Bharat MiniGPT 350M: भारत में Scratch से विकसित किया गया एक True Custom GPT-Style LLM

May 12, 2026 Harshvardhan Mishra

आज अधिकांश AI projects existing open-source models की fine-tuning पर आधारित होते हैं। कई लोग pretrained architectures लेकर उनमें केवल dataset या instruction tuning बदलते हैं। लेकिन कुछ projects ऐसे भी होते हैं जहाँ पूरा transformer architecture, training pipeline और optimization system स्वयं implement किया जाता है।

Bharat MiniGPT 350M ऐसा ही एक भारतीय AI project है जिसे HVM Smart Solutions (Harshvardhan Mishra) द्वारा scratch से विकसित किया गया है।

महत्वपूर्ण बात यह है कि यह केवल GPT-2, LLaMA या किसी existing open model का simple derivative नहीं है। इसका core transformer implementation manually PyTorch में तैयार किया गया और बाद में Hugging Face ecosystem compatibility जोड़ी गई।

Bharat MiniGPT 350M क्या है?

Bharat MiniGPT 350M एक custom decoder-only Transformer आधारित causal language model है जिसे modern LLM engineering concepts का उपयोग करके विकसित किया गया है।

यह project केवल chatbot बनाने के उद्देश्य से नहीं बनाया गया, बल्कि इसका focus है:

Foundational LLM engineering
Transformer implementation
Large-scale pretraining
Training stability experimentation
Attention optimization
Inference systems
KV-cache handling
Hugging Face compatibility
Future lightweight AI deployment

यह मॉडल अलग क्यों है?

आज internet पर बहुत से “custom AI models” वास्तव में existing architectures के wrappers या fine-tuned variants होते हैं। लेकिन Bharat MiniGPT 350M में कई core systems manually implemented हैं।

User द्वारा shared training code में स्पष्ट रूप से custom implementations दिखाई देते हैं:

Custom RMSNorm implementation
Manual RoPE logic
Custom SwiGLU feed-forward blocks
Self-written attention system
Decoder transformer blocks
Manual generation pipeline
KV-cache compatible generation structure
Streaming dataset pipeline
Custom LR scheduling
Gradient checkpointing integration

इसके बाद model को Hugging Face Transformers ecosystem के साथ compatible बनाया गया ताकि inference और deployment आसान हो सके।

यानी यह project पहले “custom transformer implementation” था, बाद में उसे HF-compatible ecosystem में integrate किया गया।

Model Specifications

Feature	Details
Model Name	Bharat MiniGPT 350M
Parameters	~350 Million
Architecture	Custom Decoder-only Transformer
Training Tokens	3 Billion
Framework	PyTorch
HF Compatibility	Added Later
Developer	Harshvardhan Mishra
Organization	HVM Smart Solutions

Architecture Details

मॉडल modern GPT-style architecture concepts का उपयोग करता है।

Component	Details
Transformer Layers	24
Attention Heads	16
Embedding Size	1024
Context Length	768 Tokens
Vocabulary Size	50,257
Positional Encoding	RoPE
Normalization	RMSNorm
Feed Forward	SwiGLU
Attention	SDPA / Flash Attention Compatible
Weight Tying	Yes
Precision	FP16

Custom Transformer Engineering

इस project की सबसे interesting बात इसकी manual engineering approach है।

Custom RMSNorm

कोड में RMSNorm manually implement किया गया है बजाय built-in transformer abstraction उपयोग करने के।

यह lightweight normalization strategy modern LLMs में काफी लोकप्रिय है।

Manual RoPE Implementation

Rotary Position Embedding (RoPE) logic manually लिखा गया है।

यह positional encoding approach long-context modeling के लिए महत्वपूर्ण मानी जाती है और modern GPT-family architectures में व्यापक रूप से उपयोग होती है।

SwiGLU Feed Forward Network

Feed-forward layers में SwiGLU activation mechanism manually implement किया गया है।

यह architecture choice transformer expressiveness improve करने में मदद करती है।

Custom Attention System

Attention module स्वयं लिखा गया है और इसमें scaled dot-product attention तथा RoPE integration शामिल है।

यह SDPA और Flash Attention compatible structure पर आधारित है।

Hugging Face Compatibility बाद में जोड़ी गई

यह एक महत्वपूर्ण technical detail है।

Bharat MiniGPT पहले standalone PyTorch transformer implementation के रूप में विकसित किया गया। बाद में इसे Hugging Face ecosystem के साथ compatible बनाया गया ताकि:

Easier inference
Better deployment
generate() support
Model sharing
Standardized loading
Community compatibility

संभव हो सके।

यानी project का मूल focus पहले architecture और training pipeline development था, न कि केवल HF wrapper बनाना।

Training Dataset

मॉडल को weighted dataset mixture पर train किया गया।

Dataset	Weight
HuggingFaceFW/fineweb (sample-10BT)	40%
HuggingFaceFW/fineweb-edu (sample-10BT)	30%
Wikimedia Wikipedia	30%

Training pipeline में streaming dataset architecture भी manually बनाया गया है।

Training Setup

Setting	Value
Optimizer	AdamW
Learning Rate	3e-4
Warmup Steps	51,200
LR Scheduler	Cosine Decay
Gradient Accumulation	128
Mixed Precision	FP16
Gradient Clipping	1.0

Engineering Features

Project में कई advanced engineering features शामिल हैं:

Custom GPT implementation
RoPE embeddings
RMSNorm
SwiGLU
Flash Attention compatible SDPA
Gradient checkpointing
Mixed precision training
Weight tying
Streaming datasets
KV-cache compatible generation
Custom cosine LR scheduler
Manual checkpoint recovery system

Current Stage: Base Pretrained Model

यह समझना जरूरी है कि Bharat MiniGPT 350M अभी केवल:

Base pretrained checkpoint
3B token stage
Non-instruction tuned model

है।

इसका मतलब:

यह ChatGPT जैसा polished assistant नहीं है
Conversational optimization नहीं हुई
RLHF training नहीं हुई
Alignment tuning नहीं हुई

फिलहाल इसका focus foundational language learning और architecture experimentation है।

Benchmark Results

EleutherAI LM Evaluation Harness के अनुसार current benchmark scores:

Task	Metric	Score
ARC Easy	acc	0.3312
HellaSwag	acc	0.2650
PIQA	acc	0.5631

ये results शुरुआती 3B-token pretrained stage के लिए हैं।

Project की Technical Importance

Scratch से transformer architecture implement करना केवल coding task नहीं होता। इसमें कई complex engineering समस्याएँ शामिल होती हैं:

Memory optimization
Attention stability
Gradient scaling
Precision handling
Dataset streaming
Checkpoint management
Token generation stability
Training recovery
GPU utilization

इसी कारण ऐसे projects AI engineering learning ecosystem में महत्वपूर्ण होते हैं।

Future Improvements

Project roadmap के अनुसार आगे कई improvements संभव हैं:

Better Tokenizer

Tokenizer quality language modeling पर सीधा प्रभाव डालती है।

Larger Token Training

3B से आगे additional pretraining capability improve कर सकती है।

Instruction Tuning

Future conversational capability improve हो सकती है।

Better Inference

ONNX, KV-cache और quantization support भविष्य में उपयोगी हो सकते हैं।

Indian Language Expansion

Hindi और भारतीय भाषाओं के लिए future optimization संभव है।

भारत में Independent AI Development

भारत में AI adoption तेजी से बढ़ रहा है, लेकिन foundational LLM engineering projects अभी भी सीमित हैं।

ऐसे projects महत्वपूर्ण हैं क्योंकि ये:

Practical AI engineering सिखाते हैं
Open experimentation बढ़ाते हैं
Local research ecosystem को support करते हैं
Students को architecture-level understanding देते हैं

निष्कर्ष

Bharat MiniGPT 350M केवल एक और fine-tuned AI model नहीं है, बल्कि scratch से विकसित किया गया एक custom transformer engineering project है।

इसकी architecture, RoPE implementation, RMSNorm, SwiGLU blocks, attention system और training pipeline manually विकसित किए गए, और बाद में Hugging Face compatibility जोड़ी गई।

हालांकि मॉडल अभी शुरुआती pretrained stage में है और इसे आगे काफी refinement की आवश्यकता है, फिर भी यह भारतीय independent AI engineering ecosystem में एक रोचक और technically valuable प्रयास माना जा सकता है।