KobiGPT is a character-level GPT language model trained exclusively on the complete collection of Rabindranath Tagore’s Bengali poems.
It generates new verses that stylistically and linguistically resemble Tagore’s original poetic rhythm, word patterns, and emotional tone.
KobiGPT is based on a GPT (Generative Pretrained Transformer) architecture — a type of decoder-only transformer that predicts the next character in a sequence given the previous ones. Autoregressive modeling allows it to generate coherent text one character at a time.
- Type: Character-level GPT
- Architecture: Decoder-only Transformer (like GPT-2 but smaller)
- Framework: PyTorch
- Dataset: Complete collection of Rabindranath Tagore’s Bengali poems
- Tokenizer: Character-level (139 unique characters)
- Training Device: NVIDIA T4 GPU (Google Colab Free Tier)
| Parameter | Description / Value |
|---|---|
Context Length (block_size) |
256 characters |
Vocabulary Size (vocab_size) |
139 unique characters |
Transformer Layers (n_layer) |
6 |
Attention Heads (n_head) |
8 |
Embedding Dimension (n_embd) |
512 |
| Dropout | 0.2 |
| Use Bias | ✅ True |
| Total Parameters | ≈ 19.18 million |
| Optimizer | AdamW |
| Initial Learning Rate | 3 × 10⁻⁴ |
| LR Schedule | Decayed by ×0.1 at steps 3000 and 5000 |
| Batch Size | 32 |
| Training Steps | 8000 |
| Training Duration | ~1.5 hours |
| Hardware | NVIDIA T4 GPU (Google Colab Free Tier) |
Each character is mapped to a 512-dimensional vector via a learnable embedding matrix.
Positional embeddings are added to retain sequence order.
The model consists of 6 layers, each with:
- Masked multi-head self-attention (8 heads)
- Feed-forward projection (2-layer MLP)
- Layer normalization and residual connections
The final layer projects the hidden representation back into the vocabulary space, producing logits for each character.
- Input sequence (e.g.,
"\n") is encoded into character IDs. - Each ID is converted into embeddings and combined with positional encodings.
- The transformer layers use self-attention to model dependencies between all characters.
- The model predicts the next character probability distribution.
- Loss is computed via cross-entropy.
| Metric | Value |
|---|---|
| Final Train Loss | 1.50 |
| Final Val Loss | 1.69 |
| Training Steps | 8000 |
| GPU Time | ~1.5 hours |
The final validation loss of ~1.69 nats (≈2.44 bits per character) indicates strong character-level modeling capability for a small GPT trained on poetic text.
From a generation run without any prompt (starting from "\n"):
হাসি ছুটি তাে গান গেয়ে,
তারে এঁকে দেবে হেসে।
এ ভুবনচরণ তৃষা এসে বেদনা,
তরুণ হৃদয় উঠে শ্যামা মেঘের ফুল
ওই যেন আলো
ঝাউয়ের'পরে পড়ে আজ।
আকাশ ছিঁড়ে নাইল চোখে।
স্নেহমুখীর ছড়ায়ে নাই, চমকি
চূর্ণ ছড়িয়া চলে যায়,
তার কপোলে তার বালে দেখী,
বাঁধন কেঁদে উড়ে যায়।
হৃদয় কুসুম সাগরে
মিশে কে জানিল না কখনো,
চেয়ে না এসে না বেশি বেশি বেদে।
বসন্ত যবে কালবৈশাখে,
ডালি পাছে ঝরিয়া গাঁথিতে খুঁজে,
ফুল যে পাছে ফুটে ফুটে,
তারি সুর পথে পাতা পাতা তফুটে।
হাসিমুখে মেঘের কোণে
রয়েছে চিরদিন
ব্যাপিয়া মধু,
প্রভাতসংগীত
ভালোবাসিল আকাশ-কুমল।
With a custom prompt: "বিনয় ভাবছে আকাশ কেন নীল,"
বিনয় ভাবছে আকাশ কেন নীল,
ভবিষ্যৎ মাঞ্চিতের হানল;
আমাদের মন্ত্র দেখছি বিস্ময়ের ক্রন্দনে
আমাকে এসেছি আমার দলে, কেবল সিহাস করে।
বিশ্বাস-বীরে কল্পনা...
While writing prompt avoid using characters outside the known vocabulary to prevent unknown token issues. Unknown characters are currently mapped to space (' ').
You can generate text using:
python src/generate.pyYou can adjust max_new_tokens and temperature in src/generate.py to control generation length and creativity.
python src/train.pyTweak hyperparameters in src/train.py as needed.
-
Inspired by Andrej Karpathy’s nanoGPT implementation.
-
Trained and fine-tuned on Rabindranath Tagore’s Bengali poem corpus.
-
Dataset: Public domain Bengali poems by Rabindranath Tagore, available on Kaggle.
-
Developed and trained by Binoy Bhushan Barman Dipu as part of learning and experimentation with GPT architectures.
-
Implement BPE or WordPiece tokenization for better semantic coherence.
-
Experiment with larger context lengths (512+) and deeper models.
-
Add Bangla rhyme and meter control for stylistic conditioning.
-
Integrate UI-based poem generator using Streamlit or Flask.
“KobiGPT is not just a model — it’s a poetic mirror where Tagore’s spirit of words finds a neural echo.”
