GujjuGPT

Getting Started

GujjuGPT is a Gujrati language based LLM Model It can take Gujrati as the input, process it and gives it's output back in gujrati

It is still under-development

Steps

Data collection - Done
Data extraction - Done
Data cleaning - Done
Creating a high quality Instruction-style dataset - In Progress
i. Fine-tuned llama 1b on 500 Q/A - Done ii. Generating more questions - In progress
Quantization of the model - Done
Fine-tuning - In progress (v0 done)
Inference - Not completed

Tech Specifications

Base model: Llama2-7b (Changed the base model from Mistral to llama)

Finetuning technique: LORA + PEFT

Dataset: Sanghara(Bhasini)

Versions

GujjuGPT-v0

Can generate some semi-contexted gujrati text on a given prompt
It was trained on a small dataset and is great for gujrati/gujrat-context information
Does not have a GUI yet, it is only a console based GPT for now
No inference
Hasn't been evaluated on metrics/norms

GujjuGPT-v1

It has it's own tokenizer now allowing it to embedd input and understand context better
Made a ChatGPT-like UI Interface for Inference
Current stats:
- Training loss: 0.352
- Validation loss: 0.346

Limitations

Limited by Compute to train it on a larger dataset
Currently using small chunks of data with Lora and Peft to fine-tune the model

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
architecture		architecture
gujjugpt-lora		gujjugpt-lora
.gitignore		.gitignore
Issues.md		Issues.md
LICENSE		LICENSE
README.md		README.md
chatbot.py		chatbot.py
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GujjuGPT

Getting Started

Steps

Tech Specifications

Versions

GujjuGPT-v0

GujjuGPT-v1

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GujjuGPT

Getting Started

Steps

Tech Specifications

Versions

GujjuGPT-v0

GujjuGPT-v1

Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages