GLLM: G-Code generation using open-source LLM models

This repository contains scripts for generating and validating G-codes automatically-generated using various LLM pipelines.

Setup

Clone with submodules

git clone https://github.com/Chitransh31/GLLM.git

Install requirements

This project uses Python3.11. If not installed, you may install it via:

sudo apt update
sudo apt install python3.11

You can use pyenv to setup Python 3.11 in your repo folder

brew install pyenv
~/.zprofile
sudo ~/.zprofile
vi ~/.zprofile
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc\necho '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc\necho 'eval "$(pyenv init - zsh)"' >> ~/.zshrc
vi ~/.zshrc
pyenv versions
pyenv install 3.11
pyenv versions

Then, install poetry and guide it to use python 3.11

pipx install poetry
poetry env use /usr/bin/python3.11

Then, install the requirements

poetry install

To use Huggingface models, it is required to save the API access token as an environment variable.

Register or login at Hugging Face and create an API token in your profile settings
Add a file called secrets.toml in a folder called .streamlit at the root of the repo, and provide your HuggingFace API token by typing huggingface_token = "..."
For `OpenAI` models, add the access token openai_token = "YourOpenAITokenHere" to `.streamlit/secrets.toml`.

or you can open your shell's configuration file in a text editor:

vim ~/.bashrc

Add the following line to the end of the file:

export HUGGINGFACEHUB_API_TOKEN="YourHFTokenHere"

Save and close the file. To apply the changes, source the file or restart your terminal:

source ~/.bashrc

Resolve library errors

Most of the import library errors can be resolved using these library installation commands

pip3 install streamlit
pip3 install openai
pip3 install hub
pip3 install -e /gllm
pip3 install deeplake
pip3 install hub
pip3 install langchain
pip3 install PyPDF2
pip3 install langchain_community
pip3 install langchain_chains
pip3 install peft
pip3 install pygcode
pip3 install matplotlib
pip3 install plotly
pip3 install langgraph
pip3 install langgraph-checkpoint-sqlite

Usage

To run the GLLM application:

poetry run streamlit run gllm/code_generator_streamlit_reasoning_langchain_langgraph.py

Question Generation

This file contains code that takes in text and generates question-answer pairs which could be used for LLM evaluation or instruction tuning.

Code was taken from github. Check repo for details to setup and run code.

Finetuning an open-source LLM

train_pipeline.py contains code to finetune open-source LLMs from Hugging Face.

Run python train_pipeline.py to start the finetuning process. As default, the dataset used for finetuning are PDF files stored in the directory pdfs. To use "The Stack", specify this using: --dataset 'thestack'

The Stack

The Stack contains code files collected from Github, including G-code. Around 400 MB of G-code is available with a total of 16020 examples.

To use this dataset, you need to log in to Hugging Face in your terminal by:

Running huggingface-cli login
Providing your Hugging Face access token.

To load this dataset, use ds = load_dataset("bigcode/the-stack", data_dir="data/g-code", split="train")

Limitations to Model Size

So far, training is limited to models with <3B parameters due to memory limitations. Training code works for these models:

WizardLM/WizardCoder-3B-V1.0
bigcode/starcoderbase-3b

I tested these methods when training larger models such as setting smaller batch size, gradient accumulation and checkpointing, mixed precision training, setting device_map='auto' when loading model, but nothing works so far

Pushing Finetuned Model to Hugging Face

To push model to hub after finetuning, make sure you are logged in via cli, just like when using "The Stack" dataset (provide token that has write permission)

Starcoder

To use the Starcoder model, you need to be granted access to the model. To do this,

Log in to Hugging Face in a terminal like described above
Log in to the Hugging Face website, go to bigcode/starcoder
Accept the conditions to access model files and content.

It is recommended to use the StarCoder tech assistant prompt, since the model is only trained on code completion.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.idea		.idea
.streamlit		.streamlit
data		data
docs		docs
faiss_index		faiss_index
finetuned_model		finetuned_model
gllm		gllm
helpers		helpers
notebooks		notebooks
react-gcode-generator		react-gcode-generator
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
FINAL_SUMMARY_FIXES_AND_TESTING.md		FINAL_SUMMARY_FIXES_AND_TESTING.md
FIXES_COMPLETED.md		FIXES_COMPLETED.md
MODEL_FIXES_SUMMARY.md		MODEL_FIXES_SUMMARY.md
README.md		README.md
libssl1.1_1.1.1f-1ubuntu2.22_amd64.deb		libssl1.1_1.1.1f-1ubuntu2.22_amd64.deb
libv8-3.14.5_3.14.5.8-5ubuntu2_amd64.deb		libv8-3.14.5_3.14.5.8-5ubuntu2_amd64.deb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
quick_model_test.py		quick_model_test.py
requirements.txt		requirements.txt
test_basic_fixes.py		test_basic_fixes.py
test_basic_functionality.py		test_basic_functionality.py
test_fix.py		test_fix.py
test_fixes.py		test_fixes.py
test_model_accuracy.py		test_model_accuracy.py
test_model_loading.py		test_model_loading.py
test_none_subscriptable_fix.py		test_none_subscriptable_fix.py
test_parse_function.py		test_parse_function.py
validate_all_fixes.py		validate_all_fixes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLLM: G-Code generation using open-source LLM models

Setup

Clone with submodules

Install requirements

Resolve library errors

Usage

Question Generation

Finetuning an open-source LLM

The Stack

Limitations to Model Size

Pushing Finetuned Model to Hugging Face

Starcoder

About

Uh oh!

Releases

Packages

Languages

Chitransh31/GLLM

Folders and files

Latest commit

History

Repository files navigation

GLLM: G-Code generation using open-source LLM models

Setup

Clone with submodules

Install requirements

Resolve library errors

Usage

Question Generation

Finetuning an open-source LLM

The Stack

Limitations to Model Size

Pushing Finetuned Model to Hugging Face

Starcoder

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages