Skip to content

Guney-olu/Mistral-optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Performance Optimzation

Kaggle notebook https://www.kaggle.com/code/ariyasaran/ml-mixtral-assignopt/notebook

Running Inference Scripts

python3 Infernce_scripts/Base_script.py --model mistralai/Mistral-7B-v0.1 --prompt "I am batman" --output_length 1000 --hf_token "XYZ"

Run Tests scrits directly

python3 test.py

Small Comparsion

it is a very simple comparsion not taking in account many things

Image Alt text

Groq = 559 tokens/sec > Our's = 260 tokens/sec > fireworks = 251 tokens/sec

Image Alt text

TODO

  • Improve speed: Using vLLM and awq quantization to imporve speed
  • More on Quantization: trying 2bit and 3bit using llama.cpp (gguf models) CS-SUP
  • Modify code for kaggle notebook Modifying the code for the kaggle notebook support bc it has 2*T4 GPU .....on ...

About

Scripts to Optimise the model for faster inference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages