LLM Benchmark for Throughput via Ollama (Local LLMs)
pip install llm-benchllm_bench run7B model can be run on machines with 8GB of RAM
13B model can be run on machines with 16GB of RAM
On Windows, Linux, and macOS, it will detect memory RAM size to first download required LLM models.
When memory RAM size is greater than or equal to 4GB, but less than 7GB, it will check if gemma:2b exist. The program implicitly pull the model.
ollama pull gemma:2bWhen memory RAM size is greater than 7GB, but less than 15GB, it will check if these models exist. The program implicitly pull these models
ollama pull gemma:2b
ollama pull gemma:7b
ollama pull mistral:7b
ollama pull llama2:7b
ollama pull llava:7bWhen memory RAM siz is greater than 15GB, it will check if these models exist. The program implicitly pull these models
ollama pull gemma:2b
ollama pull gemma:7b
ollama pull mistral:7b
ollama pull llama2:7b
ollama pull llama2:13b
ollama pull llava:7b
ollama pull llava:13bhttps://python-poetry.org/docs/#installing-manually
python3 -m venv .venv
. ./.venv/bin/activate
pip install -U pip setuptools
pip install poetrypoetry shell
poetry install
llm_benchmark hello jasonllm_bench runllm_bench run --no-sendinfoExample #3 Benchmark run on explicitly given the path to the ollama executable (When you built your own developer version of ollama)
llm_bench run --ollamabin=~/code/ollama/ollama