-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
running python sweep.py --config ./configs/my_sweep_14B.json, I constantly get 400 Bad request from llama-server
[1/12] 14B Q8_0 (baseline)
[llama.cpp] Starting server on 127.0.0.1:8080
Command: /home/filip/Development/Builds/llama.cpp/build/bin/llama-server -m /home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-14B-Instruct-GGUF/Qwen2.5-14B-Instruct-Q8_0.gguf --host 127.0.0.1 --port 8080 -ngl 99 -c 4096
[llama.cpp] Server pid=46706
Waiting for server ...
Server ready. Benchmarking ...
[14B Q8_0 baseline] request 1/3 ERROR: 400 Client Error: Bad Request for url: http://127.0.0.1:8080/v1/chat/completions
[14B Q8_0 baseline] request 2/3 ERROR: 400 Client Error: Bad Request for url: http://127.0.0.1:8080/v1/chat/completions
[14B Q8_0 baseline] request 3/3 ERROR: 400 Client Error: Bad Request for url: http://127.0.0.1:8080/v1/chat/completions
Server stopped (pid 46706)
Result: 0 tok/s
my_sweep_14B.json
{
"name": "qwen2.5-14B",
"hardware": "9070 XT",
"backend": "llamacpp",
"model_family": "Qwen2.5",
"targets": [
{"label": "14B Q8_0", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-14B-Instruct-GGUF/Qwen2.5-14B-Instruct-Q8_0.gguf"},
{"label": "14B Q4_K_M", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-14B-Instruct-GGUF/Qwen2.5-14B-Instruct-Q4_K_M.gguf"}
],
"drafts": [
{"label": "0.5B Q8_0", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-Coder-0.5B-Instruct-GGUF/Qwen2.5-Coder-0.5B-Instruct-Q8_0.gguf"},
{"label": "1.5B Q8_0", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-1.5B-Instruct-GGUF/Qwen2.5-1.5B-Instruct-Q8_0.gguf"},
{"label": "1.5B Q4_K_M", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-1.5B-Instruct-GGUF/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf"},
{"label": "3B Q8_0", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-Q8_0.gguf"},
{"label": "3B Q4_K_M", "path": "/home/filip/.lmstudio/models/lmstudio-community/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-Q4_K_M.gguf"}
],
"settings": {
"llama_bin": "/home/filip/Development/Builds/llama.cpp/build/bin/llama-server",
"runs": 1,
"max_tokens": 1024,
"temperature": 0.0,
"gpu_layers": 99,
"ctx_size": 4096,
"port": 8080
}
}
I've double checked my config paths, but they seem correct. Any help would be appreciated
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels