The inference results appear to be a repetition of several words

i am using the model“pentagoniac/SEMIKONG-8b-GPTQ” and using "python -m vllm.entrypoints.api_server --model /public/home/lulingyi/repo/semikong/model/SEMIKONG-8b-GPTQ  --device cuda --max-lora-rank 32 --dtype float16 --port 8080".
When I’m making inferences, the responses seem to be repetitive, consisting of the same few words or numbers. How can I address this issue?
![Uploading 图片_0880c888051087d2e803.png…]()


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The inference results appear to be a repetition of several words #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The inference results appear to be a repetition of several words #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions