Conversation
5ae9f2f to
60c7c38
Compare
60c7c38 to
a360e42
Compare
tokenizer.go
Outdated
| // defer C.free(unsafe.Pointer(cErr)) | ||
|
|
||
| cChatTemplate := C.new_chat_template(cTemplate, cBosToken, cEosToken) | ||
| // defer C.free_chat_template(cChatTemplate) |
There was a problem hiding this comment.
I added a Close() method so users can free the resource when needed. We shouldn’t free cChatTemplate here as it’s returned and may be used afterwards.
release/Dockerfile
Outdated
| # syntax=docker/dockerfile:1.4 | ||
|
|
||
| FROM rust:1.87 as builder-rust | ||
| FROM swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rust:1.87 as builder-rust |
There was a problem hiding this comment.
Sorry, this should not be pushed.
| [dependencies] | ||
| libc = "0.2.162" | ||
| tokenizers = {version = "0.20.2" } | ||
| tokenizers = { version = "0.20.0", features = ["http"] } |
There was a problem hiding this comment.
is there a reason for the downgrade in version? is something broken in 20.2?
| // Construct the model URL | ||
| modelURL := fmt.Sprintf("%s/%s/resolve/main", baseURL, modelID) | ||
| modelURL := "" | ||
| hfEndpoint, exist := os.LookupEnv("HF_ENDPOINT") |
There was a problem hiding this comment.
if we gonna make this public config, lets spell it out HUGGINGFACE_ENDPOINT
| if err != nil { | ||
| t.Fatalf("Failed to apply chat template: %v", err) | ||
| } | ||
| fmt.Println(result) |
There was a problem hiding this comment.
lets assert expected output, instead of printing it
| // TODO: replace with better solution | ||
| // hack to adjust gemma3 template for debug | ||
| // replace 'messages[0]['content'][0]['text']' with 'messages[0]['content']' | ||
| let mutated_template = template.replace( | ||
| "messages[0]['content'][0]['text']", | ||
| "messages[0]['content']", | ||
| ); | ||
| // Hack to fix Qwen3 templating. | ||
| // It uses python notation to reverse lists, which do not exist in minijinja | ||
| // so we're using the reverse filter instead. | ||
| let mutated_template = mutated_template.replace("[::-1]", "|reverse"); | ||
| // TODO: replace with a better solution | ||
| // Hack to remove the {% generation %} and {% endgeneration %} statements from | ||
| // the Jinja2 chat templates if there, since those are only using for assistant | ||
| // masking during training, and should be ignored during inference |
There was a problem hiding this comment.
should these hacks live in higher layer, like in go, so the library doesnt have to inherit them? Or maybe some kind of config preprocessing script to correct them?
No description provided.