Skip to content

cannot load Qwen3-235B-A22B tokenizer.json #75

@yunqi123

Description

@yunqi123

Hi, I used this code to load the Qwen3-235B-A22B tokenizer.json, but encountered this error:

tokenizer.json

code:
tk, err := pretrained.FromFile("tokenizer.json")

error:
正在预加载sugarme tokenizer...
panic: regexp: Compile((?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+): error parsing regexp: invalid or unsupported Perl syntax: (?!

goroutine 1 [running]:
regexp.MustCompile({0xc0000244d0, 0x6e})
D:/software/Go/src/regexp/regexp.go:313 +0xb4
github.com/sugarme/tokenizer/normalizer.NewRegexpPattern(...)
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/pattern.go:192
github.com/sugarme/tokenizer/pretrained.createSplitPreTokenizer(0xc005b07990)
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretrained/pretokenizer.go:189 +0x11a
github.com/sugarme/tokenizer/pretrained.CreatePreTokenizer(0xc00027ac90)
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretrained/pretokenizer.go:56 +0xb3
github.com/sugarme/tokenizer/pretrained.createSequencePreTokenizer(0xc005b07a58?)
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretrained/pretokenizer.go:223 +0xaf
github.com/sugarme/tokenizer/pretrained.CreatePreTokenizer(0xc00027ac60)
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretrained/pretokenizer.go:46 +0x15b
github.com/sugarme/tokenizer/pretrained.FromReader({0x7ff6165b9580, 0xc00007a108})
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretrained/tokenizer.go:51 +0x353
github.com/sugarme/tokenizer/pretrained.FromFile({0x7ff6164f37c2?, 0x0?})
D:/code/goPath/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretrained/tokenizer.go:19 +0x78
main.loadSugarmeTokenizer()
D:/code/go-test/cal-token.go:176 +0x1f
main.preloadEncodings()
D:/code/go-test/cal-token.go:74 +0x2df
main.main()
D:/code/go-test/cal-token.go:929 +0x13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions