question about qwen3 finetune

Hi, thanks for the quick support for the qwen3-embedding model! 

I noticed a discrepancy regarding the query prefix format. In the [official post](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B), the instruction and query are concatenated without a space. However, in the Tevatron example, there's a space after the colon:
```
  --query_prefix "Find a relevant scientific paper abstract to support or reject the claim. Query: " \
```

This seemingly minor difference actually affects tokenization behavior:

```python
# Without space after colon
>>> print(tokenizer.tokenize("Answer the query\nQuery:What is the capital of China?"))
['Answer', 'Ġthe', 'Ġquery', 'Ċ', 'Query', ':', 'What', 'Ġis', 'Ġthe', 'Ġcapital', 'Ġof', 'ĠChina', '?']

# With space after colon
>>> print(tokenizer.tokenize("Answer the query\nQuery: What is the capital of China?"))
['Answer', 'Ġthe', 'Ġquery', 'Ċ', 'Query', ':', 'ĠWhat', 'Ġis', 'Ġthe', 'Ġcapital', 'Ġof', 'ĠChina', '?']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

question about qwen3 finetune #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

question about qwen3 finetune #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions