Skip to content

Question about Chinese Language Support and Model Retraining #875

@freyalu

Description

@freyalu

❓ The question

I really appreciate the team's contribution in sharing this model for research and learning purposes. I have a question regarding its Chinese language capabilities. It appears the model is less optimized for Chinese inputs. Could you clarify:

  1. What percentage of the pretraining corpus consists of Chinese data?
  2. If I want to train a Chinese-optimized LLM based on this model, what technical recommendations would you suggest?
    Thank you for your guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/questionAn issue that's a question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions