❓ The question
I really appreciate the team's contribution in sharing this model for research and learning purposes. I have a question regarding its Chinese language capabilities. It appears the model is less optimized for Chinese inputs. Could you clarify:
- What percentage of the pretraining corpus consists of Chinese data?
- If I want to train a Chinese-optimized LLM based on this model, what technical recommendations would you suggest?
Thank you for your guidance.