-
Notifications
You must be signed in to change notification settings - Fork 183
Description
Hi, thanks for the great work on LongWriter-Zero—this is a very exciting project.
I have a couple of questions regarding both the open-source plan and the training setup under limited GPU resources.
1. Open-source timeline
Is there a plan to open-source the LongWriter-Zero implementation?
If so, could you share an approximate timeline (e.g., in the coming weeks or months, or after a certain milestone)? This would be very helpful for planning reproduction and follow-up experiments.
2. Training with smaller models and fewer GPUs
I’m interested in experimenting with LongWriter-Zero under more constrained hardware settings. Specifically, would it be feasible to apply the method to a smaller model such as Qwen3-4B, instead of Qwen2.5-32B, to reduce GPU requirements?
For reference, I currently have access to 4×80GB A800 GPUs, which is significantly less than the 8 nodes × 8 H800 GPUs setup described in the paper.
In this context, I’m wondering:
- Would a 4B-scale model still meaningfully benefit from the LongWriter-Zero RL approach?
- Are there any recommended adjustments for smaller models and limited hardware, such as:
- number of concurrent trajectories per optimization step
- batch size
- maximum output length
- or other key hyperparameters that are important for stable RL training?
Any guidance or high-level suggestions would be greatly appreciated.
Looking forward to the open-source release—thanks again for the impressive work!