Skip to content

Smol training replication with nano llm models from scratch #34

@udapy

Description

@udapy

Our plan of action based on accessibility of resources

Plan A: The "Locally Run" Plan (Best for SFT & Learning)

  • Resource Used: Local MacBook Pro M2 (16GB) + Hugging Face Pro Goal: Master the Data and Post-Training (SFT) phases without spending cloud credits. Limitation: You cannot use Nanotron (CUDA-only). You will use MLX or TRL instead.

  • Context: The M2 chip is capable of training a 135M model, but it lacks the throughput for pre-training from scratch. However, it is excellent for the Supervised Fine-Tuning (SFT) chapter of the playbook.

  • Hardware Setup:

    • RAM: 16GB is sufficient for a 135M model (requires <1GB VRAM).
    • Framework: MLX (Apple's array framework) or transformers with mps (Metal Performance Shaders) backend.
  • Step-by-Step Execution:

    • Base Model: Download the pre-trained HuggingFaceTB/SmolLM2-135M from Hugging Face Pro (faster download).
    • Dataset: Download HuggingFaceTB/smol-smoltalk.
    • Training (SFT):
    • Use MLX-LM:
    • pip install mlx-lm
    • mlx_lm.lora --model HuggingFaceTB/SmolLM2-135M --train --data data/ --batch-size 4 --iters 1000
  • Note: While the Playbook uses alignment-handbook (PyTorch), MLX is 10x more efficient on your Mac. The curriculum logic remains the same.

  • Evaluation: Run inference locally to test "General Knowledge" vs "Instruction Following".

  • Hosting: Use your Hugging Face Pro account to host the resulting model on a ZeroGPU Space for a free, shareable demo.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions