-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Skill: model-onboard
Priority: P1 — New model onboarding frequency is increasing rapidly
Motivation
Adding support for a new model architecture in ATOM requires touching multiple files with specific patterns: model implementation, model runner registration, weight loading, quantization config, and testing. The process is well-defined but has many steps where small mistakes cause hard-to-debug failures. Encoding this workflow as a skill reduces onboarding time and error rate.
What This Skill Should Do
Given a HuggingFace model name or architecture description:
-
Analyze the model architecture
- Read HuggingFace config.json and modeling files
- Identify: attention type (MHA/MQA/GQA/MLA), FFN type (dense/MoE/SwiGLU), normalization, position encoding
- Map to closest existing ATOM model as a template
-
Generate model implementation skeleton
- Create
atom/models/<model_name>.pybased on the closest template - Implement attention, FFN, and layer classes
- Handle quantization hooks (FP8/FP4 weight loading)
- Include proper weight name mapping for
load_weights()
- Create
-
Register the model
- Add entry to
model_runner.pymodel registry - Add model to supported models documentation
- Create basic test configuration
- Add entry to
-
Validate the implementation
- Load model weights successfully
- Run a single forward pass
- Compare output against HuggingFace transformers reference
- Check output quality (cosine similarity > 0.999 for FP16, > 0.99 for FP8)
Acceptance Criteria
- Generate working model skeleton from HuggingFace model card
- Automatic model runner registration
- Weight loading validation
- Forward pass correctness check
- Support for common patterns: MoE, GQA, RoPE, sliding window, attention sinks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels