A high-performance Go runtime for DeepSeek-V3 models.
Disclaimer: This project is an independent implementation and is not affiliated with, endorsed by, or sponsored by DeepSeek. DeepSeek is a trademark of its respective owners. This project is a Go rewrite of software related to DeepSeek models.
Original work: Copyright (c) 2023 DeepSeek
Modifications: Rewritten from Python to Go, 2026.
Forked from: DeepSeek-V3 / README.md
GoSeek-V3 is a Go reimplementation of the inference engine for DeepSeek-V3, a powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token).
This project provides a DeepSeek-compatible inference runtime written entirely in Go, targeting production deployments that benefit from Go's concurrency model, low memory overhead, and static compilation.
GoSeek-V3 preserves all capabilities of the original model:
- Multi-head Latent Attention (MLA) for efficient inference
- DeepSeekMoE architecture for cost-effective computation
- Auxiliary-loss-free load balancing for stable training
- Multi-Token Prediction (MTP) objective for stronger performance and speculative decoding
The underlying model was pre-trained on 14.8 trillion diverse, high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages. Reasoning capabilities were further enhanced via knowledge distillation from the DeepSeek-R1 series.
This project focuses on the inference runtime. Model weights are sourced from the original DeepSeek-V3 release and are subject to the DeepSeek Model License.
| Property | Value |
|---|---|
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 671B |
| Activated Parameters (per token) | 37B |
| Context Length | 128K tokens |
| Attention | Multi-head Latent Attention (MLA) |
Load Balancing: An auxiliary-loss-free strategy minimizes performance degradation while encouraging balanced expert utilization.
Multi-Token Prediction: MTP improves model performance and enables speculative decoding for faster inference.
FP8 Training: The model was trained using an FP8 mixed precision framework, validating FP8 effectiveness at extreme scale. FP8 weights are provided natively; a conversion script for BF16 is available (see How to Run Locally).
Knowledge Distillation: Reasoning patterns from DeepSeek-R1's long Chain-of-Thought are distilled into DeepSeek-V3, improving its reasoning while preserving output style and length control.
| Model | Total Params | Activated Params | Context Length | Download |
|---|---|---|---|---|
| DeepSeek-V3-Base | 671B | 37B | 128K | 🤗 Hugging Face |
| DeepSeek-V3 | 671B | 37B | 128K | 🤗 Hugging Face |
Note: The total size on Hugging Face is 685B, which includes 671B of main model weights and 14B of Multi-Token Prediction (MTP) module weights.
This code repository is licensed under the MIT License.
The use of DeepSeek-V3 Base/Chat model weights is subject to the DeepSeek Model License. DeepSeek-V3 (Base and Chat) supports commercial use.
This project (GoSeek-V3) is an independent Go reimplementation and is not affiliated with or endorsed by DeepSeek. The DeepSeek Model License applies to the model weights and derivatives.
For questions about this Go runtime, please open an issue in this repository or write leycm@proton.me.
For questions about the underlying DeepSeek-V3 model, contact the original authors at their repository or at service@deepseek.com.