Generated: 2025-11-05 Status: Phase 1 Setup Completed (6/9 tasks) Estimated Remaining Effort: 145 tasks, 6-8 weeks (1 FTE) or 3-4 weeks (3 FTE)
| Task ID | Description | Status |
|---|---|---|
| T001 | Create project directory structure | ✅ DONE |
| T002 | Configure pyproject.toml | ✅ DONE |
| T003 | Create .python-version | ✅ DONE |
| T004 | Create .gitignore | ✅ DONE |
| T005 | Configure mkdocs.yml | ✅ DONE |
| T006 | Write README.md | ✅ DONE |
| Task ID | Description | Priority | Estimated Time |
|---|---|---|---|
| T007 | Configure CI/CD (GitHub Actions) | Medium | 30 min |
| T008 | Create offline data directory structure | Low | 10 min |
| T009 | Initialize Git repository and first commit | High | 10 min |
Goal: Deliver a working MVP (User Story 1 - Stage 3) as quickly as possible
Days 1-2: Complete remaining setup + Foundation basics
- T007: Configure CI/CD
- T008-T009: Finalize setup
- T010-T013: Create YAML entity configurations (stages, modules, projects, datasets)
- T014-T021: Implement core scripts (env detection, data verification, validation)
Days 3-5: Foundation infrastructure + Cross-platform docs
- T022-T028: Write all 6 OS setup guides + troubleshooting
- T029-T032: Write auxiliary docs (glossary, prerequisites, learning path, framework comparison)
Week 2, Days 1-3: Modules + Data prep
- T033-T044: Write all 4 module tutorials + create Notebooks (12 files)
- T045-T047: Implement stage 3 data download scripts + offline package
Week 2, Days 4-5 + Week 3, Days 1-2: First batch of projects
- T048-T053: Project P01 (Healthcare)
- T054-T058: Project P02 (Ecommerce)
- T059-T063: Project P03 (Finance)
Week 3, Days 3-5: Remaining projects + Evaluation
- T064-T082: Projects P04-P09 (6 projects, can parallelize)
- T083-T085: Create rubrics, metrics, evaluation scripts
MVP Delivery: End of Week 3
- Learners can complete Stage 3 on any OS
- 9 working projects with evaluation
- Full cross-platform support
Team Structure:
- Dev A (Foundation Lead): Infrastructure & cross-platform
- Dev B (Content Creator 1): Stage 3 modules + first 3 projects
- Dev C (Content Creator 2): Stage 3 last 6 projects + evaluation
Dev A - Foundation Infrastructure:
# Day 1 Morning
- T007: Configure .github/workflows/ci.yml
- T010-T013: Create configs/content/*.yaml (stages, modules, projects, datasets)
# Day 1 Afternoon
- T014: scripts/env/detect-platform.py
- T015: scripts/data/verify.py
- T018-T020: scripts/validation/*.py
# Day 2
- T022-T028: docs/cross-platform/*.md (6 OS guides + troubleshooting)Dev B - Module Content:
# Day 1-2
- T033-T036: Module M01 docs + 3 notebooks (NumPy/Pandas/Viz)
- T037-T039: Module M02 docs + 2 notebooks (Pandas practice)
- T040-T042: Module M03 docs + 2 notebooks (Math basics)
- T043-T044: Module M04 docs + 1 notebook (ML advanced)Dev C - Support Infrastructure:
# Day 1
- T016-T017: Create project templates
- T021: configs/content/environments.yaml
- T029-T032: Auxiliary docs (glossary, prerequisites, learning path, framework comparison)
# Day 2
- T045-T047: Stage 3 data scripts + offline package prepDev A - Review & Integration:
# Day 3
- Review all foundation work
- T008-T009: Finalize setup & git commit
- Integration testing
# Day 4-5
- T083-T085: Evaluation system (rubrics, metrics, eval scripts)
- CI/CD pipeline testingDev B - Projects P01-P03:
# Day 3
- T048-T053: Project P01 Healthcare (6 tasks)
# Day 4
- T054-T058: Project P02 Ecommerce (5 tasks)
# Day 5
- T059-T063: Project P03 Finance (5 tasks)Dev C - Projects P04-P09:
# Day 3
- T064-T067: Project P04 Telecom (4 tasks)
- T068-T070: Project P05 Retail (3 tasks)
# Day 4
- T071-T073: Project P06 Internet (3 tasks)
- T074-T076: Project P07 Ecommerce Annual (3 tasks)
# Day 5
- T077-T079: Project P08 Airline (3 tasks)
- T080-T082: Project P09 Credit (3 tasks)MVP Delivery: End of Day 5
- Complete Stage 3 tutorial system
- All 9 projects working and tested
- Evaluation system functional
File: .github/workflows/ci.yml
name: CI
on:
push:
branches: [ main, develop, 002-ai-tutorial-stages ]
pull_request:
branches: [ main, develop ]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.cargo/bin" >> $GITHUB_PATH
- name: Install dependencies
run: |
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
- name: Validate data models
run: |
python scripts/validation/validate-entities.py
python scripts/validation/validate-paths.py
python scripts/validation/validate-relationships.py
- name: Run tests
run: |
pytest tests/ --cov=scripts --cov-report=xml
- name: Code quality checks
run: |
black --check scripts/ tests/
ruff check scripts/ tests/
mypy scripts/
build-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -e ".[docs]"
- name: Build MkDocs
run: |
mkdocs build --strict
- name: Deploy to GitHub Pages (main branch only)
if: github.ref == 'refs/heads/main'
run: |
mkdocs gh-deploy --forceEstimated Time: 30 minutes
Commands:
mkdir -p offline/stage3-data
mkdir -p offline/stage4-data
mkdir -p offline/stage5-data
mkdir -p offline/stage4-models
# Create README for offline packages
cat > offline/README.md << 'EOF'
# Offline Data Packages
本目录包含离线数据包,用于网络受限环境。
## 下载链接
- **Stage 3 数据包** (~1.5GB): [百度网盘链接] | [阿里云盘链接]
- **Stage 4 数据包** (~6GB): [百度网盘链接] | [阿里云盘链接]
- **Stage 4 模型包** (~3GB): [百度网盘链接] | [阿里云盘链接]
- **Stage 5 数据包** (~2GB): [百度网盘链接] | [阿里云盘链接]
## 使用方法
1. 下载对应阶段的离线包(.tar.gz格式)
2. 解压到对应目录:
```bash
tar -xzf stage3-data.tar.gz -C data/stage3/
tar -xzf stage4-data.tar.gz -C data/stage4/- 验证完整性:
python scripts/data/verify.py --stage 3 --offline
所有数据包附带 checksums.txt 文件(SHA256)。
EOF
**Estimated Time**: 10 minutes
---
#### T009: Initialize Git and first commit
**Commands**:
```bash
cd /Users/hanlinqi/Desktop/Code/AICode/py_ai_tutorial
# Check git status
git status
# Add all setup files
git add pyproject.toml .python-version .gitignore mkdocs.yml README.md
git add docs/ notebooks/ scripts/ data/ templates/ tests/ configs/ offline/
git add .github/
# Create first commit
git commit -m "feat: 初始化项目结构与配置
- 配置 pyproject.toml(uv包管理,阶段3/4/5依赖分组)
- 配置 MkDocs(Material主题,中文支持)
- 创建项目目录结构(docs, notebooks, scripts, data, templates, tests)
- 配置 CI/CD(GitHub Actions)
- 编写项目 README(快速开始指南、学习路径)
Co-authored-by: Claude <claude@anthropic.com>"
# Push to feature branch
git push origin 002-ai-tutorial-stages
Estimated Time: 10 minutes
This phase is critical - all User Stories depend on it. Prioritize completion before starting any Stage 3 content.
T010-T013: Create YAML Entity Configurations
These files define the data model for the entire tutorial system.
File: configs/content/stages.yaml
stages:
- id: stage3
name: 机器学习与数据挖掘
name_en: Machine Learning & Data Mining
description: 掌握传统机器学习算法(分类、回归、聚类、集成学习)及其在实际业务场景中的应用,熟练使用scikit-learn与数据分析工具栈。
order: 1
prerequisites: []
estimated_hours:
theory_min: 2
theory_max: 3
practice_min: 2
practice_max: 3
modules:
- stage3-m01-scientific-computing
- stage3-m02-pandas-practice
- stage3-m03-ml-basics
- stage3-m04-ml-advanced
projects:
- stage3-p01-healthcare
- stage3-p02-ecommerce
- stage3-p03-finance
- stage3-p04-telecom
- stage3-p05-retail
- stage3-p06-internet
- stage3-p07-ecommerce-annual
- stage3-p08-airline
- stage3-p09-credit
learning_outcomes:
- 能使用NumPy进行高效数组运算与数据预处理
- 能使用Pandas完成数据清洗、探索性分析与特征工程
- 理解分类、回归、聚类算法原理并能选择合适算法解决业务问题
- 能使用scikit-learn训练模型、调参与评估,并解释模型结果
- 能完成端到端的机器学习项目(从数据到模型交付)
- id: stage4
name: 深度学习
name_en: Deep Learning
description: 掌握深度学习框架(PyTorch/TensorFlow),能完成CV/NLP迁移学习项目,理解神经网络训练技巧。
order: 2
prerequisites:
- stage3
estimated_hours:
theory_min: 3
theory_max: 4
practice_min: 3
practice_max: 6
modules:
- stage4-m01-dl-basics
- stage4-m02-cv-basics
- stage4-m03-nlp-basics
projects:
- stage4-p01-industrial-vision
- stage4-p02-yolov11-realtime
- stage4-p03-ocr
- stage4-p04-image-segmentation
- stage4-p05-medical-imaging
- stage4-p06-transformer-translation
- stage4-p07-pretrained-info-extraction
learning_outcomes:
- 掌握PyTorch/TensorFlow框架,能定义与训练神经网络
- 理解CNN原理,能完成图像分类、目标检测、图像分割任务
- 理解RNN/Transformer原理,能完成文本分类、序列标注、翻译任务
- 能使用预训练模型进行迁移学习,提升小数据集场景下的性能
- 能在CPU/GPU环境部署深度学习模型
- id: stage5
name: AIGC与大模型
name_en: AIGC & Large Language Models
description: 掌握LLM应用开发(Prompt/RAG/Agent),能完成端到端对话系统,理解大模型微调技术。
order: 3
prerequisites:
- stage4
estimated_hours:
theory_min: 2
theory_max: 3
practice_min: 6
practice_max: 9
modules:
- stage5-m01-aigc-llm-intro
- stage5-m02-llm-dev
projects:
- stage5-p01-dialogue-system
learning_outcomes:
- 理解GPT/LLM原理与应用场景,能选择合适的LLM API
- 掌握Prompt Engineering技巧,能设计有效的提示词
- 能搭建RAG系统,实现知识检索增强生成
- 能设计Agent工作流,实现多步骤任务自动化
- 能使用LoRA/QLoRA轻量微调LLM,适配特定领域Similar files needed:
configs/content/modules.yaml(define all 9 modules)configs/content/projects.yaml(define all 17 projects)configs/content/datasets.yaml(define all datasets with download URLs, checksums)
Estimated Time: 2-3 hours for all 4 YAML files
T014-T020: Implement Core Scripts
These scripts are essential for environment setup, data management, and validation.
Priority Order:
- T014:
scripts/env/detect-platform.py(helps learners identify their OS) - T015:
scripts/data/verify.py(ensures data integrity) - T018-T020: Validation scripts (ensures YAML configs are correct)
Example: scripts/env/detect-platform.py
#!/usr/bin/env python3
"""
环境检测脚本
自动检测操作系统、CPU架构、GPU可用性、Python版本等信息
"""
import platform
import sys
from pathlib import Path
from typing import Dict, Optional
def detect_os() -> str:
"""检测操作系统"""
system = platform.system()
if system == "Darwin":
return "macOS"
elif system == "Linux":
return "Linux"
elif system == "Windows":
return "Windows"
return "Unknown"
def detect_cpu_arch() -> str:
"""检测CPU架构"""
machine = platform.machine().lower()
if machine in ["x86_64", "amd64"]:
return "x86_64"
elif machine in ["arm64", "aarch64"]:
return "arm64"
return machine
def detect_gpu() -> Dict[str, Optional[str]]:
"""检测GPU类型与可用性"""
gpu_info = {
"available": False,
"type": None,
"device_name": None,
}
# Try NVIDIA GPU (CUDA)
try:
import subprocess
result = subprocess.run(
["nvidia-smi", "--query-gpu=name", "--format=csv,noheader"],
capture_output=True,
text=True,
timeout=5,
)
if result.returncode == 0:
gpu_info["available"] = True
gpu_info["type"] = "NVIDIA CUDA"
gpu_info["device_name"] = result.stdout.strip()
return gpu_info
except (FileNotFoundError, subprocess.TimeoutExpired):
pass
# Try Apple Metal (M1/M2/M3)
if platform.system() == "Darwin" and platform.machine() == "arm64":
try:
import torch
if torch.backends.mps.is_available():
gpu_info["available"] = True
gpu_info["type"] = "Apple MPS"
gpu_info["device_name"] = "Apple Silicon GPU"
return gpu_info
except ImportError:
pass
return gpu_info
def detect_python_version() -> str:
"""检测Python版本"""
return f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
def detect_memory() -> Optional[int]:
"""检测系统内存(GB)"""
try:
import psutil
return round(psutil.virtual_memory().total / (1024**3))
except ImportError:
return None
def recommend_setup_doc(os_type: str, cpu_arch: str, gpu_available: bool) -> str:
"""推荐配置文档"""
if os_type == "macOS":
if cpu_arch == "x86_64":
return "docs/cross-platform/setup-macos-intel.md"
elif cpu_arch == "arm64":
return "docs/cross-platform/setup-macos-arm64.md"
elif os_type == "Linux":
return "docs/cross-platform/setup-linux.md"
elif os_type == "Windows":
# TODO: Detect WSL2 vs native
return "docs/cross-platform/setup-windows-wsl2.md"
return "docs/cross-platform/troubleshooting.md"
def main():
"""主函数"""
print("=" * 60)
print("环境检测结果")
print("=" * 60)
os_type = detect_os()
cpu_arch = detect_cpu_arch()
python_version = detect_python_version()
gpu_info = detect_gpu()
memory_gb = detect_memory()
print(f"\n操作系统: {os_type} {platform.release()}")
print(f"CPU架构: {cpu_arch}")
print(f"Python版本: {python_version}")
if memory_gb:
print(f"内存: {memory_gb} GB")
if gpu_info["available"]:
print(f"\nGPU: ✅ 检测到 {gpu_info['type']}")
print(f"设备名称: {gpu_info['device_name']}")
else:
print(f"\nGPU: ❌ 未检测到(将使用CPU模式)")
recommended_doc = recommend_setup_doc(os_type, cpu_arch, gpu_info["available"])
print(f"\n推荐配置文档: {recommended_doc}")
print("\n" + "=" * 60)
if __name__ == "__main__":
main()Estimated Time: 1-2 hours per script (6-12 hours total for T014-T020)
T022-T028: Cross-Platform Setup Guides
These are documentation tasks - can be parallelized across team members.
Template for each OS guide:
# [OS Name] 环境配置指南
**适用于**: [OS Version]
**预计时间**: 30-60分钟
**前置要求**: [Prerequisites]
## 环境概览
- **操作系统**: [OS Details]
- **CPU架构**: [x86_64/arm64]
- **GPU支持**: [Yes/No, details]
- **Python版本**: 3.9+ (推荐3.11)
## 安装步骤
### 1. 安装Python
[OS-specific Python installation steps]
### 2. 安装uv包管理器
[OS-specific uv installation steps]
### 3. 克隆项目并创建虚拟环境
[Standard steps]
### 4. 安装依赖
[OS-specific dependency installation, including GPU drivers if applicable]
### 5. 验证安装
[Validation commands]
## 常见问题
### 问题1: [Common Issue]
- **症状**: [Description]
- **原因**: [Root cause]
- **解决**: [Solution]
[Repeat for 3-5 common issues]
## 下一步
- 继续学习: [link to stage3 intro]
- 运行首个项目: [link to quickstart]
- 遇到问题?查看[故障恢复清单](troubleshooting.md)Estimated Time: 1-2 hours per guide (6-12 hours total for 6 guides + troubleshooting)
T029-T032: Auxiliary Documentation
These provide essential context for learners.
Key files:
docs/glossary.md: ≥15 terms with Chinese/English equivalentsdocs/prerequisites.md: Math/Python requirements + external learning resourcesdocs/learning-path.md: Milestone checklist, time estimatesdocs/framework-comparison.md: PyTorch vs TensorFlow comparison table
Estimated Time: 3-4 hours total
cd /Users/hanlinqi/Desktop/Code/AICode/py_ai_tutorial
# T007: Create CI/CD config
mkdir -p .github/workflows
# [Copy CI/CD YAML content above to .github/workflows/ci.yml]
# T008: Create offline directories
mkdir -p offline/{stage3-data,stage4-data,stage5-data,stage4-models}
# [Create offline/README.md]
# T009: Git commit
git add .
git commit -m "feat: 完成项目基础设置
- 配置项目结构与依赖管理
- 配置MkDocs文档系统
- 配置CI/CD流水线
- 创建离线数据目录
Co-authored-by: Claude <claude@anthropic.com>"
git push origin 002-ai-tutorial-stages
# Start Phase 2: Foundation
# T010: Create stages.yaml
mkdir -p configs/content
# [Create configs/content/stages.yaml with content above]
# Continue with remaining Foundation tasks...All devs: Sync on branch and pull latest
git checkout 002-ai-tutorial-stages
git pull origin 002-ai-tutorial-stagesDev A (Foundation Lead):
# T007-T009: Finalize setup
# [Create CI/CD, offline dirs, git commit]
# T010-T013: Create YAML configs
mkdir -p configs/content
# [Create all 4 YAML files]Dev B (Content Creator 1):
# T033-T036: Start Module M01
mkdir -p docs/stage3/01-scientific-computing
mkdir -p notebooks/stage3
# [Create README.md + 3 notebooks]Dev C (Content Creator 2):
# T029-T032: Auxiliary docs
# [Create glossary, prerequisites, learning path, framework comparison]
# T016-T017: Project templates
mkdir -p templates/project-template
# [Create template structure]Update specs/002-ai-tutorial-stages/tasks.md:
-- [ ] T001 创建项目根目录结构
+- [X] T001 创建项目根目录结构Create a daily log:
# Create progress log
cat > PROGRESS.md << 'EOF'
# Implementation Progress Log
## 2025-11-05
- ✅ T001-T006: Completed initial setup
- ⏳ T007-T009: In progress
- 📝 Next: Foundation phase (T010-T032)
## [Date]
- [Tasks completed]
- [Blockers encountered]
- [Next steps]
EOFSymptom: YAML warnings about unresolved tags in mkdocs.yml
Cause: IDE doesn't recognize MkDocs-specific YAML tags
Solution: These are safe to ignore - MkDocs will process them correctly. Alternatively, suppress warnings in IDE settings.
Symptom: curl command fails or uv not found
Solution: Use alternative installation:
pip install uv
# or
pipx install uvSymptom: Git complains about file size when committing data
Solution: Ensure .gitignore excludes data files:
data/
*.parquet
*.h5
*.pth- Spec Questions: Review
specs/002-ai-tutorial-stages/spec.md - Technical Decisions: Review
specs/002-ai-tutorial-stages/research.md - Data Model: Review
specs/002-ai-tutorial-stages/data-model.md - API Contracts: Review
specs/002-ai-tutorial-stages/contracts/ - Task List: Review
specs/002-ai-tutorial-stages/tasks.md
- Project structure created
- pyproject.toml configured
- MkDocs configured
- README.md complete
- CI/CD configured
- Git initialized with first commit
- All 4 YAML entity configs created
- Core scripts implemented (env detection, data verification, validation)
- All 6 OS setup guides complete
- Auxiliary docs complete (glossary, prerequisites, learning path)
- All 4 Stage 3 module tutorials complete (docs + notebooks)
- Stage 3 data download scripts working
- All 9 Stage 3 projects complete and tested
- Evaluation system working (rubrics, metrics, eval scripts)
- At least 3 projects verified on 2+ different OS platforms
- Documentation buildable and deployable
MVP Success Criteria:
- ✅ Learner can configure environment on any OS in <60 minutes
- ✅ Learner can complete Stage 3 projects with CPU only
- ✅ Project outputs match expected metric ranges (±5%)
- ✅ Documentation site builds without errors
- ✅ CI/CD pipeline passes
Good luck with implementation! Feel free to adjust priorities based on your team's strengths and project needs. 🚀