Skip to content

Latest commit

 

History

History
171 lines (130 loc) · 6.08 KB

File metadata and controls

171 lines (130 loc) · 6.08 KB

Data Directory

This directory stores all data files for the AI-Tutor system, including knowledge bases, user data, logs, etc.

📁 Directory Structure

data/
├── knowledge_bases/          # Knowledge base storage directory
│   ├── kb_config.json        # Knowledge base configuration file
│   └── {kb_name}/            # Individual knowledge base directories
│       ├── metadata.json     # Knowledge base metadata
│       ├── numbered_items.json  # Numbered items (definitions, theorems, etc.)
│       ├── raw/               # Original documents (PDF/Markdown)
│       ├── images/            # Extracted images
│       ├── content_list/      # Document content list
│       └── rag_storage/       # RAG knowledge graph storage
│           ├── graph_chunk_entity_relation.graphml
│           ├── kv_store_*.json
│           └── vdb_*.json
│
└── user/                      # User data directory
    ├── solve/                 # Problem solving module output
    │   └── solve_YYYYMMDD_HHMMSS/
    │       ├── investigate_memory.json
    │       ├── solve_chain.json
    │       ├── citation_memory.json
    │       ├── final_answer.md
    │       └── artifacts/     # Code execution output
    │
    ├── question/              # Question generation module output
    │   └── question_YYYYMMDD_HHMMSS/
    │
    ├── research/              # Research module output
    │   ├── cache/             # Research cache
    │   │   └── research_*/    # Queue and intermediate results
    │   └── reports/           # Research reports
    │       └── research_*.md
    │
    ├── guide/                 # Guided learning output
    │   └── session_{session_id}.json
    │
    ├── notebook/              # Notebook data
    │   └── notebooks_index.json
    │
    ├── co-writer/             # Co-Writer output
    │   ├── audio/             # TTS audio files
    │   └── tool_calls/        # Tool call history
    │
    ├── logs/                  # System logs
    │   └── ai_tutor_*.log
    │
    ├── run_code_workspace/    # Code execution workspace
    │
    └── user_history.json      # User activity history

📋 Directory Description

knowledge_bases/

Stores all knowledge base data files. Each knowledge base contains:

  • metadata.json: Knowledge base metadata, including creation time, update time, update history, etc.
  • numbered_items.json: Extracted numbered items (Definition, Theorem, Formula, etc.)
  • raw/: Original uploaded documents (PDF, Markdown, etc.)
  • images/: Images extracted from documents
  • content_list/: Document content list (JSON format)
  • rag_storage/: RAG knowledge graph storage files
    • graph_chunk_entity_relation.graphml: Knowledge graph structure
    • kv_store_*.json: Key-value storage (documents, entities, relations, etc.)
    • vdb_*.json: Vector database indices

user/

Stores all user-generated data and output files.

solve/

Problem solving module output directory. Each solving task generates a timestamped directory:

  • investigate_memory.json: Analysis Loop memory data
  • solve_chain.json: Complete Solve Loop steps and tool call records
  • citation_memory.json: Citation management data
  • final_answer.md: Final answer (Markdown format)
  • artifacts/: Files generated by code execution (images, data files, etc.)

question/

Question generation module output directory. Each question generation task generates a timestamped directory containing generated questions and validation results.

research/

Research module output directory:

  • cache/: Intermediate data during research (queue state, planning results, etc.)
  • reports/: Final generated research reports (Markdown format)

guide/

Guided learning module output directory. Each learning session is saved as a JSON file containing session state, knowledge points, chat history, etc.

notebook/

Notebook data storage. notebooks_index.json contains index information for all notebooks.

co-writer/

Co-Writer module output directory:

  • audio/: TTS-generated audio files
  • tool_calls/: AI tool call history

logs/

System log files, named by date.

run_code_workspace/

Code execution tool workspace for temporarily storing files generated by code execution.

🔧 Configuration

Data directory paths are configured in config/main.yaml:

paths:
  user_data_dir: "./data/user"
  knowledge_bases_dir: "./data/knowledge_bases"
  user_log_dir: "./data/user/logs"

📝 Notes

  1. Backup Important Data: Recommend regularly backing up knowledge_bases/ and important user data
  2. Version Control: Recommend adding data/ directory to .gitignore to avoid committing large files
  3. Disk Space: Knowledge bases and user data may occupy significant disk space, clean old data regularly
  4. Permission Management: Ensure application has read/write permissions
  5. Path Consistency: All modules use unified path configuration, avoid hardcoded paths

🔗 Related Modules

  • Knowledge Base Management: src/knowledge/ - Knowledge base creation, updates, queries
  • User Data: Each functional module automatically manages its corresponding user data directory
  • Logging System: src/core/logging/ - Unified logging management

🛠️ Maintenance Operations

Clean Old Data

# Clean old solving records (keep last 30 days)
find data/user/solve -type d -mtime +30 -exec rm -rf {} \;

# Clean old log files (keep last 7 days)
find data/user/logs -name "*.log" -mtime +7 -delete

Backup Knowledge Base

# Backup entire knowledge base directory
tar -czf knowledge_bases_backup_$(date +%Y%m%d).tar.gz data/knowledge_bases/

# Backup specific knowledge base
tar -czf ai_textbook_backup.tar.gz data/knowledge_bases/ai-textbook/

Restore Knowledge Base

# Restore knowledge base
tar -xzf knowledge_bases_backup_20250101.tar.gz -C data/