This module provides a collection of various tools used in the DeepTutor system, including code execution, knowledge retrieval, web search, paper search, and more.
tools/
├── __init__.py # Module initialization, unified export of all tools
├── code_executor.py # Code execution tool ⭐
├── query_item_tool.py # Numbered item query tool ⭐
├── rag_tool.py # RAG retrieval tool ⭐
├── web_search.py # Web search tool
├── paper_search_tool.py # Paper search tool
├── tex_downloader.py # LaTeX source download tool
├── tex_chunker.py # LaTeX text chunking tool
└── README.md # This document
from src.tools import (
run_code, # Code execution
run_code_sync, # Synchronous code execution
query_numbered_item, # Query numbered items
rag_search, # RAG retrieval
web_search, # Web search
PaperSearchTool, # Paper search (optional)
TexDownloader, # LaTeX download (optional)
TexChunker, # LaTeX chunking (optional)
)Execute Python code in an isolated workspace, preserving the original input/output structure.
Key Features:
- Asynchronous and synchronous code execution
- Isolated workspace environment
- Secure filesystem access control
- Support for persistent workspaces
- Automatic cleanup of temporary files
Usage Example:
from src.tools import run_code, run_code_sync
# Asynchronous execution
result = await run_code(
code="print('Hello, World!')",
workspace="my_workspace"
)
# Synchronous execution
result = run_code_sync(
code="x = 1 + 1\nprint(x)",
workspace="my_workspace"
)
# Result format
# {
# "status": "success" | "error",
# "output": str, # Standard output
# "error": str, # Error message (if any)
# "exit_code": int, # Exit code
# }Configuration:
- Configure via
tools.run_codeinsolve_config.yamlorquestion_config.yaml - Environment variables:
RUN_CODE_WORKSPACE: Workspace directoryRUN_CODE_ALLOWED_ROOTS: List of allowed root directories
Query numbered items in the knowledge base, such as definitions, theorems, formulas, figures, etc.
Key Features:
- Support for multiple numbered item types: definitions, theorems, formulas, figures, examples, remarks, etc.
- Support for fuzzy and exact matching
- Returns multiple matching results
- Backward compatible with single result format
Usage Example:
from src.tools import query_numbered_item
# Query definition
result = query_numbered_item(
identifier="Definition 1.1",
kb_name="ai_textbook"
)
# Query formula
result = query_numbered_item(
identifier="(1.2.1)",
kb_name="ai_textbook"
)
# Query figure
result = query_numbered_item(
identifier="Figure 2.5",
kb_name="ai_textbook",
max_results=3
)
# Result format
# {
# "identifier": str, # Original query identifier
# "type": str, # Type: formula/definition/theorem/lemma/figure/example/remark
# "status": str, # success/failed
# "count": int, # Number of matched items
# "items": [ # List of all matched items
# {
# "identifier": str,
# "type": str,
# "content": str
# }
# ],
# "content": str, # Backward compatible: single item content or merged content
# "error": str # Error message (only when failed)
# }Supported Identifier Formats:
- Definition/Theorem:
"Definition 1.1","Theorem 2.3" - Formula:
"(1.2.1)","(2.3.5)" - Figure:
"Figure 1.1","Figure 2.5" - Example/Remark:
"Example 1.1","Remark 2.1"
Knowledge base query tool based on RAG (Retrieval-Augmented Generation).
Key Features:
- Support for multiple query modes:
local,global,hybrid,naive - Based on LightRAG and RAG-Anything frameworks
- Automatic knowledge base connection management
- Support for custom LLM and Embedding configurations
Usage Example:
from src.tools import rag_search
# Asynchronous query
result = await rag_search(
query="What is machine learning?",
kb_name="ai_textbook",
mode="hybrid" # local/global/hybrid/naive
)
# Result format
# {
# "query": str,
# "answer": str, # RAG-generated answer
# "context": str, # Retrieved context
# "prompt": str, # Full prompt (if only_need_prompt=True)
# "status": str, # success/error
# "error": str # Error message (if any)
# }Query Mode Description:
local: Local retrieval, focusing on specific topicsglobal: Global retrieval, obtaining overall knowledgehybrid: Hybrid mode, combining local and global (recommended)naive: Simple retrieval mode
Configuration:
- Configure LLM and Embedding via
main.yaml - Knowledge base path: default
./knowledge_bases
Perform web search using Perplexity API.
Key Features:
- Real-time web search based on Perplexity API
- Automatic saving of search results
- Support for verbose output mode
Usage Example:
from src.tools import web_search
# Perform search
result = web_search(
query="latest deep learning research progress",
output_dir="./search_results", # Optional: save results
verbose=True
)
# Result format
# {
# "query": str,
# "answer": str, # Search result summary
# "result_file": str # Saved file path (if output_dir provided)
# }Dependencies:
- Requires
perplexitypackage - Requires
PERPLEXITY_API_KEYenvironment variable
Installation:
pip install perplexityArXiv paper search tool.
Key Features:
- Search ArXiv papers
- Parse paper metadata
- Format paper information
- Support for sorting by relevance or date
- Support for year limits
Usage Example:
from src.tools import PaperSearchTool
tool = PaperSearchTool()
# Search papers
papers = await tool.search_papers(
query="transformer attention mechanism",
max_results=5,
years_limit=3, # Last 3 years
sort_by="relevance" # relevance or date
)
# Result format
# [
# {
# "title": str,
# "authors": list[str],
# "year": int,
# "abstract": str,
# "url": str,
# "arxiv_id": str,
# "published": str,
# "categories": list[str]
# },
# ...
# ]
# Format paper information
formatted = tool.format_paper_info(papers[0])Dependencies:
- Requires
arxivpackage
Installation:
pip install arxivDownload LaTeX source code of papers from ArXiv.
Key Features:
- Download LaTeX source archive from ArXiv
- Automatically extract and locate main tex file
- Read tex file content
- Support for tar.gz and zip formats
Usage Example:
from src.tools import TexDownloader
downloader = TexDownloader(workspace_dir="./workspace")
# Download LaTeX source
result = await downloader.download_tex(
arxiv_id="2301.00001"
)
# Result format (TexDownloadResult)
# result.success: bool
# result.tex_path: str | None # Main tex file path
# result.tex_content: str | None # Tex file content
# result.error: str | None # Error message (if any)
if result.success:
print(f"Tex file: {result.tex_path}")
print(f"Content length: {len(result.tex_content)}")Helper Function:
from src.tools import read_tex_file
# Directly read tex file
content = read_tex_file(tex_path)Intelligently chunk LaTeX content for processing long documents.
Key Features:
- Chunk by section or token count
- Token estimation (based on GPT tokenizer)
- Maintain context coherence (overlap between chunks)
- Support for custom chunking strategies
Usage Example:
from src.tools import TexChunker
chunker = TexChunker(model="gpt-4o")
# Estimate token count
token_count = chunker.estimate_tokens(tex_content)
# Chunk by section
chunks = chunker.chunk_by_section(
tex_content,
max_tokens_per_chunk=4000,
overlap_tokens=200
)
# Chunk by token count
chunks = chunker.chunk_by_tokens(
tex_content,
max_tokens=4000,
overlap_tokens=200
)
# Result format
# [
# {
# "content": str, # Chunk content
# "tokens": int, # Token count
# "start_line": int, # Start line number
# "end_line": int # End line number
# },
# ...
# ]Dependencies:
- Requires
tiktokenpackage
Installation:
pip install tiktokenThe tools module uses the following environment variables:
# Code execution
RUN_CODE_WORKSPACE=/path/to/workspace
RUN_CODE_ALLOWED_ROOTS=/path1,/path2
# LLM configuration (for RAG)
LLM_MODEL=gpt-4o
LLM_API_KEY=your-api-key
LLM_BASE_URL=https://api.openai.com/v1
# Web search
PERPLEXITY_API_KEY=your-perplexity-key
# Knowledge base path
KB_BASE_DIR=./knowledge_basesTool configuration is mainly in the following configuration files:
config/main.yaml: Main configuration file (LLM, Embedding, etc.)config/solve_config.yaml: Solver configuration (includestools.run_code)config/question_config.yaml: Question configuration (includestools.run_code)
Dependencies required by all tools:
lightrag: RAG frameworkraganything: RAG-Anything framework
Some tools require additional dependencies:
# Web search
pip install perplexity
# Paper search and LaTeX processing
pip install arxiv tiktoken requestsNote: If some optional dependencies are not installed, the related tools will fail to import, but this will not affect the use of other tools. __init__.py gracefully handles import failures.
from src.tools import run_code_sync
# Execute mathematical calculations
result = run_code_sync(
code="""
import numpy as np
x = np.array([1, 2, 3])
print(x.mean())
""",
workspace="math_workspace"
)from src.tools import query_numbered_item, rag_search
# Query specific definition
definition = query_numbered_item("Definition 3.1", kb_name="ai_textbook")
# RAG retrieval of related concepts
context = await rag_search(
query="Explain backpropagation algorithm",
kb_name="ai_textbook",
mode="hybrid"
)from src.tools import PaperSearchTool, TexDownloader, TexChunker
# Search related papers
tool = PaperSearchTool()
papers = await tool.search_papers("transformer architecture", max_results=3)
# Download and process LaTeX source
downloader = TexDownloader("./workspace")
result = await downloader.download_tex(papers[0]["arxiv_id"])
if result.success:
chunker = TexChunker()
chunks = chunker.chunk_by_section(result.tex_content, max_tokens_per_chunk=4000)from src.tools import web_search
# Search for latest information
result = web_search(
query="latest deep learning breakthroughs in 2024",
verbose=True
)- Create a new tool file in the
tools/directory - Implement the tool function or class
- Import and export in
__init__.py - Update this document
It is recommended that tool functions follow this interface specification:
def tool_function(
required_param: str,
optional_param: str | None = None,
**kwargs
) -> dict:
"""
Tool function description
Args:
required_param: Required parameter description
optional_param: Optional parameter description
**kwargs: Other parameters
Returns:
dict: Result dictionary containing status, data, error fields
"""
passAll tools should:
- Return a unified dictionary format result
- Include a
statusfield ("success"or"error") - Provide an
errorfield on failure - Use logging to record error information
- Code Execution Security:
code_executorruns in an isolated environment, but security configuration should still be noted - API Keys: Web search and RAG require corresponding API keys, please keep them secure
- Knowledge Base Path: Ensure the knowledge base path is configured correctly
- Async Functions: Some tools are asynchronous and need to be called with
awaitor in an async environment - Dependency Installation: Some tools require additional dependencies, please install according to usage scenarios