AI Research Assistant - Transform your questions into comprehensive research plans
Discovery Dojo is an intelligent research assistant that converts user questions into fully-developed research plans through a three-phase pipeline: idea generation, novelty assessment, and research planning.
The following diagram represents a high-level system deisgn schematic for the project
The system uses a flow-based architecture built on the PocketFlow framework, orchestrating multiple AI agents through structured workflows with shared state management.
User Question → Idea Generation → RAG Novelty Assessment → Research Planning → Markdown Output
↓ ↓ ↓ ↓ ↓
Web Search Validation ArXiv Papers Interactive Config File Save
Parallel Refinement Vector Similarity Plan Validation Pretty Display
Query → Parallel Search → Summarization → Idea Generation → Interactive Validation → Finalization
↓ (Tavily) ↓ ↓ ↑ ↓
Multiple Web Results Research Ideas User Feedback Final Ideas
Queries Processing Generation Loop Back Storage
Research Idea → Embedding → Retrieval → Reranking → Novelty Assessment
↓ ↓ ↓ (Optional) ↓
Text Input Vector DB Similar Papers Top-N Novelty Score
(Qdrant) from ArXiv Selection + Analysis
User Config → Plan Generation → Validation → Finalization → Markdown Output
↓ ↓ ↓ ↓ ↓
Project Type LLM Planning User Review Add Metadata Beautiful File
Timeline Structured Refinement Timestamps + Console Display
Requirements Output Cycles Context
- QueryGenerationNode: Creates diverse search queries from user input
- ParallelSearchNode: Executes multiple web searches concurrently (Tavily API)
- SummarizationNode: Processes and summarizes search results
- IdeaGenerationNode: Generates research ideas using LLM
- InteractiveValidationNode: User feedback and refinement cycles
- EmbeddingNode: Vector embeddings for similarity search
- RetrievalNode: Qdrant vector database integration
- RankingNode: Optional reranking with local Qwen models
- NoveltyAssessmentNode: Comprehensive novelty scoring
- PlanGenerationNode: Structured research plan creation
- PlanOutputNode: Beautiful markdown generation and file output
Each node is an autonomous agent with specific responsibilities, retry logic, and error handling.
- Map: Parallel web searches across multiple queries
- Reduce: Consolidate results into coherent research ideas
Sequential flow orchestration with conditional branching:
- Validation loops with user feedback
- Early termination on approval
- Maximum cycle limits
- Retrieval: Vector similarity search in ArXiv papers
- Augmentation: Context-aware novelty assessment
- Generation: Evidence-based analysis of proposed research idea against retrieved papers.
Modular pipeline stages that can be run independently or combined:
get_flow("idea_generation") # Ideas only
get_flow("rag") # Novelty only
get_flow("planning") # Planning only
get_flow("complete_assistant") # Full pipelineDynamic flow creation based on user requirements:
flow = get_flow(flow_type)
await flow.run_async(shared_dict)# Install dependencies
uv sync
# Set environment variables
export OPENAI_API_KEY="your-key"
export TAVILY_API_KEY="your-key"
export NEBIUS_API_KEY="your-key"
# Run the full research assistant
uv run src/main.pyImportant
Ensure you have access to a valid Qdrant Databse with embedded ArXiV papers for the RAG flow to work.
You will be asked to provide a url to the valid Qdrant database during the flow (e.g. http://localhost:6333)
- Complete Assistant: Full 3-phase pipeline (recommended)
- Idea Generation: Web research + idea creation only
- RAG Assessment: Novelty analysis against academic papers
- Planning: Convert ideas into actionable research plans
- Legacy Q/A: Flow for a simple single-llm-call question answering
- Framework: PocketFlow (async workflow orchestration)
- LLM: OpenAI GPT models with structured output
- Search: Tavily API for web research
- Vector DB: Qdrant for similarity search
- Embeddings: Qwen3-Embedding models via Nebius Studio
- Reranking: Optional Qwen3-Reranker models
- Output: Beautiful markdown with emojis and formatting
src/
├── domain/ # Domain models and shared state
├── flows/ # Flow definitions and orchestration
├── nodes/ # Individual processing nodes
├── utils/ # LLM, search, and utility functions
└── main.py # CLI interface and flow execution
