AutoRAG-Research is a Python framework for automating RAG (Retrieval-Augmented Generation) research workflows. It provides tools for data ingestion, pipeline execution, and evaluation metrics. The user can download pre-ingested datasets and run pre-made RAG pipelines and evaluate it. The pipelines can be customized. There are two types (retrieval/generation) pipelines.
# Setup
make install # Create venv, install deps, setup pre-commit hooks
uv sync --all-groups --all-extras # Install all deps including optional (gpu, search)
# Code Quality
make check # Run all checks (ruff, ty type checker, deptry)
# Testing (requires Docker)
make test # Full test with Docker PostgreSQL lifecycle management
make test-only # Run tests (assumes PostgreSQL container is running)
make test-data # Run tests only marked as data
make test-full # Run all tests including api/gpu/data marked tests
# Run single test
uv run pytest tests/path/to/test_file.py::test_function_name -v
# Docker
make docker-up # Start PostgreSQL container
make docker-wait # Wait for PostgreSQL readiness
make docker-down # Stop container
make clean-docker # Remove container and volumes
# Docs
make docs # Build and serve docs locallyThe codebase follows a layered architecture with Generic Repository + Unit of Work + Service Layer patterns:
Executor/Evaluator (config.py, executor.py, evaluator.py)
↓
Pipeline Layer (pipelines/)
↓
Service Layer (orm/service/) - Business logic
↓
Unit of Work (orm/uow/) - Transaction management
↓
Repository Layer (orm/repository/) - Data access (GenericRepository[T])
↓
ORM Models (orm/models/) - SQLAlchemy with pgvector
Pipeline Types:
- Retrieval Pipelines (
pipelines/retrieval/) - Vector search, BM25, hybrid retrieval- Extend
BaseRetrievalPipeline - Use
RetrievalPipelineService+RetrievalUnitOfWork - Methods:
.retrieve(query, top_k)for single-query,.run()for batch
- Extend
- Generation Pipelines (
pipelines/generation/) - LLM-based answer generation- Extend
BaseGenerationPipeline - Use
GenerationPipelineService+GenerationUnitOfWork - Compose with retrieval pipelines for flexible RAG strategies
- Example:
NaiveRAGPipeline(single retrieve + generate)
- Extend
Key Entry Points:
executor.py- Orchestrates pipeline execution and metric evaluationconfig.py- Configuration dataclasses (ExecutorConfig, BasePipelineConfig, BaseMetricConfig)orm/schema_factory.py- Dynamic schema creation with custom embedding dimensions viacreate_schema(dim)
- Generic Repository: Base CRUD in
orm/repository/base.py, specialized repos extend it - Unit of Work: Context managers in
orm/uow/ensure transaction safety - Service Layer: Business logic in
orm/service/, uses UoW for data access - Vector Search: Uses VectorChord with pgvector
Vectortype - Driver: psycopg3 (imported as
psycopg) - Async: Use SQLAlchemy's greenlet-based bridging (
sqlalchemy.util.concurrency.greenlet_spawn)
- Use
db_sessionfixture from conftest.py (don't create new sessions) - Test data is pre-seeded in
postgresql/db/init/002-seed.sql- use it - Test file structure mirrors package structure in
tests/ - One test function = one function under test
- Clean up any data you add in tests
Test Markers:
@pytest.mark.gpu # Requires GPU
@pytest.mark.api # Requires LLM/API calls (prefer mocks)
@pytest.mark.data # Downloads external data
@pytest.mark.asyncio # Async testPrefer mocks over real API calls (use LangChain FakeListLLM/FakeEmbeddings from langchain_core).
- Python 3.10+ type hints: use
list,dict,|(nottyping.List,typing.Optional) - Line length: 120 characters
- Ruff for linting/formatting
- Type checking: ty
- Alwasy use
logger = logging.getLogger("AutoRAG-Research")to initialize logger - Do not use print statements for logging. Or any other logging methods.
MANDATORY: Before implementing any utility logic, consult ai_instructions/utility_reference.md.
DO NOT duplicate — reuse existing helpers. Run /check-duplication to verify.
- Document/Page/Chunk/ImageChunk - Content storage with embeddings
- Query - Search queries with ground truth
- RetrievalRelation - Query-to-chunk relevance (composite key: query_id, group_index, group_order)
- Pipeline/Metric - Configuration storage
- ChunkRetrievedResult - Retrieval pipeline outputs (query → chunks with scores)
- ExecutorResult - Generation pipeline outputs (generation_result, token_usage, execution_time)
- EvaluationResult - Metric evaluation outputs
Detailed patterns and examples are in /ai_instructions/ (and schema DDL in postgresql/db/init/):
db_pattern.md- Repository, UoW, Service patterns with code examplespostgresql/db/init/001-schema.sql- Canonical DB schema DDL (single source of truth)test_code_generation_instructions.md- Testing conventionsutility_reference.md- MANDATORY utility & service method catalog (read before implementing)pipeline_checklist.md- MANDATORY pipeline completion checklist (verify before merging any pipeline PR)metric_implementation.md- MANDATORY checklist for adding new evaluation metrics (Config, YAML, exports, tests)
- Code quality checks are automated via hooks -
make checkruns automatically after file edits and on session end - Generate the corresponding test file correctly when you add the code, and runs the test code before commit (
make test) - Do not use relative imports.
- Always use
uv sync --all-extras --all-groupsto install dependencies to avoid missing optional packages. - Do not add
__init__.pyintests/directory. - DO NOT CHANGE ty, ruff, deptry configurations in the
pyproject.toml. Use ignore annotation if necessary. - DO NOT USE
globalvariable anywhere in the codebase. - When adding new utility functions to
util.py, you MUST also updateai_instructions/utility_reference.md.