LitFlow is an open-source research intelligence platform for literature ingestion, retrieval, survey generation, and knowledge graph analytics.
It processes real research PDFs into a durable corpus, supports citation-grounded Q&A, generates topic reports, and surfaces graph-driven insights for method lineage, competitive performance, dataset dominance, and trends.
- Corpus ingestion from real PDFs (no synthetic seed data)
- Temporal-native orchestration for long-running, resumable pipelines
- RAG Q&A with citations
- Survey builder with Markdown reports
- Knowledge Graph + Research Intelligence dashboard
- Backfills/reprocessing (retry failed, re-embed, regenerate)
- Local-first stack with free defaults (
mock, local embeddings)
Provider switching is implemented in workflow code with deterministic policy:
- Quota exhausted: disable provider for cooldown window, switch immediately
- Rate-limited: bounded backoff, then short disable and switch
- Transient: bounded retries, then switch
- Context-too-long: reduce context and retry path
- Permanent errors: fail step with explicit reason
Failover state is tracked in workflow state (disabledUntil, retry counters), so behavior is durable and replay-safe.
- Fan-out/fan-in ingestion with child workflows and bounded concurrency
- Queryable progress (
GetProgress,GetPaperStatus,GetSurveyProgress, KG status) - Idempotent activities and reproducible reruns
- Safe backfills with versioned manifests
- Clear event history and retry/failover trace in Temporal UI
cmd/api- Go API server (:8080)cmd/worker- Go Temporal workerapps/web- Next.js + Tailwind UI (:3000)internal/workflows- Temporal workflows (ingest, survey, backfill, KG)internal/activities- idempotent workflow activitiesinternal/providers- LLM/embedding provider abstractions + parsinginternal/storage- Postgres reposinternal/vector- pgvector searchinternal/graph- KG extraction, parsing, normalizationmigrations- schema + pgvector + KG migrationsdocker-compose.yml- Temporal, Temporal UI, Postgres
Docker services:
- Temporal server:
localhost:7233 - Temporal UI:
http://localhost:8233 - Postgres + pgvector:
localhost:5432
App services:
- API:
http://localhost:8080 - Web:
http://localhost:3000
Data directories:
- Inbound PDFs:
./data/in/{corpusId}/... - Artifacts/reports/manifests:
./data/out/{corpusId}/...
- Docker + Docker Compose
- Go (compatible with
go.mod) - Node.js 20+
- npm
cp .env.example .env
make up
make migrateRun in separate terminals:
make worker
make api
make webOpen:
- Web:
http://localhost:3000 - Temporal UI:
http://localhost:8233
Stop everything:
make downmake upmake downmake migratemake apimake workermake webmake test
Both API and worker auto-load .env.
Core provider settings:
LITFLOW_LLM_PROVIDERS="mock|openai:key1|groq:key2"LITFLOW_EMBED_PROVIDERS="mock|ollama:nomic|ollama:bge|openai:key1"LITFLOW_PROVIDER_COOLDOWN_SECONDS=900
Embedding settings:
LITFLOW_EMBED_DIM=1536LITFLOW_EMBED_VERSION=v1LITFLOW_CHUNK_SIZE=1200LITFLOW_CHUNK_OVERLAP=200
Frontend API base:
NEXT_PUBLIC_LITFLOW_API_BASE=http://localhost:8080
Optional providers:
- OpenAI:
OPENAI_API_KEYor aliasedLITFLOW_OPENAI_KEY_<ALIAS> - Groq:
GROQ_API_KEYor aliasedLITFLOW_GROQ_KEY_<ALIAS> - Ollama embeddings:
LITFLOW_OLLAMA_BASE_URL=http://localhost:11434LITFLOW_OLLAMA_EMBED_MODEL_NOMIC=nomic-embed-textLITFLOW_OLLAMA_EMBED_MODEL_BGE=bge-small-en-v1.5
- Lists PDFs from corpus input directory
- Starts
PaperProcessWorkflowchildren with concurrency limits - Continues despite individual paper failures
- Exposes query:
GetProgress - Writes corpus summary artifact
- Computes stable
paper_id - Extracts text (text PDFs only; no OCR)
- Chunks and embeds with provider failover
- Upserts chunks + embeddings idempotently
- Writes per-paper artifacts and status
- Exposes query:
GetPaperStatus
- Retrieves relevant chunks per topic
- Generates outline + sections with failover
- Produces Markdown report + citations
- Exposes query:
GetSurveyProgress
RETRY_FAILED_PAPERSREEMBED_ALL_PAPERSREGENERATE_SURVEY- Emits versioned run manifest
KGBackfillWorkflow(corpus-wide)KGExtractPaperWorkflow(single paper)
The Knowledge Graph page provides productized insights (not query-console-first UX):
- Overview
- Lineage Explorer
- Performance Matrix
- Dataset Dominance
- Trend Timeline
- Full Graph (optional deep view)
Open http://localhost:8233.
Recommended views:
- Filter by workflow type:
CorpusIngestWorkflowPaperProcessWorkflowSurveyBuildWorkflowBackfillWorkflowKGBackfillWorkflowKGExtractPaperWorkflow
- Inspect event history to trace:
- activity retries
- provider failover transitions
- cooldown/disable behavior
- terminal failure causes
- Idempotent DB upserts (
ON CONFLICT DO UPDATE) - Atomic artifact writes (temp + rename)
- Workflow-level provider failover policy
- Audit trail for provider calls (
llm_calls) - KG schema bootstrap protections for local drift
go test ./...
go build ./...
cd apps/web && npm install && npm run build