Polymath v4

An evolving applied knowledge-skills hub for spatial multimodal data analysis—bridging theory, code, and actionable skills across domains.

Figure 1: The Polymath v4 architecture integrates unstructured scientific texts and structured codebases through a hybrid vector-graph engine. This enables novel applications including seamless navigation between theory and implementation ('Vibe Coding'), automated knowledge synthesis, and personalized concept mastery.

Status: Production Ready

Last Audit: 2026-01-19 | Auditor: Claude Opus 4.5 | Result: ✅ PASS

Vision

Polymath is a personal polymathic system. Applications are still being discovered, but the core purpose is clear: actionable, implementable knowledge that bridges theory and practice.

This isn't just a paper database. It's:

A concept graph connecting methods across fields (biology ↔ physics ↔ ML)
A code-paper bridge linking implementations to theory
A skill repository capturing successful workflows for reuse
A learning accelerator for conceptual mastery

Current State (2026-01-19)

Component	Count	Status
Documents	2,193	✅
Passages	174,321	✅ 100% embedded
Concepts	7.36M	✅
Repositories	1,881	✅
Code Chunks	578,830	✅
Neo4j Nodes	930K+	✅
Neo4j Edges	2.5M+	✅

Knowledge Structure

Quick Start

cd /home/user/polymath-v4

# Search papers
python scripts/q.py "spatial transcriptomics"

# Find code for papers (Code-Paper Bridge)
python scripts/q.py "gene expression prediction" --code

# Search repositories
python scripts/q.py "attention mechanisms" --repos

# Ingest a PDF
python scripts/ingest_pdf.py paper.pdf

# System health
python scripts/system_report.py --quick

Architecture

polymath-v4/
├── lib/
│   ├── config.py              # Central config (thread-safe)
│   ├── db/postgres.py         # Connection pool (thread-safe)
│   ├── embeddings/bge_m3.py   # BGE-M3 embeddings (thread-safe)
│   ├── search/hybrid_search.py # Vector + BM25 + reranking
│   └── ingest/                # PDF parsing, chunking, asset detection
├── scripts/                   # CLI tools (28 scripts)
├── schema/                    # PostgreSQL migrations (001-010)
├── skills/                    # Operational skills
├── docs/                      # Documentation and audits
└── dashboard/                 # Streamlit UI

Key Features

Hybrid Search

Vector similarity + BM25 keyword matching + optional cross-encoder reranking.

from lib.search.hybrid_search import search
results = search("gene expression prediction", n=10)

Code-Paper Bridge

Find implementations for papers, or papers for code.

python scripts/q.py "transformer architecture" --code

Concept Extraction

Automatic extraction of METHOD, PROBLEM, DOMAIN, ENTITY concepts via Gemini batch API.

python scripts/batch_concepts.py --submit --limit 100
python scripts/batch_concepts.py --process

Neo4j Knowledge Graph

Papers → Passages → Concepts with MENTIONS edges for graph traversal.

python scripts/sync_neo4j.py --full

Databases

Store	Purpose	Connection
PostgreSQL	Documents, passages, embeddings, concepts	`psql -U polymath -d polymath`
Neo4j	Concept graph	`bolt://localhost:7687`

Documentation

Document	Purpose
`CLAUDE.md`	Claude Code guide with commands and config
`ARCHITECTURE.md`	System design and pipeline details
`docs/audits/`	Audit history and verification reports
`skills/`	Operational skills for common workflows

Roadmap

Core search pipeline (vector + BM25 + rerank)
Code-Paper Bridge
Neo4j graph synchronization
Concept extraction (batch API)
Stabilization audit (2026-01-19)
SIMILAR_TO edges for concept clustering
Flashcard generation for learning
Gap analysis across polymathic connections

License

MIT

Acknowledgments

Built with BGE-M3, PostgreSQL + pgvector, Neo4j, PyMuPDF.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
dashboard		dashboard
data		data
docs		docs
lib		lib
schema		schema
scripts		scripts
skills		skills
skills_drafts		skills_drafts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
FIX_ALL_ISSUES.md		FIX_ALL_ISSUES.md
Makefile		Makefile
NEXT_SESSION.md		NEXT_SESSION.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TEST_PLAN.md		TEST_PLAN.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polymath v4

Status: Production Ready

Vision

Current State (2026-01-19)

Knowledge Structure

Quick Start

Architecture

Key Features

Hybrid Search

Code-Paper Bridge

Concept Extraction

Neo4j Knowledge Graph

Databases

Documentation

Roadmap

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

vanbelkummax/polymath-v4

Folders and files

Latest commit

History

Repository files navigation

Polymath v4

Status: Production Ready

Vision

Current State (2026-01-19)

Knowledge Structure

Quick Start

Architecture

Key Features

Hybrid Search

Code-Paper Bridge

Concept Extraction

Neo4j Knowledge Graph

Databases

Documentation

Roadmap

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages