Skip to content

Latest commit

Β 

History

History
176 lines (147 loc) Β· 8.62 KB

File metadata and controls

176 lines (147 loc) Β· 8.62 KB

InfoMesh β€” Tech Stack & Coding Conventions


1. Tech Stack

Layer Technology Notes
Language Python 3.12+ Use modern Python features (type hints, match, type statement, StrEnum, TypeVar defaults, etc.)
P2P Network libp2p (py-libp2p) DHT, NAT traversal, encryption built-in
DHT Kademlia Proven distributed hash table for index & crawl coordination
Crawling httpx + asyncio Async-first high-performance HTTP client
HTML Parsing trafilatura Best accuracy for main-content extraction
Keyword Index SQLite FTS5 Zero-install, embedded full-text search
Vector Index ChromaDB Semantic search with embeddings
MCP Server mcp-python-sdk VS Code / Claude integration
Admin API FastAPI Local status & config endpoints
Serialization msgpack Faster and smaller than JSON
Compression zstd Level-tunable compression; dictionary mode for similar documents
Local LLM ollama / llama.cpp Optional local summarization (Qwen 2.5, Llama 3.x, etc.)
Logging structlog Structured logging for all library code
Package Manager uv Fast Python package/project manager (replaces pip/venv)

Optional / Fallback Dependencies

Package When Used
BeautifulSoup4 HTML parsing fallback when trafilatura fails
vLLM High-throughput GPU inference (alternative to ollama/llama.cpp)
sentence-transformers Embedding generation for ChromaDB vector index

2. Project Structure

infomesh/
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ infomesh/
β”‚   β”œβ”€β”€ __init__.py          # Package root
β”‚   β”œβ”€β”€ __main__.py          # CLI entry point
β”‚   β”œβ”€β”€ config.py            # Configuration management
β”‚   β”œβ”€β”€ services.py          # Central AppContext + index_document orchestration
β”‚   β”œβ”€β”€ p2p/                 # P2P network layer
β”‚   β”‚   β”œβ”€β”€ node.py          #   Peer main process
β”‚   β”‚   β”œβ”€β”€ dht.py           #   Kademlia DHT
β”‚   β”‚   β”œβ”€β”€ routing.py       #   Query routing
β”‚   β”‚   β”œβ”€β”€ replication.py   #   Document/index replication
β”‚   β”‚   └── protocol.py      #   Message protocol definitions
β”‚   β”œβ”€β”€ crawler/             # Web crawler
β”‚   β”‚   β”œβ”€β”€ worker.py        #   Async crawl workers
β”‚   β”‚   β”œβ”€β”€ scheduler.py     #   URL assignment (DHT-based)
β”‚   β”‚   β”œβ”€β”€ parser.py        #   HTML β†’ text extraction
β”‚   β”‚   β”œβ”€β”€ robots.py        #   robots.txt compliance
β”‚   β”‚   β”œβ”€β”€ dedup.py         #   Deduplication pipeline (URL, SHA-256, SimHash)
β”‚   β”‚   β”œβ”€β”€ seeds.py         #   Seed URL management & category selection
β”‚   β”‚   └── crawl_loop.py    #   Continuous seed-and-crawl loop (extracted from services.py)
β”‚   β”œβ”€β”€ index/               # Search index
β”‚   β”‚   β”œβ”€β”€ local_store.py   #   SQLite FTS5 local index
β”‚   β”‚   β”œβ”€β”€ vector_store.py  #   ChromaDB vector index
β”‚   β”‚   β”œβ”€β”€ distributed.py   #   DHT inverted-index publish/query
β”‚   β”‚   └── ranking.py       #   BM25 + freshness + trust scoring
β”‚   β”œβ”€β”€ search/              # Search engine
β”‚   β”‚   β”œβ”€β”€ query.py         #   Query parsing + distributed orchestration
β”‚   β”‚   └── merge.py         #   Multi-node result merging
β”‚   β”œβ”€β”€ mcp/                 # MCP server (SRP: split into 4 modules)
β”‚   β”‚   β”œβ”€β”€ server.py        #   Thin wiring: Server creation, tool dispatch, runners
β”‚   β”‚   β”œβ”€β”€ tools.py         #   Tool schema definitions + filter extraction
β”‚   β”‚   β”œβ”€β”€ handlers.py      #   Tool handler implementations (handle_search, etc.)
β”‚   β”‚   └── session.py       #   SearchSession, AnalyticsTracker, WebhookRegistry
β”‚   β”œβ”€β”€ api/                 # Local admin API
β”‚   β”‚   └── local_api.py     #   FastAPI (status, config)
β”‚   β”œβ”€β”€ credits/             # Incentive system
β”‚   β”‚   β”œβ”€β”€ types.py         #   ActionType, CreditState, dataclasses (extracted from ledger.py)
β”‚   β”‚   └── ledger.py        #   SQLite-backed credit ledger (imports types from types.py)
β”‚   β”œβ”€β”€ trust/               # Trust & integrity
β”‚   β”‚   β”œβ”€β”€ attestation.py   #   Content attestation chain (signing, verification)
β”‚   β”‚   β”œβ”€β”€ audit.py         #   Random audit system
β”‚   β”‚   └── scoring.py       #   Unified trust score computation
β”‚   β”œβ”€β”€ summarizer/          # Local LLM summarization
β”‚   β”‚   β”œβ”€β”€ engine.py        #   LLM backend abstraction (ollama, llama.cpp)
β”‚   β”‚   β”œβ”€β”€ summarize.py     #   Content summarization pipeline
β”‚   β”‚   └── verify.py        #   Summary verification (key-fact anchoring, NLI)
β”‚   └── compression/         # Data compression
β”‚       └── zstd.py          #   zstd compression with dictionary support
β”œβ”€β”€ bootstrap/
β”‚   └── nodes.json           # Bootstrap node list
β”œβ”€β”€ seeds/                   # Bundled seed URL lists
β”‚   β”œβ”€β”€ tech-docs.txt        #   Technology documentation URLs
β”‚   β”œβ”€β”€ academic.txt         #   Academic paper source URLs
β”‚   └── encyclopedia.txt     #   Encyclopedia URLs
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ conftest.py          # Shared fixtures
β”‚   β”œβ”€β”€ test_dht.py
β”‚   β”œβ”€β”€ test_crawler.py
β”‚   β”œβ”€β”€ test_index.py
β”‚   β”œβ”€β”€ test_search.py
β”‚   β”œβ”€β”€ test_credits.py
β”‚   β”œβ”€β”€ test_trust.py
β”‚   β”œβ”€β”€ test_summarizer.py
β”‚   β”œβ”€β”€ test_mcp.py
β”‚   β”œβ”€β”€ test_services.py     # Services layer tests
β”‚   └── test_mcp_handlers.py # MCP handler tests
└── docs/

3. Coding Conventions

3.1 General

  • Language: All source code, comments, docstrings, commit messages, and PR descriptions in English.
  • Python version: 3.12+ β€” use modern syntax (match/case, type statement, StrEnum, TypeVar defaults).
  • Async-first: All I/O-bound code must use async/await with asyncio. Never use blocking I/O in the event loop.
  • Type hints: Required on all public functions and class attributes. Use from __future__ import annotations for forward references.

3.2 Style & Formatting

  • Formatter: ruff format (default settings, line length 88).
  • Linter: ruff with select = ["E", "F", "I", "UP", "B", "SIM"].
  • Import order: stdlib β†’ third-party β†’ local (enforced by ruff/isort).
  • Prefer pathlib.Path over os.path.

3.3 Naming

Target Convention Example
Modules/packages snake_case local_store.py
Classes PascalCase SearchResult
Functions/methods/variables snake_case parse_query()
Constants UPPER_SNAKE_CASE MAX_RETRIES
Private members Single underscore _internal_state

3.4 Error Handling

  • Use specific exception types, not bare except:.
  • Log errors with structlog or stdlib logging β€” never print() in library code.
  • Network/IO failures must be retried with exponential backoff where appropriate.

3.5 Testing

  • Framework: pytest with pytest-asyncio for async tests.
  • Test files mirror source layout: infomesh/p2p/dht.py β†’ tests/test_dht.py.
  • Each public function/method should have at least one test.
  • Use fixtures and factories over inline setup.

4. Dependencies & Package Management

Using uv

uv is used for all dependency resolution, virtual environments, and project management.

  • All dependencies declared in pyproject.toml under [project.dependencies].
  • Dev dependencies under [dependency-groups] (PEP 735) or [project.optional-dependencies.dev].
  • Pin minimum versions only (e.g., httpx>=0.27), not exact pins.
  • Lock file: uv.lock β€” committed to the repository for reproducible builds.
  • No requirements.txt, no pip β€” use uv commands only.

Key Commands

uv sync              # Install all dependencies (creates .venv automatically)
uv sync --dev        # Install with dev dependencies
uv add <package>     # Add a new dependency
uv add --dev <pkg>   # Add a dev dependency
uv run <command>     # Run a command within the project environment
uv run pytest        # Run tests
uv run infomesh start  # Run the application

Related docs: Overview Β· Architecture Β· Credit System Β· Legal Β· Trust & Integrity Β· Security Audit Β· Console Dashboard Β· MCP Integration Β· Publishing Β· FAQ