Skip to content

[Phase 7] Project Structure Refactoring - Country Modules #12

@mikhashev

Description

@mikhashev

Task

Restructure codebase to support pluggable country modules, enabling future multi-country expansion. This refactoring is informed by P2P research to ensure architecture can support both centralized and decentralized modes.

Overview

Duration: Months 3-4 (parallel with Phase 3)

Current codebase is Russia-specific. This phase refactors it into country-agnostic core with pluggable country-specific modules.

7.1 Pluggable Country Module Architecture

Current Structure

scripts/
├── crawler/        # pravo.gov.ru API (Russia-specific)
├── parser/         # Russian legal document parser
├── consolidation/  # Russian code consolidation
├── sync/           # Russian data sync
└── import/         # Russian legal codes

Target Structure

scripts/
├── core/               # Country-independent (existing, expand)
│   ├── config.py
│   ├── db.py
│   └── batch_saver.py
│
├── country_modules/    # Country-specific modules (NEW)
│   ├── base/           # Abstract base classes
│   │   ├── scraper.py          # BaseScraper interface
│   │   ├── parser.py           # BaseParser interface
│   │   ├── consolidator.py     # BaseConsolidator interface
│   │   └── schema.py           # Base schema definitions
│   │
│   ├── russia/         # Russian Federation (refactor existing)
│   │   ├── scrapers/
│   │   ├── parsers/
│   │   ├── consolidation/
│   │   └── schemas/
│   │
│   └── germany/       # Germany (future)
│
├── legal_systems/     # Legal system adapters (NEW)
│   ├── civil_law/     # Code-based systems (Russia, Germany, France)
│   └── common_law/    # Case law systems (UK, USA, Canada)
│
└── indexer/           # Country-agnostic (unchanged)

Files to Create

  • scripts/country_modules/base/scraper.py - Abstract base class for scrapers
  • scripts/country_modules/base/parser.py - Abstract base class for parsers
  • scripts/country_modules/base/consolidator.py - Abstract base class for consolidation
  • scripts/legal_systems/civil_law/schema.py - Civil law common schema
  • scripts/legal_systems/common_law/schema.py - Common law common schema

Files to Refactor

  • scripts/crawler/pravo_api_client.pyscripts/country_modules/russia/scrapers/pravo_api_client.py
  • scripts/parser/html_parser.pyscripts/country_modules/russia/parsers/html_parser.py
  • scripts/consolidation/consolidate.pyscripts/country_modules/russia/consolidation/consolidate.py

7.2 Country Registry and Configuration

Create Country Registry

# scripts/country_modules/registry.py
class CountryModule:
    """Country-specific module configuration"""
    
    def __init__(
        self,
        country_id: str,        # ISO 3166-1 alpha-3 (e.g., "RUS", "DEU")
        country_name: str,
        legal_system: str,      # "civil_law", "common_law", "mixed"
        scraper_class: Type[BaseScraper],
        parser_class: Type[BaseParser],
        data_sources: Dict[str, str],
        jurisdiction_levels: list,
    ):
        ...

# Country registry
COUNTRIES: Dict[str, CountryModule] = {
    "RUS": CountryModule(
        country_id="RUS",
        country_name="Russia",
        legal_system="civil_law",
        scraper_class=RussiaPravoScraper,
        parser_class=RussiaHtmlParser,
        data_sources={
            "federal": "http://pravo.gov.ru",
            "supreme_court": "https://vsrf.ru",
            "constitutional_court": "http://www.ksrf.ru",
        },
        jurisdiction_levels=["federal", "regional", "municipal"],
    ),
}

7.3 Database Schema for Multi-Country

Schema Updates

-- Add country_id to existing tables
ALTER TABLE documents ADD COLUMN country_id VARCHAR(3) NOT NULL DEFAULT 'RUS';
ALTER TABLE documents ADD CONSTRAINT fk_country
    FOREIGN KEY (country_id) REFERENCES countries(id);

ALTER TABLE documents ADD COLUMN jurisdiction_level VARCHAR(20);
ALTER TABLE documents ADD COLUMN jurisdiction_id VARCHAR(100);

-- Update countries table
CREATE TABLE countries (
    id VARCHAR(3) PRIMARY KEY,      -- ISO 3166-1 alpha-3
    name_en VARCHAR(100),
    name_native VARCHAR(100),
    legal_system_type VARCHAR(50),  -- 'civil_law', 'common_law', 'mixed'
    federal_structure BOOLEAN,
    official_languages VARCHAR(100)[],
    data_sources JSONB,
    scraper_config JSONB,
    parser_config JSONB,
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT NOW()
);

7.4 MCP Server Country Parameter

Update MCP Tools

// src/tools/query-laws.ts
export const queryLawsTool = {
  name: "query-laws",
  description: "Search legal documents by country",
  inputSchema: {
    country?: "string",        // NEW: Country code (default: "RUS")
    query: "string",
    filters?: "SearchFilters",
    use_hybrid?: "boolean"
  }
};

7.5 Migration Path for Russia Module

Migration Steps

  1. Create new structure without touching existing code
  2. Move Russia module to country_modules/russia/
  3. Create shims for backward compatibility
  4. Update imports gradually
  5. Remove shims after all imports updated

Backward Compatibility

  • All existing scripts continue to work
  • Gradual migration via shims
  • No breaking changes to MCP tools
  • Database migration uses default country_id='RUS'

Deliverables

  • Refactored codebase with country modules
  • Country registry and configuration
  • Multi-country database schema
  • MCP server country parameter support
  • Backward-compatible migration completed

Timeline

Month 3: Create base classes, refactor core modules
Month 4: Move Russia module, create shims, test migration

Reference

Priority

HIGH - Enables multi-country expansion

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions