Skip to content

[FEAT] Comprehensive Court Decision Fetching - All Court Types (2022-2024) #25

@mikhashev

Description

@mikhashev

Feature Description

Add comprehensive court decision fetching capabilities to Law7, covering all Russian court types (arbitration, general jurisdiction, supreme/constitutional) for the last 2 years (2022-2024).

Problem Statement

Law7 currently has no court decisions in the database. Court decisions show how laws are actually interpreted and applied in practice, which is invaluable for:

  • AI Assistance: Better understanding of how legal articles work in real cases
  • Legal Research: Finding precedents for specific articles
  • Article Context: Seeing practical applications of legal codes
  • Historical Tracking: How court interpretations change over time

Related Work

This feature expands significantly on Phase 7C (Issue #22), which currently covers only:

  • Supreme Court + Constitutional Court only (~1K-2K docs)

This issue adds:

  • Arbitration courts (kad.arbitr.ru) - economic disputes
  • General jurisdiction (sudrf.ru) - civil, criminal, administrative cases
  • Supreme/Constitutional courts (vsrf.ru, ksrf.ru) - high-level precedents
  • Time scope: Last 2 years (2022-2024) instead of all-time

Proposed Solution

Hybrid approach (balance reliability and coverage):

  1. Start with pravo.gov.ru API (official, stable) - quick wins
  2. Add scraping for comprehensive coverage:
    • kad.arbitr.ru (arbitration courts)
    • sudrf.ru (general jurisdiction)
    • vsrf.ru / ksrf.ru (supreme/constitutional)

Architecture

Follow the established country_modules pattern from Phase 7A:

scripts/country_modules/russia/scrapers/
├── court_scraper.py          # Base court scraper (extend BaseScraper)
├── kad_scraper.py            # Arbitration courts scraper
├── sudrf_scraper.py          # General jurisdiction scraper
└── supreme_scraper.py        # Supreme + Constitutional courts

Database Schema

-- Court decisions metadata
CREATE TABLE court_decisions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    country_id VARCHAR(3) REFERENCES countries(id),
    case_number VARCHAR(255) UNIQUE NOT NULL,
    decision_date DATE NOT NULL,
    court_name TEXT NOT NULL,
    court_code VARCHAR(50),
    case_type VARCHAR(100),
    instance VARCHAR(50), -- first, appeal, cassation, supreme
    decision_text TEXT,
    source_url TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Link court decisions to articles they interpret
CREATE TABLE court_decision_article_references (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    court_decision_id UUID REFERENCES court_decisions(id) ON DELETE CASCADE,
    code_id VARCHAR(50), -- e.g., 'GK_RF', 'UK_RF'
    article_number VARCHAR(50), -- e.g., '123', '124.1'
    reference_context TEXT, -- excerpt showing how article was interpreted
    reference_type VARCHAR(50), -- 'cited', 'interpreted', 'applied'
    created_at TIMESTAMP DEFAULT NOW()
);

-- Court metadata reference table
CREATE TABLE courts (
    id VARCHAR(50) PRIMARY KEY, -- court code
    name TEXT NOT NULL,
    court_type VARCHAR(50), -- 'arbitration', 'general', 'supreme', 'constitutional'
    url TEXT,
    jurisdiction TEXT
);

Article Reference Extraction

Parse court decisions to extract article citations using regex patterns:

# Russian court decision citation patterns
patterns = [
    r'(?:ст\.?\s*|статья\s+)(\d+(?:\.\d+)*)\s+([А-ЯЁA-Z]{2,}(?:\s+[А-ЯЁA-Z]{2,})?(?:\s*РФ)?)',  # "ст. 15 ГК РФ"
    r'(?:п\.?\s*|пункт\s+)(\d+(?:\.\d+)*)\s*(?:ст\.?\s*|статья\s+)(\d+)',  # "п. 2 ст. 15"
]

MCP Tool

New MCP tool: get-court-decisions-for-article

{
  name: "get-court-decisions-for-article",
  description: "Get court decisions that interpret or apply a specific legal article",
  inputSchema: {
    code_id: "string", // e.g., "GK_RF"
    article_number: "string", // e.g., "123"
    court_type?: "string", // optional filter
    limit: "number" // default 10
  }
}

Alternatives Considered

  1. Only pravo.gov.ru API - simpler but limited coverage
  2. Only commercial APIs (ConsultantPlus, Garant) - rejected, official sources only
  3. Scraping only - comprehensive but high maintenance burden
  4. Hybrid approach (selected) - starts with API, adds scraping for coverage

Official Sources Only

Constraint: Use only official government sources

  • ✅ pravo.gov.ru API (official legal publication portal)
  • ✅ kad.arbitr.ru (arbitration courts database)
  • ✅ sudrf.ru (general jurisdiction courts database)
  • ✅ vsrf.ru (Supreme Court official site)
  • ✅ ksrf.ru (Constitutional Court official site)
  • ❌ ConsultantPlus, Garant, Sudact (commercial - excluded)

Additional Context

Current Database: 157K+ legal documents with full consolidation history (2011-present)

Target: Court decisions for last 2 years (2022-2024) with:

  • Article reference links
  • Partial embeddings (summaries only) for semantic search
  • Metadata filtering by court, case type, date

Related Files:

  • scripts/country_modules/base/scraper.py - BaseScraper ABC to extend
  • scripts/country_modules/registry.py - Register new scrapers
  • docker/postgres/init.sql - Database schema
  • src/server.ts - MCP server tool registration

Implementation Ideas (Optional)

Implementation Timeline (~8 weeks)

Week 1: Foundation

  • Create database schema (court_decisions, article_references, courts)
  • Extend PravoApiClient for court decision endpoints
  • Implement court decision parser (article reference extraction)

Week 2: Official API Integration

  • Fetch court decisions from pravo.gov.ru API (last 2 years)
  • Parse and store in database
  • Extract article references
  • Generate partial embeddings (summaries only)

Week 3-4: Arbitration Courts

  • Implement kad_scraper.py (arbitration courts)
  • Fetch last 2 years of decisions
  • Parse and merge with existing data

Week 5-6: General Jurisdiction

  • Implement sudrf_scraper.py (general courts)
  • Fetch last 2 years of decisions
  • Parse and merge with existing data

Week 7: Supreme/Constitutional Courts

  • Implement vsrf/ksrf scrapers
  • Fetch last 2 years of high-level decisions

Week 8: MCP Tool & Search

  • Implement get-court-decisions-for-article tool
  • Add semantic search for summaries
  • Test end-to-end functionality

Reference Scrapers

Leverage existing GitHub implementations as reference:

  • yuglebov/kad_arbitr_ru - KAD scraper
  • tochno-st/sudrfscraper - SUDRF scraper

Add respectful rate limiting (10-30s delays) per AI_WORKFLOW.md guidelines.

Reference

Priority

HIGH - Valuable context for AI and users, official sources available, architecture ready

Success Criteria

  1. Database schema created for court decisions and article references
  2. Scrapers implemented for all 4 official court sources
  3. Court decisions fetched for last 2 years (2022-2024)
  4. Article references extracted and linked
  5. MCP tool functional for querying decisions by article
  6. Partial embeddings generated for semantic search
  7. End-to-end test: Query court decisions for article 15 of Civil Code

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions