-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Feature Description
Add comprehensive court decision fetching capabilities to Law7, covering all Russian court types (arbitration, general jurisdiction, supreme/constitutional) for the last 2 years (2022-2024).
Problem Statement
Law7 currently has no court decisions in the database. Court decisions show how laws are actually interpreted and applied in practice, which is invaluable for:
- AI Assistance: Better understanding of how legal articles work in real cases
- Legal Research: Finding precedents for specific articles
- Article Context: Seeing practical applications of legal codes
- Historical Tracking: How court interpretations change over time
Related Work
This feature expands significantly on Phase 7C (Issue #22), which currently covers only:
- Supreme Court + Constitutional Court only (~1K-2K docs)
This issue adds:
- Arbitration courts (kad.arbitr.ru) - economic disputes
- General jurisdiction (sudrf.ru) - civil, criminal, administrative cases
- Supreme/Constitutional courts (vsrf.ru, ksrf.ru) - high-level precedents
- Time scope: Last 2 years (2022-2024) instead of all-time
Proposed Solution
Hybrid approach (balance reliability and coverage):
- Start with pravo.gov.ru API (official, stable) - quick wins
- Add scraping for comprehensive coverage:
- kad.arbitr.ru (arbitration courts)
- sudrf.ru (general jurisdiction)
- vsrf.ru / ksrf.ru (supreme/constitutional)
Architecture
Follow the established country_modules pattern from Phase 7A:
scripts/country_modules/russia/scrapers/
├── court_scraper.py # Base court scraper (extend BaseScraper)
├── kad_scraper.py # Arbitration courts scraper
├── sudrf_scraper.py # General jurisdiction scraper
└── supreme_scraper.py # Supreme + Constitutional courts
Database Schema
-- Court decisions metadata
CREATE TABLE court_decisions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
country_id VARCHAR(3) REFERENCES countries(id),
case_number VARCHAR(255) UNIQUE NOT NULL,
decision_date DATE NOT NULL,
court_name TEXT NOT NULL,
court_code VARCHAR(50),
case_type VARCHAR(100),
instance VARCHAR(50), -- first, appeal, cassation, supreme
decision_text TEXT,
source_url TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Link court decisions to articles they interpret
CREATE TABLE court_decision_article_references (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
court_decision_id UUID REFERENCES court_decisions(id) ON DELETE CASCADE,
code_id VARCHAR(50), -- e.g., 'GK_RF', 'UK_RF'
article_number VARCHAR(50), -- e.g., '123', '124.1'
reference_context TEXT, -- excerpt showing how article was interpreted
reference_type VARCHAR(50), -- 'cited', 'interpreted', 'applied'
created_at TIMESTAMP DEFAULT NOW()
);
-- Court metadata reference table
CREATE TABLE courts (
id VARCHAR(50) PRIMARY KEY, -- court code
name TEXT NOT NULL,
court_type VARCHAR(50), -- 'arbitration', 'general', 'supreme', 'constitutional'
url TEXT,
jurisdiction TEXT
);Article Reference Extraction
Parse court decisions to extract article citations using regex patterns:
# Russian court decision citation patterns
patterns = [
r'(?:ст\.?\s*|статья\s+)(\d+(?:\.\d+)*)\s+([А-ЯЁA-Z]{2,}(?:\s+[А-ЯЁA-Z]{2,})?(?:\s*РФ)?)', # "ст. 15 ГК РФ"
r'(?:п\.?\s*|пункт\s+)(\d+(?:\.\d+)*)\s*(?:ст\.?\s*|статья\s+)(\d+)', # "п. 2 ст. 15"
]MCP Tool
New MCP tool: get-court-decisions-for-article
{
name: "get-court-decisions-for-article",
description: "Get court decisions that interpret or apply a specific legal article",
inputSchema: {
code_id: "string", // e.g., "GK_RF"
article_number: "string", // e.g., "123"
court_type?: "string", // optional filter
limit: "number" // default 10
}
}Alternatives Considered
- Only pravo.gov.ru API - simpler but limited coverage
- Only commercial APIs (ConsultantPlus, Garant) - rejected, official sources only
- Scraping only - comprehensive but high maintenance burden
- Hybrid approach (selected) - starts with API, adds scraping for coverage
Official Sources Only
Constraint: Use only official government sources
- ✅ pravo.gov.ru API (official legal publication portal)
- ✅ kad.arbitr.ru (arbitration courts database)
- ✅ sudrf.ru (general jurisdiction courts database)
- ✅ vsrf.ru (Supreme Court official site)
- ✅ ksrf.ru (Constitutional Court official site)
- ❌ ConsultantPlus, Garant, Sudact (commercial - excluded)
Additional Context
Current Database: 157K+ legal documents with full consolidation history (2011-present)
Target: Court decisions for last 2 years (2022-2024) with:
- Article reference links
- Partial embeddings (summaries only) for semantic search
- Metadata filtering by court, case type, date
Related Files:
scripts/country_modules/base/scraper.py- BaseScraper ABC to extendscripts/country_modules/registry.py- Register new scrapersdocker/postgres/init.sql- Database schemasrc/server.ts- MCP server tool registration
Implementation Ideas (Optional)
Implementation Timeline (~8 weeks)
Week 1: Foundation
- Create database schema (court_decisions, article_references, courts)
- Extend PravoApiClient for court decision endpoints
- Implement court decision parser (article reference extraction)
Week 2: Official API Integration
- Fetch court decisions from pravo.gov.ru API (last 2 years)
- Parse and store in database
- Extract article references
- Generate partial embeddings (summaries only)
Week 3-4: Arbitration Courts
- Implement kad_scraper.py (arbitration courts)
- Fetch last 2 years of decisions
- Parse and merge with existing data
Week 5-6: General Jurisdiction
- Implement sudrf_scraper.py (general courts)
- Fetch last 2 years of decisions
- Parse and merge with existing data
Week 7: Supreme/Constitutional Courts
- Implement vsrf/ksrf scrapers
- Fetch last 2 years of high-level decisions
Week 8: MCP Tool & Search
- Implement get-court-decisions-for-article tool
- Add semantic search for summaries
- Test end-to-end functionality
Reference Scrapers
Leverage existing GitHub implementations as reference:
- yuglebov/kad_arbitr_ru - KAD scraper
- tochno-st/sudrfscraper - SUDRF scraper
Add respectful rate limiting (10-30s delays) per AI_WORKFLOW.md guidelines.
Reference
- Related: Issue Phase 7C: Priority 1 Enhancements - Regional, Courts, Ministry Data #22 - Phase 7C: Priority 1 Enhancements (Supreme + Constitutional courts only)
- Expands: Phase 7C scope to all court types
- Follows: AI_WORKFLOW.md guidelines (official sources only, batch operations, bias mitigation)
- Uses: country_modules architecture from Phase 7A (Issue [Phase 7A] Create country module registry #15-[Phase 7A] Integration testing and validation #21)
Priority
HIGH - Valuable context for AI and users, official sources available, architecture ready
Success Criteria
- Database schema created for court decisions and article references
- Scrapers implemented for all 4 official court sources
- Court decisions fetched for last 2 years (2022-2024)
- Article references extracted and linked
- MCP tool functional for querying decisions by article
- Partial embeddings generated for semantic search
- End-to-end test: Query court decisions for article 15 of Civil Code