AI-powered legal document classification, research, and generation tool for German asylum law.
Rechtmaschine assists German asylum lawyers by automatically classifying legal documents, conducting intelligent web research, and generating structured legal drafts. The system leverages multiple AI models and features automatic document segmentation for complex case files (Akten), OCR text extraction, automated anonymization, and direct integration with j-lawyer practice management software.
Live at: https://rechtmaschine.de
Fully functional production system with:
- Gemini-powered document classification with structured output validation
- Automatic PDF segmentation for Akte files (extracts Anhörung and Bescheid documents)
- Web research with Google Search grounding + asyl.net legal database integration
- Saved sources management with automatic PDF downloads
- Document generation via Claude Sonnet 4.5 with Files API
- OCR text extraction via external microservice
- Automated anonymization with NER-based plaintiff identification
- j-lawyer template integration for direct export
- Real-time UI updates via Server-Sent Events (SSE)
- Frontend: Embedded HTML/CSS/JS with real-time SSE updates (Svelte migration planned)
- Backend: FastAPI (Python 3.11) with modular endpoint architecture running in Docker
- Database: PostgreSQL (latest alpine) with LISTEN/NOTIFY for real-time events
- Reverse Proxy: Caddy (HTTPS with automatic certificates)
- Deployment: Docker Compose on self-hosted server
- Google Gemini 2.5 Flash: Document classification with structured JSON output
- Google Gemini 2.5 Flash: Automatic PDF segmentation for Akte files
- Google Gemini 2.5 Flash: Web research with Google Search grounding
- Anthropic Claude Sonnet 4.5: Structured document generation via Files API
- OCR Service (
desktop:8004via Tailscale): PaddleOCR-based text extraction for scanned documents - Anonymization Service (
desktop:8004via Tailscale): NER-based plaintiff identification and text anonymization - Playwright: Web scraping for asyl.net search and automatic PDF detection
- Intelligent Classification: Automatically categorizes uploaded PDFs into:
- Anhörung (hearing protocols)
- Bescheid (administrative decisions)
- Akte (complete BAMF case files)
- Rechtsprechung (case law)
- Sonstiges (other documents)
- Automatic Segmentation: Akte files are automatically split into individual Anhörung and Bescheid documents
- OCR Text Extraction: Extract text from scanned PDFs using PaddleOCR
- Automated Anonymization: NER-based identification and replacement of plaintiff names with "Kläger/Klägerin"
- Real-time Updates: Instant UI updates via SSE (no polling required)
- Dual Web Research:
- Gemini with Google Search grounding (official sources, courts, government)
- asyl.net database scraping with keyword suggestions
- Automatic PDF Detection: First 10 sources analyzed for PDF availability
- Saved Sources: Organize research with automatic PDF downloads and status tracking
- Download Management: Background PDF downloads with real-time status updates
- Structured Drafts: Generate Klagebegründung or Schriftsatz using Claude Sonnet 4.5
- Multi-document Context: Reference multiple uploaded PDFs (Anhörung, Bescheid, Rechtsprechung, saved sources)
- Citation Analysis: Automatic detection of citations with quality warnings
- j-lawyer Integration:
- Direct export to ODT templates
- List available templates from configured folder
- Populate placeholders with generated text
- Custom file naming support
- PostgreSQL LISTEN/NOTIFY: Dual channels for document and source updates
- Unified SSE Stream: Single connection for all entity types
- Zero Polling: All updates delivered via push notifications
- Instant UI Updates: Sub-100ms latency for all operations
PostgreSQL Tables:
- documents - Classified legal documents with category, confidence, and file metadata
- research_sources - Saved legal research sources with PDF download tracking
- processed_documents - Text extraction and anonymization results linked to documents
POST /classify- Upload and classify PDF (auto-segments Akte files)GET /documents- Retrieve all documents grouped by categoryGET /documents/stream- SSE stream for real-time updatesDELETE /documents/{filename}- Delete document and associated files
POST /documents/{document_id}/ocr- Extract text via OCR servicePOST /documents/{document_id}/anonymize- Anonymize document with NERPOST /anonymize-file- Anonymize uploaded file without storing
POST /research- Web research (Gemini + asyl.net)POST /sources- Save source to collectionGET /sources- List all saved sourcesGET /sources/download/{source_id}- Download saved PDFDELETE /sources/{source_id}- Delete specific source
POST /generate- Generate legal draft with Claude Files APIGET /jlawyer/templates- List available ODT templatesPOST /send-to-jlawyer- Populate j-lawyer template with generated text
GET /- Main HTML interfaceGET /health- Health checkDELETE /reset- Clear all data (documents, sources, processed data)
cd /var/opt/docker/rechtmaschine/app
docker compose up -d
docker compose logs -f appCreate .env file with:
DATABASE_URL=postgresql://rechtmaschine:password@postgres:5432/rechtmaschine_db
GOOGLE_API_KEY=your_gemini_api_key
ANTHROPIC_API_KEY=your_claude_api_key
JLAWYER_BASE_URL=http://jlawyer-server:8080
JLAWYER_USERNAME=username
JLAWYER_PASSWORD=password
docker exec -it rechtmaschine-postgres psql -U rechtmaschine -d rechtmaschine_dbdocker compose restart appNote: Hot reload doesn't work reliably with Docker volumes. Manual restart required after code changes.
See CLAUDE.md for comprehensive technical documentation including:
- Detailed implementation patterns
- SSE architecture and design decisions
- External service integration
- Module documentation
- Known issues and limitations
- Future development roadmap
- All processing happens on self-hosted infrastructure
- OCR and anonymization run on isolated Tailscale-connected services
- No client data sent to external services except AI API providers (Google, Anthropic)
- Automatic anonymization for plaintiff identification before draft generation
- PostgreSQL for secure persistent storage
Developed and deployed on self-hosted infrastructure for maximum security and control.