Rechtmaschine

AI-powered legal document classification, research, and generation tool for German asylum law.

Overview

Rechtmaschine assists German asylum lawyers by automatically classifying legal documents, conducting intelligent web research, and generating structured legal drafts. The system leverages multiple AI models and features automatic document segmentation for complex case files (Akten), OCR text extraction, automated anonymization, and direct integration with j-lawyer practice management software.

Live at: https://rechtmaschine.de

Current Status

Fully functional production system with:

Gemini-powered document classification with structured output validation
Automatic PDF segmentation for Akte files (extracts Anhörung and Bescheid documents)
Web research with Google Search grounding + asyl.net legal database integration
Saved sources management with automatic PDF downloads
Document generation via Claude Sonnet 4.5 with Files API
OCR text extraction via external microservice
Automated anonymization with NER-based plaintiff identification
j-lawyer template integration for direct export
Real-time UI updates via Server-Sent Events (SSE)

Tech Stack

Core Infrastructure

Frontend: Embedded HTML/CSS/JS with real-time SSE updates (Svelte migration planned)
Backend: FastAPI (Python 3.11) with modular endpoint architecture running in Docker
Database: PostgreSQL (latest alpine) with LISTEN/NOTIFY for real-time events
Reverse Proxy: Caddy (HTTPS with automatic certificates)
Deployment: Docker Compose on self-hosted server

AI Models

Google Gemini 2.5 Flash: Document classification with structured JSON output
Google Gemini 2.5 Flash: Automatic PDF segmentation for Akte files
Google Gemini 2.5 Flash: Web research with Google Search grounding
Anthropic Claude Sonnet 4.5: Structured document generation via Files API

External Services

OCR Service (desktop:8004 via Tailscale): PaddleOCR-based text extraction for scanned documents
Anonymization Service (desktop:8004 via Tailscale): NER-based plaintiff identification and text anonymization
Playwright: Web scraping for asyl.net search and automatic PDF detection

Features

Document Management

Intelligent Classification: Automatically categorizes uploaded PDFs into:
- Anhörung (hearing protocols)
- Bescheid (administrative decisions)
- Akte (complete BAMF case files)
- Rechtsprechung (case law)
- Sonstiges (other documents)
Automatic Segmentation: Akte files are automatically split into individual Anhörung and Bescheid documents
OCR Text Extraction: Extract text from scanned PDFs using PaddleOCR
Automated Anonymization: NER-based identification and replacement of plaintiff names with "Kläger/Klägerin"
Real-time Updates: Instant UI updates via SSE (no polling required)

Research & Sources

Dual Web Research:
- Gemini with Google Search grounding (official sources, courts, government)
- asyl.net database scraping with keyword suggestions
Automatic PDF Detection: First 10 sources analyzed for PDF availability
Saved Sources: Organize research with automatic PDF downloads and status tracking
Download Management: Background PDF downloads with real-time status updates

Document Generation

Structured Drafts: Generate Klagebegründung or Schriftsatz using Claude Sonnet 4.5
Multi-document Context: Reference multiple uploaded PDFs (Anhörung, Bescheid, Rechtsprechung, saved sources)
Citation Analysis: Automatic detection of citations with quality warnings
j-lawyer Integration:
- Direct export to ODT templates
- List available templates from configured folder
- Populate placeholders with generated text
- Custom file naming support

Real-time Architecture

PostgreSQL LISTEN/NOTIFY: Dual channels for document and source updates
Unified SSE Stream: Single connection for all entity types
Zero Polling: All updates delivered via push notifications
Instant UI Updates: Sub-100ms latency for all operations

Database Schema

PostgreSQL Tables:

documents - Classified legal documents with category, confidence, and file metadata
research_sources - Saved legal research sources with PDF download tracking
processed_documents - Text extraction and anonymization results linked to documents

API Endpoints

Document Operations

POST /classify - Upload and classify PDF (auto-segments Akte files)
GET /documents - Retrieve all documents grouped by category
GET /documents/stream - SSE stream for real-time updates
DELETE /documents/{filename} - Delete document and associated files

Document Processing

POST /documents/{document_id}/ocr - Extract text via OCR service
POST /documents/{document_id}/anonymize - Anonymize document with NER
POST /anonymize-file - Anonymize uploaded file without storing

Research & Sources

POST /research - Web research (Gemini + asyl.net)
POST /sources - Save source to collection
GET /sources - List all saved sources
GET /sources/download/{source_id} - Download saved PDF
DELETE /sources/{source_id} - Delete specific source

Document Generation

POST /generate - Generate legal draft with Claude Files API
GET /jlawyer/templates - List available ODT templates
POST /send-to-jlawyer - Populate j-lawyer template with generated text

System

GET / - Main HTML interface
GET /health - Health check
DELETE /reset - Clear all data (documents, sources, processed data)

Development

Quick Start

cd /var/opt/docker/rechtmaschine/app
docker compose up -d
docker compose logs -f app

Environment Variables

Create .env file with:

DATABASE_URL=postgresql://rechtmaschine:password@postgres:5432/rechtmaschine_db
GOOGLE_API_KEY=your_gemini_api_key
ANTHROPIC_API_KEY=your_claude_api_key
JLAWYER_BASE_URL=http://jlawyer-server:8080
JLAWYER_USERNAME=username
JLAWYER_PASSWORD=password

Database Access

docker exec -it rechtmaschine-postgres psql -U rechtmaschine -d rechtmaschine_db

Restart After Changes

docker compose restart app

Note: Hot reload doesn't work reliably with Docker volumes. Manual restart required after code changes.

Architecture Details

See CLAUDE.md for comprehensive technical documentation including:

Detailed implementation patterns
SSE architecture and design decisions
External service integration
Module documentation
Known issues and limitations
Future development roadmap

Security & Privacy

All processing happens on self-hosted infrastructure
OCR and anonymization run on isolated Tailscale-connected services
No client data sent to external services except AI API providers (Google, Anthropic)
Automatic anonymization for plaintiff identification before draft generation
PostgreSQL for secure persistent storage

Developed and deployed on self-hosted infrastructure for maximum security and control.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.agent/rules		.agent/rules
anon		anon
app		app
docs		docs
legacy		legacy
ocr		ocr
rag		rag
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
inspect_thinking_config.py		inspect_thinking_config.py
kanzlei-gemini.py		kanzlei-gemini.py
migrate_gemini_uri.py		migrate_gemini_uri.py
requirements.txt		requirements.txt
service_manager.py		service_manager.py
verify_gpt5_upload.py		verify_gpt5_upload.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rechtmaschine

Overview

Current Status

Tech Stack

Core Infrastructure

AI Models

External Services

Features

Document Management

Research & Sources

Document Generation

Real-time Architecture

Database Schema

API Endpoints

Document Operations

Document Processing

Research & Sources

Document Generation

System

Development

Quick Start

Environment Variables

Database Access

Restart After Changes

Architecture Details

Security & Privacy

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

derspotter/rechtmaschine

Folders and files

Latest commit

History

Repository files navigation

Rechtmaschine

Overview

Current Status

Tech Stack

Core Infrastructure

AI Models

External Services

Features

Document Management

Research & Sources

Document Generation

Real-time Architecture

Database Schema

API Endpoints

Document Operations

Document Processing

Research & Sources

Document Generation

System

Development

Quick Start

Environment Variables

Database Access

Restart After Changes

Architecture Details

Security & Privacy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages