SLeDoc: System for Legal Document Comparison 🖥️✨

Gradio-based UI for comparing documents and spotting meaningful semantic matches/mismatches.

🎥 Demo video: link

We present SLeDoC, a system for pairwise, span-aware semantic document comparison that moves beyond token- or character-level matching to semantic judgments. SLeDoC combines information-retrieval (IR) techniques with state-of-the art large language models (LLMs).

Overview 🌿

What This Repo Delivers 🎯

SLeDoc provides an end-to-end flow to compare two documents, align their claims, and surface semantic mismatches with short, readable explanations.

The primary UI lives in app/full_pipeline_new.py.

Pipeline: Full Mismatch Flow 🧭

Below is the full pipeline as implemented in the main Gradio app (full_pipeline_new.py)

📄 Input documents: two files (docx / code / text).
🧹 Text extraction and cleanup.
✂️ Paragraph split + claim extraction.
🔗 Pairwise matching (LLM + rules).
⚖️ NLI labeling (contradiction / neutral / entailment).
📌 Anchor span selection for precise evidence.
🖥️ UI rendering: side-by-side views + explanations + exportable JSON.

Repository Layout 🗂️

app/ Python package
│
├─ main.py FastAPI entry – mounts all three UIs
├─ full_pipeline_new.py Full mismatch pipeline + UI
├─ nli.py NLI viewer
└─ pipeline.py Two-step pipeline
│
├─ requirements.txt
├─ Dockerfile
├─ docker-compose.yml
└─ Makefile

Environment/Configuration (.env) 🔧

Create a .env from .env.example (or set these in your process manager):

Set at least one of:

OPENAI_API_KEY or OPENROUTER_API_KEY

Optional:

LLM_MAX_PARALLEL — max concurrent LLM calls (default: 8).
LLM_NLI_SYSTEM_PROMPT_FILE — path to system prompt for the end-to-end LLM branch.
CLAIM_EXTRACTOR_SYSTEM_PROMPT_FILE — path to system prompt for the claim-extraction stage.
LLM_PROVIDER — force provider: openrouter or local. If unset, provider is inferred from the model name.
DEFAULT_MODEL — default LLM model id used by the UI and pipeline (e.g., openrouter/mistral-7b).

You may also inline long prompts via LLM_NLI_SYSTEM_PROMPT / CLAIM_EXTRACTOR_SYSTEM_PROMPT

Quick start (Docker) 🐳

Before any launch need to create an environment file: .env:

OPENAI_API_KEY="sk-proj-"
OPENROUTER_API_KEY="sk-or-"
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1" # or your custom OpenRouter URL
OPENAI_BASE_URL="https://api.openai.com/v1" # or your custom URL
DEFAULT_MODEL="openrouter/mistral-7b" # or "openai/gpt-4o" or "openai/gpt-4o-mini" - any model
LLM_PROVIDER="openrouter"            # optional: force OpenRouter even without a provider prefix

# For local OpenAI-compatible servers, e.g., FastAPI-based ones:
OPENAI_API_KEY="dummy"                      # required by the SDK
OPENAI_BASE_URL="http://127.0.0.1:8000/v1"  # your FastAPI OpenAI-compatible base

LLM_MAX_PARALLEL="8" # How many requests to handle in parallel in the Gradio UI
LLM_NLI_SYSTEM_PROMPT_FILE="prompts/llm_nli_system.en.md" # main prompt for NLI
CLAIM_EXTRACTOR_SYSTEM_PROMPT_FILE="prompts/llm_align_defaut.en.md" # main prompt for CLAIM EXTRACTOR

# 1. build the image
docker build -t viewers .

# 2. run it (ports 7860 → 7860)
docker run --rm -p 7860:7860 viewers

Local development 💻

# install deps into the active venv
make vendor

# run with auto-reload
make run

# format code
make fmt

# static analysis
make lint

# run tests
make test

# run a service
make run

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
dataset		dataset
example_data		example_data
experiments		experiments
nli/output		nli/output
prompts		prompts
tests		tests
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.hfignore		.hfignore
.prettierrc.json		.prettierrc.json
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLeDoc: System for Legal Document Comparison 🖥️✨

Overview 🌿

What This Repo Delivers 🎯

Pipeline: Full Mismatch Flow 🧭

Repository Layout 🗂️

Environment/Configuration (.env) 🔧

Quick start (Docker) 🐳

Local development 💻

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SLeDoc: System for Legal Document Comparison 🖥️✨

Overview 🌿

What This Repo Delivers 🎯

Pipeline: Full Mismatch Flow 🧭

Repository Layout 🗂️

Environment/Configuration (.env) 🔧

Quick start (Docker) 🐳

Local development 💻

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages