An interactive web application that simplifies complex legal documents like contracts, policies, and acts using a transformer-based abstractive summarization model from Hugging Face. Built using Streamlit, this tool enables users to upload .txt, .pdf, or .docx files and get readable, condensed summaries instantly.
Legal language is often long, repetitive, and hard to interpret. This app automates the summarization of such documents, helping:
- π Law students & educators
- π§ββοΈ Legal professionals
- π§ General users needing simplified legal content
- π Upload legal documents in
.txt,.pdf, or.docx - π Automatic chunking for long texts
- π€ BART-based summarization (
facebook/bart-large-cnn) - π Select between Concise and Detailed summaries
- π₯ Download the final summary as a
.txtfile - π§Ύ PDF and DOCX parsing support
| Tool | Purpose |
|---|---|
| Streamlit | Frontend interface |
| Hugging Face Transformers | Abstractive summarization |
| NLTK | Sentence tokenization |
| PyMuPDF | PDF text extraction |
| python-docx | DOCX text extraction |
- Model:
facebook/bart-large-cnn - Type: Abstractive summarization
- Token Limit: Up to 1024 tokens per chunk
- Trained On: CNN/DailyMail dataset
The app was tested on various legal documents including the Indian Patents Act, 1970.
- Original Length: ~14,000 words
- Final Summary: ~300β500 words
- Performance:
- β Preserved key definitions and clauses
- β Reduced redundant legal text
- β Made procedural language easier to interpret
Example:
βPatent means a patent for any invention granted under this Act...β
βΆ Retained due to legal significance and accuracy