Redactify is a local Flask-based web application that scans .txt and .csv files for sensitive information and redacts it using a combination of regular expressions and NLP (spaCy). It helps organizations minimize data exposure by only retaining essential information.
π§ Example Use Case: Hospitals or clinics can use Redactify to remove patient-identifiable data from medical records before sharing them for research or AI model training.
-
π Detects and redacts:
- Names, organizations, locations, and dates using spaCy's NER
- Emails, phone numbers, usernames, and formatted dates using regex
-
π Supports
.txtand.csvfiles -
β‘ Instant download of redacted file
-
π¨ Aesthetic and professional UI with drag-and-drop + file picker
-
π¦ Comes with pre-configured virtual environment (
env/)
Redactify/
βββ app.py
βββ redactor.py
βββ uploads/ # Uploaded and redacted files
βββ templates/
β βββ index.html
βββ static/
β βββ style.css
βββ env/ # Virtual environment (already configured)
βββ README.md
-
Clone the repository
git clone https://github.com/yourusername/redactify.git cd redactify -
Activate the virtual environment
# Windows env\Scripts\activate # macOS/Linux source env/bin/activate
-
Run the Flask server
python app.py
-
Open your browser and go to http://localhost:5000
- Flask
- spaCy (
en_core_web_sm) - pandas
- re (built-in)
To download the spaCy model again (if needed):
python -m spacy download en_core_web_smThis project was built not just to demonstrate utility but also to:
- Practice secure file handling
- Learn NLP-based entity recognition
- Implement privacy-aware design thinking
- Understand regex and named entity offsets
-
This is a basic version of Redactify.
-
Future versions may support:
- Image-based redaction using OCR
- Docx and PDF file types
- Scan summary reports (entities redacted, counts, etc.)
- Integration with cloud storage (optional)