Skip to content

πŸ›‘οΈ Redactify – Data Minimization Scanner

License

Notifications You must be signed in to change notification settings

Parantap-Mishra/Redactify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ Redactify – Data Minimization Scanner

Redactify is a local Flask-based web application that scans .txt and .csv files for sensitive information and redacts it using a combination of regular expressions and NLP (spaCy). It helps organizations minimize data exposure by only retaining essential information.

🧠 Example Use Case: Hospitals or clinics can use Redactify to remove patient-identifiable data from medical records before sharing them for research or AI model training.


✨ Features

  • πŸ” Detects and redacts:

    • Names, organizations, locations, and dates using spaCy's NER
    • Emails, phone numbers, usernames, and formatted dates using regex
  • πŸ“„ Supports .txt and .csv files

  • ⚑ Instant download of redacted file

  • 🎨 Aesthetic and professional UI with drag-and-drop + file picker

  • πŸ“¦ Comes with pre-configured virtual environment (env/)


πŸ“‚ Folder Structure

Redactify/
β”œβ”€β”€ app.py
β”œβ”€β”€ redactor.py
β”œβ”€β”€ uploads/                # Uploaded and redacted files
β”œβ”€β”€ templates/
β”‚   └── index.html
β”œβ”€β”€ static/
β”‚   └── style.css
β”œβ”€β”€ env/                   # Virtual environment (already configured)
β”œβ”€β”€ README.md

πŸš€ How to Run Locally

  1. Clone the repository

    git clone https://github.com/yourusername/redactify.git
    cd redactify
  2. Activate the virtual environment

    # Windows
    env\Scripts\activate
    
    # macOS/Linux
    source env/bin/activate
  3. Run the Flask server

    python app.py
  4. Open your browser and go to http://localhost:5000


βš™οΈ Dependencies (Preinstalled in env/)

  • Flask
  • spaCy (en_core_web_sm)
  • pandas
  • re (built-in)

To download the spaCy model again (if needed):

python -m spacy download en_core_web_sm

🧠 Learning Goals

This project was built not just to demonstrate utility but also to:

  • Practice secure file handling
  • Learn NLP-based entity recognition
  • Implement privacy-aware design thinking
  • Understand regex and named entity offsets

πŸ“Œ Future Scope

  • This is a basic version of Redactify.

  • Future versions may support:

    • Image-based redaction using OCR
    • Docx and PDF file types
    • Scan summary reports (entities redacted, counts, etc.)
    • Integration with cloud storage (optional)

πŸ‘¨β€πŸ’» Created By

Parantap Mishra GitHub Β· LinkedIn

About

πŸ›‘οΈ Redactify – Data Minimization Scanner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors