A Streamlit-based tool for scanning datasets or SQL tables for PII (Personally Identifiable Information), checking compliance rules, applying anonymization techniques, and generating reports with full scan history tracking.
-
π Data Input
- Upload CSV / Excel files
- Upload and scan SQLite databases
-
π PII Detection
- Regex-based detection (email, phone, credit card, SSN, IP, passport, name, address, etc.)
- NLP-based detection (person, organization, location using spaCy)
-
π Compliance Scoring
- Define rules (
max_pii_fields, allowed PII types, anonymization required) - Automatic compliance check with score & violations
- Define rules (
-
π‘ Anonymization
- Multiple methods:
mask,hash,redact,fake,pseudonymize - Per-PII type method selection
- Verification of anonymization success
- Multiple methods:
-
π Compliance Summary Dashboard
- Scan history viewer (date & source filters)
- Compliance score trends
- Most frequent PII types
- Violation frequencies
- Anonymization rate over time
-
π Reporting & Export
- PDF & CSV compliance reports
- Save anonymized datasets
- Scan history stored in
output/scan_history.csv
π§ How It Works
1, Upload your dataset via web UI or CLI.
2, The tool scans all columns and rows for patterns matching PII.
3, It generates:
A compliance score
Risk rating (Low, Medium, High)
A PDF/CSV report of issues
Optional: Automatically anonymize/mask detected fields.
| Component | Tool/Library |
|---|---|
| Language | Python 3.x |
| Data Handling | Pandas, OpenPyXL |
| PII Detection | Regex |
| UI Dashboard | Streamlit |
| Reports | FPDF |
| File Formats | CSV, XLSX, sql light |
| PII Detection & NLP | spaCy (for named entity recognition), regex (for pattern-based PII detection) |
| Visualization | matplotlib / seaborn / plotly (for graphs, compliance summaries, trends) |
| Anonymization | Custom anonymization functions (masking, pseudonymization, or hashing) |
| Security/Validation | Custom logic for compliance scoring, audit logging, and error handling |
-
Clone this repository:
git clone https://github.com/ezadin2/Group_19_cyber_Project.git cd Group_19_cyber_Project.git -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows pip install -r requirements.txt
-
Run the app:
- for it to run you must be in the directry of the folder/ project in your terminalbash or git - bash
streamlit run app.py- or use this if you have multiple version of python
python3 -m streamlit run app.py
privacy_checker/
βββ app.py # Main Streamlit application
β
βββ modules/
β βββ anonymize_data.py # Advanced anonymization functions
β βββ pii_detector.py # Regex + NLP PII detection
β βββ compliance_scoring.py # Compliance scoring & rule checks
β βββ history_logger.py # Logs scan history
β βββ report_generator.py # PDF & CSV reporting
β βββ db_loader.py # SQLite loader
β βββ file_loader.py # CSV/Excel loader
β
βββ output/ # Reports & scan history
β βββ anonymized_data.csv
β βββ compliance_report.pdf
β βββ compliance_report.csv
β βββ scan_history.csv
β
βββ requirements.txt # Python dependencies
β
βββdashboard.py
β
βββcheck_setup.py
β
βββtemp/ # contains temporary files which have been scanned
β
βββconfig/ # Configuration files and rules configuration in .json file format
β
βββmain.py # cli work format in terminal or bash
β
βββreadme.md
β
βββhistry/ # contains scan history in cvs format
β
βββtest/
β
βββ data/ # contains sample datas to test the app
--- requirements
this libreries are required for this project to work sucsessfully
plotly sqlite3 modules openpyxlythis ththis z' pandas matlib os csv json logging hashlib streamlit plotly.express fpdf re date faker xlsxwriter matplotlib collections spacy
-
Go to Privacy Scanner:
- Upload a dataset (CSV/Excel/SQLite).
- Run a scan to detect PII.
- View compliance score & violations.
- Choose anonymization methods per PII type.
- Download anonymized dataset + compliance reports.
-
Go to Compliance Summary Dashboard:
- View scan history.
- Analyze compliance score trends.
- Check most common PII types.
- Review frequent violations.
- Track anonymization success over time.
You can test the program without launching the Streamlit UI by running the test scripts located in the test/ folder.
git clone https://github.com/ezadin2/Group_19_cyber_Project.git
cd Group_19_cyber_Project
python -m venv venv
Windows (Git Bash / PowerShell)
source venv/Scripts/activate
pip install --no-cache-dir -r requirements.txt
python -m spacy download en_core_web_sm
Option 1: Run a specific test file (PII Detector)
python -m pytest test/test_pii_detector.py -v --disable-warnings
Option 2: Run all tests from project root
python -m pytest -v --disable-warnings
Option 3: Run all tests from inside the test directory
cd test
python -m pytest -v --disable-warnings
- Detection Results: List of sensitive columns and detected patterns.
- Compliance Report: PDF + CSV with score & violations.
- Anonymized Dataset: Downloadable CSV with masked/hashed/faked values.
- Dashboard: Visual trends for compliance & anonymization across multiple scans.
Regex-based PII detection Advanced NLP for contextual PII detection
- Support for Excel export of scan history
- Custom rule editor in-app
- Severity levels for PII violations (High/Medium/Low)
- Smart compliance summaries
- Compliance score + report
- NLP-based PII detection
- API integration
- Multi-language support
| π₯Contributor |
|---|
| Abenezer Markos |
| Ezadin Badiru |
| Kaleab |
| William |
| Maranatha |
MIT License. You are free to use and modify.
this project is still being devloped so stay tuned for futer updated and news...