Skip to content

πŸ” Data Privacy Compliance Checker– AI-Powered Data Privacy Compliance Suite A modular cybersecurity toolkit by Group 19 for detecting, anonymizing, and auditing PII in structured datasets. Built for offline environments to support privacy compliance in government, research, and institutional sectors.

Notifications You must be signed in to change notification settings

ezadin2/Group_19_cyber_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Data Privacy Compliance Checker

A Streamlit-based tool for scanning datasets or SQL tables for PII (Personally Identifiable Information), checking compliance rules, applying anonymization techniques, and generating reports with full scan history tracking.

✨ Features

  • πŸ“‚ Data Input

    • Upload CSV / Excel files
    • Upload and scan SQLite databases
  • πŸ” PII Detection

    • Regex-based detection (email, phone, credit card, SSN, IP, passport, name, address, etc.)
    • NLP-based detection (person, organization, location using spaCy)
  • πŸ“‹ Compliance Scoring

    • Define rules (max_pii_fields, allowed PII types, anonymization required)
    • Automatic compliance check with score & violations
  • πŸ›‘ Anonymization

    • Multiple methods: mask, hash, redact, fake, pseudonymize
    • Per-PII type method selection
    • Verification of anonymization success
  • πŸ“Š Compliance Summary Dashboard

    • Scan history viewer (date & source filters)
    • Compliance score trends
    • Most frequent PII types
    • Violation frequencies
    • Anonymization rate over time
  • πŸ“‘ Reporting & Export

    • PDF & CSV compliance reports
    • Save anonymized datasets
    • Scan history stored in output/scan_history.csv

🧠 How It Works

1, Upload your dataset via web UI or CLI.

2, The tool scans all columns and rows for patterns matching PII.

3, It generates:

A compliance score

Risk rating (Low, Medium, High)

A PDF/CSV report of issues

Optional: Automatically anonymize/mask detected fields.


πŸ“¦ Technologies Used

Component Tool/Library
Language Python 3.x
Data Handling Pandas, OpenPyXL
PII Detection Regex
UI Dashboard Streamlit
Reports FPDF
File Formats CSV, XLSX, sql light
PII Detection & NLP spaCy (for named entity recognition), regex (for pattern-based PII detection)
Visualization matplotlib / seaborn / plotly (for graphs, compliance summaries, trends)
Anonymization Custom anonymization functions (masking, pseudonymization, or hashing)
Security/Validation Custom logic for compliance scoring, audit logging, and error handling

πŸ“¦ Installation

  1. Clone this repository:

    git clone https://github.com/ezadin2/Group_19_cyber_Project.git
    cd Group_19_cyber_Project.git
  2. Create a virtual environment and install dependencies:

    python -m venv venv
    source venv/bin/activate   # Linux/Mac
    venv\Scripts\activate      # Windows
    
    pip install -r requirements.txt
  3. Run the app:

  • for it to run you must be in the directry of the folder/ project in your terminalbash or git - bash
streamlit run app.py
  • or use this if you have multiple version of python
python3 -m streamlit run app.py

πŸ“‚ Project Structure

privacy_checker/
│── app.py                     # Main Streamlit application
β”‚
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ anonymize_data.py      # Advanced anonymization functions
β”‚   β”œβ”€β”€ pii_detector.py        # Regex + NLP PII detection
β”‚   β”œβ”€β”€ compliance_scoring.py  # Compliance scoring & rule checks
β”‚   β”œβ”€β”€ history_logger.py      # Logs scan history
β”‚   β”œβ”€β”€ report_generator.py    # PDF & CSV reporting
β”‚   β”œβ”€β”€ db_loader.py           # SQLite loader
β”‚   └── file_loader.py         # CSV/Excel loader
β”‚
β”œβ”€β”€ output/                    # Reports & scan history
β”‚   β”œβ”€β”€ anonymized_data.csv
β”‚   β”œβ”€β”€ compliance_report.pdf
β”‚   β”œβ”€β”€ compliance_report.csv
β”‚   └── scan_history.csv
β”‚
└── requirements.txt           # Python dependencies
β”‚
│──dashboard.py
β”‚
│──check_setup.py
β”‚
│──temp/       # contains temporary files which have been scanned 
β”‚
│──config/     # Configuration files and rules configuration in .json file format
β”‚
│──main.py     # cli work format in terminal or bash
β”‚
│──readme.md
β”‚
│──histry/      # contains scan history in cvs format 
β”‚
│──test/
β”‚
│── data/  # contains sample datas to test the app

--- requirements

this libreries are required for this project to work sucsessfully

plotly sqlite3 modules openpyxlythis ththis z' pandas matlib os csv json logging hashlib streamlit plotly.express fpdf re date faker xlsxwriter matplotlib collections spacy

βš™οΈ Usage

  1. Go to Privacy Scanner:

    • Upload a dataset (CSV/Excel/SQLite).
    • Run a scan to detect PII.
    • View compliance score & violations.
    • Choose anonymization methods per PII type.
    • Download anonymized dataset + compliance reports.
  2. Go to Compliance Summary Dashboard:

    • View scan history.
    • Analyze compliance score trends.
    • Check most common PII types.
    • Review frequent violations.
    • Track anonymization success over time.

πŸ§ͺ Testing Without Running the App

You can test the program without launching the Streamlit UI by running the test scripts located in the test/ folder.

1. Clone the repository

git clone https://github.com/ezadin2/Group_19_cyber_Project.git
cd Group_19_cyber_Project

2. Create a virtual environment

python -m venv venv

3. Activate the virtual environment

Windows (Git Bash / PowerShell)

source venv/Scripts/activate

4. Install all required dependencies

pip install --no-cache-dir -r requirements.txt

5. Install SpaCy language model for NLP-based PII detection

python -m spacy download en_core_web_sm

6. Run tests without launching the Streamlit app

 Option 1: Run a specific test file (PII Detector)
python -m pytest test/test_pii_detector.py -v --disable-warnings
 Option 2: Run all tests from project root
python -m pytest -v --disable-warnings
 Option 3: Run all tests from inside the test directory
cd test
python -m pytest -v --disable-warnings

πŸ“Š Example Output

  • Detection Results: List of sensitive columns and detected patterns.
  • Compliance Report: PDF + CSV with score & violations.
  • Anonymized Dataset: Downloadable CSV with masked/hashed/faked values.
  • Dashboard: Visual trends for compliance & anonymization across multiple scans.

πŸ›£οΈ Roadmap (Planned Features)

Regex-based PII detection Advanced NLP for contextual PII detection

  • Support for Excel export of scan history
  • Custom rule editor in-app
  • Severity levels for PII violations (High/Medium/Low)
  • Smart compliance summaries
  • Compliance score + report
  • NLP-based PII detection
  • API integration
  • Multi-language support

πŸ‘₯ Contributor

πŸ‘₯Contributor
Abenezer Markos
Ezadin Badiru
Kaleab
William
Maranatha

πŸ“ƒ License

MIT License. You are free to use and modify.

this project is still being devloped so stay tuned for futer updated and news...

About

πŸ” Data Privacy Compliance Checker– AI-Powered Data Privacy Compliance Suite A modular cybersecurity toolkit by Group 19 for detecting, anonymizing, and auditing PII in structured datasets. Built for offline environments to support privacy compliance in government, research, and institutional sectors.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages