Skip to content

Nossks/FinGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ Finguard: Secure Fraud Detection RAG

Privacy-First Financial Forensics Powered by CyborgDB

Python Database AI Security License

fraud_detection is a secure Retrieval-Augmented Generation (RAG) pipeline designed to detect anomalies and flag potential fraud in sensitive financial documents.

Financial data demands absolute privacy. Unlike standard RAG implementations, it leverages CyborgDB to ensure that all transaction embeddings are encrypted at rest and in transit. We enable AI-driven insights without ever exposing raw financial vectors to plain-text vulnerabilities.


๐Ÿ“ธ Demo & Dashboard

๐Ÿ“‰ Fraud Analysis Dashboard

Fraud Detection UI Real-time analysis of transaction logs with risk scoring, powered by secure RAG retrieval.

๐ŸŽฅ Watch the System in Action

FinGuard Demo Click the thumbnail above to see the Encrypted Fraud Detection pipeline flow.


๐Ÿ”’ The Encrypted Architecture (ML Flow)

We utilize CyborgDB to maintain a "Zero-Trust" architecture for vector storage.

graph TD
    subgraph "Ingestion Phase"
        A[Synthetic Data Generator] -->|Create Logs| B(Data Ingestion)
        B -->|Generate Embeddings| C[Hugging Face Transformer]
        C -->|Encrypt & Store| D[(CyborgDB Encrypted Cloud)]
    end

    subgraph "Inference Phase"
        E[User Query] --> R{Router LLM}
        
        %% Chat Branch (Zero Latency)
        R -->|Chat Mode| L[Direct LLM Response]
        L --> I[Analyst Dashboard]

        %% Search Branch (RAG)
        R -->|Search Mode| F[Embed Query]
        F -->|Encrypted Similarity Search| D
        D -->|Retrieve Decrypted Context| G[Transaction Context]
        G -->|Combine with Risk Prompt| H[Main LLM Chain]
        H -->|Generate Risk Report| I
    end

    %% --- STYLING (Dark Theme for GitHub Visibility) ---
    
    %% Standard Nodes (Dark Grey)
    style A fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff
    style B fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff
    style C fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff
    style E fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff
    style F fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff
    style G fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff
    style H fill:#2d2d2d,stroke:#ffffff,stroke-width:2px,color:#fff

    %% Key Nodes (Colored Accents)
    %% CyborgDB (Dark Red)
    style D fill:#4a151b,stroke:#ff6b6b,stroke-width:2px,stroke-dasharray: 5 5,color:#fff
    
    %% Router (Dark Gold)
    style R fill:#3d3100,stroke:#ffd700,stroke-width:2px,color:#fff
    
    %% Direct Chat (Dark Green)
    style L fill:#0d3329,stroke:#00ff41,stroke-width:2px,color:#fff
    
    %% Dashboard (Dark Blue)
    style I fill:#0e2a35,stroke:#00f0ff,stroke-width:2px,color:#fff
Loading

๐Ÿ“Š Security & Performance Benchmarks

In financial contexts, speed matters, but security is non-negotiable. We benchmarked the impact of CyborgDB's encryption on retrieval latency.
Click here to view the full interactive benchmark.html report.

Security & Performance Benchmarks


๐Ÿš€ Key Features

  • CyborgDB Integration: Industry-first encrypted vector search. Even if the database is compromised, the vectors remain unreadable.
  • Context-Aware Forensics: Queries like "Show me transactions over $10k sent to offshore accounts" retrieve exact matches from the encrypted index.
  • Hybrid Analysis: Combines semantic search (RAG) with rule-based filtering for maximum fraud detection coverage.
  • Persistent Secure Indexing: Ingest terabytes of logs once; query securely forever.
  • Synthetic Data Genration:Gerates Synthetic data using Faker library.

๐Ÿ› ๏ธ Tech Stack

Component โ€” Technologies

  • Vector Database โ€” CyborgDB (Encrypted Storage)
  • Orchestration โ€” LangChain, Python
  • Embeddings โ€” Hugging Face (sentence-transformers/all-MiniLM-L6-v2)
  • Web Interface โ€” Flask
  • Data Processing โ€” Pandas, NumPy

๐Ÿ—๏ธ Deployment & Setup

Prerequisites

  • Python 3.11
  • CyborgDB API Key (Required for encrypted storage)
  • Git

1. Clone the Repository

git clone https://github.com/Nossks/fraud_detection.git
cd fraud_detection

2. Set up Virtual Environment

python -m venv venv
# Linux/Mac:
source venv/bin/activate
# Windows:
.\venv\Scripts\Activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables and api key

Create a .env file. You must provide CyborgDB credentials to enable the encrypted layer.

# AI Models
HUGGINGFACEHUB_API_TOKEN=hf_your_token_here
GOOGLE_API_KEY=api-key
set cyborg db api key in data ingestion file located at src/components

5. Run the Application

Run the Prediction Pipeline :

python prediction.py

Launch the Dashboard:

python app.py

๐Ÿ“‚ Project Structure

fraud_detection/
โ”œโ”€โ”€ app.py                  # Application entry point
โ”œโ”€โ”€ src/                    # Core logic (pipelines, components, utils)
โ”œโ”€โ”€ data/                   # Datasets and vector stores
โ”œโ”€โ”€ notebooks/              # Experiments and prototyping
โ”œโ”€โ”€ static/ & templates/    # Frontend assets
โ”œโ”€โ”€ logs/                   # Runtime logs
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md


๐Ÿ”ฎ Future Scope

  • Real-time Stream Processing: Hooking into Kafka for live transaction monitoring.
  • Graph RAG: Using Knowledge Graphs to detect syndicate fraud rings.
  • Multi-Modal Support: Scanning scanned checks and invoices (OCR).

๐Ÿค Contributing

Contributions are welcome! Please fork the repository and create a pull request.


๐Ÿ“„ License

This project is licensed under the MIT License.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors