Skip to content

Kineviz/paysim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaySim Graph Data Processing

This project demonstrates financial fraud detection by leveraging Google Cloud Graph capabilities—specifically Spanner Graph and BigQuery Graph—combined with the visual analytics of Kineviz GraphXR. We transform PaySim transaction data into property graph format for fraud detection analysis.

About PaySim

We build upon synthetic data generated by PaySim, the Mobile Money Payment Simulator. PaySim, originally created by Dr. Edgar Lopez-Rojas (http://edgarlopez.net), simulates authentic transaction behavior observed on a mobile money platform. The platform enables users to transfer funds between the electronic wallets on their mobile phones.

Specifically, we utilize PaySim 2, a version developed by David Voutila. After configuring the simulation with parameters such as the number of steps, clients, merchants, banks, and the probabilities of various activities, the simulation can be executed.

Quick Start

# Clone this repository
git clone git@github.com:Kineviz/paysim.git

cd paysim

# Install and setup uv
pip install uv
uv venv --python=python3.11

.venv\Scripts\activate  # Windows; Linux/Mac: source .venv/bin/activate

# Install dependencies
uv pip install pandas google-cloud-bigquery google-cloud-spanner pandas-gbq db-dtypes python-dotenv

# Prepare data to be loaded to Spanner or BigQuery
uv run src/prepare_data.py

Requirements: Python 3.11, GCP credentials, CSVs generated by PaySim simulator (data/raw/transactions.csv, data/raw/clients.csv, data/raw/merchants.csv)

Data Pipeline

See Data Preparation Pipeline for details. The pipeline generates:

  • Entity nodes: Banks, Emails, PhoneNumbers, SSNs
  • PII relationships: Client → Email/Phone/SSN
  • Transaction relationships: Client ↔ Transaction ↔ Merchant/Bank

Import to Cloud

Connect to GraphXR Explorer

Now that data is loaded in Spanner or BigQuery, and you run DDL to define a graph, you can connect GraphXR Explorer to it following instruction on Google Cloud Marketplace:

Or create an account on https://graphxr.kineviz.com/ and connect to your instance.

Graph Schema

graph LR
    Client["<b>Client</b><br/>id, name, isfraud"]
    Transaction["<b>Transaction</b><br/>id, amount, timestamp<br/>action, globalstep<br/>isfraud, isflaggedfraud<br/>typedest, typeorig"]
    Merchant["<b>Merchant</b><br/>id, name, highrisk"]
    Bank["<b>Bank</b><br/>id, name"]
    Email["<b>Email</b><br/>id, name"]
    PhoneNumber["<b>PhoneNumber</b><br/>id, name"]
    SSN["<b>SSN</b><br/>id, name"]
    
    Client -->|PERFORMS| Transaction
    Transaction -->|TO_CLIENT| Client
    Transaction -->|TO_MERCHANT| Merchant
    Transaction -->|TO_BANK| Bank
    Client -->|HAS_EMAIL| Email
    Client -->|HAS_PHONE| PhoneNumber
    Client -->|HAS_SSN| SSN
    
    style Client fill:#e1f5ff
    style Transaction fill:#fff3e0
    style Merchant fill:#f3e5f5
    style Bank fill:#e8f5e9
    style Email fill:#fce4ec
    style PhoneNumber fill:#fce4ec
    style SSN fill:#fce4ec
Loading

Nodes: Client, Transaction, Merchant, Bank, Email, PhoneNumber, SSN
Edges: PERFORMS, TO_CLIENT, TO_MERCHANT, TO_BANK, HAS_EMAIL, HAS_PHONE, HAS_SSN

About

Transform PaySim transaction data into property graph format for fraud detection analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages