Joseph Coco, Tony Zhang
We couldnt get it to connect with the cloud so we had to just go with manually creating a local database to use
- Grab Code
- Download Final_Proj folder and extract it
- SQL setup
- Go in pgAdmin and make a new server / database
- Inside the database, using the query tool, paste in the sql code from the Database/init.sql file (ignore first 2 lines)
- Run that and it should create the necessary tables and schemas
- Python setup
- Have seaborn, psycopg2-binary, and scikit-learn installed
pip install seaborn psycopg2-binary scikit-learn - Go into db_config.json and fill in your postgres login info
- Have seaborn, psycopg2-binary, and scikit-learn installed
- Running Code
- Generate data and run
- Run
python Start.py - When prompted with questions on data generation, type desired amount and press enter
- The amount entered should be a positive value
- The amount of perchases and payments should be roughly the same or at least very close
- The amount of accounts should also be less then payments and perchases
- When the terminal's prompter returns, that means that the program has finished running
- Run
- Run using already generated data
- make sure there is a output.csv inside the folder code, can use one of the sample csv but make sure to rename it
- Run
python ML.py
- Generate data and run
- Looking at the results
- Find the results in the code folder. After running, there should now be 5 csv files and a png
- Look at modCompare.csv to check how accurate the predictions are compared to the known labels
- Look at toResult.csv to see what labels the model has given to the unknown
- How to understand: Info in the table: output_id, result, avg_cluster
- output_id is just the id of the transactions
- result is the given labels: 0 if false (fraud), 1 if true (legit), 2 if left empty (unknown)
- avg_cluster is the average of the clustering and labeling. If this value is greater then 0.5, it is labeled as legit, otherwise fraud
- How to understand: Info in the table: output_id, result, avg_cluster