This is a Machine Learning (ML) project aimed at classifying user gender (Male/Female) based on anonymous financial transaction data.
The project focuses on building and comparing various classic and advanced classification models using the Python data science stack.
The project is comprised of two core files:
-
mint_gender.ipynb:- The main Jupyter Notebook containing the entire ML pipeline: data loading, cleaning, Feature Engineering, model training, and performance evaluation.
- The notebook compares performance across several algorithms, including Logistic Regression, Decision Tree, Random Forest, and XGBoost.
-
utils.py:- A Python module containing essential utility functions, categorical mappings (
category_mapping), constants (RANDOM_STATE), and standard model evaluation routines:eval_model: For evaluating a model on training and validation sets.test_results: For evaluating a model on the held-out test set.
- A Python module containing essential utility functions, categorical mappings (
The project requires the following standard Python libraries (as imported in utils.py):
All necessary libraries can be installed using pip:
pip install numpy pandas scikit-learn xgboost matplotlib seaborn jupyter