This project analyzes data from an experimental speed dating event to build a logistic regression model that predicts whether participants would want to have a second date based on various factors. The model examines how attributes like attractiveness, fun, intelligence, and demographic information influence dating outcomes.
The dataset contains information from participants in an experimental speed dating event, including:
- Ratings of dates on six key attributes (attractiveness, fun, intelligence, etc.)
- Participant demographic information
- Dating habits and preferences
- Match outcomes (whether participants wanted a second date)
- Explore and understand factors influencing dating decisions
- Develop a logistic regression model to predict match outcomes
- Identify key predictors of successful matches
- Evaluate model performance with appropriate metrics
-
Data Preparation and Exploration
- Clean and preprocess the dataset
- Handle missing values
- Explore relationships between variables
- Generate descriptive statistics
-
Feature Engineering
- Transform relevant variables
- Create interaction terms if needed
- Select significant predictors
-
Model Development
- Split data into training and testing sets
- Implement logistic regression
- Optimize model parameters
- Evaluate performance (accuracy, precision, recall, ROC-AUC)
-
Results Analysis
- Interpret model coefficients
- Identify significant factors in dating decisions
- Assess model strengths and limitations
- Type: Supervised Learning
- Category: Classification (Binary)
- Algorithm: Logistic Regression
- Target Variable: Second date decision (Yes/No)
- Python 3.10
- Libraries: pandas, numpy, scikit-learn, matplotlib, seaborn