[DF Project] Child Mind Institute — Problematic Internet Use Relating Physical Activity to Problematic Internet Use
This project analyzes the relationship between physical activity and problematic internet use in children and adolescents. Using machine learning techniques, the goal is to predict levels of problematic internet usage based on physical activity and fitness data from the Kaggle competition: Child Mind Institute — Problematic Internet Use.
- Dataset: The dataset includes features such as physical activity metrics, demographics, and health indicators.
- Target Variable: The target variable (
sii) reflects the severity of problematic internet usage on a scale of 0 to 3. - Imputation: Missing values were handled using methods such as KNN and Random Forest imputation.
- Feature Engineering: Derived new features like BMI-to-age ratio and internet hours adjusted by age.
- Modeling: Trained regression models to predict
PCIAT-PCIAT_Totalscores and classified them intosiilevels.
- Checked for missing values and feature correlations.
- Visualized relationships between physical activity and internet usage.
- Used heatmaps and scatterplots to explore feature significance.
- Dropped irrelevant features (e.g., season and redundant demographic data).
- Handled missing values using a combination of mean, KNN, and Random Forest imputation.
- Standardized and encoded categorical variables for modeling.
- Created new interaction terms (e.g., BMI scaled by age).
- Calculated feature correlations to select impactful predictors.
- Trained various regression models including LightGBM, Random Forest, and Linear Regression.
- Fine-tuned hyperparameters for better performance.
- Predicted
PCIAT-PCIAT_Totaland mapped results tosiilevels using custom thresholds.
- Evaluated models using accuracy, F1-score, and Cohen's Kappa.
- Conducted validation and refined the models iteratively.
- The model achieved [include key metrics here, such as accuracy or Kappa score].
- Physical activity during evenings had a strong correlation with problematic internet usage.
- Feature engineering improved prediction accuracy significantly.
- Python
- Libraries:
- NumPy, Pandas: Data manipulation and analysis.
- Matplotlib, Seaborn: Data visualization.
- Scikit-learn, LightGBM: Machine learning models.
-
Clone the repository:
git clone [repository URL] cd [repository folder] -
Install dependencies:
pip install -r requirements.txt
-
Run the notebook:
jupyter notebook df-physical-activity-problematic-internet-use.ipynb