This project aims to classify depression levels based on survey responses using machine learning models. The dataset contains information such as age, gender, academic year, CGPA, and responses to several questions related to depression symptoms. The goal is to predict the depression level (Minimal, Mild, Moderate, Moderately Severe, or Severe) using this data.
The dataset used in this project is Depression.csv, which contains the following columns:
- Age: Age group of the respondent.
- Gender: Gender of the respondent.
- University: The university where the respondent is enrolled.
- Department: The department the respondent belongs to.
- Academic Year: The academic year of the respondent.
- Current CGPA: Current CGPA of the respondent.
- Survey Responses: Responses to 9 questions about the respondent's mental health, rated on a scale.
- Depression Value: Depression value calculated based on the responses (max = 27).
- Depression Level: The categorized depression level based on the depression value.
-
Data Preprocessing:
- Loaded the dataset and previewed its contents.
- Checked for missing values and dropped irrelevant columns.
- Encoded categorical columns using label encoding.
-
Feature Selection:
- Selected the responses to depression-related questions as the feature set.
- The target variable is the depression level.
-
Modeling:
- Split the data into training and testing sets.
- Standardized the features using
StandardScaler. - Built two machine learning models: Logistic Regression and Random Forest.
-
Model Evaluation:
- Evaluated the models using metrics such as accuracy, precision, recall, F1-score, and confusion matrix.
- Both models achieved an accuracy of 1.00 on the test set.
- Visualized confusion matrices and ROC curves for both models.
This project requires the following Python libraries:
pandasnumpymatplotlibseabornscikit-learn