This is a repository for my work in the Google Advanced Data Analytics course on Coursera.
- Get a link for above
- Polish
- Course 2: Python Introduction
- Loaded data from a
csvfile with Pandas - Retrieved preliminary information and statistics
- Analyzed some features, especially in relation to the target variable,
claim_status - Concluded that there is a large difference in
likes,comments,views, etc. between differentclaim_status
- Course 3: Exploratory Data Analysis
- Constructed visualizations using Seaborn (histograms, box plots, pie charts, bar graphs, scatterplots, etc.)
- Analyzed distribution of features by class (of the target variable)
- Checked for statistical outliers
- Made conclusions about
author_ban_statusand other variables in relation toclaim_status
- Course 4: Hypothesis Testing
- Checked for missing data
- Prepared hypotheses for hypothesis testing
- Conducted a t-test to determine stastical significance
- Course 5: Regression Modeling
- Initial analysis of target variable,
verification_status - Check correlation of features to satisfy model assumptions
- Split data into train and test sets
- Encode data using one-hot encoding
- Trained a logistic regression model
- Evaluated the model on metrics using a confusion matrix
- Course 6: Machine Learning Models
- Split data into train, validation, and test sets
- Trained a Random Forest Model (from
sklearn) on the data - Trained an XGBoost Model (from
xgboost) on the data - Used Grid Search to tune hyperparameters
- Evaluated the model using metrics and confusion matrices