Skip to content

Some work from the Google Advanced Data Analytics course from Coursera.

Notifications You must be signed in to change notification settings

SubtlePeon/gada

Repository files navigation

This is a repository for my work in the Google Advanced Data Analytics course on Coursera.

Todo

  • Get a link for above
  • Polish

Details

  1. Course 2: Python Introduction
  • Loaded data from a csv file with Pandas
  • Retrieved preliminary information and statistics
  • Analyzed some features, especially in relation to the target variable, claim_status
  • Concluded that there is a large difference in likes, comments, views, etc. between different claim_status
  1. Course 3: Exploratory Data Analysis
  • Constructed visualizations using Seaborn (histograms, box plots, pie charts, bar graphs, scatterplots, etc.)
  • Analyzed distribution of features by class (of the target variable)
  • Checked for statistical outliers
  • Made conclusions about author_ban_status and other variables in relation to claim_status
  1. Course 4: Hypothesis Testing
  • Checked for missing data
  • Prepared hypotheses for hypothesis testing
  • Conducted a t-test to determine stastical significance
  1. Course 5: Regression Modeling
  • Initial analysis of target variable, verification_status
  • Check correlation of features to satisfy model assumptions
  • Split data into train and test sets
  • Encode data using one-hot encoding
  • Trained a logistic regression model
  • Evaluated the model on metrics using a confusion matrix
  1. Course 6: Machine Learning Models
  • Split data into train, validation, and test sets
  • Trained a Random Forest Model (from sklearn) on the data
  • Trained an XGBoost Model (from xgboost) on the data
  • Used Grid Search to tune hyperparameters
  • Evaluated the model using metrics and confusion matrices

About

Some work from the Google Advanced Data Analytics course from Coursera.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published