Projects from "Data Science Specialist" course from Yandex

The following projects were completed during my studies in Yandex.Practicum. The projects are listed in chronological order, so it is possible to see my Data Scientist skill toolkit grow.

Project name	Description	Libraries
1. Big city music	Preferences comparison of Yandex.Music users from Moscow and St. Petersburg depending on time and day. Data preparation, cleaning, exploratory analysis.	pandas
2. Сredit worthiness analysis	Investigation on the effect of family status, number of children and income on credit repayment. Data preparation, cleaning, exploratory analysis, variable categorization.	pandas, pymystem3
3. Real estate analysis	Determining the market value of real estate in St. Petersburg and its suburbs and defining parameters that make it possible to create an automated system capable of detecting anomalies and fraud. Data preparation, feature engineering, exploratory analysis, correlation between the real estate price and different parameters.	pandas, matplotlib.pyplot, plotly.express
4. Determination of a promising tariff for a telecom company	User behavior analysis for a telecom company to determine the most profitable tariff. Data preparation, cleaning, feature engineering, statistical analysis, hypothesis testing.	pandas, matplotlib.pyplot, plotly, numpy, scipy.stats
5. Computer games popularity research	Game popularity prediction depending on genre and platform based on historical data. Exploratory data analysis, user portrait analysis, hypothesis testing.	pandas, matplotlib.pyplot, plotly, numpy, scipy.stats
6. Telecom tariff recommendation	Building a machine learning model for user classification. The training is conducted on historical data with user behavior, the model will be used to recommend a suitable tariff to users. Data preparation and cleaning was not required here. The following models have been tested: DecisionTreeClassifier, RandomForestClassifier, LogisticRegression.	pandas, sklearn, numpy
7. Bank customer churn modeling	Building a machine learning model for customer churn prediction based on historical data with customer behavior. Data preparation, cleaning, OHE encoding of categorical variables, scaling of numerical variables, dealing with imbalanced data (class weight, upsampling, downsampling). Different models have been tested (DecisionTreeClassifier, RandomForestClassifier, LogisticRegression), optimal hyperparameters were found with GridSearchCV. F1-score and AUC-ROC were calculated.	pandas, sklearn, numpy, matplotlib.pyplot
8. New oil well location	Creating a model that analyzes different proposed locations for a new oil well that will maximize profits while minimizing risk. Profits and risk were analyzed using Bootstrap technique. LinearRegression was used as a model.	pandas, sklearn, numpy, scipy.stats, seaborn, matplotlib.pyplot
9. Gold recovery coefficient	Building a machine learning model to predict the amount of gold extracted from gold ore based on extraction and purification data. Different models have been tested: LinearRegression, DecisionTreeRegressor, RandomForestRegressor. Optimal hyperparameters were found with cross-validation.	pandas, sklearn, numpy, seaborn, matplotlib.pyplot
10. Customers' personal data encryption	Developing a method for data encryption based on multiplication of features by an invertible matrix. The correctness of the method is mathematically justified. It is shown that the quality of linear regression on the transformed data does not change.	pandas, sklearn, numpy, seaborn, matplotlib.pyplot
11. Car prices prediction	Building a machine learning model for car prices prediction based in historical data. Data was preprocessed, categorical features were encoded with OrdinalEncoder. Models used: Gradient Boosting, Random Forest. Optimal hyperparameters were found with GridSearchCV. Feature importance chart was plotted.	pandas, sklearn, numpy, seaborn, matplotlib.pyplot, plotly, lightgbm
12. Number of taxi orders forecast	Time-series forecasting for a number of taxi orders. Data resampling, trend and seasonality analysis. New features were added: calendar features, lag and rolling mean values. Models used: Gradient Boosting (CatBoost), Linear Regression. Optimal hyperparameters were found with GridSearchCV.	statsmodels.tsa.seasonal, catboost, pandas, sklearn, numpy, matplotlib.pyplot
13. Sentiment analysis	Building a model to study tonality of texts. Texts were preprocessed, lemmatized with Spacy. Models used: CatBoost, TF-IDF + Logistic Regression	catboost, pandas, sklearn, re, nltk
14. Analytics of airline customers' preferences	Analysis of customers' flights to different cities in September 2018. Studying the database and extracting the necessary information. Data analysis in python.	pandas, PostgreSQL

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
01_yandex_music		01_yandex_music
02_credit_scoring		02_credit_scoring
03_real_estate_prices		03_real_estate_prices
04_telecom_tariff		04_telecom_tariff
05_games		05_games
06_users_classification		06_users_classification
07_bank_customer_churn_modeling		07_bank_customer_churn_modeling
08_oil_extraction_location		08_oil_extraction_location
09_gold_recovery		09_gold_recovery
10_customer_data_encryption		10_customer_data_encryption
11_car_prices_boosting		11_car_prices_boosting
12_time_series		12_time_series
13_nlp		13_nlp
14_sql		14_sql
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Projects from "Data Science Specialist" course from Yandex

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Projects from "Data Science Specialist" course from Yandex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages