During the intense 8-module learning cycle, you will study all the essential theory needed for full-cycle data science project development. The program consists of different online activities: top MOOCs in the best possible order, labs, Kaggle competitions, tests, and full-cycle projects.
Math Overview
We will overview all the necessary math fields for comfortable dive into the world of Machine Learning. By the end of this week, you'll refresh your knowledge of Calculus, Linear Algebra, and Probability Theory. You'll validate your knowledge by passing the test at the end of the week.
During the next two weeks, you'll get hands-on experience with Python language. We will guide you through Data Types, Conditional Statements, Loops, Functions, Decorators, Modules, Packages, OOP, and Libraries. Knowledge of Python will be validated by test and lab on writing the Snake Game.
Python Libraries for Data Science
We will dig into 4 main python-based cornerstones which are essential for any Data Scientist. Here are our heroes: NumPy, Pandas, SciPy, and Matplotlib. NumPy is an efficient multi-dimensional container of generic data, which gives you an ability to work with Tensors of any shapes. Pandas library provides high-performance, easy-to-use data structures, and data analysis tools, this is you must have the library to deal with Tabular data. SciPy is an enhancement over NumPy which gives access to a higher level of mathematic objects and expressions. Matplolib provides an ability to visualize your insights. Remember, visualization is essential in Data Science. Knowledge of libs will be validated by the test at the end of the week.
You will spend the first three weeks working on the Supervised Learning techniques. Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. During the first week, you'll work with different modifications of Linear Regression, Logistic Regression, Naive Bayes Classifier, and KNN. During the second week, you will work with Neural Networks and SVM. You will need to pass the lab on each of the algorithms.
The next two weeks will be dedicated to working with Unsupervised Learning techniques. Unsupervised learning is a type of self-organized learning that helps find previously unknown patterns in data without pre-existing labels. During the first week, you'll work on K-Means clustering and PCA dimensionality reduction algorithms. During the second week, you are working on one more Kaggle Competition as well as passing a General Test on Machine Learning.
During the next two weeks, you'll be tasked to solve different real-life machine learning problems. You'll get more understanding of how the real data looks like and how to process it. Also, you'll get a fast overview of Machine Learning libraries such as Scikit-Learn and XGBoost with off-the-shelf algorithms. You'll apply your knowledge working on Kaggle competitions, your results will be also validated automatically by our system.
RESTful APIs and Containerization
Every Data Scientist should be able to expose his own results to the world, and in lots of cases, it cannot be accomplished by only visualizing your results, especially when you work in a development team. You need to able able to create fast API. We will guide you through a simple API framework called Flask. By the end of this week, you'll be able to write simple RESTful APIs. Another important thing when you create an exposable model is the ability to make sure that it will run everywhere. Docker gives you such a level of virtualization to which you'll dedicate one more week. Knowledge of Flask and Docker will be validated by a test.