Tutorial on using Pandas, a popular data analysis framework in Python, to perform data analysis tasks on structured data.
This repository is structured as follow:
contents/ : contains Python scripts for instructor's demonstration purposes during training session
exercises/ : contains Python scripts with instructions for participants to practice
Class contents are structured in such a way that core functionalities of pandas are exposed to participants in a step-by-step and functional manner. Following contains short description of subject covered for each scripts:
01_intro.ipynb: IntroducesSeriesandDataFrame, two core data structures ofpandas02_load_and_save.ipynb: Importing and exporting files/datasets03_data_manipulation.ipynb: Indexing and slicing data,locandilocmethods04_EDA.ipynb: General methods to perform exploratory data analysis05_data_cleaning.ipynb: Performing missing data identification and imputation06_data_analysis.ipynb: Performing more in-depth data analysis using rolling window, grouping method and more07_data_visualization.ipynb: Depicting data in various visualizations
Isolating each environment for different projects is the best practice. One of the way you can create virtual environment is by using Python's native virtualenv module. At this directory's root, execute the following to create a virtual environment:
python3 -m venv venvCommands to activate virtual environment varies according to your OS. Use the following for Linux:
source venv/bin/activateUse the following for Windows:
venv\Scripts\activateA (venv) appearing in front of your system path indicates that the virtual environment is successfully activated, as shown below:
(venv) C:\Users\User\pandas-tutorial>
Finally, execute the following to install the dependencies for this lab into your activated virtual environment:
pip install -r requirements.txtAll the dataset used in this repository does not belong to the authors but rather are open-sourced datasets found online. Attached are the URLs for each and every dataset: