pandas-tutorial

Tutorial on using Pandas, a popular data analysis framework in Python, to perform data analysis tasks on structured data.

Repository Structure

This repository is structured as follow:

contents/   : contains Python scripts for instructor's demonstration purposes during training session
exercises/  : contains Python scripts with instructions for participants to practice

Class contents are structured in such a way that core functionalities of pandas are exposed to participants in a step-by-step and functional manner. Following contains short description of subject covered for each scripts:

01_intro.ipynb: Introduces Series and DataFrame, two core data structures of pandas
02_load_and_save.ipynb: Importing and exporting files/datasets
03_data_manipulation.ipynb: Indexing and slicing data, loc and iloc methods
04_EDA.ipynb: General methods to perform exploratory data analysis
05_data_cleaning.ipynb: Performing missing data identification and imputation
06_data_analysis.ipynb: Performing more in-depth data analysis using rolling window, grouping method and more
07_data_visualization.ipynb: Depicting data in various visualizations

Dependencies

Isolating each environment for different projects is the best practice. One of the way you can create virtual environment is by using Python's native virtualenv module. At this directory's root, execute the following to create a virtual environment:

python3 -m venv venv

Commands to activate virtual environment varies according to your OS. Use the following for Linux:

source venv/bin/activate

Use the following for Windows:

venv\Scripts\activate

A (venv) appearing in front of your system path indicates that the virtual environment is successfully activated, as shown below:

(venv) C:\Users\User\pandas-tutorial>

Finally, execute the following to install the dependencies for this lab into your activated virtual environment:

pip install -r requirements.txt

Disclaimer

All the dataset used in this repository does not belong to the authors but rather are open-sourced datasets found online. Attached are the URLs for each and every dataset:

ct.json dataset source
airlines.csv dataset source
Pokemon.csv dataset source
Titanic.csv dataset source

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
contents		contents
datasets		datasets
scripts		scripts
solutions		solutions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pandas-tutorial

Contents

Repository Structure

Dependencies

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

OnziDreox/pandas-tutorial-1

Folders and files

Latest commit

History

Repository files navigation

pandas-tutorial

Contents

Repository Structure

Dependencies

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages