This course focuses on analyzing data of all types using the Python programming language. No programming experience is necessary.
The course starts with an introduction and refresher on the command line. We then cover the fundamentals of Python and its data types, followed by the data analysis packages Numpy and Pandas, and plotting packages Matplotlib and Seaborn, and other topics.
Jupyter (IPython) notebooks are used throughout. Conda is used for package management and virtual environments. All notebooks are in Python 3 unless otherwise noted.
Luke Thompson, Ph.D.
Lecturer, Scripps Institution of Oceanography
Research Associate, National Oceanographic and Atmospheric Administration
lukethompson@gmail.com
All lectures are available from the 2016 course on my YouTube channel Doc Thompson Data Science.
The lessons below match the Jupyter notebooks in the ipynb directory. Any data files required by those notebooks are provided in the data directory.
| Lesson | Title | Readings | Topics |
|---|---|---|---|
| 0 | Introductions and Syllabus | Obtain Learn Python The Hard Way (Shaw), Python for Data Analysis (McKinney), and Learning Python (Lutz) | Introductions and overview of course |
| 1 | Command Line and Bash | Shaw: The Hard Way Is Easier, Exercise 0, Appendix A: Command Line Crash Course | A full introduction to using the command line, the bash shell, and text editors |
| 2 | Conda, IPython, and Jupyter Notebooks | Install: Miniconda 3 | Conda tutorial including conda environments, python packages, and PIP, Python and IPython in the command line, Jupyter notebook tutorial and Python crash course |
| 3 | Python Basics, Strings, Printing | Shaw: Exercises 1-10; Lutz: Ch 1-7 | Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython |
| 4 | Taking Input, Reading and Writing Files, Functions | Shaw: Exercises 11-26; Lutz: Ch 9, 14-17 | Taking input, reading files, writing files, functions |
| 5 | Logic, Loops, Lists, Dictionaries, and Tuples | Shaw: Exercises 27-39; Lutz: Ch 8-13 | Logic and loops, lists and list comprehension, tuples, dictionaries, other types |
| 6 | Python and IPython Review | McKinney: Appendix: Python Language Essentials, Ch 3 | Review of Python commands, IPython review -- enhanced interactive Python shells with support for data visualization, distributed and parallel computation and a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media |
| 7 | Regular Expressions | Grep tutorials: Drew's Grep Tutorial, Linux Grep Tutorial; Python Regular Expressions Tutorial | Regular expression syntax, Command-line tools: grep, sed, awk, perl -e, Python examples: built-in and re module |
| 8 | Numpy, Pandas and Matplotlib Crashcourse | Numpy overview, Pandas overview, Matplotlib overview | |
| 9 | Pandas Basics | McKinney: Ch 1-2, 4 (Introduction to Scientific Computing with NumPy and Pandas) | Series, DataFrame, index, columns, dtypes, info, describe, read_csv, head, tail, loc, iloc, ix, to_datetime |
| 10 | Pandas Advanced | McKinney: Ch 5-7 (Data Analysis with Pandas); Pandas Documentation: Indexing and Selecting Data | concat, append, merge, join, set_option, stack, unstack, transpose, dot-notation, values, apply, lambda, sort_index, sort_values, to_csv, read_csv, isnull |
| 11 | Plotting with Matplotlib | McKinney: Ch 8; J.R. Johansson: Matplotlib 2D and 3D plotting in Python | |
| 12 | Plotting with Seaborn | Seaborn Tutorial | |
| 13 | Pandas Time Series | McKinney: Ch 10, Pandas Documentation: Time Series and Date | |
| 14 | Pandas Group Operations | McKinney: Ch 9 | groupby, melt, pivot, inplace=True, reindex |
| 15 | Statistics Packages | Statitics capabilities of Pandas, Numpy, Scipy, and Scikit-bio | |
| 16 | Interactive Visualization with Bokeh | Bokeh IPython Notebooks |