Anaconda will have all the libraries (scikit-learn, matplotlib, pandas, numpy) as well as Jupyter notebooks that we will use in this assignment. Follow the instructions to install on your machine here: https://www.anaconda.com/distribution/
(If you already have Anaconda installed, please ensure it is up to date!) This is written assuming use of Python 3.6 or greater.
If you are using a lab machine (or lack space on your personal machine), anaconda python is too large! You have two options: (1) follow the instructions for Google Colab below or (2) use Jupyter notebooks via the default python by opening the files in VS code and pip install the various modules as needed.
Download the ipynb notebook file in this repository as well as data.zip. Place both of these files into a well-named folder on your computer, and click on the zip file to unzip the data folder. Rename the .ipynb file to [yourunix]_haii24a[assignmentnumber], e.g., ikh1_haii24a1. You will submit this file to Glow when completed.
If you have VS Code, it has built-in support for running Jupyter Notebooks. You'll be prompted to select a Python environment, and should choose the anaconda python environment so you don't have to separately install scikit-learn, pandas, numpy, etc. individually.
In your terminal, navigate to your homework folder and run jupyter notebook . to start the notebook. A notebook session should open up in your browser.
On a Mac, you can run a Jupyter notebook from Terminal by typing jupyter notebook name_of_file_here.ipynb. On Windows, you will do something similar but by running the 'Anaconda Prompt' that comes with the Anaconda distribution.
Once Anaconda is installed on your machine, open an application called Anaconda-Navigator. On the main page, click the 'launch' button on the Jupyter Notebook tile. A notebook session should open up in your browser.
If you lack space on your machine for installing libraries, Google Colab can be a good option. Google Colab is a cloud-based Jupyter Notebook. Instructions for using Google Colab for this assignment are available on the assignment page.
If you haven't used Jupyter Notebooks before, then read the Jupyter Notebook Tutorial: The Definitive Guide and take the User Interface Tour in the Jupyter notebook Help menu once you've opened your first Jupyter Notebook.
- Jupyter Notebook Tutorial: The Definitive Guide
- Python3 Documentation (tutorial and library reference are likely useful)
- Python tutorials from w3schools and Scrimba.
- Python pandas documentation
- A list of python pandas tutorials
- scikit-learn tutorial
- How to Handle Imbalanced Classes in Machine Learning: Down-sample Majority Class
- scikit-learn models (many more in the User Guide):