Aditi Gajjar, Anagha Sikha, Othilia Norell, Soren Paetau, Nicholas Tan \ agajjar@calpoly.edu / arsikha@calpoly.edu \ onorell@calpoly.edu / spaetau@calpoly.edu / nktan@calpoly.edu
- run python3 randomForest.py To run code on different subsets of attributes (all, without best attributes, and without worst attributes) by updating line 34 in randomForest.py – instructions in randomForest.py
- For KMeans, run python3 kmeans.py spotify_songs.csv [-p (kmeans_plus)] [-m (manhattan dist)] [-n (normalize data)] [-t (testing)]
- For DBScan, run python3 dbscan.py spotify_songs.csv [-m (manhattan dist)]
- To run C4.5 run python3 InduceC45.py spotify_new_train.csv in the terminal
- To run classify on the decision tree from InduceC45.py run on the test dataset: python3 classify.py spotify_new_test.csv spotify_new_train.json
- To run Random Forests run python3 randomForest.py spotify_songs_new.csv (example: python3 randomForest.py spotify_songs_new.csv 9 600 50)
- Results of the random forests classification are found in spotify_songs_new_rf_all.txt for all 9 attributes
- From hypertuning results of the random forests classification are found in spotify_songs_new_rf.txt for 4 attributes
-Reference readme in CollabFiltering -use -sp with testcases given