Project for CDIPS Data Science Workshop Summer 2014
Adam Kalman, Aleksey Kocherzhenko, Henoch Wong
- Put all these files into a local directory
- Create empty subdirectories "Data" and "interdata"
- Put avito_test.tsv and avito_train.tsv into "Data" (these files are over 1 GB, so they're not here. They are available from Kaggle.)
- Change code at beginning of each file to match the correct local paths
- Modify split.py to choose the training set size, then run it.
- Run classifycategories.py, illicitcontent.py, mergeSolutions.py, and finalMerge.py, in that order.