-
Notifications
You must be signed in to change notification settings - Fork 44
Homework 1
-
Download
docs.trn.tsv. -
Each line in the file represents a document, where the format is as follows:
line ::= <label><tab><document> document ::= <token>(<space><token>)* -
Create a vector for each document using bag-of-words and TF-IDF. A sample python code for the vector creation can be found here:
hw1.py. -
Implement and run the k-means clustering algorithm on all documents using both bag-of-words and TF-IDF, where
k = 7. -
Experiment with different sets of randomly selected centroids. Measure the purity score of each trial.
-
Implement and experiment with the k-means++ clustering algorithm and compare its results to the ones achieved by the k-means clustering algorithm.
-
Write a report describing your approach, results, and analysis. Use the ACL latex template.
- Compress your code and report into
hw1.zipand submit it to: https://canvas.emory.edu/courses/29596/assignments/30886
Copyright © 2015-2019 Emory University - All Rights Reserved.
