tf-idf.py
Preprocess dataset.
Extract text from CSV,
tokenise by n-grams,
compute TF*IDF,
output cosine similarity matrix,
encode to JSON.
t-sne.py
t-SNE visualisation.
Extract data from JSON,
(run PCA if high-dimensional),
run t-SNE,
output coordinate matrix,
plot in 2D,
save to CSV.
Live demo and details: http://34.121.78.114/