Skip to content

Repository used for storing data relevant to the CLAIR Ubuntu project.

Notifications You must be signed in to change notification settings

jpeper/clair-ubuntu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

107 Commits
 
 
 
 
 
 

Repository files navigation

clair-ubuntu

Repository used for storing files relevant to the CLAIR Ubuntu disentanglement project. Implementation of perceptron models for use on the ubuntu dataset can be found here: https://github.com/jpeper/irc-perceptron

raw_data_processing.py
Handles file processing and uses bag of words model to perform clustering of messages. Also does very primitive data annotation.
Usage: Specify the input filename as a command line argument While running program the user will be given the option of performing kmeans, mean shift or spectral clustering and will be prompted at that time to enter the values of any relevant parameters.
For more information on clustering methods, see the following: http://scikit-learn.org/stable/modules/clustering.html

basicstats.py
Program which calculates basic statistics for a file from the ubuntu dataset
Usage: Specify the input filename as a command line argument

Example files from Ubuntu dataset:
ubuntu_small_sample.txt
small snippet (~30 messages) from Ubuntu logs
ubuntu_medium_sample.txt
moderately-sized (~170 messages) file containing dialogue from ubuntu irc chat
ubuntu_large_sample.txt
large (~4000 messages) file from Ubuntu logs

License

The code under src/ is licensed under the MIT license, while the data under data/ is licensed under the CC-BY-4.0 license, in both cases copyright Joseph Peper and Jonathan K. Kummerfeld. For details, see the LICENSE files in each folder.

About

Repository used for storing data relevant to the CLAIR Ubuntu project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •