clair-ubuntu

Repository used for storing files relevant to the CLAIR Ubuntu disentanglement project. Implementation of perceptron models for use on the ubuntu dataset can be found here: https://github.com/jpeper/irc-perceptron

raw_data_processing.py
Handles file processing and uses bag of words model to perform clustering of messages. Also does very primitive data annotation.
Usage: Specify the input filename as a command line argument While running program the user will be given the option of performing kmeans, mean shift or spectral clustering and will be prompted at that time to enter the values of any relevant parameters.
For more information on clustering methods, see the following: http://scikit-learn.org/stable/modules/clustering.html

basicstats.py
Program which calculates basic statistics for a file from the ubuntu dataset
Usage: Specify the input filename as a command line argument

Example files from Ubuntu dataset:
ubuntu_small_sample.txt
small snippet (~30 messages) from Ubuntu logs
ubuntu_medium_sample.txt
moderately-sized (~170 messages) file containing dialogue from ubuntu irc chat
ubuntu_large_sample.txt
large (~4000 messages) file from Ubuntu logs

License

The code under src/ is licensed under the MIT license, while the data under data/ is licensed under the CC-BY-4.0 license, in both cases copyright Joseph Peper and Jonathan K. Kummerfeld. For details, see the LICENSE files in each folder.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clair-ubuntu

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

jpeper/clair-ubuntu

Folders and files

Latest commit

History

Repository files navigation

clair-ubuntu

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages