Skip to content

FoodSentimentObservatory/summarisation-scripts

Repository files navigation

summarisation-scripts

Scripts to process the jst data into a more meaningful and readable format

Prerequisites

Required installations

Follow the instructions of the above mentioned packages to run them.

Other requirements

Sample config file available.

Need to create a data directory for all input and results folder for the output

Input files

The summarisations uses for input 5 items: 4 text files: documentThetha, documentPi, topicWords and topicSentences and a folder with raw-tweets.

The four text files mentioned above are copied from the jst final results (final.thetha, final.pi, final.twords and final.topSentences) and pasted and saved in txt format with the above mentioned names. The names are only specified in the config file, however keeping to the same names makes it less time consuming as there's no need to touch the config file.

Note: do not change the extention of the jst files themselves, as this ruins their formatting, just copy the contents and paste them in a new file.

The raw-tweeets folder is a subdirectory of the data folder which contains raw texts generated by the raw-tweets.py script in pyMysql.

Running the code

Open console and navigate to the directory of the summarisation-scripts package. When you get there type:

summary_of_topics.py

Note: After each run of the script make sure to copy the result files to a different directory or give them a relevant name as currently the code cannot generate useful names and it simply overwrites the result files.

Output files

  • json file to be used for visualisation of summaries in html

  • text file of all topic summaries

  • spreadsheet of all topic summaries, ordered by topic importance

  • spreadsheet of all topics, ordered by importance

  • csv file of all topics

About

Python code to parse and sort JST results

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages