Skip to content

meilinshi/soc-twitter

 
 

Repository files navigation

SNAPP Soil Organic Carbon -- Tweet Parsing and Analysis

This repository contains several scripts developed to process Twitter data to investigate how soil organic content and health are related.

Two different data sources are used:

  • Data collected directly from the Twitter API
  • Data from the Twitter archives

For the archive data:

  • Main: raw_data_processing.R:
    • read raw twitter datasets from different sources (Json or csv format)
    • clean and standardize to enable a merge
    • simple analysis of what the data looks like
    • to correct parsing errors found in the csv files derived from the API (cell overlap) use fixed_tweet.sh !!! This script needs to be edited from the command line and NOT from R, as it is dealing with hidden characters !!!

For the collected data via Twitter API:

  • Main: automate.R
    • runs every week collecting the last 6-9 days of twitter data based on query words from tag_list.csv
    • cleans and standarize to enable merge

  • Inititial data exploration:

    • Data_viz_script.R: Data visualization and exploration
    • Sentiment_test.R: Used to explore text mining options with Archived/json data. Reproducible for the larger merged dataset.
  • More specific exploration and visualizations can be found in the following folders (see their respective README's for more detailed information about specific analyses):

    • various way of visualizing the content of tweets by different categoriestweet_content
    • attempts to identify what type of content appeals to different user groups influencers
    • each of these ^ rely on the functions within text_analysis_functions.R

translation folder contains scripts for translating hindi using google translate via webinterface


pre_processing folder contains scripts for specific tasks (usually run once).

About

SNAPP - Soil Organic Carbon Twitter data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 98.8%
  • R 1.2%