diff --git a/methodshub.qmd b/methodshub.qmd index 1187d6b..551e5a1 100644 --- a/methodshub.qmd +++ b/methodshub.qmd @@ -6,9 +6,36 @@ format: gfm: default --- -## Description +# oolong - Create Validation Tests for Automated Content Analysis + - +## Description + Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, ) and word set intrusion (Ying et al. 2021) [doi:10.1017/pan.2021.33](https://doi.org/10.1017/pan.2021.33) tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) [doi:10.1080/10584609.2020.1723752](https://doi.org/10.1080/10584609.2020.1723752). @@ -18,53 +45,31 @@ Intended to create standard human-in-the-loop validity tests for typical automat * Text Analysis * Topic Model -## Science Usecase(s) - - - +## Use Cases + This package was used in the literature to valid topic models and prediction models trained on text data, e.g. [Rauchfleisch et al. (2023)](https://doi.org/10.1080/17512786.2022.2110928), [Rothut, et al. (2023)](https://doi.org/10.1177/14614448231164409), [Eisele, et al. (2023)](https://doi.org/10.1080/19312458.2023.2230560). -## Repository structure - -This repository follows [the standard structure of an R package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure). - -## Environment Setup - -With R installed: - -```r -install.packages("oolong") -``` - - - - - +## Input Data + -## Input Data - - - - -The input data has to be a topic model or prediction model trained on text data. For example, one can train a topic model from the text data (tweets from Donald trump) included in the package by: +A sample input is a model trained on text data, e.g. -```r +```{r} +#| message: false +library(oolong) library(seededlda) -library(quanteda) -trump_corpus <- corpus(trump2k) -tokens(trump_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, - split_hyphens = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% - tokens_remove(stopwords("en")) %>% tokens_remove("@*") -> trump_toks - -model <- textmodel_lda(x = dfm(trump_toks), k = 8, verbose = TRUE) +abstracts_seededlda ``` -## Sample Input and Output Data - - - - A sample input is a model trained on text data, e.g. ```{r} @@ -74,9 +79,39 @@ library(seededlda) abstracts_seededlda ``` +## Output Data + + The sample output is an oolong [R6 object](https://r6.r-lib.org/). +## Hardware Requirements + + +This package runs on any hardware that can run R. + +## Environment Setup + + +With R installed: + +```r +install.packages("oolong") +``` + ## How to Use + Please refer to the [overview of this package](https://gesistsa.github.io/oolong/articles/overview.html) for a comprehensive introduction of all test types. @@ -123,18 +158,30 @@ And then obtain the result of the test. For example: oolong_test ``` -## Contact Details - -Maintainer: Chung-hong Chan +## Technical Details + -Issue Tracker: [https://github.com/gesistsa/oolong/issues](https://github.com/gesistsa/oolong/issues) +See the official [CRAN repository](https://cran.r-project.org/web/packages/oolong/) for further information about technical details. -## Publication +## References + -1. Chan, C. H., & Sältzer, M. (2020). oolong: An R package for validating automated content analysis tools. The Journal of Open Source Software: JOSS, 5(55), 2461. +Chan, C. H., & Sältzer, M. (2020). oolong: An R package for validating automated content analysis tools. The Journal of Open Source Software: JOSS, 5(55), 2461. - - +## Contact Details + - - +Maintainer: Chung-hong Chan +Issue Tracker: [https://github.com/gesistsa/oolong/issues](https://github.com/gesistsa/oolong/issues)