diff --git a/.Rbuildignore b/.Rbuildignore index 885b5c5..4e41c8f 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -54,3 +54,4 @@ ^CITATION\.cff$ ^\.quarto$ ^runtime\.txt$ +^\.binder$ diff --git a/apt.txt b/.binder/apt.txt similarity index 100% rename from apt.txt rename to .binder/apt.txt diff --git a/install.R b/.binder/install.R similarity index 100% rename from install.R rename to .binder/install.R diff --git a/postBuild b/.binder/postBuild similarity index 100% rename from postBuild rename to .binder/postBuild diff --git a/runtime.txt b/.binder/runtime.txt similarity index 100% rename from runtime.txt rename to .binder/runtime.txt diff --git a/methodshub.md b/methodshub.md deleted file mode 100644 index 2aeb807..0000000 --- a/methodshub.md +++ /dev/null @@ -1,203 +0,0 @@ -# oolong - Create Validation Tests for Automated Content Analysis - - -## Description - - - -Intended to create standard human-in-the-loop validity tests for typical -automated content analysis such as topic modeling and dictionary-based -methods. This package offers a standard workflow with functions to -prepare, administer and evaluate a human-in-the-loop validity test. This -package provides functions for validating topic models using word -intrusion, topic intrusion (Chang et al. 2009, -) -and word set intrusion (Ying et al. 2021) -[doi:10.1017/pan.2021.33](https://doi.org/10.1017/pan.2021.33) tests. -This package also provides functions for generating gold-standard data -which are useful for validating dictionary-based methods. The default -settings of all generated tests match those suggested in Chang et -al. (2009) and Song et al. (2020) -[doi:10.1080/10584609.2020.1723752](https://doi.org/10.1080/10584609.2020.1723752). - -## Keywords - -- Validity -- Text Analysis -- Topic Model - -## Science Usecase(s) - - - - - -This package was used in the literature to valid topic models and -prediction models trained on text data, e.g. [Rauchfleisch et -al. (2023)](https://doi.org/10.1080/17512786.2022.2110928), [Rothut, et -al. (2023)](https://doi.org/10.1177/14614448231164409), [Eisele, et -al. (2023)](https://doi.org/10.1080/19312458.2023.2230560). - -## Repository structure - -This repository follows [the standard structure of an R -package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure). - -## Environment Setup - -With R installed: - -``` r -install.packages("oolong") -``` - - - - - - - -## Input Data - - - - - -The input data has to be a topic model or prediction model trained on -text data. For example, one can train a topic model from the text data -(tweets from Donald trump) included in the package by: - -``` r -library(seededlda) -library(quanteda) -trump_corpus <- corpus(trump2k) -tokens(trump_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, - split_hyphens = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% - tokens_remove(stopwords("en")) %>% tokens_remove("@*") -> trump_toks - -model <- textmodel_lda(x = dfm(trump_toks), k = 8, verbose = TRUE) -``` - -## Sample Input and Output Data - - - - - -A sample input is a model trained on text data, e.g. - -``` r -library(oolong) -library(seededlda) -abstracts_seededlda -``` - - - Call: - lda(x = x, k = k, label = label, max_iter = max_iter, alpha = alpha, - beta = beta, seeds = seeds, words = NULL, verbose = verbose) - - 10 topics; 2,500 documents; 3,908 features. - -The sample output is an oolong [R6 object](https://r6.r-lib.org/). - -## How to Use - -Please refer to the [overview of this -package](https://gesistsa.github.io/oolong/articles/overview.html) for a -comprehensive introduction of all test types. - -Suppose there is a topic model trained on some text data called -`abstracts_seededlda`, which is included in the package. - -``` r -library(oolong) -abstracts_seededlda -``` - - - Call: - lda(x = x, k = k, label = label, max_iter = max_iter, alpha = alpha, - beta = beta, seeds = seeds, words = NULL, verbose = verbose) - - 10 topics; 2,500 documents; 3,908 features. - -Suppose one would like to conduct a word intrusion test (Chang et -al. 2009) to validate this topic model. This test can be generated by -the `wi()` function. - -``` r -oolong_test <- wi(abstracts_seededlda, userid = "Hadley") -oolong_test -``` - - ── oolong (topic model) ──────────────────────────────────────────────────────── - - ✔ WI ✖ TI ✖ WSI - - ☺ Hadley - - ℹ WI: k = 10, 0 coded. - - ── Methods ── - - • <$do_word_intrusion_test()>: do word intrusion test - - • <$lock()>: finalize and see the results - -One can then conduct the test following the instruction displayed, -i.e. `oolong_test$$do_word_intrusion_test()`. - -``` r -oolong_test$do_word_intrusion_test() -``` - -One should see a graphic interface like the following and conduct the -test. - - - -After the test, one can finalize the test by locking the test. - -``` r -oolong_test$lock() -``` - -And then obtain the result of the test. For example: - -``` r -oolong_test -``` - - ── oolong (topic model) ──────────────────────────────────────────────────────── - - ✔ WI ✖ TI ✖ WSI - - ☺ Hadley - - ℹ WI: k = 10, 10 coded. - - ── Results: ── - - ℹ 90% precision - -## Contact Details - -Maintainer: Chung-hong Chan - -Issue Tracker: - -## Publication - -1. Chan, C. H., & Sältzer, M. (2020). oolong: An R package for - validating automated content analysis tools. The Journal of Open - Source Software: JOSS, 5(55), 2461. - - - - - - - - -