cwb-cads: CWB-based API for Corpus-Assisted Discourse Studies

implemented in Python/APIFlask
- JWT authorisation
- interactive OpenAPI documentation
uses cwb-ccc for connecting to CWB
- CWB must be installed and corpora must be encoded via cwb-encode
- meta data can be stored separately or be parsed from structural attributes
the repository also contains a beta version of a frontend (the "MMDA toolkit")

Reference

Our methodology is explained in detail in Heinrich & Evert (2024).

@InProceedings{HeinrichEvert2024,
  author    = {Heinrich, Philipp and Evert, Stephanie},
  title     = {Operationalising the Hermeneutic Grouping Process in Corpus-assisted Discourse Studies},
  booktitle = {Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers},
  year      = {2024},
  editor    = {Klamm, Christopher and Lapesa, Gabriella and Ponzetto, Simone Paolo and Rehbein, Ines and Sen, Indira},
  pages     = {33--44},
  address   = {Vienna, Austria},
  month     = sep,
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2024.cpss-1.3}
}

Abstract: We propose a framework for quantitative-qualitative research in corpus-assisted discourse studies (CADS), which operationalises the central process of manually forming groups of related words and phrases in terms of “discoursemes” and their constellations. We introduce an open-source implementation of this framework in the form of a REST API based on Corpus Workbench. Going through the workflow of a collocation analysis for fleeing and related terms in the German Federal Parliament, the paper gives details about the underlying algorithms, with available parameters and further possible choices. We also address multi-word units (which are often disregarded by CADS tools), a semantic map visualisation of collocations, and how to compute assocations between discoursemes.

We provide running instances of the backend and frontend on our web server.

For information regarding the original mmda-v1 toolkit, see the former repository and our project website.

Manual

We provide information

regarding general CADS functionality in the manual, and
regarding discourseme-based functionality in the MMDA manual.

Installation and Configuration

Backend: cwb-cads

We recommend installing all dependencies of the API in a virtual environment:
```
python3 -m venv venv
. venv/bin/activate
pip3 install -r requirements.txt
```
The API is configured using cfg.py in the top-level directory. Use the example config as a starting point. It uses staging specific configs that can be activated using the CWB_CADS_CONFIG environment variable, e.g.
```
export CWB_CADS_CONFIG=cfg.DevConfig
```
Initialise the database:
```
flask --app cads database init
```

Import corpus settings from JSON file.:

flask --app cads corpus import ${corpora.json}

Meta data can be imported from separate files or from within the XML data stored in structural attributes of indexed corpora:
```
flask --app cads corpus read-meta ${cwb_id} --level "text"
```

You can also import pre-defined subcorpora using a TSV file:

flask --app cads corpus subcorpora ${cwb_id} ${subcorpora.tsv}

Discoursemes can be imported using a TSV file

flask --app cads discourseme import --path_in ${discoursemes.tsv}

and can similarly be exported:

flask --app cads discourseme export --path_out ${discoursemes.tsv}

Start the development server
```
flask --app cads --debug run
```

Frontend: mmda-toolkit

Requirements:

node.js
nvm (node version manager) is recommended

Setup:

Navigate to frontend/
Install the correct node version. If you have nvm installed, you can just run:
```
nvm install
```
And to use it:
```
nvm use
```
Otherwise, install the correct node version manually as specified in .nvmrc
Install node dependencies:
```
npm install
```
Specify API in vite.config.ts. This uses our development server by default.
Run development build of frontend:
```
npm run dev
```

production

set target in frontend/mmda/vite.config.ts
set frontend URL VITE_ROUTER_BASEPATH in frontend/mmda/.env.production
set backend URL VITE_API_URL in frontend/mmda/.env.production
run npm run build and deploy mmda/dist/

Name		Name	Last commit message	Last commit date
Latest commit History 1,347 Commits
.github/workflows		.github/workflows
R-utils		R-utils
cads		cads
frontend		frontend
manual		manual
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
cfg_example.py		cfg_example.py
cwb-cads.wsgi		cwb-cads.wsgi
makefile		makefile
openapi.json		openapi.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
roadmap-mmda.md		roadmap-mmda.md
roadmap-spheroscope.md		roadmap-spheroscope.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cwb-cads: CWB-based API for Corpus-Assisted Discourse Studies

Reference

Manual

Installation and Configuration

Backend: cwb-cads

Frontend: mmda-toolkit

production

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cwb-cads: CWB-based API for Corpus-Assisted Discourse Studies

Reference

Manual

Installation and Configuration

Backend: cwb-cads

Frontend: mmda-toolkit

production

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages