Concordancier multilingue pour l'étude de traductions

Cet outil produit des concordances multilingues à partir de corpus structurés en XML-TEI, alignés au segment et annotés lexicalement. Il est pensé pour faciliter l'étude quantitative et qualitative de corpus de traductions.

Des tables de concordance au format HTML sont produites pour chaque requête à partir d'un texte cible ou d'un texte source.

Structure des fichiers d'entrée

L'outil fonctionne actuellement sur des corpus binaire source/cible. Il requiert deux fichiers TEI préalablement alignés. L'unité d'alignement, c'est-à-dire l'élément qui contient le segment aligné, doit être indiqué. Par défault, il s'agit de l'élément cl (clause). L'alignement est réalisé par un jeu d'attributs @xml:id et @corresp. La gestion des alignements de 1 > n, n > 1 et n > n segments est possible.

Fonctionnement

Peuvent être interrogés les informations formelles (formes), lexicales (lemmes) ou grammaticales (pos, morph), selon une syntaxe CQL basique:

python3 get_translations.py -s test_data/Val_S.xml -t test_data/Rome_W.xml -o new_alignement_2/ -me cl -q "[pos='AQ.*'][pos='NC.*']" -w 1

Cette commande produit une table de concordance en prenant le texte Val_S.xml comme source, avec un contexte de 1 segment à gauche et à droite pour la source et la cible. Elle cherche tous les adjectifs antéposés au substantif (requêtes sur les étiquettes EAGLES).

Sortie

Sont produites des tables au format HTML, CSV et LaTeX:

Multilingual concordancer for translation studies

This tool produces multilingual concordances from corpora structured in XML-TEI, segment-aligned and lexically annotated.

HTML-formatted concordance tables are produced for each query from target or source text.

Input file structure

The tool currently works on binary source/target corpora. It requires two pre-aligned TEI files.

The alignment unit, i.e. the element containing the aligned segment, must be specified. The cl (clause) is the default element. Alignment must be indicated by a set of @xml:id and @corresp attributes. The tool can manage alignments of 1 > n, n > 1 and n > n segments.

How it works

Forms, lemmas, pos and morph can be queried with a basic CQL parser:

python3 get_translations.py -s test_data/Val_S.xml -t test_data/Rome_W.xml -o new_alignement_2/ -me cl -q "[pos='AQ.*'][pos='NC.*']" -w 1

This command produces a concordance table taking the text Val_S.xml as source, with a context of 1 segment left and right for source and target, and extracts all the segments with an anteposed adjective.

Filtering

Positive filtering is possible:

python3 get_translations.py -t test_data/Val_S.xml -s test_data/Rome_W.xml -o new_alignement_2/ -me cl -q "[lemma='monarchia']" -f "[lemma='monarquía']" -w 1

For all sentences that contain the lemma monarchia in source sentence and where lemma monarquía in is target (castilian) sentence. In human language, it helps detecting all litteral translations of the lemma monarquia.

The negative filter works the same way:

python3 get_translations.py -t test_data/Val_S.xml -s test_data/Rome_W.xml -o new_alignement_2/ -me cl -q "[lemma='monarchia']" -nf "[lemma='monarquía']" -w 1

For all sentences that contain the lemma monarchia in source sentence, and where lemma monarquía is not in aligned target sentence. This query detects all non litteral translations of lemma monarchia.

Output

HTML, CSV and LaTeX formatted tables are produced:

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
img		img
scripts		scripts
test_data		test_data
test_results		test_results
.gitignore		.gitignore
README.md		README.md
get_translations.py		get_translations.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Concordancier multilingue pour l'étude de traductions

Structure des fichiers d'entrée

Fonctionnement

Sortie

Multilingual concordancer for translation studies

Input file structure

How it works

Filtering

Output

About

Uh oh!

Releases

Packages

Languages

matgille/multilingual_concordances

Folders and files

Latest commit

History

Repository files navigation

Concordancier multilingue pour l'étude de traductions

Structure des fichiers d'entrée

Fonctionnement

Sortie

Multilingual concordancer for translation studies

Input file structure

How it works

Filtering

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages