Automated Essay Scoring for Russian as a Second Language: A Deep Ordinal Learning Approach

Master's Thesis in Digital Humanities and Digital Knowledge
University of Bologna (UNIBO)

Author: Nikolai Gorbachev
Supervisor: Prof. Fabio Tamburini
Co-Supervisor: Prof. Mikhail Kopotev (University of Helsinki)

Abstract

Automated Essay Scoring (AES) systems are critical for scaling language assessment, yet most research focuses on English and large-scale datasets. This thesis addresses the challenge of developing reliable AES models for Russian as a Second Language (L2), a morphologically rich language, within the resource-constrained context of institutional learning. The study utilizes a real-world dataset of ~1,100 learner essays from the Russian language course at the Middlebury Language School (VT, USA), rated according to the ACTFL (American Council on the Teaching of Foreign Languages) proficiency guidelines.

The research systematically evaluates three modeling paradigms: (1) feature-based statistical models relying on engineered linguistic metrics (e.g., syntactic complexity, lexical diversity); (2) deep representation learning using fine-tuned multilingual Transformers (XLM-RoBERTa); and (3) hybrid fusion strategies. A central innovation of this work is the application of Ordinal Regression objectives, specifically Consistent Rank Logits (CORAL) and Conditional Ordinal Regression (CORN), to explicitly model the ordered nature of proficiency levels.

Results demonstrate that the deep ordinal approach (XLM-RoBERTa + CORAL) achieves competitive performance under low-resource conditions, substantially outperforming feature-based baselines and standard cross-entropy models.

Research Context

This work contributes to:

Low-resource AES research
Ordinal deep learning in NLP
Modeling morphologically rich languages (Russian)

Key Contributions

Application of CORAL and CORN ordinal objectives to Russian L2 AES.
Empirical comparison of feature-based, transformer-based, and fusion paradigms.
Analysis of linguistic feature evolution across ordinal proficiency thresholds.
Demonstration that ordinal objectives improve rank consistency over cross-entropy.

Why Ordinal Learning?

Standard classification treats proficiency levels as independent categories.
However, language proficiency is inherently ordered (e.g., Intermediate < Advanced < Superior).

Ordinal objectives (CORAL, CORN):

Enforce rank consistency
Penalize distant misclassifications more strongly
Reduce prediction incoherence (e.g., skipping levels)

This thesis demonstrates that explicitly modeling this structure improves evaluation metrics such as QWK and MAE.

Main Result

Best Model: XLM-RoBERTa + CORAL head QWK: 0.8677 ± 0.0083 MAE: 0.4956 ± 0.0155 RMSE: 0.7663 ± 0.0391

Repository Structure

This repository is organized as a sequential research pipeline, corresponding to the chapters of the thesis.

.
├── data/                       # (Excluded from repo for privacy) Dataset placeholders
├── docs/
│   ├── Gorbachev_Nikolai_2026_Russian_AES_Thesis.pdf  # Full Thesis Text
├── notebooks/                  # Experimental pipeline
│   ├── 01_preprocessing/       # Stanza/spaCy pipelines for text normalization
│   ├── 02_feature_extraction/  # Implementation of 24 linguistic features (Syntactic/Lexical)
│   ├── 03_baselines/           # Dummy, Linear Regression, and Logistic Regression baselines
│   ├── 04_feature-based_models/# Non-neural Ordinal models (MLP + CORAL/CORN on manual features)
│   ├── 05_deep_models/         # Fine-tuning XLM-RoBERTa with various loss heads (The core models)
│   ├── 06_fusion/              # Early and Late fusion experiments
│   └── 07_visualizations/      # Generation of Confusion Matrices and Error Analysis plots
├── requirements.txt            # Python dependencies
└── README.md                   # Project documentation

Reproducibility

Environment:

Python 3.12
PyTorch 2.x
HuggingFace Transformers
Scikit-learn

Deep models can be trained using the notebooks in 05_deep_models/.

Random seeds are fixed for reproducibility (seed=42).

To run the deep learning experiments:

pip install -r requirements.txt
python -m spacy download ru_core_news_md
python -c "import stanza; stanza.download('ru')"

Evaluation Metrics

Models are evaluated using:

Accuracy
Macro-F1
Quadratic Weighted Kappa (QWK)
Mean Absolute Error (MAE)

Data Availability

The dataset used in this study contains learner essays collected within an institutional setting and cannot be publicly released due to privacy agreements.

However, the notebooks/ directory includes full preprocessing and modeling pipelines that can be applied to any comparable proficiency-annotated dataset (e.g., standard AES corpora).

Future Work

Larger multi-institution Russian L2 corpora
Multitask learning with linguistic auxiliary objectives
Deployment as a web-based scoring assistant

Citation

If you use this code, please cite the thesis:

@mastersthesis{gorbachev2026russianAES,
  author  = {Nikolai Gorbachev},
  title   = {Automated Essay Scoring for Russian as a Second Language: A Deep Ordinal Learning Approach},
  school  = {University of Bologna},
  year    = {2026},
  address = {Bologna, Italy}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
notebooks		notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Essay Scoring for Russian as a Second Language: A Deep Ordinal Learning Approach

Abstract

Research Context

Key Contributions

Why Ordinal Learning?

Main Result

Repository Structure

Reproducibility

Evaluation Metrics

Data Availability

Future Work

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated Essay Scoring for Russian as a Second Language: A Deep Ordinal Learning Approach

Abstract

Research Context

Key Contributions

Why Ordinal Learning?

Main Result

Repository Structure

Reproducibility

Evaluation Metrics

Data Availability

Future Work

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages