Text Classifier v2: AI vs SLW (Second Language Writers)

This project is an extension of Text Classifier v1, using a stronger XGBoost model and larger dataset. It explores linguistic data science and syntactic complexity modeling.

Live App

Improvements over v1

Upgraded from Random Forest to XGBoost
More dataset for training (from 300 → 1000 samples)
More stable predictions and improved feature interpretability

About the syntactic Complexity Indices

The classifier uses L2SCA indices by TAASSC.

Predictions are supported by SHAP contribution plots, showing how each feature influences the outcome toward AI or SLW.

Data Overview

The dataset used for model training consists of 1,000 writing samples (500 human, 500 AI):

Human-written:
500 essays by second language writers (SLW), sourced from ICNALE
AI-generated:
500 essays generated by large language models (LLMs), sourced from LLM-generated Essay Dataset

Data preprocessing by TAASSC.

Data Usage Notice

The .txt files in txt_samples/ are included only for demonstration and learning purposes.
They are not licensed for reuse, redistribution, or commercial use.
The dataset file X_binary.csv is private and is not licensed for reuse, redistribution, or modification.
It is shared solely for demonstration purposes and should not be used for any other purpose.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
txt_samples		txt_samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
X_binary.csv		X_binary.csv
model_xgb.pkl		model_xgb.pkl
requirements.txt		requirements.txt
text-classifier-v2.py		text-classifier-v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classifier v2: AI vs SLW (Second Language Writers)

Improvements over v1

About the syntactic Complexity Indices

Data Overview

Data Usage Notice

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Classifier v2: AI vs SLW (Second Language Writers)

Improvements over v1

About the syntactic Complexity Indices

Data Overview

Data Usage Notice

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages