Grammatical Error Correction for Indian languages under low resource setting with less than 1000 training samples for each language. Indian languages are low resource and this task tries to imitate that.
The training (train.csv) and validation (dev.csv) are available under folder for each language. The final testing data will be available later.
The languages available are:
- Hindi
- Telugu
- Bangla
- Malayalam
- Tamil
Tamil GEC is under extreme low-resource setting with less than 100 training samples.
This is part of the Shared Task co-located with 1st BHASHA workshop 2025.
The urls for the competitions for final phase are:
- Max Team Size: 4
- An individual cannot be part of multiple teams.
- Submission is from one CodaBench account.
- CodaBench account is required for participation.
- Google Form for registration of teams: https://forms.gle/gftDLz69Vv9aB3AEA.
GLEU score will be used for evaluation
Your submission should contatin a .zip of predictions.csv file. The predictions.csv file should contain 2 columns named Input sentence and Output sentence. Submissions that do not conform to this requirements will not be evaluated by the system.
All participating teams are expected to submit a system paper describing methodologies adopted and findings.
P.S:- All characters which are not in native script will be counted as incorrect.