Skip to content

BHASHA-Workshop/IndicGEC2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IndicGEC2025

Grammatical Error Correction for Indian languages under low resource setting with less than 1000 training samples for each language. Indian languages are low resource and this task tries to imitate that.

The training (train.csv) and validation (dev.csv) are available under folder for each language. The final testing data will be available later.

The languages available are:

  1. Hindi
  2. Telugu
  3. Bangla
  4. Malayalam
  5. Tamil

Tamil GEC is under extreme low-resource setting with less than 100 training samples.

This is part of the Shared Task co-located with 1st BHASHA workshop 2025.

The urls for the competitions for final phase are:

  1. Hindi

  2. Malayalam

  3. Telugu

  4. Bangla

  5. Tamil

Rules for participation

  1. Max Team Size: 4
  2. An individual cannot be part of multiple teams.
  3. Submission is from one CodaBench account.
  4. CodaBench account is required for participation.
  5. Google Form for registration of teams: https://forms.gle/gftDLz69Vv9aB3AEA.

Evaluation Criteria

GLEU score will be used for evaluation

Your submission should contatin a .zip of predictions.csv file. The predictions.csv file should contain 2 columns named Input sentence and Output sentence. Submissions that do not conform to this requirements will not be evaluated by the system.

All participating teams are expected to submit a system paper describing methodologies adopted and findings.

P.S:- All characters which are not in native script will be counted as incorrect.

About

Grammatical Error Correction for Indian languages under low resource setting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages