The sentences in these files originate from the Tatoeba Corpus and have been downloaded from ManyThings.
In the files found here, the sentences have been shuffled and split according to language. They are categorized into three portions: training (70%), validation (15%) and testing (15%).
This data is licensed under the Attribution 2.0 France (CC BY 2.0 FR).