-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Hi, I have found some duplicate data in train.txt. For example,
line 190245: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###
line 190246: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###
line 190667: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###
line 190668: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###
They are totally the same in both entities and sentences. Are they set for some reason?