Skip to content

duplicate data in train.txt #12

@ShellingFord221

Description

@ShellingFord221

Hi, I have found some duplicate data in train.txt. For example,
line 190245: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###
line 190246: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###

line 190667: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###
line 190668: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###

They are totally the same in both entities and sentences. Are they set for some reason?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions