Skip to content

Formatting Issues while converting NLM-Chem corpus & question about converting relations #8

@mobashgr

Description

@mobashgr

Hi Lenz!
Great Library and a life saver (Y). However, I want to state that I have been doing extensive post-processing after converting BioC XML to Conll, even if I set byte_offsets to False. Briefly, the problem is many tokens and their corresponding labels exist as if there is one token. The second problem is the labels would look something like "B-Chemicalentity;B-Chemicalentity". Here is the corpus that I am using.

Regarding the relationships, can you provide an example or extra hints other than the ones in the documentation to convert relations from BioC XML to Conll?

Best,
Ghadeer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions