Skip to content

Missing code to process source and target inputs #1

@rsgoncalves

Description

@rsgoncalves

Based on the code provided in usage_example.ipynb It is not clear what the source/target datasets should be — presuming the source dataset is a list of strings, and the target is an ontology or a terminology, however in the code below the target dataset seems to be also a text file and not a terminology.

Similarly, it is not clear how the source and target weights are computed. To be able to run this tool, it would be useful to clarify these issues and to have some code to process inputs and generate BioBERT weights accordingly.

source_dataset="../data/input.txt"
target_dataset="../data/target.txt"
source_weight="../data/input.jsonl"
target_weight="../data/target.jsonl"

The data folder used in the given example usage is missing. What exactly should these files be and how can they be generated by potential users?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions