Skip to content

Data organization issue #39

@laoliu5280

Description

@laoliu5280

Context:

For running a dataset, we need to specify a task_name to pipeline.py, and this task_name will be the directory containing the datasets, for example, /hypothesis-generation/data/task_A

Then when running the task, a BaseTask object will be created using this task_name, and then retrieving the data, metadata, config, etc.

In the config.yaml files, users need to specify another "task_name", which will be only used to find its extract_label register.

The duplicate definition of task_name can cause some confusion or bugs.

Maybe we can change the name or handling of this. We will likely need more organized datasets in the future as the number of datasets is growing quickly, and sometimes one task can have different configurations, so creating multiple extract_label functions for the same task is unnecessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions