Skip to content

Support multiple input files vs dir of files #9

@cwensel

Description

@cwensel

Tess can process all files in a given directory. The limitation is that they must all have the exact same schema if fields are required during the transforms.

New intrinsics have been added to clean up field names and add missing columns, but when applied to a set of files, Tess must be run on each file individually with a final Tess pass to merge them all.

Tess should accept multiple sources with the intent on merging each before the sink. That is, identical branches will be created for each source, but these branches will accommodate the unique field names for each file. The requirement is that by the merge before the sink, all branches have the same declared fields.

Cascading supports this natively, but the PipelineDef model will need to be updated, or better, any named source not declared as a join file will be treated as a source the main branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions