Skip to content

HashJoin has problematic interaction with Merge #61

@oscar-stripe

Description

@oscar-stripe

see this graph on 3.2.1
https://www.dropbox.com/s/iffadh9x7unrg5w/01-BalanceAssembly-init.dot.png?dl=0

You can see the full planner logs here:
https://www.dropbox.com/s/7qyc4a9pxtstwio/E552D2.tgz?dl=0\

We are merging two HashJoins after some Each operations. In this particular graph, it is possible to fix the issue by adding Checkpoints after all but one of the HashJoins it seems. This is not a great solution since even knowing what a graph will look like when you combine many pipes with functions is not very clear.

It would be great to have either a clear rule that we need to follow in generating the graphs, or to remove this restriction since we would like to using cascading 3 in scalding by default.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions