Skip to content

Task cleanup should not look for _temporary dirs when talking to s3 #66

@cwensel

Description

@cwensel

Seeing this exception sometimes talking to S3. Could be a race condition on directory (key) creates on S3, but there should be no task cleaup when using S3.

    at cascading.tap.hadoop.io.TapOutputCollector.close(TapOutputCollector.java:184)
    at cascading.tuple.TupleEntrySchemeCollector.close(TupleEntrySchemeCollector.java:245)
    at cascading.tap.partition.BasePartitionTap$PartitionCollector.closeCollector(BasePartitionTap.java:205)
    at cascading.tap.partition.BasePartitionTap$PartitionCollector.close(BasePartitionTap.java:190)
    at cascading.flow.stream.element.SinkStage.cleanup(SinkStage.java:148)
    at cascading.flow.stream.graph.StreamGraph.cleanup(StreamGraph.java:187)
    at cascading.flow.local.planner.LocalStepRunner.call(LocalStepRunner.java:204)
    at cascading.flow.local.planner.LocalStepRunner.call(LocalStepRunner.java:53)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://bucket/test/_temporary/_attempt_002147483647_0000_m_000000_0/path/
    at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1931)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1822)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1763)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1585)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1561)
    at cascading.tap.hadoop.util.Hadoop18TapUtil.moveTaskOutputs(Hadoop18TapUtil.java:326)
    at cascading.tap.hadoop.util.Hadoop18TapUtil.moveTaskOutputs(Hadoop18TapUtil.java:332)
    at cascading.tap.hadoop.util.Hadoop18TapUtil.moveTaskOutputs(Hadoop18TapUtil.java:332)
    at cascading.tap.hadoop.util.Hadoop18TapUtil.moveTaskOutputs(Hadoop18TapUtil.java:332)
    at cascading.tap.hadoop.util.Hadoop18TapUtil.moveTaskOutputs(Hadoop18TapUtil.java:332)
    at cascading.tap.hadoop.util.Hadoop18TapUtil.commitTask(Hadoop18TapUtil.java:174)
    at cascading.tap.hadoop.io.TapOutputCollector.close(TapOutputCollector.java:171)
    ... 11 more```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions