Skip to content

Ability to delete local cloudpath cache after upload #29

@lazear

Description

@lazear

I found myself downloading a large amount of data from PRIDE (PXD004452) on a small EC2 instance (64 GB disk space) with the goal of directly transferring the data to an S3 bucket (I have done this several times, I ❤️ ppx). I have always just started a small instance with minimal disk space, because I figured that since I was just directly transferring to S3 it wouldn't matter... This is not the case though! I am out of memory due to cloudpath local caching.

If I delete the files in the /tmp directory, I can free up space and try to resume the search - but when I restart the search, the completed raw files are re-synced back to the /tmp directory. I think there should be a way (based on issues linked below) to manually delete the locally cached file after upload - not sure how it works for a re-started search. I can try and take a stab at this if it's something you feel could be supported in ppx. This is probably too specialized to be upstreamed to cloudpath - I would say raw files downloaded from PRIDE/etc are immutable and we don't need to worry about syncing changes from local to cloud - just whether the file is synced between cloud storage & repository.

https://cloudpathlib.drivendata.org/stable/caching/

drivendataorg/cloudpathlib#233
drivendataorg/cloudpathlib#153

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions