Skip to content

Add automatic Scratch.jl fallback for storage#80

Closed
asinghvi17 wants to merge 2 commits intoEcoJulia:masterfrom
asinghvi17:docs/update-storage-documentation
Closed

Add automatic Scratch.jl fallback for storage#80
asinghvi17 wants to merge 2 commits intoEcoJulia:masterfrom
asinghvi17:docs/update-storage-documentation

Conversation

@asinghvi17
Copy link
Contributor

@asinghvi17 asinghvi17 commented Sep 18, 2025

Add Scratch.jl fallback if RASTERDATASOURCES_PATH is not present, this removes the most common user footgun.

AI generated.

- Update README.md to explain automatic storage using Scratch.jl
- Make RASTERDATASOURCES_PATH optional in installation instructions
- Add comprehensive docstring to rasterpath() function
- Update examples in ALWB and MODIS documentation to be more generic
- Remove references to get_raster_storage_path() function per plan changes

This addresses task 7 from the default-scratch-storage spec.
@asinghvi17 asinghvi17 changed the title Update documentation for automatic storage functionality Add automatic Scratch.jl fallback for storage Sep 18, 2025
@rafaqz
Copy link
Member

rafaqz commented Oct 24, 2025

Hey we should merge this, just looks like a Project.toml issue?

@rafaqz rafaqz mentioned this pull request Oct 24, 2025
@rafaqz
Copy link
Member

rafaqz commented Oct 24, 2025

Closing in favour of #83

@rafaqz rafaqz closed this Oct 24, 2025
"""
function rasterpath()
# Priority 1: Use environment variable if set and valid
if haskey(ENV, "RASTERDATASOURCES_PATH") && isdir(ENV["RASTERDATASOURCES_PATH"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to special-case when there is a RASTERDATASOURCES_PATH set but it's not a dir. In that case as it stands this will create a scratch directory and potentially start downloading a bunch of data. I think it should just error in that case (as before), since it means a user actively set a path - they might just have made a typo

Comment on lines +106 to +107
scratch_dir = @get_scratch!("raster_data")
@debug "Using scratch directory for raster data storage: $scratch_dir"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does @get_scratch! make a new directory by default or just return a path? Can we get this to print an @info if scratch_dir doesn't exist yet so first-time users are aware about this hidden directory where potentially many GB will end up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a concern and why I never replaced the path, like at least you know what you are doing when you set it manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on package and julia version.

In general we can have it print if the directory is empty though, which is a good indication that nothing has been downloaded yet.

But I really don't think the files RDS downloads are large enough that most users these days will care. Laptops have lots of storage!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so on a new Julia version the scratch space would change and it would all start downloading again?? That's not great

But I really don't think the files RDS downloads are large enough that most users these days will care. Laptops have lots of storage!

I disagree. Just checked and my CHELSA folder alone has 120 GB. That's a lot. And the thing is it keeps accumulating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh wow, yeah that's substantial. I somehow never downloaded more than 3GB to my folder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still set your path manually, we can warn that you haven't, but IMO not having a default is just user-unfriendly and has caused confusion in the past

Copy link
Member

@rafaqz rafaqz Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multuple 100GB+ datasets here, and some of the weather data must be many terrabytes.

Its very much all in the kill your home directory range so we have to be a bit careful.

Thats the reason there is no default, siliently borking someones system is also very user-unfriendly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants