-
Notifications
You must be signed in to change notification settings - Fork 31
Add a Workflow Example for Geospatial Analysis using RAPIDS #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:49Z Chore: It would be nice to link things like
Chore: I would also be careful to define acronyms here on there first use, there are a few like AOI that readers may not know. |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:50Z Line #14. FEATURES_ZARR = Path("")
Suggestion (non-blocking): I would be tempted to separate out the variables that you expect users to change into a new cell. That way it's more clear what needs to be done. ncclementi commented on 2025-12-02T14:10:21Z +1 to this comment, I thought of the same. |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:51Z Suggestion (non-blocking): Have you thought about using |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:51Z suggestion (non-blocking): you could use cupy-xarray here instead of manually mapping the blocks over to cupy. It would be nice to encourage people to use that library. |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:52Z thought: This cell is going to take a while right? I wonder if you should add a note to talk about nvdashboard and suggest folks go and have a look at the docs there so they can see the activity of their hardware |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:54Z nitpick: You say "note how...", this could be a note admonition. |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:54Z Line #1. model_path = Path("/raid/jjayabaskar/full-outputs/models/lulc_rf_2022.pkl")
suggestion (non-blocking): Maybe this path variable should be defined at the top with the rest of them |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:55Z nit: I don't think you need to describe closing the client |
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-25T16:52:56Z thought: The command to "Run it now" made me chuckle |
jacobtomlinson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really nice! I left a few comments throughout (I'm trying out conventional comments, let me know what you think).
A few higher level thoughts:
The chunk of text above each code cell feels a little robotic. I'm guessing you used AI to generate some portion of these. It feels a little like that code comment trap where you just describe what the code does. I do appreciate some amount of that because the code is necessarily terse, lots of geo data analysis code is. But I think what it is missing is "why". You start out at the beginning with some high level goals. It would be nice to keep a narrative thread throughout explaining why you are doing each things and how each step is bringing us closer to that goal.
The notebook feels like it ends abruptly. There is no conclusion or recap of what was done. I think you need to wrap the post up in some way, probably by recapping what was done and then talking a little more about what could be improved or done next.
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
|
View / edit / reply to this conversation on ReviewNB jacobtomlinson commented on 2025-11-26T17:45:09Z Suggestion: This reads really nicely now. I would encourage you to add more links though. Any project name like dask, cudf, cupy, etc should be linked to their docs or GitHub repo on first use. I would also link NDVI and NDWI to their wikipedia pages.
I have two personas in my head for the readers of this post:
At the moment this post reads very well for the 1st persona, but there is some geospatial terminology that it would be good to unpack for the 2nd persona. It's also worth noting that PyData Boston will be heavily weighted towards the 2nd persona. ncclementi commented on 2025-12-02T14:04:02Z +1 to Jacob's comment.
Few links to add, and typos:
The prerequisites are a bit confusing to me, I assume that you will probably explain that in the code below, if so maybe add a line that says so. If I try to see before continuing how do I get a Prerequisites 2 and 3, I'm not sure I know how, maybe mentioning that you'll show how this below? (if you show it), or point to a place that shows how to do that. |
|
+1 to Jacob's comment.
Few links to add, and typos:
The prerequisites are a bit confusing to me, I assume that you will probably explain that in the code below, if so maybe add a line that says so. If I try to see before continuing how do I get a Prerequisites 2 and 3, I'm not sure I know how, maybe mentioning that you'll show how this below? (if you show it), or point to a place that shows how to do that. View entire conversation on ReviewNB |
|
+1 to this comment, I thought of the same. View entire conversation on ReviewNB |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:20Z Do we have instructions on how to set up the environment somewhere, with pinned versions? |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:20Z if this is the first time you introduce Zarr, then add a link to the library |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:21Z Do we have any guides to point someone how to define this? most specifically the AOI_GEOJSON and FEATURE_ZARR, as somoene who is not entirely familiar with this, I wouldn't know how to do this.
If we can write as an example somewhere what you did for this notebook, that would be great. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:22Z When you say "If you need to pin workers to specific GPUs or change memory limits" in this case, why would this happen? If you think it could be needed, add a link to the docs on how to do this, so the user has a guide. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:23Z Line #1. # %% Is this a typo? if so we could remove it, I noticed it appears in several cells, not sure if it's intentional |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:24Z There is a lot of info here, that only experts will understand, having a guide on how to prepare the AOI geometry as a reference would be helpful for the user. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:24Z Line #19. project = pyproj.Transformer.from_crs(4326, target_epsg, always_xy=True).transform Is this because your CRS is on 4326 right? It would be helpful to know what choices are made because of your particular case, and what is not, maybe some comments around why the |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:25Z "Sentinel-2 tiles also include a special band called" ... It would be nice to have a link to the docs that supports this, in case the user wants to read more about it. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:26Z Why do we do this step, maybe a small not mentioning why?
In general, I noticed that all the steps have text above on what are we doing, but not why is that needed or part of the process, I think this would be helpful for the audience with less geo knowledge. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:26Z "data preprocessing steps are done ", At this point, after so many steps, it would be worth adding a sentence or two re-caping what was done, in words. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:27Z Line #2. random_state = 42 Is this for the RF? Do we need a random state? Asking for two reasons:
|
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:28Z Add a comment, to expand on what is each plot in parentheses to remind the user: like "NDVI ( short desc ), NDWI ( short desc ), and WorldCover ( short desc )" |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:29Z "The strong skew you see here comes" is this s related to class 7? maybe add class 7 = Building area in parentheses. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:30Z How did you choose those values (n_estimators, n_bins, n_streams), if these are optimized mention how you got there, if these are a good starting point, mentioned that. To give the user a reference. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:31Z Do we know how long it takes for the hardware you run this on? It is helpful to give the user a reference of time. |
|
View / edit / reply to this conversation on ReviewNB ncclementi commented on 2025-12-02T15:15:31Z Any idea what those warnings are about? |
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
ncclementi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding a notebook here which consists of an end-to-end workflow example to load and preprocess Sentinel-2 data, train a Random Forest model to perform LULC classification and perform inference on an unseen Sentinel-2 tile.