@cstenkampstrahm @bralex @Preston5789 : Since you're presenting so soon, I wanted to make sure I went ahead and got you some feedback on the current version of your project (based on the GitHub repository as of this evening). These are items for you to consider for your final draft of the project due next week, not things that you necessarily need to resolve by the time you present. Also, I don't expect you to incorporate all of the suggestions, but do address some of them. I think the only severe concern I had is that the RMarkdown document doesn't include any interesting R code-- all the figures are being generated with a script somewhere else and then read in as images. This keeps the file from being truly reproducible, as if I wanted to go back and see how you made a figure, I can't do that (I could find the image file you're reading in, but there's no way for me to figure out from that image file which R script was used to create it, so I'd have to search through all of your R scripts to find the right code). Once quick alternative is that you may be able to call source on the R scripts you're using to create the plots from within the RMarkdown document--- at least in that case the RMarkdown document would clearly specify which file has the code used for each figure, so it would be a bit more reproducible.
Here are other comments, split into comments for the RMarkdown report, the Shiny app, and overall:
RMarkdown document
- It would be helpful to include more links in the RMarkdown document. For example, a link to a website with more information about the Ranger Uranium mine at its first mention, a link to Environmental Research Institute of the Supervising Scientist (ERISS), etc.
- The first time you list the metals, put the full name, then abbreviation (for example, “copper (Cu)”). This will be helpful to readers who don’t remember all their abbreviations (which will probably be plenty of us in the class).
- The first map in the report is a bit hard to see because it’s zoomed out so far, and some of the labels don’t show at all (the ones in the background map, especially the ones in green). I would suggest that you play around with this map to see if you can make it more user-friendly. Some of the things I would try are: (1) Try a different source for the map. Something with a bit less going on, or with more readable labels, might make the whole map look a little cleaner. (2) Try to zoom in a little more. With some of the map sources for
map_data (stamen I think, but not google), you can input a bounding box (latitude and longitude for the outside corners of the range) and clip the map right down to the area you need. See the helpfile ?get_stamenmap for some more details.
- Ideally, since most people in the class don’t know their Australian geography, you would want to also include an inset or something that shows which part of Australia this map highlights. This isn’t completely straightforward, and it’s not the end of the world if you can’t figure it out for the final report, but here are some webpages with examples of doing this in R, so you can see what I mean, as well as some code:
https://quantpalaeo.wordpress.com/2016/06/05/ggplot2-maps-with-inset/
https://plot.ly/r/choropleth-maps/
https://www.r-bloggers.com/creating-inset-map-with-ggplot2/
- In the RMarkdown document, why are you reading an image in for the first map, rather than running R code to generate it directly (
large map code chunk)?
- Same for the second map (
small map code chunk)?
- Best practice is to not include spaces when naming your code chunks (so
largemap instead of large map). I think in some cases these code chunk names are treated as objects in R, so it’s safer to not have spaces.
- Be a bit clearer in explaining the difference between the first and second maps. I can see that the second map is showing the type of sample, but why are there some samples from further to the west and east that are shown in the first map but not the second? Are those samples still in the dataset used for the second map, but are cut out with the amount of zooming? Or are those samples excluded from the sample dataset as you move from the first map to the second? If so, why?
- For the second map, it’s also a bit hard to distinguish detailed information. You may want to try faceting, creating separate maps for each sample substrate, and then increase the size of the points and make them somewhat transparent. That way, you could see the points better because they’d be larger, but you could also see where they overlap (transparency) and see the patterns in measurements for different substrates (facets). You could use some of the
forcats functions to arrange these by number of sample or average radionuclide (if that’s meaningful) or something.
- Great that you’ve linked through to the recent journal article.
- “The only manual process was to load the file into Excel and save as a .csv file”: Why not read it directly into R using
readxl and write out a csv, if you want one, using R? This would allow you to script everything you did starting from the raw data you got from someone else.
- The conversion from Easting / Northing to latitude / longitude is a very interesting part of your data cleaning. I suggest you include a few more details about that and perhaps a bit of example R code. You’ll also probably want to talk a bit about that during your presentation.
- Measuring distance from the mine to a sample is an important part of your analysis. Please add a bit of text to your report describing how you did that in R. Any special functions or formulae?
- Again, you’re killing me with just saving a png somewhere else and then reading it in here. This takes all of the efficiency (and reproducibility) out of RMarkdown.
- For the
Mollusc plot, you’ve got some overlap in points, so add some transparency to the points.
- For the
Mollusc plot, it would be very interesting to turn this into a two-panel plot. Use the left panel to show a map with all Mollusc samples, with radionuclide concentration shown by color of the point, and the other panel with this scatterplot. That would allow the reader to see a few things. First of all, are all the measurements that are clustered at the same distance actually from the same location, or did that sample in different locations along the radius of this distance from the mine (maybe it’s a distance relevant to policy or something)? Also, does direction from the mine matter? For a given distance, is the concentration usually higher north of the mine that south of it or something?
- In that same plot, any ideas why samples are clumping at certain distances? For example, there are loads of measurements all at an identical distance of just under 2 km. Why?
- For the metal concentration plot, try faceting by metal and use
scales = “free_y” in the facet_wrap call. The difference in scales between the different metals makes it really hard to see any trends in AS. Also, transparency would be helpful for your points in this plot. I expect, for example, that all of the metals have lots of measurements just under 2 km, but you can only see that for Cu, because that’s the only metal for which there is much variation in the y variable across the samples. If you use some transparency, you could see where there are clumps of measurements, even if the measures don’t vary in y.
- “there are very few samples of insects available”: Use some inline r code here to actually give the number of insect samples available in the data you’re working with.
Shiny App
- The layout looks nice. It’s great that you customized with a grid layout. Also, nice job adding in a relevant image.
- I recommend that you include an “All” option for the “Sample Type” input and set that as the default initial value (if you list it first when you create the widget, it should be). This will require some if / else code for whether or not to filter the data in the
server.R file.
- For the years of sampling, some years have very few samples (or none), so you get a blank map. Instead of allowing the user to select a single year, let them pick a range of years (they can always reduce the range to pick a single year. You can use
sliderInput with a two-unit value input to do that (e.g., value = c(1976, 2015)). Set the default to be the full study period, so that when your app first opens, there will be lots of points. You’ll need to change the server file so you filter the data to years from input$slider1[1] to input$slider1[2]. You could also play around with including the option to put in an animation button for this widget (try using the animate = TRUE option in sliderInput).
- That’s great that you’ve included a source for the image in the app. See if you can make that link a real link that someone can click on in the app.
- Is there any other information from the original dataset that might be useful to include in the pop-up of the map points?
- It would be great to include one other tab with a different output in addition to the map. For example, you may want to have an output that shows a scatterplot of distance versus radionuclide concentration for the selected subset of sample type and years (similar to the Mollusc plot you show in the RMarkdown document).
- Overall, very nice code for this. You’ve kept it very clean by doing a lot of the data cleaning in another script and just loading data here. This code was very easy to navigate and understand.
Overall
- Overall, the project directory is nicely organized. It’s particularly good that you put a lot of your R scripts in an
R subdirectory (although why do you have a copy of the RMarkdown Word output there?). Same for the data_raw folder— great that you used this to store your raw data. It was immediately clear to me in looking through your directory which files contained your original data.
- Try to avoid spaces in file names. RStudio is a bit forgiving, but if you ever have to do things with these files from the command line, it could get to be a huge pain. It’s a good habit to get into to use underscores or something rather than spaces in file names.
- File names in the R/ subdirectory are sensibly named (other than the spaces) and have useful comments within the code to make them easier to navigate. Nicely done.
- (PS: commit message for commit 81d53b2 is my favorite in this repository so far. Sometimes I feel that way too.)
@cstenkampstrahm @bralex @Preston5789 : Since you're presenting so soon, I wanted to make sure I went ahead and got you some feedback on the current version of your project (based on the GitHub repository as of this evening). These are items for you to consider for your final draft of the project due next week, not things that you necessarily need to resolve by the time you present. Also, I don't expect you to incorporate all of the suggestions, but do address some of them. I think the only severe concern I had is that the RMarkdown document doesn't include any interesting R code-- all the figures are being generated with a script somewhere else and then read in as images. This keeps the file from being truly reproducible, as if I wanted to go back and see how you made a figure, I can't do that (I could find the image file you're reading in, but there's no way for me to figure out from that image file which R script was used to create it, so I'd have to search through all of your R scripts to find the right code). Once quick alternative is that you may be able to call
sourceon the R scripts you're using to create the plots from within the RMarkdown document--- at least in that case the RMarkdown document would clearly specify which file has the code used for each figure, so it would be a bit more reproducible.Here are other comments, split into comments for the RMarkdown report, the Shiny app, and overall:
RMarkdown document
map_data(stamen I think, but not google), you can input a bounding box (latitude and longitude for the outside corners of the range) and clip the map right down to the area you need. See the helpfile?get_stamenmapfor some more details.https://quantpalaeo.wordpress.com/2016/06/05/ggplot2-maps-with-inset/
https://plot.ly/r/choropleth-maps/
https://www.r-bloggers.com/creating-inset-map-with-ggplot2/
large mapcode chunk)?small mapcode chunk)?largemapinstead oflarge map). I think in some cases these code chunk names are treated as objects in R, so it’s safer to not have spaces.forcatsfunctions to arrange these by number of sample or average radionuclide (if that’s meaningful) or something.readxland write out a csv, if you want one, using R? This would allow you to script everything you did starting from the raw data you got from someone else.Molluscplot, you’ve got some overlap in points, so add some transparency to the points.Molluscplot, it would be very interesting to turn this into a two-panel plot. Use the left panel to show a map with all Mollusc samples, with radionuclide concentration shown by color of the point, and the other panel with this scatterplot. That would allow the reader to see a few things. First of all, are all the measurements that are clustered at the same distance actually from the same location, or did that sample in different locations along the radius of this distance from the mine (maybe it’s a distance relevant to policy or something)? Also, does direction from the mine matter? For a given distance, is the concentration usually higher north of the mine that south of it or something?scales = “free_y”in thefacet_wrapcall. The difference in scales between the different metals makes it really hard to see any trends in AS. Also, transparency would be helpful for your points in this plot. I expect, for example, that all of the metals have lots of measurements just under 2 km, but you can only see that for Cu, because that’s the only metal for which there is much variation in the y variable across the samples. If you use some transparency, you could see where there are clumps of measurements, even if the measures don’t vary in y.Shiny App
server.Rfile.sliderInputwith a two-unitvalueinput to do that (e.g.,value = c(1976, 2015)). Set the default to be the full study period, so that when your app first opens, there will be lots of points. You’ll need to change the server file so you filter the data to years frominput$slider1[1]toinput$slider1[2]. You could also play around with including the option to put in an animation button for this widget (try using theanimate = TRUEoption insliderInput).Overall
Rsubdirectory (although why do you have a copy of the RMarkdown Word output there?). Same for thedata_rawfolder— great that you used this to store your raw data. It was immediately clear to me in looking through your directory which files contained your original data.