cstenkampstrahm · geanders · Dec 20, 2016 · Dec 20, 2016 · Dec 20, 2016 · Dec 20, 2016
diff --git a/FinalProject_Rclass.Rproj b/FinalProject_Rclass.Rproj
diff --git a/README-Heavy_Metal-1.png b/README-Heavy_Metal-1.png
diff --git a/README-Mollusc-1.png b/README-Mollusc-1.png
diff --git a/README-Rad_vs._Distance-1.png b/README-Rad_vs._Distance-1.png
diff --git a/README-largemap-1.png b/README-largemap-1.png
diff --git a/README-smallmap-1.png b/README-smallmap-1.png
diff --git a/Rmarkdown.Rmd → README.Rmd b/Rmarkdown.Rmd → README.Rmd
diff --git a/Rmarkdown.docx → README.docx b/Rmarkdown.docx → README.docx
diff --git a/README.md b/README.md
@@ -0,0 +1,68 @@
+<!-- README.md is generated from README.Rmd. Please edit that file -->
+Research Questions:
+-------------------
+
+-   Have mining and milling operations at the Ranger Mine contaminated the surrounding Alligators River Region with radionuclides and metals?
+    -   Where are the sample types of interest located in relation to the mine?
+    -   How do the concentrations of radionuclides and metals change with distance from the mine?
+
+Problem
+-------
+
+The Alligator Rivers Region is a province in the wet-dry tropics of northeastern Australia. The land in this region is rich in uranium, and since 1980 has been home to a large mine. [This is the Ranger Uranium mine](http://www.world-nuclear.org/information-library/country-profiles/countries-a-f/appendices/australia-s-uranium-mines.aspx#ranger): ![Image of the Ranger Uranium Mine](images/RMDRanger2005.png)
+
+An Australian government entity called the [Environmental Research Institute of the Supervising Scientist](http://www.environment.gov.au/science/supervising-scientist) (ERISS) has been collecting animal and environmental samples from the land around the mine, in the general area of the Alligator Rivers Watershed and Kakadu National Park: ![Image of watershed](images/australia_with_mine.png)
+
+These have been collected since mine activity began. The samples have been measured for many different mine contaminants including the heavy metals Copper (Cu), Arsenic (As), Lead (Pb), Mercury (Hg), isotopes of Uranium (U), Thorium (Th), Radium (Ra), Polonium (Po), Actinium (Ac), and Potassium (K) - specifically <sup>234</sup>U, <sup>238</sup>U, <sup>230</sup>Th, <sup>232</sup>Th, <sup>226</sup>Ra, <sup>228</sup>Ra, <sup>210</sup>Pb, <sup>210</sup>Po, <sup>228</sup>Th, <sup>227</sup>Ac, and <sup>40</sup>K. All of these metals and radionuclides are naturally occurring, but may have been released during mining and milling at the mine.
+
+Radionuclide information is measured using the SI unit of activity, the Becquerel (Bq). This unit is defined as one radioactive decay or disintegration per second.
+
+The location of the mine and the surrounding region is show below. Locations of sample collection are indicated.
+
+<img src="README-largemap-1.png" style="display: block; margin: auto;" />
+
+Samples from flora, fauna, and soil and water have been gathered by the ERISS. The specific sample types can be clustered based on their ecological niche and purpose in the watershed, to contain the following groups: water, fish, molluscs, freshwater animals, freshwater plants, freshwater sediment, terrestrial animals, terrestrial plants, and terrestrial soil. The following map shows the collection locations of these varying sample types, near the Ranger Mine.
+
+<img src="README-smallmap-1.png" style="display: block; margin: auto;" />
+
+This dataset of environmental samples has been continually gathered by the ERISS to keep track of mine contaminants in the region over space and time. The dataset was recently published in the [Journal of Environmental Radioactivity](http://www.sciencedirect.com/science/article/pii/S0265931X16301928).
+
+Attempting to answer the research questions proposed, using this dataset, is important for many reasons. One reason is to better understand the overall impacts of uranium mining on water and fauna, as 1/3 of the land in the region forms the Kakadu National Park. Addtionally, most of the remaining land is aboriginally owned. Because of this ownership, there are concerns about the contamination of bush foods, and also the contamination of flora for wild animal consumption. After mine closure, the site is to be remediated by 2026 and incorporated into Kakadu. Knowing where and to what level contaminants can be found is also central to this ultimate goal.
+
+Analysis
+--------
+
+The initial dataset was made available as an .xlsx file. The only manual process was to load the file into Excel and save as a .csv file; all other data cleaning was performed using an R script. Most of the data cleanup consisted of cosmetic and convenience changes, renaming variables and setting data types. We created a variable called `sample_type` to group samples by substrate, such as Terrestrial Plants or Fish. This is important for both our analysis and for the Shiny app.
+
+The dataset includes geospatial data using Easting and Northing in two different zones of a Mercator Projection. We used functions from the `rgdal` and `sp` packages to convert to lat-long coordinates, which we were then able to plot using `ggplot2`.
+
+A number of sample results were reported as below detection limits, in the form of "<value", where value is the detection limit rather than an actual measured value.  We discussed several possibilities for managing these results.  We found references that directly used the LOD as a result (highly conservative), that used 0 as the result (effectively loses results), or divided the LOD by a factor (2 or $\sqrt{2}$) for use as a result (could be considered arbitary).  We elected to go the highly conservative route and use the LOD as the value. 
+
+
+##Results
+
+
+Below are the results for the radionuclide concentration (Bq/Kg) in Molluscs as a function of the year sampled. The color of each point is scaled by it's proximity to the mine. 
+
+<img src="README-Mollusc-1.png" style="display: block; margin: auto;" />
+
+If the presence of the mine had a direct impact on the radionuclide concentrations in the environment, one would expect to see concentrations of radionuclides increase over time, from the point before the mine was established to after. However, there is no apparent data to indicate an upward trend in radionuclide concentration over time, nor does the radionuclide concentration seem to depend on distance for a particular year.
+
+The fluctuations in radionuclide content are most likely caused by naturally occuring radionuclides within the local geology, as molluscs spend their lives embedded in the land! There has been a presence of these entities in the sampled environments, preceding any excavation from the mine.
+
+As water is fluid (PUN!) in this area, unlike the land or relatively static molluscs, we can look specifically at the content of radionuclides in the water samples taken over time.
+
+<img src="README-Rad_vs._Distance-1.png" style="display: block; margin: auto;" />
+
+As one can see, there is a slight downward trend in the amount of radionuclides in water samples as are taken farther and farther from the mine. The water samples represented in the graph above reflect those taken from all sampled years.
+
+<img src="README-Heavy_Metal-1.png" style="display: block; margin: auto;" />
+
+The figure above shows metal concentrations in samples as a function of distance from the mine. Both copper and lead show a potentially significant correlation with distance; mercury may as well. This figure also shows one side effect of our decision to treat results below the limits of detection as results equal to the limit of detection; a common detection threshold for lead was ~25 mg/l.
+
+Further Research
+----------------
+
+While the dataset is quite robust in general, there are some limitations. For example, there are very few samples of insects available. For locations more than a few kilometers from the mine, there are not many repeat samples such that a time course could be studied. Since the primary focus was environmental samples, there does not appear to have been a concerted effort conducted to study animals and their uptake of heavy metals and radionuclides.
+
+Interestingly, the same group that published the initial data set recently published a short communication entitled '[A tool for calculating concentration ratios from large environmental data sets](https://www.ncbi.nlm.nih.gov/pubmed/27579895). This publication outlines a tool developed in MS Excel that predicts biological accumulation of mine contaminants, given their proximity to environmental samples with certain concentrations of both metals and radionuclides. We'd expect others to start analyzing this current data set in light of this newly developed tool, since these many environmental measures can now been linked to biological outcomes.
diff --git a/data/geo_aussie_data.RData b/data/geo_aussie_data.RData
diff --git a/data_raw/clean_data.R b/data_raw/clean_data.R
@@ -0,0 +1,180 @@
+## this is the code for cleaning the data set
+library(tidyverse)
+library(lubridate)
+library(stringr)
+library(rgdal)
+library(sp)
+library(maptools)
+library(ggmap)
+library(sp)
+
+# guess_max set because otherwise R sets up the variables incorrectly
+aussie_data <- read_csv("data_raw/mmc1.csv", guess_max = 5000)
+
+# initial bits of cleanup
+# dropping some non-useful columns - leaves 5060 obs. of 53 variables
+clean_aussie_data <- dplyr::select(aussie_data, -`Database ID`, `Composite count`, -`Family code`, -Datum, -Notes, -Reference, -Geocoding, 
+                                   -(contains("err")), -`Mussel age (y)`)
+
+# renaming some columns to simplify R codes
+# deliberately left bare element names capitalized, but shifted isotope names
+clean_aussie_data <- clean_aussie_data %>%
+  dplyr::rename(sample_id = `Sample ID`, ecosystem = Ecosystem, 
+                sample_type = `Sample type`, common_name = `Common name`,
+                sci_name = `Scientific name`, bush_group = `Bush food group`,
+                wildlife_group = `Wildlife group`, compartment = Compartment,
+                composite_count = `Composite count`, sex = Sex, 
+                soil_depth = `Soil depth (cm)`, 
+                collection_date = `Collection date (day/month/year)`, 
+                location = Location, status = Status, easting = Easting,
+                northing = Northing, zone = Zone, u238 = `U-238`, 
+                u234 = `U-234`, th230 = `Th-230`, ra226 = `Ra-226`, 
+                pb210 = `Pb-210`, po210 = `Po-210`, th232 = `Th-232`, 
+                ra228 = `Ra-228`, th228 = `Th-228`, ac227 = `Ac-227`, 
+                k40 = `K-40`, dry_mass_fraction = `Dry mass fraction`)
+#
+# original dataset uses wildlife_group only for a fine-grained division of 
+# plant and animal samples.  Here we use ecosystem and sample_type to create 
+# entries in wildlife_group for soil and sediment samples, as well as for
+# fruits that were not counted in other types. Set the wildlife_group to a 
+# factor.
+clean_aussie_data <- clean_aussie_data %>%
+  dplyr::mutate(wildlife_group = ifelse(is.na(wildlife_group), 
+                                        paste0(ecosystem,"-",sample_type), wildlife_group ))
+#
+# set several values to factors: ecosystem, sample_type, wildlife_group
+# convert collection_date to a date
+clean_aussie_data$wildlife_group <- as.factor(clean_aussie_data$wildlife_group)
+clean_aussie_data$ecosystem <- as.factor(clean_aussie_data$ecosystem)
+clean_aussie_data$sample_type <- as.factor(clean_aussie_data$sample_type)
+clean_aussie_data$collection_date <- dmy(clean_aussie_data$collection_date)
+#
+# entries in sample_id are not unique, so create an entry number for later use
+clean_aussie_data$entry_num <- 1:nrow(clean_aussie_data)
+#
+# A number of sample results were reported as below detection limits, in the form
+# of "<value", where value is the detection limit.  Highly conservative to
+# use the LOD as the value but what we elected to do
+clean_aussie_data <- clean_aussie_data %>%
+  mutate(u234 = as.numeric(str_replace(u234, "<", "")),
+         th230 = as.numeric(str_replace(th230, "<", "")),
+         ra226 = as.numeric(str_replace(ra226, "<", "")),
+         pb210 = as.numeric(str_replace(pb210, "<", "")),
+         th232 = as.numeric(str_replace(th232, "<", "")),
+         po210 = as.numeric(str_replace(po210, "<", "")),
+         k40 = as.numeric(str_replace(k40, "<", "")),
+         u238 = as.numeric(str_replace(u238, "<", "")),
+         ra228 = as.numeric(str_replace(ra228, "<", "")),
+         th228 = as.numeric(str_replace(th228, "<", "")),
+         Al = as.numeric(str_replace(Al, "<", "")),
+         As = as.numeric(str_replace(As, "<", "")),
+         Ba = as.numeric(str_replace(Ba, "<", "")),
+         Ca = as.numeric(str_replace(Ca, "<", "")),
+         Cd = as.numeric(str_replace(Cd, "<", "")),
+         Co = as.numeric(str_replace(Co, "<", "")),
+         Cr = as.numeric(str_replace(Cr, "<", "")),
+         Cu = as.numeric(str_replace(Cu, "<", "")),
+         Fe = as.numeric(str_replace(Fe, "<", "")),
+         Hg = as.numeric(str_replace(Hg, "<", "")),
+         K = as.numeric(str_replace(K, "<", "")),
+         Mg = as.numeric(str_replace(Mg, "<", "")),
+         Mn = as.numeric(str_replace(Mn, "<", "")),
+         Na = as.numeric(str_replace(Na, "<", "")),
+         Ni = as.numeric(str_replace(Ni, "<", "")),
+         P = as.numeric(str_replace(P, "<", "")),
+         Pb = as.numeric(str_replace(Pb, "<", "")),
+         Rb = as.numeric(str_replace(Rb, "<", "")),
+         S = as.numeric(str_replace(S, "<", "")),
+         Sb = as.numeric(str_replace(Sb, "<", "")),
+         Se = as.numeric(str_replace(Se, "<", "")),
+         Th = as.numeric(str_replace(Th, "<", "")),
+         U = as.numeric(str_replace(U, "<", "")),
+         V = as.numeric(str_replace(V, "<", "")),
+         Zn = as.numeric(str_replace(Zn, "<", ""))
+  )
+emitters <- c("u238", "u234", "th230", "ra226", "pb210", "po210", "th232",
+              "ra228", "th228", "ac227", "k40")
+clean_aussie_data$radionuclides <- rowSums(clean_aussie_data[, emitters], 
+                                           na.rm = TRUE)
+#
+# consolidate the wildlife_group factors into sample_type
+clean_aussie_data <- clean_aussie_data%>% 
+  mutate(sample_type = case_when(
+    clean_aussie_data$wildlife_group == "Terrestrial-Bird" ~ 
+      "Terrestrial Animals",
+    clean_aussie_data$wildlife_group == "Terrestrial-Reptile" ~ 
+      "Terrestrial Animals",
+    clean_aussie_data$wildlife_group == "Terrestrial-Mammal" ~ 
+      "Terrestrial Animals",
+    clean_aussie_data$wildlife_group == "Terrestrial-Grasses and Herbs" ~
+      "Terrestrial Plants",
+    clean_aussie_data$wildlife_group == "Terrestrial-Plant" ~ 
+      "Terrestrial Plants",
+    clean_aussie_data$wildlife_group == "Terrestrial-Soil" ~ 
+      "Terrestrial Soil",
+    clean_aussie_data$wildlife_group == "Freshwater-Bird" ~ 
+      "Freshwater Animals",
+    clean_aussie_data$wildlife_group == "Freshwater-Reptile" ~ 
+      "Freshwater Animals",
+    clean_aussie_data$wildlife_group == "Freshwater-Fish" ~ "Fish",
+    clean_aussie_data$wildlife_group == "Freshwater-Mollusc" ~ "Mollusc",
+    clean_aussie_data$wildlife_group == "Freshwater-Sediment" ~ 
+      "Freshwater Sediment",
+    clean_aussie_data$wildlife_group == "Freshwater-Vascular Plant" ~ 
+      "Freshwater Plant",
+    clean_aussie_data$wildlife_group == "Freshwater-Water" ~ "Water",
+    TRUE ~ "NA"))
+# 
+# peeling out the geospatial information for plotting
+#geo_data <- select(clean_aussie_data, sample_id, easting, northing, zone)
+# filter out a few NAs
+geo_data <- subset(clean_aussie_data, easting != "" | northing != "")
+#
+# the geo_data is in UTM coordinates, need to convert to lat-long
+# the lat-long conversion requires that the data be split between the UTM
+# zones.  In this case, 52S and 53S.  From here on everything is split.
+geo_data_52 <- subset(geo_data, zone == "52S")
+geo_data_53 <- subset(geo_data, zone == "53S")
+coords_52 <- cbind(Easting = as.numeric(as.character(geo_data_52$easting)),
+                   Northing = as.numeric(as.character(geo_data_52$northing)))
+coords_53 <- cbind(Easting = as.numeric(as.character(geo_data_53$easting)),
+                   Northing = as.numeric(as.character(geo_data_53$northing)))
+#
+# Create the SpatialPointsDataFrame
+spatial_52 <- SpatialPointsDataFrame(coords_52, data = 
+                                       data.frame(geo_data_52$entry_num), 
+                                     proj4string = CRS("+init=epsg:32752"))
+spatial_53 <- SpatialPointsDataFrame(coords_53, data = 
+                                       data.frame(geo_data_53$entry_num), 
+                                     proj4string = CRS("+init=epsg:32753"))
+#
+# Convert to Lat Long
+# Convert from Eastings and Northings to Latitude and Longitude
+spatial_52_ll <- spTransform(spatial_52, CRS("+init=epsg:4326"))
+spatial_53_ll <- spTransform(spatial_53, CRS("+init=epsg:4326"))
+#
+# we also need to rename the columns 
+colnames(spatial_52_ll@coords)[colnames(spatial_52_ll@coords) == 
+                                 "Easting"]<- "Longitude" 
+colnames(spatial_52_ll@coords)[colnames(spatial_52_ll@coords) == 
+                                 "Northing"]<- "Latitude"
+colnames(spatial_53_ll@coords)[colnames(spatial_53_ll@coords) == 
+                                 "Easting"]<- "Longitude" 
+colnames(spatial_53_ll@coords)[colnames(spatial_53_ll@coords) == 
+                                 "Northing"]<- "Latitude"
+#
+# convert back to data frames
+geo_data_52_ll <- as.data.frame(spatial_52_ll)
+geo_data_52_ll <- dplyr::rename(geo_data_52_ll, 
+                                entry_num = geo_data_52.entry_num)
+geo_data_53_ll <- as.data.frame(spatial_53_ll)
+geo_data_53_ll <- dplyr::rename(geo_data_53_ll, 
+                                entry_num = geo_data_53.entry_num)
+#
+# merge the two lat-long data sets into one, then join back into original
+# cleaned data set, producing a new data frame
+geo_data_ll <- rbind(geo_data_52_ll, geo_data_53_ll)
+geo_aussie_data <- left_join(geo_data_ll, clean_aussie_data, by = "entry_num")
+#
+# the final clean dataset here is geo_aussie_data
+save(geo_aussie_data, file = "data/geo_aussie_data.RData")
diff --git a/RMDRanger2005.png → images/RMDRanger2005.png b/RMDRanger2005.png → images/RMDRanger2005.png
diff --git a/australia_with_mine.png → images/australia_with_mine.png b/australia_with_mine.png → images/australia_with_mine.png
diff --git a/Notes on Geospatial Data.txt → notes/Notes on Geospatial Data.txt b/Notes on Geospatial Data.txt → notes/Notes on Geospatial Data.txt
diff --git a/Notes on Shiny App.txt → notes/Notes on Shiny App.txt b/Notes on Shiny App.txt → notes/Notes on Shiny App.txt
diff --git a/notes on the cleaning script → notes/notes on the cleaning script b/notes on the cleaning script → notes/notes on the cleaning script