diff --git a/_episodes_rmd/01-intro-rstudio.Rmd b/_episodes_rmd/01-intro-rstudio.Rmd index 49d7698a3..0f8912dbb 100644 --- a/_episodes_rmd/01-intro-rstudio.Rmd +++ b/_episodes_rmd/01-intro-rstudio.Rmd @@ -42,7 +42,7 @@ on if needed. You can copy-paste into the R console, but the RStudio script editor allows you to 'send' the current line or the currently selected text to the R console using the Ctrl+Return shortcut. -At some point in your analysis you may want to check the content of variable or +At some point in your analysis you may want to check the content of a variable or the structure of an object, without necessarily keep a record of it in your script. You can type these commands directly in the console. RStudio provides the Ctrl+1 and Ctrl+2 shortcuts allow you to jump between the script and the @@ -61,8 +61,16 @@ window and press Esc; this should help you out of trouble. ### Commenting -Use `#` signs to comment. Comment liberally in your R scripts. Anything to the -right of a `#` is ignored by R. +Use `#` symbols to start a comment. Anything to the +right of a `#` is ignored by R. However, keep in mind that code should be +[like a joke: good without explanation](https://twitter.com/search?q=code%20joke%20explain%20good). +This means that adding comments should be of lower priority than: + +- naming functions, arguments and variables in a self-explanatory +manner, +- cleaning up or rewriting hard-to-understand code +- adding formal documentation for your functions +- adding automated tests for your code ### Assignment Operator diff --git a/_episodes_rmd/03-func-R.Rmd b/_episodes_rmd/03-func-R.Rmd index e7e64ed1d..ace7a6f75 100644 --- a/_episodes_rmd/03-func-R.Rmd +++ b/_episodes_rmd/03-func-R.Rmd @@ -31,6 +31,9 @@ knitr_fig_path("03-func-R-") ``` If we only have one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot some simple statistics. +However, we often have multiple data files or can expect more in the future. +That's why it's usually a good idea to prepare for this case +and write code in a reproducible manner right away. In this lesson, we'll learn how to write a function so that we can repeat several operations with a single command. ### Defining functions @@ -197,8 +200,8 @@ center <- function(data, desired) { ``` We could test this on our actual data, but since we don't know what the values ought to be, it will be hard to tell if the result was correct. -Instead, let's create a vector of 0s and then center that around 3. -This will make it simple to see if our function is working as expected: +Instead, let's create a small vector of integers and center it around a small integer, +so that we can comprehend, retrace and therefore check the calculation in our head: ```{r, } (z <- c(1, 2, 3)) @@ -310,20 +313,23 @@ Each `@param` documents an input parameter, while `@return` explains the functio output. The more comprehensible a description of its in- and output data (types), the easier a function can be integrated into larger analysis pipelines. -```{r challenge-more-advanced-function-analyze, eval=FALSE, include=FALSE} -analyze <- function(filename) { - dat <- read.csv(file = filename, header = FALSE) - avg_day_inflammation <- colMeans(dat) - plot(avg_day_inflammation) -} -``` - > ## Functions to Create Graphs > -> Write a function called `analyze` that takes a filename as an argument -> and displays the graph produced in the [previous lesson][start-ep] (average inflammation over time). -> `analyze("inflammation.csv")` should produce the graph already shown, -> Be sure to document your function with roxygen comments. +> To automate the plotting of the graphs from the +> [previous lesson][start-ep] (average inflammation over time), +> we wrote the following function: +> ~~~ +> analyze <- function(filename) { +> # Input a character string that correspondes to a filename to +> # to get the average inflammation of each day plotted. +> dat <- read.csv(file = filename, header = FALSE) +> avg_day_inflammation <- colMeans(dat) +> plot(avg_day_inflammation) +> } +> ~~~ +> {: .r} +> +> Formalise the above comments into roxygen-style function documentation. > > > ## Solution > > ~~~ @@ -335,11 +341,7 @@ analyze <- function(filename) { > > #' > > #' @examples > > #' analyze("inflammation.csv") -> > analyze <- function(filename) { -> > dat <- read.csv(file = filename, header = FALSE) -> > avg_day_inflammation <- colMeans(dat) -> > plot(avg_day_inflammation) -> > } +> > analyze <- function(filename) { … } > > ~~~ > > {: .r} > {: .solution} @@ -352,17 +354,15 @@ and a roxygen comment skeleton will be inserted. Note that we are not using `@export` here, because it will only become relevant for [packaging]({{ page.root }}/reference/#packages). You can safely ignore it for now, or delete it. -```{r challenge-more-advanced-function-rescale, include=FALSE} -#' Rescaling vectors to lie in the range 0 to 1 -#' -#' @param v A numeric vector -#' -#' @return The rescaled numeric vector -#' -#' @examples -#' rescale(c(1, 2, 3)) # should return [1] 0.0 0.5 1.0 -#' rescale(c(1, 2, 3, 4, 5)) # should return [1] 0.00 0.25 0.50 0.75 1.00 +## Rescaling + +Another example of an informally documented function: + +```{r challenge-more-advanced-function-rescale} rescale <- function(v) { + # takes a vector as input + # returns a corresponding vector of values scaled to the range 0 to 1 + # e.g.: rescale(c(1, 2, 3)) => 0.0 0.5 1.0 L <- min(v) H <- max(v) result <- (v - L) / (H - L) @@ -370,16 +370,18 @@ rescale <- function(v) { } ``` -> ## Rescaling -> -> Write a function `rescale` that takes a vector as input and returns a corresponding vector of values scaled to lie in the range 0 to 1. -> Please create a new file for this and save it as `rescale.R`. -> (If `L` and `H` are the lowest and highest values in the original vector, then the replacement for a value `v` should be `(v-L) / (H-L)`.) -> Be sure to document your function with roxygen comments. +Please create a new file for this and save it as `rescale.R`, and test that +it is mathematically correct by using `min`, `max`, and `plot`. + +> ## Rescaling documentation > -> Test that your `rescale` function is working properly using `min`, `max`, and `plot`. +> Convert `rescale`'s comments into roxygen function docu! Which keyboard shortcut +> gets you started in RStudio? > > > ## Solution +> > +> > Find the keyboard shortcut above, and one possible roxygen documentation below. +> > > > ~~~ > > #' Rescaling vectors to lie in the range 0 to 1 > > #' @@ -390,13 +392,7 @@ rescale <- function(v) { > > #' @examples > > #' rescale(c(1, 2, 3)) # should return [1] 0.0 0.5 1.0 > > #' rescale(c(1, 2, 3, 4, 5)) # should return [1] 0.00 0.25 0.50 0.75 1.00 -> > -> > rescale <- function(v) { -> > L <- min(v) -> > H <- max(v) -> > result <- (v - L) / (H - L) -> > return(result) -> > } +> > rescale <- function(v) { ... } > > ~~~ > > {: .r} > {: .solution} @@ -546,3 +542,8 @@ saved by typing less. Also, a good IDE (integrated development environment) will reduce your typing with auto-completion. Thus, it is generally better to only omit an argument name if its value clarifies it. For example, `read.csv("path/to/file.xyz")` is comprehensible even without `...(file = "...` in the middle. + +> For the next episodes, we'll initially ignore both `2`-versions of our +> functions. Keep them in separate files, close those and concentrate on going +> back to the roots: `center()` and `rescale()`. +{: .callout} diff --git a/_episodes_rmd/04-testthat.Rmd b/_episodes_rmd/04-testthat.Rmd new file mode 100644 index 000000000..949027dfe --- /dev/null +++ b/_episodes_rmd/04-testthat.Rmd @@ -0,0 +1,107 @@ +--- +title: "Testing R Functions Automatically" +teaching: 30 +exercises: 10 +questions: +- "What is the benefit of auto-testing my functions?" +- "How do I create and run test-cases in R?" +- "Why would I change my code after I got it to run?" +objectives: +- "Formalise documented examples as tests." +- "Use testthat functions to create and run tests." +keypoints: +- "Tests help you ensure correct code." +source: Rmd +--- + +```{r, include = FALSE} +source("../bin/chunk-options.R") +knitr_fig_path("04-testthat-") +``` + +### Auto-testing functions with the `testthat` package + +Computer code evolves. As you just saw, functions need to be updated +when new use-cases appear, or new goals need to be achieved, or +sped up when more data needs to be crunched, or cleaned up to make their code +more readable for collaborators, reviewers, etc. You probably know the saying +"Never change a running system!" We surely invested a lot of time already to make +sure our functions work fine now. It is natural to be cautious about changing +computer code and it rightly shouldn't be done on a whim. + +However, as you have probably learned in the Git lesson, branching is one way to +try out code changes in a way that allows you to recover from a mistake, and only +merge successful changes. Automatic tests are another way to "span a safety net", +and in R, the [`testthat` package][testthat] is commonly used. Install it through +RStudio's `Packages` panel or execute `install.packages("testthat")` in the console. + +> ## Loading packages in RStudio or the console +> +> Checking and unticking a box in RStudio's `Packages` panel is the same as +> executing `library("…")` and `detach("…")` in the console. Note how your mouse +> interaction with the panel sends commands, or how your console commands affect +> the panel. The same is true for `install.packages()` and `remove.packages()`. +{: .callout} + +A `testthat` case is composed of the following elements: + +```{r center-test} +library("testthat") +test_that("centering works", { # function call with a descriptive string + expect_equal( # expectation, which compares… + object = center(data = c(1, 2, 3), desired = 0), # … the function's actual output… + expected = c(-1, 0, 1) # … with our expected result + ) +}) +``` + +Don't worry that there is no output when you execute one or more `expect_equal()` +statement(s), or the whole `test_that()` block. In good UNIX +tradition, `testthat` will only bother you in case some expectations are _not_ +met, i.e. one or more unit tests fail. + +> ## Adding more test-cases +> +> When we tested our `rescale()` function interactively, we used several different +> numeric examples. Search your script and/or your console history in RStudio to +> find these examples and convert them into more `expect_equal(…)` test-cases. + + +> ## Testing our rescaling functions +> +> Apply the above example of converting the examples into unit tests in the +> `rescale.R` file as well. +> +> > ## Solution +> > ~~~ +> > library("testthat") +> > test_that("rescaling works", { +> > expect_equal(rescale(c(1, 2, 3)), c(0.0, 0.5, 1.0)) +> > expect_equal(rescale(c(1, 2, 3, 4, 5)), c(0.0, 0.25, 0.5, 0.75, 1.0)) +> > }) +> > ~~~ +> > {: .r} +> {: .solution} +> +> Think about the [DRY principle]({{ page.root }}/03-func-R/#composing-functions). +> Is it necessary to keep the `@examples` in the documentation when you are using +> them in the tests, or vice versa? Which considerations would help you decide? +> +> > ## Hint +> > +> > Examples are a good starting point, but: Tests help you ensure semantically, +> > mathematically and scientifically correct code. This may require covering +> > edge-cases, errors, etc. with dedicated tests. These may not be helpful, +> > instructive examples for your users. The examples should teach how your +> > functions can be applied to solve your users' problems. +> {: .solution} +{: .challenge} + +In order to get very quick feedback on your code, consider activating RStudio's +`Source on Save` function and getting used to pressing Ctrl+S +whenever you think your code is ready to be saved. If there are any errors, read +the error message, investigate and correct either your code or the test and +Ctrl+S again. + +Once all your tests are green, a good moment has come to also commit your tests +for `center.R` and `rescale.R`. Pick a self-explanatory commit message ;-) diff --git a/_episodes_rmd/04-making-packages-R.Rmd b/_episodes_rmd/05-making-packages-R.Rmd similarity index 92% rename from _episodes_rmd/04-making-packages-R.Rmd rename to _episodes_rmd/05-making-packages-R.Rmd index 385e49251..2f2697680 100644 --- a/_episodes_rmd/04-making-packages-R.Rmd +++ b/_episodes_rmd/05-making-packages-R.Rmd @@ -22,7 +22,7 @@ source: Rmd ```{r, include = FALSE} source("../bin/chunk-options.R") -knitr_fig_path("04-making-packages-R-") +knitr_fig_path("05-making-packages-R-") ``` Why should you make your own R packages? @@ -94,16 +94,28 @@ Suggestion: organize in a logical manner so that you know which file holds which ### `DESCRIPTION` file +We used RStudio's import process to include the `.R` files we already had. However, FAIRer +(because richer in metadata) `DESCRIPTION` files can be generated with a +[helper package called `usethis`][usethis]. Again, install it with RStudio or +the console, then load it and execute + +```{r use_desc, eval=FALSE} +use_description() +``` + ~~~ Package: PackageName Type: Package -Title: What the Package Does (Title Case) Version: 0.1.0 -Author: Who wrote it -Maintainer: The package maintainer -Description: More about what it does (maybe more than one line) +Title: What the Package Does (One Line, Title Case) +Authors@R: + person(given = "First", + family = "Last", + role = c("aut", "cre"), + email = "first.last@example.com") +Description: What the package does (one paragraph). Use four spaces when indenting paragraphs within the Description. -License: What license is it under? +License: What license it uses Encoding: UTF-8 LazyData: true ~~~ @@ -175,7 +187,7 @@ After `Install and Restart`-ing again, looking up the documentation should work. What exactly does `roxygen2` do? It reads lines that begin with `#'` as the function documentation for your package. Descriptive tags are preceded with the `@` symbol. For example, `@param` has information about the input parameters for the function. -### Exporting "user-level" function +### Exporting "user-level" functions We haven't talked about the `NAMESPACE` file yet! It belongs to the package skeleton, and was set up by RStudio with an `exportPattern`. Any file name in `R/` that @@ -193,7 +205,7 @@ Learn more about this from the ["R packages" book][r-pkgs-name]. ### Finishing up -Please take a look at RStudio's `Files` panes now. The `/man` directory should +Please take a look at RStudio's `Files` panes now. The `man/` directory should now contain one LaTeX-like formatted `.Rd` file for each function. In case you learned about Git already, also view the `.Rd` files in RStudio's @@ -231,4 +243,3 @@ your personal package. A good resource to find more guidance on packaging R code is [ROpenSci's onboarding guide][ROSPG]. [ROSPG]: https://onboarding.ropensci.org/packaging_guide.html -[ep-func]: {{ page.root }}/03-func-R/ diff --git a/_episodes_rmd/05-testthat.Rmd b/_episodes_rmd/06-pkg-tests-TDD.Rmd similarity index 51% rename from _episodes_rmd/05-testthat.Rmd rename to _episodes_rmd/06-pkg-tests-TDD.Rmd index 51c4ca8a5..10feb0df9 100644 --- a/_episodes_rmd/05-testthat.Rmd +++ b/_episodes_rmd/06-pkg-tests-TDD.Rmd @@ -1,14 +1,13 @@ --- -title: "Unit-Testing And Test-driven Development" -teaching: 45 -exercises: 15 +title: "Packaging Tests and Test-Driven Development" +teaching: 30 +exercises: 10 questions: -- "What is the benefit of unit-testing my code?" -- "How do I create and run unit tests?" -- "Why would I change my code after I got it to run?" +- "How are test-cases organised in an R package?" +- "Which tools help me organise my tests" objectives: -- "Formalise documented examples as tests." -- "Use testthat functions to create and run tests." +- "Let the usethis package create test files and folder structures" +- "Use RStudio or testthat shortcuts to run all tests at once" keypoints: - "Changing code is not always necessary, but often useful." - "Tests provide a safety net for changing code." @@ -18,35 +17,12 @@ source: Rmd ```{r, include = FALSE} source("../bin/chunk-options.R") -knitr_fig_path("05-testthat-R-") +knitr_fig_path("06-pkg-tests-TDD-") ``` -### Unit testing with the `testthat` package - -Computer code evolves. Functions may need to be updated to new usage goals, -sped up when more data needs to be crunched, or cleaned up to make their code -more readable for collaborators, reviewers, etc. You probably know the saying -"Never change a running system!" We surely invested a lot of time already to make -sure our functions work fine now. It is natural to be cautious about changing -computer code and it rightly shouldn't be done on a whim. - -However, as you have probably learned in the Git lesson, branching is one way to -try out code changes in a way that allows you to recover from mistakes, and only -merge successful changes. - -Unit tests are another way to "span a safety net", and in R, the [`testthat` package][testthat] -is commonly used. Because adding tests requires an expansion of the folder and file -structure of an R package, we are going to use a helper package to do this for -us: [`usethis`][usethis]. - -```{r, eval=FALSE} -install.packages(c("testthat", "usethis")) -# several can be combined, but only for installations -library("testthat") -library("usethis") -``` - -Afterwards, type: +Because moving the test cases into their own files in a certain way +requires an expansion of the folder and file +structure of our R package, we are going to [`usethis`][usethis] again: ```{r use_test, eval=FALSE} use_test("center") @@ -63,77 +39,27 @@ A new file should be created and the console should show: ~~~ {: .source} -The file is pre-filled with a little example. While -following the explanation of each part, please delete the contents of that example -test in order to prepare inserting our own. The string within `test_that("…", …)` -is an explanation of what this test actually tests. In case of our `center()` -function we expect that "centering works". - -Next comes a specific `expect_`ation, usually calling a given function with -a defined set of arguments (inputs) to check whether that result is `_equal` -to a known return value (outputs). -In the function comments, we have already formalised a few of these as examples. -Copy-and-paste them into the test file, remove the `# should return [1]` -indicators of the expected results, and wrap the result values into a `c()` to -enable R to compare the expected values with the output of the two `center()` tests. - -```{r center-test, eval=FALSE} -test_that("centering works", { - expect_equal(center(c(1, 2, 3), 0), c(-1, 0, 1)) -}) -``` +The file is pre-filled with a little example. Remove it, then cut our test case +out of `center.R` and paste it here. Repeat this with the test in `rescale.R`. +You could now execute the tests interactively, but of course, switching between +files and manually executing the test code is not necessary. -Don't worry that there is no output when you execute either one or both of the -`expect_equal()` statements, or the whole `test_that()` block. In good UNIX -tradition, `testthat` will only bother you in case some expectations are _not_ -met, i.e. one or more unit tests fail. - -> ## Testing our rescaling functions -> -> Apply the above example of converting the examples into unit tests for the -> `rescale` function as well. -> -> > ## Solution -> > ~~~ -> > test_that("rescaling works", { -> > expect_equal(rescale(c(1, 2, 3)), c(0.0, 0.5, 1.0)) -> > expect_equal(rescale(c(1, 2, 3, 4, 5)), c(0.0, 0.25, 0.5, 0.75, 1.0)) -> > }) -> > ~~~ -> > {: .r} -> {: .solution} -> -> Think about the [DRY principle]({{ page.root }}/03-func-R/#composing-functions). -> Is it necessary to keep the `@examples` in the documentation when you are using -> them in the tests? Which factors would you consider in your decision? -> -> > ## Hint -> > -> > Examples are a good starting point, but not every test case will be a useful -> > example, and vice versa. The examples help users to figure out how to apply -> > your functions. The test cases help the package developer(s) improve the code. -> > -> {: .solution} -{: .challenge} +To automatically run all our packaged tests, we can use RStudio's `Build` pane the +`More > Test Package` menu option, or [`testthat`'s `auto_test_package()`][tt-atp] +in the console. Notice the hopefully all green and `OK` `Results`. -To conclude this section about creating unit tests, let's again commit our results, +To conclude this section about packaging tests, let's again commit our results, for example as "Span safety net for TDD". - ### Test-driven development (TDD) Remember that we updated `rescale()` with lower and upper bounds and default -values at the end of [the functions episode][ep-func]? We had to manually test that change with a -new example back then. We were repeating ourselves a bit more often than necessary -back then, weren't we? Of course, there are ways to automate the testing of code -changes, which gives you quick feedback whether your changes worked, or broke -anything. - -To do that, we use `testthat`'s `auto_test_package()` -in the console, or in RStudio's `Build` pane the `More > Test Package` menu option, -and notice the hopefully all green and `OK` `Results`. +values at the end of [the functions episode][ep-func]? We either had to manually +test that change with a new example, or create another test case. +We were repeating ourselves a bit more often than necessary +back then, weren't we? -With this safety net enabled, we will first update `test-center.R`, and then +With the "safety net"" enabled, we will first update `test-center.R`, and then update the code in `center.R` with the `desired = 0` default argument. This strategy of (re)writing (new) tests before (re)writing the code-to-be-tested is called "[test-driven design/development][TDD]". It is intended to reduce confirmation @@ -226,5 +152,3 @@ our time and energy into testing code improvement, while recovering from mistake [ep-func]: {{ page.root }}/03-func-R/ [refactoring]: https://en.wikipedia.org/wiki/Code_refactoring -[usethis]: https://usethis.r-lib.org/ -[testthat]: https://testthat.r-lib.org/ diff --git a/_episodes_rmd/06-tidy-data.Rmd b/_episodes_rmd/07-tidy-data.Rmd similarity index 79% rename from _episodes_rmd/06-tidy-data.Rmd rename to _episodes_rmd/07-tidy-data.Rmd index 993077454..bb9fe2055 100644 --- a/_episodes_rmd/06-tidy-data.Rmd +++ b/_episodes_rmd/07-tidy-data.Rmd @@ -19,7 +19,7 @@ source: Rmd ```{r, include = FALSE} source("../bin/chunk-options.R") -knitr_fig_path("06-tidy-data-R-") +knitr_fig_path("07-tidy-data-R-") ``` ## Tidying our inflammation datasets @@ -159,3 +159,38 @@ the documentation file, what is left to do in our little personal package? > {: .solution} {: .challenge} +## Checking CSVs for structural validity + +This was an average case of cleaning a dataset. On the one hand, we had to label +columns (`r names(dat_long)`). On the other hand, the data itself was easily +tidied. You have probably encountered many other `.csv` files which were messy +in their own ways. Two useful tools to find structural problems in CSVs are +[CSVLint.io][csvl] and [try.goodtables.io][gt]. + +> ## Challenge: Search for structural problems in both the initial `inflammation.csv` and in your tidied version +> +> How many errors or warnings does that produce? Then, +> `write.csv(dat_long, "inflammation-tidy.csv")` +> and check that version. A few warnings may be left. If yes, which one(s)? +> How would you approach eliminating those? +> +> > ## Solutions +> > +> > 1. [CSVLint.io][csvl] should report "Structural problem: Possible title row detected" +> > and [try.goodtables.io][gt] should report "1x Blank Header". +> > 1. View `inflammation-tidy.csv` in RStudio, or a text editor and read `?write.csv`. +> > 1. An empty `"",` in the beginning of the first row indicates that we need to +> > set `row.names = FALSE`. Row names are a bit weird, because they will often +> > be numbers, but not necessarily will the number you see with `tail(...)` +> > be equal to the number of rows. Always use `nrow(...)` to count those. We +> > recommend to move actually useful information about a row / observation into +> > its own column. +> > 1. Depending on your operating system, you could set `eol = "\r\n"` to meet +> > the most common CSV format specification ([RFC 4180]). +> > +> {: .solution} +{: .challenge} + +[csvl]: https://csvlint.io/ +[gt]: https://try.goodtables.io/ +[RFC 4180]: https://tools.ietf.org/html/rfc4180 diff --git a/_includes/links.md b/_includes/links.md index 2fb9f74cf..5038983f5 100644 --- a/_includes/links.md +++ b/_includes/links.md @@ -34,5 +34,8 @@ [rubygems]: https://rubygems.org/pages/download/ [styles]: https://github.com/swcarpentry/styles/ [swc-releases]: https://github.com/swcarpentry/swc-releases +[testthat]: https://testthat.r-lib.org/ +[tt-atp]: https://testthat.r-lib.org/reference/auto_test_package.html +[usethis]: https://usethis.r-lib.org/ [workshop-repo]: {{ site.workshop_repo }} [yaml]: http://yaml.org/