diff --git a/01-getting-started.Rmd b/01-getting-started.Rmd index a6174c8..8091388 100644 --- a/01-getting-started.Rmd +++ b/01-getting-started.Rmd @@ -1,7 +1,7 @@ # Getting Started with Data in R {#getting-started} ```{r setup_getting_started, include=FALSE, purl=FALSE} -chap <- 2 +chap <- 1 lc <- 0 rq <- 0 # **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** @@ -38,7 +38,7 @@ For much of this book, we will assume that you are using R via RStudio. First ti - RStudio is like a car's dashboard. | R: Engine | RStudio: Dashboard | -|:--------------------------------------:|:-----------------------------------------:| +|:---------------------------------:|:-----------------------------------:| | ![](images/engine.jpg){height="1.7in"} | ![](images/dashboard.jpg){height="1.7in"} | More precisely, R is a programming language that runs computations while RStudio is an *integrated development environment (IDE)* that provides an interface by adding many convenient features and tools. So just as having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio's interface makes using R much easier as well. @@ -49,13 +49,13 @@ RStudio Cloud () is a hosted version of RStudio that allo To begin using RStudio Cloud use the link provided by your instructor to gain access to the classroom workspace. You will be prompted to create a free account or log in if you have an existing account. -After you open RStudio Cloud, you should now have access to the classroom under 'Spaces' on the left hand side (in this case 'Stat 202'). +After you open RStudio Cloud, you should now have access to the classroom under 'Spaces' on the left hand side (in this case 'STAT202'). -![](images/RStudioCloud.png) +![](images/rstudio_cloud.png) -Throughout class you will be working on various activities. Once the instructor has made an activity available you will click on the classroom Workspace (Stat 202) to access the available projects. To begin working on an activity click 'Start'. Once that activity project is open navigate to the 'File' pane and open the R Markdown '.Rmd' file. +Throughout this course you will be working on various activities. Once the instructor has made an activity available you will click on the classroom Workspace (STAT202) to access the available projects. To begin working on an activity click 'Start'. Once that activity project is open navigate to the 'File' pane and open the Quarto '.qmd' file. -![](images/RStudioWorkspace.png) +![](images/rstudio_workspace.png) You can use RStudio Cloud for personal use as well by creating projects in 'Your Workspace'. However, RStudio Cloud limits the number of projects and amount of accessible time so it is recommended that you later install the software on your own computer. @@ -73,12 +73,13 @@ You can use RStudio Cloud for personal use as well by creating projects in 'Your - Scroll down to "Installers for Supported Platforms" near the bottom of the page. - Click on the download link corresponding to your computer's operating system. + ### Using R via RStudio Recall our car analogy from above. Much as we don't drive a car by interacting directly with the engine but rather by interacting with elements on the car's dashboard, we won't be using R directly but rather we will use RStudio's interface. After you install R and RStudio on your computer, you'll have two new programs AKA applications you can open. We will always work in RStudio and not R. In other words: | R: Do not open this | RStudio: Open this | -|:--------------------------------------------------------------:|:---------------------------------------------------------------------:| +|:--------------------------------:|:------------------------------------:| | `r include_image("images/Rlogo.png", html_opts = "width=25%")` | `r include_image("images/RStudio-Ball.png", html_opts = "width=20%")` | After you open RStudio, you should see the following: @@ -87,28 +88,37 @@ After you open RStudio, you should see the following: Note the three panes, which are three panels dividing the screen: The *Console pane*, the *Files pane*, and the *Environment pane*. Over the course of this chapter, you'll come to learn what purpose each of these panes serve. + + ## How do I code in R? {#code} -Now that you're set up with R and RStudio, you are probably asking yourself "OK. Now how do I use R?" The first thing to note as that unlike other statistical software programs like Excel, STATA, or SAS that provide [point and click](https://en.wikipedia.org/wiki/Point_and_click) interfaces, R is an [interpreted language](https://en.wikipedia.org/wiki/Interpreted_language), meaning you have to enter in R commands written in R code. In other words, you have to code/program in R. Note that we'll use the terms "coding" and "programming" interchangeably in this book. +Now that you're set up with R and RStudio, you are probably asking yourself "OK. Now how do I use R?" The first thing to note is that unlike other statistical software programs like Excel, STATA, or SAS that provide [point and click](https://en.wikipedia.org/wiki/Point_and_click) interfaces, R is an [interpreted language](https://en.wikipedia.org/wiki/Interpreted_language), meaning you have to enter in R commands written in R code. In other words, you have to code/program in R. Note that we'll use the terms "coding" and "programming" interchangeably in this book. While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that R users need to understand. Consequently, while this book is not a book on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively. -### Creating your first R Markdown document +### Creating your first Quarto document + +Quarto allows you to easily create a document which combines your code, the results from your code, as well as any text that accompanies the analysis. If you are using RStudio on your personal computer you will need to install Quarto. If you are using RStudio Server/Cloud you can skip this installation step. + +1. **Install Quarto:** [Download and install Quarto](https://quarto.org/docs/get-started/). -R Markdown allows you to easily create a document which combines your code, the results from your code, as well as any text that accompanies the analysis. To create a new R Markdown file, in R Studio select File\>New File\>R Markdown. Then, you will see a window pop-up titled *New R Markdown*. Here, you specify the type of file you wish to create. HTML is generally the recommended document type since it does not have traditional *page* separators like PDF and Word do. You can also choose a title and author for your document using their respective fields. Finally, select *Ok* to create your new R Markdown file. You will see it appear as a tab in your R Studio session. Click the *save icon* to save your new document. + - Click on the download Quarto CLI link corresponding to your computer's operating system. -The following is an example of an R Markdown document: -![](images/markdown_example.png) +To create a new Quarto file, in RStudio select File\>New File\>Quarto Document. Then, you will see a window pop-up titled *New Quarto Document*. Here, you specify the type of file you wish to create. HTML is generally the recommended document type since it does not have traditional *page* separators like PDF and Word do. You can also choose a title and author for your document using their respective fields. Finally, select *Create* to create your new Quarto file. You will see it appear as a tab in your RStudio session. Click the *save icon* to save your new document. + +The following is an example of a Quarto document: + +
![](images/quarto_example.png){width="75%"}
a) Save your document. -b) Click *knit* to compile your R Markdown into the document file type that you specified. The file will be saved in your *Files pane*. This will also save your document. +b) Click *Render* to compile your Quarto document into the file type that you specified. The file will be saved in your *Files pane*. This will also save your document. c) Insert a new code chunk in your document where the cursor is located. You will often have many code chunks in your document. d) Run the current code chunk. -When you create your Markdown file and *knit* it into a document, the chunks are run in order and any output from them is shown in the document, in the order and location that their respective chunk appears. Sometimes you may wish to type code or analyze data without it printing in the document. If that is the case, you type the code in the *Console* rather than in the *.Rmd* file. +When you create your Quarto file and *Render* it into a document, the chunks are run in order and any output from them is shown in the document, in the order and location that their respective chunk appears. Sometimes you may wish to type code or analyze data without it printing in the document. If that is the case, you type the code in the *Console* rather than in the *.qmd* file. -While you read through this book, it will be helpful to have an RMarkdown document open so you can copy code provided and paste it into a code chunk to run. +While you read through this book, it will be helpful to have a Quarto document open so you can copy code provided and paste it into a code chunk to run. ### Basic programming concepts and terminology {#programming-concepts} @@ -121,7 +131,7 @@ We now introduce some basic programming concepts and terminology. Instead of ask - *Objects*: Where values are saved in R. In order to do useful and interesting things in R, we will want to *assign* a name to an object. For example we could do the following assignments: `x <- 44 - 20` and `three <- 3`. This would allow us to run `x + three` which would return `27`. - *Data types*: Integers, doubles/numerics, logicals, and characters. -In R Studio try typing the following code into the console or code chunk. +In RStudio try typing the following code into the console or code chunk. ```{r} x <- 44-20 @@ -181,7 +191,7 @@ Another point of confusion with many new R users is the idea of an R package. R A good analogy for R packages is they are like apps you can download onto a mobile phone: | R: A new phone | R Packages: Apps you can download | -|:--------------------------------------:|:------------------------------------:| +|:-----------------------------------:|:---------------------------------:| | ![](images/iphone.jpg){height="1.5in"} | ![](images/apps.jpg){height="1.5in"} | So R is like a new mobile phone: while it has a certain amount of features when you use it for the first time, it doesn't have everything. R packages are like the apps you can download onto your phone from Apple's App Store or Android's Google Play. @@ -216,7 +226,7 @@ There are two ways to install an R package. For example, to install the `ggplot2 Much like an app on your phone, you only have to install a package once. However, if you want to update an already installed package to a newer verions, you need to re-install it by repeating the above steps. -```{block lc2-0, type='learncheck'} +```{block lc1-0, type='learncheck'} **_Learning check_** ``` @@ -239,7 +249,7 @@ If after running the above code, a blinking cursor returns next to the `>` "prom ... it means that you didn't successfully install it. In that case, go back to the previous subsection "Package installation" and install it. -```{block lc2-1, type='learncheck'} +```{block lc1-1, type='learncheck'} **_Learning check_** ``` @@ -279,7 +289,7 @@ We'd all like to arrive at our destinations on time whenever possible. (Unless y - `flights`: Information on all `r scales::comma(nrow(nycflights13::flights))` flights - `airlines`: A table matching airline names and their two letter IATA airline codes (also known as carrier codes) for `r nrow(nycflights13::airlines)` airline companies - `planes`: Information about each of `r scales::comma(nrow(nycflights13::planes))` physical aircraft used. -- `weather`: Hourly meteorological data for each of the three NYC airports. This data frame has `r scales::comma(nrow(nycflights13::weather))` rows, roughtly corresponding to the 365 $\times$ 24 $\times$ 3 = 26,280 possible hourly measurements one can observe at three locations over the course of a year. +- `weather`: Hourly meteorological data for each of the three NYC airports. This data frame has `r scales::comma(nrow(nycflights13::weather))` rows, roughly corresponding to the 365 $\times$ 24 $\times$ 3 = 26,280 possible hourly measurements one can observe at three locations over the course of a year. - `airports`: Airport names, codes, and locations for `r scales::comma(nrow(nycflights13::airports))` destination airports. ### `flights` data frame @@ -318,7 +328,7 @@ Among the many ways of getting a feel for the data contained in a data frame suc Run `View(flights)` in your Console in RStudio, either by typing it or cutting & pasting it into the Console pane, and explore this data frame in the resulting pop-up viewer. You should get into the habit of always `View`ing any data frames that come your way. Note the capital "V" in `View`. R is case-sensitive so you'll receive an error is you run `view(flights)` instead of `View(flights)`. -```{block lc2-2, type='learncheck'} +```{block lc1-2, type='learncheck'} **_Learning check_** ``` @@ -346,7 +356,7 @@ glimpse(flights) We see that `glimpse()` will give you the first few entries of each variable in a row after the variable. In addition, the *data type* (see Subsection \@ref(programming-concepts)) of the variable is given immediately after each variable's name inside `< >`. Here, `int` and `dbl` refer to "integer" and "double", which are computer coding terminology for quantitative/numerical variables. In contrast, `chr` refers to "character", which is computer terminology for text data. Text data, such as the `carrier` or `origin` of a flight, are categorical variables. The `time_hour` variable is an example of one more type of data type: `dttm`. As you may suspect, this variable corresponds to a specific date and time of day. However, we won't work with dates in this class and leave it to a more advanced book on data science. -```{block lc2-3, type='learncheck'} +```{block lc1-3, type='learncheck'} **_Learning check_** ``` @@ -364,7 +374,7 @@ airlines kable(airlines) ``` -At first glance, it may not appear that there is much difference in the outputs. However when using tools for document production such as [R Markdown](http://rmarkdown.rstudio.com/lesson-1.html), the latter code produces output that is much more legible and reader-friendly. +At first glance, it may not appear that there is much difference in the outputs. However when using tools for document production such as [Quarto](https://quarto.org/docs/get-started/hello/rstudio.html), the latter code produces output that is much more legible and reader-friendly. **4. `$` operator** @@ -393,13 +403,91 @@ We've given you what we feel are the most essential concepts to know before you ### Additional resources -If you are completely new to the world of coding, R, and RStudio and feel you could benefit from a more detailed introduction, we suggest you check out ModernDive co-author Chester Ismay's [Getting used to R, RStudio, and R Markdown](https://rbasics.netlify.com/) short book [@usedtor2016], which includes screencast recordings that you can follow along and pause as you learn. Furthermore, there is an introduction to R Markdown, a tool used for reproducible research in R. +If you are completely new to the world of coding, R, and RStudio and feel you could benefit from a more detailed introduction, we suggest you check out ModernDive co-author Chester Ismay's [Getting used to R, RStudio, and R Markdown](https://rbasics.netlify.com/) short book [@usedtor2016], which includes screencast recordings that you can follow along and pause as you learn. While this book teaches R Markdown it it important to note that everything in R Markdown is transferable to Quarto. R Markdown and Quarto are both tools used for reproducible research but R Markdown is fundamentally tied to R while Quarto is a multi-language platform. + +
![](images/gettting-used-to-R.png){height="3.5in"}
+ +## Practice Problems + +```{block pp1-0, type='practice'} +**_Concept_** +``` + +1. Which type of document do we use to both code and write explanations? + + a) R Script + b) Quarto Document + c) HTML file + d) R Notebook + +\n + +2. Which type of red text in the console pane generally means that your code will not run? + + a) error + b) warning + c) message + +\n + +3. The function load is used to load and attach add-on packages. + + a) TRUE + b) FALSE + +\n + +4. If you place the operator ? before the name of a function or data frame, then you will be presented with a page showing the documentation for the respective function or data frame. + + a) TRUE + b) FALSE + +\n + +5. What does any ONE row in this flights dataset refer to? + + a) Data on an airline + b) Data on a flight + c) Data on an airport + d) Data on multiple flights + +\n + +6. In the flights dataset, `air_time` and `arr_delay` are which type of variables? + + a) string + b) categorical + c) quantitative + d) character + e) dataframe + +\n + +```{block pp1-1, type='practice'} +**_Application_** +``` + +7. In a code chunk, first define a variable `z` to be the product of 12 and 31, then define a variable called `add_on` to be the number 12. Print the output of `z + add_on`. + +\n + +8. Consider the `titanic` data set included in the package `ISDSdatasets`. This is one of the most popular data sets used for understanding machine learning basics, and you will likely see this data set in the future if you continue on in your studies to machine learning. + + Use the `glimpse()` function from the `dplyr` package to explore and describe the dataset. + +\n + +```{block pp1-2, type='practice'} +**_Advanced_** +``` + +For the following problems we will use the `titanic` data set to learn additional data exploration techniques. -
+9. Use the function `head()` on the `titanic` dataset. What does it do? Based on this, what do you expect the function `tail()` does? -![](images/gettting-used-to-R.png){height="3.5in"} +\n -
+10. The function `unique()`, when used on a specific variable within a data set, returns a vector of the values of the variable with duplicate elements removed. Try using the function `unique()` on the variable `Embarked`. diff --git a/images/RStudioCloud.png b/images/RStudioCloud.png deleted file mode 100644 index 526a9c8..0000000 Binary files a/images/RStudioCloud.png and /dev/null differ diff --git a/images/RStudioWorkspace.png b/images/RStudioWorkspace.png deleted file mode 100644 index 624e3d7..0000000 Binary files a/images/RStudioWorkspace.png and /dev/null differ diff --git a/images/markdown_example.png b/images/markdown_example.png deleted file mode 100644 index 808746f..0000000 Binary files a/images/markdown_example.png and /dev/null differ diff --git a/images/quarto_example.png b/images/quarto_example.png new file mode 100644 index 0000000..f20d1d6 Binary files /dev/null and b/images/quarto_example.png differ diff --git a/images/rstudio_cloud.png b/images/rstudio_cloud.png new file mode 100644 index 0000000..3663bb7 Binary files /dev/null and b/images/rstudio_cloud.png differ diff --git a/images/rstudio_workspace.png b/images/rstudio_workspace.png new file mode 100644 index 0000000..3a49d8e Binary files /dev/null and b/images/rstudio_workspace.png differ diff --git a/style.css b/style.css index d39a905..cac9562 100644 --- a/style.css +++ b/style.css @@ -1,7 +1,13 @@ .learncheck { padding: 1em 1em 1em 1em; margin-bottom: 10px; - background: #9ED3AD 5px center/3em no-repeat; + background: #E4E0EE 5px center/3em no-repeat; +} + +.practice { + padding: 0.1em 1em 0.1em 1em; + margin-bottom: 10px; + background: #D8D6D6 5px center/3em no-repeat; } .announcement {