## R Basics (2020–2021)

### Lieven Clement · Universiteit Gent

In this interactive course you will build a foundation in R and learn the basics to wrangle data. The course is based on the first part of the e-book **Introduction to Data Science** authored by Prof. Rafael Irizarry, Department of Data Sciences at the Dana-Farber Cancer Institute and Department of Biostatistics Harvard School of Public Health.

You are not a member of this course.

RegisterTitle | Class progress | |||
---|---|---|---|---|

Preface | ||||

Acknowledgments | ||||

Introduction |

The instructions below include screen shots from the installation process in which we used the Chrome browser which, although not necessary, you can freely download and install from here: https://www.google.com/chrome/.

Title | Class progress | |||
---|---|---|---|---|

36.1 Installing R | ||||

36.2 Installing RStudio |

In this book, we will be using the R software environment for all our
analysis. You will learn R and data analysis techniques simultaneously.
To follow along you will therefore need access to R. We also recommend
the use of an *integrated development environment* (IDE), such as
RStudio, to save your work. Note that it is common for a course or
workshop to offer access to an R environment and an IDE through your web
browser, as done by RStudio cloud. If you have access to such a
resource, you don’t need to install R and RStudio. However, if you
intend on becoming an advanced data analyst, we highly recommend
installing these tools on your computer. Both R and RStudio are
free and available online.
We suggest to develop your code for the exercises in RStudio and to paste your script in dodona to evaluate them.

Title | Class progress | Status | |||
---|---|---|---|---|---|

2.1 Case study: US Gun Murders | |||||

2.2 The very basics | |||||

2.3.1. Sum of integers 1,...,100 | |||||

2.3.2. Sum of integers 1,...,1000 | |||||

2.3.3. Interpret code | |||||

2.3.4. Nested functions | |||||

2.3.5. Interpret code | |||||

2.4 Data types | |||||

2.5.1. Variables in a dataframe | |||||

2.5.2 Variable names | |||||

2.5.3 Examining Variables | |||||

2.5.4 Multiple ways to access variables | |||||

2.5.5 Factors | |||||

2.5.6 Tables | |||||

2.6 Vectors | |||||

2.7 Coercion | |||||

2.8.1-5. Vectors | |||||

2.8.6. Vector of numbers 12..73 | |||||

2.8.7. Odd numbers | |||||

2.8.8. Length of a sequence | |||||

2.8.9. Class of seq(1, 10, 0.5) | |||||

2.8.10. Class of seq(1, 10) | |||||

2.8.11. 1 vs 1L | |||||

2.8.12. Vector cast | |||||

2.9 Sorting | |||||

2.10.1-4. Dataframes 1 | |||||

2.10.5-6. Dataframes 2 | |||||

2.10.7-8. NA | |||||

2.11 Vector arithmetics | |||||

2.12.1. Convert Temperatures | |||||

2.12.2. Vector Sum | |||||

2.12.3. Vector Mean | |||||

2.13 Indexing | |||||

2.14.1-5. Dataframe operations | |||||

2.14.6. Match function | |||||

2.14.7-8. Match operator | |||||

2.15 Basic plots | |||||

2.16.1. Scatter Plot | |||||

2.16.2. Histogram | |||||

2.16.3. Boxplot |

We teach R because it greatly facilitates data analysis, the main topic
of this book. By coding in R, we can efficiently perform exploratory
data analysis, build data analysis pipelines, and prepare data
visualization to communicate results. However, R is not just a data
analysis environment but a programming language. Advanced R programmers
can develop complex packages and even improve R itself, but we do not
cover advanced programming in this book. Nonetheless, in this section,
we introduce three key programming concepts: conditional expressions,
for-loops, and functions. These are not just key building blocks for
advanced programming, but are sometimes useful during data analysis. We
also note that there are several functions that are widely used to
program in R but that we will not cover in this book. These include
`split`

, `cut`

, `do.call`

, and `Reduce`

, as well as the **data.table**
package. These are worth learning if you plan to become an expert R
programmer.

Title | Class progress | Status | |||
---|---|---|---|---|---|

3.0 Introduction | |||||

3.1 Conditional expressions | |||||

3.2 Defining functions | |||||

3.3 Namespaces | |||||

3.4 For-loops | |||||

3.5 Vectorization and functionals | |||||

3.6.1 Conditional expression | |||||

3.6.2 Any and all | |||||

3.6.3 Conditional changes | |||||

3.6.4 Sum of the first n integers | |||||

3.6.5 Functions with multiple variables | |||||

3.6.6 Namespace | |||||

3.6.7-8 Sum of the first n squares (I) | |||||

3.6.9 Sum of the first n squares (II) | |||||

3.6.10 Sum of the first n squares (III) |

Up to now we have been manipulating vectors by reordering and subsetting
them through indexing. However, once we start more advanced analyses,
the preferred unit for data storage is not the vector but the data
frame. In this chapter we learn to work directly with data frames, which
greatly facilitate the organization of information. We will be using
data frames for the majority of this book. We will focus on a specific
data format referred to as *tidy* and on specific collection of packages
that are particularly helpful for working with *tidy* data referred to
as the *tidyverse*.

We can load all the tidyverse packages at once by installing and loading
the **tidyverse** package:

```
library(tidyverse)
```

We will learn how to implement the tidyverse approach throughout the
book, but before delving into the details, in this chapter we introduce
some of the most widely used tidyverse functionality, starting with the
**dplyr** package for manipulating data frames and the **purrr** package
for working with functions. Note that the tidyverse also includes a
graphing package, **ggplot2**, which will be introduced in a later course on data visualization, the
**readr** package discussed in Chapter 5;
and many others. In this chapter, we first introduce the concept of
*tidy data* and then demonstrate how we use the tidyverse to work with
data frames in this format.

Title | Class progress | Status | |||
---|---|---|---|---|---|

4.1 Tidy data | |||||

4.2.1 CO2 | |||||

4.2.2 Weight | |||||

4.2.3 BOD | |||||

4.2.4 Some more datasets | |||||

4.3 Manipulating data frames | |||||

4.4.1-3. Dataframe column operations | |||||

4.4.4-6. Dataframe row operations | |||||

4.4.7. Dataframe mixed operations | |||||

4.5 The pipe: %>% | |||||

4.6.1 Pipes | |||||

4.7 Summarizing data | |||||

4.8 Sorting data frames | |||||

4.9.1. Summarizing | |||||

4.9.2. Grouping | |||||

4.9.3. Sorting | |||||

4.10 Tibbles | |||||

4.11 The dot operator | |||||

4.12 do | |||||

4.13 The purrr package | |||||

4.14 Tidyverse conditionals | |||||

4.15.1. Tibbles 1 | |||||

4.15.2. Tibbles 2 | |||||

4.15.3. Purrr |

Title | Class progress | Status | |||
---|---|---|---|---|---|

5.0 Introduction | |||||

5.1 Paths and the working directory | |||||

5.2 The readr and readxl packages | |||||

5.3 Lost column headers | |||||

5.4 Downloading files | |||||

5.5 R-base importing functions | |||||

5.6 Text versus binary files | |||||

5.7 Unicode versus ASCII | |||||

5.8 Organizing data with spreadsheets | |||||

5.9.1 Spreadsheets 1 | |||||

5.9.2 Spreadsheets 2 |