Dier en Data - R Basics (2023–2024)

Miel Hostens · Universiteit Utrecht

In this interactive course you will build a foundation in R and learn the basics to wrangle data. The course is based on the first part of the e-book Introduction to Data Science authored by Prof. Rafael Irizarry, Department of Data Sciences at the Dana-Farber Cancer Institute and Department of Biostatistics Harvard School of Public Health.

Je kan momenteel niet registreren voor deze cursus.

Registreren

Oefeningenreeksen

Prologue
17 november 2023 17:00

Installing R and Rstudio
17 november 2023 17:00

The instructions below include screen shots from the installation process in which we used the Chrome browser which, although not necessary, you can freely download and install from here: https://www.google.com/chrome/.

1. Getting started with R and RStudio
17 november 2023 17:00

2. R Basics
17 november 2023 17:00

In this book, we will be using the R software environment for all our analysis. You will learn R and data analysis techniques simultaneously. To follow along you will therefore need access to R. We also recommend the use of an integrated development environment (IDE), such as RStudio, to save your work. Note that it is common for a course or workshop to offer access to an R environment and an IDE through your web browser, as done by RStudio cloud. If you have access to such a resource, you don’t need to install R and RStudio. However, if you intend on becoming an advanced data analyst, we highly recommend installing these tools on your computer. Both R and RStudio are free and available online. We suggest to develop your code for the exercises in RStudio and to paste your script in dodona to evaluate them.

3. Programming basics

We teach R because it greatly facilitates data analysis, the main topic of this book. By coding in R, we can efficiently perform exploratory data analysis, build data analysis pipelines, and prepare data visualization to communicate results. However, R is not just a data analysis environment but a programming language. Advanced R programmers can develop complex packages and even improve R itself, but we do not cover advanced programming in this book. Nonetheless, in this section, we introduce three key programming concepts: conditional expressions, for-loops, and functions. These are not just key building blocks for advanced programming, but are sometimes useful during data analysis. We also note that there are several functions that are widely used to program in R but that we will not cover in this book. These include split, cut, do.call, and Reduce, as well as the data.table package. These are worth learning if you plan to become an expert R programmer.

4. The tidyverse

Up to now we have been manipulating vectors by reordering and subsetting them through indexing. However, once we start more advanced analyses, the preferred unit for data storage is not the vector but the data frame. In this chapter we learn to work directly with data frames, which greatly facilitate the organization of information. We will be using data frames for the majority of this book. We will focus on a specific data format referred to as tidy and on specific collection of packages that are particularly helpful for working with tidy data referred to as the tidyverse.

We can load all the tidyverse packages at once by installing and loading the tidyverse package:

library(tidyverse)

We will learn how to implement the tidyverse approach throughout the book, but before delving into the details, in this chapter we introduce some of the most widely used tidyverse functionality, starting with the dplyr package for manipulating data frames and the purrr package for working with functions. Note that the tidyverse also includes a graphing package, ggplot2, which will be introduced in a later course on data visualization, the readr package discussed in Chapter 5; and many others. In this chapter, we first introduce the concept of tidy data and then demonstrate how we use the tidyverse to work with data frames in this format.

5. Importing Data