2.1 Loading Data

R has many different functions for loading data, based on the format of the file. Since data often come in comma separated variables (CSV) format, we’ll use that as an example.

We will load some data on the weight, length, and age of great white sharks (Carcharodon carcharias). It is in a file called pups.csv. You can download the file from the labs website on GitHub: pups.csv.

To load a csv file like this, there are two basic options.

2.1.1 The Import Dataset RStudio Button

This is the “point-and-click” approach, and it often is the easiest. Click the Import Dataset button in RStudio’s Environment tab in the upper right pane. In this case we have a CSV file, so choose that option. A new window will appear.

Copy and paste the link above into the new window and click Update. The contents of the file will appear in the preview pane. In the lower left, you can choose what to call the variable where the data will be stored, and in the lower right, you can see a preview of the code that R will execute. Click Import and the pups variable should appear under Data in the Environment pane.

This is easy, but if you want to save a series of commands so that you can execute them again (and this is essential for reproducible research), you should use the next option.

2.1.2 The read_csv() Function (in the readr package)

You can either type this in directly to the R console command line (lower left pane of the Rstudio screen), or save it in a script that can be “sourced” (executed) whenever you like. Scripts are written/saved/sourced in the top left pane of the RStudio screen.

For this method you need to provide the location of the data, and the name you want the object to have when it is loaded into R. For this example, our copy of the data is online at https://uw-statistics.github.io/Stat311Tutorial/data/pups.csv, and we wish to store the data in the variable pups. We would execute the code,

library(readr)
pups <- read_csv("https://uw-statistics.github.io/Stat311Tutorial/data/pups.csv")
## Parsed with column specification:
## cols(
##   id = col_double(),
##   weight = col_double(),
##   length = col_double(),
##   age = col_double(),
##   clutch = col_double()
## )

Notice that this is very similar to what we see in the Code Preview pane in the Import Dataset method above.

Finally, if we only want to use read_csv() once or twice, we do not have to load the entire readr package with the library command, we can just tell R that read_csv() is in readr using the :: operator. And, if we have a copy of pups.csv saved on our computer, in our “working directory”" (see next section for how R defines the current working directory), we can just use the name "pups.csv" instead of the URL. Like so,

pups <- readr::read_csv("pups.csv")

2.1.3 The Working Directory

The working directory is the location on your computer where R looks for files and data first. You can run the command getwd() to find out what the current working directory is. In RStudio, you can also see the current working directory directly under the console tab in the console pane.

To set the working directory, you can run the command setwd(). In RStudio, you can also set the working directory by navigating somewhere in the Files tab, clicking More, and selecting Set As Working Directory.