2.4 Plotting Bivariate Data

2.4.1 Box Plots

The code to make box plots for two variables is very straightforward, and it introduces us to an important syntax in R. And that is: response ~ covariate. The ~ operator tells R that whatever is to the left is the response, or dependent variable, and whatever is to the right is the covariate or independent variable.

Let us load the pups.csv dataset we used before, and then view the first few lines of the dataset.

pups <- readr::read_csv("https://uw-statistics.github.io/Stat311Tutorial/data/pups.csv")
head(pups)
## # A tibble: 6 x 5
##      id weight length   age clutch
##   <dbl>  <dbl>  <dbl> <dbl>  <dbl>
## 1     1   20.3     97    14      1
## 2     2   28      104    16      2
## 3     3   31.5    106    16      3
## 4     4   31.5    108    17      1
## 5     5   32.5    109    18      2
## 6     6   33.5    110    18      3

The clutch variable is a categorical variable noting the birth group, or clutch, of the pup. Bivariate box plots require that the independent variable be categorical, so let us make a plot of weight versus clutch.

boxplot(pups$weight ~ pups$clutch,
        main = "Default Plot", 
        xlab = "Birth group/Clutch", 
        ylab = "Weight")

Note that there is a large range for the weight variable, due in part to the outlier in the first clutch, but that otherwise the weight distributions look fairly similar across birth groups.

2.4.2 Scatter Plots

What do we do if we want to plot the association of two continuous variables? This is a very important task in statistics, so it uses the aptly named plot() function.

In the pups dataset, let us consider weight the response and age the covariate. To save on typing, let’s assign the data to shorter names,

weight <- pups$weight
age <- pups$age

Now let’s make the bivariate scatter plot,

plot(weight ~ age,
     main = "Pup Weight vs. Age", 
     xlab = "Age", 
     ylab = "Weight",
     pch = 16, col = rgb(0, 0, 1, 0.6))

It is important to make the plot explain itself to its audience as much as possible. You should always include a main title, as well as a clear and informative x-axis label (xlab) and y-axis label (ylab). Points may be colored with col. R accepts many named colors (like "blue", and others–Google “R colors” for more), as well as user-defined colors using the rgb() function. The shape of the points themselves can be defined using pch, with some of the most popular listed below.