2 Working with Data

Statistics is based on the study of data. Or, as Wikipedia puts it:

“Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.”

Without data, all we can do is make guesses. In this chapter, we will learn how to work with data in R. Specifically, how to load data into R so we can work with it, and how to explore our data by calculating summaries and plotting simple figures.

We will also explore the tools for analyzing the associations between variables. We start with the association between two variables, or bivariate association. We will explore visual tools and plots for two variables, then move on to more number-based tools.

Learning Objectives

  • Learn to load data into R.
  • Learn to explore a dataset using numerical summaries.
  • Learn to explore a dataset using simple plots.
  • Learn how to plot bivariate data.
  • Learn how to summarize a bivariate relationship numerically.
  • Learn how to fit a linear model, or equivalently, perform linear regression.

Useful Functions

  • Use read_csv() or the Import Dataset button to load data into R.
  • Use getwd() and setwd() to get and set R’s working directory.

  • Use head() and tail() to examine the beginning and end of a dataset.
  • Use names() to see all the variable names in a dataset.
  • Use the $ operator to access a particular column of a dataset.
  • Use sort() to sort data.
  • Use length(), mean(), median(), range(), var(), sd(), and table() to calculate numerical summaries.

  • Use barplot() to make a bar plot.
  • Use hist() to make a histogram.
  • Use boxplot() to make a box plot.

  • Use the ~ operator to express the relationship between a dependent variable and independent variable(s).
  • Use cov() to calculate covariance.
  • Use cor() to calculate correlation.
  • Use lm() to fit a linear model, or equivalently, perform linear regression.

  • Use plot() to make a scatter plot.
  • Use abline() to add a straight lone to a plot.
  • Use boxplot() to make a box plot.