2 Working with Data
Statistics is based on the study of data. Or, as Wikipedia puts it:
“Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.”
Without data, all we can do is make guesses. In this chapter, we will learn how to work with data in R. Specifically, how to load data into R so we can work with it, and how to explore our data by calculating summaries and plotting simple figures.
We will also explore the tools for analyzing the associations between variables. We start with the association between two variables, or bivariate association. We will explore visual tools and plots for two variables, then move on to more number-based tools.
Learning Objectives
- Learn to load data into R.
- Learn to explore a dataset using numerical summaries.
- Learn to explore a dataset using simple plots.
- Learn how to plot bivariate data.
- Learn how to summarize a bivariate relationship numerically.
- Learn how to fit a linear model, or equivalently, perform linear regression.
Useful Functions
- Use
read_csv()
or theImport Dataset
button to load data into R. Use
getwd()
andsetwd()
to get and set R’s working directory.- Use
head()
andtail()
to examine the beginning and end of a dataset. - Use
names()
to see all the variable names in a dataset. - Use the
$
operator to access a particular column of a dataset. - Use
sort()
to sort data. Use
length()
,mean()
,median()
,range()
,var()
,sd()
, andtable()
to calculate numerical summaries.- Use
barplot()
to make a bar plot. - Use
hist()
to make a histogram. Use
boxplot()
to make a box plot.- Use the
~
operator to express the relationship between a dependent variable and independent variable(s). - Use
cov()
to calculate covariance. - Use
cor()
to calculate correlation. Use
lm()
to fit a linear model, or equivalently, perform linear regression.- Use
plot()
to make a scatter plot. - Use
abline()
to add a straight lone to a plot. Use
boxplot()
to make a box plot.