Code
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)Let us get R installed and the IDE set up. The whole process takes about ten minutes and three steps. You will install R itself (the language), then an IDE that gives you a sane way to work with it. R is the language; the IDE is the workspace where you write, run, and debug code. You need the first; the second is what you actually spend your time in.
This chapter walks through RStudio Desktop because it is the long-time standard and the gentlest place to start. Positron — Posit’s newer cross-language IDE — is mentioned at the end. Both work for everything in this book.
Download R from https://cloud.r-project.org/ – pick the version that matches your operating system (Windows, Mac, or Linux) and select base R. Mac users: double-check that you grab the version compatible with your specific macOS version. It’ll only take a minute.
Install R. If you’re on Windows, choose the 64-bit version when it asks. Once it’s installed, go ahead and open it just to make sure a console window pops up. See it? Great. Now close it. Seriously – you can close it and never open this plain R console again. We’ve got something much better for you.
Download RStudio Desktop from Posit’s website. (Posit is the company that makes RStudio — they rebranded a few years back as the company expanded into Python tooling.) Install the version for your operating system, then open it. From now on, RStudio is the workspace you live in; it connects to R automatically in the background, so the bare R console you opened in step 2 stays closed.
Welcome to your cockpit.
When you open RStudio, you’ll see the screen divided into four panels (or “panes”). It might look like a lot at first, but think of it as your digital workspace – like having your desk, your calculator, your filing cabinet, and your whiteboard all visible at the same time. Here’s what each panel does:
Source (top-left): This is your notepad. It’s where you write and edit your R scripts and reports. Think of it like drafting an email – you write your code here, review it, and then send it off to be executed when you’re ready.
Console (bottom-left): This is where the magic happens. When you run code, R processes it here and shows you the results. You can also type commands directly into the console for quick, one-off calculations – like using it as the world’s most powerful calculator.
Environment/History (top-right): This panel shows you everything R is currently holding in memory – your datasets, variables, and other objects. It’s like a table of contents for your current work session. The History tab keeps a log of every command you’ve run, which is great for those “wait, what did I just do?” moments.
Files/Plots/Packages/Help/Viewer (bottom-right): The utility panel. Browse files on your computer, view plots you create, manage installed packages, and read help documentation. When you produce a chart, this is where it appears first.
A few quick housekeeping tips that will save you from future headaches. Trust me, “Future You” will be grateful.
Keep each project in its own folder. Every analysis project should live in its own dedicated folder on your computer. Your data files, scripts, and any output (charts, reports) all go in there. It’s like having a separate binder for each class – simple, but it keeps you from losing things.
Watch your naming conventions. Avoid spaces and special characters in your folder names, file names, and variable names. Computers can be picky about these things. Instead of My Sales Report (Final v2).csv, go with my_sales_report_v2.csv. For variable names, something like Customer_Age or CustomerAge works great. And please, use names that actually mean something – x1 tells you nothing six months later, but monthly_revenue tells you everything.
Always check your working directory. After opening RStudio, make sure R knows where your project files are. The working directory is basically R’s answer to the question “where should I look for files?” If it’s pointed at the wrong folder, R won’t be able to find your data. It’s like telling your GPS the wrong starting address.
Use RStudio Projects (seriously, do this). This is the pro move. Go to File -> New Project to create one. An RStudio Project (.Rproj file) automatically sets your working directory, keeps all your files organized, and plays nicely with version control if you ever get into that. It’s the difference between tossing papers on your desk and having a proper filing system.
Packages are like apps for R – each one adds new capabilities. You install them from CRAN (the Comprehensive R Archive Network), which is basically R’s app store. Here’s how:
install.packages("pkgname")
install.packages(c("pkgname1","pkgname2")) # For two or more packagesYou only need to install a package once (just like you only download an app once). After that, you just load it each time you want to use it.
The tidyverse is a collection of packages that we’ll use constantly throughout this book. It’s like a starter kit for modern data analysis. One command installs the whole bundle:
install.packages("tidyverse")This single command installs ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, and lubridate, among others. It’s like buying the combo meal instead of ordering everything separately. While you’re at it, grab a few other handy packages we’ll use:
install.packages(c("readxl", "haven", "knitr", "rmarkdown", "kableExtra"))Sometimes a developer has a brand-new package (or a cutting-edge update) that hasn’t made it to CRAN yet. In that case, you can install it directly from GitHub – think of it as downloading a beta version of an app directly from the developer:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("githubaccountname/packagerepository")You probably won’t need this often, but it’s good to know it’s an option.
Here’s an important distinction: installing a package downloads it to your computer, but loading it makes it available for your current session. It’s like the difference between buying a book and actually opening it.
Every time you start a new R session and want to use a package, you need to load it:
library(pkgname)Loading the tidyverse loads all the core packages at once:
You will see a startup message listing the attached packages and any function name conflicts with base R. This is normal – don’t panic. It’s just R being transparent about what it loaded.
Your working directory is where R looks for files and saves output. Think of it as telling R: “everything I am working on today is in this folder.”
getwd()
setwd("path/to/directory")
Ask an AI assistant to “set my working directory to my data folder” and you will reliably get a line like setwd("C:/Users/yourname/Documents/projects/myproject"). It runs. It works on your machine, today. It also guarantees that the moment anyone else tries to run your code — a coworker, a reviewer, future-you on a new laptop — the script breaks.
Three habits to internalize from day one:
.Rproj file.read_csv("data/sales.csv"), not read_csv("C:/Users/me/projects/q3-analysis/data/sales.csv").here package gives you here("data", "sales.csv") which always resolves correctly relative to your project root, regardless of where in the project tree your script lives.AI assistants will keep generating setwd() because that is what most R code on the internet looks like. The fact that they generate it does not mean you should ship it.
This is the part you’ve been waiting for – loading actual data into R. Whether your boss emailed you a CSV, your professor shared an Excel file, or you downloaded data from a survey platform, R can handle it.
Load a saved R data file:
load("filename.Rda")Read a CSV file (the format you’ll probably encounter most often) using the readr package (part of the tidyverse). See Chapter @ref(dataimport) for the full deep dive.
That <- arrow means “take whatever’s on the right and store it in the name on the left.” So this line says: “Read this CSV file and call it dataname so I can work with it.” The read_csv() function is fast and gives you clean output.
Excel files arrive in inboxes constantly — the readxl package pulls them directly into R, no intermediate CSV step required:
library(readxl)
read_excel("excelfilename.xls") # reads xlsx extensions also
read_excel("excelfilename.xlsx", sheet = "nameofsheet")
read_excel("excelfilename.xlsx", sheet = 3) # for third sheet
read_excel("my-spreadsheet.xls", na = "NA") # specify NA representationYou can specify which sheet to read, which is a lifesaver when someone sends you a workbook with 17 tabs and the data you need is on sheet 3.
If you’re working with data from other statistical software (maybe from a research methods class or a dataset your professor shared), the haven package has you covered:
R can read almost any data format you throw at it. For a full list of options:
?read.table
# The readr package provides many other options: https://readr.tidyverse.org/You’ve done all this work – don’t forget to save it. Here’s how to export your data back out of R.
Pro tip: If you need to send the data to someone who only knows Excel, save it as a CSV. Everyone can open a CSV. It’s the universal language of data files.
A data frame is R’s version of a spreadsheet – rows and columns of data. If you can picture an Excel worksheet, you can picture a data frame. Let’s create a simple one.
A tibble is the tidyverse version of a data frame – it works the same way but has a few quality-of-life improvements, like nicer printing and not converting your text to weird factor categories behind your back.
# A tibble: 5 × 3
Person Age Height
<chr> <dbl> <dbl>
1 A 15 60
2 B 20 63
3 C 25 75
4 D 30 79
5 E 35 56
What happened here? We created three columns (Person, Age, Height), then stitched them together into a tibble called page. The c() function just means “combine these values into a list.” Easy.
You can also create small tibbles row-by-row using tribble(), which can be easier to read when you’re typing in data manually:
page_tribble <- tribble(
~Person, ~Age, ~Height,
"A", 15, 60,
"B", 20, 63,
"C", 25, 75,
"D", 30, 79,
"E", 35, 56
)
page_tribble# A tibble: 5 × 3
Person Age Height
<chr> <dbl> <dbl>
1 A 15 60
2 B 20 63
3 C 25 75
4 D 30 79
5 E 35 56
Here’s a fundamental concept: R stores everything in objects. An object is just a name that holds some data – kind of like a labeled box. You put something in the box, slap a label on it, and then you can refer to it by name later.
You assign values to objects using <- (the preferred way) or =:
x <- 10
y <- "hello"
z <- c(1, 2, 3, 4, 5)Reading that first line out loud: “x gets 10.” Now whenever you type x, R knows you mean 10. Simple as that.
R has a few basic types of data. Don’t overthink this – it’s pretty intuitive:
# Numeric
num <- 42.5
class(num)[1] "numeric"
# Character (string)
name <- "Statistics"
class(name)[1] "character"
# Logical (boolean)
flag <- TRUE
class(flag)[1] "logical"
When you’re working with data and want to check what you’re dealing with, these functions are your best friends:
class(x) # What type of object?[1] "numeric"
length(x) # How many elements?[1] 1
str(x) # Structure of the object num 10
typeof(x) # Internal storage type[1] "double"
is.numeric(x) # Is it numeric?[1] TRUE
Think of these as the “what am I looking at?” tools. You’ll use them constantly, especially when something isn’t working and you need to figure out why.
You will get stuck. Everyone does. The good news is that R has an excellent built-in help system, and knowing how to use it will save you hours of frustration.
?mean # Help page for a specific function
help(mean) # Same as above
??regression # Search help pages for a keyword
help.search("regression") # Same as aboveIn RStudio, you can also press F1 with your cursor on a function name to open its help page instantly. The help pages can look dense at first – lots of technical details – but the most useful parts are usually the Examples section at the very bottom. Scroll down, look at the examples, and things usually click.
And remember: Google is your friend. Searching “how to [thing you want to do] in R” will almost always lead you to a helpful answer on Stack Overflow or an R blog. You’re not cheating by Googling – even experienced R users do it every single day.
---
title: "Getting Started with R"
---
# Getting Started with R {#Rgettingstarted}
```{r}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
```
Let us get R installed and the IDE set up. The whole process takes about ten minutes and three steps. You will install R itself (the language), then an IDE that gives you a sane way to work with it. R is the language; the IDE is the workspace where you write, run, and debug code. You need the first; the second is what you actually spend your time in.
This chapter walks through RStudio Desktop because it is the long-time standard and the gentlest place to start. Positron — Posit's newer cross-language IDE — is mentioned at the end. Both work for everything in this book.
1. **Download R** from [https://cloud.r-project.org/](https://cloud.r-project.org/) -- pick the version that matches your operating system (Windows, Mac, or Linux) and select base R. Mac users: double-check that you grab the version compatible with your specific macOS version. It'll only take a minute.
2. **Install R.** If you're on Windows, choose the 64-bit version when it asks. Once it's installed, go ahead and open it just to make sure a console window pops up. See it? Great. Now close it. Seriously -- you can close it and never open this plain R console again. We've got something much better for you.
3. **Download RStudio Desktop** from [Posit's website](https://posit.co/download/rstudio-desktop/). (Posit is the company that makes RStudio — they rebranded a few years back as the company expanded into Python tooling.) Install the version for your operating system, then open it. From now on, RStudio is the workspace you live in; it connects to R automatically in the background, so the bare R console you opened in step 2 stays closed.
## The RStudio Interface
Welcome to your cockpit.
When you open RStudio, you'll see the screen divided into four panels (or "panes"). It might look like a lot at first, but think of it as your digital workspace -- like having your desk, your calculator, your filing cabinet, and your whiteboard all visible at the same time. Here's what each panel does:
- **Source (top-left)**: This is your notepad. It's where you write and edit your R scripts and reports. Think of it like drafting an email -- you write your code here, review it, and then send it off to be executed when you're ready.
- **Console (bottom-left)**: This is where the magic happens. When you run code, R processes it here and shows you the results. You can also type commands directly into the console for quick, one-off calculations -- like using it as the world's most powerful calculator.
- **Environment/History (top-right)**: This panel shows you everything R is currently holding in memory -- your datasets, variables, and other objects. It's like a table of contents for your current work session. The History tab keeps a log of every command you've run, which is great for those "wait, what did I just do?" moments.
- **Files/Plots/Packages/Help/Viewer (bottom-right)**: The utility panel. Browse files on your computer, view plots you create, manage installed packages, and read help documentation. When you produce a chart, this is where it appears first.
## Organizing Projects
A few quick housekeeping tips that will save you from future headaches. Trust me, "Future You" will be grateful.
1. **Keep each project in its own folder.** Every analysis project should live in its own dedicated folder on your computer. Your data files, scripts, and any output (charts, reports) all go in there. It's like having a separate binder for each class -- simple, but it keeps you from losing things.
2. **Watch your naming conventions.** Avoid spaces and special characters in your folder names, file names, and variable names. Computers can be picky about these things. Instead of `My Sales Report (Final v2).csv`, go with `my_sales_report_v2.csv`. For variable names, something like `Customer_Age` or `CustomerAge` works great. And please, use names that actually mean something -- `x1` tells you nothing six months later, but `monthly_revenue` tells you everything.
3. **Always check your working directory.** After opening RStudio, make sure R knows where your project files are. The working directory is basically R's answer to the question "where should I look for files?" If it's pointed at the wrong folder, R won't be able to find your data. It's like telling your GPS the wrong starting address.
4. **Use RStudio Projects (seriously, do this).** This is the pro move. Go to *File -> New Project* to create one. An RStudio Project (`.Rproj` file) automatically sets your working directory, keeps all your files organized, and plays nicely with version control if you ever get into that. It's the difference between tossing papers on your desk and having a proper filing system.
## Installing Packages from CRAN
Packages are like apps for R -- each one adds new capabilities. You install them from CRAN (the Comprehensive R Archive Network), which is basically R's app store. Here's how:
```{r}
#| eval: false
install.packages("pkgname")
install.packages(c("pkgname1","pkgname2")) # For two or more packages
```
You only need to install a package once (just like you only download an app once). After that, you just load it each time you want to use it.
### Installing the tidyverse
The tidyverse is a collection of packages that we'll use *constantly* throughout this book. It's like a starter kit for modern data analysis. One command installs the whole bundle:
```{r}
#| eval: false
install.packages("tidyverse")
```
This single command installs ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, and lubridate, among others. It's like buying the combo meal instead of ordering everything separately. While you're at it, grab a few other handy packages we'll use:
```{r}
#| eval: false
install.packages(c("readxl", "haven", "knitr", "rmarkdown", "kableExtra"))
```
## Installing a Package from GitHub
Sometimes a developer has a brand-new package (or a cutting-edge update) that hasn't made it to CRAN yet. In that case, you can install it directly from GitHub -- think of it as downloading a beta version of an app directly from the developer:
```{r}
#| eval: false
if (!require("devtools")) install.packages("devtools")
devtools::install_github("githubaccountname/packagerepository")
```
You probably won't need this often, but it's good to know it's an option.
## Loading Packages
Here's an important distinction: **installing** a package downloads it to your computer, but **loading** it makes it available for your current session. It's like the difference between buying a book and actually opening it.
Every time you start a new R session and want to use a package, you need to load it:
```{r}
#| eval: false
library(pkgname)
```
Loading the tidyverse loads all the core packages at once:
```{r}
library(tidyverse)
```
You will see a startup message listing the attached packages and any function name conflicts with base R. This is normal -- don't panic. It's just R being transparent about what it loaded.
## Setting Your Working Directory
Your working directory is where R looks for files and saves output. Think of it as telling R: "everything I am working on today is in *this* folder."
* Check your current working directory: `getwd()`
* Set a new one: `setwd("path/to/directory")`
* In RStudio, the menu way: *Session → Set Working Directory*.
* The right way: use RStudio Projects (covered above). When you open a project, the working directory is set automatically and your code stays portable across machines.
::: {.callout-warning}
## AI Pitfall: AI loves `setwd()` and `setwd()` will sink you
Ask an AI assistant to "set my working directory to my data folder" and you will reliably get a line like `setwd("C:/Users/yourname/Documents/projects/myproject")`. It runs. It works on *your* machine, today. It also guarantees that the moment anyone else tries to run your code — a coworker, a reviewer, future-you on a new laptop — the script breaks.
Three habits to internalize from day one:
1. **Never paste hardcoded absolute paths into scripts.** Use RStudio Projects so the working directory is set automatically when you open the `.Rproj` file.
2. **Inside a project, use relative paths**: `read_csv("data/sales.csv")`, not `read_csv("C:/Users/me/projects/q3-analysis/data/sales.csv")`.
3. **For anything more complex, the [`here` package](https://here.r-lib.org/)** gives you `here("data", "sales.csv")` which always resolves correctly relative to your project root, regardless of where in the project tree your script lives.
AI assistants will keep generating `setwd()` because that is what most R code on the internet looks like. The fact that they generate it does not mean you should ship it.
:::
## Reading Data from Different Sources
This is the part you've been waiting for -- loading actual data into R. Whether your boss emailed you a CSV, your professor shared an Excel file, or you downloaded data from a survey platform, R can handle it.
Load a saved R data file:
```{r}
#| eval: false
load("filename.Rda")
```
Read a CSV file (the format you'll probably encounter most often) using the **readr** package (part of the tidyverse). See Chapter \@ref(dataimport) for the full deep dive.
```{r}
#| eval: false
library(readr)
dataname <- read_csv("datafilename.csv") # returns a tibble
```
That `<-` arrow means "take whatever's on the right and store it in the name on the left." So this line says: "Read this CSV file and call it `dataname` so I can work with it." The `read_csv()` function is fast and gives you clean output.
### Excel Files
Excel files arrive in inboxes constantly — the `readxl` package pulls them directly into R, no intermediate CSV step required:
```{r}
#| eval: false
library(readxl)
read_excel("excelfilename.xls") # reads xlsx extensions also
read_excel("excelfilename.xlsx", sheet = "nameofsheet")
read_excel("excelfilename.xlsx", sheet = 3) # for third sheet
read_excel("my-spreadsheet.xls", na = "NA") # specify NA representation
```
You can specify which sheet to read, which is a lifesaver when someone sends you a workbook with 17 tabs and the data you need is on sheet 3.
### SPSS/SAS/Stata Files
If you're working with data from other statistical software (maybe from a research methods class or a dataset your professor shared), the **haven** package has you covered:
```{r}
#| eval: false
library(haven)
read_sas("path/to/file") # SAS file
read_por("path/to/file") # SPSS por files
read_sav("path/to/file") # SPSS sav files
read_dta("path/to/file") # Stata
```
### Many Others
R can read almost any data format you throw at it. For a full list of options:
```{r}
#| eval: false
?read.table
# The readr package provides many other options: https://readr.tidyverse.org/
```
## Saving Data
You've done all this work -- don't forget to save it. Here's how to export your data back out of R.
```{r}
#| eval: false
save(dataframename, file = "filename.Rda") # To R data (preserves multiple objects)
library(readr)
write_csv(dataframename, file = "myfilename.csv") # to CSV, no row names by default
write_rds(dataframename, file = "myfilename.rds") # single R object, preserves types
```
Pro tip: If you need to send the data to someone who only knows Excel, save it as a CSV. Everyone can open a CSV. It's the universal language of data files.
## Creating a Data Frame
A data frame is R's version of a spreadsheet -- rows and columns of data. If you can picture an Excel worksheet, you can picture a data frame. Let's create a simple one.
A **tibble** is the tidyverse version of a data frame -- it works the same way but has a few quality-of-life improvements, like nicer printing and not converting your text to weird factor categories behind your back.
```{r}
library(tibble)
page <- tibble(
Person = c("A", "B", "C", "D", "E"),
Age = c(15, 20, 25, 30, 35),
Height = c(60, 63, 75, 79, 56)
)
page
```
What happened here? We created three columns (Person, Age, Height), then stitched them together into a tibble called `page`. The `c()` function just means "combine these values into a list." Easy.
You can also create small tibbles row-by-row using `tribble()`, which can be easier to read when you're typing in data manually:
```{r}
page_tribble <- tribble(
~Person, ~Age, ~Height,
"A", 15, 60,
"B", 20, 63,
"C", 25, 75,
"D", 30, 79,
"E", 35, 56
)
page_tribble
```
## R Basics: Objects and Assignment
Here's a fundamental concept: R stores everything in **objects**. An object is just a name that holds some data -- kind of like a labeled box. You put something in the box, slap a label on it, and then you can refer to it by name later.
You assign values to objects using `<-` (the preferred way) or `=`:
```{r}
x <- 10
y <- "hello"
z <- c(1, 2, 3, 4, 5)
```
Reading that first line out loud: "x *gets* 10." Now whenever you type `x`, R knows you mean 10. Simple as that.
### Common Data Types
R has a few basic types of data. Don't overthink this -- it's pretty intuitive:
```{r}
# Numeric
num <- 42.5
class(num)
# Character (string)
name <- "Statistics"
class(name)
# Logical (boolean)
flag <- TRUE
class(flag)
# Vector (collection of same type)
nums <- c(10, 20, 30, 40, 50)
names <- c("Alice", "Bob", "Charlie")
```
- **Numeric**: Numbers. Could be revenue figures, ages, stock prices -- anything with digits.
- **Character**: Text. Customer names, product descriptions, survey responses.
- **Logical**: TRUE or FALSE. Did the customer buy? Is the account active? Yes or no questions.
- **Vector**: A list of values that are all the same type. Like a single column from a spreadsheet.
### Useful Functions for Exploring Objects
When you're working with data and want to check what you're dealing with, these functions are your best friends:
```{r}
class(x) # What type of object?
length(x) # How many elements?
str(x) # Structure of the object
typeof(x) # Internal storage type
is.numeric(x) # Is it numeric?
```
Think of these as the "what am I looking at?" tools. You'll use them constantly, especially when something isn't working and you need to figure out why.
## Getting Help
You *will* get stuck. Everyone does. The good news is that R has an excellent built-in help system, and knowing how to use it will save you hours of frustration.
```{r}
#| eval: false
?mean # Help page for a specific function
help(mean) # Same as above
??regression # Search help pages for a keyword
help.search("regression") # Same as above
```
In RStudio, you can also press **F1** with your cursor on a function name to open its help page instantly. The help pages can look dense at first -- lots of technical details -- but the most useful parts are usually the **Examples** section at the very bottom. Scroll down, look at the examples, and things usually click.
And remember: Google is your friend. Searching "how to [thing you want to do] in R" will almost always lead you to a helpful answer on Stack Overflow or an R blog. You're not cheating by Googling -- even experienced R users do it every single day.