3  Getting Started with R

4 Getting Started with R

Code
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)

Let us get R installed and the IDE set up. The whole process takes about ten minutes and three steps. You will install R itself (the language), then an IDE that gives you a sane way to work with it. R is the language; the IDE is the workspace where you write, run, and debug code. You need the first; the second is what you actually spend your time in.

This chapter walks through RStudio Desktop because it is the long-time standard and the gentlest place to start. Positron — Posit’s newer cross-language IDE — is mentioned at the end. Both work for everything in this book.

  1. Download R from https://cloud.r-project.org/ – pick the version that matches your operating system (Windows, Mac, or Linux) and select base R. Mac users: double-check that you grab the version compatible with your specific macOS version. It’ll only take a minute.

  2. Install R. If you’re on Windows, choose the 64-bit version when it asks. Once it’s installed, go ahead and open it just to make sure a console window pops up. See it? Great. Now close it. Seriously – you can close it and never open this plain R console again. We’ve got something much better for you.

  3. Download RStudio Desktop from Posit’s website. (Posit is the company that makes RStudio — they rebranded a few years back as the company expanded into Python tooling.) Install the version for your operating system, then open it. From now on, RStudio is the workspace you live in; it connects to R automatically in the background, so the bare R console you opened in step 2 stays closed.

4.1 The RStudio Interface

Welcome to your cockpit.

When you open RStudio, you’ll see the screen divided into four panels (or “panes”). It might look like a lot at first, but think of it as your digital workspace – like having your desk, your calculator, your filing cabinet, and your whiteboard all visible at the same time. Here’s what each panel does:

  • Source (top-left): This is your notepad. It’s where you write and edit your R scripts and reports. Think of it like drafting an email – you write your code here, review it, and then send it off to be executed when you’re ready.

  • Console (bottom-left): This is where the magic happens. When you run code, R processes it here and shows you the results. You can also type commands directly into the console for quick, one-off calculations – like using it as the world’s most powerful calculator.

  • Environment/History (top-right): This panel shows you everything R is currently holding in memory – your datasets, variables, and other objects. It’s like a table of contents for your current work session. The History tab keeps a log of every command you’ve run, which is great for those “wait, what did I just do?” moments.

  • Files/Plots/Packages/Help/Viewer (bottom-right): The utility panel. Browse files on your computer, view plots you create, manage installed packages, and read help documentation. When you produce a chart, this is where it appears first.

4.2 Organizing Projects

A few quick housekeeping tips that will save you from future headaches. Trust me, “Future You” will be grateful.

  1. Keep each project in its own folder. Every analysis project should live in its own dedicated folder on your computer. Your data files, scripts, and any output (charts, reports) all go in there. It’s like having a separate binder for each class – simple, but it keeps you from losing things.

  2. Watch your naming conventions. Avoid spaces and special characters in your folder names, file names, and variable names. Computers can be picky about these things. Instead of My Sales Report (Final v2).csv, go with my_sales_report_v2.csv. For variable names, something like Customer_Age or CustomerAge works great. And please, use names that actually mean something – x1 tells you nothing six months later, but monthly_revenue tells you everything.

  3. Always check your working directory. After opening RStudio, make sure R knows where your project files are. The working directory is basically R’s answer to the question “where should I look for files?” If it’s pointed at the wrong folder, R won’t be able to find your data. It’s like telling your GPS the wrong starting address.

  4. Use RStudio Projects (seriously, do this). This is the pro move. Go to File -> New Project to create one. An RStudio Project (.Rproj file) automatically sets your working directory, keeps all your files organized, and plays nicely with version control if you ever get into that. It’s the difference between tossing papers on your desk and having a proper filing system.

4.3 Installing Packages from CRAN

Packages are like apps for R – each one adds new capabilities. You install them from CRAN (the Comprehensive R Archive Network), which is basically R’s app store. Here’s how:

Code
install.packages("pkgname")
install.packages(c("pkgname1","pkgname2")) # For two or more packages

You only need to install a package once (just like you only download an app once). After that, you just load it each time you want to use it.

4.3.1 Installing the tidyverse

The tidyverse is a collection of packages that we’ll use constantly throughout this book. It’s like a starter kit for modern data analysis. One command installs the whole bundle:

Code
install.packages("tidyverse")

This single command installs ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, and lubridate, among others. It’s like buying the combo meal instead of ordering everything separately. While you’re at it, grab a few other handy packages we’ll use:

Code
install.packages(c("readxl", "haven", "knitr", "rmarkdown", "kableExtra"))

4.4 Installing a Package from GitHub

Sometimes a developer has a brand-new package (or a cutting-edge update) that hasn’t made it to CRAN yet. In that case, you can install it directly from GitHub – think of it as downloading a beta version of an app directly from the developer:

Code
if (!require("devtools")) install.packages("devtools")
devtools::install_github("githubaccountname/packagerepository")

You probably won’t need this often, but it’s good to know it’s an option.

4.5 Loading Packages

Here’s an important distinction: installing a package downloads it to your computer, but loading it makes it available for your current session. It’s like the difference between buying a book and actually opening it.

Every time you start a new R session and want to use a package, you need to load it:

Code
library(pkgname)

Loading the tidyverse loads all the core packages at once:

You will see a startup message listing the attached packages and any function name conflicts with base R. This is normal – don’t panic. It’s just R being transparent about what it loaded.

4.6 Setting Your Working Directory

Your working directory is where R looks for files and saves output. Think of it as telling R: “everything I am working on today is in this folder.”

  • Check your current working directory: getwd()
  • Set a new one: setwd("path/to/directory")
  • In RStudio, the menu way: Session → Set Working Directory.
  • The right way: use RStudio Projects (covered above). When you open a project, the working directory is set automatically and your code stays portable across machines.
WarningAI Pitfall: AI loves setwd() and setwd() will sink you

Ask an AI assistant to “set my working directory to my data folder” and you will reliably get a line like setwd("C:/Users/yourname/Documents/projects/myproject"). It runs. It works on your machine, today. It also guarantees that the moment anyone else tries to run your code — a coworker, a reviewer, future-you on a new laptop — the script breaks.

Three habits to internalize from day one:

  1. Never paste hardcoded absolute paths into scripts. Use RStudio Projects so the working directory is set automatically when you open the .Rproj file.
  2. Inside a project, use relative paths: read_csv("data/sales.csv"), not read_csv("C:/Users/me/projects/q3-analysis/data/sales.csv").
  3. For anything more complex, the here package gives you here("data", "sales.csv") which always resolves correctly relative to your project root, regardless of where in the project tree your script lives.

AI assistants will keep generating setwd() because that is what most R code on the internet looks like. The fact that they generate it does not mean you should ship it.

4.7 Reading Data from Different Sources

This is the part you’ve been waiting for – loading actual data into R. Whether your boss emailed you a CSV, your professor shared an Excel file, or you downloaded data from a survey platform, R can handle it.

Load a saved R data file:

Code
load("filename.Rda")

Read a CSV file (the format you’ll probably encounter most often) using the readr package (part of the tidyverse). See Chapter @ref(dataimport) for the full deep dive.

Code
library(readr)
dataname <- read_csv("datafilename.csv") # returns a tibble

That <- arrow means “take whatever’s on the right and store it in the name on the left.” So this line says: “Read this CSV file and call it dataname so I can work with it.” The read_csv() function is fast and gives you clean output.

4.7.1 Excel Files

Excel files arrive in inboxes constantly — the readxl package pulls them directly into R, no intermediate CSV step required:

Code
library(readxl)
read_excel("excelfilename.xls") # reads xlsx extensions also
read_excel("excelfilename.xlsx", sheet = "nameofsheet")
read_excel("excelfilename.xlsx", sheet = 3) # for third sheet
read_excel("my-spreadsheet.xls", na = "NA") # specify NA representation

You can specify which sheet to read, which is a lifesaver when someone sends you a workbook with 17 tabs and the data you need is on sheet 3.

4.7.2 SPSS/SAS/Stata Files

If you’re working with data from other statistical software (maybe from a research methods class or a dataset your professor shared), the haven package has you covered:

Code
library(haven)
read_sas("path/to/file") # SAS file
read_por("path/to/file") # SPSS por files
read_sav("path/to/file") # SPSS sav files
read_dta("path/to/file") # Stata

4.7.3 Many Others

R can read almost any data format you throw at it. For a full list of options:

Code
?read.table
# The readr package provides many other options: https://readr.tidyverse.org/

4.8 Saving Data

You’ve done all this work – don’t forget to save it. Here’s how to export your data back out of R.

Code
save(dataframename, file = "filename.Rda") # To R data (preserves multiple objects)

library(readr)
write_csv(dataframename, file = "myfilename.csv") # to CSV, no row names by default
write_rds(dataframename, file = "myfilename.rds") # single R object, preserves types

Pro tip: If you need to send the data to someone who only knows Excel, save it as a CSV. Everyone can open a CSV. It’s the universal language of data files.

4.9 Creating a Data Frame

A data frame is R’s version of a spreadsheet – rows and columns of data. If you can picture an Excel worksheet, you can picture a data frame. Let’s create a simple one.

A tibble is the tidyverse version of a data frame – it works the same way but has a few quality-of-life improvements, like nicer printing and not converting your text to weird factor categories behind your back.

Code
library(tibble)
page <- tibble(
 Person = c("A", "B", "C", "D", "E"),
 Age = c(15, 20, 25, 30, 35),
 Height = c(60, 63, 75, 79, 56)
)
page
# A tibble: 5 × 3
  Person   Age Height
  <chr>  <dbl>  <dbl>
1 A         15     60
2 B         20     63
3 C         25     75
4 D         30     79
5 E         35     56

What happened here? We created three columns (Person, Age, Height), then stitched them together into a tibble called page. The c() function just means “combine these values into a list.” Easy.

You can also create small tibbles row-by-row using tribble(), which can be easier to read when you’re typing in data manually:

Code
page_tribble <- tribble(
 ~Person, ~Age, ~Height,
 "A", 15, 60,
 "B", 20, 63,
 "C", 25, 75,
 "D", 30, 79,
 "E", 35, 56
)
page_tribble
# A tibble: 5 × 3
  Person   Age Height
  <chr>  <dbl>  <dbl>
1 A         15     60
2 B         20     63
3 C         25     75
4 D         30     79
5 E         35     56

4.10 R Basics: Objects and Assignment

Here’s a fundamental concept: R stores everything in objects. An object is just a name that holds some data – kind of like a labeled box. You put something in the box, slap a label on it, and then you can refer to it by name later.

You assign values to objects using <- (the preferred way) or =:

Code
x <- 10
y <- "hello"
z <- c(1, 2, 3, 4, 5)

Reading that first line out loud: “x gets 10.” Now whenever you type x, R knows you mean 10. Simple as that.

4.10.1 Common Data Types

R has a few basic types of data. Don’t overthink this – it’s pretty intuitive:

Code
# Numeric
num <- 42.5
class(num)
[1] "numeric"
Code
# Character (string)
name <- "Statistics"
class(name)
[1] "character"
Code
# Logical (boolean)
flag <- TRUE
class(flag)
[1] "logical"
Code
# Vector (collection of same type)
nums <- c(10, 20, 30, 40, 50)
names <- c("Alice", "Bob", "Charlie")
  • Numeric: Numbers. Could be revenue figures, ages, stock prices – anything with digits.
  • Character: Text. Customer names, product descriptions, survey responses.
  • Logical: TRUE or FALSE. Did the customer buy? Is the account active? Yes or no questions.
  • Vector: A list of values that are all the same type. Like a single column from a spreadsheet.

4.10.2 Useful Functions for Exploring Objects

When you’re working with data and want to check what you’re dealing with, these functions are your best friends:

Code
class(x) # What type of object?
[1] "numeric"
Code
length(x) # How many elements?
[1] 1
Code
str(x) # Structure of the object
 num 10
Code
typeof(x) # Internal storage type
[1] "double"
Code
is.numeric(x) # Is it numeric?
[1] TRUE

Think of these as the “what am I looking at?” tools. You’ll use them constantly, especially when something isn’t working and you need to figure out why.

4.11 Getting Help

You will get stuck. Everyone does. The good news is that R has an excellent built-in help system, and knowing how to use it will save you hours of frustration.

Code
?mean # Help page for a specific function
help(mean) # Same as above
??regression # Search help pages for a keyword
help.search("regression") # Same as above

In RStudio, you can also press F1 with your cursor on a function name to open its help page instantly. The help pages can look dense at first – lots of technical details – but the most useful parts are usually the Examples section at the very bottom. Scroll down, look at the examples, and things usually click.

And remember: Google is your friend. Searching “how to [thing you want to do] in R” will almost always lead you to a helpful answer on Stack Overflow or an R blog. You’re not cheating by Googling – even experienced R users do it every single day.