8 Data Wrangling with dplyr

9 Data Wrangling with dplyr

Code

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.5.2

Warning: package 'ggplot2' was built under R version 4.5.2

Warning: package 'tibble' was built under R version 4.5.2

Warning: package 'tidyr' was built under R version 4.5.2

Warning: package 'readr' was built under R version 4.5.2

Warning: package 'purrr' was built under R version 4.5.2

Warning: package 'dplyr' was built under R version 4.5.2

Warning: package 'stringr' was built under R version 4.5.2

Warning: package 'forcats' was built under R version 4.5.2

Warning: package 'lubridate' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(knitr)

Warning: package 'knitr' was built under R version 4.5.2

Code

library(DT)

Warning: package 'DT' was built under R version 4.5.2

Code

opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

Data wrangling — filtering, selecting, sorting, mutating, grouping, summarizing — is where most analysts spend most of their time. The dplyr package replaces the menu-driven workflow of Excel pivot tables with short, readable commands that compose: “rows where revenue tops $10K,” “add a profit-margin column,” “average sales by region.” Each operation is a single function call, and because every function takes a data frame in and returns a data frame back, you can chain them into pipelines.

The result is code that reads close to how you would describe the analysis in plain language, and that reviewers can follow without reverse-engineering a series of UI clicks.

9.1 The Six Verbs You Need to Know

Almost everything in dplyr boils down to six functions. Once they are second nature, most data wrangling becomes a question of which verbs to chain in which order.

Verb	What It Does	Excel Equivalent
`filter()`	Show me only the rows I care about	Filtering rows
`select()`	Show me only the columns I care about	Hiding columns
`mutate()`	Add a new column (or change one)	Adding a formula column
`summarize()`	Give me summary stats	SUM, AVERAGE, COUNT
`arrange()`	Sort my data	Sort A-Z / Z-A
`group_by()`	Do it separately for each group	Pivot tables!

They all work the same way:

First argument: your data.
Next arguments: what you want to do, using column names directly (no quotes needed).
Result: a shiny new data frame.

Let’s dig in.

9.2 filter() — Show Me Only the Rows I Care About

filter() keeps only rows that match a condition. It is the analyst’s equivalent of opening a database and writing a WHERE clause: “rows where the customer is in California,” “rows where the order amount is over a thousand dollars,” “rows where the date falls in Q3.”

AI Pitfall: long pipelines hide their own intermediate failures

Ask an AI assistant to “find the average order value for high-value customers in each region this quarter” and you get back a clean five-step pipeline:

df %>%
  filter(date >= "2026-07-01", date <= "2026-09-30") %>%
  filter(customer_segment == "High-value") %>%
  group_by(region) %>%
  summarize(avg_order = mean(order_amount, na.rm = TRUE)) %>%
  arrange(desc(avg_order))

The pipeline reads beautifully. It also makes it impossible to see what happened at each step. If the date filter accidentally dropped 80% of the rows because of a parsing issue, you would not know — the only output is the final summary table. If customer_segment has values like “high-value” and “High-Value” mixed in (case-sensitive, common in inherited data), half your high-value customers got silently excluded.

The discipline: when a pipeline matters, run it one verb at a time first. Check nrow() after each step. Only chain it once you trust each piece. Or use dplyr::filter_at-style debugging where you assign intermediate results and inspect them. AI-generated pipelines are convenient; the convenience is exactly what makes verification harder.

9.2.1 Basic Comparisons

Code

# Cars that get more than 30 miles per gallon
filter(mtcars, mpg > 30)

                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

Only 4 cars out of the original 32 clear the 30 mpg bar — the Toyota Corolla, Fiat 128, Lotus Europa, and Honda Civic. Filtering transformed a 32-row dataset into a 4-row answer.

Code

# Only setosa flowers
filter(iris, Species == "setosa")

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
12          4.8         3.4          1.6         0.2  setosa
13          4.8         3.0          1.4         0.1  setosa
14          4.3         3.0          1.1         0.1  setosa
15          5.8         4.0          1.2         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa
17          5.4         3.9          1.3         0.4  setosa
18          5.1         3.5          1.4         0.3  setosa
19          5.7         3.8          1.7         0.3  setosa
20          5.1         3.8          1.5         0.3  setosa
21          5.4         3.4          1.7         0.2  setosa
22          5.1         3.7          1.5         0.4  setosa
23          4.6         3.6          1.0         0.2  setosa
24          5.1         3.3          1.7         0.5  setosa
25          4.8         3.4          1.9         0.2  setosa
26          5.0         3.0          1.6         0.2  setosa
27          5.0         3.4          1.6         0.4  setosa
28          5.2         3.5          1.5         0.2  setosa
29          5.2         3.4          1.4         0.2  setosa
30          4.7         3.2          1.6         0.2  setosa
31          4.8         3.1          1.6         0.2  setosa
32          5.4         3.4          1.5         0.4  setosa
33          5.2         4.1          1.5         0.1  setosa
34          5.5         4.2          1.4         0.2  setosa
35          4.9         3.1          1.5         0.2  setosa
36          5.0         3.2          1.2         0.2  setosa
37          5.5         3.5          1.3         0.2  setosa
38          4.9         3.6          1.4         0.1  setosa
39          4.4         3.0          1.3         0.2  setosa
40          5.1         3.4          1.5         0.2  setosa
41          5.0         3.5          1.3         0.3  setosa
42          4.5         2.3          1.3         0.3  setosa
43          4.4         3.2          1.3         0.2  setosa
44          5.0         3.5          1.6         0.6  setosa
45          5.1         3.8          1.9         0.4  setosa
46          4.8         3.0          1.4         0.3  setosa
47          5.1         3.8          1.6         0.2  setosa
48          4.6         3.2          1.4         0.2  setosa
49          5.3         3.7          1.5         0.2  setosa
50          5.0         3.3          1.4         0.2  setosa

Code

# Cars with exactly 6 cylinders
filter(mtcars, cyl == 6)

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

The comparison operators are what you’d expect: == (equals), != (not equal), >, <, >=, <=.

Pro tip that will save you 20 minutes of debugging: Use == for comparison, not =. A single = is for assigning values. Everyone makes this mistake exactly once. Some of us make it twice.

9.2.2 Combining Conditions

Need multiple conditions? Use & (and) or | (or).

Code

# Setosa flowers with long sepals
filter(iris, Species == "setosa" & Sepal.Length > 5)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           5.4         3.9          1.7         0.4  setosa
3           5.4         3.7          1.5         0.2  setosa
4           5.8         4.0          1.2         0.2  setosa
5           5.7         4.4          1.5         0.4  setosa
6           5.4         3.9          1.3         0.4  setosa
7           5.1         3.5          1.4         0.3  setosa
8           5.7         3.8          1.7         0.3  setosa
9           5.1         3.8          1.5         0.3  setosa
10          5.4         3.4          1.7         0.2  setosa
11          5.1         3.7          1.5         0.4  setosa
12          5.1         3.3          1.7         0.5  setosa
13          5.2         3.5          1.5         0.2  setosa
14          5.2         3.4          1.4         0.2  setosa
15          5.4         3.4          1.5         0.4  setosa
16          5.2         4.1          1.5         0.1  setosa
17          5.5         4.2          1.4         0.2  setosa
18          5.5         3.5          1.3         0.2  setosa
19          5.1         3.4          1.5         0.2  setosa
20          5.1         3.8          1.9         0.4  setosa
21          5.1         3.8          1.6         0.2  setosa
22          5.3         3.7          1.5         0.2  setosa

Shortcut: commas inside filter() are treated as “and” automatically.

Code

# Same thing, cleaner syntax
filter(iris, Species == "setosa", Sepal.Length > 5)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           5.4         3.9          1.7         0.4  setosa
3           5.4         3.7          1.5         0.2  setosa
4           5.8         4.0          1.2         0.2  setosa
5           5.7         4.4          1.5         0.4  setosa
6           5.4         3.9          1.3         0.4  setosa
7           5.1         3.5          1.4         0.3  setosa
8           5.7         3.8          1.7         0.3  setosa
9           5.1         3.8          1.5         0.3  setosa
10          5.4         3.4          1.7         0.2  setosa
11          5.1         3.7          1.5         0.4  setosa
12          5.1         3.3          1.7         0.5  setosa
13          5.2         3.5          1.5         0.2  setosa
14          5.2         3.4          1.4         0.2  setosa
15          5.4         3.4          1.5         0.4  setosa
16          5.2         4.1          1.5         0.1  setosa
17          5.5         4.2          1.4         0.2  setosa
18          5.5         3.5          1.3         0.2  setosa
19          5.1         3.4          1.5         0.2  setosa
20          5.1         3.8          1.9         0.4  setosa
21          5.1         3.8          1.6         0.2  setosa
22          5.3         3.7          1.5         0.2  setosa

Code

# Cars with 4 cylinders OR great gas mileage
filter(mtcars, cyl == 4 | mpg > 30)

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

9.2.3 The %in% Operator

When you want to match against a list of values, %in% is your friend. Think of it as checking whether a job applicant’s major is on your “approved majors” list.

Code

# Setosa or virginica (but not versicolor)
filter(iris, Species %in% c("setosa", "virginica"))

    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1            5.1         3.5          1.4         0.2    setosa
2            4.9         3.0          1.4         0.2    setosa
3            4.7         3.2          1.3         0.2    setosa
4            4.6         3.1          1.5         0.2    setosa
5            5.0         3.6          1.4         0.2    setosa
6            5.4         3.9          1.7         0.4    setosa
7            4.6         3.4          1.4         0.3    setosa
8            5.0         3.4          1.5         0.2    setosa
9            4.4         2.9          1.4         0.2    setosa
10           4.9         3.1          1.5         0.1    setosa
11           5.4         3.7          1.5         0.2    setosa
12           4.8         3.4          1.6         0.2    setosa
13           4.8         3.0          1.4         0.1    setosa
14           4.3         3.0          1.1         0.1    setosa
15           5.8         4.0          1.2         0.2    setosa
16           5.7         4.4          1.5         0.4    setosa
17           5.4         3.9          1.3         0.4    setosa
18           5.1         3.5          1.4         0.3    setosa
19           5.7         3.8          1.7         0.3    setosa
20           5.1         3.8          1.5         0.3    setosa
21           5.4         3.4          1.7         0.2    setosa
22           5.1         3.7          1.5         0.4    setosa
23           4.6         3.6          1.0         0.2    setosa
24           5.1         3.3          1.7         0.5    setosa
25           4.8         3.4          1.9         0.2    setosa
26           5.0         3.0          1.6         0.2    setosa
27           5.0         3.4          1.6         0.4    setosa
28           5.2         3.5          1.5         0.2    setosa
29           5.2         3.4          1.4         0.2    setosa
30           4.7         3.2          1.6         0.2    setosa
31           4.8         3.1          1.6         0.2    setosa
32           5.4         3.4          1.5         0.4    setosa
33           5.2         4.1          1.5         0.1    setosa
34           5.5         4.2          1.4         0.2    setosa
35           4.9         3.1          1.5         0.2    setosa
36           5.0         3.2          1.2         0.2    setosa
37           5.5         3.5          1.3         0.2    setosa
38           4.9         3.6          1.4         0.1    setosa
39           4.4         3.0          1.3         0.2    setosa
40           5.1         3.4          1.5         0.2    setosa
41           5.0         3.5          1.3         0.3    setosa
42           4.5         2.3          1.3         0.3    setosa
43           4.4         3.2          1.3         0.2    setosa
44           5.0         3.5          1.6         0.6    setosa
45           5.1         3.8          1.9         0.4    setosa
46           4.8         3.0          1.4         0.3    setosa
47           5.1         3.8          1.6         0.2    setosa
48           4.6         3.2          1.4         0.2    setosa
49           5.3         3.7          1.5         0.2    setosa
50           5.0         3.3          1.4         0.2    setosa
51           6.3         3.3          6.0         2.5 virginica
52           5.8         2.7          5.1         1.9 virginica
53           7.1         3.0          5.9         2.1 virginica
54           6.3         2.9          5.6         1.8 virginica
55           6.5         3.0          5.8         2.2 virginica
56           7.6         3.0          6.6         2.1 virginica
57           4.9         2.5          4.5         1.7 virginica
58           7.3         2.9          6.3         1.8 virginica
59           6.7         2.5          5.8         1.8 virginica
60           7.2         3.6          6.1         2.5 virginica
61           6.5         3.2          5.1         2.0 virginica
62           6.4         2.7          5.3         1.9 virginica
63           6.8         3.0          5.5         2.1 virginica
64           5.7         2.5          5.0         2.0 virginica
65           5.8         2.8          5.1         2.4 virginica
66           6.4         3.2          5.3         2.3 virginica
67           6.5         3.0          5.5         1.8 virginica
68           7.7         3.8          6.7         2.2 virginica
69           7.7         2.6          6.9         2.3 virginica
70           6.0         2.2          5.0         1.5 virginica
71           6.9         3.2          5.7         2.3 virginica
72           5.6         2.8          4.9         2.0 virginica
73           7.7         2.8          6.7         2.0 virginica
74           6.3         2.7          4.9         1.8 virginica
75           6.7         3.3          5.7         2.1 virginica
76           7.2         3.2          6.0         1.8 virginica
77           6.2         2.8          4.8         1.8 virginica
78           6.1         3.0          4.9         1.8 virginica
79           6.4         2.8          5.6         2.1 virginica
80           7.2         3.0          5.8         1.6 virginica
81           7.4         2.8          6.1         1.9 virginica
82           7.9         3.8          6.4         2.0 virginica
83           6.4         2.8          5.6         2.2 virginica
84           6.3         2.8          5.1         1.5 virginica
85           6.1         2.6          5.6         1.4 virginica
86           7.7         3.0          6.1         2.3 virginica
87           6.3         3.4          5.6         2.4 virginica
88           6.4         3.1          5.5         1.8 virginica
89           6.0         3.0          4.8         1.8 virginica
90           6.9         3.1          5.4         2.1 virginica
91           6.7         3.1          5.6         2.4 virginica
92           6.9         3.1          5.1         2.3 virginica
93           5.8         2.7          5.1         1.9 virginica
94           6.8         3.2          5.9         2.3 virginica
95           6.7         3.3          5.7         2.5 virginica
96           6.7         3.0          5.2         2.3 virginica
97           6.3         2.5          5.0         1.9 virginica
98           6.5         3.0          5.2         2.0 virginica
99           6.2         3.4          5.4         2.3 virginica
100          5.9         3.0          5.1         1.8 virginica

Code

# 4-cylinder or 8-cylinder cars
filter(mtcars, cyl %in% c(4, 8))

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

9.2.4 Filtering Ranges

between() is perfect for “give me everything from X to Y” — like pulling all transactions between Q1 and Q3.

Code

# Sepal length between 5.0 and 6.0
filter(iris, between(Sepal.Length, 5.0, 6.0))

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.1         3.5          1.4         0.2     setosa
2           5.0         3.6          1.4         0.2     setosa
3           5.4         3.9          1.7         0.4     setosa
4           5.0         3.4          1.5         0.2     setosa
5           5.4         3.7          1.5         0.2     setosa
6           5.8         4.0          1.2         0.2     setosa
7           5.7         4.4          1.5         0.4     setosa
8           5.4         3.9          1.3         0.4     setosa
9           5.1         3.5          1.4         0.3     setosa
10          5.7         3.8          1.7         0.3     setosa
11          5.1         3.8          1.5         0.3     setosa
12          5.4         3.4          1.7         0.2     setosa
13          5.1         3.7          1.5         0.4     setosa
14          5.1         3.3          1.7         0.5     setosa
15          5.0         3.0          1.6         0.2     setosa
16          5.0         3.4          1.6         0.4     setosa
17          5.2         3.5          1.5         0.2     setosa
18          5.2         3.4          1.4         0.2     setosa
19          5.4         3.4          1.5         0.4     setosa
20          5.2         4.1          1.5         0.1     setosa
21          5.5         4.2          1.4         0.2     setosa
22          5.0         3.2          1.2         0.2     setosa
23          5.5         3.5          1.3         0.2     setosa
24          5.1         3.4          1.5         0.2     setosa
25          5.0         3.5          1.3         0.3     setosa
26          5.0         3.5          1.6         0.6     setosa
27          5.1         3.8          1.9         0.4     setosa
28          5.1         3.8          1.6         0.2     setosa
29          5.3         3.7          1.5         0.2     setosa
30          5.0         3.3          1.4         0.2     setosa
31          5.5         2.3          4.0         1.3 versicolor
32          5.7         2.8          4.5         1.3 versicolor
33          5.2         2.7          3.9         1.4 versicolor
34          5.0         2.0          3.5         1.0 versicolor
35          5.9         3.0          4.2         1.5 versicolor
36          6.0         2.2          4.0         1.0 versicolor
37          5.6         2.9          3.6         1.3 versicolor
38          5.6         3.0          4.5         1.5 versicolor
39          5.8         2.7          4.1         1.0 versicolor
40          5.6         2.5          3.9         1.1 versicolor
41          5.9         3.2          4.8         1.8 versicolor
42          6.0         2.9          4.5         1.5 versicolor
43          5.7         2.6          3.5         1.0 versicolor
44          5.5         2.4          3.8         1.1 versicolor
45          5.5         2.4          3.7         1.0 versicolor
46          5.8         2.7          3.9         1.2 versicolor
47          6.0         2.7          5.1         1.6 versicolor
48          5.4         3.0          4.5         1.5 versicolor
49          6.0         3.4          4.5         1.6 versicolor
50          5.6         3.0          4.1         1.3 versicolor
51          5.5         2.5          4.0         1.3 versicolor
52          5.5         2.6          4.4         1.2 versicolor
53          5.8         2.6          4.0         1.2 versicolor
54          5.0         2.3          3.3         1.0 versicolor
55          5.6         2.7          4.2         1.3 versicolor
56          5.7         3.0          4.2         1.2 versicolor
57          5.7         2.9          4.2         1.3 versicolor
58          5.1         2.5          3.0         1.1 versicolor
59          5.7         2.8          4.1         1.3 versicolor
60          5.8         2.7          5.1         1.9  virginica
61          5.7         2.5          5.0         2.0  virginica
62          5.8         2.8          5.1         2.4  virginica
63          6.0         2.2          5.0         1.5  virginica
64          5.6         2.8          4.9         2.0  virginica
65          6.0         3.0          4.8         1.8  virginica
66          5.8         2.7          5.1         1.9  virginica
67          5.9         3.0          5.1         1.8  virginica

9.2.5 Dealing with Missing Values

Missing data (NA) is the business world’s version of “the dog ate my homework.” Someone didn’t fill out the survey, a sensor died, a record got corrupted. You can’t test for NA with ==; you need is.na().

Code

df_with_na <- tibble(
 x = c(1, 2, NA, 4, 5),
 y = c("a", NA, "c", "d", NA)
)

# Keep only rows where x is not missing
filter(df_with_na, !is.na(x))

# A tibble: 4 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     2 <NA> 
3     4 d    
4     5 <NA>

Code

# Keep rows where BOTH x and y are present
filter(df_with_na, !is.na(x), !is.na(y))

# A tibble: 2 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     4 d

9.2.6 Putting It Together

Real-world filtering usually stacks several conditions. Here’s the kind of thing you’ll actually write at work:

Code

# Efficient cars: 4 cylinders with above-average mpg
filter(mtcars, cyl == 4, mpg > mean(mpg))

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Notice something clever: mean(mpg) is computed on the full dataset (about 20.1 mpg), and then only 4-cylinder cars above that average are kept. All 4-cylinder cars are fuel-efficient by design, so most of them pass this bar.

9.3 select() – Show Me Only the Columns I Care About

Your dataset has 47 columns but you only need 5? select() is the Marie Kondo of data wrangling — it keeps only what sparks joy (or, you know, what’s relevant to your analysis).

9.3.1 Picking Columns

Code

# Just these two columns, please
select(iris, Sepal.Length, Species)

    Sepal.Length    Species
1            5.1     setosa
2            4.9     setosa
3            4.7     setosa
4            4.6     setosa
5            5.0     setosa
6            5.4     setosa
7            4.6     setosa
8            5.0     setosa
9            4.4     setosa
10           4.9     setosa
11           5.4     setosa
12           4.8     setosa
13           4.8     setosa
14           4.3     setosa
15           5.8     setosa
16           5.7     setosa
17           5.4     setosa
18           5.1     setosa
19           5.7     setosa
20           5.1     setosa
21           5.4     setosa
22           5.1     setosa
23           4.6     setosa
24           5.1     setosa
25           4.8     setosa
26           5.0     setosa
27           5.0     setosa
28           5.2     setosa
29           5.2     setosa
30           4.7     setosa
31           4.8     setosa
32           5.4     setosa
33           5.2     setosa
34           5.5     setosa
35           4.9     setosa
36           5.0     setosa
37           5.5     setosa
38           4.9     setosa
39           4.4     setosa
40           5.1     setosa
41           5.0     setosa
42           4.5     setosa
43           4.4     setosa
44           5.0     setosa
45           5.1     setosa
46           4.8     setosa
47           5.1     setosa
48           4.6     setosa
49           5.3     setosa
50           5.0     setosa
51           7.0 versicolor
52           6.4 versicolor
53           6.9 versicolor
54           5.5 versicolor
55           6.5 versicolor
56           5.7 versicolor
57           6.3 versicolor
58           4.9 versicolor
59           6.6 versicolor
60           5.2 versicolor
61           5.0 versicolor
62           5.9 versicolor
63           6.0 versicolor
64           6.1 versicolor
65           5.6 versicolor
66           6.7 versicolor
67           5.6 versicolor
68           5.8 versicolor
69           6.2 versicolor
70           5.6 versicolor
71           5.9 versicolor
72           6.1 versicolor
73           6.3 versicolor
74           6.1 versicolor
75           6.4 versicolor
76           6.6 versicolor
77           6.8 versicolor
78           6.7 versicolor
79           6.0 versicolor
80           5.7 versicolor
81           5.5 versicolor
82           5.5 versicolor
83           5.8 versicolor
84           6.0 versicolor
85           5.4 versicolor
86           6.0 versicolor
87           6.7 versicolor
88           6.3 versicolor
89           5.6 versicolor
90           5.5 versicolor
91           5.5 versicolor
92           6.1 versicolor
93           5.8 versicolor
94           5.0 versicolor
95           5.6 versicolor
96           5.7 versicolor
97           5.7 versicolor
98           6.2 versicolor
99           5.1 versicolor
100          5.7 versicolor
101          6.3  virginica
102          5.8  virginica
103          7.1  virginica
104          6.3  virginica
105          6.5  virginica
106          7.6  virginica
107          4.9  virginica
108          7.3  virginica
109          6.7  virginica
110          7.2  virginica
111          6.5  virginica
112          6.4  virginica
113          6.8  virginica
114          5.7  virginica
115          5.8  virginica
116          6.4  virginica
117          6.5  virginica
118          7.7  virginica
119          7.7  virginica
120          6.0  virginica
121          6.9  virginica
122          5.6  virginica
123          7.7  virginica
124          6.3  virginica
125          6.7  virginica
126          7.2  virginica
127          6.2  virginica
128          6.1  virginica
129          6.4  virginica
130          7.2  virginica
131          7.4  virginica
132          7.9  virginica
133          6.4  virginica
134          6.3  virginica
135          6.1  virginica
136          7.7  virginica
137          6.3  virginica
138          6.4  virginica
139          6.0  virginica
140          6.9  virginica
141          6.7  virginica
142          6.9  virginica
143          5.8  virginica
144          6.8  virginica
145          6.7  virginica
146          6.7  virginica
147          6.3  virginica
148          6.5  virginica
149          6.2  virginica
150          5.9  virginica

Code

# A range of columns
select(iris, Sepal.Length:Petal.Length)

    Sepal.Length Sepal.Width Petal.Length
1            5.1         3.5          1.4
2            4.9         3.0          1.4
3            4.7         3.2          1.3
4            4.6         3.1          1.5
5            5.0         3.6          1.4
6            5.4         3.9          1.7
7            4.6         3.4          1.4
8            5.0         3.4          1.5
9            4.4         2.9          1.4
10           4.9         3.1          1.5
11           5.4         3.7          1.5
12           4.8         3.4          1.6
13           4.8         3.0          1.4
14           4.3         3.0          1.1
15           5.8         4.0          1.2
16           5.7         4.4          1.5
17           5.4         3.9          1.3
18           5.1         3.5          1.4
19           5.7         3.8          1.7
20           5.1         3.8          1.5
21           5.4         3.4          1.7
22           5.1         3.7          1.5
23           4.6         3.6          1.0
24           5.1         3.3          1.7
25           4.8         3.4          1.9
26           5.0         3.0          1.6
27           5.0         3.4          1.6
28           5.2         3.5          1.5
29           5.2         3.4          1.4
30           4.7         3.2          1.6
31           4.8         3.1          1.6
32           5.4         3.4          1.5
33           5.2         4.1          1.5
34           5.5         4.2          1.4
35           4.9         3.1          1.5
36           5.0         3.2          1.2
37           5.5         3.5          1.3
38           4.9         3.6          1.4
39           4.4         3.0          1.3
40           5.1         3.4          1.5
41           5.0         3.5          1.3
42           4.5         2.3          1.3
43           4.4         3.2          1.3
44           5.0         3.5          1.6
45           5.1         3.8          1.9
46           4.8         3.0          1.4
47           5.1         3.8          1.6
48           4.6         3.2          1.4
49           5.3         3.7          1.5
50           5.0         3.3          1.4
51           7.0         3.2          4.7
52           6.4         3.2          4.5
53           6.9         3.1          4.9
54           5.5         2.3          4.0
55           6.5         2.8          4.6
56           5.7         2.8          4.5
57           6.3         3.3          4.7
58           4.9         2.4          3.3
59           6.6         2.9          4.6
60           5.2         2.7          3.9
61           5.0         2.0          3.5
62           5.9         3.0          4.2
63           6.0         2.2          4.0
64           6.1         2.9          4.7
65           5.6         2.9          3.6
66           6.7         3.1          4.4
67           5.6         3.0          4.5
68           5.8         2.7          4.1
69           6.2         2.2          4.5
70           5.6         2.5          3.9
71           5.9         3.2          4.8
72           6.1         2.8          4.0
73           6.3         2.5          4.9
74           6.1         2.8          4.7
75           6.4         2.9          4.3
76           6.6         3.0          4.4
77           6.8         2.8          4.8
78           6.7         3.0          5.0
79           6.0         2.9          4.5
80           5.7         2.6          3.5
81           5.5         2.4          3.8
82           5.5         2.4          3.7
83           5.8         2.7          3.9
84           6.0         2.7          5.1
85           5.4         3.0          4.5
86           6.0         3.4          4.5
87           6.7         3.1          4.7
88           6.3         2.3          4.4
89           5.6         3.0          4.1
90           5.5         2.5          4.0
91           5.5         2.6          4.4
92           6.1         3.0          4.6
93           5.8         2.6          4.0
94           5.0         2.3          3.3
95           5.6         2.7          4.2
96           5.7         3.0          4.2
97           5.7         2.9          4.2
98           6.2         2.9          4.3
99           5.1         2.5          3.0
100          5.7         2.8          4.1
101          6.3         3.3          6.0
102          5.8         2.7          5.1
103          7.1         3.0          5.9
104          6.3         2.9          5.6
105          6.5         3.0          5.8
106          7.6         3.0          6.6
107          4.9         2.5          4.5
108          7.3         2.9          6.3
109          6.7         2.5          5.8
110          7.2         3.6          6.1
111          6.5         3.2          5.1
112          6.4         2.7          5.3
113          6.8         3.0          5.5
114          5.7         2.5          5.0
115          5.8         2.8          5.1
116          6.4         3.2          5.3
117          6.5         3.0          5.5
118          7.7         3.8          6.7
119          7.7         2.6          6.9
120          6.0         2.2          5.0
121          6.9         3.2          5.7
122          5.6         2.8          4.9
123          7.7         2.8          6.7
124          6.3         2.7          4.9
125          6.7         3.3          5.7
126          7.2         3.2          6.0
127          6.2         2.8          4.8
128          6.1         3.0          4.9
129          6.4         2.8          5.6
130          7.2         3.0          5.8
131          7.4         2.8          6.1
132          7.9         3.8          6.4
133          6.4         2.8          5.6
134          6.3         2.8          5.1
135          6.1         2.6          5.6
136          7.7         3.0          6.1
137          6.3         3.4          5.6
138          6.4         3.1          5.5
139          6.0         3.0          4.8
140          6.9         3.1          5.4
141          6.7         3.1          5.6
142          6.9         3.1          5.1
143          5.8         2.7          5.1
144          6.8         3.2          5.9
145          6.7         3.3          5.7
146          6.7         3.0          5.2
147          6.3         2.5          5.0
148          6.5         3.0          5.2
149          6.2         3.4          5.4
150          5.9         3.0          5.1

Code

# Everything EXCEPT Species
select(iris, -Species)

    Sepal.Length Sepal.Width Petal.Length Petal.Width
1            5.1         3.5          1.4         0.2
2            4.9         3.0          1.4         0.2
3            4.7         3.2          1.3         0.2
4            4.6         3.1          1.5         0.2
5            5.0         3.6          1.4         0.2
6            5.4         3.9          1.7         0.4
7            4.6         3.4          1.4         0.3
8            5.0         3.4          1.5         0.2
9            4.4         2.9          1.4         0.2
10           4.9         3.1          1.5         0.1
11           5.4         3.7          1.5         0.2
12           4.8         3.4          1.6         0.2
13           4.8         3.0          1.4         0.1
14           4.3         3.0          1.1         0.1
15           5.8         4.0          1.2         0.2
16           5.7         4.4          1.5         0.4
17           5.4         3.9          1.3         0.4
18           5.1         3.5          1.4         0.3
19           5.7         3.8          1.7         0.3
20           5.1         3.8          1.5         0.3
21           5.4         3.4          1.7         0.2
22           5.1         3.7          1.5         0.4
23           4.6         3.6          1.0         0.2
24           5.1         3.3          1.7         0.5
25           4.8         3.4          1.9         0.2
26           5.0         3.0          1.6         0.2
27           5.0         3.4          1.6         0.4
28           5.2         3.5          1.5         0.2
29           5.2         3.4          1.4         0.2
30           4.7         3.2          1.6         0.2
31           4.8         3.1          1.6         0.2
32           5.4         3.4          1.5         0.4
33           5.2         4.1          1.5         0.1
34           5.5         4.2          1.4         0.2
35           4.9         3.1          1.5         0.2
36           5.0         3.2          1.2         0.2
37           5.5         3.5          1.3         0.2
38           4.9         3.6          1.4         0.1
39           4.4         3.0          1.3         0.2
40           5.1         3.4          1.5         0.2
41           5.0         3.5          1.3         0.3
42           4.5         2.3          1.3         0.3
43           4.4         3.2          1.3         0.2
44           5.0         3.5          1.6         0.6
45           5.1         3.8          1.9         0.4
46           4.8         3.0          1.4         0.3
47           5.1         3.8          1.6         0.2
48           4.6         3.2          1.4         0.2
49           5.3         3.7          1.5         0.2
50           5.0         3.3          1.4         0.2
51           7.0         3.2          4.7         1.4
52           6.4         3.2          4.5         1.5
53           6.9         3.1          4.9         1.5
54           5.5         2.3          4.0         1.3
55           6.5         2.8          4.6         1.5
56           5.7         2.8          4.5         1.3
57           6.3         3.3          4.7         1.6
58           4.9         2.4          3.3         1.0
59           6.6         2.9          4.6         1.3
60           5.2         2.7          3.9         1.4
61           5.0         2.0          3.5         1.0
62           5.9         3.0          4.2         1.5
63           6.0         2.2          4.0         1.0
64           6.1         2.9          4.7         1.4
65           5.6         2.9          3.6         1.3
66           6.7         3.1          4.4         1.4
67           5.6         3.0          4.5         1.5
68           5.8         2.7          4.1         1.0
69           6.2         2.2          4.5         1.5
70           5.6         2.5          3.9         1.1
71           5.9         3.2          4.8         1.8
72           6.1         2.8          4.0         1.3
73           6.3         2.5          4.9         1.5
74           6.1         2.8          4.7         1.2
75           6.4         2.9          4.3         1.3
76           6.6         3.0          4.4         1.4
77           6.8         2.8          4.8         1.4
78           6.7         3.0          5.0         1.7
79           6.0         2.9          4.5         1.5
80           5.7         2.6          3.5         1.0
81           5.5         2.4          3.8         1.1
82           5.5         2.4          3.7         1.0
83           5.8         2.7          3.9         1.2
84           6.0         2.7          5.1         1.6
85           5.4         3.0          4.5         1.5
86           6.0         3.4          4.5         1.6
87           6.7         3.1          4.7         1.5
88           6.3         2.3          4.4         1.3
89           5.6         3.0          4.1         1.3
90           5.5         2.5          4.0         1.3
91           5.5         2.6          4.4         1.2
92           6.1         3.0          4.6         1.4
93           5.8         2.6          4.0         1.2
94           5.0         2.3          3.3         1.0
95           5.6         2.7          4.2         1.3
96           5.7         3.0          4.2         1.2
97           5.7         2.9          4.2         1.3
98           6.2         2.9          4.3         1.3
99           5.1         2.5          3.0         1.1
100          5.7         2.8          4.1         1.3
101          6.3         3.3          6.0         2.5
102          5.8         2.7          5.1         1.9
103          7.1         3.0          5.9         2.1
104          6.3         2.9          5.6         1.8
105          6.5         3.0          5.8         2.2
106          7.6         3.0          6.6         2.1
107          4.9         2.5          4.5         1.7
108          7.3         2.9          6.3         1.8
109          6.7         2.5          5.8         1.8
110          7.2         3.6          6.1         2.5
111          6.5         3.2          5.1         2.0
112          6.4         2.7          5.3         1.9
113          6.8         3.0          5.5         2.1
114          5.7         2.5          5.0         2.0
115          5.8         2.8          5.1         2.4
116          6.4         3.2          5.3         2.3
117          6.5         3.0          5.5         1.8
118          7.7         3.8          6.7         2.2
119          7.7         2.6          6.9         2.3
120          6.0         2.2          5.0         1.5
121          6.9         3.2          5.7         2.3
122          5.6         2.8          4.9         2.0
123          7.7         2.8          6.7         2.0
124          6.3         2.7          4.9         1.8
125          6.7         3.3          5.7         2.1
126          7.2         3.2          6.0         1.8
127          6.2         2.8          4.8         1.8
128          6.1         3.0          4.9         1.8
129          6.4         2.8          5.6         2.1
130          7.2         3.0          5.8         1.6
131          7.4         2.8          6.1         1.9
132          7.9         3.8          6.4         2.0
133          6.4         2.8          5.6         2.2
134          6.3         2.8          5.1         1.5
135          6.1         2.6          5.6         1.4
136          7.7         3.0          6.1         2.3
137          6.3         3.4          5.6         2.4
138          6.4         3.1          5.5         1.8
139          6.0         3.0          4.8         1.8
140          6.9         3.1          5.4         2.1
141          6.7         3.1          5.6         2.4
142          6.9         3.1          5.1         2.3
143          5.8         2.7          5.1         1.9
144          6.8         3.2          5.9         2.3
145          6.7         3.3          5.7         2.5
146          6.7         3.0          5.2         2.3
147          6.3         2.5          5.0         1.9
148          6.5         3.0          5.2         2.0
149          6.2         3.4          5.4         2.3
150          5.9         3.0          5.1         1.8

9.3.2 Smart Selection Helpers

When your data has lots of similarly named columns (like Q1_Revenue, Q2_Revenue, Q3_Revenue…), these helpers save a ton of typing.

Code

# All columns that start with "Sepal"
select(iris, starts_with("Sepal"))

    Sepal.Length Sepal.Width
1            5.1         3.5
2            4.9         3.0
3            4.7         3.2
4            4.6         3.1
5            5.0         3.6
6            5.4         3.9
7            4.6         3.4
8            5.0         3.4
9            4.4         2.9
10           4.9         3.1
11           5.4         3.7
12           4.8         3.4
13           4.8         3.0
14           4.3         3.0
15           5.8         4.0
16           5.7         4.4
17           5.4         3.9
18           5.1         3.5
19           5.7         3.8
20           5.1         3.8
21           5.4         3.4
22           5.1         3.7
23           4.6         3.6
24           5.1         3.3
25           4.8         3.4
26           5.0         3.0
27           5.0         3.4
28           5.2         3.5
29           5.2         3.4
30           4.7         3.2
31           4.8         3.1
32           5.4         3.4
33           5.2         4.1
34           5.5         4.2
35           4.9         3.1
36           5.0         3.2
37           5.5         3.5
38           4.9         3.6
39           4.4         3.0
40           5.1         3.4
41           5.0         3.5
42           4.5         2.3
43           4.4         3.2
44           5.0         3.5
45           5.1         3.8
46           4.8         3.0
47           5.1         3.8
48           4.6         3.2
49           5.3         3.7
50           5.0         3.3
51           7.0         3.2
52           6.4         3.2
53           6.9         3.1
54           5.5         2.3
55           6.5         2.8
56           5.7         2.8
57           6.3         3.3
58           4.9         2.4
59           6.6         2.9
60           5.2         2.7
61           5.0         2.0
62           5.9         3.0
63           6.0         2.2
64           6.1         2.9
65           5.6         2.9
66           6.7         3.1
67           5.6         3.0
68           5.8         2.7
69           6.2         2.2
70           5.6         2.5
71           5.9         3.2
72           6.1         2.8
73           6.3         2.5
74           6.1         2.8
75           6.4         2.9
76           6.6         3.0
77           6.8         2.8
78           6.7         3.0
79           6.0         2.9
80           5.7         2.6
81           5.5         2.4
82           5.5         2.4
83           5.8         2.7
84           6.0         2.7
85           5.4         3.0
86           6.0         3.4
87           6.7         3.1
88           6.3         2.3
89           5.6         3.0
90           5.5         2.5
91           5.5         2.6
92           6.1         3.0
93           5.8         2.6
94           5.0         2.3
95           5.6         2.7
96           5.7         3.0
97           5.7         2.9
98           6.2         2.9
99           5.1         2.5
100          5.7         2.8
101          6.3         3.3
102          5.8         2.7
103          7.1         3.0
104          6.3         2.9
105          6.5         3.0
106          7.6         3.0
107          4.9         2.5
108          7.3         2.9
109          6.7         2.5
110          7.2         3.6
111          6.5         3.2
112          6.4         2.7
113          6.8         3.0
114          5.7         2.5
115          5.8         2.8
116          6.4         3.2
117          6.5         3.0
118          7.7         3.8
119          7.7         2.6
120          6.0         2.2
121          6.9         3.2
122          5.6         2.8
123          7.7         2.8
124          6.3         2.7
125          6.7         3.3
126          7.2         3.2
127          6.2         2.8
128          6.1         3.0
129          6.4         2.8
130          7.2         3.0
131          7.4         2.8
132          7.9         3.8
133          6.4         2.8
134          6.3         2.8
135          6.1         2.6
136          7.7         3.0
137          6.3         3.4
138          6.4         3.1
139          6.0         3.0
140          6.9         3.1
141          6.7         3.1
142          6.9         3.1
143          5.8         2.7
144          6.8         3.2
145          6.7         3.3
146          6.7         3.0
147          6.3         2.5
148          6.5         3.0
149          6.2         3.4
150          5.9         3.0

Code

# All columns that end with "Width"
select(iris, ends_with("Width"))

    Sepal.Width Petal.Width
1           3.5         0.2
2           3.0         0.2
3           3.2         0.2
4           3.1         0.2
5           3.6         0.2
6           3.9         0.4
7           3.4         0.3
8           3.4         0.2
9           2.9         0.2
10          3.1         0.1
11          3.7         0.2
12          3.4         0.2
13          3.0         0.1
14          3.0         0.1
15          4.0         0.2
16          4.4         0.4
17          3.9         0.4
18          3.5         0.3
19          3.8         0.3
20          3.8         0.3
21          3.4         0.2
22          3.7         0.4
23          3.6         0.2
24          3.3         0.5
25          3.4         0.2
26          3.0         0.2
27          3.4         0.4
28          3.5         0.2
29          3.4         0.2
30          3.2         0.2
31          3.1         0.2
32          3.4         0.4
33          4.1         0.1
34          4.2         0.2
35          3.1         0.2
36          3.2         0.2
37          3.5         0.2
38          3.6         0.1
39          3.0         0.2
40          3.4         0.2
41          3.5         0.3
42          2.3         0.3
43          3.2         0.2
44          3.5         0.6
45          3.8         0.4
46          3.0         0.3
47          3.8         0.2
48          3.2         0.2
49          3.7         0.2
50          3.3         0.2
51          3.2         1.4
52          3.2         1.5
53          3.1         1.5
54          2.3         1.3
55          2.8         1.5
56          2.8         1.3
57          3.3         1.6
58          2.4         1.0
59          2.9         1.3
60          2.7         1.4
61          2.0         1.0
62          3.0         1.5
63          2.2         1.0
64          2.9         1.4
65          2.9         1.3
66          3.1         1.4
67          3.0         1.5
68          2.7         1.0
69          2.2         1.5
70          2.5         1.1
71          3.2         1.8
72          2.8         1.3
73          2.5         1.5
74          2.8         1.2
75          2.9         1.3
76          3.0         1.4
77          2.8         1.4
78          3.0         1.7
79          2.9         1.5
80          2.6         1.0
81          2.4         1.1
82          2.4         1.0
83          2.7         1.2
84          2.7         1.6
85          3.0         1.5
86          3.4         1.6
87          3.1         1.5
88          2.3         1.3
89          3.0         1.3
90          2.5         1.3
91          2.6         1.2
92          3.0         1.4
93          2.6         1.2
94          2.3         1.0
95          2.7         1.3
96          3.0         1.2
97          2.9         1.3
98          2.9         1.3
99          2.5         1.1
100         2.8         1.3
101         3.3         2.5
102         2.7         1.9
103         3.0         2.1
104         2.9         1.8
105         3.0         2.2
106         3.0         2.1
107         2.5         1.7
108         2.9         1.8
109         2.5         1.8
110         3.6         2.5
111         3.2         2.0
112         2.7         1.9
113         3.0         2.1
114         2.5         2.0
115         2.8         2.4
116         3.2         2.3
117         3.0         1.8
118         3.8         2.2
119         2.6         2.3
120         2.2         1.5
121         3.2         2.3
122         2.8         2.0
123         2.8         2.0
124         2.7         1.8
125         3.3         2.1
126         3.2         1.8
127         2.8         1.8
128         3.0         1.8
129         2.8         2.1
130         3.0         1.6
131         2.8         1.9
132         3.8         2.0
133         2.8         2.2
134         2.8         1.5
135         2.6         1.4
136         3.0         2.3
137         3.4         2.4
138         3.1         1.8
139         3.0         1.8
140         3.1         2.1
141         3.1         2.4
142         3.1         2.3
143         2.7         1.9
144         3.2         2.3
145         3.3         2.5
146         3.0         2.3
147         2.5         1.9
148         3.0         2.0
149         3.4         2.3
150         3.0         1.8

Code

# All columns that contain "pal"
select(iris, contains("pal"))

    Sepal.Length Sepal.Width
1            5.1         3.5
2            4.9         3.0
3            4.7         3.2
4            4.6         3.1
5            5.0         3.6
6            5.4         3.9
7            4.6         3.4
8            5.0         3.4
9            4.4         2.9
10           4.9         3.1
11           5.4         3.7
12           4.8         3.4
13           4.8         3.0
14           4.3         3.0
15           5.8         4.0
16           5.7         4.4
17           5.4         3.9
18           5.1         3.5
19           5.7         3.8
20           5.1         3.8
21           5.4         3.4
22           5.1         3.7
23           4.6         3.6
24           5.1         3.3
25           4.8         3.4
26           5.0         3.0
27           5.0         3.4
28           5.2         3.5
29           5.2         3.4
30           4.7         3.2
31           4.8         3.1
32           5.4         3.4
33           5.2         4.1
34           5.5         4.2
35           4.9         3.1
36           5.0         3.2
37           5.5         3.5
38           4.9         3.6
39           4.4         3.0
40           5.1         3.4
41           5.0         3.5
42           4.5         2.3
43           4.4         3.2
44           5.0         3.5
45           5.1         3.8
46           4.8         3.0
47           5.1         3.8
48           4.6         3.2
49           5.3         3.7
50           5.0         3.3
51           7.0         3.2
52           6.4         3.2
53           6.9         3.1
54           5.5         2.3
55           6.5         2.8
56           5.7         2.8
57           6.3         3.3
58           4.9         2.4
59           6.6         2.9
60           5.2         2.7
61           5.0         2.0
62           5.9         3.0
63           6.0         2.2
64           6.1         2.9
65           5.6         2.9
66           6.7         3.1
67           5.6         3.0
68           5.8         2.7
69           6.2         2.2
70           5.6         2.5
71           5.9         3.2
72           6.1         2.8
73           6.3         2.5
74           6.1         2.8
75           6.4         2.9
76           6.6         3.0
77           6.8         2.8
78           6.7         3.0
79           6.0         2.9
80           5.7         2.6
81           5.5         2.4
82           5.5         2.4
83           5.8         2.7
84           6.0         2.7
85           5.4         3.0
86           6.0         3.4
87           6.7         3.1
88           6.3         2.3
89           5.6         3.0
90           5.5         2.5
91           5.5         2.6
92           6.1         3.0
93           5.8         2.6
94           5.0         2.3
95           5.6         2.7
96           5.7         3.0
97           5.7         2.9
98           6.2         2.9
99           5.1         2.5
100          5.7         2.8
101          6.3         3.3
102          5.8         2.7
103          7.1         3.0
104          6.3         2.9
105          6.5         3.0
106          7.6         3.0
107          4.9         2.5
108          7.3         2.9
109          6.7         2.5
110          7.2         3.6
111          6.5         3.2
112          6.4         2.7
113          6.8         3.0
114          5.7         2.5
115          5.8         2.8
116          6.4         3.2
117          6.5         3.0
118          7.7         3.8
119          7.7         2.6
120          6.0         2.2
121          6.9         3.2
122          5.6         2.8
123          7.7         2.8
124          6.3         2.7
125          6.7         3.3
126          7.2         3.2
127          6.2         2.8
128          6.1         3.0
129          6.4         2.8
130          7.2         3.0
131          7.4         2.8
132          7.9         3.8
133          6.4         2.8
134          6.3         2.8
135          6.1         2.6
136          7.7         3.0
137          6.3         3.4
138          6.4         3.1
139          6.0         3.0
140          6.9         3.1
141          6.7         3.1
142          6.9         3.1
143          5.8         2.7
144          6.8         3.2
145          6.7         3.3
146          6.7         3.0
147          6.3         2.5
148          6.5         3.0
149          6.2         3.4
150          5.9         3.0

Code

# Only numeric columns
select(iris, where(is.numeric))

    Sepal.Length Sepal.Width Petal.Length Petal.Width
1            5.1         3.5          1.4         0.2
2            4.9         3.0          1.4         0.2
3            4.7         3.2          1.3         0.2
4            4.6         3.1          1.5         0.2
5            5.0         3.6          1.4         0.2
6            5.4         3.9          1.7         0.4
7            4.6         3.4          1.4         0.3
8            5.0         3.4          1.5         0.2
9            4.4         2.9          1.4         0.2
10           4.9         3.1          1.5         0.1
11           5.4         3.7          1.5         0.2
12           4.8         3.4          1.6         0.2
13           4.8         3.0          1.4         0.1
14           4.3         3.0          1.1         0.1
15           5.8         4.0          1.2         0.2
16           5.7         4.4          1.5         0.4
17           5.4         3.9          1.3         0.4
18           5.1         3.5          1.4         0.3
19           5.7         3.8          1.7         0.3
20           5.1         3.8          1.5         0.3
21           5.4         3.4          1.7         0.2
22           5.1         3.7          1.5         0.4
23           4.6         3.6          1.0         0.2
24           5.1         3.3          1.7         0.5
25           4.8         3.4          1.9         0.2
26           5.0         3.0          1.6         0.2
27           5.0         3.4          1.6         0.4
28           5.2         3.5          1.5         0.2
29           5.2         3.4          1.4         0.2
30           4.7         3.2          1.6         0.2
31           4.8         3.1          1.6         0.2
32           5.4         3.4          1.5         0.4
33           5.2         4.1          1.5         0.1
34           5.5         4.2          1.4         0.2
35           4.9         3.1          1.5         0.2
36           5.0         3.2          1.2         0.2
37           5.5         3.5          1.3         0.2
38           4.9         3.6          1.4         0.1
39           4.4         3.0          1.3         0.2
40           5.1         3.4          1.5         0.2
41           5.0         3.5          1.3         0.3
42           4.5         2.3          1.3         0.3
43           4.4         3.2          1.3         0.2
44           5.0         3.5          1.6         0.6
45           5.1         3.8          1.9         0.4
46           4.8         3.0          1.4         0.3
47           5.1         3.8          1.6         0.2
48           4.6         3.2          1.4         0.2
49           5.3         3.7          1.5         0.2
50           5.0         3.3          1.4         0.2
51           7.0         3.2          4.7         1.4
52           6.4         3.2          4.5         1.5
53           6.9         3.1          4.9         1.5
54           5.5         2.3          4.0         1.3
55           6.5         2.8          4.6         1.5
56           5.7         2.8          4.5         1.3
57           6.3         3.3          4.7         1.6
58           4.9         2.4          3.3         1.0
59           6.6         2.9          4.6         1.3
60           5.2         2.7          3.9         1.4
61           5.0         2.0          3.5         1.0
62           5.9         3.0          4.2         1.5
63           6.0         2.2          4.0         1.0
64           6.1         2.9          4.7         1.4
65           5.6         2.9          3.6         1.3
66           6.7         3.1          4.4         1.4
67           5.6         3.0          4.5         1.5
68           5.8         2.7          4.1         1.0
69           6.2         2.2          4.5         1.5
70           5.6         2.5          3.9         1.1
71           5.9         3.2          4.8         1.8
72           6.1         2.8          4.0         1.3
73           6.3         2.5          4.9         1.5
74           6.1         2.8          4.7         1.2
75           6.4         2.9          4.3         1.3
76           6.6         3.0          4.4         1.4
77           6.8         2.8          4.8         1.4
78           6.7         3.0          5.0         1.7
79           6.0         2.9          4.5         1.5
80           5.7         2.6          3.5         1.0
81           5.5         2.4          3.8         1.1
82           5.5         2.4          3.7         1.0
83           5.8         2.7          3.9         1.2
84           6.0         2.7          5.1         1.6
85           5.4         3.0          4.5         1.5
86           6.0         3.4          4.5         1.6
87           6.7         3.1          4.7         1.5
88           6.3         2.3          4.4         1.3
89           5.6         3.0          4.1         1.3
90           5.5         2.5          4.0         1.3
91           5.5         2.6          4.4         1.2
92           6.1         3.0          4.6         1.4
93           5.8         2.6          4.0         1.2
94           5.0         2.3          3.3         1.0
95           5.6         2.7          4.2         1.3
96           5.7         3.0          4.2         1.2
97           5.7         2.9          4.2         1.3
98           6.2         2.9          4.3         1.3
99           5.1         2.5          3.0         1.1
100          5.7         2.8          4.1         1.3
101          6.3         3.3          6.0         2.5
102          5.8         2.7          5.1         1.9
103          7.1         3.0          5.9         2.1
104          6.3         2.9          5.6         1.8
105          6.5         3.0          5.8         2.2
106          7.6         3.0          6.6         2.1
107          4.9         2.5          4.5         1.7
108          7.3         2.9          6.3         1.8
109          6.7         2.5          5.8         1.8
110          7.2         3.6          6.1         2.5
111          6.5         3.2          5.1         2.0
112          6.4         2.7          5.3         1.9
113          6.8         3.0          5.5         2.1
114          5.7         2.5          5.0         2.0
115          5.8         2.8          5.1         2.4
116          6.4         3.2          5.3         2.3
117          6.5         3.0          5.5         1.8
118          7.7         3.8          6.7         2.2
119          7.7         2.6          6.9         2.3
120          6.0         2.2          5.0         1.5
121          6.9         3.2          5.7         2.3
122          5.6         2.8          4.9         2.0
123          7.7         2.8          6.7         2.0
124          6.3         2.7          4.9         1.8
125          6.7         3.3          5.7         2.1
126          7.2         3.2          6.0         1.8
127          6.2         2.8          4.8         1.8
128          6.1         3.0          4.9         1.8
129          6.4         2.8          5.6         2.1
130          7.2         3.0          5.8         1.6
131          7.4         2.8          6.1         1.9
132          7.9         3.8          6.4         2.0
133          6.4         2.8          5.6         2.2
134          6.3         2.8          5.1         1.5
135          6.1         2.6          5.6         1.4
136          7.7         3.0          6.1         2.3
137          6.3         3.4          5.6         2.4
138          6.4         3.1          5.5         1.8
139          6.0         3.0          4.8         1.8
140          6.9         3.1          5.4         2.1
141          6.7         3.1          5.6         2.4
142          6.9         3.1          5.1         2.3
143          5.8         2.7          5.1         1.9
144          6.8         3.2          5.9         2.3
145          6.7         3.3          5.7         2.5
146          6.7         3.0          5.2         2.3
147          6.3         2.5          5.0         1.9
148          6.5         3.0          5.2         2.0
149          6.2         3.4          5.4         2.3
150          5.9         3.0          5.1         1.8

9.3.3 Reordering and Renaming

Want to move a column to the front? Use everything() to keep the rest.

Code

# Put Species first, keep everything else
select(iris, Species, everything())

       Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1       setosa          5.1         3.5          1.4         0.2
2       setosa          4.9         3.0          1.4         0.2
3       setosa          4.7         3.2          1.3         0.2
4       setosa          4.6         3.1          1.5         0.2
5       setosa          5.0         3.6          1.4         0.2
6       setosa          5.4         3.9          1.7         0.4
7       setosa          4.6         3.4          1.4         0.3
8       setosa          5.0         3.4          1.5         0.2
9       setosa          4.4         2.9          1.4         0.2
10      setosa          4.9         3.1          1.5         0.1
11      setosa          5.4         3.7          1.5         0.2
12      setosa          4.8         3.4          1.6         0.2
13      setosa          4.8         3.0          1.4         0.1
14      setosa          4.3         3.0          1.1         0.1
15      setosa          5.8         4.0          1.2         0.2
16      setosa          5.7         4.4          1.5         0.4
17      setosa          5.4         3.9          1.3         0.4
18      setosa          5.1         3.5          1.4         0.3
19      setosa          5.7         3.8          1.7         0.3
20      setosa          5.1         3.8          1.5         0.3
21      setosa          5.4         3.4          1.7         0.2
22      setosa          5.1         3.7          1.5         0.4
23      setosa          4.6         3.6          1.0         0.2
24      setosa          5.1         3.3          1.7         0.5
25      setosa          4.8         3.4          1.9         0.2
26      setosa          5.0         3.0          1.6         0.2
27      setosa          5.0         3.4          1.6         0.4
28      setosa          5.2         3.5          1.5         0.2
29      setosa          5.2         3.4          1.4         0.2
30      setosa          4.7         3.2          1.6         0.2
31      setosa          4.8         3.1          1.6         0.2
32      setosa          5.4         3.4          1.5         0.4
33      setosa          5.2         4.1          1.5         0.1
34      setosa          5.5         4.2          1.4         0.2
35      setosa          4.9         3.1          1.5         0.2
36      setosa          5.0         3.2          1.2         0.2
37      setosa          5.5         3.5          1.3         0.2
38      setosa          4.9         3.6          1.4         0.1
39      setosa          4.4         3.0          1.3         0.2
40      setosa          5.1         3.4          1.5         0.2
41      setosa          5.0         3.5          1.3         0.3
42      setosa          4.5         2.3          1.3         0.3
43      setosa          4.4         3.2          1.3         0.2
44      setosa          5.0         3.5          1.6         0.6
45      setosa          5.1         3.8          1.9         0.4
46      setosa          4.8         3.0          1.4         0.3
47      setosa          5.1         3.8          1.6         0.2
48      setosa          4.6         3.2          1.4         0.2
49      setosa          5.3         3.7          1.5         0.2
50      setosa          5.0         3.3          1.4         0.2
51  versicolor          7.0         3.2          4.7         1.4
52  versicolor          6.4         3.2          4.5         1.5
53  versicolor          6.9         3.1          4.9         1.5
54  versicolor          5.5         2.3          4.0         1.3
55  versicolor          6.5         2.8          4.6         1.5
56  versicolor          5.7         2.8          4.5         1.3
57  versicolor          6.3         3.3          4.7         1.6
58  versicolor          4.9         2.4          3.3         1.0
59  versicolor          6.6         2.9          4.6         1.3
60  versicolor          5.2         2.7          3.9         1.4
61  versicolor          5.0         2.0          3.5         1.0
62  versicolor          5.9         3.0          4.2         1.5
63  versicolor          6.0         2.2          4.0         1.0
64  versicolor          6.1         2.9          4.7         1.4
65  versicolor          5.6         2.9          3.6         1.3
66  versicolor          6.7         3.1          4.4         1.4
67  versicolor          5.6         3.0          4.5         1.5
68  versicolor          5.8         2.7          4.1         1.0
69  versicolor          6.2         2.2          4.5         1.5
70  versicolor          5.6         2.5          3.9         1.1
71  versicolor          5.9         3.2          4.8         1.8
72  versicolor          6.1         2.8          4.0         1.3
73  versicolor          6.3         2.5          4.9         1.5
74  versicolor          6.1         2.8          4.7         1.2
75  versicolor          6.4         2.9          4.3         1.3
76  versicolor          6.6         3.0          4.4         1.4
77  versicolor          6.8         2.8          4.8         1.4
78  versicolor          6.7         3.0          5.0         1.7
79  versicolor          6.0         2.9          4.5         1.5
80  versicolor          5.7         2.6          3.5         1.0
81  versicolor          5.5         2.4          3.8         1.1
82  versicolor          5.5         2.4          3.7         1.0
83  versicolor          5.8         2.7          3.9         1.2
84  versicolor          6.0         2.7          5.1         1.6
85  versicolor          5.4         3.0          4.5         1.5
86  versicolor          6.0         3.4          4.5         1.6
87  versicolor          6.7         3.1          4.7         1.5
88  versicolor          6.3         2.3          4.4         1.3
89  versicolor          5.6         3.0          4.1         1.3
90  versicolor          5.5         2.5          4.0         1.3
91  versicolor          5.5         2.6          4.4         1.2
92  versicolor          6.1         3.0          4.6         1.4
93  versicolor          5.8         2.6          4.0         1.2
94  versicolor          5.0         2.3          3.3         1.0
95  versicolor          5.6         2.7          4.2         1.3
96  versicolor          5.7         3.0          4.2         1.2
97  versicolor          5.7         2.9          4.2         1.3
98  versicolor          6.2         2.9          4.3         1.3
99  versicolor          5.1         2.5          3.0         1.1
100 versicolor          5.7         2.8          4.1         1.3
101  virginica          6.3         3.3          6.0         2.5
102  virginica          5.8         2.7          5.1         1.9
103  virginica          7.1         3.0          5.9         2.1
104  virginica          6.3         2.9          5.6         1.8
105  virginica          6.5         3.0          5.8         2.2
106  virginica          7.6         3.0          6.6         2.1
107  virginica          4.9         2.5          4.5         1.7
108  virginica          7.3         2.9          6.3         1.8
109  virginica          6.7         2.5          5.8         1.8
110  virginica          7.2         3.6          6.1         2.5
111  virginica          6.5         3.2          5.1         2.0
112  virginica          6.4         2.7          5.3         1.9
113  virginica          6.8         3.0          5.5         2.1
114  virginica          5.7         2.5          5.0         2.0
115  virginica          5.8         2.8          5.1         2.4
116  virginica          6.4         3.2          5.3         2.3
117  virginica          6.5         3.0          5.5         1.8
118  virginica          7.7         3.8          6.7         2.2
119  virginica          7.7         2.6          6.9         2.3
120  virginica          6.0         2.2          5.0         1.5
121  virginica          6.9         3.2          5.7         2.3
122  virginica          5.6         2.8          4.9         2.0
123  virginica          7.7         2.8          6.7         2.0
124  virginica          6.3         2.7          4.9         1.8
125  virginica          6.7         3.3          5.7         2.1
126  virginica          7.2         3.2          6.0         1.8
127  virginica          6.2         2.8          4.8         1.8
128  virginica          6.1         3.0          4.9         1.8
129  virginica          6.4         2.8          5.6         2.1
130  virginica          7.2         3.0          5.8         1.6
131  virginica          7.4         2.8          6.1         1.9
132  virginica          7.9         3.8          6.4         2.0
133  virginica          6.4         2.8          5.6         2.2
134  virginica          6.3         2.8          5.1         1.5
135  virginica          6.1         2.6          5.6         1.4
136  virginica          7.7         3.0          6.1         2.3
137  virginica          6.3         3.4          5.6         2.4
138  virginica          6.4         3.1          5.5         1.8
139  virginica          6.0         3.0          4.8         1.8
140  virginica          6.9         3.1          5.4         2.1
141  virginica          6.7         3.1          5.6         2.4
142  virginica          6.9         3.1          5.1         2.3
143  virginica          5.8         2.7          5.1         1.9
144  virginica          6.8         3.2          5.9         2.3
145  virginica          6.7         3.3          5.7         2.5
146  virginica          6.7         3.0          5.2         2.3
147  virginica          6.3         2.5          5.0         1.9
148  virginica          6.5         3.0          5.2         2.0
149  virginica          6.2         3.4          5.4         2.3
150  virginica          5.9         3.0          5.1         1.8

Rename as you select:

Code

select(iris, sepal_length = Sepal.Length, species = Species)

    sepal_length    species
1            5.1     setosa
2            4.9     setosa
3            4.7     setosa
4            4.6     setosa
5            5.0     setosa
6            5.4     setosa
7            4.6     setosa
8            5.0     setosa
9            4.4     setosa
10           4.9     setosa
11           5.4     setosa
12           4.8     setosa
13           4.8     setosa
14           4.3     setosa
15           5.8     setosa
16           5.7     setosa
17           5.4     setosa
18           5.1     setosa
19           5.7     setosa
20           5.1     setosa
21           5.4     setosa
22           5.1     setosa
23           4.6     setosa
24           5.1     setosa
25           4.8     setosa
26           5.0     setosa
27           5.0     setosa
28           5.2     setosa
29           5.2     setosa
30           4.7     setosa
31           4.8     setosa
32           5.4     setosa
33           5.2     setosa
34           5.5     setosa
35           4.9     setosa
36           5.0     setosa
37           5.5     setosa
38           4.9     setosa
39           4.4     setosa
40           5.1     setosa
41           5.0     setosa
42           4.5     setosa
43           4.4     setosa
44           5.0     setosa
45           5.1     setosa
46           4.8     setosa
47           5.1     setosa
48           4.6     setosa
49           5.3     setosa
50           5.0     setosa
51           7.0 versicolor
52           6.4 versicolor
53           6.9 versicolor
54           5.5 versicolor
55           6.5 versicolor
56           5.7 versicolor
57           6.3 versicolor
58           4.9 versicolor
59           6.6 versicolor
60           5.2 versicolor
61           5.0 versicolor
62           5.9 versicolor
63           6.0 versicolor
64           6.1 versicolor
65           5.6 versicolor
66           6.7 versicolor
67           5.6 versicolor
68           5.8 versicolor
69           6.2 versicolor
70           5.6 versicolor
71           5.9 versicolor
72           6.1 versicolor
73           6.3 versicolor
74           6.1 versicolor
75           6.4 versicolor
76           6.6 versicolor
77           6.8 versicolor
78           6.7 versicolor
79           6.0 versicolor
80           5.7 versicolor
81           5.5 versicolor
82           5.5 versicolor
83           5.8 versicolor
84           6.0 versicolor
85           5.4 versicolor
86           6.0 versicolor
87           6.7 versicolor
88           6.3 versicolor
89           5.6 versicolor
90           5.5 versicolor
91           5.5 versicolor
92           6.1 versicolor
93           5.8 versicolor
94           5.0 versicolor
95           5.6 versicolor
96           5.7 versicolor
97           5.7 versicolor
98           6.2 versicolor
99           5.1 versicolor
100          5.7 versicolor
101          6.3  virginica
102          5.8  virginica
103          7.1  virginica
104          6.3  virginica
105          6.5  virginica
106          7.6  virginica
107          4.9  virginica
108          7.3  virginica
109          6.7  virginica
110          7.2  virginica
111          6.5  virginica
112          6.4  virginica
113          6.8  virginica
114          5.7  virginica
115          5.8  virginica
116          6.4  virginica
117          6.5  virginica
118          7.7  virginica
119          7.7  virginica
120          6.0  virginica
121          6.9  virginica
122          5.6  virginica
123          7.7  virginica
124          6.3  virginica
125          6.7  virginica
126          7.2  virginica
127          6.2  virginica
128          6.1  virginica
129          6.4  virginica
130          7.2  virginica
131          7.4  virginica
132          7.9  virginica
133          6.4  virginica
134          6.3  virginica
135          6.1  virginica
136          7.7  virginica
137          6.3  virginica
138          6.4  virginica
139          6.0  virginica
140          6.9  virginica
141          6.7  virginica
142          6.9  virginica
143          5.8  virginica
144          6.8  virginica
145          6.7  virginica
146          6.7  virginica
147          6.3  virginica
148          6.5  virginica
149          6.2  virginica
150          5.9  virginica

If you want to rename without dropping columns, use rename():

Code

rename(iris, sepal_length = Sepal.Length, sepal_width = Sepal.Width) |> head(3)

  sepal_length sepal_width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

9.4 mutate() – Add a New Column

Think of mutate() as adding a new column to your sales report. You’ve got Revenue and Costs? Great, now add a Profit column. That’s mutate() in a nutshell.

9.4.1 Creating New Columns

Code

# Approximate sepal area
iris_modified <- mutate(iris, Sepal.Area = Sepal.Length * Sepal.Width)
select(iris_modified, Sepal.Length, Sepal.Width, Sepal.Area) |> head()

  Sepal.Length Sepal.Width Sepal.Area
1          5.1         3.5      17.85
2          4.9         3.0      14.70
3          4.7         3.2      15.04
4          4.6         3.1      14.26
5          5.0         3.6      18.00
6          5.4         3.9      21.06

Code

# You can create multiple columns at once
# Later columns can reference earlier ones (like chained formulas)
mtcars_modified <- mutate(mtcars,
 weight_kg = wt * 453.592,
 hp_per_kg = hp / weight_kg
)
select(mtcars_modified, wt, weight_kg, hp, hp_per_kg) |> head()

                     wt weight_kg  hp  hp_per_kg
Mazda RX4         2.620  1188.411 110 0.09256057
Mazda RX4 Wag     2.875  1304.077 110 0.08435085
Datsun 710        2.320  1052.333  93 0.08837503
Hornet 4 Drive    3.215  1458.298 110 0.07543038
Hornet Sportabout 3.440  1560.356 175 0.11215386
Valiant           3.460  1569.428 105 0.06690334

9.4.2 Overwriting Existing Columns

If you assign to a name that already exists, it overwrites. No drama, no pop-up asking “are you sure?”

Code

# Convert wt from thousands of pounds to actual pounds
mutate(mtcars, wt = wt * 1000) |> select(wt) |> head()

                    wt
Mazda RX4         2620
Mazda RX4 Wag     2875
Datsun 710        2320
Hornet 4 Drive    3215
Hornet Sportabout 3440
Valiant           3460

9.4.3 Useful Functions Inside mutate()

Arithmetic — the usual suspects: +, -, *, /.

Code

mutate(iris,
 Sepal.Sum = Sepal.Length + Sepal.Width,
 Petal.Ratio = Petal.Length / Petal.Width
) |> head()

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Sum
1          5.1         3.5          1.4         0.2  setosa       8.6
2          4.9         3.0          1.4         0.2  setosa       7.9
3          4.7         3.2          1.3         0.2  setosa       7.9
4          4.6         3.1          1.5         0.2  setosa       7.7
5          5.0         3.6          1.4         0.2  setosa       8.6
6          5.4         3.9          1.7         0.4  setosa       9.3
  Petal.Ratio
1        7.00
2        7.00
3        6.50
4        7.50
5        7.00
6        4.25

Lag and lead — great for “compared to last month” calculations. Your CFO will love these.

Code

example_df <- tibble(
 day = 1:7,
 sales = c(100, 120, 90, 150, 130, 170, 160)
)
mutate(example_df,
 previous_day_sales = lag(sales),
 daily_change = sales - lag(sales)
)

# A tibble: 7 × 4
    day sales previous_day_sales daily_change
  <int> <dbl>              <dbl>        <dbl>
1     1   100                 NA           NA
2     2   120                100           20
3     3    90                120          -30
4     4   150                 90           60
5     5   130                150          -20
6     6   170                130           40
7     7   160                170          -10

Cumulative functions — running totals, running averages. Perfect for tracking year-to-date performance.

Code

mutate(example_df,
 cumulative_sales = cumsum(sales),
 running_average = cummean(sales)
)

# A tibble: 7 × 4
    day sales cumulative_sales running_average
  <int> <dbl>            <dbl>           <dbl>
1     1   100              100            100 
2     2   120              220            110 
3     3    90              310            103.
4     4   150              460            115 
5     5   130              590            118 
6     6   170              760            127.
7     7   160              920            131.

9.4.4 if_else() and case_when() – Conditional Logic

if_else() is your basic “if this, then that.”

Code

mutate(mtcars,
 efficiency = if_else(mpg > 20, "efficient", "inefficient")
) |> select(mpg, efficiency) |> head(10)

                   mpg  efficiency
Mazda RX4         21.0   efficient
Mazda RX4 Wag     21.0   efficient
Datsun 710        22.8   efficient
Hornet 4 Drive    21.4   efficient
Hornet Sportabout 18.7 inefficient
Valiant           18.1 inefficient
Duster 360        14.3 inefficient
Merc 240D         24.4   efficient
Merc 230          22.8   efficient
Merc 280          19.2 inefficient

case_when() handles multiple conditions — like a series of “if… else if… else if…” checks. Way more readable than nesting a dozen IF statements in Excel. (We’ve all seen those. We’ve all cried.)

Code

mutate(mtcars,
 mpg_category = case_when(
 mpg >= 30 ~ "excellent",
 mpg >= 20 ~ "good",
 mpg >= 15 ~ "moderate",
 .default = "poor"
 )
) |> select(mpg, mpg_category) |> head(10)

                   mpg mpg_category
Mazda RX4         21.0         good
Mazda RX4 Wag     21.0         good
Datsun 710        22.8         good
Hornet 4 Drive    21.4         good
Hornet Sportabout 18.7     moderate
Valiant           18.1     moderate
Duster 360        14.3         poor
Merc 240D         24.4         good
Merc 230          22.8         good
Merc 280          19.2     moderate

This is incredibly useful for bucketing data — customer segments, performance tiers, risk categories, you name it.

Code

mutate(iris,
 sepal_size = case_when(
 Sepal.Length > 6.5 ~ "large",
 Sepal.Length > 5.5 ~ "medium",
 .default = "small"
 )
) |> count(sepal_size, Species)

  sepal_size    Species  n
1      large versicolor  8
2      large  virginica 22
3     medium     setosa  3
4     medium versicolor 31
5     medium  virginica 27
6      small     setosa 47
7      small versicolor 11
8      small  virginica  1

But seriously — take a beat here. The categories you define in case_when() are not just code. Labels like “high value,” “poor performance,” or “high risk” drive real decisions about real people: who gets the promotion, who gets flagged for review, who qualifies for a loan. The code itself is neutral; the choices baked into it are not. Pressure-test your categories the way you would want someone to pressure-test a category that you got sorted into.

9.5 summarize() – Give Me the Summary Stats

summarize() (or summarise() if you’re feeling British) collapses your entire dataset into summary numbers. Alone, it gives you one row. Pair it with group_by() and it becomes a full-blown pivot table machine.

9.5.1 The Basics

Code

summarize(mtcars,
 avg_mpg = mean(mpg),
 avg_hp = mean(hp),
 min_wt = min(wt),
 max_wt = max(wt)
)

   avg_mpg   avg_hp min_wt max_wt
1 20.09062 146.6875  1.513  5.424

One row, four numbers. The average car gets about 20 mpg with 147 horsepower, and weights range from 1.5 to 5.4 (in thousands of pounds). That is the entire 32-row dataset collapsed into an executive summary.

9.5.2 Common Summary Functions

Your cheat sheet:

Function	What It Does
`mean()`	Average
`median()`	Median
`sd()`	Standard deviation
`min()`, `max()`	Smallest / largest value
`sum()`	Total
`n()`	Count of rows
`n_distinct()`	Count of unique values

Code

summarize(iris,
 n_rows = n(),
 n_species = n_distinct(Species),
 iqr_sepal_length = IQR(Sepal.Length)
)

  n_rows n_species iqr_sepal_length
1    150         3              1.3

9.5.3 Summarizing Multiple Columns with across()

Instead of writing out a summary for every single column (boring), across() lets you apply the same function to multiple columns at once.

Code

# Mean of all numeric columns
iris |>
 summarize(across(where(is.numeric), mean))

  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

One line, four averages. Petal measurements (mean length 3.76, mean width 1.20) are more variable across species than sepal measurements, which hints at petal dimensions being more useful for telling species apart.

Code

# Multiple stats at once
iris |>
 summarize(across(
 where(is.numeric),
 list(mean = mean, sd = sd),
 .names = "{.col}_{.fn}"
 ))

  Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean Sepal.Width_sd
1          5.843333       0.8280661         3.057333      0.4358663
  Petal.Length_mean Petal.Length_sd Petal.Width_mean Petal.Width_sd
1             3.758        1.765298         1.199333      0.7622377

Now each numeric column gets both its mean and standard deviation. The .names argument controls the output column names — {.col} is the original column name and {.fn} is the function name. Petal.Length has the highest standard deviation (1.77), confirming it varies the most across the dataset.

9.6 arrange() – Sort My Data

Sorting. Simple, essential, and surprisingly satisfying.

Code

# Sort by mpg (lowest first)
arrange(mtcars, mpg) |> head()

                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

9.6.1 Descending Order

Wrap the column in desc() to flip the sort — like clicking that Z-to-A button.

Code

# Best gas mileage first
arrange(mtcars, desc(mpg)) |> head()

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2

9.6.2 Sorting by Multiple Columns

Ties in the first column get broken by the second. Just like sorting a spreadsheet by Region, then by Revenue within each region.

Code

# Sort by cylinders, then by mpg within each group
arrange(mtcars, cyl, desc(mpg)) |> head(12)

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1

Note: NA values always sort to the end. Always.

9.7 group_by() – The Pivot Table of R

This is where dplyr goes from “neat tool” to “how did I ever live without this?” group_by() tells R: “For the next operation, do it separately for each group.” It’s the R equivalent of dragging a field into the Rows area of an Excel pivot table.

9.7.1 Grouped Summaries

The bread and butter combo: group_by() + summarize().

Code

iris |>
 group_by(Species) |>
 summarize(
 mean_sepal_length = mean(Sepal.Length),
 mean_petal_length = mean(Petal.Length),
 count = n()
 )

# A tibble: 3 × 4
  Species    mean_sepal_length mean_petal_length count
  <fct>                  <dbl>             <dbl> <int>
1 setosa                  5.01              1.46    50
2 versicolor              5.94              4.26    50
3 virginica               6.59              5.55    50

Three species, three rows. Setosa has the shortest petals (mean 1.46 cm) while virginica has the longest (mean 5.55 cm) — a nearly fourfold difference. This makes Petal.Length a strong candidate for distinguishing species.

Code

mtcars |>
 group_by(cyl) |>
 summarize(
 avg_mpg = mean(mpg),
 avg_hp = mean(hp),
 n_cars = n()
 )

# A tibble: 3 × 4
    cyl avg_mpg avg_hp n_cars
  <dbl>   <dbl>  <dbl>  <int>
1     4    26.7   82.6     11
2     6    19.7  122.       7
3     8    15.1  209.      14

Boom — pivot table. The pattern is crystal clear: 4-cylinder cars average 26.7 mpg with 82 hp, while 8-cylinder cars average only 15.1 mpg but pack 209 hp. More cylinders = more power but worse fuel economy. Your boss would frame this.

9.7.2 Multiple Grouping Variables

Group by more than one variable and R computes summaries for every combination — like a two-dimensional pivot table.

Code

mtcars |>
 group_by(cyl, am) |>
 summarize(
 avg_mpg = mean(mpg),
 n_cars = n(),
 .groups = "drop"
 )

# A tibble: 6 × 4
    cyl    am avg_mpg n_cars
  <dbl> <dbl>   <dbl>  <int>
1     4     0    22.9      3
2     4     1    28.1      8
3     6     0    19.1      4
4     6     1    20.6      3
5     8     0    15.0     12
6     8     1    15.4      2

Now we see six rows — every combination of cylinder count and transmission type (am = 0 is automatic, am = 1 is manual). Manual 4-cylinder cars get the best mileage (28.1 mpg), while automatic 8-cylinder cars get the worst (15.1 mpg). The .groups = "drop" removes the grouping when you’re done. Good habit — leftover grouping causes sneaky bugs.

9.7.3 Grouped Mutate

When you use mutate() on grouped data, calculations happen within each group. Perfect for “compared to the department average” analysis.

Code

# How does each flower compare to its species average?
iris |>
 group_by(Species) |>
 mutate(
 sepal_deviation = Sepal.Length - mean(Sepal.Length)
 ) |>
 select(Species, Sepal.Length, sepal_deviation) |>
 head(10)

# A tibble: 10 × 3
# Groups:   Species [1]
   Species Sepal.Length sepal_deviation
   <fct>          <dbl>           <dbl>
 1 setosa           5.1         0.0940 
 2 setosa           4.9        -0.106  
 3 setosa           4.7        -0.306  
 4 setosa           4.6        -0.406  
 5 setosa           5          -0.00600
 6 setosa           5.4         0.394  
 7 setosa           4.6        -0.406  
 8 setosa           5          -0.00600
 9 setosa           4.4        -0.606  
10 setosa           4.9        -0.106

The sepal_deviation column shows how far each flower is from its species mean. Positive values are above average for that species; negative values are below. This is a common technique for benchmarking — “how does this salesperson compare to their regional average?” — without mixing up groups that have different baselines.

9.7.4 Grouped Filter

Filter within groups: “Show me the top performers in each department.”

Code

# Cars with above-average mpg for their cylinder class
mtcars |>
 group_by(cyl) |>
 filter(mpg > mean(mpg)) |>
 select(cyl, mpg) |>
 arrange(cyl, desc(mpg))

# A tibble: 16 × 2
# Groups:   cyl [3]
     cyl   mpg
   <dbl> <dbl>
 1     4  33.9
 2     4  32.4
 3     4  30.4
 4     4  30.4
 5     4  27.3
 6     6  21.4
 7     6  21  
 8     6  21  
 9     8  19.2
10     8  18.7
11     8  17.3
12     8  16.4
13     8  15.8
14     8  15.5
15     8  15.2
16     8  15.2

This keeps only the cars that beat the average for their cylinder class. The key insight is that “above average” is computed within each group, not across the whole dataset. A 6-cylinder car with 21 mpg is above average for 6-cylinder cars, even though it is below average for 4-cylinder cars. This is exactly the logic you need for questions like “which stores outperform their regional average?”

9.7.5 Don’t Forget to ungroup()

Grouping is sticky — it hangs around until you explicitly remove it. If your results look weird, check for leftover grouping. It’s like leaving a filter on in Excel and then wondering why your row count seems off.

Code

iris_grouped <- iris |> group_by(Species)
group_vars(iris_grouped)

[1] "Species"

Code

iris_ungrouped <- iris_grouped |> ungroup()
group_vars(iris_ungrouped)

character(0)

9.8 Combining Operations – Where the Magic Happens

Here’s the payoff. Everything we’ve learned chains together with the pipe operator (|>). Each step feeds into the next, creating a readable analysis pipeline.

9.8.1 Example: Top Fuel-Efficient Cars by Class

Read this pipeline like a sentence: “Take the car data. Add car names. Keep only 4- and 6-cylinder cars. Select the columns I need. Within each cylinder group, rank by mpg. Show me the top 3 in each group.”

Code

mtcars |>
 mutate(car_name = rownames(mtcars)) |>
 filter(cyl %in% c(4, 6)) |>
 select(car_name, mpg, cyl, hp, wt) |>
 group_by(cyl) |>
 mutate(mpg_rank = min_rank(desc(mpg))) |>
 arrange(cyl, mpg_rank) |>
 filter(mpg_rank <= 3) |>
 ungroup()

# A tibble: 7 × 6
  car_name         mpg   cyl    hp    wt mpg_rank
  <chr>          <dbl> <dbl> <dbl> <dbl>    <int>
1 Toyota Corolla  33.9     4    65  1.84        1
2 Fiat 128        32.4     4    66  2.2         2
3 Honda Civic     30.4     4    52  1.62        3
4 Lotus Europa    30.4     4   113  1.51        3
5 Hornet 4 Drive  21.4     6   110  3.22        1
6 Mazda RX4       21       6   110  2.62        2
7 Mazda RX4 Wag   21       6   110  2.88        2

Eight clean lines that answer a real business question. Try doing that in Excel without a labyrinth of helper columns and VLOOKUP nightmares.

9.8.2 Example: Performance Report

“Label the transmissions and engine types. Group by both. Compute averages. Sort by fuel efficiency.”

Code

mtcars |>
 mutate(
 transmission = if_else(am == 1, "manual", "automatic"),
 engine = if_else(vs == 1, "straight", "V-shaped")
 ) |>
 group_by(transmission, engine) |>
 summarize(
 n = n(),
 avg_mpg = round(mean(mpg), 1),
 avg_hp = round(mean(hp), 1),
 avg_qsec = round(mean(qsec), 1),
 .groups = "drop"
 ) |>
 arrange(desc(avg_mpg))

# A tibble: 4 × 6
  transmission engine       n avg_mpg avg_hp avg_qsec
  <chr>        <chr>    <int>   <dbl>  <dbl>    <dbl>
1 manual       straight     7    28.4   80.6     18.7
2 automatic    straight     7    20.7  102.      20  
3 manual       V-shaped     6    19.8  181.      15.8
4 automatic    V-shaped    12    15.1  194.      17.1

This is exactly the kind of summary table your manager wants on their desk by Friday. Manual transmissions with straight engines get the best mileage (28.4 mpg), while automatic V-shaped engines are the thirstiest (15.1 mpg). Clean, grouped, sorted, readable.

9.8.3 Example: Species Comparison with Full Stats

Code

iris |>
 group_by(Species) |>
 summarize(
 across(everything(), list(mean = mean, sd = sd)),
 .groups = "drop"
 ) |>
 mutate(across(where(is.numeric), \(x) round(x, 2)))

# A tibble: 3 × 9
  Species    Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean Sepal.Width_sd
  <fct>                  <dbl>           <dbl>            <dbl>          <dbl>
1 setosa                  5.01            0.35             3.43           0.38
2 versicolor              5.94            0.52             2.77           0.31
3 virginica               6.59            0.64             2.97           0.32
# ℹ 4 more variables: Petal.Length_mean <dbl>, Petal.Length_sd <dbl>,
#   Petal.Width_mean <dbl>, Petal.Width_sd <dbl>

This produces a comprehensive statistical profile for each species — mean and standard deviation for every numeric variable. Scanning across, virginica consistently has the largest petal measurements, while setosa has the smallest. This kind of “all stats for all groups” table is a staple of exploratory data analysis.

9.9 Joining Data – Connecting Your Tables

In the real world, data doesn’t live in one neat spreadsheet. Your company has a customer database, an orders database, a product catalog, and a returns log — all in different tables. Joins are how you connect them. Think of it like matching up customer names across two different spreadsheets so you can see the full picture.

9.9.1 Setting Up the Example

Let’s use a scenario every business student can relate to: a CRM with customer info and a separate order tracking system.

Code

customers <- tibble(
 customer_id = c(1, 2, 3, 4, 5),
 name = c("Alice", "Bob", "Carol", "David", "Eve"),
 region = c("East", "West", "East", "South", "West")
)
customers

# A tibble: 5 × 3
  customer_id name  region
        <dbl> <chr> <chr> 
1           1 Alice East  
2           2 Bob   West  
3           3 Carol East  
4           4 David South 
5           5 Eve   West

Code

orders <- tibble(
 customer_id = c(1, 2, 3, 6, 7),
 revenue = c(500, 320, 890, 150, 410),
 quarter = c("Q1", "Q1", "Q2", "Q1", "Q2")
)
orders

# A tibble: 5 × 3
  customer_id revenue quarter
        <dbl>   <dbl> <chr>  
1           1     500 Q1     
2           2     320 Q1     
3           3     890 Q2     
4           6     150 Q1     
5           7     410 Q2

Notice: David and Eve are in the CRM but haven’t placed orders. Customers 6 and 7 placed orders but aren’t in the CRM yet (maybe they signed up through a new channel).

9.9.2 left_join() – The One You’ll Use 90% of the Time

left_join() keeps every row from your main table (left) and pulls in matching data from the second table (right). No match? You get NA. It’s like doing a VLOOKUP from your customer list into the orders sheet.

Code

left_join(customers, orders, by = "customer_id") %>%
 DT::datatable(options = list(pageLength = 10, dom = 't'),
 caption = "Left join: all customers kept, with order data where available.")

All five customers are here. David and Eve show NA for revenue and quarter because they had no orders. Customers 6 and 7 from the orders table? Excluded — they weren’t in the customer list. This is your go-to join.

9.9.3 inner_join() – Only the Matches

inner_join() keeps only rows that appear in BOTH tables. It’s like saying “show me only customers who have actually bought something.”

Code

inner_join(customers, orders, by = "customer_id")

# A tibble: 3 × 5
  customer_id name  region revenue quarter
        <dbl> <chr> <chr>    <dbl> <chr>  
1           1 Alice East       500 Q1     
2           2 Bob   West       320 Q1     
3           3 Carol East       890 Q2

Only Alice, Bob, and Carol survive. They’re the only ones in both tables.

9.9.4 Other Joins (Quick Reference)

right_join() — like left_join() but keeps everything from the right table instead. (Rarely used — most people just swap the table order and use left_join().)

Code

right_join(customers, orders, by = "customer_id")

# A tibble: 5 × 5
  customer_id name  region revenue quarter
        <dbl> <chr> <chr>    <dbl> <chr>  
1           1 Alice East       500 Q1     
2           2 Bob   West       320 Q1     
3           3 Carol East       890 Q2     
4           6 <NA>  <NA>       150 Q1     
5           7 <NA>  <NA>       410 Q2

full_join() — keeps everything from both tables. The “I don’t want to lose ANY data” option.

Code

full_join(customers, orders, by = "customer_id") |>
 DT::datatable(options = list(pageLength = 10, dom = 't'),
 caption = "Full join: every row from both tables, with NA where no match exists.")

anti_join() — “Show me customers who DON’T have orders.” Perfect for finding gaps in your data, like identifying leads who haven’t converted.

Code

anti_join(customers, orders, by = "customer_id")

# A tibble: 2 × 3
  customer_id name  region
        <dbl> <chr> <chr> 
1           4 David South 
2           5 Eve   West

9.9.5 When Column Names Don’t Match

Sometimes the key column has different names in each table (because of course it does — the sales team and the marketing team never agree on naming conventions). Use join_by().

Code

orders_renamed <- orders |> rename(id = customer_id)

left_join(customers, orders_renamed, by = join_by(customer_id == id))

# A tibble: 5 × 5
  customer_id name  region revenue quarter
        <dbl> <chr> <chr>    <dbl> <chr>  
1           1 Alice East       500 Q1     
2           2 Bob   West       320 Q1     
3           3 Carol East       890 Q2     
4           4 David South       NA <NA>   
5           5 Eve   West        NA <NA>

9.9.6 Joining on Multiple Columns

You can join on more than one column — useful when a single column isn’t enough to uniquely identify a match (like needing both customer AND product to find the right price).

Code

order_details <- tibble(
 customer = c("A", "A", "B", "B"),
 product = c("X", "Y", "X", "Z"),
 quantity = c(10, 5, 8, 12)
)

prices <- tibble(
 customer = c("A", "A", "B", "C"),
 product = c("X", "Y", "X", "X"),
 price = c(2.50, 3.00, 2.75, 2.50)
)

inner_join(order_details, prices, by = c("customer", "product"))

# A tibble: 3 × 4
  customer product quantity price
  <chr>    <chr>      <dbl> <dbl>
1 A        X             10  2.5 
2 A        Y              5  3   
3 B        X              8  2.75

9.10 Other Handy Functions

A few more tools that come up constantly in practice.

9.10.1 count() – Quick Frequency Tables

count() is shorthand for “group by this, then count the rows.” You’ll use it more than you think.

Code

count(iris, Species)

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

Code

# Sort by frequency
count(mtcars, cyl, sort = TRUE)

Code

# Count by multiple variables
count(mtcars, cyl, am)

  cyl am  n
1   4  0  3
2   4  1  8
3   6  0  4
4   6  1  3
5   8  0 12
6   8  1  2

9.10.2 The slice() Family – Rows by Position

Code

# First 5 rows
slice_head(iris, n = 5)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

Code

# Top 3 by horsepower
slice_max(mtcars, hp, n = 3)

                mpg cyl disp  hp drat   wt  qsec vs am gear carb
Maserati Bora  15.0   8  301 335 3.54 3.57 14.60  0  1    5    8
Ford Pantera L 15.8   8  351 264 4.22 3.17 14.50  0  1    5    4
Duster 360     14.3   8  360 245 3.21 3.57 15.84  0  0    3    4
Camaro Z28     13.3   8  350 245 3.73 3.84 15.41  0  0    3    4

Code

# Bottom 3 by mpg
slice_min(mtcars, mpg, n = 3)

                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4

Code

# Random sample
set.seed(42)
slice_sample(iris, n = 5)

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          5.3         3.7          1.5         0.2     setosa
2          5.6         2.9          3.6         1.3 versicolor
3          6.1         2.8          4.7         1.2 versicolor
4          6.7         3.0          5.2         2.3  virginica
5          5.6         2.8          4.9         2.0  virginica

9.10.3 distinct() – Remove Duplicates

Code

# Unique combinations of cylinders and transmission
distinct(mtcars, cyl, am)

                  cyl am
Mazda RX4           6  1
Datsun 710          4  1
Hornet 4 Drive      6  0
Hornet Sportabout   8  0
Merc 240D           4  0
Ford Pantera L      8  1

9.10.4 pull() – Extract a Column as a Vector

Sometimes you need a plain vector, not a data frame.

Code

mpg_values <- pull(mtcars, mpg)
mean(mpg_values)

[1] 20.09062

9.10.5 relocate() – Move Columns Around

Code

# Move Species to the front
relocate(iris, Species) |> head(3)

  Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1  setosa          5.1         3.5          1.4         0.2
2  setosa          4.9         3.0          1.4         0.2
3  setosa          4.7         3.2          1.3         0.2

9.10.6 bind_rows() – Stack Data Frames

When you have data split across separate tables with the same columns, stack them vertically — like combining regional sales spreadsheets into one master file.

Code

df1 <- tibble(x = 1:3, y = c("a", "b", "c"))
df2 <- tibble(x = 4:6, y = c("d", "e", "f"))
bind_rows(df1, df2)

# A tibble: 6 × 2
      x y    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c    
4     4 d    
5     5 e    
6     6 f

Code

# Track which table each row came from
bind_rows(first = df1, second = df2, .id = "source")

# A tibble: 6 × 3
  source     x y    
  <chr>  <int> <chr>
1 first      1 a    
2 first      2 b    
3 first      3 c    
4 second     4 d    
5 second     5 e    
6 second     6 f

9.11 Summary

Here’s the cheat sheet you’ll tape to your monitor:

filter() — pick the rows you want (sorting through job applicants)
select() — pick the columns you want (decluttering your spreadsheet)
mutate() — add or change columns (adding a formula to your sales report)
summarize() — compute summary statistics (the bottom of every pivot table)
arrange() — sort your data (A-to-Z, big-to-small)
group_by() — do any of the above by group (the actual pivot table)

Chain them together with the pipe (|>), add left_join() to connect your tables, count() for quick frequency tables, and case_when() for conditional logic, and you’ve got a toolkit that handles just about anything a business analyst needs.

The next chapter on tidyr (Chapter @ref(tidyr)) will show you how to reshape data between wide and long formats — the final piece of the data wrangling puzzle.