12 Customizing ggplot2 Visualizations

13 Customizing ggplot2 Visualizations

Code

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.5.2

Warning: package 'ggplot2' was built under R version 4.5.2

Warning: package 'tibble' was built under R version 4.5.2

Warning: package 'tidyr' was built under R version 4.5.2

Warning: package 'readr' was built under R version 4.5.2

Warning: package 'purrr' was built under R version 4.5.2

Warning: package 'dplyr' was built under R version 4.5.2

Warning: package 'stringr' was built under R version 4.5.2

Warning: package 'forcats' was built under R version 4.5.2

Warning: package 'lubridate' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(plotly)

Warning: package 'plotly' was built under R version 4.5.2


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Code

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

The previous chapter covered the mechanics of building charts. This one covers customization — taking a chart from “communicates the data” to “publication-ready.” The default ggplot2 appearance is fine for exploratory work but rarely the right choice for slide decks, client reports, or anything that will be seen outside your console.

AI Pitfall: AI generates over-decorated charts that obscure the data

Ask an AI to “make this chart look professional” and you may get back code that adds drop shadows, gradients, custom backgrounds, three different fonts, and a logo. Each addition is technically possible. The cumulative effect is a chart that fights its own data for attention.

Edward Tufte’s principle of the data-ink ratio still applies: maximize the ink that represents data, minimize the ink that does not. AI is enthusiastic about adding visual elements; it is not opinionated about removing them. When an AI-generated chart starts to feel busy, that is usually a sign to remove decoration, not add more.

Three concrete checks for any “polished” chart before it ships: - Does every visual element (gridline, color, axis tick) help the reader interpret the data, or is it just decoration? - Are the colors meaningful (encoding categories or values) or just decorative? - If you removed everything that is not data, would the chart still tell its story? If yes, the rest is probably overdecoration.

We will cover:

Themes – swap out the entire visual style in one line
Colors – use palettes that look polished and are accessible
Axis formatting – dollar signs, commas, percentages on your axes
Legends – move them, restyle them, or get rid of them
Annotations – draw attention to the things that matter
Combining plots – put multiple charts together for a complete story

We will use the built-in mpg, iris, and diamonds datasets throughout. This chapter also uses three additional packages — patchwork (combining plots), ggthemes (publication-style themes), and ggrepel (non-overlapping labels). Install them once if you haven’t already:

Code

install.packages(c("patchwork", "ggthemes", "ggrepel"))

13.1 Themes – One-Line Style Upgrades

13.1.1 Built-in Themes

Every ggplot2 plot uses a theme to control non-data elements like backgrounds, gridlines, and fonts. The default is theme_gray(). Here are the ones worth knowing:

Code

base_plot <- ggplot(mpg, aes(x = displ, y = hwy)) +
 geom_point(aes(color = class)) +
 labs(title = "Engine Displacement vs. Highway MPG")

library(patchwork)

p1 <- base_plot + theme_gray() + ggtitle("theme_gray() (default)")
p2 <- base_plot + theme_bw() + ggtitle("theme_bw()")
p3 <- base_plot + theme_minimal() + ggtitle("theme_minimal()")
p4 <- base_plot + theme_classic() + ggtitle("theme_classic()")
p5 <- base_plot + theme_light() + ggtitle("theme_light()")

(p1 | p2 | p3) / (p4 | p5)

Here is the quick rundown:

theme_bw() – Removes the gray background. Clean, great for printed reports.
theme_minimal() – Even cleaner. This is probably the most popular choice for business presentations.
theme_classic() – Looks like a textbook chart. Axis lines, no gridlines.
theme_light() – Light gray gridlines. A nice middle ground.

For most business work, theme_minimal() or theme_bw() will serve you well.

13.1.2 Themes from ggthemes

The ggthemes package provides themes inspired by well-known publications. If you want your chart to look like it belongs in The Economist or the Wall Street Journal:

Code

library(ggthemes)

t1 <- base_plot + theme_economist() + ggtitle("theme_economist()")
t2 <- base_plot + theme_fivethirtyeight() + ggtitle("theme_fivethirtyeight()")
t3 <- base_plot + theme_tufte() + ggtitle("theme_tufte()")
t4 <- base_plot + theme_wsj() + ggtitle("theme_wsj()")

(t1 | t2) / (t3 | t4)

Fun to play with, and occasionally the right choice for a specific audience.

13.1.3 Setting a Global Theme

Tired of adding + theme_minimal() to every plot? Set it once at the top of your script:

Code

theme_set(theme_minimal())

# Now every plot uses theme_minimal() automatically
ggplot(mpg, aes(x = class)) +
 geom_bar(fill = "steelblue") +
 labs(title = "Vehicle Class Counts", x = "Class", y = "Count")

Code

# Reset to default for the rest of the chapter
theme_set(theme_gray())

13.2 Fine-Tuning with theme()

Sometimes you want a built-in theme but with a few tweaks. The theme() function lets you modify individual elements using four helpers:

Helper Function	Controls	Key Arguments
`element_text()`	Text	`family`, `face`, `size`, `color`, `angle`
`element_line()`	Lines	`color`, `linewidth`, `linetype`
`element_rect()`	Rectangles/borders	`fill`, `color`
`element_blank()`	Removes the element	(none)

13.2.1 Example: Making Titles Stand Out

Code

ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
 geom_point(size = 2) +
 labs(
 title = "Fuel Efficiency by Engine Size",
 subtitle = "Larger engines tend to have lower highway mileage",
 caption = "Source: EPA fuel economy data (mpg dataset)",
 x = "Engine Displacement (liters)",
 y = "Highway MPG",
 color = "Vehicle Class"
 ) +
 theme(
 plot.title = element_text(face = "bold", size = 16, color = "navy"),
 plot.subtitle = element_text(face = "italic", size = 12, color = "gray40"),
 plot.caption = element_text(size = 9, hjust = 0, color = "gray50"),
 axis.title = element_text(face = "bold", size = 12)
 )

13.2.2 Example: Cleaning Up the Background

Code

ggplot(mpg, aes(x = cty, y = hwy)) +
 geom_point(alpha = 0.5, color = "darkorange") +
 labs(title = "City vs. Highway Fuel Economy") +
 theme(
 panel.background = element_rect(fill = "white"),
 plot.background = element_rect(fill = "gray95"),
 panel.grid.major = element_line(color = "gray80", linewidth = 0.5),
 panel.grid.minor = element_blank()
 )

13.2.3 Example: Rotating Long Axis Labels

When your category labels overlap, angle them:

Code

ggplot(mpg, aes(x = manufacturer)) +
 geom_bar(fill = "coral") +
 labs(title = "Vehicles by Manufacturer", x = NULL, y = "Count") +
 theme(
 axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
 plot.title = element_text(face = "bold", size = 14)
 )

13.3 Colors That Look Good

Color is one of the most impactful things you can change. ggplot2 gives you several ways to control it.

13.3.1 Manual Colors – Pick Your Own

Use scale_color_manual() (for points and lines) or scale_fill_manual() (for bars and areas) to set exact colors:

Code

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
 geom_point(size = 2) +
 scale_color_manual(
 values = c("f" = "#E41A1C", "r" = "#377EB8", "4" = "#4DAF4A"),
 labels = c("f" = "Front", "r" = "Rear", "4" = "Four-Wheel")
 ) +
 labs(title = "MPG by Engine Size and Drive Type", color = "Drive Type")

This is great when your company has brand colors or when you want precise control.

13.3.2 ColorBrewer – Pre-Made Palettes That Work

ColorBrewer palettes were designed by a cartographer to be visually clear and print-friendly. They are excellent defaults:

Code

RColorBrewer::display.brewer.all()

Three categories:

Sequential (Blues, Greens, OrRd) – for ordered data, low to high
Diverging (RdBu, RdYlGn) – for data with a meaningful midpoint
Qualitative (Set1, Set2, Dark2) – for unordered categories

Code

ggplot(mpg, aes(x = class, fill = class)) +
 geom_bar() +
 scale_fill_brewer(palette = "Set2") +
 labs(title = "Vehicle Counts by Class (Set2 palette)") +
 theme(legend.position = "none")

Code

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
 geom_point(size = 2.5) +
 scale_color_brewer(palette = "Dark2") +
 labs(title = "Iris Measurements (Dark2 palette)")

13.3.3 Viridis – Colorblind-Friendly and Beautiful

The viridis palettes are perceptually uniform and work for people with color vision deficiencies. They are a great default choice, especially for continuous data:

Code

diamonds_sample <- diamonds[sample(nrow(diamonds), 2000), ]

v1 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) +
 geom_point(alpha = 0.6) +
 scale_color_viridis_c(option = "viridis") +
 labs(title = "viridis")

v2 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) +
 geom_point(alpha = 0.6) +
 scale_color_viridis_c(option = "magma") +
 labs(title = "magma")

v3 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) +
 geom_point(alpha = 0.6) +
 scale_color_viridis_c(option = "plasma") +
 labs(title = "plasma")

v4 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) +
 geom_point(alpha = 0.6) +
 scale_color_viridis_c(option = "inferno") +
 labs(title = "inferno")

(v1 | v2) / (v3 | v4)

All four palettes go from dark to light, but the color journeys differ. Viridis (the default) runs from purple through green to yellow — the best all-rounder. Magma and inferno go through warm reds and oranges, making high values “pop” more dramatically. Plasma takes a purple-to-yellow path through pinks. All four are designed to be readable even when printed in black and white and are safe for people with color vision deficiencies. When in doubt, stick with the default viridis.

For discrete (categorical) data, use scale_color_viridis_d() or scale_fill_viridis_d():

Code

ggplot(mpg, aes(x = class, fill = class)) +
 geom_bar() +
 scale_fill_viridis_d(option = "plasma") +
 labs(title = "Vehicle Class with Viridis Discrete (Plasma)") +
 theme(legend.position = "none")

13.4 Axis Formatting – Dollars, Commas, and Percentages

Nothing says “this chart is for business people” like properly formatted axis labels.

13.4.1 The scales Package

The scales package gives you label formatters that handle the common cases:

Code

diamonds_summary <- diamonds %>%
 group_by(cut) %>%
 summarize(avg_price = mean(price), count = n())

ggplot(diamonds_summary, aes(x = cut, y = avg_price)) +
 geom_col(fill = "darkgreen") +
 scale_y_continuous(labels = scales::dollar) +
 labs(title = "Average Diamond Price by Cut", x = "Cut Quality", y = "Average Price")

Code

ggplot(diamonds_summary, aes(x = cut, y = count)) +
 geom_col(fill = "steelblue") +
 scale_y_continuous(labels = scales::comma) +
 labs(title = "Diamond Count by Cut", x = "Cut Quality", y = "Number of Diamonds")

Code

mpg_pct <- mpg %>%
 count(class) %>%
 mutate(pct = n / sum(n))

ggplot(mpg_pct, aes(x = reorder(class, pct), y = pct)) +
 geom_col(fill = "tomato") +
 scale_y_continuous(labels = scales::percent) +
 coord_flip() +
 labs(title = "Percentage of Vehicles by Class", x = NULL, y = "Percentage")

13.4.2 Log Scales

When your data spans several orders of magnitude (like revenue data that goes from $100 to $100M), log scales help:

Code

p_log <- ggplot(diamonds, aes(x = carat, y = price)) +
 geom_point(alpha = 0.05, color = "purple") +
 scale_x_log10() +
 scale_y_log10(labels = scales::dollar) +
 labs(title = "Diamond Price vs. Carat (Log Scale)", x = "Carat (log)", y = "Price (log)")

ggplotly(p_log)

On the original linear axes, the diamond data was a curved blob. On log scales, the relationship between carat and price becomes nearly linear, which means the relationship is multiplicative — doubling the carat weight roughly multiplies the price by a fixed factor rather than adding a fixed dollar amount. Log scales are your friend whenever data spans orders of magnitude or when exponential/power-law relationships are at play.

13.4.3 Zooming vs. Filtering – An Important Distinction

There are two ways to limit your axes, and they behave very differently:

coord_cartesian() zooms in without removing data. Trend lines and statistics stay accurate.
scale_*_continuous(limits = ...) removes data outside the range, which can change trend lines.

Code

p_base <- ggplot(mpg, aes(x = displ, y = hwy)) +
 geom_point() +
 geom_smooth(method = "lm")

p_scale <- p_base +
 scale_x_continuous(limits = c(3, 6)) +
 ggtitle("scale limits (data removed)")

p_coord <- p_base +
 coord_cartesian(xlim = c(3, 6)) +
 ggtitle("coord_cartesian (zoom only)")

p_scale | p_coord

See how the regression lines differ? The left panel’s line is steeper because it was fitted only to the visible data (3–6 liters), ignoring the smaller and larger engines. The right panel’s line was fitted to all the data and then the view was zoomed in. This distinction matters when you are showing a zoomed-in portion of a chart in a presentation — use coord_cartesian() so your trend lines and statistics reflect the full picture, not just the window.

13.5 Legend Customization

13.5.1 Moving the Legend Around

Code

base <- ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
 geom_point()

l1 <- base + theme(legend.position = "top") + ggtitle("Top")
l2 <- base + theme(legend.position = "bottom") + ggtitle("Bottom")
l3 <- base + theme(legend.position = "left") + ggtitle("Left")
l4 <- base + theme(legend.position = "right") + ggtitle("Right (default)")

(l1 | l2) / (l3 | l4)

13.5.2 Removing the Legend

When the information is obvious from context (like a bar chart where color matches the x-axis labels), just remove it:

Code

ggplot(mpg, aes(x = class, fill = class)) +
 geom_bar() +
 theme(legend.position = "none") +
 labs(title = "Bar Chart Without Legend")

13.5.3 Customizing Legend Title and Labels

Code

p_custom_legend <- ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
 geom_point(size = 2) +
 scale_color_manual(
 values = c("setosa" = "#1b9e77", "versicolor" = "#d95f02", "virginica" = "#7570b3"),
 labels = c("Setosa", "Versicolor", "Virginica")
 ) +
 labs(color = "Iris Species", title = "Custom Legend Title and Labels")

ggplotly(p_custom_legend)

13.6 Annotations – Drawing Attention Where It Matters

13.6.1 annotate()

Add text, shapes, or arrows to specific spots on your chart:

Code

ggplot(mpg, aes(x = displ, y = hwy)) +
 geom_point(color = "gray50") +
 annotate("text", x = 6, y = 40, label = "Efficient outliers",
 color = "red", fontface = "bold", size = 5) +
 annotate("rect", xmin = 5.5, xmax = 7, ymin = 25, ymax = 45,
 alpha = 0.1, fill = "red", color = "red", linetype = "dashed") +
 labs(title = "Using annotate() for Text and Shapes")

13.6.2 geom_text() and geom_label() for Data-Driven Labels

Code

top_classes <- mpg %>%
 count(class) %>%
 arrange(desc(n)) %>%
 head(5)

ggplot(top_classes, aes(x = reorder(class, n), y = n)) +
 geom_col(fill = "steelblue") +
 geom_text(aes(label = n), hjust = -0.3, size = 5, fontface = "bold") +
 coord_flip() +
 scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
 labs(title = "Top 5 Vehicle Classes", x = NULL, y = "Count")

13.6.3 ggrepel for Non-Overlapping Labels

When labels pile on top of each other, the ggrepel package automatically spaces them out:

Code

library(ggrepel)

best_mpg <- mpg %>%
 group_by(manufacturer) %>%
 slice_max(hwy, n = 1) %>%
 ungroup() %>%
 slice_max(hwy, n = 8)

ggplot(mpg, aes(x = displ, y = hwy)) +
 geom_point(color = "gray70") +
 geom_point(data = best_mpg, color = "red", size = 3) +
 geom_label_repel(
 data = best_mpg,
 aes(label = paste(manufacturer, model)),
 size = 3,
 max.overlaps = 20,
 fill = "lightyellow"
 ) +
 labs(
 title = "Most Fuel-Efficient Vehicles by Manufacturer",
 subtitle = "Top 8 models highlighted",
 x = "Engine Displacement (L)",
 y = "Highway MPG"
 )

13.7 Combining Plots with patchwork

In real life, you often need to present multiple charts together – a scatter plot next to a bar chart, or a grid of related views. The patchwork package makes this easy with intuitive operators:

| places plots side by side
/ stacks plots vertically
() groups plots to control layout order

Code

p_scatter <- ggplot(mpg, aes(x = displ, y = hwy)) +
 geom_point(color = "steelblue") +
 labs(title = "Scatter Plot")

p_box <- ggplot(mpg, aes(x = drv, y = hwy, fill = drv)) +
 geom_boxplot(show.legend = FALSE) +
 labs(title = "Box Plot")

p_bar <- ggplot(mpg, aes(x = class)) +
 geom_bar(fill = "coral") +
 labs(title = "Bar Chart") +
 theme(axis.text.x = element_text(angle = 45, hjust = 1))

p_scatter | p_box | p_bar

13.7.1 Controlling Layout

Code

(p_scatter | p_box) / p_bar +
 plot_layout(heights = c(2, 1))

13.7.2 Adding a Shared Title and Tags

Tags automatically label your panels A, B, C – perfect for reports and papers:

Code

(p_scatter | p_box) / p_bar +
 plot_annotation(
 title = "Overview of the mpg Dataset",
 subtitle = "Three complementary views of the data",
 caption = "Source: EPA fuel economy data",
 tag_levels = "A"
 ) +
 plot_layout(heights = c(2, 1))

13.7.3 Sharing a Legend Across Plots

When multiple plots use the same color scheme, collect the legends into one:

Code

p_a <- ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
 geom_point()

p_b <- ggplot(mpg, aes(x = cty, y = hwy, color = drv)) +
 geom_point()

p_a + p_b +
 plot_layout(guides = "collect") +
 plot_annotation(title = "Shared Legend Between Plots")

13.8 Summary

Here is your customization toolkit at a glance:

Themes – Use theme_minimal() or theme_bw() for clean, professional charts. Try ggthemes for publication-specific styles. Set a global default with theme_set().
theme() tweaks – Fine-tune titles, axes, gridlines, and backgrounds with element_text(), element_line(), element_rect(), and element_blank().
Colors – Use scale_color_brewer() for categorical data, scale_color_viridis_c() for continuous data, and scale_color_manual() when you need exact colors.
Axis formatting – Use the scales package for dollar, comma, and percent labels. Use coord_cartesian() to zoom without distorting statistics.
Legends – Move with theme(legend.position = ...). Remove with "none". Customize labels in the scale function.
Annotations – Use annotate() for fixed labels, geom_text() for data-driven labels, and ggrepel to prevent overlaps.
Multi-panel figures – Use patchwork with |, /, plot_layout(), and plot_annotation() to combine charts into a cohesive story.

With these tools, you can take any default ggplot2 chart and turn it into something you would be proud to put in front of a client, a manager, or an audience of 500 people.

--- title: "Customizing ggplot2 Visualizations" --- # Customizing ggplot2 Visualizations {#ggplot2customize} ```{r} library(tidyverse) library(plotly) knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) ``` The previous chapter covered the mechanics of building charts. This one covers customization — taking a chart from "communicates the data" to "publication-ready." The default `ggplot2` appearance is fine for exploratory work but rarely the right choice for slide decks, client reports, or anything that will be seen outside your console. ::: {.callout-warning} ## AI Pitfall: AI generates over-decorated charts that obscure the data Ask an AI to "make this chart look professional" and you may get back code that adds drop shadows, gradients, custom backgrounds, three different fonts, and a logo. Each addition is technically possible. The cumulative effect is a chart that fights its own data for attention. Edward Tufte's principle of the data-ink ratio still applies: maximize the ink that represents data, minimize the ink that does not. AI is enthusiastic about adding visual elements; it is not opinionated about removing them. When an AI-generated chart starts to feel busy, that is usually a sign to remove decoration, not add more. Three concrete checks for any "polished" chart before it ships: - Does every visual element (gridline, color, axis tick) help the reader interpret the data, or is it just decoration? - Are the colors meaningful (encoding categories or values) or just decorative? - If you removed everything that is not data, would the chart still tell its story? If yes, the rest is probably overdecoration. ::: We will cover: - **Themes** -- swap out the entire visual style in one line - **Colors** -- use palettes that look polished and are accessible - **Axis formatting** -- dollar signs, commas, percentages on your axes - **Legends** -- move them, restyle them, or get rid of them - **Annotations** -- draw attention to the things that matter - **Combining plots** -- put multiple charts together for a complete story We will use the built-in `mpg`, `iris`, and `diamonds` datasets throughout. This chapter also uses three additional packages --- **patchwork** (combining plots), **ggthemes** (publication-style themes), and **ggrepel** (non-overlapping labels). Install them once if you haven't already: ```{r} #| eval: false install.packages(c("patchwork", "ggthemes", "ggrepel")) ``` ## Themes -- One-Line Style Upgrades ### Built-in Themes Every ggplot2 plot uses a theme to control non-data elements like backgrounds, gridlines, and fonts. The default is `theme_gray()`. Here are the ones worth knowing: ```{r} base_plot <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + labs(title = "Engine Displacement vs. Highway MPG") library(patchwork) p1 <- base_plot + theme_gray() + ggtitle("theme_gray() (default)") p2 <- base_plot + theme_bw() + ggtitle("theme_bw()") p3 <- base_plot + theme_minimal() + ggtitle("theme_minimal()") p4 <- base_plot + theme_classic() + ggtitle("theme_classic()") p5 <- base_plot + theme_light() + ggtitle("theme_light()") (p1 | p2 | p3) / (p4 | p5) ``` Here is the quick rundown: - **`theme_bw()`** -- Removes the gray background. Clean, great for printed reports. - **`theme_minimal()`** -- Even cleaner. This is probably the most popular choice for business presentations. - **`theme_classic()`** -- Looks like a textbook chart. Axis lines, no gridlines. - **`theme_light()`** -- Light gray gridlines. A nice middle ground. For most business work, `theme_minimal()` or `theme_bw()` will serve you well. ### Themes from ggthemes The **ggthemes** package provides themes inspired by well-known publications. If you want your chart to look like it belongs in The Economist or the Wall Street Journal: ```{r} library(ggthemes) t1 <- base_plot + theme_economist() + ggtitle("theme_economist()") t2 <- base_plot + theme_fivethirtyeight() + ggtitle("theme_fivethirtyeight()") t3 <- base_plot + theme_tufte() + ggtitle("theme_tufte()") t4 <- base_plot + theme_wsj() + ggtitle("theme_wsj()") (t1 | t2) / (t3 | t4) ``` Fun to play with, and occasionally the right choice for a specific audience. ### Setting a Global Theme Tired of adding `+ theme_minimal()` to every plot? Set it once at the top of your script: ```{r} theme_set(theme_minimal()) # Now every plot uses theme_minimal() automatically ggplot(mpg, aes(x = class)) + geom_bar(fill = "steelblue") + labs(title = "Vehicle Class Counts", x = "Class", y = "Count") ``` ```{r} # Reset to default for the rest of the chapter theme_set(theme_gray()) ``` ## Fine-Tuning with theme() Sometimes you want a built-in theme but with a few tweaks. The `theme()` function lets you modify individual elements using four helpers: | Helper Function | Controls | Key Arguments | |:---|:---|:---| | `element_text()` | Text | `family`, `face`, `size`, `color`, `angle` | | `element_line()` | Lines | `color`, `linewidth`, `linetype` | | `element_rect()` | Rectangles/borders | `fill`, `color` | | `element_blank()` | Removes the element | (none) | ### Example: Making Titles Stand Out ```{r} ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point(size = 2) + labs( title = "Fuel Efficiency by Engine Size", subtitle = "Larger engines tend to have lower highway mileage", caption = "Source: EPA fuel economy data (mpg dataset)", x = "Engine Displacement (liters)", y = "Highway MPG", color = "Vehicle Class" ) + theme( plot.title = element_text(face = "bold", size = 16, color = "navy"), plot.subtitle = element_text(face = "italic", size = 12, color = "gray40"), plot.caption = element_text(size = 9, hjust = 0, color = "gray50"), axis.title = element_text(face = "bold", size = 12) ) ``` ### Example: Cleaning Up the Background ```{r} ggplot(mpg, aes(x = cty, y = hwy)) + geom_point(alpha = 0.5, color = "darkorange") + labs(title = "City vs. Highway Fuel Economy") + theme( panel.background = element_rect(fill = "white"), plot.background = element_rect(fill = "gray95"), panel.grid.major = element_line(color = "gray80", linewidth = 0.5), panel.grid.minor = element_blank() ) ``` ### Example: Rotating Long Axis Labels When your category labels overlap, angle them: ```{r} ggplot(mpg, aes(x = manufacturer)) + geom_bar(fill = "coral") + labs(title = "Vehicles by Manufacturer", x = NULL, y = "Count") + theme( axis.text.x = element_text(angle = 45, hjust = 1, size = 9), plot.title = element_text(face = "bold", size = 14) ) ``` ## Colors That Look Good Color is one of the most impactful things you can change. ggplot2 gives you several ways to control it. ### Manual Colors -- Pick Your Own Use `scale_color_manual()` (for points and lines) or `scale_fill_manual()` (for bars and areas) to set exact colors: ```{r} ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point(size = 2) + scale_color_manual( values = c("f" = "#E41A1C", "r" = "#377EB8", "4" = "#4DAF4A"), labels = c("f" = "Front", "r" = "Rear", "4" = "Four-Wheel") ) + labs(title = "MPG by Engine Size and Drive Type", color = "Drive Type") ``` This is great when your company has brand colors or when you want precise control. ### ColorBrewer -- Pre-Made Palettes That Work ColorBrewer palettes were designed by a cartographer to be visually clear and print-friendly. They are excellent defaults: ```{r} RColorBrewer::display.brewer.all() ``` Three categories: - **Sequential** (Blues, Greens, OrRd) -- for ordered data, low to high - **Diverging** (RdBu, RdYlGn) -- for data with a meaningful midpoint - **Qualitative** (Set1, Set2, Dark2) -- for unordered categories ```{r} ggplot(mpg, aes(x = class, fill = class)) + geom_bar() + scale_fill_brewer(palette = "Set2") + labs(title = "Vehicle Counts by Class (Set2 palette)") + theme(legend.position = "none") ``` ```{r} ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(size = 2.5) + scale_color_brewer(palette = "Dark2") + labs(title = "Iris Measurements (Dark2 palette)") ``` ### Viridis -- Colorblind-Friendly and Beautiful The viridis palettes are perceptually uniform and work for people with color vision deficiencies. They are a great default choice, especially for continuous data: ```{r} diamonds_sample <- diamonds[sample(nrow(diamonds), 2000), ] v1 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) + geom_point(alpha = 0.6) + scale_color_viridis_c(option = "viridis") + labs(title = "viridis") v2 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) + geom_point(alpha = 0.6) + scale_color_viridis_c(option = "magma") + labs(title = "magma") v3 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) + geom_point(alpha = 0.6) + scale_color_viridis_c(option = "plasma") + labs(title = "plasma") v4 <- ggplot(diamonds_sample, aes(x = carat, y = price, color = depth)) + geom_point(alpha = 0.6) + scale_color_viridis_c(option = "inferno") + labs(title = "inferno") (v1 | v2) / (v3 | v4) ``` All four palettes go from dark to light, but the color journeys differ. **Viridis** (the default) runs from purple through green to yellow --- the best all-rounder. **Magma** and **inferno** go through warm reds and oranges, making high values "pop" more dramatically. **Plasma** takes a purple-to-yellow path through pinks. All four are designed to be readable even when printed in black and white and are safe for people with color vision deficiencies. When in doubt, stick with the default viridis. For discrete (categorical) data, use `scale_color_viridis_d()` or `scale_fill_viridis_d()`: ```{r} ggplot(mpg, aes(x = class, fill = class)) + geom_bar() + scale_fill_viridis_d(option = "plasma") + labs(title = "Vehicle Class with Viridis Discrete (Plasma)") + theme(legend.position = "none") ``` ## Axis Formatting -- Dollars, Commas, and Percentages Nothing says "this chart is for business people" like properly formatted axis labels. ### The scales Package The **scales** package gives you label formatters that handle the common cases: ```{r} diamonds_summary <- diamonds %>% group_by(cut) %>% summarize(avg_price = mean(price), count = n()) ggplot(diamonds_summary, aes(x = cut, y = avg_price)) + geom_col(fill = "darkgreen") + scale_y_continuous(labels = scales::dollar) + labs(title = "Average Diamond Price by Cut", x = "Cut Quality", y = "Average Price") ``` ```{r} ggplot(diamonds_summary, aes(x = cut, y = count)) + geom_col(fill = "steelblue") + scale_y_continuous(labels = scales::comma) + labs(title = "Diamond Count by Cut", x = "Cut Quality", y = "Number of Diamonds") ``` ```{r} mpg_pct <- mpg %>% count(class) %>% mutate(pct = n / sum(n)) ggplot(mpg_pct, aes(x = reorder(class, pct), y = pct)) + geom_col(fill = "tomato") + scale_y_continuous(labels = scales::percent) + coord_flip() + labs(title = "Percentage of Vehicles by Class", x = NULL, y = "Percentage") ``` ### Log Scales When your data spans several orders of magnitude (like revenue data that goes from $100 to $100M), log scales help: ```{r} p_log <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point(alpha = 0.05, color = "purple") + scale_x_log10() + scale_y_log10(labels = scales::dollar) + labs(title = "Diamond Price vs. Carat (Log Scale)", x = "Carat (log)", y = "Price (log)") ggplotly(p_log) ``` On the original linear axes, the diamond data was a curved blob. On log scales, the relationship between carat and price becomes nearly linear, which means the relationship is *multiplicative* --- doubling the carat weight roughly multiplies the price by a fixed factor rather than adding a fixed dollar amount. Log scales are your friend whenever data spans orders of magnitude or when exponential/power-law relationships are at play. ### Zooming vs. Filtering -- An Important Distinction There are two ways to limit your axes, and they behave very differently: - **`coord_cartesian()`** zooms in *without removing data*. Trend lines and statistics stay accurate. - **`scale_*_continuous(limits = ...)`** *removes data* outside the range, which can change trend lines. ```{r} p_base <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + geom_smooth(method = "lm") p_scale <- p_base + scale_x_continuous(limits = c(3, 6)) + ggtitle("scale limits (data removed)") p_coord <- p_base + coord_cartesian(xlim = c(3, 6)) + ggtitle("coord_cartesian (zoom only)") p_scale | p_coord ``` See how the regression lines differ? The left panel's line is steeper because it was fitted only to the visible data (3--6 liters), ignoring the smaller and larger engines. The right panel's line was fitted to *all* the data and then the view was zoomed in. This distinction matters when you are showing a zoomed-in portion of a chart in a presentation --- use `coord_cartesian()` so your trend lines and statistics reflect the full picture, not just the window. ## Legend Customization ### Moving the Legend Around ```{r} base <- ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point() l1 <- base + theme(legend.position = "top") + ggtitle("Top") l2 <- base + theme(legend.position = "bottom") + ggtitle("Bottom") l3 <- base + theme(legend.position = "left") + ggtitle("Left") l4 <- base + theme(legend.position = "right") + ggtitle("Right (default)") (l1 | l2) / (l3 | l4) ``` ### Removing the Legend When the information is obvious from context (like a bar chart where color matches the x-axis labels), just remove it: ```{r} ggplot(mpg, aes(x = class, fill = class)) + geom_bar() + theme(legend.position = "none") + labs(title = "Bar Chart Without Legend") ``` ### Customizing Legend Title and Labels ```{r} p_custom_legend <- ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point(size = 2) + scale_color_manual( values = c("setosa" = "#1b9e77", "versicolor" = "#d95f02", "virginica" = "#7570b3"), labels = c("Setosa", "Versicolor", "Virginica") ) + labs(color = "Iris Species", title = "Custom Legend Title and Labels") ggplotly(p_custom_legend) ``` ## Annotations -- Drawing Attention Where It Matters ### annotate() Add text, shapes, or arrows to specific spots on your chart: ```{r} ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(color = "gray50") + annotate("text", x = 6, y = 40, label = "Efficient outliers", color = "red", fontface = "bold", size = 5) + annotate("rect", xmin = 5.5, xmax = 7, ymin = 25, ymax = 45, alpha = 0.1, fill = "red", color = "red", linetype = "dashed") + labs(title = "Using annotate() for Text and Shapes") ``` ### geom_text() and geom_label() for Data-Driven Labels ```{r} top_classes <- mpg %>% count(class) %>% arrange(desc(n)) %>% head(5) ggplot(top_classes, aes(x = reorder(class, n), y = n)) + geom_col(fill = "steelblue") + geom_text(aes(label = n), hjust = -0.3, size = 5, fontface = "bold") + coord_flip() + scale_y_continuous(expand = expansion(mult = c(0, 0.15))) + labs(title = "Top 5 Vehicle Classes", x = NULL, y = "Count") ``` ### ggrepel for Non-Overlapping Labels When labels pile on top of each other, the **ggrepel** package automatically spaces them out: ```{r} library(ggrepel) best_mpg <- mpg %>% group_by(manufacturer) %>% slice_max(hwy, n = 1) %>% ungroup() %>% slice_max(hwy, n = 8) ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(color = "gray70") + geom_point(data = best_mpg, color = "red", size = 3) + geom_label_repel( data = best_mpg, aes(label = paste(manufacturer, model)), size = 3, max.overlaps = 20, fill = "lightyellow" ) + labs( title = "Most Fuel-Efficient Vehicles by Manufacturer", subtitle = "Top 8 models highlighted", x = "Engine Displacement (L)", y = "Highway MPG" ) ``` ## Combining Plots with patchwork In real life, you often need to present multiple charts together -- a scatter plot next to a bar chart, or a grid of related views. The **patchwork** package makes this easy with intuitive operators: - `|` places plots side by side - `/` stacks plots vertically - `()` groups plots to control layout order ```{r} p_scatter <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(color = "steelblue") + labs(title = "Scatter Plot") p_box <- ggplot(mpg, aes(x = drv, y = hwy, fill = drv)) + geom_boxplot(show.legend = FALSE) + labs(title = "Box Plot") p_bar <- ggplot(mpg, aes(x = class)) + geom_bar(fill = "coral") + labs(title = "Bar Chart") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) p_scatter | p_box | p_bar ``` ### Controlling Layout ```{r} (p_scatter | p_box) / p_bar + plot_layout(heights = c(2, 1)) ``` ### Adding a Shared Title and Tags Tags automatically label your panels A, B, C -- perfect for reports and papers: ```{r} (p_scatter | p_box) / p_bar + plot_annotation( title = "Overview of the mpg Dataset", subtitle = "Three complementary views of the data", caption = "Source: EPA fuel economy data", tag_levels = "A" ) + plot_layout(heights = c(2, 1)) ``` ### Sharing a Legend Across Plots When multiple plots use the same color scheme, collect the legends into one: ```{r} p_a <- ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() p_b <- ggplot(mpg, aes(x = cty, y = hwy, color = drv)) + geom_point() p_a + p_b + plot_layout(guides = "collect") + plot_annotation(title = "Shared Legend Between Plots") ``` ## Summary Here is your customization toolkit at a glance: 1. **Themes** -- Use `theme_minimal()` or `theme_bw()` for clean, professional charts. Try `ggthemes` for publication-specific styles. Set a global default with `theme_set()`. 2. **theme() tweaks** -- Fine-tune titles, axes, gridlines, and backgrounds with `element_text()`, `element_line()`, `element_rect()`, and `element_blank()`. 3. **Colors** -- Use `scale_color_brewer()` for categorical data, `scale_color_viridis_c()` for continuous data, and `scale_color_manual()` when you need exact colors. 4. **Axis formatting** -- Use the `scales` package for `dollar`, `comma`, and `percent` labels. Use `coord_cartesian()` to zoom without distorting statistics. 5. **Legends** -- Move with `theme(legend.position = ...)`. Remove with `"none"`. Customize labels in the scale function. 6. **Annotations** -- Use `annotate()` for fixed labels, `geom_text()` for data-driven labels, and `ggrepel` to prevent overlaps. 7. **Multi-panel figures** -- Use patchwork with `|`, `/`, `plot_layout()`, and `plot_annotation()` to combine charts into a cohesive story. With these tools, you can take any default ggplot2 chart and turn it into something you would be proud to put in front of a client, a manager, or an audience of 500 people.