19 R Markdown

20 R Markdown

Code

library(knitr)

Warning: package 'knitr' was built under R version 4.5.2

Code

opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)

After an analysis is done, the next problem is communicating it. The default workflow — run code in R, copy numbers into Word, screenshot the charts, paste them into slides — falls apart the moment the data is updated. Every change forces a manual rebuild, and there is no good way to tell what is current versus stale.

R Markdown solves this by combining the writing and the code in a single document. Render once and you get a polished HTML page, PDF, or Word document with all the charts, tables, and numbers regenerated from the live code. Update the data, render again, and the entire document rebuilds.

Quarto vs R Markdown — what to use in 2026

R Markdown (.Rmd) is the original. Quarto (.qmd) is the next-generation framework Posit released in 2022; this book is written in Quarto. The differences for a beginner are minor — Quarto uses YAML chunk options (#| label:, #| echo: false) instead of the older inline form, supports more output formats out of the box, and has a more consistent CLI. Everything in this chapter applies to both. New projects should use Quarto; existing R Markdown projects work fine and do not need to be migrated.

AI Pitfall: AI generates YAML headers that produce wrong output

The YAML header at the top of an R Markdown or Quarto file controls everything from output format to figure dimensions to citation handling. AI assistants sometimes generate headers that work for the example they were shown but break for your context.

Common failures:

Mixing R Markdown and Quarto YAML. AI may give you output: html_document (R Markdown style) when you are working in Quarto, where the equivalent is format: html. The render fails or produces unexpected results.
Deprecated chunk options. Older R Markdown chunk options like fig.cap and fig.height work in chunk headers, but in Quarto’s pipe-comment style they become fig-cap and fig-height (note the hyphen instead of dot). AI sometimes mixes the two.
Bibliography paths that do not exist. AI confidently generates bibliography: book.bib even when no book.bib file exists in your project, and the render errors with a citation lookup failure.

Always render the document immediately after creating or modifying YAML to catch these. Errors at render time are easy to diagnose; errors that produce wrong-looking output without erroring are not.

20.1 What is R Markdown?

An R Markdown file (.Rmd) is a document where prose and R code live in the same file. When you render (or knit) it, three things happen:

R runs all your code.
It grabs the output—tables, charts, printed results.
It stitches everything together into a finished document.

The result can be an HTML page, a PDF, a Word doc, slides, or even a full website (this book is written in R Markdown!). Think of it as a self-updating report.

You need two packages to make this work:

Code

install.packages("rmarkdown")
install.packages("knitr")

That’s it. RStudio already has Pandoc (the engine that converts everything) built in, so you’re good to go.

20.2 The Three Components of an R Markdown Document

Every .Rmd file has exactly three types of content:

YAML header – The settings block at the top. Think of it as the “cover page setup.”
Markdown text – Your actual writing: paragraphs, headers, bullet points.
Code chunks – The R code that produces your numbers and charts.

Here is the simplest R Markdown document you can write:

---
title: "My First Report"
author: "Your Name"
date: "2024-01-15"
output: html_document
---

## Introduction

This is my first R Markdown document.

```{r}
summary(iris)
```

Let’s break each piece down.

20.3 YAML Header

The YAML header sits at the very top of your file, sandwiched between two lines of ---. It tells R Markdown the basics: what’s this document called, who wrote it, and what format should the output be?

20.3.1 The Essentials

Here is a perfectly good YAML header:

---
title: "Quarterly Sales Report"
author: "Vivek H. Patil"
date: "2024-09-01"
output: html_document
---

That’s really all you need. Four lines, and you’re in business.

Pro tip: Want the date to update automatically every time you knit? Use this:

date: "`r Sys.Date()`"

Now your report always shows today’s date. No more “wait, is this the March version or the April version?”

20.3.2 Picking Your Output Format

The three formats you’ll actually use:

Format	YAML value	When to use it
HTML	`html_document`	Day-to-day reports, sharing via email
Word	`word_document`	When your boss needs it in Word
PDF	`pdf_document`	Formal reports (requires LaTeX)

HTML is the default, and honestly, it’s the best for most business use cases. It’s interactive, looks great, and you can email it as a single file.

20.3.3 Making It Look Good

You can add a floating table of contents, pick a theme, and let readers show/hide code—all from the YAML header:

---
title: "Sales Analysis Q3 2024"
author: "Vivek H. Patil"
date: "`r format(Sys.Date(), "%B %d, %Y")`"
output:
 html_document:
 toc: true
 toc_float: true
 theme: flatly
 code_folding: hide
---

Here’s what each of those options does:

toc: true – Adds a table of contents (generated from your headers).
toc_float: true – Makes the table of contents float on the side as you scroll. Very handy for long reports.
theme – Changes the visual style. Try cerulean, cosmo, flatly, journal, readable, or united.
code_folding: hide – Hides all code by default, but readers can click to reveal it. Perfect for reports where your manager doesn’t want to see code but your analyst colleague does.

That’s really all the YAML you need to know. If you want to go deeper, Section @ref(output-formats) has more detail.

20.4 Markdown Syntax

The writing part of your document uses Markdown—a simple way to format text without messing around in a toolbar. If you’ve ever used Slack formatting or Reddit, you’ve basically used Markdown.

20.4.1 Headers

Use # symbols to create headings. More # signs = smaller heading:

# Big Chapter Title
## Section
### Subsection

20.4.2 Bold, Italic, and Friends

*italic* gives you italic
**bold** gives you bold
***bold italic*** gives you bold italic
~~strikethrough~~ gives you ~~strikethrough~~

20.4.3 Lists

Bullet points:

- Revenue grew 12%
- Costs decreased 3%
 - Labor costs down 5%
 - Materials costs up 2%
- Net profit up 15%

Which renders as:

Revenue grew 12%
Costs decreased 3%
Labor costs down 5%
Materials costs up 2%
Net profit up 15%

Numbered lists:

1. Pull the data
2. Clean the data
3. Analyze the data
4. Pretend the data was clean all along

20.4.4 Links

[The R Project](https://www.r-project.org/)

Renders as: The R Project

20.4.5 Tables

You can type tables by hand using pipes and dashes:

| Region | Revenue | Growth |
|:--------|--------:|:------:|
| West | $45M | 12% |
| East | $38M | 8% |
| Central | $29M | -2% |

Renders as:

Region	Revenue	Growth
West	$45M	12%
East	$38M	8%
Central	$29M	-2%

The colons control alignment: :--- is left, ---: is right, :---: is centered.

For real data-driven tables, you’ll want to generate them from R—see Section @ref(tables-in-rmarkdown).

20.4.6 Blockquotes

Prefix a line with >:

> "Revenue is vanity, profit is sanity, cash is reality."

Renders as:

“Revenue is vanity, profit is sanity, cash is reality.”

20.4.7 Math Equations

Yes, you can write fancy equations if you need to. Most of you won’t need this often, but if you’re in a finance or econ class and your professor wants formulas, here’s the gist:

Inline math uses single dollar signs: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$ produces $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$.

Display math uses double dollar signs:

\[ \hat{\beta} = (X^T X)^{-1} X^T y \]

If you need more than that, Google “LaTeX math symbols” and you’ll find everything.

20.5 Code Chunks

Code chunks are where the magic happens. This is the R code that actually does your analysis.

20.5.1 Inserting a Code Chunk

A code chunk starts with ```{r} and ends with ```. The keyboard shortcut in RStudio is Ctrl+Alt+I (Windows) or Cmd+Option+I (Mac). Use it. Your fingers will thank you.

Code

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

20.5.2 Naming Your Chunks

Give your chunks a name right after the r. It makes life easier when debugging (“Error in chunk revenue-plot” is way more helpful than “Error in unnamed-chunk-47”):

```{r}
summary(mtcars$mpg)
```

Code

summary(mtcars$mpg)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.43   19.20   20.09   22.80   33.90

20.5.3 The Chunk Options You Actually Need

There are dozens of chunk options, but here are the ones that matter for 95% of business reports:

20.5.3.1 Show/Hide Code and Output

Option	Default	What it does
`echo`	`TRUE`	Show the code in the report?
`eval`	`TRUE`	Actually run the code?
`include`	`TRUE`	Show anything (code + output) in the document?

echo=FALSE is your best friend for client-facing reports. It hides the code but shows the result:

Code

mean(mtcars$mpg)

[1] 20.09062

The number above came from mean(mtcars$mpg), but the code is hidden. Your VP doesn’t need to see R code. They need the answer.

eval=FALSE shows code without running it. Good for “here’s how you would install this package” examples:

Code

install.packages("tidyverse")

include=FALSE runs the code silently—no code, no output in the report. Perfect for setup chunks where you load packages:

```{r}
library(tidyverse)
library(knitr)
```

20.5.3.2 Silencing Messages and Warnings

Option	Default	What it does
`message`	`TRUE`	Show messages (like package loading info)?
`warning`	`TRUE`	Show warnings?

When you load packages, R loves to tell you about every attached namespace and masked function. Nobody reading your report cares:

Code

library(ggplot2)
library(dplyr)

Set message=FALSE and warning=FALSE and enjoy the silence.

20.5.3.3 Controlling Figure Size

Option	Default	What it does
`fig.width`	`7`	Width in inches
`fig.height`	`5`	Height in inches
`fig.cap`	`NULL`	Caption for the figure
`fig.align`	`"default"`	Alignment: `"left"`, `"center"`, `"right"`
`out.width`	`NULL`	Width in the output (e.g., `"80%"`)

Code

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
 geom_point(size = 2) +
 labs(title = "Iris: Sepal Dimensions",
 x = "Sepal Length (cm)",
 y = "Sepal Width (cm)") +
 theme_minimal()

20.5.4 Global Chunk Options

Tired of typing message=FALSE, warning=FALSE on every single chunk? Set defaults for the entire document in a setup chunk at the top:

```{r}
knitr::opts_chunk$set(
 echo = TRUE,
 warning = FALSE,
 message = FALSE,
 fig.align = "center"
)
```

Any individual chunk can still override these. Think of it like setting company-wide defaults, with each department free to customize.

20.6 Inline R Code

This is one of the most underrated features of R Markdown. You can drop R results right into your sentences.

Instead of writing “the average MPG is 20.1” (and hoping you copied the right number), you write:

The average MPG is `r round(mean(mtcars$mpg), 1)`.

And it renders as: The average MPG is [R: round(mean(mtcars$mpg), 1)].

Here’s a more realistic example. Let’s say you computed some key numbers:

Code

avg_mpg <- mean(mtcars$mpg)
max_hp <- max(mtcars$hp)
n_cyl6 <- sum(mtcars$cyl == 6)

Now you can write sentences that update themselves:

The average fuel efficiency across all cars is [R: round(avg_mpg, 1)] miles per gallon.
The most powerful car has [R: max_hp] horsepower.
There are [R: n_cyl6] cars with 6 cylinders in the dataset.

If the data changes, these numbers update automatically the next time you knit. No more copy-paste errors. No more “wait, did I update that number on slide 14?” This is the kind of thing that separates a good analyst from one who stays late fixing reports.

20.7 Output Formats in Detail

20.7.1 HTML Documents

HTML is the Swiss Army knife of output formats. Here’s a full-featured setup:

---
output:
 html_document:
 toc: true
 toc_float: true
 number_sections: true
 theme: flatly
 code_folding: hide
 df_print: paged
---

df_print: paged – Makes data frames display as nice, paginated tables readers can click through. Way better than a wall of text.
self_contained: true (the default) – Embeds everything into one HTML file. You can email it to anyone and it just works. No broken image links.

20.7.2 PDF Documents

PDF output looks polished and professional, but it needs a LaTeX installation. Easiest path:

Code

install.packages("tinytex")
tinytex::install_tinytex()

Then in your YAML:

---
output:
 pdf_document:
 toc: true
 number_sections: true
---

20.7.3 Word Documents

When your stakeholder absolutely, positively needs a .docx:

---
output:
 word_document:
 toc: true
 reference_docx: my-styles.docx
---

The reference_docx option is clever—you hand it a Word file with your company’s fonts and styles, and R Markdown applies them to the generated document. Brand guidelines? Handled.

20.8 Tables in R Markdown

Hand-typing tables in Markdown is fine for small stuff, but for real data, let R build your tables.

20.8.1 knitr::kable()

The simplest way to turn a data frame into a clean table:

Code

kable(head(iris), caption = "First six rows of the iris dataset.")

First six rows of the iris dataset.
Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

You can control column names, alignment, and decimal places:

Code

mtcars_summary <- mtcars %>%
 group_by(cyl) %>%
 summarize(
 Count = n(),
 Avg_MPG = mean(mpg),
 Avg_HP = mean(hp),
 Avg_WT = mean(wt)
 )

kable(mtcars_summary,
 digits = 2,
 col.names = c("Cylinders", "Count", "Avg MPG", "Avg HP", "Avg Weight"),
 align = c("c", "c", "r", "r", "r"),
 caption = "Summary statistics of mtcars by number of cylinders.")

Summary statistics of mtcars by number of cylinders.
Cylinders	Count	Avg MPG	Avg HP	Avg Weight
4	11	26.66	82.64	2.29
6	7	19.74	122.29	3.12
8	14	15.10	209.21	4.00

20.8.2 kableExtra for Fancy Styling

Want boardroom-ready tables? The kableExtra package has you covered:

Code

install.packages("kableExtra")

Code

library(kableExtra)

kable(mtcars_summary,
 digits = 2,
 col.names = c("Cylinders", "Count", "Avg MPG", "Avg HP", "Avg Weight"),
 caption = "Styled summary of mtcars by cylinder count.") %>%
 kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
 full_width = FALSE,
 position = "center") %>%
 column_spec(1, bold = TRUE) %>%
 row_spec(0, bold = TRUE, color = "white", background = "#3366cc")

Styled summary of mtcars by cylinder count.
Cylinders	Count	Avg MPG	Avg HP	Avg Weight
4	11	26.66	82.64	2.29
6	7	19.74	122.29	3.12
8	14	15.10	209.21	4.00

Striped rows, hover effects, bold headers with a branded color—this is the kind of table that makes people think you spent way more time than you actually did.

20.9 Figures and Images

20.9.1 R-Generated Plots

Any code chunk that produces a plot automatically includes it in your document. Control the size and add a caption:

Code

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
 geom_boxplot(show.legend = FALSE) +
 labs(x = "Number of Cylinders", y = "Miles per Gallon") +
 theme_minimal() +
 scale_fill_brewer(palette = "Set2")

Use out.width for percentage-based sizing, which usually works better than inches:

Code

ggplot(iris, aes(x = Petal.Length, fill = Species)) +
 geom_density(alpha = 0.5) +
 labs(x = "Petal Length (cm)", y = "Density") +
 theme_minimal()

20.9.2 Side-by-Side Plots

Want two charts next to each other? Use fig.show="hold" with out.width="50%":

Code

ggplot(mtcars, aes(x = wt, y = mpg)) +
 geom_point() +
 labs(title = "MPG vs Weight") +
 theme_minimal()

Code

ggplot(mtcars, aes(x = hp, y = mpg)) +
 geom_point() +
 labs(title = "MPG vs Horsepower") +
 theme_minimal()

20.9.3 External Images

To include an image file (like a company logo or a screenshot), use knitr::include_graphics() inside a code chunk:

Code

knitr::include_graphics("https://www.r-project.org/Rlogo.png")

20.10 Cross-Referencing

When your report gets longer than a few pages, you’ll want to say things like “as shown in Figure 3” or “see Table 2.” R Markdown (via bookdown) handles the numbering for you automatically.

20.10.1 Figures

To cross-reference a figure, your chunk needs two things: a name and a fig.cap. Then use \@ref(fig:chunk-name) in your text. For example, \@ref(fig:iris-scatter) refers to Figure @ref(fig:iris-scatter).

20.10.2 Tables

Same idea: \@ref(tab:chunk-name) where the chunk has a kable() with a caption. Example: \@ref(tab:kable-formatted) refers to Table @ref(tab:kable-formatted).

20.10.3 Sections

Give a section a label with {#label}, then reference it with \@ref(label). The YAML header, for instance, is discussed in Section @ref(yaml-header).

20.10.4 Quick Reference

Element	How to label it	How to reference it
Figure	Chunk name + `fig.cap`	`\@ref(fig:chunk-name)`
Table	Chunk name + `kable(caption=...)`	`\@ref(tab:chunk-name)`
Section	`## Title {#label}`	`\@ref(label)`

20.11 Citations and Bibliography

If you’re writing a research paper or a thesis-style report, R Markdown handles citations. Add a .bib file to your YAML:

---
bibliography: references.bib
link-citations: true
---

Then cite with @key (in-text) or [@key] (parenthetical). The bibliography appears at the end automatically. If your professor or journal requires citations, this will save you hours compared to doing it by hand.

20.12 Rendering Documents

20.12.1 The Knit Button

Click Knit at the top of the RStudio editor. That’s it. Seriously.

The dropdown arrow next to it lets you pick the output format if you have more than one in your YAML.

20.12.2 rmarkdown::render()

You can also render from the console, which is handy for automation:

Code

rmarkdown::render("my-report.Rmd")

Override the output format:

Code

rmarkdown::render("my-report.Rmd", output_format = "pdf_document")

20.12.3 What Happens Under the Hood

When you click Knit:

knitr runs all your code chunks and produces a plain Markdown (.md) file.
Pandoc converts that .md file into your chosen output format.

That’s the whole pipeline. If you get an error, it’s almost always in step 1 (your R code has a bug). Fix the code, knit again.

20.13 Tips and Best Practices

20.13.1 Start with a Setup Chunk

Every document should begin with a setup chunk that loads packages and sets defaults. Think of it as “opening the store before customers arrive”:

```{r}
knitr::opts_chunk$set(
 echo = TRUE,
 message = FALSE,
 warning = FALSE,
 fig.align = "center"
)
library(tidyverse)
library(knitr)
```

20.13.2 Name Your Chunks

“Error in unnamed-chunk-47” is about as helpful as a meeting that could have been an email. Name your chunks so you can actually find problems.

20.13.3 Use `echo=FALSE` for Stakeholder Reports

Your CFO does not need to see library(dplyr). Use echo=FALSE (or code_folding: hide in the YAML) to keep reports clean while preserving your code.

20.13.4 Use Parameterized Reports for Repetitive Work

If you generate the same report for different regions, time periods, or product lines, use parameters:

---
title: "Regional Sales Report"
params:
 region: "West"
 year: 2024
output: html_document
---

Then in your code, use params$region and params$year. Render different versions like this:

Code

rmarkdown::render("report.Rmd", params = list(region = "East", year = 2023))

One template, infinite reports. Your future self will be grateful.

20.13.5 Keyboard Shortcuts Worth Memorizing

Windows/Linux	Mac	What it does
Ctrl+Alt+I	Cmd+Option+I	Insert a new code chunk
Ctrl+Shift+K	Cmd+Shift+K	Knit the document
Ctrl+Shift+Enter	Cmd+Shift+Enter	Run the current chunk
Ctrl+Enter	Cmd+Enter	Run the current line

20.14 Summary

Here’s what you now know how to do:

Write a YAML header that sets up your document’s title, author, output format, and appearance.
Use Markdown to format text with headers, bold, italic, lists, links, and tables.
Write code chunks that run R code and control what shows up in the report with options like echo, eval, message, and warning.
Use inline R code to drop computed numbers directly into your sentences (no more copy-paste errors).
Generate tables with knitr::kable() and kableExtra that look presentation-ready.
Include and size figures with chunk options like fig.width, fig.height, and out.width.
Cross-reference figures, tables, and sections automatically.
Render to HTML, PDF, or Word with one click.

The bottom line: R Markdown turns your analysis into a self-updating, professional report. The more you use it, the more time you save—and the fewer “can you update the numbers?” emails you’ll get.

And remember, you can always access the R Markdown cheat sheet from RStudio via Help > Cheat Sheets > R Markdown Cheat Sheet.

One last thought. Reproducibility is not just a technical convenience — it is a form of intellectual honesty. When your analysis is transparent and re-runnable, anyone can check your work, challenge your assumptions, and build on what you did. In a world where data-driven decisions affect jobs, budgets, and opportunities, that kind of accountability is not optional. It is what professional integrity looks like in practice.

--- title: "R Markdown" --- # R Markdown {#rmarkdown} ```{r} library(knitr) opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE) ``` After an analysis is done, the next problem is communicating it. The default workflow — run code in R, copy numbers into Word, screenshot the charts, paste them into slides — falls apart the moment the data is updated. Every change forces a manual rebuild, and there is no good way to tell what is current versus stale. R Markdown solves this by combining the writing and the code in a single document. Render once and you get a polished HTML page, PDF, or Word document with all the charts, tables, and numbers regenerated from the live code. Update the data, render again, and the entire document rebuilds. ::: {.callout-note} ## Quarto vs R Markdown — what to use in 2026 R Markdown (`.Rmd`) is the original. **Quarto** (`.qmd`) is the next-generation framework Posit released in 2022; this book is written in Quarto. The differences for a beginner are minor — Quarto uses YAML chunk options (`#| label:`, `#| echo: false`) instead of the older inline form, supports more output formats out of the box, and has a more consistent CLI. Everything in this chapter applies to both. New projects should use Quarto; existing R Markdown projects work fine and do not need to be migrated. ::: ::: {.callout-warning} ## AI Pitfall: AI generates YAML headers that produce wrong output The YAML header at the top of an R Markdown or Quarto file controls everything from output format to figure dimensions to citation handling. AI assistants sometimes generate headers that work for the example they were shown but break for your context. Common failures: - **Mixing R Markdown and Quarto YAML.** AI may give you `output: html_document` (R Markdown style) when you are working in Quarto, where the equivalent is `format: html`. The render fails or produces unexpected results. - **Deprecated chunk options.** Older R Markdown chunk options like `fig.cap` and `fig.height` work in chunk headers, but in Quarto's pipe-comment style they become `fig-cap` and `fig-height` (note the hyphen instead of dot). AI sometimes mixes the two. - **Bibliography paths that do not exist.** AI confidently generates `bibliography: book.bib` even when no `book.bib` file exists in your project, and the render errors with a citation lookup failure. Always render the document immediately after creating or modifying YAML to catch these. Errors at render time are easy to diagnose; errors that produce wrong-looking output without erroring are not. ::: ## What is R Markdown? {#what-is-rmarkdown} An R Markdown file (`.Rmd`) is a document where prose and R code live in the same file. When you *render* (or *knit*) it, three things happen: 1. R runs all your code. 2. It grabs the output---tables, charts, printed results. 3. It stitches everything together into a finished document. The result can be an HTML page, a PDF, a Word doc, slides, or even a full website (this book is written in R Markdown!). Think of it as a self-updating report. You need two packages to make this work: ```{r} #| eval: false install.packages("rmarkdown") install.packages("knitr") ``` That's it. RStudio already has Pandoc (the engine that converts everything) built in, so you're good to go. ## The Three Components of an R Markdown Document {#three-components} Every `.Rmd` file has exactly three types of content: 1. **YAML header** -- The settings block at the top. Think of it as the "cover page setup." 2. **Markdown text** -- Your actual writing: paragraphs, headers, bullet points. 3. **Code chunks** -- The R code that produces your numbers and charts. Here is the simplest R Markdown document you can write: ```` --- title: "My First Report" author: "Your Name" date: "2024-01-15" output: html_document --- ## Introduction This is my first R Markdown document. ```{r}`r ''` summary(iris) ``` ```` Let's break each piece down. ## YAML Header {#yaml-header} The YAML header sits at the very top of your file, sandwiched between two lines of `---`. It tells R Markdown the basics: what's this document called, who wrote it, and what format should the output be? ### The Essentials Here is a perfectly good YAML header: ```` --- title: "Quarterly Sales Report" author: "Vivek H. Patil" date: "2024-09-01" output: html_document --- ```` That's really all you need. Four lines, and you're in business. **Pro tip:** Want the date to update automatically every time you knit? Use this: ```` date: "`r knitr::inline_expr('Sys.Date()')`" ```` Now your report always shows today's date. No more "wait, is this the March version or the April version?" ### Picking Your Output Format The three formats you'll actually use: | Format | YAML value | When to use it | |:-------|:-----------------|:----------------------------------------| | HTML | `html_document` | Day-to-day reports, sharing via email | | Word | `word_document` | When your boss *needs* it in Word | | PDF | `pdf_document` | Formal reports (requires LaTeX) | HTML is the default, and honestly, it's the best for most business use cases. It's interactive, looks great, and you can email it as a single file. ### Making It Look Good You can add a floating table of contents, pick a theme, and let readers show/hide code---all from the YAML header: ```` --- title: "Sales Analysis Q3 2024" author: "Vivek H. Patil" date: "`r knitr::inline_expr('format(Sys.Date(), \"%B %d, %Y\")')`" output: html_document: toc: true toc_float: true theme: flatly code_folding: hide --- ```` Here's what each of those options does: - **`toc: true`** -- Adds a table of contents (generated from your headers). - **`toc_float: true`** -- Makes the table of contents float on the side as you scroll. Very handy for long reports. - **`theme`** -- Changes the visual style. Try `cerulean`, `cosmo`, `flatly`, `journal`, `readable`, or `united`. - **`code_folding: hide`** -- Hides all code by default, but readers can click to reveal it. Perfect for reports where your manager doesn't want to see code but your analyst colleague does. That's really all the YAML you need to know. If you want to go deeper, Section \@ref(output-formats) has more detail. ## Markdown Syntax {#markdown-syntax} The writing part of your document uses Markdown---a simple way to format text without messing around in a toolbar. If you've ever used Slack formatting or Reddit, you've basically used Markdown. ### Headers Use `#` symbols to create headings. More `#` signs = smaller heading: ```` # Big Chapter Title ## Section ### Subsection ```` ### Bold, Italic, and Friends - `*italic*` gives you *italic* - `**bold**` gives you **bold** - `***bold italic***` gives you ***bold italic*** - `~~strikethrough~~` gives you ~~strikethrough~~ ### Lists Bullet points: ```` - Revenue grew 12% - Costs decreased 3% - Labor costs down 5% - Materials costs up 2% - Net profit up 15% ```` Which renders as: - Revenue grew 12% - Costs decreased 3% - Labor costs down 5% - Materials costs up 2% - Net profit up 15% Numbered lists: ```` 1. Pull the data 2. Clean the data 3. Analyze the data 4. Pretend the data was clean all along ```` ### Links ``` [The R Project](https://www.r-project.org/) ``` Renders as: [The R Project](https://www.r-project.org/) ### Tables You can type tables by hand using pipes and dashes: ```` | Region | Revenue | Growth | |:--------|--------:|:------:| | West | $45M | 12% | | East | $38M | 8% | | Central | $29M | -2% | ```` Renders as: | Region | Revenue | Growth | |:--------|--------:|:------:| | West | $45M | 12% | | East | $38M | 8% | | Central | $29M | -2% | The colons control alignment: `:---` is left, `---:` is right, `:---:` is centered. For real data-driven tables, you'll want to generate them from R---see Section \@ref(tables-in-rmarkdown). ### Blockquotes Prefix a line with `>`: ```` > "Revenue is vanity, profit is sanity, cash is reality." ```` Renders as: > "Revenue is vanity, profit is sanity, cash is reality." ### Math Equations Yes, you can write fancy equations if you need to. Most of you won't need this often, but if you're in a finance or econ class and your professor wants formulas, here's the gist: Inline math uses single dollar signs: `$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$` produces $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$. Display math uses double dollar signs: $$ \hat{\beta} = (X^T X)^{-1} X^T y $$ If you need more than that, Google "LaTeX math symbols" and you'll find everything. ## Code Chunks {#code-chunks} Code chunks are where the magic happens. This is the R code that actually does your analysis. ### Inserting a Code Chunk A code chunk starts with ```` ```{r} ```` and ends with ```` ``` ````. The keyboard shortcut in RStudio is **Ctrl+Alt+I** (Windows) or **Cmd+Option+I** (Mac). Use it. Your fingers will thank you. ```{r} head(mtcars) ``` ### Naming Your Chunks Give your chunks a name right after the `r`. It makes life easier when debugging ("Error in chunk `revenue-plot`" is way more helpful than "Error in unnamed-chunk-47"): ```` ```{r}`r ''` summary(mtcars$mpg) ``` ```` ```{r} summary(mtcars$mpg) ``` ### The Chunk Options You Actually Need There are dozens of chunk options, but here are the ones that matter for 95% of business reports: #### Show/Hide Code and Output | Option | Default | What it does | |:----------|:--------|:---------------------------------------------------| | `echo` | `TRUE` | Show the code in the report? | | `eval` | `TRUE` | Actually run the code? | | `include` | `TRUE` | Show *anything* (code + output) in the document? | **`echo=FALSE`** is your best friend for client-facing reports. It hides the code but shows the result: ```{r} mean(mtcars$mpg) ``` The number above came from `mean(mtcars$mpg)`, but the code is hidden. Your VP doesn't need to see R code. They need the answer. **`eval=FALSE`** shows code without running it. Good for "here's how you would install this package" examples: ```{r} #| eval: false install.packages("tidyverse") ``` **`include=FALSE`** runs the code silently---no code, no output in the report. Perfect for setup chunks where you load packages: ```` ```{r}`r ''` library(tidyverse) library(knitr) ``` ```` #### Silencing Messages and Warnings | Option | Default | What it does | |:----------|:--------|:------------------------------------------------| | `message` | `TRUE` | Show messages (like package loading info)? | | `warning` | `TRUE` | Show warnings? | When you load packages, R loves to tell you about every attached namespace and masked function. Nobody reading your report cares: ```{r} library(ggplot2) library(dplyr) ``` Set `message=FALSE` and `warning=FALSE` and enjoy the silence. #### Controlling Figure Size | Option | Default | What it does | |:-------------|:--------|:------------------------------------------| | `fig.width` | `7` | Width in inches | | `fig.height` | `5` | Height in inches | | `fig.cap` | `NULL` | Caption for the figure | | `fig.align` | `"default"` | Alignment: `"left"`, `"center"`, `"right"` | | `out.width` | `NULL` | Width in the output (e.g., `"80%"`) | ```{r} ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(size = 2) + labs(title = "Iris: Sepal Dimensions", x = "Sepal Length (cm)", y = "Sepal Width (cm)") + theme_minimal() ``` ### Global Chunk Options {#global-chunk-options} Tired of typing `message=FALSE, warning=FALSE` on every single chunk? Set defaults for the entire document in a setup chunk at the top: ```` ```{r}`r ''` knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, fig.align = "center" ) ``` ```` Any individual chunk can still override these. Think of it like setting company-wide defaults, with each department free to customize. ## Inline R Code {#inline-r-code} This is one of the most underrated features of R Markdown. You can drop R results right into your sentences. Instead of writing "the average MPG is 20.1" (and hoping you copied the right number), you write: ```` The average MPG is `r knitr::inline_expr('round(mean(mtcars$mpg), 1)')`. ```` And it renders as: The average MPG is `[R: round(mean(mtcars$mpg), 1)]`. Here's a more realistic example. Let's say you computed some key numbers: ```{r} avg_mpg <- mean(mtcars$mpg) max_hp <- max(mtcars$hp) n_cyl6 <- sum(mtcars$cyl == 6) ``` Now you can write sentences that update themselves: - The average fuel efficiency across all cars is `[R: round(avg_mpg, 1)]` miles per gallon. - The most powerful car has `[R: max_hp]` horsepower. - There are `[R: n_cyl6]` cars with 6 cylinders in the dataset. If the data changes, these numbers update automatically the next time you knit. No more copy-paste errors. No more "wait, did I update that number on slide 14?" This is the kind of thing that separates a good analyst from one who stays late fixing reports. ## Output Formats in Detail {#output-formats} ### HTML Documents HTML is the Swiss Army knife of output formats. Here's a full-featured setup: ```` --- output: html_document: toc: true toc_float: true number_sections: true theme: flatly code_folding: hide df_print: paged --- ```` - **`df_print: paged`** -- Makes data frames display as nice, paginated tables readers can click through. Way better than a wall of text. - **`self_contained: true`** (the default) -- Embeds everything into one HTML file. You can email it to anyone and it just works. No broken image links. ### PDF Documents PDF output looks polished and professional, but it needs a LaTeX installation. Easiest path: ```{r} #| eval: false install.packages("tinytex") tinytex::install_tinytex() ``` Then in your YAML: ```` --- output: pdf_document: toc: true number_sections: true --- ```` ### Word Documents When your stakeholder absolutely, positively needs a `.docx`: ```` --- output: word_document: toc: true reference_docx: my-styles.docx --- ```` The `reference_docx` option is clever---you hand it a Word file with your company's fonts and styles, and R Markdown applies them to the generated document. Brand guidelines? Handled. ## Tables in R Markdown {#tables-in-rmarkdown} Hand-typing tables in Markdown is fine for small stuff, but for real data, let R build your tables. ### knitr::kable() The simplest way to turn a data frame into a clean table: ```{r} kable(head(iris), caption = "First six rows of the iris dataset.") ``` You can control column names, alignment, and decimal places: ```{r} mtcars_summary <- mtcars %>% group_by(cyl) %>% summarize( Count = n(), Avg_MPG = mean(mpg), Avg_HP = mean(hp), Avg_WT = mean(wt) ) kable(mtcars_summary, digits = 2, col.names = c("Cylinders", "Count", "Avg MPG", "Avg HP", "Avg Weight"), align = c("c", "c", "r", "r", "r"), caption = "Summary statistics of mtcars by number of cylinders.") ``` ### kableExtra for Fancy Styling Want boardroom-ready tables? The `kableExtra` package has you covered: ```{r} #| eval: false install.packages("kableExtra") ``` ```{r} library(kableExtra) kable(mtcars_summary, digits = 2, col.names = c("Cylinders", "Count", "Avg MPG", "Avg HP", "Avg Weight"), caption = "Styled summary of mtcars by cylinder count.") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE, position = "center") %>% column_spec(1, bold = TRUE) %>% row_spec(0, bold = TRUE, color = "white", background = "#3366cc") ``` Striped rows, hover effects, bold headers with a branded color---this is the kind of table that makes people think you spent way more time than you actually did. ## Figures and Images {#figures-and-images} ### R-Generated Plots Any code chunk that produces a plot automatically includes it in your document. Control the size and add a caption: ```{r} ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) + geom_boxplot(show.legend = FALSE) + labs(x = "Number of Cylinders", y = "Miles per Gallon") + theme_minimal() + scale_fill_brewer(palette = "Set2") ``` Use `out.width` for percentage-based sizing, which usually works better than inches: ```{r} ggplot(iris, aes(x = Petal.Length, fill = Species)) + geom_density(alpha = 0.5) + labs(x = "Petal Length (cm)", y = "Density") + theme_minimal() ``` ### Side-by-Side Plots Want two charts next to each other? Use `fig.show="hold"` with `out.width="50%"`: ```{r} ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + labs(title = "MPG vs Weight") + theme_minimal() ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point() + labs(title = "MPG vs Horsepower") + theme_minimal() ``` ### External Images To include an image file (like a company logo or a screenshot), use `knitr::include_graphics()` inside a code chunk: ```{r} knitr::include_graphics("https://www.r-project.org/Rlogo.png") ``` ## Cross-Referencing {#cross-referencing} When your report gets longer than a few pages, you'll want to say things like "as shown in Figure 3" or "see Table 2." R Markdown (via bookdown) handles the numbering for you automatically. ### Figures To cross-reference a figure, your chunk needs two things: a **name** and a **`fig.cap`**. Then use `\@ref(fig:chunk-name)` in your text. For example, `\@ref(fig:iris-scatter)` refers to Figure \@ref(fig:iris-scatter). ### Tables Same idea: `\@ref(tab:chunk-name)` where the chunk has a `kable()` with a `caption`. Example: `\@ref(tab:kable-formatted)` refers to Table \@ref(tab:kable-formatted). ### Sections Give a section a label with `{#label}`, then reference it with `\@ref(label)`. The YAML header, for instance, is discussed in Section \@ref(yaml-header). ### Quick Reference | Element | How to label it | How to reference it | |:--------|:---------------------------------|:--------------------------------| | Figure | Chunk name + `fig.cap` | `\@ref(fig:chunk-name)` | | Table | Chunk name + `kable(caption=...)` | `\@ref(tab:chunk-name)` | | Section | `## Title {#label}` | `\@ref(label)` | ## Citations and Bibliography {#citations-bibliography} If you're writing a research paper or a thesis-style report, R Markdown handles citations. Add a `.bib` file to your YAML: ```` --- bibliography: references.bib link-citations: true --- ```` Then cite with `@key` (in-text) or `[@key]` (parenthetical). The bibliography appears at the end automatically. If your professor or journal requires citations, this will save you hours compared to doing it by hand. ## Rendering Documents {#rendering-documents} ### The Knit Button Click **Knit** at the top of the RStudio editor. That's it. Seriously. The dropdown arrow next to it lets you pick the output format if you have more than one in your YAML. ### rmarkdown::render() You can also render from the console, which is handy for automation: ```{r} #| eval: false rmarkdown::render("my-report.Rmd") ``` Override the output format: ```{r} #| eval: false rmarkdown::render("my-report.Rmd", output_format = "pdf_document") ``` ### What Happens Under the Hood When you click Knit: 1. **knitr** runs all your code chunks and produces a plain Markdown (`.md`) file. 2. **Pandoc** converts that `.md` file into your chosen output format. That's the whole pipeline. If you get an error, it's almost always in step 1 (your R code has a bug). Fix the code, knit again. ## Tips and Best Practices {#tips-and-best-practices} ### Start with a Setup Chunk Every document should begin with a setup chunk that loads packages and sets defaults. Think of it as "opening the store before customers arrive": ```` ```{r}`r ''` knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, fig.align = "center" ) library(tidyverse) library(knitr) ``` ```` ### Name Your Chunks "Error in unnamed-chunk-47" is about as helpful as a meeting that could have been an email. Name your chunks so you can actually find problems. ### Use `echo=FALSE` for Stakeholder Reports Your CFO does not need to see `library(dplyr)`. Use `echo=FALSE` (or `code_folding: hide` in the YAML) to keep reports clean while preserving your code. ### Use Parameterized Reports for Repetitive Work If you generate the same report for different regions, time periods, or product lines, use parameters: ```` --- title: "Regional Sales Report" params: region: "West" year: 2024 output: html_document --- ```` Then in your code, use `params$region` and `params$year`. Render different versions like this: ```{r} #| eval: false rmarkdown::render("report.Rmd", params = list(region = "East", year = 2023)) ``` One template, infinite reports. Your future self will be grateful. ### Keyboard Shortcuts Worth Memorizing | Windows/Linux | Mac | What it does | |:---------------------|:---------------------|:--------------------------| | Ctrl+Alt+I | Cmd+Option+I | Insert a new code chunk | | Ctrl+Shift+K | Cmd+Shift+K | Knit the document | | Ctrl+Shift+Enter | Cmd+Shift+Enter | Run the current chunk | | Ctrl+Enter | Cmd+Enter | Run the current line | ## Summary {#rmarkdown-summary} Here's what you now know how to do: - **Write a YAML header** that sets up your document's title, author, output format, and appearance. - **Use Markdown** to format text with headers, bold, italic, lists, links, and tables. - **Write code chunks** that run R code and control what shows up in the report with options like `echo`, `eval`, `message`, and `warning`. - **Use inline R code** to drop computed numbers directly into your sentences (no more copy-paste errors). - **Generate tables** with `knitr::kable()` and `kableExtra` that look presentation-ready. - **Include and size figures** with chunk options like `fig.width`, `fig.height`, and `out.width`. - **Cross-reference** figures, tables, and sections automatically. - **Render** to HTML, PDF, or Word with one click. The bottom line: R Markdown turns your analysis into a self-updating, professional report. The more you use it, the more time you save---and the fewer "can you update the numbers?" emails you'll get. And remember, you can always access the R Markdown cheat sheet from RStudio via **Help > Cheat Sheets > R Markdown Cheat Sheet**. One last thought. Reproducibility is not just a technical convenience --- it is a form of intellectual honesty. When your analysis is transparent and re-runnable, anyone can check your work, challenge your assumptions, and build on what you did. In a world where data-driven decisions affect jobs, budgets, and opportunities, that kind of accountability is not optional. It is what professional integrity looks like in practice.

20 R Markdown

20.1 What is R Markdown?

20.2 The Three Components of an R Markdown Document

20.3 YAML Header

20.3.1 The Essentials

20.3.2 Picking Your Output Format

20.3.3 Making It Look Good

20.4 Markdown Syntax

20.4.1 Headers

20.4.2 Bold, Italic, and Friends

20.4.3 Lists

20.4.4 Links

20.4.5 Tables

20.4.6 Blockquotes

20.4.7 Math Equations

20.5 Code Chunks

20.5.1 Inserting a Code Chunk

20.5.2 Naming Your Chunks

20.5.3 The Chunk Options You Actually Need

20.5.3.1 Show/Hide Code and Output

20.5.3.2 Silencing Messages and Warnings

20.5.3.3 Controlling Figure Size

20.5.4 Global Chunk Options

20.6 Inline R Code

20.7 Output Formats in Detail

20.7.1 HTML Documents

20.7.2 PDF Documents

20.7.3 Word Documents

20.8 Tables in R Markdown

20.8.1 knitr::kable()

20.8.2 kableExtra for Fancy Styling

20.9 Figures and Images

20.9.1 R-Generated Plots

20.9.2 Side-by-Side Plots

20.9.3 External Images

20.10 Cross-Referencing

20.10.1 Figures

20.10.2 Tables

20.10.3 Sections

20.10.4 Quick Reference

20.11 Citations and Bibliography

20.12 Rendering Documents

20.12.1 The Knit Button

20.12.2 rmarkdown::render()

20.12.3 What Happens Under the Hood

20.13 Tips and Best Practices

20.13.1 Start with a Setup Chunk

20.13.2 Name Your Chunks

20.13.3 Use echo=FALSE for Stakeholder Reports

20.13.4 Use Parameterized Reports for Repetitive Work

20.13.5 Keyboard Shortcuts Worth Memorizing

20.14 Summary

20.13.3 Use `echo=FALSE` for Stakeholder Reports