---
title: "Relationships Between Variables"
---
# Relationships Between Variables {#multivar}
```{r}
library(tidyverse)
library(plotly)
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
```
Most of the questions that matter in analysis live between variables. Single-variable analysis tells you what the data looks like in isolation; multivariate analysis tells you how the pieces relate. This is where you go from describing data to understanding it, and where the questions that drive real decisions get answered:
- Does higher marketing spend actually lead to more sales, or are we just throwing money into the void? (Correlation)
- Do male and female employees earn different salaries? (t-test)
- Does customer satisfaction differ across our three service tiers? (ANOVA)
- Is there a relationship between customer region and product preference? (Chi-square test of independence)
This chapter covers bivariate and multivariate analysis --- fancy words for "looking at two or more variables at the same time and seeing what shakes out." We use visualizations throughout. For the full ggplot2 experience, see Chapters \@ref(ggplot2basics) and \@ref(ggplot2customize).
## Between continuous variables: Correlations
Correlation is the workhorse of "do these two things move together?" It produces a single number between -1 and +1 that tells you the strength and direction of a linear relationship. Here is the cheat sheet:
- **+1** = perfect positive relationship. When one goes up, the other goes up in lockstep. (Think: hours studied and exam scores.)
- **-1** = perfect negative relationship. When one goes up, the other goes down like a seesaw. (Think: price of a product and quantity demanded --- Econ 101 vibes.)
- **0** = no linear relationship whatsoever. These two variables could not care less about each other. (Think: your shoe size and your GPA.)
In business, correlations are everywhere. A marketing analyst might correlate ad spend with website traffic. An operations manager might correlate warehouse staffing levels with order fulfillment speed. A finance analyst might correlate interest rates with loan applications. The correlation coefficient gives you a quick read on whether two things are related --- and how strongly --- before you invest time in deeper analysis.
Here is a rough guide for interpreting correlation strength (these are rules of thumb, not hard boundaries):
| Absolute Value of r | Interpretation |
|:---------------------|:---------------|
| 0.00 -- 0.19 | Very weak |
| 0.20 -- 0.39 | Weak |
| 0.40 -- 0.59 | Moderate |
| 0.60 -- 0.79 | Strong |
| 0.80 -- 1.00 | Very strong |
And one critical warning you will hear a hundred times in your career: **correlation does not imply causation.** Just because two things move together does not mean one causes the other. Ice cream sales and drowning deaths are positively correlated --- not because ice cream is dangerous, but because both go up in the summer. Always think about whether a lurking third variable (a "confound") could be driving both.
### Plotting
The best way to start is with scatterplots. Below are several examples showing what different correlation values actually look like in practice. Notice how the dots cluster tighter around the blue line as the correlation gets stronger (closer to -1 or +1), and scatter into a shapeless cloud as it approaches 0. A correlation of 0.003 is basically "I drew dots at random and drew a line through them for fun":
```{r}
#| eval: false
load("data/corrdata.Rda")
corrdata_long <- corrdata %>%
pivot_longer(cols = -NumVar1, names_to = "variable", values_to = "value") %>%
mutate(variable = recode(variable,
"NumVar2" = "Correlation of -0.65",
"NumVar3" = "Correlation of -0.16",
"NumVar4" = "Correlation of 0.003",
"NumVar5" = "Correlation of 0.20",
"NumVar6" = "Correlation of 0.69"
))
ggplot(corrdata_long, aes(x = NumVar1, y = value)) +
geom_point(color = "red", shape = 18) +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
facet_wrap(~ variable, scales = "free_y", ncol = 3) +
labs(x = "Numeric Variable 1", y = "Numeric Variable 2") +
theme_minimal()
```
Look at how the visual pattern changes as the correlation number changes. At -0.65, the points form a clear downward-sloping cloud --- when one variable goes up, the other tends to go down. At -0.16 and 0.20, the relationship is so weak that you can barely see a trend; the dots look almost random. At 0.003, there is literally no pattern --- the blue line is nearly flat. At 0.69, a strong upward trend re-emerges. These panels give you visual intuition for what the numbers mean in practice.
### Correlations between two numeric variables
A few assumptions to keep in mind before running a correlation: variables should be on an interval or ratio scale (numbers, not categories), the relationship should be roughly linear (not curved), extreme outliers should not be driving the result, and the distributions should not be wildly skewed. The Pearson correlation, which is what `cor()` returns by default, is sensitive to all four. When those assumptions break, Spearman's rank correlation (`cor(x, y, method = "spearman")`) is more robust.
::: {.callout-warning}
## AI Pitfall: AI runs Pearson correlation on whatever you give it
Two failure modes worth flagging:
**1. Non-linear relationships.** Ask an AI assistant for "the correlation between price and demand" and you reliably get `cor(df$price, df$demand)` — Pearson by default. If the underlying relationship is curved (which most price-demand relationships are), Pearson reports a number close to zero even when the variables are tightly related. Always plot the scatter first. If the cloud bends, switch to `method = "spearman"` for a rank-based correlation that captures monotonic but non-linear relationships.
**2. Outliers hijacking the result.** A single extreme observation can inflate or deflate Pearson's r by 0.3 or more. AI does not warn you about this. The verification: after computing the correlation, run `cor(x, y)` and `cor(x[-which.max(abs(x - mean(x)))], y[-which.max(abs(x - mean(x)))])` (or just plot and look) to see if removing one extreme value changes the answer materially. If it does, you have an outlier-driven result, not a population-level relationship.
:::
Let us plot Sepal.Length against Petal.Length. In a business context, think of this as plotting "years of experience" against "salary" or "social media followers" against "monthly revenue."
```{r}
p_scatter <- ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) +
geom_point(color = "red") +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(x = "Sepal Length", y = "Petal Length",
title = "Sepal Length vs Petal Length") +
theme_minimal()
ggplotly(p_scatter)
```
The scatterplot shows a clear positive relationship --- as Sepal.Length increases, Petal.Length tends to increase as well. The blue **regression line** (also called a "line of best fit") is the straight line that comes closest to all the data points; it summarizes the overall trend in one stroke. The gray band around it is a **confidence band** --- it shows the range of uncertainty around the line. A narrow band (like this one) means the trend is reliable; a wide band means more uncertainty. In business terms, if these were "years of experience" and "salary," you would see that more experience is strongly associated with higher pay.
We can color by species to reveal group structure --- and this is a genuinely important technique. Sometimes the overall trend masks completely different patterns within subgroups. Imagine overall company revenue is flat, but when you break it down, the East region is booming while the West is tanking. The aggregate number hid the real story. Same idea here.
This matters beyond iris flowers. Aggregate numbers can mask real disparities --- in pay, in health outcomes, in loan approval rates. Disaggregating by demographic group is how you find out whether a system that looks fair on average is actually working for everyone. A commitment to equity in data work starts with something deceptively simple: looking at the right level of detail.
```{r}
p_species <- ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Sepal Length", y = "Petal Length",
title = "Sepal Length vs Petal Length by Species") +
theme_minimal()
ggplotly(p_species)
```
Hover over any point to see its exact measurements and species. Interactive charts like this are useful when you want to identify specific data points --- something static plots cannot do. Try clicking a species name in the legend to show or hide that group.
Now let us put an actual number on it. The Pearson correlation coefficient:
```{r}
cor(iris$Sepal.Length,iris$Petal.Length)
```
The result is approximately 0.87 --- a very strong positive correlation (refer to the table above). This means that as Sepal.Length increases, Petal.Length almost always increases too, and the relationship is tight. In business terms, if these two variables were "marketing spend" and "revenue," a correlation of 0.87 would make the marketing team very happy.
But here is the question that separates a good analyst from a great one: is it *statistically significant*, or could we have gotten a number like this purely by chance? The `cor.test()` function answers that. Testing whether the observed correlation coefficient is different from 0:
```{r}
cor.test(iris$Sepal.Length,iris$Petal.Length)
```
**How to read this output:**
- **t = 21.646**: The test statistic. It measures how many standard errors the observed correlation is away from zero. A t-value this large is overwhelming evidence.
- **df = 148**: Degrees of freedom (sample size minus 2 for a correlation test).
- **p-value < 2.2e-16**: Essentially zero. If there truly were no relationship between these variables, the chance of observing a correlation this strong is astronomically small. We reject the null hypothesis that the correlation is zero.
- **95 percent confidence interval: 0.827 to 0.906**: We are 95% confident the true population correlation falls in this range. The entire interval is above 0.8, confirming a very strong relationship.
- **sample estimate (cor = 0.872)**: The correlation coefficient we computed.
We can also do a one-sided test --- testing whether the correlation is *greater* than 0. This is useful when you already have a directional hypothesis going in, like "I expect ad spend and revenue to be positively correlated" (if they were negatively correlated, your marketing department has some explaining to do):
```{r}
cor.test(iris$Sepal.Length,iris$Petal.Length,alternative = "greater")
```
The one-sided test gives an even smaller p-value and a 95% confidence interval that starts at 0.836. The conclusion is the same: this is a strong, statistically significant positive relationship.
### Paired Samples - comparing two-means
Sometimes you want to compare two measurements taken on the *same* subjects. This is the classic "before and after" scenario, and it shows up everywhere in business. Did the training program actually improve test scores? Did the website redesign change time-on-page? Did the new pricing strategy affect average basket size? The key word here is *same* subjects, measured twice.
In our data, we have measurements of writing-hand span and non-writing-hand span for the same students. Think of this as comparing an employee's performance review score before and after a development program --- same people, two time points:
```{r}
survey <- MASS::survey
summary(survey[,c(2,3)])
```
The `summary()` output shows the min, median, mean, and max for each hand span, along with the count of missing values. Both hand spans have similar means (around 18.7 cm), but we need a formal test to determine if the small difference is real or just noise.
A paired t-test checks whether the average difference between the two measurements is statistically significant. It is smarter than just comparing two separate groups because it accounts for person-to-person variability:
```{r}
t.test(survey$Wr.Hnd,survey$NW.Hnd,paired = TRUE)
```
**How to read this output:**
- **t = 0.948**: The test statistic. It measures the average difference between writing-hand and non-writing-hand span, scaled by the variability of those differences. A t-value close to zero (like 0.95) suggests the difference is small relative to the noise.
- **df = 235**: Degrees of freedom (number of complete pairs minus 1).
- **p-value = 0.344**: This is well above .05. We fail to reject the null hypothesis --- there is no statistically significant difference between writing-hand and non-writing-hand spans.
- **95 percent confidence interval: -0.055 to 0.157**: The interval includes zero, which is consistent with the high p-value. If zero is inside the CI, the difference could plausibly be zero.
- **mean difference = 0.051**: The average difference between the two measurements. It is tiny --- about half a millimeter.
Bottom line: writing-hand and non-writing-hand spans are essentially the same. In a business context, if this were a "before and after" test of a training program, this result would mean the program had no measurable effect.
### Relationships between more than two numeric variables
When you have a bunch of numeric variables and you want to see all their pairwise relationships at once, a correlation matrix is your best friend. It is the "at-a-glance" table that lets you quickly spot which pairs of variables are strongly related. In a business setting, you might run this on metrics like revenue, profit margin, customer count, marketing spend, and employee count to see which ones move together:
```{r}
cor(iris[,1:4])
```
**Reading the correlation matrix:** Each cell shows the Pearson correlation between two variables. The diagonal is always 1.00 (every variable is perfectly correlated with itself). The matrix is symmetric --- the value in row 1, column 3 is the same as row 3, column 1. Scan for the largest absolute values to find the strongest relationships. Here, Petal.Length and Petal.Width have a very strong correlation (about 0.96), and Sepal.Length and Petal.Length are also very strong (0.87). Sepal.Width has weak or negative correlations with the other variables --- it marches to its own beat.
A correlation matrix is great, but how do you know which of those numbers are statistically significant versus just noise? The `rcorr()` function from the **Hmisc** package gives you the correlation coefficients *and* their p-values side by side. This is the "show me the receipts" version --- essential when you need to back up your findings in a report:
```{r}
# Correlations with significance levels
library(Hmisc)
rcorr(as.matrix(iris[,1:4]), type="pearson") # type can be pearson or spearman
```
```{r}
detach("package:Hmisc", unload = TRUE)
```
**Reading this output:** The `rcorr()` function prints three tables. The first table shows the correlation coefficients (same as before). The second shows the sample size (n) for each pair --- typically the same for all pairs unless there are missing values. The third table shows p-values. A p-value of 0 (or very close to it) means the correlation is statistically significant. Here, almost all pairwise correlations are significant. The one exception to watch for is Sepal.Width with other variables --- the correlations are weaker, and you should check whether the p-values still clear your significance threshold.
## Relationships between factor variables
When both your variables are categorical --- think "customer region" and "product preference," or "department" and "preferred communication channel" --- you cannot compute a correlation. (You cannot take the average of "Northeast" and "Midwest." Well, you can try, but R will not be happy about it.) Instead, you use cross-tabulations and the chi-square test of independence to figure out whether the two categories are related or just doing their own thing.
Let us use the `survey` dataset and focus on the `Sex` and `Exer` (exercise frequency) variables. In a business context, this is exactly the kind of analysis you would do to determine whether customer demographics (gender, age group, income bracket) are related to behavioral variables (purchase channel, product preference, loyalty program participation).
```{r}
str(survey)
```
```{r}
summary(survey$Sex)
```
```{r}
summary(survey$Exer)
```
A cross-tab (or cross-tabulation) is the bread and butter of categorical analysis. It shows the joint frequencies of two variables in a neat table --- basically a contingency table. If you have ever built a pivot table in Excel (and if you are a business student, you have built approximately one thousand of them), this is the R equivalent:
```{r}
crosstab <- table(survey$Exer, survey$Sex)
crosstab
```
**Reading the cross-tab:** Each cell shows the count of observations for that combination. For example, 49 females exercise frequently, compared to 65 males. Both sexes have "Freq" as their most common category, and "None" is the smallest group for both. The question is whether the *proportional* breakdown differs between sexes --- and that is what the chi-square test will formally check.
The tidyverse way of building a cross-tab uses `count()` and `pivot_wider()`. It reads more like a recipe than a math equation, which is always a win for readability:
```{r}
survey %>%
filter(!is.na(Sex), !is.na(Exer)) %>%
count(Sex, Exer) %>%
pivot_wider(names_from = Sex, values_from = n, values_fill = 0) %>%
DT::datatable(options = list(pageLength = 5, dom = 't'),
caption = "Cross-tabulation of Exercise Frequency by Sex.")
```
We could also plot that data in the form of stacked or dodged bar graphs. These are the kinds of charts that show up in every quarterly business review presentation, so get comfortable making them.
Dodged bar chart --- bars side by side for easy comparison. This is the "which group is bigger?" view:
```{r}
p_dodge <- survey %>%
filter(!is.na(Sex), !is.na(Exer)) %>%
ggplot(aes(x = Sex, fill = Exer)) +
geom_bar(position = "dodge") +
labs(title = "Exercise Frequency by Sex", y = "Count", fill = "Exercise") +
theme_minimal()
ggplotly(p_dodge)
```
Proportional (filled) bar chart --- great for when you care about the *percentage* breakdown rather than raw counts. This answers "what share of each group falls into each category?":
```{r}
survey %>%
filter(!is.na(Sex), !is.na(Exer)) %>%
ggplot(aes(x = Sex, fill = Exer)) +
geom_bar(position = "fill") +
labs(title = "Proportion of Exercise Frequency by Sex", y = "Proportion", fill = "Exercise") +
scale_y_continuous(labels = scales::percent) +
theme_minimal()
```
The dodged bar chart shows that both males and females exercise frequently, with "Freq" being the most common category for both groups. The proportional chart makes it easier to compare: the split between exercise levels looks similar across sexes, suggesting that exercise habits may not differ much by gender in this sample.
OK, so the charts look like there might be a pattern. But "looks like" is not good enough for a business decision. We need a formal test. The chi-square test of independence tells us whether the two variables are actually related or whether any apparent pattern in the chart is just random noise doing random noise things.
An HR analyst might use this exact test to determine whether participation in the company wellness program differs by department. A marketer might test whether response to a campaign varies by customer segment.
The null hypothesis is that the two variables are independent (no relationship). The alternative is that they are related:
```{r}
chisq.test(crosstab)
```
**How to read this output:**
- **X-squared = 5.7186**: The chi-square test statistic. It sums up how far each observed cell count is from the count you would expect if the variables were completely independent. Larger values mean bigger deviations from independence.
- **df = 2**: Degrees of freedom, calculated as (number of rows - 1) x (number of columns - 1). Here, (3-1) x (2-1) = 2.
- **p-value = 0.05731**: Just barely above .05. This is a borderline result --- tantalizingly close to significance but not quite there.
Interpretation: the p-value of .057 is above the typical .05 threshold, which means we do not have strong enough evidence to claim a relationship between sex and exercise frequency. The variables appear to be independent of each other. In business terms, if "sex" were "customer segment" and "exercise" were "product preference," you would conclude that product preference does not significantly differ across segments --- at least not based on this data. And that is a perfectly valid finding. Sometimes the most useful answer is "nope, those two things are not connected."
## Between factor and continuous variables
This is arguably the most common analytical scenario in the business world: you have groups (departments, product lines, regions, customer segments, pricing tiers) and you want to know if a numeric outcome (revenue, satisfaction score, response time, conversion rate) differs across those groups. Are some groups genuinely outperforming others, or is the variation just statistical noise?
### Two levels of a factor level and one continuous variable - t-test
When you have exactly two groups and one numeric variable, the independent samples t-test is your tool. This is the "are these two groups *really* different, or does it just look that way?" test. It shows up everywhere:
- An HR analyst asks: "Do employees with MBAs earn more than those without?"
- A product manager asks: "Do users on the annual plan have higher engagement than month-to-month users?"
- A marketing analyst asks: "Did Campaign A produce higher average order values than Campaign B?"
Let us check the mean height of different genders in the `survey` data.
```{r}
survey %>%
filter(!is.na(Sex), !is.na(Height)) %>%
group_by(Sex) %>%
dplyr::summarize(mean_height = mean(Height))
```
The numbers look different, but --- and this is critical --- "looks different" and "is statistically different" are not the same thing. (This distinction has saved many analysts from embarrassing PowerPoint corrections.) Let us visualize the distributions to get a better feel.
```{r}
survey %>%
filter(!is.na(Sex), !is.na(Height)) %>%
ggplot(aes(x = Sex, y = Height, fill = Sex)) +
geom_boxplot() +
geom_jitter(width = 0.2, alpha = 0.3) +
labs(title = "Height Distribution by Sex", y = "Height (cm)") +
theme_minimal() +
theme(legend.position = "none")
```
The boxplot shows a clear separation: males have a higher median height (around 178 cm) compared to females (around 165 cm). The IQRs barely overlap, and the jittered points confirm the pattern is not driven by a few outliers. Both distributions look roughly symmetric, which is good news for the t-test assumptions.
Males appear taller than females in the charts, but "appear" is a weasel word. Is the difference statistically significant, or could it be due to the luck of the draw in our particular sample? Time to bring in the t-test. We are testing whether the mean height of females is less than the mean height of males, with a 99% confidence level (because we want to be really sure):
```{r}
t.test(survey$Height ~ survey$Sex, alternative = 'less', conf.level = .99, var.equal = FALSE)
```
**How to read this output:**
- **t = -12.12**: The test statistic is large and negative. It is negative because the first group (Female) has a lower mean than the second group (Male), and we specified `alternative = 'less'`. The magnitude (12.12) is very large, meaning the difference is many standard errors away from zero.
- **df = 196.7**: Degrees of freedom. Because we set `var.equal = FALSE`, R uses the Welch approximation, which can produce non-integer degrees of freedom.
- **p-value < 2.2e-16**: Essentially zero. The evidence against the null hypothesis is overwhelming.
- **99 percent confidence interval: -Inf to -10.03**: We are 99% confident the true difference in means (Female - Male) is at most -10.03 cm. Since the upper bound is well below zero, the difference is not just statistically significant --- it is large.
- **sample estimates (mean Female = 165.7, mean Male = 178.8)**: A difference of about 13 cm.
The result confirms that mean height of females is indeed less than the mean height of males. In business terms, this level of confidence means you can walk into the meeting and say "yes, these two groups are genuinely different" without anyone being able to poke holes in your analysis. That is a good feeling.
### More than two levels of a factor variable and one continuous variable - One-Way ANOVA
But what happens when you have *more* than two groups? Maybe you have three customer segments, four regional offices, or five product lines, and you want to know if the average outcome differs across them. You might be tempted to just run a bunch of t-tests --- segment A vs. B, A vs. C, B vs. C --- but statisticians will send you strongly worded emails about something called "inflated Type I error rates." (Translation: you will find "significant" differences that are actually just noise.)
Instead, you use ANOVA --- Analysis of Variance. Despite the name (which sounds like it is about variance), it is really about comparing group means. ANOVA answers one clear question: "Is at least one group's average different from the others?" It does not tell you *which* group is different --- just that a difference exists somewhere. Think of it as a smoke detector: it tells you there is a fire, but not which room it is in.
Let us check the mean height of people who engage in different levels of exercising.
```{r}
survey %>%
filter(!is.na(Exer), !is.na(Height)) %>%
group_by(Exer) %>%
dplyr::summarize(
n = n(),
mean_height = mean(Height),
sd_height = sd(Height)
)
```
The table shows that the "Freq" group has the highest mean height (around 172 cm), followed by "Some" (around 170 cm), and "None" (around 167 cm). The standard deviations are similar across groups (around 9--10 cm), which is a good sign for ANOVA assumptions. The group sizes are unequal --- "Freq" has the most observations --- but ANOVA can handle that.
```{r}
survey %>%
filter(!is.na(Exer), !is.na(Height)) %>%
ggplot(aes(x = Exer, y = Height, fill = Exer)) +
geom_boxplot() +
geom_jitter(width = 0.2, alpha = 0.3) +
labs(title = "Height by Exercise Frequency",
x = "Exercise Level", y = "Height (cm)") +
theme_minimal() +
theme(legend.position = "none")
```
The boxplots show that the "Freq" group has a slightly higher median height than "Some," and "Some" has a slightly higher median than "None." The spread (IQR) is fairly similar across groups. There are a few outliers in each group, but nothing extreme. The differences look modest --- which is exactly why we need a formal test instead of eyeballing it.
Interestingly, we find that those who engage in frequent exercise have a higher mean height than those who engage in some exercise. In a similar vein, those who engage in some exercise seem to have a higher mean height than those who do not engage in any exercise. But, are these differences statistically significant? Since we have more than two groups, it is ANOVA time.
The null hypothesis is that the means of all groups are the same (boring world, no differences). The alternative is that at least one mean is different from the others (interesting world, something is going on):
```{r}
results <- aov(survey$Height ~ survey$Exer)
summary(results)
```
**How to read the ANOVA table:**
- **Df (Degrees of Freedom)**: The first row (`survey$Exer`) shows df = 2 (number of groups minus 1: 3 - 1 = 2). The second row (`Residuals`) shows the remaining degrees of freedom (total observations minus number of groups).
- **Sum Sq (Sum of Squares)**: Measures variability. The `survey$Exer` row shows how much variability in height is explained by exercise group differences. The `Residuals` row shows the remaining unexplained variability.
- **Mean Sq (Mean Square)**: Sum of Squares divided by its degrees of freedom. This gives an "average" variability per degree of freedom.
- **F value = 5.72**: The ratio of the between-group variability to the within-group variability. A large F means the group differences are large relative to the noise within groups. An F of 1.0 would mean no group differences at all.
- **Pr(>F) = 0.00354**: The p-value. This is the probability of getting an F-value this large if all three group means were truly equal. At 0.0035, this is well below .01.
The findings suggest that we have evidence that at least one of the three means is different from the others. (The p-value of .00354 indicates significance at the .01 level --- this is not a fluke.) In a business context, if these three groups were pricing strategies and the outcome was revenue, you would now know that pricing strategy *matters* --- at least one approach is producing meaningfully different results. **But ANOVA does not tell you which specific groups differ.** It just sounds the alarm.
So naturally, the next question is: OK, *which* groups are different from each other? Is "Frequent" different from "None"? Is "Some" different from "Frequent"? Are all three different from each other? For that detective work, we use Tukey's HSD (Honestly Significant Difference) test. It compares every possible pair of group means and tells you exactly where the statistically significant differences live:
```{r}
TukeyHSD(results, conf.level = .99)
```
**How to read this output:** Each row compares two groups. The columns are:
- **diff**: The difference in group means. For example, if "Freq-None" shows a diff of about 5.0, it means the "Freq" group's average height is about 5 cm higher than the "None" group's.
- **lwr and upr**: The lower and upper bounds of the 99% confidence interval for the difference. If this interval does *not* include zero, the difference is statistically significant at the 99% level.
- **p adj**: The adjusted p-value. "Adjusted" because Tukey's method accounts for the fact that we are making multiple comparisons at once (which inflates the chance of false positives). Look for p adj values below .01 (since we set `conf.level = .99`).
Check each row: if the confidence interval for a pair includes zero, those two groups are not significantly different. If it excludes zero, they are. This tells you exactly which exercise groups have meaningfully different average heights --- and by how much.