Lesson 37: ANOVA II


Project Day Expectations – Tuesday, 5 May (Lesson 38)

ImportantWhat you must do BEFORE class on 5 May
  1. Email your non-cadet guest the formal invitation – deadline is today (Thursday).
    • CC or BCC dusty.turner@westpoint.edu.
    • Include title, classroom (TH339), date, and class hour.
  2. Print every slide in COLOR, one slide per page.
  3. Tape your slides to the board / wall before class begins.
    • Arrive early – have everything posted before guests walk in.
    • Order your slides left-to-right so a reader can walk the wall like a poster.
  4. Stand by your display for the entire hour. Guests circulate; you don’t.
Section 12 May
0730–1100
13 May
0730–1100
15 May
1300–1630
A2 2 0 15
B2 1 0 17
C2 3 1 12

See Lesson 36 for the full section roster by date.


What We Did: Lesson 35 (ANOVA I)

  • Pairwise \(t\)-tests inflate the family-wise false-positive rate (1 - 0.95^6 ≈ 26.5% for 4 groups)
  • One-way ANOVA asks one question – “do any group means differ?” – with a single test at level \(\alpha\)
  • \(H_0: \mu_1 = \mu_2 = \cdots = \mu_k\) vs. \(H_a:\) at least one differs
  • \(F = MSTr / MSE\) on \((k-1,\ N-k)\) df; reject when \(F\) is large
  • ANOVA tells you that groups differ, not which pairs

What We’re Doing: Lesson 37

Objectives

  • Continue one-way ANOVA: assumptions, effect sizes, and reporting
  • Use multiple comparisons (Tukey HSD) to identify which group pairs differ while controlling the family-wise error rate

Required Reading

Devore, Section 10.2 (focus on multiple comparisons)


Break!

In Memoriam: My Classmates

9:11


The Takeaway for Today

ImportantToday’s Key Ideas
  • One-way ANOVA answers “do any group means differ?” – but it does not say which.
  • Running all \(\binom{k}{2}\) pairwise \(t\)-tests inflates the family-wise false-positive rate.
  • Tukey’s HSD tests every pair while holding the family-wise error rate at \(\alpha\).
  • Read Tukey output as CIs – if the CI excludes 0, that pair differs.

One-Way ANOVA Refresher

Today’s scenario: a brigade S-3 wants to know whether Land Navigation completion times (minutes) differ across the four cadet regiments (1st, 2nd, 3rd, 4th). We have 25 cadets sampled from each.

  • \(H_0:\ \mu_1 = \mu_2 = \mu_3 = \mu_4\)
  • \(H_a:\) at least one regiment mean differs
  • Test statistic: \(F = MSTr / MSE\) on \((k - 1,\ N - k)\) df

The Two Pieces: Signal and Noise

ANOVA is a signal-to-noise ratio. The two pieces:

ImportantMSTr – Mean Square Treatment (between-group, the signal)

How far apart are the group means from the grand mean? When the groups really differ, this is large.

\[ SSTr = \sum_{i=1}^{k} n_i (\bar{x}_i - \bar{x})^2 \qquad MSTr = \frac{SSTr}{k - 1} \]

This is the betweenness – it captures variation between the groups.

ImportantMSE – Mean Square Error (within-group, the noise)

How spread out is the data within each group, around its own group mean? This is the background noise that’s there even if the groups are identical.

\[ SSE = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_i)^2 \qquad MSE = \frac{SSE}{N - k} \]

This is the withinness – it captures variation within each group.

The Ratio

\[ F = \frac{MSTr}{MSE} = \frac{\text{between-group variability}}{\text{within-group variability}} = \frac{\text{signal}}{\text{noise}} \]

  • \(F\) near 1 → between-group spread looks like ordinary noise → consistent with \(H_0\)
  • \(F\) much larger than 1 → between-group spread is bigger than noise can explain → reject \(H_0\)

Computing F by Hand

Before we let R do it, let’s build the F statistic ourselves from the four regiment means. With \(n_i = 25\), \(N = 100\), \(k = 4\):

Step 1 – group means and grand mean. From the data,

Regiment \(\bar{x}_i\)
1st 154.72
2nd 171.65
3rd 164.40
4th 181.36

Grand mean: \(\bar{x} = 168.03\).

Step 2 – SSTr (between groups, the signal).

\[ SSTr = \sum_{i=1}^{4} n_i (\bar{x}_i - \bar{x})^2 \approx 9527.4 \]

On \(k - 1 = 3\) df, \(MSTr = 9527.4 / 3 \approx 3175.8\).

Step 3 – SSE (within groups, the noise).

\[ SSE = \sum_{i=1}^{4} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_i)^2 \approx 27,286.5 \]

On \(N - k = 96\) df, \(MSE = 27,286.5 / 96 \approx 284.2\).

Step 4 – assemble F.

\[ F = \frac{MSTr}{MSE} = \frac{3175.8}{284.2} \approx 11.17 \quad\text{on }(3,\,96)\text{ df.} \]

The p-value is the area to the right of \(F = 11.17\) under the \(F_{3,\,96}\) density:

So \(p \approx 0.0000\) – well below \(\alpha = 0.05\), so we reject \(H_0\): at least one regiment mean differs.

Confirm with R

Now let R do the same calculation:

fit_aov <- aov(time ~ regiment, data = landnav)
summary(fit_aov)
            Df Sum Sq Mean Sq F value     Pr(>F)    
regiment     3   9527    3176   11.17 0.00000236 ***
Residuals   96  27286     284                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Same \(F_{3,\,96} \approx 11.17\), same \(p \approx 0.0000\). We reject \(H_0\): at least one regiment’s mean Land Nav time differs from the others.

But that’s all ANOVA tells us. We still don’t know which regiments differ – if any specific pair actually does. Rejecting \(H_0\) only says “somewhere in there, at least one mean is different.” The pair could be 1st vs. 4th, or 3rd vs. 2nd, or some other combination – ANOVA is silent on that.

That’s the gap we close today, with Tukey’s HSD.


Tukey’s HSD: Which Pairs Actually Differ?

Tukey’s HSD (Honestly Significant Difference) compares every pair of group means while holding the family-wise error rate at \(\alpha\).

The procedure:

  1. Run ANOVA. (You should reject \(H_0\) first – otherwise there’s nothing to dissect.)
  2. For each pair \((i, j)\), compute the difference \(\bar{x}_i - \bar{x}_j\) and a CI based on the studentized range distribution and \(MSE\) from the ANOVA.
  3. Any pair whose CI excludes 0 is significantly different at the family-wise \(\alpha\) level.

For a balanced design with \(J\) observations per group and \(I\) groups, the family-wise \((1-\alpha)\) confidence interval for \(\mu_i - \mu_j\) (Devore §10.2) is

\[ \bar{X}_i - \bar{X}_j \;\pm\; Q_{\alpha,\, I,\, I(J-1)} \cdot \sqrt{\frac{MSE}{J}} \]

\(Q\) is the upper-\(\alpha\) critical value of the studentized range distribution – a separate tabulated distribution (like \(t\) or \(F\)) built specifically for “what’s the largest gap among \(I\) group means, scaled by their standard error?” With \(I = 4\), \(J = 25\), \(MSE \approx 284.2\), \(\alpha = 0.05\): \(Q_{0.05,\,4,\,96} \approx 3.70\), giving a margin of \(\approx 12.47\) minutes. Any pair of regiment means more than that far apart is significantly different at the family-wise 5% level.

In R, TukeyHSD() takes the fitted aov object:

TukeyHSD(fit_aov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = time ~ regiment, data = landnav)

$regiment
             diff        lwr       upr     p adj
2nd-1st 16.934419   4.466633 29.402205 0.0032893
3rd-1st  9.680381  -2.787405 22.148167 0.1842393
4th-1st 26.637740  14.169954 39.105526 0.0000013
3rd-2nd -7.254038 -19.721824  5.213748 0.4289106
4th-2nd  9.703321  -2.764465 22.171107 0.1825279
4th-3rd 16.957359   4.489573 29.425145 0.0032380

For each pair you get:

  • diff – sample mean difference \(\bar{x}_i - \bar{x}_j\)
  • lwr, upr – family-wise 95% CI for the true difference
  • p adj – Tukey-adjusted p-value
TipHow to read it
  • If the CI excludes 0 (equivalently, p adj < 0.05), the pair is significantly different.
  • If the CI includes 0, the data don’t separate that pair.

Visualizing Tukey’s HSD

Any interval crossing the dashed zero line is not a significant pair. Intervals fully to one side of zero are.

Reporting It

After Tukey’s HSD, write the result like this:

A one-way ANOVA found a significant difference in mean Land Nav completion time across the four regiments (\(F_{3, 96} \approx 11.2\), \(p < 0.001\)). Tukey’s HSD identified three significantly different pairs at the family-wise 5% level: 2nd vs. 1st, 4th vs. 1st, and 4th vs. 3rd. Operationally, 1st Regiment was the fastest and 4th Regiment was the slowest; the gap between them is roughly 27 minutes (95% family-wise CI excludes 0). The 3rd-vs-1st, 3rd-vs-2nd, and 4th-vs-2nd pairs were not significantly different.


Worked Example: Coffee Shop Wait Times

A regional coffee chain wants to know if mean wait time (seconds) differs across its three store formats: drive-thru only, in-store only, and hybrid. They time 30 random orders at each.

Step 1 – ANOVA.

fit_coffee <- aov(wait ~ format, data = coffee)
summary(fit_coffee)
            Df Sum Sq Mean Sq F value         Pr(>F)    
format       2  42609   21304   27.95 0.000000000423 ***
Residuals   87  66322     762                           
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We reject \(H_0\) at \(\alpha = 0.05\): at least one format’s mean wait differs.

Step 2 – Tukey HSD at 90% confidence (\(\alpha = 0.10\)).

Suppose the chain is willing to tolerate a higher false-positive rate in exchange for tighter intervals. Pass conf.level = 0.90 to TukeyHSD():

TukeyHSD(fit_coffee, conf.level = 0.90)
  Tukey multiple comparisons of means
    90% family-wise confidence level

Fit: aov(formula = wait ~ format, data = coffee)

$format
                         diff       lwr        upr     p adj
In-store-Drive-thru  40.04380  25.21776  54.869846 0.0000007
Hybrid-Drive-thru   -10.43791 -25.26396   4.388135 0.3131373
Hybrid-In-store     -50.48171 -65.30776 -35.655665 0.0000000

Read it. With the 90% family-wise CIs, In-store vs. Drive-thru and In-store vs. Hybrid still exclude 0 (those pairs differ even at this looser level). Drive-thru vs. Hybrid still straddles 0 – the gap isn’t detectable even when we relax to \(\alpha = 0.10\).

Report.

A one-way ANOVA found a significant difference in mean wait time across the three store formats (\(F_{2,87} \approx 28\), \(p < 0.001\)). At a family-wise 90% confidence level (\(\alpha = 0.10\)), Tukey’s HSD shows In-store is slower than both Drive-thru (\(\approx 40\) sec) and Hybrid (\(\approx 50\) sec), while Drive-thru and Hybrid remain statistically indistinguishable (90% CI for the difference still includes 0). Operationally: if reducing wait time is the goal, in-store ordering – not the choice between drive-thru and hybrid – is where to focus.


Board Problems

Problem 1: Tukey HSD Output

TukeyHSD() for a one-way ANOVA on three platoons returns:

              diff     lwr     upr   p adj
B - A        4.20   -1.10    9.50   0.146
C - A        9.80    4.50   15.10   0.001
C - B        5.60    0.30   10.90   0.038
  1. Which pairs differ at the family-wise 5% level?
  2. Construct the “underline” or “letter” summary: which means group together?
  1. C vs. A (\(p = 0.001\)) and C vs. B (\(p = 0.038\)). B vs. A does not.

  2. Letters: A and B share a group; C is its own group. \[ \underline{\bar{x}_A \quad \bar{x}_B} \qquad \bar{x}_C \]


Before You Leave

Today

  • One-way ANOVA tells you that at least one mean differs – not which.
  • Tukey’s HSD compares every pair while holding family-wise error at \(\alpha\).
  • Read the output as CIs: any interval excluding 0 is a significantly different pair.

Any questions?


Next Lesson

Lesson 38: Project Presentations

  • Tuesday, 5 May 2026 – walk-around presentations
  • Slides printed in color and taped to the board before class begins
  • Non-cadet guest invited (with me CC/BCC’d)
  • Stand by your display; guests circulate

Upcoming Graded Events

  • Project Presentation5 May 2026 (Lesson 38)
  • Tech Report (final) – see Canvas
  • TEE – 12, 13, 15 May 2026 (per section schedule)