Two-Way ANOVA

PSYC 640 - Fall 2023

Dustin Haraden, PhD

Last Class

  • One Way ANOVA
    • Comparing means across multiple groups/levels

Looking Ahead

  • R-Workshop! (Link to Sign up)
    • 11/3 & 12/1 from 2-3pm
  • Professor will put together guidelines for the final project and have dates to complete various components
  • Lab 3 - Starting Today will be due before class on 11/6
    • Will be able to resubmit for any missed items up to 1 week from when I post the initial feedback
  • Correlation & Regression

Today…

One-Way ANOVA

  • Contrasts/post-hoc tests

Two-Way ANOVA

# File management
library(here)
# for dplyr, ggplot2
library(tidyverse)
#Loading data
library(rio)
# Estimating Marginal Means
library(emmeans)
# Pretty Tables
library(kableExtra)

#Remove Scientific Notation 
options(scipen=999)

Review of One-Way ANOVA

One-Way ANOVA

Goal: Inform of differences among the levels of our variable of interest (Omnibus Test)

Using the between and within group variance to create the \(F\)-statistic/ratio

Hypotheses:

\[ H_0: it\: is\: true\: that\: \mu_1 = \mu_2 = \mu_3 =\: ...\mu_k \\ H_1: it\: is\: \boldsymbol{not}\: true\: that\: \mu_1 = \mu_2 = \mu_3 =\: ...\mu_k \]

\(F = \frac{MS_{between}}{MS_{within}} = \frac{small}{large} < 1\)

\(F = \frac{MS_{between}}{MS_{within}} = \frac{large}{small} > 1\)

Review of the NHST process

  1. Collect Sample and define hypotheses

  2. Set alpha level

  3. Determine the sampling distribution (will be using \(F\)-distribution now)

  4. Identify the critical value

  5. Calculate test statistic for sample collected

  6. Inspect & compare statistic to critical value; Calculate probability

Steps to calculating F-ratio

  1. Variance to Sum of Squares (Between & Within)
  2. Degrees of Freedom
  3. Mean squares values
  4. F-Statistic

Calculating the F-Statistic

\[F = \frac{MS_b}{MS_w}\]

If the null hypothesis is true, \(F\) has an expected value close to 1 (numerator and denominator are estimates of the same variability)

If it is false, the numerator will likely be larger, because systematic, between-group differences contribute to the variance of the means, but not to variance within group.

Code
data.frame(F = c(0,8)) %>%
  ggplot(aes(x = F)) +
  stat_function(fun = function(x) df(x, df1 = 3, df2 = 196), 
                geom = "line") +
  stat_function(fun = function(x) df(x, df1 = 3, df2 = 196), 
                geom = "area", xlim = c(2.65, 8), fill = "purple") +
  geom_vline(aes(xintercept = 2.65), color = "purple") +
  scale_y_continuous("Density") + scale_x_continuous("F statistic", breaks = NULL) +
  theme_bw(base_size = 20)

If data are normally distributed, then the variance is \(\chi^2\) distributed

\(F\)-distributions are one-tailed tests. Recall that we’re interested in how far away our test statistic from the null \((F = 1).\)

ANOVA table

Source of Variation df Sum of Squares Mean Squares F-statistic p-value
Group \(G-1\) \(SS_b\) \(MS_b = \frac{SS_b}{df_b}\) \(F = \frac{MS_b}{MS_w}\) \(p\)
Residual \(N-G\) \(SS_w\) \(MS_w = \frac{SS_w}{df_w}\)
Total \(N-1\) \(SS_{total}\)

Contrasts/Post-Hoc

Contrasts/Post-Hoc Tests

Performed when there is a significant difference among the groups to examine which groups are different

  1. Contrasts: When we have a priori hypotheses
  2. Post-hoc Tests: When we want to test everything

These comparisons take the general form of t-tests, but note some extensions:

  • the pooled variance estimate comes from \(SS_{\text{residual}}\), meaning it pulls information from all groups

  • the degrees of freedom for the \(t\)-test is \(N-k\), so using all data

Previous Spooky Data

EMF rating across multiple locations

spooky <- import("https://raw.githubusercontent.com/dharaden/dharaden.github.io/main/data/SS%20Calculations.csv") %>% 
  select(Location, EMF) #There were some extra empty variables in there that we don't care about

fit_1 <- aov(EMF ~ Location, data = spooky)
summary(fit_1)
             Df Sum Sq Mean Sq F value              Pr(>F)    
Location      7   5115   730.7   523.5 <0.0000000000000002 ***
Residuals   138    193     1.4                                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests

#library(emmeans)
emmeans(fit_1, pairwise ~ Location, adjust = "none")
$emmeans
 Location          emmean    SE  df lower.CL upper.CL
 Abandoned Walmart  11.42 0.278 138    10.87    11.97
 Gettysburg         21.23 0.278 138    20.68    21.78
 Ikea               16.27 0.278 138    15.72    16.82
 RIT Tunnels         7.51 0.278 138     6.96     8.06
 Stats Classroom    12.35 0.271 138    11.82    12.89
 The Woods          25.01 0.278 138    24.46    25.56
 Unused Stairwell   14.23 0.278 138    13.68    14.78
 Walmart             6.81 0.271 138     6.28     7.35

Confidence level used: 0.95 

$contrasts
 contrast                             estimate    SE  df t.ratio p.value
 Abandoned Walmart - Gettysburg         -9.809 0.394 138 -24.910  <.0001
 Abandoned Walmart - Ikea               -4.847 0.394 138 -12.308  <.0001
 Abandoned Walmart - RIT Tunnels         3.913 0.394 138   9.937  <.0001
 Abandoned Walmart - Stats Classroom    -0.932 0.389 138  -2.399  0.0178
 Abandoned Walmart - The Woods         -13.587 0.394 138 -34.501  <.0001
 Abandoned Walmart - Unused Stairwell   -2.809 0.394 138  -7.134  <.0001
 Abandoned Walmart - Walmart             4.607 0.389 138  11.857  <.0001
 Gettysburg - Ikea                       4.962 0.394 138  12.601  <.0001
 Gettysburg - RIT Tunnels               13.723 0.394 138  34.847  <.0001
 Gettysburg - Stats Classroom            8.877 0.389 138  22.845  <.0001
 Gettysburg - The Woods                 -3.777 0.394 138  -9.592  <.0001
 Gettysburg - Unused Stairwell           7.000 0.394 138  17.775  <.0001
 Gettysburg - Walmart                   14.417 0.389 138  37.101  <.0001
 Ikea - RIT Tunnels                      8.760 0.394 138  22.246  <.0001
 Ikea - Stats Classroom                  3.915 0.389 138  10.074  <.0001
 Ikea - The Woods                       -8.740 0.394 138 -22.193  <.0001
 Ikea - Unused Stairwell                 2.038 0.394 138   5.174  <.0001
 Ikea - Walmart                          9.454 0.389 138  24.330  <.0001
 RIT Tunnels - Stats Classroom          -4.846 0.389 138 -12.470  <.0001
 RIT Tunnels - The Woods               -17.500 0.394 138 -44.439  <.0001
 RIT Tunnels - Unused Stairwell         -6.723 0.394 138 -17.072  <.0001
 RIT Tunnels - Walmart                   0.694 0.389 138   1.786  0.0763
 Stats Classroom - The Woods           -12.654 0.389 138 -32.565  <.0001
 Stats Classroom - Unused Stairwell     -1.877 0.389 138  -4.831  <.0001
 Stats Classroom - Walmart               5.540 0.383 138  14.453  <.0001
 The Woods - Unused Stairwell           10.777 0.394 138  27.367  <.0001
 The Woods - Walmart                    18.194 0.389 138  46.821  <.0001
 Unused Stairwell - Walmart              7.417 0.389 138  19.087  <.0001

Family-wise error

These pairwise comparisons can quickly grow in number as the number of Groups increases. With 8 (k) Groups, we have k(k-1)/2 = 28 possible pairwise comparisons.

As the number of groups in the ANOVA grows, the number of possible pairwise comparisons increases dramatically.

As the number of tests grows, and assuming the null hypothesis is true, the probability that we will make one or more Type I errors increases. To approximate the magnitude of the problem, we can assume that the multiple pairwise comparisons are independent. The probability that we don’t make a Type I error for one test is:

\[P(\text{No Type I}, 1 \text{ test}) = 1-\alpha\]

The probability that we don’t make a Type I error for two tests is:

\[P(\text{No Type I}, 2 \text{ test}) = (1-\alpha)(1-\alpha)\]

For C tests, the probability that we make no Type I errors is

\[P(\text{No Type I}, C \text{ tests}) = (1-\alpha)^C\]

We can then use the following to calculate the probability that we make one or more Type I errors in a collection of C independent tests.

\[P(\text{At least one Type I}, C \text{ tests}) = 1-(1-\alpha)^C\]

The Type I error inflation that accompanies multiple comparisons motivates the large number of “correction” procedures that have been developed.

Multiple comparisons, each tested with \(\alpha_{per-test}\), increases the family-wise \(\alpha\) level.

\[\large \alpha_{family-wise} = 1 - (1-\alpha_{per-test})^C\] Šidák showed that the family-wise a could be controlled to a desired level (e.g., .05) by changing the \(\alpha_{per-test}\) to:

\[\large \alpha_{per-wise} = 1 - (1-\alpha_{family-wise})^{\frac{1}{C}}\]

Bonferroni

Bonferroni (and Dunn, and others) suggested this simple approximation:

\[\large \alpha_{per-test} = \frac{\alpha_{family-wise}}{C}\]

This is typically called the Bonferroni correction and is very often used even though better alternatives are available.

emmeans(fit_1, pairwise ~ Location, adjust = "bonferroni")
$emmeans
 Location          emmean    SE  df lower.CL upper.CL
 Abandoned Walmart  11.42 0.278 138    10.87    11.97
 Gettysburg         21.23 0.278 138    20.68    21.78
 Ikea               16.27 0.278 138    15.72    16.82
 RIT Tunnels         7.51 0.278 138     6.96     8.06
 Stats Classroom    12.35 0.271 138    11.82    12.89
 The Woods          25.01 0.278 138    24.46    25.56
 Unused Stairwell   14.23 0.278 138    13.68    14.78
 Walmart             6.81 0.271 138     6.28     7.35

Confidence level used: 0.95 

$contrasts
 contrast                             estimate    SE  df t.ratio p.value
 Abandoned Walmart - Gettysburg         -9.809 0.394 138 -24.910  <.0001
 Abandoned Walmart - Ikea               -4.847 0.394 138 -12.308  <.0001
 Abandoned Walmart - RIT Tunnels         3.913 0.394 138   9.937  <.0001
 Abandoned Walmart - Stats Classroom    -0.932 0.389 138  -2.399  0.4972
 Abandoned Walmart - The Woods         -13.587 0.394 138 -34.501  <.0001
 Abandoned Walmart - Unused Stairwell   -2.809 0.394 138  -7.134  <.0001
 Abandoned Walmart - Walmart             4.607 0.389 138  11.857  <.0001
 Gettysburg - Ikea                       4.962 0.394 138  12.601  <.0001
 Gettysburg - RIT Tunnels               13.723 0.394 138  34.847  <.0001
 Gettysburg - Stats Classroom            8.877 0.389 138  22.845  <.0001
 Gettysburg - The Woods                 -3.777 0.394 138  -9.592  <.0001
 Gettysburg - Unused Stairwell           7.000 0.394 138  17.775  <.0001
 Gettysburg - Walmart                   14.417 0.389 138  37.101  <.0001
 Ikea - RIT Tunnels                      8.760 0.394 138  22.246  <.0001
 Ikea - Stats Classroom                  3.915 0.389 138  10.074  <.0001
 Ikea - The Woods                       -8.740 0.394 138 -22.193  <.0001
 Ikea - Unused Stairwell                 2.038 0.394 138   5.174  <.0001
 Ikea - Walmart                          9.454 0.389 138  24.330  <.0001
 RIT Tunnels - Stats Classroom          -4.846 0.389 138 -12.470  <.0001
 RIT Tunnels - The Woods               -17.500 0.394 138 -44.439  <.0001
 RIT Tunnels - Unused Stairwell         -6.723 0.394 138 -17.072  <.0001
 RIT Tunnels - Walmart                   0.694 0.389 138   1.786  1.0000
 Stats Classroom - The Woods           -12.654 0.389 138 -32.565  <.0001
 Stats Classroom - Unused Stairwell     -1.877 0.389 138  -4.831  0.0001
 Stats Classroom - Walmart               5.540 0.383 138  14.453  <.0001
 The Woods - Unused Stairwell           10.777 0.394 138  27.367  <.0001
 The Woods - Walmart                    18.194 0.389 138  46.821  <.0001
 Unused Stairwell - Walmart              7.417 0.389 138  19.087  <.0001

P value adjustment: bonferroni method for 28 tests 

The Bonferroni procedure is conservative. Other correction procedures have been developed that control family-wise Type I error at .05 but that are more powerful than the Bonferroni procedure. The most common one is the Holm procedure.

The Holm procedure does not make a constant adjustment to each per-test \(\alpha\). Instead it makes adjustments in stages depending on the relative size of each pairwise p-value.

Holm correction

  1. Rank order the p-values from largest to smallest.
  2. Start with the smallest p-value. Multiply it by its rank.
  3. Go to the next smallest p-value. Multiply it by its rank. If the result is larger than the adjusted p-value of next smallest rank, keep it. Otherwise replace with the previous step adjusted p-value.
  4. Repeat Step 3 for the remaining p-values.
  5. Judge significance of each new p-value against \(\alpha = .05\).
Original p value Rank Rank x p Holm Bonferroni
0.0012 6 0.0072 0.0072 0.0072
0.0023 5 0.0115 0.0115 0.0138
0.0450 4 0.1800 0.1800 0.2700
0.0470 3 0.1410 0.1800 0.2820
0.0530 2 0.1060 0.1800 0.3180
0.2100 1 0.2100 0.2100 1.0000
emmeans(fit_1, pairwise ~ Location, adjust = "holm")
$emmeans
 Location          emmean    SE  df lower.CL upper.CL
 Abandoned Walmart  11.42 0.278 138    10.87    11.97
 Gettysburg         21.23 0.278 138    20.68    21.78
 Ikea               16.27 0.278 138    15.72    16.82
 RIT Tunnels         7.51 0.278 138     6.96     8.06
 Stats Classroom    12.35 0.271 138    11.82    12.89
 The Woods          25.01 0.278 138    24.46    25.56
 Unused Stairwell   14.23 0.278 138    13.68    14.78
 Walmart             6.81 0.271 138     6.28     7.35

Confidence level used: 0.95 

$contrasts
 contrast                             estimate    SE  df t.ratio p.value
 Abandoned Walmart - Gettysburg         -9.809 0.394 138 -24.910  <.0001
 Abandoned Walmart - Ikea               -4.847 0.394 138 -12.308  <.0001
 Abandoned Walmart - RIT Tunnels         3.913 0.394 138   9.937  <.0001
 Abandoned Walmart - Stats Classroom    -0.932 0.389 138  -2.399  0.0355
 Abandoned Walmart - The Woods         -13.587 0.394 138 -34.501  <.0001
 Abandoned Walmart - Unused Stairwell   -2.809 0.394 138  -7.134  <.0001
 Abandoned Walmart - Walmart             4.607 0.389 138  11.857  <.0001
 Gettysburg - Ikea                       4.962 0.394 138  12.601  <.0001
 Gettysburg - RIT Tunnels               13.723 0.394 138  34.847  <.0001
 Gettysburg - Stats Classroom            8.877 0.389 138  22.845  <.0001
 Gettysburg - The Woods                 -3.777 0.394 138  -9.592  <.0001
 Gettysburg - Unused Stairwell           7.000 0.394 138  17.775  <.0001
 Gettysburg - Walmart                   14.417 0.389 138  37.101  <.0001
 Ikea - RIT Tunnels                      8.760 0.394 138  22.246  <.0001
 Ikea - Stats Classroom                  3.915 0.389 138  10.074  <.0001
 Ikea - The Woods                       -8.740 0.394 138 -22.193  <.0001
 Ikea - Unused Stairwell                 2.038 0.394 138   5.174  <.0001
 Ikea - Walmart                          9.454 0.389 138  24.330  <.0001
 RIT Tunnels - Stats Classroom          -4.846 0.389 138 -12.470  <.0001
 RIT Tunnels - The Woods               -17.500 0.394 138 -44.439  <.0001
 RIT Tunnels - Unused Stairwell         -6.723 0.394 138 -17.072  <.0001
 RIT Tunnels - Walmart                   0.694 0.389 138   1.786  0.0763
 Stats Classroom - The Woods           -12.654 0.389 138 -32.565  <.0001
 Stats Classroom - Unused Stairwell     -1.877 0.389 138  -4.831  <.0001
 Stats Classroom - Walmart               5.540 0.383 138  14.453  <.0001
 The Woods - Unused Stairwell           10.777 0.394 138  27.367  <.0001
 The Woods - Walmart                    18.194 0.389 138  46.821  <.0001
 Unused Stairwell - Walmart              7.417 0.389 138  19.087  <.0001

P value adjustment: holm method for 28 tests 

ANOVA is regression

In regression, we can accommodate categorical predictors. How does this compare to ANOVA?

  • Same omnibus test of the model!

*(Really the same model, but packaged differently.)

  • When would you use one versus the other?

ANOVA

  • More traditional for 3+ groups

  • Comparing/controlling multiple categorical variables

Regression

  • Best for two groups

  • Incorporating continuous predictors too

  • Good for 3+ groups when you have more specific hypotheses (contrasts)

Two-Way ANOVA

What is a Two-Way ANOVA?

Examines the impact of 2 nominal/categorical variables on a continuous outcome

We can now examine:

  • The impact of variable 1 on the outcome (Main Effect)

  • The impact of variable 2 on the outcome (Main Effect)

  • The interaction of variable 1 & 2 on the outcome (Interaction Effect)

    • The effect of variable 1 depends on the level of variable 2

Two-Way ANOVA: Assumptions

Same as what we’ve examined previously, plus a couple more:

  • Independence

  • Normality of Residuals

  • Homoscedasticity (Homogeneity of Variance)

  • Additivity

    • The effects of each factor are consistent across all levels of the other factor
  • Multicollinearity

    • Correlations between factors. This can make it difficult to separate unique contributions to the outcome
  • Equal Cell Sizes (N)

Main Effect & Interactions

Main Effect: Basically a one-way ANOVA

  • The effect of variable 1 is the same across all levels of variable 2

Interaction:

  • Able to examine the effect of variable 1 across different levels of variable 2

  • Basically speaking, the effect of variable 1 on our outcome DEPENDS on the levels of variable 2

Example Data

Data

We are interested on the impact of phone usage on overall sleep quality

We include 2 variables of interest: 1) Phone Type (iOS vs. Android) and 2) Amount of usage (High, Medium & Low) to examine if there are differences in terms of sleep quality

Note: It is important to consider HOW we operationalize constructs as some things we have as factors could easily be continuous variables

Code
#Generate some data

# Set random seed for reproducibility
set.seed(42)

n <- 500  # Set number of observations

# Generate Type of Phone data
phone_type <- sample(c("Android", "iOS"), 
                     n, 
                     replace = TRUE)

# Generate Phone Usage data
phone_usage <- factor(sample(c("Low", "Medium", "High"), 
                             n, 
                             replace = TRUE), 
                      levels= c("Low", "Medium", "High"))

# Generate Sleep Quality data (with some variation)
# Intentionally inflating to highlight main effects
sleep_quality <- round(
  rnorm(n, mean = ifelse(phone_type == "Android", 5, 7) + ifelse(phone_usage == "High", 1, -1), sd = 1),
  1
)

# Generate Sleep Quality data (with some variation)
sleep_quality2 <- round(rnorm(n, mean = 6, sd = 3),1)

# Create a data frame
sleep_data <- data.frame(phone_type, 
                         phone_usage, 
                         sleep_quality,
                         sleep_quality2)

head(sleep_data)
  phone_type phone_usage sleep_quality sleep_quality2
1    Android        High           5.0           12.5
2    Android         Low           3.8           12.8
3    Android      Medium           3.9            8.7
4    Android      Medium           3.3            5.8
5        iOS         Low           5.6            8.0
6        iOS        High           7.8            3.6

Test Statistics

We’ve gone too far today without me showing some equations

With one way anova, we calculated the \(SS_{between}\) and the \(SS_{within}\) and were able to use those to capture the F-statistic

Now we have another variable to take into account. Therefore, we need to calculate:

\[ SS_{between Group1}, \: SS_{between Group2} \]

\[ SS_{within}, \: SS_{total} \]

\(F\)-Statistic/Ratio

Helpful site for hand calculations: (link)

  • Since we have these various sum of squares, we can then fill out the ANOVA table

  • This also means that we will have multiple F-Statistics

Source of Variation df Sum of Squares Mean Squares F-statistic p-value
Group1 \(j-1\) \(SS_{b1}\) \(MS_{b1} = \frac{SS_{b1}}{df_{b1}}\) \(F_1 = \frac{MS_{b1}}{MS_{w1}}\) \(p_1\)
Group2 \(k-1\) \(SS_{b2}\) \(MS_{b2} = \frac{SS_{b2}}{df_{b2}}\) \(F_2 = \frac{MS_{b2}}{MS_{w2}}\) \(p_2\)
Interaction \(df_1*df_2\)

\(SS_{int}\)

\(MS_{b3} = \frac{SS_{b3}}{df_{b3}}\) \(F_2 = \frac{MS_{b3}}{MS_{w3}}\) \(p_3\)
Residual (within) \(N-G\) \(SS_w\) \(MS_w = \frac{SS_w}{df_w}\)
Total \(N-1\) \(SS_{total}\)

Calculate \(SS_{interaction}\)

Now we need to be able to take into account the interaction term

This is done by calculating all other \(SS\) and then performing:

\[ SS_{interaction} = SS_{total} - SS_{b1} - SS_{b2} - SS_{w} \]

Example: Calculate the \(SS\) for the new data

  • no…we aren’t doing that

Running in R

Take a peak at the dataset again so we can look at the variables and their names

head(sleep_data)
  phone_type phone_usage sleep_quality sleep_quality2
1    Android        High           5.0           12.5
2    Android         Low           3.8           12.8
3    Android      Medium           3.9            8.7
4    Android      Medium           3.3            5.8
5        iOS         Low           5.6            8.0
6        iOS        High           7.8            3.6

Running in R

We will use the aov() function to set up our model

fit2 <- aov(sleep_quality ~ phone_type + phone_usage + phone_type*phone_usage, 
            data = sleep_data)

#OR 

fit2 <- aov(sleep_quality ~ phone_type * phone_usage, 
            data = sleep_data)

summary(fit2)
                        Df Sum Sq Mean Sq F value              Pr(>F)    
phone_type               1  544.6   544.6 515.727 <0.0000000000000002 ***
phone_usage              2  498.7   249.4 236.148 <0.0000000000000002 ***
phone_type:phone_usage   2    0.7     0.3   0.317               0.728    
Residuals              494  521.6     1.1                                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
ggplot(sleep_data, aes(x = phone_usage, 
                       y = sleep_quality, 
                       color = phone_type, 
                       group = phone_type)) +
  geom_point() +
  geom_smooth(method = "lm",
              se=FALSE) +
  labs(x = "Phone Usage", 
       y = "Sleep Quality", 
       color = "Phone Type") +
  theme_minimal()

Different Outcome

Created the outcome of sleep_quality2 to be completely random instead of following a formula so that we could visualize the effect

fit3 <- aov(sleep_quality2 ~ phone_type * phone_usage, 
            data = sleep_data)
summary(fit3)
                        Df Sum Sq Mean Sq F value Pr(>F)   
phone_type               1     86   85.65   9.162 0.0026 **
phone_usage              2     56   27.83   2.977 0.0519 . 
phone_type:phone_usage   2      8    4.18   0.447 0.6399   
Residuals              494   4618    9.35                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Create an interaction plot using ggplot2
ggplot(sleep_data, aes(x = phone_usage, y = sleep_quality2, color = phone_type, group = phone_type)) +
  geom_point() +
  geom_smooth(method = "lm",
              se=FALSE) +
  labs(x = "Phone Usage", y = "Sleep Quality", color = "Phone Type") +
  theme_minimal()

Next time…

  • ¯\_(ツ)_/¯

  • Correlation & Regression