PSYC 640 - Fall 2023
Plan to have 2 more labs that will be similar to the last lab
Outside of these labs, I am going to plan on having additional mini-labs
Introduction to ANOVA (Analysis of Variance)
One-Way ANOVA
Two-Way ANOVA
Repeated Measures ANOVA
ANCOVA
MANOVA
Goal: Inform of differences among the levels of our variable of interest (Omnibus Test)
Hypotheses:
\[ H_0: it\: is\: true\: that\: \mu_1 = \mu_2 = \mu_3 =\: ...\mu_k \\ H_1: it\: is\: \boldsymbol{not}\: true\: that\: \mu_1 = \mu_2 = \mu_3 =\: ...\mu_k \]
We are using the variance to create a ratio (within group versus between group variance) to determine differences in means
\[ F_{df_b, \: df_w} = \frac{MS_{between}}{MS_{within}} \]
\(F = \frac{MS_{between}}{MS_{within}} = \frac{small}{large} < 1\)
\(F = \frac{MS_{between}}{MS_{within}} = \frac{large}{small} > 1\)
Independence
Homogeneity of Variance
Normality
Collect Sample and define hypotheses
Set alpha level
Determine the sampling distribution (will be using \(F\)-distribution now)
Identify the critical value
Calculate test statistic for sample collected
Inspect & compare statistic to critical value; Calculate probability
We have calculated variance before!
\[ Var = \frac{1}{N}\sum(x_i - \bar{x})^2 \]
Now we have to take into account the variance between and within the groups:
\[ Var(Y) = \frac{1}{N} \sum^G_{k=1}\sum^{N_k}_{i=i}(Y_{ik} - \bar{Y})^2 \]
Notice that we have the summation across each group ( \(G\) ) and the person in the group ( \(N_k\) )
Total Sum of Squares - Adding up the sum of squares instead of getting the average (notice the removal of \(\frac{1}{N}\))
\[ SS_{total} = \sum^G_{k=1}\sum^{N_k}_{i=i}(Y_{ik} - \bar{Y})^2 \]
Can be broken up to see what is the variation between the groups AND the variation within the groups
\[ SS_{total}=SS_{between}+SS_{within} \]
This gets us closer to understanding the difference between means
\[ SS_{total}=SS_{between}+SS_{within} \]
The difference between the group mean and grand mean
\[ SS_{between} = \sum^G_{k=1}N_k(\bar{Y_k} - \bar{Y})^2 \]
Group | Group Mean \(\bar{Y_k}\) | Grand Mean \(\bar{Y}\) |
---|---|---|
Cool | 32 | 41.8 |
Uncool | 56.5 | 41.8 |
The difference between the group mean and grand mean
\[ SS_{between} = \sum^G_{k=1}N_k(\bar{Y_k} - \bar{Y})^2 \]
Group | Group Mean \(\bar{Y_k}\) | Grand Mean \(\bar{Y}\) | Sq. Dev. | N | Weighted Sq. Dev. |
---|---|---|---|---|---|
Cool | 32 | 41.8 | 96.04 | 3 | 288.12 |
Uncool | 56.5 | 41.8 | 216.09 | 2 | 432.18 |
The difference between the group mean and grand mean
\[ SS_{between} = \sum^G_{k=1}N_k(\bar{Y_k} - \bar{Y})^2 \]
Now we can sum the Weighted Squared Deviations together to get our Sum of Squares Between:
The difference between the individual and their group mean
\[ SS_{within} = \sum^G_{k=1}\sum^{N_k}_{i=i}(Y_{ik} - \bar{Y_k})^2 \]
Name | Grumpiness \(Y_{ik}\) | Group Mean \(\bar{Y_K}\) |
---|---|---|
Frodo | 20 | 32 |
Sam | 55 | 32 |
Bandit | 21 | 32 |
Dolores U. | 91 | 56.5 |
Dustin | 22 | 56.5 |
The difference between the individual and their group mean
\[ SS_{within} = \sum^G_{k=1}\sum^{N_k}_{i=i}(Y_{ik} - \bar{Y_k})^2 \]
Name | Grumpiness \(Y_{ik}\) | Group Mean \(\bar{Y_K}\) | Sq. Dev |
---|---|---|---|
Frodo | 20 | 32 | 144 |
Sam | 55 | 32 | 529 |
Bandit | 21 | 32 | 121 |
Dolores U. | 91 | 56.5 | 1190.25 |
Dustin | 22 | 56.5 | 1190.25 |
The difference between the individual and their group mean
\[ SS_{within} = \sum^G_{k=1}\sum^{N_k}_{i=i}(Y_{ik} - \bar{Y_k})^2 \] Now we can sum the Squared Deviations together to get our Sum of Squares Within:
Can start to have an idea of what this looks like
\[ SS_{between} = \sum^G_{k=1}N_k(\bar{Y_k} - \bar{Y})^2 = 720.3 \]
\[ SS_{within} = \sum^G_{k=1}\sum^{N_k}_{i=i}(Y_{ik} - \bar{Y_k})^2 = 3174.5 \]
Next we have to take into account the degrees of freedom
Since we have 2 types of variations that we are examining, this needs to be reflected in the degrees of freedom
Take the number of groups and subtract 1
\(df_{between} = G - 1\)
Take the total number of observations and subtract the number of groups
\(df_{within} = N - G\)
Next we convert our summed squares value into a “mean squares”
This is done by dividing by the respective degrees of freedom
\[ MS_b = \frac{SS_b}{df_b} \]
\[ MS_W = \frac{SS_w}{df_w} \]
Let’s take a look at how this applies to our example: \[ MS_b = \frac{SS_b}{G-1} = \frac{720.3}{2-1} = 720.3 \]
\[ MS_W = \frac{SS_w}{N-G} = \frac{3174.5}{5-2} = 1058.167 \]
\[F = \frac{MS_b}{MS_w}\]
If the null hypothesis is true, \(F\) has an expected value close to 1 (numerator and denominator are estimates of the same variability)
If it is false, the numerator will likely be larger, because systematic, between-group differences contribute to the variance of the means, but not to variance within group.
data.frame(F = c(0,8)) %>%
ggplot(aes(x = F)) +
stat_function(fun = function(x) df(x, df1 = 3, df2 = 196),
geom = "line") +
stat_function(fun = function(x) df(x, df1 = 3, df2 = 196),
geom = "area", xlim = c(2.65, 8), fill = "purple") +
geom_vline(aes(xintercept = 2.65), color = "purple") +
scale_y_continuous("Density") + scale_x_continuous("F statistic", breaks = NULL) +
theme_bw(base_size = 20)
If data are normally distributed, then the variance is \(\chi^2\) distributed
\(F\)-distributions are one-tailed tests. Recall that we’re interested in how far away our test statistic from the null \((F = 1).\)
\[F = \frac{MS_b}{MS_w} = \frac{720.3}{1058.167} = 0.68\]
Link to probability calculator
data.frame(F = c(0,8)) %>%
ggplot(aes(x = F)) +
stat_function(fun = function(x) df(x, df1 = 3, df2 = 196),
geom = "line") +
stat_function(fun = function(x) df(x, df1 = 3, df2 = 196),
geom = "area", xlim = c(2.65, 8), fill = "purple") +
geom_vline(aes(xintercept = 2.65), color = "purple") +
geom_vline(aes(xintercept = 0.68), color = "red") +
annotate("text",
label = "F=0.68",
x = 1.1, y = 0.65, size = 8, color = "red") +
scale_y_continuous("Density") + scale_x_continuous("F statistic", breaks = NULL) +
theme_bw(base_size = 20)
What can we conclude?
Performed when there is a significant difference among the groups to examine which groups are different
Often times the output will be in the form of a table and then it is often reported this way in the manuscript
Source of Variation | df | Sum of Squares | Mean Squares | F-statistic | p-value |
---|---|---|---|---|---|
Group | \(G-1\) | \(SS_b\) | \(MS_b = \frac{SS_b}{df_b}\) | \(F = \frac{MS_b}{MS_w}\) | \(p\) |
Residual | \(N-G\) | \(SS_w\) | \(MS_w = \frac{SS_w}{df_w}\) | ||
Total | \(N-1\) | \(SS_{total}\) |
A one-way analysis of variance was used to test for differences in the [variable of interest/outcome variable] as a function of [whatever the factor is]. Specifically, differences in [variable of interest] were assessed for the [list different levels and be sure to include (M= , SD= )] . The one-way ANOVA revealed a significant/nonsignificant effect of [factor] on scores on the [variable of interest] (F(dfb, dfw) = f-ratio, p = p-value, η2 = effect size).
Planned comparisons were conducted to compare expected differences among the [however many groups] means. Planned contrasts revealed that participants in the [one of the conditions] had a greater/fewer [variable of interest] and then include the p-value. This same type of sentence is repeated for whichever contrasts you completed. Descriptive statistics were reported in Table 1.
We want to be able to connect with the paranormal. Collected data at different locations to examine whether there are certain areas that have more ghost activity. We have multiple ratings (EMF) at the various locations to determine the potential presence of ghosts. The locations were determined by a select group of undergraduate researchers. They include:
Collect Sample and define hypotheses
Set alpha level
Determine the sampling distribution (will be using \(F\)-distribution now)
Identify the critical value
Calculate test statistic for sample collected
Inspect & compare statistic to critical value; Calculate probability
Take a look at the data and compute the following:
Source of Variation | df | Sum of Squares | Mean Squares | F-statistic | p-value |
---|---|---|---|---|---|
Group | \(G-1\) | \(SS_b\) | \(MS_b = \frac{SS_b}{df_b}\) | \(F = \frac{MS_b}{MS_w}\) | \(p\) |
Residual | \(N-G\) | \(SS_w\) | \(MS_w = \frac{SS_w}{df_w}\) | ||
Total | \(N-1\) | \(SS_{total}\) |
Can use R or Excel