Comparing Means: t-tests

PSYC 640 - Fall 2023

Dustin Haraden, PhD

Last week

Categorical Data analysis with the \(\chi^2\) distribution
- Test of Independence & Goodness of Fit Test
Single Sample \(t\)-test

library(lsr)
# File management
library(here)
# for dplyr, ggplot
library(tidyverse)
# Making things look nice
library(ggpubr)
#Loading data
library(rio)
# Assumption Checks
library(car)

#Remove Scientific Notation 
options(scipen=999)

Today…

Comparing Means with the \(t\)-test
- Independent samples
- Paired Samples (probably next class)

Comparing Means

Calculated using a t-test. To calculate the t-statistic, you will use this formula:

\[t_{df=N-1} = \frac{\bar{X}-\mu}{\frac{\hat{\sigma}}{\sqrt{N}}}\]

The heavier tails of the t-distribution, especially for small N, are the penalty we pay for having to estimate the population standard deviation from the sample.

Load in the dataset from last class

school <- import("https://raw.githubusercontent.com/dharaden/dharaden.github.io/main/data/example2-chisq.csv") %>%  
  mutate(Score_in_memory_game = as.numeric(Score_in_memory_game))
school <- school %>% 
  filter(!is.na(Score_in_memory_game))

One-sample t-tests vs Z-test

Parameters	Z-test	t-test
\(\large{\mu}\)	known	known
\(\sigma\)	known	unknown
sem or \(\sigma_M\)	\(\frac{\sigma}{\sqrt{N}}\)	\(\frac{\hat{\sigma}}{\sqrt{N}}\)
Probability distribution	standard normal	\(t\)
DF	none	\(N-1\)
Tails	One or two	One or two
Critical value \((\alpha = .05, two-tailed)\)	1.96	Depends on DF

Assumptions of the one-sample t-test

Normality. We assume the sampling distribution of the mean is normally distributed. Under what two conditions can we be assured that this is true?
Independence. Observations in the dataset are not associated with one another. Put another way, collecting a score from Participant A doesn’t tell me anything about what Participant B will say. How can we be safe in this assumption?

A brief example - REVIEW

Using the same Census at School data, we find that New York students who participated in a memory game ( \(N = 224\) ) completed the game in an average time of 44.2 seconds ( \(s = 15.3\) ). We know that the average US student completed the game in 45.04 seconds. How do our students compare?

Hypotheses

\(H_0: \mu = 45.05\)

\(H_1: \mu \neq 45.05\)

\[\mu = 45.05\]

\[N = 227\]

\[ \bar{X} = 44.2 \]

\[ s = 15.3 \]

\[ \sigma = Unknown \]

t.test(x = school$Score_in_memory_game, 
       mu = 45.05,
       alternative = "two.sided")


    One Sample t-test

data:  school$Score_in_memory_game
t = -0.86545, df = 223, p-value = 0.3877
alternative hypothesis: true mean is not equal to 45.05
95 percent confidence interval:
 42.14729 46.18116
sample estimates:
mean of x 
 44.16422

lsr::oneSampleTTest(x = school$Score_in_memory_game,
                    mu = 45.05, one.sided = FALSE)


   One sample t-test 

Data variable:   school$Score_in_memory_game 

Descriptive statistics: 
            Score_in_memory_game
   mean                   44.164
   std dev.               15.318

Hypotheses: 
   null:        population mean equals 45.05 
   alternative: population mean not equal to 45.05 

Test results: 
   t-statistic:  -0.865 
   degrees of freedom:  223 
   p-value:  0.388 

Other information: 
   two-sided 95% confidence interval:  [42.147, 46.181] 
   estimated effect size (Cohen's d):  0.058

Writing Up a t-test

“A one-sample t-test was conducted to determine if the mean [variable name] differed from a hypothesized population mean of [population mean]. The sample mean was M = [sample mean], which was significantly [greater than/less than/different from] the hypothesized population mean, t(df) = [t-value], p = [p-value].”

A one-sample t-test was conducted to determine if the mean score in a memory game for NY students differed from the US population mean. The sample mean was \(M = 44.164\) (SD = 15.32, CI = [42.15, 46.18]), which was not significantly different from the population mean, \(t(223) = -0.87\), \(p = 0.388\).

Single sample t-tests are not used super often in practice

You will mainly see them when interpreting effect sizes of coefficients in your model

#Load sleep data: https://vincentarelbundock.github.io/Rdatasets/datasets.html
sleep <- read_csv("https://raw.githubusercontent.com/dharaden/dharaden.github.io/main/data/sleepstudy.csv")
model = lm(Reaction ~ Days, data = sleep)
summary(model)


Call:
lm(formula = Reaction ~ Days, data = sleep)

Residuals:
     Min       1Q   Median       3Q      Max 
-110.848  -27.483    1.546   26.142  139.953 

Coefficients:
            Estimate Std. Error t value             Pr(>|t|)    
(Intercept)  251.405      6.610  38.033 < 0.0000000000000002 ***
Days          10.467      1.238   8.454  0.00000000000000989 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 47.71 on 178 degrees of freedom
Multiple R-squared:  0.2865,    Adjusted R-squared:  0.2825 
F-statistic: 71.46 on 1 and 178 DF,  p-value: 0.000000000000009894

Types of t-Tests

Single Samples t-test

Independent Samples t-test

Paired Samples t-test

Types of t-Tests: Assumptions

~~Single Samples t-test~~

Independent Samples t-test

Random Sampling
Independent observations

Approximately normal distributions
Homogeneity of variances

Paired Samples t-test

Approximately normal distributions
Homogeneity of variances

Dataset

Moving forward for today, we will use this dataset

100 students from New York
100 students from New Mexico

state_school <- import("https://raw.githubusercontent.com/dharaden/dharaden.github.io/main/data/NM-NY_CAS.csv")

Normality Assumption

Check for Normality: Visualizing data (histograms), Q-Q plots, and statistical tests (Shapiro-Wilk, Anderson-Darling) to assess normality.
Remedies for Violations: data transformation or non-parametric alternatives when data is not normally distributed.

Normality Assumption - Visualizing

Visualizing Data

hist(as.numeric(state_school$Sleep_Hours_Schoolnight))

Let’s make it pretty

state_school %>% 
  ggplot(aes(Sleep_Hours_Schoolnight,
             fill = Region)) +
  geom_bar(position = "dodge")

Normality Assumption — Q-Q Plot

A Q-Q plot is a graphical method for assessing whether a dataset follows a normal distribution. It compares the quantiles of your data to the quantiles of a theoretical normal distribution. If your data follows a normal distribution, the points in the Q-Q plot should form a straight line.

qqPlot(state_school$Sleep_Hours_Schoolnight)

[1] 131   6

Normality Assumption — Shapiro-Wilk Test

Examines the Null Hypothesis that the data are normally distributed

shapiro.test(state_school$Sleep_Hours_Schoolnight)


    Shapiro-Wilk normality test

data:  state_school$Sleep_Hours_Schoolnight
W = 0.95501, p-value = 0.00001511

Normality Assumption

Strict adherence to normality assumptions is not always necessary,
- Larger samples bring in Central Limit Theorem
However, assessing normality is still a valuable step in understanding distributions and potential impacts on your analyses

Failure of Normality Assumptions

What can we do if our data violate these normality assumptions?

Logarithmic Transformations
Square Root Transformations
Non-parametric tests
- Mann-Whitney U Test (Wilcoxon Rank-Sum Test): Used for comparing two independent groups
- Wilcoxon Signed-Rank Test: Used for comparing two paired or matched groups

Homogeneity of Variance

Check for Equality of Variances: Levene’s test to assess if variances are equal between groups
Remedies for Violations: Welch’s t-test for unequal variances.

Levene’s Test

This test is used to examine if the variance is equal across groups. The Null Hypothesis is that the variances are equal

# Perform Levene's test for equality of variances
leveneTest(Sleep_Hours_Schoolnight ~ Region, 
           data = state_school)

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1  0.5937  0.442
      180

Like other tests of significance, Levene’s test gets more powerful as sample size increases. So unless your two variances are exactly equal to each other (and if they are, you don’t need to do a test), your test will be “significant” with a large enough sample. Part of the analysis has to be an eyeball test – is this “significant” because they are truly different, or because I have many subjects.

Independent Samples t-test

Chapter 13.3 in Learning Stats with R

Two different types: Student’s & Welch’s

Start with Student’s t-test which assumes equal variances between the groups

\[ t = \frac{\bar{X_1} - \bar{X_2}}{SE(\bar{X_1} - \bar{X_2})} \]

Student’s t-test

\[ H_0 : \mu_1 = \mu_2 \ \ H_1 : \mu_1 \neq \mu_2 \]

Student’s t-test: Calculate SE

Are able to use a pooled variance estimate

Both variances/standard deviations are assumed to be equal

Therefore:

\[ SE(\bar{X_1} - \bar{X_2}) = \hat{\sigma} \sqrt{\frac{1}{N_1} + \frac{1}{N_2}} \]

We are calculating the Standard Error of the Difference between means

Degrees of Freedom: Total N - 2

Student’s t-test: In R

Using the independentSamplesTest() within the lsr library (make sure it is installed) we can run the test very easily

Formula is outcome ~ group

independentSamplesTTest(
  formula = Sleep_Hours_Schoolnight ~ Region, 
  data = state_school, 
  var.equal = TRUE #default is FALSE
)


   Student's independent samples t-test 

Outcome variable:   Sleep_Hours_Schoolnight 
Grouping variable:  Region 

Descriptive statistics: 
               NM    NY
   mean     6.989 6.994
   std dev. 1.379 1.512

Hypotheses: 
   null:        population means equal for both groups
   alternative: different population means in each group

Test results: 
   t-statistic:  -0.024 
   degrees of freedom:  180 
   p-value:  0.981 

Other information: 
   two-sided 95% confidence interval:  [-0.428, 0.418] 
   estimated effect size (Cohen's d):  0.004

Student’s t-test: In R (classic)

Let’s then try it out using the traditional t.test() function. It doesn’t look as nice, but you will likely encounter it when searching for help

t.test(
  formula = Sleep_Hours_Schoolnight ~ Region, 
  data = state_school, 
  var.equal = TRUE
)


    Two Sample t-test

data:  Sleep_Hours_Schoolnight by Region
t = -0.023951, df = 180, p-value = 0.9809
alternative hypothesis: true difference in means between group NM and group NY is not equal to 0
95 percent confidence interval:
 -0.4281648  0.4178954
sample estimates:
mean in group NM mean in group NY 
        6.989247         6.994382

Student’s t-test: Write-up

The mean amount of sleep in New Mexico for youth was 6.989 (SD = 1.379), while the mean in New York was 6.994 (SD = 1.512). A Student’s independent samples t-test showed that there was not a significant mean difference (t(180)=-0.024, p=.981, \(CI_{95}\)=[-0.43, 0.42], d=.004). This suggests that there is no difference between youth in NM and NY on amount of sleep on school nights.

#https://indrajeetpatil.github.io/ggstatsplot/
ggstatsplot::ggbetweenstats(
  data  = state_school,
  x     = Region,
  y     = Sleep_Hours_Schoolnight,
  title = "Distribution of hours of sleep across Region"
)

Welch’s t-test

\[ H_0 : \mu_1 = \mu_2 \ \ H_1 : \mu_1 \neq \mu_2 \]

Welch’s t-test: Calculate SE

Since the variances are not equal, we have to estimate the SE differently

\[ SE(\bar{X_1} - \bar{X_2}) = \sqrt{\frac{\hat{\sigma_1^2}}{N_1} + \frac{\hat{\sigma_2^2}}{N_2}} \]

Degrees of Freedom is also very different:

Welch’s t-test: In R

We use the same function as before, but specify that the variances are not equal

independentSamplesTTest(
  formula = Sleep_Hours_Schoolnight ~ Region, 
  data = state_school, 
  var.equal = FALSE
)


   Welch's independent samples t-test 

Outcome variable:   Sleep_Hours_Schoolnight 
Grouping variable:  Region 

Descriptive statistics: 
               NM    NY
   mean     6.989 6.994
   std dev. 1.379 1.512

Hypotheses: 
   null:        population means equal for both groups
   alternative: different population means in each group

Test results: 
   t-statistic:  -0.024 
   degrees of freedom:  176.737 
   p-value:  0.981 

Other information: 
   two-sided 95% confidence interval:  [-0.429, 0.419] 
   estimated effect size (Cohen's d):  0.004

Welch’s t-test: In R (classic)

Let’s then try it out using the traditional t.test() function…turns out it is pretty straightforward

t.test(
  formula = Sleep_Hours_Schoolnight ~ Region, 
  data = state_school, 
  var.equal = FALSE
)


    Welch Two Sample t-test

data:  Sleep_Hours_Schoolnight by Region
t = -0.023902, df = 176.74, p-value = 0.981
alternative hypothesis: true difference in means between group NM and group NY is not equal to 0
95 percent confidence interval:
 -0.4290776  0.4188082
sample estimates:
mean in group NM mean in group NY 
        6.989247         6.994382

Cool Visualizations

The library ggstatsplot has some wonderful visualizations of various tests

Code

ggstatsplot::ggbetweenstats(
  data  = state_school,
  x     = Region,
  y     = Sleep_Hours_Schoolnight,
  title = "Distribution of hours of sleep across Region"
)

In-Class Lab

Take a look at the data in state_school and identify another Independent Samples t-test that you can perform

Be sure to select (1) a grouping variable and (2) a continuous variable to look at differences between the groups

Follow the steps that we went through today:

Check for Normality of the variable
Check for Homogeneity of Variances
Perform the appropriate t-test
Report Results

Knit the document

Next Time…

Paired Samples t-test

Paired Samples t-Test in R

Chapter 13.5 - Learning Stats with R

Also called “Dependent Samples t-test”

We have been testing means between two independent samples. Participants may be randomly assigned to the separate groups
- This is limited to those types of study designs, but what if we have repeated measures?
We will then need to compare scores across people…the samples we are comparing now depend on one another and are paired

Common Mistakes and Pitfalls

Misinterpreting p-Values
Violations of Assumptions
Sample Size Considerations
Multiple Testing Issues

Comparing Means: t-tests

Last week

Today…

Comparing Means

Load in the dataset from last class

One-sample t-tests vs Z-test

Assumptions of the one-sample t-test

A brief example - REVIEW

Writing Up a t-test

Types of t-Tests

Types of t-Tests: Assumptions

Dataset

Normality Assumption

Normality Assumption - Visualizing

Normality Assumption — Q-Q Plot

Normality Assumption — Shapiro-Wilk Test

Normality Assumption

Failure of Normality Assumptions

Homogeneity of Variance

Levene’s Test

Independent Samples t-test

Student’s t-test

Student’s t-test: Calculate SE

Student’s t-test: In R

Student’s t-test: In R (classic)

Student’s t-test: Write-up

Welch’s t-test

Welch’s t-test: Calculate SE

Welch’s t-test: In R

Welch’s t-test: In R (classic)

Cool Visualizations

In-Class Lab

Next Time…

Paired Samples t-Test in R

Common Mistakes and Pitfalls

Next time…