Unit 6 (pt. 2) - Comparing Means

PSYC 640 - Fall 2024

Dustin Haraden, PhD

Reminders

  • Journal Entries

  • Lab #1 - Due 10/6

Last Class

  • Lots of professor talking
  • Intro to NHST
  • Different t-tests
  • More professor talking

Today…

  • Comparing Two Means

    • But we get to try it out
  • Comparing MANY MUCH Means -ANOVA!

# File management
library(here)
# just because
library(tidyverse)
# makes for easier to look at output
library(lsr)
# cool visualizations
library(ggstatsplot)
# Nice names
library(janitor)



#Remove Scientific Notation 
options(scipen=999)

Comparing Means

Converting to a t-score and checking against the t-distribution

  • One-Sample

  • Independent Sample

  • Paired Sample

New Data for Today

Dataset collected over 1,000 Americans in September 2017. Funded by Cards Against Humanity.

#import data - Your path may be different
ghost_data <- read_csv(here("lectures", "data", "ghosts.csv"))

View Data

woops…sorry

Import again

Since there are 2 header rows, we need to only include the first one. To do that, we have to “skip” the first two and then give the names of the columns back to the data.

We also need to specify what the missing values are. Typically we have been working with NA which is more traditional. However, missing values in this dataset are DK/REF and a blank. This will need to be specified in the import function (used this website)

# get the names of your columns which is the first row
ghost_data_names <- read_csv(here("lectures", "data", "ghosts.csv")) %>% 
  names()

# import second time; skip row 2, and assign column names to argument col_names =
ghost_data <- read_csv(here("lectures", "data", "ghosts.csv"),
                       skip = 2,
                       col_names = ghost_data_names,
                       na = c("DK/REF", "", " ")
                      ) %>% 
  clean_names()

Study Design

What type of study design is this?

Independent t-test

Income by Ghost Belief

t.test(
  formula = income ~ ghosts, 
  data = ghost_data, 
  var.equal = FALSE
)

    Welch Two Sample t-test

data:  income by ghosts
t = 0.28606, df = 337.95, p-value = 0.775
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 -11727.68  15719.22
sample estimates:
 mean in group No mean in group Yes 
         89131.87          87136.09 
ghost_data %>% 
  group_by(ghosts) %>% 
  summarise(mean = mean(income, na.rm = TRUE), 
            sd = sd(income, na.rm = TRUE))
# A tibble: 3 × 3
  ghosts    mean      sd
  <chr>    <dbl>   <dbl>
1 No      89132.  68316.
2 Yes     87136.  73055.
3 <NA>   128143. 120514.

Independent t-test: LSR

##Need to place the "ghosts" variable to a factor
ghost_data <- ghost_data %>% 
  mutate(ghosts = as.factor(ghosts))

#Can't get this to work :(
#independentSamplesTTest(
#  income ~ ghosts, 
#  data = ghost_data, 
#  var.equal = FALSE
#)

But it doesn’t work…

t-test: Write-up

The mean income for those who do not believe in ghotsts was $89,131 (SD = 68316), while the mean in New York was $87,136 (SD = 73055). A Welch’s independent samples t-test showed that there was not a significant mean difference (t(337.95)=-.29, p=.775, \(CI_{95}\)=[-0.16, 0.22]). This suggests that there is no difference between believing in ghosts and income.

Cool Visualizations

The library ggstatsplot has some wonderful visualizations of various tests

Code
ggstatsplot::ggbetweenstats(
  data  = ghost_data,
  x     = ghosts,
  y     = income,
  title = "Distribution of income across belief in ghosts"
)

Now you try