PSYC 640 - Fall 2024
Journal Entries
Reverse Results - Due 9/17
Continuing on this trend to get more comfortable with all that dplyr
can do!
This is a nice website that walks through a lot of this as well
We need to revisit how to read the ggplot2 cheatsheet!
A lot of cheatsheets will have something that looks like this:
c
?!c
is just an object that is referring to the first line that we have in a ggplot function
Let’s first start by opening our Project
Then, create a new Notebook/Markdown Document that we will use for today
Setup the libraries and bring in the data
Ooof…These variable names look terrible! How do we update those? Our brains would break having to remember how to connect them and having to reference another document.
First, let’s get rid of the variables that we don’t need for the moment:
Use select()
to remove the following variables:
Now we can update the names
Using the names()
command, we can rename all columns in the file
Here is a list so that it makes it easier to do instead of having to type everything out…that would just be cruel
names(sleep_data) <- c('age', 'gender', 'roommate', 'other_sleep', 'bed_read', 'bed_study', 'bed_hw', 'attention1', 'bed_internet',
'bed_tv', 'bed_eat', 'bed_friends', 'bed_videogame', 'bed_readp', 'BL_1', 'BL_2', 'BL_3', 'BL_4',
'BL_5', 'attention2', 'BL_6', 'BL_7', 'sleepsat1', 'sleepsat2', 'sleepsat3', 'sleepsat4', 'sleepsat5',
'sleepsat6', 'ESS1', 'ESS2', 'ESS3', 'ESS4', 'ESS5', 'ESS6', 'ESS7', 'ESS8', 'ESS00', 'ashs1', 'ashs2',
'ashs3', 'ashs4', 'ashs5', 'attention3', 'ashs6', 'ashs7', 'ashs8', 'ashs9', 'ashs10', 'ashs11',
'ashs12', 'ashs13', 'ashs14', 'ashs15', 'ashs16', 'ashs17', 'attention4', 'ashs18', 'ashs19',
'ashs20', 'ashs21', 'ashs22', 'ashs23', 'ashs24', 'ashs26', 'ashs27', 'ashs28', 'attention5',
'ashs29', 'ashs30', 'ashs31', 'ashs32', 'ashs33', 'id')
The ‘filter()’ function is used to subset observations based on their values.
The result of filtering is a data frame with the same number of columns as before but fewer rows.
The first argument is data and subsequent arguments are logical expressions that tell you which observations to retain in the data frame.
# A tibble: 39 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Darth V… 202 136 none white yellow 41.9 male mascu…
2 Greedo 173 74 <NA> green black 44 male mascu…
3 IG-88 200 140 none metal red 15 none mascu…
4 Bossk 190 113 none green red 53 male mascu…
5 Lobot 175 79 none light blue 37 male mascu…
6 Ackbar 180 83 none brown mot… orange 41 male mascu…
7 Nien Nu… 160 68 none grey black NA male mascu…
8 Nute Gu… 191 90 none mottled g… red NA male mascu…
9 Jar Jar… 196 66 none orange orange 52 male mascu…
10 Roos Ta… 224 82 none grey orange NA male mascu…
# ℹ 29 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
We can now generate a subset of observations based on a particular value
Since the age
variable has been giving us some trouble, let’s filter based on that variable to only include appropriate ages for “college aged” participants
Generating a bar chart to see if it worked out!
We often need to make a sum/mean score for a variable of interest
The mutate()
function is most commonly used to add new columns to your data frame that are functions of existing columns.
mutate()
requires data as its first argument, followed by a set of expressions defining new columns.
For example, in the mod_sleep_data
, we have the Epworth Sleepiness Scale
Take a look at the scoring of the ESS and compute the total score labelled ess_total
ess_total
It is a sum score with a scale of 0 - 24
The original items are supposed to be on a 0-3 scale
The data that we have are on a 1-4 scale
What should we do?!
We could subtract 1 from each item before creating another sum
Or we could combine all the 1
’s that we are subtracting and just take that from the total score
Let’s see what the differences are between these methods
It is good to always know what the original scaling needs to be (e.g., 1-4 or 0-3)
Extremely important to visualize the data! ALWAYS VISUALIZE
There is also another way to allow for the sum()
function in mutate()
, but we’ll get to that later
You may have noticed the attention
variables throughout.
These are “attention checks” that researchers put in to…um….check the attention of the participants. It’s right in the name.
For this survey, they were nonsensical questions that were embedded to look like the others.
Var Name | Question | Responses |
---|---|---|
attention1 | How often do you breathe in bed? | 1-5 (Never to Always) |
attention2 | How often have you used a screen device to steal over a million dollars? | 1-5 (Never to Every Day) |
attention3 | After 6:00 in the evening…Choose 60% | 1-6 (Never to Always) 4 = 60% |
attention4 | I go to bed and know that a magical train will literally take me to see Santa. | 1-6 (Never to Always) |
attention5 | I have gone 30 days or more without sleeping at all. | 1-6 (Never to Always) |
recode()
We can combine our mutate()
with this new function to produce the results we want
Now let’s put this into action by recoding the attention variables
Getting a 1
indicates that they were paying attention
Let’s make a total score for the attention variable so that we can get rid of folks who are not paying attention
OR
Create two new datasets labeled (1) “data_attend” and (2) “data_distract”. In each dataset have those who were paying attention in the “data_attend” and those who were not in the “data_distract”. Paying attention is operationalized as having a score of 5 on the aggregated variable.
Get the sample size of each of these datasets (use Google to search for things like number of rows)
What is the mean/average of ess_total
in each of the samples?
Starting to look at descriptive statistics and how to make these nice tables
Will begin Lab 1 (as long as we have time to do so)