Figures: Using ggplot2

PSYC 640 - Fall 2024

Dustin Haraden, PhD

Reminders

  • Journal Entries

  • Reverse Results - Due 9/17

Last Class

  • The workflow
    • Starting with a Project and going from there

Today…

Working with ggplot2 to get some really fancy visualizations!

Maybe integrating some generative AI (ChatGPT) to help us out too

# File management
library(here)
# for dplyr, ggplot2
library(tidyverse)
#Loading data
library(rio)

#for the penguins dataset
#install.packages('palmerpenguins')
library(palmerpenguins)

#Remove Scientific Notation 
options(scipen=999)

Starting out

Let’s first start by opening our Project

Then, create a new Notebook/Markdown Document that we will use for today

Setup the libraries and bring in the data

  • We will use the “Sleep Data” file as well
#import sleep data - Your path may be different
sleep_data <- import(here("lectures", "data", "Sleep_Data.csv"))

Take a look at the data

Will be using a dataset from the palmerpenguins library (link) which is a dataset about…penguins. This function will pull that data into our environment:

data(penguins)

ggplot2

ggplot2 from the tidyverse

Since we have already installed and loaded the library, we don’t have to do anything else at this point!

ggplot2 follows the “grammar of graphics”

  • Theoretical framework for creating data visualizations
  • Breaks the process down into separate components:

Data

Aesthetics (aes)

Geometric Objects (geoms)

Faceting

Themes

Grammar of Graphics

ggplot2 cheatsheet

ggplot2 syntax

There is a basic structure to create a plot within ggplot2, and consists of at least these three things:

  1. A Data Set
  2. Coordinate System
  3. Geoms - visual marks to represent the data points

In R it looks like this:

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

#or how I like to do it
<DATA> %>% 
  ggplot(aes(<MAPPINGS>)) + 
  <GEOM_FUNCTION>()

ggplot2 syntax

Let’s start with a basic figure with palmerpenguins

First we will define the data that we are using and the variables we are visualizing

#the dataset is called penguins

penguins %>% 
  #including the variables we want to visualize
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g))

What happens?

We forgot to tell it what to do with the data!

Need to add the appropriate geom to have it plot points for each observation

penguins %>% 
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g)) + 
  geom_point()


Note:
the geom_point() layer will inherit what is in the aes() in the previous layer

Adding in Color

Maybe we would like to have each of the points colored by their respective species

This information will be added to the aes() within the geom_point() layer

penguins %>% 
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g)) + 
  geom_point(aes(color = species))

Including a fit line

Why don’t we put in a line that represents the relationship between these variables?

We will want to add another layer/geom

penguins %>% 
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g)) + 
  geom_point(aes(color = species)) + 
  geom_smooth()


That looks a little wonky…why is that? Did you get a note in the console?

Including a fit line

The geom_smooth() defaults to using a loess line to fit to the data

In order to update that, we need to change some of the defaults for that layer and specify that we want a “linear model” or lm function to the data

penguins %>% 
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g)) + 
  geom_point(aes(color = species)) + 
  geom_smooth(method = 'lm')


Did that look a little better?

Individual fit lines

It might make more sense to have individual lines for each of the species instead of something that is across all

penguins %>% 
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g, 
             color = species)) + 
  geom_point() + 
  geom_smooth(method = 'lm')


What did we move around from the last set of code?

Updating Labels/Title

It will default to including the variable names as the x and y labels, but that isn’t something that makes sense. Also would be good to have a title!

We add on another layer called labs() for our labels (link)

penguins %>% 
  ggplot(aes(x = flipper_length_mm, 
             y = body_mass_g, 
             color = species)) + 
  geom_point() + 
  geom_smooth(method = 'lm') + 
  labs(
    title = "Palmer Penguins",
    subtitle = "Body Mass by Flipper Length", 
    x = "Flipper Length (mm)", 
    y = "Body Mass (g)", 
    color = "Species"
  )

Penguin Histogram

Taken from the website for palmerpenguins (link)

penguins %>% 
  ggplot(aes(x = flipper_length_mm)) +
    geom_histogram(aes(fill = species), 
                   alpha = 0.5, 
                   position = "identity")

Now our own data! 🎉

Sleep Data

Let’s start by looking at our data. You can either click on the dataset in the Environment or use the View(sleep_data) command. Here, I am using the head() command just to visualize a sample of the data for the slides.

head(sleep_data)
            StartDate             EndDate Status Progress Duration__in_seconds_
1 2017-08-16 13:13:06 2017-08-16 13:15:13      0      100                   126
2 2017-08-16 13:17:16 2017-08-16 13:19:03      0      100                   106
3 2017-11-27 16:59:07 2017-11-27 17:07:27      0      100                   499
4 2017-11-27 18:57:16 2017-11-27 19:13:00      0      100                   943
5 2017-11-27 18:54:35 2017-11-27 19:14:12      0      100                  1177
6 2017-11-27 19:41:14 2017-11-27 19:46:48      0      100                   333
  Finished        RecordedDate DistributionChannel UserLanguage Q5 Q7 Q8 Q12
1        1 2017-08-16 13:15:14           anonymous           EN 12  2  2   3
2        1 2017-08-16 13:19:03           anonymous           EN  1  2  2   1
3        1 2017-11-27 17:07:27           anonymous           EN 19  1  2   1
4        1 2017-11-27 19:13:00           anonymous           EN 18  2  2   2
5        1 2017-11-27 19:14:13           anonymous           EN 21  1  2   2
6        1 2017-11-27 19:46:48           anonymous           EN 18  2  1   3
  Q13_1 Q13_2 Q13_3 Q13_4 Q13_5 Q13_6 Q13_7 Q13_8 Q13_9 Q13_10 Q14_1 Q14_2
1     2     2     2    NA     2     2     2     2     2      2     2     2
2     1     1     1    NA     2     3     2     3     2      3     4     2
3     1     2     2     5     3     4     1     1     2      1     5     5
4     3     4     5     5     5     5     5     5     5      1     5     5
5     1     2     2     5     4     2     2     1     1      2     5     5
6     1     2     4     5     4     2     4     4     5      4     5     5
  Q14_3 Q14_4 Q14_5 Q14_6 Q14_7 Q14_8 Q11_1 Q11_2 Q11_3 Q11_4 Q11_5 Q11_6 Q17_1
1     2     2     2    NA     2     2     2     3     3     4     4     3     2
2     4     4     4    NA     4     4     4     4     4     4     4     4     2
3     5     5     1     1     1     1     4     5     4     4     5     4     4
4     5     5     5     1     5     5     2     3     4     3     1     1     4
5     5     5     2     1     3     4     2     1     4     3     4     2     2
6     5     5     4     1     4     3     1     4     2     2     4     5     2
  Q17_2 Q17_3 Q17_4 Q18_1 Q18_2 Q18_3 Q18_4 Q18_5 Q20_1 Q20_2 Q21_1 Q21_2 Q21_3
1     3     3     2     2     2     2     3     1     1     2     4     3     3
2     2     1     2     3     1     1     1     1     2     3     5     4     4
3     2     1     4     3     1     4     1     2     3     3     1     2     3
4     4     1     3     4     1     2     1     1     4     2     5     2     1
5     2     2     4     3     1     1     1     2     3     3     4     1     3
6     4     3     4     4     1     1     1     4     6     5     4     4     5
  Q21_4 Q21_5 Q21_6 Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q23_1 Q23_2 Q23_3 Q23_4 Q23_5
1    NA     5     3     3     2     1     2     3     4     5     3     3     2
2    NA     2     2     2     3     4     5     5     5     5     4     3     3
3     4     1     1     5     4     3     4     4     5     6     2     5     5
4     4     1     1     2     5     1     6     1     6     6     2     5     6
5     4     1     2     3     4     1     4     2     2     6     3     5     5
6     4     1     2     2     2     3     4     3     4     5     2     3     3
  Q23_6 Q23_7 Q23_8 Q24_1 Q24_2 Q24_3 Q24_4 Q24_5 Q25_1 Q25_2 Q25_3 Q25_4 Q25_5
1    NA     3     3     3     3     3     3     3     3     3     3    NA     4
2    NA     3     3     3     3     2     2     1     1     2     1    NA     1
3     1     1     3     1     1     1     1     2     3     4     2     1     1
4     1     3     1     1     6     2     1     1     4     2     6     1     6
5     1     2     2     1     2     2     1     2     5     4     2     1     1
6     1     2     4     4     5     1     2     6     6     3     5     1     3
  Q26_1 Q26_2 Q27_1 Q27_2 SC0 SC1 SC2 SC3 SC4 SC5    id Attention
1     3     3     3     3  19  11  19  14  82   0 15516        NA
2     1     1     1     1  24   5  24  26  79   0 15516        NA
3     3     1     4     1  26  13  26  23  70   0 17915        NA
4     4     4     1     1  14  12  14  35  89   0 17648        NA
5     5     2     5     5  16   9  16  29  76   0 17799        NA
6     5     1     6     6  18  15  18  31  99   0 18003        NA

What do these variables mean? Who does this?

Data Documentation

It is always important to have appropriate data documentation!

If you can’t look at your data and know what it means right away, you aren’t going to remember what it means later on.

Sleep Data Documentation - myCourses

Visualizing Demographics

“Q5” - What is your age in years? (open text)

This is a free text field…is that a good way to get quality data?

Let’s see if everyone followed directions. Check the “structure” of the variable

# Here we use str() to check structure of a variable
  # Then we state the data and the variable, linked by the $
str(sleep_data$Q5)
 int [1:1520] 12 1 19 18 21 18 18 18 18 18 ...

Giving it a try

Use the ggplot cheatsheet to identify an appropriate way to visualize the data

  • Add some color

  • Update the title and axes

When you are done, post your creation here!