Lab 5: Correlations

Instructions

We are going to investigate what is most highly related to the number of Spotify streams in this Popular Music dataset. This data was taken from: https://www.kaggle.com/datasets/ahmadrazakashif/spotify-popularity-songs

Here are the things that you will need for this lab:

When you are finished, click the Knit button to turn your work into an HTML document. You will submit both this .Rmd file and the 🧶knitted .html file.

Scenario and Goal

We have been asked by an up and coming music artist to use our advanced data analytic skills to see what has been related to the most popular songs. They have provided us with all of this data for recent popular songs that have been streamed the most.

Our goal is to identify which variable is most related to higher streams. These variables that they are looking at are things like danceability, instrumentalness, etc.

Variables of Interest

  • streams: The number of streams for the individual song

  • Percentage Variables: These variables reflect a rating from 0-100 on the intensity related to that variable

    • danceability_%
    • valence_%
    • energy_%
    • acousticness_%
    • instrumentalness_%
    • liveness_%
    • speechiness_%

Lab Exercises

Import your data and use the clean_names() function to make the variables a little nicer looking

Start by creating a correlation table (make sure it looks nice and not just an output from R using cor()).

Examine the correlation table and identify the two largest effects with the outcome (streams).

Visualize the two correlations that you have identified.

Choose one of the relationships that you visualized and write the results in APA format.

End of Lab. Don’t forget to Knit! 🧶