Load the HomeRun_2014 data into R.
HomeRun <- read.csv("https://sullystats.github.io/Statistics6e/Data/HomeRun_2014.csv")
head(HomeRun,n=4)
## Date Hitter HitterTeam Pitcher PitcherTeam INN
## 1 9/28/2014 Rizzo, Anthony CHC Fiers, Mike MIL 1
## 2 9/28/2014 Bernadina, Roger LAD Scahill, Rob COL 6
## 3 9/28/2014 Duvall, Adam SF Stauffer, Tim SD 4
## 4 9/28/2014 Duda, Lucas NYM Foltynewicz, Mike HOU 8
## Ballpark TrueDist SpeedOffBat Elev.Angle Horiz.Angle Apex Type
## 1 Miller Park 441 109.1 22.7 86.7 81 PL
## 2 Dodger Stadi... 424 113.2 27.7 62.3 98 ND
## 3 AT&T Park 423 103.6 31.9 112.9 98 ND
## 4 Citi Field 417 106.3 26.5 73.0 83 PL
If all the data in the data frame are quantitative, then use the following syntax:
cor(df_name)
However, if the data frame has some qualitative variables, you will need to select variables from the original data frame so it only contains the quantitative variables you wish to analyze.
Use the select( ) command to create a new data frame that is a subset of an existing data frame. This will allow you to find correlation coefficients by some qualitative variable. Let’s say we only want to find the correlation coefficient between TrueDist, SpeedOffBat,Elev.Angle, Horiz.Angle, and Apex.
library(mosaic)
df_HomeRun <- select(HomeRun,TrueDist, SpeedOffBat, Elev.Angle, Horiz.Angle, Apex) # Select certain variables from the data frame
cor(df_HomeRun) # Find the correlation matrix of the new data frame
## TrueDist SpeedOffBat Elev.Angle Horiz.Angle Apex
## TrueDist 1.00000000 0.6869344 -0.3328835 0.10126718 0.08807084
## SpeedOffBat 0.68693436 1.0000000 -0.6023920 0.13127437 -0.29903646
## Elev.Angle -0.33288347 -0.6023920 1.0000000 -0.01018030 0.86984450
## Horiz.Angle 0.10126718 0.1312744 -0.0101803 1.00000000 0.02517413
## Apex 0.08807084 -0.2990365 0.8698445 0.02517413 1.00000000