Load the HomeRun_2014 data into R.
HomeRun <- read.csv("https://sullystats.github.io/Statistics6e/Data/HomeRun_2014.csv")
head(HomeRun,n=4)
## Date Hitter HitterTeam Pitcher PitcherTeam INN
## 1 9/28/2014 Rizzo, Anthony CHC Fiers, Mike MIL 1
## 2 9/28/2014 Bernadina, Roger LAD Scahill, Rob COL 6
## 3 9/28/2014 Duvall, Adam SF Stauffer, Tim SD 4
## 4 9/28/2014 Duda, Lucas NYM Foltynewicz, Mike HOU 8
## Ballpark TrueDist SpeedOffBat Elev.Angle Horiz.Angle Apex Type
## 1 Miller Park 441 109.1 22.7 86.7 81 PL
## 2 Dodger Stadi... 424 113.2 27.7 62.3 98 ND
## 3 AT&T Park 423 103.6 31.9 112.9 98 ND
## 4 Citi Field 417 106.3 26.5 73.0 83 PL
Use Mosaic. We use the following syntax when adding a qualitative variable to the command.
xyplot(y-variable ~ x-variable | qualitative variable, data = df_name)
library(mosaic)
xyplot(TrueDist ~ SpeedOffBat|Type,data=HomeRun)
Notice that there is a positive association for the JE (just enough), ND (no doubt), and PL (plenty) home run type, but ITP (inside-the-park) home runs do not appear to have a relation.
Use the subset( ) command to create a new data frame that is a subset of an existing data frame. This will allow you to find correlation coefficients by some qualitative variable.
HomeRun_JE <- subset(HomeRun,Type=="JE") # Data frame for JE home runs
HomeRun_ND <- subset(HomeRun,Type=="ND") # Data frame for ND home runs
HomeRun_PL <- subset(HomeRun,Type=="PL") # Data frame for PL home runs
HomeRun_ITP <- subset(HomeRun,Type=="ITP") # Data frame for ITP home runs
cor(TrueDist ~ SpeedOffBat, data=HomeRun_JE)
## [1] 0.5283018
cor(TrueDist ~ SpeedOffBat, data=HomeRun_ND)
## [1] 0.6691496
cor(TrueDist ~ SpeedOffBat, data=HomeRun_PL)
## [1] 0.663038
cor(TrueDist ~ SpeedOffBat, data=HomeRun_ITP)
## [1] 0.4295198