In necessary, install the Mosaic package.
install.packages("mosaic")
We are going to follow the scenario laid out in Section 11.3A where we suspect that female students spend more time on homework than male students. In that regard, we are testing
\(H_0:\mu_F = \mu_M\)
\(H_1:\mu_F > \mu_M\)
Study <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter11/Table1_113A.csv")
head(Study,n=3)
## Time Gender
## 1 81 Male
## 2 79 Male
## 3 81 Male
First, let’s explore the data by computing the summary statistics and drawing a dot plot.
library(mosaic)
favstats(Time ~ Gender,data=Study)
## Gender min Q1 median Q3 max mean sd n missing
## 1 Female 70 80 99 109.25 152 99.75000 23.46031 12 0
## 2 Male 60 77 80 84.25 104 80.58333 10.51802 12 0
gf_dotplot(~Time | Gender,data=Study,title="Time Spent Doing Homework by Gender")
So, the sample mean time for males is 80.6 minutes and the sample mean time for females is 99.8 minutes. The sample mean difference (females - males) is 99.8 - 80.6 = 19.2 minutes.
Now, we are going to randomly assign a gender to the study time 5000 times. Use the do() command along with the diffmean and shuffle command.
set.seed(93)
Random <- do(5000)*diffmean(Time ~ shuffle(Gender),data=Study)
head(Random,n=4)
## diffmean
## 1 -6.000000
## 2 -4.500000
## 3 4.666667
## 4 15.500000
Draw a histogram of the randomized difference in means. Also, draw a vertical line at 19.2 (the observed difference in means).
histogram(~diffmean,data=Random,width=1,main="Randomized Difference in Means",v=19.2)
Finally, determine the proportion of randomized difference in means that are as extreme or more extreme than that observed, 19.2.
prop(~(diffmean >= 19.2),data=Random)
## prop_TRUE
## 0.0064
The P-value is approximately 0.0064.