Using Randomization Techniques to Compare Two Independent Means

In necessary, install the Mosaic package.

install.packages("mosaic")

We are going to follow the scenario laid out in Section 11.3A where we suspect that female students spend more time on homework than male students. In that regard, we are testing

\(H_0:\mu_F = \mu_M\)
\(H_1:\mu_F > \mu_M\)

Study <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter11/Table1_113A.csv")
head(Study,n=3)

##   Time Gender
## 1   81   Male
## 2   79   Male
## 3   81   Male

First, let’s explore the data by computing the summary statistics and drawing a dot plot.

library(mosaic)
favstats(Time ~ Gender,data=Study)

##   Gender min Q1 median     Q3 max     mean       sd  n missing
## 1 Female  70 80     99 109.25 152 99.75000 23.46031 12       0
## 2   Male  60 77     80  84.25 104 80.58333 10.51802 12       0

gf_dotplot(~Time | Gender,data=Study,title="Time Spent Doing Homework by Gender")

So, the sample mean time for males is 80.6 minutes and the sample mean time for females is 99.8 minutes. The sample mean difference (females - males) is 99.8 - 80.6 = 19.2 minutes.

Now, we are going to randomly assign a gender to the study time 5000 times. Use the do() command along with the diffmean and shuffle command.

set.seed(93)
Random <- do(5000)*diffmean(Time ~ shuffle(Gender),data=Study)
head(Random,n=4)

##    diffmean
## 1 -6.000000
## 2 -4.500000
## 3  4.666667
## 4 15.500000

Draw a histogram of the randomized difference in means. Also, draw a vertical line at 19.2 (the observed difference in means).

histogram(~diffmean,data=Random,width=1,main="Randomized Difference in Means",v=19.2)

Finally, determine the proportion of randomized difference in means that are as extreme or more extreme than that observed, 19.2.

prop(~(diffmean >= 19.2),data=Random)

## prop_TRUE 
##    0.0064

The P-value is approximately 0.0064.