Let’s work Example 2 from Section 11.1.
In clinical trials of Nasonex, 3774 adult and adolescent allergy patients (patients 12 years and older) were randomly divided into two groups. The patients in group 1 (experimental group) received 200 g of Nasonex, while the patients in group 2 (control group) received a placebo. Of the 2103 patients in the experimental group, 547 reported headaches as a side effect. Of the 1671 patients in the control group, 368 reported headaches as a side effect. Is there evidence to conclude that the proportion of Nasonex users who experienced headaches as a side effect is greater than the proportion in the control group?
Here, we are testing
\(H_0:p_1 = p_2\)
\(H_1:p_1 > p_2\)
The syntax for the test is
prop.test(x = c(\(x_1\), \(x_2\)), n = c(\(n_1\), \(n_2\)), alternative = ‘greater’, correct = FALSE)
where:
* \(x_1\) and \(x_2\) are the number of patients who reported a headache from group 1 and group 2, respectively.
* \(n_1\) and \(n_2\) are the total number of patients in group 1 and group 2, respectively.
Note: For a left-tailed test, use alternative = ‘less’; for a two-tailed test, use alternative = ‘two.sided’.
prop.test(x = c(547, 368), n = c(2103, 1671), alternative = 'greater', conf.level = .95, correct = FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: c(547, 368) out of c(2103, 1671)
## X-squared = 8.0618, df = 1, p-value = 0.00226
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.01695043 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.2601046 0.2202274
The P-value is 0.002.
Tornado <- read.csv("https://sullystats.github.io/Statistics6e/Data/Tornadoes_2017.csv")
head(Tornado,n=3)
## Month Day Time State F.Scale Injuries Fatalities PropLoss Length Width
## 1 1 2 9:03:00 TX 1 0 0 30000 2.55 100
## 2 1 2 9:44:00 TX 1 0 0 30000 2.57 100
## 3 1 2 10:06:00 LA 1 0 0 25000 0.30 20
## NumberStates F0
## 1 1 No
## 2 1 No
## 3 1 No
We are going to focus on the column F0, which is a categorical variable that is No if the tornado is not an F0 and Yes if the tornado is an F0 tornado.
Suppose we want to know if there is a difference in the proportion of F0 tornadoes in Louisiana (LA) versus Georgia (GA).
We need to use the Mosaic package.
install.packages("mosaic")
The first thing we need to do is obtain a subset of the data set that only contains observations for Louisiana (LA) and Georgia (GA). Use the subset command on the data set “Tornado” where the “State” is (==) LA or (|) GA. We will name the new data file Data_LA_GA.
Data_LA_GA <- subset(Tornado,State=="LA"|State=="GA")
head(Data_LA_GA,n=3)
## Month Day Time State F.Scale Injuries Fatalities PropLoss Length Width
## 3 1 2 10:06:00 LA 1 0 0 25000 0.30 20
## 4 1 2 10:17:00 LA 1 0 0 50000 1.20 50
## 5 1 2 10:30:00 LA 1 0 0 20000 4.64 100
## NumberStates F0
## 3 1 No
## 4 1 No
## 5 1 No
Now, run the prop.test command in the Mosaic package. The syntax is
prop.test(response variable ~ explanatory variable,data = data frame,alternative = less or greater or two.sided,correct=FALSE)
Note: correct=FALSE turns off the correction for continuity and gives results equivalent to using the normal model.
library(mosaic)
prop.test(F0 ~ State,data=Data_LA_GA,alternative="two.sided",correct=FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: tally(F0 ~ State)
## X-squared = 2.8769, df = 1, p-value = 0.08986
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.2375579 0.0144915
## sample estimates:
## prop 1 prop 2
## 0.6355932 0.7471264
The P-value is 0.0899.
R and Mosaic use a distribution called the \(\chi^2\)-distribution. The test statistic is shown to be 2.8769. To find the test statistic using the normal model, find the square root of the test statistic provided.
sqrt(2.8769)
## [1] 1.696143
So, the test statistic is 1.696.