install.packages("mosaic")
Suppose we want to test
\(H_0:\mu_1 = \mu_2\)
\(H_1:\mu_1 \neq \mu_2\)
Further suppose we have the following summary statistics:
\(\bar{x}_1 = 43.2\)
\(s_1 = 3.2\)
\(n_1 = 40\)
\(\bar{x}_2 = 44.9\)
\(s_2 = 3.6\)
\(n_2 = 45\)
To conduct this test, install the PASWR package.
install.packages("PASWR")
library(PASWR)
tsum.test(mean.x=43.2,s.x=3.2,n.x=40,mean.y=44.9,s.y=3.6,n.y=45,alternative="two.sided")
##
## Welch Modified Two-Sample t-Test
##
## data: Summarized x and y
## t = -2.3049, df = 83, p-value = 0.02367
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.1669843 -0.2330157
## sample estimates:
## mean of x mean of y
## 43.2 44.9
The P-value is 0.02367.
We will use the data and scenario from Example 1 in Section 11.3 where we are testing
\(H_0:\mu_1 = \mu_2\)
\(H_1:\mu_1 \neq \mu_2\)
Table3 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter11/Table3.csv")
head(Table3,n=3)
## Flight Control
## 1 8.59 8.65
## 2 6.87 7.62
## 3 7.00 7.33
Use the following command to perform a hypothesis test:
t.test(x, y, alternative = “less” or “greater” or “two.sided”)
where: x is the first column in the data set (Sample 1) y is the second column in the data set (Sample 2)
t.test(Table3$Flight, Table3$Control, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: Table3$Flight and Table3$Control
## t = -1.4368, df = 25.996, p-value = 0.1627
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3351258 0.2365544
## sample estimates:
## mean of x mean of y
## 7.880714 8.430000
The test statistic is \(t_0 = -1.4368\) and the P-value is 0.1627.
When the categorical variable is in one column and the quantitative response variable is in a second column, we use the Mosaic package.
Let’s use the Tornado_2017.csv data.
Tornado <- read.csv("https://sullystats.github.io/Statistics6e/Data/Tornadoes_2017.csv")
head(Tornado,n=3)
## Month Day Time State F.Scale Injuries Fatalities PropLoss Length Width
## 1 1 2 9:03:00 TX 1 0 0 30000 2.55 100
## 2 1 2 9:44:00 TX 1 0 0 30000 2.57 100
## 3 1 2 10:06:00 LA 1 0 0 25000 0.30 20
## NumberStates F0
## 1 1 No
## 2 1 No
## 3 1 No
Suppose we want to know if the mean length of a tornado in Louisiana (LA) is greater than the mean length of a tornado in Georgia (GA).
If necessary, install the Mosaic package.
install.packages("mosaic")
First, we need to obtain a subset of the data set that only contains observations for Louisiana (LA) and Georgia (GA).
Data_LA_GA <- subset(Tornado,State=="LA"|State=="GA") # The | means "or" in R
head(Data_LA_GA)
## Month Day Time State F.Scale Injuries Fatalities PropLoss Length Width
## 3 1 2 10:06:00 LA 1 0 0 25000 0.30 20
## 4 1 2 10:17:00 LA 1 0 0 50000 1.20 50
## 5 1 2 10:30:00 LA 1 0 0 20000 4.64 100
## 6 1 2 10:30:00 LA 1 0 0 150000 2.74 100
## 7 1 2 11:06:00 LA 1 0 0 50000 0.54 50
## 8 1 2 11:30:00 LA 0 0 0 75000 0.54 25
## NumberStates F0
## 3 1 No
## 4 1 No
## 5 1 No
## 6 1 No
## 7 1 No
## 8 1 Yes
Now, run the t.test command in Mosaic. The syntax is
t.test(response variable ~ explanatory variable, data=data frame,alternative = “less” or “greater” or “two.sided”)
t.test(Length ~ State,data=Data_LA_GA,alternative="greater")
##
## Welch Two Sample t-test
##
## data: Length by State
## t = 2.5539, df = 185.4, p-value = 0.005728
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.7505899 Inf
## sample estimates:
## mean in group GA mean in group LA
## 5.345593 3.217586
Notice that Georgia is the first state in the output. This is why the alternative is “greater”. The test statistic is \(t_0 = 2.5539\) and the P-value is 0.0057.
Note The command var.equal=TRUE will pool the standard deviations. By default, the Mosaic package assumes unequal variances (and therefore uses Welch’s t.)