Inference about Two Means: Indpendent Samples

install.packages("mosaic")

Hypothesis Tests for Two Independent Means: Summarized Data

Suppose we want to test

\(H_0:\mu_1 = \mu_2\)
\(H_1:\mu_1 \neq \mu_2\)

Further suppose we have the following summary statistics:

\(\bar{x}_1 = 43.2\)
\(s_1 = 3.2\)
\(n_1 = 40\)
\(\bar{x}_2 = 44.9\)
\(s_2 = 3.6\)
\(n_2 = 45\)

To conduct this test, install the PASWR package.

install.packages("PASWR")

library(PASWR)
tsum.test(mean.x=43.2,s.x=3.2,n.x=40,mean.y=44.9,s.y=3.6,n.y=45,alternative="two.sided")

## 
##  Welch Modified Two-Sample t-Test
## 
## data:  Summarized x and y
## t = -2.3049, df = 83, p-value = 0.02367
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.1669843 -0.2330157
## sample estimates:
## mean of x mean of y 
##      43.2      44.9

The P-value is 0.02367.

Hypothesis Tests for Two Independent Means: Raw Data in Two Columns

We will use the data and scenario from Example 1 in Section 11.3 where we are testing

\(H_0:\mu_1 = \mu_2\)
\(H_1:\mu_1 \neq \mu_2\)

Table3 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter11/Table3.csv")
head(Table3,n=3)

##   Flight Control
## 1   8.59    8.65
## 2   6.87    7.62
## 3   7.00    7.33

Use the following command to perform a hypothesis test:

t.test(x, y, alternative = “less” or “greater” or “two.sided”)

where: x is the first column in the data set (Sample 1) y is the second column in the data set (Sample 2)

t.test(Table3$Flight, Table3$Control, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  Table3$Flight and Table3$Control
## t = -1.4368, df = 25.996, p-value = 0.1627
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.3351258  0.2365544
## sample estimates:
## mean of x mean of y 
##  7.880714  8.430000

The test statistic is \(t_0 = -1.4368\) and the P-value is 0.1627.

Hypothesis Tests for Two Independent Means: Raw Data-One Column with Quantitative Variable; One with Qualitative Variable

When the categorical variable is in one column and the quantitative response variable is in a second column, we use the Mosaic package.

Let’s use the Tornado_2017.csv data.

Tornado <- read.csv("https://sullystats.github.io/Statistics6e/Data/Tornadoes_2017.csv")
head(Tornado,n=3)

##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 1     1   2  9:03:00    TX       1        0          0    30000   2.55   100
## 2     1   2  9:44:00    TX       1        0          0    30000   2.57   100
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
##   NumberStates F0
## 1            1 No
## 2            1 No
## 3            1 No

Suppose we want to know if the mean length of a tornado in Louisiana (LA) is greater than the mean length of a tornado in Georgia (GA).

If necessary, install the Mosaic package.

install.packages("mosaic")

First, we need to obtain a subset of the data set that only contains observations for Louisiana (LA) and Georgia (GA).

Data_LA_GA <- subset(Tornado,State=="LA"|State=="GA")  # The | means "or" in R
head(Data_LA_GA)

##   Month Day     Time State F.Scale Injuries Fatalities PropLoss Length Width
## 3     1   2 10:06:00    LA       1        0          0    25000   0.30    20
## 4     1   2 10:17:00    LA       1        0          0    50000   1.20    50
## 5     1   2 10:30:00    LA       1        0          0    20000   4.64   100
## 6     1   2 10:30:00    LA       1        0          0   150000   2.74   100
## 7     1   2 11:06:00    LA       1        0          0    50000   0.54    50
## 8     1   2 11:30:00    LA       0        0          0    75000   0.54    25
##   NumberStates  F0
## 3            1  No
## 4            1  No
## 5            1  No
## 6            1  No
## 7            1  No
## 8            1 Yes

Now, run the t.test command in Mosaic. The syntax is

t.test(response variable ~ explanatory variable, data=data frame,alternative = “less” or “greater” or “two.sided”)

t.test(Length ~ State,data=Data_LA_GA,alternative="greater")

## 
##  Welch Two Sample t-test
## 
## data:  Length by State
## t = 2.5539, df = 185.4, p-value = 0.005728
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.7505899       Inf
## sample estimates:
## mean in group GA mean in group LA 
##         5.345593         3.217586

Notice that Georgia is the first state in the output. This is why the alternative is “greater”. The test statistic is \(t_0 = 2.5539\) and the P-value is 0.0057.

Note The command var.equal=TRUE will pool the standard deviations. By default, the Mosaic package assumes unequal variances (and therefore uses Welch’s t.)