Install the Mosaic package, if necessary.
install.packages("mosaic")
Load the data from Table 1 in Section 12.1 into R.
Table1 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter12/Table1.csv")
head(Table1,n=3)
## Income Observed Proportion
## 1 Under $15,000 161 0.099
## 2 $15,000-$24,999 144 0.098
## 3 $25,000-$34,999 138 0.093
Now, letโs find the expected counts. Recall, expected counts are
\(E_i = np_i\)
library(mosaic)
n <- sum(~Observed,data=Table1)
Expected <- n*Table1$Proportion
Expected
## [1] 148.5 147.0 139.5 202.5 268.5 196.5 223.5 91.5 82.5
The test statistic in a chi-square Goodness-of-Fit Test is
\(\chi^2_0 = \sum{\frac{(Observed - Expected)^2}{Expected}}\)
Teststat <- sum(~((Table1$Observed - Expected)^2/Expected))
Teststat
## [1] 20.69282
The P-value is the area under the chi-square distribution with k - 1 degrees of freedom, where k is the number of categories of data.
1 - pchisq(Teststat,df=8)
## [1] 0.008009763
The P-value is 0.008.
The direct commands for conducting the chi-square Goodness-of-Fit test is chisq.test or xchisq.test. The advantage of the xchisq.test command is the output includes expected counts, contribution to the test statistic, and residuals. There are two options with the xchisq.test command.
Option 1: Use expected counts.
If using expected counts, the syntax of the command is
xchsq.test(x = Observed Counts,p = Expected Counts, rescale.p = TRUE)
For our scenario, the observed counts are in Table 1 while the expected counts are in their on column vector.
xchisq.test(x=Table1$Observed,p=Expected,rescale.p = TRUE)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 20.693, df = 8, p-value = 0.00801
##
## 161 144 138 184 247 188 217 105 116
## (148.50) (147.00) (139.50) (202.50) (268.50) (196.50) (223.50) ( 91.50) ( 82.50)
## [ 1.052] [ 0.061] [ 0.016] [ 1.690] [ 1.722] [ 0.368] [ 0.189] [ 1.992] [13.603]
## < 1.03> <-0.25> <-0.13> <-1.30> <-1.31> <-0.61> <-0.43> < 1.41> < 3.69>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>
Option 2: Use expected proportions
xchsq.test(x = Observed Counts,p = Expected proportions)
For our scenario, the observed counts and expected proportionsn are both in Table 1. Notice the rescale.p=TRUE command is not needed when using expected proportions.
xchisq.test(x=Table1$Observed,p=Table1$Proportion)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 20.693, df = 8, p-value = 0.00801
##
## 161 144 138 184 247 188 217 105 116
## (148.50) (147.00) (139.50) (202.50) (268.50) (196.50) (223.50) ( 91.50) ( 82.50)
## [ 1.052] [ 0.061] [ 0.016] [ 1.690] [ 1.722] [ 0.368] [ 0.189] [ 1.992] [13.603]
## < 1.03> <-0.25> <-0.13> <-1.30> <-1.31> <-0.61> <-0.43> < 1.41> < 3.69>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>
The test statistic is \(\chi^2_0 = 20.693\) and the P-value is 0.008.
Suppose the expected proportions are uniform, as in Example 3 from Section 12.1. Then, simply enter the observed data. Then, run xchisq.test without the expected probability in the syntax.
xchisq.test.(x = Observed Frequencies)
Day <- c(46,76,83,81,81,80,53)
xchisq.test(x=Day)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 19.568, df = 6, p-value = 0.003305
##
## 46.00 76.00 83.00 81.00 81.00 80.00 53.00
## (71.43) (71.43) (71.43) (71.43) (71.43) (71.43) (71.43)
## [9.05] [0.29] [1.87] [1.28] [1.28] [1.03] [4.75]
## <-3.01> < 0.54> < 1.37> < 1.13> < 1.13> < 1.01> <-2.18>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>
The test statistic is \(\chi^2_0 = 19.568\) and the P-value is 0.0033.