Goodness-of-Fit Test

Install the Mosaic package, if necessary.

install.packages("mosaic")

Finding Expected Counts

Load the data from Table 1 in Section 12.1 into R.

Table1 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter12/Table1.csv")
head(Table1,n=3)

##            Income Observed Proportion
## 1   Under $15,000      161      0.099
## 2 $15,000-$24,999      144      0.098
## 3 $25,000-$34,999      138      0.093

Now, let’s find the expected counts. Recall, expected counts are

\(E_i = np_i\)

library(mosaic)
n <- sum(~Observed,data=Table1)
Expected <- n*Table1$Proportion
Expected

## [1] 148.5 147.0 139.5 202.5 268.5 196.5 223.5  91.5  82.5

Perform a Goodness-of-Fit Test (Unequal Proportions)

Using Direct Computation

The test statistic in a chi-square Goodness-of-Fit Test is

\(\chi^2_0 = \sum{\frac{(Observed - Expected)^2}{Expected}}\)

Teststat <- sum(~((Table1$Observed - Expected)^2/Expected))
Teststat

## [1] 20.69282

The P-value is the area under the chi-square distribution with k - 1 degrees of freedom, where k is the number of categories of data.

1 - pchisq(Teststat,df=8)

## [1] 0.008009763

The P-value is 0.008.

Using a Direct Command

The direct commands for conducting the chi-square Goodness-of-Fit test is chisq.test or xchisq.test. The advantage of the xchisq.test command is the output includes expected counts, contribution to the test statistic, and residuals. There are two options with the xchisq.test command.

Option 1: Use expected counts.

If using expected counts, the syntax of the command is

xchsq.test(x = Observed Counts,p = Expected Counts, rescale.p = TRUE)

For our scenario, the observed counts are in Table 1 while the expected counts are in their on column vector.

xchisq.test(x=Table1$Observed,p=Expected,rescale.p = TRUE)

## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 20.693, df = 8, p-value = 0.00801
## 
##   161      144      138      184      247      188      217      105      116   
## (148.50) (147.00) (139.50) (202.50) (268.50) (196.50) (223.50) ( 91.50) ( 82.50)
## [ 1.052] [ 0.061] [ 0.016] [ 1.690] [ 1.722] [ 0.368] [ 0.189] [ 1.992] [13.603]
## < 1.03>  <-0.25>  <-0.13>  <-1.30>  <-1.31>  <-0.61>  <-0.43>  < 1.41>  < 3.69> 
##                  
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

Option 2: Use expected proportions

xchsq.test(x = Observed Counts,p = Expected proportions)

For our scenario, the observed counts and expected proportionsn are both in Table 1. Notice the rescale.p=TRUE command is not needed when using expected proportions.

xchisq.test(x=Table1$Observed,p=Table1$Proportion)

## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 20.693, df = 8, p-value = 0.00801
## 
##   161      144      138      184      247      188      217      105      116   
## (148.50) (147.00) (139.50) (202.50) (268.50) (196.50) (223.50) ( 91.50) ( 82.50)
## [ 1.052] [ 0.061] [ 0.016] [ 1.690] [ 1.722] [ 0.368] [ 0.189] [ 1.992] [13.603]
## < 1.03>  <-0.25>  <-0.13>  <-1.30>  <-1.31>  <-0.61>  <-0.43>  < 1.41>  < 3.69> 
##                  
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

The test statistic is \(\chi^2_0 = 20.693\) and the P-value is 0.008.

Perform a Goodness-of-Fit Test (Equal Proportions)

Suppose the expected proportions are uniform, as in Example 3 from Section 12.1. Then, simply enter the observed data. Then, run xchisq.test without the expected probability in the syntax.

xchisq.test.(x = Observed Frequencies)

Day <- c(46,76,83,81,81,80,53)
xchisq.test(x=Day)

## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 19.568, df = 6, p-value = 0.003305
## 
##  46.00    76.00    83.00    81.00    81.00    80.00    53.00  
## (71.43)  (71.43)  (71.43)  (71.43)  (71.43)  (71.43)  (71.43) 
##  [9.05]   [0.29]   [1.87]   [1.28]   [1.28]   [1.03]   [4.75] 
## <-3.01>  < 0.54>  < 1.37>  < 1.13>  < 1.13>  < 1.01>  <-2.18> 
##              
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

The test statistic is \(\chi^2_0 = 19.568\) and the P-value is 0.0033.