Homogeneity of Proportions

Install the Mosaic package, if necessary.

install.packages("mosaic")

Test for Homogeneity of Proportions:Summarized Data

Enter the data in Table 12 from Section 12.2. Use the do() command. Notice the row variable is Pain and the column variable is Group. Name the data Table12_df.

Use the tally command to convert Table12_df into a contingency table using the tally command.

library(mosaic)
Table12_df <- rbind(
  do(51)*data.frame(Pain="Yes",Group="Zocor"),
  do(5)*data.frame(Pain="Yes",Group="Placebo"),
  do(16)*data.frame(Pain="Yes",Group="Cholestyramine"),
  do(1532)*data.frame(Pain="No",Group="Zocor"),
  do(152)*data.frame(Pain="No",Group="Placebo"),
  do(163)*data.frame(Pain="No",Group="Cholestyramine")
 )
Table12 <- tally(~Pain+Group,data=Table12_df)

Use the xchisq.test on the contingency table in Table 12. Recall, xchisq.test gives expected counts, contribution to the the \(\chi^2\) test statistic, and residuals.

xchisq.test(Table12)

## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 14.707, df = 2, p-value = 0.0006405
## 
##    163      152     1532  
## ( 172.28) ( 151.11) (1523.61)
## [ 0.5003] [ 0.0052] [ 0.0462]
## <-0.707> < 0.072> < 0.215>
##      
##     16        5       51  
## (   6.72) (   5.89) (  59.39)
## [12.8339] [ 0.1346] [ 1.1862]
## < 3.582> <-0.367> <-1.089>
##      
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

The test statistic is \(\chi^2_0 = 14.707\) and the P-value is 0.0006.

Conditional Distribution and Bar Graph

Now, let’s construct a conditional distribution from the Table 12 data frame. Use the tally command. Recall the syntax:

tally(~response variable|explanatory variable,margins=FALSE,format=“proportion”,data=data_frame)

We are treating Group as the explanatory variable (the column variable), so use

Table12_condition <- tally(~Pain|Group,margins=FALSE,format="proportion",data=Table12_df)
Table12_condition

##      Group
## Pain  Cholestyramine    Placebo      Zocor
##   No      0.91061453 0.96815287 0.96778269
##   Yes     0.08938547 0.03184713 0.03221731

A higher proportion of the cholestyramine patients experience abdominal pain (0.089).

Now that we have the conditional distribution, use the barplot command. The syntax is as follows:

barplot(df_name,beside=TRUE)

Note: cex.names decreases the font size of the labels. legend = TRUE adds a legend. ylim=c(0,1.2) adjusts the length of the y-axis so the legend does not overlay the graph. You should experiment with the limits until you are happy with the graph.

barplot(Table12_condition, beside = TRUE, cex.names = .7,legend=TRUE, ylim=c(0,1.2),main="Patients Reporting Abdominal Pain by Treatment", xlab = "Group", ylab = "Relative Frequency", col = c('#6897bb', '#c06723', '#baebae'))

Test for Homogeneity of Proportions:Raw Data

Is there an association between political philosophy and whether one texts while at a red light? Open the SullivanStatsSurveyII data file to answer this question.

Survey <- read.csv("https://sullystats.github.io/Statistics6e/Data/SullivanStatsSurveyII.csv")
head(Survey,n=3)

##   Response_id Gender Age    Education Tax.Rate GenderIncomeInequality
## 1      290408 Female  19 Some College       10                     No
## 2      290410 Female  18 Some College       10                    Yes
## 3      290412 Female  21 Some College       10                    Yes
##   MinWageOpinion MinWageAmount Political.Philosophy Text RetirementDollars
## 1            Yes          10.0             Moderate  Yes           1200000
## 2            Yes           9.0         Conservative   No            350000
## 3            Yes           9.5              Liberal  Yes           1000000
##   RetirementAge DeathAge
## 1            65       90
## 2            61      105
## 3            60       90

Now, let’s build a contingency table using the variables “Political Philosophy” and “Text”.

ContTable <- tally(~Political.Philosophy+Text,data=Survey)
ContTable

##                     Text
## Political.Philosophy No Yes
##         Conservative 18  19
##         Liberal      18  11
##         Moderate     39  29

xchisq.test(ContTable)

## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 1.2953, df = 2, p-value = 0.5233
## 
##    18       19   
## (20.71)  (16.29) 
## [0.354]  [0.450] 
## <-0.60>  < 0.67> 
##    
##    18       11   
## (16.23)  (12.77) 
## [0.193]  [0.245] 
## < 0.44>  <-0.49> 
##    
##    39       29   
## (38.06)  (29.94) 
## [0.023]  [0.030] 
## < 0.15>  <-0.17> 
##    
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

Survey_condition <- tally(~Text|Political.Philosophy,margins=FALSE,format="proportion",data=Survey)
barplot(Survey_condition, beside = TRUE, cex.names = .7,legend=TRUE, ylim=c(0,1.2),main="Do You Text While at Red Lights", xlab = "Political Philosophy", ylab = "Relative Frequency", col = c('#6897bb', '#c06723'))