Sampling Distribution of the Sample Proportion

First, be sure the package mosaic is installed.

install.packages('mosaic')

Now, we will load population data. First, let’s consider the fare charged by ALL Chicago taxi rides on a single day.

Taxi <- read.csv("https://sullystats.github.io/Statistics6e/Data/ChicagoTaxi.csv")
head(Taxi,n=4)

##   Trip  Fare Payment
## 1  300  6.50    Cash
## 2 1281 42.25  Credit
## 3  780 10.75    Cash
## 4  900 17.00  Credit

We are going to focus on the variable “Payment Method”, which is how the the fare is paid – cash or credit.

Let’s look at the distribution of this variable and get some summary statistics.

library(mosaic)
options(digits=3)
tally(~Payment,format="proportion", data=Taxi)

## Payment
##   Cash Credit 
##  0.523  0.477

The population proportion of fares paid with cash is 0.523.

Now, let’s take a random sample of n = 50 rides from this data set and determine the sample proportion of fares paid with cash.

tally(~Payment,format="proportion",data=sample(Taxi,50))   # Find the sample proportion of a sample of size 50

## Payment
##   Cash Credit 
##   0.52   0.48

Let’s take another random sample of n = 50 rides from this data set and determine the sample proportion of fares paid with cash.

tally(~Payment,format="proportion",data=sample(Taxi,50))   # Find the sample proportion of a sample of size 50

## Payment
##   Cash Credit 
##   0.56   0.44

SamplingDist <- bind_rows(do(5000)*c(prop = tally(~Payment,format="proportion",data=sample(Taxi,50))))
head(SamplingDist,n=4)

##   prop.Cash prop.Credit
## 1      0.44        0.56
## 2      0.44        0.56
## 3      0.40        0.60
## 4      0.58        0.42

Notice that the sample proportion of cash payments (prop.Cash) varies from sample to sample. Now, let’s look at the shape, center, and spread of the sampling distribution of \(\hat{p}\).

gf_histogram(~prop.Cash,data=SamplingDist,binwidth=0.02,color="black",fill="blue",xlab="Sample Proportion of Cash Payments",ylab="Frequency",title="Distribution of Sample Proportion of Cash Payments for Taxi Rides in Chicago")

mean(~prop.Cash,data=SamplingDist)

## [1] 0.524

sd(~prop.Cash,data=SamplingDist)

## [1] 0.0725

The shape of the distribution of \(\hat{p}\) is approximately normal because np(1 - p) \(\geq\) 10. The mean and standard deviation of the distribution of \(\hat{p}\) is \(\mu_\hat{p} = p\) and \(\sigma_\hat{p} = \sqrt\frac{p(1-p)}{n}\).