First, be sure the package mosaic is installed.
install.packages('mosaic')
Now, we will load population data. First, let’s consider the fare charged by ALL Chicago taxi rides on a single day.
Taxi <- read.csv("https://sullystats.github.io/Statistics6e/Data/ChicagoTaxi.csv")
head(Taxi,n=4)
## Trip Fare Payment
## 1 300 6.50 Cash
## 2 1281 42.25 Credit
## 3 780 10.75 Cash
## 4 900 17.00 Credit
We are going to focus on the variable “Payment Method”, which is how the the fare is paid – cash or credit.
Let’s look at the distribution of this variable and get some summary statistics.
library(mosaic)
options(digits=3)
tally(~Payment,format="proportion", data=Taxi)
## Payment
## Cash Credit
## 0.523 0.477
The population proportion of fares paid with cash is 0.523.
Now, let’s take a random sample of n = 50 rides from this data set and determine the sample proportion of fares paid with cash.
tally(~Payment,format="proportion",data=sample(Taxi,50)) # Find the sample proportion of a sample of size 50
## Payment
## Cash Credit
## 0.52 0.48
Let’s take another random sample of n = 50 rides from this data set and determine the sample proportion of fares paid with cash.
tally(~Payment,format="proportion",data=sample(Taxi,50)) # Find the sample proportion of a sample of size 50
## Payment
## Cash Credit
## 0.56 0.44
SamplingDist <- bind_rows(do(5000)*c(prop = tally(~Payment,format="proportion",data=sample(Taxi,50))))
head(SamplingDist,n=4)
## prop.Cash prop.Credit
## 1 0.44 0.56
## 2 0.44 0.56
## 3 0.40 0.60
## 4 0.58 0.42
Notice that the sample proportion of cash payments (prop.Cash) varies from sample to sample. Now, let’s look at the shape, center, and spread of the sampling distribution of \(\hat{p}\).
gf_histogram(~prop.Cash,data=SamplingDist,binwidth=0.02,color="black",fill="blue",xlab="Sample Proportion of Cash Payments",ylab="Frequency",title="Distribution of Sample Proportion of Cash Payments for Taxi Rides in Chicago")
mean(~prop.Cash,data=SamplingDist)
## [1] 0.524
sd(~prop.Cash,data=SamplingDist)
## [1] 0.0725
The shape of the distribution of \(\hat{p}\) is approximately normal because np(1 - p) \(\geq\) 10. The mean and standard deviation of the distribution of \(\hat{p}\) is \(\mu_\hat{p} = p\) and \(\sigma_\hat{p} = \sqrt\frac{p(1-p)}{n}\).