Mean and Standard Deviation from Grouped Data

Load Data

Option 1 Use Excel or Google Sheets to create a csv file. The first column should be the class midpoint, $x_i$, and the second column should be the frequency, $f_i$. Be sure to title each column.

We will use the data in Table 14 of Section 3.3.

Table14 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter3/Table14.csv")
head(Table14,n=4)

##   Midpoint Frequency
## 1     62.5         1
## 2     87.5         0
## 3    112.5         7
## 4    137.5        10

Option 2 Enter the data directly into R and create a data frame.

Midpoint <- c(62.5,87.5,112.5,137.5,162.5,187.5,212.5,237.5,262.5,287.5,312.5)
Freq <- c(1,0,7,10,5,4,13,4,5,0,1)
Table14a <- data.frame(Midpoint,Freq)
head(Table14a,n=4)

##   Midpoint Freq
## 1     62.5    1
## 2     87.5    0
## 3    112.5    7
## 4    137.5   10

To find the mean and standard deviation from grouped data, we need to install a package called Weighted.Desc.Stat.

install.packages("Weighted.Desc.Stat")

Now, we will call the Weighted.Desc.Stat library and find the mean and standard deviation of the data in Table 14.

library(Weighted.Desc.Stat)
w.mean(Table14$Midpoint,Table14$Frequency)   # Find the mean

## [1] 182.5

w.sd(Table14$Midpoint,Table14$Frequency)    # Find the standard deviation

## [1] 53.61903

The weighted mean $\overline{x}_w$ = $182.50.

The weighted standard deviation supplied is a population standard deviation. Because the data in Table 14 is sample data, we need to multiply the weighted standard deviation by $\sqrt{\frac{n}{n - 1}}$. In this problem, n = 50, so multiply the weighted standard deviation by $\sqrt{\frac{50}{49}}$.

w.sd(Table14$Midpoint,Table14$Frequency)*sqrt(50/49)  # Find the sample standard deviation.

## [1] 54.1634

So, the weighted sample standard deviation is $54.16.