Construct Histograms of Discrete Data

First, read in Table 8 (Wendy’s Number of Arrivals)

Table8 <- read.csv("https://sullystats.github.io/Statistics6e/Data/Chapter2/Table8.csv")
Table8

##    Arrivals
## 1         7
## 2         5
## 3         2
## 4         6
## 5         2
## 6         6
## 7         6
## 8         7
## 9         5
## 10        2
## 11        6
## 12        6
## 13        1
## 14        5
## 15        9
## 16        6
## 17       11
## 18        2
## 19        3
## 20        7
## 21        4
## 22        4
## 23        4
## 24        7
## 25        5
## 26        6
## 27        5
## 28        8
## 29        5
## 30        9
## 31        2
## 32        7
## 33        2
## 34        4
## 35        8
## 36        6
## 37        6
## 38        6
## 39        6
## 40        5

Using Mosaic

If you have not done so already, be sure to install Mosaic.

install.packages("mosaic")

The Mosaic packages has a basic structure:

function_name(~variable_name,data = data_file)

To create a histogram in R, we use the gf_function function. Notice the variable name in Table 8 is “Arrivals”.

library(mosaic)
gf_histogram(~Arrivals,data=Table8,binwidth=1,center=0,col="black",fill="blue",title="Arrivals at Wendy's")

Note: binwidth = 1 makes the width of each bar 1; center = 0 centers the label within the bar.

To create a relative frequency histogram, make use of the gf_refine function.

Notice that we are using two functions in one command through the use of the plus (+) sign.

gf_histogram(~Arrivals,data=Table8,binwidth=1,center=0,col="black",fill="blue",title="Arrivals at Wendy's") + 
  gf_refine(scale_y_continuous(sec.axis=sec_axis(trans=~./nrow(Table8),name="Relative Frequency")))

The left side of the histogram is still count, but the right side is relative frequency. A little strange, but it gets the point across.

Using Base R

To create a histogram with frequencies as the labels, use the following function:

hist(table$column_name, breaks=0:max(table$column_name))

Note: Remember that you can use “main=”, “xlab=”, “ylab=”, and “col=” to customize the titles in the graph.

hist(Table8$Arrivals, breaks = 0:max(Table8$Arrivals), xlab = "Arrivals", ylab="Frequency", main = "Arrivals at Wendy's", col = '#6897bb')

Note: Click here to find color codes.

The command breaks=0:max(Table8$Arrivals) is used to determine the number of classes to create. The breaks command determines how many cut-off points your histogram has. For example, breaks=5 would create 5 cut-off points (or 6 classes).

The problem with the histogram above is that the cut-off points are the values of the observations. We would prefer our histogram to have the bars centered over the observations with a class width of 1. Can we force R to do this? Yes.

In R, we can force the location of the breaks using the c() function as follows:

breaks = c(-0.5,0.5,1.5,2.5, ., 11.5)

If you don’t want to enter all the values of the cut-off points, use the seq( ) command.

breaks = c(-0.5,seq(0.5,max(Table8$Arrivals)+0.5,1))

The seq() command is short for “sequence”. Above, we are creating a sequence of values starting at 0.5 and ending at the largest value in Table8 plus 0.5 while going in increments of 1. We must manually enter the starting point of the column vector, -0.5, however.

To add x-axis labels to every bar, use the axis() command as follows:

axis(side=1, at=seq(0,12,1), labels=seq(0,12,1))

In addition, if you want to see the data values at the top of each bar, add the input labels = TRUE to the command.

hist(Table8$Arrivals,breaks =c(-0.5, seq(0.5, max(Table8$Arrivals) + 1.5, 1)), xlab = "Arrivals", main = "Arrivals at Wendy's", col = '#6897bb', labels = TRUE)

axis(side=1, at=seq(0,12,1), labels=seq(0,12,1))

To create a relative frequency histogram, add probability=T to the same command as above and adjust your title accordingly. Also, the default label for the vertical axis is “Density”. Therefore, we add the y-label “Relative Frequency”.

hist(Table8$Arrivals,breaks =c(-0.5, seq(0.5, max(Table8$Arrivals) + 1.5, 1)), probability = T, xlab = "Arrivals", ylab = "Relative Frequency", main = "Arrivals at Wendy's", col = '#6897bb', labels = TRUE)

axis(side=1, at=seq(0,12,1), labels=seq(0,12,1))