By fishR Blog
The Problem – Binning for Length Frequency Histograms
Fisheries scientists often make histograms of fish lengths. For example, the code below uses hist()
(actually hist.formula()
) from the FSA
package to construct a histogram of total lengths for Chinook Salmon from Argentinian waters.
library(FSA)
data(ChinookArg)
hist(~tl,data=ChinookArg,xlab="Total Length (cm)")
The default bins for these histograms are rarely what the fisheries scientist desires. For example, the 10-cm wide bins shown above resulted in a histogram that lacked detail. Thus, the fisheries scientist may want to construct a histogram with 5-cm wide bins to reveal more detail.
As described in the Introductory Fisheries Analysis with R book, specific bin widths may be created by creating a sequence of numbers that represent the lower values of each bin. This sequence is most easily created with seq()
which takes the minimum value, the maximum value, and a step value (which will be the bin width) as its three arguments. For example, the following constructs a histogram with 5-cm bin widths.
hist(~tl,data=ChinookArg,xlab="Total Length (cm)",breaks=seq(15,125,5))
Definining a sequence for bins is flexible, but it requires the user to identify the minimum and maximum value in the data. This is inefficient because it requires additional code or, more usually, constructing the plot once without any breaks=
. In addition, the breaks are then “hard-wired” which de-generalizes the code and leads to more inefficiency.
As an example, imagine having a markdown template that will be used to construct a length frequency histogram for Chinook Salmon. Suppose that this template will be used to construct histograms for Chinook Salmon from different water bodies, years, etc. Chances are that you will always want 5-cm breaks for these histograms. However, with the hard-wired breaks described above, the user (you!) may have to …read more
Source:: r-bloggers.com