1

In our curriculum we have various exercises on calculating the three elementary measures of central tendency - mean, median, and mode for grouped data. For the same, we have been taught the following formulae:

Mean: $$\frac{\sum_i f_ix_i}{\sum_i f_i}$$

Median: $$l + \frac{\frac{N}{2}- cf}{f} \cdot h$$ where $l$ is the lower limit, $f$ the frequency, and $h$ the width of the median class, $cf$ the cumulative frequency of the preceding class, and $N = \sum_i {f_i}$

Mode:

$$l + h\cdot \frac{f_1-f_0}{2f_1-f_0-f_2}$$ where $l$, $h$ are similarly defined but this time for the modal class, and $f_1$ being the frequency of the modal class, $f_0$ the frequency of the preceding class, and $f_2$ the frequency of the succeeding class.

Unfortunately, we have just been told to memorise these formulae and taught to find out the different arguments to put into the formulae according to the type of problem. This has left me curious about the intuition and proofs of these formulae.

Further, another book of mine mentions a complete different approach towards finding the mode using “variates”, but does not bother to explain other than some worked out examples.

Online resources are also unhelpful for all I have tried to find.

Hence any explanation on these formulae is appreciated. Any alternative and perhaps more intuitive methods for finding these values are also welcome.

Sahaj
  • 5,355

1 Answers1

1

These are in a sense intuitive (though I do not particularly like the notation, but will stick with it to describe the thought process), so you could if you wished work them out in an exam rather than memorising them now. Practising the thought processes now would be worthwhile, whether or not you memorise the formulae.

  • Mean: Take a representative central value $x_i$ for each group (perhaps the midpoint if the groups are bounded), weight them by the frequency of the group to give an estimate of the sum for each group $f_ix_i$, add these up across the groups to give $\sum f_i x_i$ as an estimate of the overall sum, and finally divide by the total frequency of $N=\sum f_i$ to give $\frac{\sum f_i x_i}{\sum f_i}$ to give an estimate of the overall mean

  • Median: Find which group contains the median (the first for which the cumulative frequency $\ge \frac N2$), see that the previous cumulative sum leaves $\frac N2 -cf$ to get to the median, find what proportion of the $f$ in that group might be below the median to get to exactly $\frac N2$ using $\frac{\frac{N}{2}- cf}{f}$, and add that proportion of the group width $h$ to the lower end of the median group $l$ to get $l + \frac{\frac{N}{2}- cf}{f} \cdot h$

  • Mode: Find which group has the highest frequency $f_1$. Assume that the modal point within that group is closer to the adjacent group with the nearer frequency so comparing $f_1-f_0$ and $f_1-f_2$ (if $f_1-f_0\approx 0$ and $f_1-f_2>0$ you want the bottom end, while if $f_1-f_0>0$ and $f_1-f_2\approx 0$ you want the top end, and if $f_1-f_0=f_1-f_2$ you want the middle) so consider $\frac{f_1-f_0}{(f_1-f_0)+(f_2-f_2)}= \frac{f_1-f_0}{2f_1-f_0-f_2}$ as the proportion of the modal group which might be below the mode and, like the median, use this proportion to choose the proportion of of the group width $h$ to add to the lower end of that group $l$ to get $l + \frac{f_1-f_0}{2f_1-f_0-f_2} \cdot h$

Henry
  • 169,616
  • Thanks for the answer. Why did you say that you do not like this notation? What other notation can be used here? – Sahaj Sep 13 '23 at 14:53
  • @SahajSatishSharma Some of my concerns include the mean estimator using $\sum f_i$ where the median uses $N$ to mean the same thing. Similarly the median uses $f$ where the mode uses $f_1$ for the same thing so there could be more consistent use of subscripts. And I would use $w$ for width rather than $h$. So as an alternative, if the median group is group $k$ write $l_k + \dfrac{\frac{\sum f_i}{2}- cf_{k-1}}{f_k} \cdot w_k$ for the median estimator, and if the modal group is group $c$ then write $l_c + \dfrac{f_c-f_{c-1}}{(f_{c}-f_{c-1})+(f_c-f_{c+1})} \cdot w_c$ for the mode estimator – Henry Sep 13 '23 at 15:16
  • Yes, that does make it more consistent and more helpful for a student such as myself. – Sahaj Sep 13 '23 at 15:21