21

The geothmetic meandian, $G_{MDN}$ is defined in this XKCD as

$$F(x_1, x_2, ..., x_n) = \left(\frac{x_1 +x_2+\cdots+x_n}{n}, \sqrt[n]{x_1 x_2 \cdots x_n}, x_{\frac{n+1}{2}} \right)$$

$$G_{MDN}(x_1, x_2, \ldots, x_n) = F(F(F(\ldots F(x_1, x_2, \ldots, x_n)\ldots)))$$

The comic also (correctly) claims that $G_{MDN}(1, 1, 2, 3, 5) \approx 2.089$

There are two convergence questions I'm interested in:

  1. For what values of $(x_1, x_2, \ldots, x_n)$ does $G_{MDN}$ converge to a single number?

  2. For what values of $(x_1, x_2, \ldots, x_n)$ does $G_{MDN}$ converge, but not to a single number?

I've written up Python 3 code so that you can test out numbers yourself by changing the values at the bottom of the code. If the code never seems to stop running, then $G_{MDN}$ does not converge for your input. (In these situations, you can set verbose=True to see what's happening.)

# assumes Python 3 because I assume that "/" always means float division
from typing import Iterable, Tuple, Union
from decimal import Decimal
import math
from functools import reduce  # Required in Python 3
import operator

from https://stackoverflow.com/a/48648756

def prod(iterable): return reduce(operator.mul, iterable, 1)

def geothmetic_meandian(nums: Iterable[float], verbose=False) -> Tuple[bool, Tuple[float, float, float]]: def inner_geothmetic_meandian(nums: Iterable[float]) -> Tuple[float]: arithmetic_mean = sum(nums)/len(nums) geometric_mean = prod(nums)**(1 / len(nums)) sorted_nums = sorted(list(nums)) if len(nums) % 2 == 0: # even number of numbers higher_median_index = int(len(nums) / 2) lower_median_index = higher_median_index - 1 median = (sorted_nums[higher_median_index] + sorted_nums[lower_median_index]) / 2 else: # odd number of numbers median = sorted_nums[int((len(nums) - 1) / 2)]

    return (arithmetic_mean, geometric_mean, median)

last_ans = None
ans = inner_geothmetic_meandian(nums)
converged = True
while not (ans[0] == ans[1] == ans[2]):
    if ans == last_ans:
        converged = False
        break
    last_ans = ans
    ans = inner_geothmetic_meandian(ans)

    if verbose:
        print(ans)

return converged, ans


if name == "main": verbose = False values = (1, 1, 3, 2, 5) converged, results = geothmetic_meandian(values, verbose=verbose) if converged: print(f"The geothmetic meandian of {values} converged to: {results[0]}") else: print(f"The geothmetic meandian of {values} did not converge to a single value:\nArithmetic Mean: {results[0]}\nGeometric Mean: {results[1]}\nMedian: {results[2]}")

I've tested this code to verify that $G_{MDN}(1, 1, 2, 3, 5) \approx 2.089$.

I have also not found any inputs that cause the program to not converge at all.

However, I have found that $G_{MDN}(1, 2, 3, 4, 5) = (2.8993696858822964, 2.899369685882296, 2.8993696858822964)$ which would mean that $(1, 2, 3, 4, 5)$ is in the second convergence class, where it converged, just not all to the same number. But I'm worried that this result is due to a rounding error in Python itself. (I quickly tried and failed to use the Decimal class due to the nth root operation.)

Thus, my hunch is that all inputs converge to a single number, but I have not been able to prove this yet. It looks like an epsilon-delta proof.

Mike Earnest
  • 84,902
Pro Q
  • 943
  • 2
    It seems likely at least at first glance that convergence for all $x_i$ is a straightforward corollary of the convergence of the arithmetic-geometric mean. – Steven Stadnicki Mar 11 '21 at 03:29
  • The hover text on the comic says "Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse" which makes me think that it might not be completely straightforward, but I'm not entirely sure. – Pro Q Mar 11 '21 at 03:35
  • 1
    You have two values. The arithmetic and geometric means should be somewhere in between, and the median should be the repeated value. So only one of the three values should survive to the next round--but all three do. Counting the digits, it seems you're at about the limit of precision for 64-bit IEEE-754 numbers, so you're looking at a rounding effect. – David K Mar 11 '21 at 04:42
  • 1
    Don't compare float using strict equality, use epsilon ball test instead. – jlandercy Mar 11 '21 at 05:38
  • 1
    I just wanted to ask this question! Take an upvote. – Aldoggen Mar 11 '21 at 15:35
  • 3
    Regarding the numeric experiment, the Julia code using Statistics, StatsBase; F(x) = (mean(x), geomean(x), median(x)); Fn(x, n) = ∘(fill(F, n)...)(x); Fn(BigFloat[1,2,3,4,5], 100) shows convergence of the three values to the same number up to the 48th decimal after 100 iterations (using the default BigFloat precision of 256 bits). – sijo Mar 12 '21 at 13:18

3 Answers3

13

Let $a_n,g_n$ and $m_n$ denote the three means computed at stage $n$, for $n\in \{1,2,3,\dots\}$. Let $$\begin{align}l_n&=\min(a_n,g_n,m_n)\\u_n&=\max(a_n,g_n,m_n)\end{align}$$ Since $u_n$ is a non-increasing sequence, and $l_n$ is a non-decreasing sequence, in order to prove convergence, it suffices to prove that $u_n-l_n\to 0$. We do this by showing $(u_{n+2}-l_{n+2})\le \frac79 (u_n-l_n).$

Note that $$a_{n+1}=\tfrac13(a_n+g_n+m_n)\le \tfrac{2}3u_n+\tfrac13 l_n,$$which follows by replacing the smallest of $\{a_n,g_n,m_n\}$ with $l_n$, and upper bounding the other two with $u_n$. Then, by the AM-GM inequality, $g_{n+1}\le \frac{2}3u_n+\frac13 l_n$ as well. The last two inequalities imply $$ m_{n+2}\le \tfrac23u_n+\tfrac13l_n. $$ We have given an upper bound for one of the means in the $(n+2)^{nd}$ stage. For the other two, note $$ \begin{align} g_{n+2}\le a_{n+2} &=\frac{a_{n+1}+g_{n+1}+m_{n+1}}3 \\ &\le \frac{2a_{n+1}+m_{n+1}}3\tag{AM-GM} \\ &=\frac{2a_n+2g_n+2m_n+3m_{n+1}}{9} \\ &\le\frac{2a_n+2g_n+2m_n+3u_{n}}{9}\tag{$m_{n+1}\le u_n$} \\ &\le\frac{7u_n+2l_{n}}{9} \end{align} $$ For the last step, we are replacing the smallest of $a_n,g_n,m_n$ with $l_n$, and upper bounding the other two with $u_n$.

We have shown that all three of $m_{n+2},g_{n+2}$, and $a_{n+2}$ are at most $\tfrac79u_n+\tfrac29u_n$, so we conclude that $$ u_{n+2}\le \tfrac79u_n+\tfrac29l_n$$ which implies $$ u_{n+2}-l_{n+2}\le u_{n+2}-l_n\le \tfrac79(u_n-l_n) $$


Note that this shows that $(u_n-l_n)\in O\left(c^n\right)$, where $c=\frac{\sqrt7}3$. It is actually possible to prove that for any $\epsilon>0$ that $$ u_n-l_n\in O\left(\frac1{3^n} \right). $$

Mike Earnest
  • 84,902
  • Nice! A lot of answers seem to rely on the range changing at each iteration, which is really unreliable b/c (epislon,x,x) goes to (epsilon, 2x/3, x), this counterexample has broken every proof except yours. :) – Ryan Yang-Liu Jul 20 '21 at 20:19
7

Let $$a_k=\min(F^{(k)}(x_1, x_2, ..., x_n))$$ and $$b_k=\max(F^{(k)}(x_1, x_2, ..., x_n))$$ It follows from their definitions that the arithmetic and geometric means and the median of $F^{(k)}(x_1, x_2, ..., x_n)$ are all $\ge a_k$ and $\le b_k$, so $a_{k+1}\ge a_k$ and $b_{k+1}\le b_k$. We also know that $a_k$ and $b_k$ are bounded, so they must be convergent.

Let $\lim_{k\to\infty}a_k=a$ and $\lim_{k\to\infty}b_k=b$. Suppose for contradiction that $a\neq b$. Using the definition of limits, it’s fairly easy to show that you can pick $k$ such that the arithmetic and geometric means of $F^{(k)}(x_1, x_2, ..., x_n)$ are both greater than $a$ and less than $b$, which means that at least one of $a<a_{k+1}<b$ or $a<b_{k+1}<b$ is true, which contradicts $a_k$ is non-decreasing, $b_k$ is non-increasing, and their limits are $a$ and $b$. Thus $a=b$, and $G_{MDN}$ converges to a single number for all possible values.

iosce
  • 440
  • I don't get how you can find $k$ such that AM and GM are both in $(a,b)$ – Exodd Mar 11 '21 at 08:38
  • 1
    For any $\epsilon\ge 0$, we can choose $N$ such that for all $k\ge N$, $a-\epsilon<a_k\le a$ and $b\le b_k<b+\epsilon$. Let $a_k, v_k, b_k$ be the three elements of $F^{(k)}(x_1, x_2,...,x_n)$ for some $k\ge N$. We want to pick $\epsilon$ such that $\frac{a_k+b_k+v_k}{3}\in(a,b)$. By definition of $a_k$ and $b_k$ we know $a_k\le v_k\le b_k$, so $\frac{a_k+b_k+v_k}{3}\le\frac{a_k+2b_k}{3}\le\frac{a_k+2(b+\epsilon)}{3}$. Rearranging this gets an $\epsilon$ such that the arithmetic mean is less than $b$, and we can use a similar process to show it’s greater than $a$, and for the geometric mean. – iosce Mar 11 '21 at 11:08
-2

I mean, trivially, a list of an odd amount of elements all greater or equal then 1 should not encounter any problems, however if it contains a $0$ or has an even number of elements (for which the median isn't really defined), then you have a problem. If it contains a $0$, then the geometric part will always be $0$ so the whole thing will converge toward $0$. Otherwise, like others already stated, you shouldn't run into any significant problems.

Siong Thye Goh
  • 153,832
JoDa
  • 1
  • 1
    If you could explain why it is trivial, this would constitute a much better answer. – Pro Q May 07 '21 at 07:07
  • And the median is generally defined for an even number of elements (see my code in the question, which works for even $n$ inputs) – Pro Q May 07 '21 at 07:08