Is there a deeper reason why exponentiation is not associative?

Question

Addition can be thought of as repeated counting; multiplication can be thought of as repeated addition; and exponentiation can be thought of as repeated multiplication. And yet, while the first three operations listed—counting, addition, and multiplication—are associative, exponentiation is not. Even its most primitive form, where $a^n$ simply means $$ \underbrace{a \times a \times a \times \ldots \times a}_{n \text{ times}} \, , $$ exponentiation does not have this property: $$ (a^b)^c \neq a^{(b^c)} \text{ in general.} $$ Is there a deeper reason for this phenomenon, something that makes exponentiation fundamentally different to counting, addition, and multiplication?

Well, unlike addition and multiplication, exponentiation is not commutative. — TonyK, Dec 05 '20 at 21:22
@TonyK That's a good point. It is possible for an operation to be associative but not commutative, however (e.g. function composition). — Joe, Dec 05 '20 at 21:28
I don't have time to write a full answer now, but there is a deeper reason, which is most easily expressed in the language of category theory. The basic idea is that there exists a certain duality between addition and multiplication that forces the latter to share the equational properties of the former. There is also a kind of partial duality relating exponentiation and multiplication, but it affects only one of its arguments, so as a result we get that $((a^b)^c)^{\ldots})^d$ is commutative only in $b, c, \ldots, d$. — pregunton, Dec 05 '20 at 21:49
I think a better question to ask is: Why is multiplication associative? A priori we should be surprised when a given property is preserved by the "self-iterate" construction which associates to a function $:\mathbb{N}^2\rightarrow\mathbb{N}$ the new function $$\hat{}:\mathbb{N}^2\rightarrow\mathbb{N}: \begin{cases} (a,1)\mapsto a,\ (a,b+1)\mapsto a* (a\hat{*}b)). \end{cases}$$ The example of multiplication shows that this construction can cause drastic changes, with $\hat{\times}$ having very little in common with $\times$. So what is it about $+$ which "perpetuates" associativity? — Noah Schweber, Dec 05 '20 at 22:57
(In my previous comment, contra my usual habit, I've assumed $0\not\in\mathbb{N}$ since the case where the second coordinate is $0$ is actually a bit weird. But this might actually be an important omission: it might be the case that the nature of the "starting point" - e.g. $a+0$ depends on $a$ but $a\times 0$ does not - is meaningful.) Note that whatever that perpetuating property is, it itself must not be perpetuated by self-iteration. So it's possible that there's a hierarchy of tamenesses at work here, with the self iterate of a function being somewhat less tame than the original function. — Noah Schweber, Dec 05 '20 at 23:04
Incidentally, this question and answer show that in a very strong sense we rarely preserve associativity when self-iterating. — Noah Schweber, Dec 05 '20 at 23:28
@pregunton if you find the time to write an answer to this, I think the topic you introduced would be very interesting. Very much an upgrade to the "usual" notions we repeat over and over about properties of binary operations :)) — MattAllegro, Dec 20 '20 at 10:34

score 7 · Accepted Answer · answered Dec 21 '20 at 10:51

I've tried to omit most of the technical details and go for a more intuitive explanation, since I believe one should always try to explain elementary topics like this one in an accessible way. I also mostly used set-theoretic terminology for the sake of familiarity, but the underlying concepts fundamentally come from category/type theory.

The operations of addition, multiplication and exponentiation on natural numbers are important and useful not because they form part of an infinite sequence, but because they mimic natural constructions we can apply to collections of things (after all, natural numbers were originally used for counting). In particular, given two sets $A$ and $B$, we can construct:

The disjoint union $A\sqcup B$ containing labeled elements $(1,a)$ and $(2,b)$ where $a\in A, b\in B$. The labels $1$ and $2$ tell us which set the element came from, to avoid conflict in case $A$ and $B$ have nonempty intersection.
The Cartesian product $A\times B$, whose elements are all pairs $(a,b)$ where $a\in A, b\in B$.
The function set $A\to B$, whose elements are all functions $f$ from $A$ to $B$.

If we denote by $|X|$ the number of elements of a set $X$, we have $|A\sqcup B|=|A|+|B|$, $|A\times B| = |A|\cdot |B|$ and $|A\to B| = |B|^{|A|}$, so these are indeed the counterparts of addition, multiplication and exponentiation respectively. In fact function sets are frequently denoted by $B^A$, and sometimes disjoint union is denoted by $+$; I deliberately chose different notations here to distinguish between the arithmetic and set-theoretic settings. (I am excluding the operation of succession because it is fundamentally different from the others, as it is a unary operation instead of binary, so it doesn't make much sense to talk about its associativity or commutativity; however, it does have a counterpart in set theory, namely the disjoint union with a prescribed one-element set).

In this set-theoretic language, arithmetic identities such as $a^{b+c} = a^b \cdot a^c$ turn into isomorphisms $(B\sqcup C)\to A \simeq (B\to A) \times (C\to A)$ between the underlying sets.

There are two main phenomena at play:

Repetition

What does it mean for a set construction to be another construction "repeated"? The following is enough for the purposes of this answer: if we have two constructions $\star$ and $\odot$, we will say that $\star$ is repeated $\odot$ if it is possible to define an operator $\bigodot_{x\in A} B(x)$ taking as inputs a family of sets $B(x)$ indexed by elements of another set $A$, such that the two operations are limiting cases of it, that is,

$$\bigodot_{x \in \{1,2,\ldots,n\}} \: B(x) \simeq ((B(1) \odot B(2)) \odot \ldots) \odot B(n),$$

and

$$\bigodot_{x \in A} \: B \simeq A \star B$$

whenever $B$ does not depend on $x$. (Note that for the big operator to be defined independently of the ordering of $A$, $\odot$ already needs to be commutative and associative up to isomorphism, so addition and multiplication are the only operations in the hyperoperation sequence whose set-theoretic counterparts can be "repeated", at least under this definition of repetition).

In the first case we have an obvious candidate, the generalized disjoint union $\bigsqcup_{x \in A} \: B(x)$ whose elements are pairs $(a,b)$ such that $a \in A$ and $b \in B(a)$. We can check by induction that the two above isomorphisms hold, so we can say that $\times$ is repeated $\sqcup$.

For the second case, we need to use an equivalent$^{(*)}$ definition of the Cartesian product:

The Cartesian product $A\times B$ can be equivalently defined as the set of functions $f$ from $\{1, 2\}$ to $A \cup B$ such that $f(1) \in A$ and $f(2) \in B$.

We can think of $f$ as the pair $(f(1), f(2))$ encoded as a function of its two slots. Defining the generalized Cartesian product $\prod_{x \in A} \: B(x)$ whose elements are functions $f$ from $A$ to $\bigcup_{x\in A} B(x)$ such that $f(a) \in B(a)$, we can again check that the above properties hold, so $\to$ is repeated $\times$.

Duality

The phenomenon of duality is found everywhere in category theory; each concept has a corresponding opposite or co-concept. For example, the set-theoretic Cartesian product (under the second definition) is an example of what in category theory is called a product, and the disjoint union is the corresponding coproduct. Similarly, the function set corresponds to an exponential object, which is dual to the product in one of its arguments (categories having all three constructions are called bicartesian closed). Actually, to properly express the universal properties of exponential objects (and products under the first definition) would require something stronger than a category, called a Cartesian multicategory (see e.g. this PDF, or this recent MathOverflow answer). Category theory can even handle the big operators defined above, though their definition is more complicated.

To avoid introducing so much new terminology and state the dualities in the most obvious way possible, let me recapitulate what the elements on each construction do, in terms of the information they give you:

Labeled element: "I give you $1$ and I give you an element of $A$, or I give you $2$ and I give you an element of $B$".
Pair (definition I): "I give you an element of $A$ and I give you an element of $B$".
Pair (definition II): "You give me $1$, then I give you an element of $A$, or you give me $2$, then I give you an element of $B$".
Function: "You give me an element of $A$, then I give you an element of $B$".

Categorical duality in this case corresponds to replacing one or more of the outputs by inputs or viceversa, that is, swapping some "I give you... and" by "you give me... then". Hence there is an obvious duality between $A\sqcup B$ and $A\times B$ (under definition II), and another partial duality between $A\times B$ (under definition I) and $A\to B$, which affects only the first argument. I don't want to go into much detail, but this duality is manifested in category theory itself by interchanging source and target in the relevant morphisms (which in this context are just functions between sets, or multivariable functions in the case of multicategories).

Categorical duality is a powerful notion, and lets us translate properties of one construction to properties of the other. For example, to prove that $X\sqcup Y \simeq Y\sqcup X$, we note that if we swap the labels $1\leftrightarrow 2$ in the element of $A\sqcup B$ described by "You give me $1$, then I give you an $a$, or you give me $2$, then I give you a $b$", we get an element of $B\sqcup A$, and if we do it again we recover the original element, so we have an isomorphism. Now, applying duality, we immediately find an isomorphism $X\times Y \simeq Y\times X$. It's easy to see that the same argument allows us to translate any equational property of $\sqcup$ alone to a corresponding property of $\times$.

In the case of exponentiation, the partial duality means that not all morphisms are reversed, so in some sense we can only translate arithmetic properties that involve exclusively the exponents or exclusively the bases: defining $g_a(b,c)=(a^b)^c$, we have commutativity $g_a(b,c)=g_a(c,b)$ and something reminiscent of associativity, namely that $g_a(b,c,d,\ldots)$ exists and is well-defined for any number of arguments (but note that $g_a(g_a(b,c),d)\neq g_a(b,g_a(c,d))$). Intuitively, in a multivariable function described by "You give me a $c$, then you give me a $b$, then I give you an $a$", the elements $b$ and $c$ can freely switch places without modifying the overall meaning, but we can't do the same with $a$.

The question remains, what goes wrong if we try to dualize $A\times B$ with respect to both arguments? Taking the cardinality of the resulting construction, we do indeed obtain an operation which has all desirable properties: it is commutative, associative and distributes over $\times$. What is this mysterious operation?

Sadly, the operation turns out to be just $(1^a)^b=1$. The issue is that the corresponding construction receives two inputs but gives no output, and any function that gives no output must be isomorphic to the empty function, the unique member of the singleton set $\mathbf{1}$.

$(*)$: The fact that there are two equivalent definitions of Cartesian product is important, because there are situations where these definitions are not equivalent!

For example, in settings such as intuitionistic linear type theory (whose category-theoretic counterpart would be something like monoidal closed categories, or rather multicategories, as I mentioned above), there are two versions of the product, the tensor product $\otimes$ (definition I) and Cartesian product $\&$ (definition II), such that $\otimes$ is repeated $\sqcup$ and $\to$ is repeated $\&$, but $\otimes \neq \&$. This is because in this setting, the ambient logic (called linear logic) only allows you to call a function exactly once, so given a pair in function form (an element of $\&$) you will never be able to recover both coordinates, only one of them. Thus $\otimes$ needs not always share all the properties of $\sqcup$, and can be, for example, noncommutative (and indeed there is a noncommutative version of linear logic).

This might seem like a strange setting, but it has found applications in fields such as quantum mechanics (because of the so-called no-cloning theorem) and computer science (to handle situations where the availability of resources matters).

I will need to read again and again but I still think it was great of you to develop these topics...there aren't many copies of this answer around. Thank you! — MattAllegro, Dec 21 '20 at 11:03

Trevor · Answer 2 · 2021-11-22T18:58:49.903

Short answer

Exponentiation symbolizes an ordered pair, as opposed to the scalar values you have with addition and multiplication.

Longer exposition

(Disclaimer: this is essentially entirely homegrown ideas of mine and they could be flat wrong, but I thought something here might help make things click.)

Exponentiation is similar to functions in general in that they can both be represented as ordered pairs. In lambda calculus, the standard algorithmic definition of exponentiation is $\lambda be.eb$, which is literally just defining it as a function and is by far the simplest mathematical operation in Church encoding.

So $a^{b^c}$ could be thought of as some nested functions $a(b(c))$; you would never look at that and wonder why it isn't equivalent to $(a(b))(c)$, as functions are very clearly non-commutative in general.

Also note that while you can't effectively carry out multiplication by adding (I mean, not really), or exponentiation by multiplying, the reverse is not true; exponentiation subsumes the lower operations. You can encode arithmetical expressions through judicious use of order of operations:

$$\large \log \log \left[\left(e^{\left(e^a\right)^{b}}\right)^{e^c}\right]=ab+c.$$

In fact, using a regular pushdown stack and the following three operations...

Replace the top value $a$ on the stack with $\log_2 a$.
Replace the top two values $a,b$ on the stack with $a^b$.
Push $2$ onto the stack.

...which seems to allow any standard arithmetical operation you want, and may even be enough for general computation.

In considering the basic operations, I think addition can be understood as combining two quantities of objects where every object is indistinguishable in every way; all you can tell is that each unit is a unit. Three of whatever plus seven of whatever else and you have ten whatevers. I figure this can be represented as a $0$-tuple, $()$, an object with no information.

Moving up to multiplication, we introduce the concept of distinctive properties, namely the prime factors involved. So multiplying two numbers can be viewed as looking through all your factors in both operands, and then lumping together (unit-style) all the factors that signify the same same prime. Thus why $$(2^3\cdot 5 \cdot 7^2)\times(5^3\cdot 7^4) = 2^{3+0} \cdot 5^{1+3} \cdot 7^{2+4}=2^3 5^4 7^6.$$

I figured in this way, numbers being multiplied could be treated as $1$-tuples once they're broken down to their prime atomic elements: $(i)$, where $i$ is the index of their particular prime.

And then you get to exponentiation, and I'll skip to the punchline. The new thing this one shows is order, and can be represented by a $2$-tuple, $(x,a)$. I'm not positive the $k$-tuple outlook is sound (given I arbitrarily decided it made sense), but if it is, it's interesting that the ordering seems to just emerge as you gradually add arguments to the tuples.

Finally, it may be tempting to say "but why can't $(x,a)$ work out to be equivalent to $(a,x)$", but that's the entire point: $(x,a)$ isn't a collection of two objects, like $2\times 3$ or $5+7$; it's a single object itself, as is $(a,x)$, and the concept of the ordered pair represents an absolutely vital step in complexity. If those two tuples weren't distinct from each other (i.e. they were commutative) then they wouldn't be two different objects, and you wouldn't have exponentiation. You'd be back to multiplication, which indeed is exactly what you see when you evaluate a power tower from the bottom up using parentheses: every term but the base just gets multiplied, commutatively.

Is there a deeper reason why exponentiation is not associative?

2 Answers2

Repetition

Duality