How are universal properties “solutions to optimization problems”?

Question

I have on multiple occasions stumbled on the idea that a universal property in category theory is a “most efficient solution to a problem”. See e.g. the wikipedia page. However, I don’t find the intuition given there very clarifying.

I find this somewhat intuitive for the existence part of the definition of universal property: Take as an example a terminal morphism $f:A\to X$ from functor $U$ to $X$. We need that for any object $D$ in $U$’s image, if there is a morphism from $D$ to $X$, then this must factor through $f$.

We can see this as “a most efficient solution”, in the sense that it is literally the “supremum/join” in a certain poset: if there is a morphism from object $D$ in $U$’s image, to $X$, then there must also be a morphism from $D$ to $A$, so $A$ is in this sense a “maximal object” with respect to $X$.

However, the other requirement is that any morphism from $D$ to $X$ must uniquely factor through $f$. I see the intuition here in specific cases: e.g. the set-theoretic product of sets $X,Y$ is the “most efficient” set through which functions to either $X$ or $Y$ factor.

But I cannot see the general idea of how existence + uniqueness in the case of e.g. terminal morphisms implies that we can interpret them as “most efficient solutions to problems”

Can we actually write down an optimization problem, where a morphism is a solution iff both the existence + uniqueness conditions hold?
Or, is there some poset on morphisms such that a morphism is terminal iff it is maximal in this poset?
Or is there some other formal reason that justifies calling e.g. a terminal morphism “the most efficient solution to a certain problem”?

I don't use enough category theory to write a full answer, but I think of the example of products. The categorical definition of, say, $A\times B$ has existence of a way to factor through. But so would $A\times A\times B\times B$, which has tons of ways to factor through. In an informal sense, [any construction of] $A\times B$ is the most efficient way of capturing everything you need, as opposed to less efficient ways like $A\times B\times A$ or whatever. — Mark S., May 25 '19 at 13:06
@MarkS., yes I understand this vague intuition for the specific case, but I don't have the intuition for the general definition of unversal properties, and I can't see what exactly we're optimizing for in the general case (In the specific case of Set we might say we're optimizing for smallest cardinality such that it still can factor everything, but I don't see the generalization of this intuition to arbitrary terminal morphisms in arbitrary categories) — user56834, May 25 '19 at 14:10
@MarkS., i.e. in Set there is literally an "objective function" on candidate products that we're finding the minimum of, namely the cardinality of the product object. Is there always an "objective function" of which the terminal morphism is an optimum? — user56834, May 25 '19 at 14:14

jgon · Answer 1 · 2019-05-29T03:16:41.817

This is a very long answer, and it's hard to tell from your question how intelligible it will be to you. Hopefully it's helpful to you.

First take (In what sense is finding a universal object solving an optimization problem?)

Let's first give a precise statement about what a universal object is. A universal object is an initial or terminal object in an appropriate auxiliary category. For example, a product $X\times Y$ in a category $C$ is terminal in the category of triples $(A,f,g)$ where $A\in C$, $f:A\to X$, $g:A\to Y$, and morphisms $a:(A,f,g)\to (A',f',g')$ are arrows in $C$ such that $f=f'\circ a$ and $g=g'\circ a$.

This gives us a sense in which a universal object solves an optimization problem. Terminal and initial objects are the categorical analog of a minimum or maximum element in a poset (or preorder). Indeed a poset can be regarded as a category, and the category will have an initial element if and only if the poset has a maximal element.

Then since we get universal objects by constructing an auxiliary category and asking whether or not it has a terminal/initial object, we are very literally creating an optimization problem and asking if we can solve it. Now this leads to the natural question, what are we optimizing for? Why can we say it is the most efficient solution? Also as you ask in your question, why do we require a terminal object instead of just an object to which every other object has a morphism?

Second take (But why can we say that solving this optimization problem is finding the most efficient solution?)

Let's reconceptualize universal objects. Instead of thinking of universal objects as being initial or terminal objects in an auxiliary category, let's talk about information.

What information does an object in a category carry? Well, given some object $X\in C$, for any other object $Y$, we can construct the set $\newcommand\Hom{\operatorname{Hom}}\Hom(Y,X)$. Moreover for any morphism $f:Y\to Z$, we get a morphism $\Hom(Z,X)\to \Hom(Y,X)$ given by $g\mapsto g\circ f$. Thus to an object $X$ in $C$, we can associate a functor $h_X : C^\newcommand\op{\text{op}}\op\to\newcommand\Set{\mathbf{Set}}\Set$. It turns out that this functor encodes all of the (categorical) information present in the original object, in the sense that the functor determines $X$ (up to isomorphism). This is the content of the Yoneda lemma.

What does this have to do with universal objects, you might ask.

Well, suppose we have a functor $F:C^\op\to \Set$. This encodes some information about the category. Let's stick with products (though it may not be clear what the relation to products is yet). If $X$ and $Y$ are objects of $C$, then we could define $F(Z)$ to be the set of pairs of morphisms $(f,g)$ with $f:Z\to X$ and $g:Z\to Y$, and if $a:Z\to Z'$ is a morphism, we can define $F(a):F(Z')\to F(Z)$ by $F(a)(f',g') = (f'\circ a, g'\circ a)$.

Then $F(Z)$ encodes the information about pairs of morphisms from $Z$ to objects $X$ and $Y$. We might ask whether there is a single object in $C$ that encodes this same information.

What would this mean, well, it would mean that there is some object $X\times Y$ such that $h_{X\times Y}\simeq F$. In this case, we say $X\times Y$ represents the functor $F$. (There's actually a bit of extra data here, but we'll talk about that in the next section)

We've gotten fairly abstract, and apparently far from the context of the question. However, now we can talk about why we can say that this is the most efficient solution.

Suppose we just wanted to encode the information in $F$. Then all we need is an object $D$ such that we have a natural transformation $F\to h_D$, all of whose components are injections. Encoding it efficiently as possible should mean that $h_D$ doesn't carry any extra information, which corresponds to the natural transformation actually being a bijection.

Alternatively, by saying $h_D$ encodes all of the information of $F$, we could have the natural transformation go from $h_D$ to $F$, and require the components to be surjective, and then maximum efficiency is again when the components are bijections. It turns out that this second formulation is more useful, because of how the Yoneda lemma works.

Synthesis (What does representing a functor have to do with uniqueness of the morphism to the terminal object in the auxiliary category?)

In the last section I focused on representability of functors, which corresponds to the terminal object sort of universal object, so let's connect those two concepts.

There are two key points to connect the two viewpoints. The first is we should be a little more careful about a representing object.

It turns out that if $X$ represents $F$, then not only do we need $h_X\simeq F$, but the particular natural isomorphism is important too. Luckily, the Yoneda lemma tells us what the natural transformations from a hom functor to an arbitrary contravariant functor are. It turns out that $\Hom(h_X,F)\simeq F(X)$. Thus a natural isomorphism from $h_X$ to $F$ corresponds to a particular element of the set $F(X)$.

The second key point is to construct an auxiliary category that encodes the information of a functor, so that we can apply the perspective from the first section. Suppose then that we have some information encoded in a contravariant functor $F$. We can construct an auxiliary category whose objects are pairs $(X,y)$ with $y\in F(X)$, and whose morphisms $g:(X,y)\to (X',y')$ are maps $g:X\to X'$ such that $F(g)(y')=y$.

By the Yoneda lemma, the objects $(X,y)$ of this category encode natural transformations $\psi_{X,y}$ from $h_X$ to $F$.

There is at least one morphism to $(X,y)$ from any other object $(X',y')$ in the auxiliary category if and only if the natural transformation $h_X\to F$ is surjective. (If you're not familiar with the Yoneda lemma, you'll probably want to skip or skim the proof.)

Proof.

The proof is just unwrapping the definitions.

First suppose that for any other object $(X',y')$, there is a map $g:(X',y')\to (X,y)$.

The natural transformation $\psi_{X,y,X'} : \Hom(X',X)=h_X(X')\to F(X')$ is defined by $g\mapsto F(g)(y)$. Let $y' \in F(X')$. Let $g : (X',y')\to (X,y)$, by the assumption. Then by definition of morphisms in the augmented category, $g\in \Hom(X',X)$ with $F(g)(y')=y$. Then $\psi_{X,y,X'}(g) = F(g)(y)=y'$. Hence $\psi_{X,y,X'}$ is surjective.

Now suppose conversely that $\psi_{X,y,X'}$ is surjective for all $X'$. Then for any $y'\in F(X')$, there is some $g\in h_X(X')=\Hom(X,X')$ such that $\psi_{X,y,X'}(g) = y'$.

However $\psi_{X,y,X'}(g) = F(g)(y)$. Thus $F(g)(y)=y'$, so $g$ is a morphism from $(X',y')$ to $(X,y)$, as desired. $\blacksquare$

The natural transformation $\psi_{X,y} : h_X\to F$ determined by the object $(X,y)$ in the auxiliary category is injective if and only if there is at most one morphism $g:(X',y')\to (X,y)$ for any other pair $(X',y')$.

Proof.

Once again, we are just unwrapping the definitions.

Suppose that there is at most one morphism to $(X,y)$ from any other pair. Then if $\psi_{X,y,X'}(g) =\psi_{X,y,X'}(h)$, we have that $F(g)(y)=F(h)(y)$, so $g$ and $h$ are both morphisms from $(X,F(g)(y))$ to $(X,y)$. By uniqueness, $g=h$. Therefore the components of the natural transformation are injective.

Conversely, suppose that $\psi_{X,y,X'}$ is injective for all $X'$. Then if $g$ and $h$ are both morphisms from $(X',y')$ to $(X,y)$, then $$\psi_{X,y,X'}(g) = F(g)(y) = y' = F(h)(y) = \psi_{X,y,X'}(h),$$ so $g=h$. Thus morphisms to $(X,y)$ are unique. $\blacksquare$

Corollary. The natural transformation $\psi_{X,y}:h_X\to F$ associated to a pair $(X,y)$ in the auxiliary category is a natural isomorphism if and only if $(X,y)$ is terminal in the auxiliary category, and existence of the morphisms in the auxiliary category corresponds to the pair $(X,y)$ containing all the information of $F$ (in the sense of surjectivity of the components), and uniqueness of the morphisms is what corresponds to efficiency, (in the sense of injectivity of the components).

So you're saying there are no famous optimization problems that can be readily categorized, because I think that's what the OP is looking for, though great post! — Daniel Donnelly, May 29 '19 at 02:57
@BananaCatsCategoryTheoryApp I was mostly answering the OPs question in bullet point three. If I'm being honest, I don't really know the answers to the first two bullet points, though my intuition says that no, there isn't such a poset on the morphisms. I could certainly be wrong a about that though. Idk, to some extent efficiency is a sort of fuzzy word, and my answer encapsulates how I formalize that, but there may be another way to interpret it more in line with a more typical optimization problem. — jgon, May 29 '19 at 03:12
Thanks for the answer! I am still trying to understand it. In particular, I have at least these questions: 1. What are you referring to when you say "suppose we have a contravariant functor $F$" under "synthesis"? Whats the codomain of $F$? Is this still the same $F$ as before? 2. What exactly does it mean to say that "$h_X$ encodes all of the information about $X$" and that "$h_X$ 'determines' $X$ up to isomorphism"? Does it mean that there is a bijection between objects in $C$ and functors $h_\cdot$ for $\cdot\in C$? Or does it mean more than that? — user56834, May 29 '19 at 06:27
Cntd. (Couldnt $h_X$ contain more information than $X$). More generally I'm having some trouble intuitively getting this idea. 3. Is it correct to say that you're saying that a universal morphism solves an optimization problem in two senses? Firstly, its a initial/terminal object and hence literally an optimum of a poset, and secondly it minimizes information? I'm having some trouble seeing how these combine into a single optimization problem. — user56834, May 29 '19 at 06:27

How are universal properties “solutions to optimization problems”?

1 Answers1

Linked