Codification of a formal language in set theory.

Question

Starting with an arbitrary class of sets $\Gamma$, can you generate a free semigroup $\Gamma^*$ over $\Gamma$ with the group operation of concatenation ($\frown$)?

The goal here is to codify a formal language in terms of set theory.

The difficulty is in coming up with a set-theoretic operation that corresponds to concatenation such that it makes every new element resulting from concatenation unique, and is associative.

Given $a,b \in \Gamma$, the first thought would be to represent $a \frown b\frown c$ as a 3-tuple $<a,b,c>$. I know I can define tuples set-theoretically via $<a,b>:=\{\{a\},\{a,b\}\}$ but this will violate associativity in concatenation:

$$a \frown(b \frown c)=<a,<b,c>> \ne <<a,b>,c>=(a \frown b)\frown c$$

I have tried other variants but I haven't been able to come up with a set-theoretic description of concatenation that respects associativity, any ideas?

EDIT: This is a related question: https://mathoverflow.net/questions/12190/set-theoretic-foundations-for-formal-language-theory

unfortunately none of the answers provide an explicit definition of concatenation in set-theoretic terms.

I'm pretty sure you're going to have issues avoiding both the Axiom of Infinity and the Axiom of Power Sets. Many concepts in formal language theory are inductively defined. There are two approaches to characterizing inductively defined sets: we can say that they are the smallest sets satisfying some condition, or we can that they are the union of an (countably) infinite number of "stages". The former is impredicative and thus requires something like powersets, while the latter relies the existence of naturals. — Derek Elkins left SE, Mar 10 '19 at 21:16
@DerekElkins Just to make the task easier, let me remove that restriction from the question. If someone can provide insight about the definition of concatenation in terms of set theory at this point I'll be happy no matter what axioms of set theory they assume. — Mike, Mar 10 '19 at 21:19
Foundation has nothing to do with that. Neither does infinity nor power set. You just need enough to prove there are infinitely many ordinals, then you can isolate the finite ones (no need for any of the aforementioned axioms). Then just define the sequences in the obvious way. — Asaf Karagila, Mar 10 '19 at 21:19
@AsafKaragila You mean define $a \frown b \frown c$ as the sequence ${a,b,c}$?
Wouldn't that cause the same problem with associativity that the tuples cause in my question? — Mike, Mar 10 '19 at 21:22
@AsafKaragila If you dropped the Axiom of Infinity, wouldn't $\omega$ become (essentially) a proper class? Otherwise, why wouldn't we be able to prove the Axiom of Infinity? If $\omega$ being a proper class was the case, you wouldn't be able to naively define sequences as functions $\omega\to X$, but if your intent was that we could still define a proper class of sequences, that would be plausible to me. — Derek Elkins left SE, Mar 10 '19 at 21:28
Er, I guess I should have said something like a dependent sum for a set of all finite sequences, e.g. something like ${(n,f)\mid n\in\omega\land f\in X^n}$. Still, the point being for $X={{}}$ it would seem like it was the natural numbers. — Derek Elkins left SE, Mar 10 '19 at 22:02
@AsafKaragila I see, the answer below does the sequences and the way they are defined (which is a natural one) doesn't have any associativity issues, I think that's what you meant. — Mike, Mar 10 '19 at 22:35
@Derek: You don't care about that. Just the fact that we can define when something is a finite ordinal, and that we have enough induction to define ordinal addition (which is also cardinal addition, that requires no induction). — Asaf Karagila, Mar 10 '19 at 23:45
@AsafKaragila Thanks a lot for all the input. This question is part of a larger curiosity I have about formal languages and set theory. As a next step to this, I've asked this question:
https://math.stackexchange.com/questions/3143072/how-to-define-the-class-of-terms-of-a-formal-language

about how to identify all the elements of the language that we conventionally call 'terms' (constants, variables and functions). — Mike, Mar 10 '19 at 23:48
Mike, I've posted an answer there by the time I saw this comment. Do note that it might be very helpful to understand how PA encodes first-order logic, since this is almost the same thing. — Asaf Karagila, Mar 11 '19 at 00:01

Noah Schweber · Accepted Answer · 2019-03-10T21:56:15.723

There are a couple ways to address this. Equivalence classes give the most algebraically natural treatment:

Taking the naive definition of concatenation-as-ordered-pair, we get the free magma $\hat{\Gamma}$ on $\Gamma$.
Now the free semigroup will just be the free magma modulo the "associativity relation" - basically, we just need to whip up the binary relation $\sim$ describing when two elements of $\hat{\Gamma}$ "should be" equal. It's a bit messy to describe $\sim$ "explicitly" - this winds up being an inductive construction - but we can also define it as the smallest equivalence relation on $\hat{\Gamma}$ such that (or, the intersection of all equivalence relations on $\hat{\Gamma}$ such that) for all $a,b,c,d\in\hat{\Gamma}$ we have:
- $a\sim b$ and $c\sim d$ implies $\langle a,c\rangle\sim\langle b,d\rangle$, and
- $\langle a,\langle b,c\rangle\rangle\sim \langle\langle a,b\rangle, c\rangle$.

It's now easy to define the semigroup operation $\cdot$ as $$[a]_\sim\cdot[b]_\sim=[\langle a,b\rangle_\sim$$ (after, of course, checking that this is in fact well-defined). Or if you want to be really pedantic about it, given $\sim$-classes $E,F$, their product $E\cdot F$ is the unique $\sim$-class $G$ such that there are elements $a\in E$ and $b\in F$ such that $\langle a,b\rangle\in G$.

Another approach, less algebraically natural but perhaps more concrete, is via tuples as functions.

Specifically:

An element of $\Gamma^*$ will be a function $f$ such that $(1)$ the domain of $f$ is some natural number $n$, and $(2)$ the range of $f$ is $\subseteq\Gamma$. A function with domain $n$ is "morally" an $n$-tuple.
We can now define a fully associative version of concatenation using arithmetic. Specifically, suppose $f,g\in \Gamma^*$ with domains $m,n$ respectively. We let $f^\smallfrown g$ be the function $h$ given by: $dom(h)=m+n$, for $k<m$ we have $h(k)=f(k)$, and for $m\le k<n$ we have $h(k)=g(k-m)$.
- So for example if $dom(f)=2, dom(g)=1$, $f$ sends $0$ to $0$ and $1$ to $1$, and $g$ sends $0$ to $3$, then $f^\smallfrown g$ has domain $3$, sends $0$ to $0$, sends $1$ to $1$, and sends $2$ to $3$.

(Remember that in set theory a natural number is just a finite ordinal, and in particular is just the set of smaller natural number; so e.g. "$dom(f)=5$" makes perfect sense.)

The only thing this relies on is arithmetic of finite ordinals, which is straightforward to develop.

Codification of a formal language in set theory.

1 Answers1

Linked