3

I am looking for a grammar or algorithm to write down what I now call 'metacompositions' (or just compositions of compositions? see https://oeis.org/A133494) of character strings with the absolute minimum of characters, without loosing any information.

I wrote an enumeration class in java that enumerates these objects but I can't seem to compress the notation to an absolute minimum; I'm convinced my code is not optimal in this sense.

If you're interested, here's my code: https://github.com/ncg777/name.ncg777/blob/main/src/name/ncg777/Maths/Enumerations/MetaCompositionEnumeration.java

Here is example output for the string "01234":

[[01234]]
[[0][1234]]
[[0]][[1234]]
[[01][234]]
[[01]][[234]]
[[0][1][234]]
[[0]][[1][234]]
[[0][1]][[234]]
[[0]][[1]][[234]]
[[012][34]]
[[012]][[34]]
[[0][12][34]]
[[0]][[12][34]]
[[0][12]][[34]]
[[0]][[12]][[34]]
[[01][2][34]]
[[01]][[2][34]]
[[01][2]][[34]]
[[01]][[2]][[34]]
[[0][1][2][34]]
[[0]][[1][2][34]]
[[0][1]][[2][34]]
[[0]][[1]][[2][34]]
[[0][1][2]][[34]]
[[0]][[1][2]][[34]]
[[0][1]][[2]][[34]]
[[0]][[1]][[2]][[34]]
[[0123][4]]
[[0123]][[4]]
[[0][123][4]]
[[0]][[123][4]]
[[0][123]][[4]]
[[0]][[123]][[4]]
[[01][23][4]]
[[01]][[23][4]]
[[01][23]][[4]]
[[01]][[23]][[4]]
[[0][1][23][4]]
[[0]][[1][23][4]]
[[0][1]][[23][4]]
[[0]][[1]][[23][4]]
[[0][1][23]][[4]]
[[0]][[1][23]][[4]]
[[0][1]][[23]][[4]]
[[0]][[1]][[23]][[4]]
[[012][3][4]]
[[012]][[3][4]]
[[012][3]][[4]]
[[012]][[3]][[4]]
[[0][12][3][4]]
[[0]][[12][3][4]]
[[0][12]][[3][4]]
[[0]][[12]][[3][4]]
[[0][12][3]][[4]]
[[0]][[12][3]][[4]]
[[0][12]][[3]][[4]]
[[0]][[12]][[3]][[4]]
[[01][2][3][4]]
[[01]][[2][3][4]]
[[01][2]][[3][4]]
[[01]][[2]][[3][4]]
[[01][2][3]][[4]]
[[01]][[2][3]][[4]]
[[01][2]][[3]][[4]]
[[01]][[2]][[3]][[4]]
[[0][1][2][3][4]]
[[0]][[1][2][3][4]]
[[0][1]][[2][3][4]]
[[0]][[1]][[2][3][4]]
[[0][1][2]][[3][4]]
[[0]][[1][2]][[3][4]]
[[0][1]][[2]][[3][4]]
[[0]][[1]][[2]][[3][4]]
[[0][1][2][3]][[4]]
[[0]][[1][2][3]][[4]]
[[0][1]][[2][3]][[4]]
[[0]][[1]][[2][3]][[4]]
[[0][1][2]][[3]][[4]]
[[0]][[1][2]][[3]][[4]]
[[0][1]][[2]][[3]][[4]]
[[0]][[1]][[2]][[3]][[4]]

I'm convinced there is a better notation for these objects. Imagine then feeding back the output of the program into itself; both the input and the output are strings.

UPDATE: I think I fixed my problem with a simple regex. Here is the new output for "01234" with parenthesizations that are not necessarily balanced.

<01234>
<0<1234>
<0><1234>
<01<234>
<01><234>
<0<1<234>
<0><1<234>
<0<1><234>
<0><1><234>
<012<34>
<012><34>
<0<12<34>
<0><12<34>
<0<12><34>
<0><12><34>
<01<2<34>
<01><2<34>
<01<2><34>
<01><2><34>
<0<1<2<34>
<0><1<2<34>
<0<1><2<34>
<0><1><2<34>
<0<1<2><34>
<0><1<2><34>
<0<1><2><34>
<0><1><2><34>
<0123<4>
<0123><4>
<0<123<4>
<0><123<4>
<0<123><4>
<0><123><4>
<01<23<4>
<01><23<4>
<01<23><4>
<01><23><4>
<0<1<23<4>
<0><1<23<4>
<0<1><23<4>
<0><1><23<4>
<0<1<23><4>
<0><1<23><4>
<0<1><23><4>
<0><1><23><4>
<012<3<4>
<012><3<4>
<012<3><4>
<012><3><4>
<0<12<3<4>
<0><12<3<4>
<0<12><3<4>
<0><12><3<4>
<0<12<3><4>
<0><12<3><4>
<0<12><3><4>
<0><12><3><4>
<01<2<3<4>
<01><2<3<4>
<01<2><3<4>
<01><2><3<4>
<01<2<3><4>
<01><2<3><4>
<01<2><3><4>
<01><2><3><4>
<0<1<2<3<4>
<0><1<2<3<4>
<0<1><2<3<4>
<0><1><2<3<4>
<0<1<2><3<4>
<0><1<2><3<4>
<0<1><2><3<4>
<0><1><2><3<4>
<0<1<2<3><4>
<0><1<2<3><4>
<0<1><2<3><4>
<0><1><2<3><4>
<0<1<2><3><4>
<0><1<2><3><4>
<0<1><2><3><4>
<0><1><2><3><4>

UPDATE 2: I did have some regex for the language, here with parentheses instead of brackets:

\([^()]*\(*[^()]*\(*[^()]*\)

I guess it could be easy to use a similar expression to enumerate these.

  • Related: Catalan numbers https://oeis.org/A000108 (see the "parentheses" in strings interpretation), see also https://oeis.org/A002415 and https://oeis.org/A000217 for other sequences with combinatorial interpretations in terms of inserting various numbers of parentheses into strings; https://oeis.org/A001003 ("Schroeder's second problem") – leslie townes Nov 25 '24 at 16:30
  • Thank you very much Sir for considering my question! One of these sequences may be relevant; I just don't know right now. I can tell you that the idea that lead me to this sequence and enumeration was the not necessarily balanced parenthesizations of strings of length n. For example, words such as []] are unbalanced parenthesizations, while words such as [[]] are. – Nicolas Couture-Grenier Nov 25 '24 at 17:09
  • Good news! I fixed my problem with a simple regex! Thanks for your time. – Nicolas Couture-Grenier Nov 25 '24 at 18:01

2 Answers2

4

I can compress your notation a bit further.

Given a string with $n$ symbols, where $n\ge1$, there are $3^{n-1}$ possible metacompositions of that string. The most compact representation would be to represent each composition as an integer between $0$ and $3^{n-1}-1$. This would only require $\log_2(3^{n-1})\approx 1.58\cdot (n-1)$ bits. This is attainable, as I will now show.

Here is how the encoding works. Given a metacomposition, we need to produce an integer which has at most $n-1$ ternary digits when written in base $3$. For each $k\in\{1,\dots,n-1\}$, the $k^{th}$ digit is determined by the entries with indices $k$ and $k+1$ in the string.

  • If these two entries are in different outer parts, set the $k^{th}$ digit of the encoding to be $0$.

  • If these two entries are in the same outer part, but different inner parts, set the $k^{th}$ digit of the encoding to be $1$.

  • If these two entries are in the same outer and inner parts, set the $k^{th}$ digit of the encoding to be $2$.

Finally, convert that sequence of ternary digits to an integer between $0$ and $3^{n-1}-1$, and store that integer in binary.

Mike Earnest
  • 84,902
2

I think this is essentially the same as Mike Earnest's answer but perhaps easier to grasp and I have reversed his $0,1,2$.

Use some transformations, using - as a separator on upper level compositions and | as a separator on lower level compositions and . for those in the same lower part, then turn these into $2,1,0$, and then read the result as a ternary number. Taking some examples from your list:

[[01234]]                  0.1.2.3.4   0000     0
[[0123][4]]                0.1.2.3|4   0001     1
[[01][234]]                0.1|2.3.4   0100     9
[[0][1]][[234]]            0|1-2.3.4   1200    45 
[[0]][[12]][[3][4]]        0-1.2-3|4   2021    61
[[0]][[1]][[234]]          0-1-2.3.4   2200    72 
[[0]][[1]][[2]][[3]][[4]]  0-1-2-3-4   2222    80

and it is easy enough to do this in reverse to go from an integer between $0$ and $3^{n-1}$ to a double composition, provided you know $n$. You could extend this to $k$ levels of compositions and integers between $0$ and $k^{n-1}$ if you know $k$ and $n$.

Henry
  • 169,616