61

Are all Morse code strings uniquely decipherable? Without the spaces,

......-...-..---.-----.-..-..-..

could be Hello World but perhaps the first letter is a 5 -- in fact it looks very unlikely an arbitrary sequence of dots and dashes should have a unique translation.

One might possibly use the Kraft inequality but that only applies to prefix codes.

Morse code with spaces is prefix code in which messages can always be uniquely decoded. Once we remove the spaces this is no longer true.


In the case I am right, and all Morse code message can't be uniquely decoded, is there a way to list all the possible messages? Here are some related exercise I found on codegolf.SE

john mangual
  • 1,951
  • 1
  • 21
  • 27

8 Answers8

100

The following are both plausible messages, but have a completely different meaning:

SOS HELP      = ...---...  .... . .-.. .--.        => ...---.........-...--.
I AM HIS DATE = ..  .- --  .... .. ...  -.. .- - . => ...---.........-...--.
celtschk
  • 1,094
  • 1
  • 7
  • 5
40

Quoting David Richerby from the comments:

Since ⋅ represents E and − represents T, any Morse message without spaces can be interpreted as a string in $\{E,T\}^*$

Further, since A, I, M, and N are represented by the four possible combinations of two morse characters (⋅-, ⋅⋅, --, -⋅, respectively), any message without spaces can also be interpreted as a string in $\{A,I,M,N\}^*\{E,T\}?$. Note that for any Morse message of length > 1, this is distinct from David's interpretation. Thus, the only messages with unique interpretations are those of length 1 (and, I suppose, 0, if that counts as a message) - that is, ⋅, representing E, and -, representing T.

Here's some JavaScript that will tell you all possible interpretations of a string of . and -. Strings of up to length 22 run in under a second, but anything higher than that starts getting pretty slow - I wouldn't, for example, try to decode HELLO WORLD with it. You can pop open a JavaScript console in your browser, paste this in, and then call, for example, decode('......-...-..---'). (In this example, entry #2446 is the intended string "HELLO".)

var decode = function(code) {
  var cache = {
    '0': ['']
  };
  for(var start = 0;start < code.length;start++) {
    for(var len = 1;len < 6;len++) {
      if(start + len > code.length) continue;
      if(!cache[start + len]) cache[start + len] = [];
      var curCode = code.slice(start, start + len);
      if(dict[curCode]) {
        for(var i_start = 0;i_start < cache[start].length;i_start++) {
          cache[start + len].push(cache[start][i_start] + dict[curCode]);
        }
      }
    }
  }
  return cache[code.length];
};

var dict = {
  '.-': 'A',
  '-...': 'B',
  '-.-.': 'C',
  '-..': 'D',
  '.': 'E',
  '..-.': 'F',
  '--.': 'G',
  '....': 'H',
  '..': 'I',
  '.---': 'J',
  '-.-': 'K',
  '.-..': 'L',
  '--': 'M',
  '-.': 'N',
  '---': 'O',
  '.--.': 'P',
  '--.-': 'Q',
  '.-.': 'R',
  '...': 'S',
  '-': 'T',
  '..-': 'U',
  '...-': 'V',
  '.--': 'W',
  '-..-': 'X',
  '-.--': 'Y',
  '--..': 'Z',
  '.----': '1',
  '..---': '2',
  '...--': '3',
  '....-': '4',
  '.....': '5',
  '-....': '6',
  '--...': '7',
  '---..': '8',
  '----.': '9',
  '-----': '0'
};

The code to prune it to only strings of real words is a bit longer, so I put it here. It runs under node.js and expects a file at /usr/share/dict/words-2500. The dictionary I'm using can be found here. It is not naive - it prunes as it goes, so it runs much faster on larger inputs.

The dictionary consists of a top-2500 words list I found on the internet somewhere, minus some 1-, 2-, and 3- letter combinations that I deemed not words. This algorithm is sensitive to having too many short words to choose from, and slows down drastically if you allow, say, every individual letter as a word (I'm looking at you, /usr/share/dict/words).

The algorithm finishes by sorting based on the number of words, so the "interesting" ones will hopefully be at the top. This works great on HELLO WORLD, running in under a second and returning the expected phrase as the first hit. From this I also learned that DATA SCIENTIST (the only other phrase I tried) morse codes the same as NEW REAL INDIA.

Edit: I searched for more interesting ones for a few minutes. The words SPACES and SWITCH are morsagrams. So far they're the longest single-word pair I've found.

Aaron Dufour
  • 500
  • 3
  • 8
17

It is enough to observe that certain short combinations of letters give ambiguous decodings. A single ambiguous sequence suffices, but I can see the following:

ATE ~ P
EA ~ IT
MO ~ OM

etc. As David Richerby notes in the comments, any letter is equivalent to a string of Es and Ts, which makes Morse Code ambiguous as a way of encoding arbitrary sequences of letters; the above combinations show that this is true even of plausible letter combinations in English (for instance, MEAT ~ MITT). Perhaps an interesting coding exercise would be to find all strings of five or fewer letters which could be mistaken for something else, restricting to letter combinations that may actually be found in English text (using one or more words), grouped by equivalence class.

Using your original example, it also happens to be the case that

HELLO WORLD ~ HAS TEAM NO MAID TOE

and while the right-hand side is perhaps unrealistic even as a partial message, it is certainly a sequence of English words, and one that could be found in less than 15 minutes without computer assistance. This could be taken as evidence that many phrases in English could be misparsed as a different (possibly nonsensical) sequence of English words.

Niel de Beaudrap
  • 4,241
  • 1
  • 18
  • 32
11

Morse Code is actually a ternary code, not a binary code, so the spaces are necessary. If spaces were not there, a lot of ambiguity would result, not so much with the entire message, but with individual letters.

For example, 2 dots is an I, but 3 dots is an S. If you are transcribing and you hear two dots, do you immediately write "I" or do you wait until you hear another dot (or dash)?

The answer is that each value is space separated so they are grouped together. When operators key messages in Morse, they make a pause of the same length as a dash after each letter code sequence to indicate the end of the sequence.

Even if you wrote an AI program to look at a full sentence at a time and figure out what was the logical interpretation of the message, there would still be many slight ambiguities and misspellings that would

David Richerby
  • 82,470
  • 26
  • 145
  • 239
Tyler Durden
  • 708
  • 1
  • 4
  • 14
5

a few notes not covered in other (good) answers but which dont generally research prior knowledge and cite any stuff (to me an intrinsic part of computer science).

  • this general theory of CS falls into the category of text segmentation and also "word splitting"/ "disambiguation" although there the theory is a bit different, its about splitting sequences of symbols into words (with variable letters), etc, where the symbols are units. here the strings are split into letters where letters have variable length, but the theory is analogous although not exactly 1-1. ie mapping between sentences-into-words, variable-word-letter-lengths, and sentences-into-words, variable-word/letter-lengths.

  • as others have pointed out this can be studied empirically. and someone did that from one angle (there are multiple ways to study this) and "published" the results on a web page with a big directory/ table of results.

    I found 25,787 ambiguous Morse code words. This is made of 10,330 distinct Morse strings. The highest frequency ambiguous Morse word has 13 possible donor words. The results are grouped below in tables based on the frequency of words that share the same Morse representation.

  • wow, "context matters"... a nearly identical question "translating morse code without spaces" on stackoverflow from 3yrs ago currently has 0 votes.

vzn
  • 11,162
  • 1
  • 28
  • 52
2

In general there are exponentially many possible decodings, but if you really want you can list them all. You can also list them in a succinct way, that is, give a succinct representation for all of them. Since this is nothing more than a programming exercise, I challenge you to do it yourself.

That said, the fact that there is ambiguity does not preclude the ability to decipher the message, or at least large parts of the message. Assuming a probabilistic model for the text represented by the Morse code – for definiteness, we can assume that it's English and use statistical properties of English – it may be possible to essentially decode the message, though some local ambiguities may be unavoidable. The reason is that most decodings correspond to non-sense plaintext. The way to do it is to extend the dynamic programming algorithm from the previous paragraph to estimate the likelihood of each decoding, and then choose the maximum likelihood decoding. This approach has more chance to succeed as the message gets longer.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
1

How to define/recognize/generate the language of all possible decodings.

Clearly, without spaces, the morse code is no longer uniquely decipherable.

It is however possible to give in a condensed form all the possible ways to decode it. This is actually similar to what is done in speech processing: from a unique stream of sounds (or of phonems), you have to find all the ways it can be decomposed in a sequence of words. The algorithms for doing this produce what is called a word lattice. You will find an example in the "lexical ambiguity" section of this answer.

In the case of binary Morse code (no spaces), you have only dots and dashes, but the problem is the same.

The way you can get all translations is as follow.

First you build Generalized Sequential Machine (GSM) $T$ that decodes the a Morse sentence. This is easily achieved by building a trie that recognizes Morse code. When a code is recognized, the corresponding letter/digit is output, and there is (non deterministically) an empty transition back to the root of the trie. But at the same time, the code word may be continued into a longer one (non-deterministically).

Then you take your Morse sequence $w$ of $n$ dashes and dots, and you read it as a linear finite state automaton $W$ with $n+1$ states (the positions between the dashes and dots, from $0$ to $n$) that generates this unique sentence, i.e. the singleton language $L=\{w\}=\mathcal L(W)$. What you want is the language of all translations $T(L)$. You know from general theorems that, since L is finite, hence regular, and since regular languages are closed under GSM mappings, that the language $T(L)$ is regular. So what you really want is a FSA that recognizes (or generates) that language.

For that purpose, you can simply apply a standard cross-product construction, similar to the one used for the intersection of two regular languages, to the two finites states devices $T$ and $W$. The transition are chosen so that you mimic both the transducer $T$ and the FSA $W$. You easily get a FSA that defines the regular language of all translations.

The details are easily worked out. But ask if you need more.

babou
  • 19,645
  • 43
  • 77
0

Some pseudo-code for a solver that will give all possible interpretations. This is based on a few quick thoughts, so additional input would be welcome. Method accepts two inputs one of the text so far translated, and the second of morse code.

MorseSolver (string textSoFar, string codeRemaining)
{
    if(codeRemaining length == 0) output textSoFar
    else
    {
        codeLength = length of code remaining
        read 1 through (min of 5 or codeLength) characters from codeRemaining
        for each set of characters
        {
            call an IsMorseCode method that checks if the characters 
              input are valid morse code
            if they are valid add the translated character to textSoFar 
              and remove the characters from codeRemaining, then call 
              the MorseSolver again with the new strings)
        }

}

This will output all the possible combinations of letters and numbers without any spaces between "words". If you wanted to prove the ambiguity, this would certainly do it. If you wanted to get some meaningful messages out, then try looking for code meant to translate hashtags into readable language.

Using the above, I wrote a program in C# which does the above. I stopped it from running at 22 million possibilities for the above string that can translate to hello world. The Morse Code equivalent of "Hello" resulted in 20,569 possible results. I also did not include the numbers. That would be higher if I allowed them.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Red_Shadow
  • 101
  • 1