Any efficient text-based steganographic schemes?

Question

Sophisticated and efficient steganographic schemes with images as cover are available. However, I wonder: are there any that use texts as cover instead?

If one could only transfer a few printable natural language texts due to constraints, using texts as the cover might be useful.

Does anyone know of such a scheme or how to design one? I want the steganographic bits to cover text bits in no less than a ratio of 1/50. The scheme should be user-friendly and easy to implement.

[Addendum 1, edited] I have created a scheme named WORDLISTTEXTSTEGANOGRAPHY. It should work, being in the range of [0.5, 1.0] bit per word of natural language text. In it, the user must do some trial and error under the guidance of the software. (E.g., using an appropriate codebook could highly compress the input steganographic sequence.) The latest version, 2.1, runs on Python V.3.6.1.

score 8 · Answer 1 · edited May 03 '22 at 13:14

My understanding is that the three most popular approaches to "steganography using text as a cover" are:

1 Generate a completely new text by picking one word at a time from a dictionary, using the ciphertext bits to select which word.

Pick words in ways that, at first glance, look like real English sentences, using Markov chain algorithms. Weihui Dai. "Text Steganography System Using Markov Chain Source Model and DES Algorithm". (2010). H. Hernan Moraldo. "An Approach for Text Steganography Based on Markov Chains". (2012). "Steganographic Markov Chains". (2008). Anna Lysyanskaya and Mira Meyerovich "Provably Secure Steganography with Imperfect Sampling" (2006). Etc.
For a given dictionary, picking words with uniform distribution gives you the most ciphertext bits per word, but it's pretty obviously not real English text. Still, these schemes are very popular for sending raw bits over systems (like email, voice, etc.) designed for text: NATO phonetic alphabet, PGP word list, the S/KEY dictionary, the Diceware dictionary, etc.

2 Take some fixed text, and send exactly the same series of letters as in the original text, adding the ciphertext bits by modifying the presentation of that text in ways that we hope the warden doesn't notice.

Use two slightly different typefaces as in Bacon's cipher (1605).
Add space characters to the end of lines, where they are usually invisible, as in SNOW (1996?).
Vary the spacing between words; nudge the letters left-right (nonstandard kerning) or up-down.
Etc.

3 Take some fixed text, and slightly modify the words in ways that we hope the warden doesn't notice.

The receiver takes each word, looks it up in the agreed-on thesaurus; if the thesaurus shows 4 synonyms, the receiver decodes that word into 2 ciphertext bits. Nanhe, Kunjir, Sakdeo. “Improved Synonym Approach to Linguistic Steganography”. Shirali-Shahreza. "A New Synonym Text Steganography". Chang and Clark. "Practical Linguistic Steganography using Contextual Synonym Substitution and Vertex Colour Coding".
The receiver looks for certain "key words", ignoring all the other words.
The receiver decodes a few bits per sentence by noticing which of several ways (all with equivalent meaning) of arranging the words in that sentence were selected. Grothoff; Grothoff; Alkhutova; Stutsman; and Atallah. "Translation-Based Steganography" (2005). a
Etc.

Michael · Answer 2 · 2013-10-17T09:05:34.870

The most famous text based steganographic scheme is the acrostic: using the first letters of words / sentences. If the mean sentence length is 15-20 words and mean word length is 5 letters, then efficiency is ~1%. You could use shorter than average sentences and/or words to increase the efficiency to within your bounds of >2%.

Obviously this is a specific example of a class of functions that uses the nth letter of the nth word of each sentence as the hidden message.

This scheme certainly meets the user friendly and not too hard to implement category (provided you have some imagination / a good thesaurus), but won't stand up to much cryptanalysis!

Quick thought - base 26 encode the output from a strong authenticated encryption function, then construct sentences with the first letter of the first word as above?

Mok-Kong Shen · Answer 3 · 2013-02-02T10:47:10.463

In his answer Michael mentioned a known stego scheme of using the first characters of words/sentences as stego characters and rightly remarked that the scheme can be practically applied ("user-friendly") only when the stego character sequence is in natural language (i.e. not encrypted, in which case the scheme is however evidently very weak) and not when the stego character sequence is a ciphertext of the actual secret message to be transmitted.

It seems to me that, with an appropriate adaptation/modification, the same classically known idea could nonetheless be usefully exploited, if one could accept certain corresponding reduction in transmission efficiency. To illustrate with a concrete construction: Let the 26 characters of the alphabet be suitably divided into 8 groups (in general of different sizes) such that in each group there is at least one character that fairly frequently is the first character of sentences in natural languge communications. Then, given any arbitrary set of 3 stego bits, the user wouldn't have too much difficulty to write a sentence which is sufficiently natural to the given context of communication and which starts with a character that is in one of the said 8 groups that corresponds to an ecoding by these 3 stego bits. This way, each sentence of the covertext can transmit 3 stego bits which, though not a very high rate, is nevertheless something worthy of consideration in the practice IMHO. (Note that, since one exploits only one character of the words and not the entire words, one naturally has flexibility/simplicity that is hardly attainable with schemes that depend on word substitutions.)

score 2 · Answer 4 · answered Apr 08 '15 at 22:36

I've developed one here:

https://github.com/mjethani/typo

In a nutshell, every 4 bits of the secret message is encoded as a typo in the stegotext. The value of the typo is the 4 least significant bits of the first byte of its SHA-256 hash. For example, the typo "infirmation" (information) carries the value 0xE (0b1110). The recipient simply identifies the typos and hashes them to extract the information.

Why is this great?

Everybody makes typing errors.

On the other hand, fake typos may fool a human, but they won't fool a machine. The main challenge is to generate typos that are resistant to all kinds of analyses.

I'm afraid this encoding scheme does not meet your criterion of 1/50. It's probably closer to 1/100.

score 2 · Answer 5 · answered Mar 25 '13 at 16:02

You could use a Cardano grille to solve the problem of steganography (not encryption.) Very difficult to identify or even detect, as long as care is taken when hand-lettering the final message. If the mask's letters are too dense, the language of the cover message needed to conform to those letters can get a bit tortuous.

Also note the grille doesn't need to be based on physical coordinates. You could achieve the same results with a list of numbers that look like dates: 3/4/16, 1/2/08, etc., which could mean: 3rd paragraph, 4th sentence, 16th letter is the first character. 1st paragraph, 2nd sentence, 8th letter is the second character. And so on.

If your concern is cells being inspected by the warden, a random list of date-like numbers might be recognized as a code key, but an inexplicably perforated sheet of paper is also suspicious.

score 1 · Answer 6 · answered Feb 01 '13 at 19:04

You could hide information in a particular Wiki user's Wikipedia edit history. Someone following a trail of single word edits across a range of Wikipedia documents over a chronologically increasing scale could reconstruct an otherwise hidden message. The key would be the Wikipedia users name, and possibly a number representing which edited word per (out of possibly k edited words per) document was significant.

A secret key that keyed a PRNG could also be used to generate the index of the relevant edited words.

The efficiency is possibly much greater than 1 bit in 50 since you know which words you are looking for : exactly those words edited by User X, the order of which is determined by edit time.

Other ideas could be sharing information over a twitter account, where a distribution that tended towards 1 in every 50 tweets being a part of a message, somehow secretly identified.

Meler Lawler · Answer 7 · 2018-10-07T01:40:36.657

So there is this technique which allows for a cover text quite shockingly smaller than expected! Perhaps some excessive effort may come along with steganography like this; a fair criticism. Null ciphers for doing manual encryption generally do.

The above paragraph contains the hidden word STACKEXCHANGE.

To reveal it, take the first word of the above paragraph, then every 3rd word after it:

So this allows cover shockingly expected! excessive come steganography a Null doing generally

Highlight in the first word, and every 3rd word after, the letter at position 11 modulus the word's length.

Highlight in the second word, and every 3rd word after, the letter at position 5 modulus the word's length.

Highlight in the third word, and every 3rd word after, the letter at position 25 modulus the word's length.

[S]o [t]his [a]llows [c]over shoc[k]ingly [e]xpected! e[x]cessive [c]ome steganograp[h]y [a] [N]ull doin[g] g[e]nerally

The highlighted letters construct the secret message:

StackexchaNge

The sequence 11, 5, 25 (along with the knowledge to take only every 3rd word beginning with the first) is the key, whose numbers, when translated into the letter at that position in the alphabet, become the string:

KEY

You can verify this by inputting the first paragraph at this page with key set to KEY and word spacing set to 3.

Any efficient text-based steganographic schemes?

7 Answers7

Linked