[CrackMonkey] Stegadodo

Mon Jan 29 13:30:44 PST 2001

So, I'm kind of posting this to crackmonkey because I know folks here
are into both a) dadadodo and b) weird projects.

Anyways, I've been thinking about the problem of
steganography. Steganography, as most of you know, is hiding data in
other data. For example, the following paragraph is steganographic:

        Be ever aware! Under John's ottoman lies an infuriated snake!

Sure, it's not cryptographically secure by any means, but the idea is
that there's a message encoded in another message.

Another classic example is the great program "stego", which takes a
text message and stores it in an image file. Each bit of the text
message is used to shift up or down a pixel in the image file --
imperceptible to the human eye, but easily retrievable on the
receiving end.

So, I was thinking that it'd be nice to encode text as OTHER text --
since most of the traffic on the Internet is text right now, it'd
blend right in. One way to do it is the acronym method shown in the
sentence above. Take each letter in the source text, search for a word
in /usr/dict/words that starts with that letter, use it as output.

However, it'd be hard to make the language sound natural, and it'd
submit really easily to frequency analysis (e.g., the word "xylophone"
would show up a lot more often than in normal conversation). So, how
do you make a natural-sounding text that has reasonable word
frequencies?

dadadodo, of course.

So, I think I have an algorithm for encoding and decoding data using a
shared dadadodo data file, which I want to share with you all. Here's
how it would go. First, let's say I have a dadadodo data file that
looks something like this:

((A . (B C F G H))
 (B . (C I J))
 (C . (E))
 (D . (E D B A))
 (E . (F A D B G I))
 (F . (C A B I)) 
 (G . (H I))
 (H . (B A D E F))
 (I . (E G H))
 (J . (A D E G H)))

In other words, for each word, I have mapped a list of other words
that can follow it. I think there may be some probability involved
with words but I'm going to ignore that for now. Now, let's say I want
to encode the following byte:

        01100111

Here's how the encoding would go: first, I'd pick a word at random out
of the data file. Let's say I picked "A." Well, I'd put A in my output
stream.

        A

Then, I look at the list of words that can follow A. There are 5,
which means that I can encode 2 bits of data in my choice of what word
to use next. (The number of bits I can encode is equal to the power of
the next smallest power of two less than the number of items). 

This is the key to this method -- using your choice of next word to
encode a set of bits. Since the first 2 bits of my encodable byte are
"01", I'm going to choose the zero-based index item 1, or "C". So "C"
goes in the output:

        A C

Next, I move on to C's list. C has only one follower, so I can encode
zero bits in there. So, we get a noop for encoding, but we add the
item anyways:

        A C E

Moving on to E's list, I have 6 items, so again there's 2 bits of
data, and I take the 2-indexed item, giving:

        A C E D

Continuing on in this fashion, my output string is:

        A C E D D A

Decoding is relatively straightforward, too. I know the first word is
A, so the next word, C, represents "01". E represents nothing, D
represents "10", then "01", then the A represent "11", and I've
decoded the byte.

It'd be a wordy encoding, but it'd be relatively realistic text.

~TMS

-- 
-----------------------------------------------------
  /~\  The Mighty Silverback - silverback at pigdog.org 
 C oo  
 _( ^) http://pigdog.org/ - The Online Handbook of 
/   ~\ 			    Bad People of the Future		
-----------------------------------------------------