Rob Abrazado (flatvurm) wrote,
Rob Abrazado
flatvurm

  • Mood:

You can't go home again

First off, I'll just mention that metalepticfit can safely skip this entry, since I've already related this story to him, but I figured I'd blog it, too, since I don't really have anything else to write about.

You know those little decoding puzzles? Whatever newspaper my mom gets calls them "cryptoquips," and they show up with the crossword and the jumble and stuff. Basically some cutesy phrase or something is encrypted with a simple substitution cipher, and it's up to you, the reader, to decode the ciphertext and reveal the cutesy phrase. Usually they start you off with a hint, too, like "X = T" or something to get the ball rolling.

For example, the puzzle might provide the text:
W NDAC XMC RDGOJLU WG CJL LSL. WAG'C CJBC AHLPP!

And start you off with the hint X = G. From there you have to figure out what the puzzle represents. They're kind of neat, and Momz and I used to spend some time on them back when we lived together.

Anyway. So I ran across something similar recently, and I got to thinking about how maybe a computer could help solve those things. That line of thinking got me to pondering if I could write a program to assist in brute-forcing cracks to simple substitution ciphers. And that line of thinking got me to pondering if I could still write a program at all.

See...it's been a while. Years, in fact. I haven't touched code in a serious way in quite some time now. So I figured I'd take a stab at writing this program, just to see if I could still cobble something together. And that's what I've been up to for the last couple days.

Dear God, what a disaster.

First off, I don't recommend trying to scare up a Java development environment from scratch with a crappy-ass Internet connection. I did, however, find that I had some old version of Java dev tools sitting around on my hard drive, so at least I could get something going. It just wouldn't be the newest version available. I could live with that; this was, after all, a pretty simple project. Disheartening, though, was that even though this version of Java I dug up was pretty old by modern standards, it was still way newer than the stuff I used way back in the day when I used to, you know, do this for a living. So there were actually changes in the language I had to adjust to, especially when my compiler started throwing warnings that I didn't understand at all. ("Unchecked conversion what now? Generics? What the hell is going on here?") I also found, by the way, that this activity made me talk to myself. A lot.

So anyway, I reacquainted myself enough with the language that I got a few things working. I didn't have an IDE or a debugger, but I had a text editor and a compiler, dammit, and I can make things work. Onward to victory!

Or not.

Once I had spent some time staring off into space and working out the algorithms the way I wanted, I found that I had a surprising amount of my own bone-headedness to work through before I actually got anything going. The actual ciphering mechanism was easy enough, but generating the keys turned out to be the more taxing task. See...the whole point of this exercise was to try and brute-force the decryption. That is, I would just keep trying to decrypt the text with different keys until I found one that worked. So part of this process was to generate key after key so I could attempt decryption.

I didn't want to do it randomly. There are, after all, a lot of keys, and I didn't want to have to deal with duplicates and junk. So a systematic approach seemed more in order. The keys for this thing, remember, are really just rearranged standard alphabets. So I needed a way to systematically rearrange the alphabet in every way it could be rearranged without creating duplicates. Once I got the algorithm worked out and coded, and the kinks worked out of the system, I was ready to go.

Ah, yes. Kinks. See...despite having been removed from the formal development process for quite some time, some vestiges of the practice remain with me. For example, as I was coding, I was keeping little unit tests handy so that I knew I wasn't breaking anything as I made changes. Unit tests are basically just little simple test cases so that you know if the easy stuff is getting tackled correctly, because if the easy stuff isn't working, then you sure as hell know that the hard stuff isn't working. One of these tests was generating sets of keys over smaller alphabets than the real one, just to make sure the algorithm was doing what I wanted it to do. Also I had a way set up to validate keys, just so I knew I wasn't creating nonsense in my output. I set up automated validation because it was very quickly going to become a pain in the ass to eyeball everything. For some reason, when I ran the unit tests on small data sets, say alphabets only three or four letters long, everything ran fine. When I tried it out with the full 26-letter alphabet, I kept generating invalid keys with duplicate letters in them. How could this be possible? How could code that worked fine over four iterations fail over larger iterations? This was not a complex transformation, and I just couldn't see how the number of letters in my alphabets could affect the validity of the keys being generated.

Want to know how? It can't. There wasn't a bug in the code, as I found out after a too-long and too-frustrating spate of debugging. The problem was that when I was running the key generator against the entire alphabet, I had made a typo when I entered the letters. Yes, that's right, everybody. I misspelled the alphabet. I am awesome. Even more awesome is that, later on in the process, I did it again. I not only misspelled the alphabet a second time, but I did it in the exact same way as the first one. And this wasn't a cut-and-paste error; I actually mis-typed the alphabet the exact same way two times, placing a second Q where the W should have been. Unreal.

Anyway. So I waded through all that usual programming fun, and finally, I was ready to rock. I knew that for the actual brute-forcing, I probably wouldn't need every possible key. Most keys probably wouldn't even make sense to use in one of these puzzles. Just for completeness, though, I felt like generating a complete master list anyway, just so I could have it around. It would be easier, I reasoned, to generate the list once and then pick keys I needed from the list, rather than generate keys over and over again for different projects. So I set the thing up to generate all the possible keys and dump them into a file I could use later. At some level, I knew that there were a large number of possible keys available, but it was just a simple operation, right? I figured that, when all was said and done, it would be a reasonable amount to deal with.

I was so, so...so wrong.

So wrong.

I started the key generator running, and I had it dumping its output into a text file. I did this through the OS with output redirection; I didn't code file access into my program itself. Every now and then I'd peek at the text file just to make sure everything was going well, and it was. Also I took a peek at how big the file was getting. After not too much time, it was already in the hundreds of megabytes. "Wow," I though, "this thing is really hustling." I puttered around doing other stuff for a while, and I finally decided that, based on the progress I'd seen so far, this was going to take a while to complete, so I did one last peek at the results file and then left the thing running and went to bed.

I woke up probably about five hours later to get up and use the bathroom. When I got back from the bathroom, I did a quick peek at the results file again. Wait a minute...those keys look familiar. I scrolled up to the last look I took at the file and, sure enough, the results hadn't changed. The program hadn't exited yet, so I didn't think it had finished. Did it hang somehow? Get stuck? A quick look at my CPU meter said that the thing was still running. Hmm, how strange. I wonder how big the results file is...

Four gigs. What. The. FUCK.

Well...that was...slightly higher than I expected. Not only that, but I'm pretty sure that's as big a file as my file system is going to be able to handle. So here's a little lesson, everybody; learn from my mistakes. Apparently, if you have output redirecting to a text file in Windows and that text file grows to four gigabytes in size, output will apparently stop accumulating there. Furthermore, no mention will be made of this to you. Unfortunate.

So there I was, with a four gigabyte text file filled with various arrangements of the alphabet, and apparently many, many hours of generated keys that were just thrown away somewhere, never to be seen again. Oh, well, I've got time. I figured the four gig thing was just an OS limitation, and I would have to write file output into the program itself so I could manage the files and chunk them into different sizes and stuff. Grumbling at this turn of events, I also paused to consider how after maybe six hours, I still apparently hadn't run though the entirety of the key set I was looking at. I then started to wonder just how many keys I was talking about here. I started to run the numbers.

So let's take a step back, here, and think about what we're doing. What I'm effectively doing is trying to list all the different ways you can arrange the letters of the alphabet. Because I'm dealing with specific orders, here, I'm dealing with sequences, and not sets. The combinatorics-minded among you recognize that I'm dealing with permutations here. The number of possible permutations of 26 letters is 26!, or 26 factorial. That's 26 times 25 times 24 ... all the way down to 1. In the back of my head, I knew that it was a big number. I'm kinda bad with big numbers, it turns out.

It is a friggin' huge number. When I actually sat down and ran the calculation, it turns out I'm looking at something like 400 septillion different keys. That's 4 x 1026. That is a lot of keys. How big a list is that? Well, each key is 26 bytes long. If want to dump them all into a file, it turns out I would need upwards of 10 quadrillion terabyte hard drives to hold that information. Yes. Quadrillion. Think of the rack space. Don't even talk to me about backups.

Somewhat dazed, I realized that this was slightly more storage capacity than I had available to me.

Because I am here to entertain and educate, I will now try to put these numbers in perspective. Let's say I could generate, oh, a billion keys every second. That's 1,000,000,000 keys. Every second. I can in no way meet that speed (I get about a million keys every three seconds), but let's just say. If I could generate a billion keys a second, I could probably be holding a completed list of keys right now...if I had started my key generation at the big bang. That's how many keys there are. Amazing how I figured that my program would finish running overnight.

So. Overall, as an intellectual exercise, this project was kinda fun. It was at least a little engaging for a while. For self-grading, I'm going to give myself a nominal failure in programming, but I think I did manage to score an epic fail in common sense. Go factorials!

[EDIT: Thanks to fiendess for pointing out that I made an some math errors before, which have now been corrected. Another fail! But I score a win for smart friends. Go smart friends!]
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 14 comments