Improved Caesar-like ciphers (2024)

Next: Reading and Writing from Up: fsqFsHn sGGousG Previous: Caesar cipher reduxSubsections

The Vignère cipher
One-time pads
Multi-character alphabets

Certainly the Caesar cipher offers no cryptographic security at all: if youknow the alphabet the message was encoded in, you need only guess onecharacter to crack the code. Even if you don't know the alphabet, guessingthe correspondence is not very hard with a little patience.

In this section, we will discuss a few approaches to improving the security,while retaining the basic idea of character shifting.

The Vignère cipher

One way to make a Caesar cipher a bit harder to break is to use differentshifts at different positions in the message. For example, we could shiftthe first character by 25, the second by 14, the third by 17, and the fourthby 10. Then we repeat the pattern, shifting the fifth character by 25, thesixth by 14, and so on, until we run out of characters in the plaintext.Such a scheme is called a Vignère cipher4.14, which was first used around 1600, and was popularly believed to be unbreakable.4.15 This cipher is called a polyalphabetic substitution cipher, becauseseveral different substitutions are made depending on the position ofthe character within the text.

In our first example, the key consists of the four shifts [25, 14, 17, 10],which are the numerical equivalents of the string ``ZORK'' in a 26-letter alphabet consisting of the letters A-Z. It is commonpractice to think of our key as plaintext letters, rather than theirnumerical equivalents, but either will do. We can encode the string``CRYPTOGRAPH'' as

C	R	Y	P	T	O	G	R	A	P	H
+25	+14	+17	+10	+25	+14	+17	+10	+25	+14	+17
B	F	P	Z	S	C	X	B	Z	D	Y

Note that in the above, the letter ``R'' in the plaintext encodes toboth ``F'' and ``B'' in the crypttext, depending on its position.Similarly, the two ``Z''s in the crypttext come from different plaincharacters.

Now we implement this in Maple. This is very similar to the Caesar cipher,just with the extra complication of multiple shifts, and letting our key be astring.

First, we set our Alphabet to the usual one. We also use the conversionfunctions from §4.

>  Alphabet := cat("", Select(IsPrintable,convert([seq(i,i=1..255)],bytes))):

>  Vignere:= proc(plaintext::string, key::string) local textnum,codenum,i,p,offsets,keylen; global Alphabet;  p := length(Alphabet); offsets := StringToList(key); keylen := length(key); textnum := StringToList(plaintext); codenum := [seq( modp(textnum[i] + offsets[modp(i-1,keylen)+1], p), i=1..length(plaintext)) ]; ListToString(codenum); end:
See Also
Microsoft Security Bulletin MS15-031 - ImportantCipher Suite: TLS_AES_256_GCM_SHA384SSH Weak Key Exchange Algorithms Enabled - Virtue SecuritySHA-1 Security Vulnerability Scan and How to Fix

To try it out, we'll the same text as in the previous section. Notice howmuch harder it is to pick out the word boundaries in the resulting ciphertext.

>  coded:=Vignere(text,"Prufrock");

We can make the decoding function from the original (let's call itunVignere) by changing exactly one + sign to a -.4.16We omit the change here so perhaps you will figure it out for yourself. Butwe will test it, to show you that it does work.

>  printf(unVignere(coded,"Prufrock"));

Even though this scheme looks quite daunting, it is not so very hard to crackif you use a computer or have a very large supply of perseverance. If we know that the key is of a certain length, say 4, and our plaintext issufficiently long, then we can perform frequency analysis on every fourthletter. Even if we don't know the key length, it is not too hard to write acomputer program to try all the lengths less than, say, 10, and pick the onethat looks best.

One-time pads

Note that the longer the key is in the Vignère cipher, the harder it is tobreak. If the key were as longer than the text, then it might seem at firstthat analyzing the frequency of letters in the encrypted text would be of nohelp, since each letter would be shifted by a different amount. This is almost true. If the key is an passage of text in English, then theshifts will occur with a predictable frequency. Of course, the problem getsvery difficult, but cryptanalysts are persistent people.

But what if there were no predictability within the key, having the shiftscome at random? The result (a Vignère cipher with an infinitely long,completely random key) is an cryptosystem that cannot be broken.Since the shifts are random, if you manage to decipher part of the message,this gives you no clue about the rest. Furthermore, any plaintext of a givenlength can encrypt to any ciphertext (with a different key, of course).For example, the ciphertext ``=5nwhn KDNO?uWTBC-XA'' might have comefrom the phrase ``Let's have dinner.'' (with the key``pOyntmbbXYtrjSTGe1''), or it might be the encryption of ``Attack atmidnight'' with the key ``{@y4#!Jbz>&moSYEoL''. Since any messagecould encrypt to any other, there is no way to break such a code unless youknow the key.

But that is the problem: the key is infinitely long. Infinitely long, trulyrandom sequences of numbers tend to be somewhat unwieldy. And to decode themessage, you must know what random sequence the message was encoded with.

Such a system is called a one-time pad, and was used regularly by spiesand military personnel. Agents were furnished with codebookscontaining pages and pages of random characters. Then the key to theencryption is given by the page on which to begin. It is, of course,important that each page be used only once (hence the name ``one-time pad''),because otherwise if a codebreaker were able to intercept a message and (viasome other covert means) its corresponding translation, that could be used todecipher messages encoded with the same page. This sort of setup makessense if an agent in the field is communicating with central command (butnot with each other). Each agent could be given his own codebook (sothat if he is captured, the whole system is not compromised), and heuses one page per message. Central command has on file the books foreach agent.

A variation on this theme is the Augustus cipher,4.17where instead of arandom sequence of shifts, a phrase or passage from a text which is as longas the plaintext is used. The trouble with this is that, because of theregularities in the key, a statistical analysis of the crypttext allows oneto break the cipher.

Another issue is that to be truly unbreakable, the random sequence must betruly random, with no correlation among the characters. This is harder thanit sounds-- real randomness is hard to come by. If the randomsequence has some predictability, the resulting stream can beattacked. A number of attacks on cryptosystems have been made not bybreaking the encryption scheme directly, but because the underlyingrandom-number generator was predictable.

Multi-character alphabets

We can also improve security a bit by treating larger chunks of text as thecharacters of our message. For example, if we start with the usual26-letter alphabet A-Z, we can turn it into a 676-letter alphabet bytreating pairs of letters as a unit (such pairs are called digraphs),or a 26³-letter alphabet by using trigraphs, or triples of letters. Thismakes frequency analysis much harder, and is quite easy to combine with theother crytptosystems already discussed. We will use 99-graphs on a256-letter alphabet (the ASCII code) when we implement the RSA cryptosystem in§11.2. While frequency analysis is stillpossible (charts of digraph frequencies are readily available,trigraphs less so), the analysis is much more complex.

To convert the digraph ``HI'' to an integer (using a length 26² alphabetof digraphs), one simple way is to just treat it as a base-26 number. Thatis, ``HI'' becomes 7×26 + 8 = 190, assuming the correspondence ofH=7, I=8. To convert back, we look at the quotient and remainder whendividing by 26. For example, 300 = 26×11 + 14, yielding ``LO''.

Either we can do this arithmetic ourselves directly, or we can usethe convert(,base) command. This command takes a list ofnumbers in one base and converts it to another. One slightly confusingfact is that in this conversion, the least significant figure comesfirst in the list,4.21 instead of the usual method of writing numbers with the mostsignificant digit first.

For example, 128 = 1×10² + 2×10¹ + 8×10⁰ would be written as [8,2,1]. To convert this to base 16, we would notethat 128 = 8×16¹ + 0×16⁰, so in base16, it is writtenas 80.

Doing this calculation in Maple, we have

>  convert([8,2,1], base, 10, 16);

Below is one way to implement the conversion of text to numeric valuesfor k-graphs. We assume our usual functions StringToList andListToString are defined (see §4), aswell as the global Alphabet. The routine below convertstext into a list of integers, treating each block of kletters as a unit. A block of k characters c₁c₂c₃...c_k is assigned thenumeric value x_ip^k, where x_i is the numericequivalent of c_{i + 1} assigned by StringToList.

>  StringToKgraph := proc(text::string, k::posint) local p; global Alphabet; p:= length(Alphabet); convert(StringToList(text), base, p, p^k); end:

>  KgraphToString := proc(numlist::list(nonnegint), k::posint) local p; global Alphabet; p:=length(Alphabet); ListToString( convert(numlist, base, p^k, p)); end:

In the examples below, we are using our usual 97-character alphabet.Of course, this will work on any alphabet, although the specificnumbers will differ.

>  StringToKgraph("two by two",2);

In our alphabet, ``t'' is character number 86 and ``w'' is number 89,so the digraph ``tw'' encodes as 86 + 89×97 = 8719 (remember weare using a little-endian representation). Similarly, ``o'' gives2×97 + 81, and so on. Notice that the two occurrences of``two'' give different numbers, because one begins on an evencharacter and the other starts on an odd one. The first correspondsto the digraphs ``tw'' and ``o'', while the second is ``t'' and ``wo''.

We can also encode with 4-graphs, if we like.

>  StringToKgraph("two by two",4);

One advantage (for us) of Maple's use of the little-endian order isthat we needn't worry whether the length of our text is divisible byk. In the above example, the last 4-graph is ``wo'', which isencoded as though it had two more characters on the end with numericcode of0. The disadvantage of this is that if our text ends with thecharacter with code0 (in our standard 97-character alphabet, this is anewline), that character will be lost.

Another way to treat multiple characters together is to think of them asvectors. For example, the digraph ``by'' might correspond to the vector [68,91]. We will treat this approach in §8.

Footnotes

... cipher4.14: This cipher takes its name after Blaise deVignère, although it is actually a corruption of the one he introduced in 1585.Vignère's original cipher changed the shift amount each letter based onthe result of the last encoding, and never repeated. This scheme is much harderto break. However, one reason for its lack of popularity was probably dueto the fact that a single error renders the rest of the messageundecipherable. More details van be found in [Kahn].
...unbreakable.4.15: In fact, as late as 1917, this cipher was described as ``impossible of translation'' in a respected journal (Scientific American),even though the means to break it had been well known among cryptographers for at least 50 years.
....4.16: Note that we could also use the Vignere routine, but with the inverseof the key. For example, in the preceding example, the inverse of ``Prufrock'' is ``M+(7+.:2'': the numeric code ofP plus the numeric code of M is 97 (the length of thealphabet), similarly for r and +, u and (,and so on.
... cipher,4.17: Sometimes a Caesar cipher with a shift of +1 is also called an ``Augustus Cipher'', even though these are very different ciphers.
... pad.4.18: Technically speaking, this is not a one-time pad, but a one-time stream. The distinction is subtle, and we will ignore it here.
... key.4.19: Pseudo-random number generators appropriate for cryptography are rare. Most implementations (including Maple's) are good enough for everyday use, but not enough to be cryptographically secure. By analyzing the output of a typical random number generator, a good cryptanalyst can usually determine the pattern. For example, Maple'srand function (and that of most computer languages,such as C, Fortran, and Java)gives the result of a affine sequence of numbers, reduced tosome modular base. That is, x_i = ax_{i - 1} + b and s_i = x_imodn for some fixed choices of a, b, and n. Inthis setting, the seed is x₀.We shall ignore the problem that this sequence is guessable, but if you want real security, you cannot.
..._seed.4.20: We can choose a ``random'' seed (based on the computer's clock) using the function randomize().
... list,4.21: Such a representation is called little-endian, as opposed to a big-endian one. Some computers represent numbers internally in big-endian format (SunSPARC, PowerMacintosh), and others use a little-endian representation (those using Intel processors, which is what MS-Windows and most versions of Linux run on). This name comes from Gulliver's Travels, where in the land of Blefuscu there is war between the big-endians (who eat their eggs big end first), and the little-endians (who eat eggs starting from the little end).

Next: Reading and Writing from Up: fsqFsHn sGGousG Previous: Caesar cipher redux

Translated from LaTeX by Scott Sutherland
2002-08-29