summaryrefslogtreecommitdiff
path: root/ss2017/cryptography/Notes01.txt
blob: 75c4cc691df433a679d8c4b447f0eeb49b04cb32 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Throughout the notes the acronym KL stands for the book "Introduction
to Modern Cryptography" by Jonathan Katz and Yehuda Lindell, Chapman &
Hall, 2008.

Set braces {} may be used to enclose a subscript consisting of more
than one letter. E.g x_{10} is "x sub ten". Sometimes we omit the
underscore vor subscripting and write x1 or xi for x_1 and x_i.


1 Overview, Historic Ciphers:  Caesar, Vigenere,  (KL1.3)
---------------------------------------------------------


The Shift Cipher
- - - - - - - - -

Identify the 26 upper case alphabet letters with numbers 0..25,
e.g. A=0, B=1.  Choose an integer k:{0..25} as "key" and encrypt a
letter x as x+k mod 26. Encrypt a text as the sequence of the
encryption of its letter.  For example, if k=3 then the text
BEGINTHEATTACKNOW is encrypted as EHJLQWKHDWWDFNQRZ.  The decryption
of a letter x is just x-k mod 26 and the decryption of a text is
obtained by applying this procedure letterwise.

The special case k=3 is known as "Caesar Cipher" since purportedly
Julias Caesar used it to encrypt some of his letters. The special case
k=13 is known as ROT-13.

There are two simple attacks on a shift cipher. The first one uses the
fact that there are only 26 possible keys so one can try them all
out. This, however, requires that one can recognize the right
plaintext, e.g. because it is correct English.

The second is based on frequencies of letters. In a standard English
text the letters occur with typical frequencies. E.g. E is the most
frequent letter with likelihood 12.7%. Then follows T with 9.0% and A
with 8.2% etc. Thus, we can count the relative frequencies of the
letters in the ciphertext and thus work out the key.

An improved and more systematic attack based on frequencies goes as
follows: Let p_i stand for the frequency of letter i, e.g. p_4 = 12.7%
since E=4. One has

                         sum_{i=0..25} p_i^2 =~= 0.065

So, if q_i is the frequency of the letter i *in the ciphertext* then
we expect q_{i+k} to be close to p_i so that we can recover k by
choosing j in such a way that the following quantity I_j is as close to
0.065 as possible. 

                     I_j := sum_{i=0..25} p_i * q_{i+j mod 26}



The Vigenere Cipher
- - - - - - - - - -

The shift cipher suffers from two problems:

1) the number of possible keys is too small
2) the frequency distribution of letters in the plain text shows in
   the ciphertext

A famous historic cipher not suffering from the first problem is the
Vigenere cipher. Here one selects a word or a short text as key,
writes the key repeatedly under the plaintext and then "adds" letters
one above another (as numbers mod 26). Here is an example with key BEADS:

THEMANANDTHEWOMANRETRIEVEDTHELETTERFROMTHEPOSTOFFICE
BEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBE
ULEPSOENGLIIWREBRRHLSMEYWEXHHDFXTHJGVOPLIIPRKUSFIADI

Alas, the frequency distribution of the plaintext still seeps through
and we can use this information to break the cipher without knowing
the key:

If we know the length of the key, e.g. 5, and the ciphertext is x_0
x_1 x_2 ...  then we can apply the previous frequency analysis (with
the I_j scores) to the 5 texts

x_0 x_5 x_10 ...
x_1 x_6 x_11 ...
...
x_4 x_9 x_14 ...

all of which are obtained by encrypting the corresponding plaintext
fragments with a shift cipher. Therefore, their frequency spectrum
should be a shift of the standard one and the corresponding shift,
hence the individual letters of the key, can be retrieved.

In order to get the key length we can follow a similar approach. For a
given shift t let q_i be the relative frequency of letter i (e.g. E
when i=4) in the sequence x_t, x_2t, x_3t. If the shift t is the
actual key length then this subsequence will have the frequency
spectrum as English plaintexts just shifted by t so we expect that

            J_t := sum_{i=0..25} q_i^2 =~= 0.065

since the summation is invariant under the any shift. If, however, the
shift was the wrong one then we expect the letters to appear
completely random so that in that case the above sum is rather close
to 1/26 = 0.038. Thus, we compute J_t for t=1,2,3,... and pick the t
for which J_t is largest (and close to 0.065).

Other historic ciphers include monoalphabetic substitution (key =
arbitrary permutation of the 26 letters), Grand Chiffre, the German
Enigma, etc. All these historic ciphers have been broken. See
Wikipedia and other online resources.