summaryrefslogtreecommitdiff
path: root/ss2017/cryptography/Notes01.txt
diff options
context:
space:
mode:
Diffstat (limited to 'ss2017/cryptography/Notes01.txt')
-rw-r--r--ss2017/cryptography/Notes01.txt108
1 files changed, 108 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes01.txt b/ss2017/cryptography/Notes01.txt
new file mode 100644
index 0000000..75c4cc6
--- /dev/null
+++ b/ss2017/cryptography/Notes01.txt
@@ -0,0 +1,108 @@
1Throughout the notes the acronym KL stands for the book "Introduction
2to Modern Cryptography" by Jonathan Katz and Yehuda Lindell, Chapman &
3Hall, 2008.
4
5Set braces {} may be used to enclose a subscript consisting of more
6than one letter. E.g x_{10} is "x sub ten". Sometimes we omit the
7underscore vor subscripting and write x1 or xi for x_1 and x_i.
8
9
101 Overview, Historic Ciphers: Caesar, Vigenere, (KL1.3)
11---------------------------------------------------------
12
13
14The Shift Cipher
15- - - - - - - - -
16
17Identify the 26 upper case alphabet letters with numbers 0..25,
18e.g. A=0, B=1. Choose an integer k:{0..25} as "key" and encrypt a
19letter x as x+k mod 26. Encrypt a text as the sequence of the
20encryption of its letter. For example, if k=3 then the text
21BEGINTHEATTACKNOW is encrypted as EHJLQWKHDWWDFNQRZ. The decryption
22of a letter x is just x-k mod 26 and the decryption of a text is
23obtained by applying this procedure letterwise.
24
25The special case k=3 is known as "Caesar Cipher" since purportedly
26Julias Caesar used it to encrypt some of his letters. The special case
27k=13 is known as ROT-13.
28
29There are two simple attacks on a shift cipher. The first one uses the
30fact that there are only 26 possible keys so one can try them all
31out. This, however, requires that one can recognize the right
32plaintext, e.g. because it is correct English.
33
34The second is based on frequencies of letters. In a standard English
35text the letters occur with typical frequencies. E.g. E is the most
36frequent letter with likelihood 12.7%. Then follows T with 9.0% and A
37with 8.2% etc. Thus, we can count the relative frequencies of the
38letters in the ciphertext and thus work out the key.
39
40An improved and more systematic attack based on frequencies goes as
41follows: Let p_i stand for the frequency of letter i, e.g. p_4 = 12.7%
42since E=4. One has
43
44 sum_{i=0..25} p_i^2 =~= 0.065
45
46So, if q_i is the frequency of the letter i *in the ciphertext* then
47we expect q_{i+k} to be close to p_i so that we can recover k by
48choosing j in such a way that the following quantity I_j is as close to
490.065 as possible.
50
51 I_j := sum_{i=0..25} p_i * q_{i+j mod 26}
52
53
54
55The Vigenere Cipher
56- - - - - - - - - -
57
58The shift cipher suffers from two problems:
59
601) the number of possible keys is too small
612) the frequency distribution of letters in the plain text shows in
62 the ciphertext
63
64A famous historic cipher not suffering from the first problem is the
65Vigenere cipher. Here one selects a word or a short text as key,
66writes the key repeatedly under the plaintext and then "adds" letters
67one above another (as numbers mod 26). Here is an example with key BEADS:
68
69THEMANANDTHEWOMANRETRIEVEDTHELETTERFROMTHEPOSTOFFICE
70BEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBE
71ULEPSOENGLIIWREBRRHLSMEYWEXHHDFXTHJGVOPLIIPRKUSFIADI
72
73Alas, the frequency distribution of the plaintext still seeps through
74and we can use this information to break the cipher without knowing
75the key:
76
77If we know the length of the key, e.g. 5, and the ciphertext is x_0
78x_1 x_2 ... then we can apply the previous frequency analysis (with
79the I_j scores) to the 5 texts
80
81x_0 x_5 x_10 ...
82x_1 x_6 x_11 ...
83...
84x_4 x_9 x_14 ...
85
86all of which are obtained by encrypting the corresponding plaintext
87fragments with a shift cipher. Therefore, their frequency spectrum
88should be a shift of the standard one and the corresponding shift,
89hence the individual letters of the key, can be retrieved.
90
91In order to get the key length we can follow a similar approach. For a
92given shift t let q_i be the relative frequency of letter i (e.g. E
93when i=4) in the sequence x_t, x_2t, x_3t. If the shift t is the
94actual key length then this subsequence will have the frequency
95spectrum as English plaintexts just shifted by t so we expect that
96
97 J_t := sum_{i=0..25} q_i^2 =~= 0.065
98
99since the summation is invariant under the any shift. If, however, the
100shift was the wrong one then we expect the letters to appear
101completely random so that in that case the above sum is rather close
102to 1/26 = 0.038. Thus, we compute J_t for t=1,2,3,... and pick the t
103for which J_t is largest (and close to 0.065).
104
105Other historic ciphers include monoalphabetic substitution (key =
106arbitrary permutation of the 26 letters), Grand Chiffre, the German
107Enigma, etc. All these historic ciphers have been broken. See
108Wikipedia and other online resources.