diff options
Diffstat (limited to 'ss2017/cryptography/Notes01.txt')
-rw-r--r-- | ss2017/cryptography/Notes01.txt | 108 |
1 files changed, 108 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes01.txt b/ss2017/cryptography/Notes01.txt new file mode 100644 index 0000000..75c4cc6 --- /dev/null +++ b/ss2017/cryptography/Notes01.txt | |||
@@ -0,0 +1,108 @@ | |||
1 | Throughout the notes the acronym KL stands for the book "Introduction | ||
2 | to Modern Cryptography" by Jonathan Katz and Yehuda Lindell, Chapman & | ||
3 | Hall, 2008. | ||
4 | |||
5 | Set braces {} may be used to enclose a subscript consisting of more | ||
6 | than one letter. E.g x_{10} is "x sub ten". Sometimes we omit the | ||
7 | underscore vor subscripting and write x1 or xi for x_1 and x_i. | ||
8 | |||
9 | |||
10 | 1 Overview, Historic Ciphers: Caesar, Vigenere, (KL1.3) | ||
11 | --------------------------------------------------------- | ||
12 | |||
13 | |||
14 | The Shift Cipher | ||
15 | - - - - - - - - - | ||
16 | |||
17 | Identify the 26 upper case alphabet letters with numbers 0..25, | ||
18 | e.g. A=0, B=1. Choose an integer k:{0..25} as "key" and encrypt a | ||
19 | letter x as x+k mod 26. Encrypt a text as the sequence of the | ||
20 | encryption of its letter. For example, if k=3 then the text | ||
21 | BEGINTHEATTACKNOW is encrypted as EHJLQWKHDWWDFNQRZ. The decryption | ||
22 | of a letter x is just x-k mod 26 and the decryption of a text is | ||
23 | obtained by applying this procedure letterwise. | ||
24 | |||
25 | The special case k=3 is known as "Caesar Cipher" since purportedly | ||
26 | Julias Caesar used it to encrypt some of his letters. The special case | ||
27 | k=13 is known as ROT-13. | ||
28 | |||
29 | There are two simple attacks on a shift cipher. The first one uses the | ||
30 | fact that there are only 26 possible keys so one can try them all | ||
31 | out. This, however, requires that one can recognize the right | ||
32 | plaintext, e.g. because it is correct English. | ||
33 | |||
34 | The second is based on frequencies of letters. In a standard English | ||
35 | text the letters occur with typical frequencies. E.g. E is the most | ||
36 | frequent letter with likelihood 12.7%. Then follows T with 9.0% and A | ||
37 | with 8.2% etc. Thus, we can count the relative frequencies of the | ||
38 | letters in the ciphertext and thus work out the key. | ||
39 | |||
40 | An improved and more systematic attack based on frequencies goes as | ||
41 | follows: Let p_i stand for the frequency of letter i, e.g. p_4 = 12.7% | ||
42 | since E=4. One has | ||
43 | |||
44 | sum_{i=0..25} p_i^2 =~= 0.065 | ||
45 | |||
46 | So, if q_i is the frequency of the letter i *in the ciphertext* then | ||
47 | we expect q_{i+k} to be close to p_i so that we can recover k by | ||
48 | choosing j in such a way that the following quantity I_j is as close to | ||
49 | 0.065 as possible. | ||
50 | |||
51 | I_j := sum_{i=0..25} p_i * q_{i+j mod 26} | ||
52 | |||
53 | |||
54 | |||
55 | The Vigenere Cipher | ||
56 | - - - - - - - - - - | ||
57 | |||
58 | The shift cipher suffers from two problems: | ||
59 | |||
60 | 1) the number of possible keys is too small | ||
61 | 2) the frequency distribution of letters in the plain text shows in | ||
62 | the ciphertext | ||
63 | |||
64 | A famous historic cipher not suffering from the first problem is the | ||
65 | Vigenere cipher. Here one selects a word or a short text as key, | ||
66 | writes the key repeatedly under the plaintext and then "adds" letters | ||
67 | one above another (as numbers mod 26). Here is an example with key BEADS: | ||
68 | |||
69 | THEMANANDTHEWOMANRETRIEVEDTHELETTERFROMTHEPOSTOFFICE | ||
70 | BEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBE | ||
71 | ULEPSOENGLIIWREBRRHLSMEYWEXHHDFXTHJGVOPLIIPRKUSFIADI | ||
72 | |||
73 | Alas, the frequency distribution of the plaintext still seeps through | ||
74 | and we can use this information to break the cipher without knowing | ||
75 | the key: | ||
76 | |||
77 | If we know the length of the key, e.g. 5, and the ciphertext is x_0 | ||
78 | x_1 x_2 ... then we can apply the previous frequency analysis (with | ||
79 | the I_j scores) to the 5 texts | ||
80 | |||
81 | x_0 x_5 x_10 ... | ||
82 | x_1 x_6 x_11 ... | ||
83 | ... | ||
84 | x_4 x_9 x_14 ... | ||
85 | |||
86 | all of which are obtained by encrypting the corresponding plaintext | ||
87 | fragments with a shift cipher. Therefore, their frequency spectrum | ||
88 | should be a shift of the standard one and the corresponding shift, | ||
89 | hence the individual letters of the key, can be retrieved. | ||
90 | |||
91 | In order to get the key length we can follow a similar approach. For a | ||
92 | given shift t let q_i be the relative frequency of letter i (e.g. E | ||
93 | when i=4) in the sequence x_t, x_2t, x_3t. If the shift t is the | ||
94 | actual key length then this subsequence will have the frequency | ||
95 | spectrum as English plaintexts just shifted by t so we expect that | ||
96 | |||
97 | J_t := sum_{i=0..25} q_i^2 =~= 0.065 | ||
98 | |||
99 | since the summation is invariant under the any shift. If, however, the | ||
100 | shift was the wrong one then we expect the letters to appear | ||
101 | completely random so that in that case the above sum is rather close | ||
102 | to 1/26 = 0.038. Thus, we compute J_t for t=1,2,3,... and pick the t | ||
103 | for which J_t is largest (and close to 0.065). | ||
104 | |||
105 | Other historic ciphers include monoalphabetic substitution (key = | ||
106 | arbitrary permutation of the 26 letters), Grand Chiffre, the German | ||
107 | Enigma, etc. All these historic ciphers have been broken. See | ||
108 | Wikipedia and other online resources. | ||