1 files changed, 108 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes01.txt b/ss2017/cryptography/Notes01.txt
new file mode 100644
index 0000000..75c4cc6
--- /dev/null
+++ b/ss2017/cryptography/Notes01.txt
@@ -0,0 +1,108 @@
+Throughout the notes the acronym KL stands for the book "Introduction
+to Modern Cryptography" by Jonathan Katz and Yehuda Lindell, Chapman &
+Hall, 2008.
+Set braces {} may be used to enclose a subscript consisting of more
+than one letter. E.g x_{10} is "x sub ten". Sometimes we omit the
+underscore vor subscripting and write x1 or xi for x_1 and x_i.
+1 Overview, Historic Ciphers:  Caesar, Vigenere,  (KL1.3)
+---------------------------------------------------------
+The Shift Cipher
+- - - - - - - - -
+Identify the 26 upper case alphabet letters with numbers 0..25,
+e.g. A=0, B=1.  Choose an integer k:{0..25} as "key" and encrypt a
+letter x as x+k mod 26. Encrypt a text as the sequence of the
+encryption of its letter.  For example, if k=3 then the text
+BEGINTHEATTACKNOW is encrypted as EHJLQWKHDWWDFNQRZ.  The decryption
+of a letter x is just x-k mod 26 and the decryption of a text is
+obtained by applying this procedure letterwise.
+The special case k=3 is known as "Caesar Cipher" since purportedly
+Julias Caesar used it to encrypt some of his letters. The special case
+k=13 is known as ROT-13.
+There are two simple attacks on a shift cipher. The first one uses the
+fact that there are only 26 possible keys so one can try them all
+out. This, however, requires that one can recognize the right
+plaintext, e.g. because it is correct English.
+The second is based on frequencies of letters. In a standard English
+text the letters occur with typical frequencies. E.g. E is the most
+frequent letter with likelihood 12.7%. Then follows T with 9.0% and A
+with 8.2% etc. Thus, we can count the relative frequencies of the
+letters in the ciphertext and thus work out the key.
+An improved and more systematic attack based on frequencies goes as
+follows: Let p_i stand for the frequency of letter i, e.g. p_4 = 12.7%
+since E=4. One has
+                         sum_{i=0..25} p_i^2 =~= 0.065
+So, if q_i is the frequency of the letter i *in the ciphertext* then
+we expect q_{i+k} to be close to p_i so that we can recover k by
+choosing j in such a way that the following quantity I_j is as close to
+0.065 as possible. 
+                     I_j := sum_{i=0..25} p_i * q_{i+j mod 26}
+The Vigenere Cipher
+- - - - - - - - - -
+The shift cipher suffers from two problems:
+1) the number of possible keys is too small
+2) the frequency distribution of letters in the plain text shows in
+   the ciphertext
+A famous historic cipher not suffering from the first problem is the
+Vigenere cipher. Here one selects a word or a short text as key,
+writes the key repeatedly under the plaintext and then "adds" letters
+one above another (as numbers mod 26). Here is an example with key BEADS:
+THEMANANDTHEWOMANRETRIEVEDTHELETTERFROMTHEPOSTOFFICE
+BEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBE
+ULEPSOENGLIIWREBRRHLSMEYWEXHHDFXTHJGVOPLIIPRKUSFIADI
+Alas, the frequency distribution of the plaintext still seeps through
+and we can use this information to break the cipher without knowing
+the key:
+If we know the length of the key, e.g. 5, and the ciphertext is x_0
+x_1 x_2 ...  then we can apply the previous frequency analysis (with
+the I_j scores) to the 5 texts
+x_0 x_5 x_10 ...
+x_1 x_6 x_11 ...
+...
+x_4 x_9 x_14 ...
+all of which are obtained by encrypting the corresponding plaintext
+fragments with a shift cipher. Therefore, their frequency spectrum
+should be a shift of the standard one and the corresponding shift,
+hence the individual letters of the key, can be retrieved.
+In order to get the key length we can follow a similar approach. For a
+given shift t let q_i be the relative frequency of letter i (e.g. E
+when i=4) in the sequence x_t, x_2t, x_3t. If the shift t is the
+actual key length then this subsequence will have the frequency
+spectrum as English plaintexts just shifted by t so we expect that
+            J_t := sum_{i=0..25} q_i^2 =~= 0.065
+since the summation is invariant under the any shift. If, however, the
+shift was the wrong one then we expect the letters to appear
+completely random so that in that case the above sum is rather close
+to 1/26 = 0.038. Thus, we compute J_t for t=1,2,3,... and pick the t
+for which J_t is largest (and close to 0.065).
+Other historic ciphers include monoalphabetic substitution (key =
+arbitrary permutation of the 26 letters), Grand Chiffre, the German
+Enigma, etc. All these historic ciphers have been broken. See
+Wikipedia and other online resources.

diff --git a/ss2017/cryptography/Notes01.txt b/ss2017/cryptography/Notes01.txt new file mode 100644 index 0000000..75c4cc6 --- /dev/null +++ b/ss2017/cryptography/Notes01.txt
@@ -0,0 +1,108 @@
	1	Throughout the notes the acronym KL stands for the book "Introduction
	2	to Modern Cryptography" by Jonathan Katz and Yehuda Lindell, Chapman &
	3	Hall, 2008.
	4
	5	Set braces {} may be used to enclose a subscript consisting of more
	6	than one letter. E.g x_{10} is "x sub ten". Sometimes we omit the
	7	underscore vor subscripting and write x1 or xi for x_1 and x_i.
	8
	9
	10	1 Overview, Historic Ciphers: Caesar, Vigenere, (KL1.3)
	11	---------------------------------------------------------
	12
	13
	14	The Shift Cipher
	15	- - - - - - - - -
	16
	17	Identify the 26 upper case alphabet letters with numbers 0..25,
	18	e.g. A=0, B=1. Choose an integer k:{0..25} as "key" and encrypt a
	19	letter x as x+k mod 26. Encrypt a text as the sequence of the
	20	encryption of its letter. For example, if k=3 then the text
	21	BEGINTHEATTACKNOW is encrypted as EHJLQWKHDWWDFNQRZ. The decryption
	22	of a letter x is just x-k mod 26 and the decryption of a text is
	23	obtained by applying this procedure letterwise.
	24
	25	The special case k=3 is known as "Caesar Cipher" since purportedly
	26	Julias Caesar used it to encrypt some of his letters. The special case
	27	k=13 is known as ROT-13.
	28
	29	There are two simple attacks on a shift cipher. The first one uses the
	30	fact that there are only 26 possible keys so one can try them all
	31	out. This, however, requires that one can recognize the right
	32	plaintext, e.g. because it is correct English.
	33
	34	The second is based on frequencies of letters. In a standard English
	35	text the letters occur with typical frequencies. E.g. E is the most
	36	frequent letter with likelihood 12.7%. Then follows T with 9.0% and A
	37	with 8.2% etc. Thus, we can count the relative frequencies of the
	38	letters in the ciphertext and thus work out the key.
	39
	40	An improved and more systematic attack based on frequencies goes as
	41	follows: Let p_i stand for the frequency of letter i, e.g. p_4 = 12.7%
	42	since E=4. One has
	43
	44	sum_{i=0..25} p_i^2 =~= 0.065
	45
	46	So, if q_i is the frequency of the letter i in the ciphertext then
	47	we expect q_{i+k} to be close to p_i so that we can recover k by
	48	choosing j in such a way that the following quantity I_j is as close to
	49	0.065 as possible.
	50
	51	I_j := sum_{i=0..25} p_i * q_{i+j mod 26}
	52
	53
	54
	55	The Vigenere Cipher
	56	- - - - - - - - - -
	57
	58	The shift cipher suffers from two problems:
	59
	60	1) the number of possible keys is too small
	61	2) the frequency distribution of letters in the plain text shows in
	62	the ciphertext
	63
	64	A famous historic cipher not suffering from the first problem is the
	65	Vigenere cipher. Here one selects a word or a short text as key,
	66	writes the key repeatedly under the plaintext and then "adds" letters
	67	one above another (as numbers mod 26). Here is an example with key BEADS:
	68
	69	THEMANANDTHEWOMANRETRIEVEDTHELETTERFROMTHEPOSTOFFICE
	70	BEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBEADSBE
	71	ULEPSOENGLIIWREBRRHLSMEYWEXHHDFXTHJGVOPLIIPRKUSFIADI
	72
	73	Alas, the frequency distribution of the plaintext still seeps through
	74	and we can use this information to break the cipher without knowing
	75	the key:
	76
	77	If we know the length of the key, e.g. 5, and the ciphertext is x_0
	78	x_1 x_2 ... then we can apply the previous frequency analysis (with
	79	the I_j scores) to the 5 texts
	80
	81	x_0 x_5 x_10 ...
	82	x_1 x_6 x_11 ...
	83	...
	84	x_4 x_9 x_14 ...
	85
	86	all of which are obtained by encrypting the corresponding plaintext
	87	fragments with a shift cipher. Therefore, their frequency spectrum
	88	should be a shift of the standard one and the corresponding shift,
	89	hence the individual letters of the key, can be retrieved.
	90
	91	In order to get the key length we can follow a similar approach. For a
	92	given shift t let q_i be the relative frequency of letter i (e.g. E
	93	when i=4) in the sequence x_t, x_2t, x_3t. If the shift t is the
	94	actual key length then this subsequence will have the frequency
	95	spectrum as English plaintexts just shifted by t so we expect that
	96
	97	J_t := sum_{i=0..25} q_i^2 =~= 0.065
	98
	99	since the summation is invariant under the any shift. If, however, the
	100	shift was the wrong one then we expect the letters to appear
	101	completely random so that in that case the above sum is rather close
	102	to 1/26 = 0.038. Thus, we compute J_t for t=1,2,3,... and pick the t
	103	for which J_t is largest (and close to 0.065).
	104
	105	Other historic ciphers include monoalphabetic substitution (key =
	106	arbitrary permutation of the 26 letters), Grand Chiffre, the German
	107	Enigma, etc. All these historic ciphers have been broken. See
	108	Wikipedia and other online resources.