1 files changed, 257 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes10.txt b/ss2017/cryptography/Notes10.txt
new file mode 100644
index 0000000..5457bf3
--- /dev/null
+++ b/ss2017/cryptography/Notes10.txt
@@ -0,0 +1,257 @@
+10 Message integrity, CBC-MAC (KL4.1-4.5)
+-----------------------------------------
+So far, we wanted to prevent an eavesdropper or a more active attacker
+from getting to know the content of encrypted messages or at least
+partial information about them.
+Another important objective for cryptography is message integrity and
+authenticity. We want to prevent the attacker from being able to
+manipulate an encrypted message or from fabricating one such.
+It is not the case that encryption automatically guarantees
+integrity. For example, with the Vernam cipher and derived systems
+flipping one bit in the ciphertext results in the same bit in the
+plaintext being flipped. Also, randomly chosen ciphertexts correspond
+to plaintexts albeit probably nonsensical ones. 
+Message authentication codes
+- - - - - - - - - - - - - - -
+Definition: A message authentication code (MAC) consists of two
+probabilistic polynomial time algorithms (Gen,Mac) where 
+1. On input n (security parameter) Gen outputs a key k of length >= n 
+2. Given key k and message m:{0,1}* the tag-generation algorithm Mac outputs
+   t = Mac_k(m)
+We assume (unlike in KL where this is a special case) that Mac is
+deterministic.
+If m is furthermore constrained to be from {0,1}^l(n) for some length
+function l(n), e.g. l(n)=n then the MAC is called *fixed length*.    EOD
+The intended usage is as follows: Alice and Bob share a secret key k
+that has been obtained with Gen(). If Alice wants to send a message m
+to Bob she could compute the tag t=Mac_k(m) and transmit (m,t) either
+encrypted or not.
+In order to be sure that the received message (m,t) was really from
+Alice he can then compute t'=Mac_k(m) and check that t=t'. The MAC
+should be designed in such a way that no adversary can fabricate valid
+tags unless it knows the key so that if the verification succeeds Bob
+can indeed be sure that the message was from Alice. The following
+experiment formalizes this.
+The message authentication experiment Mac-forge_{A,Pi}(n):
+Pi = (Gen,Mac) a MAC, A an adversary, n the security parameter
+1. A random key k is generated with Gen(n). 
+2. The adversary is given n and oracle access to Mac_k(), i.e. it can
+request arbitrary tags w.r.t k. The adversary eventually outputs a
+pair (m,t). 
+3. The output of the experiment is defined to be 1 if
+  (1) Mac_k(m)=t and (2) m has not previously been queried by A.
+Thus, the adversary must prepare a valid tag for a message it has not
+previously requested a tag for. 
+We define a MAC to be secure if for all adversaries 
+the probability that Mac-forge yields 1 is negligible.
+While this notion of security is indeed very strong there are some
+attacks it cannot withstand:
+1) Replay attacks. The adversary can store an authenticated message
+(m,t) and send it at some later point. In order to prevent that one
+can extend message with timestamps or similar.
+2) Timing attacks. If the check t=t'? proceeds by comparing bits
+starting from the left until the first mismatch is encountered (cf
+strcmp() from the C-library) then the time it takes to perform the
+check depends on the tags. This can be used to find out the bits of a
+valid tag one by one. According to KL this kind of attack has been
+used against an early version of the Xbox in order to install
+unlicensed games. To prevent the attack one should make sure that the
+computation time for the comparison is always the same.
+Construction of a fixed-length MAC from a pseudorandom function:
+Let F_k() be a pseudorandom function. The following then is a secure
+fixed-length MAC.
+1. Gen(): Given n choose k:{0,1}^n uniformly at random
+2. Mac(): On input k and m output the tag t=F_k(m)
+Theorem: The above MAC is secure.
+Proof sketch: As in the other security proofs it suffices to show this for
+truly random F (selected at the beginning of the experiment). The
+passage to a pseudorandom F then incurs a negligible change in
+probability. In order to produce a valid tag the adversary would have
+to correctly guess a random value from {0,1}^n which has indeed
+negligible probability.  QED
+CBC-MAC
+- - - - 
+In order to extend a fixed-length MAC to variable length messages one
+can break the message up into equal length blocks m=m_1,...m_l of
+length n each possibly padding the last block with 0s and then
+authenticate each block separately. In order that this is secure, one
+must, however augment each block with
+- the length l, 
+- its sequence number i:{1..l}
+- a random identifier r:{0,1}^n (the same for each block)
+Thus, the blocks to be authenticated have length 4n each (assuming
+that l and i fit into n bits) and an appropriate MAC must be
+used. Alternatively, one can break m into blocks of size n/4....
+These additional data prevent the adversary from dropping blocks at
+the end (length), changing the order of blocks (sequence numbers),
+mixing blocks from two different messages (random identifier).
+One can show that this scheme is indeed secure, but it is rather
+inefficient since the tags are 4 times as long as the message itself.
+A more efficient method mimicks the CBC-mode encryption but is subtly
+different:
+Rather than an arbitrary MAC we need a pseudorandom function F_k,
+i.e. a block cipher. We then decompose the message m into equal sized
+blocks
+m = m_1 .. m_l
+as before (using padding if needed). We then *prepend* the message
+with another block encoding the length l (which is thus assumed
+smaller than 2^n). Then blocks are encrypted and XORed with the
+encryption of the previous block prior to encryption. The encryption
+of the last block then is the desired MAC:
+t_0 = F_k(l)
+t_{i+1} = F_k(t_i + m_i)
+t = t_l
+We will prove that this scheme is secure. Notice, however, that
+seemingly innocuous modifications thwart its security:
+First, let us see why we need to incorporate the length. Suppose we
+would just put: 
+t_1 = F_k(m_1)
+t_2 = F_k(m_2+t_1)
+...
+t = t_l
+Then, if t=Mac_k(m_1 m_2) and t'=Mac_k(m_3+t m_4) then t'=Mac_k(m_1
+m_2 m_3 m_4) thus allowing us to forge a previously unqueried tag.
+Now suppose that we would start with a random initialisation vector IV
+and transmit it along with the tag:
+t_0 = F_k(l+IV)
+...
+This would allow us to modify the length: If (IV,t) is the MAC of l,m
+then (IV+l+l',t) is the MAC of l',m. 
+It is also not ok to attach the length at the end rather than at the
+beginning. Namely, suppose that we obtain the following tags where the
+length is added in parenthesis for clarity.
+ABCDE(5)  : t
+XYZKL(5)  : t'
+ABCDE5UVW(9) : t''
+We then have that the tag of XYZKL5 U+t+t' VW(9) is also t''. 
+We will now sketch a proof that CBC-MAC is indeed secure. We will show
+the following theorem from which the claim follows by the standard
+reasoning used in earlier security proofs.
+Theorem: Fix a security parameter n. For g:{0,1}^n->{0,1}^n and m=m1
+m2 ... mk write CBC_g(m) = CBC_g(m1,m2,...,mk) =
+   g(ml+g(m_{k-1}+ ... g(m2 + g(m1)) ... )
+Also let f:({0,1}^n)* -> {0,1}^n be an arbitrary function. Let D be an
+adversary which is given oracle access to a function ({0,1}^n)* ->
+{0,1}^n.
+Suppose that D queries its oracle on a prefix-free set, i.e. if X and
+Y are both queried then X is not a prefix of Y nor is Y a prefix of X.
+One then has
+   |Pr(D^{CBC_g()}(n) = 1) - Pr(D^{f()}(n) = 1)| = negl(n)
+where probabilities are taken over g and f chosen uniformly at random.
+To see why this implies the theorem we first consider that the
+difference between CBC_g and CBC_{F_k} is negligible since F_k is
+assumed to be pseudorandom. Furthermore, notice that in CBC-MAC proper
+the function CBC_{F_k} is applied to a subset of P:={|m| m |
+m:({0,1}^n)*} which itself is a prefix-free set. Thus, the whole
+business with including the length and putting it into the first
+position rather than anywhere else serves to ensure prefix-free-ness.
+Proof of Theorem (sketch; for details see KL, 2nd ed): 
+Let Q={X1..Xq} be the (prefix-free) set of queries (q = number of
+queries) made by the distinguisher D to the oracle. Let l be the
+maximum number of blocks in those queries. Notice that both q---the
+number of queries---and l---their maximum length are polynomial in n
+since the distinguished is assumed to run in polynomial time.  If D
+queries X and X = m1 m2 ... mk then (in the case where the oracle is
+instantiated with CBC_g()) the function g() will be evaluated at m1,
+m2+g(m1), m3+g(m2+g(m1)), ...  . Write C_g(X) for this sequence (of
+length < l). We say that Q exhibits a collision if among the
+g-evaluations occurring in {C_g(X) | X:Q} one finds the situation
+g(x)=g(x') where x =/= x'. We also consider it a collision if at some
+point we have mi+g(x) = mj'+g(x') despite x=/=x' (and then necessarily
+mi=/=mj').
+Under the assumption that g() is random, the probability that such a
+collision is exhibited is bounded by q^2 * l^2 * negl(n) for there are
+<= q^2 * l^2 possibilities for where such a collision might occur and
+the probability for a collision at a given position is negligible.
+Since q and l are polynomial in n this probability is therefore
+negligible itself.
+Let us assume that no collision is exhibited. Now, for each X:Q let
+v(X) be the last value in C_g(X). We have CBC_g(X) = g(v(X)).  The
+values v(X) are pairwise distinct and do not agree with any of the
+other values in {C_g(X) | X:Q}. This is where prefix-freeness is
+used. As a result, since g() is random, the answers to the queries
+g(v(X)) are uniformly random and thus adhere to the same distribution
+as f(X) for the random f. Thus, the only advantage that the
+distinguisher has stems from the possibility of a collision being
+exhibited which, as we saw above, is negligible.
+In conclusion, we note that an already existing implementation of
+CBC-mode (for encryption) can be turned into an implementation of
+CBC-MAC by
+ - using IV=0
+ - discarding all but the last blocks of the ciphertext
+ - prepending the plaintext with its length
+Mistakes in any one of these (e.g. using a random IV and transmitting
+it, omitting the length or placing it elsewhere, transmitting other
+blocks of the ciphertext, lead to insecurities. See the nice blog
+article "Why I hate CBC-MAC").
+https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/

diff --git a/ss2017/cryptography/Notes10.txt b/ss2017/cryptography/Notes10.txt new file mode 100644 index 0000000..5457bf3 --- /dev/null +++ b/ss2017/cryptography/Notes10.txt
@@ -0,0 +1,257 @@
	1
	2
	3	10 Message integrity, CBC-MAC (KL4.1-4.5)
	4	-----------------------------------------
	5
	6	So far, we wanted to prevent an eavesdropper or a more active attacker
	7	from getting to know the content of encrypted messages or at least
	8	partial information about them.
	9
	10	Another important objective for cryptography is message integrity and
	11	authenticity. We want to prevent the attacker from being able to
	12	manipulate an encrypted message or from fabricating one such.
	13
	14	It is not the case that encryption automatically guarantees
	15	integrity. For example, with the Vernam cipher and derived systems
	16	flipping one bit in the ciphertext results in the same bit in the
	17	plaintext being flipped. Also, randomly chosen ciphertexts correspond
	18	to plaintexts albeit probably nonsensical ones.
	19
	20
	21	Message authentication codes
	22	- - - - - - - - - - - - - - -
	23
	24	Definition: A message authentication code (MAC) consists of two
	25	probabilistic polynomial time algorithms (Gen,Mac) where
	26
	27	1. On input n (security parameter) Gen outputs a key k of length >= n
	28	2. Given key k and message m:{0,1}* the tag-generation algorithm Mac outputs
	29	t = Mac_k(m)
	30
	31	We assume (unlike in KL where this is a special case) that Mac is
	32	deterministic.
	33
	34	If m is furthermore constrained to be from {0,1}^l(n) for some length
	35	function l(n), e.g. l(n)=n then the MAC is called fixed length. EOD
	36
	37	The intended usage is as follows: Alice and Bob share a secret key k
	38	that has been obtained with Gen(). If Alice wants to send a message m
	39	to Bob she could compute the tag t=Mac_k(m) and transmit (m,t) either
	40	encrypted or not.
	41
	42	In order to be sure that the received message (m,t) was really from
	43	Alice he can then compute t'=Mac_k(m) and check that t=t'. The MAC
	44	should be designed in such a way that no adversary can fabricate valid
	45	tags unless it knows the key so that if the verification succeeds Bob
	46	can indeed be sure that the message was from Alice. The following
	47	experiment formalizes this.
	48
	49
	50	The message authentication experiment Mac-forge_{A,Pi}(n):
	51
	52	Pi = (Gen,Mac) a MAC, A an adversary, n the security parameter
	53
	54	1. A random key k is generated with Gen(n).
	55
	56	2. The adversary is given n and oracle access to Mac_k(), i.e. it can
	57	request arbitrary tags w.r.t k. The adversary eventually outputs a
	58	pair (m,t).
	59
	60	3. The output of the experiment is defined to be 1 if
	61	(1) Mac_k(m)=t and (2) m has not previously been queried by A.
	62
	63	Thus, the adversary must prepare a valid tag for a message it has not
	64	previously requested a tag for.
	65
	66	We define a MAC to be secure if for all adversaries
	67	the probability that Mac-forge yields 1 is negligible.
	68
	69	While this notion of security is indeed very strong there are some
	70	attacks it cannot withstand:
	71
	72	1) Replay attacks. The adversary can store an authenticated message
	73	(m,t) and send it at some later point. In order to prevent that one
	74	can extend message with timestamps or similar.
	75
	76	2) Timing attacks. If the check t=t'? proceeds by comparing bits
	77	starting from the left until the first mismatch is encountered (cf
	78	strcmp() from the C-library) then the time it takes to perform the
	79	check depends on the tags. This can be used to find out the bits of a
	80	valid tag one by one. According to KL this kind of attack has been
	81	used against an early version of the Xbox in order to install
	82	unlicensed games. To prevent the attack one should make sure that the
	83	computation time for the comparison is always the same.
	84
	85	Construction of a fixed-length MAC from a pseudorandom function:
	86
	87	Let F_k() be a pseudorandom function. The following then is a secure
	88	fixed-length MAC.
	89
	90	1. Gen(): Given n choose k:{0,1}^n uniformly at random
	91	2. Mac(): On input k and m output the tag t=F_k(m)
	92
	93	Theorem: The above MAC is secure.
	94
	95	Proof sketch: As in the other security proofs it suffices to show this for
	96	truly random F (selected at the beginning of the experiment). The
	97	passage to a pseudorandom F then incurs a negligible change in
	98	probability. In order to produce a valid tag the adversary would have
	99	to correctly guess a random value from {0,1}^n which has indeed
	100	negligible probability. QED
	101
	102
	103	CBC-MAC
	104	- - - -
	105
	106	In order to extend a fixed-length MAC to variable length messages one
	107	can break the message up into equal length blocks m=m_1,...m_l of
	108	length n each possibly padding the last block with 0s and then
	109	authenticate each block separately. In order that this is secure, one
	110	must, however augment each block with
	111
	112	- the length l,
	113	- its sequence number i:{1..l}
	114	- a random identifier r:{0,1}^n (the same for each block)
	115
	116	Thus, the blocks to be authenticated have length 4n each (assuming
	117	that l and i fit into n bits) and an appropriate MAC must be
	118	used. Alternatively, one can break m into blocks of size n/4....
	119
	120	These additional data prevent the adversary from dropping blocks at
	121	the end (length), changing the order of blocks (sequence numbers),
	122	mixing blocks from two different messages (random identifier).
	123
	124	One can show that this scheme is indeed secure, but it is rather
	125	inefficient since the tags are 4 times as long as the message itself.
	126
	127	A more efficient method mimicks the CBC-mode encryption but is subtly
	128	different:
	129
	130	Rather than an arbitrary MAC we need a pseudorandom function F_k,
	131	i.e. a block cipher. We then decompose the message m into equal sized
	132	blocks
	133
	134	m = m_1 .. m_l
	135
	136	as before (using padding if needed). We then prepend the message
	137	with another block encoding the length l (which is thus assumed
	138	smaller than 2^n). Then blocks are encrypted and XORed with the
	139	encryption of the previous block prior to encryption. The encryption
	140	of the last block then is the desired MAC:
	141
	142	t_0 = F_k(l)
	143	t_{i+1} = F_k(t_i + m_i)
	144	t = t_l
	145
	146	We will prove that this scheme is secure. Notice, however, that
	147	seemingly innocuous modifications thwart its security:
	148
	149	First, let us see why we need to incorporate the length. Suppose we
	150	would just put:
	151
	152	t_1 = F_k(m_1)
	153	t_2 = F_k(m_2+t_1)
	154	...
	155	t = t_l
	156
	157	Then, if t=Mac_k(m_1 m_2) and t'=Mac_k(m_3+t m_4) then t'=Mac_k(m_1
	158	m_2 m_3 m_4) thus allowing us to forge a previously unqueried tag.
	159
	160	Now suppose that we would start with a random initialisation vector IV
	161	and transmit it along with the tag:
	162
	163	t_0 = F_k(l+IV)
	164	...
	165
	166	This would allow us to modify the length: If (IV,t) is the MAC of l,m
	167	then (IV+l+l',t) is the MAC of l',m.
	168
	169	It is also not ok to attach the length at the end rather than at the
	170	beginning. Namely, suppose that we obtain the following tags where the
	171	length is added in parenthesis for clarity.
	172
	173
	174	ABCDE(5) : t
	175	XYZKL(5) : t'
	176	ABCDE5UVW(9) : t''
	177
	178	We then have that the tag of XYZKL5 U+t+t' VW(9) is also t''.
	179
	180	We will now sketch a proof that CBC-MAC is indeed secure. We will show
	181	the following theorem from which the claim follows by the standard
	182	reasoning used in earlier security proofs.
	183
	184	Theorem: Fix a security parameter n. For g:{0,1}^n->{0,1}^n and m=m1
	185	m2 ... mk write CBC_g(m) = CBC_g(m1,m2,...,mk) =
	186
	187	g(ml+g(m_{k-1}+ ... g(m2 + g(m1)) ... )
	188
	189	Also let f:({0,1}^n)* -> {0,1}^n be an arbitrary function. Let D be an
	190	adversary which is given oracle access to a function ({0,1}^n)* ->
	191	{0,1}^n.
	192
	193	Suppose that D queries its oracle on a prefix-free set, i.e. if X and
	194	Y are both queried then X is not a prefix of Y nor is Y a prefix of X.
	195	One then has
	196
	197	\|Pr(D^{CBC_g()}(n) = 1) - Pr(D^{f()}(n) = 1)\| = negl(n)
	198
	199	where probabilities are taken over g and f chosen uniformly at random.
	200
	201	To see why this implies the theorem we first consider that the
	202	difference between CBC_g and CBC_{F_k} is negligible since F_k is
	203	assumed to be pseudorandom. Furthermore, notice that in CBC-MAC proper
	204	the function CBC_{F_k} is applied to a subset of P:={\|m\| m \|
	205	m:({0,1}^n)*} which itself is a prefix-free set. Thus, the whole
	206	business with including the length and putting it into the first
	207	position rather than anywhere else serves to ensure prefix-free-ness.
	208
	209	Proof of Theorem (sketch; for details see KL, 2nd ed):
	210
	211	Let Q={X1..Xq} be the (prefix-free) set of queries (q = number of
	212	queries) made by the distinguisher D to the oracle. Let l be the
	213	maximum number of blocks in those queries. Notice that both q---the
	214	number of queries---and l---their maximum length are polynomial in n
	215	since the distinguished is assumed to run in polynomial time. If D
	216	queries X and X = m1 m2 ... mk then (in the case where the oracle is
	217	instantiated with CBC_g()) the function g() will be evaluated at m1,
	218	m2+g(m1), m3+g(m2+g(m1)), ... . Write C_g(X) for this sequence (of
	219	length < l). We say that Q exhibits a collision if among the
	220	g-evaluations occurring in {C_g(X) \| X:Q} one finds the situation
	221	g(x)=g(x') where x =/= x'. We also consider it a collision if at some
	222	point we have mi+g(x) = mj'+g(x') despite x=/=x' (and then necessarily
	223	mi=/=mj').
	224
	225	Under the assumption that g() is random, the probability that such a
	226	collision is exhibited is bounded by q^2 * l^2 * negl(n) for there are
	227	<= q^2 * l^2 possibilities for where such a collision might occur and
	228	the probability for a collision at a given position is negligible.
	229	Since q and l are polynomial in n this probability is therefore
	230	negligible itself.
	231
	232	Let us assume that no collision is exhibited. Now, for each X:Q let
	233	v(X) be the last value in C_g(X). We have CBC_g(X) = g(v(X)). The
	234	values v(X) are pairwise distinct and do not agree with any of the
	235	other values in {C_g(X) \| X:Q}. This is where prefix-freeness is
	236	used. As a result, since g() is random, the answers to the queries
	237	g(v(X)) are uniformly random and thus adhere to the same distribution
	238	as f(X) for the random f. Thus, the only advantage that the
	239	distinguisher has stems from the possibility of a collision being
	240	exhibited which, as we saw above, is negligible.
	241
	242
	243
	244	In conclusion, we note that an already existing implementation of
	245	CBC-mode (for encryption) can be turned into an implementation of
	246	CBC-MAC by
	247
	248	- using IV=0
	249	- discarding all but the last blocks of the ciphertext
	250	- prepending the plaintext with its length
	251
	252	Mistakes in any one of these (e.g. using a random IV and transmitting
	253	it, omitting the length or placing it elsewhere, transmitting other
	254	blocks of the ciphertext, lead to insecurities. See the nice blog
	255	article "Why I hate CBC-MAC").
	256
	257	https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/