Notes on cryptography

author: Gregor Kleen <gkleen@yggdrasil.li> 2017-10-24 15:33:07 +0200
committer: Gregor Kleen <gkleen@yggdrasil.li> 2017-10-24 15:33:07 +0200
commit: 162df3190c4e179ef968b0a467ddd4951faba336 (patch)
tree: 3c9628cf1abf259e1b897ac14b006e2b448cff27 /ss2017/cryptography/Notes11.txt
parent: 92ed41381f40c97b52b0e96f0eaeaa9025091411 (diff)
download: uni-162df3190c4e179ef968b0a467ddd4951faba336.tar
uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.gz
uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.bz2
uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.xz
uni-162df3190c4e179ef968b0a467ddd4951faba336.zip
1 files changed, 310 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes11.txt b/ss2017/cryptography/Notes11.txt
new file mode 100644
index 0000000..839d24b
--- /dev/null
+++ b/ss2017/cryptography/Notes11.txt
@@ -0,0 +1,310 @@
+11 Hashing, Merkle-Damgard transform, SHA, HMAC
+-----------------------------------------------
+Hash functions serve a related but slightly different purpose to that
+of MACs. Unlike those (and private key encryption) they are not
+immediately useful on their own but rather are building blocks and
+indeed are "a workhorse of cryptography" like e.g. solving systems of
+linear equations is to engineering.
+Intuitively, a hash function H() transforms a long string m into a
+short digest ("hash") t=H(m) such that it is extremely difficult to
+produce two texts x =/= x' with H(x)=H(x'). Such pair x,x' is known as
+a *collision*.
+As such, the notion cannot really be made sense of in the context of
+computational complexity for once we have x=/=x' with H(x)=H(x') then
+we can just hardwire x,x' into a program. In order to get around this,
+we introduce as before randomly chosen keys and security parameters.
+This will allow us to formulate meaningful security goals and
+meaningful security proofs for various constructions involving hash
+functions. 
+Definition: A *hash function* consists of a (probabilistic polytime)
+key generation algorithm Gen(n) as before and a polytime function H
+taking as input a key s ("seed") and m:{0,1}^* then
+H_s(m):{0,1}^{l(n)} where l(n) is a fixed, and typically simple,
+length function, e.g. l(n)=n.
+There may also be another length function l'(n)>l(n) and
+H_s:{0,1}^{l'(n)} -> {0,1}^{l(n)}. In that case, we have a
+fixed-length hash function.
+Definition: A hash function Pi=(Gen,H) as above is *collision resistant* if
+for all probabilistic polytime adversaries A one has
+ 
+            Pr[Hash-coll_{A,Pi}(n)=1] = negl(n)
+for some negligible function negl(n). Here, Hash-coll_{A,Pi}(n) is the
+following experiment:
+1) s <-Gen(n) /* s = "seed" */
+2) A is given s and then outputs x,x'
+3) The result of the experiment is 1 if x=/=x' and H_s(x)=H_s(x') and
+0 otherwise.
+Thus, the adversary must provide a collision given a key s and the
+probability that it succeeds in doing so should be negligible. In
+contrast, in the case of MACs the adversary does not get access to the
+key.
+Weaker notions of security
+- - - - - - - - - - - - - -
+In this context one may ask why being able to produce a collision
+should be considered dangerous. Firstly, if a hash function is
+collision resistant then it follows that it is also not possible to
+forge a message x' =/=x such that H(x)=H(x') given x. So collision
+resistance is certainly a "good enough" notion. However, one could
+instead ask for the weaker property of "second preimage resistance":
+it should be difficult to, given x, produce x'=/=x such that
+H(x)=(x').
+A typical application of hash functions is message integrity: If we
+send someone a message x via some untrusted channel (which can be
+modified by an adversary) and the hash value t=H(x) via a trusted
+channel then the receiver can check that the message has not been
+tampered with by checking that t=H(x') with x' being the received
+(possibly modified) message.
+While being able to produce an arbitrary collision may be harmless,
+being able to find a collision in a given set of texts of size ~
+2^{l(n)} can be very dangerous and is not covered by "second preimage
+resistance": Consider the following scenario: we produce a letter L,
+have its hash signed by a responsible person, and then send it to some
+recipient. Now, it is very easy to produce functions
+f,g:{0,1}^{l(n)}->{0,1}^* such that for each z the string f(z) is,
+say, a different rejection letter, and g(z) is a different acceptance
+letter. E.g. by introducing various switches dear/esteemed, Sir/Madam
+or pleased/delighted etc. Or, if one is dealing with images rather
+than letters by slightly altering intensity values. Then if, one can
+produce z, z' such that H(f(z))=H(g(z')) one can have the hash
+t=H(f(z)) signed and communicate along with g(z'), for which the hash
+is also valid due to the collision. While this still requires
+collisions of a special kind it is sufficiently close to arbitrary
+collisions and for this reason one considers collision resistance as
+the "gold standard" for hash functions.
+Birthday paradox
+- - - - - - - - -
+Let us see, how many steps a brute force attack requires. In order to
+do that, we (simplifyingly) assume that H_s is a random function. Let
+us write H(x)=H_s(x). So we assume that H is drawn uniformly at random
+from {0,1}^*->{0,1}^{l} where l=l(n). 
+Clearly, if we fix x then the likelihood for x' drawn uniformly at
+random to satisfy H(x)=H(x') is 2^{-l(n)} and thus, we need to try out
+ca N/2 values (N=2^l) to get a 50% success probability. This also
+holds, if x has been drawn at random. Things are different, if we
+randomly choose a set of values S and then *search* for a collision
+within S. Of course, if |S|>N then we find a collision with certainty,
+but perhaps surprisingly |S|>=2*sqrt(N) ensures that the likelihood of
+a collision within S is above 50%.
+So, to find a collision one can choose a set S of size 2*sqrt(N), sort
+it according to hash values and then search (in linear time) for the
+desired collision. If one does not find one, repeat with new data.
+Thus, for example if l(n)=64 then trying out 2^63 values will (with
+probability >=50%) lead to a collision with a fixed left hand side,
+but trying out 2^33 values will suffice to get an arbitrary collision
+using the birthday paradox. While 2^63 steps can only be carried out
+with special hardware resources, carrying out ~2^33 steps is trivial on
+today's computers. One must bear this in mind when trying to estimate
+"bits of security" of a hash function. Even assuming that the hash
+function does not contain systematic security holes we will only get
+one half of the hash size (size of the digest) many bits of security.
+Improved birthday attack
+- - - - - - - - - - - - -  
+The implementation of the "birthday attack" sketched above takes not
+only roughly O(sqrt(N)) steps but also O(sqrt(N)) memory which is a
+more expensive resource. Indeed, storing 2^33 thirty-two byte words
+requires 64TB. We now describe a version of the birthday attack that
+runs in constant space.
+If we choose x0:{0,1}^l uniformly at random then, assuming as we did,
+that H is a random function then
+S={x0,H(x0),H(H(x0)),H(H(H(x0))),...,H^{2*sqrt(N)-1}(x0)}
+is also random and thus contains a collision with >=50%
+likelihood. Let us write xi:=H^i(x0). Suppose that i,j are such that
+xi=xj, but x_{i-1} =/= x_{j-1}, so x_{i-1},x_{j-1} form a
+collision. Note that xi=H(x_{i-1}). Assuming i<j and writing j=i+d we
+then also have x_i=x_{i+kd} for all k>=0 and thus x_{i+z}=x_{i+kd+z}
+for all k>=0 and z>=0. If we choose k such that kd>=i then
+x_{kd}=x_{2kd} follows with z=kd-i.
+This suggests the following
+algorithm for finding a collision with high likelihood:
+x0 <- {0,1}^l /* uniformly at random */
+x = x0;y = x0; 
+for (r=1..4*sqrt(N)) {
+   x = H(x); /* x=H^r(x0) */
+   y = H(H(y)); /* y=H^{2r}(x0) */
+   if (x==y) break;}
+if (x!=y) start over with new x0; else
+y=x;x=x0; /* y=H^{kd}(x0) */
+if (x==x0) start over with new x0; else /* very unlikely */ 
+for (i=0..r) do {
+   x = H(x); /* x=H^i(x0) */
+   y = H(y); /* y=H^{i+kd}(x0) */
+   if (x==y) break; /* must happen before loop runs out */}
+return H^{i-1}(x0),H^{i+kd-1}(x0); /* can optimize recomputation */
+Merkle-Damgard Transform
+- - - - - - - - - - - - -
+The Merkle-Damgard Transform constructs a collision-resistant hash
+function (Gen,H) from a given *fixed length* collision-resistant hash
+function (Gen,h) with length 2*l(n), i.e. the h-hash is half as long
+as the input. The key generation remains unchanged and H is as follows.
+1) Given key s and input x:{0,1}^* divide message into blocks x1,x2,...,x_B
+of length l=l(n) possibly padding the last block with zeroes. Set x_{B+1}=L. 
+2) Set z0=0...0 /* l zeroes */ 
+3) for i=1..B+1 do zi = h_s(z_{i-1} || x_i)
+4) Output z_{B+1}
+  x1           x2              x_B          L  
+0 |  _____     |  _____         |  _____    |  _____
+| |_|     |    |_|     |        |_|     |   |_|     |
+|___| h_s |______| h_s |_...._____| h_s |_____| h_s |-----> H_s(x)
+    |_____|      |_____|          |_____|     |_____|
+Theorem: If (Gen,h) is a collision-resistant fixed length hash
+function of length l'(n)=2*l(n) then (Gen,H) is a collision-resistant
+hash function.
+Proof: We first show that any collision in H_s yields a collision in
+h_s.  Suppose that H_s(x)=H_s(x') and write x_1..x_B,L for the blocks
+and length of x and x'_1..x'_B' L' for x'. Also write z_1..z_{B+1},
+z'_1..z'_B' for the intermediate results. We have H_s(x)=h_s(z_B||L)
+and H_s(x')=h_s(z'_B'||L'), so, if L=/=L' we already have produced the
+desired collision in h_s. This also holds, if L=L' and z_B =/= z'_B
+(notice that B=B' in this case). If L=L' and z_B=z'_B then, since
+z_B=h_s(z_{B-1}||x_B) we get a collision unless x_B=x'_B. Continuing
+in this way, we eventually find our collision since x_i =/= x'_i for
+at least one i.
+Now, suppose that A is an adversary against H_s. We construct an
+adversary A' against h_s as follows. If A outputs x,x' we check
+whether H_s(x)=H_s(x') and if so, construct the corresponding
+collision h_s(y)=h_s(y') in h_s as described above and output
+y,y'. Otherwise, we output 0,0. The likelihood for succeeding, i.e.,
+for the output to be a h_s-collision is negligible by assumption and
+thus the probability that A output a collision is negligible as
+well. Notice that the computations we carried out as A' were clearly
+in polynomial time. QED
+The well-known and standardized cryptographic hash functions MD5,
+SHA-1, SHA-2 use the Merkle-Damgard transform. The underlying
+fixed-length hash function (called compression function) in this
+context is built using the heuristic principles that were described in
+Section 7 (Practical block ciphers. Principles: s/p- and Feistel
+networks).  We remark here that MD5 and SHA-1 are no longer consider
+sufficiently secure whereas SHA-2 is.
+The latest standard SHA-3 which is not yet widely used but might
+eventually supersede SHA-2 does not use Merkle-Damgard but instead
+relies on the so-called *sponge construction*, see Wikipedia. 
+MACs from hash functions
+- - - - - - - - - - - - - 
+A natural question is whether one can use hash functions also for the
+purpose of authentication, i.e. MAC. First, we note that naive
+attempts (which according to KL have been actually used in practice!)
+do not work. Consider for example the definition Mac_k(x) = H_s(k||x)
+where s is some public seed. If H_s was constructed with the
+Merkle-Damgard transform from h_s then we can easily forge MACs as
+follows: if t is the MAC of x then t'=h_s(t||L') is the
+MAC of (x || |x|) (glossing over the padding).
+Suppose that we are given a *fixed-length* MAC (Gen,Mac) capable of
+authenticating messages of length l(n). Suppose furthermore, that we
+have a collision-resistant hash function (Gen',H) which produces
+hashes of length l(n). We can then build a MAC (Gen'',Mac'') for
+arbitrary length messages as follows:
+Gen''(n): k<-Gen(n); s<-Gen'(n);return (k,s) /* use some encoding of pairs */
+Mac''_{(k,s)}(m): return Mac_k(H_s(m))
+Theorem: Assuming that (Gen,Mac) is secure and (Gen',H) is
+collision-resistant then (Gen'',Mac'') is secure.
+Proof: Suppose that an adversary A asks questions Q and then succeeds
+in forging a MAC t=Mac''_{(k,s)}(x), i.e., x is not in Q. If H_s(x) is
+different from all the H_s(y) for y:Q then A has succeeded in forging
+a MAC for Mac_k. Namely then t=Mac_k(H_s(x)) and H_s(x) was not
+previously asked. If, however, H_s(x) is contained in {H_s(y) | y:Q}
+then A has produced a collision. Since both events occur with
+negligible probability the probability that A can forge a MAC is also
+negligible. QED 
+Suppose that H_s() above has been constructed using the Merkle-Damgard
+construction from the fixed-length hash function h_s(). For seed s
+define a fixed-length MAC by
+                    Mac_k(m) = h_s(k||m)
+We now make the assumption that h_s() is such that this is a secure
+fixed-length MAC. Notice that the abovedescribed extension attack on
+MACs of this sort does not apply due to the fixed length.
+If we now apply the above hash-and-MAC construction then it turns out
+that the arbitrary-length MAC obtained satisfies "essentially"
+Mac''_{(k,s)}(m) = H_s(k||H_s(m))
+because the last H_s degenerates to h_s due to the shortness of
+H_s(m). The quoted "essentially" refers to the fact that for this
+equation to hold exactly one needs to assume some conventions about
+how exactly the padding in the Merkle-Damgard construction is
+performed. We want to ignore these details here and refer to the
+literature.
+The HMAC construction
+- - - - - - - - - - -
+The widely used and standardised HMAC construction is essentially the
+above except that another key is added to the message m prior to
+invoking the inner hash function and that the two keys that are thus
+needed are derived from a single one by XOR-ing with certain fixed
+words ipad and opad:
+HMAC_{(k,s)}(m) = H_s( (k+opad) || H_s((k+ipad) || m))
+In practice, ipad and opad are chosen to be 00110110 and 01011100 each
+repeated as often as necessary.
+The additional inner keying with k+ipad allows one to weaken the
+assumptions on the collision freeness of H_s(). The details of these
+assumptions and their practical relevance are still subject to debate
+(see a 2012 paper by Koblitz and Menezes); however, no practical
+attack has been found to date and HMAC is considered secure in
+practice.
author	Gregor Kleen <gkleen@yggdrasil.li>	2017-10-24 15:33:07 +0200
committer	Gregor Kleen <gkleen@yggdrasil.li>	2017-10-24 15:33:07 +0200
commit	162df3190c4e179ef968b0a467ddd4951faba336 (patch)
tree	3c9628cf1abf259e1b897ac14b006e2b448cff27 /ss2017/cryptography/Notes11.txt
parent	92ed41381f40c97b52b0e96f0eaeaa9025091411 (diff)
download	uni-162df3190c4e179ef968b0a467ddd4951faba336.tar uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.gz uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.bz2 uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.xz uni-162df3190c4e179ef968b0a467ddd4951faba336.zip

diff --git a/ss2017/cryptography/Notes11.txt b/ss2017/cryptography/Notes11.txt new file mode 100644 index 0000000..839d24b --- /dev/null +++ b/ss2017/cryptography/Notes11.txt
@@ -0,0 +1,310 @@
	1
	2
	3	11 Hashing, Merkle-Damgard transform, SHA, HMAC
	4	-----------------------------------------------
	5
	6	Hash functions serve a related but slightly different purpose to that
	7	of MACs. Unlike those (and private key encryption) they are not
	8	immediately useful on their own but rather are building blocks and
	9	indeed are "a workhorse of cryptography" like e.g. solving systems of
	10	linear equations is to engineering.
	11
	12	Intuitively, a hash function H() transforms a long string m into a
	13	short digest ("hash") t=H(m) such that it is extremely difficult to
	14	produce two texts x =/= x' with H(x)=H(x'). Such pair x,x' is known as
	15	a collision.
	16
	17	As such, the notion cannot really be made sense of in the context of
	18	computational complexity for once we have x=/=x' with H(x)=H(x') then
	19	we can just hardwire x,x' into a program. In order to get around this,
	20	we introduce as before randomly chosen keys and security parameters.
	21	This will allow us to formulate meaningful security goals and
	22	meaningful security proofs for various constructions involving hash
	23	functions.
	24
	25	Definition: A hash function consists of a (probabilistic polytime)
	26	key generation algorithm Gen(n) as before and a polytime function H
	27	taking as input a key s ("seed") and m:{0,1}^* then
	28	H_s(m):{0,1}^{l(n)} where l(n) is a fixed, and typically simple,
	29	length function, e.g. l(n)=n.
	30
	31	There may also be another length function l'(n)>l(n) and
	32	H_s:{0,1}^{l'(n)} -> {0,1}^{l(n)}. In that case, we have a
	33	fixed-length hash function.
	34
	35	Definition: A hash function Pi=(Gen,H) as above is collision resistant if
	36	for all probabilistic polytime adversaries A one has
	37
	38	Pr[Hash-coll_{A,Pi}(n)=1] = negl(n)
	39
	40	for some negligible function negl(n). Here, Hash-coll_{A,Pi}(n) is the
	41	following experiment:
	42
	43	1) s <-Gen(n) /* s = "seed" */
	44	2) A is given s and then outputs x,x'
	45	3) The result of the experiment is 1 if x=/=x' and H_s(x)=H_s(x') and
	46	0 otherwise.
	47
	48	Thus, the adversary must provide a collision given a key s and the
	49	probability that it succeeds in doing so should be negligible. In
	50	contrast, in the case of MACs the adversary does not get access to the
	51	key.
	52
	53
	54
	55	Weaker notions of security
	56	- - - - - - - - - - - - - -
	57
	58	In this context one may ask why being able to produce a collision
	59	should be considered dangerous. Firstly, if a hash function is
	60	collision resistant then it follows that it is also not possible to
	61	forge a message x' =/=x such that H(x)=H(x') given x. So collision
	62	resistance is certainly a "good enough" notion. However, one could
	63	instead ask for the weaker property of "second preimage resistance":
	64	it should be difficult to, given x, produce x'=/=x such that
	65	H(x)=(x').
	66
	67	A typical application of hash functions is message integrity: If we
	68	send someone a message x via some untrusted channel (which can be
	69	modified by an adversary) and the hash value t=H(x) via a trusted
	70	channel then the receiver can check that the message has not been
	71	tampered with by checking that t=H(x') with x' being the received
	72	(possibly modified) message.
	73
	74	While being able to produce an arbitrary collision may be harmless,
	75	being able to find a collision in a given set of texts of size ~
	76	2^{l(n)} can be very dangerous and is not covered by "second preimage
	77	resistance": Consider the following scenario: we produce a letter L,
	78	have its hash signed by a responsible person, and then send it to some
	79	recipient. Now, it is very easy to produce functions
	80	f,g:{0,1}^{l(n)}->{0,1}^* such that for each z the string f(z) is,
	81	say, a different rejection letter, and g(z) is a different acceptance
	82	letter. E.g. by introducing various switches dear/esteemed, Sir/Madam
	83	or pleased/delighted etc. Or, if one is dealing with images rather
	84	than letters by slightly altering intensity values. Then if, one can
	85	produce z, z' such that H(f(z))=H(g(z')) one can have the hash
	86	t=H(f(z)) signed and communicate along with g(z'), for which the hash
	87	is also valid due to the collision. While this still requires
	88	collisions of a special kind it is sufficiently close to arbitrary
	89	collisions and for this reason one considers collision resistance as
	90	the "gold standard" for hash functions.
	91
	92
	93
	94	Birthday paradox
	95	- - - - - - - - -
	96
	97	Let us see, how many steps a brute force attack requires. In order to
	98	do that, we (simplifyingly) assume that H_s is a random function. Let
	99	us write H(x)=H_s(x). So we assume that H is drawn uniformly at random
	100	from {0,1}^*->{0,1}^{l} where l=l(n).
	101
	102	Clearly, if we fix x then the likelihood for x' drawn uniformly at
	103	random to satisfy H(x)=H(x') is 2^{-l(n)} and thus, we need to try out
	104	ca N/2 values (N=2^l) to get a 50% success probability. This also
	105	holds, if x has been drawn at random. Things are different, if we
	106	randomly choose a set of values S and then search for a collision
	107	within S. Of course, if \|S\|>N then we find a collision with certainty,
	108	but perhaps surprisingly \|S\|>=2*sqrt(N) ensures that the likelihood of
	109	a collision within S is above 50%.
	110
	111	So, to find a collision one can choose a set S of size 2*sqrt(N), sort
	112	it according to hash values and then search (in linear time) for the
	113	desired collision. If one does not find one, repeat with new data.
	114
	115	Thus, for example if l(n)=64 then trying out 2^63 values will (with
	116	probability >=50%) lead to a collision with a fixed left hand side,
	117	but trying out 2^33 values will suffice to get an arbitrary collision
	118	using the birthday paradox. While 2^63 steps can only be carried out
	119	with special hardware resources, carrying out ~2^33 steps is trivial on
	120	today's computers. One must bear this in mind when trying to estimate
	121	"bits of security" of a hash function. Even assuming that the hash
	122	function does not contain systematic security holes we will only get
	123	one half of the hash size (size of the digest) many bits of security.
	124
	125
	126
	127	Improved birthday attack
	128	- - - - - - - - - - - - -
	129
	130	The implementation of the "birthday attack" sketched above takes not
	131	only roughly O(sqrt(N)) steps but also O(sqrt(N)) memory which is a
	132	more expensive resource. Indeed, storing 2^33 thirty-two byte words
	133	requires 64TB. We now describe a version of the birthday attack that
	134	runs in constant space.
	135
	136	If we choose x0:{0,1}^l uniformly at random then, assuming as we did,
	137	that H is a random function then
	138
	139	S={x0,H(x0),H(H(x0)),H(H(H(x0))),...,H^{2*sqrt(N)-1}(x0)}
	140
	141	is also random and thus contains a collision with >=50%
	142	likelihood. Let us write xi:=H^i(x0). Suppose that i,j are such that
	143	xi=xj, but x_{i-1} =/= x_{j-1}, so x_{i-1},x_{j-1} form a
	144	collision. Note that xi=H(x_{i-1}). Assuming i<j and writing j=i+d we
	145	then also have x_i=x_{i+kd} for all k>=0 and thus x_{i+z}=x_{i+kd+z}
	146	for all k>=0 and z>=0. If we choose k such that kd>=i then
	147	x_{kd}=x_{2kd} follows with z=kd-i.
	148
	149	This suggests the following
	150	algorithm for finding a collision with high likelihood:
	151
	152	x0 <- {0,1}^l /* uniformly at random */
	153	x = x0;y = x0;
	154	for (r=1..4*sqrt(N)) {
	155	x = H(x); /* x=H^r(x0) */
	156	y = H(H(y)); /* y=H^{2r}(x0) */
	157	if (x==y) break;}
	158	if (x!=y) start over with new x0; else
	159	y=x;x=x0; /* y=H^{kd}(x0) */
	160	if (x==x0) start over with new x0; else /* very unlikely */
	161	for (i=0..r) do {
	162	x = H(x); /* x=H^i(x0) */
	163	y = H(y); /* y=H^{i+kd}(x0) */
	164	if (x==y) break; /* must happen before loop runs out */}
	165	return H^{i-1}(x0),H^{i+kd-1}(x0); /* can optimize recomputation */
	166
	167
	168
	169	Merkle-Damgard Transform
	170	- - - - - - - - - - - - -
	171
	172	The Merkle-Damgard Transform constructs a collision-resistant hash
	173	function (Gen,H) from a given fixed length collision-resistant hash
	174	function (Gen,h) with length 2*l(n), i.e. the h-hash is half as long
	175	as the input. The key generation remains unchanged and H is as follows.
	176
	177	1) Given key s and input x:{0,1}^* divide message into blocks x1,x2,...,x_B
	178	of length l=l(n) possibly padding the last block with zeroes. Set x_{B+1}=L.
	179
	180	2) Set z0=0...0 /* l zeroes */
	181
	182	3) for i=1..B+1 do zi = h_s(z_{i-1} \|\| x_i)
	183
	184	4) Output z_{B+1}
	185
	186
	187	x1 x2 x_B L
	188	0 \| _____ \| _____ \| _____ \| _____
	189	\| \|_\| \| \|_\| \| \|_\| \| \|_\| \|
	190	\|___\| h_s \|______\| h_s \|_...._____\| h_s \|_____\| h_s \|-----> H_s(x)
	191	\|_____\| \|_____\| \|_____\| \|_____\|
	192
	193
	194	Theorem: If (Gen,h) is a collision-resistant fixed length hash
	195	function of length l'(n)=2*l(n) then (Gen,H) is a collision-resistant
	196	hash function.
	197
	198	Proof: We first show that any collision in H_s yields a collision in
	199	h_s. Suppose that H_s(x)=H_s(x') and write x_1..x_B,L for the blocks
	200	and length of x and x'_1..x'_B' L' for x'. Also write z_1..z_{B+1},
	201	z'_1..z'_B' for the intermediate results. We have H_s(x)=h_s(z_B\|\|L)
	202	and H_s(x')=h_s(z'_B'\|\|L'), so, if L=/=L' we already have produced the
	203	desired collision in h_s. This also holds, if L=L' and z_B =/= z'_B
	204	(notice that B=B' in this case). If L=L' and z_B=z'_B then, since
	205	z_B=h_s(z_{B-1}\|\|x_B) we get a collision unless x_B=x'_B. Continuing
	206	in this way, we eventually find our collision since x_i =/= x'_i for
	207	at least one i.
	208
	209	Now, suppose that A is an adversary against H_s. We construct an
	210	adversary A' against h_s as follows. If A outputs x,x' we check
	211	whether H_s(x)=H_s(x') and if so, construct the corresponding
	212	collision h_s(y)=h_s(y') in h_s as described above and output
	213	y,y'. Otherwise, we output 0,0. The likelihood for succeeding, i.e.,
	214	for the output to be a h_s-collision is negligible by assumption and
	215	thus the probability that A output a collision is negligible as
	216	well. Notice that the computations we carried out as A' were clearly
	217	in polynomial time. QED
	218
	219	The well-known and standardized cryptographic hash functions MD5,
	220	SHA-1, SHA-2 use the Merkle-Damgard transform. The underlying
	221	fixed-length hash function (called compression function) in this
	222	context is built using the heuristic principles that were described in
	223	Section 7 (Practical block ciphers. Principles: s/p- and Feistel
	224	networks). We remark here that MD5 and SHA-1 are no longer consider
	225	sufficiently secure whereas SHA-2 is.
	226
	227	The latest standard SHA-3 which is not yet widely used but might
	228	eventually supersede SHA-2 does not use Merkle-Damgard but instead
	229	relies on the so-called sponge construction, see Wikipedia.
	230
	231
	232
	233	MACs from hash functions
	234	- - - - - - - - - - - - -
	235
	236	A natural question is whether one can use hash functions also for the
	237	purpose of authentication, i.e. MAC. First, we note that naive
	238	attempts (which according to KL have been actually used in practice!)
	239
	240
	241	do not work. Consider for example the definition Mac_k(x) = H_s(k\|\|x)
	242	where s is some public seed. If H_s was constructed with the
	243	Merkle-Damgard transform from h_s then we can easily forge MACs as
	244	follows: if t is the MAC of x then t'=h_s(t\|\|L') is the
	245	MAC of (x \|\| \|x\|) (glossing over the padding).
	246
	247	Suppose that we are given a fixed-length MAC (Gen,Mac) capable of
	248	authenticating messages of length l(n). Suppose furthermore, that we
	249	have a collision-resistant hash function (Gen',H) which produces
	250	hashes of length l(n). We can then build a MAC (Gen'',Mac'') for
	251	arbitrary length messages as follows:
	252
	253	Gen''(n): k<-Gen(n); s<-Gen'(n);return (k,s) /* use some encoding of pairs */
	254	Mac''_{(k,s)}(m): return Mac_k(H_s(m))
	255
	256	Theorem: Assuming that (Gen,Mac) is secure and (Gen',H) is
	257	collision-resistant then (Gen'',Mac'') is secure.
	258
	259	Proof: Suppose that an adversary A asks questions Q and then succeeds
	260	in forging a MAC t=Mac''_{(k,s)}(x), i.e., x is not in Q. If H_s(x) is
	261	different from all the H_s(y) for y:Q then A has succeeded in forging
	262	a MAC for Mac_k. Namely then t=Mac_k(H_s(x)) and H_s(x) was not
	263	previously asked. If, however, H_s(x) is contained in {H_s(y) \| y:Q}
	264	then A has produced a collision. Since both events occur with
	265	negligible probability the probability that A can forge a MAC is also
	266	negligible. QED
	267
	268	Suppose that H_s() above has been constructed using the Merkle-Damgard
	269	construction from the fixed-length hash function h_s(). For seed s
	270	define a fixed-length MAC by
	271
	272	Mac_k(m) = h_s(k\|\|m)
	273
	274	We now make the assumption that h_s() is such that this is a secure
	275	fixed-length MAC. Notice that the abovedescribed extension attack on
	276	MACs of this sort does not apply due to the fixed length.
	277
	278	If we now apply the above hash-and-MAC construction then it turns out
	279	that the arbitrary-length MAC obtained satisfies "essentially"
	280
	281	Mac''_{(k,s)}(m) = H_s(k\|\|H_s(m))
	282
	283	because the last H_s degenerates to h_s due to the shortness of
	284	H_s(m). The quoted "essentially" refers to the fact that for this
	285	equation to hold exactly one needs to assume some conventions about
	286	how exactly the padding in the Merkle-Damgard construction is
	287	performed. We want to ignore these details here and refer to the
	288	literature.
	289
	290
	291	The HMAC construction
	292	- - - - - - - - - - -
	293
	294	The widely used and standardised HMAC construction is essentially the
	295	above except that another key is added to the message m prior to
	296	invoking the inner hash function and that the two keys that are thus
	297	needed are derived from a single one by XOR-ing with certain fixed
	298	words ipad and opad:
	299
	300	HMAC_{(k,s)}(m) = H_s( (k+opad) \|\| H_s((k+ipad) \|\| m))
	301
	302	In practice, ipad and opad are chosen to be 00110110 and 01011100 each
	303	repeated as often as necessary.
	304
	305	The additional inner keying with k+ipad allows one to weaken the
	306	assumptions on the collision freeness of H_s(). The details of these
	307	assumptions and their practical relevance are still subject to debate
	308	(see a 2012 paper by Koblitz and Menezes); however, no practical
	309	attack has been found to date and HMAC is considered secure in
	310	practice.