From 162df3190c4e179ef968b0a467ddd4951faba336 Mon Sep 17 00:00:00 2001 From: Gregor Kleen Date: Tue, 24 Oct 2017 15:33:07 +0200 Subject: Notes on cryptography --- ss2017/cryptography/Notes11.txt | 310 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 310 insertions(+) create mode 100644 ss2017/cryptography/Notes11.txt (limited to 'ss2017/cryptography/Notes11.txt') diff --git a/ss2017/cryptography/Notes11.txt b/ss2017/cryptography/Notes11.txt new file mode 100644 index 0000000..839d24b --- /dev/null +++ b/ss2017/cryptography/Notes11.txt @@ -0,0 +1,310 @@ + + +11 Hashing, Merkle-Damgard transform, SHA, HMAC +----------------------------------------------- + +Hash functions serve a related but slightly different purpose to that +of MACs. Unlike those (and private key encryption) they are not +immediately useful on their own but rather are building blocks and +indeed are "a workhorse of cryptography" like e.g. solving systems of +linear equations is to engineering. + +Intuitively, a hash function H() transforms a long string m into a +short digest ("hash") t=H(m) such that it is extremely difficult to +produce two texts x =/= x' with H(x)=H(x'). Such pair x,x' is known as +a *collision*. + +As such, the notion cannot really be made sense of in the context of +computational complexity for once we have x=/=x' with H(x)=H(x') then +we can just hardwire x,x' into a program. In order to get around this, +we introduce as before randomly chosen keys and security parameters. +This will allow us to formulate meaningful security goals and +meaningful security proofs for various constructions involving hash +functions. + +Definition: A *hash function* consists of a (probabilistic polytime) +key generation algorithm Gen(n) as before and a polytime function H +taking as input a key s ("seed") and m:{0,1}^* then +H_s(m):{0,1}^{l(n)} where l(n) is a fixed, and typically simple, +length function, e.g. l(n)=n. + +There may also be another length function l'(n)>l(n) and +H_s:{0,1}^{l'(n)} -> {0,1}^{l(n)}. In that case, we have a +fixed-length hash function. + +Definition: A hash function Pi=(Gen,H) as above is *collision resistant* if +for all probabilistic polytime adversaries A one has + + Pr[Hash-coll_{A,Pi}(n)=1] = negl(n) + +for some negligible function negl(n). Here, Hash-coll_{A,Pi}(n) is the +following experiment: + +1) s <-Gen(n) /* s = "seed" */ +2) A is given s and then outputs x,x' +3) The result of the experiment is 1 if x=/=x' and H_s(x)=H_s(x') and +0 otherwise. + +Thus, the adversary must provide a collision given a key s and the +probability that it succeeds in doing so should be negligible. In +contrast, in the case of MACs the adversary does not get access to the +key. + + + +Weaker notions of security +- - - - - - - - - - - - - - + +In this context one may ask why being able to produce a collision +should be considered dangerous. Firstly, if a hash function is +collision resistant then it follows that it is also not possible to +forge a message x' =/=x such that H(x)=H(x') given x. So collision +resistance is certainly a "good enough" notion. However, one could +instead ask for the weaker property of "second preimage resistance": +it should be difficult to, given x, produce x'=/=x such that +H(x)=(x'). + +A typical application of hash functions is message integrity: If we +send someone a message x via some untrusted channel (which can be +modified by an adversary) and the hash value t=H(x) via a trusted +channel then the receiver can check that the message has not been +tampered with by checking that t=H(x') with x' being the received +(possibly modified) message. + +While being able to produce an arbitrary collision may be harmless, +being able to find a collision in a given set of texts of size ~ +2^{l(n)} can be very dangerous and is not covered by "second preimage +resistance": Consider the following scenario: we produce a letter L, +have its hash signed by a responsible person, and then send it to some +recipient. Now, it is very easy to produce functions +f,g:{0,1}^{l(n)}->{0,1}^* such that for each z the string f(z) is, +say, a different rejection letter, and g(z) is a different acceptance +letter. E.g. by introducing various switches dear/esteemed, Sir/Madam +or pleased/delighted etc. Or, if one is dealing with images rather +than letters by slightly altering intensity values. Then if, one can +produce z, z' such that H(f(z))=H(g(z')) one can have the hash +t=H(f(z)) signed and communicate along with g(z'), for which the hash +is also valid due to the collision. While this still requires +collisions of a special kind it is sufficiently close to arbitrary +collisions and for this reason one considers collision resistance as +the "gold standard" for hash functions. + + + +Birthday paradox +- - - - - - - - - + +Let us see, how many steps a brute force attack requires. In order to +do that, we (simplifyingly) assume that H_s is a random function. Let +us write H(x)=H_s(x). So we assume that H is drawn uniformly at random +from {0,1}^*->{0,1}^{l} where l=l(n). + +Clearly, if we fix x then the likelihood for x' drawn uniformly at +random to satisfy H(x)=H(x') is 2^{-l(n)} and thus, we need to try out +ca N/2 values (N=2^l) to get a 50% success probability. This also +holds, if x has been drawn at random. Things are different, if we +randomly choose a set of values S and then *search* for a collision +within S. Of course, if |S|>N then we find a collision with certainty, +but perhaps surprisingly |S|>=2*sqrt(N) ensures that the likelihood of +a collision within S is above 50%. + +So, to find a collision one can choose a set S of size 2*sqrt(N), sort +it according to hash values and then search (in linear time) for the +desired collision. If one does not find one, repeat with new data. + +Thus, for example if l(n)=64 then trying out 2^63 values will (with +probability >=50%) lead to a collision with a fixed left hand side, +but trying out 2^33 values will suffice to get an arbitrary collision +using the birthday paradox. While 2^63 steps can only be carried out +with special hardware resources, carrying out ~2^33 steps is trivial on +today's computers. One must bear this in mind when trying to estimate +"bits of security" of a hash function. Even assuming that the hash +function does not contain systematic security holes we will only get +one half of the hash size (size of the digest) many bits of security. + + + +Improved birthday attack +- - - - - - - - - - - - - + +The implementation of the "birthday attack" sketched above takes not +only roughly O(sqrt(N)) steps but also O(sqrt(N)) memory which is a +more expensive resource. Indeed, storing 2^33 thirty-two byte words +requires 64TB. We now describe a version of the birthday attack that +runs in constant space. + +If we choose x0:{0,1}^l uniformly at random then, assuming as we did, +that H is a random function then + +S={x0,H(x0),H(H(x0)),H(H(H(x0))),...,H^{2*sqrt(N)-1}(x0)} + +is also random and thus contains a collision with >=50% +likelihood. Let us write xi:=H^i(x0). Suppose that i,j are such that +xi=xj, but x_{i-1} =/= x_{j-1}, so x_{i-1},x_{j-1} form a +collision. Note that xi=H(x_{i-1}). Assuming i=0 and thus x_{i+z}=x_{i+kd+z} +for all k>=0 and z>=0. If we choose k such that kd>=i then +x_{kd}=x_{2kd} follows with z=kd-i. + +This suggests the following +algorithm for finding a collision with high likelihood: + +x0 <- {0,1}^l /* uniformly at random */ +x = x0;y = x0; +for (r=1..4*sqrt(N)) { + x = H(x); /* x=H^r(x0) */ + y = H(H(y)); /* y=H^{2r}(x0) */ + if (x==y) break;} +if (x!=y) start over with new x0; else +y=x;x=x0; /* y=H^{kd}(x0) */ +if (x==x0) start over with new x0; else /* very unlikely */ +for (i=0..r) do { + x = H(x); /* x=H^i(x0) */ + y = H(y); /* y=H^{i+kd}(x0) */ + if (x==y) break; /* must happen before loop runs out */} +return H^{i-1}(x0),H^{i+kd-1}(x0); /* can optimize recomputation */ + + + +Merkle-Damgard Transform +- - - - - - - - - - - - - + +The Merkle-Damgard Transform constructs a collision-resistant hash +function (Gen,H) from a given *fixed length* collision-resistant hash +function (Gen,h) with length 2*l(n), i.e. the h-hash is half as long +as the input. The key generation remains unchanged and H is as follows. + +1) Given key s and input x:{0,1}^* divide message into blocks x1,x2,...,x_B +of length l=l(n) possibly padding the last block with zeroes. Set x_{B+1}=L. + +2) Set z0=0...0 /* l zeroes */ + +3) for i=1..B+1 do zi = h_s(z_{i-1} || x_i) + +4) Output z_{B+1} + + + x1 x2 x_B L +0 | _____ | _____ | _____ | _____ +| |_| | |_| | |_| | |_| | +|___| h_s |______| h_s |_...._____| h_s |_____| h_s |-----> H_s(x) + |_____| |_____| |_____| |_____| + + +Theorem: If (Gen,h) is a collision-resistant fixed length hash +function of length l'(n)=2*l(n) then (Gen,H) is a collision-resistant +hash function. + +Proof: We first show that any collision in H_s yields a collision in +h_s. Suppose that H_s(x)=H_s(x') and write x_1..x_B,L for the blocks +and length of x and x'_1..x'_B' L' for x'. Also write z_1..z_{B+1}, +z'_1..z'_B' for the intermediate results. We have H_s(x)=h_s(z_B||L) +and H_s(x')=h_s(z'_B'||L'), so, if L=/=L' we already have produced the +desired collision in h_s. This also holds, if L=L' and z_B =/= z'_B +(notice that B=B' in this case). If L=L' and z_B=z'_B then, since +z_B=h_s(z_{B-1}||x_B) we get a collision unless x_B=x'_B. Continuing +in this way, we eventually find our collision since x_i =/= x'_i for +at least one i. + +Now, suppose that A is an adversary against H_s. We construct an +adversary A' against h_s as follows. If A outputs x,x' we check +whether H_s(x)=H_s(x') and if so, construct the corresponding +collision h_s(y)=h_s(y') in h_s as described above and output +y,y'. Otherwise, we output 0,0. The likelihood for succeeding, i.e., +for the output to be a h_s-collision is negligible by assumption and +thus the probability that A output a collision is negligible as +well. Notice that the computations we carried out as A' were clearly +in polynomial time. QED + +The well-known and standardized cryptographic hash functions MD5, +SHA-1, SHA-2 use the Merkle-Damgard transform. The underlying +fixed-length hash function (called compression function) in this +context is built using the heuristic principles that were described in +Section 7 (Practical block ciphers. Principles: s/p- and Feistel +networks). We remark here that MD5 and SHA-1 are no longer consider +sufficiently secure whereas SHA-2 is. + +The latest standard SHA-3 which is not yet widely used but might +eventually supersede SHA-2 does not use Merkle-Damgard but instead +relies on the so-called *sponge construction*, see Wikipedia. + + + +MACs from hash functions +- - - - - - - - - - - - - + +A natural question is whether one can use hash functions also for the +purpose of authentication, i.e. MAC. First, we note that naive +attempts (which according to KL have been actually used in practice!) + + +do not work. Consider for example the definition Mac_k(x) = H_s(k||x) +where s is some public seed. If H_s was constructed with the +Merkle-Damgard transform from h_s then we can easily forge MACs as +follows: if t is the MAC of x then t'=h_s(t||L') is the +MAC of (x || |x|) (glossing over the padding). + +Suppose that we are given a *fixed-length* MAC (Gen,Mac) capable of +authenticating messages of length l(n). Suppose furthermore, that we +have a collision-resistant hash function (Gen',H) which produces +hashes of length l(n). We can then build a MAC (Gen'',Mac'') for +arbitrary length messages as follows: + +Gen''(n): k<-Gen(n); s<-Gen'(n);return (k,s) /* use some encoding of pairs */ +Mac''_{(k,s)}(m): return Mac_k(H_s(m)) + +Theorem: Assuming that (Gen,Mac) is secure and (Gen',H) is +collision-resistant then (Gen'',Mac'') is secure. + +Proof: Suppose that an adversary A asks questions Q and then succeeds +in forging a MAC t=Mac''_{(k,s)}(x), i.e., x is not in Q. If H_s(x) is +different from all the H_s(y) for y:Q then A has succeeded in forging +a MAC for Mac_k. Namely then t=Mac_k(H_s(x)) and H_s(x) was not +previously asked. If, however, H_s(x) is contained in {H_s(y) | y:Q} +then A has produced a collision. Since both events occur with +negligible probability the probability that A can forge a MAC is also +negligible. QED + +Suppose that H_s() above has been constructed using the Merkle-Damgard +construction from the fixed-length hash function h_s(). For seed s +define a fixed-length MAC by + + Mac_k(m) = h_s(k||m) + +We now make the assumption that h_s() is such that this is a secure +fixed-length MAC. Notice that the abovedescribed extension attack on +MACs of this sort does not apply due to the fixed length. + +If we now apply the above hash-and-MAC construction then it turns out +that the arbitrary-length MAC obtained satisfies "essentially" + +Mac''_{(k,s)}(m) = H_s(k||H_s(m)) + +because the last H_s degenerates to h_s due to the shortness of +H_s(m). The quoted "essentially" refers to the fact that for this +equation to hold exactly one needs to assume some conventions about +how exactly the padding in the Merkle-Damgard construction is +performed. We want to ignore these details here and refer to the +literature. + + +The HMAC construction +- - - - - - - - - - - + +The widely used and standardised HMAC construction is essentially the +above except that another key is added to the message m prior to +invoking the inner hash function and that the two keys that are thus +needed are derived from a single one by XOR-ing with certain fixed +words ipad and opad: + +HMAC_{(k,s)}(m) = H_s( (k+opad) || H_s((k+ipad) || m)) + +In practice, ipad and opad are chosen to be 00110110 and 01011100 each +repeated as often as necessary. + +The additional inner keying with k+ipad allows one to weaken the +assumptions on the collision freeness of H_s(). The details of these +assumptions and their practical relevance are still subject to debate +(see a 2012 paper by Koblitz and Menezes); however, no practical +attack has been found to date and HMAC is considered secure in +practice. -- cgit v1.2.3