From 162df3190c4e179ef968b0a467ddd4951faba336 Mon Sep 17 00:00:00 2001
From: Gregor Kleen <gkleen@yggdrasil.li>
Date: Tue, 24 Oct 2017 15:33:07 +0200
Subject: Notes on cryptography

---
 ss2017/cryptography/Notes11.txt | 310 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 310 insertions(+)
 create mode 100644 ss2017/cryptography/Notes11.txt

(limited to 'ss2017/cryptography/Notes11.txt')

diff --git a/ss2017/cryptography/Notes11.txt b/ss2017/cryptography/Notes11.txt
new file mode 100644
index 0000000..839d24b
--- /dev/null
+++ b/ss2017/cryptography/Notes11.txt
@@ -0,0 +1,310 @@
+
+
+11 Hashing, Merkle-Damgard transform, SHA, HMAC
+-----------------------------------------------
+
+Hash functions serve a related but slightly different purpose to that
+of MACs. Unlike those (and private key encryption) they are not
+immediately useful on their own but rather are building blocks and
+indeed are "a workhorse of cryptography" like e.g. solving systems of
+linear equations is to engineering.
+
+Intuitively, a hash function H() transforms a long string m into a
+short digest ("hash") t=H(m) such that it is extremely difficult to
+produce two texts x =/= x' with H(x)=H(x'). Such pair x,x' is known as
+a *collision*.
+
+As such, the notion cannot really be made sense of in the context of
+computational complexity for once we have x=/=x' with H(x)=H(x') then
+we can just hardwire x,x' into a program. In order to get around this,
+we introduce as before randomly chosen keys and security parameters.
+This will allow us to formulate meaningful security goals and
+meaningful security proofs for various constructions involving hash
+functions. 
+
+Definition: A *hash function* consists of a (probabilistic polytime)
+key generation algorithm Gen(n) as before and a polytime function H
+taking as input a key s ("seed") and m:{0,1}^* then
+H_s(m):{0,1}^{l(n)} where l(n) is a fixed, and typically simple,
+length function, e.g. l(n)=n.
+
+There may also be another length function l'(n)>l(n) and
+H_s:{0,1}^{l'(n)} -> {0,1}^{l(n)}. In that case, we have a
+fixed-length hash function.
+
+Definition: A hash function Pi=(Gen,H) as above is *collision resistant* if
+for all probabilistic polytime adversaries A one has
+ 
+            Pr[Hash-coll_{A,Pi}(n)=1] = negl(n)
+
+for some negligible function negl(n). Here, Hash-coll_{A,Pi}(n) is the
+following experiment:
+
+1) s <-Gen(n) /* s = "seed" */
+2) A is given s and then outputs x,x'
+3) The result of the experiment is 1 if x=/=x' and H_s(x)=H_s(x') and
+0 otherwise.
+
+Thus, the adversary must provide a collision given a key s and the
+probability that it succeeds in doing so should be negligible. In
+contrast, in the case of MACs the adversary does not get access to the
+key.
+
+
+
+Weaker notions of security
+- - - - - - - - - - - - - -
+
+In this context one may ask why being able to produce a collision
+should be considered dangerous. Firstly, if a hash function is
+collision resistant then it follows that it is also not possible to
+forge a message x' =/=x such that H(x)=H(x') given x. So collision
+resistance is certainly a "good enough" notion. However, one could
+instead ask for the weaker property of "second preimage resistance":
+it should be difficult to, given x, produce x'=/=x such that
+H(x)=(x').
+
+A typical application of hash functions is message integrity: If we
+send someone a message x via some untrusted channel (which can be
+modified by an adversary) and the hash value t=H(x) via a trusted
+channel then the receiver can check that the message has not been
+tampered with by checking that t=H(x') with x' being the received
+(possibly modified) message.
+
+While being able to produce an arbitrary collision may be harmless,
+being able to find a collision in a given set of texts of size ~
+2^{l(n)} can be very dangerous and is not covered by "second preimage
+resistance": Consider the following scenario: we produce a letter L,
+have its hash signed by a responsible person, and then send it to some
+recipient. Now, it is very easy to produce functions
+f,g:{0,1}^{l(n)}->{0,1}^* such that for each z the string f(z) is,
+say, a different rejection letter, and g(z) is a different acceptance
+letter. E.g. by introducing various switches dear/esteemed, Sir/Madam
+or pleased/delighted etc. Or, if one is dealing with images rather
+than letters by slightly altering intensity values. Then if, one can
+produce z, z' such that H(f(z))=H(g(z')) one can have the hash
+t=H(f(z)) signed and communicate along with g(z'), for which the hash
+is also valid due to the collision. While this still requires
+collisions of a special kind it is sufficiently close to arbitrary
+collisions and for this reason one considers collision resistance as
+the "gold standard" for hash functions.
+
+
+
+Birthday paradox
+- - - - - - - - -
+
+Let us see, how many steps a brute force attack requires. In order to
+do that, we (simplifyingly) assume that H_s is a random function. Let
+us write H(x)=H_s(x). So we assume that H is drawn uniformly at random
+from {0,1}^*->{0,1}^{l} where l=l(n). 
+
+Clearly, if we fix x then the likelihood for x' drawn uniformly at
+random to satisfy H(x)=H(x') is 2^{-l(n)} and thus, we need to try out
+ca N/2 values (N=2^l) to get a 50% success probability. This also
+holds, if x has been drawn at random. Things are different, if we
+randomly choose a set of values S and then *search* for a collision
+within S. Of course, if |S|>N then we find a collision with certainty,
+but perhaps surprisingly |S|>=2*sqrt(N) ensures that the likelihood of
+a collision within S is above 50%.
+
+So, to find a collision one can choose a set S of size 2*sqrt(N), sort
+it according to hash values and then search (in linear time) for the
+desired collision. If one does not find one, repeat with new data.
+
+Thus, for example if l(n)=64 then trying out 2^63 values will (with
+probability >=50%) lead to a collision with a fixed left hand side,
+but trying out 2^33 values will suffice to get an arbitrary collision
+using the birthday paradox. While 2^63 steps can only be carried out
+with special hardware resources, carrying out ~2^33 steps is trivial on
+today's computers. One must bear this in mind when trying to estimate
+"bits of security" of a hash function. Even assuming that the hash
+function does not contain systematic security holes we will only get
+one half of the hash size (size of the digest) many bits of security.
+
+
+
+Improved birthday attack
+- - - - - - - - - - - - -  
+
+The implementation of the "birthday attack" sketched above takes not
+only roughly O(sqrt(N)) steps but also O(sqrt(N)) memory which is a
+more expensive resource. Indeed, storing 2^33 thirty-two byte words
+requires 64TB. We now describe a version of the birthday attack that
+runs in constant space.
+
+If we choose x0:{0,1}^l uniformly at random then, assuming as we did,
+that H is a random function then
+
+S={x0,H(x0),H(H(x0)),H(H(H(x0))),...,H^{2*sqrt(N)-1}(x0)}
+
+is also random and thus contains a collision with >=50%
+likelihood. Let us write xi:=H^i(x0). Suppose that i,j are such that
+xi=xj, but x_{i-1} =/= x_{j-1}, so x_{i-1},x_{j-1} form a
+collision. Note that xi=H(x_{i-1}). Assuming i<j and writing j=i+d we
+then also have x_i=x_{i+kd} for all k>=0 and thus x_{i+z}=x_{i+kd+z}
+for all k>=0 and z>=0. If we choose k such that kd>=i then
+x_{kd}=x_{2kd} follows with z=kd-i.
+
+This suggests the following
+algorithm for finding a collision with high likelihood:
+
+x0 <- {0,1}^l /* uniformly at random */
+x = x0;y = x0; 
+for (r=1..4*sqrt(N)) {
+   x = H(x); /* x=H^r(x0) */
+   y = H(H(y)); /* y=H^{2r}(x0) */
+   if (x==y) break;}
+if (x!=y) start over with new x0; else
+y=x;x=x0; /* y=H^{kd}(x0) */
+if (x==x0) start over with new x0; else /* very unlikely */ 
+for (i=0..r) do {
+   x = H(x); /* x=H^i(x0) */
+   y = H(y); /* y=H^{i+kd}(x0) */
+   if (x==y) break; /* must happen before loop runs out */}
+return H^{i-1}(x0),H^{i+kd-1}(x0); /* can optimize recomputation */
+
+
+
+Merkle-Damgard Transform
+- - - - - - - - - - - - -
+
+The Merkle-Damgard Transform constructs a collision-resistant hash
+function (Gen,H) from a given *fixed length* collision-resistant hash
+function (Gen,h) with length 2*l(n), i.e. the h-hash is half as long
+as the input. The key generation remains unchanged and H is as follows.
+
+1) Given key s and input x:{0,1}^* divide message into blocks x1,x2,...,x_B
+of length l=l(n) possibly padding the last block with zeroes. Set x_{B+1}=L. 
+
+2) Set z0=0...0 /* l zeroes */ 
+
+3) for i=1..B+1 do zi = h_s(z_{i-1} || x_i)
+
+4) Output z_{B+1}
+
+
+  x1           x2              x_B          L  
+0 |  _____     |  _____         |  _____    |  _____
+| |_|     |    |_|     |        |_|     |   |_|     |
+|___| h_s |______| h_s |_...._____| h_s |_____| h_s |-----> H_s(x)
+    |_____|      |_____|          |_____|     |_____|
+
+
+Theorem: If (Gen,h) is a collision-resistant fixed length hash
+function of length l'(n)=2*l(n) then (Gen,H) is a collision-resistant
+hash function.
+
+Proof: We first show that any collision in H_s yields a collision in
+h_s.  Suppose that H_s(x)=H_s(x') and write x_1..x_B,L for the blocks
+and length of x and x'_1..x'_B' L' for x'. Also write z_1..z_{B+1},
+z'_1..z'_B' for the intermediate results. We have H_s(x)=h_s(z_B||L)
+and H_s(x')=h_s(z'_B'||L'), so, if L=/=L' we already have produced the
+desired collision in h_s. This also holds, if L=L' and z_B =/= z'_B
+(notice that B=B' in this case). If L=L' and z_B=z'_B then, since
+z_B=h_s(z_{B-1}||x_B) we get a collision unless x_B=x'_B. Continuing
+in this way, we eventually find our collision since x_i =/= x'_i for
+at least one i.
+
+Now, suppose that A is an adversary against H_s. We construct an
+adversary A' against h_s as follows. If A outputs x,x' we check
+whether H_s(x)=H_s(x') and if so, construct the corresponding
+collision h_s(y)=h_s(y') in h_s as described above and output
+y,y'. Otherwise, we output 0,0. The likelihood for succeeding, i.e.,
+for the output to be a h_s-collision is negligible by assumption and
+thus the probability that A output a collision is negligible as
+well. Notice that the computations we carried out as A' were clearly
+in polynomial time. QED
+
+The well-known and standardized cryptographic hash functions MD5,
+SHA-1, SHA-2 use the Merkle-Damgard transform. The underlying
+fixed-length hash function (called compression function) in this
+context is built using the heuristic principles that were described in
+Section 7 (Practical block ciphers. Principles: s/p- and Feistel
+networks).  We remark here that MD5 and SHA-1 are no longer consider
+sufficiently secure whereas SHA-2 is.
+
+The latest standard SHA-3 which is not yet widely used but might
+eventually supersede SHA-2 does not use Merkle-Damgard but instead
+relies on the so-called *sponge construction*, see Wikipedia. 
+
+
+
+MACs from hash functions
+- - - - - - - - - - - - - 
+
+A natural question is whether one can use hash functions also for the
+purpose of authentication, i.e. MAC. First, we note that naive
+attempts (which according to KL have been actually used in practice!)
+
+
+do not work. Consider for example the definition Mac_k(x) = H_s(k||x)
+where s is some public seed. If H_s was constructed with the
+Merkle-Damgard transform from h_s then we can easily forge MACs as
+follows: if t is the MAC of x then t'=h_s(t||L') is the
+MAC of (x || |x|) (glossing over the padding).
+
+Suppose that we are given a *fixed-length* MAC (Gen,Mac) capable of
+authenticating messages of length l(n). Suppose furthermore, that we
+have a collision-resistant hash function (Gen',H) which produces
+hashes of length l(n). We can then build a MAC (Gen'',Mac'') for
+arbitrary length messages as follows:
+
+Gen''(n): k<-Gen(n); s<-Gen'(n);return (k,s) /* use some encoding of pairs */
+Mac''_{(k,s)}(m): return Mac_k(H_s(m))
+
+Theorem: Assuming that (Gen,Mac) is secure and (Gen',H) is
+collision-resistant then (Gen'',Mac'') is secure.
+
+Proof: Suppose that an adversary A asks questions Q and then succeeds
+in forging a MAC t=Mac''_{(k,s)}(x), i.e., x is not in Q. If H_s(x) is
+different from all the H_s(y) for y:Q then A has succeeded in forging
+a MAC for Mac_k. Namely then t=Mac_k(H_s(x)) and H_s(x) was not
+previously asked. If, however, H_s(x) is contained in {H_s(y) | y:Q}
+then A has produced a collision. Since both events occur with
+negligible probability the probability that A can forge a MAC is also
+negligible. QED 
+
+Suppose that H_s() above has been constructed using the Merkle-Damgard
+construction from the fixed-length hash function h_s(). For seed s
+define a fixed-length MAC by
+
+                    Mac_k(m) = h_s(k||m)
+
+We now make the assumption that h_s() is such that this is a secure
+fixed-length MAC. Notice that the abovedescribed extension attack on
+MACs of this sort does not apply due to the fixed length.
+
+If we now apply the above hash-and-MAC construction then it turns out
+that the arbitrary-length MAC obtained satisfies "essentially"
+
+Mac''_{(k,s)}(m) = H_s(k||H_s(m))
+
+because the last H_s degenerates to h_s due to the shortness of
+H_s(m). The quoted "essentially" refers to the fact that for this
+equation to hold exactly one needs to assume some conventions about
+how exactly the padding in the Merkle-Damgard construction is
+performed. We want to ignore these details here and refer to the
+literature.
+
+
+The HMAC construction
+- - - - - - - - - - -
+
+The widely used and standardised HMAC construction is essentially the
+above except that another key is added to the message m prior to
+invoking the inner hash function and that the two keys that are thus
+needed are derived from a single one by XOR-ing with certain fixed
+words ipad and opad:
+
+HMAC_{(k,s)}(m) = H_s( (k+opad) || H_s((k+ipad) || m))
+
+In practice, ipad and opad are chosen to be 00110110 and 01011100 each
+repeated as often as necessary.
+
+The additional inner keying with k+ipad allows one to weaken the
+assumptions on the collision freeness of H_s(). The details of these
+assumptions and their practical relevance are still subject to debate
+(see a 2012 paper by Koblitz and Menezes); however, no practical
+attack has been found to date and HMAC is considered secure in
+practice.
-- 
cgit v1.2.3