summaryrefslogtreecommitdiff
path: root/ss2017/cryptography/Notes11.txt
diff options
context:
space:
mode:
Diffstat (limited to 'ss2017/cryptography/Notes11.txt')
-rw-r--r--ss2017/cryptography/Notes11.txt310
1 files changed, 310 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes11.txt b/ss2017/cryptography/Notes11.txt
new file mode 100644
index 0000000..839d24b
--- /dev/null
+++ b/ss2017/cryptography/Notes11.txt
@@ -0,0 +1,310 @@
1
2
311 Hashing, Merkle-Damgard transform, SHA, HMAC
4-----------------------------------------------
5
6Hash functions serve a related but slightly different purpose to that
7of MACs. Unlike those (and private key encryption) they are not
8immediately useful on their own but rather are building blocks and
9indeed are "a workhorse of cryptography" like e.g. solving systems of
10linear equations is to engineering.
11
12Intuitively, a hash function H() transforms a long string m into a
13short digest ("hash") t=H(m) such that it is extremely difficult to
14produce two texts x =/= x' with H(x)=H(x'). Such pair x,x' is known as
15a *collision*.
16
17As such, the notion cannot really be made sense of in the context of
18computational complexity for once we have x=/=x' with H(x)=H(x') then
19we can just hardwire x,x' into a program. In order to get around this,
20we introduce as before randomly chosen keys and security parameters.
21This will allow us to formulate meaningful security goals and
22meaningful security proofs for various constructions involving hash
23functions.
24
25Definition: A *hash function* consists of a (probabilistic polytime)
26key generation algorithm Gen(n) as before and a polytime function H
27taking as input a key s ("seed") and m:{0,1}^* then
28H_s(m):{0,1}^{l(n)} where l(n) is a fixed, and typically simple,
29length function, e.g. l(n)=n.
30
31There may also be another length function l'(n)>l(n) and
32H_s:{0,1}^{l'(n)} -> {0,1}^{l(n)}. In that case, we have a
33fixed-length hash function.
34
35Definition: A hash function Pi=(Gen,H) as above is *collision resistant* if
36for all probabilistic polytime adversaries A one has
37
38 Pr[Hash-coll_{A,Pi}(n)=1] = negl(n)
39
40for some negligible function negl(n). Here, Hash-coll_{A,Pi}(n) is the
41following experiment:
42
431) s <-Gen(n) /* s = "seed" */
442) A is given s and then outputs x,x'
453) The result of the experiment is 1 if x=/=x' and H_s(x)=H_s(x') and
460 otherwise.
47
48Thus, the adversary must provide a collision given a key s and the
49probability that it succeeds in doing so should be negligible. In
50contrast, in the case of MACs the adversary does not get access to the
51key.
52
53
54
55Weaker notions of security
56- - - - - - - - - - - - - -
57
58In this context one may ask why being able to produce a collision
59should be considered dangerous. Firstly, if a hash function is
60collision resistant then it follows that it is also not possible to
61forge a message x' =/=x such that H(x)=H(x') given x. So collision
62resistance is certainly a "good enough" notion. However, one could
63instead ask for the weaker property of "second preimage resistance":
64it should be difficult to, given x, produce x'=/=x such that
65H(x)=(x').
66
67A typical application of hash functions is message integrity: If we
68send someone a message x via some untrusted channel (which can be
69modified by an adversary) and the hash value t=H(x) via a trusted
70channel then the receiver can check that the message has not been
71tampered with by checking that t=H(x') with x' being the received
72(possibly modified) message.
73
74While being able to produce an arbitrary collision may be harmless,
75being able to find a collision in a given set of texts of size ~
762^{l(n)} can be very dangerous and is not covered by "second preimage
77resistance": Consider the following scenario: we produce a letter L,
78have its hash signed by a responsible person, and then send it to some
79recipient. Now, it is very easy to produce functions
80f,g:{0,1}^{l(n)}->{0,1}^* such that for each z the string f(z) is,
81say, a different rejection letter, and g(z) is a different acceptance
82letter. E.g. by introducing various switches dear/esteemed, Sir/Madam
83or pleased/delighted etc. Or, if one is dealing with images rather
84than letters by slightly altering intensity values. Then if, one can
85produce z, z' such that H(f(z))=H(g(z')) one can have the hash
86t=H(f(z)) signed and communicate along with g(z'), for which the hash
87is also valid due to the collision. While this still requires
88collisions of a special kind it is sufficiently close to arbitrary
89collisions and for this reason one considers collision resistance as
90the "gold standard" for hash functions.
91
92
93
94Birthday paradox
95- - - - - - - - -
96
97Let us see, how many steps a brute force attack requires. In order to
98do that, we (simplifyingly) assume that H_s is a random function. Let
99us write H(x)=H_s(x). So we assume that H is drawn uniformly at random
100from {0,1}^*->{0,1}^{l} where l=l(n).
101
102Clearly, if we fix x then the likelihood for x' drawn uniformly at
103random to satisfy H(x)=H(x') is 2^{-l(n)} and thus, we need to try out
104ca N/2 values (N=2^l) to get a 50% success probability. This also
105holds, if x has been drawn at random. Things are different, if we
106randomly choose a set of values S and then *search* for a collision
107within S. Of course, if |S|>N then we find a collision with certainty,
108but perhaps surprisingly |S|>=2*sqrt(N) ensures that the likelihood of
109a collision within S is above 50%.
110
111So, to find a collision one can choose a set S of size 2*sqrt(N), sort
112it according to hash values and then search (in linear time) for the
113desired collision. If one does not find one, repeat with new data.
114
115Thus, for example if l(n)=64 then trying out 2^63 values will (with
116probability >=50%) lead to a collision with a fixed left hand side,
117but trying out 2^33 values will suffice to get an arbitrary collision
118using the birthday paradox. While 2^63 steps can only be carried out
119with special hardware resources, carrying out ~2^33 steps is trivial on
120today's computers. One must bear this in mind when trying to estimate
121"bits of security" of a hash function. Even assuming that the hash
122function does not contain systematic security holes we will only get
123one half of the hash size (size of the digest) many bits of security.
124
125
126
127Improved birthday attack
128- - - - - - - - - - - - -
129
130The implementation of the "birthday attack" sketched above takes not
131only roughly O(sqrt(N)) steps but also O(sqrt(N)) memory which is a
132more expensive resource. Indeed, storing 2^33 thirty-two byte words
133requires 64TB. We now describe a version of the birthday attack that
134runs in constant space.
135
136If we choose x0:{0,1}^l uniformly at random then, assuming as we did,
137that H is a random function then
138
139S={x0,H(x0),H(H(x0)),H(H(H(x0))),...,H^{2*sqrt(N)-1}(x0)}
140
141is also random and thus contains a collision with >=50%
142likelihood. Let us write xi:=H^i(x0). Suppose that i,j are such that
143xi=xj, but x_{i-1} =/= x_{j-1}, so x_{i-1},x_{j-1} form a
144collision. Note that xi=H(x_{i-1}). Assuming i<j and writing j=i+d we
145then also have x_i=x_{i+kd} for all k>=0 and thus x_{i+z}=x_{i+kd+z}
146for all k>=0 and z>=0. If we choose k such that kd>=i then
147x_{kd}=x_{2kd} follows with z=kd-i.
148
149This suggests the following
150algorithm for finding a collision with high likelihood:
151
152x0 <- {0,1}^l /* uniformly at random */
153x = x0;y = x0;
154for (r=1..4*sqrt(N)) {
155 x = H(x); /* x=H^r(x0) */
156 y = H(H(y)); /* y=H^{2r}(x0) */
157 if (x==y) break;}
158if (x!=y) start over with new x0; else
159y=x;x=x0; /* y=H^{kd}(x0) */
160if (x==x0) start over with new x0; else /* very unlikely */
161for (i=0..r) do {
162 x = H(x); /* x=H^i(x0) */
163 y = H(y); /* y=H^{i+kd}(x0) */
164 if (x==y) break; /* must happen before loop runs out */}
165return H^{i-1}(x0),H^{i+kd-1}(x0); /* can optimize recomputation */
166
167
168
169Merkle-Damgard Transform
170- - - - - - - - - - - - -
171
172The Merkle-Damgard Transform constructs a collision-resistant hash
173function (Gen,H) from a given *fixed length* collision-resistant hash
174function (Gen,h) with length 2*l(n), i.e. the h-hash is half as long
175as the input. The key generation remains unchanged and H is as follows.
176
1771) Given key s and input x:{0,1}^* divide message into blocks x1,x2,...,x_B
178of length l=l(n) possibly padding the last block with zeroes. Set x_{B+1}=L.
179
1802) Set z0=0...0 /* l zeroes */
181
1823) for i=1..B+1 do zi = h_s(z_{i-1} || x_i)
183
1844) Output z_{B+1}
185
186
187 x1 x2 x_B L
1880 | _____ | _____ | _____ | _____
189| |_| | |_| | |_| | |_| |
190|___| h_s |______| h_s |_...._____| h_s |_____| h_s |-----> H_s(x)
191 |_____| |_____| |_____| |_____|
192
193
194Theorem: If (Gen,h) is a collision-resistant fixed length hash
195function of length l'(n)=2*l(n) then (Gen,H) is a collision-resistant
196hash function.
197
198Proof: We first show that any collision in H_s yields a collision in
199h_s. Suppose that H_s(x)=H_s(x') and write x_1..x_B,L for the blocks
200and length of x and x'_1..x'_B' L' for x'. Also write z_1..z_{B+1},
201z'_1..z'_B' for the intermediate results. We have H_s(x)=h_s(z_B||L)
202and H_s(x')=h_s(z'_B'||L'), so, if L=/=L' we already have produced the
203desired collision in h_s. This also holds, if L=L' and z_B =/= z'_B
204(notice that B=B' in this case). If L=L' and z_B=z'_B then, since
205z_B=h_s(z_{B-1}||x_B) we get a collision unless x_B=x'_B. Continuing
206in this way, we eventually find our collision since x_i =/= x'_i for
207at least one i.
208
209Now, suppose that A is an adversary against H_s. We construct an
210adversary A' against h_s as follows. If A outputs x,x' we check
211whether H_s(x)=H_s(x') and if so, construct the corresponding
212collision h_s(y)=h_s(y') in h_s as described above and output
213y,y'. Otherwise, we output 0,0. The likelihood for succeeding, i.e.,
214for the output to be a h_s-collision is negligible by assumption and
215thus the probability that A output a collision is negligible as
216well. Notice that the computations we carried out as A' were clearly
217in polynomial time. QED
218
219The well-known and standardized cryptographic hash functions MD5,
220SHA-1, SHA-2 use the Merkle-Damgard transform. The underlying
221fixed-length hash function (called compression function) in this
222context is built using the heuristic principles that were described in
223Section 7 (Practical block ciphers. Principles: s/p- and Feistel
224networks). We remark here that MD5 and SHA-1 are no longer consider
225sufficiently secure whereas SHA-2 is.
226
227The latest standard SHA-3 which is not yet widely used but might
228eventually supersede SHA-2 does not use Merkle-Damgard but instead
229relies on the so-called *sponge construction*, see Wikipedia.
230
231
232
233MACs from hash functions
234- - - - - - - - - - - - -
235
236A natural question is whether one can use hash functions also for the
237purpose of authentication, i.e. MAC. First, we note that naive
238attempts (which according to KL have been actually used in practice!)
239
240
241do not work. Consider for example the definition Mac_k(x) = H_s(k||x)
242where s is some public seed. If H_s was constructed with the
243Merkle-Damgard transform from h_s then we can easily forge MACs as
244follows: if t is the MAC of x then t'=h_s(t||L') is the
245MAC of (x || |x|) (glossing over the padding).
246
247Suppose that we are given a *fixed-length* MAC (Gen,Mac) capable of
248authenticating messages of length l(n). Suppose furthermore, that we
249have a collision-resistant hash function (Gen',H) which produces
250hashes of length l(n). We can then build a MAC (Gen'',Mac'') for
251arbitrary length messages as follows:
252
253Gen''(n): k<-Gen(n); s<-Gen'(n);return (k,s) /* use some encoding of pairs */
254Mac''_{(k,s)}(m): return Mac_k(H_s(m))
255
256Theorem: Assuming that (Gen,Mac) is secure and (Gen',H) is
257collision-resistant then (Gen'',Mac'') is secure.
258
259Proof: Suppose that an adversary A asks questions Q and then succeeds
260in forging a MAC t=Mac''_{(k,s)}(x), i.e., x is not in Q. If H_s(x) is
261different from all the H_s(y) for y:Q then A has succeeded in forging
262a MAC for Mac_k. Namely then t=Mac_k(H_s(x)) and H_s(x) was not
263previously asked. If, however, H_s(x) is contained in {H_s(y) | y:Q}
264then A has produced a collision. Since both events occur with
265negligible probability the probability that A can forge a MAC is also
266negligible. QED
267
268Suppose that H_s() above has been constructed using the Merkle-Damgard
269construction from the fixed-length hash function h_s(). For seed s
270define a fixed-length MAC by
271
272 Mac_k(m) = h_s(k||m)
273
274We now make the assumption that h_s() is such that this is a secure
275fixed-length MAC. Notice that the abovedescribed extension attack on
276MACs of this sort does not apply due to the fixed length.
277
278If we now apply the above hash-and-MAC construction then it turns out
279that the arbitrary-length MAC obtained satisfies "essentially"
280
281Mac''_{(k,s)}(m) = H_s(k||H_s(m))
282
283because the last H_s degenerates to h_s due to the shortness of
284H_s(m). The quoted "essentially" refers to the fact that for this
285equation to hold exactly one needs to assume some conventions about
286how exactly the padding in the Merkle-Damgard construction is
287performed. We want to ignore these details here and refer to the
288literature.
289
290
291The HMAC construction
292- - - - - - - - - - -
293
294The widely used and standardised HMAC construction is essentially the
295above except that another key is added to the message m prior to
296invoking the inner hash function and that the two keys that are thus
297needed are derived from a single one by XOR-ing with certain fixed
298words ipad and opad:
299
300HMAC_{(k,s)}(m) = H_s( (k+opad) || H_s((k+ipad) || m))
301
302In practice, ipad and opad are chosen to be 00110110 and 01011100 each
303repeated as often as necessary.
304
305The additional inner keying with k+ipad allows one to weaken the
306assumptions on the collision freeness of H_s(). The details of these
307assumptions and their practical relevance are still subject to debate
308(see a 2012 paper by Koblitz and Menezes); however, no practical
309attack has been found to date and HMAC is considered secure in
310practice.