diff options
author | Gregor Kleen <gkleen@yggdrasil.li> | 2017-10-24 15:33:07 +0200 |
---|---|---|
committer | Gregor Kleen <gkleen@yggdrasil.li> | 2017-10-24 15:33:07 +0200 |
commit | 162df3190c4e179ef968b0a467ddd4951faba336 (patch) | |
tree | 3c9628cf1abf259e1b897ac14b006e2b448cff27 /ss2017/cryptography/Notes11.txt | |
parent | 92ed41381f40c97b52b0e96f0eaeaa9025091411 (diff) | |
download | uni-162df3190c4e179ef968b0a467ddd4951faba336.tar uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.gz uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.bz2 uni-162df3190c4e179ef968b0a467ddd4951faba336.tar.xz uni-162df3190c4e179ef968b0a467ddd4951faba336.zip |
Notes on cryptography
Diffstat (limited to 'ss2017/cryptography/Notes11.txt')
-rw-r--r-- | ss2017/cryptography/Notes11.txt | 310 |
1 files changed, 310 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes11.txt b/ss2017/cryptography/Notes11.txt new file mode 100644 index 0000000..839d24b --- /dev/null +++ b/ss2017/cryptography/Notes11.txt | |||
@@ -0,0 +1,310 @@ | |||
1 | |||
2 | |||
3 | 11 Hashing, Merkle-Damgard transform, SHA, HMAC | ||
4 | ----------------------------------------------- | ||
5 | |||
6 | Hash functions serve a related but slightly different purpose to that | ||
7 | of MACs. Unlike those (and private key encryption) they are not | ||
8 | immediately useful on their own but rather are building blocks and | ||
9 | indeed are "a workhorse of cryptography" like e.g. solving systems of | ||
10 | linear equations is to engineering. | ||
11 | |||
12 | Intuitively, a hash function H() transforms a long string m into a | ||
13 | short digest ("hash") t=H(m) such that it is extremely difficult to | ||
14 | produce two texts x =/= x' with H(x)=H(x'). Such pair x,x' is known as | ||
15 | a *collision*. | ||
16 | |||
17 | As such, the notion cannot really be made sense of in the context of | ||
18 | computational complexity for once we have x=/=x' with H(x)=H(x') then | ||
19 | we can just hardwire x,x' into a program. In order to get around this, | ||
20 | we introduce as before randomly chosen keys and security parameters. | ||
21 | This will allow us to formulate meaningful security goals and | ||
22 | meaningful security proofs for various constructions involving hash | ||
23 | functions. | ||
24 | |||
25 | Definition: A *hash function* consists of a (probabilistic polytime) | ||
26 | key generation algorithm Gen(n) as before and a polytime function H | ||
27 | taking as input a key s ("seed") and m:{0,1}^* then | ||
28 | H_s(m):{0,1}^{l(n)} where l(n) is a fixed, and typically simple, | ||
29 | length function, e.g. l(n)=n. | ||
30 | |||
31 | There may also be another length function l'(n)>l(n) and | ||
32 | H_s:{0,1}^{l'(n)} -> {0,1}^{l(n)}. In that case, we have a | ||
33 | fixed-length hash function. | ||
34 | |||
35 | Definition: A hash function Pi=(Gen,H) as above is *collision resistant* if | ||
36 | for all probabilistic polytime adversaries A one has | ||
37 | |||
38 | Pr[Hash-coll_{A,Pi}(n)=1] = negl(n) | ||
39 | |||
40 | for some negligible function negl(n). Here, Hash-coll_{A,Pi}(n) is the | ||
41 | following experiment: | ||
42 | |||
43 | 1) s <-Gen(n) /* s = "seed" */ | ||
44 | 2) A is given s and then outputs x,x' | ||
45 | 3) The result of the experiment is 1 if x=/=x' and H_s(x)=H_s(x') and | ||
46 | 0 otherwise. | ||
47 | |||
48 | Thus, the adversary must provide a collision given a key s and the | ||
49 | probability that it succeeds in doing so should be negligible. In | ||
50 | contrast, in the case of MACs the adversary does not get access to the | ||
51 | key. | ||
52 | |||
53 | |||
54 | |||
55 | Weaker notions of security | ||
56 | - - - - - - - - - - - - - - | ||
57 | |||
58 | In this context one may ask why being able to produce a collision | ||
59 | should be considered dangerous. Firstly, if a hash function is | ||
60 | collision resistant then it follows that it is also not possible to | ||
61 | forge a message x' =/=x such that H(x)=H(x') given x. So collision | ||
62 | resistance is certainly a "good enough" notion. However, one could | ||
63 | instead ask for the weaker property of "second preimage resistance": | ||
64 | it should be difficult to, given x, produce x'=/=x such that | ||
65 | H(x)=(x'). | ||
66 | |||
67 | A typical application of hash functions is message integrity: If we | ||
68 | send someone a message x via some untrusted channel (which can be | ||
69 | modified by an adversary) and the hash value t=H(x) via a trusted | ||
70 | channel then the receiver can check that the message has not been | ||
71 | tampered with by checking that t=H(x') with x' being the received | ||
72 | (possibly modified) message. | ||
73 | |||
74 | While being able to produce an arbitrary collision may be harmless, | ||
75 | being able to find a collision in a given set of texts of size ~ | ||
76 | 2^{l(n)} can be very dangerous and is not covered by "second preimage | ||
77 | resistance": Consider the following scenario: we produce a letter L, | ||
78 | have its hash signed by a responsible person, and then send it to some | ||
79 | recipient. Now, it is very easy to produce functions | ||
80 | f,g:{0,1}^{l(n)}->{0,1}^* such that for each z the string f(z) is, | ||
81 | say, a different rejection letter, and g(z) is a different acceptance | ||
82 | letter. E.g. by introducing various switches dear/esteemed, Sir/Madam | ||
83 | or pleased/delighted etc. Or, if one is dealing with images rather | ||
84 | than letters by slightly altering intensity values. Then if, one can | ||
85 | produce z, z' such that H(f(z))=H(g(z')) one can have the hash | ||
86 | t=H(f(z)) signed and communicate along with g(z'), for which the hash | ||
87 | is also valid due to the collision. While this still requires | ||
88 | collisions of a special kind it is sufficiently close to arbitrary | ||
89 | collisions and for this reason one considers collision resistance as | ||
90 | the "gold standard" for hash functions. | ||
91 | |||
92 | |||
93 | |||
94 | Birthday paradox | ||
95 | - - - - - - - - - | ||
96 | |||
97 | Let us see, how many steps a brute force attack requires. In order to | ||
98 | do that, we (simplifyingly) assume that H_s is a random function. Let | ||
99 | us write H(x)=H_s(x). So we assume that H is drawn uniformly at random | ||
100 | from {0,1}^*->{0,1}^{l} where l=l(n). | ||
101 | |||
102 | Clearly, if we fix x then the likelihood for x' drawn uniformly at | ||
103 | random to satisfy H(x)=H(x') is 2^{-l(n)} and thus, we need to try out | ||
104 | ca N/2 values (N=2^l) to get a 50% success probability. This also | ||
105 | holds, if x has been drawn at random. Things are different, if we | ||
106 | randomly choose a set of values S and then *search* for a collision | ||
107 | within S. Of course, if |S|>N then we find a collision with certainty, | ||
108 | but perhaps surprisingly |S|>=2*sqrt(N) ensures that the likelihood of | ||
109 | a collision within S is above 50%. | ||
110 | |||
111 | So, to find a collision one can choose a set S of size 2*sqrt(N), sort | ||
112 | it according to hash values and then search (in linear time) for the | ||
113 | desired collision. If one does not find one, repeat with new data. | ||
114 | |||
115 | Thus, for example if l(n)=64 then trying out 2^63 values will (with | ||
116 | probability >=50%) lead to a collision with a fixed left hand side, | ||
117 | but trying out 2^33 values will suffice to get an arbitrary collision | ||
118 | using the birthday paradox. While 2^63 steps can only be carried out | ||
119 | with special hardware resources, carrying out ~2^33 steps is trivial on | ||
120 | today's computers. One must bear this in mind when trying to estimate | ||
121 | "bits of security" of a hash function. Even assuming that the hash | ||
122 | function does not contain systematic security holes we will only get | ||
123 | one half of the hash size (size of the digest) many bits of security. | ||
124 | |||
125 | |||
126 | |||
127 | Improved birthday attack | ||
128 | - - - - - - - - - - - - - | ||
129 | |||
130 | The implementation of the "birthday attack" sketched above takes not | ||
131 | only roughly O(sqrt(N)) steps but also O(sqrt(N)) memory which is a | ||
132 | more expensive resource. Indeed, storing 2^33 thirty-two byte words | ||
133 | requires 64TB. We now describe a version of the birthday attack that | ||
134 | runs in constant space. | ||
135 | |||
136 | If we choose x0:{0,1}^l uniformly at random then, assuming as we did, | ||
137 | that H is a random function then | ||
138 | |||
139 | S={x0,H(x0),H(H(x0)),H(H(H(x0))),...,H^{2*sqrt(N)-1}(x0)} | ||
140 | |||
141 | is also random and thus contains a collision with >=50% | ||
142 | likelihood. Let us write xi:=H^i(x0). Suppose that i,j are such that | ||
143 | xi=xj, but x_{i-1} =/= x_{j-1}, so x_{i-1},x_{j-1} form a | ||
144 | collision. Note that xi=H(x_{i-1}). Assuming i<j and writing j=i+d we | ||
145 | then also have x_i=x_{i+kd} for all k>=0 and thus x_{i+z}=x_{i+kd+z} | ||
146 | for all k>=0 and z>=0. If we choose k such that kd>=i then | ||
147 | x_{kd}=x_{2kd} follows with z=kd-i. | ||
148 | |||
149 | This suggests the following | ||
150 | algorithm for finding a collision with high likelihood: | ||
151 | |||
152 | x0 <- {0,1}^l /* uniformly at random */ | ||
153 | x = x0;y = x0; | ||
154 | for (r=1..4*sqrt(N)) { | ||
155 | x = H(x); /* x=H^r(x0) */ | ||
156 | y = H(H(y)); /* y=H^{2r}(x0) */ | ||
157 | if (x==y) break;} | ||
158 | if (x!=y) start over with new x0; else | ||
159 | y=x;x=x0; /* y=H^{kd}(x0) */ | ||
160 | if (x==x0) start over with new x0; else /* very unlikely */ | ||
161 | for (i=0..r) do { | ||
162 | x = H(x); /* x=H^i(x0) */ | ||
163 | y = H(y); /* y=H^{i+kd}(x0) */ | ||
164 | if (x==y) break; /* must happen before loop runs out */} | ||
165 | return H^{i-1}(x0),H^{i+kd-1}(x0); /* can optimize recomputation */ | ||
166 | |||
167 | |||
168 | |||
169 | Merkle-Damgard Transform | ||
170 | - - - - - - - - - - - - - | ||
171 | |||
172 | The Merkle-Damgard Transform constructs a collision-resistant hash | ||
173 | function (Gen,H) from a given *fixed length* collision-resistant hash | ||
174 | function (Gen,h) with length 2*l(n), i.e. the h-hash is half as long | ||
175 | as the input. The key generation remains unchanged and H is as follows. | ||
176 | |||
177 | 1) Given key s and input x:{0,1}^* divide message into blocks x1,x2,...,x_B | ||
178 | of length l=l(n) possibly padding the last block with zeroes. Set x_{B+1}=L. | ||
179 | |||
180 | 2) Set z0=0...0 /* l zeroes */ | ||
181 | |||
182 | 3) for i=1..B+1 do zi = h_s(z_{i-1} || x_i) | ||
183 | |||
184 | 4) Output z_{B+1} | ||
185 | |||
186 | |||
187 | x1 x2 x_B L | ||
188 | 0 | _____ | _____ | _____ | _____ | ||
189 | | |_| | |_| | |_| | |_| | | ||
190 | |___| h_s |______| h_s |_...._____| h_s |_____| h_s |-----> H_s(x) | ||
191 | |_____| |_____| |_____| |_____| | ||
192 | |||
193 | |||
194 | Theorem: If (Gen,h) is a collision-resistant fixed length hash | ||
195 | function of length l'(n)=2*l(n) then (Gen,H) is a collision-resistant | ||
196 | hash function. | ||
197 | |||
198 | Proof: We first show that any collision in H_s yields a collision in | ||
199 | h_s. Suppose that H_s(x)=H_s(x') and write x_1..x_B,L for the blocks | ||
200 | and length of x and x'_1..x'_B' L' for x'. Also write z_1..z_{B+1}, | ||
201 | z'_1..z'_B' for the intermediate results. We have H_s(x)=h_s(z_B||L) | ||
202 | and H_s(x')=h_s(z'_B'||L'), so, if L=/=L' we already have produced the | ||
203 | desired collision in h_s. This also holds, if L=L' and z_B =/= z'_B | ||
204 | (notice that B=B' in this case). If L=L' and z_B=z'_B then, since | ||
205 | z_B=h_s(z_{B-1}||x_B) we get a collision unless x_B=x'_B. Continuing | ||
206 | in this way, we eventually find our collision since x_i =/= x'_i for | ||
207 | at least one i. | ||
208 | |||
209 | Now, suppose that A is an adversary against H_s. We construct an | ||
210 | adversary A' against h_s as follows. If A outputs x,x' we check | ||
211 | whether H_s(x)=H_s(x') and if so, construct the corresponding | ||
212 | collision h_s(y)=h_s(y') in h_s as described above and output | ||
213 | y,y'. Otherwise, we output 0,0. The likelihood for succeeding, i.e., | ||
214 | for the output to be a h_s-collision is negligible by assumption and | ||
215 | thus the probability that A output a collision is negligible as | ||
216 | well. Notice that the computations we carried out as A' were clearly | ||
217 | in polynomial time. QED | ||
218 | |||
219 | The well-known and standardized cryptographic hash functions MD5, | ||
220 | SHA-1, SHA-2 use the Merkle-Damgard transform. The underlying | ||
221 | fixed-length hash function (called compression function) in this | ||
222 | context is built using the heuristic principles that were described in | ||
223 | Section 7 (Practical block ciphers. Principles: s/p- and Feistel | ||
224 | networks). We remark here that MD5 and SHA-1 are no longer consider | ||
225 | sufficiently secure whereas SHA-2 is. | ||
226 | |||
227 | The latest standard SHA-3 which is not yet widely used but might | ||
228 | eventually supersede SHA-2 does not use Merkle-Damgard but instead | ||
229 | relies on the so-called *sponge construction*, see Wikipedia. | ||
230 | |||
231 | |||
232 | |||
233 | MACs from hash functions | ||
234 | - - - - - - - - - - - - - | ||
235 | |||
236 | A natural question is whether one can use hash functions also for the | ||
237 | purpose of authentication, i.e. MAC. First, we note that naive | ||
238 | attempts (which according to KL have been actually used in practice!) | ||
239 | |||
240 | |||
241 | do not work. Consider for example the definition Mac_k(x) = H_s(k||x) | ||
242 | where s is some public seed. If H_s was constructed with the | ||
243 | Merkle-Damgard transform from h_s then we can easily forge MACs as | ||
244 | follows: if t is the MAC of x then t'=h_s(t||L') is the | ||
245 | MAC of (x || |x|) (glossing over the padding). | ||
246 | |||
247 | Suppose that we are given a *fixed-length* MAC (Gen,Mac) capable of | ||
248 | authenticating messages of length l(n). Suppose furthermore, that we | ||
249 | have a collision-resistant hash function (Gen',H) which produces | ||
250 | hashes of length l(n). We can then build a MAC (Gen'',Mac'') for | ||
251 | arbitrary length messages as follows: | ||
252 | |||
253 | Gen''(n): k<-Gen(n); s<-Gen'(n);return (k,s) /* use some encoding of pairs */ | ||
254 | Mac''_{(k,s)}(m): return Mac_k(H_s(m)) | ||
255 | |||
256 | Theorem: Assuming that (Gen,Mac) is secure and (Gen',H) is | ||
257 | collision-resistant then (Gen'',Mac'') is secure. | ||
258 | |||
259 | Proof: Suppose that an adversary A asks questions Q and then succeeds | ||
260 | in forging a MAC t=Mac''_{(k,s)}(x), i.e., x is not in Q. If H_s(x) is | ||
261 | different from all the H_s(y) for y:Q then A has succeeded in forging | ||
262 | a MAC for Mac_k. Namely then t=Mac_k(H_s(x)) and H_s(x) was not | ||
263 | previously asked. If, however, H_s(x) is contained in {H_s(y) | y:Q} | ||
264 | then A has produced a collision. Since both events occur with | ||
265 | negligible probability the probability that A can forge a MAC is also | ||
266 | negligible. QED | ||
267 | |||
268 | Suppose that H_s() above has been constructed using the Merkle-Damgard | ||
269 | construction from the fixed-length hash function h_s(). For seed s | ||
270 | define a fixed-length MAC by | ||
271 | |||
272 | Mac_k(m) = h_s(k||m) | ||
273 | |||
274 | We now make the assumption that h_s() is such that this is a secure | ||
275 | fixed-length MAC. Notice that the abovedescribed extension attack on | ||
276 | MACs of this sort does not apply due to the fixed length. | ||
277 | |||
278 | If we now apply the above hash-and-MAC construction then it turns out | ||
279 | that the arbitrary-length MAC obtained satisfies "essentially" | ||
280 | |||
281 | Mac''_{(k,s)}(m) = H_s(k||H_s(m)) | ||
282 | |||
283 | because the last H_s degenerates to h_s due to the shortness of | ||
284 | H_s(m). The quoted "essentially" refers to the fact that for this | ||
285 | equation to hold exactly one needs to assume some conventions about | ||
286 | how exactly the padding in the Merkle-Damgard construction is | ||
287 | performed. We want to ignore these details here and refer to the | ||
288 | literature. | ||
289 | |||
290 | |||
291 | The HMAC construction | ||
292 | - - - - - - - - - - - | ||
293 | |||
294 | The widely used and standardised HMAC construction is essentially the | ||
295 | above except that another key is added to the message m prior to | ||
296 | invoking the inner hash function and that the two keys that are thus | ||
297 | needed are derived from a single one by XOR-ing with certain fixed | ||
298 | words ipad and opad: | ||
299 | |||
300 | HMAC_{(k,s)}(m) = H_s( (k+opad) || H_s((k+ipad) || m)) | ||
301 | |||
302 | In practice, ipad and opad are chosen to be 00110110 and 01011100 each | ||
303 | repeated as often as necessary. | ||
304 | |||
305 | The additional inner keying with k+ipad allows one to weaken the | ||
306 | assumptions on the collision freeness of H_s(). The details of these | ||
307 | assumptions and their practical relevance are still subject to debate | ||
308 | (see a 2012 paper by Koblitz and Menezes); however, no practical | ||
309 | attack has been found to date and HMAC is considered secure in | ||
310 | practice. | ||