diff options
Diffstat (limited to 'ss2017/cryptography/Notes15.txt')
-rw-r--r-- | ss2017/cryptography/Notes15.txt | 279 |
1 files changed, 279 insertions, 0 deletions
diff --git a/ss2017/cryptography/Notes15.txt b/ss2017/cryptography/Notes15.txt new file mode 100644 index 0000000..e76ada5 --- /dev/null +++ b/ss2017/cryptography/Notes15.txt | |||
@@ -0,0 +1,279 @@ | |||
1 | |||
2 | |||
3 | 15 Factoring and computing discrete logarithms | ||
4 | ---------------------------------------------- | ||
5 | |||
6 | In this chapter we survey algorithms for factoring large numbers, in | ||
7 | particular products of equal sized primes, and for computing discrete | ||
8 | logarithms. These algorithms or rather their runtime provide lower | ||
9 | bounds on the necessary key lengths for public key cryptography for it | ||
10 | relies on the difficulty of these problems. | ||
11 | |||
12 | For the sections about factoring we let N=pq be a product of two | ||
13 | roughly equal sized primes and n = O(log(N)) be the length of N in | ||
14 | binary. The task is to find p,q from N. | ||
15 | |||
16 | For the sections about discrete logarithms let G be a cyclic group of | ||
17 | order q and g,y:G with g a generator. The task is to find x:Z_q such | ||
18 | that g^x=y. | ||
19 | |||
20 | Trial division | ||
21 | - - - - - - - - | ||
22 | |||
23 | In order to factor N=pq, i.e. to find p or q given N, we can try out | ||
24 | divisors d=1...sqrt(N). The runtime of this is polynomial in sqrt(N), | ||
25 | hence takes time exponential in n. Once n>120, hence ||sqrt(N)||=~=60 | ||
26 | this is no longer feasible. | ||
27 | |||
28 | |||
29 | Pollard's p-1 method | ||
30 | - - - - - - - - - - - | ||
31 | |||
32 | Pollard's p-1 method may help when p-1 is a product of powers of small | ||
33 | primes. We let p_1,...p_k be the first k primes and construct | ||
34 | |||
35 | B_k = prod_{i=1..k}p_i^{n / ||p_i||} | ||
36 | |||
37 | The exponent n/||p_i|| is chosen so that if some power of p_i divides | ||
38 | p-1 then that power is at most n/||p_i||. As a result, if p-1 is a | ||
39 | product of powers of primes from among the set {p_1..p_k} then p-1 | | ||
40 | B_k. If, in addition, q-1 does *not* divide B_k then we can find out p | ||
41 | as we show shortly. In order that q-1 does *not* divide B_k we can | ||
42 | repeatedly execute the method with increasing values for k hoping that | ||
43 | at some point one of the two prime factors "works" and the other does | ||
44 | not. | ||
45 | |||
46 | OK so, now let B any number so that p-1 | B and q-1 does not divide | ||
47 | B. For any x it holds that x^B mod p = 1 whereas with constant | ||
48 | probability x^B mod q =/= 1. The former holds since the order of Z_p^* | ||
49 | is p-1 and B is a multiple thereof. The latter holds since when x is a | ||
50 | generator of Z_q^* then x^B mod q is not 1 and the proportion of those | ||
51 | generators is about 50% (not hard to see or use Thm B.15 of KL). | ||
52 | |||
53 | Now let y = x^B-1 mod N. We have y mod p = 0 and y mod q =/= 0. Thus, | ||
54 | p divides y and q does not divide y so gcd(y,N)=p. | ||
55 | |||
56 | The resulting algorithm is now as follows: | ||
57 | |||
58 | for k=1,2,3,... do the following until either a factor of N is found | ||
59 | or computing resources are exhausted. | ||
60 | Put B = B_k | ||
61 | for a certain number of times do the following | ||
62 | x<-Z_N | ||
63 | d = gcd(x^B-1 mod N,N) | ||
64 | if d=/=1 then a factor of N has been found. | ||
65 | |||
66 | |||
67 | |||
68 | Pollard's rho method | ||
69 | - - - - - - - - - - - | ||
70 | |||
71 | This algorithm is based on the observation that once we have two | ||
72 | distinct values x,y:Z_N such that x mod p = y mod p then p = | ||
73 | gcd(x-y,N). Thus, factoring amounts to finding a collision in the | ||
74 | "hash function" H(x) = x mod p. Notice that while we cannot compute | ||
75 | this function we can detect whether x,y constitute a collision by gcd | ||
76 | computation. | ||
77 | |||
78 | Now, by the birthday paradox, a random set of size S := 2*sqrt(p) = | ||
79 | O(N^{1/4}) contains such a collision with likelihood 50%. Finding it, | ||
80 | requires S^2 collision tests thus, when implemented naively, this | ||
81 | method is no better than trial division. | ||
82 | |||
83 | However, the constant space method for collision detection described | ||
84 | earlier only has a runtime of O(S), but of course, as already | ||
85 | mentioned, we cannot compute H(). However, we can replace it with any | ||
86 | function F which has the property that x = y (mod p) implies F(x)=F(y) | ||
87 | (mod p) and which is such that for random x0 the set | ||
88 | {x_0,F(x_0),F(F(x_0)),..., F^S(x_0)} is also sufficiently close to a | ||
89 | random set so that it is likely to contain a collision. The function | ||
90 | F(x)=x^2 + b for some b =/= 0, -2 mod N is believed to have this | ||
91 | property and at any rate behaves well in practice. Just as in the | ||
92 | earlier algorithm we have that F^i(x0) = F^j(x0) (mod p) implies | ||
93 | F^k(x0) = F^{2k}(x0) (mod p) for some k. | ||
94 | |||
95 | So, the set of the first S iterates of F() starting from random x0 is | ||
96 | likely to contain a collision. But since once x,y form a collision so | ||
97 | do F(x), F(y), it suffices to search for a collision among the pairs | ||
98 | F^k(x0), F^{2k}(x0) which leads to the following algorithm: | ||
99 | |||
100 | b <- Z_N \ {0,-2} | ||
101 | F(x) = x^2+b mod N | ||
102 | |||
103 | x <- Z_N | ||
104 | y = x | ||
105 | for k=1..2*sqrt(N) do | ||
106 | x = F(x); y=F(F(x)) | ||
107 | if gcd(x-y,N) != 1 a factor is found. | ||
108 | |||
109 | repeat the whole procedure until either a factor is found or resources | ||
110 | are exhausted. | ||
111 | |||
112 | Assuming that the function F() has the required properties (which has | ||
113 | not been proved to this date!) the runtime of this algorithm is | ||
114 | proportional to N^{1/4} which is still exponential in n but requires | ||
115 | one to double n if one wants to stay immune against this | ||
116 | method. | ||
117 | |||
118 | |||
119 | The quadratic sieve method | ||
120 | - - - - - - - - - - - - - - | ||
121 | |||
122 | The starting point for the most efficient factoring algorithms to this | ||
123 | date is the following observation made by Pierre de Fermat: | ||
124 | |||
125 | If we succeed in expressing N as the difference of two | ||
126 | squares N=x^2-y^2 then N=(x+y)(x-y) so we have factored N. | ||
127 | |||
128 | Fermat suggested to try out successive values for x starting at | ||
129 | sqrt(N) and for each of those checking whether x^2-N is a square. | ||
130 | |||
131 | The first improvement one can make on this observation is that it | ||
132 | suffices that N divides x^2 - y^2 for if kN = (x+y)(x-y) | ||
133 | then---assuming that x != (+/-)y mod N---the factors p and q of N | ||
134 | must straddle the product so that gcd(x-y,N) or gcd(x+y,N) will be | ||
135 | different from 1. | ||
136 | |||
137 | Thus, we seek x,y:Z_N such that x^2-y^2 = 0 (mod N) and x !=y, x!=-y mod N. | ||
138 | |||
139 | Notice that by the Chinese Remainder Theorem each square has four | ||
140 | roots modulo N, so given x, two out of the four roots of x^2 are good. | ||
141 | |||
142 | Now, to find such x,y more efficiently we proceed as follows. We | ||
143 | select a set B of small primes and try out various values x>sqrt(N) so | ||
144 | that x^2 mod N can be factored (because after reduction mod N it is | ||
145 | sufficiently small) and its factors are in B. In this way, we find | ||
146 | numbers x_1 .. x_l such that each x_i^2 mod N is expressed as a | ||
147 | product of primes in B. | ||
148 | |||
149 | With each of these numbers x_i we associate a B-indexed 0,1 vector | ||
150 | whose entry at p:B is the exponent of p in the factorisation of x_i | ||
151 | modulo 2. | ||
152 | |||
153 | For example, if B={2,3,5,11} and x_1^2 mod N is 2^5*3^2*11^2 then the | ||
154 | associated vector is (1 0 0 0). | ||
155 | |||
156 | If l exceeds the size of k then there is a (mod 2) linear combination | ||
157 | of these vectors which results in the all zero vector. But multiplying | ||
158 | the corresponding x_i together then yields a square because all prime | ||
159 | factors have even exponents. | ||
160 | |||
161 | Continuing the example, if x_2^2 mod N is 2^15*11, i.e. (1 0 0 1), and | ||
162 | x_3^2 mod N is 11^13, i.e. (0 0 0 1) then, since 1000+1001+0001=0000, | ||
163 | we obtain the following identity mod N: | ||
164 | |||
165 | (x_1*x_2*x_3)^2 = (2^10 * 3 * 11^8)^2 | ||
166 | |||
167 | and, heuristically, we may assume that there is a 50% chance for this | ||
168 | identity to be nontrivial in the sense that it yields a good pair of | ||
169 | square roots. | ||
170 | |||
171 | Summing up, we build up a large basis of values x whose squares mod N | ||
172 | can be completely factored, then compute a linear combination (mod 2) | ||
173 | of the zero vector from the corresponding B-indexed vectors and check | ||
174 | whether the resulting identity of squares mod N is nontrivial which | ||
175 | then yields the desired factor. | ||
176 | |||
177 | We now study the problem of finding discrete logarithms in a cyclic | ||
178 | group G as above. | ||
179 | |||
180 | |||
181 | The Pohlig-Hellman algorithm | ||
182 | - - - - - - - - - - - - - - - | ||
183 | |||
184 | If we have a factorisation of the group order q as prod_{i=1}^n q^i | ||
185 | with the q_i relatively prime (gcd = 1) then, if we have values x_i | ||
186 | such that | ||
187 | |||
188 | (g^{q/q_i})^x_i = y^{q/q_i} | ||
189 | |||
190 | we can find x by solving (using the Chinese Remainder Theorem) the | ||
191 | system of modular equations | ||
192 | |||
193 | x mod q_i = x_i | ||
194 | |||
195 | Since g^{q/q_i} generates a subgroup of order q_i which is easier if | ||
196 | the q_i are small. It remains to see why x satisfies the above system | ||
197 | of equations. Indeed, if g^x=y then g^{q/q_i}^x = y^{q/q_i} so x mod | ||
198 | q_i solves the discrete log problem of order q_i. | ||
199 | |||
200 | It remains to see how to deal with groups whose order is a prime power | ||
201 | q=p^f. In this case, one can expand the unknown x as | ||
202 | |||
203 | x = x_0 + x_1 * p + x_2 * p^2 + ... + x_{f-1} p^{f-1} | ||
204 | |||
205 | Now, by exhaustive search or using any other algorithm we can | ||
206 | determine x_0:Z_p such that g^{q/p}^{x_0} = y^{q/p} (mod p). Once we | ||
207 | know x_0 we can then recursively solve (g^p)^{x'} = y/g^{x_0} which | ||
208 | yields x_1,... The details are left as an exercise. | ||
209 | |||
210 | |||
211 | The giant-step baby-step algorithm | ||
212 | - - - - - - - - - - - - - - - - - - | ||
213 | |||
214 | Here the idea is that the unknown x can be decomposed as x = | ||
215 | t*[sqrt(q)] + s for some s <sqrt(q). Here [sqrt(q)] stands for the | ||
216 | greatest integer <= sqrt(q). We then precompute a table of the values | ||
217 | y_t := g^{t*[sqrt{q}]} for t=0..sqrt(q), this are the giant | ||
218 | steps. Now, for each s we look for (using binary search or similar) | ||
219 | for a t such that | ||
220 | |||
221 | g^{t*[sqrt(q)]} = y*g^{-s} | ||
222 | |||
223 | Once we have found it we can output x = s*[sqrt(q)]+t. The runtime of | ||
224 | this method is O(sqrt(q)) which again essentially halves the "bits of | ||
225 | security". | ||
226 | |||
227 | |||
228 | The index calculus method | ||
229 | - - - - - - - - - - - - - | ||
230 | |||
231 | Unlike the earlier methods the index calculus makes special | ||
232 | assumptions on the group G for which to solve the discrete logarithm | ||
233 | problem. Namely, it must be the group Z_p^* for some prime p. It can | ||
234 | be generalised to the group of invertible elements in a finite field, | ||
235 | but does not work in the case of ellipic curves, for instance. | ||
236 | |||
237 | So, let us assume that G=Z_p^* and that g,y:Z_p^*. As before, we seek | ||
238 | x such that g^x=y. As before, we write q=p-1 for the group order. | ||
239 | |||
240 | Let us write log_g(z) for the discrete logarithm of some value z:Z_p^* | ||
241 | so that g^{log_g(z)} = z. The idea is to precompute a table of such | ||
242 | logarithms. To that end, we choose values x_1..x_l such that g^{x_i} | ||
243 | can be factored modulo p into small primes from a previously fixed set | ||
244 | B={p_1,...,p_k}.From | ||
245 | |||
246 | g^{x_j} = prod_{i=1}^k p_i^{e_{j,i}} (mod p) | ||
247 | |||
248 | we then obtain | ||
249 | |||
250 | x_j = sum_{i=1}^k e_{j,i} * log_g(p_i) (mod q) | ||
251 | |||
252 | Now, assuming that l in relation to k was chosen large enough this | ||
253 | system can be olved for the unknowns log_g(p_i). We thus have obtained | ||
254 | the desired "table of logarithms". | ||
255 | |||
256 | It allows us to easily compute the discrete logarithm of any number | ||
257 | w:Z_p^* which can be factored into primes from B: if w = prod_{i=1}^k | ||
258 | p_i^{e_i} then log_g(w) = sum_{i=1}^k e_i * log(p_i) (mod q). | ||
259 | |||
260 | If y itself has that property we are obviously done. Otherwise, we can | ||
261 | search for a value With the latter at hand we can then --- with luck | ||
262 | --- take the logarithm of our y as follows: we choose some value z:Z_q | ||
263 | at random such that y*g^z can be so factored and thus taken a | ||
264 | logarithm of. This helps, since log_g(y) = log_g(y*g^z)-z. | ||
265 | |||
266 | Using advanced number-theoretic methods one can analyse the | ||
267 | probabilities that the various random choices in this method succeed | ||
268 | and thus derive an expected runtime that is subexponential (in | ||
269 | ||q||). Indeed, the index calculus method is currently the best known | ||
270 | method for computing discrete logarithms. | ||
271 | |||
272 | We also remark that it is the possibility of factoring elements of G | ||
273 | into primes what makes this algorithm possible. For other groups, in | ||
274 | particular the multiplicative group of a finite field, analogous | ||
275 | notions of "primes" can be found, but for groups derived from elliptic | ||
276 | curves no such notion has been discovered and hence these groups are | ||
277 | immune against index calculus making them particularly promising for | ||
278 | cryptography. Having said that, taking discrete logarithms in a group | ||
279 | Z_p^* where p has 1000+ digits remains unfeasible. | ||