15 Factoring and computing discrete logarithms ---------------------------------------------- In this chapter we survey algorithms for factoring large numbers, in particular products of equal sized primes, and for computing discrete logarithms. These algorithms or rather their runtime provide lower bounds on the necessary key lengths for public key cryptography for it relies on the difficulty of these problems. For the sections about factoring we let N=pq be a product of two roughly equal sized primes and n = O(log(N)) be the length of N in binary. The task is to find p,q from N. For the sections about discrete logarithms let G be a cyclic group of order q and g,y:G with g a generator. The task is to find x:Z_q such that g^x=y. Trial division - - - - - - - - In order to factor N=pq, i.e. to find p or q given N, we can try out divisors d=1...sqrt(N). The runtime of this is polynomial in sqrt(N), hence takes time exponential in n. Once n>120, hence ||sqrt(N)||=~=60 this is no longer feasible. Pollard's p-1 method - - - - - - - - - - - Pollard's p-1 method may help when p-1 is a product of powers of small primes. We let p_1,...p_k be the first k primes and construct B_k = prod_{i=1..k}p_i^{n / ||p_i||} The exponent n/||p_i|| is chosen so that if some power of p_i divides p-1 then that power is at most n/||p_i||. As a result, if p-1 is a product of powers of primes from among the set {p_1..p_k} then p-1 | B_k. If, in addition, q-1 does *not* divide B_k then we can find out p as we show shortly. In order that q-1 does *not* divide B_k we can repeatedly execute the method with increasing values for k hoping that at some point one of the two prime factors "works" and the other does not. OK so, now let B any number so that p-1 | B and q-1 does not divide B. For any x it holds that x^B mod p = 1 whereas with constant probability x^B mod q =/= 1. The former holds since the order of Z_p^* is p-1 and B is a multiple thereof. The latter holds since when x is a generator of Z_q^* then x^B mod q is not 1 and the proportion of those generators is about 50% (not hard to see or use Thm B.15 of KL). Now let y = x^B-1 mod N. We have y mod p = 0 and y mod q =/= 0. Thus, p divides y and q does not divide y so gcd(y,N)=p. The resulting algorithm is now as follows: for k=1,2,3,... do the following until either a factor of N is found or computing resources are exhausted. Put B = B_k for a certain number of times do the following x<-Z_N d = gcd(x^B-1 mod N,N) if d=/=1 then a factor of N has been found. Pollard's rho method - - - - - - - - - - - This algorithm is based on the observation that once we have two distinct values x,y:Z_N such that x mod p = y mod p then p = gcd(x-y,N). Thus, factoring amounts to finding a collision in the "hash function" H(x) = x mod p. Notice that while we cannot compute this function we can detect whether x,y constitute a collision by gcd computation. Now, by the birthday paradox, a random set of size S := 2*sqrt(p) = O(N^{1/4}) contains such a collision with likelihood 50%. Finding it, requires S^2 collision tests thus, when implemented naively, this method is no better than trial division. However, the constant space method for collision detection described earlier only has a runtime of O(S), but of course, as already mentioned, we cannot compute H(). However, we can replace it with any function F which has the property that x = y (mod p) implies F(x)=F(y) (mod p) and which is such that for random x0 the set {x_0,F(x_0),F(F(x_0)),..., F^S(x_0)} is also sufficiently close to a random set so that it is likely to contain a collision. The function F(x)=x^2 + b for some b =/= 0, -2 mod N is believed to have this property and at any rate behaves well in practice. Just as in the earlier algorithm we have that F^i(x0) = F^j(x0) (mod p) implies F^k(x0) = F^{2k}(x0) (mod p) for some k. So, the set of the first S iterates of F() starting from random x0 is likely to contain a collision. But since once x,y form a collision so do F(x), F(y), it suffices to search for a collision among the pairs F^k(x0), F^{2k}(x0) which leads to the following algorithm: b <- Z_N \ {0,-2} F(x) = x^2+b mod N x <- Z_N y = x for k=1..2*sqrt(N) do x = F(x); y=F(F(x)) if gcd(x-y,N) != 1 a factor is found. repeat the whole procedure until either a factor is found or resources are exhausted. Assuming that the function F() has the required properties (which has not been proved to this date!) the runtime of this algorithm is proportional to N^{1/4} which is still exponential in n but requires one to double n if one wants to stay immune against this method. The quadratic sieve method - - - - - - - - - - - - - - The starting point for the most efficient factoring algorithms to this date is the following observation made by Pierre de Fermat: If we succeed in expressing N as the difference of two squares N=x^2-y^2 then N=(x+y)(x-y) so we have factored N. Fermat suggested to try out successive values for x starting at sqrt(N) and for each of those checking whether x^2-N is a square. The first improvement one can make on this observation is that it suffices that N divides x^2 - y^2 for if kN = (x+y)(x-y) then---assuming that x != (+/-)y mod N---the factors p and q of N must straddle the product so that gcd(x-y,N) or gcd(x+y,N) will be different from 1. Thus, we seek x,y:Z_N such that x^2-y^2 = 0 (mod N) and x !=y, x!=-y mod N. Notice that by the Chinese Remainder Theorem each square has four roots modulo N, so given x, two out of the four roots of x^2 are good. Now, to find such x,y more efficiently we proceed as follows. We select a set B of small primes and try out various values x>sqrt(N) so that x^2 mod N can be factored (because after reduction mod N it is sufficiently small) and its factors are in B. In this way, we find numbers x_1 .. x_l such that each x_i^2 mod N is expressed as a product of primes in B. With each of these numbers x_i we associate a B-indexed 0,1 vector whose entry at p:B is the exponent of p in the factorisation of x_i modulo 2. For example, if B={2,3,5,11} and x_1^2 mod N is 2^5*3^2*11^2 then the associated vector is (1 0 0 0). If l exceeds the size of k then there is a (mod 2) linear combination of these vectors which results in the all zero vector. But multiplying the corresponding x_i together then yields a square because all prime factors have even exponents. Continuing the example, if x_2^2 mod N is 2^15*11, i.e. (1 0 0 1), and x_3^2 mod N is 11^13, i.e. (0 0 0 1) then, since 1000+1001+0001=0000, we obtain the following identity mod N: (x_1*x_2*x_3)^2 = (2^10 * 3 * 11^8)^2 and, heuristically, we may assume that there is a 50% chance for this identity to be nontrivial in the sense that it yields a good pair of square roots. Summing up, we build up a large basis of values x whose squares mod N can be completely factored, then compute a linear combination (mod 2) of the zero vector from the corresponding B-indexed vectors and check whether the resulting identity of squares mod N is nontrivial which then yields the desired factor. We now study the problem of finding discrete logarithms in a cyclic group G as above. The Pohlig-Hellman algorithm - - - - - - - - - - - - - - - If we have a factorisation of the group order q as prod_{i=1}^n q^i with the q_i relatively prime (gcd = 1) then, if we have values x_i such that (g^{q/q_i})^x_i = y^{q/q_i} we can find x by solving (using the Chinese Remainder Theorem) the system of modular equations x mod q_i = x_i Since g^{q/q_i} generates a subgroup of order q_i which is easier if the q_i are small. It remains to see why x satisfies the above system of equations. Indeed, if g^x=y then g^{q/q_i}^x = y^{q/q_i} so x mod q_i solves the discrete log problem of order q_i. It remains to see how to deal with groups whose order is a prime power q=p^f. In this case, one can expand the unknown x as x = x_0 + x_1 * p + x_2 * p^2 + ... + x_{f-1} p^{f-1} Now, by exhaustive search or using any other algorithm we can determine x_0:Z_p such that g^{q/p}^{x_0} = y^{q/p} (mod p). Once we know x_0 we can then recursively solve (g^p)^{x'} = y/g^{x_0} which yields x_1,... The details are left as an exercise. The giant-step baby-step algorithm - - - - - - - - - - - - - - - - - - Here the idea is that the unknown x can be decomposed as x = t*[sqrt(q)] + s for some s