From 162df3190c4e179ef968b0a467ddd4951faba336 Mon Sep 17 00:00:00 2001
From: Gregor Kleen <gkleen@yggdrasil.li>
Date: Tue, 24 Oct 2017 15:33:07 +0200
Subject: Notes on cryptography

---
 ss2017/cryptography/Notes15.txt | 279 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 279 insertions(+)
 create mode 100644 ss2017/cryptography/Notes15.txt

(limited to 'ss2017/cryptography/Notes15.txt')

diff --git a/ss2017/cryptography/Notes15.txt b/ss2017/cryptography/Notes15.txt
new file mode 100644
index 0000000..e76ada5
--- /dev/null
+++ b/ss2017/cryptography/Notes15.txt
@@ -0,0 +1,279 @@
+
+
+15 Factoring and computing discrete logarithms
+----------------------------------------------
+
+In this chapter we survey algorithms for factoring large numbers, in
+particular products of equal sized primes, and for computing discrete
+logarithms. These algorithms or rather their runtime provide lower
+bounds on the necessary key lengths for public key cryptography for it
+relies on the difficulty of these problems.
+
+For the sections about factoring we let N=pq be a product of two
+roughly equal sized primes and n = O(log(N)) be the length of N in
+binary. The task is to find p,q from N. 
+
+For the sections about discrete logarithms let G be a cyclic group of
+order q and g,y:G with g a generator. The task is to find x:Z_q such
+that g^x=y. 
+
+Trial division
+- - - - - - - -
+
+In order to factor N=pq, i.e. to find p or q given N, we can try out
+divisors d=1...sqrt(N). The runtime of this is polynomial in sqrt(N),
+hence takes time exponential in n. Once n>120, hence ||sqrt(N)||=~=60
+this is no longer feasible.
+
+
+Pollard's p-1 method
+- - - - - - - - - - -
+
+Pollard's p-1 method may help when p-1 is a product of powers of small
+primes. We let p_1,...p_k be the first k primes and construct
+
+              B_k = prod_{i=1..k}p_i^{n / ||p_i||} 
+
+The exponent n/||p_i|| is chosen so that if some power of p_i divides
+p-1 then that power is at most n/||p_i||. As a result, if p-1 is a
+product of powers of primes from among the set {p_1..p_k} then p-1 |
+B_k. If, in addition, q-1 does *not* divide B_k then we can find out p
+as we show shortly. In order that q-1 does *not* divide B_k we can
+repeatedly execute the method with increasing values for k hoping that
+at some point one of the two prime factors "works" and the other does
+not.
+
+OK so, now let B any number so that p-1 | B and q-1 does not divide
+B. For any x it holds that x^B mod p = 1 whereas with constant
+probability x^B mod q =/= 1. The former holds since the order of Z_p^*
+is p-1 and B is a multiple thereof. The latter holds since when x is a
+generator of Z_q^* then x^B mod q is not 1 and the proportion of those
+generators is about 50% (not hard to see or use Thm B.15 of KL).
+
+Now let y = x^B-1 mod N. We have y mod p = 0 and y mod q =/= 0. Thus,
+p divides y and q does not divide y so gcd(y,N)=p.
+
+The resulting algorithm is now as follows:
+
+for k=1,2,3,... do the following until either a factor of N is found
+or computing resources are exhausted.
+   Put B = B_k
+   for a certain number of times do the following
+        x<-Z_N
+        d = gcd(x^B-1 mod N,N)
+	if d=/=1 then a factor of N has been found.
+
+
+
+Pollard's rho method
+- - - - - - - - - - -
+
+This algorithm is based on the observation that once we have two
+distinct values x,y:Z_N such that x mod p = y mod p then p =
+gcd(x-y,N). Thus, factoring amounts to finding a collision in the
+"hash function" H(x) = x mod p. Notice that while we cannot compute
+this function we can detect whether x,y constitute a collision by gcd
+computation.
+
+Now, by the birthday paradox, a random set of size S := 2*sqrt(p) =
+O(N^{1/4}) contains such a collision with likelihood 50%. Finding it,
+requires S^2 collision tests thus, when implemented naively, this
+method is no better than trial division.
+
+However, the constant space method for collision detection described
+earlier only has a runtime of O(S), but of course, as already
+mentioned, we cannot compute H(). However, we can replace it with any
+function F which has the property that x = y (mod p) implies F(x)=F(y)
+(mod p) and which is such that for random x0 the set
+{x_0,F(x_0),F(F(x_0)),..., F^S(x_0)} is also sufficiently close to a
+random set so that it is likely to contain a collision. The function
+F(x)=x^2 + b for some b =/= 0, -2 mod N is believed to have this
+property and at any rate behaves well in practice. Just as in the
+earlier algorithm we have that F^i(x0) = F^j(x0) (mod p) implies
+F^k(x0) = F^{2k}(x0) (mod p) for some k.
+
+So, the set of the first S iterates of F() starting from random x0 is
+likely to contain a collision. But since once x,y form a collision so
+do F(x), F(y), it suffices to search for a collision among the pairs
+F^k(x0), F^{2k}(x0) which leads to the following algorithm:
+
+b <- Z_N \ {0,-2}
+F(x) = x^2+b mod N
+
+x <- Z_N
+y = x
+for k=1..2*sqrt(N) do
+  x = F(x); y=F(F(x))
+  if gcd(x-y,N) != 1 a factor is found. 
+
+repeat the whole procedure until either a factor is found or resources
+are exhausted.
+
+Assuming that the function F() has the required properties (which has
+not been proved to this date!) the runtime of this algorithm is
+proportional to N^{1/4} which is still exponential in n but requires
+one to double n if one wants to stay immune against this
+method.
+
+
+The quadratic sieve method
+- - - - - - - - - - - - - -
+
+The starting point for the most efficient factoring algorithms to this
+date is the following observation made by Pierre de Fermat:
+
+If we succeed in expressing N as the difference of two
+squares N=x^2-y^2 then N=(x+y)(x-y) so we have factored N.
+
+Fermat suggested to try out successive values for x starting at
+sqrt(N) and for each of those checking whether x^2-N is a square.
+
+The first improvement one can make on this observation is that it
+suffices that N divides x^2 - y^2 for if kN = (x+y)(x-y)
+then---assuming that x != (+/-)y mod N---the factors p and q of N
+must straddle the product so that gcd(x-y,N) or gcd(x+y,N) will be
+different from 1.
+
+Thus, we seek x,y:Z_N such that x^2-y^2 = 0 (mod N) and x !=y, x!=-y mod N. 
+
+Notice that by the Chinese Remainder Theorem each square has four
+roots modulo N, so given x, two out of the four roots of x^2 are good.
+
+Now, to find such x,y more efficiently we proceed as follows. We
+select a set B of small primes and try out various values x>sqrt(N) so
+that x^2 mod N can be factored (because after reduction mod N it is
+sufficiently small) and its factors are in B. In this way, we find
+numbers x_1 .. x_l such that each x_i^2 mod N is expressed as a
+product of primes in B.
+
+With each of these numbers x_i we associate a B-indexed 0,1 vector
+whose entry at p:B is the exponent of p in the factorisation of x_i
+modulo 2.
+
+For example, if B={2,3,5,11} and x_1^2 mod N is 2^5*3^2*11^2 then the
+associated vector is (1 0 0 0).
+
+If l exceeds the size of k then there is a (mod 2) linear combination
+of these vectors which results in the all zero vector. But multiplying
+the corresponding x_i together then yields a square because all prime
+factors have even exponents.
+
+Continuing the example, if x_2^2 mod N is 2^15*11, i.e. (1 0 0 1), and
+x_3^2 mod N is 11^13, i.e. (0 0 0 1) then, since 1000+1001+0001=0000,
+we obtain the following identity mod N:
+
+                      (x_1*x_2*x_3)^2 = (2^10 * 3 * 11^8)^2
+
+and, heuristically, we may assume that there is a 50% chance for this
+identity to be nontrivial in the sense that it yields a good pair of
+square roots.
+
+Summing up, we build up a large basis of values x whose squares mod N
+can be completely factored, then compute a linear combination (mod 2)
+of the zero vector from the corresponding B-indexed vectors and check
+whether the resulting identity of squares mod N is nontrivial which
+then yields the desired factor.
+
+We now study the problem of finding discrete logarithms in a cyclic
+group G as above. 
+
+
+The Pohlig-Hellman algorithm
+- - - - - - - - - - - - - - -
+
+If we have a factorisation of the group order q as prod_{i=1}^n q^i
+with the q_i relatively prime (gcd = 1) then, if we have values x_i
+such that
+
+                 (g^{q/q_i})^x_i = y^{q/q_i}
+
+we can find x by solving (using the Chinese Remainder Theorem) the
+system of modular equations
+
+                          x mod q_i = x_i
+
+Since g^{q/q_i} generates a subgroup of order q_i which is easier if
+the q_i are small. It remains to see why x satisfies the above system
+of equations. Indeed, if g^x=y then g^{q/q_i}^x = y^{q/q_i} so x mod
+q_i solves the discrete log problem of order q_i.
+
+It remains to see how to deal with groups whose order is a prime power
+q=p^f. In this case, one can expand the unknown x as
+
+     x = x_0 + x_1 * p + x_2 * p^2 + ... + x_{f-1} p^{f-1}
+
+Now, by exhaustive search or using any other algorithm we can
+determine x_0:Z_p such that g^{q/p}^{x_0} = y^{q/p} (mod p). Once we
+know x_0 we can then recursively solve (g^p)^{x'} = y/g^{x_0} which
+yields x_1,... The details are left as an exercise.
+
+
+The giant-step baby-step algorithm
+- - - - - - - - - - - - - - - - - - 
+
+Here the idea is that the unknown x can be decomposed as x =
+t*[sqrt(q)] + s for some s <sqrt(q).  Here [sqrt(q)] stands for the
+greatest integer <= sqrt(q). We then precompute a table of the values
+y_t := g^{t*[sqrt{q}]} for t=0..sqrt(q), this are the giant
+steps. Now, for each s we look for (using binary search or similar)
+for a t such that
+
+         g^{t*[sqrt(q)]} = y*g^{-s}
+
+Once we have found it we can output x = s*[sqrt(q)]+t. The runtime of
+this method is O(sqrt(q)) which again essentially halves the "bits of
+security".
+
+
+The index calculus method
+- - - - - - - - - - - - - 
+
+Unlike the earlier methods the index calculus makes special
+assumptions on the group G for which to solve the discrete logarithm
+problem. Namely, it must be the group Z_p^* for some prime p.  It can
+be generalised to the group of invertible elements in a finite field,
+but does not work in the case of ellipic curves, for instance.
+
+So, let us assume that G=Z_p^* and that g,y:Z_p^*. As before, we seek
+x such that g^x=y. As before, we write q=p-1 for the group order. 
+
+Let us write log_g(z) for the discrete logarithm of some value z:Z_p^*
+so that g^{log_g(z)} = z. The idea is to precompute a table of such
+logarithms. To that end, we choose values x_1..x_l such that g^{x_i}
+can be factored modulo p into small primes from a previously fixed set
+B={p_1,...,p_k}.From
+
+        g^{x_j} = prod_{i=1}^k p_i^{e_{j,i}} (mod p)
+
+we then obtain
+
+          x_j = sum_{i=1}^k e_{j,i} * log_g(p_i) (mod q)
+
+Now, assuming that l in relation to k was chosen large enough this
+system can be olved for the unknowns log_g(p_i). We thus have obtained
+the desired "table of logarithms".
+
+It allows us to easily compute the discrete logarithm of any number
+w:Z_p^* which can be factored into primes from B: if w = prod_{i=1}^k
+p_i^{e_i} then log_g(w) = sum_{i=1}^k e_i * log(p_i) (mod q).
+
+If y itself has that property we are obviously done. Otherwise, we can
+search for a value With the latter at hand we can then --- with luck
+--- take the logarithm of our y as follows: we choose some value z:Z_q
+at random such that y*g^z can be so factored and thus taken a
+logarithm of. This helps, since log_g(y) = log_g(y*g^z)-z.
+
+Using advanced number-theoretic methods one can analyse the
+probabilities that the various random choices in this method succeed
+and thus derive an expected runtime that is subexponential (in
+||q||). Indeed, the index calculus method is currently the best known
+method for computing discrete logarithms.
+
+We also remark that it is the possibility of factoring elements of G
+into primes what makes this algorithm possible. For other groups, in
+particular the multiplicative group of a finite field, analogous
+notions of "primes" can be found, but for groups derived from elliptic
+curves no such notion has been discovered and hence these groups are
+immune against index calculus making them particularly promising for
+cryptography. Having said that, taking discrete logarithms in a group
+Z_p^* where p has 1000+ digits remains unfeasible.
-- 
cgit v1.2.3