9 Cryptanalysis
---------------

Cryptanalysis is aimed at breaking ciphers in the sense of
reconstructing a key from given or chosen pairs of plain- and
ciphertext. In the first chapter, we saw how various historic ciphers
can be broken using regularities in the statistical distribution of
ciphertexts or rather using the fact that known properties of the
statistical distribution of plaintexts shine through in the
statistical distribution of plaintexts. Modern cryptographic schemes
avoid such attacks by making sure that the statistical distribution of
ciphertext looks close to uniform no matter what the statistical
distribution of the plaintexts is. The principles of confusion and
diffusion were introduced to ensure this property at least
approximately in practice. The notion of pseudo random generator
furnishes an ideal yet useful theoretical model.

Attacks against cryptographic schemes that properly implement
confusion and diffusion are much more complicated and in many cases,
in particular DES and AES not known to exist except in the sense that
the computational time needed to carry them out is close to that of a
brute-force attack (trying out all keys) which is always
possible. Nevertheless, it is important to know and study principles
of such attacks in order to avoid them when designing a new cipher,
e.g. a smaller variant of existing ones to be run in
resource-constrained environments e.g. "Internet of Things".

We have already seen that a cipher that only uses linear or affine
linear operations is vulnerable to an attack based on solving systems
of linear equations. In this chapter we take a brief look at three
more principles of cryptanalysis. First, we show why feedback shift
registers which are a popular device for generating pseudorandom
numbers are not secure.

Linear Feedback Shift Registers (LFSR): Such a device is given by a
sequence of shift registers with some outputs fed back via XOR to the
input. For instance:

       >[D]-o->[D]-o->[D]---o------------->
      |  |             |    |
       -(+)-----------(+)----

When the shift registers (here noted [D]) are initialised with some
bits then a stream of bits is generated at the output. E.g. starting from 100
we obtain the following sequence of states:

100,110,111,011,101,010,001,100,...
and the following output sequence 00111010...

Here, from xyz we get to uxy where u=x+z (mod 2).  Of course, the
sequence repeats after at most 8 steps, but until then looks rather
random. With a chain 16 shift registers one can achieve a period
length of 65535. In order to encrypt a stream of bits ("stream
cipher") one can initialise the shift registers with the secret key
and then add (XOR) the generated stream to the data stream. Due to the
relative simplicity of implementation this has been used for a long
time in esp. military applications.

However, LFSR are linear (the sequence generated by the sum of two
keys equals the sum of the sequences generated by the keys
individually) and this can be exploited in various ways. The most
interesting way uses the Berlekamp Massey algorithm which from a given
sequence of bits computes the shortest LFSR which generates it. If one
knows the beginning m of a plaintext and the corresponding ciphertext
c, e.g. because the plaintext starts with the date or some formal
greeting one can then apply the Berlekamp-Massey algorithm to m+c and
use the computed LFSR to decode the remaining ciphertext.

In order to circumvent this one can use nonlinear functions in the
feedback loop, e.g. logical or use nonlinear combinations of several
LFSR. The resulting stream ciphers, e.g. A5/1 and A5/2, E0 are still
in use in GSM and Bluetooth but are known to have vulnerabilities
too. Indeed, the design of secure stream ciphers is very difficult and
for high security applications one resorts to block ciphers run,
e.g. in CTR mode in order to encrypt bitstreams.


Linear cryptanalysis
- - - - - - - - - - - 

Linear cryptanalysis was developed in the 90s by Matsui and apparently
was not known to the DES developers. Nevertheless, it can only produce
a theoretical attack on the latter and on AES.

We illustrate linear cryptanalysis with the toy cryptosystem from
Michael Heys' tutorial
https://www.engr.mun.ca/~howard/PAPERS/ldc_tutorial.pdf.

It is an S/P network with a block length of 16 and four proper rounds,
i.e. four rounds and an additional key mixing step. We regard each
block as a 4x4 bit matrix. The substitution step applies the first
S-Box of DES S_1, i.e.

0 1 2 3 4 5 6 7 8 9 A B C D E F
-------------------------------
E 4 D 1 2 F B 8 3 A 6 C 5 9 0 7

separately to each of the four rows (regarded as a hexadecimal number). So
  
(1 0 1 0)             (A)         (6)          (0 1 1 0)
(0 1 1 0)   which is  (6) becomes (B) which is (1 0 1 1) 
(1 1 0 1)             (D)         (9)          (1 0 0 1)
(0 0 0 1)             (1)         (4)          (0 1 0 0)

The permutation step is simply matrix transposition, thus our example
becomes

(0 1 1 0)           (6)
(1 1 0 0)  which is (C)
(1 0 0 1)           (9)
(0 1 1 0)           (6)

As usual, each round consists of addition mod2 of the round key,
substitution (S), and permutation (P), i.e. transposition. After four
rounds we add one more key.  The master key simply consists of the
concatenation of the five round keys k = k1 k2 k3 k4 k5.

We can now observe that the S-box S is not completely balanced in the
following sense. Let WX1 be (1 0 1 1)^T and WY1 = (0 1 0 0)^T. The ^T
suffix means transposition so W1, W2 are actually columns. We now have

              Pr(WX1^T x + WY1^T S(x) = 0) = 3/4

where the probability is taken over uniformly distributed
x:{0,1}^4. In other words, for 12 out of the 16 possible inputs x=(x1
x2 x3 x4)^T and results S(x)=(y1 y2 y3 y4)^T one has x1+x3+x4=y2
whereas for the remaining four one has x1+x3+x4=/=y2,
i.e. x1+x3+x4=y2+1.

One can find such an imbalance by exhaustive search. Another such
imbalance is given by WX2=(0 1 0 0)^T WY2 = (0 1 0 1)^T:

              Pr(WX2^T x + WY2^T S(x) = 0) = 1/4

Chaining these imbalances through the whole cipher (using the first
one on round 1 and the second one on the other rounds) one obtains a
similar linear relationship between the plaintext, the ciphertext just
after round four, and the four round keys that have been used until
then: Letting P be the plaintext (now written as dim16 vector) and U
the ciphertext prior to applying the last S-boxes, permuting, and
mixing with the fifth and last round key we find dim16 vectors A, B
D1, D2, D3, D4 such that

    Pr(A^T P + B^T U + D1^T K1 + D2^T K2 + D3^T K3 + D4^T K4 = 0) = 15/32

with keys K1 ... K4 fixed and probability taken over the plaintexts P. 


Since D1^T K1 + D2^T K2 + D3^T K3 + D4^T K4 is either 0 or 1 we have that

     Prob(A^T P + B^T U = 0) = 15/32 or Prob(A^T P + B^T U = 0) = 17/32

irrespective of K1,..,K4. This allows us to work out a likely value
for K5 or rather some bits of it.

Namely, given a list of plaintexts P_1..P_N and corresponding
ciphertexts C_1..C_n we can try out all 2^16 possible choices for K_5,
for each one compute the corresponding U_i := S^{-1}(P^{-1}(C_i+K_5)),
and compute the ratio

   p(K_5) = {i | A^T P_i + B^T U_i = 0} / N 

We choose a value K_5 for which this ratio is close to 15/32 or 17/32
and discard all those where it is close to 1/2. Now, p(K_5) does not
actually depend on all bits of K_5 but only on those where the
corresponding bit in B^T is set for the other ones only influence
components of U_i which are not used in p(K_5). Nevertheless, this
allows us to considerably narrow down the possible options for
K_5. For each of those, we can then continue in the same fashion and
thus peel off the fourth round key etc.


Differential cryptanalysis
- - - - - - - - - - - - - -

Differential cryptanalysis was developed by Biham and Shamir as an
attack against DES but apparently already known to the DES
developers. It has been used to successfully break other ciphers. It
uses the fact that the effect of adding a round key cancels out when
we consider differences between two plaintexts. However, this
requires the preparation of chosen plaintexts. In more detail, the
*differential* between two plaintexts X1 and X2 is simply X1+X2,
i.e. the bitvector which has a 1 where X1 and X2 differ and a 0 where
they are equal.

Coming back to our 4bit example S-box for any given differential there
then are 2^4=16 different ways of realising it. E.g. the differential
1010 arises for the plaintext pair 0011,1001 and also for
1111,0101. Indeed, for any plaintext X1 there is a unique X2 such that
X1, X2 have a given differential.

Given two differentials X* and Y* we are interested in the following
question. For how many of the (16) plaintext pairs X1,X2 with
differential X* is it the case that the results of applying the S-box
Y1=S(X1), Y2=S(X2) have differential Y*. Tabulating these numbers by
exhaustive search yields the following table:

           Output differential
     0 1 2 3 4 5 6 7 8 9 A B C D E F 
I 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
n 1  0 0 0 2 0 0 0 2 0 2 4 0 4 2 0 0 
p 2  0 0 0 2 0 6 2 2 0 2 0 0 0 0 2 0
u 3  0 0 2 0 2 0 0 0 0 4 2 0 2 0 0 4
t 4  0 0 0 2 0 0 6 0 0 2 0 4 2 0 0 0 
  5  0 4 0 0 0 2 2 0 0 0 4 0 2 0 0 2
d 6  0 0 0 4 0 4 0 0 0 0 0 0 2 2 2 2 
i 7  0 0 2 2 2 0 2 0 0 2 2 0 0 0 0 4
f 8  0 0 0 0 0 0 2 2 0 0 0 4 0 4 2 2 
f 9  0 2 0 0 2 0 0 4 2 0 2 2 2 0 0 0 
' A  0 2 2 0 0 0 0 0 6 0 0 2 0 6 0 0 
i B  0 0 8 0 0 2 0 2 0 0 0 0 0 2 0 2 
a C  0 2 0 0 2 2 2 0 0 0 0 2 0 6 0 0 
l D  0 4 0 0 0 0 0 4 2 0 2 0 2 0 2 0 
  E  0 0 2 4 2 0 0 0 6 0 0 0 0 0 2 0   
  F  0 2 0 0 6 0 0 0 0 4 0 2 0 0 2 0


The high entries 8 and 6 are of particular interest because they
constitute a deviation of an intuitive pseudorandom permutation.
Let us now see how we can exploit these in order to attack our cipher.

We feed in a pair of 16bit plaintexts X1 X2 which differ in the second
block by B=1011 and are equal everywhere else.  With probability 8/16
this will result in differential 2=0010 at the output of the second
S-box. The output of the other S-boxes will be equal for X1 and X2.
This differential 2=0010 will be propagated through the P network and
results in a differential of 4=0100 at the entry of the third S-box of
the second round:

              Differential before P   Differential after P
                    0 0 0 0                     0 0 0 0 
                    0 0 0 0   ------P---->      0 0 1 0 
                    0 1 0 0                     0 0 0 0 
                    0 0 0 0                     0 0 0 0


There, with probability 6/16 it will produce differential 6=0110 at
the output. This results in differential 2=0010 at the input of the
second and third S-boxes of the third round. With probability 36/256
this will lead to differential 5 at both outputs which propagates to
differential 6=0110 at the entries to the second and fourth S-boxes of
the final round.

                     0 0 0 0                    0 0 0 0 
                     0 0 1 0  -------P----->    0 0 0 0 
                     0 0 1 0                    0 1 1 0 
		     0 0 0 0                    0 0 0 0

Summing up, when we encipher plaintexts with differential 0B00 we will
obtain with probability 1/2*3/8*9/64=27/1024 a differential 0606 at
the entry to the last round. We now have to set the relevant
components (second and fourth block) of the final key in such a way
that this ratio (after inverting the last layer of S-boxes) is
actually reproduced. This allows us (with high probability) to guess
hone half of the bits of the last round key. Continuing in this way
with other differentials combined with exhaustive search allows us to
undo the entire last round and then to proceed from there.