7 Practical block ciphers. Principles: s/p- and Feistel networks (5.1-5.2)
- - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -- - - -

We now describe how pseudorandom functions and block ciphers are
practically constructed. The constructions given do not ensure the
pseudorandom property, and in fact do not even fit into the asymptotic
paradigm since the blocklength is typically fixed. Their design is
guided by heuristic principles and the practical evidence that so far
no attacks have been found.  As already mentioned, there are other,
less efficient constructions that merely require the existence of a
PRG, but again, the latter has not yet been shown to exist
unconditionally.

In this section, a block cipher (of key length n and block length l)
thus is simply a function F:{0,1}^n x {0,1}^l -> {0,1}^l such that F_k
given by F_k(m)=F(k,m) is invertible (a permutation) for each k.

Claude Shannon formulated two important principles guiding the design
of block ciphers:

Confusion: statistical regularities in the plaintexts, e.g. letter
frequencies or frequencies of n-grams should not show up in the
ciphertext.

Diffusion: changing one bit of the plaintext should (on average)
change a large proportion, e.g. one half of the bits in the
ciphertext.

These principles are not rigorously defined and should be seen rather
as guidelines. The following procedure shows intuitively how they
could be observed.

Let us say that the key comprises 16 byte-substitutions f_1, ... f_16
: {0,1}^8->{0,1}^8, e.g. various shifts or other operations. So k =
(f_1,..,f_16). A first attempt at constructing a block cipher of
length 128 would be as follows:

F_k(x) = f_1(x_1) ... f_16(x_16)

where x is decomposed into 16 consecutive 8-bit blocks (bytes) x_1
.. x_16.


This introduces confusion because each byte is subjected to a
different permutation. However, the cipher still has insufficient
diffusion since x1 only influences the first byte of the ciphertext
etc. This lack of diffusion implies in particular that with chosen
plaintexts we can learn parts of the value tables of individual key
substitutions. 

To remedy this, we subject the individual bits of the ciphertext to
some global, known permutation P:{1..128}->{1..128} of the positions
and repeat this entire procedure several times ("rounds"). After a few
rounds a large degree of diffusion will be reached ("avalanche
effect") provided that the permutation mixes across block boundaries
and that the substitutions in themselves already achieve a certain
degree of diffusion in the sense that each input bit influences
several output bits and vice versa.

The substitutions and permutations used in each round need not always
be the same which then leads to the following general pattern.
Repeatedly break the text into equal size blocks, e.g. bytes which are
subjected to individual substitutions. Afterwards, the bits are
shuffled across the entire text using a permutation of the
positions. The key determines the substitutions and permutations being
used. Some of them may be independent of the key.


S/P-networks
- - - - - - -

A substitution-permutation network (S/P-network) implements this idea
in principle, however, neither the substitutions f_i, nor the
permutations on positions depend on the key but are rather fixed
upfront. In this way, confusion and diffusion are maintained, but of
course key dependency needs to be introduced one way or another. This
happens by XOR-ing the intermediate results with parts of or a known
function of the key. The key itself is often called *master key* and
the portions of it used as XOR-masks in the different rounds are known
as *key schedule*. The substitution functions being used are commonly
known as *S-boxes*. The permutations on positions are called *mixing
permutations*. Thus, in an S/P-network the message is repeatedly
subjected to the following three steps:

a) XOR-ing with keys derived from the master key through a key schedule
b) blockwise application of the S-boxes
c) global application of a mixing permutation

There are a number of principles to be followed when designing an
S/P-network.

1. The S-boxes used *should be invertible* for otherwise one would not
obtain an invertible cipher in the end.

2. The S-boxes *should not be linear* functions in the sense of XOR
for then the entire cipher would be a linear function which opens it
up to a variety of attacks based on solving linear equations, see
below. In fact they should not even be close to linear functions in
the sense of Hamming distance. 

3. The S-boxes should implement the *avalanche effect*. Changing one
bit in the input changes at least two bits in the output no matter
what the other bits are set to. In addition, the mixing permutations
should spread the output bits of any given S-box, i.e. in one block to
different S-boxes, i.e. blocks in the next round.

It is not hard to see that after a modest (polynomial in the message)
number of rounds these design principles ensure sufficient diffusion
because after each round the number of bits affected by a single bit
flip in the input roughly doubles.


Attacks against badly designed S/P-networks
- - - - - - - - - - - - - -- - - - - - - - -

Suppose we have an S/P-network with only one round. Given a plaintext
m and a ciphertext c we can then work out the key as follows. Undoing
the (known!) mixing permutation and the S-boxes yields c' := m+k from
which we retrieve k by adding m since m+k+m = k.

Indeed, practical implementations of S/P-networks usually omit the
application of S-boxes and the mixing permutation in the last round
since it can be undone anyway as we just saw. (NB in the 2nd edition
of KL the S/P-networks are set up in this fashion from the beginning;
however the present approach looks more systematic)

Suppose now that we have an S/P-network with two rounds. Suppose for
the sake of concreteness that the cipher uses 16 S-boxes operating on
4-bit each so that the total message length is 64. Suppose
furthermore, that the key k consists of 128 bits of which the first
half is used in the first round and the other half in the
second. Given m and c we, being the attacker, first undo the
S/P-operation of the second round as before. Now consider any 4-bit
block in the output. It depends on at most four 4-bit blocks from the
input because in the worst case each bit comes from a different
S-box. We also know from the mixing permutation in the first round
which these four 4-bit blocks are. Its 16 bits are transformed into
the 4 bits of the block we're looking at using a known function and 16
+ 4 bits of the key. These can all be tried out in 2^20 =~= 10^6 time
which is altogether feasible. A similar argument applies to a three
round S/P-network, however the number of key bits influencing a four
bit block then goes up to 4+16+64 = 84 which is still a bit shorter
than 128, but cannot really be feasibly explored by brute force.

However, even after three rounds the S/P-network can be efficiently
distinguished from a truly random permutation. Namely, each output bit
is then not influenced by 44 of the input bits. A distinguisher can
then toggle these bits and see whether the observed bit changes. If it
does not for, say 10^9 out of the 2^44 possible combinations then the
possibility that the distinguisher deals with a truly random function
is extremely small so it has an advantage.

Once we use four or more rounds (and the mixing permutations as well
as the S-boxes) were well designed then the avalanche effect has fully
kicked in and attacks like the ones sketched are no longer possible.

Let us also see why linear S-boxes cannot be recommended: if all
S-boxes f are linear functions in the sense that f(x+y)=f(x)+f(y) then
for each key k the function Enc_k() is linear too, i.e., we have
Enc_k(x+y)=Enc_k(x)+Enc_k(y). In this case, there exists for each key
an nxn matrix A_k with 0,1 entries (n the message length) such that

                               Enc_k(m) = A_k m 

If we now have n^2 many pairs of plaintext-ciphertext we obtain n^2
linear equations which we can solve for the entries of A_k. Once A_k
is known then even if we do not know k itself we can decrypt arbitrary
ciphertexts encrypted with the same key using the inverse matrix. This
kind of attack applies to all linear encryption functions regardless
whether they are S/P-networks. It also applies to affine linear
ciphers, where Enc_k(m) = A_k m + b_k.


Feistel networks
- - - - - - - -

Feistel networks are an alternative approach to designing practical
pseudorandom permutations. It also operates in rounds and each round a
key-dependent *mixing function* f_i() is applied to the second half of
the current text which is then XOR-ed with its first half to become
the new second half. The new first half is the old second half. Thus,
if L_i and R_i denote the first and second halves of the text after
round i then we have

         L_0 R_0 := "plaintext"
	 L_{i+1} := R_i
	 R_{i+1} := L_i + f_i(k_i,R_i)

Here k_i is the round key used in the i-th round which is derived from
the master key through the key schedule as usual.

Notice that,

        R_i = L_{i+1}
	L_i = R_{i+1}+f_i(k_i,L_{i+1})

so that the Feistel network is invertible as desired no matter what
f_i() does.

One can show (KL6.4) that if the mixing functions f_i are pseudorandom
functions then the function defined by a four round Feistel network is
a pseudorandom permutation.