7 Practical block ciphers. Principles: s/p- and Feistel networks (5.1-5.2) - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -- - - - We now describe how pseudorandom functions and block ciphers are practically constructed. The constructions given do not ensure the pseudorandom property, and in fact do not even fit into the asymptotic paradigm since the blocklength is typically fixed. Their design is guided by heuristic principles and the practical evidence that so far no attacks have been found. As already mentioned, there are other, less efficient constructions that merely require the existence of a PRG, but again, the latter has not yet been shown to exist unconditionally. In this section, a block cipher (of key length n and block length l) thus is simply a function F:{0,1}^n x {0,1}^l -> {0,1}^l such that F_k given by F_k(m)=F(k,m) is invertible (a permutation) for each k. Claude Shannon formulated two important principles guiding the design of block ciphers: Confusion: statistical regularities in the plaintexts, e.g. letter frequencies or frequencies of n-grams should not show up in the ciphertext. Diffusion: changing one bit of the plaintext should (on average) change a large proportion, e.g. one half of the bits in the ciphertext. These principles are not rigorously defined and should be seen rather as guidelines. The following procedure shows intuitively how they could be observed. Let us say that the key comprises 16 byte-substitutions f_1, ... f_16 : {0,1}^8->{0,1}^8, e.g. various shifts or other operations. So k = (f_1,..,f_16). A first attempt at constructing a block cipher of length 128 would be as follows: F_k(x) = f_1(x_1) ... f_16(x_16) where x is decomposed into 16 consecutive 8-bit blocks (bytes) x_1 .. x_16. This introduces confusion because each byte is subjected to a different permutation. However, the cipher still has insufficient diffusion since x1 only influences the first byte of the ciphertext etc. This lack of diffusion implies in particular that with chosen plaintexts we can learn parts of the value tables of individual key substitutions. To remedy this, we subject the individual bits of the ciphertext to some global, known permutation P:{1..128}->{1..128} of the positions and repeat this entire procedure several times ("rounds"). After a few rounds a large degree of diffusion will be reached ("avalanche effect") provided that the permutation mixes across block boundaries and that the substitutions in themselves already achieve a certain degree of diffusion in the sense that each input bit influences several output bits and vice versa. The substitutions and permutations used in each round need not always be the same which then leads to the following general pattern. Repeatedly break the text into equal size blocks, e.g. bytes which are subjected to individual substitutions. Afterwards, the bits are shuffled across the entire text using a permutation of the positions. The key determines the substitutions and permutations being used. Some of them may be independent of the key. S/P-networks - - - - - - - A substitution-permutation network (S/P-network) implements this idea in principle, however, neither the substitutions f_i, nor the permutations on positions depend on the key but are rather fixed upfront. In this way, confusion and diffusion are maintained, but of course key dependency needs to be introduced one way or another. This happens by XOR-ing the intermediate results with parts of or a known function of the key. The key itself is often called *master key* and the portions of it used as XOR-masks in the different rounds are known as *key schedule*. The substitution functions being used are commonly known as *S-boxes*. The permutations on positions are called *mixing permutations*. Thus, in an S/P-network the message is repeatedly subjected to the following three steps: a) XOR-ing with keys derived from the master key through a key schedule b) blockwise application of the S-boxes c) global application of a mixing permutation There are a number of principles to be followed when designing an S/P-network. 1. The S-boxes used *should be invertible* for otherwise one would not obtain an invertible cipher in the end. 2. The S-boxes *should not be linear* functions in the sense of XOR for then the entire cipher would be a linear function which opens it up to a variety of attacks based on solving linear equations, see below. In fact they should not even be close to linear functions in the sense of Hamming distance. 3. The S-boxes should implement the *avalanche effect*. Changing one bit in the input changes at least two bits in the output no matter what the other bits are set to. In addition, the mixing permutations should spread the output bits of any given S-box, i.e. in one block to different S-boxes, i.e. blocks in the next round. It is not hard to see that after a modest (polynomial in the message) number of rounds these design principles ensure sufficient diffusion because after each round the number of bits affected by a single bit flip in the input roughly doubles. Attacks against badly designed S/P-networks - - - - - - - - - - - - - -- - - - - - - - - Suppose we have an S/P-network with only one round. Given a plaintext m and a ciphertext c we can then work out the key as follows. Undoing the (known!) mixing permutation and the S-boxes yields c' := m+k from which we retrieve k by adding m since m+k+m = k. Indeed, practical implementations of S/P-networks usually omit the application of S-boxes and the mixing permutation in the last round since it can be undone anyway as we just saw. (NB in the 2nd edition of KL the S/P-networks are set up in this fashion from the beginning; however the present approach looks more systematic) Suppose now that we have an S/P-network with two rounds. Suppose for the sake of concreteness that the cipher uses 16 S-boxes operating on 4-bit each so that the total message length is 64. Suppose furthermore, that the key k consists of 128 bits of which the first half is used in the first round and the other half in the second. Given m and c we, being the attacker, first undo the S/P-operation of the second round as before. Now consider any 4-bit block in the output. It depends on at most four 4-bit blocks from the input because in the worst case each bit comes from a different S-box. We also know from the mixing permutation in the first round which these four 4-bit blocks are. Its 16 bits are transformed into the 4 bits of the block we're looking at using a known function and 16 + 4 bits of the key. These can all be tried out in 2^20 =~= 10^6 time which is altogether feasible. A similar argument applies to a three round S/P-network, however the number of key bits influencing a four bit block then goes up to 4+16+64 = 84 which is still a bit shorter than 128, but cannot really be feasibly explored by brute force. However, even after three rounds the S/P-network can be efficiently distinguished from a truly random permutation. Namely, each output bit is then not influenced by 44 of the input bits. A distinguisher can then toggle these bits and see whether the observed bit changes. If it does not for, say 10^9 out of the 2^44 possible combinations then the possibility that the distinguisher deals with a truly random function is extremely small so it has an advantage. Once we use four or more rounds (and the mixing permutations as well as the S-boxes) were well designed then the avalanche effect has fully kicked in and attacks like the ones sketched are no longer possible. Let us also see why linear S-boxes cannot be recommended: if all S-boxes f are linear functions in the sense that f(x+y)=f(x)+f(y) then for each key k the function Enc_k() is linear too, i.e., we have Enc_k(x+y)=Enc_k(x)+Enc_k(y). In this case, there exists for each key an nxn matrix A_k with 0,1 entries (n the message length) such that Enc_k(m) = A_k m If we now have n^2 many pairs of plaintext-ciphertext we obtain n^2 linear equations which we can solve for the entries of A_k. Once A_k is known then even if we do not know k itself we can decrypt arbitrary ciphertexts encrypted with the same key using the inverse matrix. This kind of attack applies to all linear encryption functions regardless whether they are S/P-networks. It also applies to affine linear ciphers, where Enc_k(m) = A_k m + b_k. Feistel networks - - - - - - - - Feistel networks are an alternative approach to designing practical pseudorandom permutations. It also operates in rounds and each round a key-dependent *mixing function* f_i() is applied to the second half of the current text which is then XOR-ed with its first half to become the new second half. The new first half is the old second half. Thus, if L_i and R_i denote the first and second halves of the text after round i then we have L_0 R_0 := "plaintext" L_{i+1} := R_i R_{i+1} := L_i + f_i(k_i,R_i) Here k_i is the round key used in the i-th round which is derived from the master key through the key schedule as usual. Notice that, R_i = L_{i+1} L_i = R_{i+1}+f_i(k_i,L_{i+1}) so that the Feistel network is invertible as desired no matter what f_i() does. One can show (KL6.4) that if the mixing functions f_i are pseudorandom functions then the function defined by a four round Feistel network is a pseudorandom permutation.