10 Message integrity, CBC-MAC (KL4.1-4.5) ----------------------------------------- So far, we wanted to prevent an eavesdropper or a more active attacker from getting to know the content of encrypted messages or at least partial information about them. Another important objective for cryptography is message integrity and authenticity. We want to prevent the attacker from being able to manipulate an encrypted message or from fabricating one such. It is not the case that encryption automatically guarantees integrity. For example, with the Vernam cipher and derived systems flipping one bit in the ciphertext results in the same bit in the plaintext being flipped. Also, randomly chosen ciphertexts correspond to plaintexts albeit probably nonsensical ones. Message authentication codes - - - - - - - - - - - - - - - Definition: A message authentication code (MAC) consists of two probabilistic polynomial time algorithms (Gen,Mac) where 1. On input n (security parameter) Gen outputs a key k of length >= n 2. Given key k and message m:{0,1}* the tag-generation algorithm Mac outputs t = Mac_k(m) We assume (unlike in KL where this is a special case) that Mac is deterministic. If m is furthermore constrained to be from {0,1}^l(n) for some length function l(n), e.g. l(n)=n then the MAC is called *fixed length*. EOD The intended usage is as follows: Alice and Bob share a secret key k that has been obtained with Gen(). If Alice wants to send a message m to Bob she could compute the tag t=Mac_k(m) and transmit (m,t) either encrypted or not. In order to be sure that the received message (m,t) was really from Alice he can then compute t'=Mac_k(m) and check that t=t'. The MAC should be designed in such a way that no adversary can fabricate valid tags unless it knows the key so that if the verification succeeds Bob can indeed be sure that the message was from Alice. The following experiment formalizes this. The message authentication experiment Mac-forge_{A,Pi}(n): Pi = (Gen,Mac) a MAC, A an adversary, n the security parameter 1. A random key k is generated with Gen(n). 2. The adversary is given n and oracle access to Mac_k(), i.e. it can request arbitrary tags w.r.t k. The adversary eventually outputs a pair (m,t). 3. The output of the experiment is defined to be 1 if (1) Mac_k(m)=t and (2) m has not previously been queried by A. Thus, the adversary must prepare a valid tag for a message it has not previously requested a tag for. We define a MAC to be secure if for all adversaries the probability that Mac-forge yields 1 is negligible. While this notion of security is indeed very strong there are some attacks it cannot withstand: 1) Replay attacks. The adversary can store an authenticated message (m,t) and send it at some later point. In order to prevent that one can extend message with timestamps or similar. 2) Timing attacks. If the check t=t'? proceeds by comparing bits starting from the left until the first mismatch is encountered (cf strcmp() from the C-library) then the time it takes to perform the check depends on the tags. This can be used to find out the bits of a valid tag one by one. According to KL this kind of attack has been used against an early version of the Xbox in order to install unlicensed games. To prevent the attack one should make sure that the computation time for the comparison is always the same. Construction of a fixed-length MAC from a pseudorandom function: Let F_k() be a pseudorandom function. The following then is a secure fixed-length MAC. 1. Gen(): Given n choose k:{0,1}^n uniformly at random 2. Mac(): On input k and m output the tag t=F_k(m) Theorem: The above MAC is secure. Proof sketch: As in the other security proofs it suffices to show this for truly random F (selected at the beginning of the experiment). The passage to a pseudorandom F then incurs a negligible change in probability. In order to produce a valid tag the adversary would have to correctly guess a random value from {0,1}^n which has indeed negligible probability. QED CBC-MAC - - - - In order to extend a fixed-length MAC to variable length messages one can break the message up into equal length blocks m=m_1,...m_l of length n each possibly padding the last block with 0s and then authenticate each block separately. In order that this is secure, one must, however augment each block with - the length l, - its sequence number i:{1..l} - a random identifier r:{0,1}^n (the same for each block) Thus, the blocks to be authenticated have length 4n each (assuming that l and i fit into n bits) and an appropriate MAC must be used. Alternatively, one can break m into blocks of size n/4.... These additional data prevent the adversary from dropping blocks at the end (length), changing the order of blocks (sequence numbers), mixing blocks from two different messages (random identifier). One can show that this scheme is indeed secure, but it is rather inefficient since the tags are 4 times as long as the message itself. A more efficient method mimicks the CBC-mode encryption but is subtly different: Rather than an arbitrary MAC we need a pseudorandom function F_k, i.e. a block cipher. We then decompose the message m into equal sized blocks m = m_1 .. m_l as before (using padding if needed). We then *prepend* the message with another block encoding the length l (which is thus assumed smaller than 2^n). Then blocks are encrypted and XORed with the encryption of the previous block prior to encryption. The encryption of the last block then is the desired MAC: t_0 = F_k(l) t_{i+1} = F_k(t_i + m_i) t = t_l We will prove that this scheme is secure. Notice, however, that seemingly innocuous modifications thwart its security: First, let us see why we need to incorporate the length. Suppose we would just put: t_1 = F_k(m_1) t_2 = F_k(m_2+t_1) ... t = t_l Then, if t=Mac_k(m_1 m_2) and t'=Mac_k(m_3+t m_4) then t'=Mac_k(m_1 m_2 m_3 m_4) thus allowing us to forge a previously unqueried tag. Now suppose that we would start with a random initialisation vector IV and transmit it along with the tag: t_0 = F_k(l+IV) ... This would allow us to modify the length: If (IV,t) is the MAC of l,m then (IV+l+l',t) is the MAC of l',m. It is also not ok to attach the length at the end rather than at the beginning. Namely, suppose that we obtain the following tags where the length is added in parenthesis for clarity. ABCDE(5) : t XYZKL(5) : t' ABCDE5UVW(9) : t'' We then have that the tag of XYZKL5 U+t+t' VW(9) is also t''. We will now sketch a proof that CBC-MAC is indeed secure. We will show the following theorem from which the claim follows by the standard reasoning used in earlier security proofs. Theorem: Fix a security parameter n. For g:{0,1}^n->{0,1}^n and m=m1 m2 ... mk write CBC_g(m) = CBC_g(m1,m2,...,mk) = g(ml+g(m_{k-1}+ ... g(m2 + g(m1)) ... ) Also let f:({0,1}^n)* -> {0,1}^n be an arbitrary function. Let D be an adversary which is given oracle access to a function ({0,1}^n)* -> {0,1}^n. Suppose that D queries its oracle on a prefix-free set, i.e. if X and Y are both queried then X is not a prefix of Y nor is Y a prefix of X. One then has |Pr(D^{CBC_g()}(n) = 1) - Pr(D^{f()}(n) = 1)| = negl(n) where probabilities are taken over g and f chosen uniformly at random. To see why this implies the theorem we first consider that the difference between CBC_g and CBC_{F_k} is negligible since F_k is assumed to be pseudorandom. Furthermore, notice that in CBC-MAC proper the function CBC_{F_k} is applied to a subset of P:={|m| m | m:({0,1}^n)*} which itself is a prefix-free set. Thus, the whole business with including the length and putting it into the first position rather than anywhere else serves to ensure prefix-free-ness. Proof of Theorem (sketch; for details see KL, 2nd ed): Let Q={X1..Xq} be the (prefix-free) set of queries (q = number of queries) made by the distinguisher D to the oracle. Let l be the maximum number of blocks in those queries. Notice that both q---the number of queries---and l---their maximum length are polynomial in n since the distinguished is assumed to run in polynomial time. If D queries X and X = m1 m2 ... mk then (in the case where the oracle is instantiated with CBC_g()) the function g() will be evaluated at m1, m2+g(m1), m3+g(m2+g(m1)), ... . Write C_g(X) for this sequence (of length < l). We say that Q exhibits a collision if among the g-evaluations occurring in {C_g(X) | X:Q} one finds the situation g(x)=g(x') where x =/= x'. We also consider it a collision if at some point we have mi+g(x) = mj'+g(x') despite x=/=x' (and then necessarily mi=/=mj'). Under the assumption that g() is random, the probability that such a collision is exhibited is bounded by q^2 * l^2 * negl(n) for there are <= q^2 * l^2 possibilities for where such a collision might occur and the probability for a collision at a given position is negligible. Since q and l are polynomial in n this probability is therefore negligible itself. Let us assume that no collision is exhibited. Now, for each X:Q let v(X) be the last value in C_g(X). We have CBC_g(X) = g(v(X)). The values v(X) are pairwise distinct and do not agree with any of the other values in {C_g(X) | X:Q}. This is where prefix-freeness is used. As a result, since g() is random, the answers to the queries g(v(X)) are uniformly random and thus adhere to the same distribution as f(X) for the random f. Thus, the only advantage that the distinguisher has stems from the possibility of a collision being exhibited which, as we saw above, is negligible. In conclusion, we note that an already existing implementation of CBC-mode (for encryption) can be turned into an implementation of CBC-MAC by - using IV=0 - discarding all but the last blocks of the ciphertext - prepending the plaintext with its length Mistakes in any one of these (e.g. using a random IV and transmitting it, omitting the length or placing it elsewhere, transmitting other blocks of the ciphertext, lead to insecurities. See the nice blog article "Why I hate CBC-MAC"). https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/