10 Message integrity, CBC-MAC (KL4.1-4.5)
-----------------------------------------

So far, we wanted to prevent an eavesdropper or a more active attacker
from getting to know the content of encrypted messages or at least
partial information about them.

Another important objective for cryptography is message integrity and
authenticity. We want to prevent the attacker from being able to
manipulate an encrypted message or from fabricating one such.

It is not the case that encryption automatically guarantees
integrity. For example, with the Vernam cipher and derived systems
flipping one bit in the ciphertext results in the same bit in the
plaintext being flipped. Also, randomly chosen ciphertexts correspond
to plaintexts albeit probably nonsensical ones. 


Message authentication codes
- - - - - - - - - - - - - - -

Definition: A message authentication code (MAC) consists of two
probabilistic polynomial time algorithms (Gen,Mac) where 

1. On input n (security parameter) Gen outputs a key k of length >= n 
2. Given key k and message m:{0,1}* the tag-generation algorithm Mac outputs
   t = Mac_k(m)

We assume (unlike in KL where this is a special case) that Mac is
deterministic.

If m is furthermore constrained to be from {0,1}^l(n) for some length
function l(n), e.g. l(n)=n then the MAC is called *fixed length*.    EOD

The intended usage is as follows: Alice and Bob share a secret key k
that has been obtained with Gen(). If Alice wants to send a message m
to Bob she could compute the tag t=Mac_k(m) and transmit (m,t) either
encrypted or not.

In order to be sure that the received message (m,t) was really from
Alice he can then compute t'=Mac_k(m) and check that t=t'. The MAC
should be designed in such a way that no adversary can fabricate valid
tags unless it knows the key so that if the verification succeeds Bob
can indeed be sure that the message was from Alice. The following
experiment formalizes this.


The message authentication experiment Mac-forge_{A,Pi}(n):

Pi = (Gen,Mac) a MAC, A an adversary, n the security parameter

1. A random key k is generated with Gen(n). 

2. The adversary is given n and oracle access to Mac_k(), i.e. it can
request arbitrary tags w.r.t k. The adversary eventually outputs a
pair (m,t). 

3. The output of the experiment is defined to be 1 if
  (1) Mac_k(m)=t and (2) m has not previously been queried by A.

Thus, the adversary must prepare a valid tag for a message it has not
previously requested a tag for. 

We define a MAC to be secure if for all adversaries 
the probability that Mac-forge yields 1 is negligible.

While this notion of security is indeed very strong there are some
attacks it cannot withstand:

1) Replay attacks. The adversary can store an authenticated message
(m,t) and send it at some later point. In order to prevent that one
can extend message with timestamps or similar.

2) Timing attacks. If the check t=t'? proceeds by comparing bits
starting from the left until the first mismatch is encountered (cf
strcmp() from the C-library) then the time it takes to perform the
check depends on the tags. This can be used to find out the bits of a
valid tag one by one. According to KL this kind of attack has been
used against an early version of the Xbox in order to install
unlicensed games. To prevent the attack one should make sure that the
computation time for the comparison is always the same.

Construction of a fixed-length MAC from a pseudorandom function:

Let F_k() be a pseudorandom function. The following then is a secure
fixed-length MAC.

1. Gen(): Given n choose k:{0,1}^n uniformly at random
2. Mac(): On input k and m output the tag t=F_k(m)

Theorem: The above MAC is secure.

Proof sketch: As in the other security proofs it suffices to show this for
truly random F (selected at the beginning of the experiment). The
passage to a pseudorandom F then incurs a negligible change in
probability. In order to produce a valid tag the adversary would have
to correctly guess a random value from {0,1}^n which has indeed
negligible probability.  QED


CBC-MAC
- - - - 

In order to extend a fixed-length MAC to variable length messages one
can break the message up into equal length blocks m=m_1,...m_l of
length n each possibly padding the last block with 0s and then
authenticate each block separately. In order that this is secure, one
must, however augment each block with

- the length l, 
- its sequence number i:{1..l}
- a random identifier r:{0,1}^n (the same for each block)

Thus, the blocks to be authenticated have length 4n each (assuming
that l and i fit into n bits) and an appropriate MAC must be
used. Alternatively, one can break m into blocks of size n/4....

These additional data prevent the adversary from dropping blocks at
the end (length), changing the order of blocks (sequence numbers),
mixing blocks from two different messages (random identifier).

One can show that this scheme is indeed secure, but it is rather
inefficient since the tags are 4 times as long as the message itself.

A more efficient method mimicks the CBC-mode encryption but is subtly
different:

Rather than an arbitrary MAC we need a pseudorandom function F_k,
i.e. a block cipher. We then decompose the message m into equal sized
blocks

m = m_1 .. m_l

as before (using padding if needed). We then *prepend* the message
with another block encoding the length l (which is thus assumed
smaller than 2^n). Then blocks are encrypted and XORed with the
encryption of the previous block prior to encryption. The encryption
of the last block then is the desired MAC:

t_0 = F_k(l)
t_{i+1} = F_k(t_i + m_i)
t = t_l

We will prove that this scheme is secure. Notice, however, that
seemingly innocuous modifications thwart its security:

First, let us see why we need to incorporate the length. Suppose we
would just put: 

t_1 = F_k(m_1)
t_2 = F_k(m_2+t_1)
...
t = t_l

Then, if t=Mac_k(m_1 m_2) and t'=Mac_k(m_3+t m_4) then t'=Mac_k(m_1
m_2 m_3 m_4) thus allowing us to forge a previously unqueried tag.

Now suppose that we would start with a random initialisation vector IV
and transmit it along with the tag:

t_0 = F_k(l+IV)
...

This would allow us to modify the length: If (IV,t) is the MAC of l,m
then (IV+l+l',t) is the MAC of l',m. 

It is also not ok to attach the length at the end rather than at the
beginning. Namely, suppose that we obtain the following tags where the
length is added in parenthesis for clarity.


ABCDE(5)  : t
XYZKL(5)  : t'
ABCDE5UVW(9) : t''

We then have that the tag of XYZKL5 U+t+t' VW(9) is also t''. 

We will now sketch a proof that CBC-MAC is indeed secure. We will show
the following theorem from which the claim follows by the standard
reasoning used in earlier security proofs.

Theorem: Fix a security parameter n. For g:{0,1}^n->{0,1}^n and m=m1
m2 ... mk write CBC_g(m) = CBC_g(m1,m2,...,mk) =

   g(ml+g(m_{k-1}+ ... g(m2 + g(m1)) ... )

Also let f:({0,1}^n)* -> {0,1}^n be an arbitrary function. Let D be an
adversary which is given oracle access to a function ({0,1}^n)* ->
{0,1}^n.

Suppose that D queries its oracle on a prefix-free set, i.e. if X and
Y are both queried then X is not a prefix of Y nor is Y a prefix of X.
One then has

   |Pr(D^{CBC_g()}(n) = 1) - Pr(D^{f()}(n) = 1)| = negl(n)

where probabilities are taken over g and f chosen uniformly at random.

To see why this implies the theorem we first consider that the
difference between CBC_g and CBC_{F_k} is negligible since F_k is
assumed to be pseudorandom. Furthermore, notice that in CBC-MAC proper
the function CBC_{F_k} is applied to a subset of P:={|m| m |
m:({0,1}^n)*} which itself is a prefix-free set. Thus, the whole
business with including the length and putting it into the first
position rather than anywhere else serves to ensure prefix-free-ness.

Proof of Theorem (sketch; for details see KL, 2nd ed): 

Let Q={X1..Xq} be the (prefix-free) set of queries (q = number of
queries) made by the distinguisher D to the oracle. Let l be the
maximum number of blocks in those queries. Notice that both q---the
number of queries---and l---their maximum length are polynomial in n
since the distinguished is assumed to run in polynomial time.  If D
queries X and X = m1 m2 ... mk then (in the case where the oracle is
instantiated with CBC_g()) the function g() will be evaluated at m1,
m2+g(m1), m3+g(m2+g(m1)), ...  . Write C_g(X) for this sequence (of
length < l). We say that Q exhibits a collision if among the
g-evaluations occurring in {C_g(X) | X:Q} one finds the situation
g(x)=g(x') where x =/= x'. We also consider it a collision if at some
point we have mi+g(x) = mj'+g(x') despite x=/=x' (and then necessarily
mi=/=mj').

Under the assumption that g() is random, the probability that such a
collision is exhibited is bounded by q^2 * l^2 * negl(n) for there are
<= q^2 * l^2 possibilities for where such a collision might occur and
the probability for a collision at a given position is negligible.
Since q and l are polynomial in n this probability is therefore
negligible itself.

Let us assume that no collision is exhibited. Now, for each X:Q let
v(X) be the last value in C_g(X). We have CBC_g(X) = g(v(X)).  The
values v(X) are pairwise distinct and do not agree with any of the
other values in {C_g(X) | X:Q}. This is where prefix-freeness is
used. As a result, since g() is random, the answers to the queries
g(v(X)) are uniformly random and thus adhere to the same distribution
as f(X) for the random f. Thus, the only advantage that the
distinguisher has stems from the possibility of a collision being
exhibited which, as we saw above, is negligible.


In conclusion, we note that an already existing implementation of
CBC-mode (for encryption) can be turned into an implementation of
CBC-MAC by

 - using IV=0
 - discarding all but the last blocks of the ciphertext
 - prepending the plaintext with its length

Mistakes in any one of these (e.g. using a random IV and transmitting
it, omitting the length or placing it elsewhere, transmitting other
blocks of the ciphertext, lead to insecurities. See the nice blog
article "Why I hate CBC-MAC").

https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/