From 162df3190c4e179ef968b0a467ddd4951faba336 Mon Sep 17 00:00:00 2001
From: Gregor Kleen <gkleen@yggdrasil.li>
Date: Tue, 24 Oct 2017 15:33:07 +0200
Subject: Notes on cryptography

---
 ss2017/cryptography/Notes10.txt | 257 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 ss2017/cryptography/Notes10.txt

(limited to 'ss2017/cryptography/Notes10.txt')

diff --git a/ss2017/cryptography/Notes10.txt b/ss2017/cryptography/Notes10.txt
new file mode 100644
index 0000000..5457bf3
--- /dev/null
+++ b/ss2017/cryptography/Notes10.txt
@@ -0,0 +1,257 @@
+
+
+10 Message integrity, CBC-MAC (KL4.1-4.5)
+-----------------------------------------
+
+So far, we wanted to prevent an eavesdropper or a more active attacker
+from getting to know the content of encrypted messages or at least
+partial information about them.
+
+Another important objective for cryptography is message integrity and
+authenticity. We want to prevent the attacker from being able to
+manipulate an encrypted message or from fabricating one such.
+
+It is not the case that encryption automatically guarantees
+integrity. For example, with the Vernam cipher and derived systems
+flipping one bit in the ciphertext results in the same bit in the
+plaintext being flipped. Also, randomly chosen ciphertexts correspond
+to plaintexts albeit probably nonsensical ones. 
+
+
+Message authentication codes
+- - - - - - - - - - - - - - -
+
+Definition: A message authentication code (MAC) consists of two
+probabilistic polynomial time algorithms (Gen,Mac) where 
+
+1. On input n (security parameter) Gen outputs a key k of length >= n 
+2. Given key k and message m:{0,1}* the tag-generation algorithm Mac outputs
+   t = Mac_k(m)
+
+We assume (unlike in KL where this is a special case) that Mac is
+deterministic.
+
+If m is furthermore constrained to be from {0,1}^l(n) for some length
+function l(n), e.g. l(n)=n then the MAC is called *fixed length*.    EOD
+
+The intended usage is as follows: Alice and Bob share a secret key k
+that has been obtained with Gen(). If Alice wants to send a message m
+to Bob she could compute the tag t=Mac_k(m) and transmit (m,t) either
+encrypted or not.
+
+In order to be sure that the received message (m,t) was really from
+Alice he can then compute t'=Mac_k(m) and check that t=t'. The MAC
+should be designed in such a way that no adversary can fabricate valid
+tags unless it knows the key so that if the verification succeeds Bob
+can indeed be sure that the message was from Alice. The following
+experiment formalizes this.
+
+
+The message authentication experiment Mac-forge_{A,Pi}(n):
+
+Pi = (Gen,Mac) a MAC, A an adversary, n the security parameter
+
+1. A random key k is generated with Gen(n). 
+
+2. The adversary is given n and oracle access to Mac_k(), i.e. it can
+request arbitrary tags w.r.t k. The adversary eventually outputs a
+pair (m,t). 
+
+3. The output of the experiment is defined to be 1 if
+  (1) Mac_k(m)=t and (2) m has not previously been queried by A.
+
+Thus, the adversary must prepare a valid tag for a message it has not
+previously requested a tag for. 
+
+We define a MAC to be secure if for all adversaries 
+the probability that Mac-forge yields 1 is negligible.
+
+While this notion of security is indeed very strong there are some
+attacks it cannot withstand:
+
+1) Replay attacks. The adversary can store an authenticated message
+(m,t) and send it at some later point. In order to prevent that one
+can extend message with timestamps or similar.
+
+2) Timing attacks. If the check t=t'? proceeds by comparing bits
+starting from the left until the first mismatch is encountered (cf
+strcmp() from the C-library) then the time it takes to perform the
+check depends on the tags. This can be used to find out the bits of a
+valid tag one by one. According to KL this kind of attack has been
+used against an early version of the Xbox in order to install
+unlicensed games. To prevent the attack one should make sure that the
+computation time for the comparison is always the same.
+
+Construction of a fixed-length MAC from a pseudorandom function:
+
+Let F_k() be a pseudorandom function. The following then is a secure
+fixed-length MAC.
+
+1. Gen(): Given n choose k:{0,1}^n uniformly at random
+2. Mac(): On input k and m output the tag t=F_k(m)
+
+Theorem: The above MAC is secure.
+
+Proof sketch: As in the other security proofs it suffices to show this for
+truly random F (selected at the beginning of the experiment). The
+passage to a pseudorandom F then incurs a negligible change in
+probability. In order to produce a valid tag the adversary would have
+to correctly guess a random value from {0,1}^n which has indeed
+negligible probability.  QED
+
+
+CBC-MAC
+- - - - 
+
+In order to extend a fixed-length MAC to variable length messages one
+can break the message up into equal length blocks m=m_1,...m_l of
+length n each possibly padding the last block with 0s and then
+authenticate each block separately. In order that this is secure, one
+must, however augment each block with
+
+- the length l, 
+- its sequence number i:{1..l}
+- a random identifier r:{0,1}^n (the same for each block)
+
+Thus, the blocks to be authenticated have length 4n each (assuming
+that l and i fit into n bits) and an appropriate MAC must be
+used. Alternatively, one can break m into blocks of size n/4....
+
+These additional data prevent the adversary from dropping blocks at
+the end (length), changing the order of blocks (sequence numbers),
+mixing blocks from two different messages (random identifier).
+
+One can show that this scheme is indeed secure, but it is rather
+inefficient since the tags are 4 times as long as the message itself.
+
+A more efficient method mimicks the CBC-mode encryption but is subtly
+different:
+
+Rather than an arbitrary MAC we need a pseudorandom function F_k,
+i.e. a block cipher. We then decompose the message m into equal sized
+blocks
+
+m = m_1 .. m_l
+
+as before (using padding if needed). We then *prepend* the message
+with another block encoding the length l (which is thus assumed
+smaller than 2^n). Then blocks are encrypted and XORed with the
+encryption of the previous block prior to encryption. The encryption
+of the last block then is the desired MAC:
+
+t_0 = F_k(l)
+t_{i+1} = F_k(t_i + m_i)
+t = t_l
+
+We will prove that this scheme is secure. Notice, however, that
+seemingly innocuous modifications thwart its security:
+
+First, let us see why we need to incorporate the length. Suppose we
+would just put: 
+
+t_1 = F_k(m_1)
+t_2 = F_k(m_2+t_1)
+...
+t = t_l
+
+Then, if t=Mac_k(m_1 m_2) and t'=Mac_k(m_3+t m_4) then t'=Mac_k(m_1
+m_2 m_3 m_4) thus allowing us to forge a previously unqueried tag.
+
+Now suppose that we would start with a random initialisation vector IV
+and transmit it along with the tag:
+
+t_0 = F_k(l+IV)
+...
+
+This would allow us to modify the length: If (IV,t) is the MAC of l,m
+then (IV+l+l',t) is the MAC of l',m. 
+
+It is also not ok to attach the length at the end rather than at the
+beginning. Namely, suppose that we obtain the following tags where the
+length is added in parenthesis for clarity.
+
+
+ABCDE(5)  : t
+XYZKL(5)  : t'
+ABCDE5UVW(9) : t''
+
+We then have that the tag of XYZKL5 U+t+t' VW(9) is also t''. 
+
+We will now sketch a proof that CBC-MAC is indeed secure. We will show
+the following theorem from which the claim follows by the standard
+reasoning used in earlier security proofs.
+
+Theorem: Fix a security parameter n. For g:{0,1}^n->{0,1}^n and m=m1
+m2 ... mk write CBC_g(m) = CBC_g(m1,m2,...,mk) =
+
+   g(ml+g(m_{k-1}+ ... g(m2 + g(m1)) ... )
+
+Also let f:({0,1}^n)* -> {0,1}^n be an arbitrary function. Let D be an
+adversary which is given oracle access to a function ({0,1}^n)* ->
+{0,1}^n.
+
+Suppose that D queries its oracle on a prefix-free set, i.e. if X and
+Y are both queried then X is not a prefix of Y nor is Y a prefix of X.
+One then has
+
+   |Pr(D^{CBC_g()}(n) = 1) - Pr(D^{f()}(n) = 1)| = negl(n)
+
+where probabilities are taken over g and f chosen uniformly at random.
+
+To see why this implies the theorem we first consider that the
+difference between CBC_g and CBC_{F_k} is negligible since F_k is
+assumed to be pseudorandom. Furthermore, notice that in CBC-MAC proper
+the function CBC_{F_k} is applied to a subset of P:={|m| m |
+m:({0,1}^n)*} which itself is a prefix-free set. Thus, the whole
+business with including the length and putting it into the first
+position rather than anywhere else serves to ensure prefix-free-ness.
+
+Proof of Theorem (sketch; for details see KL, 2nd ed): 
+
+Let Q={X1..Xq} be the (prefix-free) set of queries (q = number of
+queries) made by the distinguisher D to the oracle. Let l be the
+maximum number of blocks in those queries. Notice that both q---the
+number of queries---and l---their maximum length are polynomial in n
+since the distinguished is assumed to run in polynomial time.  If D
+queries X and X = m1 m2 ... mk then (in the case where the oracle is
+instantiated with CBC_g()) the function g() will be evaluated at m1,
+m2+g(m1), m3+g(m2+g(m1)), ...  . Write C_g(X) for this sequence (of
+length < l). We say that Q exhibits a collision if among the
+g-evaluations occurring in {C_g(X) | X:Q} one finds the situation
+g(x)=g(x') where x =/= x'. We also consider it a collision if at some
+point we have mi+g(x) = mj'+g(x') despite x=/=x' (and then necessarily
+mi=/=mj').
+
+Under the assumption that g() is random, the probability that such a
+collision is exhibited is bounded by q^2 * l^2 * negl(n) for there are
+<= q^2 * l^2 possibilities for where such a collision might occur and
+the probability for a collision at a given position is negligible.
+Since q and l are polynomial in n this probability is therefore
+negligible itself.
+
+Let us assume that no collision is exhibited. Now, for each X:Q let
+v(X) be the last value in C_g(X). We have CBC_g(X) = g(v(X)).  The
+values v(X) are pairwise distinct and do not agree with any of the
+other values in {C_g(X) | X:Q}. This is where prefix-freeness is
+used. As a result, since g() is random, the answers to the queries
+g(v(X)) are uniformly random and thus adhere to the same distribution
+as f(X) for the random f. Thus, the only advantage that the
+distinguisher has stems from the possibility of a collision being
+exhibited which, as we saw above, is negligible.
+
+
+
+In conclusion, we note that an already existing implementation of
+CBC-mode (for encryption) can be turned into an implementation of
+CBC-MAC by
+
+ - using IV=0
+ - discarding all but the last blocks of the ciphertext
+ - prepending the plaintext with its length
+
+Mistakes in any one of these (e.g. using a random IV and transmitting
+it, omitting the length or placing it elsewhere, transmitting other
+blocks of the ciphertext, lead to insecurities. See the nice blog
+article "Why I hate CBC-MAC").
+
+https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/
-- 
cgit v1.2.3