LinuxHDEncSettings.txt (20410B)
1 2 Linux hard disk encryption settings 3 4 This page intends to educate the reader about the existing weaknesses 5 of the public-IV on-disk format commonly used with cryptoloop and 6 dm-crypt (used in IV-plain mode). This page aims to facilitate risk 7 calculation when utilising Linux hard disk encryption. The attacks 8 presented on this page may pose a thread to you, but at the same time 9 may be totally irrelevant for others. At the end of this document, the 10 reader should be able to make a good choice according to his security 11 needs. 12 13 A good quote with respect to this topic is ''All security involves 14 trade-offs'' from Beyond Fear (Bruce Schneier). You should keep in mind 15 that perfect security is unachievable and by all means shouldn't be 16 your goal. For instance, when using pass phrase based cryptography, you 17 have to trust in that the underlying system is secure, the computer 18 system has not been tampered with, and nobody is watching you. The most 19 obvious weakness is the last one, but even if you make sure nobody nor 20 any camera is around, how about the keyboard you're typing on? Has it 21 been manipulated while you have been getting your lunch? 22 23 So security comes for a price, and the price when designing 24 cryptography security algorithms is performance. You will be introduced 25 to the fastest of all setups available, the "public-IV", which 26 sacrifices security properties for speed. After that we will talk about 27 ESSIV, the newest of IV modes implemented. It comes for a small price, 28 but it can deal with watermarking for a relatively small price. Then 29 you'll be introduced to the draft specifications of the Security in 30 Storage Working Group ([18]SISWG). Currently SISWG is considering EME 31 and LRW for standardisation. EME along with it's cousin CMC seems to 32 provide the best security level, but imposes additional encryption 33 steps. Plumb-IV is discussed only for reference, because it has the 34 same performance penalty as CMC, but in constrast suffers from 35 weaknesses of CBC encryption. 36 37 As convention, this document will use the term "blocks", when it 38 referes to a single block of plain or cipher text (usually 16 byte), 39 and will use the term "sectors", when it refers to a 512-byte wide hard 40 disk block. 41 42 CBC Mode: The basic level 43 44 Most hard disk encryption systems utilise CBC to encrypt bulk data. 45 Good descriptions on CBC and other common cipher modes are available at 46 * [19]Wikipedia 47 * [20]Connected: An Internet Encyclopedia 48 * [21]NIST: Recommendation for Block Cipher Modes of Operation (CBC 49 is at PDF Page 17) 50 51 Please make sure you're familiar with CBC before proceeding. 52 53 Since CBC encryption is a recursive algorithm, the encryption of the 54 n-th block requires the encryption of all preceding blocks, 0 till n-1. 55 Thus, if we would run the whole hard disk encryption in CBC mode, one 56 would have to re-encrypt the whole hard disk, if the first computation 57 step changed, this is, when the first plain text block changed. Of 58 course, this is an undesired property, therefore the CBC chaining is 59 cut every sector and restarted with a new initialisation vector (IV), 60 so we can encrypt sectors individually. The choice of the sector as 61 smallest unit matches with the smallest unit of hard disks, where a 62 sector is also atomic in terms of access. 63 64 For reference, I will give a formal definition of CBC encryption and 65 decryption. Note, that decryption is not recursive, in contrast to 66 encryption, since it's a function only of C[n-1] and C[n]. 67 Encryption: 68 C[-1] = IV 69 C[n] = E(P[n] ⊕ C[n-1]) 70 Decryption: 71 C[-1] = IV 72 P[n] = C[n-1] ⊕ D(C[n]) 73 The next sections will deal with how this IV is chosen. 74 75 The IV Modes 76 77 The "public-IV" 78 79 The IV for sector n is simply the 32-bit version of the number n 80 encoded in little-endian padded with zeros to the block-size of the 81 cipher used, if necessary. This is the most simple IV mode, but at the 82 same the most vulnerable. 83 84 ESSIV 85 86 E(Sector|Salt) IV, short ESSIV, derives the IV from key material via 87 encryption of the sector number with a hashed version of the key 88 material, the salt. ESSIV does not specify a particular hash algorithm, 89 but the digest size of the hash must be an accepted key size for the 90 block cipher in use. As the IV depends on a none public piece of 91 information, the key, the sequence of IV is not known, and the attacks 92 based on this can't be launched. 93 94 plumb IV 95 96 The IV is computed by hashing (or MAC-ing) the plain text from the 97 second block till the last. Additionally, the sector number and the key 98 are used as input as well. If a byte changes in the plain text of the 99 blocks 2 till n, the first block is influenced by the change of the IV. 100 As the first encryption effects all subsequent encryption steps due to 101 the nature of CBC, the whole sector is changed. 102 103 Decryption is possible because CBC is not recursive for decryption. The 104 prerequisites for a successful CBC decryption are two subsequent cipher 105 blocks. The former one is decrypted and the first one is XOR-ed into 106 the decryption result yielding the original plain text. Therefore 107 independent of the IV scheme, decryption is possible from the 2nd to 108 the last block. After the recovery of these plain text blocks, the IV 109 can be computed, and finally the first block can be decrypted as well. 110 111 The only weakness of this scheme is it's performance. It has to process 112 data twice: first for obtaining the IV, and then to produce the CBC 113 encryption with this IV. With the same performance penalty CMC is able 114 to achieve better security properties (CMC is discussed later), thus 115 plumb-IV will remain unimplemented. 116 117 The attack arsenal 118 119 Content leaks 120 121 This attack can be mounted against any system operating in CBC Mode. It 122 rests on the property, that in CBC decryption, the preceding cipher 123 block's influence is simple, that is, it's XORed into the plain text. 124 The preceding cipher block, C[n-1], is readily available on disk (for n 125 > 0) or may be deduced from the IV (for n = 0). If an attacker finds 126 two blocks with identical cipher text, he knows that both cipher texts 127 have been formed according to: 128 C[m] = E(P[m] ⊕ C[m-1] ) 129 C[n] = E(P[n] ⊕ C[n-1] ) 130 Since he found that C[m] = C[n], it holds 131 P[m] ⊕ C[m-1] = P[n] ⊕ C[n-1] 132 which can be rewritten as 133 C[m-1] ⊕ C[n-1] = P[n] ⊕ P[m] 134 The left hand side is known to the attacker by reading the preceding 135 cipher text from disk. If one of the blocks is the first block of a 136 sector, the IV must be examined instead (when it's available as it is 137 in public-IV). The attacker is now able to deduce the difference 138 between the plain texts by examining the difference of C[m-1] and 139 C[n-1]. If one of the plain text blocks happens to be zero, the 140 difference yields the original content of the other related plain text 141 block. 142 143 Another information is available to the attacker. Any succeeding 144 identical pair of cipher text, that follows the initial identical 145 cipher pair, is equal. No information about the content of those pairs 146 can be extracted, since the information is extracted from the 147 respective preceding cipher blocks, but those are all required to be 148 equal. 149 150 Let's have a look at the chance of succeeding with this attack. 151 Assuming the output of a cipher forms an uniform distribution, the 152 chance, p, of finding an identical block is 2^-blocksize. For instance, 153 p = 1/2^128 for a 128-bit cipher. Because the number of possible pairs 154 develops as an arithmetic series in n, the number of sectors, the 155 chance of not finding two identical blocks is given by 156 (1-p)^n(n-1)/2 157 As p is very small, but in contrast the power is very big, we apply the 158 logarithm to get meaningful answers, that is 159 n(n-1)/2 ln (1-p) 160 An example: The number of cipher blocks available on 200GB disk with 161 known C[n-1] is 200GB × 1024^2 KB/GB × 64/1KB ^1. Or in other words, a 162 128-bit block is 16 bytes, so the number of 16-byte blocks in a 200GB 163 hard disk is 13.4 billion. Therefore, n = 1.342e10. For a 128-bit 164 cipher, p = 2^-128. Hence, 165 ln(1-p) = -2.939e-39 166 n(n-1)/2 = 9.007e19 167 168 n(n-1)/2 ln (1-p) = -2.647e-19 169 1-e^-2.776e-13 = 2.647e-19 170 The last term is the chance of finding at least one pair of identical 171 cipher blocks. But how does this number grow in n? Obviously 172 exponentially. Plotting a few a decimal powers shows that the chance 173 for finding at least on identical cipher pair flips to 1 around n = 174 10^20 (n = 10^40 for a 256-bit cipher). This inflexion point is reached 175 for a 146 million TB storage (or a hundered thousand trillion trillions 176 TB storage for a 256-bit cipher). 177 178 ^1The blocks with available preceding cipher blocks is 62/1KB for all 179 non-public IV schemes, i.e. ESSIV/plumb IV 180 181 Data existence leak: The Watermark 182 183 No IV format discussed on this page allows the user to deny the 184 existence of encrypted data. Neither cryptoloop nor dm-crypt is an 185 implementation of a deniable cryptography system. But the problem is 186 more serious with public-IV. 187 188 With public IV and the predicable difference it introduces in the first 189 blocks of a sequence of plain text, data can be watermarked, which 190 means, the watermarked data is detectable even when the key has not 191 been recovered. As shown in the paragraph above, the existence of two 192 blocks with identical cipher text is very unlikely and coincidence can 193 be excluded, which is relevant when somebody tries to demonstrate 194 before the law that certain data is in an encrypted partition. 195 196 As the IV progresses with a foreseeable pattern and is guaranteed to 197 change the least significant bit ever step, we can build identical pair 198 of cipher text by writing three consecutive sectors each with a flipped 199 LSB relative to the previous. (The reason it's three instead of two is, 200 that the second least significant bit might change as well.) This 201 "public-IV"-driven CBC encryption will output exactly the same cipher 202 text for two consecutive sectors. An attacker can search the disk for 203 identical consecutive blocks to find the watermark. This can be done in 204 a single pass, and is much more feasible than finding to identical 205 blocks, that are scattered on the disk, as in the previous attack. A 206 few bits of information can be encoded into the watermarks, which might 207 serve as tag to prove the existence copyright infringing material. 208 209 A complete description of watermarking can be found in [22]Encrypted 210 Watermarks and Linux Laptop Security. The attack can be defeated by 211 using ESSIV. 212 213 Data modification leak 214 215 CBC encryption is recursive, so the n-th block depends on all previous 216 blocks. But the other way round would also be nice. Why? The weakness 217 becomes visible, if storage on a remote computer is used, or more 218 likely, the hard disk exhibits good forensic properties. The point is, 219 the attacker has to have access to preceding (in time) cipher text of a 220 sector, either by recording it from the network, or by using forensic 221 methods. 222 223 An attacker can now guess data modification patterns by examining the 224 historic data. If a sector is overwritten with a partial changed plain 225 text, there is an amount of bytes at the beginning, which are 226 unchanged. This point of change^2 is directly reflected in the cipher 227 text. So an attacker can deduce the point of the change in plain text 228 by finding the point where the cipher text starts to differ. 229 230 This weakness is present in public-IV and ESSIV. 231 232 ^2aligned to the cipher block size boundaries 233 234 Malleable plain text 235 236 The decryption structure of CBC is the source of this weakness. 237 Malleability (with respect to cryptography) is defined as a 238 modification of the cipher text that will resulting in a predictable 239 change in plain text. To put it formally, there is a function f(C), 240 that, if applied to the cipher text, C' = f(C), will result in a known 241 function f', which will predict the resulting plain text, P' = D(C'), 242 correctly assuming P is known, that is P' = f'(P). 243 244 As we can see in it's definition, CBC decryption depends on C[n-1]. An 245 attacker can flip arbitrary bits in the plain text by flipping bit in 246 C[n-1]. More formally^3, if 247 P = P[1] || P[2] || ... || P[i] || ... || P[n] 248 C = E[CBC](P) 249 C = C[1] || C[2] || ... || C[i-1] || ... || C[n] 250 the function 251 f(C[1] || ... || C[n]) = C[1] || ... || C[i-1] XOR M || ... || C[n] 252 follows the function f', which predicts the resulting plain text 253 correctly as, 254 f'(P[1] || ... || P[n]) = P[1] || ... || P[i] XOR M || ... || P[n] 255 The first block of the CBC cipher text stream is not malleable, because 256 it depends on the IV, which is not modifiable for an attacker. 257 258 ^3The IV parameter for E[CBC] has been intentionally omitted. 259 260 Movable 261 262 On the expense of one block decrypting to garbage, an attacker can move 263 around plain text as he likes. CBC decryption depends on two variables, 264 C[n-1] and C[n]. Both can be modified at free will. To make meaningful 265 modifications, an attacker has to replace the pair C[n-1] and C[n] with 266 other cipher text pair from disk. The first block C[n-1] will decrypt 267 to garbage, but the second block C[n] will yield a copy of the plain 268 text of the copied cipher block. This attack is also known as 269 copy&paste attack. This attack is mountable against any CBC setup. The 270 only limitation is, the first block, C[0], can't be replaced with 271 something meaningful, as C[-1] can't be modified, because it's the IV. 272 273 CMC and EME: Tweakable wide block cipher modes 274 275 CMC is a new chaining mode. It stands for ''CBC-Mask-CBC''. It works by 276 processing the data in three steps, first CBC, then masking the cipher 277 text, and then another CBC step, but this time backwards. The last step 278 introduces a dependency from the last block to the first block. The 279 authors of the CMC paper provide a prove for the security of this mode, 280 making a secure 128-bit cipher a secure 4096-bit cipher (sector size). 281 As in normal CBC, this scheme also takes an IV, but the authors call it 282 tweak. 283 284 EME is CMC's cousin. EME has also been authored by Haveli and Rogaway 285 as well been authored for the same purpose. The difference to CMC is, 286 that EME is parallelizable, that is, all operations of the underlying 287 cipher can be evaluated in parallel. To introduce an interdependency 288 among the resulting cipher blocks, the encryption happens in two 289 stages. Between these stages a mask is computed from all intermediate 290 blocks and applied to each intermediate block. This step causes an 291 interdependency among the cipher blocks. After applying the mask, 292 another encryption step diffuses the mask. 293 294 The interdependency among the resulting blocks allow CMC and EME to be 295 nonmovable, nonmalleable, to prevent content leaks and in-sector data 296 modification patterns. The tweaks are encrypted by both cipher modes, 297 thus both are nonwatermarkable. 298 299 For simplicity, the EME description above omitted the pre- and post- 300 whitening steps as well as the multiplications in GF(2^128). An 301 in-depth specification can be found at the [23]Cryptology ePrint 302 Archive. An applicable draft specification for EME-32-AES can be found 303 at [24]SISWG. I have written an EME-32-AES test implementation for 304 Linux 2.6. It's available [25]here. The CMC paper is available from the 305 [26]Cryptology ePrint Archive as well. 306 307 LRW: A tweakable narrow block cipher mode 308 309 EME as well as CMC are comparatively secure cipher modes, but heavy in 310 terms of performance. LRW tries to cope with most of security 311 requirements, and at the same time provide a good performance. LRW is a 312 narrow block cipher mode, that is, it operates only on one block, 313 instead of a whole sector. To make a cipher block tied to a location on 314 disk (to make it unmovable), a logical index is included in the 315 computation. For LRW you have to provide two keys, one for the cipher 316 and one for the cipher mode. The second key is multiplied with a 317 logical index under GF(2^128) and used as pre- and post- whitening for 318 encryption. With those whitening steps the block is effectively tied to 319 a logical index. The logical index is usually the absolute position on 320 disk measured with the block size of the cipher algorithm. The 321 different choice of the measuring unit is the only different between 322 the logical index and the public-IV. 323 324 The LRW draft is available from the [27]SISWG mailing list archive. 325 326 Summarising 327 328 The following table shows a comparison between the security properties 329 of different encryption setups and their computational costs. The 330 number of cipher calls, XOR operations and additional operations are 331 stated in terms of encryption blocks, n. 332 IV mode cipher mode content leaks watermarkable malleable movable 333 modification detection^5 cipher calls XOR ops additional op. 334 public-IV CBC Yes Yes Yes Yes Yes n n None 335 ESSIV CBC Yes No Yes Yes Yes n+1 n None 336 Plumb-IV1^4 CBC Yes No Yes Yes No 2n-1 2n None 337 public-IV CMC No No No No No 2n+1 2n+1 1 LW GF ⊗ 338 public-IV EME No No No No No 2n+1 5n 3n-1 LW GF ⊗ 339 public-IV LRW No No No No Yes n 2n n HW GF ⊗ 340 341 Legend: 342 * LW GF ⊗: light-weight Galois field multiplication, that is, a 343 multiplication with a constant x^2^i, which can be computed in 344 θ(1). 345 * HW GF ⊗: heavy-weight Galois field multiplication, that is, a 346 multiplication with an arbitrary constant, which can be computed in 347 θ(bits). 348 349 ^4plumb-IV1 uses CBC-MAC instead of hashing, so we can make a good 350 comparison with other ciphers in terms of cipher/XOR calls. 351 ^5detectable partial in-sector modification 352 __________________________________________________________________ 353 354 Clemens Fruhwirth, , also author of LUKS and ESSIV, porter of 355 cryptoloop, aes-i586 for 2.6., twofish-i586, and implementor of 356 EME-32-AES. This text is an excerpt of my diploma thesis. 357 358 This page has been reviewed by 359 360 Dr. Ernst Molitor 361 Arno Wagner 362 James Hughes , "Security in Storage Working Group" chair 363 364 Additional thanks to Pascal Brisset, for pointing out an error in the 365 Bernoulli estimation in an earlier version of this document, further 366 Adam J. Richter for pointing out an error in the KB/GB ratio. 367 368 Content and design, Copyright © 2004-2008 Clemens Fruhwirth, unless 369 stated otherwise 370 Original design by [28]haran | Additional art by [29]LinuxArt | | Blog 371 by [30]NanoBlogger 372 373 References 374 375 1. http://clemens.endorphin.org/ 376 2. http://clemens.endorphin.org/credits 377 3. http://clemens.endorphin.org/aboutme 378 4. http://clemens.endorphin.org/cryptography 379 5. http://blog.clemens.endorphin.org/ 380 6. http://clemens.endorphin.org/patches 381 7. http://clemens.endorphin.org/archive 382 8. http://clemens.endorphin.org/Cryptoloop_Migration_Guide 383 9. http://clemens.endorphin.org/LUKS 384 10. http://clemens.endorphin.org/AFsplitter 385 11. http://clemens.endorphin.org/lo-tracker 386 12. http://blog.clemens.endorphin.org/2008/12/luks-on-disk-format-revision-111.html 387 13. http://blog.clemens.endorphin.org/2008/11/xmonad-gridselect.html 388 14. http://blog.clemens.endorphin.org/2008/11/workaround-for-bittorrent-traffic.html 389 15. http://blog.clemens.endorphin.org/2008/09/i-love-lolcat-meme.html 390 16. http://blog.clemens.endorphin.org/2008/09/counter-steganography-research.html 391 17. http://clemens.endorphin.org/cryptography 392 18. http://www.siswg.org/ 393 19. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation 394 20. http://www.freesoft.org/CIE/Topics/143.htm 395 21. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf 396 22. http://www.tcs.hut.fi/~mjos/doc/wisa2004.pdf 397 23. http://eprint.iacr.org/2003/147/ 398 24. http://grouper.ieee.org/groups/1619/email/pdf00011.pdf 399 25. http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/544 400 26. http://eprint.iacr.org/2003/148/ 401 27. http://grouper.ieee.org/groups/1619/email/msg00160.html 402 28. http://www.oswd.org/user/profile/id/3013 403 29. http://www.linuxart.com/ 404 30. http://nanoblogger.sourceforge.net/