tomb

the crypto undertaker
git clone git://parazyd.org/tomb.git
Log | Files | Refs | README | LICENSE

LinuxHDEncSettings.txt (20410B)


      1 
      2 Linux hard disk encryption settings
      3 
      4    This page intends to educate the reader about the existing weaknesses
      5    of the public-IV on-disk format commonly used with cryptoloop and
      6    dm-crypt (used in IV-plain mode). This page aims to facilitate risk
      7    calculation when utilising Linux hard disk encryption. The attacks
      8    presented on this page may pose a thread to you, but at the same time
      9    may be totally irrelevant for others. At the end of this document, the
     10    reader should be able to make a good choice according to his security
     11    needs.
     12 
     13    A good quote with respect to this topic is ''All security involves
     14    trade-offs'' from Beyond Fear (Bruce Schneier). You should keep in mind
     15    that perfect security is unachievable and by all means shouldn't be
     16    your goal. For instance, when using pass phrase based cryptography, you
     17    have to trust in that the underlying system is secure, the computer
     18    system has not been tampered with, and nobody is watching you. The most
     19    obvious weakness is the last one, but even if you make sure nobody nor
     20    any camera is around, how about the keyboard you're typing on? Has it
     21    been manipulated while you have been getting your lunch?
     22 
     23    So security comes for a price, and the price when designing
     24    cryptography security algorithms is performance. You will be introduced
     25    to the fastest of all setups available, the "public-IV", which
     26    sacrifices security properties for speed. After that we will talk about
     27    ESSIV, the newest of IV modes implemented. It comes for a small price,
     28    but it can deal with watermarking for a relatively small price. Then
     29    you'll be introduced to the draft specifications of the Security in
     30    Storage Working Group ([18]SISWG). Currently SISWG is considering EME
     31    and LRW for standardisation. EME along with it's cousin CMC seems to
     32    provide the best security level, but imposes additional encryption
     33    steps. Plumb-IV is discussed only for reference, because it has the
     34    same performance penalty as CMC, but in constrast suffers from
     35    weaknesses of CBC encryption.
     36 
     37    As convention, this document will use the term "blocks", when it
     38    referes to a single block of plain or cipher text (usually 16 byte),
     39    and will use the term "sectors", when it refers to a 512-byte wide hard
     40    disk block.
     41 
     42 CBC Mode: The basic level
     43 
     44    Most hard disk encryption systems utilise CBC to encrypt bulk data.
     45    Good descriptions on CBC and other common cipher modes are available at
     46      * [19]Wikipedia
     47      * [20]Connected: An Internet Encyclopedia
     48      * [21]NIST: Recommendation for Block Cipher Modes of Operation (CBC
     49        is at PDF Page 17)
     50 
     51    Please make sure you're familiar with CBC before proceeding.
     52 
     53    Since CBC encryption is a recursive algorithm, the encryption of the
     54    n-th block requires the encryption of all preceding blocks, 0 till n-1.
     55    Thus, if we would run the whole hard disk encryption in CBC mode, one
     56    would have to re-encrypt the whole hard disk, if the first computation
     57    step changed, this is, when the first plain text block changed. Of
     58    course, this is an undesired property, therefore the CBC chaining is
     59    cut every sector and restarted with a new initialisation vector (IV),
     60    so we can encrypt sectors individually. The choice of the sector as
     61    smallest unit matches with the smallest unit of hard disks, where a
     62    sector is also atomic in terms of access.
     63 
     64    For reference, I will give a formal definition of CBC encryption and
     65    decryption. Note, that decryption is not recursive, in contrast to
     66    encryption, since it's a function only of C[n-1] and C[n].
     67    Encryption:
     68    C[-1] = IV
     69    C[n] = E(P[n] ⊕ C[n-1])
     70    Decryption:
     71    C[-1] = IV
     72    P[n] = C[n-1] ⊕ D(C[n])
     73    The next sections will deal with how this IV is chosen.
     74 
     75 The IV Modes
     76 
     77 The "public-IV"
     78 
     79    The IV for sector n is simply the 32-bit version of the number n
     80    encoded in little-endian padded with zeros to the block-size of the
     81    cipher used, if necessary. This is the most simple IV mode, but at the
     82    same the most vulnerable.
     83 
     84 ESSIV
     85 
     86    E(Sector|Salt) IV, short ESSIV, derives the IV from key material via
     87    encryption of the sector number with a hashed version of the key
     88    material, the salt. ESSIV does not specify a particular hash algorithm,
     89    but the digest size of the hash must be an accepted key size for the
     90    block cipher in use. As the IV depends on a none public piece of
     91    information, the key, the sequence of IV is not known, and the attacks
     92    based on this can't be launched.
     93 
     94 plumb IV
     95 
     96    The IV is computed by hashing (or MAC-ing) the plain text from the
     97    second block till the last. Additionally, the sector number and the key
     98    are used as input as well. If a byte changes in the plain text of the
     99    blocks 2 till n, the first block is influenced by the change of the IV.
    100    As the first encryption effects all subsequent encryption steps due to
    101    the nature of CBC, the whole sector is changed.
    102 
    103    Decryption is possible because CBC is not recursive for decryption. The
    104    prerequisites for a successful CBC decryption are two subsequent cipher
    105    blocks. The former one is decrypted and the first one is XOR-ed into
    106    the decryption result yielding the original plain text. Therefore
    107    independent of the IV scheme, decryption is possible from the 2nd to
    108    the last block. After the recovery of these plain text blocks, the IV
    109    can be computed, and finally the first block can be decrypted as well.
    110 
    111    The only weakness of this scheme is it's performance. It has to process
    112    data twice: first for obtaining the IV, and then to produce the CBC
    113    encryption with this IV. With the same performance penalty CMC is able
    114    to achieve better security properties (CMC is discussed later), thus
    115    plumb-IV will remain unimplemented.
    116 
    117 The attack arsenal
    118 
    119 Content leaks
    120 
    121    This attack can be mounted against any system operating in CBC Mode. It
    122    rests on the property, that in CBC decryption, the preceding cipher
    123    block's influence is simple, that is, it's XORed into the plain text.
    124    The preceding cipher block, C[n-1], is readily available on disk (for n
    125    > 0) or may be deduced from the IV (for n = 0). If an attacker finds
    126    two blocks with identical cipher text, he knows that both cipher texts
    127    have been formed according to:
    128    C[m] = E(P[m] ⊕ C[m-1] )
    129    C[n] = E(P[n] ⊕ C[n-1] )
    130    Since he found that C[m] = C[n], it holds
    131    P[m] ⊕ C[m-1] = P[n] ⊕ C[n-1]
    132    which can be rewritten as
    133    C[m-1] ⊕ C[n-1] = P[n] ⊕ P[m]
    134    The left hand side is known to the attacker by reading the preceding
    135    cipher text from disk. If one of the blocks is the first block of a
    136    sector, the IV must be examined instead (when it's available as it is
    137    in public-IV). The attacker is now able to deduce the difference
    138    between the plain texts by examining the difference of C[m-1] and
    139    C[n-1]. If one of the plain text blocks happens to be zero, the
    140    difference yields the original content of the other related plain text
    141    block.
    142 
    143    Another information is available to the attacker. Any succeeding
    144    identical pair of cipher text, that follows the initial identical
    145    cipher pair, is equal. No information about the content of those pairs
    146    can be extracted, since the information is extracted from the
    147    respective preceding cipher blocks, but those are all required to be
    148    equal.
    149 
    150    Let's have a look at the chance of succeeding with this attack.
    151    Assuming the output of a cipher forms an uniform distribution, the
    152    chance, p, of finding an identical block is 2^-blocksize. For instance,
    153    p = 1/2^128 for a 128-bit cipher. Because the number of possible pairs
    154    develops as an arithmetic series in n, the number of sectors, the
    155    chance of not finding two identical blocks is given by
    156    (1-p)^n(n-1)/2
    157    As p is very small, but in contrast the power is very big, we apply the
    158    logarithm to get meaningful answers, that is
    159    n(n-1)/2 ln (1-p)
    160    An example: The number of cipher blocks available on 200GB disk with
    161    known C[n-1] is 200GB × 1024^2 KB/GB × 64/1KB ^1. Or in other words, a
    162    128-bit block is 16 bytes, so the number of 16-byte blocks in a 200GB
    163    hard disk is 13.4 billion. Therefore, n = 1.342e10. For a 128-bit
    164    cipher, p = 2^-128. Hence,
    165    ln(1-p) = -2.939e-39
    166    n(n-1)/2 = 9.007e19
    167 
    168    n(n-1)/2 ln (1-p) = -2.647e-19
    169    1-e^-2.776e-13 = 2.647e-19
    170    The last term is the chance of finding at least one pair of identical
    171    cipher blocks. But how does this number grow in n? Obviously
    172    exponentially. Plotting a few a decimal powers shows that the chance
    173    for finding at least on identical cipher pair flips to 1 around n =
    174    10^20 (n = 10^40 for a 256-bit cipher). This inflexion point is reached
    175    for a 146 million TB storage (or a hundered thousand trillion trillions
    176    TB storage for a 256-bit cipher).
    177 
    178    ^1The blocks with available preceding cipher blocks is 62/1KB for all
    179    non-public IV schemes, i.e. ESSIV/plumb IV
    180 
    181 Data existence leak: The Watermark
    182 
    183    No IV format discussed on this page allows the user to deny the
    184    existence of encrypted data. Neither cryptoloop nor dm-crypt is an
    185    implementation of a deniable cryptography system. But the problem is
    186    more serious with public-IV.
    187 
    188    With public IV and the predicable difference it introduces in the first
    189    blocks of a sequence of plain text, data can be watermarked, which
    190    means, the watermarked data is detectable even when the key has not
    191    been recovered. As shown in the paragraph above, the existence of two
    192    blocks with identical cipher text is very unlikely and coincidence can
    193    be excluded, which is relevant when somebody tries to demonstrate
    194    before the law that certain data is in an encrypted partition.
    195 
    196    As the IV progresses with a foreseeable pattern and is guaranteed to
    197    change the least significant bit ever step, we can build identical pair
    198    of cipher text by writing three consecutive sectors each with a flipped
    199    LSB relative to the previous. (The reason it's three instead of two is,
    200    that the second least significant bit might change as well.) This
    201    "public-IV"-driven CBC encryption will output exactly the same cipher
    202    text for two consecutive sectors. An attacker can search the disk for
    203    identical consecutive blocks to find the watermark. This can be done in
    204    a single pass, and is much more feasible than finding to identical
    205    blocks, that are scattered on the disk, as in the previous attack. A
    206    few bits of information can be encoded into the watermarks, which might
    207    serve as tag to prove the existence copyright infringing material.
    208 
    209    A complete description of watermarking can be found in [22]Encrypted
    210    Watermarks and Linux Laptop Security. The attack can be defeated by
    211    using ESSIV.
    212 
    213 Data modification leak
    214 
    215    CBC encryption is recursive, so the n-th block depends on all previous
    216    blocks. But the other way round would also be nice. Why? The weakness
    217    becomes visible, if storage on a remote computer is used, or more
    218    likely, the hard disk exhibits good forensic properties. The point is,
    219    the attacker has to have access to preceding (in time) cipher text of a
    220    sector, either by recording it from the network, or by using forensic
    221    methods.
    222 
    223    An attacker can now guess data modification patterns by examining the
    224    historic data. If a sector is overwritten with a partial changed plain
    225    text, there is an amount of bytes at the beginning, which are
    226    unchanged. This point of change^2 is directly reflected in the cipher
    227    text. So an attacker can deduce the point of the change in plain text
    228    by finding the point where the cipher text starts to differ.
    229 
    230    This weakness is present in public-IV and ESSIV.
    231 
    232    ^2aligned to the cipher block size boundaries
    233 
    234 Malleable plain text
    235 
    236    The decryption structure of CBC is the source of this weakness.
    237    Malleability (with respect to cryptography) is defined as a
    238    modification of the cipher text that will resulting in a predictable
    239    change in plain text. To put it formally, there is a function f(C),
    240    that, if applied to the cipher text, C' = f(C), will result in a known
    241    function f', which will predict the resulting plain text, P' = D(C'),
    242    correctly assuming P is known, that is P' = f'(P).
    243 
    244    As we can see in it's definition, CBC decryption depends on C[n-1]. An
    245    attacker can flip arbitrary bits in the plain text by flipping bit in
    246    C[n-1]. More formally^3, if
    247    P = P[1] || P[2] || ... || P[i] || ... || P[n]
    248    C = E[CBC](P)
    249    C = C[1] || C[2] || ... || C[i-1] || ... || C[n]
    250    the function
    251    f(C[1] || ... || C[n]) = C[1] || ... || C[i-1] XOR M || ... || C[n]
    252    follows the function f', which predicts the resulting plain text
    253    correctly as,
    254    f'(P[1] || ... || P[n]) = P[1] || ... || P[i] XOR M || ... || P[n]
    255    The first block of the CBC cipher text stream is not malleable, because
    256    it depends on the IV, which is not modifiable for an attacker.
    257 
    258    ^3The IV parameter for E[CBC] has been intentionally omitted.
    259 
    260 Movable
    261 
    262    On the expense of one block decrypting to garbage, an attacker can move
    263    around plain text as he likes. CBC decryption depends on two variables,
    264    C[n-1] and C[n]. Both can be modified at free will. To make meaningful
    265    modifications, an attacker has to replace the pair C[n-1] and C[n] with
    266    other cipher text pair from disk. The first block C[n-1] will decrypt
    267    to garbage, but the second block C[n] will yield a copy of the plain
    268    text of the copied cipher block. This attack is also known as
    269    copy&paste attack. This attack is mountable against any CBC setup. The
    270    only limitation is, the first block, C[0], can't be replaced with
    271    something meaningful, as C[-1] can't be modified, because it's the IV.
    272 
    273 CMC and EME: Tweakable wide block cipher modes
    274 
    275    CMC is a new chaining mode. It stands for ''CBC-Mask-CBC''. It works by
    276    processing the data in three steps, first CBC, then masking the cipher
    277    text, and then another CBC step, but this time backwards. The last step
    278    introduces a dependency from the last block to the first block. The
    279    authors of the CMC paper provide a prove for the security of this mode,
    280    making a secure 128-bit cipher a secure 4096-bit cipher (sector size).
    281    As in normal CBC, this scheme also takes an IV, but the authors call it
    282    tweak.
    283 
    284    EME is CMC's cousin. EME has also been authored by Haveli and Rogaway
    285    as well been authored for the same purpose. The difference to CMC is,
    286    that EME is parallelizable, that is, all operations of the underlying
    287    cipher can be evaluated in parallel. To introduce an interdependency
    288    among the resulting cipher blocks, the encryption happens in two
    289    stages. Between these stages a mask is computed from all intermediate
    290    blocks and applied to each intermediate block. This step causes an
    291    interdependency among the cipher blocks. After applying the mask,
    292    another encryption step diffuses the mask.
    293 
    294    The interdependency among the resulting blocks allow CMC and EME to be
    295    nonmovable, nonmalleable, to prevent content leaks and in-sector data
    296    modification patterns. The tweaks are encrypted by both cipher modes,
    297    thus both are nonwatermarkable.
    298 
    299    For simplicity, the EME description above omitted the pre- and post-
    300    whitening steps as well as the multiplications in GF(2^128). An
    301    in-depth specification can be found at the [23]Cryptology ePrint
    302    Archive. An applicable draft specification for EME-32-AES can be found
    303    at [24]SISWG. I have written an EME-32-AES test implementation for
    304    Linux 2.6. It's available [25]here. The CMC paper is available from the
    305    [26]Cryptology ePrint Archive as well.
    306 
    307 LRW: A tweakable narrow block cipher mode
    308 
    309    EME as well as CMC are comparatively secure cipher modes, but heavy in
    310    terms of performance. LRW tries to cope with most of security
    311    requirements, and at the same time provide a good performance. LRW is a
    312    narrow block cipher mode, that is, it operates only on one block,
    313    instead of a whole sector. To make a cipher block tied to a location on
    314    disk (to make it unmovable), a logical index is included in the
    315    computation. For LRW you have to provide two keys, one for the cipher
    316    and one for the cipher mode. The second key is multiplied with a
    317    logical index under GF(2^128) and used as pre- and post- whitening for
    318    encryption. With those whitening steps the block is effectively tied to
    319    a logical index. The logical index is usually the absolute position on
    320    disk measured with the block size of the cipher algorithm. The
    321    different choice of the measuring unit is the only different between
    322    the logical index and the public-IV.
    323 
    324    The LRW draft is available from the [27]SISWG mailing list archive.
    325 
    326 Summarising
    327 
    328    The following table shows a comparison between the security properties
    329    of different encryption setups and their computational costs. The
    330    number of cipher calls, XOR operations and additional operations are
    331    stated in terms of encryption blocks, n.
    332    IV mode cipher mode content leaks watermarkable malleable movable
    333    modification detection^5 cipher calls XOR ops additional op.
    334    public-IV CBC Yes Yes Yes Yes Yes n n None
    335    ESSIV CBC Yes No Yes Yes Yes n+1 n None
    336    Plumb-IV1^4 CBC Yes No Yes Yes No 2n-1 2n None
    337    public-IV CMC No No No No No 2n+1 2n+1 1 LW GF ⊗
    338    public-IV EME No No No No No 2n+1 5n 3n-1 LW GF ⊗
    339    public-IV LRW No No No No Yes n 2n n HW GF ⊗
    340 
    341    Legend:
    342      * LW GF ⊗: light-weight Galois field multiplication, that is, a
    343        multiplication with a constant x^2^i, which can be computed in
    344        θ(1).
    345      * HW GF ⊗: heavy-weight Galois field multiplication, that is, a
    346        multiplication with an arbitrary constant, which can be computed in
    347        θ(bits).
    348 
    349    ^4plumb-IV1 uses CBC-MAC instead of hashing, so we can make a good
    350    comparison with other ciphers in terms of cipher/XOR calls.
    351    ^5detectable partial in-sector modification
    352      __________________________________________________________________
    353 
    354    Clemens Fruhwirth, , also author of LUKS and ESSIV, porter of
    355    cryptoloop, aes-i586 for 2.6., twofish-i586, and implementor of
    356    EME-32-AES. This text is an excerpt of my diploma thesis.
    357 
    358 This page has been reviewed by
    359 
    360    Dr. Ernst Molitor
    361    Arno Wagner
    362    James Hughes , "Security in Storage Working Group" chair
    363 
    364    Additional thanks to Pascal Brisset, for pointing out an error in the
    365    Bernoulli estimation in an earlier version of this document, further
    366    Adam J. Richter for pointing out an error in the KB/GB ratio.
    367 
    368    Content and design, Copyright © 2004-2008 Clemens Fruhwirth, unless
    369    stated otherwise
    370    Original design by [28]haran | Additional art by [29]LinuxArt | | Blog
    371    by [30]NanoBlogger
    372 
    373 References
    374 
    375    1. http://clemens.endorphin.org/
    376    2. http://clemens.endorphin.org/credits
    377    3. http://clemens.endorphin.org/aboutme
    378    4. http://clemens.endorphin.org/cryptography
    379    5. http://blog.clemens.endorphin.org/
    380    6. http://clemens.endorphin.org/patches
    381    7. http://clemens.endorphin.org/archive
    382    8. http://clemens.endorphin.org/Cryptoloop_Migration_Guide
    383    9. http://clemens.endorphin.org/LUKS
    384   10. http://clemens.endorphin.org/AFsplitter
    385   11. http://clemens.endorphin.org/lo-tracker
    386   12. http://blog.clemens.endorphin.org/2008/12/luks-on-disk-format-revision-111.html
    387   13. http://blog.clemens.endorphin.org/2008/11/xmonad-gridselect.html
    388   14. http://blog.clemens.endorphin.org/2008/11/workaround-for-bittorrent-traffic.html
    389   15. http://blog.clemens.endorphin.org/2008/09/i-love-lolcat-meme.html
    390   16. http://blog.clemens.endorphin.org/2008/09/counter-steganography-research.html
    391   17. http://clemens.endorphin.org/cryptography
    392   18. http://www.siswg.org/
    393   19. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
    394   20. http://www.freesoft.org/CIE/Topics/143.htm
    395   21. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf
    396   22. http://www.tcs.hut.fi/~mjos/doc/wisa2004.pdf
    397   23. http://eprint.iacr.org/2003/147/
    398   24. http://grouper.ieee.org/groups/1619/email/pdf00011.pdf
    399   25. http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/544
    400   26. http://eprint.iacr.org/2003/148/
    401   27. http://grouper.ieee.org/groups/1619/email/msg00160.html
    402   28. http://www.oswd.org/user/profile/id/3013
    403   29. http://www.linuxart.com/
    404   30. http://nanoblogger.sourceforge.net/