Documentation/filesystems/fscrypt.rst

   1 =====================================
   2 Filesystem-level encryption (fscrypt)
   3 =====================================
   4
   5 Introduction
   6 ============
   7
   8 fscrypt is a library which filesystems can hook into to support
   9 transparent encryption of files and directories.
  10
  11 Note: "fscrypt" in this document refers to the kernel-level portion,
  12 implemented in ``fs/crypto/``, as opposed to the userspace tool
  13 `fscrypt <https://github.com/google/fscrypt>`_.  This document only
  14 covers the kernel-level portion.  For command-line examples of how to
  15 use encryption, see the documentation for the userspace tool `fscrypt
  16 <https://github.com/google/fscrypt>`_.  Also, it is recommended to use
  17 the fscrypt userspace tool, or other existing userspace tools such as
  18 `fscryptctl <https://github.com/google/fscryptctl>`_ or `Android's key
  19 management system
  20 <https://source.android.com/security/encryption/file-based>`_, over
  21 using the kernel's API directly.  Using existing tools reduces the
  22 chance of introducing your own security bugs.  (Nevertheless, for
  23 completeness this documentation covers the kernel's API anyway.)
  24
  25 Unlike dm-crypt, fscrypt operates at the filesystem level rather than
  26 at the block device level.  This allows it to encrypt different files
  27 with different keys and to have unencrypted files on the same
  28 filesystem.  This is useful for multi-user systems where each user's
  29 data-at-rest needs to be cryptographically isolated from the others.
  30 However, except for filenames, fscrypt does not encrypt filesystem
  31 metadata.
  32
  33 Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated
  34 directly into supported filesystems --- currently ext4, F2FS, and
  35 UBIFS.  This allows encrypted files to be read and written without
  36 caching both the decrypted and encrypted pages in the pagecache,
  37 thereby nearly halving the memory used and bringing it in line with
  38 unencrypted files.  Similarly, half as many dentries and inodes are
  39 needed.  eCryptfs also limits encrypted filenames to 143 bytes,
  40 causing application compatibility issues; fscrypt allows the full 255
  41 bytes (NAME_MAX).  Finally, unlike eCryptfs, the fscrypt API can be
  42 used by unprivileged users, with no need to mount anything.
  43
  44 fscrypt does not support encrypting files in-place.  Instead, it
  45 supports marking an empty directory as encrypted.  Then, after
  46 userspace provides the key, all regular files, directories, and
  47 symbolic links created in that directory tree are transparently
  48 encrypted.
  49
  50 Threat model
  51 ============
  52
  53 Offline attacks
  54 ---------------
  55
  56 Provided that userspace chooses a strong encryption key, fscrypt
  57 protects the confidentiality of file contents and filenames in the
  58 event of a single point-in-time permanent offline compromise of the
  59 block device content.  fscrypt does not protect the confidentiality of
  60 non-filename metadata, e.g. file sizes, file permissions, file
  61 timestamps, and extended attributes.  Also, the existence and location
  62 of holes (unallocated blocks which logically contain all zeroes) in
  63 files is not protected.
  64
  65 fscrypt is not guaranteed to protect confidentiality or authenticity
  66 if an attacker is able to manipulate the filesystem offline prior to
  67 an authorized user later accessing the filesystem.
  68
  69 Online attacks
  70 --------------
  71
  72 fscrypt (and storage encryption in general) can only provide limited
  73 protection, if any at all, against online attacks.  In detail:
  74
  75 fscrypt is only resistant to side-channel attacks, such as timing or
  76 electromagnetic attacks, to the extent that the underlying Linux
  77 Cryptographic API algorithms are.  If a vulnerable algorithm is used,
  78 such as a table-based implementation of AES, it may be possible for an
  79 attacker to mount a side channel attack against the online system.
  80 Side channel attacks may also be mounted against applications
  81 consuming decrypted data.
  82
  83 After an encryption key has been provided, fscrypt is not designed to
  84 hide the plaintext file contents or filenames from other users on the
  85 same system, regardless of the visibility of the keyring key.
  86 Instead, existing access control mechanisms such as file mode bits,
  87 POSIX ACLs, LSMs, or mount namespaces should be used for this purpose.
  88 Also note that as long as the encryption keys are *anywhere* in
  89 memory, an online attacker can necessarily compromise them by mounting
  90 a physical attack or by exploiting any kernel security vulnerability
  91 which provides an arbitrary memory read primitive.
  92
  93 While it is ostensibly possible to "evict" keys from the system,
  94 recently accessed encrypted files will remain accessible at least
  95 until the filesystem is unmounted or the VFS caches are dropped, e.g.
  96 using ``echo 2 > /proc/sys/vm/drop_caches``.  Even after that, if the
  97 RAM is compromised before being powered off, it will likely still be
  98 possible to recover portions of the plaintext file contents, if not
  99 some of the encryption keys as well.  (Since Linux v4.12, all
 100 in-kernel keys related to fscrypt are sanitized before being freed.
 101 However, userspace would need to do its part as well.)
 102
 103 Currently, fscrypt does not prevent a user from maliciously providing
 104 an incorrect key for another user's existing encrypted files.  A
 105 protection against this is planned.
 106
 107 Key hierarchy
 108 =============
 109
 110 Master Keys
 111 -----------
 112
 113 Each encrypted directory tree is protected by a *master key*.  Master
 114 keys can be up to 64 bytes long, and must be at least as long as the
 115 greater of the key length needed by the contents and filenames
 116 encryption modes being used.  For example, if AES-256-XTS is used for
 117 contents encryption, the master key must be 64 bytes (512 bits).  Note
 118 that the XTS mode is defined to require a key twice as long as that
 119 required by the underlying block cipher.
 120
 121 To "unlock" an encrypted directory tree, userspace must provide the
 122 appropriate master key.  There can be any number of master keys, each
 123 of which protects any number of directory trees on any number of
 124 filesystems.
 125
 126 Userspace should generate master keys either using a cryptographically
 127 secure random number generator, or by using a KDF (Key Derivation
 128 Function).  Note that whenever a KDF is used to "stretch" a
 129 lower-entropy secret such as a passphrase, it is critical that a KDF
 130 designed for this purpose be used, such as scrypt, PBKDF2, or Argon2.
 131
 132 Per-file keys
 133 -------------
 134
 135 Master keys are not used to encrypt file contents or names directly.
 136 Instead, a unique key is derived for each encrypted file, including
 137 each regular file, directory, and symbolic link.  This has several
 138 advantages:
 139
 140 - In cryptosystems, the same key material should never be used for
 141   different purposes.  Using the master key as both an XTS key for
 142   contents encryption and as a CTS-CBC key for filenames encryption
 143   would violate this rule.
 144 - Per-file keys simplify the choice of IVs (Initialization Vectors)
 145   for contents encryption.  Without per-file keys, to ensure IV
 146   uniqueness both the inode and logical block number would need to be
 147   encoded in the IVs.  This would make it impossible to renumber
 148   inodes, which e.g. ``resize2fs`` can do when resizing an ext4
 149   filesystem.  With per-file keys, it is sufficient to encode just the
 150   logical block number in the IVs.
 151 - Per-file keys strengthen the encryption of filenames, where IVs are
 152   reused out of necessity.  With a unique key per directory, IV reuse
 153   is limited to within a single directory.
 154 - Per-file keys allow individual files to be securely erased simply by
 155   securely erasing their keys.  (Not yet implemented.)
 156
 157 A KDF (Key Derivation Function) is used to derive per-file keys from
 158 the master key.  This is done instead of wrapping a randomly-generated
 159 key for each file because it reduces the size of the encryption xattr,
 160 which for some filesystems makes the xattr more likely to fit in-line
 161 in the filesystem's inode table.  With a KDF, only a 16-byte nonce is
 162 required --- long enough to make key reuse extremely unlikely.  A
 163 wrapped key, on the other hand, would need to be up to 64 bytes ---
 164 the length of an AES-256-XTS key.  Furthermore, currently there is no
 165 requirement to support unlocking a file with multiple alternative
 166 master keys or to support rotating master keys.  Instead, the master
 167 keys may be wrapped in userspace, e.g. as done by the `fscrypt
 168 <https://github.com/google/fscrypt>`_ tool.
 169
 170 The current KDF encrypts the master key using the 16-byte nonce as an
 171 AES-128-ECB key.  The output is used as the derived key.  If the
 172 output is longer than needed, then it is truncated to the needed
 173 length.  Truncation is the norm for directories and symlinks, since
 174 those use the CTS-CBC encryption mode which requires a key half as
 175 long as that required by the XTS encryption mode.
 176
 177 Note: this KDF meets the primary security requirement, which is to
 178 produce unique derived keys that preserve the entropy of the master
 179 key, assuming that the master key is already a good pseudorandom key.
 180 However, it is nonstandard and has some problems such as being
 181 reversible, so it is generally considered to be a mistake!  It may be
 182 replaced with HKDF or another more standard KDF in the future.
 183
 184 Encryption modes and usage
 185 ==========================
 186
 187 fscrypt allows one encryption mode to be specified for file contents
 188 and one encryption mode to be specified for filenames.  Different
 189 directory trees are permitted to use different encryption modes.
 190 Currently, the following pairs of encryption modes are supported:
 191
 192 - AES-256-XTS for contents and AES-256-CTS-CBC for filenames
 193 - AES-128-CBC for contents and AES-128-CTS-CBC for filenames
 194
 195 It is strongly recommended to use AES-256-XTS for contents encryption.
 196 AES-128-CBC was added only for low-powered embedded devices with
 197 crypto accelerators such as CAAM or CESA that do not support XTS.
 198
 199 New encryption modes can be added relatively easily, without changes
 200 to individual filesystems.  However, authenticated encryption (AE)
 201 modes are not currently supported because of the difficulty of dealing
 202 with ciphertext expansion.
 203
 204 For file contents, each filesystem block is encrypted independently.
 205 Currently, only the case where the filesystem block size is equal to
 206 the system's page size (usually 4096 bytes) is supported.  With the
 207 XTS mode of operation (recommended), the logical block number within
 208 the file is used as the IV.  With the CBC mode of operation (not
 209 recommended), ESSIV is used; specifically, the IV for CBC is the
 210 logical block number encrypted with AES-256, where the AES-256 key is
 211 the SHA-256 hash of the inode's data encryption key.
 212
 213 For filenames, the full filename is encrypted at once.  Because of the
 214 requirements to retain support for efficient directory lookups and
 215 filenames of up to 255 bytes, a constant initialization vector (IV) is
 216 used.  However, each encrypted directory uses a unique key, which
 217 limits IV reuse to within a single directory.  Note that IV reuse in
 218 the context of CTS-CBC encryption means that when the original
 219 filenames share a common prefix at least as long as the cipher block
 220 size (16 bytes for AES), the corresponding encrypted filenames will
 221 also share a common prefix.  This is undesirable; it may be fixed in
 222 the future by switching to an encryption mode that is a strong
 223 pseudorandom permutation on arbitrary-length messages, e.g. the HEH
 224 (Hash-Encrypt-Hash) mode.
 225
 226 Since filenames are encrypted with the CTS-CBC mode of operation, the
 227 plaintext and ciphertext filenames need not be multiples of the AES
 228 block size, i.e. 16 bytes.  However, the minimum size that can be
 229 encrypted is 16 bytes, so shorter filenames are NUL-padded to 16 bytes
 230 before being encrypted.  In addition, to reduce leakage of filename
 231 lengths via their ciphertexts, all filenames are NUL-padded to the
 232 next 4, 8, 16, or 32-byte boundary (configurable).  32 is recommended
 233 since this provides the best confidentiality, at the cost of making
 234 directory entries consume slightly more space.  Note that since NUL
 235 (``\0``) is not otherwise a valid character in filenames, the padding
 236 will never produce duplicate plaintexts.
 237
 238 Symbolic link targets are considered a type of filename and are
 239 encrypted in the same way as filenames in directory entries.  Each
 240 symlink also uses a unique key; hence, the hardcoded IV is not a
 241 problem for symlinks.
 242
 243 User API
 244 ========
 245
 246 Setting an encryption policy
 247 ----------------------------
 248
 249 The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an
 250 empty directory or verifies that a directory or regular file already
 251 has the specified encryption policy.  It takes in a pointer to a
 252 :c:type:`struct fscrypt_policy`, defined as follows::
 253
 254     #define FS_KEY_DESCRIPTOR_SIZE  8
 255
 256     struct fscrypt_policy {
 257             __u8 version;
 258             __u8 contents_encryption_mode;
 259             __u8 filenames_encryption_mode;
 260             __u8 flags;
 261             __u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
 262     };
 263
 264 This structure must be initialized as follows:
 265
 266 - ``version`` must be 0.
 267
 268 - ``contents_encryption_mode`` and ``filenames_encryption_mode`` must
 269   be set to constants from ``<linux/fs.h>`` which identify the
 270   encryption modes to use.  If unsure, use
 271   FS_ENCRYPTION_MODE_AES_256_XTS (1) for ``contents_encryption_mode``
 272   and FS_ENCRYPTION_MODE_AES_256_CTS (4) for
 273   ``filenames_encryption_mode``.
 274
 275 - ``flags`` must be set to a value from ``<linux/fs.h>`` which
 276   identifies the amount of NUL-padding to use when encrypting
 277   filenames.  If unsure, use FS_POLICY_FLAGS_PAD_32 (0x3).
 278
 279 - ``master_key_descriptor`` specifies how to find the master key in
 280   the keyring; see `Adding keys`_.  It is up to userspace to choose a
 281   unique ``master_key_descriptor`` for each master key.  The e4crypt
 282   and fscrypt tools use the first 8 bytes of
 283   ``SHA-512(SHA-512(master_key))``, but this particular scheme is not
 284   required.  Also, the master key need not be in the keyring yet when
 285   FS_IOC_SET_ENCRYPTION_POLICY is executed.  However, it must be added
 286   before any files can be created in the encrypted directory.
 287
 288 If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY
 289 verifies that the file is an empty directory.  If so, the specified
 290 encryption policy is assigned to the directory, turning it into an
 291 encrypted directory.  After that, and after providing the
 292 corresponding master key as described in `Adding keys`_, all regular
 293 files, directories (recursively), and symlinks created in the
 294 directory will be encrypted, inheriting the same encryption policy.
 295 The filenames in the directory's entries will be encrypted as well.
 296
 297 Alternatively, if the file is already encrypted, then
 298 FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption
 299 policy exactly matches the actual one.  If they match, then the ioctl
 300 returns 0.  Otherwise, it fails with EEXIST.  This works on both
 301 regular files and directories, including nonempty directories.
 302
 303 Note that the ext4 filesystem does not allow the root directory to be
 304 encrypted, even if it is empty.  Users who want to encrypt an entire
 305 filesystem with one key should consider using dm-crypt instead.
 306
 307 FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors:
 308
 309 - ``EACCES``: the file is not owned by the process's uid, nor does the
 310   process have the CAP_FOWNER capability in a namespace with the file
 311   owner's uid mapped
 312 - ``EEXIST``: the file is already encrypted with an encryption policy
 313   different from the one specified
 314 - ``EINVAL``: an invalid encryption policy was specified (invalid
 315   version, mode(s), or flags)
 316 - ``ENOTDIR``: the file is unencrypted and is a regular file, not a
 317   directory
 318 - ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory
 319 - ``ENOTTY``: this type of filesystem does not implement encryption
 320 - ``EOPNOTSUPP``: the kernel was not configured with encryption
 321   support for this filesystem, or the filesystem superblock has not
 322   had encryption enabled on it.  (For example, to use encryption on an
 323   ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the
 324   kernel config, and the superblock must have had the "encrypt"
 325   feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O
 326   encrypt``.)
 327 - ``EPERM``: this directory may not be encrypted, e.g. because it is
 328   the root directory of an ext4 filesystem
 329 - ``EROFS``: the filesystem is readonly
 330
 331 Getting an encryption policy
 332 ----------------------------
 333
 334 The FS_IOC_GET_ENCRYPTION_POLICY ioctl retrieves the :c:type:`struct
 335 fscrypt_policy`, if any, for a directory or regular file.  See above
 336 for the struct definition.  No additional permissions are required
 337 beyond the ability to open the file.
 338
 339 FS_IOC_GET_ENCRYPTION_POLICY can fail with the following errors:
 340
 341 - ``EINVAL``: the file is encrypted, but it uses an unrecognized
 342   encryption context format
 343 - ``ENODATA``: the file is not encrypted
 344 - ``ENOTTY``: this type of filesystem does not implement encryption
 345 - ``EOPNOTSUPP``: the kernel was not configured with encryption
 346   support for this filesystem
 347
 348 Note: if you only need to know whether a file is encrypted or not, on
 349 most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl
 350 and check for FS_ENCRYPT_FL, or to use the statx() system call and
 351 check for STATX_ATTR_ENCRYPTED in stx_attributes.
 352
 353 Getting the per-filesystem salt
 354 -------------------------------
 355
 356 Some filesystems, such as ext4 and F2FS, also support the deprecated
 357 ioctl FS_IOC_GET_ENCRYPTION_PWSALT.  This ioctl retrieves a randomly
 358 generated 16-byte value stored in the filesystem superblock.  This
 359 value is intended to used as a salt when deriving an encryption key
 360 from a passphrase or other low-entropy user credential.
 361
 362 FS_IOC_GET_ENCRYPTION_PWSALT is deprecated.  Instead, prefer to
 363 generate and manage any needed salt(s) in userspace.
 364
 365 Adding keys
 366 -----------
 367
 368 To provide a master key, userspace must add it to an appropriate
 369 keyring using the add_key() system call (see:
 370 ``Documentation/security/keys/core.rst``).  The key type must be
 371 "logon"; keys of this type are kept in kernel memory and cannot be
 372 read back by userspace.  The key description must be "fscrypt:"
 373 followed by the 16-character lower case hex representation of the
 374 ``master_key_descriptor`` that was set in the encryption policy.  The
 375 key payload must conform to the following structure::
 376
 377     #define FS_MAX_KEY_SIZE 64
 378
 379     struct fscrypt_key {
 380             u32 mode;
 381             u8 raw[FS_MAX_KEY_SIZE];
 382             u32 size;
 383     };
 384
 385 ``mode`` is ignored; just set it to 0.  The actual key is provided in
 386 ``raw`` with ``size`` indicating its size in bytes.  That is, the
 387 bytes ``raw[0..size-1]`` (inclusive) are the actual key.
 388
 389 The key description prefix "fscrypt:" may alternatively be replaced
 390 with a filesystem-specific prefix such as "ext4:".  However, the
 391 filesystem-specific prefixes are deprecated and should not be used in
 392 new programs.
 393
 394 There are several different types of keyrings in which encryption keys
 395 may be placed, such as a session keyring, a user session keyring, or a
 396 user keyring.  Each key must be placed in a keyring that is "attached"
 397 to all processes that might need to access files encrypted with it, in
 398 the sense that request_key() will find the key.  Generally, if only
 399 processes belonging to a specific user need to access a given
 400 encrypted directory and no session keyring has been installed, then
 401 that directory's key should be placed in that user's user session
 402 keyring or user keyring.  Otherwise, a session keyring should be
 403 installed if needed, and the key should be linked into that session
 404 keyring, or in a keyring linked into that session keyring.
 405
 406 Note: introducing the complex visibility semantics of keyrings here
 407 was arguably a mistake --- especially given that by design, after any
 408 process successfully opens an encrypted file (thereby setting up the
 409 per-file key), possessing the keyring key is not actually required for
 410 any process to read/write the file until its in-memory inode is
 411 evicted.  In the future there probably should be a way to provide keys
 412 directly to the filesystem instead, which would make the intended
 413 semantics clearer.
 414
 415 Access semantics
 416 ================
 417
 418 With the key
 419 ------------
 420
 421 With the encryption key, encrypted regular files, directories, and
 422 symlinks behave very similarly to their unencrypted counterparts ---
 423 after all, the encryption is intended to be transparent.  However,
 424 astute users may notice some differences in behavior:
 425
 426 - Unencrypted files, or files encrypted with a different encryption
 427   policy (i.e. different key, modes, or flags), cannot be renamed or
 428   linked into an encrypted directory; see `Encryption policy
 429   enforcement`_.  Attempts to do so will fail with EPERM.  However,
 430   encrypted files can be renamed within an encrypted directory, or
 431   into an unencrypted directory.
 432
 433 - Direct I/O is not supported on encrypted files.  Attempts to use
 434   direct I/O on such files will fall back to buffered I/O.
 435
 436 - The fallocate operations FALLOC_FL_COLLAPSE_RANGE,
 437   FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported
 438   on encrypted files and will fail with EOPNOTSUPP.
 439
 440 - Online defragmentation of encrypted files is not supported.  The
 441   EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with
 442   EOPNOTSUPP.
 443
 444 - The ext4 filesystem does not support data journaling with encrypted
 445   regular files.  It will fall back to ordered data mode instead.
 446
 447 - DAX (Direct Access) is not supported on encrypted files.
 448
 449 - The st_size of an encrypted symlink will not necessarily give the
 450   length of the symlink target as required by POSIX.  It will actually
 451   give the length of the ciphertext, which may be slightly longer than
 452   the plaintext due to the NUL-padding.
 453
 454 Note that mmap *is* supported.  This is possible because the pagecache
 455 for an encrypted file contains the plaintext, not the ciphertext.
 456
 457 Without the key
 458 ---------------
 459
 460 Some filesystem operations may be performed on encrypted regular
 461 files, directories, and symlinks even before their encryption key has
 462 been provided:
 463
 464 - File metadata may be read, e.g. using stat().
 465
 466 - Directories may be listed, in which case the filenames will be
 467   listed in an encoded form derived from their ciphertext.  The
 468   current encoding algorithm is described in `Filename hashing and
 469   encoding`_.  The algorithm is subject to change, but it is
 470   guaranteed that the presented filenames will be no longer than
 471   NAME_MAX bytes, will not contain the ``/`` or ``\0`` characters, and
 472   will uniquely identify directory entries.
 473
 474   The ``.`` and ``..`` directory entries are special.  They are always
 475   present and are not encrypted or encoded.
 476
 477 - Files may be deleted.  That is, nondirectory files may be deleted
 478   with unlink() as usual, and empty directories may be deleted with
 479   rmdir() as usual.  Therefore, ``rm`` and ``rm -r`` will work as
 480   expected.
 481
 482 - Symlink targets may be read and followed, but they will be presented
 483   in encrypted form, similar to filenames in directories.  Hence, they
 484   are unlikely to point to anywhere useful.
 485
 486 Without the key, regular files cannot be opened or truncated.
 487 Attempts to do so will fail with ENOKEY.  This implies that any
 488 regular file operations that require a file descriptor, such as
 489 read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden.
 490
 491 Also without the key, files of any type (including directories) cannot
 492 be created or linked into an encrypted directory, nor can a name in an
 493 encrypted directory be the source or target of a rename, nor can an
 494 O_TMPFILE temporary file be created in an encrypted directory.  All
 495 such operations will fail with ENOKEY.
 496
 497 It is not currently possible to backup and restore encrypted files
 498 without the encryption key.  This would require special APIs which
 499 have not yet been implemented.
 500
 501 Encryption policy enforcement
 502 =============================
 503
 504 After an encryption policy has been set on a directory, all regular
 505 files, directories, and symbolic links created in that directory
 506 (recursively) will inherit that encryption policy.  Special files ---
 507 that is, named pipes, device nodes, and UNIX domain sockets --- will
 508 not be encrypted.
 509
 510 Except for those special files, it is forbidden to have unencrypted
 511 files, or files encrypted with a different encryption policy, in an
 512 encrypted directory tree.  Attempts to link or rename such a file into
 513 an encrypted directory will fail with EPERM.  This is also enforced
 514 during ->lookup() to provide limited protection against offline
 515 attacks that try to disable or downgrade encryption in known locations
 516 where applications may later write sensitive data.  It is recommended
 517 that systems implementing a form of "verified boot" take advantage of
 518 this by validating all top-level encryption policies prior to access.
 519
 520 Implementation details
 521 ======================
 522
 523 Encryption context
 524 ------------------
 525
 526 An encryption policy is represented on-disk by a :c:type:`struct
 527 fscrypt_context`.  It is up to individual filesystems to decide where
 528 to store it, but normally it would be stored in a hidden extended
 529 attribute.  It should *not* be exposed by the xattr-related system
 530 calls such as getxattr() and setxattr() because of the special
 531 semantics of the encryption xattr.  (In particular, there would be
 532 much confusion if an encryption policy were to be added to or removed
 533 from anything other than an empty directory.)  The struct is defined
 534 as follows::
 535
 536     #define FS_KEY_DESCRIPTOR_SIZE  8
 537     #define FS_KEY_DERIVATION_NONCE_SIZE 16
 538
 539     struct fscrypt_context {
 540             u8 format;
 541             u8 contents_encryption_mode;
 542             u8 filenames_encryption_mode;
 543             u8 flags;
 544             u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
 545             u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE];
 546     };
 547
 548 Note that :c:type:`struct fscrypt_context` contains the same
 549 information as :c:type:`struct fscrypt_policy` (see `Setting an
 550 encryption policy`_), except that :c:type:`struct fscrypt_context`
 551 also contains a nonce.  The nonce is randomly generated by the kernel
 552 and is used to derive the inode's encryption key as described in
 553 `Per-file keys`_.
 554
 555 Data path changes
 556 -----------------
 557
 558 For the read path (->readpage()) of regular files, filesystems can
 559 read the ciphertext into the page cache and decrypt it in-place.  The
 560 page lock must be held until decryption has finished, to prevent the
 561 page from becoming visible to userspace prematurely.
 562
 563 For the write path (->writepage()) of regular files, filesystems
 564 cannot encrypt data in-place in the page cache, since the cached
 565 plaintext must be preserved.  Instead, filesystems must encrypt into a
 566 temporary buffer or "bounce page", then write out the temporary
 567 buffer.  Some filesystems, such as UBIFS, already use temporary
 568 buffers regardless of encryption.  Other filesystems, such as ext4 and
 569 F2FS, have to allocate bounce pages specially for encryption.
 570
 571 Filename hashing and encoding
 572 -----------------------------
 573
 574 Modern filesystems accelerate directory lookups by using indexed
 575 directories.  An indexed directory is organized as a tree keyed by
 576 filename hashes.  When a ->lookup() is requested, the filesystem
 577 normally hashes the filename being looked up so that it can quickly
 578 find the corresponding directory entry, if any.
 579
 580 With encryption, lookups must be supported and efficient both with and
 581 without the encryption key.  Clearly, it would not work to hash the
 582 plaintext filenames, since the plaintext filenames are unavailable
 583 without the key.  (Hashing the plaintext filenames would also make it
 584 impossible for the filesystem's fsck tool to optimize encrypted
 585 directories.)  Instead, filesystems hash the ciphertext filenames,
 586 i.e. the bytes actually stored on-disk in the directory entries.  When
 587 asked to do a ->lookup() with the key, the filesystem just encrypts
 588 the user-supplied name to get the ciphertext.
 589
 590 Lookups without the key are more complicated.  The raw ciphertext may
 591 contain the ``\0`` and ``/`` characters, which are illegal in
 592 filenames.  Therefore, readdir() must base64-encode the ciphertext for
 593 presentation.  For most filenames, this works fine; on ->lookup(), the
 594 filesystem just base64-decodes the user-supplied name to get back to
 595 the raw ciphertext.
 596
 597 However, for very long filenames, base64 encoding would cause the
 598 filename length to exceed NAME_MAX.  To prevent this, readdir()
 599 actually presents long filenames in an abbreviated form which encodes
 600 a strong "hash" of the ciphertext filename, along with the optional
 601 filesystem-specific hash(es) needed for directory lookups.  This
 602 allows the filesystem to still, with a high degree of confidence, map
 603 the filename given in ->lookup() back to a particular directory entry
 604 that was previously listed by readdir().  See :c:type:`struct
 605 fscrypt_digested_name` in the source for more details.
 606
 607 Note that the precise way that filenames are presented to userspace
 608 without the key is subject to change in the future.  It is only meant
 609 as a way to temporarily present valid filenames so that commands like
 610 ``rm -r`` work as expected on encrypted directories.