Documentation/ioctl/botching-up-ioctls.txt

   1 (How to avoid) Botching up ioctls
   2 =================================
   3
   4 From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
   5
   6 By: Daniel Vetter, Copyright © 2013 Intel Corporation
   7
   8 One clear insight kernel graphics hackers gained in the past few years is that
   9 trying to come up with a unified interface to manage the execution units and
  10 memory on completely different GPUs is a futile effort. So nowadays every
  11 driver has its own set of ioctls to allocate memory and submit work to the GPU.
  12 Which is nice, since there's no more insanity in the form of fake-generic, but
  13 actually only used once interfaces. But the clear downside is that there's much
  14 more potential to screw things up.
  15
  16 To avoid repeating all the same mistakes again I've written up some of the
  17 lessons learned while botching the job for the drm/i915 driver. Most of these
  18 only cover technicalities and not the big-picture issues like what the command
  19 submission ioctl exactly should look like. Learning these lessons is probably
  20 something every GPU driver has to do on its own.
  21
  22
  23 Prerequisites
  24 -------------
  25
  26 First the prerequisites. Without these you have already failed, because you
  27 will need to add a 32-bit compat layer:
  28
  29  * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
  30    the kernel has special types like __u32, __s64. Use them.
  31
  32  * Align everything to the natural size and use explicit padding. 32-bit
  33    platforms don't necessarily align 64-bit values to 64-bit boundaries, but
  34    64-bit platforms do. So we always need padding to the natural size to get
  35    this right.
  36
  37  * Pad the entire struct to a multiple of 64-bits if the structure contains
  38    64-bit types - the structure size will otherwise differ on 32-bit versus
  39    64-bit. Having a different structure size hurts when passing arrays of
  40    structures to the kernel, or if the kernel checks the structure size, which
  41    e.g. the drm core does.
  42
  43  * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
  44    from/to a void __user * in the kernel. Try really hard not to delay this
  45    conversion or worse, fiddle the raw __u64 through your code since that
  46    diminishes the checking tools like sparse can provide. The macro
  47    u64_to_user_ptr can be used in the kernel to avoid warnings about integers
  48    and pointres of different sizes.
  49
  50
  51 Basics
  52 ------
  53
  54 With the joys of writing a compat layer avoided we can take a look at the basic
  55 fumbles. Neglecting these will make backward and forward compatibility a real
  56 pain. And since getting things wrong on the first attempt is guaranteed you
  57 will have a second iteration or at least an extension for any given interface.
  58
  59  * Have a clear way for userspace to figure out whether your new ioctl or ioctl
  60    extension is supported on a given kernel. If you can't rely on old kernels
  61    rejecting the new flags/modes or ioctls (since doing that was botched in the
  62    past) then you need a driver feature flag or revision number somewhere.
  63
  64  * Have a plan for extending ioctls with new flags or new fields at the end of
  65    the structure. The drm core checks the passed-in size for each ioctl call
  66    and zero-extends any mismatches between kernel and userspace. That helps,
  67    but isn't a complete solution since newer userspace on older kernels won't
  68    notice that the newly added fields at the end get ignored. So this still
  69    needs a new driver feature flags.
  70
  71  * Check all unused fields and flags and all the padding for whether it's 0,
  72    and reject the ioctl if that's not the case. Otherwise your nice plan for
  73    future extensions is going right down the gutters since someone will submit
  74    an ioctl struct with random stack garbage in the yet unused parts. Which
  75    then bakes in the ABI that those fields can never be used for anything else
  76    but garbage. This is also the reason why you must explicitly pad all
  77    structures, even if you never use them in an array - the padding the compiler
  78    might insert could contain garbage.
  79
  80  * Have simple testcases for all of the above.
  81
  82
  83 Fun with Error Paths
  84 --------------------
  85
  86 Nowadays we don't have any excuse left any more for drm drivers being neat
  87 little root exploits. This means we both need full input validation and solid
  88 error handling paths - GPUs will die eventually in the oddmost corner cases
  89 anyway:
  90
  91  * The ioctl must check for array overflows. Also it needs to check for
  92    over/underflows and clamping issues of integer values in general. The usual
  93    example is sprite positioning values fed directly into the hardware with the
  94    hardware just having 12 bits or so. Works nicely until some odd display
  95    server doesn't bother with clamping itself and the cursor wraps around the
  96    screen.
  97
  98  * Have simple testcases for every input validation failure case in your ioctl.
  99    Check that the error code matches your expectations. And finally make sure
 100    that you only test for one single error path in each subtest by submitting
 101    otherwise perfectly valid data. Without this an earlier check might reject
 102    the ioctl already and shadow the codepath you actually want to test, hiding
 103    bugs and regressions.
 104
 105  * Make all your ioctls restartable. First X really loves signals and second
 106    this will allow you to test 90% of all error handling paths by just
 107    interrupting your main test suite constantly with signals. Thanks to X's
 108    love for signal you'll get an excellent base coverage of all your error
 109    paths pretty much for free for graphics drivers. Also, be consistent with
 110    how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
 111    userspace library. The i915 driver botched this with the set_tiling ioctl,
 112    now we're stuck forever with some arcane semantics in both the kernel and
 113    userspace.
 114
 115  * If you can't make a given codepath restartable make a stuck task at least
 116    killable. GPUs just die and your users won't like you more if you hang their
 117    entire box (by means of an unkillable X process). If the state recovery is
 118    still too tricky have a timeout or hangcheck safety net as a last-ditch
 119    effort in case the hardware has gone bananas.
 120
 121  * Have testcases for the really tricky corner cases in your error recovery code
 122    - it's way too easy to create a deadlock between your hangcheck code and
 123    waiters.
 124
 125
 126 Time, Waiting and Missing it
 127 ----------------------------
 128
 129 GPUs do most everything asynchronously, so we have a need to time operations and
 130 wait for outstanding ones. This is really tricky business; at the moment none of
 131 the ioctls supported by the drm/i915 get this fully right, which means there's
 132 still tons more lessons to learn here.
 133
 134  * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
 135    v4l use by default nowadays. But let userspace know which timestamps are
 136    derived from different clock domains like your main system clock (provided
 137    by the kernel) or some independent hardware counter somewhere else. Clocks
 138    will mismatch if you look close enough, but if performance measuring tools
 139    have this information they can at least compensate. If your userspace can
 140    get at the raw values of some clocks (e.g. through in-command-stream
 141    performance counter sampling instructions) consider exposing those also.
 142
 143  * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
 144    convenient time specification, but it's mostly the standard.
 145
 146  * Check that input time values are normalized and reject them if not. Note
 147    that the kernel native struct ktime has a signed integer for both seconds
 148    and nanoseconds, so beware here.
 149
 150  * For timeouts, use absolute times. If you're a good fellow and made your
 151    ioctl restartable relative timeouts tend to be too coarse and can
 152    indefinitely extend your wait time due to rounding on each restart.
 153    Especially if your reference clock is something really slow like the display
 154    frame counter. With a spec lawyer hat on this isn't a bug since timeouts can
 155    always be extended - but users will surely hate you if their neat animations
 156    starts to stutter due to this.
 157
 158  * Consider ditching any synchronous wait ioctls with timeouts and just deliver
 159    an asynchronous event on a pollable file descriptor. It fits much better
 160    into event driven applications' main loop.
 161
 162  * Have testcases for corner-cases, especially whether the return values for
 163    already-completed events, successful waits and timed-out waits are all sane
 164    and suiting to your needs.
 165
 166
 167 Leaking Resources, Not
 168 ----------------------
 169
 170 A full-blown drm driver essentially implements a little OS, but specialized to
 171 the given GPU platforms. This means a driver needs to expose tons of handles
 172 for different objects and other resources to userspace. Doing that right
 173 entails its own little set of pitfalls:
 174
 175  * Always attach the lifetime of your dynamically created resources to the
 176    lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
 177    needs to be shared across processes -  fd-passing over unix domain sockets
 178    also simplifies lifetime management for userspace.
 179
 180  * Always have O_CLOEXEC support.
 181
 182  * Ensure that you have sufficient insulation between different clients. By
 183    default pick a private per-fd namespace which forces any sharing to be done
 184    explicitly. Only go with a more global per-device namespace if the objects
 185    are truly device-unique. One counterexample in the drm modeset interfaces is
 186    that the per-device modeset objects like connectors share a namespace with
 187    framebuffer objects, which mostly are not shared at all. A separate
 188    namespace, private by default, for framebuffers would have been more
 189    suitable.
 190
 191  * Think about uniqueness requirements for userspace handles. E.g. for most drm
 192    drivers it's a userspace bug to submit the same object twice in the same
 193    command submission ioctl. But then if objects are shareable userspace needs
 194    to know whether it has seen an imported object from a different process
 195    already or not. I haven't tried this myself yet due to lack of a new class
 196    of objects, but consider using inode numbers on your shared file descriptors
 197    as unique identifiers - it's how real files are told apart, too.
 198    Unfortunately this requires a full-blown virtual filesystem in the kernel.
 199
 200
 201 Last, but not Least
 202 -------------------
 203
 204 Not every problem needs a new ioctl:
 205
 206  * Think hard whether you really want a driver-private interface. Of course
 207    it's much quicker to push a driver-private interface than engaging in
 208    lengthy discussions for a more generic solution. And occasionally doing a
 209    private interface to spearhead a new concept is what's required. But in the
 210    end, once the generic interface comes around you'll end up maintainer two
 211    interfaces. Indefinitely.
 212
 213  * Consider other interfaces than ioctls. A sysfs attribute is much better for
 214    per-device settings, or for child objects with fairly static lifetimes (like
 215    output connectors in drm with all the detection override attributes). Or
 216    maybe only your testsuite needs this interface, and then debugfs with its
 217    disclaimer of not having a stable ABI would be better.
 218
 219 Finally, the name of the game is to get it right on the first attempt, since if
 220 your driver proves popular and your hardware platforms long-lived then you'll
 221 be stuck with a given ioctl essentially forever. You can try to deprecate
 222 horrible ioctls on newer iterations of your hardware, but generally it takes
 223 years to accomplish this. And then again years until the last user able to
 224 complain about regressions disappears, too.