From: David S. Miller Date: Thu, 5 Jan 2017 16:49:57 +0000 (-0500) Subject: Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf X-Git-Tag: v4.10-rc4~34^2~13 X-Git-Url: http://git.samba.org/samba.git/?p=sfrench%2Fcifs-2.6.git;a=commitdiff_plain;h=d896b3120b3391a2f95b2b8ec636e3f594d7f9c4;hp=14221cc45caad2fcab3a8543234bb7eda9b540d5 Merge git://git./pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains accumulated Netfilter fixes for your net tree: 1) Ensure quota dump and reset happens iff we can deliver numbers to userspace. 2) Silence splat on incorrect use of smp_processor_id() from nft_queue. 3) Fix an out-of-bound access reported by KASAN in nf_tables_rule_destroy(), patch from Florian Westphal. 4) Fix layer 4 checksum mangling in the nf_tables payload expression with IPv6. 5) Fix a race in the CLUSTERIP target from control plane path when two threads run to add a new configuration object. Serialize invocations of clusterip_config_init() using spin_lock. From Xin Long. 6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with the br_nf_pre_routing_finish() hook. From Artur Molchanov. ==================== Signed-off-by: David S. Miller --- diff --git a/CREDITS b/CREDITS index d7ebdfbc4d4f..c58560701d13 100644 --- a/CREDITS +++ b/CREDITS @@ -2775,6 +2775,10 @@ S: C/ Mieses 20, 9-B S: Valladolid 47009 S: Spain +N: Peter Oruba +D: AMD Microcode loader driver +S: Germany + N: Jens Osterkamp E: jens@de.ibm.com D: Maintainer of Spidernet network driver for Cell @@ -3945,8 +3949,6 @@ E: gwingerde@gmail.com D: Ralink rt2x00 WLAN driver D: Minix V2 file-system D: Misc fixes -S: Geessinkweg 177 -S: 7544 TX Enschede S: The Netherlands N: Lars Wirzenius diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 3acc4f1a6f84..c8a8eb1a2b11 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -14,13 +14,8 @@ Following translations are available on the WWW: - this file. ABI/ - info on kernel <-> userspace ABI and relative interface stability. - -BUG-HUNTING - - brute force method of doing binary search of patches to find bug. -Changes - - list of changes that break older software packages. CodingStyle - - how the maintainers expect the C code in the kernel to look. + - nothing here, just a pointer to process/coding-style.rst. DMA-API.txt - DMA API, pci_ API & extensions for non-consistent memory machines. DMA-API-HOWTO.txt @@ -33,8 +28,6 @@ DocBook/ - directory with DocBook templates etc. for kernel documentation. EDID/ - directory with info on customizing EDID for broken gfx/displays. -HOWTO - - the process and procedures of how to do Linux kernel development. IPMI.txt - info on Linux Intelligent Platform Management Interface (IPMI) Driver. IRQ-affinity.txt @@ -46,62 +39,43 @@ IRQ.txt Intel-IOMMU.txt - basic info on the Intel IOMMU virtualization support. Makefile - - This file does nothing. Removing it breaks make htmldocs and - make distclean. -ManagementStyle - - how to (attempt to) manage kernel hackers. + - It's not of interest for those who aren't touching the build system. +Makefile.sphinx + - It's not of interest for those who aren't touching the build system. +PCI/ + - info related to PCI drivers. RCU/ - directory with info on RCU (read-copy update). SAK.txt - info on Secure Attention Keys. SM501.txt - Silicon Motion SM501 multimedia companion chip -SecurityBugs - - procedure for reporting security bugs found in the kernel. -SubmitChecklist - - Linux kernel patch submission checklist. -SubmittingDrivers - - procedure to get a new driver source included into the kernel tree. SubmittingPatches - - procedure to get a source patch included into the kernel tree. -VGA-softcursor.txt - - how to change your VGA cursor from a blinking underscore. + - nothing here, just a pointer to process/coding-style.rst. accounting/ - documentation on accounting and taskstats. acpi/ - info on ACPI-specific hooks in the kernel. +admin-guide/ + - info related to Linux users and system admins. aoe/ - description of AoE (ATA over Ethernet) along with config examples. -applying-patches.txt - - description of various trees and how to apply their patches. arm/ - directory with info about Linux on the ARM architecture. arm64/ - directory with info about Linux on the 64 bit ARM architecture. -assoc_array.txt - - generic associative array intro. -atomic_ops.txt - - semantics and behavior of atomic and bitmask operations. auxdisplay/ - misc. LCD driver documentation (cfag12864b, ks0108). backlight/ - directory with info on controlling backlights in flat panel displays -bad_memory.txt - - how to use kernel parameters to exclude bad RAM regions. -basic_profiling.txt - - basic instructions for those who wants to profile Linux kernel. bcache.txt - Block-layer cache on fast SSDs to improve slow (raid) I/O performance. -binfmt_misc.txt - - info on the kernel support for extra binary formats. blackfin/ - directory with documentation for the Blackfin arch. block/ - info on the Block I/O (BIO) layer. blockdev/ - info on block devices & drivers -braille-console.txt - - info on how to use serial devices for Braille support. bt8xxgpio.txt - info on how to modify a bt8xx video card for GPIO usage. btmrvl.txt @@ -114,18 +88,24 @@ cachetlb.txt - describes the cache/TLB flushing interfaces Linux uses. cdrom/ - directory with information on the CD-ROM drivers that Linux has. -cgroups/ - - cgroups features, including cpusets and memory controller. +cgroup-v1/ + - cgroups v1 features, including cpusets and memory controller. +cgroup-v2.txt + - cgroups v2 features, including cpusets and memory controller. circular-buffers.txt - how to make use of the existing circular buffer infrastructure clk.txt - info on the common clock framework -coccinelle.txt - - info on how to get and use the Coccinelle code checking tool. +cma/ + - Continuous Memory Area (CMA) debugfs interface. +conf.py + - It's not of interest for those who aren't touching the build system. connector/ - docs on the netlink based userspace<->kernel space communication mod. console/ - documentation on Linux console drivers. +core-api/ + - documentation on kernel core components. cpu-freq/ - info on CPU frequency and voltage scaling. cpu-hotplug.txt @@ -150,42 +130,42 @@ debugging-via-ohci1394.txt - how to use firewire like a hardware debugger memory reader. dell_rbu.txt - document demonstrating the use of the Dell Remote BIOS Update driver. -development-process/ - - how to work with the mainline kernel development process. +dev-tools/ + - directory with info on development tools for the kernel. device-mapper/ - directory with info on Device Mapper. -devices.txt - - plain ASCII listing of all the nodes in /dev/ with major minor #'s. +dmaengine/ + - the DMA engine and controller API guides. devicetree/ - directory with info on device tree files used by OF/PowerPC/ARM digsig.txt -info on the Digital Signature Verification API dma-buf-sharing.txt - the DMA Buffer Sharing API Guide +docutils.conf + - nothing here. Just a configuration file for docutils. dontdiff - file containing a list of files that should never be diff'ed. +driver-api/ + - the Linux driver implementer's API guide. driver-model/ - directory with info about Linux driver model. -dvb/ - - info on Linux Digital Video Broadcast (DVB) subsystem. -dynamic-debug-howto.txt - - how to use the dynamic debug (dyndbg) feature. early-userspace/ - info about initramfs, klibc, and userspace early during boot. -edac.txt - - information on EDAC - Error Detection And Correction efi-stub.txt - How to use the EFI boot stub to bypass GRUB or elilo on EFI systems. eisa.txt - info on EISA bus support. -email-clients.txt - - info on how to use e-mail to send un-mangled (git) patches. extcon/ - directory with porting guide for Android kernel switch driver. +isa.txt + - info on EISA bus support. fault-injection/ - dir with docs about the fault injection capabilities infrastructure. fb/ - directory with info on the frame buffer graphics abstraction layer. +features/ + - status of feature implementation on different architectures. filesystems/ - info on the vfs and the various filesystems that Linux supports. firmware_class/ @@ -194,20 +174,22 @@ flexible-arrays.txt - how to make use of flexible sized arrays in linux fmc/ - information about the FMC bus abstraction +fpga/ + - FPGA Manager Core. frv/ - Fujitsu FR-V Linux documentation. futex-requeue-pi.txt - info on requeueing of tasks from a non-PI futex to a PI futex -gcov.txt - - use of GCC's coverage testing tool "gcov" with the Linux kernel +gcc-plugins.txt + - GCC plugin infrastructure. gpio/ - gpio related documentation +gpu/ + - directory with information on GPU driver developer's guide. hid/ - directory with information on human interface devices highuid.txt - notes on the change from 16 bit to 32 bit user/group IDs. -hsi.txt - - HSI subsystem overview. hwspinlock.txt - hardware spinlock provides hardware assistance for synchronization timers/ @@ -218,18 +200,18 @@ hwmon/ - directory with docs on various hardware monitoring drivers. i2c/ - directory with info about the I2C bus/protocol (2 wire, kHz speed). -i2o/ - - directory with info about the Linux I2O subsystem. x86/i386/ - directory with info about Linux on Intel 32 bit architecture. ia64/ - directory with info about Linux on Intel 64 bit architecture. +ide/ + - Information regarding the Enhanced IDE drive. +iio/ + - info on industrial IIO configfs support. +index.rst + - main index for the documentation at ReST format. infiniband/ - directory with documents concerning Linux InfiniBand support. -init.txt - - what to do when the kernel can't find the 1st process to run. -initrd.txt - - how to use the RAM disk as an initial/temporary root filesystem. input/ - info on Linux input device support. intel_txt.txt @@ -248,28 +230,16 @@ isapnp.txt - info on Linux ISA Plug & Play support. isdn/ - directory with info on the Linux ISDN support, and supported cards. -java.txt - - info on the in-kernel binary support for Java(tm). -ja_JP/ - - directory with Japanese translations of various documents kbuild/ - directory with info about the kernel build process. +kernel-doc-nano-HOWTO.txt + - outdated info about kernel-doc documentation. kdump/ - directory with mini HowTo on getting the crash dump code to work. -kernel-docs.txt - - listing of various WWW + books that document kernel internals. -kernel-documentation.rst +doc-guide/ - how to write and format reStructuredText kernel documentation -kernel-parameters.txt - - summary listing of command line / boot prompt args for the kernel. kernel-per-CPU-kthreads.txt - List of all per-CPU kthreads and how they introduce jitter. -kmemcheck.txt - - info on dynamic checker that detects uses of uninitialized memory. -kmemleak.txt - - info on how to make use of the kernel memory leak detection system -ko_KR/ - - directory with Korean translations of various documents kobject.txt - info of the kobject infrastructure of the Linux kernel. kprobes.txt @@ -284,8 +254,8 @@ ldm.txt - a brief description of LDM (Windows Dynamic Disks). leds/ - directory with info about LED handling under Linux. -local_ops.txt - - semantics and behavior of local atomic operations. +livepatch/ + - info on kernel live patching. locking/ - directory with info about kernel locking primitives lockup-watchdogs.txt @@ -298,22 +268,24 @@ lzo.txt - kernel LZO decompressor input formats m68k/ - directory with info about Linux on Motorola 68k architecture. -magic-number.txt - - list of magic numbers used to mark/protect kernel data structures. mailbox.txt - How to write drivers for the common mailbox framework (IPC). -md.txt - - info on boot arguments for the multiple devices driver. -media-framework.txt - - info on media framework, its data structures, functions and usage. +md-cluster.txt + - info on shared-device RAID MD cluster. +media/ + - info on media drivers: uAPI, kAPI and driver documentation. memory-barriers.txt - info on Linux kernel memory barriers. memory-devices/ - directory with info on parts like the Texas Instruments EMIF driver memory-hotplug.txt - Hotpluggable memory support, how to use and current status. +men-chameleon-bus.txt + - info on MEN chameleon bus. metag/ - directory with info about Linux on Meta architecture. +mic/ + - Intel Many Integrated Core (MIC) architecture device driver. mips/ - directory with info about Linux on MIPS architecture. misc-devices/ @@ -322,12 +294,8 @@ mmc/ - directory with info about the MMC subsystem mn10300/ - directory with info about the mn10300 architecture port -module-signing.txt - - Kernel module signing for increased security when loading modules. mtd/ - directory with info about memory technology devices (flash) -mono.txt - - how to execute Mono-based .NET binaries with the help of BINFMT_MISC. namespaces/ - directory with various information about namespaces netlabel/ @@ -336,30 +304,42 @@ networking/ - directory with info on various aspects of networking with Linux. nfc/ - directory relating info about Near Field Communications support. +nios2/ + - Linux on the Nios II architecture. nommu-mmap.txt - documentation about no-mmu memory mapping support. numastat.txt - info on how to read Numa policy hit/miss statistics in sysfs. -oops-tracing.txt - - how to decode those nasty internal kernel error dump messages. +ntb.txt + - info on Non-Transparent Bridge (NTB) drivers. +nvdimm/ + - info on non-volatile devices. +nvmem/ + - info on non volatile memory framework. +output/ + - default directory where html/LaTeX/pdf files will be written. padata.txt - An introduction to the "padata" parallel execution API parisc/ - directory with info on using Linux on PA-RISC architecture. -parport.txt - - how to use the parallel-port driver. parport-lowlevel.txt - description and usage of the low level parallel port functions. pcmcia/ - info on the Linux PCMCIA driver. percpu-rw-semaphore.txt - RCU based read-write semaphore optimized for locking for reading +perf/ + - info about the APM X-Gene SoC Performance Monitoring Unit (PMU). +phy/ + - ino on Samsung USB 2.0 PHY adaptation layer. phy.txt - Description of the generic PHY framework. pi-futex.txt - documentation on lightweight priority inheritance futexes. pinctrl.txt - info on pinctrl subsystem and the PINMUX/PINCONF and drivers +platform/ + - List of supported hardware by compal and Dell laptop. pnp.txt - Linux Plug and Play documentation. power/ @@ -372,14 +352,16 @@ preempt-locking.txt - info on locking under a preemptive kernel. printk-formats.txt - how to get printk format specifiers right +process/ + - how to work with the mainline kernel development process. pps/ - directory with information on the pulse-per-second support +pti/ + - directory with info on Intel MID PTI. ptp/ - directory with info on support for IEEE 1588 PTP clocks in Linux. pwm.txt - info on the pulse width modulation driver subsystem -ramoops.txt - - documentation of the ramoops oops/panic logging module. rapidio/ - directory with info on RapidIO packet-based fabric interconnect rbtree.txt @@ -406,8 +388,6 @@ security/ - directory that contains security-related info serial/ - directory with info on the low level serial API. -serial-console.txt - - how to set up Linux with a serial line console as the default. sgi-ioc4.txt - description of the SGI IOC4 PCI (multi function) device. sh/ @@ -416,24 +396,20 @@ smsc_ece1099.txt -info on the smsc Keyboard Scan Expansion/GPIO Expansion device. sound/ - directory with info on sound card support. -sparse.txt - - info on how to obtain and use the sparse tool for typechecking. spi/ - overview of Linux kernel Serial Peripheral Interface (SPI) support. -stable_api_nonsense.txt - - info on why the kernel does not have a stable in-kernel api or abi. -stable_kernel_rules.txt - - rules and procedures for the -stable kernel releases. +sphinx/ + - no documentation here, just files required by Sphinx toolchain. +sphinx-static/ + - no documentation here, just files required by Sphinx toolchain. static-keys.txt - info on how static keys allow debug code in hotpaths via patching svga.txt - short guide on selecting video modes at boot via VGA BIOS. -sysfs-rules.txt - - How not to use sysfs. +sync_file.txt + - Sync file API guide. sysctl/ - directory with info on the /proc/sys/* files. -sysrq.txt - - info on the magic SysRq key. target/ - directory with info on generating TCM v4 fabric .ko modules this_cpu_ops.txt @@ -442,39 +418,29 @@ thermal/ - directory with information on managing thermal issues (CPU/temp) trace/ - directory with info on tracing technologies within linux +translations/ + - translations of this document from English to another language unaligned-memory-access.txt - info on how to avoid arch breaking unaligned memory access in code. -unicode.txt - - info on the Unicode character/font mapping used in Linux. unshare.txt - description of the Linux unshare system call. usb/ - directory with info regarding the Universal Serial Bus. -vDSO/ - - directory with info regarding virtual dynamic shared objects vfio.txt - info on Virtual Function I/O used in guest/hypervisor instances. -vgaarbiter.txt - - info on enable/disable the legacy decoding on different VGA devices video-output.txt - sysfs class driver interface to enable/disable a video output device. -video4linux/ - - directory with info regarding video/TV/radio cards and linux. virtual/ - directory with information on the various linux virtualizations. vm/ - directory with info on the Linux vm code. -vme_api.txt - - file relating info on the VME bus API in linux -volatile-considered-harmful.txt - - Why the "volatile" type class should not be used w1/ - directory with documents regarding the 1-wire (w1) subsystem. watchdog/ - how to auto-reboot Linux if it has "fallen and can't get up". ;-) wimax/ - directory with info about Intel Wireless Wimax Connections -workqueue.txt +core-api/workqueue.rst - information on the Concurrency Managed Workqueue implementation x86/x86_64/ - directory with info on Linux support for AMD x86-64 (Hammer) machines. @@ -484,7 +450,5 @@ xtensa/ - directory with documents relating to arch/xtensa port/implementation xz.txt - how to make use of the XZ data compression within linux kernel -zh_CN/ - - directory with Chinese translations of various documents zorro.txt - info on writing drivers for Zorro bus devices found on Amigas. diff --git a/Documentation/ABI/README b/Documentation/ABI/README index 1fafc4b0753b..3121029dce21 100644 --- a/Documentation/ABI/README +++ b/Documentation/ABI/README @@ -84,4 +84,4 @@ stable: - Kernel-internal symbols. Do not rely on the presence, absence, location, or type of any kernel symbol, either in System.map files or the kernel binary - itself. See Documentation/stable_api_nonsense.txt. + itself. See Documentation/process/stable-api-nonsense.rst. diff --git a/Documentation/ABI/stable/sysfs-devices b/Documentation/ABI/stable/sysfs-devices index df449d79b563..35c457f8ce73 100644 --- a/Documentation/ABI/stable/sysfs-devices +++ b/Documentation/ABI/stable/sysfs-devices @@ -8,3 +8,17 @@ Description: Any device associated with a device-tree node will have an of_path symlink pointing to the corresponding device node in /sys/firmware/devicetree/ + +What: /sys/devices/*/devspec +Date: October 2016 +Contact: Device Tree mailing list +Description: + If CONFIG_OF is enabled, then this file is present. When + read, it returns full name of the device node. + +What: /sys/devices/*/obppath +Date: October 2016 +Contact: Device Tree mailing list +Description: + If CONFIG_OF is enabled, then this file is present. When + read, it returns full name of the device node. diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index 71d184dbb70d..2da04ce6aeef 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -235,3 +235,45 @@ Description: write_same_max_bytes is 0, write same is not supported by the device. +What: /sys/block//queue/write_zeroes_max_bytes +Date: November 2016 +Contact: Chaitanya Kulkarni +Description: + Devices that support write zeroes operation in which a + single request can be issued to zero out the range of + contiguous blocks on storage without having any payload + in the request. This can be used to optimize writing zeroes + to the devices. write_zeroes_max_bytes indicates how many + bytes can be written in a single write zeroes command. If + write_zeroes_max_bytes is 0, write zeroes is not supported + by the device. + +What: /sys/block//queue/zoned +Date: September 2016 +Contact: Damien Le Moal +Description: + zoned indicates if the device is a zoned block device + and the zone model of the device if it is indeed zoned. + The possible values indicated by zoned are "none" for + regular block devices and "host-aware" or "host-managed" + for zoned block devices. The characteristics of + host-aware and host-managed zoned block devices are + described in the ZBC (Zoned Block Commands) and ZAC + (Zoned Device ATA Command Set) standards. These standards + also define the "drive-managed" zone model. However, + since drive-managed zoned block devices do not support + zone commands, they will be treated as regular block + devices and zoned will report "none". + +What: /sys/block//queue/chunk_sectors +Date: September 2016 +Contact: Hannes Reinecke +Description: + chunk_sectors has different meaning depending on the type + of the disk. For a RAID device (dm-raid), chunk_sectors + indicates the size in 512B sectors of the RAID volume + stripe segment. For a zoned block device, either + host-aware or host-managed, chunk_sectors indicates the + size of 512B sectors of the zones of the device, with + the eventual exception of the last zone of the device + which may be smaller. diff --git a/Documentation/ABI/testing/sysfs-bus-fsl-mc b/Documentation/ABI/testing/sysfs-bus-fsl-mc new file mode 100644 index 000000000000..80256b8b4f26 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-fsl-mc @@ -0,0 +1,21 @@ +What: /sys/bus/fsl-mc/drivers/.../bind +Date: December 2016 +Contact: stuart.yoder@nxp.com +Description: + Writing a device location to this file will cause + the driver to attempt to bind to the device found at + this location. The format for the location is Object.Id + and is the same as found in /sys/bus/fsl-mc/devices/. + For example: + # echo dpni.2 > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/bind + +What: /sys/bus/fsl-mc/drivers/.../unbind +Date: December 2016 +Contact: stuart.yoder@nxp.com +Description: + Writing a device location to this file will cause the + driver to attempt to unbind from the device found at + this location. The format for the location is Object.Id + and is the same as found in /sys/bus/fsl-mc/devices/. + For example: + # echo dpni.2 > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/unbind diff --git a/Documentation/ABI/testing/sysfs-bus-iio b/Documentation/ABI/testing/sysfs-bus-iio index fee35c00cc4e..b8f220f978dd 100644 --- a/Documentation/ABI/testing/sysfs-bus-iio +++ b/Documentation/ABI/testing/sysfs-bus-iio @@ -329,6 +329,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_pressure_scale What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_scale What: /sys/bus/iio/devices/iio:deviceX/in_velocity_sqrt(x^2+y^2+z^2)_scale What: /sys/bus/iio/devices/iio:deviceX/in_illuminance_scale +What: /sys/bus/iio/devices/iio:deviceX/in_countY_scale KernelVersion: 2.6.35 Contact: linux-iio@vger.kernel.org Description: @@ -1579,3 +1580,20 @@ Contact: linux-iio@vger.kernel.org Description: Raw (unscaled no offset etc.) electric conductivity reading that can be processed to siemens per meter. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_raw +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Raw counter device counts from channel Y. For quadrature + counters, multiplication by an available [Y]_scale results in + the counts of a single quadrature signal phase from channel Y. + +What: /sys/bus/iio/devices/iio:deviceX/in_indexY_raw +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Raw counter device index value from channel Y. This attribute + provides an absolute positional reference (e.g. a pulse once per + revolution) which may be used to home positional systems as + required. diff --git a/Documentation/ABI/testing/sysfs-bus-iio-adc-envelope-detector b/Documentation/ABI/testing/sysfs-bus-iio-adc-envelope-detector new file mode 100644 index 000000000000..2071f9bcfaa5 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-adc-envelope-detector @@ -0,0 +1,36 @@ +What: /sys/bus/iio/devices/iio:deviceX/in_altvoltageY_invert +Date: October 2016 +KernelVersion: 4.9 +Contact: Peter Rosin +Description: + The DAC is used to find the peak level of an alternating + voltage input signal by a binary search using the output + of a comparator wired to an interrupt pin. Like so: + _ + | \ + input +------>-------|+ \ + | \ + .-------. | }---. + | | | / | + | dac|-->--|- / | + | | |_/ | + | | | + | | | + | irq|------<-------' + | | + '-------' + The boolean invert attribute (0/1) should be set when the + input signal is centered around the maximum value of the + dac instead of zero. The envelope detector will search + from below in this case and will also invert the result. + The edge/level of the interrupt is also switched to its + opposite value. + +What: /sys/bus/iio/devices/iio:deviceX/in_altvoltageY_compare_interval +Date: October 2016 +KernelVersion: 4.9 +Contact: Peter Rosin +Description: + Number of milliseconds to wait for the comparator in each + step of the binary search for the input peak level. Needs + to relate to the frequency of the input signal. diff --git a/Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8 b/Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8 new file mode 100644 index 000000000000..ba676520b953 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8 @@ -0,0 +1,125 @@ +What: /sys/bus/iio/devices/iio:deviceX/in_count_count_direction_available +What: /sys/bus/iio/devices/iio:deviceX/in_count_count_mode_available +What: /sys/bus/iio/devices/iio:deviceX/in_count_noise_error_available +What: /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available +What: /sys/bus/iio/devices/iio:deviceX/in_index_index_polarity_available +What: /sys/bus/iio/devices/iio:deviceX/in_index_synchronous_mode_available +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Discrete set of available values for the respective counter + configuration are listed in this file. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_direction +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Read-only attribute that indicates whether the counter for + channel Y is counting up or down. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_mode +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Count mode for channel Y. Four count modes are available: + normal, range limit, non-recycle, and modulo-n. The preset value + for channel Y is used by the count mode where required. + + Normal: + Counting is continuous in either direction. + + Range Limit: + An upper or lower limit is set, mimicking limit switches + in the mechanical counterpart. The upper limit is set to + the preset value, while the lower limit is set to 0. The + counter freezes at count = preset when counting up, and + at count = 0 when counting down. At either of these + limits, the counting is resumed only when the count + direction is reversed. + + Non-recycle: + Counter is disabled whenever a 24-bit count overflow or + underflow takes place. The counter is re-enabled when a + new count value is loaded to the counter via a preset + operation or write to raw. + + Modulo-N: + A count boundary is set between 0 and the preset value. + The counter is reset to 0 at count = preset when + counting up, while the counter is set to the preset + value at count = 0 when counting down; the counter does + not freeze at the bundary points, but counts + continuously throughout. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_noise_error +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Read-only attribute that indicates whether excessive noise is + present at the channel Y count inputs in quadrature clock mode; + irrelevant in non-quadrature clock mode. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_preset +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + If the counter device supports preset registers, the preset + count for channel Y is provided by this attribute. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_quadrature_mode +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Configure channel Y counter for non-quadrature or quadrature + clock mode. Selecting non-quadrature clock mode will disable + synchronous load mode. In quadrature clock mode, the channel Y + scale attribute selects the encoder phase division (scale of 1 + selects full-cycle, scale of 0.5 selects half-cycle, scale of + 0.25 selects quarter-cycle) processed by the channel Y counter. + + Non-quadrature: + The filter and decoder circuit are bypassed. Encoder A + input serves as the count input and B as the UP/DOWN + direction control input, with B = 1 selecting UP Count + mode and B = 0 selecting Down Count mode. + + Quadrature: + Encoder A and B inputs are digitally filtered and + decoded for UP/DN clock. + +What: /sys/bus/iio/devices/iio:deviceX/in_countY_set_to_preset_on_index +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Whether to set channel Y counter with channel Y preset value + when channel Y index input is active, or continuously count. + Valid attribute values are boolean. + +What: /sys/bus/iio/devices/iio:deviceX/in_indexY_index_polarity +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Active level of channel Y index input; irrelevant in + non-synchronous load mode. + +What: /sys/bus/iio/devices/iio:deviceX/in_indexY_synchronous_mode +KernelVersion: 4.9 +Contact: linux-iio@vger.kernel.org +Description: + Configure channel Y counter for non-synchronous or synchronous + load mode. Synchronous load mode cannot be selected in + non-quadrature clock mode. + + Non-synchronous: + A logic low level is the active level at this index + input. The index function (as enabled via + set_to_preset_on_index) is performed directly on the + active level of the index input. + + Synchronous: + Intended for interfacing with encoder Index output in + quadrature clock mode. The active level is configured + via index_polarity. The index function (as enabled via + set_to_preset_on_index) is performed synchronously with + the quadrature clock on the active level of the index + input. diff --git a/Documentation/ABI/testing/sysfs-bus-iio-cros-ec b/Documentation/ABI/testing/sysfs-bus-iio-cros-ec new file mode 100644 index 000000000000..297b9720f024 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-cros-ec @@ -0,0 +1,18 @@ +What: /sys/bus/iio/devices/iio:deviceX/calibrate +Date: July 2015 +KernelVersion: 4.7 +Contact: linux-iio@vger.kernel.org +Description: + Writing '1' will perform a FOC (Fast Online Calibration). The + corresponding calibration offsets can be read from *_calibbias + entries. + +What: /sys/bus/iio/devices/iio:deviceX/location +Date: July 2015 +KernelVersion: 4.7 +Contact: linux-iio@vger.kernel.org +Description: + This attribute returns a string with the physical location where + the motion sensor is placed. For example, in a laptop a motion + sensor can be located on the base or on the lid. Current valid + values are 'base' and 'lid'. diff --git a/Documentation/ABI/testing/sysfs-bus-iio-dac-dpot-dac b/Documentation/ABI/testing/sysfs-bus-iio-dac-dpot-dac new file mode 100644 index 000000000000..580e93f373f6 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-dac-dpot-dac @@ -0,0 +1,8 @@ +What: /sys/bus/iio/devices/iio:deviceX/out_voltageY_raw_available +Date: October 2016 +KernelVersion: 4.9 +Contact: Peter Rosin +Description: + The range of available values represented as the minimum value, + the step and the maximum value, all enclosed in square brackets. + Example: [0 1 256] diff --git a/Documentation/ABI/testing/sysfs-bus-iio-light-isl29018 b/Documentation/ABI/testing/sysfs-bus-iio-light-isl29018 new file mode 100644 index 000000000000..f0ce0a0476ea --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-light-isl29018 @@ -0,0 +1,19 @@ +What: /sys/bus/iio/devices/iio:deviceX/proximity_on_chip_ambient_infrared_suppression +Date: January 2011 +KernelVersion: 2.6.37 +Contact: linux-iio@vger.kernel.org +Description: + From ISL29018 Data Sheet (FN6619.4, Oct 8, 2012) regarding the + infrared suppression: + + Scheme 0, makes full n (4, 8, 12, 16) bits (unsigned) proximity + detection. The range of Scheme 0 proximity count is from 0 to + 2^n. Logic 1 of this bit, Scheme 1, makes n-1 (3, 7, 11, 15) + bits (2's complementary) proximity_less_ambient detection. The + range of Scheme 1 proximity count is from -2^(n-1) to 2^(n-1). + The sign bit is extended for resolutions less than 16. While + Scheme 0 has wider dynamic range, Scheme 1 proximity detection + is less affected by the ambient IR noise variation. + + 0 Sensing IR from LED and ambient + 1 Sensing IR from LED with ambient IR rejection diff --git a/drivers/staging/iio/Documentation/sysfs-bus-iio-light-tsl2583 b/Documentation/ABI/testing/sysfs-bus-iio-light-tsl2583 similarity index 74% rename from drivers/staging/iio/Documentation/sysfs-bus-iio-light-tsl2583 rename to Documentation/ABI/testing/sysfs-bus-iio-light-tsl2583 index 660781df409f..a2e19964e87e 100644 --- a/drivers/staging/iio/Documentation/sysfs-bus-iio-light-tsl2583 +++ b/Documentation/ABI/testing/sysfs-bus-iio-light-tsl2583 @@ -1,18 +1,18 @@ -What: /sys/bus/iio/devices/device[n]/lux_table +What: /sys/bus/iio/devices/device[n]/in_illuminance_calibrate KernelVersion: 2.6.37 Contact: linux-iio@vger.kernel.org Description: - This property gets/sets the table of coefficients - used in calculating illuminance in lux. + This property causes an internal calibration of the als gain trim + value which is later used in calculating illuminance in lux. -What: /sys/bus/iio/devices/device[n]/illuminance0_calibrate +What: /sys/bus/iio/devices/device[n]/in_illuminance_lux_table KernelVersion: 2.6.37 Contact: linux-iio@vger.kernel.org Description: - This property causes an internal calibration of the als gain trim - value which is later used in calculating illuminance in lux. + This property gets/sets the table of coefficients + used in calculating illuminance in lux. -What: /sys/bus/iio/devices/device[n]/illuminance0_input_target +What: /sys/bus/iio/devices/device[n]/in_illuminance_input_target KernelVersion: 2.6.37 Contact: linux-iio@vger.kernel.org Description: diff --git a/Documentation/ABI/testing/sysfs-bus-iio-potentiometer-mcp4531 b/Documentation/ABI/testing/sysfs-bus-iio-potentiometer-mcp4531 new file mode 100644 index 000000000000..2a91fbe394fc --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-potentiometer-mcp4531 @@ -0,0 +1,8 @@ +What: /sys/bus/iio/devices/iio:deviceX/out_resistance_raw_available +Date: October 2016 +KernelVersion: 4.9 +Contact: Peter Rosin +Description: + The range of available values represented as the minimum value, + the step and the maximum value, all enclosed in square brackets. + Example: [0 1 256] diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index b3bc50f650ee..5a1732b78707 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -294,3 +294,10 @@ Description: a firmware bug to the system vendor. Writing to this file taints the kernel with TAINT_FIRMWARE_WORKAROUND, which reduces the supportability of your system. + +What: /sys/bus/pci/devices/.../revision +Date: November 2016 +Contact: Emil Velikov +Description: + This file contains the revision field of the the PCI device. + The value comes from device config space. The file is read only. diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev b/Documentation/ABI/testing/sysfs-bus-vfio-mdev new file mode 100644 index 000000000000..452dbe39270e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev @@ -0,0 +1,111 @@ +What: /sys/...//mdev_supported_types/ +Date: October 2016 +Contact: Kirti Wankhede +Description: + This directory contains list of directories of currently + supported mediated device types and their details for + . Supported type attributes are defined by the + vendor driver who registers with Mediated device framework. + Each supported type is a directory whose name is created + by adding the device driver string as a prefix to the + string provided by the vendor driver. + +What: /sys/...//mdev_supported_types// +Date: October 2016 +Contact: Kirti Wankhede +Description: + This directory gives details of supported type, like name, + description, available_instances, device_api etc. + 'device_api' and 'available_instances' are mandatory + attributes to be provided by vendor driver. 'name', + 'description' and other vendor driver specific attributes + are optional. + +What: /sys/.../mdev_supported_types//create +Date: October 2016 +Contact: Kirti Wankhede +Description: + Writing UUID to this file will create mediated device of + type for parent device . This is a + write-only file. + For example: + # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ + /sys/devices/foo/mdev_supported_types/foo-1/create + +What: /sys/.../mdev_supported_types//devices/ +Date: October 2016 +Contact: Kirti Wankhede +Description: + This directory contains symbolic links pointing to mdev + devices sysfs entries which are created of this . + +What: /sys/.../mdev_supported_types//available_instances +Date: October 2016 +Contact: Kirti Wankhede +Description: + Reading this attribute will show the number of mediated + devices of type that can be created. This is a + readonly file. +Users: + Userspace applications interested in creating mediated + device of that type. Userspace application should check + the number of available instances could be created before + creating mediated device of this type. + +What: /sys/.../mdev_supported_types//device_api +Date: October 2016 +Contact: Kirti Wankhede +Description: + Reading this attribute will show VFIO device API supported + by this type. For example, "vfio-pci" for a PCI device, + "vfio-platform" for platform device. + +What: /sys/.../mdev_supported_types//name +Date: October 2016 +Contact: Kirti Wankhede +Description: + Reading this attribute will show human readable name of the + mediated device that will get created of type . + This is optional attribute. For example: "Grid M60-0Q" +Users: + Userspace applications interested in knowing the name of + a particular that can help in understanding the + type of mediated device. + +What: /sys/.../mdev_supported_types//description +Date: October 2016 +Contact: Kirti Wankhede +Description: + Reading this attribute will show description of the type of + mediated device that will get created of type . + This is optional attribute. For example: + "2 heads, 512M FB, 2560x1600 maximum resolution" +Users: + Userspace applications interested in knowing the details of + a particular that can help in understanding the + features provided by that type of mediated device. + +What: /sys/.../// +Date: October 2016 +Contact: Kirti Wankhede +Description: + This directory represents device directory of mediated + device. It contains all the attributes related to mediated + device. + +What: /sys/...///mdev_type +Date: October 2016 +Contact: Kirti Wankhede +Description: + This is symbolic link pointing to supported type, + directory of which this mediated device is created. + +What: /sys/...///remove +Date: October 2016 +Contact: Kirti Wankhede +Description: + Writing '1' to this file destroys the mediated device. The + vendor driver can fail the remove() callback if that device + is active and the vendor driver doesn't support hot unplug. + Example: + # echo 1 > /sys/bus/mdev/devices//remove diff --git a/Documentation/ABI/testing/sysfs-class-fpga-bridge b/Documentation/ABI/testing/sysfs-class-fpga-bridge new file mode 100644 index 000000000000..312ae2c579d8 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-fpga-bridge @@ -0,0 +1,11 @@ +What: /sys/class/fpga_bridge//name +Date: January 2016 +KernelVersion: 4.5 +Contact: Alan Tull +Description: Name of low level FPGA bridge driver. + +What: /sys/class/fpga_bridge//state +Date: January 2016 +KernelVersion: 4.5 +Contact: Alan Tull +Description: Show bridge state as "enabled" or "disabled" diff --git a/Documentation/ABI/testing/sysfs-class-led b/Documentation/ABI/testing/sysfs-class-led index 86ace287d48b..491cdeedc195 100644 --- a/Documentation/ABI/testing/sysfs-class-led +++ b/Documentation/ABI/testing/sysfs-class-led @@ -4,16 +4,24 @@ KernelVersion: 2.6.17 Contact: Richard Purdie Description: Set the brightness of the LED. Most LEDs don't - have hardware brightness support so will just be turned on for + have hardware brightness support, so will just be turned on for non-zero brightness settings. The value is between 0 and /sys/class/leds//max_brightness. + Writing 0 to this file clears active trigger. + + Writing non-zero to this file while trigger is active changes the + top brightness trigger is going to use. + What: /sys/class/leds//max_brightness Date: March 2006 KernelVersion: 2.6.17 Contact: Richard Purdie Description: - Maximum brightness level for this led, default is 255 (LED_FULL). + Maximum brightness level for this LED, default is 255 (LED_FULL). + + If the LED does not support different brightness levels, this + should be 1. What: /sys/class/leds//trigger Date: March 2006 @@ -21,7 +29,7 @@ KernelVersion: 2.6.17 Contact: Richard Purdie Description: Set the trigger for this LED. A trigger is a kernel based source - of led events. + of LED events. You can change triggers in a similar manner to the way an IO scheduler is chosen. Trigger specific parameters can appear in /sys/class/leds/ once a given trigger is selected. For diff --git a/Documentation/ABI/testing/sysfs-class-mei b/Documentation/ABI/testing/sysfs-class-mei index 80d9888a8ece..5096a82f4cde 100644 --- a/Documentation/ABI/testing/sysfs-class-mei +++ b/Documentation/ABI/testing/sysfs-class-mei @@ -29,3 +29,19 @@ Description: Display fw status registers content Also number of registers varies between 1 and 6 depending on generation. +What: /sys/class/mei/meiN/hbm_ver +Date: Aug 2016 +KernelVersion: 4.9 +Contact: Tomas Winkler +Description: Display the negotiated HBM protocol version. + + The HBM protocol version negotiated + between the driver and the device. + +What: /sys/class/mei/meiN/hbm_ver_drv +Date: Aug 2016 +KernelVersion: 4.9 +Contact: Tomas Winkler +Description: Display the driver HBM protocol version. + + The HBM protocol version supported by the driver. diff --git a/Documentation/ABI/testing/sysfs-class-remoteproc b/Documentation/ABI/testing/sysfs-class-remoteproc new file mode 100644 index 000000000000..d188afebc8ba --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-remoteproc @@ -0,0 +1,50 @@ +What: /sys/class/remoteproc/.../firmware +Date: October 2016 +Contact: Matt Redfearn +Description: Remote processor firmware + + Reports the name of the firmware currently loaded to the + remote processor. + + To change the running firmware, ensure the remote processor is + stopped (using /sys/class/remoteproc/.../state) and write a new filename. + +What: /sys/class/remoteproc/.../state +Date: October 2016 +Contact: Matt Redfearn +Description: Remote processor state + + Reports the state of the remote processor, which will be one of: + + "offline" + "suspended" + "running" + "crashed" + "invalid" + + "offline" means the remote processor is powered off. + + "suspended" means that the remote processor is suspended and + must be woken to receive messages. + + "running" is the normal state of an available remote processor + + "crashed" indicates that a problem/crash has been detected on + the remote processor. + + "invalid" is returned if the remote processor is in an + unknown state. + + Writing this file controls the state of the remote processor. + The following states can be written: + + "start" + "stop" + + Writing "start" will attempt to start the processor running the + firmware indicated by, or written to, + /sys/class/remoteproc/.../firmware. The remote processor should + transition to "running" state. + + Writing "stop" will attempt to halt the remote processor and + return it to the "offline" state. diff --git a/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration b/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration index 4cf1e72222d9..ec950c93e5c6 100644 --- a/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration +++ b/Documentation/ABI/testing/sysfs-class-rtc-rtc0-device-rtc_calibration @@ -1,8 +1,9 @@ -What: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock +What: /sys/class/rtc/rtc0/device/rtc_calibration Date: Oct 2011 KernelVersion: 3.0 Contact: Mark Godfrey -Description: The rtc_calibration attribute allows the userspace to +Description: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock + The rtc_calibration attribute allows the userspace to calibrate the AB8500.s 32KHz Real Time Clock. Every 60 seconds the AB8500 will correct the RTC's value by adding to it the value of this attribute. diff --git a/Documentation/ABI/testing/sysfs-devices-deferred_probe b/Documentation/ABI/testing/sysfs-devices-deferred_probe new file mode 100644 index 000000000000..58553d7a321f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-deferred_probe @@ -0,0 +1,12 @@ +What: /sys/devices/.../deferred_probe +Date: August 2016 +Contact: Ben Hutchings +Description: + The /sys/devices/.../deferred_probe attribute is + present for all devices. If a driver detects during + probing a device that a related device is not yet + ready, it may defer probing of the first device. The + kernel will retry probing the first device after any + other device is successfully probed. This attribute + reads as 1 if probing of this device is currently + deferred, or 0 otherwise. diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 498741737055..2a4a423d08e0 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -272,6 +272,22 @@ Description: Parameters for the CPU cache attributes the modified cache line is written to main memory only when it is replaced + +What: /sys/devices/system/cpu/cpu*/cache/index*/id +Date: September 2016 +Contact: Linux kernel mailing list +Description: Cache id + + The id provides a unique number for a specific instance of + a cache of a particular type. E.g. there may be a level + 3 unified cache on each socket in a server and we may + assign them ids 0, 1, 2, ... + + Note that id value can be non-contiguous. E.g. level 1 + caches typically exist per core, but there may not be a + power of two cores on a socket, so these caches may be + numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ... + What: /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab index 91bd6ca5440f..2cc0a72b64be 100644 --- a/Documentation/ABI/testing/sysfs-kernel-slab +++ b/Documentation/ABI/testing/sysfs-kernel-slab @@ -347,7 +347,7 @@ Description: because of fragmentation, SLUB will retry with the minimum order possible depending on its characteristics. When debug_guardpage_minorder=N (N > 0) parameter is specified - (see Documentation/kernel-parameters.txt), the minimum possible + (see Documentation/admin-guide/kernel-parameters.rst), the minimum possible order is used and this sysfs entry can not be used to change the order at run time. diff --git a/Documentation/ABI/testing/sysfs-platform-phy-rcar-gen3-usb2 b/Documentation/ABI/testing/sysfs-platform-phy-rcar-gen3-usb2 new file mode 100644 index 000000000000..6212697bbf6f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-platform-phy-rcar-gen3-usb2 @@ -0,0 +1,15 @@ +What: /sys/devices/platform//role +Date: October 2016 +KernelVersion: 4.10 +Contact: Yoshihiro Shimoda +Description: + This file can be read and write. + The file can show/change the phy mode for role swap of usb. + + Write the following strings to change the mode: + "host" - switching mode from peripheral to host. + "peripheral" - switching mode from host to peripheral. + + Read the file, then it shows the following strings: + "host" - The mode is host now. + "peripheral" - The mode is peripheral now. diff --git a/Documentation/ABI/testing/sysfs-platform-sst-atom b/Documentation/ABI/testing/sysfs-platform-sst-atom new file mode 100644 index 000000000000..0d07c0395660 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-platform-sst-atom @@ -0,0 +1,17 @@ +What: /sys/devices/platform/8086%x:00/firmware_version +Date: November 2016 +KernelVersion: 4.10 +Contact: "Sebastien Guiriec" +Description: + LPE Firmware version for SST driver on all atom + plaforms (BYT/CHT/Merrifield/BSW). + If the FW has never been loaded it will display: + "FW not yet loaded" + If FW has been loaded it will display: + "v01.aa.bb.cc" + aa: Major version is reflecting SoC version: + 0d: BYT FW + 0b: BSW FW + 07: Merrifield FW + bb: Minor version + cc: Build version diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index 50b368d490b5..f523e5a3ac33 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -7,30 +7,35 @@ Description: subsystem. What: /sys/power/state -Date: May 2014 +Date: November 2016 Contact: Rafael J. Wysocki Description: The /sys/power/state file controls system sleep states. Reading from this file returns the available sleep state - labels, which may be "mem", "standby", "freeze" and "disk" - (hibernation). The meanings of the first three labels depend on - the relative_sleep_states command line argument as follows: - 1) relative_sleep_states = 1 - "mem", "standby", "freeze" represent non-hibernation sleep - states from the deepest ("mem", always present) to the - shallowest ("freeze"). "standby" and "freeze" may or may - not be present depending on the capabilities of the - platform. "freeze" can only be present if "standby" is - present. - 2) relative_sleep_states = 0 (default) - "mem" - "suspend-to-RAM", present if supported. - "standby" - "power-on suspend", present if supported. - "freeze" - "suspend-to-idle", always present. - - Writing to this file one of these strings causes the system to - transition into the corresponding state, if available. See - Documentation/power/states.txt for a description of what - "suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean. + labels, which may be "mem" (suspend), "standby" (power-on + suspend), "freeze" (suspend-to-idle) and "disk" (hibernation). + + Writing one of the above strings to this file causes the system + to transition into the corresponding state, if available. + + See Documentation/power/states.txt for more information. + +What: /sys/power/mem_sleep +Date: November 2016 +Contact: Rafael J. Wysocki +Description: + The /sys/power/mem_sleep file controls the operating mode of + system suspend. Reading from it returns the available modes + as "s2idle" (always present), "shallow" and "deep" (present if + supported). The mode that will be used on subsequent attempts + to suspend the system (by writing "mem" to the /sys/power/state + file described above) is enclosed in square brackets. + + Writing one of the above strings to this file causes the mode + represented by it to be used on subsequent attempts to suspend + the system. + + See Documentation/power/states.txt for more information. What: /sys/power/disk Date: September 2006 diff --git a/Documentation/BUG-HUNTING b/Documentation/BUG-HUNTING deleted file mode 100644 index 65022a87bf17..000000000000 --- a/Documentation/BUG-HUNTING +++ /dev/null @@ -1,246 +0,0 @@ -Table of contents -================= - -Last updated: 20 December 2005 - -Contents -======== - -- Introduction -- Devices not appearing -- Finding patch that caused a bug --- Finding using git-bisect --- Finding it the old way -- Fixing the bug - -Introduction -============ - -Always try the latest kernel from kernel.org and build from source. If you are -not confident in doing that please report the bug to your distribution vendor -instead of to a kernel developer. - -Finding bugs is not always easy. Have a go though. If you can't find it don't -give up. Report as much as you have found to the relevant maintainer. See -MAINTAINERS for who that is for the subsystem you have worked on. - -Before you submit a bug report read REPORTING-BUGS. - -Devices not appearing -===================== - -Often this is caused by udev. Check that first before blaming it on the -kernel. - -Finding patch that caused a bug -=============================== - - - -Finding using git-bisect ------------------------- - -Using the provided tools with git makes finding bugs easy provided the bug is -reproducible. - -Steps to do it: -- start using git for the kernel source -- read the man page for git-bisect -- have fun - -Finding it the old way ----------------------- - -[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] - -This is how to track down a bug if you know nothing about kernel hacking. -It's a brute force approach but it works pretty well. - -You need: - - . A reproducible bug - it has to happen predictably (sorry) - . All the kernel tar files from a revision that worked to the - revision that doesn't - -You will then do: - - . Rebuild a revision that you believe works, install, and verify that. - . Do a binary search over the kernels to figure out which one - introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but - you know that 1.3.69 does. Pick a kernel in the middle and build - that, like 1.3.50. Build & test; if it works, pick the mid point - between .50 and .69, else the mid point between .28 and .50. - . You'll narrow it down to the kernel that introduced the bug. You - can probably do better than this but it gets tricky. - - . Narrow it down to a subdirectory - - - Copy kernel that works into "test". Let's say that 3.62 works, - but 3.63 doesn't. So you diff -r those two kernels and come - up with a list of directories that changed. For each of those - directories: - - Copy the non-working directory next to the working directory - as "dir.63". - One directory at time, try moving the working directory to - "dir.62" and mv dir.63 dir"time, try - - mv dir dir.62 - mv dir.63 dir - find dir -name '*.[oa]' -print | xargs rm -f - - And then rebuild and retest. Assuming that all related - changes were contained in the sub directory, this should - isolate the change to a directory. - - Problems: changes in header files may have occurred; I've - found in my case that they were self explanatory - you may - or may not want to give up when that happens. - - . Narrow it down to a file - - - You can apply the same technique to each file in the directory, - hoping that the changes in that file are self contained. - - . Narrow it down to a routine - - - You can take the old file and the new file and manually create - a merged file that has - - #ifdef VER62 - routine() - { - ... - } - #else - routine() - { - ... - } - #endif - - And then walk through that file, one routine at a time and - prefix it with - - #define VER62 - /* both routines here */ - #undef VER62 - - Then recompile, retest, move the ifdefs until you find the one - that makes the difference. - -Finally, you take all the info that you have, kernel revisions, bug -description, the extent to which you have narrowed it down, and pass -that off to whomever you believe is the maintainer of that section. -A post to linux.dev.kernel isn't such a bad idea if you've done some -work to narrow it down. - -If you get it down to a routine, you'll probably get a fix in 24 hours. - -My apologies to Linus and the other kernel hackers for describing this -brute force approach, it's hardly what a kernel hacker would do. However, -it does work and it lets non-hackers help fix bugs. And it is cool -because Linux snapshots will let you do this - something that you can't -do with vendor supplied releases. - -Fixing the bug -============== - -Nobody is going to tell you how to fix bugs. Seriously. You need to work it -out. But below are some hints on how to use the tools. - -To debug a kernel, use objdump and look for the hex offset from the crash -output to find the valid line of code/assembler. Without debug symbols, you -will see the assembler code for the routine shown, but if your kernel has -debug symbols the C code will also be available. (Debug symbols can be enabled -in the kernel hacking menu of the menu configuration.) For example: - - objdump -r -S -l --disassemble net/dccp/ipv4.o - -NB.: you need to be at the top level of the kernel tree for this to pick up -your C files. - -If you don't have access to the code you can also debug on some crash dumps -e.g. crash dump output as shown by Dave Miller. - -> EIP is at ip_queue_xmit+0x14/0x4c0 -> ... -> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 -> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 -> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 -> -> Put the bytes into a "foo.s" file like this: -> -> .text -> .globl foo -> foo: -> .byte .... /* bytes from Code: part of OOPS dump */ -> -> Compile it with "gcc -c -o foo.o foo.s" then look at the output of -> "objdump --disassemble foo.o". -> -> Output: -> -> ip_queue_xmit: -> push %ebp -> push %edi -> push %esi -> push %ebx -> sub $0xbc, %esp -> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) -> mov 0x8(%ebp), %ebx ! %ebx = skb->sk -> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt - -In addition, you can use GDB to figure out the exact file and line -number of the OOPS from the vmlinux file. If you have -CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the -OOPS: - - EIP: 0060:[] Not tainted VLI - -And use GDB to translate that to human-readable form: - - gdb vmlinux - (gdb) l *0xc021e50e - -If you don't have CONFIG_DEBUG_INFO enabled, you use the function -offset from the OOPS: - - EIP is at vt_ioctl+0xda8/0x1482 - -And recompile the kernel with CONFIG_DEBUG_INFO enabled: - - make vmlinux - gdb vmlinux - (gdb) p vt_ioctl - (gdb) l *(0x
+ 0xda8) -or, as one command - (gdb) l *(vt_ioctl + 0xda8) - -If you have a call trace, such as :- ->Call Trace: -> [] :jbd:log_wait_commit+0xa3/0xf5 -> [] autoremove_wake_function+0x0/0x2e -> [] :jbd:journal_stop+0x1be/0x1ee -> ... -this shows the problem in the :jbd: module. You can load that module in gdb -and list the relevant code. - gdb fs/jbd/jbd.ko - (gdb) p log_wait_commit - (gdb) l *(0x
+ 0xa3) -or - (gdb) l *(log_wait_commit + 0xa3) - - -Another very useful option of the Kernel Hacking section in menuconfig is -Debug memory allocations. This will help you see whether data has been -initialised and not set before use etc. To see the values that get assigned -with this look at mm/slab.c and search for POISON_INUSE. When using this an -Oops will often show the poisoned data instead of zero which is the default. - -Once you have worked out a fix please submit it upstream. After all open -source is about sharing what you do and don't you want to be recognised for -your genius? - -Please do read Documentation/SubmittingPatches though to help your code get -accepted. diff --git a/Documentation/Changes b/Documentation/Changes deleted file mode 100644 index 22797a15dc24..000000000000 --- a/Documentation/Changes +++ /dev/null @@ -1,485 +0,0 @@ -.. _changes: - -Minimal requerements to compile the Kernel -++++++++++++++++++++++++++++++++++++++++++ - -Intro -===== - -This document is designed to provide a list of the minimum levels of -software necessary to run the 4.x kernels. - -This document is originally based on my "Changes" file for 2.0.x kernels -and therefore owes credit to the same people as that file (Jared Mauch, -Axel Boldt, Alessandro Sigala, and countless other users all over the -'net). - -Current Minimal Requirements -**************************** - -Upgrade to at **least** these software revisions before thinking you've -encountered a bug! If you're unsure what version you're currently -running, the suggested command should tell you. - -Again, keep in mind that this list assumes you are already functionally -running a Linux kernel. Also, not all tools are necessary on all -systems; obviously, if you don't have any ISDN hardware, for example, -you probably needn't concern yourself with isdn4k-utils. - -====================== =============== ======================================== - Program Minimal version Command to check the version -====================== =============== ======================================== -GNU C 3.2 gcc --version -GNU make 3.80 make --version -binutils 2.12 ld -v -util-linux 2.10o fdformat --version -module-init-tools 0.9.10 depmod -V -e2fsprogs 1.41.4 e2fsck -V -jfsutils 1.1.3 fsck.jfs -V -reiserfsprogs 3.6.3 reiserfsck -V -xfsprogs 2.6.0 xfs_db -V -squashfs-tools 4.0 mksquashfs -version -btrfs-progs 0.18 btrfsck -pcmciautils 004 pccardctl -V -quota-tools 3.09 quota -V -PPP 2.4.0 pppd --version -isdn4k-utils 3.1pre1 isdnctrl 2>&1|grep version -nfs-utils 1.0.5 showmount --version -procps 3.2.0 ps --version -oprofile 0.9 oprofiled --version -udev 081 udevd --version -grub 0.93 grub --version || grub-install --version -mcelog 0.6 mcelog --version -iptables 1.4.2 iptables -V -openssl & libcrypto 1.0.0 openssl version -bc 1.06.95 bc --version -Sphinx\ [#f1]_ 1.2 sphinx-build --version -====================== =============== ======================================== - -.. [#f1] Sphinx is needed only to build the Kernel documentation - -Kernel compilation -****************** - -GCC ---- - -The gcc version requirements may vary depending on the type of CPU in your -computer. - -Make ----- - -You will need GNU make 3.80 or later to build the kernel. - -Binutils --------- - -Linux on IA-32 has recently switched from using ``as86`` to using ``gas`` for -assembling the 16-bit boot code, removing the need for ``as86`` to compile -your kernel. This change does, however, mean that you need a recent -release of binutils. - -Perl ----- - -You will need perl 5 and the following modules: ``Getopt::Long``, -``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the kernel. - -BC --- - -You will need bc to build kernels 3.10 and higher - - -OpenSSL -------- - -Module signing and external certificate handling use the OpenSSL program and -crypto library to do key creation and signature generation. - -You will need openssl to build kernels 3.7 and higher if module signing is -enabled. You will also need openssl development packages to build kernels 4.3 -and higher. - - -System utilities -**************** - -Architectural changes ---------------------- - -DevFS has been obsoleted in favour of udev -(http://www.kernel.org/pub/linux/utils/kernel/hotplug/) - -32-bit UID support is now in place. Have fun! - -Linux documentation for functions is transitioning to inline -documentation via specially-formatted comments near their -definitions in the source. These comments can be combined with the -SGML templates in the Documentation/DocBook directory to make DocBook -files, which can then be converted by DocBook stylesheets to PostScript, -HTML, PDF files, and several other formats. In order to convert from -DocBook format to a format of your choice, you'll need to install Jade as -well as the desired DocBook stylesheets. - -Util-linux ----------- - -New versions of util-linux provide ``fdisk`` support for larger disks, -support new options to mount, recognize more supported partition -types, have a fdformat which works with 2.4 kernels, and similar goodies. -You'll probably want to upgrade. - -Ksymoops --------- - -If the unthinkable happens and your kernel oopses, you may need the -ksymoops tool to decode it, but in most cases you don't. -It is generally preferred to build the kernel with ``CONFIG_KALLSYMS`` so -that it produces readable dumps that can be used as-is (this also -produces better output than ksymoops). If for some reason your kernel -is not build with ``CONFIG_KALLSYMS`` and you have no way to rebuild and -reproduce the Oops with that option, then you can still decode that Oops -with ksymoops. - -Module-Init-Tools ------------------ - -A new module loader is now in the kernel that requires ``module-init-tools`` -to use. It is backward compatible with the 2.4.x series kernels. - -Mkinitrd --------- - -These changes to the ``/lib/modules`` file tree layout also require that -mkinitrd be upgraded. - -E2fsprogs ---------- - -The latest version of ``e2fsprogs`` fixes several bugs in fsck and -debugfs. Obviously, it's a good idea to upgrade. - -JFSutils --------- - -The ``jfsutils`` package contains the utilities for the file system. -The following utilities are available: - -- ``fsck.jfs`` - initiate replay of the transaction log, and check - and repair a JFS formatted partition. - -- ``mkfs.jfs`` - create a JFS formatted partition. - -- other file system utilities are also available in this package. - -Reiserfsprogs -------------- - -The reiserfsprogs package should be used for reiserfs-3.6.x -(Linux kernels 2.4.x). It is a combined package and contains working -versions of ``mkreiserfs``, ``resize_reiserfs``, ``debugreiserfs`` and -``reiserfsck``. These utils work on both i386 and alpha platforms. - -Xfsprogs --------- - -The latest version of ``xfsprogs`` contains ``mkfs.xfs``, ``xfs_db``, and the -``xfs_repair`` utilities, among others, for the XFS filesystem. It is -architecture independent and any version from 2.0.0 onward should -work correctly with this version of the XFS kernel code (2.6.0 or -later is recommended, due to some significant improvements). - -PCMCIAutils ------------ - -PCMCIAutils replaces ``pcmcia-cs``. It properly sets up -PCMCIA sockets at system startup and loads the appropriate modules -for 16-bit PCMCIA devices if the kernel is modularized and the hotplug -subsystem is used. - -Quota-tools ------------ - -Support for 32 bit uid's and gid's is required if you want to use -the newer version 2 quota format. Quota-tools version 3.07 and -newer has this support. Use the recommended version or newer -from the table above. - -Intel IA32 microcode --------------------- - -A driver has been added to allow updating of Intel IA32 microcode, -accessible as a normal (misc) character device. If you are not using -udev you may need to:: - - mkdir /dev/cpu - mknod /dev/cpu/microcode c 10 184 - chmod 0644 /dev/cpu/microcode - -as root before you can use this. You'll probably also want to -get the user-space microcode_ctl utility to use with this. - -udev ----- - -``udev`` is a userspace application for populating ``/dev`` dynamically with -only entries for devices actually present. ``udev`` replaces the basic -functionality of devfs, while allowing persistent device naming for -devices. - -FUSE ----- - -Needs libfuse 2.4.0 or later. Absolute minimum is 2.3.0 but mount -options ``direct_io`` and ``kernel_cache`` won't work. - -Networking -********** - -General changes ---------------- - -If you have advanced network configuration needs, you should probably -consider using the network tools from ip-route2. - -Packet Filter / NAT -------------------- -The packet filtering and NAT code uses the same tools like the previous 2.4.x -kernel series (iptables). It still includes backwards-compatibility modules -for 2.2.x-style ipchains and 2.0.x-style ipfwadm. - -PPP ---- - -The PPP driver has been restructured to support multilink and to -enable it to operate over diverse media layers. If you use PPP, -upgrade pppd to at least 2.4.0. - -If you are not using udev, you must have the device file /dev/ppp -which can be made by:: - - mknod /dev/ppp c 108 0 - -as root. - -Isdn4k-utils ------------- - -Due to changes in the length of the phone number field, isdn4k-utils -needs to be recompiled or (preferably) upgraded. - -NFS-utils ---------- - -In ancient (2.4 and earlier) kernels, the nfs server needed to know -about any client that expected to be able to access files via NFS. This -information would be given to the kernel by ``mountd`` when the client -mounted the filesystem, or by ``exportfs`` at system startup. exportfs -would take information about active clients from ``/var/lib/nfs/rmtab``. - -This approach is quite fragile as it depends on rmtab being correct -which is not always easy, particularly when trying to implement -fail-over. Even when the system is working well, ``rmtab`` suffers from -getting lots of old entries that never get removed. - -With modern kernels we have the option of having the kernel tell mountd -when it gets a request from an unknown host, and mountd can give -appropriate export information to the kernel. This removes the -dependency on ``rmtab`` and means that the kernel only needs to know about -currently active clients. - -To enable this new functionality, you need to:: - - mount -t nfsd nfsd /proc/fs/nfsd - -before running exportfs or mountd. It is recommended that all NFS -services be protected from the internet-at-large by a firewall where -that is possible. - -mcelog ------- - -On x86 kernels the mcelog utility is needed to process and log machine check -events when ``CONFIG_X86_MCE`` is enabled. Machine check events are errors -reported by the CPU. Processing them is strongly encouraged. - -Kernel documentation -******************** - -Sphinx ------- - -The ReST markups currently used by the Documentation/ files are meant to be -built with ``Sphinx`` version 1.2 or upper. If you're desiring to build -PDF outputs, it is recommended to use version 1.4.6. - -.. note:: - - Please notice that, for PDF and LaTeX output, you'll also need ``XeLaTeX`` - version 3.14159265. Depending on the distribution, you may also need - to install a series of ``texlive`` packages that provide the minimal - set of functionalities required for ``XeLaTex`` to work. - -Other tools ------------ - -In order to produce documentation from DocBook, you'll also need ``xmlto``. -Please notice, however, that we're currently migrating all documents to use -``Sphinx``. - -Getting updated software -======================== - -Kernel compilation -****************** - -gcc ---- - -- - -Make ----- - -- - -Binutils --------- - -- - -OpenSSL -------- - -- - -System utilities -**************** - -Util-linux ----------- - -- - -Ksymoops --------- - -- - -Module-Init-Tools ------------------ - -- - -Mkinitrd --------- - -- - -E2fsprogs ---------- - -- - -JFSutils --------- - -- - -Reiserfsprogs -------------- - -- - -Xfsprogs --------- - -- - -Pcmciautils ------------ - -- - -Quota-tools ------------ - -- - -DocBook Stylesheets -------------------- - -- - -XMLTO XSLT Frontend -------------------- - -- - -Intel P6 microcode ------------------- - -- - -udev ----- - -- - -FUSE ----- - -- - -mcelog ------- - -- - -Networking -********** - -PPP ---- - -- - -Isdn4k-utils ------------- - -- - -NFS-utils ---------- - -- - -Iptables --------- - -- - -Ip-route2 ---------- - -- - -OProfile --------- - -- - -NFS-Utils ---------- - -- - -Kernel documentation -******************** - -Sphinx ------- - -- diff --git a/Documentation/Changes b/Documentation/Changes new file mode 120000 index 000000000000..7564ae1682ba --- /dev/null +++ b/Documentation/Changes @@ -0,0 +1 @@ +process/changes.rst \ No newline at end of file diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index 9c61c039ccd9..320983ca114e 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -1,1062 +1 @@ -.. _codingstyle: - -Linux kernel coding style -========================= - -This is a short document describing the preferred coding style for the -linux kernel. Coding style is very personal, and I won't **force** my -views on anybody, but this is what goes for anything that I have to be -able to maintain, and I'd prefer it for most other things too. Please -at least consider the points made here. - -First off, I'd suggest printing out a copy of the GNU coding standards, -and NOT read it. Burn them, it's a great symbolic gesture. - -Anyway, here goes: - - -1) Indentation --------------- - -Tabs are 8 characters, and thus indentations are also 8 characters. -There are heretic movements that try to make indentations 4 (or even 2!) -characters deep, and that is akin to trying to define the value of PI to -be 3. - -Rationale: The whole idea behind indentation is to clearly define where -a block of control starts and ends. Especially when you've been looking -at your screen for 20 straight hours, you'll find it a lot easier to see -how the indentation works if you have large indentations. - -Now, some people will claim that having 8-character indentations makes -the code move too far to the right, and makes it hard to read on a -80-character terminal screen. The answer to that is that if you need -more than 3 levels of indentation, you're screwed anyway, and should fix -your program. - -In short, 8-char indents make things easier to read, and have the added -benefit of warning you when you're nesting your functions too deep. -Heed that warning. - -The preferred way to ease multiple indentation levels in a switch statement is -to align the ``switch`` and its subordinate ``case`` labels in the same column -instead of ``double-indenting`` the ``case`` labels. E.g.: - -.. code-block:: c - - switch (suffix) { - case 'G': - case 'g': - mem <<= 30; - break; - case 'M': - case 'm': - mem <<= 20; - break; - case 'K': - case 'k': - mem <<= 10; - /* fall through */ - default: - break; - } - -Don't put multiple statements on a single line unless you have -something to hide: - -.. code-block:: c - - if (condition) do_this; - do_something_everytime; - -Don't put multiple assignments on a single line either. Kernel coding style -is super simple. Avoid tricky expressions. - -Outside of comments, documentation and except in Kconfig, spaces are never -used for indentation, and the above example is deliberately broken. - -Get a decent editor and don't leave whitespace at the end of lines. - - -2) Breaking long lines and strings ----------------------------------- - -Coding style is all about readability and maintainability using commonly -available tools. - -The limit on the length of lines is 80 columns and this is a strongly -preferred limit. - -Statements longer than 80 columns will be broken into sensible chunks, unless -exceeding 80 columns significantly increases readability and does not hide -information. Descendants are always substantially shorter than the parent and -are placed substantially to the right. The same applies to function headers -with a long argument list. However, never break user-visible strings such as -printk messages, because that breaks the ability to grep for them. - - -3) Placing Braces and Spaces ----------------------------- - -The other issue that always comes up in C styling is the placement of -braces. Unlike the indent size, there are few technical reasons to -choose one placement strategy over the other, but the preferred way, as -shown to us by the prophets Kernighan and Ritchie, is to put the opening -brace last on the line, and put the closing brace first, thusly: - -.. code-block:: c - - if (x is true) { - we do y - } - -This applies to all non-function statement blocks (if, switch, for, -while, do). E.g.: - -.. code-block:: c - - switch (action) { - case KOBJ_ADD: - return "add"; - case KOBJ_REMOVE: - return "remove"; - case KOBJ_CHANGE: - return "change"; - default: - return NULL; - } - -However, there is one special case, namely functions: they have the -opening brace at the beginning of the next line, thus: - -.. code-block:: c - - int function(int x) - { - body of function - } - -Heretic people all over the world have claimed that this inconsistency -is ... well ... inconsistent, but all right-thinking people know that -(a) K&R are **right** and (b) K&R are right. Besides, functions are -special anyway (you can't nest them in C). - -Note that the closing brace is empty on a line of its own, **except** in -the cases where it is followed by a continuation of the same statement, -ie a ``while`` in a do-statement or an ``else`` in an if-statement, like -this: - -.. code-block:: c - - do { - body of do-loop - } while (condition); - -and - -.. code-block:: c - - if (x == y) { - .. - } else if (x > y) { - ... - } else { - .... - } - -Rationale: K&R. - -Also, note that this brace-placement also minimizes the number of empty -(or almost empty) lines, without any loss of readability. Thus, as the -supply of new-lines on your screen is not a renewable resource (think -25-line terminal screens here), you have more empty lines to put -comments on. - -Do not unnecessarily use braces where a single statement will do. - -.. code-block:: c - - if (condition) - action(); - -and - -.. code-block:: none - - if (condition) - do_this(); - else - do_that(); - -This does not apply if only one branch of a conditional statement is a single -statement; in the latter case use braces in both branches: - -.. code-block:: c - - if (condition) { - do_this(); - do_that(); - } else { - otherwise(); - } - -3.1) Spaces -*********** - -Linux kernel style for use of spaces depends (mostly) on -function-versus-keyword usage. Use a space after (most) keywords. The -notable exceptions are sizeof, typeof, alignof, and __attribute__, which look -somewhat like functions (and are usually used with parentheses in Linux, -although they are not required in the language, as in: ``sizeof info`` after -``struct fileinfo info;`` is declared). - -So use a space after these keywords:: - - if, switch, case, for, do, while - -but not with sizeof, typeof, alignof, or __attribute__. E.g., - -.. code-block:: c - - - s = sizeof(struct file); - -Do not add spaces around (inside) parenthesized expressions. This example is -**bad**: - -.. code-block:: c - - - s = sizeof( struct file ); - -When declaring pointer data or a function that returns a pointer type, the -preferred use of ``*`` is adjacent to the data name or function name and not -adjacent to the type name. Examples: - -.. code-block:: c - - - char *linux_banner; - unsigned long long memparse(char *ptr, char **retptr); - char *match_strdup(substring_t *s); - -Use one space around (on each side of) most binary and ternary operators, -such as any of these:: - - = + - < > * / % | & ^ <= >= == != ? : - -but no space after unary operators:: - - & * + - ~ ! sizeof typeof alignof __attribute__ defined - -no space before the postfix increment & decrement unary operators:: - - ++ -- - -no space after the prefix increment & decrement unary operators:: - - ++ -- - -and no space around the ``.`` and ``->`` structure member operators. - -Do not leave trailing whitespace at the ends of lines. Some editors with -``smart`` indentation will insert whitespace at the beginning of new lines as -appropriate, so you can start typing the next line of code right away. -However, some such editors do not remove the whitespace if you end up not -putting a line of code there, such as if you leave a blank line. As a result, -you end up with lines containing trailing whitespace. - -Git will warn you about patches that introduce trailing whitespace, and can -optionally strip the trailing whitespace for you; however, if applying a series -of patches, this may make later patches in the series fail by changing their -context lines. - - -4) Naming ---------- - -C is a Spartan language, and so should your naming be. Unlike Modula-2 -and Pascal programmers, C programmers do not use cute names like -ThisVariableIsATemporaryCounter. A C programmer would call that -variable ``tmp``, which is much easier to write, and not the least more -difficult to understand. - -HOWEVER, while mixed-case names are frowned upon, descriptive names for -global variables are a must. To call a global function ``foo`` is a -shooting offense. - -GLOBAL variables (to be used only if you **really** need them) need to -have descriptive names, as do global functions. If you have a function -that counts the number of active users, you should call that -``count_active_users()`` or similar, you should **not** call it ``cntusr()``. - -Encoding the type of a function into the name (so-called Hungarian -notation) is brain damaged - the compiler knows the types anyway and can -check those, and it only confuses the programmer. No wonder MicroSoft -makes buggy programs. - -LOCAL variable names should be short, and to the point. If you have -some random integer loop counter, it should probably be called ``i``. -Calling it ``loop_counter`` is non-productive, if there is no chance of it -being mis-understood. Similarly, ``tmp`` can be just about any type of -variable that is used to hold a temporary value. - -If you are afraid to mix up your local variable names, you have another -problem, which is called the function-growth-hormone-imbalance syndrome. -See chapter 6 (Functions). - - -5) Typedefs ------------ - -Please don't use things like ``vps_t``. -It's a **mistake** to use typedef for structures and pointers. When you see a - -.. code-block:: c - - - vps_t a; - -in the source, what does it mean? -In contrast, if it says - -.. code-block:: c - - struct virtual_container *a; - -you can actually tell what ``a`` is. - -Lots of people think that typedefs ``help readability``. Not so. They are -useful only for: - - (a) totally opaque objects (where the typedef is actively used to **hide** - what the object is). - - Example: ``pte_t`` etc. opaque objects that you can only access using - the proper accessor functions. - - .. note:: - - Opaqueness and ``accessor functions`` are not good in themselves. - The reason we have them for things like pte_t etc. is that there - really is absolutely **zero** portably accessible information there. - - (b) Clear integer types, where the abstraction **helps** avoid confusion - whether it is ``int`` or ``long``. - - u8/u16/u32 are perfectly fine typedefs, although they fit into - category (d) better than here. - - .. note:: - - Again - there needs to be a **reason** for this. If something is - ``unsigned long``, then there's no reason to do - - typedef unsigned long myflags_t; - - but if there is a clear reason for why it under certain circumstances - might be an ``unsigned int`` and under other configurations might be - ``unsigned long``, then by all means go ahead and use a typedef. - - (c) when you use sparse to literally create a **new** type for - type-checking. - - (d) New types which are identical to standard C99 types, in certain - exceptional circumstances. - - Although it would only take a short amount of time for the eyes and - brain to become accustomed to the standard types like ``uint32_t``, - some people object to their use anyway. - - Therefore, the Linux-specific ``u8/u16/u32/u64`` types and their - signed equivalents which are identical to standard types are - permitted -- although they are not mandatory in new code of your - own. - - When editing existing code which already uses one or the other set - of types, you should conform to the existing choices in that code. - - (e) Types safe for use in userspace. - - In certain structures which are visible to userspace, we cannot - require C99 types and cannot use the ``u32`` form above. Thus, we - use __u32 and similar types in all structures which are shared - with userspace. - -Maybe there are other cases too, but the rule should basically be to NEVER -EVER use a typedef unless you can clearly match one of those rules. - -In general, a pointer, or a struct that has elements that can reasonably -be directly accessed should **never** be a typedef. - - -6) Functions ------------- - -Functions should be short and sweet, and do just one thing. They should -fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, -as we all know), and do one thing and do that well. - -The maximum length of a function is inversely proportional to the -complexity and indentation level of that function. So, if you have a -conceptually simple function that is just one long (but simple) -case-statement, where you have to do lots of small things for a lot of -different cases, it's OK to have a longer function. - -However, if you have a complex function, and you suspect that a -less-than-gifted first-year high-school student might not even -understand what the function is all about, you should adhere to the -maximum limits all the more closely. Use helper functions with -descriptive names (you can ask the compiler to in-line them if you think -it's performance-critical, and it will probably do a better job of it -than you would have done). - -Another measure of the function is the number of local variables. They -shouldn't exceed 5-10, or you're doing something wrong. Re-think the -function, and split it into smaller pieces. A human brain can -generally easily keep track of about 7 different things, anything more -and it gets confused. You know you're brilliant, but maybe you'd like -to understand what you did 2 weeks from now. - -In source files, separate functions with one blank line. If the function is -exported, the **EXPORT** macro for it should follow immediately after the -closing function brace line. E.g.: - -.. code-block:: c - - int system_is_up(void) - { - return system_state == SYSTEM_RUNNING; - } - EXPORT_SYMBOL(system_is_up); - -In function prototypes, include parameter names with their data types. -Although this is not required by the C language, it is preferred in Linux -because it is a simple way to add valuable information for the reader. - - -7) Centralized exiting of functions ------------------------------------ - -Albeit deprecated by some people, the equivalent of the goto statement is -used frequently by compilers in form of the unconditional jump instruction. - -The goto statement comes in handy when a function exits from multiple -locations and some common work such as cleanup has to be done. If there is no -cleanup needed then just return directly. - -Choose label names which say what the goto does or why the goto exists. An -example of a good name could be ``out_free_buffer:`` if the goto frees ``buffer``. -Avoid using GW-BASIC names like ``err1:`` and ``err2:``, as you would have to -renumber them if you ever add or remove exit paths, and they make correctness -difficult to verify anyway. - -The rationale for using gotos is: - -- unconditional statements are easier to understand and follow -- nesting is reduced -- errors by not updating individual exit points when making - modifications are prevented -- saves the compiler work to optimize redundant code away ;) - -.. code-block:: c - - int fun(int a) - { - int result = 0; - char *buffer; - - buffer = kmalloc(SIZE, GFP_KERNEL); - if (!buffer) - return -ENOMEM; - - if (condition1) { - while (loop1) { - ... - } - result = 1; - goto out_buffer; - } - ... - out_free_buffer: - kfree(buffer); - return result; - } - -A common type of bug to be aware of is ``one err bugs`` which look like this: - -.. code-block:: c - - err: - kfree(foo->bar); - kfree(foo); - return ret; - -The bug in this code is that on some exit paths ``foo`` is NULL. Normally the -fix for this is to split it up into two error labels ``err_free_bar:`` and -``err_free_foo:``: - -.. code-block:: c - - err_free_bar: - kfree(foo->bar); - err_free_foo: - kfree(foo); - return ret; - -Ideally you should simulate errors to test all exit paths. - - -8) Commenting -------------- - -Comments are good, but there is also a danger of over-commenting. NEVER -try to explain HOW your code works in a comment: it's much better to -write the code so that the **working** is obvious, and it's a waste of -time to explain badly written code. - -Generally, you want your comments to tell WHAT your code does, not HOW. -Also, try to avoid putting comments inside a function body: if the -function is so complex that you need to separately comment parts of it, -you should probably go back to chapter 6 for a while. You can make -small comments to note or warn about something particularly clever (or -ugly), but try to avoid excess. Instead, put the comments at the head -of the function, telling people what it does, and possibly WHY it does -it. - -When commenting the kernel API functions, please use the kernel-doc format. -See the files Documentation/kernel-documentation.rst and scripts/kernel-doc -for details. - -The preferred style for long (multi-line) comments is: - -.. code-block:: c - - /* - * This is the preferred style for multi-line - * comments in the Linux kernel source code. - * Please use it consistently. - * - * Description: A column of asterisks on the left side, - * with beginning and ending almost-blank lines. - */ - -For files in net/ and drivers/net/ the preferred style for long (multi-line) -comments is a little different. - -.. code-block:: c - - /* The preferred comment style for files in net/ and drivers/net - * looks like this. - * - * It is nearly the same as the generally preferred comment style, - * but there is no initial almost-blank line. - */ - -It's also important to comment data, whether they are basic types or derived -types. To this end, use just one data declaration per line (no commas for -multiple data declarations). This leaves you room for a small comment on each -item, explaining its use. - - -9) You've made a mess of it ---------------------------- - -That's OK, we all do. You've probably been told by your long-time Unix -user helper that ``GNU emacs`` automatically formats the C sources for -you, and you've noticed that yes, it does do that, but the defaults it -uses are less than desirable (in fact, they are worse than random -typing - an infinite number of monkeys typing into GNU emacs would never -make a good program). - -So, you can either get rid of GNU emacs, or change it to use saner -values. To do the latter, you can stick the following in your .emacs file: - -.. code-block:: none - - (defun c-lineup-arglist-tabs-only (ignored) - "Line up argument lists by tabs, not spaces" - (let* ((anchor (c-langelem-pos c-syntactic-element)) - (column (c-langelem-2nd-pos c-syntactic-element)) - (offset (- (1+ column) anchor)) - (steps (floor offset c-basic-offset))) - (* (max steps 1) - c-basic-offset))) - - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) - -This will make emacs go better with the kernel coding style for C -files below ``~/src/linux-trees``. - -But even if you fail in getting emacs to do sane formatting, not -everything is lost: use ``indent``. - -Now, again, GNU indent has the same brain-dead settings that GNU emacs -has, which is why you need to give it a few command line options. -However, that's not too bad, because even the makers of GNU indent -recognize the authority of K&R (the GNU people aren't evil, they are -just severely misguided in this matter), so you just give indent the -options ``-kr -i8`` (stands for ``K&R, 8 character indents``), or use -``scripts/Lindent``, which indents in the latest style. - -``indent`` has a lot of options, and especially when it comes to comment -re-formatting you may want to take a look at the man page. But -remember: ``indent`` is not a fix for bad programming. - - -10) Kconfig configuration files -------------------------------- - -For all of the Kconfig* configuration files throughout the source tree, -the indentation is somewhat different. Lines under a ``config`` definition -are indented with one tab, while help text is indented an additional two -spaces. Example:: - - config AUDIT - bool "Auditing support" - depends on NET - help - Enable auditing infrastructure that can be used with another - kernel subsystem, such as SELinux (which requires this for - logging of avc messages output). Does not do system-call - auditing without CONFIG_AUDITSYSCALL. - -Seriously dangerous features (such as write support for certain -filesystems) should advertise this prominently in their prompt string:: - - config ADFS_FS_RW - bool "ADFS write support (DANGEROUS)" - depends on ADFS_FS - ... - -For full documentation on the configuration files, see the file -Documentation/kbuild/kconfig-language.txt. - - -11) Data structures -------------------- - -Data structures that have visibility outside the single-threaded -environment they are created and destroyed in should always have -reference counts. In the kernel, garbage collection doesn't exist (and -outside the kernel garbage collection is slow and inefficient), which -means that you absolutely **have** to reference count all your uses. - -Reference counting means that you can avoid locking, and allows multiple -users to have access to the data structure in parallel - and not having -to worry about the structure suddenly going away from under them just -because they slept or did something else for a while. - -Note that locking is **not** a replacement for reference counting. -Locking is used to keep data structures coherent, while reference -counting is a memory management technique. Usually both are needed, and -they are not to be confused with each other. - -Many data structures can indeed have two levels of reference counting, -when there are users of different ``classes``. The subclass count counts -the number of subclass users, and decrements the global count just once -when the subclass count goes to zero. - -Examples of this kind of ``multi-level-reference-counting`` can be found in -memory management (``struct mm_struct``: mm_users and mm_count), and in -filesystem code (``struct super_block``: s_count and s_active). - -Remember: if another thread can find your data structure, and you don't -have a reference count on it, you almost certainly have a bug. - - -12) Macros, Enums and RTL -------------------------- - -Names of macros defining constants and labels in enums are capitalized. - -.. code-block:: c - - #define CONSTANT 0x12345 - -Enums are preferred when defining several related constants. - -CAPITALIZED macro names are appreciated but macros resembling functions -may be named in lower case. - -Generally, inline functions are preferable to macros resembling functions. - -Macros with multiple statements should be enclosed in a do - while block: - -.. code-block:: c - - #define macrofun(a, b, c) \ - do { \ - if (a == 5) \ - do_this(b, c); \ - } while (0) - -Things to avoid when using macros: - -1) macros that affect control flow: - -.. code-block:: c - - #define FOO(x) \ - do { \ - if (blah(x) < 0) \ - return -EBUGGERED; \ - } while (0) - -is a **very** bad idea. It looks like a function call but exits the ``calling`` -function; don't break the internal parsers of those who will read the code. - -2) macros that depend on having a local variable with a magic name: - -.. code-block:: c - - #define FOO(val) bar(index, val) - -might look like a good thing, but it's confusing as hell when one reads the -code and it's prone to breakage from seemingly innocent changes. - -3) macros with arguments that are used as l-values: FOO(x) = y; will -bite you if somebody e.g. turns FOO into an inline function. - -4) forgetting about precedence: macros defining constants using expressions -must enclose the expression in parentheses. Beware of similar issues with -macros using parameters. - -.. code-block:: c - - #define CONSTANT 0x4000 - #define CONSTEXP (CONSTANT | 3) - -5) namespace collisions when defining local variables in macros resembling -functions: - -.. code-block:: c - - #define FOO(x) \ - ({ \ - typeof(x) ret; \ - ret = calc_ret(x); \ - (ret); \ - }) - -ret is a common name for a local variable - __foo_ret is less likely -to collide with an existing variable. - -The cpp manual deals with macros exhaustively. The gcc internals manual also -covers RTL which is used frequently with assembly language in the kernel. - - -13) Printing kernel messages ----------------------------- - -Kernel developers like to be seen as literate. Do mind the spelling -of kernel messages to make a good impression. Do not use crippled -words like ``dont``; use ``do not`` or ``don't`` instead. Make the messages -concise, clear, and unambiguous. - -Kernel messages do not have to be terminated with a period. - -Printing numbers in parentheses (%d) adds no value and should be avoided. - -There are a number of driver model diagnostic macros in -which you should use to make sure messages are matched to the right device -and driver, and are tagged with the right level: dev_err(), dev_warn(), -dev_info(), and so forth. For messages that aren't associated with a -particular device, defines pr_notice(), pr_info(), -pr_warn(), pr_err(), etc. - -Coming up with good debugging messages can be quite a challenge; and once -you have them, they can be a huge help for remote troubleshooting. However -debug message printing is handled differently than printing other non-debug -messages. While the other pr_XXX() functions print unconditionally, -pr_debug() does not; it is compiled out by default, unless either DEBUG is -defined or CONFIG_DYNAMIC_DEBUG is set. That is true for dev_dbg() also, -and a related convention uses VERBOSE_DEBUG to add dev_vdbg() messages to -the ones already enabled by DEBUG. - -Many subsystems have Kconfig debug options to turn on -DDEBUG in the -corresponding Makefile; in other cases specific files #define DEBUG. And -when a debug message should be unconditionally printed, such as if it is -already inside a debug-related #ifdef section, printk(KERN_DEBUG ...) can be -used. - - -14) Allocating memory ---------------------- - -The kernel provides the following general purpose memory allocators: -kmalloc(), kzalloc(), kmalloc_array(), kcalloc(), vmalloc(), and -vzalloc(). Please refer to the API documentation for further information -about them. - -The preferred form for passing a size of a struct is the following: - -.. code-block:: c - - p = kmalloc(sizeof(*p), ...); - -The alternative form where struct name is spelled out hurts readability and -introduces an opportunity for a bug when the pointer variable type is changed -but the corresponding sizeof that is passed to a memory allocator is not. - -Casting the return value which is a void pointer is redundant. The conversion -from void pointer to any other pointer type is guaranteed by the C programming -language. - -The preferred form for allocating an array is the following: - -.. code-block:: c - - p = kmalloc_array(n, sizeof(...), ...); - -The preferred form for allocating a zeroed array is the following: - -.. code-block:: c - - p = kcalloc(n, sizeof(...), ...); - -Both forms check for overflow on the allocation size n * sizeof(...), -and return NULL if that occurred. - - -15) The inline disease ----------------------- - -There appears to be a common misperception that gcc has a magic "make me -faster" speedup option called ``inline``. While the use of inlines can be -appropriate (for example as a means of replacing macros, see Chapter 12), it -very often is not. Abundant use of the inline keyword leads to a much bigger -kernel, which in turn slows the system as a whole down, due to a bigger -icache footprint for the CPU and simply because there is less memory -available for the pagecache. Just think about it; a pagecache miss causes a -disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles -that can go into these 5 milliseconds. - -A reasonable rule of thumb is to not put inline at functions that have more -than 3 lines of code in them. An exception to this rule are the cases where -a parameter is known to be a compiletime constant, and as a result of this -constantness you *know* the compiler will be able to optimize most of your -function away at compile time. For a good example of this later case, see -the kmalloc() inline function. - -Often people argue that adding inline to functions that are static and used -only once is always a win since there is no space tradeoff. While this is -technically correct, gcc is capable of inlining these automatically without -help, and the maintenance issue of removing the inline when a second user -appears outweighs the potential value of the hint that tells gcc to do -something it would have done anyway. - - -16) Function return values and names ------------------------------------- - -Functions can return values of many different kinds, and one of the -most common is a value indicating whether the function succeeded or -failed. Such a value can be represented as an error-code integer -(-Exxx = failure, 0 = success) or a ``succeeded`` boolean (0 = failure, -non-zero = success). - -Mixing up these two sorts of representations is a fertile source of -difficult-to-find bugs. If the C language included a strong distinction -between integers and booleans then the compiler would find these mistakes -for us... but it doesn't. To help prevent such bugs, always follow this -convention:: - - If the name of a function is an action or an imperative command, - the function should return an error-code integer. If the name - is a predicate, the function should return a "succeeded" boolean. - -For example, ``add work`` is a command, and the add_work() function returns 0 -for success or -EBUSY for failure. In the same way, ``PCI device present`` is -a predicate, and the pci_dev_present() function returns 1 if it succeeds in -finding a matching device or 0 if it doesn't. - -All EXPORTed functions must respect this convention, and so should all -public functions. Private (static) functions need not, but it is -recommended that they do. - -Functions whose return value is the actual result of a computation, rather -than an indication of whether the computation succeeded, are not subject to -this rule. Generally they indicate failure by returning some out-of-range -result. Typical examples would be functions that return pointers; they use -NULL or the ERR_PTR mechanism to report failure. - - -17) Don't re-invent the kernel macros -------------------------------------- - -The header file include/linux/kernel.h contains a number of macros that -you should use, rather than explicitly coding some variant of them yourself. -For example, if you need to calculate the length of an array, take advantage -of the macro - -.. code-block:: c - - #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) - -Similarly, if you need to calculate the size of some structure member, use - -.. code-block:: c - - #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) - -There are also min() and max() macros that do strict type checking if you -need them. Feel free to peruse that header file to see what else is already -defined that you shouldn't reproduce in your code. - - -18) Editor modelines and other cruft ------------------------------------- - -Some editors can interpret configuration information embedded in source files, -indicated with special markers. For example, emacs interprets lines marked -like this: - -.. code-block:: c - - -*- mode: c -*- - -Or like this: - -.. code-block:: c - - /* - Local Variables: - compile-command: "gcc -DMAGIC_DEBUG_FLAG foo.c" - End: - */ - -Vim interprets markers that look like this: - -.. code-block:: c - - /* vim:set sw=8 noet */ - -Do not include any of these in source files. People have their own personal -editor configurations, and your source files should not override them. This -includes markers for indentation and mode configuration. People may use their -own custom mode, or may have some other magic method for making indentation -work correctly. - - -19) Inline assembly -------------------- - -In architecture-specific code, you may need to use inline assembly to interface -with CPU or platform functionality. Don't hesitate to do so when necessary. -However, don't use inline assembly gratuitously when C can do the job. You can -and should poke hardware from C when possible. - -Consider writing simple helper functions that wrap common bits of inline -assembly, rather than repeatedly writing them with slight variations. Remember -that inline assembly can use C parameters. - -Large, non-trivial assembly functions should go in .S files, with corresponding -C prototypes defined in C header files. The C prototypes for assembly -functions should use ``asmlinkage``. - -You may need to mark your asm statement as volatile, to prevent GCC from -removing it if GCC doesn't notice any side effects. You don't always need to -do so, though, and doing so unnecessarily can limit optimization. - -When writing a single inline assembly statement containing multiple -instructions, put each instruction on a separate line in a separate quoted -string, and end each string except the last with \n\t to properly indent the -next instruction in the assembly output: - -.. code-block:: c - - asm ("magic %reg1, #42\n\t" - "more_magic %reg2, %reg3" - : /* outputs */ : /* inputs */ : /* clobbers */); - - -20) Conditional Compilation ---------------------------- - -Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c -files; doing so makes code harder to read and logic harder to follow. Instead, -use such conditionals in a header file defining functions for use in those .c -files, providing no-op stub versions in the #else case, and then call those -functions unconditionally from .c files. The compiler will avoid generating -any code for the stub calls, producing identical results, but the logic will -remain easy to follow. - -Prefer to compile out entire functions, rather than portions of functions or -portions of expressions. Rather than putting an ifdef in an expression, factor -out part or all of the expression into a separate helper function and apply the -conditional to that function. - -If you have a function or variable which may potentially go unused in a -particular configuration, and the compiler would warn about its definition -going unused, mark the definition as __maybe_unused rather than wrapping it in -a preprocessor conditional. (However, if a function or variable *always* goes -unused, delete it.) - -Within code, where possible, use the IS_ENABLED macro to convert a Kconfig -symbol into a C boolean expression, and use it in a normal C conditional: - -.. code-block:: c - - if (IS_ENABLED(CONFIG_SOMETHING)) { - ... - } - -The compiler will constant-fold the conditional away, and include or exclude -the block of code just as with an #ifdef, so this will not add any runtime -overhead. However, this approach still allows the C compiler to see the code -inside the block, and check it for correctness (syntax, types, symbol -references, etc). Thus, you still have to use an #ifdef if the code inside the -block references symbols that will not exist if the condition is not met. - -At the end of any non-trivial #if or #ifdef block (more than a few lines), -place a comment after the #endif on the same line, noting the conditional -expression used. For instance: - -.. code-block:: c - - #ifdef CONFIG_SOMETHING - ... - #endif /* CONFIG_SOMETHING */ - - -Appendix I) References ----------------------- - -The C Programming Language, Second Edition -by Brian W. Kernighan and Dennis M. Ritchie. -Prentice Hall, Inc., 1988. -ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback). - -The Practice of Programming -by Brian W. Kernighan and Rob Pike. -Addison-Wesley, Inc., 1999. -ISBN 0-201-61586-X. - -GNU manuals - where in compliance with K&R and this text - for cpp, gcc, -gcc internals and indent, all available from http://www.gnu.org/manual/ - -WG14 is the international standardization working group for the programming -language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ - -Kernel CodingStyle, by greg@kroah.com at OLS 2002: -http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ +This file has moved to process/coding-style.rst diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index fdf8232d0eeb..a6eb7dcd4dd5 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -9,13 +9,11 @@ DOCBOOKS := z8530book.xml \ kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ writing_usb_driver.xml networking.xml \ - kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ + kernel-api.xml filesystems.xml lsm.xml kgdb.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ - debugobjects.xml sh.xml regulator.xml \ - alsa-driver-api.xml writing-an-alsa-driver.xml \ - tracepoint.xml w1.xml \ - writing_musb_glue_layer.xml crypto-API.xml iio.xml + sh.xml regulator.xml w1.xml \ + writing_musb_glue_layer.xml iio.xml ifeq ($(DOCBOOKS),) @@ -264,6 +262,7 @@ clean-files := $(DOCBOOKS) \ $(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \ $(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \ $(patsubst %.xml, %.xml, $(DOCBOOKS)) \ + $(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \ $(index) clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man diff --git a/Documentation/DocBook/alsa-driver-api.tmpl b/Documentation/DocBook/alsa-driver-api.tmpl deleted file mode 100644 index 53f439dcc94b..000000000000 --- a/Documentation/DocBook/alsa-driver-api.tmpl +++ /dev/null @@ -1,142 +0,0 @@ - - - - - - - - - The ALSA Driver API - - - - This document is free; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2 of the License, or - (at your option) any later version. - - - - This document is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the - implied warranty of MERCHANTABILITY or FITNESS FOR A - PARTICULAR PURPOSE. See the GNU General Public License - for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - - - - - Management of Cards and Devices - Card Management -!Esound/core/init.c - - Device Components -!Esound/core/device.c - - Module requests and Device File Entries -!Esound/core/sound.c - - Memory Management Helpers -!Esound/core/memory.c -!Esound/core/memalloc.c - - - PCM API - PCM Core -!Esound/core/pcm.c -!Esound/core/pcm_lib.c -!Esound/core/pcm_native.c -!Iinclude/sound/pcm.h - - PCM Format Helpers -!Esound/core/pcm_misc.c - - PCM Memory Management -!Esound/core/pcm_memory.c - - PCM DMA Engine API -!Esound/core/pcm_dmaengine.c -!Iinclude/sound/dmaengine_pcm.h - - - Control/Mixer API - General Control Interface -!Esound/core/control.c - - AC97 Codec API -!Esound/pci/ac97/ac97_codec.c -!Esound/pci/ac97/ac97_pcm.c - - Virtual Master Control API -!Esound/core/vmaster.c -!Iinclude/sound/control.h - - - MIDI API - Raw MIDI API -!Esound/core/rawmidi.c - - MPU401-UART API -!Esound/drivers/mpu401/mpu401_uart.c - - - Proc Info API - Proc Info Interface -!Esound/core/info.c - - - Compress Offload - Compress Offload API -!Esound/core/compress_offload.c -!Iinclude/uapi/sound/compress_offload.h -!Iinclude/uapi/sound/compress_params.h -!Iinclude/sound/compress_driver.h - - - ASoC - ASoC Core API -!Iinclude/sound/soc.h -!Esound/soc/soc-core.c - -!Esound/soc/soc-devres.c -!Esound/soc/soc-io.c -!Esound/soc/soc-pcm.c -!Esound/soc/soc-ops.c -!Esound/soc/soc-compress.c - - ASoC DAPM API -!Esound/soc/soc-dapm.c - - ASoC DMA Engine API -!Esound/soc/soc-generic-dmaengine-pcm.c - - - Miscellaneous Functions - Hardware-Dependent Devices API -!Esound/core/hwdep.c - - Jack Abstraction Layer API -!Iinclude/sound/jack.h -!Esound/core/jack.c -!Esound/soc/soc-jack.c - - ISA DMA Helpers -!Esound/core/isadma.c - - Other Helper Macros -!Iinclude/sound/core.h - - - - diff --git a/Documentation/DocBook/crypto-API.tmpl b/Documentation/DocBook/crypto-API.tmpl deleted file mode 100644 index 088b79c341ff..000000000000 --- a/Documentation/DocBook/crypto-API.tmpl +++ /dev/null @@ -1,2092 +0,0 @@ - - - - - - Linux Kernel Crypto API - - - - Stephan - Mueller - -
- smueller@chronox.de -
-
-
- - Marek - Vasut - -
- marek@denx.de -
-
-
-
- - - 2014 - Stephan Mueller - - - - - - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either - version 2 of the License, or (at your option) any later - version. - - - - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - -
- - - - - Kernel Crypto API Interface Specification - - Introduction - - - The kernel crypto API offers a rich set of cryptographic ciphers as - well as other data transformation mechanisms and methods to invoke - these. This document contains a description of the API and provides - example code. - - - - To understand and properly use the kernel crypto API a brief - explanation of its structure is given. Based on the architecture, - the API can be separated into different components. Following the - architecture specification, hints to developers of ciphers are - provided. Pointers to the API function call documentation are - given at the end. - - - - The kernel crypto API refers to all algorithms as "transformations". - Therefore, a cipher handle variable usually has the name "tfm". - Besides cryptographic operations, the kernel crypto API also knows - compression transformations and handles them the same way as ciphers. - - - - The kernel crypto API serves the following entity types: - - - - consumers requesting cryptographic services - - - data transformation implementations (typically ciphers) - that can be called by consumers using the kernel crypto - API - - - - - - This specification is intended for consumers of the kernel crypto - API as well as for developers implementing ciphers. This API - specification, however, does not discuss all API calls available - to data transformation implementations (i.e. implementations of - ciphers and other transformations (such as CRC or even compression - algorithms) that can register with the kernel crypto API). - - - - Note: The terms "transformation" and cipher algorithm are used - interchangeably. - - - - Terminology - - The transformation implementation is an actual code or interface - to hardware which implements a certain transformation with precisely - defined behavior. - - - - The transformation object (TFM) is an instance of a transformation - implementation. There can be multiple transformation objects - associated with a single transformation implementation. Each of - those transformation objects is held by a crypto API consumer or - another transformation. Transformation object is allocated when a - crypto API consumer requests a transformation implementation. - The consumer is then provided with a structure, which contains - a transformation object (TFM). - - - - The structure that contains transformation objects may also be - referred to as a "cipher handle". Such a cipher handle is always - subject to the following phases that are reflected in the API calls - applicable to such a cipher handle: - - - - - Initialization of a cipher handle. - - - Execution of all intended cipher operations applicable - for the handle where the cipher handle must be furnished to - every API call. - - - Destruction of a cipher handle. - - - - - When using the initialization API calls, a cipher handle is - created and returned to the consumer. Therefore, please refer - to all initialization API calls that refer to the data - structure type a consumer is expected to receive and subsequently - to use. The initialization API calls have all the same naming - conventions of crypto_alloc_*. - - - - The transformation context is private data associated with - the transformation object. - - - - - Kernel Crypto API Architecture - Cipher algorithm types - - The kernel crypto API provides different API calls for the - following cipher types: - - - Symmetric ciphers - AEAD ciphers - Message digest, including keyed message digest - Random number generation - User space interface - - - - - Ciphers And Templates - - The kernel crypto API provides implementations of single block - ciphers and message digests. In addition, the kernel crypto API - provides numerous "templates" that can be used in conjunction - with the single block ciphers and message digests. Templates - include all types of block chaining mode, the HMAC mechanism, etc. - - - - Single block ciphers and message digests can either be directly - used by a caller or invoked together with a template to form - multi-block ciphers or keyed message digests. - - - - A single block cipher may even be called with multiple templates. - However, templates cannot be used without a single cipher. - - - - See /proc/crypto and search for "name". For example: - - - aes - ecb(aes) - cmac(aes) - ccm(aes) - rfc4106(gcm(aes)) - sha1 - hmac(sha1) - authenc(hmac(sha1),cbc(aes)) - - - - - In these examples, "aes" and "sha1" are the ciphers and all - others are the templates. - - - - Synchronous And Asynchronous Operation - - The kernel crypto API provides synchronous and asynchronous - API operations. - - - - When using the synchronous API operation, the caller invokes - a cipher operation which is performed synchronously by the - kernel crypto API. That means, the caller waits until the - cipher operation completes. Therefore, the kernel crypto API - calls work like regular function calls. For synchronous - operation, the set of API calls is small and conceptually - similar to any other crypto library. - - - - Asynchronous operation is provided by the kernel crypto API - which implies that the invocation of a cipher operation will - complete almost instantly. That invocation triggers the - cipher operation but it does not signal its completion. Before - invoking a cipher operation, the caller must provide a callback - function the kernel crypto API can invoke to signal the - completion of the cipher operation. Furthermore, the caller - must ensure it can handle such asynchronous events by applying - appropriate locking around its data. The kernel crypto API - does not perform any special serialization operation to protect - the caller's data integrity. - - - - Crypto API Cipher References And Priority - - A cipher is referenced by the caller with a string. That string - has the following semantics: - - - template(single block cipher) - - - where "template" and "single block cipher" is the aforementioned - template and single block cipher, respectively. If applicable, - additional templates may enclose other templates, such as - - - template1(template2(single block cipher))) - - - - - The kernel crypto API may provide multiple implementations of a - template or a single block cipher. For example, AES on newer - Intel hardware has the following implementations: AES-NI, - assembler implementation, or straight C. Now, when using the - string "aes" with the kernel crypto API, which cipher - implementation is used? The answer to that question is the - priority number assigned to each cipher implementation by the - kernel crypto API. When a caller uses the string to refer to a - cipher during initialization of a cipher handle, the kernel - crypto API looks up all implementations providing an - implementation with that name and selects the implementation - with the highest priority. - - - - Now, a caller may have the need to refer to a specific cipher - implementation and thus does not want to rely on the - priority-based selection. To accommodate this scenario, the - kernel crypto API allows the cipher implementation to register - a unique name in addition to common names. When using that - unique name, a caller is therefore always sure to refer to - the intended cipher implementation. - - - - The list of available ciphers is given in /proc/crypto. However, - that list does not specify all possible permutations of - templates and ciphers. Each block listed in /proc/crypto may - contain the following information -- if one of the components - listed as follows are not applicable to a cipher, it is not - displayed: - - - - - name: the generic name of the cipher that is subject - to the priority-based selection -- this name can be used by - the cipher allocation API calls (all names listed above are - examples for such generic names) - - - driver: the unique name of the cipher -- this name can - be used by the cipher allocation API calls - - - module: the kernel module providing the cipher - implementation (or "kernel" for statically linked ciphers) - - - priority: the priority value of the cipher implementation - - - refcnt: the reference count of the respective cipher - (i.e. the number of current consumers of this cipher) - - - selftest: specification whether the self test for the - cipher passed - - - type: - - - skcipher for symmetric key ciphers - - - cipher for single block ciphers that may be used with - an additional template - - - shash for synchronous message digest - - - ahash for asynchronous message digest - - - aead for AEAD cipher type - - - compression for compression type transformations - - - rng for random number generator - - - givcipher for cipher with associated IV generator - (see the geniv entry below for the specification of the - IV generator type used by the cipher implementation) - - - - - - blocksize: blocksize of cipher in bytes - - - keysize: key size in bytes - - - ivsize: IV size in bytes - - - seedsize: required size of seed data for random number - generator - - - digestsize: output size of the message digest - - - geniv: IV generation type: - - - eseqiv for encrypted sequence number based IV - generation - - - seqiv for sequence number based IV generation - - - chainiv for chain iv generation - - - <builtin> is a marker that the cipher implements - IV generation and handling as it is specific to the given - cipher - - - - - - - - Key Sizes - - When allocating a cipher handle, the caller only specifies the - cipher type. Symmetric ciphers, however, typically support - multiple key sizes (e.g. AES-128 vs. AES-192 vs. AES-256). - These key sizes are determined with the length of the provided - key. Thus, the kernel crypto API does not provide a separate - way to select the particular symmetric cipher key size. - - - - Cipher Allocation Type And Masks - - The different cipher handle allocation functions allow the - specification of a type and mask flag. Both parameters have - the following meaning (and are therefore not covered in the - subsequent sections). - - - - The type flag specifies the type of the cipher algorithm. - The caller usually provides a 0 when the caller wants the - default handling. Otherwise, the caller may provide the - following selections which match the aforementioned cipher - types: - - - - - CRYPTO_ALG_TYPE_CIPHER Single block cipher - - - CRYPTO_ALG_TYPE_COMPRESS Compression - - - CRYPTO_ALG_TYPE_AEAD Authenticated Encryption with - Associated Data (MAC) - - - CRYPTO_ALG_TYPE_BLKCIPHER Synchronous multi-block cipher - - - CRYPTO_ALG_TYPE_ABLKCIPHER Asynchronous multi-block cipher - - - CRYPTO_ALG_TYPE_GIVCIPHER Asynchronous multi-block - cipher packed together with an IV generator (see geniv field - in the /proc/crypto listing for the known IV generators) - - - CRYPTO_ALG_TYPE_DIGEST Raw message digest - - - CRYPTO_ALG_TYPE_HASH Alias for CRYPTO_ALG_TYPE_DIGEST - - - CRYPTO_ALG_TYPE_SHASH Synchronous multi-block hash - - - CRYPTO_ALG_TYPE_AHASH Asynchronous multi-block hash - - - CRYPTO_ALG_TYPE_RNG Random Number Generation - - - CRYPTO_ALG_TYPE_AKCIPHER Asymmetric cipher - - - CRYPTO_ALG_TYPE_PCOMPRESS Enhanced version of - CRYPTO_ALG_TYPE_COMPRESS allowing for segmented compression / - decompression instead of performing the operation on one - segment only. CRYPTO_ALG_TYPE_PCOMPRESS is intended to replace - CRYPTO_ALG_TYPE_COMPRESS once existing consumers are converted. - - - - - The mask flag restricts the type of cipher. The only allowed - flag is CRYPTO_ALG_ASYNC to restrict the cipher lookup function - to asynchronous ciphers. Usually, a caller provides a 0 for the - mask flag. - - - - When the caller provides a mask and type specification, the - caller limits the search the kernel crypto API can perform for - a suitable cipher implementation for the given cipher name. - That means, even when a caller uses a cipher name that exists - during its initialization call, the kernel crypto API may not - select it due to the used type and mask field. - - - - Internal Structure of Kernel Crypto API - - - The kernel crypto API has an internal structure where a cipher - implementation may use many layers and indirections. This section - shall help to clarify how the kernel crypto API uses - various components to implement the complete cipher. - - - - The following subsections explain the internal structure based - on existing cipher implementations. The first section addresses - the most complex scenario where all other scenarios form a logical - subset. - - - Generic AEAD Cipher Structure - - - The following ASCII art decomposes the kernel crypto API layers - when using the AEAD cipher with the automated IV generation. The - shown example is used by the IPSEC layer. - - - - For other use cases of AEAD ciphers, the ASCII art applies as - well, but the caller may not use the AEAD cipher with a separate - IV generator. In this case, the caller must generate the IV. - - - - The depicted example decomposes the AEAD cipher of GCM(AES) based - on the generic C implementations (gcm.c, aes-generic.c, ctr.c, - ghash-generic.c, seqiv.c). The generic implementation serves as an - example showing the complete logic of the kernel crypto API. - - - - It is possible that some streamlined cipher implementations (like - AES-NI) provide implementations merging aspects which in the view - of the kernel crypto API cannot be decomposed into layers any more. - In case of the AES-NI implementation, the CTR mode, the GHASH - implementation and the AES cipher are all merged into one cipher - implementation registered with the kernel crypto API. In this case, - the concept described by the following ASCII art applies too. However, - the decomposition of GCM into the individual sub-components - by the kernel crypto API is not done any more. - - - - Each block in the following ASCII art is an independent cipher - instance obtained from the kernel crypto API. Each block - is accessed by the caller or by other blocks using the API functions - defined by the kernel crypto API for the cipher implementation type. - - - - The blocks below indicate the cipher type as well as the specific - logic implemented in the cipher. - - - - The ASCII art picture also indicates the call structure, i.e. who - calls which component. The arrows point to the invoked block - where the caller uses the API applicable to the cipher type - specified for the block. - - - - - - - - The following call sequence is applicable when the IPSEC layer - triggers an encryption operation with the esp_output function. During - configuration, the administrator set up the use of rfc4106(gcm(aes)) as - the cipher for ESP. The following call sequence is now depicted in the - ASCII art above: - - - - - - esp_output() invokes crypto_aead_encrypt() to trigger an encryption - operation of the AEAD cipher with IV generator. - - - - In case of GCM, the SEQIV implementation is registered as GIVCIPHER - in crypto_rfc4106_alloc(). - - - - The SEQIV performs its operation to generate an IV where the core - function is seqiv_geniv(). - - - - - - Now, SEQIV uses the AEAD API function calls to invoke the associated - AEAD cipher. In our case, during the instantiation of SEQIV, the - cipher handle for GCM is provided to SEQIV. This means that SEQIV - invokes AEAD cipher operations with the GCM cipher handle. - - - - During instantiation of the GCM handle, the CTR(AES) and GHASH - ciphers are instantiated. The cipher handles for CTR(AES) and GHASH - are retained for later use. - - - - The GCM implementation is responsible to invoke the CTR mode AES and - the GHASH cipher in the right manner to implement the GCM - specification. - - - - - - The GCM AEAD cipher type implementation now invokes the SKCIPHER API - with the instantiated CTR(AES) cipher handle. - - - - During instantiation of the CTR(AES) cipher, the CIPHER type - implementation of AES is instantiated. The cipher handle for AES is - retained. - - - - That means that the SKCIPHER implementation of CTR(AES) only - implements the CTR block chaining mode. After performing the block - chaining operation, the CIPHER implementation of AES is invoked. - - - - - - The SKCIPHER of CTR(AES) now invokes the CIPHER API with the AES - cipher handle to encrypt one block. - - - - - - The GCM AEAD implementation also invokes the GHASH cipher - implementation via the AHASH API. - - - - - - When the IPSEC layer triggers the esp_input() function, the same call - sequence is followed with the only difference that the operation starts - with step (2). - - - - Generic Block Cipher Structure - - Generic block ciphers follow the same concept as depicted with the ASCII - art picture above. - - - - For example, CBC(AES) is implemented with cbc.c, and aes-generic.c. The - ASCII art picture above applies as well with the difference that only - step (4) is used and the SKCIPHER block chaining mode is CBC. - - - - Generic Keyed Message Digest Structure - - Keyed message digest implementations again follow the same concept as - depicted in the ASCII art picture above. - - - - For example, HMAC(SHA256) is implemented with hmac.c and - sha256_generic.c. The following ASCII art illustrates the - implementation: - - - - - - - - The following call sequence is applicable when a caller triggers - an HMAC operation: - - - - - - The AHASH API functions are invoked by the caller. The HMAC - implementation performs its operation as needed. - - - - During initialization of the HMAC cipher, the SHASH cipher type of - SHA256 is instantiated. The cipher handle for the SHA256 instance is - retained. - - - - At one time, the HMAC implementation requires a SHA256 operation - where the SHA256 cipher handle is used. - - - - - - The HMAC instance now invokes the SHASH API with the SHA256 - cipher handle to calculate the message digest. - - - - - - - - Developing Cipher Algorithms - Registering And Unregistering Transformation - - There are three distinct types of registration functions in - the Crypto API. One is used to register a generic cryptographic - transformation, while the other two are specific to HASH - transformations and COMPRESSion. We will discuss the latter - two in a separate chapter, here we will only look at the - generic ones. - - - - Before discussing the register functions, the data structure - to be filled with each, struct crypto_alg, must be considered - -- see below for a description of this data structure. - - - - The generic registration functions can be found in - include/linux/crypto.h and their definition can be seen below. - The former function registers a single transformation, while - the latter works on an array of transformation descriptions. - The latter is useful when registering transformations in bulk, - for example when a driver implements multiple transformations. - - - - int crypto_register_alg(struct crypto_alg *alg); - int crypto_register_algs(struct crypto_alg *algs, int count); - - - - The counterparts to those functions are listed below. - - - - int crypto_unregister_alg(struct crypto_alg *alg); - int crypto_unregister_algs(struct crypto_alg *algs, int count); - - - - Notice that both registration and unregistration functions - do return a value, so make sure to handle errors. A return - code of zero implies success. Any return code < 0 implies - an error. - - - - The bulk registration/unregistration functions - register/unregister each transformation in the given array of - length count. They handle errors as follows: - - - - - crypto_register_algs() succeeds if and only if it - successfully registers all the given transformations. If an - error occurs partway through, then it rolls back successful - registrations before returning the error code. Note that if - a driver needs to handle registration errors for individual - transformations, then it will need to use the non-bulk - function crypto_register_alg() instead. - - - - - crypto_unregister_algs() tries to unregister all the given - transformations, continuing on error. It logs errors and - always returns zero. - - - - - - - Single-Block Symmetric Ciphers [CIPHER] - - Example of transformations: aes, arc4, ... - - - - This section describes the simplest of all transformation - implementations, that being the CIPHER type used for symmetric - ciphers. The CIPHER type is used for transformations which - operate on exactly one block at a time and there are no - dependencies between blocks at all. - - - Registration specifics - - The registration of [CIPHER] algorithm is specific in that - struct crypto_alg field .cra_type is empty. The .cra_u.cipher - has to be filled in with proper callbacks to implement this - transformation. - - - - See struct cipher_alg below. - - - - Cipher Definition With struct cipher_alg - - Struct cipher_alg defines a single block cipher. - - - - Here are schematics of how these functions are called when - operated from other part of the kernel. Note that the - .cia_setkey() call might happen before or after any of these - schematics happen, but must not happen during any of these - are in-flight. - - - - - KEY ---. PLAINTEXT ---. - v v - .cia_setkey() -> .cia_encrypt() - | - '-----> CIPHERTEXT - - - - - Please note that a pattern where .cia_setkey() is called - multiple times is also valid: - - - - - - KEY1 --. PLAINTEXT1 --. KEY2 --. PLAINTEXT2 --. - v v v v - .cia_setkey() -> .cia_encrypt() -> .cia_setkey() -> .cia_encrypt() - | | - '---> CIPHERTEXT1 '---> CIPHERTEXT2 - - - - - - - Multi-Block Ciphers - - Example of transformations: cbc(aes), ecb(arc4), ... - - - - This section describes the multi-block cipher transformation - implementations. The multi-block ciphers are - used for transformations which operate on scatterlists of - data supplied to the transformation functions. They output - the result into a scatterlist of data as well. - - - Registration Specifics - - - The registration of multi-block cipher algorithms - is one of the most standard procedures throughout the crypto API. - - - - Note, if a cipher implementation requires a proper alignment - of data, the caller should use the functions of - crypto_skcipher_alignmask() to identify a memory alignment mask. - The kernel crypto API is able to process requests that are unaligned. - This implies, however, additional overhead as the kernel - crypto API needs to perform the realignment of the data which - may imply moving of data. - - - - Cipher Definition With struct blkcipher_alg and ablkcipher_alg - - Struct blkcipher_alg defines a synchronous block cipher whereas - struct ablkcipher_alg defines an asynchronous block cipher. - - - - Please refer to the single block cipher description for schematics - of the block cipher usage. - - - - Specifics Of Asynchronous Multi-Block Cipher - - There are a couple of specifics to the asynchronous interface. - - - - First of all, some of the drivers will want to use the - Generic ScatterWalk in case the hardware needs to be fed - separate chunks of the scatterlist which contains the - plaintext and will contain the ciphertext. Please refer - to the ScatterWalk interface offered by the Linux kernel - scatter / gather list implementation. - - - - - Hashing [HASH] - - - Example of transformations: crc32, md5, sha1, sha256,... - - - Registering And Unregistering The Transformation - - - There are multiple ways to register a HASH transformation, - depending on whether the transformation is synchronous [SHASH] - or asynchronous [AHASH] and the amount of HASH transformations - we are registering. You can find the prototypes defined in - include/crypto/internal/hash.h: - - - - int crypto_register_ahash(struct ahash_alg *alg); - - int crypto_register_shash(struct shash_alg *alg); - int crypto_register_shashes(struct shash_alg *algs, int count); - - - - The respective counterparts for unregistering the HASH - transformation are as follows: - - - - int crypto_unregister_ahash(struct ahash_alg *alg); - - int crypto_unregister_shash(struct shash_alg *alg); - int crypto_unregister_shashes(struct shash_alg *algs, int count); - - - - Cipher Definition With struct shash_alg and ahash_alg - - Here are schematics of how these functions are called when - operated from other part of the kernel. Note that the .setkey() - call might happen before or after any of these schematics happen, - but must not happen during any of these are in-flight. Please note - that calling .init() followed immediately by .finish() is also a - perfectly valid transformation. - - - - I) DATA -----------. - v - .init() -> .update() -> .final() ! .update() might not be called - ^ | | at all in this scenario. - '----' '---> HASH - - II) DATA -----------.-----------. - v v - .init() -> .update() -> .finup() ! .update() may not be called - ^ | | at all in this scenario. - '----' '---> HASH - - III) DATA -----------. - v - .digest() ! The entire process is handled - | by the .digest() call. - '---------------> HASH - - - - Here is a schematic of how the .export()/.import() functions are - called when used from another part of the kernel. - - - - KEY--. DATA--. - v v ! .update() may not be called - .setkey() -> .init() -> .update() -> .export() at all in this scenario. - ^ | | - '-----' '--> PARTIAL_HASH - - ----------- other transformations happen here ----------- - - PARTIAL_HASH--. DATA1--. - v v - .import -> .update() -> .final() ! .update() may not be called - ^ | | at all in this scenario. - '----' '--> HASH1 - - PARTIAL_HASH--. DATA2-. - v v - .import -> .finup() - | - '---------------> HASH2 - - - - Specifics Of Asynchronous HASH Transformation - - Some of the drivers will want to use the Generic ScatterWalk - in case the implementation needs to be fed separate chunks of the - scatterlist which contains the input data. The buffer containing - the resulting hash will always be properly aligned to - .cra_alignmask so there is no need to worry about this. - - - - - - User Space Interface - Introduction - - The concepts of the kernel crypto API visible to kernel space is fully - applicable to the user space interface as well. Therefore, the kernel - crypto API high level discussion for the in-kernel use cases applies - here as well. - - - - The major difference, however, is that user space can only act as a - consumer and never as a provider of a transformation or cipher algorithm. - - - - The following covers the user space interface exported by the kernel - crypto API. A working example of this description is libkcapi that - can be obtained from [1]. That library can be used by user space - applications that require cryptographic services from the kernel. - - - - Some details of the in-kernel kernel crypto API aspects do not - apply to user space, however. This includes the difference between - synchronous and asynchronous invocations. The user space API call - is fully synchronous. - - - - [1] http://www.chronox.de/libkcapi.html - - - - - User Space API General Remarks - - The kernel crypto API is accessible from user space. Currently, - the following ciphers are accessible: - - - - - Message digest including keyed message digest (HMAC, CMAC) - - - - Symmetric ciphers - - - - AEAD ciphers - - - - Random Number Generators - - - - - The interface is provided via socket type using the type AF_ALG. - In addition, the setsockopt option type is SOL_ALG. In case the - user space header files do not export these flags yet, use the - following macros: - - - -#ifndef AF_ALG -#define AF_ALG 38 -#endif -#ifndef SOL_ALG -#define SOL_ALG 279 -#endif - - - - A cipher is accessed with the same name as done for the in-kernel - API calls. This includes the generic vs. unique naming schema for - ciphers as well as the enforcement of priorities for generic names. - - - - To interact with the kernel crypto API, a socket must be - created by the user space application. User space invokes the cipher - operation with the send()/write() system call family. The result of the - cipher operation is obtained with the read()/recv() system call family. - - - - The following API calls assume that the socket descriptor - is already opened by the user space application and discusses only - the kernel crypto API specific invocations. - - - - To initialize the socket interface, the following sequence has to - be performed by the consumer: - - - - - - Create a socket of type AF_ALG with the struct sockaddr_alg - parameter specified below for the different cipher types. - - - - - - Invoke bind with the socket descriptor - - - - - - Invoke accept with the socket descriptor. The accept system call - returns a new file descriptor that is to be used to interact with - the particular cipher instance. When invoking send/write or recv/read - system calls to send data to the kernel or obtain data from the - kernel, the file descriptor returned by accept must be used. - - - - - - In-place Cipher operation - - Just like the in-kernel operation of the kernel crypto API, the user - space interface allows the cipher operation in-place. That means that - the input buffer used for the send/write system call and the output - buffer used by the read/recv system call may be one and the same. - This is of particular interest for symmetric cipher operations where a - copying of the output data to its final destination can be avoided. - - - - If a consumer on the other hand wants to maintain the plaintext and - the ciphertext in different memory locations, all a consumer needs - to do is to provide different memory pointers for the encryption and - decryption operation. - - - - Message Digest API - - The message digest type to be used for the cipher operation is - selected when invoking the bind syscall. bind requires the caller - to provide a filled struct sockaddr data structure. This data - structure must be filled as follows: - - - -struct sockaddr_alg sa = { - .salg_family = AF_ALG, - .salg_type = "hash", /* this selects the hash logic in the kernel */ - .salg_name = "sha1" /* this is the cipher name */ -}; - - - - The salg_type value "hash" applies to message digests and keyed - message digests. Though, a keyed message digest is referenced by - the appropriate salg_name. Please see below for the setsockopt - interface that explains how the key can be set for a keyed message - digest. - - - - Using the send() system call, the application provides the data that - should be processed with the message digest. The send system call - allows the following flags to be specified: - - - - - - MSG_MORE: If this flag is set, the send system call acts like a - message digest update function where the final hash is not - yet calculated. If the flag is not set, the send system call - calculates the final message digest immediately. - - - - - - With the recv() system call, the application can read the message - digest from the kernel crypto API. If the buffer is too small for the - message digest, the flag MSG_TRUNC is set by the kernel. - - - - In order to set a message digest key, the calling application must use - the setsockopt() option of ALG_SET_KEY. If the key is not set the HMAC - operation is performed without the initial HMAC state change caused by - the key. - - - - Symmetric Cipher API - - The operation is very similar to the message digest discussion. - During initialization, the struct sockaddr data structure must be - filled as follows: - - - -struct sockaddr_alg sa = { - .salg_family = AF_ALG, - .salg_type = "skcipher", /* this selects the symmetric cipher */ - .salg_name = "cbc(aes)" /* this is the cipher name */ -}; - - - - Before data can be sent to the kernel using the write/send system - call family, the consumer must set the key. The key setting is - described with the setsockopt invocation below. - - - - Using the sendmsg() system call, the application provides the data that should be processed for encryption or decryption. In addition, the IV is - specified with the data structure provided by the sendmsg() system call. - - - - The sendmsg system call parameter of struct msghdr is embedded into the - struct cmsghdr data structure. See recv(2) and cmsg(3) for more - information on how the cmsghdr data structure is used together with the - send/recv system call family. That cmsghdr data structure holds the - following information specified with a separate header instances: - - - - - - specification of the cipher operation type with one of these flags: - - - - ALG_OP_ENCRYPT - encryption of data - - - ALG_OP_DECRYPT - decryption of data - - - - - - - specification of the IV information marked with the flag ALG_SET_IV - - - - - - The send system call family allows the following flag to be specified: - - - - - - MSG_MORE: If this flag is set, the send system call acts like a - cipher update function where more input data is expected - with a subsequent invocation of the send system call. - - - - - - Note: The kernel reports -EINVAL for any unexpected data. The caller - must make sure that all data matches the constraints given in - /proc/crypto for the selected cipher. - - - - With the recv() system call, the application can read the result of - the cipher operation from the kernel crypto API. The output buffer - must be at least as large as to hold all blocks of the encrypted or - decrypted data. If the output data size is smaller, only as many - blocks are returned that fit into that output buffer size. - - - - AEAD Cipher API - - The operation is very similar to the symmetric cipher discussion. - During initialization, the struct sockaddr data structure must be - filled as follows: - - - -struct sockaddr_alg sa = { - .salg_family = AF_ALG, - .salg_type = "aead", /* this selects the symmetric cipher */ - .salg_name = "gcm(aes)" /* this is the cipher name */ -}; - - - - Before data can be sent to the kernel using the write/send system - call family, the consumer must set the key. The key setting is - described with the setsockopt invocation below. - - - - In addition, before data can be sent to the kernel using the - write/send system call family, the consumer must set the authentication - tag size. To set the authentication tag size, the caller must use the - setsockopt invocation described below. - - - - Using the sendmsg() system call, the application provides the data that should be processed for encryption or decryption. In addition, the IV is - specified with the data structure provided by the sendmsg() system call. - - - - The sendmsg system call parameter of struct msghdr is embedded into the - struct cmsghdr data structure. See recv(2) and cmsg(3) for more - information on how the cmsghdr data structure is used together with the - send/recv system call family. That cmsghdr data structure holds the - following information specified with a separate header instances: - - - - - - specification of the cipher operation type with one of these flags: - - - - ALG_OP_ENCRYPT - encryption of data - - - ALG_OP_DECRYPT - decryption of data - - - - - - - specification of the IV information marked with the flag ALG_SET_IV - - - - - - specification of the associated authentication data (AAD) with the - flag ALG_SET_AEAD_ASSOCLEN. The AAD is sent to the kernel together - with the plaintext / ciphertext. See below for the memory structure. - - - - - - The send system call family allows the following flag to be specified: - - - - - - MSG_MORE: If this flag is set, the send system call acts like a - cipher update function where more input data is expected - with a subsequent invocation of the send system call. - - - - - - Note: The kernel reports -EINVAL for any unexpected data. The caller - must make sure that all data matches the constraints given in - /proc/crypto for the selected cipher. - - - - With the recv() system call, the application can read the result of - the cipher operation from the kernel crypto API. The output buffer - must be at least as large as defined with the memory structure below. - If the output data size is smaller, the cipher operation is not performed. - - - - The authenticated decryption operation may indicate an integrity error. - Such breach in integrity is marked with the -EBADMSG error code. - - - AEAD Memory Structure - - The AEAD cipher operates with the following information that - is communicated between user and kernel space as one data stream: - - - - - plaintext or ciphertext - - - - associated authentication data (AAD) - - - - authentication tag - - - - - The sizes of the AAD and the authentication tag are provided with - the sendmsg and setsockopt calls (see there). As the kernel knows - the size of the entire data stream, the kernel is now able to - calculate the right offsets of the data components in the data - stream. - - - - The user space caller must arrange the aforementioned information - in the following order: - - - - - - AEAD encryption input: AAD || plaintext - - - - - - AEAD decryption input: AAD || ciphertext || authentication tag - - - - - - The output buffer the user space caller provides must be at least as - large to hold the following data: - - - - - - AEAD encryption output: ciphertext || authentication tag - - - - - - AEAD decryption output: plaintext - - - - - - - Random Number Generator API - - Again, the operation is very similar to the other APIs. - During initialization, the struct sockaddr data structure must be - filled as follows: - - - -struct sockaddr_alg sa = { - .salg_family = AF_ALG, - .salg_type = "rng", /* this selects the symmetric cipher */ - .salg_name = "drbg_nopr_sha256" /* this is the cipher name */ -}; - - - - Depending on the RNG type, the RNG must be seeded. The seed is provided - using the setsockopt interface to set the key. For example, the - ansi_cprng requires a seed. The DRBGs do not require a seed, but - may be seeded. - - - - Using the read()/recvmsg() system calls, random numbers can be obtained. - The kernel generates at most 128 bytes in one call. If user space - requires more data, multiple calls to read()/recvmsg() must be made. - - - - WARNING: The user space caller may invoke the initially mentioned - accept system call multiple times. In this case, the returned file - descriptors have the same state. - - - - - Zero-Copy Interface - - In addition to the send/write/read/recv system call family, the AF_ALG - interface can be accessed with the zero-copy interface of splice/vmsplice. - As the name indicates, the kernel tries to avoid a copy operation into - kernel space. - - - - The zero-copy operation requires data to be aligned at the page boundary. - Non-aligned data can be used as well, but may require more operations of - the kernel which would defeat the speed gains obtained from the zero-copy - interface. - - - - The system-interent limit for the size of one zero-copy operation is - 16 pages. If more data is to be sent to AF_ALG, user space must slice - the input into segments with a maximum size of 16 pages. - - - - Zero-copy can be used with the following code example (a complete working - example is provided with libkcapi): - - - -int pipes[2]; - -pipe(pipes); -/* input data in iov */ -vmsplice(pipes[1], iov, iovlen, SPLICE_F_GIFT); -/* opfd is the file descriptor returned from accept() system call */ -splice(pipes[0], NULL, opfd, NULL, ret, 0); -read(opfd, out, outlen); - - - - - Setsockopt Interface - - In addition to the read/recv and send/write system call handling - to send and retrieve data subject to the cipher operation, a consumer - also needs to set the additional information for the cipher operation. - This additional information is set using the setsockopt system call - that must be invoked with the file descriptor of the open cipher - (i.e. the file descriptor returned by the accept system call). - - - - Each setsockopt invocation must use the level SOL_ALG. - - - - The setsockopt interface allows setting the following data using - the mentioned optname: - - - - - - ALG_SET_KEY -- Setting the key. Key setting is applicable to: - - - - the skcipher cipher type (symmetric ciphers) - - - the hash cipher type (keyed message digests) - - - the AEAD cipher type - - - the RNG cipher type to provide the seed - - - - - - - ALG_SET_AEAD_AUTHSIZE -- Setting the authentication tag size - for AEAD ciphers. For a encryption operation, the authentication - tag of the given size will be generated. For a decryption operation, - the provided ciphertext is assumed to contain an authentication tag - of the given size (see section about AEAD memory layout below). - - - - - - - User space API example - - Please see [1] for libkcapi which provides an easy-to-use wrapper - around the aforementioned Netlink kernel interface. [1] also contains - a test application that invokes all libkcapi API calls. - - - - [1] http://www.chronox.de/libkcapi.html - - - - - - - Programming Interface - - Please note that the kernel crypto API contains the AEAD givcrypt - API (crypto_aead_giv* and aead_givcrypt_* function calls in - include/crypto/aead.h). This API is obsolete and will be removed - in the future. To obtain the functionality of an AEAD cipher with - internal IV generation, use the IV generator as a regular cipher. - For example, rfc4106(gcm(aes)) is the AEAD cipher with external - IV generation and seqniv(rfc4106(gcm(aes))) implies that the kernel - crypto API generates the IV. Different IV generators are available. - - Block Cipher Context Data Structures -!Pinclude/linux/crypto.h Block Cipher Context Data Structures -!Finclude/crypto/aead.h aead_request - - Block Cipher Algorithm Definitions -!Pinclude/linux/crypto.h Block Cipher Algorithm Definitions -!Finclude/linux/crypto.h crypto_alg -!Finclude/linux/crypto.h ablkcipher_alg -!Finclude/crypto/aead.h aead_alg -!Finclude/linux/crypto.h blkcipher_alg -!Finclude/linux/crypto.h cipher_alg -!Finclude/crypto/rng.h rng_alg - - Symmetric Key Cipher API -!Pinclude/crypto/skcipher.h Symmetric Key Cipher API -!Finclude/crypto/skcipher.h crypto_alloc_skcipher -!Finclude/crypto/skcipher.h crypto_free_skcipher -!Finclude/crypto/skcipher.h crypto_has_skcipher -!Finclude/crypto/skcipher.h crypto_skcipher_ivsize -!Finclude/crypto/skcipher.h crypto_skcipher_blocksize -!Finclude/crypto/skcipher.h crypto_skcipher_setkey -!Finclude/crypto/skcipher.h crypto_skcipher_reqtfm -!Finclude/crypto/skcipher.h crypto_skcipher_encrypt -!Finclude/crypto/skcipher.h crypto_skcipher_decrypt - - Symmetric Key Cipher Request Handle -!Pinclude/crypto/skcipher.h Symmetric Key Cipher Request Handle -!Finclude/crypto/skcipher.h crypto_skcipher_reqsize -!Finclude/crypto/skcipher.h skcipher_request_set_tfm -!Finclude/crypto/skcipher.h skcipher_request_alloc -!Finclude/crypto/skcipher.h skcipher_request_free -!Finclude/crypto/skcipher.h skcipher_request_set_callback -!Finclude/crypto/skcipher.h skcipher_request_set_crypt - - Asynchronous Block Cipher API - Deprecated -!Pinclude/linux/crypto.h Asynchronous Block Cipher API -!Finclude/linux/crypto.h crypto_alloc_ablkcipher -!Finclude/linux/crypto.h crypto_free_ablkcipher -!Finclude/linux/crypto.h crypto_has_ablkcipher -!Finclude/linux/crypto.h crypto_ablkcipher_ivsize -!Finclude/linux/crypto.h crypto_ablkcipher_blocksize -!Finclude/linux/crypto.h crypto_ablkcipher_setkey -!Finclude/linux/crypto.h crypto_ablkcipher_reqtfm -!Finclude/linux/crypto.h crypto_ablkcipher_encrypt -!Finclude/linux/crypto.h crypto_ablkcipher_decrypt - - Asynchronous Cipher Request Handle - Deprecated -!Pinclude/linux/crypto.h Asynchronous Cipher Request Handle -!Finclude/linux/crypto.h crypto_ablkcipher_reqsize -!Finclude/linux/crypto.h ablkcipher_request_set_tfm -!Finclude/linux/crypto.h ablkcipher_request_alloc -!Finclude/linux/crypto.h ablkcipher_request_free -!Finclude/linux/crypto.h ablkcipher_request_set_callback -!Finclude/linux/crypto.h ablkcipher_request_set_crypt - - Authenticated Encryption With Associated Data (AEAD) Cipher API -!Pinclude/crypto/aead.h Authenticated Encryption With Associated Data (AEAD) Cipher API -!Finclude/crypto/aead.h crypto_alloc_aead -!Finclude/crypto/aead.h crypto_free_aead -!Finclude/crypto/aead.h crypto_aead_ivsize -!Finclude/crypto/aead.h crypto_aead_authsize -!Finclude/crypto/aead.h crypto_aead_blocksize -!Finclude/crypto/aead.h crypto_aead_setkey -!Finclude/crypto/aead.h crypto_aead_setauthsize -!Finclude/crypto/aead.h crypto_aead_encrypt -!Finclude/crypto/aead.h crypto_aead_decrypt - - Asynchronous AEAD Request Handle -!Pinclude/crypto/aead.h Asynchronous AEAD Request Handle -!Finclude/crypto/aead.h crypto_aead_reqsize -!Finclude/crypto/aead.h aead_request_set_tfm -!Finclude/crypto/aead.h aead_request_alloc -!Finclude/crypto/aead.h aead_request_free -!Finclude/crypto/aead.h aead_request_set_callback -!Finclude/crypto/aead.h aead_request_set_crypt -!Finclude/crypto/aead.h aead_request_set_ad - - Synchronous Block Cipher API - Deprecated -!Pinclude/linux/crypto.h Synchronous Block Cipher API -!Finclude/linux/crypto.h crypto_alloc_blkcipher -!Finclude/linux/crypto.h crypto_free_blkcipher -!Finclude/linux/crypto.h crypto_has_blkcipher -!Finclude/linux/crypto.h crypto_blkcipher_name -!Finclude/linux/crypto.h crypto_blkcipher_ivsize -!Finclude/linux/crypto.h crypto_blkcipher_blocksize -!Finclude/linux/crypto.h crypto_blkcipher_setkey -!Finclude/linux/crypto.h crypto_blkcipher_encrypt -!Finclude/linux/crypto.h crypto_blkcipher_encrypt_iv -!Finclude/linux/crypto.h crypto_blkcipher_decrypt -!Finclude/linux/crypto.h crypto_blkcipher_decrypt_iv -!Finclude/linux/crypto.h crypto_blkcipher_set_iv -!Finclude/linux/crypto.h crypto_blkcipher_get_iv - - Single Block Cipher API -!Pinclude/linux/crypto.h Single Block Cipher API -!Finclude/linux/crypto.h crypto_alloc_cipher -!Finclude/linux/crypto.h crypto_free_cipher -!Finclude/linux/crypto.h crypto_has_cipher -!Finclude/linux/crypto.h crypto_cipher_blocksize -!Finclude/linux/crypto.h crypto_cipher_setkey -!Finclude/linux/crypto.h crypto_cipher_encrypt_one -!Finclude/linux/crypto.h crypto_cipher_decrypt_one - - Message Digest Algorithm Definitions -!Pinclude/crypto/hash.h Message Digest Algorithm Definitions -!Finclude/crypto/hash.h hash_alg_common -!Finclude/crypto/hash.h ahash_alg -!Finclude/crypto/hash.h shash_alg - - Asynchronous Message Digest API -!Pinclude/crypto/hash.h Asynchronous Message Digest API -!Finclude/crypto/hash.h crypto_alloc_ahash -!Finclude/crypto/hash.h crypto_free_ahash -!Finclude/crypto/hash.h crypto_ahash_init -!Finclude/crypto/hash.h crypto_ahash_digestsize -!Finclude/crypto/hash.h crypto_ahash_reqtfm -!Finclude/crypto/hash.h crypto_ahash_reqsize -!Finclude/crypto/hash.h crypto_ahash_setkey -!Finclude/crypto/hash.h crypto_ahash_finup -!Finclude/crypto/hash.h crypto_ahash_final -!Finclude/crypto/hash.h crypto_ahash_digest -!Finclude/crypto/hash.h crypto_ahash_export -!Finclude/crypto/hash.h crypto_ahash_import - - Asynchronous Hash Request Handle -!Pinclude/crypto/hash.h Asynchronous Hash Request Handle -!Finclude/crypto/hash.h ahash_request_set_tfm -!Finclude/crypto/hash.h ahash_request_alloc -!Finclude/crypto/hash.h ahash_request_free -!Finclude/crypto/hash.h ahash_request_set_callback -!Finclude/crypto/hash.h ahash_request_set_crypt - - Synchronous Message Digest API -!Pinclude/crypto/hash.h Synchronous Message Digest API -!Finclude/crypto/hash.h crypto_alloc_shash -!Finclude/crypto/hash.h crypto_free_shash -!Finclude/crypto/hash.h crypto_shash_blocksize -!Finclude/crypto/hash.h crypto_shash_digestsize -!Finclude/crypto/hash.h crypto_shash_descsize -!Finclude/crypto/hash.h crypto_shash_setkey -!Finclude/crypto/hash.h crypto_shash_digest -!Finclude/crypto/hash.h crypto_shash_export -!Finclude/crypto/hash.h crypto_shash_import -!Finclude/crypto/hash.h crypto_shash_init -!Finclude/crypto/hash.h crypto_shash_update -!Finclude/crypto/hash.h crypto_shash_final -!Finclude/crypto/hash.h crypto_shash_finup - - Crypto API Random Number API -!Pinclude/crypto/rng.h Random number generator API -!Finclude/crypto/rng.h crypto_alloc_rng -!Finclude/crypto/rng.h crypto_rng_alg -!Finclude/crypto/rng.h crypto_free_rng -!Finclude/crypto/rng.h crypto_rng_generate -!Finclude/crypto/rng.h crypto_rng_get_bytes -!Finclude/crypto/rng.h crypto_rng_reset -!Finclude/crypto/rng.h crypto_rng_seedsize -!Cinclude/crypto/rng.h - - Asymmetric Cipher API -!Pinclude/crypto/akcipher.h Generic Public Key API -!Finclude/crypto/akcipher.h akcipher_alg -!Finclude/crypto/akcipher.h akcipher_request -!Finclude/crypto/akcipher.h crypto_alloc_akcipher -!Finclude/crypto/akcipher.h crypto_free_akcipher -!Finclude/crypto/akcipher.h crypto_akcipher_set_pub_key -!Finclude/crypto/akcipher.h crypto_akcipher_set_priv_key - - Asymmetric Cipher Request Handle -!Finclude/crypto/akcipher.h akcipher_request_alloc -!Finclude/crypto/akcipher.h akcipher_request_free -!Finclude/crypto/akcipher.h akcipher_request_set_callback -!Finclude/crypto/akcipher.h akcipher_request_set_crypt -!Finclude/crypto/akcipher.h crypto_akcipher_maxsize -!Finclude/crypto/akcipher.h crypto_akcipher_encrypt -!Finclude/crypto/akcipher.h crypto_akcipher_decrypt -!Finclude/crypto/akcipher.h crypto_akcipher_sign -!Finclude/crypto/akcipher.h crypto_akcipher_verify - - - - Code Examples - Code Example For Symmetric Key Cipher Operation - - -struct tcrypt_result { - struct completion completion; - int err; -}; - -/* tie all data structures together */ -struct skcipher_def { - struct scatterlist sg; - struct crypto_skcipher *tfm; - struct skcipher_request *req; - struct tcrypt_result result; -}; - -/* Callback function */ -static void test_skcipher_cb(struct crypto_async_request *req, int error) -{ - struct tcrypt_result *result = req->data; - - if (error == -EINPROGRESS) - return; - result->err = error; - complete(&result->completion); - pr_info("Encryption finished successfully\n"); -} - -/* Perform cipher operation */ -static unsigned int test_skcipher_encdec(struct skcipher_def *sk, - int enc) -{ - int rc = 0; - - if (enc) - rc = crypto_skcipher_encrypt(sk->req); - else - rc = crypto_skcipher_decrypt(sk->req); - - switch (rc) { - case 0: - break; - case -EINPROGRESS: - case -EBUSY: - rc = wait_for_completion_interruptible( - &sk->result.completion); - if (!rc && !sk->result.err) { - reinit_completion(&sk->result.completion); - break; - } - default: - pr_info("skcipher encrypt returned with %d result %d\n", - rc, sk->result.err); - break; - } - init_completion(&sk->result.completion); - - return rc; -} - -/* Initialize and trigger cipher operation */ -static int test_skcipher(void) -{ - struct skcipher_def sk; - struct crypto_skcipher *skcipher = NULL; - struct skcipher_request *req = NULL; - char *scratchpad = NULL; - char *ivdata = NULL; - unsigned char key[32]; - int ret = -EFAULT; - - skcipher = crypto_alloc_skcipher("cbc-aes-aesni", 0, 0); - if (IS_ERR(skcipher)) { - pr_info("could not allocate skcipher handle\n"); - return PTR_ERR(skcipher); - } - - req = skcipher_request_alloc(skcipher, GFP_KERNEL); - if (!req) { - pr_info("could not allocate skcipher request\n"); - ret = -ENOMEM; - goto out; - } - - skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, - test_skcipher_cb, - &sk.result); - - /* AES 256 with random key */ - get_random_bytes(&key, 32); - if (crypto_skcipher_setkey(skcipher, key, 32)) { - pr_info("key could not be set\n"); - ret = -EAGAIN; - goto out; - } - - /* IV will be random */ - ivdata = kmalloc(16, GFP_KERNEL); - if (!ivdata) { - pr_info("could not allocate ivdata\n"); - goto out; - } - get_random_bytes(ivdata, 16); - - /* Input data will be random */ - scratchpad = kmalloc(16, GFP_KERNEL); - if (!scratchpad) { - pr_info("could not allocate scratchpad\n"); - goto out; - } - get_random_bytes(scratchpad, 16); - - sk.tfm = skcipher; - sk.req = req; - - /* We encrypt one block */ - sg_init_one(&sk.sg, scratchpad, 16); - skcipher_request_set_crypt(req, &sk.sg, &sk.sg, 16, ivdata); - init_completion(&sk.result.completion); - - /* encrypt data */ - ret = test_skcipher_encdec(&sk, 1); - if (ret) - goto out; - - pr_info("Encryption triggered successfully\n"); - -out: - if (skcipher) - crypto_free_skcipher(skcipher); - if (req) - skcipher_request_free(req); - if (ivdata) - kfree(ivdata); - if (scratchpad) - kfree(scratchpad); - return ret; -} - - - - Code Example For Use of Operational State Memory With SHASH - - -struct sdesc { - struct shash_desc shash; - char ctx[]; -}; - -static struct sdescinit_sdesc(struct crypto_shash *alg) -{ - struct sdescsdesc; - int size; - - size = sizeof(struct shash_desc) + crypto_shash_descsize(alg); - sdesc = kmalloc(size, GFP_KERNEL); - if (!sdesc) - return ERR_PTR(-ENOMEM); - sdesc->shash.tfm = alg; - sdesc->shash.flags = 0x0; - return sdesc; -} - -static int calc_hash(struct crypto_shashalg, - const unsigned chardata, unsigned int datalen, - unsigned chardigest) { - struct sdescsdesc; - int ret; - - sdesc = init_sdesc(alg); - if (IS_ERR(sdesc)) { - pr_info("trusted_key: can't alloc %s\n", hash_alg); - return PTR_ERR(sdesc); - } - - ret = crypto_shash_digest(&sdesc->shash, data, datalen, digest); - kfree(sdesc); - return ret; -} - - - - Code Example For Random Number Generator Usage - - -static int get_random_numbers(u8 *buf, unsigned int len) -{ - struct crypto_rngrng = NULL; - chardrbg = "drbg_nopr_sha256"; /* Hash DRBG with SHA-256, no PR */ - int ret; - - if (!buf || !len) { - pr_debug("No output buffer provided\n"); - return -EINVAL; - } - - rng = crypto_alloc_rng(drbg, 0, 0); - if (IS_ERR(rng)) { - pr_debug("could not allocate RNG handle for %s\n", drbg); - return -PTR_ERR(rng); - } - - ret = crypto_rng_get_bytes(rng, buf, len); - if (ret < 0) - pr_debug("generation of random numbers failed\n"); - else if (ret == 0) - pr_debug("RNG returned no data"); - else - pr_debug("RNG returned %d bytes of data\n", ret); - -out: - crypto_free_rng(rng); - return ret; -} - - - -
diff --git a/Documentation/DocBook/debugobjects.tmpl b/Documentation/DocBook/debugobjects.tmpl deleted file mode 100644 index 7e4f34fde697..000000000000 --- a/Documentation/DocBook/debugobjects.tmpl +++ /dev/null @@ -1,443 +0,0 @@ - - - - - - Debug objects life time - - - - Thomas - Gleixner - -
- tglx@linutronix.de -
-
-
-
- - - 2008 - Thomas Gleixner - - - - - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License version 2 as published by the Free Software Foundation. - - - - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - -
- - - - - Introduction - - debugobjects is a generic infrastructure to track the life time - of kernel objects and validate the operations on those. - - - debugobjects is useful to check for the following error patterns: - - Activation of uninitialized objects - Initialization of active objects - Usage of freed/destroyed objects - - - - debugobjects is not changing the data structure of the real - object so it can be compiled in with a minimal runtime impact - and enabled on demand with a kernel command line option. - - - - - Howto use debugobjects - - A kernel subsystem needs to provide a data structure which - describes the object type and add calls into the debug code at - appropriate places. The data structure to describe the object - type needs at minimum the name of the object type. Optional - functions can and should be provided to fixup detected problems - so the kernel can continue to work and the debug information can - be retrieved from a live system instead of hard core debugging - with serial consoles and stack trace transcripts from the - monitor. - - - The debug calls provided by debugobjects are: - - debug_object_init - debug_object_init_on_stack - debug_object_activate - debug_object_deactivate - debug_object_destroy - debug_object_free - debug_object_assert_init - - Each of these functions takes the address of the real object and - a pointer to the object type specific debug description - structure. - - - Each detected error is reported in the statistics and a limited - number of errors are printk'ed including a full stack trace. - - - The statistics are available via /sys/kernel/debug/debug_objects/stats. - They provide information about the number of warnings and the - number of successful fixups along with information about the - usage of the internal tracking objects and the state of the - internal tracking objects pool. - - - - Debug functions - - Debug object function reference -!Elib/debugobjects.c - - - debug_object_init - - This function is called whenever the initialization function - of a real object is called. - - - When the real object is already tracked by debugobjects it is - checked, whether the object can be initialized. Initializing - is not allowed for active and destroyed objects. When - debugobjects detects an error, then it calls the fixup_init - function of the object type description structure if provided - by the caller. The fixup function can correct the problem - before the real initialization of the object happens. E.g. it - can deactivate an active object in order to prevent damage to - the subsystem. - - - When the real object is not yet tracked by debugobjects, - debugobjects allocates a tracker object for the real object - and sets the tracker object state to ODEBUG_STATE_INIT. It - verifies that the object is not on the callers stack. If it is - on the callers stack then a limited number of warnings - including a full stack trace is printk'ed. The calling code - must use debug_object_init_on_stack() and remove the object - before leaving the function which allocated it. See next - section. - - - - - debug_object_init_on_stack - - This function is called whenever the initialization function - of a real object which resides on the stack is called. - - - When the real object is already tracked by debugobjects it is - checked, whether the object can be initialized. Initializing - is not allowed for active and destroyed objects. When - debugobjects detects an error, then it calls the fixup_init - function of the object type description structure if provided - by the caller. The fixup function can correct the problem - before the real initialization of the object happens. E.g. it - can deactivate an active object in order to prevent damage to - the subsystem. - - - When the real object is not yet tracked by debugobjects - debugobjects allocates a tracker object for the real object - and sets the tracker object state to ODEBUG_STATE_INIT. It - verifies that the object is on the callers stack. - - - An object which is on the stack must be removed from the - tracker by calling debug_object_free() before the function - which allocates the object returns. Otherwise we keep track of - stale objects. - - - - - debug_object_activate - - This function is called whenever the activation function of a - real object is called. - - - When the real object is already tracked by debugobjects it is - checked, whether the object can be activated. Activating is - not allowed for active and destroyed objects. When - debugobjects detects an error, then it calls the - fixup_activate function of the object type description - structure if provided by the caller. The fixup function can - correct the problem before the real activation of the object - happens. E.g. it can deactivate an active object in order to - prevent damage to the subsystem. - - - When the real object is not yet tracked by debugobjects then - the fixup_activate function is called if available. This is - necessary to allow the legitimate activation of statically - allocated and initialized objects. The fixup function checks - whether the object is valid and calls the debug_objects_init() - function to initialize the tracking of this object. - - - When the activation is legitimate, then the state of the - associated tracker object is set to ODEBUG_STATE_ACTIVE. - - - - - debug_object_deactivate - - This function is called whenever the deactivation function of - a real object is called. - - - When the real object is tracked by debugobjects it is checked, - whether the object can be deactivated. Deactivating is not - allowed for untracked or destroyed objects. - - - When the deactivation is legitimate, then the state of the - associated tracker object is set to ODEBUG_STATE_INACTIVE. - - - - - debug_object_destroy - - This function is called to mark an object destroyed. This is - useful to prevent the usage of invalid objects, which are - still available in memory: either statically allocated objects - or objects which are freed later. - - - When the real object is tracked by debugobjects it is checked, - whether the object can be destroyed. Destruction is not - allowed for active and destroyed objects. When debugobjects - detects an error, then it calls the fixup_destroy function of - the object type description structure if provided by the - caller. The fixup function can correct the problem before the - real destruction of the object happens. E.g. it can deactivate - an active object in order to prevent damage to the subsystem. - - - When the destruction is legitimate, then the state of the - associated tracker object is set to ODEBUG_STATE_DESTROYED. - - - - - debug_object_free - - This function is called before an object is freed. - - - When the real object is tracked by debugobjects it is checked, - whether the object can be freed. Free is not allowed for - active objects. When debugobjects detects an error, then it - calls the fixup_free function of the object type description - structure if provided by the caller. The fixup function can - correct the problem before the real free of the object - happens. E.g. it can deactivate an active object in order to - prevent damage to the subsystem. - - - Note that debug_object_free removes the object from the - tracker. Later usage of the object is detected by the other - debug checks. - - - - - debug_object_assert_init - - This function is called to assert that an object has been - initialized. - - - When the real object is not tracked by debugobjects, it calls - fixup_assert_init of the object type description structure - provided by the caller, with the hardcoded object state - ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem - by calling debug_object_init and other specific initializing - functions. - - - When the real object is already tracked by debugobjects it is - ignored. - - - - - Fixup functions - - Debug object type description structure -!Iinclude/linux/debugobjects.h - - - fixup_init - - This function is called from the debug code whenever a problem - in debug_object_init is detected. The function takes the - address of the object and the state which is currently - recorded in the tracker. - - - Called from debug_object_init when the object state is: - - ODEBUG_STATE_ACTIVE - - - - The function returns true when the fixup was successful, - otherwise false. The return value is used to update the - statistics. - - - Note, that the function needs to call the debug_object_init() - function again, after the damage has been repaired in order to - keep the state consistent. - - - - - fixup_activate - - This function is called from the debug code whenever a problem - in debug_object_activate is detected. - - - Called from debug_object_activate when the object state is: - - ODEBUG_STATE_NOTAVAILABLE - ODEBUG_STATE_ACTIVE - - - - The function returns true when the fixup was successful, - otherwise false. The return value is used to update the - statistics. - - - Note that the function needs to call the debug_object_activate() - function again after the damage has been repaired in order to - keep the state consistent. - - - The activation of statically initialized objects is a special - case. When debug_object_activate() has no tracked object for - this object address then fixup_activate() is called with - object state ODEBUG_STATE_NOTAVAILABLE. The fixup function - needs to check whether this is a legitimate case of a - statically initialized object or not. In case it is it calls - debug_object_init() and debug_object_activate() to make the - object known to the tracker and marked active. In this case - the function should return false because this is not a real - fixup. - - - - - fixup_destroy - - This function is called from the debug code whenever a problem - in debug_object_destroy is detected. - - - Called from debug_object_destroy when the object state is: - - ODEBUG_STATE_ACTIVE - - - - The function returns true when the fixup was successful, - otherwise false. The return value is used to update the - statistics. - - - - fixup_free - - This function is called from the debug code whenever a problem - in debug_object_free is detected. Further it can be called - from the debug checks in kfree/vfree, when an active object is - detected from the debug_check_no_obj_freed() sanity checks. - - - Called from debug_object_free() or debug_check_no_obj_freed() - when the object state is: - - ODEBUG_STATE_ACTIVE - - - - The function returns true when the fixup was successful, - otherwise false. The return value is used to update the - statistics. - - - - fixup_assert_init - - This function is called from the debug code whenever a problem - in debug_object_assert_init is detected. - - - Called from debug_object_assert_init() with a hardcoded state - ODEBUG_STATE_NOTAVAILABLE when the object is not found in the - debug bucket. - - - The function returns true when the fixup was successful, - otherwise false. The return value is used to update the - statistics. - - - Note, this function should make sure debug_object_init() is - called before returning. - - - The handling of statically initialized objects is a special - case. The fixup function should check if this is a legitimate - case of a statically initialized object or not. In this case only - debug_object_init() should be called to make the object known to - the tracker. Then the function should return false because this - is not - a real fixup. - - - - - Known Bugs And Assumptions - - None (knock on wood). - - -
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl index 2a272275c81b..da5c087462b1 100644 --- a/Documentation/DocBook/kernel-hacking.tmpl +++ b/Documentation/DocBook/kernel-hacking.tmpl @@ -1208,8 +1208,8 @@ static struct block_device_operations opt_fops = { - Finally, don't forget to read Documentation/SubmittingPatches - and possibly Documentation/SubmittingDrivers. + Finally, don't forget to read Documentation/process/submitting-patches.rst + and possibly Documentation/process/submitting-drivers.rst. diff --git a/Documentation/DocBook/tracepoint.tmpl b/Documentation/DocBook/tracepoint.tmpl deleted file mode 100644 index b57a9ede3224..000000000000 --- a/Documentation/DocBook/tracepoint.tmpl +++ /dev/null @@ -1,112 +0,0 @@ - - - - - - The Linux Kernel Tracepoint API - - - - Jason - Baron - -
- jbaron@redhat.com -
-
-
- - William - Cohen - -
- wcohen@redhat.com -
-
-
-
- - - - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either - version 2 of the License, or (at your option) any later - version. - - - - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - -
- - - - Introduction - - Tracepoints are static probe points that are located in strategic points - throughout the kernel. 'Probes' register/unregister with tracepoints - via a callback mechanism. The 'probes' are strictly typed functions that - are passed a unique set of parameters defined by each tracepoint. - - - - From this simple callback mechanism, 'probes' can be used to profile, debug, - and understand kernel behavior. There are a number of tools that provide a - framework for using 'probes'. These tools include Systemtap, ftrace, and - LTTng. - - - - Tracepoints are defined in a number of header files via various macros. Thus, - the purpose of this document is to provide a clear accounting of the available - tracepoints. The intention is to understand not only what tracepoints are - available but also to understand where future tracepoints might be added. - - - - The API presented has functions of the form: - trace_tracepointname(function parameters). These are the - tracepoints callbacks that are found throughout the code. Registering and - unregistering probes with these callback sites is covered in the - Documentation/trace/* directory. - - - - - IRQ -!Iinclude/trace/events/irq.h - - - - SIGNAL -!Iinclude/trace/events/signal.h - - - - Block IO -!Iinclude/trace/events/block.h - - - - Workqueue -!Iinclude/trace/events/workqueue.h - -
diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl index cd0e452dfed5..5210f8a577c6 100644 --- a/Documentation/DocBook/uio-howto.tmpl +++ b/Documentation/DocBook/uio-howto.tmpl @@ -45,6 +45,13 @@ GPL version 2. + + 0.10 + 2016-10-17 + sch + Added generic hyperv driver + + 0.9 2009-07-16 @@ -1033,6 +1040,61 @@ int main() + + +Generic Hyper-V UIO driver + + The generic driver is a kernel module named uio_hv_generic. + It supports devices on the Hyper-V VMBus similar to uio_pci_generic + on PCI bus. + + + +Making the driver recognize the device + +Since the driver does not declare any device GUID's, it will not get loaded +automatically and will not automatically bind to any devices, you must load it +and allocate id to the driver yourself. For example, to use the network device +GUID: + + modprobe uio_hv_generic + echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id + + + +If there already is a hardware specific kernel driver for the device, the +generic driver still won't bind to it, in this case if you want to use the +generic driver (why would you?) you'll have to manually unbind the hardware +specific driver and bind the generic driver, like this: + + echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind + echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind + + + +You can verify that the device has been bound to the driver +by looking for it in sysfs, for example like the following: + + ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver + +Which if successful should print + + .../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic + + + + + +Things to know about uio_hv_generic + +On each interrupt, uio_hv_generic sets the Interrupt Disable bit. +This prevents the device from generating further interrupts +until the bit is cleared. The userspace driver should clear this +bit before blocking and waiting for more interrupts. + + + + Further information diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl deleted file mode 100644 index bc776be0f19c..000000000000 --- a/Documentation/DocBook/usb.tmpl +++ /dev/null @@ -1,992 +0,0 @@ - - - - - - The Linux-USB Host Side API - - - - This documentation is free software; you can redistribute - it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either - version 2 of the License, or (at your option) any later - version. - - - - This program is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - - - - - - - Introduction to USB on Linux - - A Universal Serial Bus (USB) is used to connect a host, - such as a PC or workstation, to a number of peripheral - devices. USB uses a tree structure, with the host as the - root (the system's master), hubs as interior nodes, and - peripherals as leaves (and slaves). - Modern PCs support several such trees of USB devices, usually - one USB 2.0 tree (480 Mbit/sec each) with - a few USB 1.1 trees (12 Mbit/sec each) that are used when you - connect a USB 1.1 device directly to the machine's "root hub". - - - That master/slave asymmetry was designed-in for a number of - reasons, one being ease of use. It is not physically possible to - assemble (legal) USB cables incorrectly: all upstream "to the host" - connectors are the rectangular type (matching the sockets on - root hubs), and all downstream connectors are the squarish type - (or they are built into the peripheral). - Also, the host software doesn't need to deal with distributed - auto-configuration since the pre-designated master node manages all that. - And finally, at the electrical level, bus protocol overhead is reduced by - eliminating arbitration and moving scheduling into the host software. - - - USB 1.0 was announced in January 1996 and was revised - as USB 1.1 (with improvements in hub specification and - support for interrupt-out transfers) in September 1998. - USB 2.0 was released in April 2000, adding high-speed - transfers and transaction-translating hubs (used for USB 1.1 - and 1.0 backward compatibility). - - - Kernel developers added USB support to Linux early in the 2.2 kernel - series, shortly before 2.3 development forked. Updates from 2.3 were - regularly folded back into 2.2 releases, which improved reliability and - brought /sbin/hotplug support as well more drivers. - Such improvements were continued in the 2.5 kernel series, where they added - USB 2.0 support, improved performance, and made the host controller drivers - (HCDs) more consistent. They also simplified the API (to make bugs less - likely) and added internal "kerneldoc" documentation. - - - Linux can run inside USB devices as well as on - the hosts that control the devices. - But USB device drivers running inside those peripherals - don't do the same things as the ones running inside hosts, - so they've been given a different name: - gadget drivers. - This document does not cover gadget drivers. - - - - - - USB Host-Side API Model - - Host-side drivers for USB devices talk to the "usbcore" APIs. - There are two. One is intended for - general-purpose drivers (exposed through - driver frameworks), and the other is for drivers that are - part of the core. - Such core drivers include the hub driver - (which manages trees of USB devices) and several different kinds - of host controller drivers, - which control individual busses. - - - The device model seen by USB drivers is relatively complex. - - - - - USB supports four kinds of data transfers - (control, bulk, interrupt, and isochronous). Two of them (control - and bulk) use bandwidth as it's available, - while the other two (interrupt and isochronous) - are scheduled to provide guaranteed bandwidth. - - - The device description model includes one or more - "configurations" per device, only one of which is active at a time. - Devices that are capable of high-speed operation must also support - full-speed configurations, along with a way to ask about the - "other speed" configurations which might be used. - - - Configurations have one or more "interfaces", each - of which may have "alternate settings". Interfaces may be - standardized by USB "Class" specifications, or may be specific to - a vendor or device. - - USB device drivers actually bind to interfaces, not devices. - Think of them as "interface drivers", though you - may not see many devices where the distinction is important. - Most USB devices are simple, with only one configuration, - one interface, and one alternate setting. - - - Interfaces have one or more "endpoints", each of - which supports one type and direction of data transfer such as - "bulk out" or "interrupt in". The entire configuration may have - up to sixteen endpoints in each direction, allocated as needed - among all the interfaces. - - - Data transfer on USB is packetized; each endpoint - has a maximum packet size. - Drivers must often be aware of conventions such as flagging the end - of bulk transfers using "short" (including zero length) packets. - - - The Linux USB API supports synchronous calls for - control and bulk messages. - It also supports asynchronous calls for all kinds of data transfer, - using request structures called "URBs" (USB Request Blocks). - - - - - Accordingly, the USB Core API exposed to device drivers - covers quite a lot of territory. You'll probably need to consult - the USB 2.0 specification, available online from www.usb.org at - no cost, as well as class or device specifications. - - - The only host-side drivers that actually touch hardware - (reading/writing registers, handling IRQs, and so on) are the HCDs. - In theory, all HCDs provide the same functionality through the same - API. In practice, that's becoming more true on the 2.5 kernels, - but there are still differences that crop up especially with - fault handling. Different controllers don't necessarily report - the same aspects of failures, and recovery from faults (including - software-induced ones like unlinking an URB) isn't yet fully - consistent. - Device driver authors should make a point of doing disconnect - testing (while the device is active) with each different host - controller driver, to make sure drivers don't have bugs of - their own as well as to make sure they aren't relying on some - HCD-specific behavior. - (You will need external USB 1.1 and/or - USB 2.0 hubs to perform all those tests.) - - - - -USB-Standard Types - - In <linux/usb/ch9.h> you will find - the USB data types defined in chapter 9 of the USB specification. - These data types are used throughout USB, and in APIs including - this host side API, gadget APIs, and usbfs. - - -!Iinclude/linux/usb/ch9.h - - - -Host-Side Data Types and Macros - - The host side API exposes several layers to drivers, some of - which are more necessary than others. - These support lifecycle models for host side drivers - and devices, and support passing buffers through usbcore to - some HCD that performs the I/O for the device driver. - - - -!Iinclude/linux/usb.h - - - - USB Core APIs - - There are two basic I/O models in the USB API. - The most elemental one is asynchronous: drivers submit requests - in the form of an URB, and the URB's completion callback - handle the next step. - All USB transfer types support that model, although there - are special cases for control URBs (which always have setup - and status stages, but may not have a data stage) and - isochronous URBs (which allow large packets and include - per-packet fault reports). - Built on top of that is synchronous API support, where a - driver calls a routine that allocates one or more URBs, - submits them, and waits until they complete. - There are synchronous wrappers for single-buffer control - and bulk transfers (which are awkward to use in some - driver disconnect scenarios), and for scatterlist based - streaming i/o (bulk or interrupt). - - - USB drivers need to provide buffers that can be - used for DMA, although they don't necessarily need to - provide the DMA mapping themselves. - There are APIs to use used when allocating DMA buffers, - which can prevent use of bounce buffers on some systems. - In some cases, drivers may be able to rely on 64bit DMA - to eliminate another kind of bounce buffer. - - -!Edrivers/usb/core/urb.c -!Edrivers/usb/core/message.c -!Edrivers/usb/core/file.c -!Edrivers/usb/core/driver.c -!Edrivers/usb/core/usb.c -!Edrivers/usb/core/hub.c - - - Host Controller APIs - - These APIs are only for use by host controller drivers, - most of which implement standard register interfaces such as - EHCI, OHCI, or UHCI. - UHCI was one of the first interfaces, designed by Intel and - also used by VIA; it doesn't do much in hardware. - OHCI was designed later, to have the hardware do more work - (bigger transfers, tracking protocol state, and so on). - EHCI was designed with USB 2.0; its design has features that - resemble OHCI (hardware does much more work) as well as - UHCI (some parts of ISO support, TD list processing). - - - There are host controllers other than the "big three", - although most PCI based controllers (and a few non-PCI based - ones) use one of those interfaces. - Not all host controllers use DMA; some use PIO, and there - is also a simulator. - - - The same basic APIs are available to drivers for all - those controllers. - For historical reasons they are in two layers: - struct usb_bus is a rather thin - layer that became available in the 2.2 kernels, while - struct usb_hcd is a more featureful - layer (available in later 2.4 kernels and in 2.5) that - lets HCDs share common code, to shrink driver size - and significantly reduce hcd-specific behaviors. - - -!Edrivers/usb/core/hcd.c -!Edrivers/usb/core/hcd-pci.c -!Idrivers/usb/core/buffer.c - - - - The USB Filesystem (usbfs) - - This chapter presents the Linux usbfs. - You may prefer to avoid writing new kernel code for your - USB driver; that's the problem that usbfs set out to solve. - User mode device drivers are usually packaged as applications - or libraries, and may use usbfs through some programming library - that wraps it. Such libraries include - libusb - for C/C++, and - jUSB for Java. - - - Unfinished - This particular documentation is incomplete, - especially with respect to the asynchronous mode. - As of kernel 2.5.66 the code and this (new) documentation - need to be cross-reviewed. - - - - Configure usbfs into Linux kernels by enabling the - USB filesystem option (CONFIG_USB_DEVICEFS), - and you get basic support for user mode USB device drivers. - Until relatively recently it was often (confusingly) called - usbdevfs although it wasn't solving what - devfs was. - Every USB device will appear in usbfs, regardless of whether or - not it has a kernel driver. - - - - What files are in "usbfs"? - - Conventionally mounted at - /proc/bus/usb, usbfs - features include: - - /proc/bus/usb/devices - ... a text file - showing each of the USB devices on known to the kernel, - and their configuration descriptors. - You can also poll() this to learn about new devices. - - /proc/bus/usb/BBB/DDD - ... magic files - exposing the each device's configuration descriptors, and - supporting a series of ioctls for making device requests, - including I/O to devices. (Purely for access by programs.) - - - - - Each bus is given a number (BBB) based on when it was - enumerated; within each bus, each device is given a similar - number (DDD). - Those BBB/DDD paths are not "stable" identifiers; - expect them to change even if you always leave the devices - plugged in to the same hub port. - Don't even think of saving these in application - configuration files. - Stable identifiers are available, for user mode applications - that want to use them. HID and networking devices expose - these stable IDs, so that for example you can be sure that - you told the right UPS to power down its second server. - "usbfs" doesn't (yet) expose those IDs. - - - - - - Mounting and Access Control - - There are a number of mount options for usbfs, which will - be of most interest to you if you need to override the default - access control policy. - That policy is that only root may read or write device files - (/proc/bus/BBB/DDD) although anyone may read - the devices - or drivers files. - I/O requests to the device also need the CAP_SYS_RAWIO capability, - - - The significance of that is that by default, all user mode - device drivers need super-user privileges. - You can change modes or ownership in a driver setup - when the device hotplugs, or maye just start the - driver right then, as a privileged server (or some activity - within one). - That's the most secure approach for multi-user systems, - but for single user systems ("trusted" by that user) - it's more convenient just to grant everyone all access - (using the devmode=0666 option) - so the driver can start whenever it's needed. - - - The mount options for usbfs, usable in /etc/fstab or - in command line invocations of mount, are: - - - - busgid=NNNNN - Controls the GID used for the - /proc/bus/usb/BBB - directories. (Default: 0) - busmode=MMM - Controls the file mode used for the - /proc/bus/usb/BBB - directories. (Default: 0555) - - busuid=NNNNN - Controls the UID used for the - /proc/bus/usb/BBB - directories. (Default: 0) - - devgid=NNNNN - Controls the GID used for the - /proc/bus/usb/BBB/DDD - files. (Default: 0) - devmode=MMM - Controls the file mode used for the - /proc/bus/usb/BBB/DDD - files. (Default: 0644) - devuid=NNNNN - Controls the UID used for the - /proc/bus/usb/BBB/DDD - files. (Default: 0) - - listgid=NNNNN - Controls the GID used for the - /proc/bus/usb/devices and drivers files. - (Default: 0) - listmode=MMM - Controls the file mode used for the - /proc/bus/usb/devices and drivers files. - (Default: 0444) - listuid=NNNNN - Controls the UID used for the - /proc/bus/usb/devices and drivers files. - (Default: 0) - - - - - Note that many Linux distributions hard-wire the mount options - for usbfs in their init scripts, such as - /etc/rc.d/rc.sysinit, - rather than making it easy to set this per-system - policy in /etc/fstab. - - - - - - /proc/bus/usb/devices - - This file is handy for status viewing tools in user - mode, which can scan the text format and ignore most of it. - More detailed device status (including class and vendor - status) is available from device-specific files. - For information about the current format of this file, - see the - Documentation/usb/proc_usb_info.txt - file in your Linux kernel sources. - - - This file, in combination with the poll() system call, can - also be used to detect when devices are added or removed: -int fd; -struct pollfd pfd; - -fd = open("/proc/bus/usb/devices", O_RDONLY); -pfd = { fd, POLLIN, 0 }; -for (;;) { - /* The first time through, this call will return immediately. */ - poll(&pfd, 1, -1); - - /* To see what's changed, compare the file's previous and current - contents or scan the filesystem. (Scanning is more precise.) */ -} - Note that this behavior is intended to be used for informational - and debug purposes. It would be more appropriate to use programs - such as udev or HAL to initialize a device or start a user-mode - helper program, for instance. - - - - - /proc/bus/usb/BBB/DDD - - Use these files in one of these basic ways: - - - They can be read, - producing first the device descriptor - (18 bytes) and then the descriptors for the current configuration. - See the USB 2.0 spec for details about those binary data formats. - You'll need to convert most multibyte values from little endian - format to your native host byte order, although a few of the - fields in the device descriptor (both of the BCD-encoded fields, - and the vendor and product IDs) will be byteswapped for you. - Note that configuration descriptors include descriptors for - interfaces, altsettings, endpoints, and maybe additional - class descriptors. - - - Perform USB operations using - ioctl() requests to make endpoint I/O - requests (synchronously or asynchronously) or manage - the device. - These requests need the CAP_SYS_RAWIO capability, - as well as filesystem access permissions. - Only one ioctl request can be made on one of these - device files at a time. - This means that if you are synchronously reading an endpoint - from one thread, you won't be able to write to a different - endpoint from another thread until the read completes. - This works for half duplex protocols, - but otherwise you'd use asynchronous i/o requests. - - - - - - - Life Cycle of User Mode Drivers - - Such a driver first needs to find a device file - for a device it knows how to handle. - Maybe it was told about it because a - /sbin/hotplug event handling agent - chose that driver to handle the new device. - Or maybe it's an application that scans all the - /proc/bus/usb device files, and ignores most devices. - In either case, it should read() all - the descriptors from the device file, - and check them against what it knows how to handle. - It might just reject everything except a particular - vendor and product ID, or need a more complex policy. - - - Never assume there will only be one such device - on the system at a time! - If your code can't handle more than one device at - a time, at least detect when there's more than one, and - have your users choose which device to use. - - - Once your user mode driver knows what device to use, - it interacts with it in either of two styles. - The simple style is to make only control requests; some - devices don't need more complex interactions than those. - (An example might be software using vendor-specific control - requests for some initialization or configuration tasks, - with a kernel driver for the rest.) - - - More likely, you need a more complex style driver: - one using non-control endpoints, reading or writing data - and claiming exclusive use of an interface. - Bulk transfers are easiest to use, - but only their sibling interrupt transfers - work with low speed devices. - Both interrupt and isochronous transfers - offer service guarantees because their bandwidth is reserved. - Such "periodic" transfers are awkward to use through usbfs, - unless you're using the asynchronous calls. However, interrupt - transfers can also be used in a synchronous "one shot" style. - - - Your user-mode driver should never need to worry - about cleaning up request state when the device is - disconnected, although it should close its open file - descriptors as soon as it starts seeing the ENODEV - errors. - - - - - The ioctl() Requests - - To use these ioctls, you need to include the following - headers in your userspace program: -#include <linux/usb.h> -#include <linux/usbdevice_fs.h> -#include <asm/byteorder.h> - The standard USB device model requests, from "Chapter 9" of - the USB 2.0 specification, are automatically included from - the <linux/usb/ch9.h> header. - - - Unless noted otherwise, the ioctl requests - described here will - update the modification time on the usbfs file to which - they are applied (unless they fail). - A return of zero indicates success; otherwise, a - standard USB error code is returned. (These are - documented in - Documentation/usb/error-codes.txt - in your kernel sources.) - - - Each of these files multiplexes access to several - I/O streams, one per endpoint. - Each device has one control endpoint (endpoint zero) - which supports a limited RPC style RPC access. - Devices are configured - by hub_wq (in the kernel) setting a device-wide - configuration that affects things - like power consumption and basic functionality. - The endpoints are part of USB interfaces, - which may have altsettings - affecting things like which endpoints are available. - Many devices only have a single configuration and interface, - so drivers for them will ignore configurations and altsettings. - - - - - Management/Status Requests - - A number of usbfs requests don't deal very directly - with device I/O. - They mostly relate to device management and status. - These are all synchronous requests. - - - - - USBDEVFS_CLAIMINTERFACE - This is used to force usbfs to - claim a specific interface, - which has not previously been claimed by usbfs or any other - kernel driver. - The ioctl parameter is an integer holding the number of - the interface (bInterfaceNumber from descriptor). - - Note that if your driver doesn't claim an interface - before trying to use one of its endpoints, and no - other driver has bound to it, then the interface is - automatically claimed by usbfs. - - This claim will be released by a RELEASEINTERFACE ioctl, - or by closing the file descriptor. - File modification time is not updated by this request. - - - USBDEVFS_CONNECTINFO - Says whether the device is lowspeed. - The ioctl parameter points to a structure like this: -struct usbdevfs_connectinfo { - unsigned int devnum; - unsigned char slow; -}; - File modification time is not updated by this request. - - You can't tell whether a "not slow" - device is connected at high speed (480 MBit/sec) - or just full speed (12 MBit/sec). - You should know the devnum value already, - it's the DDD value of the device file name. - - - USBDEVFS_GETDRIVER - Returns the name of the kernel driver - bound to a given interface (a string). Parameter - is a pointer to this structure, which is modified: -struct usbdevfs_getdriver { - unsigned int interface; - char driver[USBDEVFS_MAXDRIVERNAME + 1]; -}; - File modification time is not updated by this request. - - - USBDEVFS_IOCTL - Passes a request from userspace through - to a kernel driver that has an ioctl entry in the - struct usb_driver it registered. -struct usbdevfs_ioctl { - int ifno; - int ioctl_code; - void *data; -}; - -/* user mode call looks like this. - * 'request' becomes the driver->ioctl() 'code' parameter. - * the size of 'param' is encoded in 'request', and that data - * is copied to or from the driver->ioctl() 'buf' parameter. - */ -static int -usbdev_ioctl (int fd, int ifno, unsigned request, void *param) -{ - struct usbdevfs_ioctl wrapper; - - wrapper.ifno = ifno; - wrapper.ioctl_code = request; - wrapper.data = param; - - return ioctl (fd, USBDEVFS_IOCTL, &wrapper); -} - File modification time is not updated by this request. - - This request lets kernel drivers talk to user mode code - through filesystem operations even when they don't create - a character or block special device. - It's also been used to do things like ask devices what - device special file should be used. - Two pre-defined ioctls are used - to disconnect and reconnect kernel drivers, so - that user mode code can completely manage binding - and configuration of devices. - - - USBDEVFS_RELEASEINTERFACE - This is used to release the claim usbfs - made on interface, either implicitly or because of a - USBDEVFS_CLAIMINTERFACE call, before the file - descriptor is closed. - The ioctl parameter is an integer holding the number of - the interface (bInterfaceNumber from descriptor); - File modification time is not updated by this request. - - No security check is made to ensure - that the task which made the claim is the one - which is releasing it. - This means that user mode driver may interfere - other ones. - - - USBDEVFS_RESETEP - Resets the data toggle value for an endpoint - (bulk or interrupt) to DATA0. - The ioctl parameter is an integer endpoint number - (1 to 15, as identified in the endpoint descriptor), - with USB_DIR_IN added if the device's endpoint sends - data to the host. - - Avoid using this request. - It should probably be removed. - Using it typically means the device and driver will lose - toggle synchronization. If you really lost synchronization, - you likely need to completely handshake with the device, - using a request like CLEAR_HALT - or SET_INTERFACE. - - - USBDEVFS_DROP_PRIVILEGES - This is used to relinquish the ability - to do certain operations which are considered to be - privileged on a usbfs file descriptor. - This includes claiming arbitrary interfaces, resetting - a device on which there are currently claimed interfaces - from other users, and issuing USBDEVFS_IOCTL calls. - The ioctl parameter is a 32 bit mask of interfaces - the user is allowed to claim on this file descriptor. - You may issue this ioctl more than one time to narrow - said mask. - - - - - - - Synchronous I/O Support - - Synchronous requests involve the kernel blocking - until the user mode request completes, either by - finishing successfully or by reporting an error. - In most cases this is the simplest way to use usbfs, - although as noted above it does prevent performing I/O - to more than one endpoint at a time. - - - - - USBDEVFS_BULK - Issues a bulk read or write request to the - device. - The ioctl parameter is a pointer to this structure: -struct usbdevfs_bulktransfer { - unsigned int ep; - unsigned int len; - unsigned int timeout; /* in milliseconds */ - void *data; -}; - The "ep" value identifies a - bulk endpoint number (1 to 15, as identified in an endpoint - descriptor), - masked with USB_DIR_IN when referring to an endpoint which - sends data to the host from the device. - The length of the data buffer is identified by "len"; - Recent kernels support requests up to about 128KBytes. - FIXME say how read length is returned, - and how short reads are handled.. - - - USBDEVFS_CLEAR_HALT - Clears endpoint halt (stall) and - resets the endpoint toggle. This is only - meaningful for bulk or interrupt endpoints. - The ioctl parameter is an integer endpoint number - (1 to 15, as identified in an endpoint descriptor), - masked with USB_DIR_IN when referring to an endpoint which - sends data to the host from the device. - - Use this on bulk or interrupt endpoints which have - stalled, returning -EPIPE status - to a data transfer request. - Do not issue the control request directly, since - that could invalidate the host's record of the - data toggle. - - - USBDEVFS_CONTROL - Issues a control request to the device. - The ioctl parameter points to a structure like this: -struct usbdevfs_ctrltransfer { - __u8 bRequestType; - __u8 bRequest; - __u16 wValue; - __u16 wIndex; - __u16 wLength; - __u32 timeout; /* in milliseconds */ - void *data; -}; - - The first eight bytes of this structure are the contents - of the SETUP packet to be sent to the device; see the - USB 2.0 specification for details. - The bRequestType value is composed by combining a - USB_TYPE_* value, a USB_DIR_* value, and a - USB_RECIP_* value (from - <linux/usb.h>). - If wLength is nonzero, it describes the length of the data - buffer, which is either written to the device - (USB_DIR_OUT) or read from the device (USB_DIR_IN). - - At this writing, you can't transfer more than 4 KBytes - of data to or from a device; usbfs has a limit, and - some host controller drivers have a limit. - (That's not usually a problem.) - Also there's no way to say it's - not OK to get a short read back from the device. - - - USBDEVFS_RESET - Does a USB level device reset. - The ioctl parameter is ignored. - After the reset, this rebinds all device interfaces. - File modification time is not updated by this request. - - Avoid using this call - until some usbcore bugs get fixed, - since it does not fully synchronize device, interface, - and driver (not just usbfs) state. - - - USBDEVFS_SETINTERFACE - Sets the alternate setting for an - interface. The ioctl parameter is a pointer to a - structure like this: -struct usbdevfs_setinterface { - unsigned int interface; - unsigned int altsetting; -}; - File modification time is not updated by this request. - - Those struct members are from some interface descriptor - applying to the current configuration. - The interface number is the bInterfaceNumber value, and - the altsetting number is the bAlternateSetting value. - (This resets each endpoint in the interface.) - - - USBDEVFS_SETCONFIGURATION - Issues the - usb_set_configuration call - for the device. - The parameter is an integer holding the number of - a configuration (bConfigurationValue from descriptor). - File modification time is not updated by this request. - - Avoid using this call - until some usbcore bugs get fixed, - since it does not fully synchronize device, interface, - and driver (not just usbfs) state. - - - - - - - Asynchronous I/O Support - - As mentioned above, there are situations where it may be - important to initiate concurrent operations from user mode code. - This is particularly important for periodic transfers - (interrupt and isochronous), but it can be used for other - kinds of USB requests too. - In such cases, the asynchronous requests described here - are essential. Rather than submitting one request and having - the kernel block until it completes, the blocking is separate. - - - These requests are packaged into a structure that - resembles the URB used by kernel device drivers. - (No POSIX Async I/O support here, sorry.) - It identifies the endpoint type (USBDEVFS_URB_TYPE_*), - endpoint (number, masked with USB_DIR_IN as appropriate), - buffer and length, and a user "context" value serving to - uniquely identify each request. - (It's usually a pointer to per-request data.) - Flags can modify requests (not as many as supported for - kernel drivers). - - - Each request can specify a realtime signal number - (between SIGRTMIN and SIGRTMAX, inclusive) to request a - signal be sent when the request completes. - - - When usbfs returns these urbs, the status value - is updated, and the buffer may have been modified. - Except for isochronous transfers, the actual_length is - updated to say how many bytes were transferred; if the - USBDEVFS_URB_DISABLE_SPD flag is set - ("short packets are not OK"), if fewer bytes were read - than were requested then you get an error report. - - -struct usbdevfs_iso_packet_desc { - unsigned int length; - unsigned int actual_length; - unsigned int status; -}; - -struct usbdevfs_urb { - unsigned char type; - unsigned char endpoint; - int status; - unsigned int flags; - void *buffer; - int buffer_length; - int actual_length; - int start_frame; - int number_of_packets; - int error_count; - unsigned int signr; - void *usercontext; - struct usbdevfs_iso_packet_desc iso_frame_desc[]; -}; - - For these asynchronous requests, the file modification - time reflects when the request was initiated. - This contrasts with their use with the synchronous requests, - where it reflects when requests complete. - - - - - USBDEVFS_DISCARDURB - - TBS - File modification time is not updated by this request. - - - - USBDEVFS_DISCSIGNAL - - TBS - File modification time is not updated by this request. - - - - USBDEVFS_REAPURB - - TBS - File modification time is not updated by this request. - - - - USBDEVFS_REAPURBNDELAY - - TBS - File modification time is not updated by this request. - - - - USBDEVFS_SUBMITURB - - TBS - - - - - - - - - - - - diff --git a/Documentation/DocBook/writing-an-alsa-driver.tmpl b/Documentation/DocBook/writing-an-alsa-driver.tmpl deleted file mode 100644 index a27ab9f53fb6..000000000000 --- a/Documentation/DocBook/writing-an-alsa-driver.tmpl +++ /dev/null @@ -1,6206 +0,0 @@ - - - - - - - - - Writing an ALSA Driver - - Takashi - Iwai - -
- tiwai@suse.de -
-
-
- - Oct 15, 2007 - 0.3.7 - - - - This document describes how to write an ALSA (Advanced Linux - Sound Architecture) driver. - - - - - - Copyright (c) 2002-2005 Takashi Iwai tiwai@suse.de - - - - This document is free; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2 of the License, or - (at your option) any later version. - - - - This document is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the - implied warranty of MERCHANTABILITY or FITNESS FOR A - PARTICULAR PURPOSE. See the GNU General Public License - for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - -
- - - - - - Preface - - This document describes how to write an - - ALSA (Advanced Linux Sound Architecture) - driver. The document focuses mainly on PCI soundcards. - In the case of other device types, the API might - be different, too. However, at least the ALSA kernel API is - consistent, and therefore it would be still a bit help for - writing them. - - - - This document targets people who already have enough - C language skills and have basic linux kernel programming - knowledge. This document doesn't explain the general - topic of linux kernel coding and doesn't cover low-level - driver implementation details. It only describes - the standard way to write a PCI sound driver on ALSA. - - - - If you are already familiar with the older ALSA ver.0.5.x API, you - can check the drivers such as sound/pci/es1938.c or - sound/pci/maestro3.c which have also almost the same - code-base in the ALSA 0.5.x tree, so you can compare the differences. - - - - This document is still a draft version. Any feedback and - corrections, please!! - - - - - - - - - File Tree Structure - -
- General - - The ALSA drivers are provided in two ways. - - - - One is the trees provided as a tarball or via cvs from the - ALSA's ftp site, and another is the 2.6 (or later) Linux kernel - tree. To synchronize both, the ALSA driver tree is split into - two different trees: alsa-kernel and alsa-driver. The former - contains purely the source code for the Linux 2.6 (or later) - tree. This tree is designed only for compilation on 2.6 or - later environment. The latter, alsa-driver, contains many subtle - files for compiling ALSA drivers outside of the Linux kernel tree, - wrapper functions for older 2.2 and 2.4 kernels, to adapt the latest kernel API, - and additional drivers which are still in development or in - tests. The drivers in alsa-driver tree will be moved to - alsa-kernel (and eventually to the 2.6 kernel tree) when they are - finished and confirmed to work fine. - - - - The file tree structure of ALSA driver is depicted below. Both - alsa-kernel and alsa-driver have almost the same file - structure, except for core directory. It's - named as acore in alsa-driver tree. - - - ALSA File Tree Structure - - sound - /core - /oss - /seq - /oss - /instr - /ioctl32 - /include - /drivers - /mpu401 - /opl3 - /i2c - /l3 - /synth - /emux - /pci - /(cards) - /isa - /(cards) - /arm - /ppc - /sparc - /usb - /pcmcia /(cards) - /oss - - - -
- -
- core directory - - This directory contains the middle layer which is the heart - of ALSA drivers. In this directory, the native ALSA modules are - stored. The sub-directories contain different modules and are - dependent upon the kernel config. - - -
- core/oss - - - The codes for PCM and mixer OSS emulation modules are stored - in this directory. The rawmidi OSS emulation is included in - the ALSA rawmidi code since it's quite small. The sequencer - code is stored in core/seq/oss directory (see - - below). - -
- -
- core/ioctl32 - - - This directory contains the 32bit-ioctl wrappers for 64bit - architectures such like x86-64, ppc64 and sparc64. For 32bit - and alpha architectures, these are not compiled. - -
- -
- core/seq - - This directory and its sub-directories are for the ALSA - sequencer. This directory contains the sequencer core and - primary sequencer modules such like snd-seq-midi, - snd-seq-virmidi, etc. They are compiled only when - CONFIG_SND_SEQUENCER is set in the kernel - config. - -
- -
- core/seq/oss - - This contains the OSS sequencer emulation codes. - -
- -
- core/seq/instr - - This directory contains the modules for the sequencer - instrument layer. - -
-
- -
- include directory - - This is the place for the public header files of ALSA drivers, - which are to be exported to user-space, or included by - several files at different directories. Basically, the private - header files should not be placed in this directory, but you may - still find files there, due to historical reasons :) - -
- -
- drivers directory - - This directory contains code shared among different drivers - on different architectures. They are hence supposed not to be - architecture-specific. - For example, the dummy pcm driver and the serial MIDI - driver are found in this directory. In the sub-directories, - there is code for components which are independent from - bus and cpu architectures. - - -
- drivers/mpu401 - - The MPU401 and MPU401-UART modules are stored here. - -
- -
- drivers/opl3 and opl4 - - The OPL3 and OPL4 FM-synth stuff is found here. - -
-
- -
- i2c directory - - This contains the ALSA i2c components. - - - - Although there is a standard i2c layer on Linux, ALSA has its - own i2c code for some cards, because the soundcard needs only a - simple operation and the standard i2c API is too complicated for - such a purpose. - - -
- i2c/l3 - - This is a sub-directory for ARM L3 i2c. - -
-
- -
- synth directory - - This contains the synth middle-level modules. - - - - So far, there is only Emu8000/Emu10k1 synth driver under - the synth/emux sub-directory. - -
- -
- pci directory - - This directory and its sub-directories hold the top-level card modules - for PCI soundcards and the code specific to the PCI BUS. - - - - The drivers compiled from a single file are stored directly - in the pci directory, while the drivers with several source files are - stored on their own sub-directory (e.g. emu10k1, ice1712). - -
- -
- isa directory - - This directory and its sub-directories hold the top-level card modules - for ISA soundcards. - -
- -
- arm, ppc, and sparc directories - - They are used for top-level card modules which are - specific to one of these architectures. - -
- -
- usb directory - - This directory contains the USB-audio driver. In the latest version, the - USB MIDI driver is integrated in the usb-audio driver. - -
- -
- pcmcia directory - - The PCMCIA, especially PCCard drivers will go here. CardBus - drivers will be in the pci directory, because their API is identical - to that of standard PCI cards. - -
- -
- oss directory - - The OSS/Lite source files are stored here in Linux 2.6 (or - later) tree. In the ALSA driver tarball, this directory is empty, - of course :) - -
-
- - - - - - - Basic Flow for PCI Drivers - -
- Outline - - The minimum flow for PCI soundcards is as follows: - - - define the PCI ID table (see the section - PCI Entries - ). - create probe() callback. - create remove() callback. - create a pci_driver structure - containing the three pointers above. - create an init() function just calling - the pci_register_driver() to register the pci_driver table - defined above. - create an exit() function to call - the pci_unregister_driver() function. - - -
- -
- Full Code Example - - The code example is shown below. Some parts are kept - unimplemented at this moment but will be filled in the - next sections. The numbers in the comment lines of the - snd_mychip_probe() function - refer to details explained in the following section. - - - Basic Flow for PCI Drivers - Example - - - #include - #include - #include - #include - - /* module parameters (see "Module Parameters") */ - /* SNDRV_CARDS: maximum number of cards supported by this module */ - static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; - static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; - static bool enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP; - - /* definition of the chip-specific record */ - struct mychip { - struct snd_card *card; - /* the rest of the implementation will be in section - * "PCI Resource Management" - */ - }; - - /* chip-specific destructor - * (see "PCI Resource Management") - */ - static int snd_mychip_free(struct mychip *chip) - { - .... /* will be implemented later... */ - } - - /* component-destructor - * (see "Management of Cards and Components") - */ - static int snd_mychip_dev_free(struct snd_device *device) - { - return snd_mychip_free(device->device_data); - } - - /* chip-specific constructor - * (see "Management of Cards and Components") - */ - static int snd_mychip_create(struct snd_card *card, - struct pci_dev *pci, - struct mychip **rchip) - { - struct mychip *chip; - int err; - static struct snd_device_ops ops = { - .dev_free = snd_mychip_dev_free, - }; - - *rchip = NULL; - - /* check PCI availability here - * (see "PCI Resource Management") - */ - .... - - /* allocate a chip-specific data with zero filled */ - chip = kzalloc(sizeof(*chip), GFP_KERNEL); - if (chip == NULL) - return -ENOMEM; - - chip->card = card; - - /* rest of initialization here; will be implemented - * later, see "PCI Resource Management" - */ - .... - - err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops); - if (err < 0) { - snd_mychip_free(chip); - return err; - } - - *rchip = chip; - return 0; - } - - /* constructor -- see "Constructor" sub-section */ - static int snd_mychip_probe(struct pci_dev *pci, - const struct pci_device_id *pci_id) - { - static int dev; - struct snd_card *card; - struct mychip *chip; - int err; - - /* (1) */ - if (dev >= SNDRV_CARDS) - return -ENODEV; - if (!enable[dev]) { - dev++; - return -ENOENT; - } - - /* (2) */ - err = snd_card_new(&pci->dev, index[dev], id[dev], THIS_MODULE, - 0, &card); - if (err < 0) - return err; - - /* (3) */ - err = snd_mychip_create(card, pci, &chip); - if (err < 0) { - snd_card_free(card); - return err; - } - - /* (4) */ - strcpy(card->driver, "My Chip"); - strcpy(card->shortname, "My Own Chip 123"); - sprintf(card->longname, "%s at 0x%lx irq %i", - card->shortname, chip->ioport, chip->irq); - - /* (5) */ - .... /* implemented later */ - - /* (6) */ - err = snd_card_register(card); - if (err < 0) { - snd_card_free(card); - return err; - } - - /* (7) */ - pci_set_drvdata(pci, card); - dev++; - return 0; - } - - /* destructor -- see the "Destructor" sub-section */ - static void snd_mychip_remove(struct pci_dev *pci) - { - snd_card_free(pci_get_drvdata(pci)); - pci_set_drvdata(pci, NULL); - } -]]> - - - -
- -
- Constructor - - The real constructor of PCI drivers is the probe callback. - The probe callback and other component-constructors which are called - from the probe callback cannot be used with - the __init prefix - because any PCI device could be a hotplug device. - - - - In the probe callback, the following scheme is often used. - - -
- 1) Check and increment the device index. - - - -= SNDRV_CARDS) - return -ENODEV; - if (!enable[dev]) { - dev++; - return -ENOENT; - } -]]> - - - - where enable[dev] is the module option. - - - - Each time the probe callback is called, check the - availability of the device. If not available, simply increment - the device index and returns. dev will be incremented also - later (step - 7). - -
- -
- 2) Create a card instance - - - -dev, index[dev], id[dev], THIS_MODULE, - 0, &card); -]]> - - - - - - The details will be explained in the section - - Management of Cards and Components. - -
- -
- 3) Create a main component - - In this part, the PCI resources are allocated. - - - - - - - - The details will be explained in the section PCI Resource - Management. - -
- -
- 4) Set the driver ID and name strings. - - - -driver, "My Chip"); - strcpy(card->shortname, "My Own Chip 123"); - sprintf(card->longname, "%s at 0x%lx irq %i", - card->shortname, chip->ioport, chip->irq); -]]> - - - - The driver field holds the minimal ID string of the - chip. This is used by alsa-lib's configurator, so keep it - simple but unique. - Even the same driver can have different driver IDs to - distinguish the functionality of each chip type. - - - - The shortname field is a string shown as more verbose - name. The longname field contains the information - shown in /proc/asound/cards. - -
- -
- 5) Create other components, such as mixer, MIDI, etc. - - Here you define the basic components such as - PCM, - mixer (e.g. AC97), - MIDI (e.g. MPU-401), - and other interfaces. - Also, if you want a proc - file, define it here, too. - -
- -
- 6) Register the card instance. - - - - - - - - - - Will be explained in the section Management - of Cards and Components, too. - -
- -
- 7) Set the PCI driver data and return zero. - - - - - - - - In the above, the card record is stored. This pointer is - used in the remove callback and power-management - callbacks, too. - -
-
- -
- Destructor - - The destructor, remove callback, simply releases the card - instance. Then the ALSA middle layer will release all the - attached components automatically. - - - - It would be typically like the following: - - - - - - - - The above code assumes that the card pointer is set to the PCI - driver data. - -
- -
- Header Files - - For the above example, at least the following include files - are necessary. - - - - - #include - #include - #include - #include -]]> - - - - where the last one is necessary only when module options are - defined in the source file. If the code is split into several - files, the files without module options don't need them. - - - - In addition to these headers, you'll need - <linux/interrupt.h> for interrupt - handling, and <asm/io.h> for I/O - access. If you use the mdelay() or - udelay() functions, you'll need to include - <linux/delay.h> too. - - - - The ALSA interfaces like the PCM and control APIs are defined in other - <sound/xxx.h> header files. - They have to be included after - <sound/core.h>. - - -
-
- - - - - - - Management of Cards and Components - -
- Card Instance - - For each soundcard, a card record must be allocated. - - - - A card record is the headquarters of the soundcard. It manages - the whole list of devices (components) on the soundcard, such as - PCM, mixers, MIDI, synthesizer, and so on. Also, the card - record holds the ID and the name strings of the card, manages - the root of proc files, and controls the power-management states - and hotplug disconnections. The component list on the card - record is used to manage the correct release of resources at - destruction. - - - - As mentioned above, to create a card instance, call - snd_card_new(). - - - -dev, index, id, module, extra_size, &card); -]]> - - - - - - The function takes six arguments: the parent device pointer, - the card-index number, the id string, the module pointer (usually - THIS_MODULE), - the size of extra-data space, and the pointer to return the - card instance. The extra_size argument is used to - allocate card->private_data for the - chip-specific data. Note that these data - are allocated by snd_card_new(). - - - - The first argument, the pointer of struct - device, specifies the parent device. - For PCI devices, typically &pci-> is passed there. - -
- -
- Components - - After the card is created, you can attach the components - (devices) to the card instance. In an ALSA driver, a component is - represented as a struct snd_device object. - A component can be a PCM instance, a control interface, a raw - MIDI interface, etc. Each such instance has one component - entry. - - - - A component can be created via - snd_device_new() function. - - - - - - - - - - This takes the card pointer, the device-level - (SNDRV_DEV_XXX), the data pointer, and the - callback pointers (&ops). The - device-level defines the type of components and the order of - registration and de-registration. For most components, the - device-level is already defined. For a user-defined component, - you can use SNDRV_DEV_LOWLEVEL. - - - - This function itself doesn't allocate the data space. The data - must be allocated manually beforehand, and its pointer is passed - as the argument. This pointer (chip in the - above example) is used as the identifier for the instance. - - - - Each pre-defined ALSA component such as ac97 and pcm calls - snd_device_new() inside its - constructor. The destructor for each component is defined in the - callback pointers. Hence, you don't need to take care of - calling a destructor for such a component. - - - - If you wish to create your own component, you need to - set the destructor function to the dev_free callback in - the ops, so that it can be released - automatically via snd_card_free(). - The next example will show an implementation of chip-specific - data. - -
- -
- Chip-Specific Data - - Chip-specific information, e.g. the I/O port address, its - resource pointer, or the irq number, is stored in the - chip-specific record. - - - - - - - - - - In general, there are two ways of allocating the chip record. - - -
- 1. Allocating via <function>snd_card_new()</function>. - - As mentioned above, you can pass the extra-data-length - to the 5th argument of snd_card_new(), i.e. - - - -dev, index[dev], id[dev], THIS_MODULE, - sizeof(struct mychip), &card); -]]> - - - - struct mychip is the type of the chip record. - - - - In return, the allocated record can be accessed as - - - -private_data; -]]> - - - - With this method, you don't have to allocate twice. - The record is released together with the card instance. - -
- -
- 2. Allocating an extra device. - - - After allocating a card instance via - snd_card_new() (with - 0 on the 4th arg), call - kzalloc(). - - - -dev, index[dev], id[dev], THIS_MODULE, - 0, &card); - ..... - chip = kzalloc(sizeof(*chip), GFP_KERNEL); -]]> - - - - - - The chip record should have the field to hold the card - pointer at least, - - - - - - - - - - Then, set the card pointer in the returned chip instance. - - - -card = card; -]]> - - - - - - Next, initialize the fields, and register this chip - record as a low-level device with a specified - ops, - - - - - - - - snd_mychip_dev_free() is the - device-destructor function, which will call the real - destructor. - - - - - -device_data); - } -]]> - - - - where snd_mychip_free() is the real destructor. - -
-
- -
- Registration and Release - - After all components are assigned, register the card instance - by calling snd_card_register(). Access - to the device files is enabled at this point. That is, before - snd_card_register() is called, the - components are safely inaccessible from external side. If this - call fails, exit the probe function after releasing the card via - snd_card_free(). - - - - For releasing the card instance, you can call simply - snd_card_free(). As mentioned earlier, all - components are released automatically by this call. - - - - For a device which allows hotplugging, you can use - snd_card_free_when_closed. This one will - postpone the destruction until all devices are closed. - - -
- -
- - - - - - - PCI Resource Management - -
- Full Code Example - - In this section, we'll complete the chip-specific constructor, - destructor and PCI entries. Example code is shown first, - below. - - - PCI Resource Management Example - -irq >= 0) - free_irq(chip->irq, chip); - /* release the I/O ports & memory */ - pci_release_regions(chip->pci); - /* disable the PCI entry */ - pci_disable_device(chip->pci); - /* release the data */ - kfree(chip); - return 0; - } - - /* chip-specific constructor */ - static int snd_mychip_create(struct snd_card *card, - struct pci_dev *pci, - struct mychip **rchip) - { - struct mychip *chip; - int err; - static struct snd_device_ops ops = { - .dev_free = snd_mychip_dev_free, - }; - - *rchip = NULL; - - /* initialize the PCI entry */ - err = pci_enable_device(pci); - if (err < 0) - return err; - /* check PCI availability (28bit DMA) */ - if (pci_set_dma_mask(pci, DMA_BIT_MASK(28)) < 0 || - pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(28)) < 0) { - printk(KERN_ERR "error to set 28bit mask DMA\n"); - pci_disable_device(pci); - return -ENXIO; - } - - chip = kzalloc(sizeof(*chip), GFP_KERNEL); - if (chip == NULL) { - pci_disable_device(pci); - return -ENOMEM; - } - - /* initialize the stuff */ - chip->card = card; - chip->pci = pci; - chip->irq = -1; - - /* (1) PCI resource allocation */ - err = pci_request_regions(pci, "My Chip"); - if (err < 0) { - kfree(chip); - pci_disable_device(pci); - return err; - } - chip->port = pci_resource_start(pci, 0); - if (request_irq(pci->irq, snd_mychip_interrupt, - IRQF_SHARED, KBUILD_MODNAME, chip)) { - printk(KERN_ERR "cannot grab irq %d\n", pci->irq); - snd_mychip_free(chip); - return -EBUSY; - } - chip->irq = pci->irq; - - /* (2) initialization of the chip hardware */ - .... /* (not implemented in this document) */ - - err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops); - if (err < 0) { - snd_mychip_free(chip); - return err; - } - - *rchip = chip; - return 0; - } - - /* PCI IDs */ - static struct pci_device_id snd_mychip_ids[] = { - { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, - PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, - .... - { 0, } - }; - MODULE_DEVICE_TABLE(pci, snd_mychip_ids); - - /* pci_driver definition */ - static struct pci_driver driver = { - .name = KBUILD_MODNAME, - .id_table = snd_mychip_ids, - .probe = snd_mychip_probe, - .remove = snd_mychip_remove, - }; - - /* module initialization */ - static int __init alsa_card_mychip_init(void) - { - return pci_register_driver(&driver); - } - - /* module clean up */ - static void __exit alsa_card_mychip_exit(void) - { - pci_unregister_driver(&driver); - } - - module_init(alsa_card_mychip_init) - module_exit(alsa_card_mychip_exit) - - EXPORT_NO_SYMBOLS; /* for old kernels only */ -]]> - - - -
- -
- Some Hafta's - - The allocation of PCI resources is done in the - probe() function, and usually an extra - xxx_create() function is written for this - purpose. - - - - In the case of PCI devices, you first have to call - the pci_enable_device() function before - allocating resources. Also, you need to set the proper PCI DMA - mask to limit the accessed I/O range. In some cases, you might - need to call pci_set_master() function, - too. - - - - Suppose the 28bit mask, and the code to be added would be like: - - - - - - - -
- -
- Resource Allocation - - The allocation of I/O ports and irqs is done via standard kernel - functions. Unlike ALSA ver.0.5.x., there are no helpers for - that. And these resources must be released in the destructor - function (see below). Also, on ALSA 0.9.x, you don't need to - allocate (pseudo-)DMA for PCI like in ALSA 0.5.x. - - - - Now assume that the PCI device has an I/O port with 8 bytes - and an interrupt. Then struct mychip will have the - following fields: - - - - - - - - - - For an I/O port (and also a memory region), you need to have - the resource pointer for the standard resource management. For - an irq, you have to keep only the irq number (integer). But you - need to initialize this number as -1 before actual allocation, - since irq 0 is valid. The port address and its resource pointer - can be initialized as null by - kzalloc() automatically, so you - don't have to take care of resetting them. - - - - The allocation of an I/O port is done like this: - - - -port = pci_resource_start(pci, 0); -]]> - - - - - - - It will reserve the I/O port region of 8 bytes of the given - PCI device. The returned value, chip->res_port, is allocated - via kmalloc() by - request_region(). The pointer must be - released via kfree(), but there is a - problem with this. This issue will be explained later. - - - - The allocation of an interrupt source is done like this: - - - -irq, snd_mychip_interrupt, - IRQF_SHARED, KBUILD_MODNAME, chip)) { - printk(KERN_ERR "cannot grab irq %d\n", pci->irq); - snd_mychip_free(chip); - return -EBUSY; - } - chip->irq = pci->irq; -]]> - - - - where snd_mychip_interrupt() is the - interrupt handler defined later. - Note that chip->irq should be defined - only when request_irq() succeeded. - - - - On the PCI bus, interrupts can be shared. Thus, - IRQF_SHARED is used as the interrupt flag of - request_irq(). - - - - The last argument of request_irq() is the - data pointer passed to the interrupt handler. Usually, the - chip-specific record is used for that, but you can use what you - like, too. - - - - I won't give details about the interrupt handler at this - point, but at least its appearance can be explained now. The - interrupt handler looks usually like the following: - - - - - - - - - - Now let's write the corresponding destructor for the resources - above. The role of destructor is simple: disable the hardware - (if already activated) and release the resources. So far, we - have no hardware part, so the disabling code is not written here. - - - - To release the resources, the check-and-release - method is a safer way. For the interrupt, do like this: - - - -irq >= 0) - free_irq(chip->irq, chip); -]]> - - - - Since the irq number can start from 0, you should initialize - chip->irq with a negative value (e.g. -1), so that you can - check the validity of the irq number as above. - - - - When you requested I/O ports or memory regions via - pci_request_region() or - pci_request_regions() like in this example, - release the resource(s) using the corresponding function, - pci_release_region() or - pci_release_regions(). - - - -pci); -]]> - - - - - - When you requested manually via request_region() - or request_mem_region, you can release it via - release_resource(). Suppose that you keep - the resource pointer returned from request_region() - in chip->res_port, the release procedure looks like: - - - -res_port); -]]> - - - - - - Don't forget to call pci_disable_device() - before the end. - - - - And finally, release the chip-specific record. - - - - - - - - - - We didn't implement the hardware disabling part in the above. - If you need to do this, please note that the destructor may be - called even before the initialization of the chip is completed. - It would be better to have a flag to skip hardware disabling - if the hardware was not initialized yet. - - - - When the chip-data is assigned to the card using - snd_device_new() with - SNDRV_DEV_LOWLELVEL , its destructor is - called at the last. That is, it is assured that all other - components like PCMs and controls have already been released. - You don't have to stop PCMs, etc. explicitly, but just - call low-level hardware stopping. - - - - The management of a memory-mapped region is almost as same as - the management of an I/O port. You'll need three fields like - the following: - - - - - - - - and the allocation would be like below: - - - -iobase_phys = pci_resource_start(pci, 0); - chip->iobase_virt = ioremap_nocache(chip->iobase_phys, - pci_resource_len(pci, 0)); -]]> - - - - and the corresponding destructor would be: - - - -iobase_virt) - iounmap(chip->iobase_virt); - .... - pci_release_regions(chip->pci); - .... - } -]]> - - - - -
- -
- PCI Entries - - So far, so good. Let's finish the missing PCI - stuff. At first, we need a - pci_device_id table for this - chipset. It's a table of PCI vendor/device ID number, and some - masks. - - - - For example, - - - - - - - - - - The first and second fields of - the pci_device_id structure are the vendor and - device IDs. If you have no reason to filter the matching - devices, you can leave the remaining fields as above. The last - field of the pci_device_id struct contains - private data for this entry. You can specify any value here, for - example, to define specific operations for supported device IDs. - Such an example is found in the intel8x0 driver. - - - - The last entry of this list is the terminator. You must - specify this all-zero entry. - - - - Then, prepare the pci_driver record: - - - - - - - - - - The probe and - remove functions have already - been defined in the previous sections. - The name - field is the name string of this device. Note that you must not - use a slash / in this string. - - - - And at last, the module entries: - - - - - - - - - - Note that these module entries are tagged with - __init and - __exit prefixes. - - - - Oh, one thing was forgotten. If you have no exported symbols, - you need to declare it in 2.2 or 2.4 kernels (it's not necessary in 2.6 kernels). - - - - - - - - That's all! - -
-
- - - - - - - PCM Interface - -
- General - - The PCM middle layer of ALSA is quite powerful and it is only - necessary for each driver to implement the low-level functions - to access its hardware. - - - - For accessing to the PCM layer, you need to include - <sound/pcm.h> first. In addition, - <sound/pcm_params.h> might be needed - if you access to some functions related with hw_param. - - - - Each card device can have up to four pcm instances. A pcm - instance corresponds to a pcm device file. The limitation of - number of instances comes only from the available bit size of - the Linux's device numbers. Once when 64bit device number is - used, we'll have more pcm instances available. - - - - A pcm instance consists of pcm playback and capture streams, - and each pcm stream consists of one or more pcm substreams. Some - soundcards support multiple playback functions. For example, - emu10k1 has a PCM playback of 32 stereo substreams. In this case, at - each open, a free substream is (usually) automatically chosen - and opened. Meanwhile, when only one substream exists and it was - already opened, the successful open will either block - or error with EAGAIN according to the - file open mode. But you don't have to care about such details in your - driver. The PCM middle layer will take care of such work. - -
- -
- Full Code Example - - The example code below does not include any hardware access - routines but shows only the skeleton, how to build up the PCM - interfaces. - - - PCM Example Code - - - .... - - /* hardware definition */ - static struct snd_pcm_hardware snd_mychip_playback_hw = { - .info = (SNDRV_PCM_INFO_MMAP | - SNDRV_PCM_INFO_INTERLEAVED | - SNDRV_PCM_INFO_BLOCK_TRANSFER | - SNDRV_PCM_INFO_MMAP_VALID), - .formats = SNDRV_PCM_FMTBIT_S16_LE, - .rates = SNDRV_PCM_RATE_8000_48000, - .rate_min = 8000, - .rate_max = 48000, - .channels_min = 2, - .channels_max = 2, - .buffer_bytes_max = 32768, - .period_bytes_min = 4096, - .period_bytes_max = 32768, - .periods_min = 1, - .periods_max = 1024, - }; - - /* hardware definition */ - static struct snd_pcm_hardware snd_mychip_capture_hw = { - .info = (SNDRV_PCM_INFO_MMAP | - SNDRV_PCM_INFO_INTERLEAVED | - SNDRV_PCM_INFO_BLOCK_TRANSFER | - SNDRV_PCM_INFO_MMAP_VALID), - .formats = SNDRV_PCM_FMTBIT_S16_LE, - .rates = SNDRV_PCM_RATE_8000_48000, - .rate_min = 8000, - .rate_max = 48000, - .channels_min = 2, - .channels_max = 2, - .buffer_bytes_max = 32768, - .period_bytes_min = 4096, - .period_bytes_max = 32768, - .periods_min = 1, - .periods_max = 1024, - }; - - /* open callback */ - static int snd_mychip_playback_open(struct snd_pcm_substream *substream) - { - struct mychip *chip = snd_pcm_substream_chip(substream); - struct snd_pcm_runtime *runtime = substream->runtime; - - runtime->hw = snd_mychip_playback_hw; - /* more hardware-initialization will be done here */ - .... - return 0; - } - - /* close callback */ - static int snd_mychip_playback_close(struct snd_pcm_substream *substream) - { - struct mychip *chip = snd_pcm_substream_chip(substream); - /* the hardware-specific codes will be here */ - .... - return 0; - - } - - /* open callback */ - static int snd_mychip_capture_open(struct snd_pcm_substream *substream) - { - struct mychip *chip = snd_pcm_substream_chip(substream); - struct snd_pcm_runtime *runtime = substream->runtime; - - runtime->hw = snd_mychip_capture_hw; - /* more hardware-initialization will be done here */ - .... - return 0; - } - - /* close callback */ - static int snd_mychip_capture_close(struct snd_pcm_substream *substream) - { - struct mychip *chip = snd_pcm_substream_chip(substream); - /* the hardware-specific codes will be here */ - .... - return 0; - - } - - /* hw_params callback */ - static int snd_mychip_pcm_hw_params(struct snd_pcm_substream *substream, - struct snd_pcm_hw_params *hw_params) - { - return snd_pcm_lib_malloc_pages(substream, - params_buffer_bytes(hw_params)); - } - - /* hw_free callback */ - static int snd_mychip_pcm_hw_free(struct snd_pcm_substream *substream) - { - return snd_pcm_lib_free_pages(substream); - } - - /* prepare callback */ - static int snd_mychip_pcm_prepare(struct snd_pcm_substream *substream) - { - struct mychip *chip = snd_pcm_substream_chip(substream); - struct snd_pcm_runtime *runtime = substream->runtime; - - /* set up the hardware with the current configuration - * for example... - */ - mychip_set_sample_format(chip, runtime->format); - mychip_set_sample_rate(chip, runtime->rate); - mychip_set_channels(chip, runtime->channels); - mychip_set_dma_setup(chip, runtime->dma_addr, - chip->buffer_size, - chip->period_size); - return 0; - } - - /* trigger callback */ - static int snd_mychip_pcm_trigger(struct snd_pcm_substream *substream, - int cmd) - { - switch (cmd) { - case SNDRV_PCM_TRIGGER_START: - /* do something to start the PCM engine */ - .... - break; - case SNDRV_PCM_TRIGGER_STOP: - /* do something to stop the PCM engine */ - .... - break; - default: - return -EINVAL; - } - } - - /* pointer callback */ - static snd_pcm_uframes_t - snd_mychip_pcm_pointer(struct snd_pcm_substream *substream) - { - struct mychip *chip = snd_pcm_substream_chip(substream); - unsigned int current_ptr; - - /* get the current hardware pointer */ - current_ptr = mychip_get_hw_pointer(chip); - return current_ptr; - } - - /* operators */ - static struct snd_pcm_ops snd_mychip_playback_ops = { - .open = snd_mychip_playback_open, - .close = snd_mychip_playback_close, - .ioctl = snd_pcm_lib_ioctl, - .hw_params = snd_mychip_pcm_hw_params, - .hw_free = snd_mychip_pcm_hw_free, - .prepare = snd_mychip_pcm_prepare, - .trigger = snd_mychip_pcm_trigger, - .pointer = snd_mychip_pcm_pointer, - }; - - /* operators */ - static struct snd_pcm_ops snd_mychip_capture_ops = { - .open = snd_mychip_capture_open, - .close = snd_mychip_capture_close, - .ioctl = snd_pcm_lib_ioctl, - .hw_params = snd_mychip_pcm_hw_params, - .hw_free = snd_mychip_pcm_hw_free, - .prepare = snd_mychip_pcm_prepare, - .trigger = snd_mychip_pcm_trigger, - .pointer = snd_mychip_pcm_pointer, - }; - - /* - * definitions of capture are omitted here... - */ - - /* create a pcm device */ - static int snd_mychip_new_pcm(struct mychip *chip) - { - struct snd_pcm *pcm; - int err; - - err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1, &pcm); - if (err < 0) - return err; - pcm->private_data = chip; - strcpy(pcm->name, "My Chip"); - chip->pcm = pcm; - /* set operators */ - snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, - &snd_mychip_playback_ops); - snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE, - &snd_mychip_capture_ops); - /* pre-allocation of buffers */ - /* NOTE: this may fail */ - snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, - snd_dma_pci_data(chip->pci), - 64*1024, 64*1024); - return 0; - } -]]> - - - -
- -
- Constructor - - A pcm instance is allocated by the snd_pcm_new() - function. It would be better to create a constructor for pcm, - namely, - - - -card, "My Chip", 0, 1, 1, &pcm); - if (err < 0) - return err; - pcm->private_data = chip; - strcpy(pcm->name, "My Chip"); - chip->pcm = pcm; - .... - return 0; - } -]]> - - - - - - The snd_pcm_new() function takes four - arguments. The first argument is the card pointer to which this - pcm is assigned, and the second is the ID string. - - - - The third argument (index, 0 in the - above) is the index of this new pcm. It begins from zero. If - you create more than one pcm instances, specify the - different numbers in this argument. For example, - index = 1 for the second PCM device. - - - - The fourth and fifth arguments are the number of substreams - for playback and capture, respectively. Here 1 is used for - both arguments. When no playback or capture substreams are available, - pass 0 to the corresponding argument. - - - - If a chip supports multiple playbacks or captures, you can - specify more numbers, but they must be handled properly in - open/close, etc. callbacks. When you need to know which - substream you are referring to, then it can be obtained from - struct snd_pcm_substream data passed to each callback - as follows: - - - -number; -]]> - - - - - - After the pcm is created, you need to set operators for each - pcm stream. - - - - - - - - - - The operators are defined typically like this: - - - - - - - - All the callbacks are described in the - - Operators subsection. - - - - After setting the operators, you probably will want to - pre-allocate the buffer. For the pre-allocation, simply call - the following: - - - -pci), - 64*1024, 64*1024); -]]> - - - - It will allocate a buffer up to 64kB as default. - Buffer management details will be described in the later section Buffer and Memory - Management. - - - - Additionally, you can set some extra information for this pcm - in pcm->info_flags. - The available values are defined as - SNDRV_PCM_INFO_XXX in - <sound/asound.h>, which is used for - the hardware definition (described later). When your soundchip - supports only half-duplex, specify like this: - - - -info_flags = SNDRV_PCM_INFO_HALF_DUPLEX; -]]> - - - -
- -
- ... And the Destructor? - - The destructor for a pcm instance is not always - necessary. Since the pcm device will be released by the middle - layer code automatically, you don't have to call the destructor - explicitly. - - - - The destructor would be necessary if you created - special records internally and needed to release them. In such a - case, set the destructor function to - pcm->private_free: - - - PCM Instance with a Destructor - -my_private_pcm_data); - /* do what you like else */ - .... - } - - static int snd_mychip_new_pcm(struct mychip *chip) - { - struct snd_pcm *pcm; - .... - /* allocate your own data */ - chip->my_private_pcm_data = kmalloc(...); - /* set the destructor */ - pcm->private_data = chip; - pcm->private_free = mychip_pcm_free; - .... - } -]]> - - - -
- -
- Runtime Pointer - The Chest of PCM Information - - When the PCM substream is opened, a PCM runtime instance is - allocated and assigned to the substream. This pointer is - accessible via substream->runtime. - This runtime pointer holds most information you need - to control the PCM: the copy of hw_params and sw_params configurations, the buffer - pointers, mmap records, spinlocks, etc. - - - - The definition of runtime instance is found in - <sound/pcm.h>. Here are - the contents of this file: - - - - - - - - - For the operators (callbacks) of each sound driver, most of - these records are supposed to be read-only. Only the PCM - middle-layer changes / updates them. The exceptions are - the hardware description (hw) DMA buffer information and the - private data. Besides, if you use the standard buffer allocation - method via snd_pcm_lib_malloc_pages(), - you don't need to set the DMA buffer information by yourself. - - - - In the sections below, important records are explained. - - -
- Hardware Description - - The hardware descriptor (struct snd_pcm_hardware) - contains the definitions of the fundamental hardware - configuration. Above all, you'll need to define this in - - the open callback. - Note that the runtime instance holds the copy of the - descriptor, not the pointer to the existing descriptor. That - is, in the open callback, you can modify the copied descriptor - (runtime->hw) as you need. For example, if the maximum - number of channels is 1 only on some chip models, you can - still use the same hardware descriptor and change the - channels_max later: - - -runtime; - ... - runtime->hw = snd_mychip_playback_hw; /* common definition */ - if (chip->model == VERY_OLD_ONE) - runtime->hw.channels_max = 1; -]]> - - - - - - Typically, you'll have a hardware descriptor as below: - - - - - - - - - - - The info field contains the type and - capabilities of this pcm. The bit flags are defined in - <sound/asound.h> as - SNDRV_PCM_INFO_XXX. Here, at least, you - have to specify whether the mmap is supported and which - interleaved format is supported. - When the hardware supports mmap, add the - SNDRV_PCM_INFO_MMAP flag here. When the - hardware supports the interleaved or the non-interleaved - formats, SNDRV_PCM_INFO_INTERLEAVED or - SNDRV_PCM_INFO_NONINTERLEAVED flag must - be set, respectively. If both are supported, you can set both, - too. - - - - In the above example, MMAP_VALID and - BLOCK_TRANSFER are specified for the OSS mmap - mode. Usually both are set. Of course, - MMAP_VALID is set only if the mmap is - really supported. - - - - The other possible flags are - SNDRV_PCM_INFO_PAUSE and - SNDRV_PCM_INFO_RESUME. The - PAUSE bit means that the pcm supports the - pause operation, while the - RESUME bit means that the pcm supports - the full suspend/resume operation. - If the PAUSE flag is set, - the trigger callback below - must handle the corresponding (pause push/release) commands. - The suspend/resume trigger commands can be defined even without - the RESUME flag. See - Power Management section for details. - - - - When the PCM substreams can be synchronized (typically, - synchronized start/stop of a playback and a capture streams), - you can give SNDRV_PCM_INFO_SYNC_START, - too. In this case, you'll need to check the linked-list of - PCM substreams in the trigger callback. This will be - described in the later section. - - - - - - formats field contains the bit-flags - of supported formats (SNDRV_PCM_FMTBIT_XXX). - If the hardware supports more than one format, give all or'ed - bits. In the example above, the signed 16bit little-endian - format is specified. - - - - - - rates field contains the bit-flags of - supported rates (SNDRV_PCM_RATE_XXX). - When the chip supports continuous rates, pass - CONTINUOUS bit additionally. - The pre-defined rate bits are provided only for typical - rates. If your chip supports unconventional rates, you need to add - the KNOT bit and set up the hardware - constraint manually (explained later). - - - - - - rate_min and - rate_max define the minimum and - maximum sample rate. This should correspond somehow to - rates bits. - - - - - - channel_min and - channel_max - define, as you might already expected, the minimum and maximum - number of channels. - - - - - - buffer_bytes_max defines the - maximum buffer size in bytes. There is no - buffer_bytes_min field, since - it can be calculated from the minimum period size and the - minimum number of periods. - Meanwhile, period_bytes_min and - define the minimum and maximum size of the period in bytes. - periods_max and - periods_min define the maximum and - minimum number of periods in the buffer. - - - - The period is a term that corresponds to - a fragment in the OSS world. The period defines the size at - which a PCM interrupt is generated. This size strongly - depends on the hardware. - Generally, the smaller period size will give you more - interrupts, that is, more controls. - In the case of capture, this size defines the input latency. - On the other hand, the whole buffer size defines the - output latency for the playback direction. - - - - - - There is also a field fifo_size. - This specifies the size of the hardware FIFO, but currently it - is neither used in the driver nor in the alsa-lib. So, you - can ignore this field. - - - - -
- -
- PCM Configurations - - Ok, let's go back again to the PCM runtime records. - The most frequently referred records in the runtime instance are - the PCM configurations. - The PCM configurations are stored in the runtime instance - after the application sends hw_params data via - alsa-lib. There are many fields copied from hw_params and - sw_params structs. For example, - format holds the format type - chosen by the application. This field contains the enum value - SNDRV_PCM_FORMAT_XXX. - - - - One thing to be noted is that the configured buffer and period - sizes are stored in frames in the runtime. - In the ALSA world, 1 frame = channels * samples-size. - For conversion between frames and bytes, you can use the - frames_to_bytes() and - bytes_to_frames() helper functions. - - -period_size); -]]> - - - - - - Also, many software parameters (sw_params) are - stored in frames, too. Please check the type of the field. - snd_pcm_uframes_t is for the frames as unsigned - integer while snd_pcm_sframes_t is for the frames - as signed integer. - -
- -
- DMA Buffer Information - - The DMA buffer is defined by the following four fields, - dma_area, - dma_addr, - dma_bytes and - dma_private. - The dma_area holds the buffer - pointer (the logical address). You can call - memcpy from/to - this pointer. Meanwhile, dma_addr - holds the physical address of the buffer. This field is - specified only when the buffer is a linear buffer. - dma_bytes holds the size of buffer - in bytes. dma_private is used for - the ALSA DMA allocator. - - - - If you use a standard ALSA function, - snd_pcm_lib_malloc_pages(), for - allocating the buffer, these fields are set by the ALSA middle - layer, and you should not change them by - yourself. You can read them but not write them. - On the other hand, if you want to allocate the buffer by - yourself, you'll need to manage it in hw_params callback. - At least, dma_bytes is mandatory. - dma_area is necessary when the - buffer is mmapped. If your driver doesn't support mmap, this - field is not necessary. dma_addr - is also optional. You can use - dma_private as you like, too. - -
- -
- Running Status - - The running status can be referred via runtime->status. - This is the pointer to the struct snd_pcm_mmap_status - record. For example, you can get the current DMA hardware - pointer via runtime->status->hw_ptr. - - - - The DMA application pointer can be referred via - runtime->control, which points to the - struct snd_pcm_mmap_control record. - However, accessing directly to this value is not recommended. - -
- -
- Private Data - - You can allocate a record for the substream and store it in - runtime->private_data. Usually, this - is done in - - the open callback. - Don't mix this with pcm->private_data. - The pcm->private_data usually points to the - chip instance assigned statically at the creation of PCM, while the - runtime->private_data points to a dynamic - data structure created at the PCM open callback. - - - -runtime->private_data = data; - .... - } -]]> - - - - - - The allocated object must be released in - - the close callback. - -
- -
- -
- Operators - - OK, now let me give details about each pcm callback - (ops). In general, every callback must - return 0 if successful, or a negative error number - such as -EINVAL. To choose an appropriate - error number, it is advised to check what value other parts of - the kernel return when the same kind of request fails. - - - - The callback function takes at least the argument with - snd_pcm_substream pointer. To retrieve - the chip record from the given substream instance, you can use the - following macro. - - - - - - - - The macro reads substream->private_data, - which is a copy of pcm->private_data. - You can override the former if you need to assign different data - records per PCM substream. For example, the cmi8330 driver assigns - different private_data for playback and capture directions, - because it uses two different codecs (SB- and AD-compatible) for - different directions. - - -
- open callback - - - - - - - - This is called when a pcm substream is opened. - - - - At least, here you have to initialize the runtime->hw - record. Typically, this is done by like this: - - - -runtime; - - runtime->hw = snd_mychip_playback_hw; - return 0; - } -]]> - - - - where snd_mychip_playback_hw is the - pre-defined hardware description. - - - - You can allocate a private data in this callback, as described - in - Private Data section. - - - - If the hardware configuration needs more constraints, set the - hardware constraints here, too. - See - Constraints for more details. - -
- -
- close callback - - - - - - - - Obviously, this is called when a pcm substream is closed. - - - - Any private instance for a pcm substream allocated in the - open callback will be released here. - - - -runtime->private_data); - .... - } -]]> - - - -
- -
- ioctl callback - - This is used for any special call to pcm ioctls. But - usually you can pass a generic ioctl callback, - snd_pcm_lib_ioctl. - -
- -
- hw_params callback - - - - - - - - - - This is called when the hardware parameter - (hw_params) is set - up by the application, - that is, once when the buffer size, the period size, the - format, etc. are defined for the pcm substream. - - - - Many hardware setups should be done in this callback, - including the allocation of buffers. - - - - Parameters to be initialized are retrieved by - params_xxx() macros. To allocate - buffer, you can call a helper function, - - - - - - - - snd_pcm_lib_malloc_pages() is available - only when the DMA buffers have been pre-allocated. - See the section - Buffer Types for more details. - - - - Note that this and prepare callbacks - may be called multiple times per initialization. - For example, the OSS emulation may - call these callbacks at each change via its ioctl. - - - - Thus, you need to be careful not to allocate the same buffers - many times, which will lead to memory leaks! Calling the - helper function above many times is OK. It will release the - previous buffer automatically when it was already allocated. - - - - Another note is that this callback is non-atomic - (schedulable) as default, i.e. when no - nonatomic flag set. - This is important, because the - trigger callback - is atomic (non-schedulable). That is, mutexes or any - schedule-related functions are not available in - trigger callback. - Please see the subsection - - Atomicity for details. - -
- -
- hw_free callback - - - - - - - - - - This is called to release the resources allocated via - hw_params. For example, releasing the - buffer via - snd_pcm_lib_malloc_pages() is done by - calling the following: - - - - - - - - - - This function is always called before the close callback is called. - Also, the callback may be called multiple times, too. - Keep track whether the resource was already released. - -
- -
- prepare callback - - - - - - - - - - This callback is called when the pcm is - prepared. You can set the format type, sample - rate, etc. here. The difference from - hw_params is that the - prepare callback will be called each - time - snd_pcm_prepare() is called, i.e. when - recovering after underruns, etc. - - - - Note that this callback is now non-atomic. - You can use schedule-related functions safely in this callback. - - - - In this and the following callbacks, you can refer to the - values via the runtime record, - substream->runtime. - For example, to get the current - rate, format or channels, access to - runtime->rate, - runtime->format or - runtime->channels, respectively. - The physical address of the allocated buffer is set to - runtime->dma_area. The buffer and period sizes are - in runtime->buffer_size and runtime->period_size, - respectively. - - - - Be careful that this callback will be called many times at - each setup, too. - -
- -
- trigger callback - - - - - - - - This is called when the pcm is started, stopped or paused. - - - - Which action is specified in the second argument, - SNDRV_PCM_TRIGGER_XXX in - <sound/pcm.h>. At least, - the START and STOP - commands must be defined in this callback. - - - - - - - - - - When the pcm supports the pause operation (given in the info - field of the hardware table), the PAUSE_PUSH - and PAUSE_RELEASE commands must be - handled here, too. The former is the command to pause the pcm, - and the latter to restart the pcm again. - - - - When the pcm supports the suspend/resume operation, - regardless of full or partial suspend/resume support, - the SUSPEND and RESUME - commands must be handled, too. - These commands are issued when the power-management status is - changed. Obviously, the SUSPEND and - RESUME commands - suspend and resume the pcm substream, and usually, they - are identical to the STOP and - START commands, respectively. - See the - Power Management section for details. - - - - As mentioned, this callback is atomic as default unless - nonatomic flag set, and - you cannot call functions which may sleep. - The trigger callback should be as minimal as possible, - just really triggering the DMA. The other stuff should be - initialized hw_params and prepare callbacks properly - beforehand. - -
- -
- pointer callback - - - - - - - - This callback is called when the PCM middle layer inquires - the current hardware position on the buffer. The position must - be returned in frames, - ranging from 0 to buffer_size - 1. - - - - This is called usually from the buffer-update routine in the - pcm middle layer, which is invoked when - snd_pcm_period_elapsed() is called in the - interrupt routine. Then the pcm middle layer updates the - position and calculates the available space, and wakes up the - sleeping poll threads, etc. - - - - This callback is also atomic as default. - -
- -
- copy and silence callbacks - - These callbacks are not mandatory, and can be omitted in - most cases. These callbacks are used when the hardware buffer - cannot be in the normal memory space. Some chips have their - own buffer on the hardware which is not mappable. In such a - case, you have to transfer the data manually from the memory - buffer to the hardware buffer. Or, if the buffer is - non-contiguous on both physical and virtual memory spaces, - these callbacks must be defined, too. - - - - If these two callbacks are defined, copy and set-silence - operations are done by them. The detailed will be described in - the later section Buffer and Memory - Management. - -
- -
- ack callback - - This callback is also not mandatory. This callback is called - when the appl_ptr is updated in read or write operations. - Some drivers like emu10k1-fx and cs46xx need to track the - current appl_ptr for the internal buffer, and this callback - is useful only for such a purpose. - - - This callback is atomic as default. - -
- -
- page callback - - - This callback is optional too. This callback is used - mainly for non-contiguous buffers. The mmap calls this - callback to get the page address. Some examples will be - explained in the later section Buffer and Memory - Management, too. - -
-
- -
- Interrupt Handler - - The rest of pcm stuff is the PCM interrupt handler. The - role of PCM interrupt handler in the sound driver is to update - the buffer position and to tell the PCM middle layer when the - buffer position goes across the prescribed period size. To - inform this, call the snd_pcm_period_elapsed() - function. - - - - There are several types of sound chips to generate the interrupts. - - -
- Interrupts at the period (fragment) boundary - - This is the most frequently found type: the hardware - generates an interrupt at each period boundary. - In this case, you can call - snd_pcm_period_elapsed() at each - interrupt. - - - - snd_pcm_period_elapsed() takes the - substream pointer as its argument. Thus, you need to keep the - substream pointer accessible from the chip instance. For - example, define substream field in the chip record to hold the - current running substream pointer, and set the pointer value - at open callback (and reset at close callback). - - - - If you acquire a spinlock in the interrupt handler, and the - lock is used in other pcm callbacks, too, then you have to - release the lock before calling - snd_pcm_period_elapsed(), because - snd_pcm_period_elapsed() calls other pcm - callbacks inside. - - - - Typical code would be like: - - - Interrupt Handler Case #1 - -lock); - .... - if (pcm_irq_invoked(chip)) { - /* call updater, unlock before it */ - spin_unlock(&chip->lock); - snd_pcm_period_elapsed(chip->substream); - spin_lock(&chip->lock); - /* acknowledge the interrupt if necessary */ - } - .... - spin_unlock(&chip->lock); - return IRQ_HANDLED; - } -]]> - - - -
- -
- High frequency timer interrupts - - This happens when the hardware doesn't generate interrupts - at the period boundary but issues timer interrupts at a fixed - timer rate (e.g. es1968 or ymfpci drivers). - In this case, you need to check the current hardware - position and accumulate the processed sample length at each - interrupt. When the accumulated size exceeds the period - size, call - snd_pcm_period_elapsed() and reset the - accumulator. - - - - Typical code would be like the following. - - - Interrupt Handler Case #2 - -lock); - .... - if (pcm_irq_invoked(chip)) { - unsigned int last_ptr, size; - /* get the current hardware pointer (in frames) */ - last_ptr = get_hw_ptr(chip); - /* calculate the processed frames since the - * last update - */ - if (last_ptr < chip->last_ptr) - size = runtime->buffer_size + last_ptr - - chip->last_ptr; - else - size = last_ptr - chip->last_ptr; - /* remember the last updated point */ - chip->last_ptr = last_ptr; - /* accumulate the size */ - chip->size += size; - /* over the period boundary? */ - if (chip->size >= runtime->period_size) { - /* reset the accumulator */ - chip->size %= runtime->period_size; - /* call updater */ - spin_unlock(&chip->lock); - snd_pcm_period_elapsed(substream); - spin_lock(&chip->lock); - } - /* acknowledge the interrupt if necessary */ - } - .... - spin_unlock(&chip->lock); - return IRQ_HANDLED; - } -]]> - - - -
- -
- On calling <function>snd_pcm_period_elapsed()</function> - - In both cases, even if more than one period are elapsed, you - don't have to call - snd_pcm_period_elapsed() many times. Call - only once. And the pcm layer will check the current hardware - pointer and update to the latest status. - -
-
- -
- Atomicity - - One of the most important (and thus difficult to debug) problems - in kernel programming are race conditions. - In the Linux kernel, they are usually avoided via spin-locks, mutexes - or semaphores. In general, if a race condition can happen - in an interrupt handler, it has to be managed atomically, and you - have to use a spinlock to protect the critical session. If the - critical section is not in interrupt handler code and - if taking a relatively long time to execute is acceptable, you - should use mutexes or semaphores instead. - - - - As already seen, some pcm callbacks are atomic and some are - not. For example, the hw_params callback is - non-atomic, while trigger callback is - atomic. This means, the latter is called already in a spinlock - held by the PCM middle layer. Please take this atomicity into - account when you choose a locking scheme in the callbacks. - - - - In the atomic callbacks, you cannot use functions which may call - schedule or go to - sleep. Semaphores and mutexes can sleep, - and hence they cannot be used inside the atomic callbacks - (e.g. trigger callback). - To implement some delay in such a callback, please use - udelay() or mdelay(). - - - - All three atomic callbacks (trigger, pointer, and ack) are - called with local interrupts disabled. - - - - The recent changes in PCM core code, however, allow all PCM - operations to be non-atomic. This assumes that the all caller - sides are in non-atomic contexts. For example, the function - snd_pcm_period_elapsed() is called - typically from the interrupt handler. But, if you set up the - driver to use a threaded interrupt handler, this call can be in - non-atomic context, too. In such a case, you can set - nonatomic filed of - snd_pcm object after creating it. - When this flag is set, mutex and rwsem are used internally in - the PCM core instead of spin and rwlocks, so that you can call - all PCM functions safely in a non-atomic context. - - -
-
- Constraints - - If your chip supports unconventional sample rates, or only the - limited samples, you need to set a constraint for the - condition. - - - - For example, in order to restrict the sample rates in the some - supported values, use - snd_pcm_hw_constraint_list(). - You need to call this function in the open callback. - - - Example of Hardware Constraints - -runtime, 0, - SNDRV_PCM_HW_PARAM_RATE, - &constraints_rates); - if (err < 0) - return err; - .... - } -]]> - - - - - - There are many different constraints. - Look at sound/pcm.h for a complete list. - You can even define your own constraint rules. - For example, let's suppose my_chip can manage a substream of 1 channel - if and only if the format is S16_LE, otherwise it supports any format - specified in the snd_pcm_hardware structure (or in any - other constraint_list). You can build a rule like this: - - - Example of Hardware Constraints for Channels - -bits[0] == SNDRV_PCM_FMTBIT_S16_LE) { - ch.min = ch.max = 1; - ch.integer = 1; - return snd_interval_refine(c, &ch); - } - return 0; - } -]]> - - - - - - Then you need to call this function to add your rule: - - - -runtime, 0, SNDRV_PCM_HW_PARAM_CHANNELS, - hw_rule_channels_by_format, NULL, - SNDRV_PCM_HW_PARAM_FORMAT, -1); -]]> - - - - - - The rule function is called when an application sets the PCM - format, and it refines the number of channels accordingly. - But an application may set the number of channels before - setting the format. Thus you also need to define the inverse rule: - - - Example of Hardware Constraints for Formats - -min < 2) { - fmt.bits[0] &= SNDRV_PCM_FMTBIT_S16_LE; - return snd_mask_refine(f, &fmt); - } - return 0; - } -]]> - - - - - - ...and in the open callback: - - -runtime, 0, SNDRV_PCM_HW_PARAM_FORMAT, - hw_rule_format_by_channels, NULL, - SNDRV_PCM_HW_PARAM_CHANNELS, -1); -]]> - - - - - - I won't give more details here, rather I - would like to say, Luke, use the source. - -
- -
- - - - - - - Control Interface - -
- General - - The control interface is used widely for many switches, - sliders, etc. which are accessed from user-space. Its most - important use is the mixer interface. In other words, since ALSA - 0.9.x, all the mixer stuff is implemented on the control kernel API. - - - - ALSA has a well-defined AC97 control module. If your chip - supports only the AC97 and nothing else, you can skip this - section. - - - - The control API is defined in - <sound/control.h>. - Include this file if you want to add your own controls. - -
- -
- Definition of Controls - - To create a new control, you need to define the - following three - callbacks: info, - get and - put. Then, define a - struct snd_kcontrol_new record, such as: - - - Definition of a Control - - - - - - - - The iface field specifies the control - type, SNDRV_CTL_ELEM_IFACE_XXX, which - is usually MIXER. - Use CARD for global controls that are not - logically part of the mixer. - If the control is closely associated with some specific device on - the sound card, use HWDEP, - PCM, RAWMIDI, - TIMER, or SEQUENCER, and - specify the device number with the - device and - subdevice fields. - - - - The name is the name identifier - string. Since ALSA 0.9.x, the control name is very important, - because its role is classified from its name. There are - pre-defined standard control names. The details are described in - the - Control Names subsection. - - - - The index field holds the index number - of this control. If there are several different controls with - the same name, they can be distinguished by the index - number. This is the case when - several codecs exist on the card. If the index is zero, you can - omit the definition above. - - - - The access field contains the access - type of this control. Give the combination of bit masks, - SNDRV_CTL_ELEM_ACCESS_XXX, there. - The details will be explained in - the - Access Flags subsection. - - - - The private_value field contains - an arbitrary long integer value for this record. When using - the generic info, - get and - put callbacks, you can pass a value - through this field. If several small numbers are necessary, you can - combine them in bitwise. Or, it's possible to give a pointer - (casted to unsigned long) of some record to this field, too. - - - - The tlv field can be used to provide - metadata about the control; see the - - Metadata subsection. - - - - The other three are - - callback functions. - -
- -
- Control Names - - There are some standards to define the control names. A - control is usually defined from the three parts as - SOURCE DIRECTION FUNCTION. - - - - The first, SOURCE, specifies the source - of the control, and is a string such as Master, - PCM, CD and - Line. There are many pre-defined sources. - - - - The second, DIRECTION, is one of the - following strings according to the direction of the control: - Playback, Capture, Bypass - Playback and Bypass Capture. Or, it can - be omitted, meaning both playback and capture directions. - - - - The third, FUNCTION, is one of the - following strings according to the function of the control: - Switch, Volume and - Route. - - - - The example of control names are, thus, Master Capture - Switch or PCM Playback Volume. - - - - There are some exceptions: - - -
- Global capture and playback - - Capture Source, Capture Switch - and Capture Volume are used for the global - capture (input) source, switch and volume. Similarly, - Playback Switch and Playback - Volume are used for the global output gain switch and - volume. - -
- -
- Tone-controls - - tone-control switch and volumes are specified like - Tone Control - XXX, e.g. Tone Control - - Switch, Tone Control - Bass, - Tone Control - Center. - -
- -
- 3D controls - - 3D-control switches and volumes are specified like 3D - Control - XXX, e.g. 3D Control - - Switch, 3D Control - Center, 3D - Control - Space. - -
- -
- Mic boost - - Mic-boost switch is set as Mic Boost or - Mic Boost (6dB). - - - - More precise information can be found in - Documentation/sound/alsa/ControlNames.txt. - -
-
- -
- Access Flags - - - The access flag is the bitmask which specifies the access type - of the given control. The default access type is - SNDRV_CTL_ELEM_ACCESS_READWRITE, - which means both read and write are allowed to this control. - When the access flag is omitted (i.e. = 0), it is - considered as READWRITE access as default. - - - - When the control is read-only, pass - SNDRV_CTL_ELEM_ACCESS_READ instead. - In this case, you don't have to define - the put callback. - Similarly, when the control is write-only (although it's a rare - case), you can use the WRITE flag instead, and - you don't need the get callback. - - - - If the control value changes frequently (e.g. the VU meter), - VOLATILE flag should be given. This means - that the control may be changed without - - notification. Applications should poll such - a control constantly. - - - - When the control is inactive, set - the INACTIVE flag, too. - There are LOCK and - OWNER flags to change the write - permissions. - - -
- -
- Callbacks - -
- info callback - - The info callback is used to get - detailed information on this control. This must store the - values of the given struct snd_ctl_elem_info - object. For example, for a boolean control with a single - element: - - - Example of info callback - -type = SNDRV_CTL_ELEM_TYPE_BOOLEAN; - uinfo->count = 1; - uinfo->value.integer.min = 0; - uinfo->value.integer.max = 1; - return 0; - } -]]> - - - - - - The type field specifies the type - of the control. There are BOOLEAN, - INTEGER, ENUMERATED, - BYTES, IEC958 and - INTEGER64. The - count field specifies the - number of elements in this control. For example, a stereo - volume would have count = 2. The - value field is a union, and - the values stored are depending on the type. The boolean and - integer types are identical. - - - - The enumerated type is a bit different from others. You'll - need to set the string for the currently given item index. - - - -type = SNDRV_CTL_ELEM_TYPE_ENUMERATED; - uinfo->count = 1; - uinfo->value.enumerated.items = 4; - if (uinfo->value.enumerated.item > 3) - uinfo->value.enumerated.item = 3; - strcpy(uinfo->value.enumerated.name, - texts[uinfo->value.enumerated.item]); - return 0; - } -]]> - - - - - - The above callback can be simplified with a helper function, - snd_ctl_enum_info. The final code - looks like below. - (You can pass ARRAY_SIZE(texts) instead of 4 in the third - argument; it's a matter of taste.) - - - - - - - - - - Some common info callbacks are available for your convenience: - snd_ctl_boolean_mono_info() and - snd_ctl_boolean_stereo_info(). - Obviously, the former is an info callback for a mono channel - boolean item, just like snd_myctl_mono_info - above, and the latter is for a stereo channel boolean item. - - -
- -
- get callback - - - This callback is used to read the current value of the - control and to return to user-space. - - - - For example, - - - Example of get callback - -value.integer.value[0] = get_some_value(chip); - return 0; - } -]]> - - - - - - The value field depends on - the type of control as well as on the info callback. For example, - the sb driver uses this field to store the register offset, - the bit-shift and the bit-mask. The - private_value field is set as follows: - - - - - - and is retrieved in callbacks like - - -private_value & 0xff; - int shift = (kcontrol->private_value >> 16) & 0xff; - int mask = (kcontrol->private_value >> 24) & 0xff; - .... - } -]]> - - - - - - In the get callback, - you have to fill all the elements if the - control has more than one elements, - i.e. count > 1. - In the example above, we filled only one element - (value.integer.value[0]) since it's - assumed as count = 1. - -
- -
- put callback - - - This callback is used to write a value from user-space. - - - - For example, - - - Example of put callback - -current_value != - ucontrol->value.integer.value[0]) { - change_current_value(chip, - ucontrol->value.integer.value[0]); - changed = 1; - } - return changed; - } -]]> - - - - As seen above, you have to return 1 if the value is - changed. If the value is not changed, return 0 instead. - If any fatal error happens, return a negative error code as - usual. - - - - As in the get callback, - when the control has more than one elements, - all elements must be evaluated in this callback, too. - -
- -
- Callbacks are not atomic - - All these three callbacks are basically not atomic. - -
-
- -
- Constructor - - When everything is ready, finally we can create a new - control. To create a control, there are two functions to be - called, snd_ctl_new1() and - snd_ctl_add(). - - - - In the simplest way, you can do like this: - - - - - - - - where my_control is the - struct snd_kcontrol_new object defined above, and chip - is the object pointer to be passed to - kcontrol->private_data - which can be referred to in callbacks. - - - - snd_ctl_new1() allocates a new - snd_kcontrol instance, - and snd_ctl_add assigns the given - control component to the card. - -
- -
- Change Notification - - If you need to change and update a control in the interrupt - routine, you can call snd_ctl_notify(). For - example, - - - - - - - - This function takes the card pointer, the event-mask, and the - control id pointer for the notification. The event-mask - specifies the types of notification, for example, in the above - example, the change of control values is notified. - The id pointer is the pointer of struct snd_ctl_elem_id - to be notified. - You can find some examples in es1938.c or - es1968.c for hardware volume interrupts. - -
- -
- Metadata - - To provide information about the dB values of a mixer control, use - on of the DECLARE_TLV_xxx macros from - <sound/tlv.h> to define a variable - containing this information, set thetlv.p - field to point to this variable, and include the - SNDRV_CTL_ELEM_ACCESS_TLV_READ flag in the - access field; like this: - - - - - - - - - The DECLARE_TLV_DB_SCALE macro defines - information about a mixer control where each step in the control's - value changes the dB value by a constant dB amount. - The first parameter is the name of the variable to be defined. - The second parameter is the minimum value, in units of 0.01 dB. - The third parameter is the step size, in units of 0.01 dB. - Set the fourth parameter to 1 if the minimum value actually mutes - the control. - - - - The DECLARE_TLV_DB_LINEAR macro defines - information about a mixer control where the control's value affects - the output linearly. - The first parameter is the name of the variable to be defined. - The second parameter is the minimum value, in units of 0.01 dB. - The third parameter is the maximum value, in units of 0.01 dB. - If the minimum value mutes the control, set the second parameter to - TLV_DB_GAIN_MUTE. - -
- -
- - - - - - - API for AC97 Codec - -
- General - - The ALSA AC97 codec layer is a well-defined one, and you don't - have to write much code to control it. Only low-level control - routines are necessary. The AC97 codec API is defined in - <sound/ac97_codec.h>. - -
- -
- Full Code Example - - - Example of AC97 Interface - -private_data; - .... - /* read a register value here from the codec */ - return the_register_value; - } - - static void snd_mychip_ac97_write(struct snd_ac97 *ac97, - unsigned short reg, unsigned short val) - { - struct mychip *chip = ac97->private_data; - .... - /* write the given register value to the codec */ - } - - static int snd_mychip_ac97(struct mychip *chip) - { - struct snd_ac97_bus *bus; - struct snd_ac97_template ac97; - int err; - static struct snd_ac97_bus_ops ops = { - .write = snd_mychip_ac97_write, - .read = snd_mychip_ac97_read, - }; - - err = snd_ac97_bus(chip->card, 0, &ops, NULL, &bus); - if (err < 0) - return err; - memset(&ac97, 0, sizeof(ac97)); - ac97.private_data = chip; - return snd_ac97_mixer(bus, &ac97, &chip->ac97); - } - -]]> - - - -
- -
- Constructor - - To create an ac97 instance, first call snd_ac97_bus - with an ac97_bus_ops_t record with callback functions. - - - - - - - - The bus record is shared among all belonging ac97 instances. - - - - And then call snd_ac97_mixer() with an - struct snd_ac97_template - record together with the bus pointer created above. - - - -ac97); -]]> - - - - where chip->ac97 is a pointer to a newly created - ac97_t instance. - In this case, the chip pointer is set as the private data, so that - the read/write callback functions can refer to this chip instance. - This instance is not necessarily stored in the chip - record. If you need to change the register values from the - driver, or need the suspend/resume of ac97 codecs, keep this - pointer to pass to the corresponding functions. - -
- -
- Callbacks - - The standard callbacks are read and - write. Obviously they - correspond to the functions for read and write accesses to the - hardware low-level codes. - - - - The read callback returns the - register value specified in the argument. - - - -private_data; - .... - return the_register_value; - } -]]> - - - - Here, the chip can be cast from ac97->private_data. - - - - Meanwhile, the write callback is - used to set the register value. - - - - - - - - - - These callbacks are non-atomic like the control API callbacks. - - - - There are also other callbacks: - reset, - wait and - init. - - - - The reset callback is used to reset - the codec. If the chip requires a special kind of reset, you can - define this callback. - - - - The wait callback is used to - add some waiting time in the standard initialization of the codec. If the - chip requires the extra waiting time, define this callback. - - - - The init callback is used for - additional initialization of the codec. - -
- -
- Updating Registers in The Driver - - If you need to access to the codec from the driver, you can - call the following functions: - snd_ac97_write(), - snd_ac97_read(), - snd_ac97_update() and - snd_ac97_update_bits(). - - - - Both snd_ac97_write() and - snd_ac97_update() functions are used to - set a value to the given register - (AC97_XXX). The difference between them is - that snd_ac97_update() doesn't write a - value if the given value has been already set, while - snd_ac97_write() always rewrites the - value. - - - - - - - - - - snd_ac97_read() is used to read the value - of the given register. For example, - - - - - - - - - - snd_ac97_update_bits() is used to update - some bits in the given register. - - - - - - - - - - Also, there is a function to change the sample rate (of a - given register such as - AC97_PCM_FRONT_DAC_RATE) when VRA or - DRA is supported by the codec: - snd_ac97_set_rate(). - - - - - - - - - - The following registers are available to set the rate: - AC97_PCM_MIC_ADC_RATE, - AC97_PCM_FRONT_DAC_RATE, - AC97_PCM_LR_ADC_RATE, - AC97_SPDIF. When - AC97_SPDIF is specified, the register is - not really changed but the corresponding IEC958 status bits will - be updated. - -
- -
- Clock Adjustment - - In some chips, the clock of the codec isn't 48000 but using a - PCI clock (to save a quartz!). In this case, change the field - bus->clock to the corresponding - value. For example, intel8x0 - and es1968 drivers have their own function to read from the clock. - -
- -
- Proc Files - - The ALSA AC97 interface will create a proc file such as - /proc/asound/card0/codec97#0/ac97#0-0 and - ac97#0-0+regs. You can refer to these files to - see the current status and registers of the codec. - -
- -
- Multiple Codecs - - When there are several codecs on the same card, you need to - call snd_ac97_mixer() multiple times with - ac97.num=1 or greater. The num field - specifies the codec number. - - - - If you set up multiple codecs, you either need to write - different callbacks for each codec or check - ac97->num in the callback routines. - -
- -
- - - - - - - MIDI (MPU401-UART) Interface - -
- General - - Many soundcards have built-in MIDI (MPU401-UART) - interfaces. When the soundcard supports the standard MPU401-UART - interface, most likely you can use the ALSA MPU401-UART API. The - MPU401-UART API is defined in - <sound/mpu401.h>. - - - - Some soundchips have a similar but slightly different - implementation of mpu401 stuff. For example, emu10k1 has its own - mpu401 routines. - -
- -
- Constructor - - To create a rawmidi object, call - snd_mpu401_uart_new(). - - - - - - - - - - The first argument is the card pointer, and the second is the - index of this component. You can create up to 8 rawmidi - devices. - - - - The third argument is the type of the hardware, - MPU401_HW_XXX. If it's not a special one, - you can use MPU401_HW_MPU401. - - - - The 4th argument is the I/O port address. Many - backward-compatible MPU401 have an I/O port such as 0x330. Or, it - might be a part of its own PCI I/O region. It depends on the - chip design. - - - - The 5th argument is a bitflag for additional information. - When the I/O port address above is part of the PCI I/O - region, the MPU401 I/O port might have been already allocated - (reserved) by the driver itself. In such a case, pass a bit flag - MPU401_INFO_INTEGRATED, - and the mpu401-uart layer will allocate the I/O ports by itself. - - - - When the controller supports only the input or output MIDI stream, - pass the MPU401_INFO_INPUT or - MPU401_INFO_OUTPUT bitflag, respectively. - Then the rawmidi instance is created as a single stream. - - - - MPU401_INFO_MMIO bitflag is used to change - the access method to MMIO (via readb and writeb) instead of - iob and outb. In this case, you have to pass the iomapped address - to snd_mpu401_uart_new(). - - - - When MPU401_INFO_TX_IRQ is set, the output - stream isn't checked in the default interrupt handler. The driver - needs to call snd_mpu401_uart_interrupt_tx() - by itself to start processing the output stream in the irq handler. - - - - If the MPU-401 interface shares its interrupt with the other logical - devices on the card, set MPU401_INFO_IRQ_HOOK - (see - below). - - - - Usually, the port address corresponds to the command port and - port + 1 corresponds to the data port. If not, you may change - the cport field of - struct snd_mpu401 manually - afterward. However, snd_mpu401 pointer is not - returned explicitly by - snd_mpu401_uart_new(). You need to cast - rmidi->private_data to - snd_mpu401 explicitly, - - - -private_data; -]]> - - - - and reset the cport as you like: - - - -cport = my_own_control_port; -]]> - - - - - - The 6th argument specifies the ISA irq number that will be - allocated. If no interrupt is to be allocated (because your - code is already allocating a shared interrupt, or because the - device does not use interrupts), pass -1 instead. - For a MPU-401 device without an interrupt, a polling timer - will be used instead. - -
- -
- Interrupt Handler - - When the interrupt is allocated in - snd_mpu401_uart_new(), an exclusive ISA - interrupt handler is automatically used, hence you don't have - anything else to do than creating the mpu401 stuff. Otherwise, you - have to set MPU401_INFO_IRQ_HOOK, and call - snd_mpu401_uart_interrupt() explicitly from your - own interrupt handler when it has determined that a UART interrupt - has occurred. - - - - In this case, you need to pass the private_data of the - returned rawmidi object from - snd_mpu401_uart_new() as the second - argument of snd_mpu401_uart_interrupt(). - - - -private_data, regs); -]]> - - - -
- -
- - - - - - - RawMIDI Interface - -
- Overview - - - The raw MIDI interface is used for hardware MIDI ports that can - be accessed as a byte stream. It is not used for synthesizer - chips that do not directly understand MIDI. - - - - ALSA handles file and buffer management. All you have to do is - to write some code to move data between the buffer and the - hardware. - - - - The rawmidi API is defined in - <sound/rawmidi.h>. - -
- -
- Constructor - - - To create a rawmidi device, call the - snd_rawmidi_new function: - - -card, "MyMIDI", 0, outs, ins, &rmidi); - if (err < 0) - return err; - rmidi->private_data = chip; - strcpy(rmidi->name, "My MIDI"); - rmidi->info_flags = SNDRV_RAWMIDI_INFO_OUTPUT | - SNDRV_RAWMIDI_INFO_INPUT | - SNDRV_RAWMIDI_INFO_DUPLEX; -]]> - - - - - - The first argument is the card pointer, the second argument is - the ID string. - - - - The third argument is the index of this component. You can - create up to 8 rawmidi devices. - - - - The fourth and fifth arguments are the number of output and - input substreams, respectively, of this device (a substream is - the equivalent of a MIDI port). - - - - Set the info_flags field to specify - the capabilities of the device. - Set SNDRV_RAWMIDI_INFO_OUTPUT if there is - at least one output port, - SNDRV_RAWMIDI_INFO_INPUT if there is at - least one input port, - and SNDRV_RAWMIDI_INFO_DUPLEX if the device - can handle output and input at the same time. - - - - After the rawmidi device is created, you need to set the - operators (callbacks) for each substream. There are helper - functions to set the operators for all the substreams of a device: - - - - - - - - - The operators are usually defined like this: - - - - - - These callbacks are explained in the Callbacks - section. - - - - If there are more than one substream, you should give a - unique name to each of them: - - -streams[SNDRV_RAWMIDI_STREAM_OUTPUT].substreams, - list { - sprintf(substream->name, "My MIDI Port %d", substream->number + 1); - } - /* same for SNDRV_RAWMIDI_STREAM_INPUT */ -]]> - - - -
- -
- Callbacks - - - In all the callbacks, the private data that you've set for the - rawmidi device can be accessed as - substream->rmidi->private_data. - - - - - If there is more than one port, your callbacks can determine the - port index from the struct snd_rawmidi_substream data passed to each - callback: - - -number; -]]> - - - - -
- <function>open</function> callback - - - - - - - - - This is called when a substream is opened. - You can initialize the hardware here, but you shouldn't - start transmitting/receiving data yet. - -
- -
- <function>close</function> callback - - - - - - - - - Guess what. - - - - The open and close - callbacks of a rawmidi device are serialized with a mutex, - and can sleep. - -
- -
- <function>trigger</function> callback for output - substreams - - - - - - - - - This is called with a nonzero up - parameter when there is some data in the substream buffer that - must be transmitted. - - - - To read data from the buffer, call - snd_rawmidi_transmit_peek. It will - return the number of bytes that have been read; this will be - less than the number of bytes requested when there are no more - data in the buffer. - After the data have been transmitted successfully, call - snd_rawmidi_transmit_ack to remove the - data from the substream buffer: - - - - - - - - - If you know beforehand that the hardware will accept data, you - can use the snd_rawmidi_transmit function - which reads some data and removes them from the buffer at once: - - - - - - - - - If you know beforehand how many bytes you can accept, you can - use a buffer size greater than one with the - snd_rawmidi_transmit* functions. - - - - The trigger callback must not sleep. If - the hardware FIFO is full before the substream buffer has been - emptied, you have to continue transmitting data later, either - in an interrupt handler, or with a timer if the hardware - doesn't have a MIDI transmit interrupt. - - - - The trigger callback is called with a - zero up parameter when the transmission - of data should be aborted. - -
- -
- <function>trigger</function> callback for input - substreams - - - - - - - - - This is called with a nonzero up - parameter to enable receiving data, or with a zero - up parameter do disable receiving data. - - - - The trigger callback must not sleep; the - actual reading of data from the device is usually done in an - interrupt handler. - - - - When data reception is enabled, your interrupt handler should - call snd_rawmidi_receive for all received - data: - - - - - - -
- -
- <function>drain</function> callback - - - - - - - - - This is only used with output substreams. This function should wait - until all data read from the substream buffer have been transmitted. - This ensures that the device can be closed and the driver unloaded - without losing data. - - - - This callback is optional. If you do not set - drain in the struct snd_rawmidi_ops - structure, ALSA will simply wait for 50 milliseconds - instead. - -
-
- -
- - - - - - - Miscellaneous Devices - -
- FM OPL3 - - The FM OPL3 is still used in many chips (mainly for backward - compatibility). ALSA has a nice OPL3 FM control layer, too. The - OPL3 API is defined in - <sound/opl3.h>. - - - - FM registers can be directly accessed through the direct-FM API, - defined in <sound/asound_fm.h>. In - ALSA native mode, FM registers are accessed through - the Hardware-Dependent Device direct-FM extension API, whereas in - OSS compatible mode, FM registers can be accessed with the OSS - direct-FM compatible API in /dev/dmfmX device. - - - - To create the OPL3 component, you have two functions to - call. The first one is a constructor for the opl3_t - instance. - - - - - - - - - - The first argument is the card pointer, the second one is the - left port address, and the third is the right port address. In - most cases, the right port is placed at the left port + 2. - - - - The fourth argument is the hardware type. - - - - When the left and right ports have been already allocated by - the card driver, pass non-zero to the fifth argument - (integrated). Otherwise, the opl3 module will - allocate the specified ports by itself. - - - - When the accessing the hardware requires special method - instead of the standard I/O access, you can create opl3 instance - separately with snd_opl3_new(). - - - - - - - - - - Then set command, - private_data and - private_free for the private - access function, the private data and the destructor. - The l_port and r_port are not necessarily set. Only the - command must be set properly. You can retrieve the data - from the opl3->private_data field. - - - - After creating the opl3 instance via snd_opl3_new(), - call snd_opl3_init() to initialize the chip to the - proper state. Note that snd_opl3_create() always - calls it internally. - - - - If the opl3 instance is created successfully, then create a - hwdep device for this opl3. - - - - - - - - - - The first argument is the opl3_t instance you - created, and the second is the index number, usually 0. - - - - The third argument is the index-offset for the sequencer - client assigned to the OPL3 port. When there is an MPU401-UART, - give 1 for here (UART always takes 0). - -
- -
- Hardware-Dependent Devices - - Some chips need user-space access for special - controls or for loading the micro code. In such a case, you can - create a hwdep (hardware-dependent) device. The hwdep API is - defined in <sound/hwdep.h>. You can - find examples in opl3 driver or - isa/sb/sb16_csp.c. - - - - The creation of the hwdep instance is done via - snd_hwdep_new(). - - - - - - - - where the third argument is the index number. - - - - You can then pass any pointer value to the - private_data. - If you assign a private data, you should define the - destructor, too. The destructor function is set in - the private_free field. - - - -private_data = p; - hw->private_free = mydata_free; -]]> - - - - and the implementation of the destructor would be: - - - -private_data; - kfree(p); - } -]]> - - - - - - The arbitrary file operations can be defined for this - instance. The file operators are defined in - the ops table. For example, assume that - this chip needs an ioctl. - - - -ops.open = mydata_open; - hw->ops.ioctl = mydata_ioctl; - hw->ops.release = mydata_release; -]]> - - - - And implement the callback functions as you like. - -
- -
- IEC958 (S/PDIF) - - Usually the controls for IEC958 devices are implemented via - the control interface. There is a macro to compose a name string for - IEC958 controls, SNDRV_CTL_NAME_IEC958() - defined in <include/asound.h>. - - - - There are some standard controls for IEC958 status bits. These - controls use the type SNDRV_CTL_ELEM_TYPE_IEC958, - and the size of element is fixed as 4 bytes array - (value.iec958.status[x]). For the info - callback, you don't specify - the value field for this type (the count field must be set, - though). - - - - IEC958 Playback Con Mask is used to return the - bit-mask for the IEC958 status bits of consumer mode. Similarly, - IEC958 Playback Pro Mask returns the bitmask for - professional mode. They are read-only controls, and are defined - as MIXER controls (iface = - SNDRV_CTL_ELEM_IFACE_MIXER). - - - - Meanwhile, IEC958 Playback Default control is - defined for getting and setting the current default IEC958 - bits. Note that this one is usually defined as a PCM control - (iface = SNDRV_CTL_ELEM_IFACE_PCM), - although in some places it's defined as a MIXER control. - - - - In addition, you can define the control switches to - enable/disable or to set the raw bit mode. The implementation - will depend on the chip, but the control should be named as - IEC958 xxx, preferably using - the SNDRV_CTL_NAME_IEC958() macro. - - - - You can find several cases, for example, - pci/emu10k1, - pci/ice1712, or - pci/cmipci.c. - -
- -
- - - - - - - Buffer and Memory Management - -
- Buffer Types - - ALSA provides several different buffer allocation functions - depending on the bus and the architecture. All these have a - consistent API. The allocation of physically-contiguous pages is - done via - snd_malloc_xxx_pages() function, where xxx - is the bus type. - - - - The allocation of pages with fallback is - snd_malloc_xxx_pages_fallback(). This - function tries to allocate the specified pages but if the pages - are not available, it tries to reduce the page sizes until - enough space is found. - - - - The release the pages, call - snd_free_xxx_pages() function. - - - - Usually, ALSA drivers try to allocate and reserve - a large contiguous physical space - at the time the module is loaded for the later use. - This is called pre-allocation. - As already written, you can call the following function at - pcm instance construction time (in the case of PCI bus). - - - - - - - - where size is the byte size to be - pre-allocated and the max is the maximum - size to be changed via the prealloc proc file. - The allocator will try to get an area as large as possible - within the given size. - - - - The second argument (type) and the third argument (device pointer) - are dependent on the bus. - In the case of the ISA bus, pass snd_dma_isa_data() - as the third argument with SNDRV_DMA_TYPE_DEV type. - For the continuous buffer unrelated to the bus can be pre-allocated - with SNDRV_DMA_TYPE_CONTINUOUS type and the - snd_dma_continuous_data(GFP_KERNEL) device pointer, - where GFP_KERNEL is the kernel allocation flag to - use. - For the PCI scatter-gather buffers, use - SNDRV_DMA_TYPE_DEV_SG with - snd_dma_pci_data(pci) - (see the - Non-Contiguous Buffers - section). - - - - Once the buffer is pre-allocated, you can use the - allocator in the hw_params callback: - - - - - - - - Note that you have to pre-allocate to use this function. - -
- -
- External Hardware Buffers - - Some chips have their own hardware buffers and the DMA - transfer from the host memory is not available. In such a case, - you need to either 1) copy/set the audio data directly to the - external hardware buffer, or 2) make an intermediate buffer and - copy/set the data from it to the external hardware buffer in - interrupts (or in tasklets, preferably). - - - - The first case works fine if the external hardware buffer is large - enough. This method doesn't need any extra buffers and thus is - more effective. You need to define the - copy and - silence callbacks for - the data transfer. However, there is a drawback: it cannot - be mmapped. The examples are GUS's GF1 PCM or emu8000's - wavetable PCM. - - - - The second case allows for mmap on the buffer, although you have - to handle an interrupt or a tasklet to transfer the data - from the intermediate buffer to the hardware buffer. You can find an - example in the vxpocket driver. - - - - Another case is when the chip uses a PCI memory-map - region for the buffer instead of the host memory. In this case, - mmap is available only on certain architectures like the Intel one. - In non-mmap mode, the data cannot be transferred as in the normal - way. Thus you need to define the copy and - silence callbacks as well, - as in the cases above. The examples are found in - rme32.c and rme96.c. - - - - The implementation of the copy and - silence callbacks depends upon - whether the hardware supports interleaved or non-interleaved - samples. The copy callback is - defined like below, a bit - differently depending whether the direction is playback or - capture: - - - - - - - - - - In the case of interleaved samples, the second argument - (channel) is not used. The third argument - (pos) points the - current position offset in frames. - - - - The meaning of the fourth argument is different between - playback and capture. For playback, it holds the source data - pointer, and for capture, it's the destination data pointer. - - - - The last argument is the number of frames to be copied. - - - - What you have to do in this callback is again different - between playback and capture directions. In the - playback case, you copy the given amount of data - (count) at the specified pointer - (src) to the specified offset - (pos) on the hardware buffer. When - coded like memcpy-like way, the copy would be like: - - - - - - - - - - For the capture direction, you copy the given amount of - data (count) at the specified offset - (pos) on the hardware buffer to the - specified pointer (dst). - - - - - - - - Note that both the position and the amount of data are given - in frames. - - - - In the case of non-interleaved samples, the implementation - will be a bit more complicated. - - - - You need to check the channel argument, and if it's -1, copy - the whole channels. Otherwise, you have to copy only the - specified channel. Please check - isa/gus/gus_pcm.c as an example. - - - - The silence callback is also - implemented in a similar way. - - - - - - - - - - The meanings of arguments are the same as in the - copy - callback, although there is no src/dst - argument. In the case of interleaved samples, the channel - argument has no meaning, as well as on - copy callback. - - - - The role of silence callback is to - set the given amount - (count) of silence data at the - specified offset (pos) on the hardware - buffer. Suppose that the data format is signed (that is, the - silent-data is 0), and the implementation using a memset-like - function would be like: - - - - - - - - - - In the case of non-interleaved samples, again, the - implementation becomes a bit more complicated. See, for example, - isa/gus/gus_pcm.c. - -
- -
- Non-Contiguous Buffers - - If your hardware supports the page table as in emu10k1 or the - buffer descriptors as in via82xx, you can use the scatter-gather - (SG) DMA. ALSA provides an interface for handling SG-buffers. - The API is provided in <sound/pcm.h>. - - - - For creating the SG-buffer handler, call - snd_pcm_lib_preallocate_pages() or - snd_pcm_lib_preallocate_pages_for_all() - with SNDRV_DMA_TYPE_DEV_SG - in the PCM constructor like other PCI pre-allocator. - You need to pass snd_dma_pci_data(pci), - where pci is the struct pci_dev pointer - of the chip as well. - The struct snd_sg_buf instance is created as - substream->dma_private. You can cast - the pointer like: - - - -dma_private; -]]> - - - - - - Then call snd_pcm_lib_malloc_pages() - in the hw_params callback - as well as in the case of normal PCI buffer. - The SG-buffer handler will allocate the non-contiguous kernel - pages of the given size and map them onto the virtually contiguous - memory. The virtual pointer is addressed in runtime->dma_area. - The physical address (runtime->dma_addr) is set to zero, - because the buffer is physically non-contiguous. - The physical address table is set up in sgbuf->table. - You can get the physical address at a certain offset via - snd_pcm_sgbuf_get_addr(). - - - - When a SG-handler is used, you need to set - snd_pcm_sgbuf_ops_page as - the page callback. - (See - page callback section.) - - - - To release the data, call - snd_pcm_lib_free_pages() in the - hw_free callback as usual. - -
- -
- Vmalloc'ed Buffers - - It's possible to use a buffer allocated via - vmalloc, for example, for an intermediate - buffer. Since the allocated pages are not contiguous, you need - to set the page callback to obtain - the physical address at every offset. - - - - The implementation of page callback - would be like this: - - - - - - /* get the physical page pointer on the given offset */ - static struct page *mychip_page(struct snd_pcm_substream *substream, - unsigned long offset) - { - void *pageptr = substream->runtime->dma_area + offset; - return vmalloc_to_page(pageptr); - } -]]> - - - -
- -
- - - - - - - Proc Interface - - ALSA provides an easy interface for procfs. The proc files are - very useful for debugging. I recommend you set up proc files if - you write a driver and want to get a running status or register - dumps. The API is found in - <sound/info.h>. - - - - To create a proc file, call - snd_card_proc_new(). - - - - - - - - where the second argument specifies the name of the proc file to be - created. The above example will create a file - my-file under the card directory, - e.g. /proc/asound/card0/my-file. - - - - Like other components, the proc entry created via - snd_card_proc_new() will be registered and - released automatically in the card registration and release - functions. - - - - When the creation is successful, the function stores a new - instance in the pointer given in the third argument. - It is initialized as a text proc file for read only. To use - this proc file as a read-only text file as it is, set the read - callback with a private data via - snd_info_set_text_ops(). - - - - - - - - where the second argument (chip) is the - private data to be used in the callbacks. The third parameter - specifies the read buffer size and the fourth - (my_proc_read) is the callback function, which - is defined like - - - - - - - - - - - In the read callback, use snd_iprintf() for - output strings, which works just like normal - printf(). For example, - - - -private_data; - - snd_iprintf(buffer, "This is my chip!\n"); - snd_iprintf(buffer, "Port = %ld\n", chip->port); - } -]]> - - - - - - The file permissions can be changed afterwards. As default, it's - set as read only for all users. If you want to add write - permission for the user (root as default), do as follows: - - - -mode = S_IFREG | S_IRUGO | S_IWUSR; -]]> - - - - and set the write buffer size and the callback - - - -c.text.write = my_proc_write; -]]> - - - - - - For the write callback, you can use - snd_info_get_line() to get a text line, and - snd_info_get_str() to retrieve a string from - the line. Some examples are found in - core/oss/mixer_oss.c, core/oss/and - pcm_oss.c. - - - - For a raw-data proc-file, set the attributes as follows: - - - -content = SNDRV_INFO_CONTENT_DATA; - entry->private_data = chip; - entry->c.ops = &my_file_io_ops; - entry->size = 4096; - entry->mode = S_IFREG | S_IRUGO; -]]> - - - - For the raw data, size field must be - set properly. This specifies the maximum size of the proc file access. - - - - The read/write callbacks of raw mode are more direct than the text mode. - You need to use a low-level I/O functions such as - copy_from/to_user() to transfer the - data. - - - - - - - - If the size of the info entry has been set up properly, - count and pos are - guaranteed to fit within 0 and the given size. - You don't have to check the range in the callbacks unless any - other condition is required. - - - - - - - - - - - Power Management - - If the chip is supposed to work with suspend/resume - functions, you need to add power-management code to the - driver. The additional code for power-management should be - ifdef'ed with - CONFIG_PM. - - - - If the driver fully supports suspend/resume - that is, the device can be - properly resumed to its state when suspend was called, - you can set the SNDRV_PCM_INFO_RESUME flag - in the pcm info field. Usually, this is possible when the - registers of the chip can be safely saved and restored to - RAM. If this is set, the trigger callback is called with - SNDRV_PCM_TRIGGER_RESUME after the resume - callback completes. - - - - Even if the driver doesn't support PM fully but - partial suspend/resume is still possible, it's still worthy to - implement suspend/resume callbacks. In such a case, applications - would reset the status by calling - snd_pcm_prepare() and restart the stream - appropriately. Hence, you can define suspend/resume callbacks - below but don't set SNDRV_PCM_INFO_RESUME - info flag to the PCM. - - - - Note that the trigger with SUSPEND can always be called when - snd_pcm_suspend_all is called, - regardless of the SNDRV_PCM_INFO_RESUME flag. - The RESUME flag affects only the behavior - of snd_pcm_resume(). - (Thus, in theory, - SNDRV_PCM_TRIGGER_RESUME isn't needed - to be handled in the trigger callback when no - SNDRV_PCM_INFO_RESUME flag is set. But, - it's better to keep it for compatibility reasons.) - - - In the earlier version of ALSA drivers, a common - power-management layer was provided, but it has been removed. - The driver needs to define the suspend/resume hooks according to - the bus the device is connected to. In the case of PCI drivers, the - callbacks look like below: - - - - - - - - - - The scheme of the real suspend job is as follows. - - - Retrieve the card and the chip data. - Call snd_power_change_state() with - SNDRV_CTL_POWER_D3hot to change the - power status. - Call snd_pcm_suspend_all() to suspend the running PCM streams. - If AC97 codecs are used, call - snd_ac97_suspend() for each codec. - Save the register values if necessary. - Stop the hardware if necessary. - Disable the PCI device by calling - pci_disable_device(). Then, call - pci_save_state() at last. - - - - - A typical code would be like: - - - -private_data; - /* (2) */ - snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); - /* (3) */ - snd_pcm_suspend_all(chip->pcm); - /* (4) */ - snd_ac97_suspend(chip->ac97); - /* (5) */ - snd_mychip_save_registers(chip); - /* (6) */ - snd_mychip_stop_hardware(chip); - /* (7) */ - pci_disable_device(pci); - pci_save_state(pci); - return 0; - } -]]> - - - - - - The scheme of the real resume job is as follows. - - - Retrieve the card and the chip data. - Set up PCI. First, call pci_restore_state(). - Then enable the pci device again by calling pci_enable_device(). - Call pci_set_master() if necessary, too. - Re-initialize the chip. - Restore the saved registers if necessary. - Resume the mixer, e.g. calling - snd_ac97_resume(). - Restart the hardware (if any). - Call snd_power_change_state() with - SNDRV_CTL_POWER_D0 to notify the processes. - - - - - A typical code would be like: - - - -private_data; - /* (2) */ - pci_restore_state(pci); - pci_enable_device(pci); - pci_set_master(pci); - /* (3) */ - snd_mychip_reinit_chip(chip); - /* (4) */ - snd_mychip_restore_registers(chip); - /* (5) */ - snd_ac97_resume(chip->ac97); - /* (6) */ - snd_mychip_restart_chip(chip); - /* (7) */ - snd_power_change_state(card, SNDRV_CTL_POWER_D0); - return 0; - } -]]> - - - - - - As shown in the above, it's better to save registers after - suspending the PCM operations via - snd_pcm_suspend_all() or - snd_pcm_suspend(). It means that the PCM - streams are already stopped when the register snapshot is - taken. But, remember that you don't have to restart the PCM - stream in the resume callback. It'll be restarted via - trigger call with SNDRV_PCM_TRIGGER_RESUME - when necessary. - - - - OK, we have all callbacks now. Let's set them up. In the - initialization of the card, make sure that you can get the chip - data from the card instance, typically via - private_data field, in case you - created the chip data individually. - - - -dev, index[dev], id[dev], THIS_MODULE, - 0, &card); - .... - chip = kzalloc(sizeof(*chip), GFP_KERNEL); - .... - card->private_data = chip; - .... - } -]]> - - - - When you created the chip data with - snd_card_new(), it's anyway accessible - via private_data field. - - - -dev, index[dev], id[dev], THIS_MODULE, - sizeof(struct mychip), &card); - .... - chip = card->private_data; - .... - } -]]> - - - - - - - If you need a space to save the registers, allocate the - buffer for it here, too, since it would be fatal - if you cannot allocate a memory in the suspend phase. - The allocated buffer should be released in the corresponding - destructor. - - - - And next, set suspend/resume callbacks to the pci_driver. - - - - - - - - - - - - - - - - Module Parameters - - There are standard module options for ALSA. At least, each - module should have the index, - id and enable - options. - - - - If the module supports multiple cards (usually up to - 8 = SNDRV_CARDS cards), they should be - arrays. The default initial values are defined already as - constants for easier programming: - - - - - - - - - - If the module supports only a single card, they could be single - variables, instead. enable option is not - always necessary in this case, but it would be better to have a - dummy option for compatibility. - - - - The module parameters must be declared with the standard - module_param()(), - module_param_array()() and - MODULE_PARM_DESC() macros. - - - - The typical coding would be like below: - - - - - - - - - - Also, don't forget to define the module description, classes, - license and devices. Especially, the recent modprobe requires to - define the module license as GPL, etc., otherwise the system is - shown as tainted. - - - - - - - - - - - - - - - - How To Put Your Driver Into ALSA Tree -
- General - - So far, you've learned how to write the driver codes. - And you might have a question now: how to put my own - driver into the ALSA driver tree? - Here (finally :) the standard procedure is described briefly. - - - - Suppose that you create a new PCI driver for the card - xyz. The card module name would be - snd-xyz. The new driver is usually put into the alsa-driver - tree, alsa-driver/pci directory in - the case of PCI cards. - Then the driver is evaluated, audited and tested - by developers and users. After a certain time, the driver - will go to the alsa-kernel tree (to the corresponding directory, - such as alsa-kernel/pci) and eventually - will be integrated into the Linux 2.6 tree (the directory would be - linux/sound/pci). - - - - In the following sections, the driver code is supposed - to be put into alsa-driver tree. The two cases are covered: - a driver consisting of a single source file and one consisting - of several source files. - -
- -
- Driver with A Single Source File - - - - - Modify alsa-driver/pci/Makefile - - - - Suppose you have a file xyz.c. Add the following - two lines - - - - - - - - - - - Create the Kconfig entry - - - - Add the new entry of Kconfig for your xyz driver. - - - - - - - the line, select SND_PCM, specifies that the driver xyz supports - PCM. In addition to SND_PCM, the following components are - supported for select command: - SND_RAWMIDI, SND_TIMER, SND_HWDEP, SND_MPU401_UART, - SND_OPL3_LIB, SND_OPL4_LIB, SND_VX_LIB, SND_AC97_CODEC. - Add the select command for each supported component. - - - - Note that some selections imply the lowlevel selections. - For example, PCM includes TIMER, MPU401_UART includes RAWMIDI, - AC97_CODEC includes PCM, and OPL3_LIB includes HWDEP. - You don't need to give the lowlevel selections again. - - - - For the details of Kconfig script, refer to the kbuild - documentation. - - - - - - - Run cvscompile script to re-generate the configure script and - build the whole stuff again. - - - - -
- -
- Drivers with Several Source Files - - Suppose that the driver snd-xyz have several source files. - They are located in the new subdirectory, - pci/xyz. - - - - - Add a new directory (xyz) in - alsa-driver/pci/Makefile as below - - - - - - - - - - - - Under the directory xyz, create a Makefile - - - Sample Makefile for a driver xyz - - - - - - - - - - Create the Kconfig entry - - - - This procedure is as same as in the last section. - - - - - - Run cvscompile script to re-generate the configure script and - build the whole stuff again. - - - - -
- -
- - - - - - Useful Functions - -
- <function>snd_printk()</function> and friends - - ALSA provides a verbose version of the - printk() function. If a kernel config - CONFIG_SND_VERBOSE_PRINTK is set, this - function prints the given message together with the file name - and the line of the caller. The KERN_XXX - prefix is processed as - well as the original printk() does, so it's - recommended to add this prefix, e.g. - - - - - - - - - - There are also printk()'s for - debugging. snd_printd() can be used for - general debugging purposes. If - CONFIG_SND_DEBUG is set, this function is - compiled, and works just like - snd_printk(). If the ALSA is compiled - without the debugging flag, it's ignored. - - - - snd_printdd() is compiled in only when - CONFIG_SND_DEBUG_VERBOSE is set. Please note - that CONFIG_SND_DEBUG_VERBOSE is not set as default - even if you configure the alsa-driver with - option. You need to give - explicitly option instead. - -
- -
- <function>snd_BUG()</function> - - It shows the BUG? message and - stack trace as well as snd_BUG_ON at the point. - It's useful to show that a fatal error happens there. - - - When no debug flag is set, this macro is ignored. - -
- -
- <function>snd_BUG_ON()</function> - - snd_BUG_ON() macro is similar with - WARN_ON() macro. For example, - - - - - - - - or it can be used as the condition, - - - - - - - - - - The macro takes an conditional expression to evaluate. - When CONFIG_SND_DEBUG, is set, if the - expression is non-zero, it shows the warning message such as - BUG? (xxx) - normally followed by stack trace. - - In both cases it returns the evaluated value. - - -
- -
- - - - - - - Acknowledgments - - I would like to thank Phil Kerr for his help for improvement and - corrections of this document. - - - Kevin Conder reformatted the original plain-text to the - DocBook format. - - - Giuliano Pochini corrected typos and contributed the example codes - in the hardware constraints section. - - -
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt index c0d8788e75d3..72292308d0f5 100644 --- a/Documentation/IPMI.txt +++ b/Documentation/IPMI.txt @@ -111,6 +111,8 @@ ipmi_ssif - A driver for accessing BMCs on the SMBus. It uses the I2C kernel driver's SMBus interfaces to send and receive IPMI messages over the SMBus. +ipmi_powernv - A driver for access BMCs on POWERNV systems. + ipmi_watchdog - IPMI requires systems to have a very capable watchdog timer. This driver implements the standard Linux watchdog timer interface on top of the IPMI message handler. @@ -118,17 +120,15 @@ interface on top of the IPMI message handler. ipmi_poweroff - Some systems support the ability to be turned off via IPMI commands. -These are all individually selectable via configuration options. +bt-bmc - This is not part of the main driver, but instead a driver for +accessing a BMC-side interface of a BT interface. It is used on BMCs +running Linux to provide an interface to the host. -Note that the KCS-only interface has been removed. The af_ipmi driver -is no longer supported and has been removed because it was impossible -to do 32 bit emulation on 64-bit kernels with it. +These are all individually selectable via configuration options. Much documentation for the interface is in the include files. The IPMI include files are: -net/af_ipmi.h - Contains the socket interface. - linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI. linux/ipmi_smi.h - Contains the interface for system management interfaces @@ -245,6 +245,16 @@ addressed (because some boards actually have multiple BMCs on them) and the user should not have to care what type of SMI is below them. +Watching For Interfaces + +When your code comes up, the IPMI driver may or may not have detected +if IPMI devices exist. So you might have to defer your setup until +the device is detected, or you might be able to do it immediately. +To handle this, and to allow for discovery, you register an SMI +watcher with ipmi_smi_watcher_register() to iterate over interfaces +and tell you when they come and go. + + Creating the User To user the message handler, you must first create a user using @@ -263,7 +273,7 @@ closing the device automatically destroys the user. Messaging -To send a message from kernel-land, the ipmi_request() call does +To send a message from kernel-land, the ipmi_request_settime() call does pretty much all message handling. Most of the parameter are self-explanatory. However, it takes a "msgid" parameter. This is NOT the sequence number of messages. It is simply a long value that is @@ -352,11 +362,12 @@ that for more details. The SI Driver ------------- -The SI driver allows up to 4 KCS or SMIC interfaces to be configured -in the system. By default, scan the ACPI tables for interfaces, and -if it doesn't find any the driver will attempt to register one KCS -interface at the spec-specified I/O port 0xca2 without interrupts. -You can change this at module load time (for a module) with: +The SI driver allows KCS, BT, and SMIC interfaces to be configured +in the system. It discovers interfaces through a host of different +methods, depending on the system. + +You can specify up to four interfaces on the module load line and +control some module parameters: modprobe ipmi_si.o type=,.... ports=,... addrs=,... @@ -367,7 +378,7 @@ You can change this at module load time (for a module) with: force_kipmid=,,... kipmid_max_busy_us=,,... unload_when_empty=[0|1] - trydefaults=[0|1] trydmi=[0|1] tryacpi=[0|1] + trydmi=[0|1] tryacpi=[0|1] tryplatform=[0|1] trypci=[0|1] Each of these except try... items is a list, the first item for the @@ -386,10 +397,6 @@ use the I/O port given as the device address. If you specify irqs as non-zero for an interface, the driver will attempt to use the given interrupt for the device. -trydefaults sets whether the standard IPMI interface at 0xca2 and -any interfaces specified by ACPE are tried. By default, the driver -tries it, set this value to zero to turn this off. - The other try... items disable discovery by their corresponding names. These are all enabled by default, set them to zero to disable them. The tryplatform disables openfirmware. @@ -434,7 +441,7 @@ kernel command line as: ipmi_si.type=,... ipmi_si.ports=,... ipmi_si.addrs=,... - ipmi_si.irqs=,... ipmi_si.trydefaults=[0|1] + ipmi_si.irqs=,... ipmi_si.regspacings=,,... ipmi_si.regsizes=,,... ipmi_si.regshifts=,,... @@ -444,11 +451,6 @@ kernel command line as: It works the same as the module parameters of the same names. -By default, the driver will attempt to detect any device specified by -ACPI, and if none of those then a KCS device at the spec-specified -0xca2. If you want to turn this off, set the "trydefaults" option to -false. - If your IPMI interface does not support interrupts and is a KCS or SMIC interface, the IPMI driver will start a kernel thread for the interface to help speed things up. This is a low-priority kernel @@ -500,7 +502,8 @@ at module load time (for a module) with: addr=[,[,...]] adapter=[,[...]] dbg=,... - slave_addrs=,,... + slave_addrs=,,... + tryacpi=[0|1] trydmi=[0|1] [dbg_probe=1] The addresses are normal I2C addresses. The adapter is the string @@ -513,6 +516,9 @@ spaces in kernel parameters. The debug flags are bit flags for each BMC found, they are: IPMI messages: 1, driver state: 2, timing: 4, I2C probe: 8 +The tryxxx parameters can be used to disable detecting interfaces +from various sources. + Setting dbg_probe to 1 will enable debugging of the probing and detection process for BMCs on the SMBusses. @@ -535,7 +541,8 @@ kernel command line as: ipmi_ssif.adapter=[,[...]] ipmi_ssif.dbg=[,[...]] ipmi_ssif.dbg_probe=1 - ipmi_ssif.slave_addrs=[,[...]] + ipmi_ssif.slave_addrs=[,[...]] + ipmi_ssif.tryacpi=[0|1] ipmi_ssif.trydmi=[0|1] These are the same options as on the module command line. diff --git a/Documentation/Makefile.sphinx b/Documentation/Makefile.sphinx index 92deea30b183..707c65337ebf 100644 --- a/Documentation/Makefile.sphinx +++ b/Documentation/Makefile.sphinx @@ -10,6 +10,8 @@ _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(src SPHINX_CONF = conf.py PAPER = BUILDDIR = $(obj)/output +PDFLATEX = xelatex +LATEXOPTS = -interaction=batchmode # User-friendly check for sphinx-build HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi) @@ -29,7 +31,7 @@ else ifneq ($(DOCBOOKS),) else # HAVE_SPHINX # User-friendly check for pdflatex -HAVE_PDFLATEX := $(shell if which xelatex >/dev/null 2>&1; then echo 1; else echo 0; fi) +HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi) # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 @@ -51,8 +53,8 @@ loop_cmd = $(echo-cmd) $(cmd_$(1)) # $5 reST source folder relative to $(srctree)/$(src), # e.g. "media" for the linux-tv book-set at ./Documentation/media -quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4); - cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media all;\ +quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) + cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2;\ BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ $(SPHINXBUILD) \ -b $2 \ @@ -67,16 +69,19 @@ htmldocs: @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) latexdocs: + @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) + ifeq ($(HAVE_PDFLATEX),0) - $(warning The 'xelatex' command was not found. Make sure you have it installed and in PATH to produce PDF output.) + +pdfdocs: + $(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.) @echo " SKIP Sphinx $@ target." + else # HAVE_PDFLATEX - @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) -endif # HAVE_PDFLATEX pdfdocs: latexdocs -ifneq ($(HAVE_PDFLATEX),0) - $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=xelatex LATEXOPTS="-interaction=nonstopmode" -C $(BUILDDIR)/$(var)/latex) + $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex;) + endif # HAVE_PDFLATEX epubdocs: @@ -93,6 +98,7 @@ installmandocs: cleandocs: $(Q)rm -rf $(BUILDDIR) + $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) -C Documentation/media clean endif # HAVE_SPHINX diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index a4d3838130e4..39bcb74ea733 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -547,7 +547,7 @@ The rcu_access_pointer() on line 6 is similar to It could reuse a value formerly fetched from this same pointer. It could also fetch the pointer from gp in a byte-at-a-time manner, resulting in load tearing, in turn resulting a bytewise - mash-up of two distince pointer values. + mash-up of two distinct pointer values. It might even use value-speculation optimizations, where it makes a wrong guess, but by the time it gets around to checking the value, an update has changed the pointer to match the wrong guess. @@ -659,6 +659,29 @@ systems with more than one CPU: In other words, a given instance of synchronize_rcu() can avoid waiting on a given RCU read-side critical section only if it can prove that synchronize_rcu() started first. + +

+ A related question is “When rcu_read_lock() + doesn't generate any code, why does it matter how it relates + to a grace period?” + The answer is that it is not the relationship of + rcu_read_lock() itself that is important, but rather + the relationship of the code within the enclosed RCU read-side + critical section to the code preceding and following the + grace period. + If we take this viewpoint, then a given RCU read-side critical + section begins before a given grace period when some access + preceding the grace period observes the effect of some access + within the critical section, in which case none of the accesses + within the critical section may observe the effects of any + access following the grace period. + +

+ As of late 2016, mathematical models of RCU take this + viewpoint, for example, see slides 62 and 63 + of the + 2016 LinuxCon EU + presentation.   diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 204422719197..5cbd8b2395b8 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -237,7 +237,7 @@ rcu_dereference() The reader uses rcu_dereference() to fetch an RCU-protected pointer, which returns a value that may then be safely - dereferenced. Note that rcu_deference() does not actually + dereferenced. Note that rcu_dereference() does not actually dereference the pointer, instead, it protects the pointer for later dereferencing. It also executes any needed memory-barrier instructions for a given CPU architecture. Currently, only Alpha diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 36f1dedc944c..81455705e4a6 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -1,841 +1 @@ -.. _submittingpatches: - -How to Get Your Change Into the Linux Kernel or Care And Operation Of Your Linus Torvalds -========================================================================================= - -For a person or company who wishes to submit a change to the Linux -kernel, the process can sometimes be daunting if you're not familiar -with "the system." This text is a collection of suggestions which -can greatly increase the chances of your change being accepted. - -This document contains a large number of suggestions in a relatively terse -format. For detailed information on how the kernel development process -works, see :ref:`Documentation/development-process `. -Also, read :ref:`Documentation/SubmitChecklist ` -for a list of items to check before -submitting code. If you are submitting a driver, also read -:ref:`Documentation/SubmittingDrivers `; -for device tree binding patches, read -Documentation/devicetree/bindings/submitting-patches.txt. - -Many of these steps describe the default behavior of the ``git`` version -control system; if you use ``git`` to prepare your patches, you'll find much -of the mechanical work done for you, though you'll still need to prepare -and document a sensible set of patches. In general, use of ``git`` will make -your life as a kernel developer easier. - -Creating and Sending your Change -******************************** - - -0) Obtain a current source tree -------------------------------- - -If you do not have a repository with the current kernel source handy, use -``git`` to obtain one. You'll want to start with the mainline repository, -which can be grabbed with:: - - git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git - -Note, however, that you may not want to develop against the mainline tree -directly. Most subsystem maintainers run their own trees and want to see -patches prepared against those trees. See the **T:** entry for the subsystem -in the MAINTAINERS file to find that tree, or simply ask the maintainer if -the tree is not listed there. - -It is still possible to download kernel releases via tarballs (as described -in the next section), but that is the hard way to do kernel development. - -1) ``diff -up`` ---------------- - -If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN`` -to create patches. Git generates patches in this form by default; if -you're using ``git``, you can skip this section entirely. - -All changes to the Linux kernel occur in the form of patches, as -generated by :manpage:`diff(1)`. When creating your patch, make sure to -create it in "unified diff" format, as supplied by the ``-u`` argument -to :manpage:`diff(1)`. -Also, please use the ``-p`` argument which shows which C function each -change is in - that makes the resultant ``diff`` a lot easier to read. -Patches should be based in the root kernel source directory, -not in any lower subdirectory. - -To create a patch for a single file, it is often sufficient to do:: - - SRCTREE= linux - MYFILE= drivers/net/mydriver.c - - cd $SRCTREE - cp $MYFILE $MYFILE.orig - vi $MYFILE # make your change - cd .. - diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch - -To create a patch for multiple files, you should unpack a "vanilla", -or unmodified kernel source tree, and generate a ``diff`` against your -own source tree. For example:: - - MYSRC= /devel/linux - - tar xvfz linux-3.19.tar.gz - mv linux-3.19 linux-3.19-vanilla - diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \ - linux-3.19-vanilla $MYSRC > /tmp/patch - -``dontdiff`` is a list of files which are generated by the kernel during -the build process, and should be ignored in any :manpage:`diff(1)`-generated -patch. - -Make sure your patch does not include any extra files which do not -belong in a patch submission. Make sure to review your patch -after- -generating it with :manpage:`diff(1)`, to ensure accuracy. - -If your changes produce a lot of deltas, you need to split them into -individual patches which modify things in logical stages; see -:ref:`split_changes`. This will facilitate review by other kernel developers, -very important if you want your patch accepted. - -If you're using ``git``, ``git rebase -i`` can help you with this process. If -you're not using ``git``, ``quilt`` -is another popular alternative. - -.. _describe_changes: - -2) Describe your changes ------------------------- - -Describe your problem. Whether your patch is a one-line bug fix or -5000 lines of a new feature, there must be an underlying problem that -motivated you to do this work. Convince the reviewer that there is a -problem worth fixing and that it makes sense for them to read past the -first paragraph. - -Describe user-visible impact. Straight up crashes and lockups are -pretty convincing, but not all bugs are that blatant. Even if the -problem was spotted during code review, describe the impact you think -it can have on users. Keep in mind that the majority of Linux -installations run kernels from secondary stable trees or -vendor/product-specific trees that cherry-pick only specific patches -from upstream, so include anything that could help route your change -downstream: provoking circumstances, excerpts from dmesg, crash -descriptions, performance regressions, latency spikes, lockups, etc. - -Quantify optimizations and trade-offs. If you claim improvements in -performance, memory consumption, stack footprint, or binary size, -include numbers that back them up. But also describe non-obvious -costs. Optimizations usually aren't free but trade-offs between CPU, -memory, and readability; or, when it comes to heuristics, between -different workloads. Describe the expected downsides of your -optimization so that the reviewer can weigh costs against benefits. - -Once the problem is established, describe what you are actually doing -about it in technical detail. It's important to describe the change -in plain English for the reviewer to verify that the code is behaving -as you intend it to. - -The maintainer will thank you if you write your patch description in a -form which can be easily pulled into Linux's source code management -system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`. - -Solve only one problem per patch. If your description starts to get -long, that's a sign that you probably need to split up your patch. -See :ref:`split_changes`. - -When you submit or resubmit a patch or patch series, include the -complete patch description and justification for it. Don't just -say that this is version N of the patch (series). Don't expect the -subsystem maintainer to refer back to earlier patch versions or referenced -URLs to find the patch description and put that into the patch. -I.e., the patch (series) and its description should be self-contained. -This benefits both the maintainers and reviewers. Some reviewers -probably didn't even receive earlier versions of the patch. - -Describe your changes in imperative mood, e.g. "make xyzzy do frotz" -instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy -to do frotz", as if you are giving orders to the codebase to change -its behaviour. - -If the patch fixes a logged bug entry, refer to that bug entry by -number and URL. If the patch follows from a mailing list discussion, -give a URL to the mailing list archive; use the https://lkml.kernel.org/ -redirector with a ``Message-Id``, to ensure that the links cannot become -stale. - -However, try to make your explanation understandable without external -resources. In addition to giving a URL to a mailing list archive or -bug, summarize the relevant points of the discussion that led to the -patch as submitted. - -If you want to refer to a specific commit, don't just refer to the -SHA-1 ID of the commit. Please also include the oneline summary of -the commit, to make it easier for reviewers to know what it is about. -Example:: - - Commit e21d2170f36602ae2708 ("video: remove unnecessary - platform_set_drvdata()") removed the unnecessary - platform_set_drvdata(), but left the variable "dev" unused, - delete it. - -You should also be sure to use at least the first twelve characters of the -SHA-1 ID. The kernel repository holds a *lot* of objects, making -collisions with shorter IDs a real possibility. Bear in mind that, even if -there is no collision with your six-character ID now, that condition may -change five years from now. - -If your patch fixes a bug in a specific commit, e.g. you found an issue using -``git bisect``, please use the 'Fixes:' tag with the first 12 characters of -the SHA-1 ID, and the one line summary. For example:: - - Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") - -The following ``git config`` settings can be used to add a pretty format for -outputting the above style in the ``git log`` or ``git show`` commands:: - - [core] - abbrev = 12 - [pretty] - fixes = Fixes: %h (\"%s\") - -.. _split_changes: - -3) Separate your changes ------------------------- - -Separate each **logical change** into a separate patch. - -For example, if your changes include both bug fixes and performance -enhancements for a single driver, separate those changes into two -or more patches. If your changes include an API update, and a new -driver which uses that new API, separate those into two patches. - -On the other hand, if you make a single change to numerous files, -group those changes into a single patch. Thus a single logical change -is contained within a single patch. - -The point to remember is that each patch should make an easily understood -change that can be verified by reviewers. Each patch should be justifiable -on its own merits. - -If one patch depends on another patch in order for a change to be -complete, that is OK. Simply note **"this patch depends on patch X"** -in your patch description. - -When dividing your change into a series of patches, take special care to -ensure that the kernel builds and runs properly after each patch in the -series. Developers using ``git bisect`` to track down a problem can end up -splitting your patch series at any point; they will not thank you if you -introduce bugs in the middle. - -If you cannot condense your patch set into a smaller set of patches, -then only post say 15 or so at a time and wait for review and integration. - - - -4) Style-check your changes ---------------------------- - -Check your patch for basic style violations, details of which can be -found in -:ref:`Documentation/CodingStyle `. -Failure to do so simply wastes -the reviewers time and will get your patch rejected, probably -without even being read. - -One significant exception is when moving code from one file to -another -- in this case you should not modify the moved code at all in -the same patch which moves it. This clearly delineates the act of -moving the code and your changes. This greatly aids review of the -actual differences and allows tools to better track the history of -the code itself. - -Check your patches with the patch style checker prior to submission -(scripts/checkpatch.pl). Note, though, that the style checker should be -viewed as a guide, not as a replacement for human judgment. If your code -looks better with a violation then its probably best left alone. - -The checker reports at three levels: - - ERROR: things that are very likely to be wrong - - WARNING: things requiring careful review - - CHECK: things requiring thought - -You should be able to justify all violations that remain in your -patch. - - -5) Select the recipients for your patch ---------------------------------------- - -You should always copy the appropriate subsystem maintainer(s) on any patch -to code that they maintain; look through the MAINTAINERS file and the -source code revision history to see who those maintainers are. The -script scripts/get_maintainer.pl can be very useful at this step. If you -cannot find a maintainer for the subsystem you are working on, Andrew -Morton (akpm@linux-foundation.org) serves as a maintainer of last resort. - -You should also normally choose at least one mailing list to receive a copy -of your patch set. linux-kernel@vger.kernel.org functions as a list of -last resort, but the volume on that list has caused a number of developers -to tune it out. Look in the MAINTAINERS file for a subsystem-specific -list; your patch will probably get more attention there. Please do not -spam unrelated lists, though. - -Many kernel-related lists are hosted on vger.kernel.org; you can find a -list of them at http://vger.kernel.org/vger-lists.html. There are -kernel-related lists hosted elsewhere as well, though. - -Do not send more than 15 patches at once to the vger mailing lists!!! - -Linus Torvalds is the final arbiter of all changes accepted into the -Linux kernel. His e-mail address is . -He gets a lot of e-mail, and, at this point, very few patches go through -Linus directly, so typically you should do your best to -avoid- -sending him e-mail. - -If you have a patch that fixes an exploitable security bug, send that patch -to security@kernel.org. For severe bugs, a short embargo may be considered -to allow distributors to get the patch out to users; in such cases, -obviously, the patch should not be sent to any public lists. - -Patches that fix a severe bug in a released kernel should be directed -toward the stable maintainers by putting a line like this:: - - Cc: stable@vger.kernel.org - -into the sign-off area of your patch (note, NOT an email recipient). You -should also read -:ref:`Documentation/stable_kernel_rules.txt ` -in addition to this file. - -Note, however, that some subsystem maintainers want to come to their own -conclusions on which patches should go to the stable trees. The networking -maintainer, in particular, would rather not see individual developers -adding lines like the above to their patches. - -If changes affect userland-kernel interfaces, please send the MAN-PAGES -maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at -least a notification of the change, so that some information makes its way -into the manual pages. User-space API changes should also be copied to -linux-api@vger.kernel.org. - -For small patches you may want to CC the Trivial Patch Monkey -trivial@kernel.org which collects "trivial" patches. Have a look -into the MAINTAINERS file for its current manager. - -Trivial patches must qualify for one of the following rules: - -- Spelling fixes in documentation -- Spelling fixes for errors which could break :manpage:`grep(1)` -- Warning fixes (cluttering with useless warnings is bad) -- Compilation fixes (only if they are actually correct) -- Runtime fixes (only if they actually fix things) -- Removing use of deprecated functions/macros -- Contact detail and documentation fixes -- Non-portable code replaced by portable code (even in arch-specific, - since people copy, as long as it's trivial) -- Any fix by the author/maintainer of the file (ie. patch monkey - in re-transmission mode) - - - -6) No MIME, no links, no compression, no attachments. Just plain text ----------------------------------------------------------------------- - -Linus and other kernel developers need to be able to read and comment -on the changes you are submitting. It is important for a kernel -developer to be able to "quote" your changes, using standard e-mail -tools, so that they may comment on specific portions of your code. - -For this reason, all patches should be submitted by e-mail "inline". - -.. warning:: - - Be wary of your editor's word-wrap corrupting your patch, - if you choose to cut-n-paste your patch. - -Do not attach the patch as a MIME attachment, compressed or not. -Many popular e-mail applications will not always transmit a MIME -attachment as plain text, making it impossible to comment on your -code. A MIME attachment also takes Linus a bit more time to process, -decreasing the likelihood of your MIME-attached change being accepted. - -Exception: If your mailer is mangling patches then someone may ask -you to re-send them using MIME. - -See :ref:`Documentation/email-clients.txt ` -for hints about configuring your e-mail client so that it sends your patches -untouched. - -7) E-mail size --------------- - -Large changes are not appropriate for mailing lists, and some -maintainers. If your patch, uncompressed, exceeds 300 kB in size, -it is preferred that you store your patch on an Internet-accessible -server, and provide instead a URL (link) pointing to your patch. But note -that if your patch exceeds 300 kB, it almost certainly needs to be broken up -anyway. - -8) Respond to review comments ------------------------------ - -Your patch will almost certainly get comments from reviewers on ways in -which the patch can be improved. You must respond to those comments; -ignoring reviewers is a good way to get ignored in return. Review comments -or questions that do not lead to a code change should almost certainly -bring about a comment or changelog entry so that the next reviewer better -understands what is going on. - -Be sure to tell the reviewers what changes you are making and to thank them -for their time. Code review is a tiring and time-consuming process, and -reviewers sometimes get grumpy. Even in that case, though, respond -politely and address the problems they have pointed out. - - -9) Don't get discouraged - or impatient ---------------------------------------- - -After you have submitted your change, be patient and wait. Reviewers are -busy people and may not get to your patch right away. - -Once upon a time, patches used to disappear into the void without comment, -but the development process works more smoothly than that now. You should -receive comments within a week or so; if that does not happen, make sure -that you have sent your patches to the right place. Wait for a minimum of -one week before resubmitting or pinging reviewers - possibly longer during -busy times like merge windows. - - -10) Include PATCH in the subject --------------------------------- - -Due to high e-mail traffic to Linus, and to linux-kernel, it is common -convention to prefix your subject line with [PATCH]. This lets Linus -and other kernel developers more easily distinguish patches from other -e-mail discussions. - - - -11) Sign your work ------------------- - -To improve tracking of who did what, especially with patches that can -percolate to their final resting place in the kernel through several -layers of maintainers, we've introduced a "sign-off" procedure on -patches that are being emailed around. - -The sign-off is a simple line at the end of the explanation for the -patch, which certifies that you wrote it or otherwise have the right to -pass it on as an open-source patch. The rules are pretty simple: if you -can certify the below: - -Developer's Certificate of Origin 1.1 -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -By making a contribution to this project, I certify that: - - (a) The contribution was created in whole or in part by me and I - have the right to submit it under the open source license - indicated in the file; or - - (b) The contribution is based upon previous work that, to the best - of my knowledge, is covered under an appropriate open source - license and I have the right under that license to submit that - work with modifications, whether created in whole or in part - by me, under the same open source license (unless I am - permitted to submit under a different license), as indicated - in the file; or - - (c) The contribution was provided directly to me by some other - person who certified (a), (b) or (c) and I have not modified - it. - - (d) I understand and agree that this project and the contribution - are public and that a record of the contribution (including all - personal information I submit with it, including my sign-off) is - maintained indefinitely and may be redistributed consistent with - this project or the open source license(s) involved. - -then you just add a line saying:: - - Signed-off-by: Random J Developer - -using your real name (sorry, no pseudonyms or anonymous contributions.) - -Some people also put extra tags at the end. They'll just be ignored for -now, but you can do this to mark internal company procedures or just -point out some special detail about the sign-off. - -If you are a subsystem or branch maintainer, sometimes you need to slightly -modify patches you receive in order to merge them, because the code is not -exactly the same in your tree and the submitters'. If you stick strictly to -rule (c), you should ask the submitter to rediff, but this is a totally -counter-productive waste of time and energy. Rule (b) allows you to adjust -the code, but then it is very impolite to change one submitter's code and -make him endorse your bugs. To solve this problem, it is recommended that -you add a line between the last Signed-off-by header and yours, indicating -the nature of your changes. While there is nothing mandatory about this, it -seems like prepending the description with your mail and/or name, all -enclosed in square brackets, is noticeable enough to make it obvious that -you are responsible for last-minute changes. Example:: - - Signed-off-by: Random J Developer - [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h] - Signed-off-by: Lucky K Maintainer - -This practice is particularly helpful if you maintain a stable branch and -want at the same time to credit the author, track changes, merge the fix, -and protect the submitter from complaints. Note that under no circumstances -can you change the author's identity (the From header), as it is the one -which appears in the changelog. - -Special note to back-porters: It seems to be a common and useful practice -to insert an indication of the origin of a patch at the top of the commit -message (just after the subject line) to facilitate tracking. For instance, -here's what we see in a 3.x-stable release:: - - Date: Tue Oct 7 07:26:38 2014 -0400 - - libata: Un-break ATA blacklist - - commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream. - -And here's what might appear in an older kernel once a patch is backported:: - - Date: Tue May 13 22:12:27 2008 +0200 - - wireless, airo: waitbusy() won't delay - - [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a] - -Whatever the format, this information provides a valuable help to people -tracking your trees, and to people trying to troubleshoot bugs in your -tree. - - -12) When to use Acked-by: and Cc: ---------------------------------- - -The Signed-off-by: tag indicates that the signer was involved in the -development of the patch, or that he/she was in the patch's delivery path. - -If a person was not directly involved in the preparation or handling of a -patch but wishes to signify and record their approval of it then they can -ask to have an Acked-by: line added to the patch's changelog. - -Acked-by: is often used by the maintainer of the affected code when that -maintainer neither contributed to nor forwarded the patch. - -Acked-by: is not as formal as Signed-off-by:. It is a record that the acker -has at least reviewed the patch and has indicated acceptance. Hence patch -mergers will sometimes manually convert an acker's "yep, looks good to me" -into an Acked-by: (but note that it is usually better to ask for an -explicit ack). - -Acked-by: does not necessarily indicate acknowledgement of the entire patch. -For example, if a patch affects multiple subsystems and has an Acked-by: from -one subsystem maintainer then this usually indicates acknowledgement of just -the part which affects that maintainer's code. Judgement should be used here. -When in doubt people should refer to the original discussion in the mailing -list archives. - -If a person has had the opportunity to comment on a patch, but has not -provided such comments, you may optionally add a ``Cc:`` tag to the patch. -This is the only tag which might be added without an explicit action by the -person it names - but it should indicate that this person was copied on the -patch. This tag documents that potentially interested parties -have been included in the discussion. - - -13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes: --------------------------------------------------------------------------- - -The Reported-by tag gives credit to people who find bugs and report them and it -hopefully inspires them to help us again in the future. Please note that if -the bug was reported in private, then ask for permission first before using the -Reported-by tag. - -A Tested-by: tag indicates that the patch has been successfully tested (in -some environment) by the person named. This tag informs maintainers that -some testing has been performed, provides a means to locate testers for -future patches, and ensures credit for the testers. - -Reviewed-by:, instead, indicates that the patch has been reviewed and found -acceptable according to the Reviewer's Statement: - -Reviewer's statement of oversight -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -By offering my Reviewed-by: tag, I state that: - - (a) I have carried out a technical review of this patch to - evaluate its appropriateness and readiness for inclusion into - the mainline kernel. - - (b) Any problems, concerns, or questions relating to the patch - have been communicated back to the submitter. I am satisfied - with the submitter's response to my comments. - - (c) While there may be things that could be improved with this - submission, I believe that it is, at this time, (1) a - worthwhile modification to the kernel, and (2) free of known - issues which would argue against its inclusion. - - (d) While I have reviewed the patch and believe it to be sound, I - do not (unless explicitly stated elsewhere) make any - warranties or guarantees that it will achieve its stated - purpose or function properly in any given situation. - -A Reviewed-by tag is a statement of opinion that the patch is an -appropriate modification of the kernel without any remaining serious -technical issues. Any interested reviewer (who has done the work) can -offer a Reviewed-by tag for a patch. This tag serves to give credit to -reviewers and to inform maintainers of the degree of review which has been -done on the patch. Reviewed-by: tags, when supplied by reviewers known to -understand the subject area and to perform thorough reviews, will normally -increase the likelihood of your patch getting into the kernel. - -A Suggested-by: tag indicates that the patch idea is suggested by the person -named and ensures credit to the person for the idea. Please note that this -tag should not be added without the reporter's permission, especially if the -idea was not posted in a public forum. That said, if we diligently credit our -idea reporters, they will, hopefully, be inspired to help us again in the -future. - -A Fixes: tag indicates that the patch fixes an issue in a previous commit. It -is used to make it easy to determine where a bug originated, which can help -review a bug fix. This tag also assists the stable kernel team in determining -which stable kernel versions should receive your fix. This is the preferred -method for indicating a bug fixed by the patch. See :ref:`describe_changes` -for more details. - - -14) The canonical patch format ------------------------------- - -This section describes how the patch itself should be formatted. Note -that, if you have your patches stored in a ``git`` repository, proper patch -formatting can be had with ``git format-patch``. The tools cannot create -the necessary text, though, so read the instructions below anyway. - -The canonical patch subject line is:: - - Subject: [PATCH 001/123] subsystem: summary phrase - -The canonical patch message body contains the following: - - - A ``from`` line specifying the patch author (only needed if the person - sending the patch is not the author). - - - An empty line. - - - The body of the explanation, line wrapped at 75 columns, which will - be copied to the permanent changelog to describe this patch. - - - The ``Signed-off-by:`` lines, described above, which will - also go in the changelog. - - - A marker line containing simply ``---``. - - - Any additional comments not suitable for the changelog. - - - The actual patch (``diff`` output). - -The Subject line format makes it very easy to sort the emails -alphabetically by subject line - pretty much any email reader will -support that - since because the sequence number is zero-padded, -the numerical and alphabetic sort is the same. - -The ``subsystem`` in the email's Subject should identify which -area or subsystem of the kernel is being patched. - -The ``summary phrase`` in the email's Subject should concisely -describe the patch which that email contains. The ``summary -phrase`` should not be a filename. Do not use the same ``summary -phrase`` for every patch in a whole patch series (where a ``patch -series`` is an ordered sequence of multiple, related patches). - -Bear in mind that the ``summary phrase`` of your email becomes a -globally-unique identifier for that patch. It propagates all the way -into the ``git`` changelog. The ``summary phrase`` may later be used in -developer discussions which refer to the patch. People will want to -google for the ``summary phrase`` to read discussion regarding that -patch. It will also be the only thing that people may quickly see -when, two or three months later, they are going through perhaps -thousands of patches using tools such as ``gitk`` or ``git log ---oneline``. - -For these reasons, the ``summary`` must be no more than 70-75 -characters, and it must describe both what the patch changes, as well -as why the patch might be necessary. It is challenging to be both -succinct and descriptive, but that is what a well-written summary -should do. - -The ``summary phrase`` may be prefixed by tags enclosed in square -brackets: "Subject: [PATCH ...]

". The tags are -not considered part of the summary phrase, but describe how the patch -should be treated. Common tags might include a version descriptor if -the multiple versions of the patch have been sent out in response to -comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for -comments. If there are four patches in a patch series the individual -patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures -that developers understand the order in which the patches should be -applied and that they have reviewed or applied all of the patches in -the patch series. - -A couple of example Subjects:: - - Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching - Subject: [PATCH v2 01/27] x86: fix eflags tracking - -The ``from`` line must be the very first line in the message body, -and has the form: - - From: Original Author - -The ``from`` line specifies who will be credited as the author of the -patch in the permanent changelog. If the ``from`` line is missing, -then the ``From:`` line from the email header will be used to determine -the patch author in the changelog. - -The explanation body will be committed to the permanent source -changelog, so should make sense to a competent reader who has long -since forgotten the immediate details of the discussion that might -have led to this patch. Including symptoms of the failure which the -patch addresses (kernel log messages, oops messages, etc.) is -especially useful for people who might be searching the commit logs -looking for the applicable patch. If a patch fixes a compile failure, -it may not be necessary to include _all_ of the compile failures; just -enough that it is likely that someone searching for the patch can find -it. As in the ``summary phrase``, it is important to be both succinct as -well as descriptive. - -The ``---`` marker line serves the essential purpose of marking for patch -handling tools where the changelog message ends. - -One good use for the additional comments after the ``---`` marker is for -a ``diffstat``, to show what files have changed, and the number of -inserted and deleted lines per file. A ``diffstat`` is especially useful -on bigger patches. Other comments relevant only to the moment or the -maintainer, not suitable for the permanent changelog, should also go -here. A good example of such comments might be ``patch changelogs`` -which describe what has changed between the v1 and v2 version of the -patch. - -If you are going to include a ``diffstat`` after the ``---`` marker, please -use ``diffstat`` options ``-p 1 -w 70`` so that filenames are listed from -the top of the kernel source tree and don't use too much horizontal -space (easily fit in 80 columns, maybe with some indentation). (``git`` -generates appropriate diffstats by default.) - -See more details on the proper patch format in the following -references. - -.. _explicit_in_reply_to: - -15) Explicit In-Reply-To headers --------------------------------- - -It can be helpful to manually add In-Reply-To: headers to a patch -(e.g., when using ``git send-email``) to associate the patch with -previous relevant discussion, e.g. to link a bug fix to the email with -the bug report. However, for a multi-patch series, it is generally -best to avoid using In-Reply-To: to link to older versions of the -series. This way multiple versions of the patch don't become an -unmanageable forest of references in email clients. If a link is -helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in -the cover email text) to link to an earlier version of the patch series. - - -16) Sending ``git pull`` requests ---------------------------------- - -If you have a series of patches, it may be most convenient to have the -maintainer pull them directly into the subsystem repository with a -``git pull`` operation. Note, however, that pulling patches from a developer -requires a higher degree of trust than taking patches from a mailing list. -As a result, many subsystem maintainers are reluctant to take pull -requests, especially from new, unknown developers. If in doubt you can use -the pull request as the cover letter for a normal posting of the patch -series, giving the maintainer the option of using either. - -A pull request should have [GIT] or [PULL] in the subject line. The -request itself should include the repository name and the branch of -interest on a single line; it should look something like:: - - Please pull from - - git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus - - to get these changes: - -A pull request should also include an overall message saying what will be -included in the request, a ``git shortlog`` listing of the patches -themselves, and a ``diffstat`` showing the overall effect of the patch series. -The easiest way to get all this information together is, of course, to let -``git`` do it for you with the ``git request-pull`` command. - -Some maintainers (including Linus) want to see pull requests from signed -commits; that increases their confidence that the request actually came -from you. Linus, in particular, will not pull from public hosting sites -like GitHub in the absence of a signed tag. - -The first step toward creating such tags is to make a GNUPG key and get it -signed by one or more core kernel developers. This step can be hard for -new developers, but there is no way around it. Attending conferences can -be a good way to find developers who can sign your key. - -Once you have prepared a patch series in ``git`` that you wish to have somebody -pull, create a signed tag with ``git tag -s``. This will create a new tag -identifying the last commit in the series and containing a signature -created with your private key. You will also have the opportunity to add a -changelog-style message to the tag; this is an ideal place to describe the -effects of the pull request as a whole. - -If the tree the maintainer will be pulling from is not the repository you -are working from, don't forget to push the signed tag explicitly to the -public tree. - -When generating your pull request, use the signed tag as the target. A -command like this will do the trick:: - - git request-pull master git://my.public.tree/linux.git my-signed-tag - - -REFERENCES -********** - -Andrew Morton, "The perfect patch" (tpp). - - -Jeff Garzik, "Linux kernel patch submission format". - - -Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer". - - - - - - - - - - - - -NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! - - -Kernel Documentation/CodingStyle: - :ref:`Documentation/CodingStyle ` - -Linus Torvalds's mail on the canonical patch format: - - -Andi Kleen, "On submitting kernel patches" - Some strategies to get difficult or controversial changes in. - - http://halobates.de/on-submitting-patches.pdf - +This file has moved to process/submitting-patches.rst diff --git a/Documentation/VGA-softcursor.txt b/Documentation/VGA-softcursor.txt deleted file mode 100644 index 70acfbf399eb..000000000000 --- a/Documentation/VGA-softcursor.txt +++ /dev/null @@ -1,39 +0,0 @@ -Software cursor for VGA by Pavel Machek -======================= and Martin Mares - - Linux now has some ability to manipulate cursor appearance. Normally, you -can set the size of hardware cursor (and also work around some ugly bugs in -those miserable Trident cards--see #define TRIDENT_GLITCH in drivers/video/ -vgacon.c). You can now play a few new tricks: you can make your cursor look -like a non-blinking red block, make it inverse background of the character it's -over or to highlight that character and still choose whether the original -hardware cursor should remain visible or not. There may be other things I have -never thought of. - - The cursor appearance is controlled by a "[?1;2;3c" escape sequence -where 1, 2 and 3 are parameters described below. If you omit any of them, -they will default to zeroes. - - Parameter 1 specifies cursor size (0=default, 1=invisible, 2=underline, ..., -8=full block) + 16 if you want the software cursor to be applied + 32 if you -want to always change the background color + 64 if you dislike having the -background the same as the foreground. Highlights are ignored for the last two -flags. - - The second parameter selects character attribute bits you want to change -(by simply XORing them with the value of this parameter). On standard VGA, -the high four bits specify background and the low four the foreground. In both -groups, low three bits set color (as in normal color codes used by the console) -and the most significant one turns on highlight (or sometimes blinking--it -depends on the configuration of your VGA). - - The third parameter consists of character attribute bits you want to set. -Bit setting takes place before bit toggling, so you can simply clear a bit by -including it in both the set mask and the toggle mask. - -Examples: -========= - -To get normal blinking underline, use: echo -e '\033[?2c' -To get blinking block, use: echo -e '\033[?6c' -To get red non-blinking block, use: echo -e '\033[?17;0;64c' diff --git a/Documentation/acpi/DSD-properties-rules.txt b/Documentation/acpi/DSD-properties-rules.txt new file mode 100644 index 000000000000..3e4862bdad98 --- /dev/null +++ b/Documentation/acpi/DSD-properties-rules.txt @@ -0,0 +1,97 @@ +_DSD Device Properties Usage Rules +---------------------------------- + +Properties, Property Sets and Property Subsets +---------------------------------------------- + +The _DSD (Device Specific Data) configuration object, introduced in ACPI 5.1, +allows any type of device configuration data to be provided via the ACPI +namespace. In principle, the format of the data may be arbitrary, but it has to +be identified by a UUID which must be recognized by the driver processing the +_DSD output. However, there are generic UUIDs defined for _DSD recognized by +the ACPI subsystem in the Linux kernel which automatically processes the data +packages associated with them and makes those data available to device drivers +as "device properties". + +A device property is a data item consisting of a string key and a value (of a +specific type) associated with it. + +In the ACPI _DSD context it is an element of the sub-package following the +generic Device Properties UUID in the _DSD return package as specified in the +Device Properties UUID definition document [1]. + +It also may be regarded as the definition of a key and the associated data type +that can be returned by _DSD in the Device Properties UUID sub-package for a +given device. + +A property set is a collection of properties applicable to a hardware entity +like a device. In the ACPI _DSD context it is the set of all properties that +can be returned in the Device Properties UUID sub-package for the device in +question. + +Property subsets are nested collections of properties. Each of them is +associated with an additional key (name) allowing the subset to be referred +to as a whole (and to be treated as a separate entity). The canonical +representation of property subsets is via the mechanism specified in the +Hierarchical Properties Extension UUID definition document [2]. + +Property sets may be hierarchical. That is, a property set may contain +multiple property subsets that each may contain property subsets of its +own and so on. + +General Validity Rule for Property Sets +--------------------------------------- + +Valid property sets must follow the guidance given by the Device Properties UUID +definition document [1]. + +_DSD properties are intended to be used in addition to, and not instead of, the +existing mechanisms defined by the ACPI specification. Therefore, as a rule, +they should only be used if the ACPI specification does not make direct +provisions for handling the underlying use case. It generally is invalid to +return property sets which do not follow that rule from _DSD in data packages +associated with the Device Properties UUID. + +Additional Considerations +------------------------- + +There are cases in which, even if the general rule given above is followed in +principle, the property set may still not be regarded as a valid one. + +For example, that applies to device properties which may cause kernel code +(either a device driver or a library/subsystem) to access hardware in a way +possibly leading to a conflict with AML methods in the ACPI namespace. In +particular, that may happen if the kernel code uses device properties to +manipulate hardware normally controlled by ACPI methods related to power +management, like _PSx and _DSW (for device objects) or _ON and _OFF (for power +resource objects), or by ACPI device disabling/enabling methods, like _DIS and +_SRS. + +In all cases in which kernel code may do something that will confuse AML as a +result of using device properties, the device properties in question are not +suitable for the ACPI environment and consequently they cannot belong to a valid +property set. + +Property Sets and Device Tree Bindings +-------------------------------------- + +It often is useful to make _DSD return property sets that follow Device Tree +bindings. + +In those cases, however, the above validity considerations must be taken into +account in the first place and returning invalid property sets from _DSD must be +avoided. For this reason, it may not be possible to make _DSD return a property +set following the given DT binding literally and completely. Still, for the +sake of code re-use, it may make sense to provide as much of the configuration +data as possible in the form of device properties and complement that with an +ACPI-specific mechanism suitable for the use case at hand. + +In any case, property sets following DT bindings literally should not be +expected to automatically work in the ACPI environment regardless of their +contents. + +References +---------- + +[1] http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf +[2] http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf diff --git a/Documentation/acpi/enumeration.txt b/Documentation/acpi/enumeration.txt index a91ec5af52df..209a5eba6b87 100644 --- a/Documentation/acpi/enumeration.txt +++ b/Documentation/acpi/enumeration.txt @@ -415,3 +415,12 @@ the "compatible" property in the _DSD or a _CID as long as one of their ancestors provides a _DSD with a valid "compatible" property. Such device objects are then simply regarded as additional "blocks" providing hierarchical configuration information to the driver of the composite ancestor device. + +However, PRP0001 can only be returned from either _HID or _CID of a device +object if all of the properties returned by the _DSD associated with it (either +the _DSD of the device object itself or the _DSD of its ancestor in the +"composite device" case described above) can be used in the ACPI environment. +Otherwise, the _DSD itself is regarded as invalid and therefore the "compatible" +property returned by it is meaningless. + +Refer to DSD-properties-rules.txt for more information. diff --git a/Documentation/acpi/gpio-properties.txt b/Documentation/acpi/gpio-properties.txt index 5aafe0b351a1..2aff0349facd 100644 --- a/Documentation/acpi/gpio-properties.txt +++ b/Documentation/acpi/gpio-properties.txt @@ -51,6 +51,68 @@ it to 1 marks the GPIO as active low. In our Bluetooth example the "reset-gpios" refers to the second GpioIo() resource, second pin in that resource with the GPIO number of 31. +It is possible to leave holes in the array of GPIOs. This is useful in +cases like with SPI host controllers where some chip selects may be +implemented as GPIOs and some as native signals. For example a SPI host +controller can have chip selects 0 and 2 implemented as GPIOs and 1 as +native: + + Package () { + "cs-gpios", + Package () { + ^GPIO, 19, 0, 0, // chip select 0: GPIO + 0, // chip select 1: native signal + ^GPIO, 20, 0, 0, // chip select 2: GPIO + } + } + +Other supported properties +-------------------------- + +Following Device Tree compatible device properties are also supported by +_DSD device properties for GPIO controllers: + +- gpio-hog +- output-high +- output-low +- input +- line-name + +Example: + + Name (_DSD, Package () { + // _DSD Hierarchical Properties Extension UUID + ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"), + Package () { + Package () {"hog-gpio8", "G8PU"} + } + }) + + Name (G8PU, Package () { + ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), + Package () { + Package () {"gpio-hog", 1}, + Package () {"gpios", Package () {8, 0}}, + Package () {"output-high", 1}, + Package () {"line-name", "gpio8-pullup"}, + } + }) + +- gpio-line-names + +Example: + + Package () { + "gpio-line-names", + Package () { + "SPI0_CS_N", "EXP2_INT", "MUX6_IO", "UART0_RXD", "MUX7_IO", + "LVL_C_A1", "MUX0_IO", "SPI1_MISO" + } + } + +See Documentation/devicetree/bindings/gpio/gpio.txt for more information +about these properties. + ACPI GPIO Mappings Provided by Drivers -------------------------------------- diff --git a/Documentation/acpi/osi.txt b/Documentation/acpi/osi.txt new file mode 100644 index 000000000000..50cde0ceb9b0 --- /dev/null +++ b/Documentation/acpi/osi.txt @@ -0,0 +1,187 @@ +ACPI _OSI and _REV methods +-------------------------- + +An ACPI BIOS can use the "Operating System Interfaces" method (_OSI) +to find out what the operating system supports. Eg. If BIOS +AML code includes _OSI("XYZ"), the kernel's AML interpreter +can evaluate that method, look to see if it supports 'XYZ' +and answer YES or NO to the BIOS. + +The ACPI _REV method returns the "Revision of the ACPI specification +that OSPM supports" + +This document explains how and why the BIOS and Linux should use these methods. +It also explains how and why they are widely misused. + +How to use _OSI +--------------- + +Linux runs on two groups of machines -- those that are tested by the OEM +to be compatible with Linux, and those that were never tested with Linux, +but where Linux was installed to replace the original OS (Windows or OSX). + +The larger group is the systems tested to run only Windows. Not only that, +but many were tested to run with just one specific version of Windows. +So even though the BIOS may use _OSI to query what version of Windows is running, +only a single path through the BIOS has actually been tested. +Experience shows that taking untested paths through the BIOS +exposes Linux to an entire category of BIOS bugs. +For this reason, Linux _OSI defaults must continue to claim compatibility +with all versions of Windows. + +But Linux isn't actually compatible with Windows, and the Linux community +has also been hurt with regressions when Linux adds the latest version of +Windows to its list of _OSI strings. So it is possible that additional strings +will be more thoroughly vetted before shipping upstream in the future. +But it is likely that they will all eventually be added. + +What should an OEM do if they want to support Linux and Windows +using the same BIOS image? Often they need to do something different +for Linux to deal with how Linux is different from Windows. +Here the BIOS should ask exactly what it wants to know: + +_OSI("Linux-OEM-my_interface_name") +where 'OEM' is needed if this is an OEM-specific hook, +and 'my_interface_name' describes the hook, which could be a +quirk, a bug, or a bug-fix. + +In addition, the OEM should send a patch to upstream Linux +via the linux-acpi@vger.kernel.org mailing list. When that patch +is checked into Linux, the OS will answer "YES" when the BIOS +on the OEM's system uses _OSI to ask if the interface is supported +by the OS. Linux distributors can back-port that patch for Linux +pre-installs, and it will be included by all distributions that +re-base to upstream. If the distribution can not update the kernel binary, +they can also add an acpi_osi=Linux-OEM-my_interface_name +cmdline parameter to the boot loader, as needed. + +If the string refers to a feature where the upstream kernel +eventually grows support, a patch should be sent to remove +the string when that support is added to the kernel. + +That was easy. Read on, to find out how to do it wrong. + +Before _OSI, there was _OS +-------------------------- + +ACPI 1.0 specified "_OS" as an +"object that evaluates to a string that identifies the operating system." + +The ACPI BIOS flow would include an evaluation of _OS, and the AML +interpreter in the kernel would return to it a string identifying the OS: + +Windows 98, SE: "Microsoft Windows" +Windows ME: "Microsoft WindowsME:Millenium Edition" +Windows NT: "Microsoft Windows NT" + +The idea was on a platform tasked with running multiple OS's, +the BIOS could use _OS to enable devices that an OS +might support, or enable quirks or bug workarounds +necessary to make the platform compatible with that pre-existing OS. + +But _OS had fundamental problems. First, the BIOS needed to know the name +of every possible version of the OS that would run on it, and needed to know +all the quirks of those OS's. Certainly it would make more sense +for the BIOS to ask *specific* things of the OS, such +"do you support a specific interface", and thus in ACPI 3.0, +_OSI was born to replace _OS. + +_OS was abandoned, though even today, many BIOS look for +_OS "Microsoft Windows NT", though it seems somewhat far-fetched +that anybody would install those old operating systems +over what came with the machine. + +Linux answers "Microsoft Windows NT" to please that BIOS idiom. +That is the *only* viable strategy, as that is what modern Windows does, +and so doing otherwise could steer the BIOS down an untested path. + +_OSI is born, and immediately misused +-------------------------------------- + +With _OSI, the *BIOS* provides the string describing an interface, +and asks the OS: "YES/NO, are you compatible with this interface?" + +eg. _OSI("3.0 Thermal Model") would return TRUE if the OS knows how +to deal with the thermal extensions made to the ACPI 3.0 specification. +An old OS that doesn't know about those extensions would answer FALSE, +and a new OS may be able to return TRUE. + +For an OS-specific interface, the ACPI spec said that the BIOS and the OS +were to agree on a string of the form such as "Windows-interface_name". + +But two bad things happened. First, the Windows ecosystem used _OSI +not as designed, but as a direct replacement for _OS -- identifying +the OS version, rather than an OS supported interface. Indeed, right +from the start, the ACPI 3.0 spec itself codified this misuse +in example code using _OSI("Windows 2001"). + +This misuse was adopted and continues today. + +Linux had no choice but to also return TRUE to _OSI("Windows 2001") +and its successors. To do otherwise would virtually guarantee breaking +a BIOS that has been tested only with that _OSI returning TRUE. + +This strategy is problematic, as Linux is never completely compatible with +the latest version of Windows, and sometimes it takes more than a year +to iron out incompatibilities. + +Not to be out-done, the Linux community made things worse by returning TRUE +to _OSI("Linux"). Doing so is even worse than the Windows misuse +of _OSI, as "Linux" does not even contain any version information. +_OSI("Linux") led to some BIOS' malfunctioning due to BIOS writer's +using it in untested BIOS flows. But some OEM's used _OSI("Linux") +in tested flows to support real Linux features. In 2009, Linux +removed _OSI("Linux"), and added a cmdline parameter to restore it +for legacy systems still needed it. Further a BIOS_BUG warning prints +for all BIOS's that invoke it. + +No BIOS should use _OSI("Linux"). + +The result is a strategy for Linux to maximize compatibility with +ACPI BIOS that are tested on Windows machines. There is a real risk +of over-stating that compatibility; but the alternative has often been +catastrophic failure resulting from the BIOS taking paths that +were never validated under *any* OS. + +Do not use _REV +--------------- + +Since _OSI("Linux") went away, some BIOS writers used _REV +to support Linux and Windows differences in the same BIOS. + +_REV was defined in ACPI 1.0 to return the version of ACPI +supported by the OS and the OS AML interpreter. + +Modern Windows returns _REV = 2. Linux used ACPI_CA_SUPPORT_LEVEL, +which would increment, based on the version of the spec supported. + +Unfortunately, _REV was also misused. eg. some BIOS would check +for _REV = 3, and do something for Linux, but when Linux returned +_REV = 4, that support broke. + +In response to this problem, Linux returns _REV = 2 always, +from mid-2015 onward. The ACPI specification will also be updated +to reflect that _REV is deprecated, and always returns 2. + +Apple Mac and _OSI("Darwin") +---------------------------- + +On Apple's Mac platforms, the ACPI BIOS invokes _OSI("Darwin") +to determine if the machine is running Apple OSX. + +Like Linux's _OSI("*Windows*") strategy, Linux defaults to +answering YES to _OSI("Darwin") to enable full access +to the hardware and validated BIOS paths seen by OSX. +Just like on Windows-tested platforms, this strategy has risks. + +Starting in Linux-3.18, the kernel answered YES to _OSI("Darwin") +for the purpose of enabling Mac Thunderbolt support. Further, +if the kernel noticed _OSI("Darwin") being invoked, it additionally +disabled all _OSI("*Windows*") to keep poorly written Mac BIOS +from going down untested combinations of paths. + +The Linux-3.18 change in default caused power regressions on Mac +laptops, and the 3.18 implementation did not allow changing +the default via cmdline "acpi_osi=!Darwin". Linux-4.7 fixed +the ability to use acpi_osi=!Darwin as a workaround, and +we hope to see Mac Thunderbolt power management support in Linux-4.11. diff --git a/Documentation/acpi/video_extension.txt b/Documentation/acpi/video_extension.txt index 78b32ac02466..79bf6a4921be 100644 --- a/Documentation/acpi/video_extension.txt +++ b/Documentation/acpi/video_extension.txt @@ -101,6 +101,6 @@ received a notification, it will set the backlight level accordingly. This does not affect the sending of event to user space, they are always sent to user space regardless of whether or not the video module controls the backlight level directly. This behaviour can be controlled through the brightness_switch_enabled -module parameter as documented in kernel-parameters.txt. It is recommended to +module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to disable this behaviour once a GUI environment starts up and wants to have full control of the backlight level. diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst new file mode 100644 index 000000000000..1b6dfb2b3adb --- /dev/null +++ b/Documentation/admin-guide/README.rst @@ -0,0 +1,411 @@ +Linux kernel release 4.x +============================================= + +These are the release notes for Linux version 4. Read them carefully, +as they tell you what this is all about, explain how to install the +kernel, and what to do if something goes wrong. + +What is Linux? +-------------- + + Linux is a clone of the operating system Unix, written from scratch by + Linus Torvalds with assistance from a loosely-knit team of hackers across + the Net. It aims towards POSIX and Single UNIX Specification compliance. + + It has all the features you would expect in a modern fully-fledged Unix, + including true multitasking, virtual memory, shared libraries, demand + loading, shared copy-on-write executables, proper memory management, + and multistack networking including IPv4 and IPv6. + + It is distributed under the GNU General Public License - see the + accompanying COPYING file for more details. + +On what hardware does it run? +----------------------------- + + Although originally developed first for 32-bit x86-based PCs (386 or higher), + today Linux also runs on (at least) the Compaq Alpha AXP, Sun SPARC and + UltraSPARC, Motorola 68000, PowerPC, PowerPC64, ARM, Hitachi SuperH, Cell, + IBM S/390, MIPS, HP PA-RISC, Intel IA-64, DEC VAX, AMD x86-64, AXIS CRIS, + Xtensa, Tilera TILE, AVR32, ARC and Renesas M32R architectures. + + Linux is easily portable to most general-purpose 32- or 64-bit architectures + as long as they have a paged memory management unit (PMMU) and a port of the + GNU C compiler (gcc) (part of The GNU Compiler Collection, GCC). Linux has + also been ported to a number of architectures without a PMMU, although + functionality is then obviously somewhat limited. + Linux has also been ported to itself. You can now run the kernel as a + userspace application - this is called UserMode Linux (UML). + +Documentation +------------- + + - There is a lot of documentation available both in electronic form on + the Internet and in books, both Linux-specific and pertaining to + general UNIX questions. I'd recommend looking into the documentation + subdirectories on any Linux FTP site for the LDP (Linux Documentation + Project) books. This README is not meant to be documentation on the + system: there are much better sources available. + + - There are various README files in the Documentation/ subdirectory: + these typically contain kernel-specific installation notes for some + drivers for example. See Documentation/00-INDEX for a list of what + is contained in each file. Please read the + :ref:`Documentation/process/changes.rst ` file, as it + contains information about the problems, which may result by upgrading + your kernel. + + - The Documentation/DocBook/ subdirectory contains several guides for + kernel developers and users. These guides can be rendered in a + number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others. + After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``, + or ``make mandocs`` will render the documentation in the requested format. + +Installing the kernel source +---------------------------- + + - If you install the full sources, put the kernel tarball in a + directory where you have permissions (e.g. your home directory) and + unpack it:: + + xz -cd linux-4.X.tar.xz | tar xvf - + + Replace "X" with the version number of the latest kernel. + + Do NOT use the /usr/src/linux area! This area has a (usually + incomplete) set of kernel headers that are used by the library header + files. They should match the library, and not get messed up by + whatever the kernel-du-jour happens to be. + + - You can also upgrade between 4.x releases by patching. Patches are + distributed in the xz format. To install by patching, get all the + newer patch files, enter the top level directory of the kernel source + (linux-4.X) and execute:: + + xz -cd ../patch-4.x.xz | patch -p1 + + Replace "x" for all versions bigger than the version "X" of your current + source tree, **in_order**, and you should be ok. You may want to remove + the backup files (some-file-name~ or some-file-name.orig), and make sure + that there are no failed patches (some-file-name# or some-file-name.rej). + If there are, either you or I have made a mistake. + + Unlike patches for the 4.x kernels, patches for the 4.x.y kernels + (also known as the -stable kernels) are not incremental but instead apply + directly to the base 4.x kernel. For example, if your base kernel is 4.0 + and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1 + and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and + want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is, + patch -R) **before** applying the 4.0.3 patch. You can read more on this in + :ref:`Documentation/process/applying-patches.rst `. + + Alternatively, the script patch-kernel can be used to automate this + process. It determines the current kernel version and applies any + patches found:: + + linux/scripts/patch-kernel linux + + The first argument in the command above is the location of the + kernel source. Patches are applied from the current directory, but + an alternative directory can be specified as the second argument. + + - Make sure you have no stale .o files and dependencies lying around:: + + cd linux + make mrproper + + You should now have the sources correctly installed. + +Software requirements +--------------------- + + Compiling and running the 4.x kernels requires up-to-date + versions of various software packages. Consult + :ref:`Documentation/process/changes.rst ` for the minimum version numbers + required and how to get updates for these packages. Beware that using + excessively old versions of these packages can cause indirect + errors that are very difficult to track down, so don't assume that + you can just update packages when obvious problems arise during + build or operation. + +Build directory for the kernel +------------------------------ + + When compiling the kernel, all output files will per default be + stored together with the kernel source code. + Using the option ``make O=output/dir`` allows you to specify an alternate + place for the output files (including .config). + Example:: + + kernel source code: /usr/src/linux-4.X + build directory: /home/name/build/kernel + + To configure and build the kernel, use:: + + cd /usr/src/linux-4.X + make O=/home/name/build/kernel menuconfig + make O=/home/name/build/kernel + sudo make O=/home/name/build/kernel modules_install install + + Please note: If the ``O=output/dir`` option is used, then it must be + used for all invocations of make. + +Configuring the kernel +---------------------- + + Do not skip this step even if you are only upgrading one minor + version. New configuration options are added in each release, and + odd problems will turn up if the configuration files are not set up + as expected. If you want to carry your existing configuration to a + new version with minimal work, use ``make oldconfig``, which will + only ask you for the answers to new questions. + + - Alternative configuration commands are:: + + "make config" Plain text interface. + + "make menuconfig" Text based color menus, radiolists & dialogs. + + "make nconfig" Enhanced text based color menus. + + "make xconfig" Qt based configuration tool. + + "make gconfig" GTK+ based configuration tool. + + "make oldconfig" Default all questions based on the contents of + your existing ./.config file and asking about + new config symbols. + + "make silentoldconfig" + Like above, but avoids cluttering the screen + with questions already answered. + Additionally updates the dependencies. + + "make olddefconfig" + Like above, but sets new symbols to their default + values without prompting. + + "make defconfig" Create a ./.config file by using the default + symbol values from either arch/$ARCH/defconfig + or arch/$ARCH/configs/${PLATFORM}_defconfig, + depending on the architecture. + + "make ${PLATFORM}_defconfig" + Create a ./.config file by using the default + symbol values from + arch/$ARCH/configs/${PLATFORM}_defconfig. + Use "make help" to get a list of all available + platforms of your architecture. + + "make allyesconfig" + Create a ./.config file by setting symbol + values to 'y' as much as possible. + + "make allmodconfig" + Create a ./.config file by setting symbol + values to 'm' as much as possible. + + "make allnoconfig" Create a ./.config file by setting symbol + values to 'n' as much as possible. + + "make randconfig" Create a ./.config file by setting symbol + values to random values. + + "make localmodconfig" Create a config based on current config and + loaded modules (lsmod). Disables any module + option that is not needed for the loaded modules. + + To create a localmodconfig for another machine, + store the lsmod of that machine into a file + and pass it in as a LSMOD parameter. + + target$ lsmod > /tmp/mylsmod + target$ scp /tmp/mylsmod host:/tmp + + host$ make LSMOD=/tmp/mylsmod localmodconfig + + The above also works when cross compiling. + + "make localyesconfig" Similar to localmodconfig, except it will convert + all module options to built in (=y) options. + + You can find more information on using the Linux kernel config tools + in Documentation/kbuild/kconfig.txt. + + - NOTES on ``make config``: + + - Having unnecessary drivers will make the kernel bigger, and can + under some circumstances lead to problems: probing for a + nonexistent controller card may confuse your other controllers + + - A kernel with math-emulation compiled in will still use the + coprocessor if one is present: the math emulation will just + never get used in that case. The kernel will be slightly larger, + but will work on different machines regardless of whether they + have a math coprocessor or not. + + - The "kernel hacking" configuration details usually result in a + bigger or slower kernel (or both), and can even make the kernel + less stable by configuring some routines to actively try to + break bad code to find kernel problems (kmalloc()). Thus you + should probably answer 'n' to the questions for "development", + "experimental", or "debugging" features. + +Compiling the kernel +-------------------- + + - Make sure you have at least gcc 3.2 available. + For more information, refer to :ref:`Documentation/process/changes.rst `. + + Please note that you can still run a.out user programs with this kernel. + + - Do a ``make`` to create a compressed kernel image. It is also + possible to do ``make install`` if you have lilo installed to suit the + kernel makefiles, but you may want to check your particular lilo setup first. + + To do the actual install, you have to be root, but none of the normal + build should require that. Don't take the name of root in vain. + + - If you configured any of the parts of the kernel as ``modules``, you + will also have to do ``make modules_install``. + + - Verbose kernel compile/build output: + + Normally, the kernel build system runs in a fairly quiet mode (but not + totally silent). However, sometimes you or other kernel developers need + to see compile, link, or other commands exactly as they are executed. + For this, use "verbose" build mode. This is done by passing + ``V=1`` to the ``make`` command, e.g.:: + + make V=1 all + + To have the build system also tell the reason for the rebuild of each + target, use ``V=2``. The default is ``V=0``. + + - Keep a backup kernel handy in case something goes wrong. This is + especially true for the development releases, since each new release + contains new code which has not been debugged. Make sure you keep a + backup of the modules corresponding to that kernel, as well. If you + are installing a new kernel with the same version number as your + working kernel, make a backup of your modules directory before you + do a ``make modules_install``. + + Alternatively, before compiling, use the kernel config option + "LOCALVERSION" to append a unique suffix to the regular kernel version. + LOCALVERSION can be set in the "General Setup" menu. + + - In order to boot your new kernel, you'll need to copy the kernel + image (e.g. .../linux/arch/x86/boot/bzImage after compilation) + to the place where your regular bootable kernel is found. + + - Booting a kernel directly from a floppy without the assistance of a + bootloader such as LILO, is no longer supported. + + If you boot Linux from the hard drive, chances are you use LILO, which + uses the kernel image as specified in the file /etc/lilo.conf. The + kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or + /boot/bzImage. To use the new kernel, save a copy of the old image + and copy the new image over the old one. Then, you MUST RERUN LILO + to update the loading map! If you don't, you won't be able to boot + the new kernel image. + + Reinstalling LILO is usually a matter of running /sbin/lilo. + You may wish to edit /etc/lilo.conf to specify an entry for your + old kernel image (say, /vmlinux.old) in case the new one does not + work. See the LILO docs for more information. + + After reinstalling LILO, you should be all set. Shutdown the system, + reboot, and enjoy! + + If you ever need to change the default root device, video mode, + ramdisk size, etc. in the kernel image, use the ``rdev`` program (or + alternatively the LILO boot options when appropriate). No need to + recompile the kernel to change these parameters. + + - Reboot with the new kernel and enjoy. + +If something goes wrong +----------------------- + + - If you have problems that seem to be due to kernel bugs, please check + the file MAINTAINERS to see if there is a particular person associated + with the part of the kernel that you are having trouble with. If there + isn't anyone listed there, then the second best thing is to mail + them to me (torvalds@linux-foundation.org), and possibly to any other + relevant mailing-list or to the newsgroup. + + - In all bug-reports, *please* tell what kernel you are talking about, + how to duplicate the problem, and what your setup is (use your common + sense). If the problem is new, tell me so, and if the problem is + old, please try to tell me when you first noticed it. + + - If the bug results in a message like:: + + unable to handle kernel paging request at address C0000010 + Oops: 0002 + EIP: 0010:XXXXXXXX + eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx + esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx + ds: xxxx es: xxxx fs: xxxx gs: xxxx + Pid: xx, process nr: xx + xx xx xx xx xx xx xx xx xx xx + + or similar kernel debugging information on your screen or in your + system log, please duplicate it *exactly*. The dump may look + incomprehensible to you, but it does contain information that may + help debugging the problem. The text above the dump is also + important: it tells something about why the kernel dumped code (in + the above example, it's due to a bad kernel pointer). More information + on making sense of the dump is in Documentation/admin-guide/oops-tracing.rst + + - If you compiled the kernel with CONFIG_KALLSYMS you can send the dump + as is, otherwise you will have to use the ``ksymoops`` program to make + sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred). + This utility can be downloaded from + ftp://ftp..kernel.org/pub/linux/utils/kernel/ksymoops/ . + Alternatively, you can do the dump lookup by hand: + + - In debugging dumps like the above, it helps enormously if you can + look up what the EIP value means. The hex value as such doesn't help + me or anybody else very much: it will depend on your particular + kernel setup. What you should do is take the hex value from the EIP + line (ignore the ``0010:``), and look it up in the kernel namelist to + see which kernel function contains the offending address. + + To find out the kernel function name, you'll need to find the system + binary associated with the kernel that exhibited the symptom. This is + the file 'linux/vmlinux'. To extract the namelist and match it against + the EIP from the kernel crash, do:: + + nm vmlinux | sort | less + + This will give you a list of kernel addresses sorted in ascending + order, from which it is simple to find the function that contains the + offending address. Note that the address given by the kernel + debugging messages will not necessarily match exactly with the + function addresses (in fact, that is very unlikely), so you can't + just 'grep' the list: the list will, however, give you the starting + point of each kernel function, so by looking for the function that + has a starting address lower than the one you are searching for but + is followed by a function with a higher address you will find the one + you want. In fact, it may be a good idea to include a bit of + "context" in your problem report, giving a few lines around the + interesting one. + + If you for some reason cannot do the above (you have a pre-compiled + kernel image or similar), telling me as much about your setup as + possible will help. Please read the :ref:`admin-guide/reporting-bugs.rst ` + document for details. + + - Alternatively, you can use gdb on a running kernel. (read-only; i.e. you + cannot change values or set break points.) To do this, first compile the + kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make + clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``). + + After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``. + You can now use all the usual gdb commands. The command to look up the + point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes + with the EIP value.) + + gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly) + disregards the starting offset for which the kernel is compiled. diff --git a/Documentation/admin-guide/binfmt-misc.rst b/Documentation/admin-guide/binfmt-misc.rst new file mode 100644 index 000000000000..97b0d7927078 --- /dev/null +++ b/Documentation/admin-guide/binfmt-misc.rst @@ -0,0 +1,151 @@ +Kernel Support for miscellaneous (your favourite) Binary Formats v1.1 +===================================================================== + +This Kernel feature allows you to invoke almost (for restrictions see below) +every program by simply typing its name in the shell. +This includes for example compiled Java(TM), Python or Emacs programs. + +To achieve this you must tell binfmt_misc which interpreter has to be invoked +with which binary. Binfmt_misc recognises the binary-type by matching some bytes +at the beginning of the file with a magic byte sequence (masking out specified +bits) you have supplied. Binfmt_misc can also recognise a filename extension +aka ``.com`` or ``.exe``. + +First you must mount binfmt_misc:: + + mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc + +To actually register a new binary type, you have to set up a string looking like +``:name:type:offset:magic:mask:interpreter:flags`` (where you can choose the +``:`` upon your needs) and echo it to ``/proc/sys/fs/binfmt_misc/register``. + +Here is what the fields mean: + +- ``name`` + is an identifier string. A new /proc file will be created with this + ``name below /proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for + obvious reasons. +- ``type`` + is the type of recognition. Give ``M`` for magic and ``E`` for extension. +- ``offset`` + is the offset of the magic/mask in the file, counted in bytes. This + defaults to 0 if you omit it (i.e. you write ``:name:type::magic...``). + Ignored when using filename extension matching. +- ``magic`` + is the byte sequence binfmt_misc is matching for. The magic string + may contain hex-encoded characters like ``\x0a`` or ``\xA4``. Note that you + must escape any NUL bytes; parsing halts at the first one. In a shell + environment you might have to write ``\\x0a`` to prevent the shell from + eating your ``\``. + If you chose filename extension matching, this is the extension to be + recognised (without the ``.``, the ``\x0a`` specials are not allowed). + Extension matching is case sensitive, and slashes ``/`` are not allowed! +- ``mask`` + is an (optional, defaults to all 0xff) mask. You can mask out some + bits from matching by supplying a string like magic and as long as magic. + The mask is anded with the byte sequence of the file. Note that you must + escape any NUL bytes; parsing halts at the first one. Ignored when using + filename extension matching. +- ``interpreter`` + is the program that should be invoked with the binary as first + argument (specify the full path) +- ``flags`` + is an optional field that controls several aspects of the invocation + of the interpreter. It is a string of capital letters, each controls a + certain aspect. The following flags are supported: + + ``P`` - preserve-argv[0] + Legacy behavior of binfmt_misc is to overwrite + the original argv[0] with the full path to the binary. When this + flag is included, binfmt_misc will add an argument to the argument + vector for this purpose, thus preserving the original ``argv[0]``. + e.g. If your interp is set to ``/bin/foo`` and you run ``blah`` + (which is in ``/usr/local/bin``), then the kernel will execute + ``/bin/foo`` with ``argv[]`` set to ``["/bin/foo", "/usr/local/bin/blah", "blah"]``. The interp has to be aware of this so it can + execute ``/usr/local/bin/blah`` + with ``argv[]`` set to ``["blah"]``. + ``O`` - open-binary + Legacy behavior of binfmt_misc is to pass the full path + of the binary to the interpreter as an argument. When this flag is + included, binfmt_misc will open the file for reading and pass its + descriptor as an argument, instead of the full path, thus allowing + the interpreter to execute non-readable binaries. This feature + should be used with care - the interpreter has to be trusted not to + emit the contents of the non-readable binary. + ``C`` - credentials + Currently, the behavior of binfmt_misc is to calculate + the credentials and security token of the new process according to + the interpreter. When this flag is included, these attributes are + calculated according to the binary. It also implies the ``O`` flag. + This feature should be used with care as the interpreter + will run with root permissions when a setuid binary owned by root + is run with binfmt_misc. + ``F`` - fix binary + The usual behaviour of binfmt_misc is to spawn the + binary lazily when the misc format file is invoked. However, + this doesn``t work very well in the face of mount namespaces and + changeroots, so the ``F`` mode opens the binary as soon as the + emulation is installed and uses the opened image to spawn the + emulator, meaning it is always available once installed, + regardless of how the environment changes. + + +There are some restrictions: + + - the whole register string may not exceed 1920 characters + - the magic must reside in the first 128 bytes of the file, i.e. + offset+size(magic) has to be less than 128 + - the interpreter string may not exceed 127 characters + +To use binfmt_misc you have to mount it first. You can mount it with +``mount -t binfmt_misc none /proc/sys/fs/binfmt_misc`` command, or you can add +a line ``none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0`` to your +``/etc/fstab`` so it auto mounts on boot. + +You may want to add the binary formats in one of your ``/etc/rc`` scripts during +boot-up. Read the manual of your init program to figure out how to do this +right. + +Think about the order of adding entries! Later added entries are matched first! + + +A few examples (assumed you are in ``/proc/sys/fs/binfmt_misc``): + +- enable support for em86 (like binfmt_em86, for Alpha AXP only):: + + echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register + echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register + +- enable support for packed DOS applications (pre-configured dosemu hdimages):: + + echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register + +- enable support for Windows executables using wine:: + + echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register + +For java support see Documentation/admin-guide/java.rst + + +You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable) +or 1 (to enable) to ``/proc/sys/fs/binfmt_misc/status`` or +``/proc/.../the_name``. +Catting the file tells you the current status of ``binfmt_misc/the_entry``. + +You can remove one entry or all entries by echoing -1 to ``/proc/.../the_name`` +or ``/proc/sys/fs/binfmt_misc/status``. + + +Hints +----- + +If you want to pass special arguments to your interpreter, you can +write a wrapper script for it. See Documentation/admin-guide/java.rst for an +example. + +Your interpreter should NOT look in the PATH for the filename; the kernel +passes it the full filename (or the file descriptor) to use. Using ``$PATH`` can +cause unexpected behaviour and can be a security hazard. + + +Richard Günther diff --git a/Documentation/admin-guide/braille-console.rst b/Documentation/admin-guide/braille-console.rst new file mode 100644 index 000000000000..18e79337dcfd --- /dev/null +++ b/Documentation/admin-guide/braille-console.rst @@ -0,0 +1,38 @@ +Linux Braille Console +===================== + +To get early boot messages on a braille device (before userspace screen +readers can start), you first need to compile the support for the usual serial +console (see :ref:`Documentation/admin-guide/serial-console.rst `), and +for braille device +(in :menuselection:`Device Drivers --> Accessibility support --> Console on braille device`). + +Then you need to specify a ``console=brl``, option on the kernel command line, the +format is:: + + console=brl,serial_options... + +where ``serial_options...`` are the same as described in +:ref:`Documentation/admin-guide/serial-console.rst `. + +So for instance you can use ``console=brl,ttyS0`` if the braille device is connected to the first serial port, and ``console=brl,ttyS0,115200`` to +override the baud rate to 115200, etc. + +By default, the braille device will just show the last kernel message (console +mode). To review previous messages, press the Insert key to switch to the VT +review mode. In review mode, the arrow keys permit to browse in the VT content, +:kbd:`PAGE-UP`/:kbd:`PAGE-DOWN` keys go at the top/bottom of the screen, and +the :kbd:`HOME` key goes back +to the cursor, hence providing very basic screen reviewing facility. + +Sound feedback can be obtained by adding the ``braille_console.sound=1`` kernel +parameter. + +For simplicity, only one braille console can be enabled, other uses of +``console=brl,...`` will be discarded. Also note that it does not interfere with +the console selection mechanism described in +:ref:`Documentation/admin-guide/serial-console.rst `. + +For now, only the VisioBraille device is supported. + +Samuel Thibault diff --git a/Documentation/admin-guide/bug-bisect.rst b/Documentation/admin-guide/bug-bisect.rst new file mode 100644 index 000000000000..59567da344e8 --- /dev/null +++ b/Documentation/admin-guide/bug-bisect.rst @@ -0,0 +1,76 @@ +Bisecting a bug ++++++++++++++++ + +Last updated: 28 October 2016 + +Introduction +============ + +Always try the latest kernel from kernel.org and build from source. If you are +not confident in doing that please report the bug to your distribution vendor +instead of to a kernel developer. + +Finding bugs is not always easy. Have a go though. If you can't find it don't +give up. Report as much as you have found to the relevant maintainer. See +MAINTAINERS for who that is for the subsystem you have worked on. + +Before you submit a bug report read +:ref:`Documentation/admin-guide/reporting-bugs.rst `. + +Devices not appearing +===================== + +Often this is caused by udev/systemd. Check that first before blaming it +on the kernel. + +Finding patch that caused a bug +=============================== + +Using the provided tools with ``git`` makes finding bugs easy provided the bug +is reproducible. + +Steps to do it: + +- build the Kernel from its git source +- start bisect with [#f1]_:: + + $ git bisect start + +- mark the broken changeset with:: + + $ git bisect bad [commit] + +- mark a changeset where the code is known to work with:: + + $ git bisect good [commit] + +- rebuild the Kernel and test +- interact with git bisect by using either:: + + $ git bisect good + + or:: + + $ git bisect bad + + depending if the bug happened on the changeset you're testing +- After some interactions, git bisect will give you the changeset that + likely caused the bug. + +- For example, if you know that the current version is bad, and version + 4.8 is good, you could do:: + + $ git bisect start + $ git bisect bad # Current version is bad + $ git bisect good v4.8 + + +.. [#f1] You can, optionally, provide both good and bad arguments at git + start with ``git bisect start [BAD] [GOOD]`` + +For further references, please read: + +- The man page for ``git-bisect`` +- `Fighting regressions with git bisect `_ +- `Fully automated bisecting with "git bisect run" `_ +- `Using Git bisect to figure out when brokenness was introduced `_ diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst new file mode 100644 index 000000000000..08c4b1308189 --- /dev/null +++ b/Documentation/admin-guide/bug-hunting.rst @@ -0,0 +1,369 @@ +Bug hunting +=========== + +Kernel bug reports often come with a stack dump like the one below:: + + ------------[ cut here ]------------ + WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70 + Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core] + CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1 + Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009 + 00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6 + c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10 + f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617 + Call Trace: + [] ? dump_stack+0x44/0x64 + [] ? __warn+0xfa/0x120 + [] ? module_put+0x57/0x70 + [] ? module_put+0x57/0x70 + [] ? warn_slowpath_null+0x23/0x30 + [] ? module_put+0x57/0x70 + [] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk] + [] ? symbol_put_addr+0x27/0x50 + [] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb] + [] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb] + [] ? usb_disable_endpoint+0x7c/0xb0 + [] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb] + [] ? usb_unbind_interface+0x62/0x250 + [] ? __pm_runtime_idle+0x44/0x70 + [] ? __device_release_driver+0x78/0x120 + [] ? driver_detach+0x87/0x90 + [] ? bus_remove_driver+0x38/0x90 + [] ? usb_deregister+0x58/0xb0 + [] ? SyS_delete_module+0x130/0x1f0 + [] ? task_work_run+0x64/0x80 + [] ? exit_to_usermode_loop+0x85/0x90 + [] ? do_fast_syscall_32+0x80/0x130 + [] ? sysenter_past_esp+0x40/0x6a + ---[ end trace 6ebc60ef3981792f ]--- + +Such stack traces provide enough information to identify the line inside the +Kernel's source code where the bug happened. Depending on the severity of +the issue, it may also contain the word **Oops**, as on this one:: + + BUG: unable to handle kernel NULL pointer dereference at (null) + IP: [] iret_exc+0x7d0/0xa59 + *pdpt = 000000002258a001 *pde = 0000000000000000 + Oops: 0002 [#1] PREEMPT SMP + ... + +Despite being an **Oops** or some other sort of stack trace, the offended +line is usually required to identify and handle the bug. Along this chapter, +we'll refer to "Oops" for all kinds of stack traces that need to be analized. + +.. note:: + + ``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original + format (from ``dmesg``, etc). Ignore any references in this or other docs to + "decoding the Oops" or "running it through ksymoops". + If you post an Oops from 2.6+ that has been run through ``ksymoops``, + people will just tell you to repost it. + +Where is the Oops message is located? +------------------------------------- + +Normally the Oops text is read from the kernel buffers by klogd and +handed to ``syslogd`` which writes it to a syslog file, typically +``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with +systemd, it may also be stored by the ``journald`` daemon, and accessed +by running ``journalctl`` command. + +Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to +read the data from the kernel buffers and save it. Or you can +``cat /proc/kmsg > file``, however you have to break in to stop the transfer, +``kmsg`` is a "never ending file". + +If the machine has crashed so badly that you cannot enter commands or +the disk is not available then you have three options: + +(1) Hand copy the text from the screen and type it in after the machine + has restarted. Messy but it is the only option if you have not + planned for a crash. Alternatively, you can take a picture of + the screen with a digital camera - not nice, but better than + nothing. If the messages scroll off the top of the console, you + may find that booting with a higher resolution (eg, ``vga=791``) + will allow you to read more of the text. (Caveat: This needs ``vesafb``, + so won't help for 'early' oopses) + +(2) Boot with a serial console (see + :ref:`Documentation/admin-guide/serial-console.rst `), + run a null modem to a second machine and capture the output there + using your favourite communication program. Minicom works well. + +(3) Use Kdump (see Documentation/kdump/kdump.txt), + extract the kernel ring buffer from old memory with using dmesg + gdbmacro in Documentation/kdump/gdbmacros.txt. + +Finding the bug's location +-------------------------- + +Reporting a bug works best if you point the location of the bug at the +Kernel source file. There are two methods for doing that. Usually, using +``gdb`` is easier, but the Kernel should be pre-compiled with debug info. + +gdb +^^^ + +The GNU debug (``gdb``) is the best way to figure out the exact file and line +number of the OOPS from the ``vmlinux`` file. + +The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``. +This can be set by running:: + + $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO + +On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the +EIP value from the OOPS:: + + EIP: 0060:[] Not tainted VLI + +And use GDB to translate that to human-readable form:: + + $ gdb vmlinux + (gdb) l *0xc021e50e + +If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function +offset from the OOPS:: + + EIP is at vt_ioctl+0xda8/0x1482 + +And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled:: + + $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO + $ make vmlinux + $ gdb vmlinux + (gdb) l *vt_ioctl+0xda8 + 0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293). + 288 { + 289 struct vc_data *vc = NULL; + 290 int ret = 0; + 291 + 292 console_lock(); + 293 if (VT_BUSY(vc_num)) + 294 ret = -EBUSY; + 295 else if (vc_num) + 296 vc = vc_deallocate(vc_num); + 297 console_unlock(); + +or, if you want to be more verbose:: + + (gdb) p vt_ioctl + $1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 + (gdb) l *0xae0+0xda8 + +You could, instead, use the object file:: + + $ make drivers/tty/ + $ gdb drivers/tty/vt/vt_ioctl.o + (gdb) l *vt_ioctl+0xda8 + +If you have a call trace, such as:: + + Call Trace: + [] :jbd:log_wait_commit+0xa3/0xf5 + [] autoremove_wake_function+0x0/0x2e + [] :jbd:journal_stop+0x1be/0x1ee + ... + +this shows the problem likely in the :jbd: module. You can load that module +in gdb and list the relevant code:: + + $ gdb fs/jbd/jbd.ko + (gdb) l *log_wait_commit+0xa3 + +.. note:: + + You can also do the same for any function call at the stack trace, + like this one:: + + [] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb] + + The position where the above call happened can be seen with:: + + $ gdb drivers/media/usb/dvb-usb/dvb-usb.o + (gdb) l *dvb_usb_adapter_frontend_exit+0x3a + +objdump +^^^^^^^ + +To debug a kernel, use objdump and look for the hex offset from the crash +output to find the valid line of code/assembler. Without debug symbols, you +will see the assembler code for the routine shown, but if your kernel has +debug symbols the C code will also be available. (Debug symbols can be enabled +in the kernel hacking menu of the menu configuration.) For example:: + + $ objdump -r -S -l --disassemble net/dccp/ipv4.o + +.. note:: + + You need to be at the top level of the kernel tree for this to pick up + your C files. + +If you don't have access to the code you can also debug on some crash dumps +e.g. crash dump output as shown by Dave Miller:: + + EIP is at +0x14/0x4c0 + ... + Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 + 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 + <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 + + Put the bytes into a "foo.s" file like this: + + .text + .globl foo + foo: + .byte .... /* bytes from Code: part of OOPS dump */ + + Compile it with "gcc -c -o foo.o foo.s" then look at the output of + "objdump --disassemble foo.o". + + Output: + + ip_queue_xmit: + push %ebp + push %edi + push %esi + push %ebx + sub $0xbc, %esp + mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) + mov 0x8(%ebp), %ebx ! %ebx = skb->sk + mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt + +Reporting the bug +----------------- + +Once you find where the bug happened, by inspecting its location, +you could either try to fix it yourself or report it upstream. + +In order to report it upstream, you should identify the mailing list +used for the development of the affected code. This can be done by using +the ``get_maintainer.pl`` script. + +For example, if you find a bug at the gspca's conex.c file, you can get +their maintainers with:: + + $ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c + Hans Verkuil (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%) + Mauro Carvalho Chehab (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%) + Tejun Heo (commit_signer:1/1=100%) + Bhaktipriya Shridhar (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%) + linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER) + linux-kernel@vger.kernel.org (open list) + +Please notice that it will point to: + +- The last developers that touched on the source code. On the above example, + Tejun and Bhaktipriya (in this specific case, none really envolved on the + development of this file); +- The driver maintainer (Hans Verkuil); +- The subsystem maintainer (Mauro Carvalho Chehab) +- The driver and/or subsystem mailing list (linux-media@vger.kernel.org); +- the Linux Kernel mailing list (linux-kernel@vger.kernel.org). + +Usually, the fastest way to have your bug fixed is to report it to mailing +list used for the development of the code (linux-media ML) copying the driver maintainer (Hans). + +If you are totally stumped as to whom to send the report, and +``get_maintainer.pl`` didn't provide you anything useful, send it to +linux-kernel@vger.kernel.org. + +Thanks for your help in making Linux as stable as humanly possible. + +Fixing the bug +-------------- + +If you know programming, you could help us by not only reporting the bug, +but also providing us with a solution. After all open source is about +sharing what you do and don't you want to be recognised for your genius? + +If you decide to take this way, once you have worked out a fix please submit +it upstream. + +Please do read +ref:`Documentation/process/submitting-patches.rst ` though +to help your code get accepted. + + +--------------------------------------------------------------------------- + +Notes on Oops tracing with ``klogd`` +------------------------------------ + +In order to help Linus and the other kernel developers there has been +substantial support incorporated into ``klogd`` for processing protection +faults. In order to have full support for address resolution at least +version 1.3-pl3 of the ``sysklogd`` package should be used. + +When a protection fault occurs the ``klogd`` daemon automatically +translates important addresses in the kernel log messages to their +symbolic equivalents. This translated kernel message is then +forwarded through whatever reporting mechanism ``klogd`` is using. The +protection fault message can be simply cut out of the message files +and forwarded to the kernel developers. + +Two types of address resolution are performed by ``klogd``. The first is +static translation and the second is dynamic translation. Static +translation uses the System.map file in much the same manner that +ksymoops does. In order to do static translation the ``klogd`` daemon +must be able to find a system map file at daemon initialization time. +See the klogd man page for information on how ``klogd`` searches for map +files. + +Dynamic address translation is important when kernel loadable modules +are being used. Since memory for kernel modules is allocated from the +kernel's dynamic memory pools there are no fixed locations for either +the start of the module or for functions and symbols in the module. + +The kernel supports system calls which allow a program to determine +which modules are loaded and their location in memory. Using these +system calls the klogd daemon builds a symbol table which can be used +to debug a protection fault which occurs in a loadable kernel module. + +At the very minimum klogd will provide the name of the module which +generated the protection fault. There may be additional symbolic +information available if the developer of the loadable module chose to +export symbol information from the module. + +Since the kernel module environment can be dynamic there must be a +mechanism for notifying the ``klogd`` daemon when a change in module +environment occurs. There are command line options available which +allow klogd to signal the currently executing daemon that symbol +information should be refreshed. See the ``klogd`` manual page for more +information. + +A patch is included with the sysklogd distribution which modifies the +``modules-2.0.0`` package to automatically signal klogd whenever a module +is loaded or unloaded. Applying this patch provides essentially +seamless support for debugging protection faults which occur with +kernel loadable modules. + +The following is an example of a protection fault in a loadable module +processed by ``klogd``:: + + Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc + Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000 + Aug 29 09:51:01 blizard kernel: *pde = 00000000 + Aug 29 09:51:01 blizard kernel: Oops: 0002 + Aug 29 09:51:01 blizard kernel: CPU: 0 + Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868] + Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212 + Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c + Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c + Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 + Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000) + Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001 + Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00 + Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036 + Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128] + Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3 + +--------------------------------------------------------------------------- + +:: + + Dr. G.W. Wettstein Oncology Research Div. Computing Facility + Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com + 820 4th St. N. + Fargo, ND 58122 + Phone: 701-234-7556 diff --git a/Documentation/admin-guide/conf.py b/Documentation/admin-guide/conf.py new file mode 100644 index 000000000000..86f738953799 --- /dev/null +++ b/Documentation/admin-guide/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = 'Linux Kernel User Documentation' + +tags.add("subproject") + +latex_documents = [ + ('index', 'linux-user.tex', 'Linux Kernel User Documentation', + 'The kernel development community', 'manual'), +] diff --git a/Documentation/admin-guide/devices.rst b/Documentation/admin-guide/devices.rst new file mode 100644 index 000000000000..7fadc05330dd --- /dev/null +++ b/Documentation/admin-guide/devices.rst @@ -0,0 +1,268 @@ + +Linux allocated devices (4.x+ version) +====================================== + +This list is the Linux Device List, the official registry of allocated +device numbers and ``/dev`` directory nodes for the Linux operating +system. + +The LaTeX version of this document is no longer maintained, nor is +the document that used to reside at lanana.org. This version in the +mainline Linux kernel is the master document. Updates shall be sent +as patches to the kernel maintainers (see the +:ref:`Documentation/process/submitting-patches.rst ` document). +Specifically explore the sections titled "CHAR and MISC DRIVERS", and +"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers +to involve for character and block devices. + +This document is included by reference into the Filesystem Hierarchy +Standard (FHS). The FHS is available from http://www.pathname.com/fhs/. + +Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga +platform only. Allocations marked (68k/Atari) apply to Linux/68k on +the Atari platform only. + +This document is in the public domain. The authors requests, however, +that semantically altered versions are not distributed without +permission of the authors, assuming the authors can be contacted without +an unreasonable effort. + + +.. attention:: + + DEVICE DRIVERS AUTHORS PLEASE READ THIS + + Linux now has extensive support for dynamic allocation of device numbering + and can use ``sysfs`` and ``udev`` (``systemd``) to handle the naming needs. + There are still some exceptions in the serial and boot device area. Before + asking for a device number make sure you actually need one. + + To have a major number allocated, or a minor number in situations + where that applies (e.g. busmice), please submit a patch and send to + the authors as indicated above. + + Keep the description of the device *in the same format + as this list*. The reason for this is that it is the only way we have + found to ensure we have all the requisite information to publish your + device and avoid conflicts. + + Finally, sometimes we have to play "namespace police." Please don't be + offended. We often get submissions for ``/dev`` names that would be bound + to cause conflicts down the road. We are trying to avoid getting in a + situation where we would have to suffer an incompatible forward + change. Therefore, please consult with us **before** you make your + device names and numbers in any way public, at least to the point + where it would be at all difficult to get them changed. + + Your cooperation is appreciated. + +.. include:: devices.txt + :literal: + +Additional ``/dev/`` directory entries +-------------------------------------- + +This section details additional entries that should or may exist in +the /dev directory. It is preferred that symbolic links use the same +form (absolute or relative) as is indicated here. Links are +classified as "hard" or "symbolic" depending on the preferred type of +link; if possible, the indicated type of link should be used. + +Compulsory links +++++++++++++++++ + +These links should exist on all systems: + +=============== =============== =============== =============================== +/dev/fd /proc/self/fd symbolic File descriptors +/dev/stdin fd/0 symbolic stdin file descriptor +/dev/stdout fd/1 symbolic stdout file descriptor +/dev/stderr fd/2 symbolic stderr file descriptor +/dev/nfsd socksys symbolic Required by iBCS-2 +/dev/X0R null symbolic Required by iBCS-2 +=============== =============== =============== =============================== + +Note: ``/dev/X0R`` is --. + +Recommended links ++++++++++++++++++ + +It is recommended that these links exist on all systems: + + +=============== =============== =============== =============================== +/dev/core /proc/kcore symbolic Backward compatibility +/dev/ramdisk ram0 symbolic Backward compatibility +/dev/ftape qft0 symbolic Backward compatibility +/dev/bttv0 video0 symbolic Backward compatibility +/dev/radio radio0 symbolic Backward compatibility +/dev/i2o* /dev/i2o/* symbolic Backward compatibility +/dev/scd? sr? hard Alternate SCSI CD-ROM name +=============== =============== =============== =============================== + +Locally defined links ++++++++++++++++++++++ + +The following links may be established locally to conform to the +configuration of the system. This is merely a tabulation of existing +practice, and does not constitute a recommendation. However, if they +exist, they should have the following uses. + +=============== =============== =============== =============================== +/dev/mouse mouse port symbolic Current mouse device +/dev/tape tape device symbolic Current tape device +/dev/cdrom CD-ROM device symbolic Current CD-ROM device +/dev/cdwriter CD-writer symbolic Current CD-writer device +/dev/scanner scanner symbolic Current scanner device +/dev/modem modem port symbolic Current dialout device +/dev/root root device symbolic Current root filesystem +/dev/swap swap device symbolic Current swap device +=============== =============== =============== =============================== + +``/dev/modem`` should not be used for a modem which supports dialin as +well as dialout, as it tends to cause lock file problems. If it +exists, ``/dev/modem`` should point to the appropriate primary TTY device +(the use of the alternate callout devices is deprecated). + +For SCSI devices, ``/dev/tape`` and ``/dev/cdrom`` should point to the +*cooked* devices (``/dev/st*`` and ``/dev/sr*``, respectively), whereas +``/dev/cdwriter`` and /dev/scanner should point to the appropriate generic +SCSI devices (/dev/sg*). + +``/dev/mouse`` may point to a primary serial TTY device, a hardware mouse +device, or a socket for a mouse driver program (e.g. ``/dev/gpmdata``). + +Sockets and pipes ++++++++++++++++++ + +Non-transient sockets and named pipes may exist in /dev. Common entries are: + +=============== =============== =============================================== +/dev/printer socket lpd local socket +/dev/log socket syslog local socket +/dev/gpmdata socket gpm mouse multiplexer +=============== =============== =============================================== + +Mount points +++++++++++++ + +The following names are reserved for mounting special filesystems +under /dev. These special filesystems provide kernel interfaces that +cannot be provided with standard device nodes. + +=============== =============== =============================================== +/dev/pts devpts PTY slave filesystem +/dev/shm tmpfs POSIX shared memory maintenance access +=============== =============== =============================================== + +Terminal devices +---------------- + +Terminal, or TTY devices are a special class of character devices. A +terminal device is any device that could act as a controlling terminal +for a session; this includes virtual consoles, serial ports, and +pseudoterminals (PTYs). + +All terminal devices share a common set of capabilities known as line +disciplines; these include the common terminal line discipline as well +as SLIP and PPP modes. + +All terminal devices are named similarly; this section explains the +naming and use of the various types of TTYs. Note that the naming +conventions include several historical warts; some of these are +Linux-specific, some were inherited from other systems, and some +reflect Linux outgrowing a borrowed convention. + +A hash mark (``#``) in a device name is used here to indicate a decimal +number without leading zeroes. + +Virtual consoles and the console device ++++++++++++++++++++++++++++++++++++++++ + +Virtual consoles are full-screen terminal displays on the system video +monitor. Virtual consoles are named ``/dev/tty#``, with numbering +starting at ``/dev/tty1``; ``/dev/tty0`` is the current virtual console. +``/dev/tty0`` is the device that should be used to access the system video +card on those architectures for which the frame buffer devices +(``/dev/fb*``) are not applicable. Do not use ``/dev/console`` +for this purpose. + +The console device, ``/dev/console``, is the device to which system +messages should be sent, and on which logins should be permitted in +single-user mode. Starting with Linux 2.1.71, ``/dev/console`` is managed +by the kernel; for previous versions it should be a symbolic link to +either ``/dev/tty0``, a specific virtual console such as ``/dev/tty1``, or to +a serial port primary (``tty*``, not ``cu*``) device, depending on the +configuration of the system. + +Serial ports +++++++++++++ + +Serial ports are RS-232 serial ports and any device which simulates +one, either in hardware (such as internal modems) or in software (such +as the ISDN driver.) Under Linux, each serial ports has two device +names, the primary or callin device and the alternate or callout one. +Each kind of device is indicated by a different letter. For any +letter X, the names of the devices are ``/dev/ttyX#`` and ``/dev/cux#``, +respectively; for historical reasons, ``/dev/ttyS#`` and ``/dev/ttyC#`` +correspond to ``/dev/cua#`` and ``/dev/cub#``. In the future, it should be +expected that multiple letters will be used; all letters will be upper +case for the "tty" device (e.g. ``/dev/ttyDP#``) and lower case for the +"cu" device (e.g. ``/dev/cudp#``). + +The names ``/dev/ttyQ#`` and ``/dev/cuq#`` are reserved for local use. + +The alternate devices provide for kernel-based exclusion and somewhat +different defaults than the primary devices. Their main purpose is to +allow the use of serial ports with programs with no inherent or broken +support for serial ports. Their use is deprecated, and they may be +removed from a future version of Linux. + +Arbitration of serial ports is provided by the use of lock files with +the names ``/var/lock/LCK..ttyX#``. The contents of the lock file should +be the PID of the locking process as an ASCII number. + +It is common practice to install links such as /dev/modem +which point to serial ports. In order to ensure proper locking in the +presence of these links, it is recommended that software chase +symlinks and lock all possible names; additionally, it is recommended +that a lock file be installed with the corresponding alternate +device. In order to avoid deadlocks, it is recommended that the locks +are acquired in the following order, and released in the reverse: + + 1. The symbolic link name, if any (``/var/lock/LCK..modem``) + 2. The "tty" name (``/var/lock/LCK..ttyS2``) + 3. The alternate device name (``/var/lock/LCK..cua2``) + +In the case of nested symbolic links, the lock files should be +installed in the order the symlinks are resolved. + +Under no circumstances should an application hold a lock while waiting +for another to be released. In addition, applications which attempt +to create lock files for the corresponding alternate device names +should take into account the possibility of being used on a non-serial +port TTY, for which no alternate device would exist. + +Pseudoterminals (PTYs) +++++++++++++++++++++++ + +Pseudoterminals, or PTYs, are used to create login sessions or provide +other capabilities requiring a TTY line discipline (including SLIP or +PPP capability) to arbitrary data-generation processes. Each PTY has +a master side, named ``/dev/pty[p-za-e][0-9a-f]``, and a slave side, named +``/dev/tty[p-za-e][0-9a-f]``. The kernel arbitrates the use of PTYs by +allowing each master side to be opened only once. + +Once the master side has been opened, the corresponding slave device +can be used in the same manner as any TTY device. The master and +slave devices are connected by the kernel, generating the equivalent +of a bidirectional pipe with TTY capabilities. + +Recent versions of the Linux kernels and GNU libc contain support for +the System V/Unix98 naming scheme for PTYs, which assigns a common +device, ``/dev/ptmx``, to all the masters (opening it will automatically +give you a previously unassigned PTY) and a subdirectory, ``/dev/pts``, +for the slaves; the slaves are named with decimal integers (``/dev/pts/#`` +in our notation). This removes the problem of exhausting the +namespace and enables the kernel to automatically create the device +nodes for the slaves on demand using the "devpts" filesystem. diff --git a/Documentation/devices.txt b/Documentation/admin-guide/devices.txt similarity index 73% rename from Documentation/devices.txt rename to Documentation/admin-guide/devices.txt index 4035eca87144..c9cea2e39c21 100644 --- a/Documentation/devices.txt +++ b/Documentation/admin-guide/devices.txt @@ -1,63 +1,8 @@ - - LINUX ALLOCATED DEVICES (4.x+ version) - -This list is the Linux Device List, the official registry of allocated -device numbers and /dev directory nodes for the Linux operating -system. - -The LaTeX version of this document is no longer maintained, nor is -the document that used to reside at lanana.org. This version in the -mainline Linux kernel is the master document. Updates shall be sent -as patches to the kernel maintainers (see the SubmittingPatches document). -Specifically explore the sections titled "CHAR and MISC DRIVERS", and -"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers -to involve for character and block devices. - -This document is included by reference into the Filesystem Hierarchy -Standard (FHS). The FHS is available from http://www.pathname.com/fhs/. - -Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga -platform only. Allocations marked (68k/Atari) apply to Linux/68k on -the Atari platform only. - -This document is in the public domain. The authors requests, however, -that semantically altered versions are not distributed without -permission of the authors, assuming the authors can be contacted without -an unreasonable effort. - - - **** DEVICE DRIVERS AUTHORS PLEASE READ THIS **** - -Linux now has extensive support for dynamic allocation of device numbering -and can use sysfs and udev (systemd) to handle the naming needs. There are -still some exceptions in the serial and boot device area. Before asking -for a device number make sure you actually need one. - -To have a major number allocated, or a minor number in situations -where that applies (e.g. busmice), please submit a patch and send to -the authors as indicated above. - -Keep the description of the device *in the same format -as this list*. The reason for this is that it is the only way we have -found to ensure we have all the requisite information to publish your -device and avoid conflicts. - -Finally, sometimes we have to play "namespace police." Please don't be -offended. We often get submissions for /dev names that would be bound -to cause conflicts down the road. We are trying to avoid getting in a -situation where we would have to suffer an incompatible forward -change. Therefore, please consult with us *before* you make your -device names and numbers in any way public, at least to the point -where it would be at all difficult to get them changed. - -Your cooperation is appreciated. - - - 0 Unnamed devices (e.g. non-device mounts) + 0 Unnamed devices (e.g. non-device mounts) 0 = reserved as null device number See block major 144, 145, 146 for expansion areas. - 1 char Memory devices + 1 char Memory devices 1 = /dev/mem Physical memory access 2 = /dev/kmem Kernel virtual memory access 3 = /dev/null Null device @@ -72,7 +17,7 @@ Your cooperation is appreciated. export the buffered printk records. 12 = /dev/oldmem OBSOLETE - replaced by /proc/vmcore - 1 block RAM disk + 1 block RAM disk 0 = /dev/ram0 First RAM disk 1 = /dev/ram1 Second RAM disk ... @@ -83,7 +28,7 @@ Your cooperation is appreciated. by the boot loader; newer kernels use /dev/ram0 for the initrd. - 2 char Pseudo-TTY masters + 2 char Pseudo-TTY masters 0 = /dev/ptyp0 First PTY master 1 = /dev/ptyp1 Second PTY master ... @@ -101,7 +46,7 @@ Your cooperation is appreciated. master multiplex (/dev/ptmx) to acquire a PTY on demand. - 2 block Floppy disks + 2 block Floppy disks 0 = /dev/fd0 Controller 0, drive 0, autodetect 1 = /dev/fd1 Controller 0, drive 1, autodetect 2 = /dev/fd2 Controller 0, drive 2, autodetect @@ -158,7 +103,7 @@ Your cooperation is appreciated. and E for the 3.5" models have been deprecated, since the drive type is insignificant for these devices. - 3 char Pseudo-TTY slaves + 3 char Pseudo-TTY slaves 0 = /dev/ttyp0 First PTY slave 1 = /dev/ttyp1 Second PTY slave ... @@ -167,7 +112,7 @@ Your cooperation is appreciated. These are the old-style (BSD) PTY devices; Unix98 devices are on major 136 and above. - 3 block First MFM, RLL and IDE hard disk/CD-ROM interface + 3 block First MFM, RLL and IDE hard disk/CD-ROM interface 0 = /dev/hda Master: whole disk (or CD-ROM) 64 = /dev/hdb Slave: whole disk (or CD-ROM) @@ -183,7 +128,7 @@ Your cooperation is appreciated. Other versions of Linux use partitioning schemes appropriate to their respective architectures. - 4 char TTY devices + 4 char TTY devices 0 = /dev/tty0 Current virtual console 1 = /dev/tty1 First virtual console @@ -199,13 +144,13 @@ Your cooperation is appreciated. number for BSD PTY devices. As of Linux 2.1.115, this is no longer supported. Use major numbers 2 and 3. - 4 block Aliases for dynamically allocated major devices to be used + 4 block Aliases for dynamically allocated major devices to be used when its not possible to create the real device nodes because the root filesystem is mounted read-only. - 0 = /dev/root + 0 = /dev/root - 5 char Alternate TTY devices + 5 char Alternate TTY devices 0 = /dev/tty Current TTY device 1 = /dev/console System console 2 = /dev/ptmx PTY master multiplex @@ -218,7 +163,7 @@ Your cooperation is appreciated. the section on terminal devices for more information on /dev/console. - 6 char Parallel printer devices + 6 char Parallel printer devices 0 = /dev/lp0 Parallel printer on parport0 1 = /dev/lp1 Parallel printer on parport1 ... @@ -227,7 +172,7 @@ Your cooperation is appreciated. between parallel ports and I/O addresses. Instead, they are redirected through the parport multiplex layer. - 7 char Virtual console capture devices + 7 char Virtual console capture devices 0 = /dev/vcs Current vc text contents 1 = /dev/vcs1 tty1 text contents ... @@ -239,7 +184,7 @@ Your cooperation is appreciated. NOTE: These devices permit both read and write access. - 7 block Loopback devices + 7 block Loopback devices 0 = /dev/loop0 First loop device 1 = /dev/loop1 Second loop device ... @@ -248,7 +193,7 @@ Your cooperation is appreciated. associated with block devices. The binding to the loop devices is handled by mount(8) or losetup(8). - 8 block SCSI disk devices (0-15) + 8 block SCSI disk devices (0-15) 0 = /dev/sda First SCSI disk whole disk 16 = /dev/sdb Second SCSI disk whole disk 32 = /dev/sdc Third SCSI disk whole disk @@ -259,7 +204,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 9 char SCSI tape devices + 9 char SCSI tape devices 0 = /dev/st0 First SCSI tape, mode 0 1 = /dev/st1 Second SCSI tape, mode 0 ... @@ -290,7 +235,7 @@ Your cooperation is appreciated. ioctl()'s can be used to rewind the tape regardless of the device used to access it. - 9 block Metadisk (RAID) devices + 9 block Metadisk (RAID) devices 0 = /dev/md0 First metadisk group 1 = /dev/md1 Second metadisk group ... @@ -298,7 +243,7 @@ Your cooperation is appreciated. The metadisk driver is used to span a filesystem across multiple physical disks. - 10 char Non-serial mice, misc features + 10 char Non-serial mice, misc features 0 = /dev/logibm Logitech bus mouse 1 = /dev/psaux PS/2-style mouse port 2 = /dev/inportbm Microsoft Inport bus mouse @@ -428,22 +373,22 @@ Your cooperation is appreciated. 240-254 Reserved for local use 255 Reserved for MISC_DYNAMIC_MINOR - 11 char Raw keyboard device (Linux/SPARC only) + 11 char Raw keyboard device (Linux/SPARC only) 0 = /dev/kbd Raw keyboard device - 11 char Serial Mux device (Linux/PA-RISC only) + 11 char Serial Mux device (Linux/PA-RISC only) 0 = /dev/ttyB0 First mux port 1 = /dev/ttyB1 Second mux port ... - 11 block SCSI CD-ROM devices + 11 block SCSI CD-ROM devices 0 = /dev/scd0 First SCSI CD-ROM 1 = /dev/scd1 Second SCSI CD-ROM ... The prefix /dev/sr (instead of /dev/scd) has been deprecated. - 12 char QIC-02 tape + 12 char QIC-02 tape 2 = /dev/ntpqic11 QIC-11, no rewind-on-close 3 = /dev/tpqic11 QIC-11, rewind-on-close 4 = /dev/ntpqic24 QIC-24, no rewind-on-close @@ -456,9 +401,9 @@ Your cooperation is appreciated. The device names specified are proposed -- if there are "standard" names for these devices, please let me know. - 12 block + 12 block - 13 char Input core + 13 char Input core 0 = /dev/input/js0 First joystick 1 = /dev/input/js1 Second joystick ... @@ -472,10 +417,10 @@ Your cooperation is appreciated. Each device type has 5 bits (32 minors). - 13 block Previously used for the XT disk (/dev/xdN) + 13 block Previously used for the XT disk (/dev/xdN) Deleted in kernel v3.9. - 14 char Open Sound System (OSS) + 14 char Open Sound System (OSS) 0 = /dev/mixer Mixer control 1 = /dev/sequencer Audio sequencer 2 = /dev/midi00 First MIDI port @@ -493,44 +438,44 @@ Your cooperation is appreciated. 34 = /dev/midi02 Third MIDI port 50 = /dev/midi03 Fourth MIDI port - 14 block + 14 block - 15 char Joystick + 15 char Joystick 0 = /dev/js0 First analog joystick 1 = /dev/js1 Second analog joystick ... 128 = /dev/djs0 First digital joystick 129 = /dev/djs1 Second digital joystick ... - 15 block Sony CDU-31A/CDU-33A CD-ROM + 15 block Sony CDU-31A/CDU-33A CD-ROM 0 = /dev/sonycd Sony CDU-31a CD-ROM - 16 char Non-SCSI scanners + 16 char Non-SCSI scanners 0 = /dev/gs4500 Genius 4500 handheld scanner - 16 block GoldStar CD-ROM + 16 block GoldStar CD-ROM 0 = /dev/gscd GoldStar CD-ROM - 17 char OBSOLETE (was Chase serial card) + 17 char OBSOLETE (was Chase serial card) 0 = /dev/ttyH0 First Chase port 1 = /dev/ttyH1 Second Chase port ... - 17 block Optics Storage CD-ROM + 17 block Optics Storage CD-ROM 0 = /dev/optcd Optics Storage CD-ROM - 18 char OBSOLETE (was Chase serial card - alternate devices) + 18 char OBSOLETE (was Chase serial card - alternate devices) 0 = /dev/cuh0 Callout device for ttyH0 1 = /dev/cuh1 Callout device for ttyH1 ... - 18 block Sanyo CD-ROM + 18 block Sanyo CD-ROM 0 = /dev/sjcd Sanyo CD-ROM - 19 char Cyclades serial card + 19 char Cyclades serial card 0 = /dev/ttyC0 First Cyclades port ... 31 = /dev/ttyC31 32nd Cyclades port - 19 block "Double" compressed disk + 19 block "Double" compressed disk 0 = /dev/double0 First compressed disk ... 7 = /dev/double7 Eighth compressed disk @@ -541,15 +486,15 @@ Your cooperation is appreciated. See the Double documentation for the meaning of the mirror devices. - 20 char Cyclades serial card - alternate devices + 20 char Cyclades serial card - alternate devices 0 = /dev/cub0 Callout device for ttyC0 ... 31 = /dev/cub31 Callout device for ttyC31 - 20 block Hitachi CD-ROM (under development) + 20 block Hitachi CD-ROM (under development) 0 = /dev/hitcd Hitachi CD-ROM - 21 char Generic SCSI access + 21 char Generic SCSI access 0 = /dev/sg0 First generic SCSI device 1 = /dev/sg1 Second generic SCSI device ... @@ -559,7 +504,7 @@ Your cooperation is appreciated. the system and is counter to standard Linux device-naming practice. - 21 block Acorn MFM hard drive interface + 21 block Acorn MFM hard drive interface 0 = /dev/mfma First MFM drive whole disk 64 = /dev/mfmb Second MFM drive whole disk @@ -567,25 +512,25 @@ Your cooperation is appreciated. Partitions are handled the same way as for IDE disks (see major number 3). - 22 char Digiboard serial card + 22 char Digiboard serial card 0 = /dev/ttyD0 First Digiboard port 1 = /dev/ttyD1 Second Digiboard port ... - 22 block Second IDE hard disk/CD-ROM interface + 22 block Second IDE hard disk/CD-ROM interface 0 = /dev/hdc Master: whole disk (or CD-ROM) 64 = /dev/hdd Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 23 char Digiboard serial card - alternate devices + 23 char Digiboard serial card - alternate devices 0 = /dev/cud0 Callout device for ttyD0 1 = /dev/cud1 Callout device for ttyD1 ... - 23 block Mitsumi proprietary CD-ROM + 23 block Mitsumi proprietary CD-ROM 0 = /dev/mcd Mitsumi CD-ROM - 24 char Stallion serial card + 24 char Stallion serial card 0 = /dev/ttyE0 Stallion port 0 card 0 1 = /dev/ttyE1 Stallion port 1 card 0 ... @@ -598,10 +543,10 @@ Your cooperation is appreciated. 192 = /dev/ttyE192 Stallion port 0 card 3 193 = /dev/ttyE193 Stallion port 1 card 3 ... - 24 block Sony CDU-535 CD-ROM + 24 block Sony CDU-535 CD-ROM 0 = /dev/cdu535 Sony CDU-535 CD-ROM - 25 char Stallion serial card - alternate devices + 25 char Stallion serial card - alternate devices 0 = /dev/cue0 Callout device for ttyE0 1 = /dev/cue1 Callout device for ttyE1 ... @@ -614,21 +559,21 @@ Your cooperation is appreciated. 192 = /dev/cue192 Callout device for ttyE192 193 = /dev/cue193 Callout device for ttyE193 ... - 25 block First Matsushita (Panasonic/SoundBlaster) CD-ROM + 25 block First Matsushita (Panasonic/SoundBlaster) CD-ROM 0 = /dev/sbpcd0 Panasonic CD-ROM controller 0 unit 0 1 = /dev/sbpcd1 Panasonic CD-ROM controller 0 unit 1 2 = /dev/sbpcd2 Panasonic CD-ROM controller 0 unit 2 3 = /dev/sbpcd3 Panasonic CD-ROM controller 0 unit 3 - 26 char + 26 char - 26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM + 26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM 0 = /dev/sbpcd4 Panasonic CD-ROM controller 1 unit 0 1 = /dev/sbpcd5 Panasonic CD-ROM controller 1 unit 1 2 = /dev/sbpcd6 Panasonic CD-ROM controller 1 unit 2 3 = /dev/sbpcd7 Panasonic CD-ROM controller 1 unit 3 - 27 char QIC-117 tape + 27 char QIC-117 tape 0 = /dev/qft0 Unit 0, rewind-on-close 1 = /dev/qft1 Unit 1, rewind-on-close 2 = /dev/qft2 Unit 2, rewind-on-close @@ -654,29 +599,29 @@ Your cooperation is appreciated. 38 = /dev/nrawqft2 Unit 2, no rewind-on-close, no file marks 39 = /dev/nrawqft3 Unit 3, no rewind-on-close, no file marks - 27 block Third Matsushita (Panasonic/SoundBlaster) CD-ROM + 27 block Third Matsushita (Panasonic/SoundBlaster) CD-ROM 0 = /dev/sbpcd8 Panasonic CD-ROM controller 2 unit 0 1 = /dev/sbpcd9 Panasonic CD-ROM controller 2 unit 1 2 = /dev/sbpcd10 Panasonic CD-ROM controller 2 unit 2 3 = /dev/sbpcd11 Panasonic CD-ROM controller 2 unit 3 - 28 char Stallion serial card - card programming + 28 char Stallion serial card - card programming 0 = /dev/staliomem0 First Stallion card I/O memory 1 = /dev/staliomem1 Second Stallion card I/O memory 2 = /dev/staliomem2 Third Stallion card I/O memory 3 = /dev/staliomem3 Fourth Stallion card I/O memory - 28 char Atari SLM ACSI laser printer (68k/Atari) + 28 char Atari SLM ACSI laser printer (68k/Atari) 0 = /dev/slm0 First SLM laser printer 1 = /dev/slm1 Second SLM laser printer ... - 28 block Fourth Matsushita (Panasonic/SoundBlaster) CD-ROM + 28 block Fourth Matsushita (Panasonic/SoundBlaster) CD-ROM 0 = /dev/sbpcd12 Panasonic CD-ROM controller 3 unit 0 1 = /dev/sbpcd13 Panasonic CD-ROM controller 3 unit 1 2 = /dev/sbpcd14 Panasonic CD-ROM controller 3 unit 2 3 = /dev/sbpcd15 Panasonic CD-ROM controller 3 unit 3 - 28 block ACSI disk (68k/Atari) + 28 block ACSI disk (68k/Atari) 0 = /dev/ada First ACSI disk whole disk 16 = /dev/adb Second ACSI disk whole disk 32 = /dev/adc Third ACSI disk whole disk @@ -687,16 +632,16 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15, like SCSI. - 29 char Universal frame buffer + 29 char Universal frame buffer 0 = /dev/fb0 First frame buffer 1 = /dev/fb1 Second frame buffer ... 31 = /dev/fb31 32nd frame buffer - 29 block Aztech/Orchid/Okano/Wearnes CD-ROM + 29 block Aztech/Orchid/Okano/Wearnes CD-ROM 0 = /dev/aztcd Aztech CD-ROM - 30 char iBCS-2 compatibility devices + 30 char iBCS-2 compatibility devices 0 = /dev/socksys Socket access 1 = /dev/spx SVR3 local X interface 32 = /dev/inet/ip Network access @@ -727,17 +672,17 @@ Your cooperation is appreciated. /dev/nfsd -> /dev/socksys /dev/X0R -> /dev/null (? apparently not required ?) - 30 block Philips LMS CM-205 CD-ROM + 30 block Philips LMS CM-205 CD-ROM 0 = /dev/cm205cd Philips LMS CM-205 CD-ROM /dev/lmscd is an older name for this device. This driver does not work with the CM-205MS CD-ROM. - 31 char MPU-401 MIDI + 31 char MPU-401 MIDI 0 = /dev/mpu401data MPU-401 data port 1 = /dev/mpu401stat MPU-401 status port - 31 block ROM/flash memory card + 31 block ROM/flash memory card 0 = /dev/rom0 First ROM card (rw) ... 7 = /dev/rom7 Eighth ROM card (rw) @@ -756,25 +701,25 @@ Your cooperation is appreciated. devices. The read-only devices (ro) support reading only. - 32 char Specialix serial card + 32 char Specialix serial card 0 = /dev/ttyX0 First Specialix port 1 = /dev/ttyX1 Second Specialix port ... - 32 block Philips LMS CM-206 CD-ROM + 32 block Philips LMS CM-206 CD-ROM 0 = /dev/cm206cd Philips LMS CM-206 CD-ROM - 33 char Specialix serial card - alternate devices + 33 char Specialix serial card - alternate devices 0 = /dev/cux0 Callout device for ttyX0 1 = /dev/cux1 Callout device for ttyX1 ... - 33 block Third IDE hard disk/CD-ROM interface + 33 block Third IDE hard disk/CD-ROM interface 0 = /dev/hde Master: whole disk (or CD-ROM) 64 = /dev/hdf Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 34 char Z8530 HDLC driver + 34 char Z8530 HDLC driver 0 = /dev/scc0 First Z8530, first port 1 = /dev/scc1 First Z8530, second port 2 = /dev/scc2 Second Z8530, first port @@ -785,14 +730,14 @@ Your cooperation is appreciated. /dev/sc1 for /dev/scc0, /dev/sc2 for /dev/scc1, and so on. - 34 block Fourth IDE hard disk/CD-ROM interface + 34 block Fourth IDE hard disk/CD-ROM interface 0 = /dev/hdg Master: whole disk (or CD-ROM) 64 = /dev/hdh Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 35 char tclmidi MIDI driver + 35 char tclmidi MIDI driver 0 = /dev/midi0 First MIDI port, kernel timed 1 = /dev/midi1 Second MIDI port, kernel timed 2 = /dev/midi2 Third MIDI port, kernel timed @@ -806,10 +751,10 @@ Your cooperation is appreciated. 130 = /dev/smpte2 Third MIDI port, SMPTE timed 131 = /dev/smpte3 Fourth MIDI port, SMPTE timed - 35 block Slow memory ramdisk + 35 block Slow memory ramdisk 0 = /dev/slram Slow memory ramdisk - 36 char Netlink support + 36 char Netlink support 0 = /dev/route Routing, device updates, kernel to user 1 = /dev/skip enSKIP security cache control 3 = /dev/fwmonitor Firewall packet copies @@ -817,9 +762,9 @@ Your cooperation is appreciated. ... 31 = /dev/tap15 16th Ethertap device - 36 block OBSOLETE (was MCA ESDI hard disk) + 36 block OBSOLETE (was MCA ESDI hard disk) - 37 char IDE tape + 37 char IDE tape 0 = /dev/ht0 First IDE tape 1 = /dev/ht1 Second IDE tape ... @@ -829,10 +774,10 @@ Your cooperation is appreciated. Currently, only one IDE tape drive is supported. - 37 block Zorro II ramdisk + 37 block Zorro II ramdisk 0 = /dev/z2ram Zorro II ramdisk - 38 char Myricom PCI Myrinet board + 38 char Myricom PCI Myrinet board 0 = /dev/mlanai0 First Myrinet board 1 = /dev/mlanai1 Second Myrinet board ... @@ -841,9 +786,9 @@ Your cooperation is appreciated. and "user level packet I/O." This board is also accessible as a standard networking "eth" device. - 38 block OBSOLETE (was Linux/AP+) + 38 block OBSOLETE (was Linux/AP+) - 39 char ML-16P experimental I/O board + 39 char ML-16P experimental I/O board 0 = /dev/ml16pa-a0 First card, first analog channel 1 = /dev/ml16pa-a1 First card, second analog channel ... @@ -861,20 +806,20 @@ Your cooperation is appreciated. 50 = /dev/ml16pb-c1 Second card, second counter/timer 51 = /dev/ml16pb-c2 Second card, third counter/timer ... - 39 block + 39 block - 40 char + 40 char - 40 block + 40 block - 41 char Yet Another Micro Monitor + 41 char Yet Another Micro Monitor 0 = /dev/yamm Yet Another Micro Monitor - 41 block + 41 block - 42 char Demo/sample use + 42 char Demo/sample use - 42 block Demo/sample use + 42 block Demo/sample use This number is intended for use in sample code, as well as a general "example" device number. It @@ -887,12 +832,12 @@ Your cooperation is appreciated. IN PARTICULAR, ANY DISTRIBUTION WHICH CONTAINS A DEVICE DRIVER USING MAJOR NUMBER 42 IS NONCOMPLIANT. - 43 char isdn4linux virtual modem + 43 char isdn4linux virtual modem 0 = /dev/ttyI0 First virtual modem ... 63 = /dev/ttyI63 64th virtual modem - 43 block Network block devices + 43 block Network block devices 0 = /dev/nb0 First network block device 1 = /dev/nb1 Second network block device ... @@ -904,12 +849,12 @@ Your cooperation is appreciated. to mounting filesystems over the net, swapping over the net, implementing block device in userland etc. - 44 char isdn4linux virtual modem - alternate devices + 44 char isdn4linux virtual modem - alternate devices 0 = /dev/cui0 Callout device for ttyI0 ... 63 = /dev/cui63 Callout device for ttyI63 - 44 block Flash Translation Layer (FTL) filesystems + 44 block Flash Translation Layer (FTL) filesystems 0 = /dev/ftla FTL on first Memory Technology Device 16 = /dev/ftlb FTL on second Memory Technology Device 32 = /dev/ftlc FTL on third Memory Technology Device @@ -920,7 +865,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the partition limit is 15 rather than 63 per disk (same as SCSI.) - 45 char isdn4linux ISDN BRI driver + 45 char isdn4linux ISDN BRI driver 0 = /dev/isdn0 First virtual B channel raw data ... 63 = /dev/isdn63 64th virtual B channel raw data @@ -934,7 +879,7 @@ Your cooperation is appreciated. 255 = /dev/isdninfo ISDN monitor interface - 45 block Parallel port IDE disk devices + 45 block Parallel port IDE disk devices 0 = /dev/pda First parallel port IDE disk 16 = /dev/pdb Second parallel port IDE disk 32 = /dev/pdc Third parallel port IDE disk @@ -944,21 +889,21 @@ Your cooperation is appreciated. disks (see major number 3) except that the partition limit is 15 rather than 63 per disk. - 46 char Comtrol Rocketport serial card + 46 char Comtrol Rocketport serial card 0 = /dev/ttyR0 First Rocketport port 1 = /dev/ttyR1 Second Rocketport port ... - 46 block Parallel port ATAPI CD-ROM devices + 46 block Parallel port ATAPI CD-ROM devices 0 = /dev/pcd0 First parallel port ATAPI CD-ROM 1 = /dev/pcd1 Second parallel port ATAPI CD-ROM 2 = /dev/pcd2 Third parallel port ATAPI CD-ROM 3 = /dev/pcd3 Fourth parallel port ATAPI CD-ROM - 47 char Comtrol Rocketport serial card - alternate devices + 47 char Comtrol Rocketport serial card - alternate devices 0 = /dev/cur0 Callout device for ttyR0 1 = /dev/cur1 Callout device for ttyR1 ... - 47 block Parallel port ATAPI disk devices + 47 block Parallel port ATAPI disk devices 0 = /dev/pf0 First parallel port ATAPI disk 1 = /dev/pf1 Second parallel port ATAPI disk 2 = /dev/pf2 Third parallel port ATAPI disk @@ -967,11 +912,11 @@ Your cooperation is appreciated. This driver is intended for floppy disks and similar devices and hence does not support partitioning. - 48 char SDL RISCom serial card + 48 char SDL RISCom serial card 0 = /dev/ttyL0 First RISCom port 1 = /dev/ttyL1 Second RISCom port ... - 48 block Mylex DAC960 PCI RAID controller; first controller + 48 block Mylex DAC960 PCI RAID controller; first controller 0 = /dev/rd/c0d0 First disk, whole disk 8 = /dev/rd/c0d1 Second disk, whole disk ... @@ -983,11 +928,11 @@ Your cooperation is appreciated. ... 7 = /dev/rd/c?d?p7 Seventh partition - 49 char SDL RISCom serial card - alternate devices + 49 char SDL RISCom serial card - alternate devices 0 = /dev/cul0 Callout device for ttyL0 1 = /dev/cul1 Callout device for ttyL1 ... - 49 block Mylex DAC960 PCI RAID controller; second controller + 49 block Mylex DAC960 PCI RAID controller; second controller 0 = /dev/rd/c1d0 First disk, whole disk 8 = /dev/rd/c1d1 Second disk, whole disk ... @@ -995,19 +940,19 @@ Your cooperation is appreciated. Partitions are handled as for major 48. - 50 char Reserved for GLINT + 50 char Reserved for GLINT - 50 block Mylex DAC960 PCI RAID controller; third controller + 50 block Mylex DAC960 PCI RAID controller; third controller 0 = /dev/rd/c2d0 First disk, whole disk 8 = /dev/rd/c2d1 Second disk, whole disk ... 248 = /dev/rd/c2d31 32nd disk, whole disk - 51 char Baycom radio modem OR Radio Tech BIM-XXX-RS232 radio modem + 51 char Baycom radio modem OR Radio Tech BIM-XXX-RS232 radio modem 0 = /dev/bc0 First Baycom radio modem 1 = /dev/bc1 Second Baycom radio modem ... - 51 block Mylex DAC960 PCI RAID controller; fourth controller + 51 block Mylex DAC960 PCI RAID controller; fourth controller 0 = /dev/rd/c3d0 First disk, whole disk 8 = /dev/rd/c3d1 Second disk, whole disk ... @@ -1015,13 +960,13 @@ Your cooperation is appreciated. Partitions are handled as for major 48. - 52 char Spellcaster DataComm/BRI ISDN card + 52 char Spellcaster DataComm/BRI ISDN card 0 = /dev/dcbri0 First DataComm card 1 = /dev/dcbri1 Second DataComm card 2 = /dev/dcbri2 Third DataComm card 3 = /dev/dcbri3 Fourth DataComm card - 52 block Mylex DAC960 PCI RAID controller; fifth controller + 52 block Mylex DAC960 PCI RAID controller; fifth controller 0 = /dev/rd/c4d0 First disk, whole disk 8 = /dev/rd/c4d1 Second disk, whole disk ... @@ -1029,7 +974,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. - 53 char BDM interface for remote debugging MC683xx microcontrollers + 53 char BDM interface for remote debugging MC683xx microcontrollers 0 = /dev/pd_bdm0 PD BDM interface on lp0 1 = /dev/pd_bdm1 PD BDM interface on lp1 2 = /dev/pd_bdm2 PD BDM interface on lp2 @@ -1043,7 +988,7 @@ Your cooperation is appreciated. Domain Interface and ICD is the commercial interface by P&E. - 53 block Mylex DAC960 PCI RAID controller; sixth controller + 53 block Mylex DAC960 PCI RAID controller; sixth controller 0 = /dev/rd/c5d0 First disk, whole disk 8 = /dev/rd/c5d1 Second disk, whole disk ... @@ -1051,7 +996,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. - 54 char Electrocardiognosis Holter serial card + 54 char Electrocardiognosis Holter serial card 0 = /dev/holter0 First Holter port 1 = /dev/holter1 Second Holter port 2 = /dev/holter2 Third Holter port @@ -1060,7 +1005,7 @@ Your cooperation is appreciated. to transfer data from Holter 24-hour heart monitoring equipment. - 54 block Mylex DAC960 PCI RAID controller; seventh controller + 54 block Mylex DAC960 PCI RAID controller; seventh controller 0 = /dev/rd/c6d0 First disk, whole disk 8 = /dev/rd/c6d1 Second disk, whole disk ... @@ -1068,10 +1013,10 @@ Your cooperation is appreciated. Partitions are handled as for major 48. - 55 char DSP56001 digital signal processor + 55 char DSP56001 digital signal processor 0 = /dev/dsp56k First DSP56001 - 55 block Mylex DAC960 PCI RAID controller; eighth controller + 55 block Mylex DAC960 PCI RAID controller; eighth controller 0 = /dev/rd/c7d0 First disk, whole disk 8 = /dev/rd/c7d1 Second disk, whole disk ... @@ -1079,42 +1024,42 @@ Your cooperation is appreciated. Partitions are handled as for major 48. - 56 char Apple Desktop Bus + 56 char Apple Desktop Bus 0 = /dev/adb ADB bus control Additional devices will be added to this number, all starting with /dev/adb. - 56 block Fifth IDE hard disk/CD-ROM interface + 56 block Fifth IDE hard disk/CD-ROM interface 0 = /dev/hdi Master: whole disk (or CD-ROM) 64 = /dev/hdj Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 57 char Hayes ESP serial card + 57 char Hayes ESP serial card 0 = /dev/ttyP0 First ESP port 1 = /dev/ttyP1 Second ESP port ... - 57 block Sixth IDE hard disk/CD-ROM interface + 57 block Sixth IDE hard disk/CD-ROM interface 0 = /dev/hdk Master: whole disk (or CD-ROM) 64 = /dev/hdl Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 58 char Hayes ESP serial card - alternate devices + 58 char Hayes ESP serial card - alternate devices 0 = /dev/cup0 Callout device for ttyP0 1 = /dev/cup1 Callout device for ttyP1 ... - 58 block Reserved for logical volume manager + 58 block Reserved for logical volume manager - 59 char sf firewall package + 59 char sf firewall package 0 = /dev/firewall Communication with sf kernel module - 59 block Generic PDA filesystem device + 59 block Generic PDA filesystem device 0 = /dev/pda0 First PDA device 1 = /dev/pda1 Second PDA device ... @@ -1127,17 +1072,17 @@ Your cooperation is appreciated. NAMING CONFLICT -- PROPOSED REVISED NAME /dev/rpda0 etc - 60-63 char LOCAL/EXPERIMENTAL USE + 60-63 char LOCAL/EXPERIMENTAL USE - 60-63 block LOCAL/EXPERIMENTAL USE + 60-63 block LOCAL/EXPERIMENTAL USE Allocated for local/experimental use. For devices not assigned official numbers, these ranges should be used in order to avoid conflicting with future assignments. - 64 char ENskip kernel encryption package + 64 char ENskip kernel encryption package 0 = /dev/enskip Communication with ENskip kernel module - 64 block Scramdisk/DriveCrypt encrypted devices + 64 block Scramdisk/DriveCrypt encrypted devices 0 = /dev/scramdisk/master Master node for ioctls 1 = /dev/scramdisk/1 First encrypted device 2 = /dev/scramdisk/2 Second encrypted device @@ -1152,7 +1097,7 @@ Your cooperation is appreciated. Requested by: andy@scramdisklinux.org - 65 char Sundance "plink" Transputer boards (obsolete, unused) + 65 char Sundance "plink" Transputer boards (obsolete, unused) 0 = /dev/plink0 First plink device 1 = /dev/plink1 Second plink device 2 = /dev/plink2 Third plink device @@ -1173,7 +1118,7 @@ Your cooperation is appreciated. This is a commercial driver; contact James Howes for information. - 65 block SCSI disk devices (16-31) + 65 block SCSI disk devices (16-31) 0 = /dev/sdq 17th SCSI disk whole disk 16 = /dev/sdr 18th SCSI disk whole disk 32 = /dev/sds 19th SCSI disk whole disk @@ -1184,12 +1129,12 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 66 char YARC PowerPC PCI coprocessor card + 66 char YARC PowerPC PCI coprocessor card 0 = /dev/yppcpci0 First YARC card 1 = /dev/yppcpci1 Second YARC card ... - 66 block SCSI disk devices (32-47) + 66 block SCSI disk devices (32-47) 0 = /dev/sdag 33th SCSI disk whole disk 16 = /dev/sdah 34th SCSI disk whole disk 32 = /dev/sdai 35th SCSI disk whole disk @@ -1200,12 +1145,12 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 67 char Coda network file system + 67 char Coda network file system 0 = /dev/cfs0 Coda cache manager See http://www.coda.cs.cmu.edu for information about Coda. - 67 block SCSI disk devices (48-63) + 67 block SCSI disk devices (48-63) 0 = /dev/sdaw 49th SCSI disk whole disk 16 = /dev/sdax 50th SCSI disk whole disk 32 = /dev/sday 51st SCSI disk whole disk @@ -1216,7 +1161,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 68 char CAPI 2.0 interface + 68 char CAPI 2.0 interface 0 = /dev/capi20 Control device 1 = /dev/capi20.00 First CAPI 2.0 application 2 = /dev/capi20.01 Second CAPI 2.0 application @@ -1226,7 +1171,7 @@ Your cooperation is appreciated. ISDN CAPI 2.0 driver for use with CAPI 2.0 applications; currently supports the AVM B1 card. - 68 block SCSI disk devices (64-79) + 68 block SCSI disk devices (64-79) 0 = /dev/sdbm 65th SCSI disk whole disk 16 = /dev/sdbn 66th SCSI disk whole disk 32 = /dev/sdbo 67th SCSI disk whole disk @@ -1237,10 +1182,10 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 69 char MA16 numeric accelerator card + 69 char MA16 numeric accelerator card 0 = /dev/ma16 Board memory access - 69 block SCSI disk devices (80-95) + 69 block SCSI disk devices (80-95) 0 = /dev/sdcc 81st SCSI disk whole disk 16 = /dev/sdcd 82nd SCSI disk whole disk 32 = /dev/sdce 83th SCSI disk whole disk @@ -1251,7 +1196,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 70 char SpellCaster Protocol Services Interface + 70 char SpellCaster Protocol Services Interface 0 = /dev/apscfg Configuration interface 1 = /dev/apsauth Authentication interface 2 = /dev/apslog Logging interface @@ -1260,7 +1205,7 @@ Your cooperation is appreciated. 65 = /dev/apsasync Async command interface 128 = /dev/apsmon Monitor interface - 70 block SCSI disk devices (96-111) + 70 block SCSI disk devices (96-111) 0 = /dev/sdcs 97th SCSI disk whole disk 16 = /dev/sdct 98th SCSI disk whole disk 32 = /dev/sdcu 99th SCSI disk whole disk @@ -1271,7 +1216,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 71 char Computone IntelliPort II serial card + 71 char Computone IntelliPort II serial card 0 = /dev/ttyF0 IntelliPort II board 0, port 0 1 = /dev/ttyF1 IntelliPort II board 0, port 1 ... @@ -1289,7 +1234,7 @@ Your cooperation is appreciated. ... 255 = /dev/ttyF255 IntelliPort II board 3, port 63 - 71 block SCSI disk devices (112-127) + 71 block SCSI disk devices (112-127) 0 = /dev/sddi 113th SCSI disk whole disk 16 = /dev/sddj 114th SCSI disk whole disk 32 = /dev/sddk 115th SCSI disk whole disk @@ -1300,7 +1245,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 72 char Computone IntelliPort II serial card - alternate devices + 72 char Computone IntelliPort II serial card - alternate devices 0 = /dev/cuf0 Callout device for ttyF0 1 = /dev/cuf1 Callout device for ttyF1 ... @@ -1318,7 +1263,7 @@ Your cooperation is appreciated. ... 255 = /dev/cuf255 Callout device for ttyF255 - 72 block Compaq Intelligent Drive Array, first controller + 72 block Compaq Intelligent Drive Array, first controller 0 = /dev/ida/c0d0 First logical drive whole disk 16 = /dev/ida/c0d1 Second logical drive whole disk ... @@ -1328,7 +1273,7 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 73 char Computone IntelliPort II serial card - control devices + 73 char Computone IntelliPort II serial card - control devices 0 = /dev/ip2ipl0 Loadware device for board 0 1 = /dev/ip2stat0 Status device for board 0 4 = /dev/ip2ipl1 Loadware device for board 1 @@ -1338,7 +1283,7 @@ Your cooperation is appreciated. 12 = /dev/ip2ipl3 Loadware device for board 3 13 = /dev/ip2stat3 Status device for board 3 - 73 block Compaq Intelligent Drive Array, second controller + 73 block Compaq Intelligent Drive Array, second controller 0 = /dev/ida/c1d0 First logical drive whole disk 16 = /dev/ida/c1d1 Second logical drive whole disk ... @@ -1348,7 +1293,7 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 74 char SCI bridge + 74 char SCI bridge 0 = /dev/SCI/0 SCI device 0 1 = /dev/SCI/1 SCI device 1 ... @@ -1356,7 +1301,7 @@ Your cooperation is appreciated. Currently for Dolphin Interconnect Solutions' PCI-SCI bridge. - 74 block Compaq Intelligent Drive Array, third controller + 74 block Compaq Intelligent Drive Array, third controller 0 = /dev/ida/c2d0 First logical drive whole disk 16 = /dev/ida/c2d1 Second logical drive whole disk ... @@ -1366,14 +1311,14 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 75 char Specialix IO8+ serial card + 75 char Specialix IO8+ serial card 0 = /dev/ttyW0 First IO8+ port, first card 1 = /dev/ttyW1 Second IO8+ port, first card ... 8 = /dev/ttyW8 First IO8+ port, second card ... - 75 block Compaq Intelligent Drive Array, fourth controller + 75 block Compaq Intelligent Drive Array, fourth controller 0 = /dev/ida/c3d0 First logical drive whole disk 16 = /dev/ida/c3d1 Second logical drive whole disk ... @@ -1383,14 +1328,14 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 76 char Specialix IO8+ serial card - alternate devices + 76 char Specialix IO8+ serial card - alternate devices 0 = /dev/cuw0 Callout device for ttyW0 1 = /dev/cuw1 Callout device for ttyW1 ... 8 = /dev/cuw8 Callout device for ttyW8 ... - 76 block Compaq Intelligent Drive Array, fifth controller + 76 block Compaq Intelligent Drive Array, fifth controller 0 = /dev/ida/c4d0 First logical drive whole disk 16 = /dev/ida/c4d1 Second logical drive whole disk ... @@ -1401,10 +1346,10 @@ Your cooperation is appreciated. partitions is 15. - 77 char ComScire Quantum Noise Generator + 77 char ComScire Quantum Noise Generator 0 = /dev/qng ComScire Quantum Noise Generator - 77 block Compaq Intelligent Drive Array, sixth controller + 77 block Compaq Intelligent Drive Array, sixth controller 0 = /dev/ida/c5d0 First logical drive whole disk 16 = /dev/ida/c5d1 Second logical drive whole disk ... @@ -1414,12 +1359,12 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 78 char PAM Software's multimodem boards + 78 char PAM Software's multimodem boards 0 = /dev/ttyM0 First PAM modem 1 = /dev/ttyM1 Second PAM modem ... - 78 block Compaq Intelligent Drive Array, seventh controller + 78 block Compaq Intelligent Drive Array, seventh controller 0 = /dev/ida/c6d0 First logical drive whole disk 16 = /dev/ida/c6d1 Second logical drive whole disk ... @@ -1429,12 +1374,12 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 79 char PAM Software's multimodem boards - alternate devices + 79 char PAM Software's multimodem boards - alternate devices 0 = /dev/cum0 Callout device for ttyM0 1 = /dev/cum1 Callout device for ttyM1 ... - 79 block Compaq Intelligent Drive Array, eighth controller + 79 block Compaq Intelligent Drive Array, eighth controller 0 = /dev/ida/c7d0 First logical drive whole disk 16 = /dev/ida/c7d1 Second logical drive whole disk ... @@ -1444,10 +1389,10 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. - 80 char Photometrics AT200 CCD camera + 80 char Photometrics AT200 CCD camera 0 = /dev/at200 Photometrics AT200 CCD camera - 80 block I2O hard disk + 80 block I2O hard disk 0 = /dev/i2o/hda First I2O hard disk, whole disk 16 = /dev/i2o/hdb Second I2O hard disk, whole disk ... @@ -1457,7 +1402,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 81 char video4linux + 81 char video4linux 0 = /dev/video0 Video capture/overlay device ... 63 = /dev/video63 Video capture/overlay device @@ -1475,7 +1420,7 @@ Your cooperation is appreciated. CONFIG_VIDEO_FIXED_MINOR_RANGES (default n) configuration option is set. - 81 block I2O hard disk + 81 block I2O hard disk 0 = /dev/i2o/hdq 17th I2O hard disk, whole disk 16 = /dev/i2o/hdr 18th I2O hard disk, whole disk ... @@ -1485,7 +1430,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 82 char WiNRADiO communications receiver card + 82 char WiNRADiO communications receiver card 0 = /dev/winradio0 First WiNRADiO card 1 = /dev/winradio1 Second WiNRADiO card ... @@ -1493,7 +1438,7 @@ Your cooperation is appreciated. The driver and documentation may be obtained from http://www.winradio.com/ - 82 block I2O hard disk + 82 block I2O hard disk 0 = /dev/i2o/hdag 33rd I2O hard disk, whole disk 16 = /dev/i2o/hdah 34th I2O hard disk, whole disk ... @@ -1503,14 +1448,14 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 83 char Matrox mga_vid video driver - 0 = /dev/mga_vid0 1st video card + 83 char Matrox mga_vid video driver + 0 = /dev/mga_vid0 1st video card 1 = /dev/mga_vid1 2nd video card 2 = /dev/mga_vid2 3rd video card ... - 15 = /dev/mga_vid15 16th video card + 15 = /dev/mga_vid15 16th video card - 83 block I2O hard disk + 83 block I2O hard disk 0 = /dev/i2o/hdaw 49th I2O hard disk, whole disk 16 = /dev/i2o/hdax 50th I2O hard disk, whole disk ... @@ -1520,11 +1465,11 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 84 char Ikon 1011[57] Versatec Greensheet Interface + 84 char Ikon 1011[57] Versatec Greensheet Interface 0 = /dev/ihcp0 First Greensheet port 1 = /dev/ihcp1 Second Greensheet port - 84 block I2O hard disk + 84 block I2O hard disk 0 = /dev/i2o/hdbm 65th I2O hard disk, whole disk 16 = /dev/i2o/hdbn 66th I2O hard disk, whole disk ... @@ -1534,13 +1479,13 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 85 char Linux/SGI shared memory input queue + 85 char Linux/SGI shared memory input queue 0 = /dev/shmiq Master shared input queue 1 = /dev/qcntl0 First device pushed 2 = /dev/qcntl1 Second device pushed ... - 85 block I2O hard disk + 85 block I2O hard disk 0 = /dev/i2o/hdcc 81st I2O hard disk, whole disk 16 = /dev/i2o/hdcd 82nd I2O hard disk, whole disk ... @@ -1550,12 +1495,12 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 86 char SCSI media changer + 86 char SCSI media changer 0 = /dev/sch0 First SCSI media changer 1 = /dev/sch1 Second SCSI media changer ... - 86 block I2O hard disk + 86 block I2O hard disk 0 = /dev/i2o/hdcs 97th I2O hard disk, whole disk 16 = /dev/i2o/hdct 98th I2O hard disk, whole disk ... @@ -1565,12 +1510,12 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 87 char Sony Control-A1 stereo control bus + 87 char Sony Control-A1 stereo control bus 0 = /dev/controla0 First device on chain 1 = /dev/controla1 Second device on chain ... - 87 block I2O hard disk + 87 block I2O hard disk 0 = /dev/i2o/hddi 113rd I2O hard disk, whole disk 16 = /dev/i2o/hddj 114th I2O hard disk, whole disk ... @@ -1580,59 +1525,59 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 88 char COMX synchronous serial card + 88 char COMX synchronous serial card 0 = /dev/comx0 COMX channel 0 1 = /dev/comx1 COMX channel 1 ... - 88 block Seventh IDE hard disk/CD-ROM interface + 88 block Seventh IDE hard disk/CD-ROM interface 0 = /dev/hdm Master: whole disk (or CD-ROM) 64 = /dev/hdn Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 89 char I2C bus interface + 89 char I2C bus interface 0 = /dev/i2c-0 First I2C adapter 1 = /dev/i2c-1 Second I2C adapter ... - 89 block Eighth IDE hard disk/CD-ROM interface + 89 block Eighth IDE hard disk/CD-ROM interface 0 = /dev/hdo Master: whole disk (or CD-ROM) 64 = /dev/hdp Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 90 char Memory Technology Device (RAM, ROM, Flash) + 90 char Memory Technology Device (RAM, ROM, Flash) 0 = /dev/mtd0 First MTD (rw) 1 = /dev/mtdr0 First MTD (ro) ... 30 = /dev/mtd15 16th MTD (rw) 31 = /dev/mtdr15 16th MTD (ro) - 90 block Ninth IDE hard disk/CD-ROM interface + 90 block Ninth IDE hard disk/CD-ROM interface 0 = /dev/hdq Master: whole disk (or CD-ROM) 64 = /dev/hdr Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 91 char CAN-Bus devices + 91 char CAN-Bus devices 0 = /dev/can0 First CAN-Bus controller 1 = /dev/can1 Second CAN-Bus controller ... - 91 block Tenth IDE hard disk/CD-ROM interface + 91 block Tenth IDE hard disk/CD-ROM interface 0 = /dev/hds Master: whole disk (or CD-ROM) 64 = /dev/hdt Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). - 92 char Reserved for ith Kommunikationstechnik MIC ISDN card + 92 char Reserved for ith Kommunikationstechnik MIC ISDN card - 92 block PPDD encrypted disk driver + 92 block PPDD encrypted disk driver 0 = /dev/ppdd0 First encrypted disk 1 = /dev/ppdd1 Second encrypted disk ... @@ -1641,35 +1586,35 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. - 93 char + 93 char - 93 block NAND Flash Translation Layer filesystem + 93 block NAND Flash Translation Layer filesystem 0 = /dev/nftla First NFTL layer 16 = /dev/nftlb Second NFTL layer ... 240 = /dev/nftlp 16th NTFL layer - 94 char + 94 char - 94 block IBM S/390 DASD block storage - 0 = /dev/dasda First DASD device, major - 1 = /dev/dasda1 First DASD device, block 1 - 2 = /dev/dasda2 First DASD device, block 2 - 3 = /dev/dasda3 First DASD device, block 3 - 4 = /dev/dasdb Second DASD device, major - 5 = /dev/dasdb1 Second DASD device, block 1 - 6 = /dev/dasdb2 Second DASD device, block 2 - 7 = /dev/dasdb3 Second DASD device, block 3 + 94 block IBM S/390 DASD block storage + 0 = /dev/dasda First DASD device, major + 1 = /dev/dasda1 First DASD device, block 1 + 2 = /dev/dasda2 First DASD device, block 2 + 3 = /dev/dasda3 First DASD device, block 3 + 4 = /dev/dasdb Second DASD device, major + 5 = /dev/dasdb1 Second DASD device, block 1 + 6 = /dev/dasdb2 Second DASD device, block 2 + 7 = /dev/dasdb3 Second DASD device, block 3 ... - 95 char IP filter + 95 char IP filter 0 = /dev/ipl Filter control device/log file 1 = /dev/ipnat NAT control device/log file 2 = /dev/ipstate State information log file 3 = /dev/ipauth Authentication control device/log file ... - 96 char Parallel port ATAPI tape devices + 96 char Parallel port ATAPI tape devices 0 = /dev/pt0 First parallel port ATAPI tape 1 = /dev/pt1 Second parallel port ATAPI tape ... @@ -1677,13 +1622,13 @@ Your cooperation is appreciated. 129 = /dev/npt1 Second p.p. ATAPI tape, no rewind ... - 96 block Inverse NAND Flash Translation Layer + 96 block Inverse NAND Flash Translation Layer 0 = /dev/inftla First INFTL layer 16 = /dev/inftlb Second INFTL layer ... 240 = /dev/inftlp 16th INTFL layer - 97 char Parallel port generic ATAPI interface + 97 char Parallel port generic ATAPI interface 0 = /dev/pg0 First parallel port ATAPI device 1 = /dev/pg1 Second parallel port ATAPI device 2 = /dev/pg2 Third parallel port ATAPI device @@ -1692,14 +1637,14 @@ Your cooperation is appreciated. These devices support the same API as the generic SCSI devices. - 98 char Control and Measurement Device (comedi) + 98 char Control and Measurement Device (comedi) 0 = /dev/comedi0 First comedi device 1 = /dev/comedi1 Second comedi device ... See http://stm.lbl.gov/comedi. - 98 block User-mode virtual block device + 98 block User-mode virtual block device 0 = /dev/ubda First user-mode block device 16 = /dev/udbb Second user-mode block device ... @@ -1710,26 +1655,26 @@ Your cooperation is appreciated. This device is used by the user-mode virtual kernel port. - 99 char Raw parallel ports + 99 char Raw parallel ports 0 = /dev/parport0 First parallel port 1 = /dev/parport1 Second parallel port ... - 99 block JavaStation flash disk + 99 block JavaStation flash disk 0 = /dev/jsfd JavaStation flash disk -100 char Telephony for Linux + 100 char Telephony for Linux 0 = /dev/phone0 First telephony device 1 = /dev/phone1 Second telephony device ... -101 char Motorola DSP 56xxx board + 101 char Motorola DSP 56xxx board 0 = /dev/mdspstat Status information 1 = /dev/mdsp1 First DSP board I/O controls ... 16 = /dev/mdsp16 16th DSP board I/O controls -101 block AMI HyperDisk RAID controller + 101 block AMI HyperDisk RAID controller 0 = /dev/amiraid/ar0 First array whole disk 16 = /dev/amiraid/ar1 Second array whole disk ... @@ -1742,9 +1687,9 @@ Your cooperation is appreciated. ... 15 = /dev/amiraid/ar?p15 15th partition -102 char + 102 char -102 block Compressed block device + 102 block Compressed block device 0 = /dev/cbd/a First compressed block device, whole device 16 = /dev/cbd/b Second compressed block device, whole device ... @@ -1754,7 +1699,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. -103 char Arla network file system + 103 char Arla network file system 0 = /dev/nnpfs0 First NNPFS device 1 = /dev/nnpfs1 Second NNPFS device @@ -1765,12 +1710,12 @@ Your cooperation is appreciated. write to or see http://www.stacken.kth.se/project/arla/ -103 block Audit device + 103 block Audit device 0 = /dev/audit Audit device -104 char Flash BIOS support + 104 char Flash BIOS support -104 block Compaq Next Generation Drive Array, first controller + 104 block Compaq Next Generation Drive Array, first controller 0 = /dev/cciss/c0d0 First logical drive, whole disk 16 = /dev/cciss/c0d1 Second logical drive, whole disk ... @@ -1780,12 +1725,12 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -105 char Comtrol VS-1000 serial controller + 105 char Comtrol VS-1000 serial controller 0 = /dev/ttyV0 First VS-1000 port 1 = /dev/ttyV1 Second VS-1000 port ... -105 block Compaq Next Generation Drive Array, second controller + 105 block Compaq Next Generation Drive Array, second controller 0 = /dev/cciss/c1d0 First logical drive, whole disk 16 = /dev/cciss/c1d1 Second logical drive, whole disk ... @@ -1795,12 +1740,12 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -106 char Comtrol VS-1000 serial controller - alternate devices + 106 char Comtrol VS-1000 serial controller - alternate devices 0 = /dev/cuv0 First VS-1000 port 1 = /dev/cuv1 Second VS-1000 port ... -106 block Compaq Next Generation Drive Array, third controller + 106 block Compaq Next Generation Drive Array, third controller 0 = /dev/cciss/c2d0 First logical drive, whole disk 16 = /dev/cciss/c2d1 Second logical drive, whole disk ... @@ -1810,10 +1755,10 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -107 char 3Dfx Voodoo Graphics device + 107 char 3Dfx Voodoo Graphics device 0 = /dev/3dfx Primary 3Dfx graphics device -107 block Compaq Next Generation Drive Array, fourth controller + 107 block Compaq Next Generation Drive Array, fourth controller 0 = /dev/cciss/c3d0 First logical drive, whole disk 16 = /dev/cciss/c3d1 Second logical drive, whole disk ... @@ -1823,10 +1768,10 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -108 char Device independent PPP interface + 108 char Device independent PPP interface 0 = /dev/ppp Device independent PPP interface -108 block Compaq Next Generation Drive Array, fifth controller + 108 block Compaq Next Generation Drive Array, fifth controller 0 = /dev/cciss/c4d0 First logical drive, whole disk 16 = /dev/cciss/c4d1 Second logical drive, whole disk ... @@ -1836,9 +1781,9 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -109 char Reserved for logical volume manager + 109 char Reserved for logical volume manager -109 block Compaq Next Generation Drive Array, sixth controller + 109 block Compaq Next Generation Drive Array, sixth controller 0 = /dev/cciss/c5d0 First logical drive, whole disk 16 = /dev/cciss/c5d1 Second logical drive, whole disk ... @@ -1848,12 +1793,12 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -110 char miroMEDIA Surround board + 110 char miroMEDIA Surround board 0 = /dev/srnd0 First miroMEDIA Surround board 1 = /dev/srnd1 Second miroMEDIA Surround board ... -110 block Compaq Next Generation Drive Array, seventh controller + 110 block Compaq Next Generation Drive Array, seventh controller 0 = /dev/cciss/c6d0 First logical drive, whole disk 16 = /dev/cciss/c6d1 Second logical drive, whole disk ... @@ -1863,9 +1808,9 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -111 char + 111 char -111 block Compaq Next Generation Drive Array, eighth controller + 111 block Compaq Next Generation Drive Array, eighth controller 0 = /dev/cciss/c7d0 First logical drive, whole disk 16 = /dev/cciss/c7d1 Second logical drive, whole disk ... @@ -1875,7 +1820,7 @@ Your cooperation is appreciated. DAC960 (see major number 48) except that the limit on partitions is 15. -112 char ISI serial card + 112 char ISI serial card 0 = /dev/ttyM0 First ISI port 1 = /dev/ttyM1 Second ISI port ... @@ -1883,7 +1828,7 @@ Your cooperation is appreciated. There is currently a device-naming conflict between these and PAM multimodems (major 78). -112 block IBM iSeries virtual disk + 112 block IBM iSeries virtual disk 0 = /dev/iseries/vda First virtual disk, whole disk 8 = /dev/iseries/vdb Second virtual disk, whole disk ... @@ -1896,17 +1841,17 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 7. -113 char ISI serial card - alternate devices + 113 char ISI serial card - alternate devices 0 = /dev/cum0 Callout device for ttyM0 1 = /dev/cum1 Callout device for ttyM1 ... -113 block IBM iSeries virtual CD-ROM + 113 block IBM iSeries virtual CD-ROM 0 = /dev/iseries/vcda First virtual CD-ROM 1 = /dev/iseries/vcdb Second virtual CD-ROM ... -114 char Picture Elements ISE board + 114 char Picture Elements ISE board 0 = /dev/ise0 First ISE board 1 = /dev/ise1 Second ISE board ... @@ -1919,24 +1864,24 @@ Your cooperation is appreciated. I/O access to the board, the /dev/isex0 nodes command nodes used to control the board. -114 block IDE BIOS powered software RAID interfaces such as the - Promise Fastrak + 114 block IDE BIOS powered software RAID interfaces such as the + Promise Fastrak - 0 = /dev/ataraid/d0 - 1 = /dev/ataraid/d0p1 - 2 = /dev/ataraid/d0p2 - ... - 16 = /dev/ataraid/d1 - 17 = /dev/ataraid/d1p1 - 18 = /dev/ataraid/d1p2 - ... - 255 = /dev/ataraid/d15p15 + 0 = /dev/ataraid/d0 + 1 = /dev/ataraid/d0p1 + 2 = /dev/ataraid/d0p2 + ... + 16 = /dev/ataraid/d1 + 17 = /dev/ataraid/d1p1 + 18 = /dev/ataraid/d1p2 + ... + 255 = /dev/ataraid/d15p15 Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -115 char TI link cable devices (115 was formerly the console driver speaker) + 115 char TI link cable devices (115 was formerly the console driver speaker) 0 = /dev/tipar0 Parallel cable on first parallel port ... 7 = /dev/tipar7 Parallel cable on seventh parallel port @@ -1949,28 +1894,28 @@ Your cooperation is appreciated. ... 47 = /dev/tiusb31 32nd USB cable -115 block NetWare (NWFS) Devices (0-255) + 115 block NetWare (NWFS) Devices (0-255) - The NWFS (NetWare) devices are used to present a - collection of NetWare Mirror Groups or NetWare - Partitions as a logical storage segment for - use in mounting NetWare volumes. A maximum of - 256 NetWare volumes can be supported in a single - machine. + The NWFS (NetWare) devices are used to present a + collection of NetWare Mirror Groups or NetWare + Partitions as a logical storage segment for + use in mounting NetWare volumes. A maximum of + 256 NetWare volumes can be supported in a single + machine. - http://cgfa.telepac.pt/ftp2/kernel.org/linux/kernel/people/jmerkey/nwfs/ + http://cgfa.telepac.pt/ftp2/kernel.org/linux/kernel/people/jmerkey/nwfs/ - 0 = /dev/nwfs/v0 First NetWare (NWFS) Logical Volume - 1 = /dev/nwfs/v1 Second NetWare (NWFS) Logical Volume - 2 = /dev/nwfs/v2 Third NetWare (NWFS) Logical Volume - ... - 255 = /dev/nwfs/v255 Last NetWare (NWFS) Logical Volume + 0 = /dev/nwfs/v0 First NetWare (NWFS) Logical Volume + 1 = /dev/nwfs/v1 Second NetWare (NWFS) Logical Volume + 2 = /dev/nwfs/v2 Third NetWare (NWFS) Logical Volume + ... + 255 = /dev/nwfs/v255 Last NetWare (NWFS) Logical Volume -116 char Advanced Linux Sound Driver (ALSA) + 116 char Advanced Linux Sound Driver (ALSA) -116 block MicroMemory battery backed RAM adapter (NVRAM) - Supports 16 boards, 15 partitions each. - Requested by neilb at cse.unsw.edu.au. + 116 block MicroMemory battery backed RAM adapter (NVRAM) + Supports 16 boards, 15 partitions each. + Requested by neilb at cse.unsw.edu.au. 0 = /dev/umem/d0 Whole of first board 1 = /dev/umem/d0p1 First partition of first board @@ -1982,7 +1927,7 @@ Your cooperation is appreciated. ... 255= /dev/umem/d15p15 15th partition of 16th board. -117 char COSA/SRP synchronous serial card + 117 char COSA/SRP synchronous serial card 0 = /dev/cosa0c0 1st board, 1st channel 1 = /dev/cosa0c1 1st board, 2nd channel ... @@ -1990,147 +1935,147 @@ Your cooperation is appreciated. 17 = /dev/cosa1c1 2nd board, 2nd channel ... -117 block Enterprise Volume Management System (EVMS) + 117 block Enterprise Volume Management System (EVMS) - The EVMS driver uses a layered, plug-in model to provide - unparalleled flexibility and extensibility in managing - storage. This allows for easy expansion or customization - of various levels of volume management. Requested by - Mark Peloquin (peloquin at us.ibm.com). + The EVMS driver uses a layered, plug-in model to provide + unparalleled flexibility and extensibility in managing + storage. This allows for easy expansion or customization + of various levels of volume management. Requested by + Mark Peloquin (peloquin at us.ibm.com). - Note: EVMS populates and manages all the devnodes in - /dev/evms. + Note: EVMS populates and manages all the devnodes in + /dev/evms. - http://sf.net/projects/evms + http://sf.net/projects/evms - 0 = /dev/evms/block_device EVMS block device - 1 = /dev/evms/legacyname1 First EVMS legacy device - 2 = /dev/evms/legacyname2 Second EVMS legacy device - ... - Both ranges can grow (down or up) until they meet. - ... - 254 = /dev/evms/EVMSname2 Second EVMS native device - 255 = /dev/evms/EVMSname1 First EVMS native device + 0 = /dev/evms/block_device EVMS block device + 1 = /dev/evms/legacyname1 First EVMS legacy device + 2 = /dev/evms/legacyname2 Second EVMS legacy device + ... + Both ranges can grow (down or up) until they meet. + ... + 254 = /dev/evms/EVMSname2 Second EVMS native device + 255 = /dev/evms/EVMSname1 First EVMS native device - Note: legacyname(s) are derived from the normal legacy - device names. For example, /dev/hda5 would become - /dev/evms/hda5. + Note: legacyname(s) are derived from the normal legacy + device names. For example, /dev/hda5 would become + /dev/evms/hda5. -118 char IBM Cryptographic Accelerator + 118 char IBM Cryptographic Accelerator 0 = /dev/ica Virtual interface to all IBM Crypto Accelerators 1 = /dev/ica0 IBMCA Device 0 2 = /dev/ica1 IBMCA Device 1 ... -119 char VMware virtual network control + 119 char VMware virtual network control 0 = /dev/vnet0 1st virtual network 1 = /dev/vnet1 2nd virtual network ... -120-127 char LOCAL/EXPERIMENTAL USE + 120-127 char LOCAL/EXPERIMENTAL USE -120-127 block LOCAL/EXPERIMENTAL USE + 120-127 block LOCAL/EXPERIMENTAL USE Allocated for local/experimental use. For devices not assigned official numbers, these ranges should be used in order to avoid conflicting with future assignments. -128-135 char Unix98 PTY masters + 128-135 char Unix98 PTY masters These devices should not have corresponding device nodes; instead they should be accessed through the /dev/ptmx cloning interface. -128 block SCSI disk devices (128-143) - 0 = /dev/sddy 129th SCSI disk whole disk - 16 = /dev/sddz 130th SCSI disk whole disk - 32 = /dev/sdea 131th SCSI disk whole disk - ... - 240 = /dev/sden 144th SCSI disk whole disk + 128 block SCSI disk devices (128-143) + 0 = /dev/sddy 129th SCSI disk whole disk + 16 = /dev/sddz 130th SCSI disk whole disk + 32 = /dev/sdea 131th SCSI disk whole disk + ... + 240 = /dev/sden 144th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -129 block SCSI disk devices (144-159) - 0 = /dev/sdeo 145th SCSI disk whole disk - 16 = /dev/sdep 146th SCSI disk whole disk - 32 = /dev/sdeq 147th SCSI disk whole disk - ... - 240 = /dev/sdfd 160th SCSI disk whole disk + 129 block SCSI disk devices (144-159) + 0 = /dev/sdeo 145th SCSI disk whole disk + 16 = /dev/sdep 146th SCSI disk whole disk + 32 = /dev/sdeq 147th SCSI disk whole disk + ... + 240 = /dev/sdfd 160th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -130 char (Misc devices) + 130 char (Misc devices) -130 block SCSI disk devices (160-175) - 0 = /dev/sdfe 161st SCSI disk whole disk - 16 = /dev/sdff 162nd SCSI disk whole disk - 32 = /dev/sdfg 163rd SCSI disk whole disk - ... - 240 = /dev/sdft 176th SCSI disk whole disk + 130 block SCSI disk devices (160-175) + 0 = /dev/sdfe 161st SCSI disk whole disk + 16 = /dev/sdff 162nd SCSI disk whole disk + 32 = /dev/sdfg 163rd SCSI disk whole disk + ... + 240 = /dev/sdft 176th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -131 block SCSI disk devices (176-191) - 0 = /dev/sdfu 177th SCSI disk whole disk - 16 = /dev/sdfv 178th SCSI disk whole disk - 32 = /dev/sdfw 179th SCSI disk whole disk - ... - 240 = /dev/sdgj 192nd SCSI disk whole disk + 131 block SCSI disk devices (176-191) + 0 = /dev/sdfu 177th SCSI disk whole disk + 16 = /dev/sdfv 178th SCSI disk whole disk + 32 = /dev/sdfw 179th SCSI disk whole disk + ... + 240 = /dev/sdgj 192nd SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -132 block SCSI disk devices (192-207) - 0 = /dev/sdgk 193rd SCSI disk whole disk - 16 = /dev/sdgl 194th SCSI disk whole disk - 32 = /dev/sdgm 195th SCSI disk whole disk - ... - 240 = /dev/sdgz 208th SCSI disk whole disk + 132 block SCSI disk devices (192-207) + 0 = /dev/sdgk 193rd SCSI disk whole disk + 16 = /dev/sdgl 194th SCSI disk whole disk + 32 = /dev/sdgm 195th SCSI disk whole disk + ... + 240 = /dev/sdgz 208th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -133 block SCSI disk devices (208-223) - 0 = /dev/sdha 209th SCSI disk whole disk - 16 = /dev/sdhb 210th SCSI disk whole disk - 32 = /dev/sdhc 211th SCSI disk whole disk - ... - 240 = /dev/sdhp 224th SCSI disk whole disk + 133 block SCSI disk devices (208-223) + 0 = /dev/sdha 209th SCSI disk whole disk + 16 = /dev/sdhb 210th SCSI disk whole disk + 32 = /dev/sdhc 211th SCSI disk whole disk + ... + 240 = /dev/sdhp 224th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -134 block SCSI disk devices (224-239) - 0 = /dev/sdhq 225th SCSI disk whole disk - 16 = /dev/sdhr 226th SCSI disk whole disk - 32 = /dev/sdhs 227th SCSI disk whole disk - ... - 240 = /dev/sdif 240th SCSI disk whole disk + 134 block SCSI disk devices (224-239) + 0 = /dev/sdhq 225th SCSI disk whole disk + 16 = /dev/sdhr 226th SCSI disk whole disk + 32 = /dev/sdhs 227th SCSI disk whole disk + ... + 240 = /dev/sdif 240th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -135 block SCSI disk devices (240-255) - 0 = /dev/sdig 241st SCSI disk whole disk - 16 = /dev/sdih 242nd SCSI disk whole disk - 32 = /dev/sdih 243rd SCSI disk whole disk - ... - 240 = /dev/sdiv 256th SCSI disk whole disk + 135 block SCSI disk devices (240-255) + 0 = /dev/sdig 241st SCSI disk whole disk + 16 = /dev/sdih 242nd SCSI disk whole disk + 32 = /dev/sdih 243rd SCSI disk whole disk + ... + 240 = /dev/sdiv 256th SCSI disk whole disk Partitions are handled in the same way as for IDE disks (see major number 3) except that the limit on partitions is 15. -136-143 char Unix98 PTY slaves + 136-143 char Unix98 PTY slaves 0 = /dev/pts/0 First Unix98 pseudo-TTY 1 = /dev/pts/1 Second Unix98 pseudo-TTY ... @@ -2142,7 +2087,7 @@ Your cooperation is appreciated. *most* distributions the appropriate options are "mode=0620,gid=".) -136 block Mylex DAC960 PCI RAID controller; ninth controller + 136 block Mylex DAC960 PCI RAID controller; ninth controller 0 = /dev/rd/c8d0 First disk, whole disk 8 = /dev/rd/c8d1 Second disk, whole disk ... @@ -2150,7 +2095,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -137 block Mylex DAC960 PCI RAID controller; tenth controller + 137 block Mylex DAC960 PCI RAID controller; tenth controller 0 = /dev/rd/c9d0 First disk, whole disk 8 = /dev/rd/c9d1 Second disk, whole disk ... @@ -2158,7 +2103,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -138 block Mylex DAC960 PCI RAID controller; eleventh controller + 138 block Mylex DAC960 PCI RAID controller; eleventh controller 0 = /dev/rd/c10d0 First disk, whole disk 8 = /dev/rd/c10d1 Second disk, whole disk ... @@ -2166,7 +2111,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -139 block Mylex DAC960 PCI RAID controller; twelfth controller + 139 block Mylex DAC960 PCI RAID controller; twelfth controller 0 = /dev/rd/c11d0 First disk, whole disk 8 = /dev/rd/c11d1 Second disk, whole disk ... @@ -2174,7 +2119,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -140 block Mylex DAC960 PCI RAID controller; thirteenth controller + 140 block Mylex DAC960 PCI RAID controller; thirteenth controller 0 = /dev/rd/c12d0 First disk, whole disk 8 = /dev/rd/c12d1 Second disk, whole disk ... @@ -2182,7 +2127,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -141 block Mylex DAC960 PCI RAID controller; fourteenth controller + 141 block Mylex DAC960 PCI RAID controller; fourteenth controller 0 = /dev/rd/c13d0 First disk, whole disk 8 = /dev/rd/c13d1 Second disk, whole disk ... @@ -2190,7 +2135,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -142 block Mylex DAC960 PCI RAID controller; fifteenth controller + 142 block Mylex DAC960 PCI RAID controller; fifteenth controller 0 = /dev/rd/c14d0 First disk, whole disk 8 = /dev/rd/c14d1 Second disk, whole disk ... @@ -2198,7 +2143,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -143 block Mylex DAC960 PCI RAID controller; sixteenth controller + 143 block Mylex DAC960 PCI RAID controller; sixteenth controller 0 = /dev/rd/c15d0 First disk, whole disk 8 = /dev/rd/c15d1 Second disk, whole disk ... @@ -2206,7 +2151,7 @@ Your cooperation is appreciated. Partitions are handled as for major 48. -144 char Encapsulated PPP + 144 char Encapsulated PPP 0 = /dev/pppox0 First PPP over Ethernet ... 63 = /dev/pppox63 64th PPP over Ethernet @@ -2216,11 +2161,11 @@ Your cooperation is appreciated. The SST 5136-DN DeviceNet interface driver has been relocated to major 183 due to an unfortunate conflict. -144 block Expansion Area #1 for more non-device (e.g. NFS) mounts + 144 block Expansion Area #1 for more non-device (e.g. NFS) mounts 0 = mounted device 256 255 = mounted device 511 -145 char SAM9407-based soundcard + 145 char SAM9407-based soundcard 0 = /dev/sam0_mixer 1 = /dev/sam0_sequencer 2 = /dev/sam0_midi00 @@ -2241,66 +2186,66 @@ Your cooperation is appreciated. addons, which are sam9407 specific. OSS can be operated simultaneously, taking care of the codec. -145 block Expansion Area #2 for more non-device (e.g. NFS) mounts + 145 block Expansion Area #2 for more non-device (e.g. NFS) mounts 0 = mounted device 512 255 = mounted device 767 -146 char SYSTRAM SCRAMNet mirrored-memory network + 146 char SYSTRAM SCRAMNet mirrored-memory network 0 = /dev/scramnet0 First SCRAMNet device 1 = /dev/scramnet1 Second SCRAMNet device ... -146 block Expansion Area #3 for more non-device (e.g. NFS) mounts + 146 block Expansion Area #3 for more non-device (e.g. NFS) mounts 0 = mounted device 768 255 = mounted device 1023 -147 char Aureal Semiconductor Vortex Audio device + 147 char Aureal Semiconductor Vortex Audio device 0 = /dev/aureal0 First Aureal Vortex 1 = /dev/aureal1 Second Aureal Vortex ... -147 block Distributed Replicated Block Device (DRBD) + 147 block Distributed Replicated Block Device (DRBD) 0 = /dev/drbd0 First DRBD device 1 = /dev/drbd1 Second DRBD device ... -148 char Technology Concepts serial card + 148 char Technology Concepts serial card 0 = /dev/ttyT0 First TCL port 1 = /dev/ttyT1 Second TCL port ... -149 char Technology Concepts serial card - alternate devices + 149 char Technology Concepts serial card - alternate devices 0 = /dev/cut0 Callout device for ttyT0 1 = /dev/cut0 Callout device for ttyT1 ... -150 char Real-Time Linux FIFOs + 150 char Real-Time Linux FIFOs 0 = /dev/rtf0 First RTLinux FIFO 1 = /dev/rtf1 Second RTLinux FIFO ... -151 char DPT I2O SmartRaid V controller + 151 char DPT I2O SmartRaid V controller 0 = /dev/dpti0 First DPT I2O adapter 1 = /dev/dpti1 Second DPT I2O adapter ... -152 char EtherDrive Control Device + 152 char EtherDrive Control Device 0 = /dev/etherd/ctl Connect/Disconnect an EtherDrive 1 = /dev/etherd/err Monitor errors 2 = /dev/etherd/raw Raw AoE packet monitor -152 block EtherDrive Block Devices + 152 block EtherDrive Block Devices 0 = /dev/etherd/0 EtherDrive 0 ... 255 = /dev/etherd/255 EtherDrive 255 -153 char SPI Bus Interface (sometimes referred to as MicroWire) + 153 char SPI Bus Interface (sometimes referred to as MicroWire) 0 = /dev/spi0 First SPI device on the bus 1 = /dev/spi1 Second SPI device on the bus ... 15 = /dev/spi15 Sixteenth SPI device on the bus -153 block Enhanced Metadisk RAID (EMD) storage units + 153 block Enhanced Metadisk RAID (EMD) storage units 0 = /dev/emd/0 First unit 1 = /dev/emd/0p1 Partition 1 on First unit 2 = /dev/emd/0p2 Partition 2 on First unit @@ -2316,41 +2261,41 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 15. -154 char Specialix RIO serial card + 154 char Specialix RIO serial card 0 = /dev/ttySR0 First RIO port ... 255 = /dev/ttySR255 256th RIO port -155 char Specialix RIO serial card - alternate devices + 155 char Specialix RIO serial card - alternate devices 0 = /dev/cusr0 Callout device for ttySR0 ... 255 = /dev/cusr255 Callout device for ttySR255 -156 char Specialix RIO serial card + 156 char Specialix RIO serial card 0 = /dev/ttySR256 257th RIO port ... 255 = /dev/ttySR511 512th RIO port -157 char Specialix RIO serial card - alternate devices + 157 char Specialix RIO serial card - alternate devices 0 = /dev/cusr256 Callout device for ttySR256 ... 255 = /dev/cusr511 Callout device for ttySR511 -158 char Dialogic GammaLink fax driver + 158 char Dialogic GammaLink fax driver 0 = /dev/gfax0 GammaLink channel 0 1 = /dev/gfax1 GammaLink channel 1 ... -159 char RESERVED + 159 char RESERVED -159 block RESERVED + 159 block RESERVED -160 char General Purpose Instrument Bus (GPIB) + 160 char General Purpose Instrument Bus (GPIB) 0 = /dev/gpib0 First GPIB bus 1 = /dev/gpib1 Second GPIB bus ... -160 block Carmel 8-port SATA Disks on First Controller + 160 block Carmel 8-port SATA Disks on First Controller 0 = /dev/carmel/0 SATA disk 0 whole disk 1 = /dev/carmel/0p1 SATA disk 0 partition 1 ... @@ -2365,7 +2310,7 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 31. -161 char IrCOMM devices (IrDA serial/parallel emulation) + 161 char IrCOMM devices (IrDA serial/parallel emulation) 0 = /dev/ircomm0 First IrCOMM device 1 = /dev/ircomm1 Second IrCOMM device ... @@ -2373,7 +2318,7 @@ Your cooperation is appreciated. 17 = /dev/irlpt1 Second IrLPT device ... -161 block Carmel 8-port SATA Disks on Second Controller + 161 block Carmel 8-port SATA Disks on Second Controller 0 = /dev/carmel/8 SATA disk 8 whole disk 1 = /dev/carmel/8p1 SATA disk 8 partition 1 ... @@ -2388,17 +2333,17 @@ Your cooperation is appreciated. disks (see major number 3) except that the limit on partitions is 31. -162 char Raw block device interface + 162 char Raw block device interface 0 = /dev/rawctl Raw I/O control device 1 = /dev/raw/raw1 First raw I/O device 2 = /dev/raw/raw2 Second raw I/O device ... - max minor number of raw device is set by kernel config - MAX_RAW_DEVS or raw module parameter 'max_raw_devs' + max minor number of raw device is set by kernel config + MAX_RAW_DEVS or raw module parameter 'max_raw_devs' -163 char + 163 char -164 char Chase Research AT/PCI-Fast serial card + 164 char Chase Research AT/PCI-Fast serial card 0 = /dev/ttyCH0 AT/PCI-Fast board 0, port 0 ... 15 = /dev/ttyCH15 AT/PCI-Fast board 0, port 15 @@ -2412,67 +2357,67 @@ Your cooperation is appreciated. ... 63 = /dev/ttyCH63 AT/PCI-Fast board 3, port 15 -165 char Chase Research AT/PCI-Fast serial card - alternate devices + 165 char Chase Research AT/PCI-Fast serial card - alternate devices 0 = /dev/cuch0 Callout device for ttyCH0 ... 63 = /dev/cuch63 Callout device for ttyCH63 -166 char ACM USB modems + 166 char ACM USB modems 0 = /dev/ttyACM0 First ACM modem 1 = /dev/ttyACM1 Second ACM modem ... -167 char ACM USB modems - alternate devices + 167 char ACM USB modems - alternate devices 0 = /dev/cuacm0 Callout device for ttyACM0 1 = /dev/cuacm1 Callout device for ttyACM1 ... -168 char Eracom CSA7000 PCI encryption adaptor + 168 char Eracom CSA7000 PCI encryption adaptor 0 = /dev/ecsa0 First CSA7000 1 = /dev/ecsa1 Second CSA7000 ... -169 char Eracom CSA8000 PCI encryption adaptor + 169 char Eracom CSA8000 PCI encryption adaptor 0 = /dev/ecsa8-0 First CSA8000 1 = /dev/ecsa8-1 Second CSA8000 ... -170 char AMI MegaRAC remote access controller + 170 char AMI MegaRAC remote access controller 0 = /dev/megarac0 First MegaRAC card 1 = /dev/megarac1 Second MegaRAC card ... -171 char Reserved for IEEE 1394 (Firewire) + 171 char Reserved for IEEE 1394 (Firewire) -172 char Moxa Intellio serial card + 172 char Moxa Intellio serial card 0 = /dev/ttyMX0 First Moxa port 1 = /dev/ttyMX1 Second Moxa port ... 127 = /dev/ttyMX127 128th Moxa port 128 = /dev/moxactl Moxa control port -173 char Moxa Intellio serial card - alternate devices + 173 char Moxa Intellio serial card - alternate devices 0 = /dev/cumx0 Callout device for ttyMX0 1 = /dev/cumx1 Callout device for ttyMX1 ... 127 = /dev/cumx127 Callout device for ttyMX127 -174 char SmartIO serial card + 174 char SmartIO serial card 0 = /dev/ttySI0 First SmartIO port 1 = /dev/ttySI1 Second SmartIO port ... -175 char SmartIO serial card - alternate devices + 175 char SmartIO serial card - alternate devices 0 = /dev/cusi0 Callout device for ttySI0 1 = /dev/cusi1 Callout device for ttySI1 ... -176 char nCipher nFast PCI crypto accelerator + 176 char nCipher nFast PCI crypto accelerator 0 = /dev/nfastpci0 First nFast PCI device 1 = /dev/nfastpci1 First nFast PCI device ... -177 char TI PCILynx memory spaces + 177 char TI PCILynx memory spaces 0 = /dev/pcilynx/aux0 AUX space of first PCILynx card ... 15 = /dev/pcilynx/aux15 AUX space of 16th PCILynx card @@ -2483,12 +2428,12 @@ Your cooperation is appreciated. ... 47 = /dev/pcilynx/ram15 RAM space of 16th PCILynx card -178 char Giganet cLAN1xxx virtual interface adapter + 178 char Giganet cLAN1xxx virtual interface adapter 0 = /dev/clanvi0 First cLAN adapter 1 = /dev/clanvi1 Second cLAN adapter ... -179 block MMC block devices + 179 block MMC block devices 0 = /dev/mmcblk0 First SD/MMC card 1 = /dev/mmcblk0p1 First partition on first MMC card 8 = /dev/mmcblk1 Second SD/MMC card @@ -2500,12 +2445,12 @@ Your cooperation is appreciated. bump the offset between each card to be the configured value instead of the default 8. -179 char CCube DVXChip-based PCI products + 179 char CCube DVXChip-based PCI products 0 = /dev/dvxirq0 First DVX device 1 = /dev/dvxirq1 Second DVX device ... -180 char USB devices + 180 char USB devices 0 = /dev/usb/lp0 First USB printer ... 15 = /dev/usb/lp15 16th USB printer @@ -2539,23 +2484,23 @@ Your cooperation is appreciated. ... 209 = /dev/usb/yurex16 16th USB Yurex device -180 block USB block devices + 180 block USB block devices 0 = /dev/uba First USB block device 8 = /dev/ubb Second USB block device 16 = /dev/ubc Third USB block device - ... + ... -181 char Conrad Electronic parallel port radio clocks + 181 char Conrad Electronic parallel port radio clocks 0 = /dev/pcfclock0 First Conrad radio clock 1 = /dev/pcfclock1 Second Conrad radio clock ... -182 char Picture Elements THR2 binarizer + 182 char Picture Elements THR2 binarizer 0 = /dev/pethr0 First THR2 board 1 = /dev/pethr1 Second THR2 board ... -183 char SST 5136-DN DeviceNet interface + 183 char SST 5136-DN DeviceNet interface 0 = /dev/ss5136dn0 First DeviceNet interface 1 = /dev/ss5136dn1 Second DeviceNet interface ... @@ -2563,12 +2508,12 @@ Your cooperation is appreciated. This device used to be assigned to major number 144. It had to be moved due to an unfortunate conflict. -184 char Picture Elements' video simulator/sender + 184 char Picture Elements' video simulator/sender 0 = /dev/pevss0 First sender board 1 = /dev/pevss1 Second sender board ... -185 char InterMezzo high availability file system + 185 char InterMezzo high availability file system 0 = /dev/intermezzo0 First cache manager 1 = /dev/intermezzo1 Second cache manager ... @@ -2576,48 +2521,48 @@ Your cooperation is appreciated. See http://web.archive.org/web/20080115195241/ http://inter-mezzo.org/index.html -186 char Object-based storage control device + 186 char Object-based storage control device 0 = /dev/obd0 First obd control device 1 = /dev/obd1 Second obd control device ... See ftp://ftp.lustre.org/pub/obd for code and information. -187 char DESkey hardware encryption device + 187 char DESkey hardware encryption device 0 = /dev/deskey0 First DES key 1 = /dev/deskey1 Second DES key ... -188 char USB serial converters + 188 char USB serial converters 0 = /dev/ttyUSB0 First USB serial converter 1 = /dev/ttyUSB1 Second USB serial converter ... -189 char USB serial converters - alternate devices + 189 char USB serial converters - alternate devices 0 = /dev/cuusb0 Callout device for ttyUSB0 1 = /dev/cuusb1 Callout device for ttyUSB1 ... -190 char Kansas City tracker/tuner card + 190 char Kansas City tracker/tuner card 0 = /dev/kctt0 First KCT/T card 1 = /dev/kctt1 Second KCT/T card ... -191 char Reserved for PCMCIA + 191 char Reserved for PCMCIA -192 char Kernel profiling interface + 192 char Kernel profiling interface 0 = /dev/profile Profiling control device 1 = /dev/profile0 Profiling device for CPU 0 2 = /dev/profile1 Profiling device for CPU 1 ... -193 char Kernel event-tracing interface + 193 char Kernel event-tracing interface 0 = /dev/trace Tracing control device 1 = /dev/trace0 Tracing device for CPU 0 2 = /dev/trace1 Tracing device for CPU 1 ... -194 char linVideoStreams (LINVS) + 194 char linVideoStreams (LINVS) 0 = /dev/mvideo/status0 Video compression status 1 = /dev/mvideo/stream0 Video stream 2 = /dev/mvideo/frame0 Single compressed frame @@ -2633,13 +2578,13 @@ Your cooperation is appreciated. 240 = /dev/mvideo/status15 16th device ... -195 char Nvidia graphics devices + 195 char Nvidia graphics devices 0 = /dev/nvidia0 First Nvidia card 1 = /dev/nvidia1 Second Nvidia card ... 255 = /dev/nvidiactl Nvidia card control device -196 char Tormenta T1 card + 196 char Tormenta T1 card 0 = /dev/tor/0 Master control channel for all cards 1 = /dev/tor/1 First DS0 2 = /dev/tor/2 Second DS0 @@ -2649,24 +2594,24 @@ Your cooperation is appreciated. 50 = /dev/tor/50 Second pseudo-channel ... -197 char OpenTNF tracing facility + 197 char OpenTNF tracing facility 0 = /dev/tnf/t0 Trace 0 data extraction 1 = /dev/tnf/t1 Trace 1 data extraction ... 128 = /dev/tnf/status Tracing facility status 130 = /dev/tnf/trace Tracing device -198 char Total Impact TPMP2 quad coprocessor PCI card + 198 char Total Impact TPMP2 quad coprocessor PCI card 0 = /dev/tpmp2/0 First card 1 = /dev/tpmp2/1 Second card ... -199 char Veritas volume manager (VxVM) volumes + 199 char Veritas volume manager (VxVM) volumes 0 = /dev/vx/rdsk/*/* First volume 1 = /dev/vx/rdsk/*/* Second volume ... -199 block Veritas volume manager (VxVM) volumes + 199 block Veritas volume manager (VxVM) volumes 0 = /dev/vx/dsk/*/* First volume 1 = /dev/vx/dsk/*/* Second volume ... @@ -2674,19 +2619,19 @@ Your cooperation is appreciated. The namespace in these directories is maintained by the user space VxVM software. -200 char Veritas VxVM configuration interface - 0 = /dev/vx/config Configuration access node - 1 = /dev/vx/trace Volume i/o trace access node - 2 = /dev/vx/iod Volume i/o daemon access node - 3 = /dev/vx/info Volume information access node - 4 = /dev/vx/task Volume tasks access node - 5 = /dev/vx/taskmon Volume tasks monitor daemon + 200 char Veritas VxVM configuration interface + 0 = /dev/vx/config Configuration access node + 1 = /dev/vx/trace Volume i/o trace access node + 2 = /dev/vx/iod Volume i/o daemon access node + 3 = /dev/vx/info Volume information access node + 4 = /dev/vx/task Volume tasks access node + 5 = /dev/vx/taskmon Volume tasks monitor daemon -201 char Veritas VxVM dynamic multipathing driver + 201 char Veritas VxVM dynamic multipathing driver 0 = /dev/vx/rdmp/* First multipath device 1 = /dev/vx/rdmp/* Second multipath device ... -201 block Veritas VxVM dynamic multipathing driver + 201 block Veritas VxVM dynamic multipathing driver 0 = /dev/vx/dmp/* First multipath device 1 = /dev/vx/dmp/* Second multipath device ... @@ -2694,28 +2639,28 @@ Your cooperation is appreciated. The namespace in these directories is maintained by the user space VxVM software. -202 char CPU model-specific registers + 202 char CPU model-specific registers 0 = /dev/cpu/0/msr MSRs on CPU 0 1 = /dev/cpu/1/msr MSRs on CPU 1 ... -202 block Xen Virtual Block Device + 202 block Xen Virtual Block Device 0 = /dev/xvda First Xen VBD whole disk 16 = /dev/xvdb Second Xen VBD whole disk 32 = /dev/xvdc Third Xen VBD whole disk ... 240 = /dev/xvdp Sixteenth Xen VBD whole disk - Partitions are handled in the same way as for IDE - disks (see major number 3) except that the limit on - partitions is 15. + Partitions are handled in the same way as for IDE + disks (see major number 3) except that the limit on + partitions is 15. -203 char CPU CPUID information + 203 char CPU CPUID information 0 = /dev/cpu/0/cpuid CPUID on CPU 0 1 = /dev/cpu/1/cpuid CPUID on CPU 1 ... -204 char Low-density serial ports + 204 char Low-density serial ports 0 = /dev/ttyLU0 LinkUp Systems L72xx UART - port 0 1 = /dev/ttyLU1 LinkUp Systems L72xx UART - port 1 2 = /dev/ttyLU2 LinkUp Systems L72xx UART - port 2 @@ -2787,7 +2732,7 @@ Your cooperation is appreciated. 211 = /dev/ttyMAX2 MAX3100 serial port 2 212 = /dev/ttyMAX3 MAX3100 serial port 3 -205 char Low-density serial ports (alternate device) + 205 char Low-density serial ports (alternate device) 0 = /dev/culu0 Callout device for ttyLU0 1 = /dev/culu1 Callout device for ttyLU1 2 = /dev/culu2 Callout device for ttyLU2 @@ -2823,7 +2768,7 @@ Your cooperation is appreciated. 82 = /dev/cuvr0 Callout device for ttyVR0 83 = /dev/cuvr1 Callout device for ttyVR1 -206 char OnStream SC-x0 tape devices + 206 char OnStream SC-x0 tape devices 0 = /dev/osst0 First OnStream SCSI tape, mode 0 1 = /dev/osst1 Second OnStream SCSI tape, mode 0 ... @@ -2857,7 +2802,7 @@ Your cooperation is appreciated. driver as well. The ADR-x0 drives are QIC-157 compliant and don't need osst. -207 char Compaq ProLiant health feature indicate + 207 char Compaq ProLiant health feature indicate 0 = /dev/cpqhealth/cpqw Redirector interface 1 = /dev/cpqhealth/crom EISA CROM 2 = /dev/cpqhealth/cdt Data Table @@ -2871,17 +2816,17 @@ Your cooperation is appreciated. 10 = /dev/cpqhealth/cram CMOS interface 11 = /dev/cpqhealth/cpci PCI IRQ interface -208 char User space serial ports + 208 char User space serial ports 0 = /dev/ttyU0 First user space serial port 1 = /dev/ttyU1 Second user space serial port ... -209 char User space serial ports (alternate devices) + 209 char User space serial ports (alternate devices) 0 = /dev/cuu0 Callout device for ttyU0 1 = /dev/cuu1 Callout device for ttyU1 ... -210 char SBE, Inc. sync/async serial card + 210 char SBE, Inc. sync/async serial card 0 = /dev/sbei/wxcfg0 Configuration device for board 0 1 = /dev/sbei/dld0 Download device for board 0 2 = /dev/sbei/wan00 WAN device, port 0, board 0 @@ -2906,12 +2851,12 @@ Your cooperation is appreciated. Yes, each board is really spaced 10 (decimal) apart. -211 char Addinum CPCI1500 digital I/O card + 211 char Addinum CPCI1500 digital I/O card 0 = /dev/addinum/cpci1500/0 First CPCI1500 card 1 = /dev/addinum/cpci1500/1 Second CPCI1500 card ... -212 char LinuxTV.org DVB driver subsystem + 212 char LinuxTV.org DVB driver subsystem 0 = /dev/dvb/adapter0/video0 first video decoder of first card 1 = /dev/dvb/adapter0/audio0 first audio decoder of first card 2 = /dev/dvb/adapter0/sec0 (obsolete/unused) @@ -2929,34 +2874,34 @@ Your cooperation is appreciated. ... 196 = /dev/dvb/adapter3/video0 first video decoder of fourth card -216 char Bluetooth RFCOMM TTY devices + 216 char Bluetooth RFCOMM TTY devices 0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device 1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device ... -217 char Bluetooth RFCOMM TTY devices (alternate devices) + 217 char Bluetooth RFCOMM TTY devices (alternate devices) 0 = /dev/curf0 Callout device for rfcomm0 1 = /dev/curf1 Callout device for rfcomm1 ... -218 char The Logical Company bus Unibus/Qbus adapters + 218 char The Logical Company bus Unibus/Qbus adapters 0 = /dev/logicalco/bci/0 First bus adapter 1 = /dev/logicalco/bci/1 First bus adapter ... -219 char The Logical Company DCI-1300 digital I/O card + 219 char The Logical Company DCI-1300 digital I/O card 0 = /dev/logicalco/dci1300/0 First DCI-1300 card 1 = /dev/logicalco/dci1300/1 Second DCI-1300 card ... -220 char Myricom Myrinet "GM" board + 220 char Myricom Myrinet "GM" board 0 = /dev/myricom/gm0 First Myrinet GM board 1 = /dev/myricom/gmp0 First board "root access" 2 = /dev/myricom/gm1 Second Myrinet GM board 3 = /dev/myricom/gmp1 Second board "root access" ... -221 char VME bus + 221 char VME bus 0 = /dev/bus/vme/m0 First master image 1 = /dev/bus/vme/m1 Second master image 2 = /dev/bus/vme/m2 Third master image @@ -2971,38 +2916,38 @@ Your cooperation is appreciated. same interface. For interface documentation see http://www.vmelinux.org/. -224 char A2232 serial card + 224 char A2232 serial card 0 = /dev/ttyY0 First A2232 port 1 = /dev/ttyY1 Second A2232 port ... -225 char A2232 serial card (alternate devices) + 225 char A2232 serial card (alternate devices) 0 = /dev/cuy0 Callout device for ttyY0 1 = /dev/cuy1 Callout device for ttyY1 ... -226 char Direct Rendering Infrastructure (DRI) + 226 char Direct Rendering Infrastructure (DRI) 0 = /dev/dri/card0 First graphics card 1 = /dev/dri/card1 Second graphics card ... -227 char IBM 3270 terminal Unix tty access + 227 char IBM 3270 terminal Unix tty access 1 = /dev/3270/tty1 First 3270 terminal 2 = /dev/3270/tty2 Seconds 3270 terminal ... -228 char IBM 3270 terminal block-mode access + 228 char IBM 3270 terminal block-mode access 0 = /dev/3270/tub Controlling interface 1 = /dev/3270/tub1 First 3270 terminal 2 = /dev/3270/tub2 Second 3270 terminal ... -229 char IBM iSeries/pSeries virtual console + 229 char IBM iSeries/pSeries virtual console 0 = /dev/hvc0 First console port 1 = /dev/hvc1 Second console port ... -230 char IBM iSeries virtual tape + 230 char IBM iSeries virtual tape 0 = /dev/iseries/vt0 First virtual tape, mode 0 1 = /dev/iseries/vt1 Second virtual tape, mode 0 ... @@ -3033,7 +2978,7 @@ Your cooperation is appreciated. ioctl()'s can be used to rewind the tape regardless of the device used to access it. -231 char InfiniBand + 231 char InfiniBand 0 = /dev/infiniband/umad0 1 = /dev/infiniband/umad1 ... @@ -3047,7 +2992,7 @@ Your cooperation is appreciated. ... 159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device -232 char Biometric Devices + 232 char Biometric Devices 0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device 1 = /dev/biometric/sensor0/iris first iris sensor on first device 2 = /dev/biometric/sensor0/retina first retina sensor on first device @@ -3060,7 +3005,7 @@ Your cooperation is appreciated. 20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device ... -233 char PathScale InfiniPath interconnect + 233 char PathScale InfiniPath interconnect 0 = /dev/ipath Primary device for programs (any unit) 1 = /dev/ipath0 Access specifically to unit 0 2 = /dev/ipath1 Access specifically to unit 1 @@ -3069,18 +3014,18 @@ Your cooperation is appreciated. 129 = /dev/ipath_sma Device used by Subnet Management Agent 130 = /dev/ipath_diag Device used by diagnostics programs -234-254 char RESERVED FOR DYNAMIC ASSIGNMENT + 234-254 char RESERVED FOR DYNAMIC ASSIGNMENT Character devices that request a dynamic allocation of major number will take numbers starting from 254 and downward. -240-254 block LOCAL/EXPERIMENTAL USE + 240-254 block LOCAL/EXPERIMENTAL USE Allocated for local/experimental use. For devices not assigned official numbers, these ranges should be used in order to avoid conflicting with future assignments. -255 char RESERVED + 255 char RESERVED -255 block RESERVED + 255 block RESERVED This major is reserved to assist the expansion to a larger number space. No device nodes with this major @@ -3088,25 +3033,25 @@ Your cooperation is appreciated. (This is probably not true anymore, but I'll leave it for now /Torben) ----LARGE MAJORS!!!!!--- + ---LARGE MAJORS!!!!!--- -256 char Equinox SST multi-port serial boards + 256 char Equinox SST multi-port serial boards 0 = /dev/ttyEQ0 First serial port on first Equinox SST board 127 = /dev/ttyEQ127 Last serial port on first Equinox SST board 128 = /dev/ttyEQ128 First serial port on second Equinox SST board ... 1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board -256 block Resident Flash Disk Flash Translation Layer + 256 block Resident Flash Disk Flash Translation Layer 0 = /dev/rfda First RFD FTL layer 16 = /dev/rfdb Second RFD FTL layer ... 240 = /dev/rfdp 16th RFD FTL layer -257 char Phoenix Technologies Cryptographic Services Driver + 257 char Phoenix Technologies Cryptographic Services Driver 0 = /dev/ptlsec Crypto Services Driver -257 block SSFDC Flash Translation Layer filesystem + 257 block SSFDC Flash Translation Layer filesystem 0 = /dev/ssfdca First SSFDC layer 8 = /dev/ssfdcb Second SSFDC layer 16 = /dev/ssfdcc Third SSFDC layer @@ -3116,210 +3061,21 @@ Your cooperation is appreciated. 48 = /dev/ssfdcg 7th SSFDC layer 56 = /dev/ssfdch 8th SSFDC layer -258 block ROM/Flash read-only translation layer + 258 block ROM/Flash read-only translation layer 0 = /dev/blockrom0 First ROM card's translation layer interface 1 = /dev/blockrom1 Second ROM card's translation layer interface ... -259 block Block Extended Major + 259 block Block Extended Major Used dynamically to hold additional partition minor numbers and allow large numbers of partitions per device -259 char FPGA configuration interfaces + 259 char FPGA configuration interfaces 0 = /dev/icap0 First Xilinx internal configuration 1 = /dev/icap1 Second Xilinx internal configuration -260 char OSD (Object-based-device) SCSI Device + 260 char OSD (Object-based-device) SCSI Device 0 = /dev/osd0 First OSD Device 1 = /dev/osd1 Second OSD Device ... 255 = /dev/osd255 256th OSD Device - - **** ADDITIONAL /dev DIRECTORY ENTRIES - -This section details additional entries that should or may exist in -the /dev directory. It is preferred that symbolic links use the same -form (absolute or relative) as is indicated here. Links are -classified as "hard" or "symbolic" depending on the preferred type of -link; if possible, the indicated type of link should be used. - - - Compulsory links - -These links should exist on all systems: - -/dev/fd /proc/self/fd symbolic File descriptors -/dev/stdin fd/0 symbolic stdin file descriptor -/dev/stdout fd/1 symbolic stdout file descriptor -/dev/stderr fd/2 symbolic stderr file descriptor -/dev/nfsd socksys symbolic Required by iBCS-2 -/dev/X0R null symbolic Required by iBCS-2 - -Note: /dev/X0R is --. - - Recommended links - -It is recommended that these links exist on all systems: - -/dev/core /proc/kcore symbolic Backward compatibility -/dev/ramdisk ram0 symbolic Backward compatibility -/dev/ftape qft0 symbolic Backward compatibility -/dev/bttv0 video0 symbolic Backward compatibility -/dev/radio radio0 symbolic Backward compatibility -/dev/i2o* /dev/i2o/* symbolic Backward compatibility -/dev/scd? sr? hard Alternate SCSI CD-ROM name - - Locally defined links - -The following links may be established locally to conform to the -configuration of the system. This is merely a tabulation of existing -practice, and does not constitute a recommendation. However, if they -exist, they should have the following uses. - -/dev/mouse mouse port symbolic Current mouse device -/dev/tape tape device symbolic Current tape device -/dev/cdrom CD-ROM device symbolic Current CD-ROM device -/dev/cdwriter CD-writer symbolic Current CD-writer device -/dev/scanner scanner symbolic Current scanner device -/dev/modem modem port symbolic Current dialout device -/dev/root root device symbolic Current root filesystem -/dev/swap swap device symbolic Current swap device - -/dev/modem should not be used for a modem which supports dialin as -well as dialout, as it tends to cause lock file problems. If it -exists, /dev/modem should point to the appropriate primary TTY device -(the use of the alternate callout devices is deprecated). - -For SCSI devices, /dev/tape and /dev/cdrom should point to the -``cooked'' devices (/dev/st* and /dev/sr*, respectively), whereas -/dev/cdwriter and /dev/scanner should point to the appropriate generic -SCSI devices (/dev/sg*). - -/dev/mouse may point to a primary serial TTY device, a hardware mouse -device, or a socket for a mouse driver program (e.g. /dev/gpmdata). - - Sockets and pipes - -Non-transient sockets and named pipes may exist in /dev. Common entries are: - -/dev/printer socket lpd local socket -/dev/log socket syslog local socket -/dev/gpmdata socket gpm mouse multiplexer - - Mount points - -The following names are reserved for mounting special filesystems -under /dev. These special filesystems provide kernel interfaces that -cannot be provided with standard device nodes. - -/dev/pts devpts PTY slave filesystem -/dev/shm tmpfs POSIX shared memory maintenance access - - **** TERMINAL DEVICES - -Terminal, or TTY devices are a special class of character devices. A -terminal device is any device that could act as a controlling terminal -for a session; this includes virtual consoles, serial ports, and -pseudoterminals (PTYs). - -All terminal devices share a common set of capabilities known as line -disciplines; these include the common terminal line discipline as well -as SLIP and PPP modes. - -All terminal devices are named similarly; this section explains the -naming and use of the various types of TTYs. Note that the naming -conventions include several historical warts; some of these are -Linux-specific, some were inherited from other systems, and some -reflect Linux outgrowing a borrowed convention. - -A hash mark (#) in a device name is used here to indicate a decimal -number without leading zeroes. - - Virtual consoles and the console device - -Virtual consoles are full-screen terminal displays on the system video -monitor. Virtual consoles are named /dev/tty#, with numbering -starting at /dev/tty1; /dev/tty0 is the current virtual console. -/dev/tty0 is the device that should be used to access the system video -card on those architectures for which the frame buffer devices -(/dev/fb*) are not applicable. Do not use /dev/console -for this purpose. - -The console device, /dev/console, is the device to which system -messages should be sent, and on which logins should be permitted in -single-user mode. Starting with Linux 2.1.71, /dev/console is managed -by the kernel; for previous versions it should be a symbolic link to -either /dev/tty0, a specific virtual console such as /dev/tty1, or to -a serial port primary (tty*, not cu*) device, depending on the -configuration of the system. - - Serial ports - -Serial ports are RS-232 serial ports and any device which simulates -one, either in hardware (such as internal modems) or in software (such -as the ISDN driver.) Under Linux, each serial ports has two device -names, the primary or callin device and the alternate or callout one. -Each kind of device is indicated by a different letter. For any -letter X, the names of the devices are /dev/ttyX# and /dev/cux#, -respectively; for historical reasons, /dev/ttyS# and /dev/ttyC# -correspond to /dev/cua# and /dev/cub#. In the future, it should be -expected that multiple letters will be used; all letters will be upper -case for the "tty" device (e.g. /dev/ttyDP#) and lower case for the -"cu" device (e.g. /dev/cudp#). - -The names /dev/ttyQ# and /dev/cuq# are reserved for local use. - -The alternate devices provide for kernel-based exclusion and somewhat -different defaults than the primary devices. Their main purpose is to -allow the use of serial ports with programs with no inherent or broken -support for serial ports. Their use is deprecated, and they may be -removed from a future version of Linux. - -Arbitration of serial ports is provided by the use of lock files with -the names /var/lock/LCK..ttyX#. The contents of the lock file should -be the PID of the locking process as an ASCII number. - -It is common practice to install links such as /dev/modem -which point to serial ports. In order to ensure proper locking in the -presence of these links, it is recommended that software chase -symlinks and lock all possible names; additionally, it is recommended -that a lock file be installed with the corresponding alternate -device. In order to avoid deadlocks, it is recommended that the locks -are acquired in the following order, and released in the reverse: - - 1. The symbolic link name, if any (/var/lock/LCK..modem) - 2. The "tty" name (/var/lock/LCK..ttyS2) - 3. The alternate device name (/var/lock/LCK..cua2) - -In the case of nested symbolic links, the lock files should be -installed in the order the symlinks are resolved. - -Under no circumstances should an application hold a lock while waiting -for another to be released. In addition, applications which attempt -to create lock files for the corresponding alternate device names -should take into account the possibility of being used on a non-serial -port TTY, for which no alternate device would exist. - - Pseudoterminals (PTYs) - -Pseudoterminals, or PTYs, are used to create login sessions or provide -other capabilities requiring a TTY line discipline (including SLIP or -PPP capability) to arbitrary data-generation processes. Each PTY has -a master side, named /dev/pty[p-za-e][0-9a-f], and a slave side, named -/dev/tty[p-za-e][0-9a-f]. The kernel arbitrates the use of PTYs by -allowing each master side to be opened only once. - -Once the master side has been opened, the corresponding slave device -can be used in the same manner as any TTY device. The master and -slave devices are connected by the kernel, generating the equivalent -of a bidirectional pipe with TTY capabilities. - -Recent versions of the Linux kernels and GNU libc contain support for -the System V/Unix98 naming scheme for PTYs, which assigns a common -device, /dev/ptmx, to all the masters (opening it will automatically -give you a previously unassigned PTY) and a subdirectory, /dev/pts, -for the slaves; the slaves are named with decimal integers (/dev/pts/# -in our notation). This removes the problem of exhausting the -namespace and enables the kernel to automatically create the device -nodes for the slaves on demand using the "devpts" filesystem. - diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst new file mode 100644 index 000000000000..88adcfdf5b2b --- /dev/null +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -0,0 +1,353 @@ +Dynamic debug ++++++++++++++ + + +Introduction +============ + +This document describes how to use the dynamic debug (dyndbg) feature. + +Dynamic debug is designed to allow you to dynamically enable/disable +kernel code to obtain additional kernel information. Currently, if +``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and +``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically +enabled per-callsite. + +If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just +shortcut for ``print_hex_dump(KERN_DEBUG)``. + +For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is +its ``prefix_str`` argument, if it is constant string; or ``hexdump`` +in case ``prefix_str`` is build dynamically. + +Dynamic debug has even more useful features: + + * Simple query language allows turning on and off debugging + statements by matching any combination of 0 or 1 of: + + - source filename + - function name + - line number (including ranges of line numbers) + - module name + - format string + + * Provides a debugfs control file: ``/dynamic_debug/control`` + which can be read to display the complete list of known debug + statements, to help guide you + +Controlling dynamic debug Behaviour +=================================== + +The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a +control file in the 'debugfs' filesystem. Thus, you must first mount +the debugfs filesystem, in order to make use of this feature. +Subsequently, we refer to the control file as: +``/dynamic_debug/control``. For example, if you want to enable +printing from source file ``svcsock.c``, line 1603 you simply do:: + + nullarbor:~ # echo 'file svcsock.c line 1603 +p' > + /dynamic_debug/control + +If you make a mistake with the syntax, the write will fail thus:: + + nullarbor:~ # echo 'file svcsock.c wtf 1 +p' > + /dynamic_debug/control + -bash: echo: write error: Invalid argument + +Viewing Dynamic Debug Behaviour +=============================== + +You can view the currently configured behaviour of all the debug +statements via:: + + nullarbor:~ # cat /dynamic_debug/control + # filename:lineno [module]function flags format + /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012" + /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012" + /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012" + /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012" + ... + + +You can also apply standard Unix text manipulation filters to this +data, e.g.:: + + nullarbor:~ # grep -i rdma /dynamic_debug/control | wc -l + 62 + + nullarbor:~ # grep -i tcp /dynamic_debug/control | wc -l + 42 + +The third column shows the currently enabled flags for each debug +statement callsite (see below for definitions of the flags). The +default value, with no flags enabled, is ``=_``. So you can view all +the debug statement callsites with any non-default flags:: + + nullarbor:~ # awk '$3 != "=_"' /dynamic_debug/control + # filename:lineno [module]function flags format + /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012" + +Command Language Reference +========================== + +At the lexical level, a command comprises a sequence of words separated +by spaces or tabs. So these are all equivalent:: + + nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' > + /dynamic_debug/control + nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' > + /dynamic_debug/control + nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > + /dynamic_debug/control + +Command submissions are bounded by a write() system call. +Multiple commands can be written together, separated by ``;`` or ``\n``:: + + ~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \ + > /dynamic_debug/control + +If your query set is big, you can batch them too:: + + ~# cat query-batch-file > /dynamic_debug/control + +A another way is to use wildcard. The match rule support ``*`` (matches +zero or more characters) and ``?`` (matches exactly one character).For +example, you can match all usb drivers:: + + ~# echo "file drivers/usb/* +p" > /dynamic_debug/control + +At the syntactical level, a command comprises a sequence of match +specifications, followed by a flags change specification:: + + command ::= match-spec* flags-spec + +The match-spec's are used to choose a subset of the known pr_debug() +callsites to which to apply the flags-spec. Think of them as a query +with implicit ANDs between each pair. Note that an empty list of +match-specs will select all debug statement callsites. + +A match specification comprises a keyword, which controls the +attribute of the callsite to be compared, and a value to compare +against. Possible keywords are::: + + match-spec ::= 'func' string | + 'file' string | + 'module' string | + 'format' string | + 'line' line-range + + line-range ::= lineno | + '-'lineno | + lineno'-' | + lineno'-'lineno + + lineno ::= unsigned-int + +.. note:: + + ``line-range`` cannot contain space, e.g. + "1-30" is valid range but "1 - 30" is not. + + +The meanings of each keyword are: + +func + The given string is compared against the function name + of each callsite. Example:: + + func svc_tcp_accept + +file + The given string is compared against either the full pathname, the + src-root relative pathname, or the basename of the source file of + each callsite. Examples:: + + file svcsock.c + file kernel/freezer.c + file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c + +module + The given string is compared against the module name + of each callsite. The module name is the string as + seen in ``lsmod``, i.e. without the directory or the ``.ko`` + suffix and with ``-`` changed to ``_``. Examples:: + + module sunrpc + module nfsd + +format + The given string is searched for in the dynamic debug format + string. Note that the string does not need to match the + entire format, only some part. Whitespace and other + special characters can be escaped using C octal character + escape ``\ooo`` notation, e.g. the space character is ``\040``. + Alternatively, the string can be enclosed in double quote + characters (``"``) or single quote characters (``'``). + Examples:: + + format svcrdma: // many of the NFS/RDMA server pr_debugs + format readahead // some pr_debugs in the readahead cache + format nfsd:\040SETATTR // one way to match a format with whitespace + format "nfsd: SETATTR" // a neater way to match a format with whitespace + format 'nfsd: SETATTR' // yet another way to match a format with whitespace + +line + The given line number or range of line numbers is compared + against the line number of each ``pr_debug()`` callsite. A single + line number matches the callsite line number exactly. A + range of line numbers matches any callsite between the first + and last line number inclusive. An empty first number means + the first line in the file, an empty line number means the + last number in the file. Examples:: + + line 1603 // exactly line 1603 + line 1600-1605 // the six lines from line 1600 to line 1605 + line -1605 // the 1605 lines from line 1 to line 1605 + line 1600- // all lines from line 1600 to the end of the file + +The flags specification comprises a change operation followed +by one or more flag characters. The change operation is one +of the characters:: + + - remove the given flags + + add the given flags + = set the flags to the given flags + +The flags are:: + + p enables the pr_debug() callsite. + f Include the function name in the printed message + l Include line number in the printed message + m Include module name in the printed message + t Include thread ID in messages not generated from interrupt context + _ No flags are set. (Or'd with others on input) + +For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag +have meaning, other flags ignored. + +For display, the flags are preceded by ``=`` +(mnemonic: what the flags are currently equal to). + +Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification. +To clear all flags at once, use ``=_`` or ``-flmpt``. + + +Debug messages during Boot Process +================================== + +To activate debug messages for core code and built-in modules during +the boot process, even before userspace and debugfs exists, use +``dyndbg="QUERY"``, ``module.dyndbg="QUERY"``, or ``ddebug_query="QUERY"`` +(``ddebug_query`` is obsoleted by ``dyndbg``, and deprecated). QUERY follows +the syntax described above, but must not exceed 1023 characters. Your +bootloader may impose lower limits. + +These ``dyndbg`` params are processed just after the ddebug tables are +processed, as part of the arch_initcall. Thus you can enable debug +messages in all code run after this arch_initcall via this boot +parameter. + +On an x86 system for example ACPI enablement is a subsys_initcall and:: + + dyndbg="file ec.c +p" + +will show early Embedded Controller transactions during ACPI setup if +your machine (typically a laptop) has an Embedded Controller. +PCI (or other devices) initialization also is a hot candidate for using +this boot parameter for debugging purposes. + +If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at +boot time, without effect, but will be reprocessed when module is +loaded later. ``dyndbg_query=`` and bare ``dyndbg=`` are only processed at +boot. + + +Debug Messages at Module Initialization Time +============================================ + +When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for +``foo.params``, strips ``foo.``, and passes them to the kernel along with +params given in modprobe args or ``/etc/modprob.d/*.conf`` files, +in the following order: + +1. parameters given via ``/etc/modprobe.d/*.conf``:: + + options foo dyndbg=+pt + options foo dyndbg # defaults to +p + +2. ``foo.dyndbg`` as given in boot args, ``foo.`` is stripped and passed:: + + foo.dyndbg=" func bar +p; func buz +mp" + +3. args to modprobe:: + + modprobe foo dyndbg==pmf # override previous settings + +These ``dyndbg`` queries are applied in order, with last having final say. +This allows boot args to override or modify those from ``/etc/modprobe.d`` +(sensible, since 1 is system wide, 2 is kernel or boot specific), and +modprobe args to override both. + +In the ``foo.dyndbg="QUERY"`` form, the query must exclude ``module foo``. +``foo`` is extracted from the param-name, and applied to each query in +``QUERY``, and only 1 match-spec of each type is allowed. + +The ``dyndbg`` option is a "fake" module parameter, which means: + +- modules do not need to define it explicitly +- every module gets it tacitly, whether they use pr_debug or not +- it doesn't appear in ``/sys/module/$module/parameters/`` + To see it, grep the control file, or inspect ``/proc/cmdline.`` + +For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or +enabled by ``-DDEBUG`` flag during compilation) can be disabled later via +the sysfs interface if the debug messages are no longer needed:: + + echo "module module_name -p" > /dynamic_debug/control + +Examples +======== + +:: + + // enable the message at line 1603 of file svcsock.c + nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > + /dynamic_debug/control + + // enable all the messages in file svcsock.c + nullarbor:~ # echo -n 'file svcsock.c +p' > + /dynamic_debug/control + + // enable all the messages in the NFS server module + nullarbor:~ # echo -n 'module nfsd +p' > + /dynamic_debug/control + + // enable all 12 messages in the function svc_process() + nullarbor:~ # echo -n 'func svc_process +p' > + /dynamic_debug/control + + // disable all 12 messages in the function svc_process() + nullarbor:~ # echo -n 'func svc_process -p' > + /dynamic_debug/control + + // enable messages for NFS calls READ, READLINK, READDIR and READDIR+. + nullarbor:~ # echo -n 'format "nfsd: READ" +p' > + /dynamic_debug/control + + // enable messages in files of which the paths include string "usb" + nullarbor:~ # echo -n '*usb* +p' > /dynamic_debug/control + + // enable all messages + nullarbor:~ # echo -n '+p' > /dynamic_debug/control + + // add module, function to all enabled messages + nullarbor:~ # echo -n '+mf' > /dynamic_debug/control + + // boot-args example, with newlines and comments for readability + Kernel command line: ... + // see whats going on in dyndbg=value processing + dynamic_debug.verbose=1 + // enable pr_debugs in 2 builtins, #cmt is stripped + dyndbg="module params +p #cmt ; module sys +p" + // enable pr_debugs in 2 functions in a module loaded later + pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p" diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst new file mode 100644 index 000000000000..8ddae4e4299a --- /dev/null +++ b/Documentation/admin-guide/index.rst @@ -0,0 +1,69 @@ +The Linux kernel user's and administrator's guide +================================================= + +The following is a collection of user-oriented documents that have been +added to the kernel over time. There is, as yet, little overall order or +organization here — this material was not written to be a single, coherent +document! With luck things will improve quickly over time. + +This initial section contains overall information, including the README +file describing the kernel as a whole, documentation on kernel parameters, +etc. + +.. toctree:: + :maxdepth: 1 + + README + kernel-parameters + devices + +Here is a set of documents aimed at users who are trying to track down +problems and bugs in particular. + +.. toctree:: + :maxdepth: 1 + + reporting-bugs + security-bugs + bug-hunting + bug-bisect + tainted-kernels + ramoops + dynamic-debug-howto + init + +This is the beginning of a section with information of interest to +application developers. Documents covering various aspects of the kernel +ABI will be found here. + +.. toctree:: + :maxdepth: 1 + + sysfs-rules + +The rest of this manual consists of various unordered guides on how to +configure specific aspects of kernel behavior to your liking. + +.. toctree:: + :maxdepth: 1 + + initrd + serial-console + braille-console + parport + md + module-signing + sysrq + unicode + vga-softcursor + binfmt-misc + mono + java + ras + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/init.txt b/Documentation/admin-guide/init.rst similarity index 65% rename from Documentation/init.txt rename to Documentation/admin-guide/init.rst index 535ad5e82b98..e89d97f31eaf 100644 --- a/Documentation/init.txt +++ b/Documentation/admin-guide/init.rst @@ -5,6 +5,7 @@ OK, so you've got this pretty unintuitive message (currently located in init/main.c) and are wondering what the H*** went wrong. Some high-level reasons for failure (listed roughly in order of execution) to load the init binary are: + A) Unable to mount root FS B) init binary doesn't exist on rootfs C) broken console device @@ -12,37 +13,39 @@ D) binary exists but dependencies not available E) binary cannot be loaded Detailed explanations: -0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) + +A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) to get more detailed kernel messages. -A) make sure you have the correct root FS type - (and root= kernel parameter points to the correct partition), +B) make sure you have the correct root FS type + (and ``root=`` kernel parameter points to the correct partition), required drivers such as storage hardware (such as SCSI or USB!) and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules, to be pre-loaded by an initrd) -C) Possibly a conflict in console= setup --> initial console unavailable. +C) Possibly a conflict in ``console= setup`` --> initial console unavailable. E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing interrupt-based configuration). - Try using a different console= device or e.g. netconsole= . + Try using a different ``console= device`` or e.g. ``netconsole=``. D) e.g. required library dependencies of the init binary such as - /lib/ld-linux.so.2 missing or broken. Use readelf -d |grep NEEDED - to find out which libraries are required. + ``/lib/ld-linux.so.2`` missing or broken. Use + ``readelf -d |grep NEEDED`` to find out which libraries are required. E) make sure the binary's architecture matches your hardware. E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. In case you tried loading a non-binary file here (shell script?), you should make sure that the script specifies an interpreter in its shebang - header line (#!/...) that is fully working (including its library + header line (``#!/...``) that is fully working (including its library dependencies). And before tackling scripts, better first test a simple - non-script binary such as /bin/sh and confirm its successful execution. - To find out more, add code to init/main.c to display kernel_execve()s + non-script binary such as ``/bin/sh`` and confirm its successful execution. + To find out more, add code ``to init/main.c`` to display kernel_execve()s return values. Please extend this explanation whenever you find new failure causes (after all loading the init binary is a CRITICAL and hard transition step which needs to be made as painless as possible), then submit patch to LKML. Further TODOs: -- Implement the various run_init_process() invocations via a struct array - which can then store the kernel_execve() result value and on failure - log it all by iterating over _all_ results (very important usability fix). + +- Implement the various ``run_init_process()`` invocations via a struct array + which can then store the ``kernel_execve()`` result value and on failure + log it all by iterating over **all** results (very important usability fix). - try to make the implementation itself more helpful in general, e.g. by providing additional error messages at affected places. diff --git a/Documentation/initrd.txt b/Documentation/admin-guide/initrd.rst similarity index 70% rename from Documentation/initrd.txt rename to Documentation/admin-guide/initrd.rst index 4e1839ccb555..a03dabaaf3a3 100644 --- a/Documentation/initrd.txt +++ b/Documentation/admin-guide/initrd.rst @@ -2,7 +2,7 @@ Using the initial RAM disk (initrd) =================================== Written 1996,2000 by Werner Almesberger and - Hans Lermen +Hans Lermen initrd provides the capability to load a RAM disk by the boot loader. @@ -16,7 +16,7 @@ where the kernel comes up with a minimum set of compiled-in drivers, and where additional modules are loaded from initrd. This document gives a brief overview of the use of initrd. A more detailed -discussion of the boot process can be found in [1]. +discussion of the boot process can be found in [#f1]_. Operation @@ -27,10 +27,10 @@ When using initrd, the system typically boots as follows: 1) the boot loader loads the kernel and the initial RAM disk 2) the kernel converts initrd into a "normal" RAM disk and frees the memory used by initrd - 3) if the root device is not /dev/ram0, the old (deprecated) + 3) if the root device is not ``/dev/ram0``, the old (deprecated) change_root procedure is followed. see the "Obsolete root change mechanism" section below. - 4) root device is mounted. if it is /dev/ram0, the initrd image is + 4) root device is mounted. if it is ``/dev/ram0``, the initrd image is then mounted as root 5) /sbin/init is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything @@ -38,7 +38,7 @@ When using initrd, the system typically boots as follows: 6) init mounts the "real" root file system 7) init places the root file system at the root directory using the pivot_root system call - 8) init execs the /sbin/init on the new root filesystem, performing + 8) init execs the ``/sbin/init`` on the new root filesystem, performing the usual boot sequence 9) the initrd file system is removed @@ -51,7 +51,7 @@ be accessible. Boot command-line options ------------------------- -initrd adds the following new options: +initrd adds the following new options:: initrd= (e.g. LOADLIN) @@ -83,36 +83,36 @@ Recent kernels have support for populating a ramdisk from a compressed cpio archive. On such systems, the creation of a ramdisk image doesn't need to involve special block devices or loopbacks; you merely create a directory on disk with the desired initrd content, cd to that directory, and run (as an -example): +example):: -find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img + find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img -Examining the contents of an existing image file is just as simple: +Examining the contents of an existing image file is just as simple:: -mkdir /tmp/imagefile -cd /tmp/imagefile -gzip -cd /boot/imagefile.img | cpio -imd --quiet + mkdir /tmp/imagefile + cd /tmp/imagefile + gzip -cd /boot/imagefile.img | cpio -imd --quiet Installation ------------ First, a directory for the initrd file system has to be created on the -"normal" root file system, e.g. +"normal" root file system, e.g.:: -# mkdir /initrd + # mkdir /initrd -The name is not relevant. More details can be found on the pivot_root(2) -man page. +The name is not relevant. More details can be found on the +:manpage:`pivot_root(2)` man page. If the root file system is created during the boot procedure (i.e. if you're building an install floppy), the root file system creation -procedure should create the /initrd directory. +procedure should create the ``/initrd`` directory. If initrd will not be mounted in some cases, its content is still -accessible if the following device has been created: +accessible if the following device has been created:: -# mknod /dev/initrd b 1 250 -# chmod 400 /dev/initrd + # mknod /dev/initrd b 1 250 + # chmod 400 /dev/initrd Second, the kernel has to be compiled with RAM disk support and with support for the initial RAM disk enabled. Also, at least all components @@ -131,60 +131,76 @@ kernels, at least three types of devices are suitable for that: We'll describe the loopback device method: 1) make sure loopback block devices are configured into the kernel - 2) create an empty file system of the appropriate size, e.g. - # dd if=/dev/zero of=initrd bs=300k count=1 - # mke2fs -F -m0 initrd + 2) create an empty file system of the appropriate size, e.g.:: + + # dd if=/dev/zero of=initrd bs=300k count=1 + # mke2fs -F -m0 initrd + (if space is critical, you may want to use the Minix FS instead of Ext2) - 3) mount the file system, e.g. - # mount -t ext2 -o loop initrd /mnt - 4) create the console device: + 3) mount the file system, e.g.:: + + # mount -t ext2 -o loop initrd /mnt + + 4) create the console device:: + # mkdir /mnt/dev # mknod /mnt/dev/console c 5 1 + 5) copy all the files that are needed to properly use the initrd - environment. Don't forget the most important file, /sbin/init - Note that /sbin/init's permissions must include "x" (execute). + environment. Don't forget the most important file, ``/sbin/init`` + + .. note:: ``/sbin/init`` permissions must include "x" (execute). + 6) correct operation the initrd environment can frequently be tested - even without rebooting with the command - # chroot /mnt /sbin/init + even without rebooting with the command:: + + # chroot /mnt /sbin/init + This is of course limited to initrds that do not interfere with the general system state (e.g. by reconfiguring network interfaces, overwriting mounted devices, trying to start already running demons, etc. Note however that it is usually possible to use pivot_root in such a chroot'ed initrd environment.) - 7) unmount the file system - # umount /mnt + 7) unmount the file system:: + + # umount /mnt + 8) the initrd is now in the file "initrd". Optionally, it can now be - compressed - # gzip -9 initrd + compressed:: + + # gzip -9 initrd For experimenting with initrd, you may want to take a rescue floppy and -only add a symbolic link from /sbin/init to /bin/sh. Alternatively, you -can try the experimental newlib environment [2] to create a small +only add a symbolic link from ``/sbin/init`` to ``/bin/sh``. Alternatively, you +can try the experimental newlib environment [#f2]_ to create a small initrd. Finally, you have to boot the kernel and load initrd. Almost all Linux boot loaders support initrd. Since the boot process is still compatible with an older mechanism, the following boot command line parameters -have to be given: +have to be given:: root=/dev/ram0 rw (rw is only necessary if writing to the initrd file system.) -With LOADLIN, you simply execute +With LOADLIN, you simply execute:: LOADLIN initrd= -e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw -With LILO, you add the option INITRD= to either the global section -or to the section of the respective kernel in /etc/lilo.conf, and pass -the options using APPEND, e.g. +e.g.:: + + LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw + +With LILO, you add the option ``INITRD=`` to either the global section +or to the section of the respective kernel in ``/etc/lilo.conf``, and pass +the options using APPEND, e.g.:: image = /bzImage initrd = /boot/initrd.gz append = "root=/dev/ram0 rw" -and run /sbin/lilo +and run ``/sbin/lilo`` For other boot loaders, please refer to the respective documentation. @@ -204,33 +220,33 @@ The procedure involves the following steps: - unmounting the initrd file system and de-allocating the RAM disk Mounting the new root file system is easy: it just needs to be mounted on -a directory under the current root. Example: +a directory under the current root. Example:: -# mkdir /new-root -# mount -o ro /dev/hda1 /new-root + # mkdir /new-root + # mount -o ro /dev/hda1 /new-root The root change is accomplished with the pivot_root system call, which -is also available via the pivot_root utility (see pivot_root(8) man -page; pivot_root is distributed with util-linux version 2.10h or higher -[3]). pivot_root moves the current root to a directory under the new +is also available via the ``pivot_root`` utility (see :manpage:`pivot_root(8)` +man page; ``pivot_root`` is distributed with util-linux version 2.10h or higher +[#f3]_). ``pivot_root`` moves the current root to a directory under the new root, and puts the new root at its place. The directory for the old root -must exist before calling pivot_root. Example: +must exist before calling ``pivot_root``. Example:: -# cd /new-root -# mkdir initrd -# pivot_root . initrd + # cd /new-root + # mkdir initrd + # pivot_root . initrd Now, the init process may still access the old root via its executable, shared libraries, standard input/output/error, and its current root directory. All these references are dropped by the -following command: +following command:: -# exec chroot . what-follows dev/console 2>&1 + # exec chroot . what-follows dev/console 2>&1 -Where what-follows is a program under the new root, e.g. /sbin/init +Where what-follows is a program under the new root, e.g. ``/sbin/init`` If the new root file system will be used with udev and has no valid -/dev directory, udev must be initialized before invoking chroot in order -to provide /dev/console. +``/dev`` directory, udev must be initialized before invoking chroot in order +to provide ``/dev/console``. Note: implementation details of pivot_root may change with time. In order to ensure compatibility, the following points should be observed: @@ -244,13 +260,13 @@ to ensure compatibility, the following points should be observed: - use relative paths for dev/console in the exec command Now, the initrd can be unmounted and the memory allocated by the RAM -disk can be freed: +disk can be freed:: -# umount /initrd -# blockdev --flushbufs /dev/ram0 + # umount /initrd + # blockdev --flushbufs /dev/ram0 It is also possible to use initrd with an NFS-mounted root, see the -pivot_root(8) man page for details. +:manpage:`pivot_root(8)` man page for details. Usage scenarios @@ -263,21 +279,21 @@ as follows: 1) system boots from floppy or other media with a minimal kernel (e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and loads initrd - 2) /sbin/init determines what is needed to (1) mount the "real" root FS + 2) ``/sbin/init`` determines what is needed to (1) mount the "real" root FS (i.e. device type, device drivers, file system) and (2) the distribution media (e.g. CD-ROM, network, tape, ...). This can be done by asking the user, by auto-probing, or by using a hybrid approach. - 3) /sbin/init loads the necessary kernel modules - 4) /sbin/init creates and populates the root file system (this doesn't + 3) ``/sbin/init`` loads the necessary kernel modules + 4) ``/sbin/init`` creates and populates the root file system (this doesn't have to be a very usable system yet) - 5) /sbin/init invokes pivot_root to change the root file system and + 5) ``/sbin/init`` invokes ``pivot_root`` to change the root file system and execs - via chroot - a program that continues the installation 6) the boot loader is installed 7) the boot loader is configured to load an initrd with the set of - modules that was used to bring up the system (e.g. /initrd can be + modules that was used to bring up the system (e.g. ``/initrd`` can be modified, then unmounted, and finally, the image is written from - /dev/ram0 or /dev/rd/0 to a file) + ``/dev/ram0`` or ``/dev/rd/0`` to a file) 8) now the system is bootable and additional installation tasks can be performed @@ -290,7 +306,7 @@ different hardware configurations in a single administrative domain. In such cases, it is desirable to generate only a small set of kernels (ideally only one) and to keep the system-specific part of configuration information as small as possible. In this case, a common initrd could be -generated with all the necessary modules. Then, only /sbin/init or a file +generated with all the necessary modules. Then, only ``/sbin/init`` or a file read by it would have to be different. A third scenario is more convenient recovery disks, because information @@ -301,9 +317,9 @@ auto-detection). Last not least, CD-ROM distributors may use it for better installation from CD, e.g. by using a boot floppy and bootstrapping a bigger RAM disk -via initrd from CD; or by booting via a loader like LOADLIN or directly +via initrd from CD; or by booting via a loader like ``LOADLIN`` or directly from the CD-ROM, and loading the RAM disk from CD without need of -floppies. +floppies. Obsolete root change mechanism @@ -316,51 +332,52 @@ continued availability. It works by mounting the "real" root device (i.e. the one set with rdev in the kernel image or with root=... at the boot command line) as the root file system when linuxrc exits. The initrd file system is then -unmounted, or, if it is still busy, moved to a directory /initrd, if +unmounted, or, if it is still busy, moved to a directory ``/initrd``, if such a directory exists on the new root file system. In order to use this mechanism, you do not have to specify the boot command options root, init, or rw. (If specified, they will affect the real root file system, not the initrd environment.) - + If /proc is mounted, the "real" root device can be changed from within linuxrc by writing the number of the new root FS device to the special -file /proc/sys/kernel/real-root-dev, e.g. +file /proc/sys/kernel/real-root-dev, e.g.:: # echo 0x301 >/proc/sys/kernel/real-root-dev Note that the mechanism is incompatible with NFS and similar file systems. -This old, deprecated mechanism is commonly called "change_root", while -the new, supported mechanism is called "pivot_root". +This old, deprecated mechanism is commonly called ``change_root``, while +the new, supported mechanism is called ``pivot_root``. Mixed change_root and pivot_root mechanism ------------------------------------------ -In case you did not want to use root=/dev/ram0 to trigger the pivot_root -mechanism, you may create both /linuxrc and /sbin/init in your initrd image. +In case you did not want to use ``root=/dev/ram0`` to trigger the pivot_root +mechanism, you may create both ``/linuxrc`` and ``/sbin/init`` in your initrd +image. -/linuxrc would contain only the following: +``/linuxrc`` would contain only the following:: -#! /bin/sh -mount -n -t proc proc /proc -echo 0x0100 >/proc/sys/kernel/real-root-dev -umount -n /proc + #! /bin/sh + mount -n -t proc proc /proc + echo 0x0100 >/proc/sys/kernel/real-root-dev + umount -n /proc Once linuxrc exited, the kernel would mount again your initrd as root, -this time executing /sbin/init. Again, it would be the duty of this init -to build the right environment (maybe using the root= device passed on -the cmdline) before the final execution of the real /sbin/init. +this time executing ``/sbin/init``. Again, it would be the duty of this init +to build the right environment (maybe using the ``root= device`` passed on +the cmdline) before the final execution of the real ``/sbin/init``. Resources --------- -[1] Almesberger, Werner; "Booting Linux: The History and the Future" +.. [#f1] Almesberger, Werner; "Booting Linux: The History and the Future" http://www.almesberger.net/cv/papers/ols2k-9.ps.gz -[2] newlib package (experimental), with initrd example - http://sources.redhat.com/newlib/ -[3] util-linux: Miscellaneous utilities for Linux - http://www.kernel.org/pub/linux/utils/util-linux/ +.. [#f2] newlib package (experimental), with initrd example + https://www.sourceware.org/newlib/ +.. [#f3] util-linux: Miscellaneous utilities for Linux + https://www.kernel.org/pub/linux/utils/util-linux/ diff --git a/Documentation/java.txt b/Documentation/admin-guide/java.rst similarity index 60% rename from Documentation/java.txt rename to Documentation/admin-guide/java.rst index 418020584ccc..8744e272e6f8 100644 --- a/Documentation/java.txt +++ b/Documentation/admin-guide/java.rst @@ -1,5 +1,5 @@ - Java(tm) Binary Kernel Support for Linux v1.03 - ---------------------------------------------- +Java(tm) Binary Kernel Support for Linux v1.03 +---------------------------------------------- Linux beats them ALL! While all other OS's are TALKING about direct support of Java Binaries in the OS, Linux is doing it! @@ -19,70 +19,82 @@ other program after you have done the following: as the application itself). 2) You have to compile BINFMT_MISC either as a module or into - the kernel (CONFIG_BINFMT_MISC) and set it up properly. + the kernel (``CONFIG_BINFMT_MISC``) and set it up properly. If you choose to compile it as a module, you will have to insert it manually with modprobe/insmod, as kmod - cannot easily be supported with binfmt_misc. + cannot easily be supported with binfmt_misc. Read the file 'binfmt_misc.txt' in this directory to know more about the configuration process. 3) Add the following configuration items to binfmt_misc - (you should really have read binfmt_misc.txt now): - support for Java applications: + (you should really have read ``binfmt_misc.txt`` now): + support for Java applications:: + ':Java:M::\xca\xfe\xba\xbe::/usr/local/bin/javawrapper:' - support for executable Jar files: + + support for executable Jar files:: + ':ExecutableJAR:E::jar::/usr/local/bin/jarwrapper:' - support for Java Applets: + + support for Java Applets:: + ':Applet:E::html::/usr/bin/appletviewer:' - or the following, if you want to be more selective: + + or the following, if you want to be more selective:: + ':Applet:M:: in the first line - ('<' has to be the first character!) to let this work! + existing html-files to contain ```` in the first line + (``<`` has to be the first character!) to let this work! For the compiled Java programs you need a wrapper script like the following (this is because Java is broken in case of the filename handling), again fix the path names, both in the script and in the above given configuration string. - You, too, need the little program after the script. Compile like - gcc -O2 -o javaclassname javaclassname.c - and stick it to /usr/local/bin. + You, too, need the little program after the script. Compile like:: + + gcc -O2 -o javaclassname javaclassname.c + + and stick it to ``/usr/local/bin``. Both the javawrapper shellscript and the javaclassname program were supplied by Colin J. Watson . -====================== Cut here =================== -#!/bin/bash -# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java +Javawrapper shell script: + +.. code-block:: sh -if [ -z "$1" ]; then + #!/bin/bash + # /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java + + if [ -z "$1" ]; then exec 1>&2 echo Usage: $0 class-file exit 1 -fi + fi -CLASS=$1 -FQCLASS=`/usr/local/bin/javaclassname $1` -FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'` -FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'` + CLASS=$1 + FQCLASS=`/usr/local/bin/javaclassname $1` + FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'` + FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'` -# for example: -# CLASS=Test.class -# FQCLASS=foo.bar.Test -# FQCLASSN=Test -# FQCLASSP=foo/bar + # for example: + # CLASS=Test.class + # FQCLASS=foo.bar.Test + # FQCLASSN=Test + # FQCLASSP=foo/bar -unset CLASSBASE + unset CLASSBASE -declare -i LINKLEVEL=0 + declare -i LINKLEVEL=0 -while :; do + while :; do if [ "`basename $CLASS .class`" == "$FQCLASSN" ]; then # See if this directory works straight off cd -L `dirname $CLASS` @@ -119,9 +131,9 @@ while :; do exit 1 fi CLASS=`ls --color=no -l $CLASS | sed -e 's/^.* \([^ ]*\)$/\1/'` -done + done -if [ -z "$CLASSBASE" ]; then + if [ -z "$CLASSBASE" ]; then if [ -z "$FQCLASSP" ]; then GOODNAME=$FQCLASSN.class else @@ -131,96 +143,97 @@ if [ -z "$CLASSBASE" ]; then echo $0: echo " $FQCLASS should be in a file called $GOODNAME" exit 1 -fi + fi -if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then + if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then # class is not in CLASSPATH, so prepend dir of class to CLASSPATH if [ -z "${CLASSPATH}" ] ; then export CLASSPATH=$CLASSBASE else export CLASSPATH=$CLASSBASE:$CLASSPATH fi -fi - -shift -/usr/bin/java $FQCLASS "$@" -====================== Cut here =================== - - -====================== Cut here =================== -/* javaclassname.c - * - * Extracts the class name from a Java class file; intended for use in a Java - * wrapper of the type supported by the binfmt_misc option in the Linux kernel. - * - * Copyright (C) 1999 Colin J. Watson . - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include - -/* From Sun's Java VM Specification, as tag entries in the constant pool. */ - -#define CP_UTF8 1 -#define CP_INTEGER 3 -#define CP_FLOAT 4 -#define CP_LONG 5 -#define CP_DOUBLE 6 -#define CP_CLASS 7 -#define CP_STRING 8 -#define CP_FIELDREF 9 -#define CP_METHODREF 10 -#define CP_INTERFACEMETHODREF 11 -#define CP_NAMEANDTYPE 12 -#define CP_METHODHANDLE 15 -#define CP_METHODTYPE 16 -#define CP_INVOKEDYNAMIC 18 - -/* Define some commonly used error messages */ - -#define seek_error() error("%s: Cannot seek\n", program) -#define corrupt_error() error("%s: Class file corrupt\n", program) -#define eof_error() error("%s: Unexpected end of file\n", program) -#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program); - -char *program; - -long *pool; - -u_int8_t read_8(FILE *classfile); -u_int16_t read_16(FILE *classfile); -void skip_constant(FILE *classfile, u_int16_t *cur); -void error(const char *format, ...); -int main(int argc, char **argv); - -/* Reads in an unsigned 8-bit integer. */ -u_int8_t read_8(FILE *classfile) -{ + fi + + shift + /usr/bin/java $FQCLASS "$@" + +javaclassname.c: + +.. code-block:: c + + /* javaclassname.c + * + * Extracts the class name from a Java class file; intended for use in a Java + * wrapper of the type supported by the binfmt_misc option in the Linux kernel. + * + * Copyright (C) 1999 Colin J. Watson . + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + + #include + #include + #include + #include + + /* From Sun's Java VM Specification, as tag entries in the constant pool. */ + + #define CP_UTF8 1 + #define CP_INTEGER 3 + #define CP_FLOAT 4 + #define CP_LONG 5 + #define CP_DOUBLE 6 + #define CP_CLASS 7 + #define CP_STRING 8 + #define CP_FIELDREF 9 + #define CP_METHODREF 10 + #define CP_INTERFACEMETHODREF 11 + #define CP_NAMEANDTYPE 12 + #define CP_METHODHANDLE 15 + #define CP_METHODTYPE 16 + #define CP_INVOKEDYNAMIC 18 + + /* Define some commonly used error messages */ + + #define seek_error() error("%s: Cannot seek\n", program) + #define corrupt_error() error("%s: Class file corrupt\n", program) + #define eof_error() error("%s: Unexpected end of file\n", program) + #define utf8_error() error("%s: Only ASCII 1-255 supported\n", program); + + char *program; + + long *pool; + + u_int8_t read_8(FILE *classfile); + u_int16_t read_16(FILE *classfile); + void skip_constant(FILE *classfile, u_int16_t *cur); + void error(const char *format, ...); + int main(int argc, char **argv); + + /* Reads in an unsigned 8-bit integer. */ + u_int8_t read_8(FILE *classfile) + { int b = fgetc(classfile); if(b == EOF) eof_error(); return (u_int8_t)b; -} + } -/* Reads in an unsigned 16-bit integer. */ -u_int16_t read_16(FILE *classfile) -{ + /* Reads in an unsigned 16-bit integer. */ + u_int16_t read_16(FILE *classfile) + { int b1, b2; b1 = fgetc(classfile); if(b1 == EOF) @@ -229,11 +242,11 @@ u_int16_t read_16(FILE *classfile) if(b2 == EOF) eof_error(); return (u_int16_t)((b1 << 8) | b2); -} + } -/* Reads in a value from the constant pool. */ -void skip_constant(FILE *classfile, u_int16_t *cur) -{ + /* Reads in a value from the constant pool. */ + void skip_constant(FILE *classfile, u_int16_t *cur) + { u_int16_t len; int seekerr = 1; pool[*cur] = ftell(classfile); @@ -270,19 +283,19 @@ void skip_constant(FILE *classfile, u_int16_t *cur) } if(seekerr) seek_error(); -} + } -void error(const char *format, ...) -{ + void error(const char *format, ...) + { va_list ap; va_start(ap, format); vfprintf(stderr, format, ap); va_end(ap); exit(1); -} + } -int main(int argc, char **argv) -{ + int main(int argc, char **argv) + { FILE *classfile; u_int16_t cp_count, i, this_class, classinfo_ptr; u_int8_t length; @@ -349,19 +362,19 @@ int main(int argc, char **argv) free(pool); fclose(classfile); return 0; -} -====================== Cut here =================== + } + +jarwrapper:: + #!/bin/bash + # /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar -====================== Cut here =================== -#!/bin/bash -# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar + java -jar $1 -java -jar $1 -====================== Cut here =================== +Now simply ``chmod +x`` the ``.class``, ``.jar`` and/or ``.html`` files you +want to execute. -Now simply chmod +x the .class, .jar and/or .html files you want to execute. To add a Java program to your path best put a symbolic link to the main .class file into /usr/bin (or another place you like) omitting the .class extension. The directory containing the original .class file will be @@ -371,29 +384,36 @@ added to your CLASSPATH during execution. To test your new setup, enter in the following simple Java app, and name it "HelloWorld.java": +.. code-block:: java + class HelloWorld { public static void main(String args[]) { System.out.println("Hello World!"); } } -Now compile the application with: +Now compile the application with:: + javac HelloWorld.java -Set the executable permissions of the binary file, with: +Set the executable permissions of the binary file, with:: + chmod 755 HelloWorld.class -And then execute it: +And then execute it:: + ./HelloWorld.class -To execute Java Jar files, simple chmod the *.jar files to include -the execution bit, then just do +To execute Java Jar files, simple chmod the ``*.jar`` files to include +the execution bit, then just do:: + ./Application.jar -To execute Java Applets, simple chmod the *.html files to include -the execution bit, then just do +To execute Java Applets, simple chmod the ``*.html`` files to include +the execution bit, then just do:: + ./Applet.html @@ -401,4 +421,3 @@ originally by Brian A. Lantz, brian@lantz.com heavily edited for binfmt_misc by Richard Günther new scripts by Colin J. Watson added executable Jar file support by Kurt Huwig - diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst new file mode 100644 index 000000000000..b516164999a8 --- /dev/null +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -0,0 +1,209 @@ +The kernel's command-line parameters +==================================== + +The following is a consolidated list of the kernel parameters as +implemented by the __setup(), core_param() and module_param() macros +and sorted into English Dictionary order (defined as ignoring all +punctuation and sorting digits before letters in a case insensitive +manner), and with descriptions where known. + +The kernel parses parameters from the kernel command line up to "--"; +if it doesn't recognize a parameter and it doesn't contain a '.', the +parameter gets passed to init: parameters with '=' go into init's +environment, others are passed as command line arguments to init. +Everything after "--" is passed as an argument to init. + +Module parameters can be specified in two ways: via the kernel command +line with a module name prefix, or via modprobe, e.g.:: + + (kernel command line) usbcore.blinkenlights=1 + (modprobe command line) modprobe usbcore blinkenlights=1 + +Parameters for modules which are built into the kernel need to be +specified on the kernel command line. modprobe looks through the +kernel command line (/proc/cmdline) and collects module parameters +when it loads a module, so the kernel command line can be used for +loadable modules too. + +Hyphens (dashes) and underscores are equivalent in parameter names, so:: + + log_buf_len=1M print-fatal-signals=1 + +can also be entered as:: + + log-buf-len=1M print_fatal_signals=1 + +Double-quotes can be used to protect spaces in values, e.g.:: + + param="spaces in here" + +cpu lists: +---------- + +Some kernel parameters take a list of CPUs as a value, e.g. isolcpus, +nohz_full, irqaffinity, rcu_nocbs. The format of this list is: + + ,..., + +or + + - + (must be a positive range in ascending order) + +or a mixture + +,...,- + +Note that for the special case of a range one can split the range into equal +sized groups and for each group use some amount from the beginning of that +group: + + -cpu number>:/ + +For example one can add to the command line following parameter: + + isolcpus=1,2,10-20,100-2000:2/25 + +where the final item represents CPUs 100,101,125,126,150,151,... + + + +This document may not be entirely up to date and comprehensive. The command +"modinfo -p ${modulename}" shows a current list of all parameters of a loadable +module. Loadable modules, after being loaded into the running kernel, also +reveal their parameters in /sys/module/${modulename}/parameters/. Some of these +parameters may be changed at runtime by the command +``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``. + +The parameters listed below are only valid if certain kernel build options were +enabled and if respective hardware is present. The text in square brackets at +the beginning of each description states the restrictions within which a +parameter is applicable:: + + ACPI ACPI support is enabled. + AGP AGP (Accelerated Graphics Port) is enabled. + ALSA ALSA sound support is enabled. + APIC APIC support is enabled. + APM Advanced Power Management support is enabled. + ARM ARM architecture is enabled. + AVR32 AVR32 architecture is enabled. + AX25 Appropriate AX.25 support is enabled. + BLACKFIN Blackfin architecture is enabled. + CLK Common clock infrastructure is enabled. + CMA Contiguous Memory Area support is enabled. + DRM Direct Rendering Management support is enabled. + DYNAMIC_DEBUG Build in debug messages and enable them at runtime + EDD BIOS Enhanced Disk Drive Services (EDD) is enabled + EFI EFI Partitioning (GPT) is enabled + EIDE EIDE/ATAPI support is enabled. + EVM Extended Verification Module + FB The frame buffer device is enabled. + FTRACE Function tracing enabled. + GCOV GCOV profiling is enabled. + HW Appropriate hardware is enabled. + IA-64 IA-64 architecture is enabled. + IMA Integrity measurement architecture is enabled. + IOSCHED More than one I/O scheduler is enabled. + IP_PNP IP DHCP, BOOTP, or RARP is enabled. + IPV6 IPv6 support is enabled. + ISAPNP ISA PnP code is enabled. + ISDN Appropriate ISDN support is enabled. + JOY Appropriate joystick support is enabled. + KGDB Kernel debugger support is enabled. + KVM Kernel Virtual Machine support is enabled. + LIBATA Libata driver is enabled + LP Printer support is enabled. + LOOP Loopback device support is enabled. + M68k M68k architecture is enabled. + These options have more detailed description inside of + Documentation/m68k/kernel-options.txt. + MDA MDA console support is enabled. + MIPS MIPS architecture is enabled. + MOUSE Appropriate mouse support is enabled. + MSI Message Signaled Interrupts (PCI). + MTD MTD (Memory Technology Device) support is enabled. + NET Appropriate network support is enabled. + NUMA NUMA support is enabled. + NFS Appropriate NFS support is enabled. + OSS OSS sound support is enabled. + PV_OPS A paravirtualized kernel is enabled. + PARIDE The ParIDE (parallel port IDE) subsystem is enabled. + PARISC The PA-RISC architecture is enabled. + PCI PCI bus support is enabled. + PCIE PCI Express support is enabled. + PCMCIA The PCMCIA subsystem is enabled. + PNP Plug & Play support is enabled. + PPC PowerPC architecture is enabled. + PPT Parallel port support is enabled. + PS2 Appropriate PS/2 support is enabled. + RAM RAM disk support is enabled. + S390 S390 architecture is enabled. + SCSI Appropriate SCSI support is enabled. + A lot of drivers have their options described inside + the Documentation/scsi/ sub-directory. + SECURITY Different security models are enabled. + SELINUX SELinux support is enabled. + APPARMOR AppArmor support is enabled. + SERIAL Serial support is enabled. + SH SuperH architecture is enabled. + SMP The kernel is an SMP kernel. + SPARC Sparc architecture is enabled. + SWSUSP Software suspend (hibernation) is enabled. + SUSPEND System suspend states are enabled. + TPM TPM drivers are enabled. + TS Appropriate touchscreen support is enabled. + UMS USB Mass Storage support is enabled. + USB USB support is enabled. + USBHID USB Human Interface Device support is enabled. + V4L Video For Linux support is enabled. + VMMIO Driver for memory mapped virtio devices is enabled. + VGA The VGA console has been enabled. + VT Virtual terminal support is enabled. + WDT Watchdog support is enabled. + XT IBM PC/XT MFM hard disk support is enabled. + X86-32 X86-32, aka i386 architecture is enabled. + X86-64 X86-64 architecture is enabled. + More X86-64 boot options can be found in + Documentation/x86/x86_64/boot-options.txt . + X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64) + X86_UV SGI UV support is enabled. + XEN Xen support is enabled + +In addition, the following text indicates that the option:: + + BUGS= Relates to possible processor bugs on the said processor. + KNL Is a kernel start-up parameter. + BOOT Is a boot loader parameter. + +Parameters denoted with BOOT are actually interpreted by the boot +loader, and have no meaning to the kernel directly. +Do not modify the syntax of boot loader parameters without extreme +need or coordination with . + +There are also arch-specific kernel-parameters not documented here. +See for example . + +Note that ALL kernel parameters listed below are CASE SENSITIVE, and that +a trailing = on the name of any parameter states that that parameter will +be entered as an environment variable, whereas its absence indicates that +it will appear as a kernel argument readable via /proc/cmdline by programs +running once the system is up. + +The number of kernel parameters is not limited, but the length of the +complete command line (parameters including spaces etc.) is limited to +a fixed number of characters. This limit depends on the architecture +and is between 256 and 4096 characters. It is defined in the file +./include/asm/setup.h as COMMAND_LINE_SIZE. + +Finally, the [KMG] suffix is commonly described after a number of kernel +parameter values. These 'K', 'M', and 'G' letters represent the _binary_ +multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30 +bytes respectively. Such letter suffixes can also be entirely omitted: + +.. include:: kernel-parameters.txt + :literal: + +Todo +---- + + Add more DRM drivers. diff --git a/Documentation/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt similarity index 94% rename from Documentation/kernel-parameters.txt rename to Documentation/admin-guide/kernel-parameters.txt index 37babf91f2cb..21e2d8863705 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1,202 +1,3 @@ - Kernel Parameters - ~~~~~~~~~~~~~~~~~ - -The following is a consolidated list of the kernel parameters as -implemented by the __setup(), core_param() and module_param() macros -and sorted into English Dictionary order (defined as ignoring all -punctuation and sorting digits before letters in a case insensitive -manner), and with descriptions where known. - -The kernel parses parameters from the kernel command line up to "--"; -if it doesn't recognize a parameter and it doesn't contain a '.', the -parameter gets passed to init: parameters with '=' go into init's -environment, others are passed as command line arguments to init. -Everything after "--" is passed as an argument to init. - -Module parameters can be specified in two ways: via the kernel command -line with a module name prefix, or via modprobe, e.g.: - - (kernel command line) usbcore.blinkenlights=1 - (modprobe command line) modprobe usbcore blinkenlights=1 - -Parameters for modules which are built into the kernel need to be -specified on the kernel command line. modprobe looks through the -kernel command line (/proc/cmdline) and collects module parameters -when it loads a module, so the kernel command line can be used for -loadable modules too. - -Hyphens (dashes) and underscores are equivalent in parameter names, so - log_buf_len=1M print-fatal-signals=1 -can also be entered as - log-buf-len=1M print_fatal_signals=1 - -Double-quotes can be used to protect spaces in values, e.g.: - param="spaces in here" - -cpu lists: ----------- - -Some kernel parameters take a list of CPUs as a value, e.g. isolcpus, -nohz_full, irqaffinity, rcu_nocbs. The format of this list is: - - ,..., - -or - - - - (must be a positive range in ascending order) - -or a mixture - -,...,- - -Note that for the special case of a range one can split the range into equal -sized groups and for each group use some amount from the beginning of that -group: - - -cpu number>:/ - -For example one can add to the command line following parameter: - - isolcpus=1,2,10-20,100-2000:2/25 - -where the final item represents CPUs 100,101,125,126,150,151,... - - - -This document may not be entirely up to date and comprehensive. The command -"modinfo -p ${modulename}" shows a current list of all parameters of a loadable -module. Loadable modules, after being loaded into the running kernel, also -reveal their parameters in /sys/module/${modulename}/parameters/. Some of these -parameters may be changed at runtime by the command -"echo -n ${value} > /sys/module/${modulename}/parameters/${parm}". - -The parameters listed below are only valid if certain kernel build options were -enabled and if respective hardware is present. The text in square brackets at -the beginning of each description states the restrictions within which a -parameter is applicable: - - ACPI ACPI support is enabled. - AGP AGP (Accelerated Graphics Port) is enabled. - ALSA ALSA sound support is enabled. - APIC APIC support is enabled. - APM Advanced Power Management support is enabled. - ARM ARM architecture is enabled. - AVR32 AVR32 architecture is enabled. - AX25 Appropriate AX.25 support is enabled. - BLACKFIN Blackfin architecture is enabled. - CLK Common clock infrastructure is enabled. - CMA Contiguous Memory Area support is enabled. - DRM Direct Rendering Management support is enabled. - DYNAMIC_DEBUG Build in debug messages and enable them at runtime - EDD BIOS Enhanced Disk Drive Services (EDD) is enabled - EFI EFI Partitioning (GPT) is enabled - EIDE EIDE/ATAPI support is enabled. - EVM Extended Verification Module - FB The frame buffer device is enabled. - FTRACE Function tracing enabled. - GCOV GCOV profiling is enabled. - HW Appropriate hardware is enabled. - IA-64 IA-64 architecture is enabled. - IMA Integrity measurement architecture is enabled. - IOSCHED More than one I/O scheduler is enabled. - IP_PNP IP DHCP, BOOTP, or RARP is enabled. - IPV6 IPv6 support is enabled. - ISAPNP ISA PnP code is enabled. - ISDN Appropriate ISDN support is enabled. - JOY Appropriate joystick support is enabled. - KGDB Kernel debugger support is enabled. - KVM Kernel Virtual Machine support is enabled. - LIBATA Libata driver is enabled - LP Printer support is enabled. - LOOP Loopback device support is enabled. - M68k M68k architecture is enabled. - These options have more detailed description inside of - Documentation/m68k/kernel-options.txt. - MDA MDA console support is enabled. - MIPS MIPS architecture is enabled. - MOUSE Appropriate mouse support is enabled. - MSI Message Signaled Interrupts (PCI). - MTD MTD (Memory Technology Device) support is enabled. - NET Appropriate network support is enabled. - NUMA NUMA support is enabled. - NFS Appropriate NFS support is enabled. - OSS OSS sound support is enabled. - PV_OPS A paravirtualized kernel is enabled. - PARIDE The ParIDE (parallel port IDE) subsystem is enabled. - PARISC The PA-RISC architecture is enabled. - PCI PCI bus support is enabled. - PCIE PCI Express support is enabled. - PCMCIA The PCMCIA subsystem is enabled. - PNP Plug & Play support is enabled. - PPC PowerPC architecture is enabled. - PPT Parallel port support is enabled. - PS2 Appropriate PS/2 support is enabled. - RAM RAM disk support is enabled. - S390 S390 architecture is enabled. - SCSI Appropriate SCSI support is enabled. - A lot of drivers have their options described inside - the Documentation/scsi/ sub-directory. - SECURITY Different security models are enabled. - SELINUX SELinux support is enabled. - APPARMOR AppArmor support is enabled. - SERIAL Serial support is enabled. - SH SuperH architecture is enabled. - SMP The kernel is an SMP kernel. - SPARC Sparc architecture is enabled. - SWSUSP Software suspend (hibernation) is enabled. - SUSPEND System suspend states are enabled. - TPM TPM drivers are enabled. - TS Appropriate touchscreen support is enabled. - UMS USB Mass Storage support is enabled. - USB USB support is enabled. - USBHID USB Human Interface Device support is enabled. - V4L Video For Linux support is enabled. - VMMIO Driver for memory mapped virtio devices is enabled. - VGA The VGA console has been enabled. - VT Virtual terminal support is enabled. - WDT Watchdog support is enabled. - XT IBM PC/XT MFM hard disk support is enabled. - X86-32 X86-32, aka i386 architecture is enabled. - X86-64 X86-64 architecture is enabled. - More X86-64 boot options can be found in - Documentation/x86/x86_64/boot-options.txt . - X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64) - X86_UV SGI UV support is enabled. - XEN Xen support is enabled - -In addition, the following text indicates that the option: - - BUGS= Relates to possible processor bugs on the said processor. - KNL Is a kernel start-up parameter. - BOOT Is a boot loader parameter. - -Parameters denoted with BOOT are actually interpreted by the boot -loader, and have no meaning to the kernel directly. -Do not modify the syntax of boot loader parameters without extreme -need or coordination with . - -There are also arch-specific kernel-parameters not documented here. -See for example . - -Note that ALL kernel parameters listed below are CASE SENSITIVE, and that -a trailing = on the name of any parameter states that that parameter will -be entered as an environment variable, whereas its absence indicates that -it will appear as a kernel argument readable via /proc/cmdline by programs -running once the system is up. - -The number of kernel parameters is not limited, but the length of the -complete command line (parameters including spaces etc.) is limited to -a fixed number of characters. This limit depends on the architecture -and is between 256 and 4096 characters. It is defined in the file -./include/asm/setup.h as COMMAND_LINE_SIZE. - -Finally, the [KMG] suffix is commonly described after a number of kernel -parameter values. These 'K', 'M', and 'G' letters represent the _binary_ -multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30 -bytes respectively. Such letter suffixes can also be entirely omitted. - - acpi= [HW,ACPI,X86,ARM64] Advanced Configuration and Power Interface Format: { force | on | off | strict | noirq | rsdt | @@ -811,7 +612,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. bits, and "f" is flow control ("r" for RTS or omit it). Default is "9600n8". - See Documentation/serial-console.txt for more + See Documentation/admin-guide/serial-console.rst for more information. See Documentation/networking/netconsole.txt for an alternative. @@ -1062,6 +863,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. dscc4.setup= [NET] + dump_apple_properties [X86] + Dump name and content of EFI device properties on + x86 Macs. Useful for driver authors to determine + what data is available or for reverse-engineering. + dyndbg[="val"] [KNL,DYNAMIC_DEBUG] module.dyndbg[="val"] Enable debug messages at boot time. See @@ -1074,12 +880,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nopku [X86] Disable Memory Protection Keys CPU feature found in some Intel CPUs. - eagerfpu= [X86] - on enable eager fpu restore - off disable eager fpu restore - auto selects the default scheme, which automatically - enables eagerfpu restore for xsaveopt. - module.async_probe [KNL] Enable asynchronous probe on this module. @@ -1641,6 +1441,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. The builtin appraise policy appraises all files owned by uid=0. + ima_canonical_fmt [IMA] + Use the canonical format for the binary runtime + measurements, instead of host native format. + ima_hash= [IMA] Format: { md5 | sha1 | rmd160 | sha256 | sha384 | sha512 | ... } @@ -1760,6 +1564,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. disable Do not enable intel_pstate as the default scaling driver for the supported processors + passive + Use intel_pstate as a scaling driver, but configure it + to work with generic cpufreq governors (instead of + enabling its internal governor). This mode cannot be + used along with the hardware-managed P-states (HWP) + feature. force Enable intel_pstate on systems that prohibit it by default in favor of acpi-cpufreq. Forcing the intel_pstate driver @@ -1780,6 +1590,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Description Table, specifies preferred power management profile as "Enterprise Server" or "Performance Server", then this feature is turned on by default. + per_cpu_perf_limits + Allow per-logical-CPU P-State performance control limits using + cpufreq sysfs interface intremap= [X86-64, Intel-IOMMU] on enable Interrupt Remapping (default) @@ -1958,9 +1771,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. kmemcheck=2 (one-shot mode) Default: 2 (one-shot mode) - kstack=N [X86] Print N words from the kernel stack - in oops dumps. - kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) @@ -2235,7 +2045,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. mce=option [X86-64] See Documentation/x86/x86_64/boot-options.txt md= [HW] RAID subsystems devices and level - See Documentation/md.txt. + See Documentation/admin-guide/md.rst. mdacon= [MDA] Format: , @@ -2325,6 +2135,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. memory contents and reserves bad memory regions that are detected. + mem_sleep_default= [SUSPEND] Default system suspend mode: + s2idle - Suspend-To-Idle + shallow - Power-On Suspend or equivalent (if supported) + deep - Suspend-To-RAM or equivalent (if supported) + See Documentation/power/states.txt. + meye.*= [HW] Set MotionEye Camera parameters See Documentation/video4linux/meye.txt. @@ -2401,7 +2217,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. that the amount of memory usable for all allocations is not too small. - movable_node [KNL,X86] Boot-time switch to enable the effects + movable_node [KNL] Boot-time switch to enable the effects of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details. MTD_Partition= [MTD] @@ -2545,7 +2361,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. will be sent. The default is to send the implementation identification information. - + nfs.recover_lost_locks = [NFSv4] Attempt to recover locks that were lost due to a lease timeout on the server. Please note that @@ -2754,6 +2570,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page fault handling. + no-vmw-sched-clock + [X86,PV_OPS] Disable paravirtualized VMware scheduler + clock and use the default one. + no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting. steal time is computed, but won't influence scheduler behaviour @@ -3235,6 +3055,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. may be specified. Format: ,.... + powersave=off [PPC] This option disables power saving features. + It specifically disables cpuidle and sets the + platform machine description specific power_save + function to NULL. On Idle the CPU just reduces + execution priority. + ppc_strict_facility_enable [PPC] This option catches any kernel floating point, Altivec, VSX and SPE outside of regions specifically @@ -3318,7 +3144,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. r128= [HW,DRM] raid= [HW,RAID] - See Documentation/md.txt. + See Documentation/admin-guide/md.rst. ramdisk_size= [RAM] Sizes of RAM disks in kilobytes See Documentation/blockdev/ramdisk.txt. @@ -3668,13 +3494,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. [KNL, SMP] Set scheduler's default relax_domain_level. See Documentation/cgroup-v1/cpusets.txt. - relative_sleep_states= - [SUSPEND] Use sleep state labeling where the deepest - state available other than hibernation is always "mem". - Format: { "0" | "1" } - 0 -- Traditional sleep state labels. - 1 -- Relative sleep state labels. - reserve= [KNL,BUGS] Force the kernel to ignore some iomem area reservetop= [X86-32] @@ -3824,12 +3643,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. shapers= [NET] Maximal number of shapers. - show_msr= [x86] show boot-time MSR settings - Format: { } - Show boot-time (BIOS-initialized) MSR settings. - The parameter means the number of CPUs to show, - for example 1 means boot CPU only. - simeth= [IA-64] simscsi= @@ -4197,7 +4010,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. See also Documentation/input/joystick-parport.txt udbg-immortal [PPC] When debugging early kernel crashes that - happen after console_init() and before a proper + happen after console_init() and before a proper console driver takes over, this boot options might help "seeing" what's going on. @@ -4564,9 +4377,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted. xirc2ps_cs= [NET,PCMCIA] Format: ,,,,,[,[,[,]]] - -______________________________________________________________________ - -TODO: - - Add more DRM drivers. diff --git a/Documentation/md.txt b/Documentation/admin-guide/md.rst similarity index 60% rename from Documentation/md.txt rename to Documentation/admin-guide/md.rst index d6e2fcf27337..e449fb5f277c 100644 --- a/Documentation/md.txt +++ b/Documentation/admin-guide/md.rst @@ -1,42 +1,77 @@ -Tools that manage md devices can be found at - http://www.kernel.org/pub/linux/utils/raid/ - +RAID arrays +=========== Boot time assembly of RAID arrays --------------------------------- +Tools that manage md devices can be found at + http://www.kernel.org/pub/linux/utils/raid/ + + You can boot with your md device with the following kernel command lines: -for old raid arrays without persistent superblocks: +for old raid arrays without persistent superblocks:: + md=,,,,dev0,dev1,...,devn -for raid arrays with persistent superblocks +for raid arrays with persistent superblocks:: + md=,dev0,dev1,...,devn -or, to assemble a partitionable array: + +or, to assemble a partitionable array:: + md=d,dev0,dev1,...,devn - -md device no. = the number of the md device ... - 0 means md0, - 1 md1, - 2 md2, - 3 md3, - 4 md4 - -raid level = -1 linear mode - 0 striped mode - other modes are only supported with persistent super blocks - -chunk size factor = (raid-0 and raid-1 only) - Set the chunk size as 4k << n. - -fault level = totally ignored - -dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1 - -A possible loadlin line (Harald Hoyer ) looks like this: - -e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro + +``md device no.`` ++++++++++++++++++ + +The number of the md device + +================= ========= +``md device no.`` device +================= ========= + 0 md0 + 1 md1 + 2 md2 + 3 md3 + 4 md4 +================= ========= + +``raid level`` +++++++++++++++ + +level of the RAID array + +=============== ============= +``raid level`` level +=============== ============= +-1 linear mode +0 striped mode +=============== ============= + +other modes are only supported with persistent super blocks + +``chunk size factor`` ++++++++++++++++++++++ + +(raid-0 and raid-1 only) + +Set the chunk size as 4k << n. + +``fault level`` ++++++++++++++++ + +Totally ignored + +``dev0`` to ``devn`` +++++++++++++++++++++ + +e.g. ``/dev/hda1``, ``/dev/hdc1``, ``/dev/sda1``, ``/dev/sdb1`` + +A possible loadlin line (Harald Hoyer ) looks like this:: + + e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro Boot time autodetection of RAID arrays @@ -45,10 +80,10 @@ Boot time autodetection of RAID arrays When md is compiled into the kernel (not as module), partitions of type 0xfd are scanned and automatically assembled into RAID arrays. This autodetection may be suppressed with the kernel parameter -"raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 +``raid=noautodetect``. As of kernel 2.6.9, only drives with a type 0 superblock can be autodetected and run at boot time. -The kernel parameter "raid=partitionable" (or "raid=part") means +The kernel parameter ``raid=partitionable`` (or ``raid=part``) means that all auto-detected arrays are assembled as partitionable. Boot time assembly of degraded/dirty arrays @@ -56,22 +91,23 @@ Boot time assembly of degraded/dirty arrays If a raid5 or raid6 array is both dirty and degraded, it could have undetectable data corruption. This is because the fact that it is -'dirty' means that the parity cannot be trusted, and the fact that it +``dirty`` means that the parity cannot be trusted, and the fact that it is degraded means that some datablocks are missing and cannot reliably be reconstructed (due to no parity). For this reason, md will normally refuse to start such an array. This requires the sysadmin to take action to explicitly start the array -despite possible corruption. This is normally done with +despite possible corruption. This is normally done with:: + mdadm --assemble --force .... This option is not really available if the array has the root filesystem on it. In order to support this booting from such an -array, md supports a module parameter "start_dirty_degraded" which, +array, md supports a module parameter ``start_dirty_degraded`` which, when set to 1, bypassed the checks and will allows dirty degraded arrays to be started. -So, to boot with a root filesystem of a dirty degraded raid[56], use +So, to boot with a root filesystem of a dirty degraded raid 5 or 6, use:: md-mod.start_dirty_degraded=1 @@ -80,30 +116,30 @@ Superblock formats ------------------ The md driver can support a variety of different superblock formats. -Currently, it supports superblock formats "0.90.0" and the "md-1" format +Currently, it supports superblock formats ``0.90.0`` and the ``md-1`` format introduced in the 2.5 development series. The kernel will autodetect which format superblock is being used. -Superblock format '0' is treated differently to others for legacy +Superblock format ``0`` is treated differently to others for legacy reasons - it is the original superblock format. General Rules - apply for all superblock formats ------------------------------------------------ -An array is 'created' by writing appropriate superblocks to all +An array is ``created`` by writing appropriate superblocks to all devices. -It is 'assembled' by associating each of these devices with an +It is ``assembled`` by associating each of these devices with an particular md virtual device. Once it is completely assembled, it can be accessed. An array should be created by a user-space tool. This will write superblocks to all devices. It will usually mark the array as -'unclean', or with some devices missing so that the kernel md driver -can create appropriate redundancy (copying in raid1, parity -calculation in raid4/5). +``unclean``, or with some devices missing so that the kernel md driver +can create appropriate redundancy (copying in raid 1, parity +calculation in raid 4/5). When an array is assembled, it is first initialized with the SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor @@ -126,13 +162,12 @@ Devices that have failed or are not yet active can be detached from an array using HOT_REMOVE_DISK. -Specific Rules that apply to format-0 super block arrays, and - arrays with no superblock (non-persistent). -------------------------------------------------------------- +Specific Rules that apply to format-0 super block arrays, and arrays with no superblock (non-persistent) +-------------------------------------------------------------------------------------------------------- -An array can be 'created' by describing the array (level, chunksize -etc) in a SET_ARRAY_INFO ioctl. This must have major_version==0 and -raid_disks != 0. +An array can be ``created`` by describing the array (level, chunksize +etc) in a SET_ARRAY_INFO ioctl. This must have ``major_version==0`` and +``raid_disks != 0``. Then uninitialized devices can be added with ADD_NEW_DISK. The structure passed to ADD_NEW_DISK must specify the state of the device @@ -142,24 +177,26 @@ Once started with RUN_ARRAY, uninitialized spares can be added with HOT_ADD_DISK. - MD devices in sysfs ------------------- -md devices appear in sysfs (/sys) as regular block devices, -e.g. + +md devices appear in sysfs (``/sys``) as regular block devices, +e.g.:: + /sys/block/md0 -Each 'md' device will contain a subdirectory called 'md' which +Each ``md`` device will contain a subdirectory called ``md`` which contains further md-specific information about the device. All md devices contain: + level - a text file indicating the 'raid level'. e.g. raid0, raid1, + a text file indicating the ``raid level``. e.g. raid0, raid1, raid5, linear, multipath, faulty. If no raid level has been set yet (array is still being assembled), the value will reflect whatever has been written to it, which may be a name like the above, or may be a number - such as '0', '5', etc. + such as ``0``, ``5``, etc. raid_disks a text file with a simple number indicating the number of devices @@ -172,10 +209,10 @@ All md devices contain: A change to this attribute will not be permitted if it would reduce the size of the array. To reduce the number of drives in an e.g. raid5, the array size must first be reduced by - setting the 'array_size' attribute. + setting the ``array_size`` attribute. chunk_size - This is the size in bytes for 'chunks' and is only relevant to + This is the size in bytes for ``chunks`` and is only relevant to raid levels that involve striping (0,4,5,6,10). The address space of the array is conceptually divided into chunks and consecutive chunks are striped onto neighbouring devices. @@ -183,7 +220,7 @@ All md devices contain: of 2. This can only be set while assembling an array layout - The "layout" for the array for the particular level. This is + The ``layout`` for the array for the particular level. This is simply a number that is interpretted differently by different levels. It can be written while assembling an array. @@ -193,22 +230,24 @@ All md devices contain: devices. Writing a number (in Kilobytes) which is less than the available size will set the size. Any reconfiguration of the array (e.g. adding devices) will not cause the size to change. - Writing the word 'default' will cause the effective size of the + Writing the word ``default`` will cause the effective size of the array to be whatever size is actually available based on - 'level', 'chunk_size' and 'component_size'. + ``level``, ``chunk_size`` and ``component_size``. This can be used to reduce the size of the array before reducing the number of devices in a raid4/5/6, or to support external metadata formats which mandate such clipping. reshape_position - This is either "none" or a sector number within the devices of - the array where "reshape" is up to. If this is set, the three + This is either ``none`` or a sector number within the devices of + the array where ``reshape`` is up to. If this is set, the three attributes mentioned above (raid_disks, chunk_size, layout) can potentially have 2 values, an old and a new value. If these - values differ, reading the attribute returns + values differ, reading the attribute returns:: + new (old) - and writing will effect the 'new' value, leaving the 'old' + + and writing will effect the ``new`` value, leaving the ``old`` unchanged. component_size @@ -223,9 +262,9 @@ All md devices contain: metadata_version This indicates the format that is being used to record metadata about the array. It can be 0.90 (traditional format), 1.0, 1.1, - 1.2 (newer format in varying locations) or "none" indicating that + 1.2 (newer format in varying locations) or ``none`` indicating that the kernel isn't managing metadata at all. - Alternately it can be "external:" followed by a string which + Alternately it can be ``external:`` followed by a string which is set by user-space. This indicates that metadata is managed by a user-space program. Any device failure or other event that requires a metadata update will cause array activity to be @@ -233,9 +272,9 @@ All md devices contain: resync_start The point at which resync should start. If no resync is needed, - this will be a very large number (or 'none' since 2.6.30-rc1). At + this will be a very large number (or ``none`` since 2.6.30-rc1). At array creation it will default to 0, though starting the array as - 'clean' will set it much larger. + ``clean`` will set it much larger. new_dev This file can be written but not read. The value written should @@ -246,10 +285,10 @@ All md devices contain: safe_mode_delay When an md array has seen no write requests for a certain period - of time, it will be marked as 'clean'. When another write - request arrives, the array is marked as 'dirty' before the write - commences. This is known as 'safe_mode'. - The 'certain period' is controlled by this file which stores the + of time, it will be marked as ``clean``. When another write + request arrives, the array is marked as ``dirty`` before the write + commences. This is known as ``safe_mode``. + The ``certain period`` is controlled by this file which stores the period as a number of seconds. The default is 200msec (0.200). Writing a value of 0 disables safemode. @@ -260,38 +299,50 @@ All md devices contain: cannot be explicitly set, and some transitions are not allowed. Select/poll works on this file. All changes except between - active_idle and active (which can be frequent and are not - very interesting) are notified. active->active_idle is - reported if the metadata is externally managed. + Active_idle and active (which can be frequent and are not + very interesting) are notified. active->active_idle is + reported if the metadata is externally managed. clear No devices, no size, no level + Writing is equivalent to STOP_ARRAY ioctl + inactive May have some settings, but array is not active - all IO results in error + all IO results in error + When written, doesn't tear down array, but just stops it + suspended (not supported yet) All IO requests will block. The array can be reconfigured. + Writing this, if accepted, will block until array is quiessent + readonly no resync can happen. no superblocks get written. - write requests fail + + Write requests fail + read-auto - like readonly, but behaves like 'clean' on a write request. + like readonly, but behaves like ``clean`` on a write request. + + clean + no pending writes, but otherwise active. - clean - no pending writes, but otherwise active. When written to inactive array, starts without resync + If a write request arrives then - if metadata is known, mark 'dirty' and switch to 'active'. - if not known, block and switch to write-pending + if metadata is known, mark ``dirty`` and switch to ``active``. + if not known, block and switch to write-pending + If written to an active array that has pending writes, then fails. active fully active: IO and resync can be happening. When written to inactive array, starts with resync write-pending - clean, but writes are blocked waiting for 'active' to be written. + clean, but writes are blocked waiting for ``active`` to be written. active-idle like active, but no writes have been seen for a while (safe_mode_delay). @@ -299,57 +350,71 @@ All md devices contain: bitmap/location This indicates where the write-intent bitmap for the array is stored. - It can be one of "none", "file" or "[+-]N". - "file" may later be extended to "file:/file/name" - "[+-]N" means that many sectors from the start of the metadata. - This is replicated on all devices. For arrays with externally - managed metadata, the offset is from the beginning of the - device. + + It can be one of ``none``, ``file`` or ``[+-]N``. + ``file`` may later be extended to ``file:/file/name`` + ``[+-]N`` means that many sectors from the start of the metadata. + + This is replicated on all devices. For arrays with externally + managed metadata, the offset is from the beginning of the + device. + bitmap/chunksize The size, in bytes, of the chunk which will be represented by a single bit. For RAID456, it is a portion of an individual device. For RAID10, it is a portion of the array. For RAID1, it is both (they come to the same thing). + bitmap/time_base The time, in seconds, between looking for bits in the bitmap to be cleared. In the current implementation, a bit will be cleared - between 2 and 3 times "time_base" after all the covered blocks + between 2 and 3 times ``time_base`` after all the covered blocks are known to be in-sync. + bitmap/backlog When write-mostly devices are active in a RAID1, write requests to those devices proceed in the background - the filesystem (or other user of the device) does not have to wait for them. - 'backlog' sets a limit on the number of concurrent background + ``backlog`` sets a limit on the number of concurrent background writes. If there are more than this, new writes will by synchronous. + bitmap/metadata - This can be either 'internal' or 'external'. - 'internal' is the default and means the metadata for the bitmap - is stored in the first 256 bytes of the allocated space and is - managed by the md module. - 'external' means that bitmap metadata is managed externally to - the kernel (i.e. by some userspace program) + This can be either ``internal`` or ``external``. + + ``internal`` + is the default and means the metadata for the bitmap + is stored in the first 256 bytes of the allocated space and is + managed by the md module. + + ``external`` + means that bitmap metadata is managed externally to + the kernel (i.e. by some userspace program) + bitmap/can_clear - This is either 'true' or 'false'. If 'true', then bits in the + This is either ``true`` or ``false``. If ``true``, then bits in the bitmap will be cleared when the corresponding blocks are thought - to be in-sync. If 'false', bits will never be cleared. - This is automatically set to 'false' if a write happens on a + to be in-sync. If ``false``, bits will never be cleared. + This is automatically set to ``false`` if a write happens on a degraded array, or if the array becomes degraded during a write. When metadata is managed externally, it should be set to true once the array becomes non-degraded, and this fact has been recorded in the metadata. - - - -As component devices are added to an md array, they appear in the 'md' -directory as new directories named + + + +As component devices are added to an md array, they appear in the ``md`` +directory as new directories named:: + dev-XXX -where XXX is a name that the kernel knows for the device, e.g. hdb1. + +where ``XXX`` is a name that the kernel knows for the device, e.g. hdb1. Each directory contains: block - a symlink to the block device in /sys/block, e.g. + a symlink to the block device in /sys/block, e.g.:: + /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1 super @@ -358,51 +423,83 @@ Each directory contains: state A file recording the current state of the device in the array - which can be a comma separated list of - faulty - device has been kicked from active use due to - a detected fault, or it has unacknowledged bad - blocks - in_sync - device is a fully in-sync member of the array - writemostly - device will only be subject to read - requests if there are no other options. - This applies only to raid1 arrays. - blocked - device has failed, and the failure hasn't been - acknowledged yet by the metadata handler. - Writes that would write to this device if - it were not faulty are blocked. - spare - device is working, but not a full member. - This includes spares that are in the process - of being recovered to - write_error - device has ever seen a write error. - want_replacement - device is (mostly) working but probably - should be replaced, either due to errors or - due to user request. - replacement - device is a replacement for another active - device with same raid_disk. + which can be a comma separated list of: + + faulty + device has been kicked from active use due to + a detected fault, or it has unacknowledged bad + blocks + + in_sync + device is a fully in-sync member of the array + + writemostly + device will only be subject to read + requests if there are no other options. + + This applies only to raid1 arrays. + + blocked + device has failed, and the failure hasn't been + acknowledged yet by the metadata handler. + + Writes that would write to this device if + it were not faulty are blocked. + + spare + device is working, but not a full member. + + This includes spares that are in the process + of being recovered to + + write_error + device has ever seen a write error. + + want_replacement + device is (mostly) working but probably + should be replaced, either due to errors or + due to user request. + + replacement + device is a replacement for another active + device with same raid_disk. This list may grow in future. + This can be written to. - Writing "faulty" simulates a failure on the device. - Writing "remove" removes the device from the array. - Writing "writemostly" sets the writemostly flag. - Writing "-writemostly" clears the writemostly flag. - Writing "blocked" sets the "blocked" flag. - Writing "-blocked" clears the "blocked" flags and allows writes - to complete and possibly simulates an error. - Writing "in_sync" sets the in_sync flag. - Writing "write_error" sets writeerrorseen flag. - Writing "-write_error" clears writeerrorseen flag. - Writing "want_replacement" is allowed at any time except to a - replacement device or a spare. It sets the flag. - Writing "-want_replacement" is allowed at any time. It clears - the flag. - Writing "replacement" or "-replacement" is only allowed before - starting the array. It sets or clears the flag. - - - This file responds to select/poll. Any change to 'faulty' - or 'blocked' causes an event. + + Writing ``faulty`` simulates a failure on the device. + + Writing ``remove`` removes the device from the array. + + Writing ``writemostly`` sets the writemostly flag. + + Writing ``-writemostly`` clears the writemostly flag. + + Writing ``blocked`` sets the ``blocked`` flag. + + Writing ``-blocked`` clears the ``blocked`` flags and allows writes + to complete and possibly simulates an error. + + Writing ``in_sync`` sets the in_sync flag. + + Writing ``write_error`` sets writeerrorseen flag. + + Writing ``-write_error`` clears writeerrorseen flag. + + Writing ``want_replacement`` is allowed at any time except to a + replacement device or a spare. It sets the flag. + + Writing ``-want_replacement`` is allowed at any time. It clears + the flag. + + Writing ``replacement`` or ``-replacement`` is only allowed before + starting the array. It sets or clears the flag. + + + This file responds to select/poll. Any change to ``faulty`` + or ``blocked`` causes an event. errors An approximate count of read errors that have been detected on @@ -417,9 +514,9 @@ Each directory contains: slot This gives the role that the device has in the array. It will - either be 'none' if the device is not active in the array + either be ``none`` if the device is not active in the array (i.e. is a spare or has failed) or an integer less than the - 'raid_disks' number for the array indicating which position + ``raid_disks`` number for the array indicating which position it currently fills. This can only be set while assembling an array. A device for which this is set is assumed to be working. @@ -437,7 +534,7 @@ Each directory contains: written, it will be rejected. recovery_start - When the device is not 'in_sync', this records the number of + When the device is not ``in_sync``, this records the number of sectors from the start of the device which are known to be correct. This is normally zero, but during a recovery operation it will steadily increase, and if the recovery is @@ -447,21 +544,21 @@ Each directory contains: This can be set whenever the device is not an active member of the array, either before the array is activated, or before - the 'slot' is set. + the ``slot`` is set. + + Setting this to ``none`` is equivalent to setting ``in_sync``. + Setting to any other value also clears the ``in_sync`` flag. - Setting this to 'none' is equivalent to setting 'in_sync'. - Setting to any other value also clears the 'in_sync' flag. - bad_blocks This gives the list of all known bad blocks in the form of start address and length (in sectors respectively). If output is too big to fit in a page, it will be truncated. Writing - "sector length" to this file adds new acknowledged (i.e. + ``sector length`` to this file adds new acknowledged (i.e. recorded to disk safely) bad blocks. unacknowledged_bad_blocks This gives the list of known-but-not-yet-saved-to-disk bad - blocks in the same form of 'bad_blocks'. If output is too big + blocks in the same form of ``bad_blocks``. If output is too big to fit in a page, it will be truncated. Writing to this file adds bad blocks without acknowledging them. This is largely for testing. @@ -469,16 +566,18 @@ Each directory contains: An active md device will also contain an entry for each active device -in the array. These are named +in the array. These are named:: rdNN -where 'NN' is the position in the array, starting from 0. +where ``NN`` is the position in the array, starting from 0. So for a 3 drive array there will be rd0, rd1, rd2. -These are symbolic links to the appropriate 'dev-XXX' entry. -Thus, for example, +These are symbolic links to the appropriate ``dev-XXX`` entry. +Thus, for example:: + cat /sys/block/md*/md/rd*/state -will show 'in_sync' on every line. + +will show ``in_sync`` on every line. @@ -488,50 +587,62 @@ also have sync_action a text file that can be used to monitor and control the rebuild process. It contains one word which can be one of: - resync - redundancy is being recalculated after unclean - shutdown or creation - recover - a hot spare is being built to replace a - failed/missing device - idle - nothing is happening - check - A full check of redundancy was requested and is - happening. This reads all blocks and checks - them. A repair may also happen for some raid - levels. - repair - A full check and repair is happening. This is - similar to 'resync', but was requested by the - user, and the write-intent bitmap is NOT used to - optimise the process. + + resync + redundancy is being recalculated after unclean + shutdown or creation + + recover + a hot spare is being built to replace a + failed/missing device + + idle + nothing is happening + check + A full check of redundancy was requested and is + happening. This reads all blocks and checks + them. A repair may also happen for some raid + levels. + + repair + A full check and repair is happening. This is + similar to ``resync``, but was requested by the + user, and the write-intent bitmap is NOT used to + optimise the process. This file is writable, and each of the strings that could be read are meaningful for writing. - 'idle' will stop an active resync/recovery etc. There is no - guarantee that another resync/recovery may not be automatically - started again, though some event will be needed to trigger - this. - 'resync' or 'recovery' can be used to restart the - corresponding operation if it was stopped with 'idle'. - 'check' and 'repair' will start the appropriate process - providing the current state is 'idle'. + ``idle`` will stop an active resync/recovery etc. There is no + guarantee that another resync/recovery may not be automatically + started again, though some event will be needed to trigger + this. + + ``resync`` or ``recovery`` can be used to restart the + corresponding operation if it was stopped with ``idle``. + + ``check`` and ``repair`` will start the appropriate process + providing the current state is ``idle``. This file responds to select/poll. Any important change in the value triggers a poll event. Sometimes the value will briefly be - "recover" if a recovery seems to be needed, but cannot be - achieved. In that case, the transition to "recover" isn't + ``recover`` if a recovery seems to be needed, but cannot be + achieved. In that case, the transition to ``recover`` isn't notified, but the transition away is. degraded This contains a count of the number of devices by which the - arrays is degraded. So an optimal array will show '0'. A - single failed/missing drive will show '1', etc. + arrays is degraded. So an optimal array will show ``0``. A + single failed/missing drive will show ``1``, etc. + This file responds to select/poll, any increase or decrease in the count of missing devices will trigger an event. mismatch_count - When performing 'check' and 'repair', and possibly when - performing 'resync', md will count the number of errors that are - found. The count in 'mismatch_cnt' is the number of sectors - that were re-written, or (for 'check') would have been + When performing ``check`` and ``repair``, and possibly when + performing ``resync``, md will count the number of errors that are + found. The count in ``mismatch_cnt`` is the number of sectors + that were re-written, or (for ``check``) would have been re-written. As most raid levels work in units of pages rather than sectors, this may be larger than the number of actual errors by a factor of the number of sectors in a page. @@ -542,27 +653,30 @@ also have would need to check the corresponding blocks. Either individual numbers or start-end pairs can be written. Multiple numbers can be separated by a space. - Note that the numbers are 'bit' numbers, not 'block' numbers. + + Note that the numbers are ``bit`` numbers, not ``block`` numbers. They should be scaled by the bitmap_chunksize. - sync_speed_min - sync_speed_max - This are similar to /proc/sys/dev/raid/speed_limit_{min,max} + sync_speed_min, sync_speed_max + This are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}`` however they only apply to the particular array. - If no value has been written to these, or if the word 'system' + + If no value has been written to these, or if the word ``system`` is written, then the system-wide value is used. If a value, in kibibytes-per-second is written, then it is used. + When the files are read, they show the currently active value - followed by "(local)" or "(system)" depending on whether it is + followed by ``(local)`` or ``(system)`` depending on whether it is a locally set or system-wide value. sync_completed This shows the number of sectors that have been completed of whatever the current sync_action is, followed by the number of sectors in total that could need to be processed. The two - numbers are separated by a '/' thus effectively showing one + numbers are separated by a ``/`` thus effectively showing one value, a fraction of the process that is complete. - A 'select' on this attribute will return when resync completes, + + A ``select`` on this attribute will return when resync completes, when it reaches the current sync_max (below) and possibly at other times. @@ -570,26 +684,24 @@ also have This shows the current actual speed, in K/sec, of the current sync_action. It is averaged over the last 30 seconds. - suspend_lo - suspend_hi + suspend_lo, suspend_hi The two values, given as numbers of sectors, indicate a range within the array where IO will be blocked. This is currently only supported for raid4/5/6. - sync_min - sync_max + sync_min, sync_max The two values, given as numbers of sectors, indicate a range - within the array where 'check'/'repair' will operate. Must be - a multiple of chunk_size. When it reaches "sync_max" it will + within the array where ``check``/``repair`` will operate. Must be + a multiple of chunk_size. When it reaches ``sync_max`` it will pause, rather than complete. - You can use 'select' or 'poll' on "sync_completed" to wait for + You can use ``select`` or ``poll`` on ``sync_completed`` to wait for that number to reach sync_max. Then you can either increase - "sync_max", or can write 'idle' to "sync_action". + ``sync_max``, or can write ``idle`` to ``sync_action``. - The value of 'max' for "sync_max" effectively disables the limit. + The value of ``max`` for ``sync_max`` effectively disables the limit. When a resync is active, the value can only ever be increased, never decreased. - The value of '0' is the minimum for "sync_min". + The value of ``0`` is the minimum for ``sync_min``. @@ -598,13 +710,15 @@ personality module that manages it. These are specific to the implementation of the module and could change substantially if the implementation changes. -These currently include +These currently include: stripe_cache_size (currently raid5 only) number of entries in the stripe cache. This is writable, but there are upper and lower limits (32768, 17). Default is 256. + strip_cache_active (currently raid5 only) number of active entries in the stripe cache + preread_bypass_threshold (currently raid5 only) number of times a stripe requiring preread will be bypassed by a stripe that does not require preread. For fairness defaults diff --git a/Documentation/module-signing.txt b/Documentation/admin-guide/module-signing.rst similarity index 71% rename from Documentation/module-signing.txt rename to Documentation/admin-guide/module-signing.rst index f0e3361db20c..27e59498b487 100644 --- a/Documentation/module-signing.txt +++ b/Documentation/admin-guide/module-signing.rst @@ -1,22 +1,21 @@ - ============================== - KERNEL MODULE SIGNING FACILITY - ============================== - -CONTENTS - - - Overview. - - Configuring module signing. - - Generating signing keys. - - Public keys in the kernel. - - Manually signing modules. - - Signed modules and stripping. - - Loading signed modules. - - Non-valid signatures and unsigned modules. - - Administering/protecting the private key. +Kernel module signing facility +------------------------------ + +.. CONTENTS +.. +.. - Overview. +.. - Configuring module signing. +.. - Generating signing keys. +.. - Public keys in the kernel. +.. - Manually signing modules. +.. - Signed modules and stripping. +.. - Loading signed modules. +.. - Non-valid signatures and unsigned modules. +.. - Administering/protecting the private key. ======== -OVERVIEW +Overview ======== The kernel module signing facility cryptographically signs modules during @@ -36,17 +35,19 @@ SHA-512 (the algorithm is selected by data in the signature). ========================== -CONFIGURING MODULE SIGNING +Configuring module signing ========================== -The module signing facility is enabled by going to the "Enable Loadable Module -Support" section of the kernel configuration and turning on +The module signing facility is enabled by going to the +:menuselection:`Enable Loadable Module Support` section of +the kernel configuration and turning on:: CONFIG_MODULE_SIG "Module signature verification" This has a number of options available: - (1) "Require modules to be validly signed" (CONFIG_MODULE_SIG_FORCE) + (1) :menuselection:`Require modules to be validly signed` + (``CONFIG_MODULE_SIG_FORCE``) This specifies how the kernel should deal with a module that has a signature for which the key is not known or a module that is unsigned. @@ -64,35 +65,39 @@ This has a number of options available: cannot be parsed, it will be rejected out of hand. - (2) "Automatically sign all modules" (CONFIG_MODULE_SIG_ALL) + (2) :menuselection:`Automatically sign all modules` + (``CONFIG_MODULE_SIG_ALL``) If this is on then modules will be automatically signed during the modules_install phase of a build. If this is off, then the modules must - be signed manually using: + be signed manually using:: scripts/sign-file - (3) "Which hash algorithm should modules be signed with?" + (3) :menuselection:`Which hash algorithm should modules be signed with?` This presents a choice of which hash algorithm the installation phase will sign the modules with: - CONFIG_MODULE_SIG_SHA1 "Sign modules with SHA-1" - CONFIG_MODULE_SIG_SHA224 "Sign modules with SHA-224" - CONFIG_MODULE_SIG_SHA256 "Sign modules with SHA-256" - CONFIG_MODULE_SIG_SHA384 "Sign modules with SHA-384" - CONFIG_MODULE_SIG_SHA512 "Sign modules with SHA-512" + =============================== ========================================== + ``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1` + ``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224` + ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256` + ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384` + ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512` + =============================== ========================================== The algorithm selected here will also be built into the kernel (rather than being a module) so that modules signed with that algorithm can have their signatures checked without causing a dependency loop. - (4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY) + (4) :menuselection:`File name or PKCS#11 URI of module signing key` + (``CONFIG_MODULE_SIG_KEY``) Setting this option to something other than its default of - "certs/signing_key.pem" will disable the autogeneration of signing keys + ``certs/signing_key.pem`` will disable the autogeneration of signing keys and allow the kernel modules to be signed with a key of your choosing. The string provided should identify a file containing both a private key and its corresponding X.509 certificate in PEM form, or — on systems where @@ -102,10 +107,11 @@ This has a number of options available: If the PEM file containing the private key is encrypted, or if the PKCS#11 token requries a PIN, this can be provided at build time by - means of the KBUILD_SIGN_PIN variable. + means of the ``KBUILD_SIGN_PIN`` variable. - (5) "Additional X.509 keys for default system keyring" (CONFIG_SYSTEM_TRUSTED_KEYS) + (5) :menuselection:`Additional X.509 keys for default system keyring` + (``CONFIG_SYSTEM_TRUSTED_KEYS``) This option can be set to the filename of a PEM-encoded file containing additional certificates which will be included in the system keyring by @@ -116,7 +122,7 @@ packages to the kernel build processes for the tool that does the signing. ======================= -GENERATING SIGNING KEYS +Generating signing keys ======================= Cryptographic keypairs are required to generate and check signatures. A @@ -126,14 +132,14 @@ it can be deleted or stored securely. The public key gets built into the kernel so that it can be used to check the signatures as the modules are loaded. -Under normal conditions, when CONFIG_MODULE_SIG_KEY is unchanged from its +Under normal conditions, when ``CONFIG_MODULE_SIG_KEY`` is unchanged from its default, the kernel build will automatically generate a new keypair using -openssl if one does not exist in the file: +openssl if one does not exist in the file:: certs/signing_key.pem during the building of vmlinux (the public part of the key needs to be built -into vmlinux) using parameters in the: +into vmlinux) using parameters in the:: certs/x509.genkey @@ -142,14 +148,14 @@ file (which is also generated if it does not already exist). It is strongly recommended that you provide your own x509.genkey file. Most notably, in the x509.genkey file, the req_distinguished_name section -should be altered from the default: +should be altered from the default:: [ req_distinguished_name ] #O = Unspecified company CN = Build time autogenerated kernel key #emailAddress = unspecified.user@unspecified.company -The generated RSA key size can also be set with: +The generated RSA key size can also be set with:: [ req ] default_bits = 4096 @@ -158,23 +164,23 @@ The generated RSA key size can also be set with: It is also possible to manually generate the key private/public files using the x509.genkey key generation configuration file in the root node of the Linux kernel sources tree and the openssl command. The following is an example to -generate the public/private key files: +generate the public/private key files:: openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \ -config x509.genkey -outform PEM -out kernel_key.pem \ -keyout kernel_key.pem The full pathname for the resulting kernel_key.pem file can then be specified -in the CONFIG_MODULE_SIG_KEY option, and the certificate and key therein will +in the ``CONFIG_MODULE_SIG_KEY`` option, and the certificate and key therein will be used instead of an autogenerated keypair. ========================= -PUBLIC KEYS IN THE KERNEL +Public keys in the kernel ========================= The kernel contains a ring of public keys that can be viewed by root. They're -in a keyring called ".system_keyring" that can be seen by: +in a keyring called ".system_keyring" that can be seen by:: [root@deneb ~]# cat /proc/keys ... @@ -184,27 +190,27 @@ in a keyring called ".system_keyring" that can be seen by: Beyond the public key generated specifically for module signing, additional trusted certificates can be provided in a PEM-encoded file referenced by the -CONFIG_SYSTEM_TRUSTED_KEYS configuration option. +``CONFIG_SYSTEM_TRUSTED_KEYS`` configuration option. Further, the architecture code may take public keys from a hardware store and add those in also (e.g. from the UEFI key database). -Finally, it is possible to add additional public keys by doing: +Finally, it is possible to add additional public keys by doing:: keyctl padd asymmetric "" [.system_keyring-ID] <[key-file] -e.g.: +e.g.:: keyctl padd asymmetric "" 0x223c7853 /proc/sys/fs/binfmt_misc/register -else + else echo "No binfmt_misc support" exit 1 -fi + fi -4) Check that .exe binaries can be ran without the need of a - wrapper script, simply by launching the .exe file directly - from a command prompt, for example: +4) Check that ``.exe`` binaries can be ran without the need of a + wrapper script, simply by launching the ``.exe`` file directly + from a command prompt, for example:: /usr/bin/xsd.exe - NOTE: If this fails with a permission denied error, check - that the .exe file has execute permissions. + .. note:: + + If this fails with a permission denied error, check + that the ``.exe`` file has execute permissions. diff --git a/Documentation/admin-guide/parport.rst b/Documentation/admin-guide/parport.rst new file mode 100644 index 000000000000..ad3f9b8a11e1 --- /dev/null +++ b/Documentation/admin-guide/parport.rst @@ -0,0 +1,286 @@ +Parport ++++++++ + +The ``parport`` code provides parallel-port support under Linux. This +includes the ability to share one port between multiple device +drivers. + +You can pass parameters to the ``parport`` code to override its automatic +detection of your hardware. This is particularly useful if you want +to use IRQs, since in general these can't be autoprobed successfully. +By default IRQs are not used even if they **can** be probed. This is +because there are a lot of people using the same IRQ for their +parallel port and a sound card or network card. + +The ``parport`` code is split into two parts: generic (which deals with +port-sharing) and architecture-dependent (which deals with actually +using the port). + + +Parport as modules +================== + +If you load the `parport`` code as a module, say:: + + # insmod parport + +to load the generic ``parport`` code. You then must load the +architecture-dependent code with (for example):: + + # insmod parport_pc io=0x3bc,0x378,0x278 irq=none,7,auto + +to tell the ``parport`` code that you want three PC-style ports, one at +0x3bc with no IRQ, one at 0x378 using IRQ 7, and one at 0x278 with an +auto-detected IRQ. Currently, PC-style (``parport_pc``), Sun ``bpp``, +Amiga, Atari, and MFC3 hardware is supported. + +PCI parallel I/O card support comes from ``parport_pc``. Base I/O +addresses should not be specified for supported PCI cards since they +are automatically detected. + + +modprobe +-------- + +If you use modprobe , you will find it useful to add lines as below to a +configuration file in /etc/modprobe.d/ directory:: + + alias parport_lowlevel parport_pc + options parport_pc io=0x378,0x278 irq=7,auto + +modprobe will load ``parport_pc`` (with the options ``io=0x378,0x278 irq=7,auto``) +whenever a parallel port device driver (such as ``lp``) is loaded. + +Note that these are example lines only! You shouldn't in general need +to specify any options to ``parport_pc`` in order to be able to use a +parallel port. + + +Parport probe [optional] +------------------------ + +In 2.2 kernels there was a module called ``parport_probe``, which was used +for collecting IEEE 1284 device ID information. This has now been +enhanced and now lives with the IEEE 1284 support. When a parallel +port is detected, the devices that are connected to it are analysed, +and information is logged like this:: + + parport0: Printer, BJC-210 (Canon) + +The probe information is available from files in ``/proc/sys/dev/parport/``. + + +Parport linked into the kernel statically +========================================= + +If you compile the ``parport`` code into the kernel, then you can use +kernel boot parameters to get the same effect. Add something like the +following to your LILO command line:: + + parport=0x3bc parport=0x378,7 parport=0x278,auto,nofifo + +You can have many ``parport=...`` statements, one for each port you want +to add. Adding ``parport=0`` to the kernel command-line will disable +parport support entirely. Adding ``parport=auto`` to the kernel +command-line will make ``parport`` use any IRQ lines or DMA channels that +it auto-detects. + + +Files in /proc +============== + +If you have configured the ``/proc`` filesystem into your kernel, you will +see a new directory entry: ``/proc/sys/dev/parport``. In there will be a +directory entry for each parallel port for which parport is +configured. In each of those directories are a collection of files +describing that parallel port. + +The ``/proc/sys/dev/parport`` directory tree looks like:: + + parport + |-- default + | |-- spintime + | `-- timeslice + |-- parport0 + | |-- autoprobe + | |-- autoprobe0 + | |-- autoprobe1 + | |-- autoprobe2 + | |-- autoprobe3 + | |-- devices + | | |-- active + | | `-- lp + | | `-- timeslice + | |-- base-addr + | |-- irq + | |-- dma + | |-- modes + | `-- spintime + `-- parport1 + |-- autoprobe + |-- autoprobe0 + |-- autoprobe1 + |-- autoprobe2 + |-- autoprobe3 + |-- devices + | |-- active + | `-- ppa + | `-- timeslice + |-- base-addr + |-- irq + |-- dma + |-- modes + `-- spintime + +.. tabularcolumns:: |p{4.0cm}|p{13.5cm}| + +======================= ======================================================= +File Contents +======================= ======================================================= +``devices/active`` A list of the device drivers using that port. A "+" + will appear by the name of the device currently using + the port (it might not appear against any). The + string "none" means that there are no device drivers + using that port. + +``base-addr`` Parallel port's base address, or addresses if the port + has more than one in which case they are separated + with tabs. These values might not have any sensible + meaning for some ports. + +``irq`` Parallel port's IRQ, or -1 if none is being used. + +``dma`` Parallel port's DMA channel, or -1 if none is being + used. + +``modes`` Parallel port's hardware modes, comma-separated, + meaning: + + - PCSPP + PC-style SPP registers are available. + + - TRISTATE + Port is bidirectional. + + - COMPAT + Hardware acceleration for printers is + available and will be used. + + - EPP + Hardware acceleration for EPP protocol + is available and will be used. + + - ECP + Hardware acceleration for ECP protocol + is available and will be used. + + - DMA + DMA is available and will be used. + + Note that the current implementation will only take + advantage of COMPAT and ECP modes if it has an IRQ + line to use. + +``autoprobe`` Any IEEE-1284 device ID information that has been + acquired from the (non-IEEE 1284.3) device. + +``autoprobe[0-3]`` IEEE 1284 device ID information retrieved from + daisy-chain devices that conform to IEEE 1284.3. + +``spintime`` The number of microseconds to busy-loop while waiting + for the peripheral to respond. You might find that + adjusting this improves performance, depending on your + peripherals. This is a port-wide setting, i.e. it + applies to all devices on a particular port. + +``timeslice`` The number of milliseconds that a device driver is + allowed to keep a port claimed for. This is advisory, + and driver can ignore it if it must. + +``default/*`` The defaults for spintime and timeslice. When a new + port is registered, it picks up the default spintime. + When a new device is registered, it picks up the + default timeslice. +======================= ======================================================= + +Device drivers +============== + +Once the parport code is initialised, you can attach device drivers to +specific ports. Normally this happens automatically; if the lp driver +is loaded it will create one lp device for each port found. You can +override this, though, by using parameters either when you load the lp +driver:: + + # insmod lp parport=0,2 + +or on the LILO command line:: + + lp=parport0 lp=parport2 + +Both the above examples would inform lp that you want ``/dev/lp0`` to be +the first parallel port, and /dev/lp1 to be the **third** parallel port, +with no lp device associated with the second port (parport1). Note +that this is different to the way older kernels worked; there used to +be a static association between the I/O port address and the device +name, so ``/dev/lp0`` was always the port at 0x3bc. This is no longer the +case - if you only have one port, it will default to being ``/dev/lp0``, +regardless of base address. + +Also: + + * If you selected the IEEE 1284 support at compile time, you can say + ``lp=auto`` on the kernel command line, and lp will create devices + only for those ports that seem to have printers attached. + + * If you give PLIP the ``timid`` parameter, either with ``plip=timid`` on + the command line, or with ``insmod plip timid=1`` when using modules, + it will avoid any ports that seem to be in use by other devices. + + * IRQ autoprobing works only for a few port types at the moment. + +Reporting printer problems with parport +======================================= + +If you are having problems printing, please go through these steps to +try to narrow down where the problem area is. + +When reporting problems with parport, really you need to give all of +the messages that ``parport_pc`` spits out when it initialises. There are +several code paths: + +- polling +- interrupt-driven, protocol in software +- interrupt-driven, protocol in hardware using PIO +- interrupt-driven, protocol in hardware using DMA + +The kernel messages that ``parport_pc`` logs give an indication of which +code path is being used. (They could be a lot better actually..) + +For normal printer protocol, having IEEE 1284 modes enabled or not +should not make a difference. + +To turn off the 'protocol in hardware' code paths, disable +``CONFIG_PARPORT_PC_FIFO``. Note that when they are enabled they are not +necessarily **used**; it depends on whether the hardware is available, +enabled by the BIOS, and detected by the driver. + +So, to start with, disable ``CONFIG_PARPORT_PC_FIFO``, and load ``parport_pc`` +with ``irq=none``. See if printing works then. It really should, +because this is the simplest code path. + +If that works fine, try with ``io=0x378 irq=7`` (adjust for your +hardware), to make it use interrupt-driven in-software protocol. + +If **that** works fine, then one of the hardware modes isn't working +right. Enable ``CONFIG_FIFO`` (no, it isn't a module option, +and yes, it should be), set the port to ECP mode in the BIOS and note +the DMA channel, and try with:: + + io=0x378 irq=7 dma=none (for PIO) + io=0x378 irq=7 dma=3 (for DMA) + +---------- + +philb@gnu.org +tim@cyberelk.net diff --git a/Documentation/ramoops.txt b/Documentation/admin-guide/ramoops.rst similarity index 69% rename from Documentation/ramoops.txt rename to Documentation/admin-guide/ramoops.rst index 26b9f31cf65a..4efd7ce77565 100644 --- a/Documentation/ramoops.txt +++ b/Documentation/admin-guide/ramoops.rst @@ -5,34 +5,37 @@ Sergiu Iordache Updated: 17 November 2011 -0. Introduction +Introduction +------------ Ramoops is an oops/panic logger that writes its logs to RAM before the system crashes. It works by logging oopses and panics in a circular buffer. Ramoops needs a system with persistent RAM so that the content of that area can survive after a restart. -1. Ramoops concepts +Ramoops concepts +---------------- Ramoops uses a predefined memory area to store the dump. The start and size and type of the memory area are set using three variables: - * "mem_address" for the start - * "mem_size" for the size. The memory size will be rounded down to a - power of two. - * "mem_type" to specifiy if the memory type (default is pgprot_writecombine). - -Typically the default value of mem_type=0 should be used as that sets the pstore -mapping to pgprot_writecombine. Setting mem_type=1 attempts to use -pgprot_noncached, which only works on some platforms. This is because pstore + + * ``mem_address`` for the start + * ``mem_size`` for the size. The memory size will be rounded down to a + power of two. + * ``mem_type`` to specifiy if the memory type (default is pgprot_writecombine). + +Typically the default value of ``mem_type=0`` should be used as that sets the pstore +mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use +``pgprot_noncached``, which only works on some platforms. This is because pstore depends on atomic operations. At least on ARM, pgprot_noncached causes the memory to be mapped strongly ordered, and atomic operations on strongly ordered memory are implementation defined, and won't work on many ARMs such as omaps. -The memory area is divided into "record_size" chunks (also rounded down to -power of two) and each oops/panic writes a "record_size" chunk of +The memory area is divided into ``record_size`` chunks (also rounded down to +power of two) and each oops/panic writes a ``record_size`` chunk of information. -Dumping both oopses and panics can be done by setting 1 in the "dump_oops" +Dumping both oopses and panics can be done by setting 1 in the ``dump_oops`` variable while setting 0 in that variable dumps only the panics. The module uses a counter to record multiple dumps but the counter gets reset @@ -43,7 +46,8 @@ This might be useful when a hardware reset was used to bring the machine back to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat corrupt, but usually it is restorable. -2. Setting the parameters +Setting the parameters +---------------------- Setting the ramoops parameters can be done in several different manners: @@ -52,12 +56,13 @@ Setting the ramoops parameters can be done in several different manners: boot and then use the reserved memory for ramoops. For example, assuming a machine with > 128 MB of memory, the following kernel command line will tell the kernel to use only the first 128 MB of memory, and place ECC-protected - ramoops region at 128 MB boundary: - "mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1" + ramoops region at 128 MB boundary:: + + mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1 B. Use Device Tree bindings, as described in - Documentation/device-tree/bindings/reserved-memory/ramoops.txt. - For example: + ``Documentation/device-tree/bindings/reserved-memory/admin-guide/ramoops.rst``. + For example:: reserved-memory { #address-cells = <2>; @@ -75,58 +80,63 @@ Setting the ramoops parameters can be done in several different manners: C. Use a platform device and set the platform data. The parameters can then be set through that platform data. An example of doing that is: -#include -[...] + .. code-block:: c + + #include + [...] -static struct ramoops_platform_data ramoops_data = { + static struct ramoops_platform_data ramoops_data = { .mem_size = <...>, .mem_address = <...>, .mem_type = <...>, .record_size = <...>, .dump_oops = <...>, .ecc = <...>, -}; + }; -static struct platform_device ramoops_dev = { + static struct platform_device ramoops_dev = { .name = "ramoops", .dev = { .platform_data = &ramoops_data, }, -}; + }; -[... inside a function ...] -int ret; + [... inside a function ...] + int ret; -ret = platform_device_register(&ramoops_dev); -if (ret) { + ret = platform_device_register(&ramoops_dev); + if (ret) { printk(KERN_ERR "unable to register platform device\n"); return ret; -} + } You can specify either RAM memory or peripheral devices' memory. However, when specifying RAM, be sure to reserve the memory by issuing memblock_reserve() -very early in the architecture code, e.g.: +very early in the architecture code, e.g.:: -#include + #include -memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size); + memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size); -3. Dump format +Dump format +----------- -The data dump begins with a header, currently defined as "====" followed by a +The data dump begins with a header, currently defined as ``====`` followed by a timestamp and a new line. The dump then continues with the actual data. -4. Reading the data +Reading the data +---------------- The dump data can be read from the pstore filesystem. The format for these -files is "dmesg-ramoops-N", where N is the record number in memory. To delete +files is ``dmesg-ramoops-N``, where N is the record number in memory. To delete a stored record from RAM, simply unlink the respective pstore file. -5. Persistent function tracing +Persistent function tracing +--------------------------- Persistent function tracing might be useful for debugging software or hardware -related hangs. The functions call chain log is stored in a "ftrace-ramoops" -file. Here is an example of usage: +related hangs. The functions call chain log is stored in a ``ftrace-ramoops`` +file. Here is an example of usage:: # mount -t debugfs debugfs /sys/kernel/debug/ # echo 1 > /sys/kernel/debug/pstore/record_ftrace diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst new file mode 100644 index 000000000000..d71340e86c27 --- /dev/null +++ b/Documentation/admin-guide/ras.rst @@ -0,0 +1,1190 @@ +.. include:: + +============================================ +Reliability, Availability and Serviceability +============================================ + +RAS concepts +************ + +Reliability, Availability and Serviceability (RAS) is a concept used on +servers meant to measure their robusteness. + +Reliability + is the probability that a system will produce correct outputs. + + * Generally measured as Mean Time Between Failures (MTBF) + * Enhanced by features that help to avoid, detect and repair hardware faults + +Availability + is the probability that a system is operational at a given time + + * Generally measured as a percentage of downtime per a period of time + * Often uses mechanisms to detect and correct hardware faults in + runtime; + +Serviceability (or maintainability) + is the simplicity and speed with which a system can be repaired or + maintained + + * Generally measured on Mean Time Between Repair (MTBR) + +Improving RAS +------------- + +In order to reduce systems downtime, a system should be capable of detecting +hardware errors, and, when possible correcting them in runtime. It should +also provide mechanisms to detect hardware degradation, in order to warn +the system administrator to take the action of replacing a component before +it causes data loss or system downtime. + +Among the monitoring measures, the most usual ones include: + +* CPU – detect errors at instruction execution and at L1/L2/L3 caches; +* Memory – add error correction logic (ECC) to detect and correct errors; +* I/O – add CRC checksums for tranfered data; +* Storage – RAID, journal file systems, checksums, + Self-Monitoring, Analysis and Reporting Technology (SMART). + +By monitoring the number of occurrences of error detections, it is possible +to identify if the probability of hardware errors is increasing, and, on such +case, do a preventive maintainance to replace a degrated component while +those errors are correctable. + +Types of errors +--------------- + +Most mechanisms used on modern systems use use technologies like Hamming +Codes that allow error correction when the number of errors on a bit packet +is below a threshold. If the number of errors is above, those mechanisms +can indicate with a high degree of confidence that an error happened, but +they can't correct. + +Also, sometimes an error occur on a component that it is not used. For +example, a part of the memory that it is not currently allocated. + +That defines some categories of errors: + +* **Correctable Error (CE)** - the error detection mechanism detected and + corrected the error. Such errors are usually not fatal, although some + Kernel mechanisms allow the system administrator to consider them as fatal. + +* **Uncorrected Error (UE)** - the amount of errors happened above the error + correction threshold, and the system was unable to auto-correct. + +* **Fatal Error** - when an UE error happens on a critical component of the + system (for example, a piece of the Kernel got corrupted by an UE), the + only reliable way to avoid data corruption is to hang or reboot the machine. + +* **Non-fatal Error** - when an UE error happens on an unused component, + like a CPU in power down state or an unused memory bank, the system may + still run, eventually replacing the affected hardware by a hot spare, + if available. + + Also, when an error happens on an userspace process, it is also possible to + kill such process and let userspace restart it. + +The mechanism for handling non-fatal errors is usually complex and may +require the help of some userspace application, in order to apply the +policy desired by the system administrator. + +Identifying a bad hardware component +------------------------------------ + +Just detecting a hardware flaw is usually not enough, as the system needs +to pinpoint to the minimal replaceable unit (MRU) that should be exchanged +to make the hardware reliable again. + +So, it requires not only error logging facilities, but also mechanisms that +will translate the error message to the silkscreen or component label for +the MRU. + +Typically, it is very complex for memory, as modern CPUs interlace memory +from different memory modules, in order to provide a better performance. The +DMI BIOS usually have a list of memory module labels, with can be obtained +using the ``dmidecode`` tool. For example, on a desktop machine, it shows:: + + Memory Device + Total Width: 64 bits + Data Width: 64 bits + Size: 16384 MB + Form Factor: SODIMM + Set: None + Locator: ChannelA-DIMM0 + Bank Locator: BANK 0 + Type: DDR4 + Type Detail: Synchronous + Speed: 2133 MHz + Rank: 2 + Configured Clock Speed: 2133 MHz + +On the above example, a DDR4 SO-DIMM memory module is located at the +system's memory labeled as "BANK 0", as given by the *bank locator* field. +Please notice that, on such system, the *total width* is equal to the +*data witdh*. It means that such memory module doesn't have error +detection/correction mechanisms. + +Unfortunately, not all systems use the same field to specify the memory +bank. On this example, from an older server, ``dmidecode`` shows:: + + Memory Device + Array Handle: 0x1000 + Error Information Handle: Not Provided + Total Width: 72 bits + Data Width: 64 bits + Size: 8192 MB + Form Factor: DIMM + Set: 1 + Locator: DIMM_A1 + Bank Locator: Not Specified + Type: DDR3 + Type Detail: Synchronous Registered (Buffered) + Speed: 1600 MHz + Rank: 2 + Configured Clock Speed: 1600 MHz + +There, the DDR3 RDIMM memory module is located at the system's memory labeled +as "DIMM_A1", as given by the *locator* field. Please notice that this +memory module has 64 bits of *data witdh* and 72 bits of *total width*. So, +it has 8 extra bits to be used by error detection and correction mechanisms. +Such kind of memory is called Error-correcting code memory (ECC memory). + +To make things even worse, it is not uncommon that systems with different +labels on their system's board to use exactly the same BIOS, meaning that +the labels provided by the BIOS won't match the real ones. + +ECC memory +---------- + +As mentioned on the previous section, ECC memory has extra bits to be +used for error correction. So, on 64 bit systems, a memory module +has 64 bits of *data width*, and 74 bits of *total width*. So, there are +8 bits extra bits to be used for the error detection and correction +mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_. + +So, when the cpu requests the memory controller to write a word with +*data width*, the memory controller calculates the *syndrome* in real time, +using Hamming code, or some other error correction code, like SECDED+, +producing a code with *total width* size. Such code is then written +on the memory modules. + +At read, the *total width* bits code is converted back, using the same +ECC code used on write, producing a word with *data width* and a *syndrome*. +The word with *data width* is sent to the CPU, even when errors happen. + +The memory controller also looks at the *syndrome* in order to check if +there was an error, and if the ECC code was able to fix such error. +If the error was corrected, a Corrected Error (CE) happened. If not, an +Uncorrected Error (UE) happened. + +The information about the CE/UE errors is stored on some special registers +at the memory controller and can be accessed by reading such registers, +either by BIOS, by some special CPUs or by Linux EDAC driver. On x86 64 +bit CPUs, such errors can also be retrieved via the Machine Check +Architecture (MCA)\ [#f3]_. + +.. [#f1] Please notice that several memory controllers allow operation on a + mode called "Lock-Step", where it groups two memory modules together, + doing 128-bit reads/writes. That gives 16 bits for error correction, with + significatively improves the error correction mechanism, at the expense + that, when an error happens, there's no way to know what memory module is + to blame. So, it has to blame both memory modules. + +.. [#f2] Some memory controllers also allow using memory in mirror mode. + On such mode, the same data is written to two memory modules. At read, + the system checks both memory modules, in order to check if both provide + identical data. On such configuration, when an error happens, there's no + way to know what memory module is to blame. So, it has to blame both + memory modules (or 4 memory modules, if the system is also on Lock-step + mode). + +.. [#f3] For more details about the Machine Check Architecture (MCA), + please read Documentation/x86/x86_64/machinecheck at the Kernel tree. + +EDAC - Error Detection And Correction +************************************* + +.. note:: + + "bluesmoke" was the name for this device driver subsystem when it + was "out-of-tree" and maintained at http://bluesmoke.sourceforge.net. + That site is mostly archaic now and can be used only for historical + purposes. + + When the subsystem was pushed upstream for the first time, on + Kernel 2.6.16, for the first time, it was renamed to ``EDAC``. + +Purpose +------- + +The ``edac`` kernel module's goal is to detect and report hardware errors +that occur within the computer system running under linux. + +Memory +------ + +Memory Correctable Errors (CE) and Uncorrectable Errors (UE) are the +primary errors being harvested. These types of errors are harvested by +the ``edac_mc`` device. + +Detecting CE events, then harvesting those events and reporting them, +**can** but must not necessarily be a predictor of future UE events. With +CE events only, the system can and will continue to operate as no data +has been damaged yet. + +However, preventive maintenance and proactive part replacement of memory +modules exhibiting CEs can reduce the likelihood of the dreaded UE events +and system panics. + +Other hardware elements +----------------------- + +A new feature for EDAC, the ``edac_device`` class of device, was added in +the 2.6.23 version of the kernel. + +This new device type allows for non-memory type of ECC hardware detectors +to have their states harvested and presented to userspace via the sysfs +interface. + +Some architectures have ECC detectors for L1, L2 and L3 caches, +along with DMA engines, fabric switches, main data path switches, +interconnections, and various other hardware data paths. If the hardware +reports it, then a edac_device device probably can be constructed to +harvest and present that to userspace. + + +PCI bus scanning +---------------- + +In addition, PCI devices are scanned for PCI Bus Parity and SERR Errors +in order to determine if errors are occurring during data transfers. + +The presence of PCI Parity errors must be examined with a grain of salt. +There are several add-in adapters that do **not** follow the PCI specification +with regards to Parity generation and reporting. The specification says +the vendor should tie the parity status bits to 0 if they do not intend +to generate parity. Some vendors do not do this, and thus the parity bit +can "float" giving false positives. + +There is a PCI device attribute located in sysfs that is checked by +the EDAC PCI scanning code. If that attribute is set, PCI parity/error +scanning is skipped for that device. The attribute is:: + + broken_parity_status + +and is located in ``/sys/devices/pci/0000:XX:YY.Z`` directories for +PCI devices. + + +Versioning +---------- + +EDAC is composed of a "core" module (``edac_core.ko``) and several Memory +Controller (MC) driver modules. On a given system, the CORE is loaded +and one MC driver will be loaded. Both the CORE and the MC driver (or +``edac_device`` driver) have individual versions that reflect current +release level of their respective modules. + +Thus, to "report" on what version a system is running, one must report +both the CORE's and the MC driver's versions. + + +Loading +------- + +If ``edac`` was statically linked with the kernel then no loading +is necessary. If ``edac`` was built as modules then simply modprobe +the ``edac`` pieces that you need. You should be able to modprobe +hardware-specific modules and have the dependencies load the necessary +core modules. + +Example:: + + $ modprobe amd76x_edac + +loads both the ``amd76x_edac.ko`` memory controller module and the +``edac_mc.ko`` core module. + + +Sysfs interface +--------------- + +EDAC presents a ``sysfs`` interface for control and reporting purposes. It +lives in the /sys/devices/system/edac directory. + +Within this directory there currently reside 2 components: + + ======= ============================== + mc memory controller(s) system + pci PCI control and status system + ======= ============================== + + + +Memory Controller (mc) Model +---------------------------- + +Each ``mc`` device controls a set of memory modules [#f4]_. These modules +are laid out in a Chip-Select Row (``csrowX``) and Channel table (``chX``). +There can be multiple csrows and multiple channels. + +.. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely + used to refer to a memory module, although there are other memory + packaging alternatives, like SO-DIMM, SIMM, etc. Along this document, + and inside the EDAC system, the term "dimm" is used for all memory + modules, even when they use a different kind of packaging. + +Memory controllers allow for several csrows, with 8 csrows being a +typical value. Yet, the actual number of csrows depends on the layout of +a given motherboard, memory controller and memory module characteristics. + +Dual channels allow for dual data length (e. g. 128 bits, on 64 bit systems) +data transfers to/from the CPU from/to memory. Some newer chipsets allow +for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory +controllers. The following example will assume 2 channels: + + +------------+-----------------------+ + | Chip | Channels | + | Select +-----------+-----------+ + | rows | ``ch0`` | ``ch1`` | + +============+===========+===========+ + | ``csrow0`` | DIMM_A0 | DIMM_B0 | + +------------+ | | + | ``csrow1`` | | | + +------------+-----------+-----------+ + | ``csrow2`` | DIMM_A1 | DIMM_B1 | + +------------+ | | + | ``csrow3`` | | | + +------------+-----------+-----------+ + +In the above example, there are 4 physical slots on the motherboard +for memory DIMMs: + + +---------+---------+ + | DIMM_A0 | DIMM_B0 | + +---------+---------+ + | DIMM_A1 | DIMM_B1 | + +---------+---------+ + +Labels for these slots are usually silk-screened on the motherboard. +Slots labeled ``A`` are channel 0 in this example. Slots labeled ``B`` are +channel 1. Notice that there are two csrows possible on a physical DIMM. +These csrows are allocated their csrow assignment based on the slot into +which the memory DIMM is placed. Thus, when 1 DIMM is placed in each +Channel, the csrows cross both DIMMs. + +Memory DIMMs come single or dual "ranked". A rank is a populated csrow. +Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above +will have just one csrow (csrow0). csrow1 will be empty. On the other +hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0 +and csrow1 will be populated. The pattern repeats itself for csrow2 and +csrow3. + +The representation of the above is reflected in the directory +tree in EDAC's sysfs interface. Starting in directory +``/sys/devices/system/edac/mc``, each memory controller will be +represented by its own ``mcX`` directory, where ``X`` is the +index of the MC:: + + ..../edac/mc/ + | + |->mc0 + |->mc1 + |->mc2 + .... + +Under each ``mcX`` directory each ``csrowX`` is again represented by a +``csrowX``, where ``X`` is the csrow index:: + + .../mc/mc0/ + | + |->csrow0 + |->csrow2 + |->csrow3 + .... + +Notice that there is no csrow1, which indicates that csrow0 is composed +of a single ranked DIMMs. This should also apply in both Channels, in +order to have dual-channel mode be operational. Since both csrow2 and +csrow3 are populated, this indicates a dual ranked set of DIMMs for +channels 0 and 1. + +Within each of the ``mcX`` and ``csrowX`` directories are several EDAC +control and attribute files. + +``mcX`` directories +------------------- + +In ``mcX`` directories are EDAC control and attribute files for +this ``X`` instance of the memory controllers. + +For a description of the sysfs API, please see: + + Documentation/ABI/testing/sysfs-devices-edac + + +``dimmX`` or ``rankX`` directories +---------------------------------- + +The recommended way to use the EDAC subsystem is to look at the information +provided by the ``dimmX`` or ``rankX`` directories [#f5]_. + +A typical EDAC system has the following structure under +``/sys/devices/system/edac/``\ [#f6]_:: + + /sys/devices/system/edac/ + ├── mc + │   ├── mc0 + │   │   ├── ce_count + │   │   ├── ce_noinfo_count + │   │   ├── dimm0 + │   │   │   ├── dimm_dev_type + │   │   │   ├── dimm_edac_mode + │   │   │   ├── dimm_label + │   │   │   ├── dimm_location + │   │   │   ├── dimm_mem_type + │   │   │   ├── size + │   │   │   └── uevent + │   │   ├── max_location + │   │   ├── mc_name + │   │   ├── reset_counters + │   │   ├── seconds_since_reset + │   │   ├── size_mb + │   │   ├── ue_count + │   │   ├── ue_noinfo_count + │   │   └── uevent + │   ├── mc1 + │   │   ├── ce_count + │   │   ├── ce_noinfo_count + │   │   ├── dimm0 + │   │   │   ├── dimm_dev_type + │   │   │   ├── dimm_edac_mode + │   │   │   ├── dimm_label + │   │   │   ├── dimm_location + │   │   │   ├── dimm_mem_type + │   │   │   ├── size + │   │   │   └── uevent + │   │   ├── max_location + │   │   ├── mc_name + │   │   ├── reset_counters + │   │   ├── seconds_since_reset + │   │   ├── size_mb + │   │   ├── ue_count + │   │   ├── ue_noinfo_count + │   │   └── uevent + │   └── uevent + └── uevent + +In the ``dimmX`` directories are EDAC control and attribute files for +this ``X`` memory module: + +- ``size`` - Total memory managed by this csrow attribute file + + This attribute file displays, in count of megabytes, the memory + that this csrow contains. + +- ``dimm_dev_type`` - Device type attribute file + + This attribute file will display what type of DRAM device is + being utilized on this DIMM. + Examples: + + - x1 + - x2 + - x4 + - x8 + +- ``dimm_edac_mode`` - EDAC Mode of operation attribute file + + This attribute file will display what type of Error detection + and correction is being utilized. + +- ``dimm_label`` - memory module label control file + + This control file allows this DIMM to have a label assigned + to it. With this label in the module, when errors occur + the output can provide the DIMM label in the system log. + This becomes vital for panic events to isolate the + cause of the UE event. + + DIMM Labels must be assigned after booting, with information + that correctly identifies the physical slot with its + silk screen label. This information is currently very + motherboard specific and determination of this information + must occur in userland at this time. + +- ``dimm_location`` - location of the memory module + + The location can have up to 3 levels, and describe how the + memory controller identifies the location of a memory module. + Depending on the type of memory and memory controller, it + can be: + + - *csrow* and *channel* - used when the memory controller + doesn't identify a single DIMM - e. g. in ``rankX`` dir; + - *branch*, *channel*, *slot* - typically used on FB-DIMM memory + controllers; + - *channel*, *slot* - used on Nehalem and newer Intel drivers. + +- ``dimm_mem_type`` - Memory Type attribute file + + This attribute file will display what type of memory is currently + on this csrow. Normally, either buffered or unbuffered memory. + Examples: + + - Registered-DDR + - Unbuffered-DDR + +.. [#f5] On some systems, the memory controller doesn't have any logic + to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories. + On modern Intel memory controllers, the memory controller identifies the + memory modules directly. On such systems, the directory is called ``dimmX``. + +.. [#f6] There are also some ``power`` directories and ``subsystem`` + symlinks inside the sysfs mapping that are automatically created by + the sysfs subsystem. Currently, they serve no purpose. + +``csrowX`` directories +---------------------- + +When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX`` +directories. As this API doesn't work properly for Rambus, FB-DIMMs and +modern Intel Memory Controllers, this is being deprecated in favor of +``dimmX`` directories. + +In the ``csrowX`` directories are EDAC control and attribute files for +this ``X`` instance of csrow: + + +- ``ue_count`` - Total Uncorrectable Errors count attribute file + + This attribute file displays the total count of uncorrectable + errors that have occurred on this csrow. If panic_on_ue is set + this counter will not have a chance to increment, since EDAC + will panic the system. + + +- ``ce_count`` - Total Correctable Errors count attribute file + + This attribute file displays the total count of correctable + errors that have occurred on this csrow. This count is very + important to examine. CEs provide early indications that a + DIMM is beginning to fail. This count field should be + monitored for non-zero values and report such information + to the system administrator. + + +- ``size_mb`` - Total memory managed by this csrow attribute file + + This attribute file displays, in count of megabytes, the memory + that this csrow contains. + + +- ``mem_type`` - Memory Type attribute file + + This attribute file will display what type of memory is currently + on this csrow. Normally, either buffered or unbuffered memory. + Examples: + + - Registered-DDR + - Unbuffered-DDR + + +- ``edac_mode`` - EDAC Mode of operation attribute file + + This attribute file will display what type of Error detection + and correction is being utilized. + + +- ``dev_type`` - Device type attribute file + + This attribute file will display what type of DRAM device is + being utilized on this DIMM. + Examples: + + - x1 + - x2 + - x4 + - x8 + + +- ``ch0_ce_count`` - Channel 0 CE Count attribute file + + This attribute file will display the count of CEs on this + DIMM located in channel 0. + + +- ``ch0_ue_count`` - Channel 0 UE Count attribute file + + This attribute file will display the count of UEs on this + DIMM located in channel 0. + + +- ``ch0_dimm_label`` - Channel 0 DIMM Label control file + + + This control file allows this DIMM to have a label assigned + to it. With this label in the module, when errors occur + the output can provide the DIMM label in the system log. + This becomes vital for panic events to isolate the + cause of the UE event. + + DIMM Labels must be assigned after booting, with information + that correctly identifies the physical slot with its + silk screen label. This information is currently very + motherboard specific and determination of this information + must occur in userland at this time. + + +- ``ch1_ce_count`` - Channel 1 CE Count attribute file + + + This attribute file will display the count of CEs on this + DIMM located in channel 1. + + +- ``ch1_ue_count`` - Channel 1 UE Count attribute file + + + This attribute file will display the count of UEs on this + DIMM located in channel 0. + + +- ``ch1_dimm_label`` - Channel 1 DIMM Label control file + + This control file allows this DIMM to have a label assigned + to it. With this label in the module, when errors occur + the output can provide the DIMM label in the system log. + This becomes vital for panic events to isolate the + cause of the UE event. + + DIMM Labels must be assigned after booting, with information + that correctly identifies the physical slot with its + silk screen label. This information is currently very + motherboard specific and determination of this information + must occur in userland at this time. + + +System Logging +-------------- + +If logging for UEs and CEs is enabled, then system logs will contain +information indicating that errors have been detected:: + + EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, channel 1 "DIMM_B1": amd76x_edac + EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, channel 1 "DIMM_B1": amd76x_edac + + +The structure of the message is: + + +---------------------------------------+-------------+ + | Content + Example | + +=======================================+=============+ + | The memory controller | MC0 | + +---------------------------------------+-------------+ + | Error type | CE | + +---------------------------------------+-------------+ + | Memory page | 0x283 | + +---------------------------------------+-------------+ + | Offset in the page | 0xce0 | + +---------------------------------------+-------------+ + | The byte granularity | grain 8 | + | or resolution of the error | | + +---------------------------------------+-------------+ + | The error syndrome | 0xb741 | + +---------------------------------------+-------------+ + | Memory row | row 0 + + +---------------------------------------+-------------+ + | Memory channel | channel 1 | + +---------------------------------------+-------------+ + | DIMM label, if set prior | DIMM B1 | + +---------------------------------------+-------------+ + | And then an optional, driver-specific | | + | message that may have additional | | + | information. | | + +---------------------------------------+-------------+ + +Both UEs and CEs with no info will lack all but memory controller, error +type, a notice of "no info" and then an optional, driver-specific error +message. + + +PCI Bus Parity Detection +------------------------ + +On Header Type 00 devices, the primary status is looked at for any +parity error regardless of whether parity is enabled on the device or +not. (The spec indicates parity is generated in some cases). On Header +Type 01 bridges, the secondary status register is also looked at to see +if parity occurred on the bus on the other side of the bridge. + + +Sysfs configuration +------------------- + +Under ``/sys/devices/system/edac/pci`` are control and attribute files as +follows: + + +- ``check_pci_parity`` - Enable/Disable PCI Parity checking control file + + This control file enables or disables the PCI Bus Parity scanning + operation. Writing a 1 to this file enables the scanning. Writing + a 0 to this file disables the scanning. + + Enable:: + + echo "1" >/sys/devices/system/edac/pci/check_pci_parity + + Disable:: + + echo "0" >/sys/devices/system/edac/pci/check_pci_parity + + +- ``pci_parity_count`` - Parity Count + + This attribute file will display the number of parity errors that + have been detected. + + +Module parameters +----------------- + +- ``edac_mc_panic_on_ue`` - Panic on UE control file + + An uncorrectable error will cause a machine panic. This is usually + desirable. It is a bad idea to continue when an uncorrectable error + occurs - it is indeterminate what was uncorrected and the operating + system context might be so mangled that continuing will lead to further + corruption. If the kernel has MCE configured, then EDAC will never + notice the UE. + + LOAD TIME:: + + module/kernel parameter: edac_mc_panic_on_ue=[0|1] + + RUN TIME:: + + echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue + + +- ``edac_mc_log_ue`` - Log UE control file + + + Generate kernel messages describing uncorrectable errors. These errors + are reported through the system message log system. UE statistics + will be accumulated even when UE logging is disabled. + + LOAD TIME:: + + module/kernel parameter: edac_mc_log_ue=[0|1] + + RUN TIME:: + + echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ue + + +- ``edac_mc_log_ce`` - Log CE control file + + + Generate kernel messages describing correctable errors. These + errors are reported through the system message log system. + CE statistics will be accumulated even when CE logging is disabled. + + LOAD TIME:: + + module/kernel parameter: edac_mc_log_ce=[0|1] + + RUN TIME:: + + echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ce + + +- ``edac_mc_poll_msec`` - Polling period control file + + + The time period, in milliseconds, for polling for error information. + Too small a value wastes resources. Too large a value might delay + necessary handling of errors and might loose valuable information for + locating the error. 1000 milliseconds (once each second) is the current + default. Systems which require all the bandwidth they can get, may + increase this. + + LOAD TIME:: + + module/kernel parameter: edac_mc_poll_msec=[0|1] + + RUN TIME:: + + echo "1000" > /sys/module/edac_core/parameters/edac_mc_poll_msec + + +- ``panic_on_pci_parity`` - Panic on PCI PARITY Error + + + This control file enables or disables panicking when a parity + error has been detected. + + + module/kernel parameter:: + + edac_panic_on_pci_pe=[0|1] + + Enable:: + + echo "1" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe + + Disable:: + + echo "0" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe + + + +EDAC device type +---------------- + +In the header file, edac_pci.h, there is a series of edac_device structures +and APIs for the EDAC_DEVICE. + +User space access to an edac_device is through the sysfs interface. + +At the location ``/sys/devices/system/edac`` (sysfs) new edac_device devices +will appear. + +There is a three level tree beneath the above ``edac`` directory. For example, +the ``test_device_edac`` device (found at the http://bluesmoke.sourceforget.net +website) installs itself as:: + + /sys/devices/system/edac/test-instance + +in this directory are various controls, a symlink and one or more ``instance`` +directories. + +The standard default controls are: + + ============== ======================================================= + log_ce boolean to log CE events + log_ue boolean to log UE events + panic_on_ue boolean to ``panic`` the system if an UE is encountered + (default off, can be set true via startup script) + poll_msec time period between POLL cycles for events + ============== ======================================================= + +The test_device_edac device adds at least one of its own custom control: + + ============== ================================================== + test_bits which in the current test driver does nothing but + show how it is installed. A ported driver can + add one or more such controls and/or attributes + for specific uses. + One out-of-tree driver uses controls here to allow + for ERROR INJECTION operations to hardware + injection registers + ============== ================================================== + +The symlink points to the 'struct dev' that is registered for this edac_device. + +Instances +--------- + +One or more instance directories are present. For the ``test_device_edac`` +case: + + +----------------+ + | test-instance0 | + +----------------+ + + +In this directory there are two default counter attributes, which are totals of +counter in deeper subdirectories. + + ============== ==================================== + ce_count total of CE events of subdirectories + ue_count total of UE events of subdirectories + ============== ==================================== + +Blocks +------ + +At the lowest directory level is the ``block`` directory. There can be 0, 1 +or more blocks specified in each instance: + + +-------------+ + | test-block0 | + +-------------+ + +In this directory the default attributes are: + + ============== ================================================ + ce_count which is counter of CE events for this ``block`` + of hardware being monitored + ue_count which is counter of UE events for this ``block`` + of hardware being monitored + ============== ================================================ + + +The ``test_device_edac`` device adds 4 attributes and 1 control: + + ================== ==================================================== + test-block-bits-0 for every POLL cycle this counter + is incremented + test-block-bits-1 every 10 cycles, this counter is bumped once, + and test-block-bits-0 is set to 0 + test-block-bits-2 every 100 cycles, this counter is bumped once, + and test-block-bits-1 is set to 0 + test-block-bits-3 every 1000 cycles, this counter is bumped once, + and test-block-bits-2 is set to 0 + ================== ==================================================== + + + ================== ==================================================== + reset-counters writing ANY thing to this control will + reset all the above counters. + ================== ==================================================== + + +Use of the ``test_device_edac`` driver should enable any others to create their own +unique drivers for their hardware systems. + +The ``test_device_edac`` sample driver is located at the +http://bluesmoke.sourceforge.net project site for EDAC. + + +Usage of EDAC APIs on Nehalem and newer Intel CPUs +-------------------------------------------------- + +On older Intel architectures, the memory controller was part of the North +Bridge chipset. Nehalem, Sandy Bridge, Ivy Bridge, Haswell, Sky Lake and +newer Intel architectures integrated an enhanced version of the memory +controller (MC) inside the CPUs. + +This chapter will cover the differences of the enhanced memory controllers +found on newer Intel CPUs, such as ``i7core_edac``, ``sb_edac`` and +``sbx_edac`` drivers. + +.. note:: + + The Xeon E7 processor families use a separate chip for the memory + controller, called Intel Scalable Memory Buffer. This section doesn't + apply for such families. + +1) There is one Memory Controller per Quick Patch Interconnect + (QPI). At the driver, the term "socket" means one QPI. This is + associated with a physical CPU socket. + + Each MC have 3 physical read channels, 3 physical write channels and + 3 logic channels. The driver currently sees it as just 3 channels. + Each channel can have up to 3 DIMMs. + + The minimum known unity is DIMMs. There are no information about csrows. + As EDAC API maps the minimum unity is csrows, the driver sequentially + maps channel/DIMM into different csrows. + + For example, supposing the following layout:: + + Ch0 phy rd0, wr0 (0x063f4031): 2 ranks, UDIMMs + dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 + dimm 1 1024 Mb offset: 4, bank: 8, rank: 1, row: 0x4000, col: 0x400 + Ch1 phy rd1, wr1 (0x063f4031): 2 ranks, UDIMMs + dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 + Ch2 phy rd3, wr3 (0x063f4031): 2 ranks, UDIMMs + dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 + + The driver will map it as:: + + csrow0: channel 0, dimm0 + csrow1: channel 0, dimm1 + csrow2: channel 1, dimm0 + csrow3: channel 2, dimm0 + + exports one DIMM per csrow. + + Each QPI is exported as a different memory controller. + +2) The MC has the ability to inject errors to test drivers. The drivers + implement this functionality via some error injection nodes: + + For injecting a memory error, there are some sysfs nodes, under + ``/sys/devices/system/edac/mc/mc?/``: + + - ``inject_addrmatch/*``: + Controls the error injection mask register. It is possible to specify + several characteristics of the address to match an error code:: + + dimm = the affected dimm. Numbers are relative to a channel; + rank = the memory rank; + channel = the channel that will generate an error; + bank = the affected bank; + page = the page address; + column (or col) = the address column. + + each of the above values can be set to "any" to match any valid value. + + At driver init, all values are set to any. + + For example, to generate an error at rank 1 of dimm 2, for any channel, + any bank, any page, any column:: + + echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm + echo 1 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank + + To return to the default behaviour of matching any, you can do:: + + echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm + echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank + + - ``inject_eccmask``: + specifies what bits will have troubles, + + - ``inject_section``: + specifies what ECC cache section will get the error:: + + 3 for both + 2 for the highest + 1 for the lowest + + - ``inject_type``: + specifies the type of error, being a combination of the following bits:: + + bit 0 - repeat + bit 1 - ecc + bit 2 - parity + + - ``inject_enable``: + starts the error generation when something different than 0 is written. + + All inject vars can be read. root permission is needed for write. + + Datasheet states that the error will only be generated after a write on an + address that matches inject_addrmatch. It seems, however, that reading will + also produce an error. + + For example, the following code will generate an error for any write access + at socket 0, on any DIMM/address on channel 2:: + + echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/channel + echo 2 >/sys/devices/system/edac/mc/mc0/inject_type + echo 64 >/sys/devices/system/edac/mc/mc0/inject_eccmask + echo 3 >/sys/devices/system/edac/mc/mc0/inject_section + echo 1 >/sys/devices/system/edac/mc/mc0/inject_enable + dd if=/dev/mem of=/dev/null seek=16k bs=4k count=1 >& /dev/null + + For socket 1, it is needed to replace "mc0" by "mc1" at the above + commands. + + The generated error message will look like:: + + EDAC MC0: UE row 0, channel-a= 0 channel-b= 0 labels "-": NON_FATAL (addr = 0x0075b980, socket=0, Dimm=0, Channel=2, syndrome=0x00000040, count=1, Err=8c0000400001009f:4000080482 (read error: read ECC error)) + +3) Corrected Error memory register counters + + Those newer MCs have some registers to count memory errors. The driver + uses those registers to report Corrected Errors on devices with Registered + DIMMs. + + However, those counters don't work with Unregistered DIMM. As the chipset + offers some counters that also work with UDIMMs (but with a worse level of + granularity than the default ones), the driver exposes those registers for + UDIMM memories. + + They can be read by looking at the contents of ``all_channel_counts/``:: + + $ for i in /sys/devices/system/edac/mc/mc0/all_channel_counts/*; do echo $i; cat $i; done + /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0 + 0 + /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1 + 0 + /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2 + 0 + + What happens here is that errors on different csrows, but at the same + dimm number will increment the same counter. + So, in this memory mapping:: + + csrow0: channel 0, dimm0 + csrow1: channel 0, dimm1 + csrow2: channel 1, dimm0 + csrow3: channel 2, dimm0 + + The hardware will increment udimm0 for an error at the first dimm at either + csrow0, csrow2 or csrow3; + + The hardware will increment udimm1 for an error at the second dimm at either + csrow0, csrow2 or csrow3; + + The hardware will increment udimm2 for an error at the third dimm at either + csrow0, csrow2 or csrow3; + +4) Standard error counters + + The standard error counters are generated when an mcelog error is received + by the driver. Since, with UDIMM, this is counted by software, it is + possible that some errors could be lost. With RDIMM's, they display the + contents of the registers + +Reference documents used on ``amd64_edac`` +------------------------------------------ + +``amd64_edac`` module is based on the following documents +(available from http://support.amd.com/en-us/search/tech-docs): + +1. :Title: BIOS and Kernel Developer's Guide for AMD Athlon 64 and AMD + Opteron Processors + :AMD publication #: 26094 + :Revision: 3.26 + :Link: http://support.amd.com/TechDocs/26094.PDF + +2. :Title: BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh + Processors + :AMD publication #: 32559 + :Revision: 3.00 + :Issue Date: May 2006 + :Link: http://support.amd.com/TechDocs/32559.pdf + +3. :Title: BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h + Processors + :AMD publication #: 31116 + :Revision: 3.00 + :Issue Date: September 07, 2007 + :Link: http://support.amd.com/TechDocs/31116.pdf + +4. :Title: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h + Models 30h-3Fh Processors + :AMD publication #: 49125 + :Revision: 3.06 + :Issue Date: 2/12/2015 (latest release) + :Link: http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf + +5. :Title: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h + Models 60h-6Fh Processors + :AMD publication #: 50742 + :Revision: 3.01 + :Issue Date: 7/23/2015 (latest release) + :Link: http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf + +6. :Title: BIOS and Kernel Developer's Guide (BKDG) for AMD Family 16h + Models 00h-0Fh Processors + :AMD publication #: 48751 + :Revision: 3.03 + :Issue Date: 2/23/2015 (latest release) + :Link: http://support.amd.com/TechDocs/48751_16h_bkdg.pdf + +Credits +======= + +* Written by Doug Thompson + + - 7 Dec 2005 + - 17 Jul 2007 Updated + +* |copy| Mauro Carvalho Chehab + + - 05 Aug 2009 Nehalem interface + - 26 Oct 2016 Converted to ReST and cleanups at the Nehalem section + +* EDAC authors/maintainers: + + - Doug Thompson, Dave Jiang, Dave Peterson et al, + - Mauro Carvalho Chehab + - Borislav Petkov + - original author: Thayne Harbaugh diff --git a/REPORTING-BUGS b/Documentation/admin-guide/reporting-bugs.rst similarity index 78% rename from REPORTING-BUGS rename to Documentation/admin-guide/reporting-bugs.rst index 914baf9cf5fa..26b60b419652 100644 --- a/REPORTING-BUGS +++ b/Documentation/admin-guide/reporting-bugs.rst @@ -1,3 +1,8 @@ +.. _reportingbugs: + +Reporting bugs +++++++++++++++ + Background ========== @@ -50,12 +55,13 @@ maintainer replies to you, make sure to 'Reply-all' in order to keep the public mailing list(s) in the email thread. If you know which driver is causing issues, you can pass one of the driver -files to the get_maintainer.pl script: +files to the get_maintainer.pl script:: + perl scripts/get_maintainer.pl -f If it is a security bug, please copy the Security Contact listed in the MAINTAINERS file. They can help coordinate bugfix and disclosure. See -Documentation/SecurityBugs for more information. +:ref:`Documentation/admin-guide/security-bugs.rst ` for more information. If you can't figure out which subsystem caused the issue, you should file a bug in kernel.org bugzilla and send email to @@ -69,8 +75,9 @@ Tips for reporting bugs If you haven't reported a bug before, please read: -http://www.chiark.greenend.org.uk/~sgtatham/bugs.html -http://www.catb.org/esr/faqs/smart-questions.html + http://www.chiark.greenend.org.uk/~sgtatham/bugs.html + + http://www.catb.org/esr/faqs/smart-questions.html It's REALLY important to report bugs that seem unrelated as separate email threads or separate bugzilla entries. If you report several unrelated @@ -87,7 +94,7 @@ step-by-step instructions for how a user can trigger the bug. If the failure includes an "OOPS:", take a picture of the screen, capture a netconsole trace, or type the message from your screen into the bug -report. Please read "Documentation/oops-tracing.txt" before posting your +report. Please read "Documentation/admin-guide/oops-tracing.rst" before posting your bug report. This explains what you should do with the "Oops" information to make it useful to the recipient. @@ -99,34 +106,34 @@ relevant to your bug, feel free to exclude it. First run the ver_linux script included as scripts/ver_linux, which reports the version of some important subsystems. Run this script with -the command "sh scripts/ver_linux". +the command ``awk -f scripts/ver_linux``. Use that information to fill in all fields of the bug report form, and post it to the mailing list with a subject of "PROBLEM: " for easy identification by the developers. - -[1.] One line summary of the problem: -[2.] Full description of the problem/report: -[3.] Keywords (i.e., modules, networking, kernel): -[4.] Kernel information -[4.1.] Kernel version (from /proc/version): -[4.2.] Kernel .config file: -[5.] Most recent kernel version which did not have the bug: -[6.] Output of Oops.. message (if applicable) with symbolic information - resolved (see Documentation/oops-tracing.txt) -[7.] A small shell script or example program which triggers the - problem (if possible) -[8.] Environment -[8.1.] Software (add the output of the ver_linux script here) -[8.2.] Processor information (from /proc/cpuinfo): -[8.3.] Module information (from /proc/modules): -[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) -[8.5.] PCI information ('lspci -vvv' as root) -[8.6.] SCSI information (from /proc/scsi/scsi) -[8.7.] Other information that might be relevant to the problem - (please look in /proc and include all information that you - think to be relevant): -[X.] Other notes, patches, fixes, workarounds: +summary from [1.]>" for easy identification by the developers:: + + [1.] One line summary of the problem: + [2.] Full description of the problem/report: + [3.] Keywords (i.e., modules, networking, kernel): + [4.] Kernel information + [4.1.] Kernel version (from /proc/version): + [4.2.] Kernel .config file: + [5.] Most recent kernel version which did not have the bug: + [6.] Output of Oops.. message (if applicable) with symbolic information + resolved (see Documentation/admin-guide/oops-tracing.rst) + [7.] A small shell script or example program which triggers the + problem (if possible) + [8.] Environment + [8.1.] Software (add the output of the ver_linux script here) + [8.2.] Processor information (from /proc/cpuinfo): + [8.3.] Module information (from /proc/modules): + [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) + [8.5.] PCI information ('lspci -vvv' as root) + [8.6.] SCSI information (from /proc/scsi/scsi) + [8.7.] Other information that might be relevant to the problem + (please look in /proc and include all information that you + think to be relevant): + [X.] Other notes, patches, fixes, workarounds: Follow up @@ -153,7 +160,8 @@ Expectations for kernel maintainers Linux kernel maintainers are busy, overworked human beings. Some times they may not be able to address your bug in a day, a week, or two weeks. If they don't answer your email, they may be on vacation, or at a Linux -conference. Check the conference schedule at LWN.net for more info: +conference. Check the conference schedule at https://LWN.net for more info: + https://lwn.net/Calendar/ In general, kernel maintainers take 1 to 5 business days to respond to diff --git a/Documentation/SecurityBugs b/Documentation/admin-guide/security-bugs.rst similarity index 91% rename from Documentation/SecurityBugs rename to Documentation/admin-guide/security-bugs.rst index 342d769834f6..4f7414cad586 100644 --- a/Documentation/SecurityBugs +++ b/Documentation/admin-guide/security-bugs.rst @@ -8,8 +8,8 @@ like to know when a security bug is found so that it can be fixed and disclosed as quickly as possible. Please report security bugs to the Linux kernel security team. -1) Contact ----------- +Contact +------- The Linux kernel security team can be contacted by email at . This is a private list of security officers @@ -19,12 +19,12 @@ area maintainers to understand and fix the security vulnerability. As it is with any bug, the more information provided the easier it will be to diagnose and fix. Please review the procedure outlined in -REPORTING-BUGS if you are unclear about what information is helpful. +admin-guide/reporting-bugs.rst if you are unclear about what information is helpful. Any exploit code is very helpful and will not be released without consent from the reporter unless it has already been made public. -2) Disclosure -------------- +Disclosure +---------- The goal of the Linux kernel security team is to work with the bug submitter to bug resolution as well as disclosure. We prefer @@ -39,8 +39,8 @@ disclosure is from immediate (esp. if it's already publicly known) to a few weeks. As a basic default policy, we expect report date to disclosure date to be on the order of 7 days. -3) Non-disclosure agreements ----------------------------- +Non-disclosure agreements +------------------------- The Linux kernel security team is not a formal body and therefore unable to enter any non-disclosure agreements. diff --git a/Documentation/serial-console.txt b/Documentation/admin-guide/serial-console.rst similarity index 60% rename from Documentation/serial-console.txt rename to Documentation/admin-guide/serial-console.rst index 9a7bc8b3f479..a8d1e36b627a 100644 --- a/Documentation/serial-console.txt +++ b/Documentation/admin-guide/serial-console.rst @@ -1,15 +1,21 @@ - Linux Serial Console +.. _serial_console: + +Linux Serial Console +==================== To use a serial port as console you need to compile the support into your kernel - by default it is not compiled in. For PC style serial ports -it's the config option next to "Standard/generic (dumb) serial support". +it's the config option next to menu option: + +:menuselection:`Character devices --> Serial drivers --> 8250/16550 and compatible serial support --> Console on 8250/16550 and compatible serial port` + You must compile serial support into the kernel and not as a module. It is possible to specify multiple devices for console output. You can define a new kernel command line option to select which device(s) to use for console output. -The format of this option is: +The format of this option is:: console=device,options @@ -28,11 +34,11 @@ The format of this option is: You can specify multiple console= options on the kernel command line. Output will appear on all of them. The last device will be used when -you open /dev/console. So, for example: +you open ``/dev/console``. So, for example:: console=ttyS1,9600 console=tty0 -defines that opening /dev/console will get you the current foreground +defines that opening ``/dev/console`` will get you the current foreground virtual console, and kernel messages will appear on both the VGA console and the 2nd serial port (ttyS1 or COM2) at 9600 baud. @@ -44,61 +50,61 @@ first looks for a VGA card and then for a serial port. So if you don't have a VGA card in your system the first serial port will automatically become the console. -You will need to create a new device to use /dev/console. The official -/dev/console is now character device 5,1. +You will need to create a new device to use ``/dev/console``. The official +``/dev/console`` is now character device 5,1. (You can also use a network device as a console. See -Documentation/networking/netconsole.txt for information on that.) +``Documentation/networking/netconsole.txt`` for information on that.) -Here's an example that will use /dev/ttyS1 (COM2) as the console. +Here's an example that will use ``/dev/ttyS1`` (COM2) as the console. Replace the sample values as needed. -1. Create /dev/console (real console) and /dev/tty0 (master virtual - console): +1. Create ``/dev/console`` (real console) and ``/dev/tty0`` (master virtual + console):: - cd /dev - rm -f console tty0 - mknod -m 622 console c 5 1 - mknod -m 622 tty0 c 4 0 + cd /dev + rm -f console tty0 + mknod -m 622 console c 5 1 + mknod -m 622 tty0 c 4 0 2. LILO can also take input from a serial device. This is a very useful option. To tell LILO to use the serial port: - In lilo.conf (global section): + In lilo.conf (global section):: - serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits) + serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits) 3. Adjust to kernel flags for the new kernel, - again in lilo.conf (kernel section) + again in lilo.conf (kernel section):: - append = "console=ttyS1,9600" + append = "console=ttyS1,9600" 4. Make sure a getty runs on the serial port so that you can login to it once the system is done booting. This is done by adding a line - like this to /etc/inittab (exact syntax depends on your getty): + like this to ``/etc/inittab`` (exact syntax depends on your getty):: - S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100 + S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100 -5. Init and /etc/ioctl.save +5. Init and ``/etc/ioctl.save`` - Sysvinit remembers its stty settings in a file in /etc, called - `/etc/ioctl.save'. REMOVE THIS FILE before using the serial + Sysvinit remembers its stty settings in a file in ``/etc``, called + ``/etc/ioctl.save``. REMOVE THIS FILE before using the serial console for the first time, because otherwise init will probably set the baudrate to 38400 (baudrate of the virtual console). -6. /dev/console and X +6. ``/dev/console`` and X Programs that want to do something with the virtual console usually - open /dev/console. If you have created the new /dev/console device, + open ``/dev/console``. If you have created the new ``/dev/console`` device, and your console is NOT the virtual console some programs will fail. Those are programs that want to access the VT interface, and use - /dev/console instead of /dev/tty0. Some of those programs are: + ``/dev/console instead of /dev/tty0``. Some of those programs are:: - Xfree86, svgalib, gpm, SVGATextMode + Xfree86, svgalib, gpm, SVGATextMode It should be fixed in modern versions of these programs though. - Note that if you boot without a console= option (or with - console=/dev/tty0), /dev/console is the same as /dev/tty0. In that - case everything will still work. + Note that if you boot without a ``console=`` option (or with + ``console=/dev/tty0``), ``/dev/console`` is the same as ``/dev/tty0``. + In that case everything will still work. 7. Thanks diff --git a/Documentation/admin-guide/sysfs-rules.rst b/Documentation/admin-guide/sysfs-rules.rst new file mode 100644 index 000000000000..abad33526aca --- /dev/null +++ b/Documentation/admin-guide/sysfs-rules.rst @@ -0,0 +1,192 @@ +Rules on how to access information in sysfs +=========================================== + +The kernel-exported sysfs exports internal kernel implementation details +and depends on internal kernel structures and layout. It is agreed upon +by the kernel developers that the Linux kernel does not provide a stable +internal API. Therefore, there are aspects of the sysfs interface that +may not be stable across kernel releases. + +To minimize the risk of breaking users of sysfs, which are in most cases +low-level userspace applications, with a new kernel release, the users +of sysfs must follow some rules to use an as-abstract-as-possible way to +access this filesystem. The current udev and HAL programs already +implement this and users are encouraged to plug, if possible, into the +abstractions these programs provide instead of accessing sysfs directly. + +But if you really do want or need to access sysfs directly, please follow +the following rules and then your programs should work with future +versions of the sysfs interface. + +- Do not use libsysfs + It makes assumptions about sysfs which are not true. Its API does not + offer any abstraction, it exposes all the kernel driver-core + implementation details in its own API. Therefore it is not better than + reading directories and opening the files yourself. + Also, it is not actively maintained, in the sense of reflecting the + current kernel development. The goal of providing a stable interface + to sysfs has failed; it causes more problems than it solves. It + violates many of the rules in this document. + +- sysfs is always at ``/sys`` + Parsing ``/proc/mounts`` is a waste of time. Other mount points are a + system configuration bug you should not try to solve. For test cases, + possibly support a ``SYSFS_PATH`` environment variable to overwrite the + application's behavior, but never try to search for sysfs. Never try + to mount it, if you are not an early boot script. + +- devices are only "devices" + There is no such thing like class-, bus-, physical devices, + interfaces, and such that you can rely on in userspace. Everything is + just simply a "device". Class-, bus-, physical, ... types are just + kernel implementation details which should not be expected by + applications that look for devices in sysfs. + + The properties of a device are: + + - devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``) + + - identical to the DEVPATH value in the event sent from the kernel + at device creation and removal + - the unique key to the device at that point in time + - the kernel's path to the device directory without the leading + ``/sys``, and always starting with a slash + - all elements of a devpath must be real directories. Symlinks + pointing to /sys/devices must always be resolved to their real + target and the target path must be used to access the device. + That way the devpath to the device matches the devpath of the + kernel used at event time. + - using or exposing symlink values as elements in a devpath string + is a bug in the application + + - kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...) + + - a directory name, identical to the last element of the devpath + - applications need to handle spaces and characters like ``!`` in + the name + + - subsystem (``block``, ``tty``, ``pci``, ...) + + - simple string, never a path or a link + - retrieved by reading the "subsystem"-link and using only the + last element of the target path + + - driver (``tg3``, ``ata_piix``, ``uhci_hcd``) + + - a simple string, which may contain spaces, never a path or a + link + - it is retrieved by reading the "driver"-link and using only the + last element of the target path + - devices which do not have "driver"-link just do not have a + driver; copying the driver value in a child device context is a + bug in the application + + - attributes + + - the files in the device directory or files below subdirectories + of the same device directory + - accessing attributes reached by a symlink pointing to another device, + like the "device"-link, is a bug in the application + + Everything else is just a kernel driver-core implementation detail + that should not be assumed to be stable across kernel releases. + +- Properties of parent devices never belong into a child device. + Always look at the parent devices themselves for determining device + context properties. If the device ``eth0`` or ``sda`` does not have a + "driver"-link, then this device does not have a driver. Its value is empty. + Never copy any property of the parent-device into a child-device. Parent + device properties may change dynamically without any notice to the + child device. + +- Hierarchy in a single device tree + There is only one valid place in sysfs where hierarchy can be examined + and this is below: ``/sys/devices.`` + It is planned that all device directories will end up in the tree + below this directory. + +- Classification by subsystem + There are currently three places for classification of devices: + ``/sys/block,`` ``/sys/class`` and ``/sys/bus.`` It is planned that these will + not contain any device directories themselves, but only flat lists of + symlinks pointing to the unified ``/sys/devices`` tree. + All three places have completely different rules on how to access + device information. It is planned to merge all three + classification directories into one place at ``/sys/subsystem``, + following the layout of the bus directories. All buses and + classes, including the converted block subsystem, will show up + there. + The devices belonging to a subsystem will create a symlink in the + "devices" directory at ``/sys/subsystem//devices``, + + If ``/sys/subsystem`` exists, ``/sys/bus``, ``/sys/class`` and ``/sys/block`` + can be ignored. If it does not exist, you always have to scan all three + places, as the kernel is free to move a subsystem from one place to + the other, as long as the devices are still reachable by the same + subsystem name. + + Assuming ``/sys/class/`` and ``/sys/bus/``, or + ``/sys/block`` and ``/sys/class/block`` are not interchangeable is a bug in + the application. + +- Block + The converted block subsystem at ``/sys/class/block`` or + ``/sys/subsystem/block`` will contain the links for disks and partitions + at the same level, never in a hierarchy. Assuming the block subsystem to + contain only disks and not partition devices in the same flat list is + a bug in the application. + +- "device"-link and :-links + Never depend on the "device"-link. The "device"-link is a workaround + for the old layout, where class devices are not created in + ``/sys/devices/`` like the bus devices. If the link-resolving of a + device directory does not end in ``/sys/devices/``, you can use the + "device"-link to find the parent devices in ``/sys/devices/``, That is the + single valid use of the "device"-link; it must never appear in any + path as an element. Assuming the existence of the "device"-link for + a device in ``/sys/devices/`` is a bug in the application. + Accessing ``/sys/class/net/eth0/device`` is a bug in the application. + + Never depend on the class-specific links back to the ``/sys/class`` + directory. These links are also a workaround for the design mistake + that class devices are not created in ``/sys/devices.`` If a device + directory does not contain directories for child devices, these links + may be used to find the child devices in ``/sys/class.`` That is the single + valid use of these links; they must never appear in any path as an + element. Assuming the existence of these links for devices which are + real child device directories in the ``/sys/devices`` tree is a bug in + the application. + + It is planned to remove all these links when all class device + directories live in ``/sys/devices.`` + +- Position of devices along device chain can change. + Never depend on a specific parent device position in the devpath, + or the chain of parent devices. The kernel is free to insert devices into + the chain. You must always request the parent device you are looking for + by its subsystem value. You need to walk up the chain until you find + the device that matches the expected subsystem. Depending on a specific + position of a parent device or exposing relative paths using ``../`` to + access the chain of parents is a bug in the application. + +- When reading and writing sysfs device attribute files, avoid dependency + on specific error codes wherever possible. This minimizes coupling to + the error handling implementation within the kernel. + + In general, failures to read or write sysfs device attributes shall + propagate errors wherever possible. Common errors include, but are not + limited to: + + ``-EIO``: The read or store operation is not supported, typically + returned by the sysfs system itself if the read or store pointer + is ``NULL``. + + ``-ENXIO``: The read or store operation failed + + Error codes will not be changed without good reason, and should a change + to error codes result in user-space breakage, it will be fixed, or the + the offending change will be reverted. + + Userspace applications can, however, expect the format and contents of + the attribute files to remain consistent in the absence of a version + attribute change in the context of a given attribute. diff --git a/Documentation/admin-guide/sysrq.rst b/Documentation/admin-guide/sysrq.rst new file mode 100644 index 000000000000..d1712ea2d314 --- /dev/null +++ b/Documentation/admin-guide/sysrq.rst @@ -0,0 +1,289 @@ +Linux Magic System Request Key Hacks +==================================== + +Documentation for sysrq.c + +What is the magic SysRq key? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is a 'magical' key combo you can hit which the kernel will respond to +regardless of whatever else it is doing, unless it is completely locked up. + +How do I enable the magic SysRq key? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You need to say "yes" to 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' when +configuring the kernel. When running a kernel with SysRq compiled in, +/proc/sys/kernel/sysrq controls the functions allowed to be invoked via +the SysRq key. The default value in this file is set by the +CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE config symbol, which itself defaults +to 1. Here is the list of possible values in /proc/sys/kernel/sysrq: + + - 0 - disable sysrq completely + - 1 - enable all functions of sysrq + - >1 - bitmask of allowed sysrq functions (see below for detailed function + description):: + + 2 = 0x2 - enable control of console logging level + 4 = 0x4 - enable control of keyboard (SAK, unraw) + 8 = 0x8 - enable debugging dumps of processes etc. + 16 = 0x10 - enable sync command + 32 = 0x20 - enable remount read-only + 64 = 0x40 - enable signalling of processes (term, kill, oom-kill) + 128 = 0x80 - allow reboot/poweroff + 256 = 0x100 - allow nicing of all RT tasks + +You can set the value in the file by the following command:: + + echo "number" >/proc/sys/kernel/sysrq + +The number may be written here either as decimal or as hexadecimal +with the 0x prefix. CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE must always be +written in hexadecimal. + +Note that the value of ``/proc/sys/kernel/sysrq`` influences only the invocation +via a keyboard. Invocation of any operation via ``/proc/sysrq-trigger`` is +always allowed (by a user with admin privileges). + +How do I use the magic SysRq key? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On x86 - You press the key combo :kbd:`ALT-SysRq-`. + +.. note:: + Some + keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is + also known as the 'Print Screen' key. Also some keyboards cannot + handle so many keys being pressed at the same time, so you might + have better luck with press :kbd:`Alt`, press :kbd:`SysRq`, + release :kbd:`SysRq`, press :kbd:``, release everything. + +On SPARC - You press :kbd:`ALT-STOP-`, I believe. + +On the serial console (PC style standard serial ports only) + You send a ``BREAK``, then within 5 seconds a command key. Sending + ``BREAK`` twice is interpreted as a normal BREAK. + +On PowerPC + Press :kbd:`ALT - Print Screen` (or :kbd:`F13`) - :kbd:``, + :kbd:`Print Screen` (or :kbd:`F13`) - :kbd:`` may suffice. + +On other + If you know of the key combos for other architectures, please + let me know so I can add them to this section. + +On all + write a character to /proc/sysrq-trigger. e.g.:: + + echo t > /proc/sysrq-trigger + +What are the 'command' keys? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +=========== =================================================================== +Command Function +=========== =================================================================== +``b`` Will immediately reboot the system without syncing or unmounting + your disks. + +``c`` Will perform a system crash by a NULL pointer dereference. + A crashdump will be taken if configured. + +``d`` Shows all locks that are held. + +``e`` Send a SIGTERM to all processes, except for init. + +``f`` Will call the oom killer to kill a memory hog process, but do not + panic if nothing can be killed. + +``g`` Used by kgdb (kernel debugger) + +``h`` Will display help (actually any other key than those listed + here will display help. but ``h`` is easy to remember :-) + +``i`` Send a SIGKILL to all processes, except for init. + +``j`` Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl. + +``k`` Secure Access Key (SAK) Kills all programs on the current virtual + console. NOTE: See important comments below in SAK section. + +``l`` Shows a stack backtrace for all active CPUs. + +``m`` Will dump current memory info to your console. + +``n`` Used to make RT tasks nice-able + +``o`` Will shut your system off (if configured and supported). + +``p`` Will dump the current registers and flags to your console. + +``q`` Will dump per CPU lists of all armed hrtimers (but NOT regular + timer_list timers) and detailed information about all + clockevent devices. + +``r`` Turns off keyboard raw mode and sets it to XLATE. + +``s`` Will attempt to sync all mounted filesystems. + +``t`` Will dump a list of current tasks and their information to your + console. + +``u`` Will attempt to remount all mounted filesystems read-only. + +``v`` Forcefully restores framebuffer console +``v`` Causes ETM buffer dump [ARM-specific] + +``w`` Dumps tasks that are in uninterruptable (blocked) state. + +``x`` Used by xmon interface on ppc/powerpc platforms. + Show global PMU Registers on sparc64. + Dump all TLB entries on MIPS. + +``y`` Show global CPU Registers [SPARC-64 specific] + +``z`` Dump the ftrace buffer + +``0``-``9`` Sets the console log level, controlling which kernel messages + will be printed to your console. (``0``, for example would make + it so that only emergency messages like PANICs or OOPSes would + make it to your console.) +=========== =================================================================== + +Okay, so what can I use them for? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Well, unraw(r) is very handy when your X server or a svgalib program crashes. + +sak(k) (Secure Access Key) is useful when you want to be sure there is no +trojan program running at console which could grab your password +when you would try to login. It will kill all programs on given console, +thus letting you make sure that the login prompt you see is actually +the one from init, not some trojan program. + +.. important:: + + In its true form it is not a true SAK like the one in a + c2 compliant system, and it should not be mistaken as + such. + +It seems others find it useful as (System Attention Key) which is +useful when you want to exit a program that will not let you switch consoles. +(For example, X or a svgalib program.) + +``reboot(b)`` is good when you're unable to shut down. But you should also +``sync(s)`` and ``umount(u)`` first. + +``crash(c)`` can be used to manually trigger a crashdump when the system is hung. +Note that this just triggers a crash if there is no dump mechanism available. + +``sync(s)`` is great when your system is locked up, it allows you to sync your +disks and will certainly lessen the chance of data loss and fscking. Note +that the sync hasn't taken place until you see the "OK" and "Done" appear +on the screen. (If the kernel is really in strife, you may not ever get the +OK or Done message...) + +``umount(u)`` is basically useful in the same ways as ``sync(s)``. I generally +``sync(s)``, ``umount(u)``, then ``reboot(b)`` when my system locks. It's saved +me many a fsck. Again, the unmount (remount read-only) hasn't taken place until +you see the "OK" and "Done" message appear on the screen. + +The loglevels ``0``-``9`` are useful when your console is being flooded with +kernel messages you do not want to see. Selecting ``0`` will prevent all but +the most urgent kernel messages from reaching your console. (They will +still be logged if syslogd/klogd are alive, though.) + +``term(e)`` and ``kill(i)`` are useful if you have some sort of runaway process +you are unable to kill any other way, especially if it's spawning other +processes. + +"just thaw ``it(j)``" is useful if your system becomes unresponsive due to a +frozen (probably root) filesystem via the FIFREEZE ioctl. + +Sometimes SysRq seems to get 'stuck' after using it, what can I do? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +That happens to me, also. I've found that tapping shift, alt, and control +on both sides of the keyboard, and hitting an invalid sysrq sequence again +will fix the problem. (i.e., something like :kbd:`alt-sysrq-z`). Switching to +another virtual console (:kbd:`ALT+Fn`) and then back again should also help. + +I hit SysRq, but nothing seems to happen, what's wrong? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are some keyboards that produce a different keycode for SysRq than the +pre-defined value of 99 (see ``KEY_SYSRQ`` in ``include/linux/input.h``), or +which don't have a SysRq key at all. In these cases, run ``showkey -s`` to find +an appropriate scancode sequence, and use ``setkeycodes 99`` to map +this sequence to the usual SysRq code (e.g., ``setkeycodes e05b 99``). It's +probably best to put this command in a boot script. Oh, and by the way, you +exit ``showkey`` by not typing anything for ten seconds. + +I want to add SysRQ key events to a module, how does it work? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to register a basic function with the table, you must first include +the header ``include/linux/sysrq.h``, this will define everything else you need. +Next, you must create a ``sysrq_key_op`` struct, and populate it with A) the key +handler function you will use, B) a help_msg string, that will print when SysRQ +prints help, and C) an action_msg string, that will print right before your +handler is called. Your handler must conform to the prototype in 'sysrq.h'. + +After the ``sysrq_key_op`` is created, you can call the kernel function +``register_sysrq_key(int key, struct sysrq_key_op *op_p);`` this will +register the operation pointed to by ``op_p`` at table key 'key', +if that slot in the table is blank. At module unload time, you must call +the function ``unregister_sysrq_key(int key, struct sysrq_key_op *op_p)``, which +will remove the key op pointed to by 'op_p' from the key 'key', if and only if +it is currently registered in that slot. This is in case the slot has been +overwritten since you registered it. + +The Magic SysRQ system works by registering key operations against a key op +lookup table, which is defined in 'drivers/tty/sysrq.c'. This key table has +a number of operations registered into it at compile time, but is mutable, +and 2 functions are exported for interface to it:: + + register_sysrq_key and unregister_sysrq_key. + +Of course, never ever leave an invalid pointer in the table. I.e., when +your module that called register_sysrq_key() exits, it must call +unregister_sysrq_key() to clean up the sysrq key table entry that it used. +Null pointers in the table are always safe. :) + +If for some reason you feel the need to call the handle_sysrq function from +within a function called by handle_sysrq, you must be aware that you are in +a lock (you are also in an interrupt handler, which means don't sleep!), so +you must call ``__handle_sysrq_nolock`` instead. + +When I hit a SysRq key combination only the header appears on the console? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Sysrq output is subject to the same console loglevel control as all +other console output. This means that if the kernel was booted 'quiet' +as is common on distro kernels the output may not appear on the actual +console, even though it will appear in the dmesg buffer, and be accessible +via the dmesg command and to the consumers of ``/proc/kmsg``. As a specific +exception the header line from the sysrq command is passed to all console +consumers as if the current loglevel was maximum. If only the header +is emitted it is almost certain that the kernel loglevel is too low. +Should you require the output on the console channel then you will need +to temporarily up the console loglevel using :kbd:`alt-sysrq-8` or:: + + echo 8 > /proc/sysrq-trigger + +Remember to return the loglevel to normal after triggering the sysrq +command you are interested in. + +I have more questions, who can I ask? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Just ask them on the linux-kernel mailing list: + linux-kernel@vger.kernel.org + +Credits +~~~~~~~ + +Written by Mydraal +Updated by Adam Sulmicki +Updated by Jeremy M. Dolan 2001/01/28 10:15:59 +Added to by Crutcher Dunnavant diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst new file mode 100644 index 000000000000..1df03b5cb02f --- /dev/null +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -0,0 +1,59 @@ +Tainted kernels +--------------- + +Some oops reports contain the string **'Tainted: '** after the program +counter. This indicates that the kernel has been tainted by some +mechanism. The string is followed by a series of position-sensitive +characters, each representing a particular tainted value. + + 1) 'G' if all modules loaded have a GPL or compatible license, 'P' if + any proprietary module has been loaded. Modules without a + MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by + insmod as GPL compatible are assumed to be proprietary. + + 2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all + modules were loaded normally. + + 3) ``S`` if the oops occurred on an SMP kernel running on hardware that + hasn't been certified as safe to run multiprocessor. + Currently this occurs only on various Athlons that are not + SMP capable. + + 4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all + modules were unloaded normally. + + 5) ``M`` if any processor has reported a Machine Check Exception, + ``' '`` if no Machine Check Exceptions have occurred. + + 6) ``B`` if a page-release function has found a bad page reference or + some unexpected page flags. + + 7) ``U`` if a user or user application specifically requested that the + Tainted flag be set, ``' '`` otherwise. + + 8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG. + + 9) ``A`` if the ACPI table has been overridden. + + 10) ``W`` if a warning has previously been issued by the kernel. + (Though some warnings may set more specific taint flags.) + + 11) ``C`` if a staging driver has been loaded. + + 12) ``I`` if the kernel is working around a severe bug in the platform + firmware (BIOS or similar). + + 13) ``O`` if an externally-built ("out-of-tree") module has been loaded. + + 14) ``E`` if an unsigned module has been loaded in a kernel supporting + module signature. + + 15) ``L`` if a soft lockup has previously occurred on the system. + + 16) ``K`` if the kernel has been live patched. + +The primary reason for the **'Tainted: '** string is to tell kernel +debuggers if this is a clean kernel or if anything unusual has +occurred. Tainting is permanent: even if an offending module is +unloaded, the tainted value remains to indicate that the kernel is not +trustworthy. diff --git a/Documentation/unicode.txt b/Documentation/admin-guide/unicode.rst similarity index 89% rename from Documentation/unicode.txt rename to Documentation/admin-guide/unicode.rst index 4a33f81cadb1..7425a3351321 100644 --- a/Documentation/unicode.txt +++ b/Documentation/admin-guide/unicode.rst @@ -1,12 +1,16 @@ +Unicode support +=============== + Last update: 2005-01-17, version 1.4 This file is maintained by H. Peter Anvin as part of the Linux Assigned Names And Numbers Authority (LANANA) project. The current version can be found at: - http://www.lanana.org/docs/unicode/unicode.txt + http://www.lanana.org/docs/unicode/admin-guide/unicode.rst - ------------------------ +Introduction +------------ The Linux kernel code has been rewritten to use Unicode to map characters to fonts. By downloading a single Unicode-to-font table, @@ -16,12 +20,14 @@ the font as indicated. This changes the semantics of the eight-bit character tables subtly. The four character tables are now: +=============== =============================== ================ Map symbol Map name Escape code (G0) - +=============== =============================== ================ LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B GRAF_MAP DEC VT100 pseudographics ESC ( 0 IBMPC_MAP IBM code page 437 ESC ( U USER_MAP User defined ESC ( K +=============== =============================== ================ In particular, ESC ( U is no longer "straight to font", since the font might be completely different than the IBM character set. This @@ -55,10 +61,12 @@ In addition, the following characters not present in Unicode 1.1.4 have been defined; these are used by the DEC VT graphics map. [v1.2] THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. +====== ====================================== U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 +====== ====================================== The DEC VT220 uses a 6x10 character matrix, and these characters form a smooth progression in the DEC VT graphics character set. I have @@ -74,10 +82,12 @@ keyboard symbols that are unlikely to ever be added to Unicode proper since they are horribly vendor-specific. This, of course, is an excellent example of horrible design. +====== ====================================== U+F810 KEYBOARD SYMBOL FLYING FLAG U+F811 KEYBOARD SYMBOL PULLDOWN MENU U+F812 KEYBOARD SYMBOL OPEN APPLE U+F813 KEYBOARD SYMBOL SOLID APPLE +====== ====================================== Klingon language support ------------------------ @@ -99,8 +109,10 @@ of the dingbats/symbols/forms type and this is a language, I have located it at the end, on a 16-cell boundary in keeping with standard Unicode practice. -NOTE: This range is now officially managed by the ConScript Unicode -Registry. The normative reference is at: +.. note:: + + This range is now officially managed by the ConScript Unicode + Registry. The normative reference is at: http://www.evertype.com/standards/csur/klingon.html @@ -112,6 +124,7 @@ However, since the set of symbols appear to be consistent throughout, with only the actual shapes being different, in keeping with standard Unicode practice these differences are considered font variants. +====== ======================================================= U+F8D0 KLINGON LETTER A U+F8D1 KLINGON LETTER B U+F8D2 KLINGON LETTER CH @@ -155,6 +168,7 @@ U+F8F9 KLINGON DIGIT NINE U+F8FD KLINGON COMMA U+F8FE KLINGON FULL STOP U+F8FF KLINGON SYMBOL FOR EMPIRE +====== ======================================================= Other Fictional and Artificial Scripts -------------------------------------- diff --git a/Documentation/admin-guide/vga-softcursor.rst b/Documentation/admin-guide/vga-softcursor.rst new file mode 100644 index 000000000000..f52175457e60 --- /dev/null +++ b/Documentation/admin-guide/vga-softcursor.rst @@ -0,0 +1,62 @@ +Software cursor for VGA +======================= + +by Pavel Machek +and Martin Mares + +Linux now has some ability to manipulate cursor appearance. Normally, +you can set the size of hardware cursor. You can now play a few new +tricks: you can make your cursor look like a non-blinking red block, +make it inverse background of the character it's over or to highlight +that character and still choose whether the original hardware cursor +should remain visible or not. There may be other things I have never +thought of. + +The cursor appearance is controlled by a ``[?1;2;3c`` escape sequence +where 1, 2 and 3 are parameters described below. If you omit any of them, +they will default to zeroes. + +first Parameter + specifies cursor size:: + + 0=default + 1=invisible + 2=underline, + ... + 8=full block + + 16 if you want the software cursor to be applied + + 32 if you want to always change the background color + + 64 if you dislike having the background the same as the + foreground. + + Highlights are ignored for the last two flags. + +second parameter + selects character attribute bits you want to change + (by simply XORing them with the value of this parameter). On standard + VGA, the high four bits specify background and the low four the + foreground. In both groups, low three bits set color (as in normal + color codes used by the console) and the most significant one turns + on highlight (or sometimes blinking -- it depends on the configuration + of your VGA). + +third parameter + consists of character attribute bits you want to set. + + Bit setting takes place before bit toggling, so you can simply clear a + bit by including it in both the set mask and the toggle mask. + +Examples +-------- + +To get normal blinking underline, use:: + + echo -e '\033[?2c' + +To get blinking block, use:: + + echo -e '\033[?6c' + +To get red non-blinking block, use:: + + echo -e '\033[?17;0;64c' diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting index 83c1df2fc758..259f00af3ab3 100644 --- a/Documentation/arm/Booting +++ b/Documentation/arm/Booting @@ -51,7 +51,7 @@ As an alternative, the boot loader can pass the relevant 'console=' option to the kernel via the tagged lists specifying the port, and serial format options as described in - Documentation/kernel-parameters.txt. + Documentation/admin-guide/kernel-parameters.rst. 3. Detect the machine type diff --git a/Documentation/arm/stm32/overview.txt b/Documentation/arm/stm32/overview.txt index 09aed5588d7c..a03b0357c017 100644 --- a/Documentation/arm/stm32/overview.txt +++ b/Documentation/arm/stm32/overview.txt @@ -5,7 +5,8 @@ Introduction ------------ The STMicroelectronics family of Cortex-M based MCUs are supported by the - 'STM32' platform of ARM Linux. Currently only the STM32F429 is supported. + 'STM32' platform of ARM Linux. Currently only the STM32F429 (Cortex-M4) + and STM32F746 (Cortex-M7) are supported. Configuration diff --git a/Documentation/arm/stm32/stm32f746-overview.txt b/Documentation/arm/stm32/stm32f746-overview.txt new file mode 100644 index 000000000000..cffd2b1ccd6f --- /dev/null +++ b/Documentation/arm/stm32/stm32f746-overview.txt @@ -0,0 +1,34 @@ + STM32F746 Overview + ================== + + Introduction + ------------ + The STM32F746 is a Cortex-M7 MCU aimed at various applications. + It features: + - Cortex-M7 core running up to @216MHz + - 1MB internal flash, 320KBytes internal RAM (+4KB of backup SRAM) + - FMC controller to connect SDRAM, NOR and NAND memories + - Dual mode QSPI + - SD/MMC/SDIO support + - Ethernet controller + - USB OTFG FS & HS controllers + - I2C, SPI, CAN busses support + - Several 16 & 32 bits general purpose timers + - Serial Audio interface + - LCD controller + - HDMI-CEC + - SPDIFRX + + Resources + --------- + Datasheet and reference manual are publicly available on ST website: + - http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series/stm32f7x6/stm32f746ng.html + + Document Author + --------------- + Alexandre Torgue + + + + + diff --git a/Documentation/assoc_array.txt b/Documentation/assoc_array.txt deleted file mode 100644 index 2f2c6cdd73c0..000000000000 --- a/Documentation/assoc_array.txt +++ /dev/null @@ -1,574 +0,0 @@ - ======================================== - GENERIC ASSOCIATIVE ARRAY IMPLEMENTATION - ======================================== - -Contents: - - - Overview. - - - The public API. - - Edit script. - - Operations table. - - Manipulation functions. - - Access functions. - - Index key form. - - - Internal workings. - - Basic internal tree layout. - - Shortcuts. - - Splitting and collapsing nodes. - - Non-recursive iteration. - - Simultaneous alteration and iteration. - - -======== -OVERVIEW -======== - -This associative array implementation is an object container with the following -properties: - - (1) Objects are opaque pointers. The implementation does not care where they - point (if anywhere) or what they point to (if anything). - - [!] NOTE: Pointers to objects _must_ be zero in the least significant bit. - - (2) Objects do not need to contain linkage blocks for use by the array. This - permits an object to be located in multiple arrays simultaneously. - Rather, the array is made up of metadata blocks that point to objects. - - (3) Objects require index keys to locate them within the array. - - (4) Index keys must be unique. Inserting an object with the same key as one - already in the array will replace the old object. - - (5) Index keys can be of any length and can be of different lengths. - - (6) Index keys should encode the length early on, before any variation due to - length is seen. - - (7) Index keys can include a hash to scatter objects throughout the array. - - (8) The array can iterated over. The objects will not necessarily come out in - key order. - - (9) The array can be iterated over whilst it is being modified, provided the - RCU readlock is being held by the iterator. Note, however, under these - circumstances, some objects may be seen more than once. If this is a - problem, the iterator should lock against modification. Objects will not - be missed, however, unless deleted. - -(10) Objects in the array can be looked up by means of their index key. - -(11) Objects can be looked up whilst the array is being modified, provided the - RCU readlock is being held by the thread doing the look up. - -The implementation uses a tree of 16-pointer nodes internally that are indexed -on each level by nibbles from the index key in the same manner as in a radix -tree. To improve memory efficiency, shortcuts can be emplaced to skip over -what would otherwise be a series of single-occupancy nodes. Further, nodes -pack leaf object pointers into spare space in the node rather than making an -extra branch until as such time an object needs to be added to a full node. - - -============== -THE PUBLIC API -============== - -The public API can be found in . The associative array is -rooted on the following structure: - - struct assoc_array { - ... - }; - -The code is selected by enabling CONFIG_ASSOCIATIVE_ARRAY. - - -EDIT SCRIPT ------------ - -The insertion and deletion functions produce an 'edit script' that can later be -applied to effect the changes without risking ENOMEM. This retains the -preallocated metadata blocks that will be installed in the internal tree and -keeps track of the metadata blocks that will be removed from the tree when the -script is applied. - -This is also used to keep track of dead blocks and dead objects after the -script has been applied so that they can be freed later. The freeing is done -after an RCU grace period has passed - thus allowing access functions to -proceed under the RCU read lock. - -The script appears as outside of the API as a pointer of the type: - - struct assoc_array_edit; - -There are two functions for dealing with the script: - - (1) Apply an edit script. - - void assoc_array_apply_edit(struct assoc_array_edit *edit); - - This will perform the edit functions, interpolating various write barriers - to permit accesses under the RCU read lock to continue. The edit script - will then be passed to call_rcu() to free it and any dead stuff it points - to. - - (2) Cancel an edit script. - - void assoc_array_cancel_edit(struct assoc_array_edit *edit); - - This frees the edit script and all preallocated memory immediately. If - this was for insertion, the new object is _not_ released by this function, - but must rather be released by the caller. - -These functions are guaranteed not to fail. - - -OPERATIONS TABLE ----------------- - -Various functions take a table of operations: - - struct assoc_array_ops { - ... - }; - -This points to a number of methods, all of which need to be provided: - - (1) Get a chunk of index key from caller data: - - unsigned long (*get_key_chunk)(const void *index_key, int level); - - This should return a chunk of caller-supplied index key starting at the - *bit* position given by the level argument. The level argument will be a - multiple of ASSOC_ARRAY_KEY_CHUNK_SIZE and the function should return - ASSOC_ARRAY_KEY_CHUNK_SIZE bits. No error is possible. - - - (2) Get a chunk of an object's index key. - - unsigned long (*get_object_key_chunk)(const void *object, int level); - - As the previous function, but gets its data from an object in the array - rather than from a caller-supplied index key. - - - (3) See if this is the object we're looking for. - - bool (*compare_object)(const void *object, const void *index_key); - - Compare the object against an index key and return true if it matches and - false if it doesn't. - - - (4) Diff the index keys of two objects. - - int (*diff_objects)(const void *object, const void *index_key); - - Return the bit position at which the index key of the specified object - differs from the given index key or -1 if they are the same. - - - (5) Free an object. - - void (*free_object)(void *object); - - Free the specified object. Note that this may be called an RCU grace - period after assoc_array_apply_edit() was called, so synchronize_rcu() may - be necessary on module unloading. - - -MANIPULATION FUNCTIONS ----------------------- - -There are a number of functions for manipulating an associative array: - - (1) Initialise an associative array. - - void assoc_array_init(struct assoc_array *array); - - This initialises the base structure for an associative array. It can't - fail. - - - (2) Insert/replace an object in an associative array. - - struct assoc_array_edit * - assoc_array_insert(struct assoc_array *array, - const struct assoc_array_ops *ops, - const void *index_key, - void *object); - - This inserts the given object into the array. Note that the least - significant bit of the pointer must be zero as it's used to type-mark - pointers internally. - - If an object already exists for that key then it will be replaced with the - new object and the old one will be freed automatically. - - The index_key argument should hold index key information and is - passed to the methods in the ops table when they are called. - - This function makes no alteration to the array itself, but rather returns - an edit script that must be applied. -ENOMEM is returned in the case of - an out-of-memory error. - - The caller should lock exclusively against other modifiers of the array. - - - (3) Delete an object from an associative array. - - struct assoc_array_edit * - assoc_array_delete(struct assoc_array *array, - const struct assoc_array_ops *ops, - const void *index_key); - - This deletes an object that matches the specified data from the array. - - The index_key argument should hold index key information and is - passed to the methods in the ops table when they are called. - - This function makes no alteration to the array itself, but rather returns - an edit script that must be applied. -ENOMEM is returned in the case of - an out-of-memory error. NULL will be returned if the specified object is - not found within the array. - - The caller should lock exclusively against other modifiers of the array. - - - (4) Delete all objects from an associative array. - - struct assoc_array_edit * - assoc_array_clear(struct assoc_array *array, - const struct assoc_array_ops *ops); - - This deletes all the objects from an associative array and leaves it - completely empty. - - This function makes no alteration to the array itself, but rather returns - an edit script that must be applied. -ENOMEM is returned in the case of - an out-of-memory error. - - The caller should lock exclusively against other modifiers of the array. - - - (5) Destroy an associative array, deleting all objects. - - void assoc_array_destroy(struct assoc_array *array, - const struct assoc_array_ops *ops); - - This destroys the contents of the associative array and leaves it - completely empty. It is not permitted for another thread to be traversing - the array under the RCU read lock at the same time as this function is - destroying it as no RCU deferral is performed on memory release - - something that would require memory to be allocated. - - The caller should lock exclusively against other modifiers and accessors - of the array. - - - (6) Garbage collect an associative array. - - int assoc_array_gc(struct assoc_array *array, - const struct assoc_array_ops *ops, - bool (*iterator)(void *object, void *iterator_data), - void *iterator_data); - - This iterates over the objects in an associative array and passes each one - to iterator(). If iterator() returns true, the object is kept. If it - returns false, the object will be freed. If the iterator() function - returns true, it must perform any appropriate refcount incrementing on the - object before returning. - - The internal tree will be packed down if possible as part of the iteration - to reduce the number of nodes in it. - - The iterator_data is passed directly to iterator() and is otherwise - ignored by the function. - - The function will return 0 if successful and -ENOMEM if there wasn't - enough memory. - - It is possible for other threads to iterate over or search the array under - the RCU read lock whilst this function is in progress. The caller should - lock exclusively against other modifiers of the array. - - -ACCESS FUNCTIONS ----------------- - -There are two functions for accessing an associative array: - - (1) Iterate over all the objects in an associative array. - - int assoc_array_iterate(const struct assoc_array *array, - int (*iterator)(const void *object, - void *iterator_data), - void *iterator_data); - - This passes each object in the array to the iterator callback function. - iterator_data is private data for that function. - - This may be used on an array at the same time as the array is being - modified, provided the RCU read lock is held. Under such circumstances, - it is possible for the iteration function to see some objects twice. If - this is a problem, then modification should be locked against. The - iteration algorithm should not, however, miss any objects. - - The function will return 0 if no objects were in the array or else it will - return the result of the last iterator function called. Iteration stops - immediately if any call to the iteration function results in a non-zero - return. - - - (2) Find an object in an associative array. - - void *assoc_array_find(const struct assoc_array *array, - const struct assoc_array_ops *ops, - const void *index_key); - - This walks through the array's internal tree directly to the object - specified by the index key.. - - This may be used on an array at the same time as the array is being - modified, provided the RCU read lock is held. - - The function will return the object if found (and set *_type to the object - type) or will return NULL if the object was not found. - - -INDEX KEY FORM --------------- - -The index key can be of any form, but since the algorithms aren't told how long -the key is, it is strongly recommended that the index key includes its length -very early on before any variation due to the length would have an effect on -comparisons. - -This will cause leaves with different length keys to scatter away from each -other - and those with the same length keys to cluster together. - -It is also recommended that the index key begin with a hash of the rest of the -key to maximise scattering throughout keyspace. - -The better the scattering, the wider and lower the internal tree will be. - -Poor scattering isn't too much of a problem as there are shortcuts and nodes -can contain mixtures of leaves and metadata pointers. - -The index key is read in chunks of machine word. Each chunk is subdivided into -one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and -on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is -unlikely that more than one word of any particular index key will have to be -used. - - -================= -INTERNAL WORKINGS -================= - -The associative array data structure has an internal tree. This tree is -constructed of two types of metadata blocks: nodes and shortcuts. - -A node is an array of slots. Each slot can contain one of four things: - - (*) A NULL pointer, indicating that the slot is empty. - - (*) A pointer to an object (a leaf). - - (*) A pointer to a node at the next level. - - (*) A pointer to a shortcut. - - -BASIC INTERNAL TREE LAYOUT --------------------------- - -Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index -key space is strictly subdivided by the nodes in the tree and nodes occur on -fixed levels. For example: - - Level: 0 1 2 3 - =============== =============== =============== =============== - NODE D - NODE B NODE C +------>+---+ - +------>+---+ +------>+---+ | | 0 | - NODE A | | 0 | | | 0 | | +---+ - +---+ | +---+ | +---+ | : : - | 0 | | : : | : : | +---+ - +---+ | +---+ | +---+ | | f | - | 1 |---+ | 3 |---+ | 7 |---+ +---+ - +---+ +---+ +---+ - : : : : | 8 |---+ - +---+ +---+ +---+ | NODE E - | e |---+ | f | : : +------>+---+ - +---+ | +---+ +---+ | 0 | - | f | | | f | +---+ - +---+ | +---+ : : - | NODE F +---+ - +------>+---+ | f | - | 0 | NODE G +---+ - +---+ +------>+---+ - : : | | 0 | - +---+ | +---+ - | 6 |---+ : : - +---+ +---+ - : : | f | - +---+ +---+ - | f | - +---+ - -In the above example, there are 7 nodes (A-G), each with 16 slots (0-f). -Assuming no other meta data nodes in the tree, the key space is divided thusly: - - KEY PREFIX NODE - ========== ==== - 137* D - 138* E - 13[0-69-f]* C - 1[0-24-f]* B - e6* G - e[0-57-f]* F - [02-df]* A - -So, for instance, keys with the following example index keys will be found in -the appropriate nodes: - - INDEX KEY PREFIX NODE - =============== ======= ==== - 13694892892489 13 C - 13795289025897 137 D - 13889dde88793 138 E - 138bbb89003093 138 E - 1394879524789 12 C - 1458952489 1 B - 9431809de993ba - A - b4542910809cd - A - e5284310def98 e F - e68428974237 e6 G - e7fffcbd443 e F - f3842239082 - A - -To save memory, if a node can hold all the leaves in its portion of keyspace, -then the node will have all those leaves in it and will not have any metadata -pointers - even if some of those leaves would like to be in the same slot. - -A node can contain a heterogeneous mix of leaves and metadata pointers. -Metadata pointers must be in the slots that match their subdivisions of key -space. The leaves can be in any slot not occupied by a metadata pointer. It -is guaranteed that none of the leaves in a node will match a slot occupied by a -metadata pointer. If the metadata pointer is there, any leaf whose key matches -the metadata key prefix must be in the subtree that the metadata pointer points -to. - -In the above example list of index keys, node A will contain: - - SLOT CONTENT INDEX KEY (PREFIX) - ==== =============== ================== - 1 PTR TO NODE B 1* - any LEAF 9431809de993ba - any LEAF b4542910809cd - e PTR TO NODE F e* - any LEAF f3842239082 - -and node B: - - 3 PTR TO NODE C 13* - any LEAF 1458952489 - - -SHORTCUTS ---------- - -Shortcuts are metadata records that jump over a piece of keyspace. A shortcut -is a replacement for a series of single-occupancy nodes ascending through the -levels. Shortcuts exist to save memory and to speed up traversal. - -It is possible for the root of the tree to be a shortcut - say, for example, -the tree contains at least 17 nodes all with key prefix '1111'. The insertion -algorithm will insert a shortcut to skip over the '1111' keyspace in a single -bound and get to the fourth level where these actually become different. - - -SPLITTING AND COLLAPSING NODES ------------------------------- - -Each node has a maximum capacity of 16 leaves and metadata pointers. If the -insertion algorithm finds that it is trying to insert a 17th object into a -node, that node will be split such that at least two leaves that have a common -key segment at that level end up in a separate node rooted on that slot for -that common key segment. - -If the leaves in a full node and the leaf that is being inserted are -sufficiently similar, then a shortcut will be inserted into the tree. - -When the number of objects in the subtree rooted at a node falls to 16 or -fewer, then the subtree will be collapsed down to a single node - and this will -ripple towards the root if possible. - - -NON-RECURSIVE ITERATION ------------------------ - -Each node and shortcut contains a back pointer to its parent and the number of -slot in that parent that points to it. None-recursive iteration uses these to -proceed rootwards through the tree, going to the parent node, slot N + 1 to -make sure progress is made without the need for a stack. - -The backpointers, however, make simultaneous alteration and iteration tricky. - - -SIMULTANEOUS ALTERATION AND ITERATION -------------------------------------- - -There are a number of cases to consider: - - (1) Simple insert/replace. This involves simply replacing a NULL or old - matching leaf pointer with the pointer to the new leaf after a barrier. - The metadata blocks don't change otherwise. An old leaf won't be freed - until after the RCU grace period. - - (2) Simple delete. This involves just clearing an old matching leaf. The - metadata blocks don't change otherwise. The old leaf won't be freed until - after the RCU grace period. - - (3) Insertion replacing part of a subtree that we haven't yet entered. This - may involve replacement of part of that subtree - but that won't affect - the iteration as we won't have reached the pointer to it yet and the - ancestry blocks are not replaced (the layout of those does not change). - - (4) Insertion replacing nodes that we're actively processing. This isn't a - problem as we've passed the anchoring pointer and won't switch onto the - new layout until we follow the back pointers - at which point we've - already examined the leaves in the replaced node (we iterate over all the - leaves in a node before following any of its metadata pointers). - - We might, however, re-see some leaves that have been split out into a new - branch that's in a slot further along than we were at. - - (5) Insertion replacing nodes that we're processing a dependent branch of. - This won't affect us until we follow the back pointers. Similar to (4). - - (6) Deletion collapsing a branch under us. This doesn't affect us because the - back pointers will get us back to the parent of the new node before we - could see the new node. The entire collapsed subtree is thrown away - unchanged - and will still be rooted on the same slot, so we shouldn't - process it a second time as we'll go back to slot + 1. - -Note: - - (*) Under some circumstances, we need to simultaneously change the parent - pointer and the parent slot pointer on a node (say, for example, we - inserted another node before it and moved it up a level). We cannot do - this without locking against a read - so we have to replace that node too. - - However, when we're changing a shortcut into a node this isn't a problem - as shortcuts only have one slot and so the parent slot number isn't used - when traversing backwards over one. This means that it's okay to change - the slot number first - provided suitable barriers are used to make sure - the parent slot number is read after the back pointer. - -Obsolete blocks and leaves are freed up after an RCU grace period has passed, -so as long as anyone doing walking or iteration holds the RCU read lock, the -old superstructure should not go away on them. diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt deleted file mode 100644 index df8416213202..000000000000 --- a/Documentation/bad_memory.txt +++ /dev/null @@ -1,45 +0,0 @@ -March 2008 -Jan-Simon Moeller, dl9pf@gmx.de - - -How to deal with bad memory e.g. reported by memtest86+ ? -######################################################### - -There are three possibilities I know of: - -1) Reinsert/swap the memory modules - -2) Buy new modules (best!) or try to exchange the memory - if you have spare-parts - -3) Use BadRAM or memmap - -This Howto is about number 3) . - - -BadRAM -###### -BadRAM is the actively developed and available as kernel-patch -here: http://rick.vanrein.org/linux/badram/ - -For more details see the BadRAM documentation. - -memmap -###### - -memmap is already in the kernel and usable as kernel-parameter at -boot-time. Its syntax is slightly strange and you may need to -calculate the values by yourself! - -Syntax to exclude a memory area (see kernel-parameters.txt for details): -memmap=$
- -Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and - some others. All had 0x1869xxxx in common, so I chose a pattern of - 0x18690000,0xffff0000. - -With the numbers of the example above: -memmap=64K$0x18690000 - or -memmap=0x10000$0x18690000 - diff --git a/Documentation/basic_profiling.txt b/Documentation/basic_profiling.txt deleted file mode 100644 index 8764e9f70821..000000000000 --- a/Documentation/basic_profiling.txt +++ /dev/null @@ -1,56 +0,0 @@ -These instructions are deliberately very basic. If you want something clever, -go read the real docs ;-) Please don't add more stuff, but feel free to -correct my mistakes ;-) (mbligh@aracnet.com) -Thanks to John Levon, Dave Hansen, et al. for help writing this. - - is the thing you're trying to measure. -Make sure you have the correct System.map / vmlinux referenced! - -It is probably easiest to use "make install" for linux and hack -/sbin/installkernel to copy vmlinux to /boot, in addition to vmlinuz, -config, System.map, which are usually installed by default. - -Readprofile ------------ -A recent readprofile command is needed for 2.6, such as found in util-linux -2.12a, which can be downloaded from: - -http://www.kernel.org/pub/linux/utils/util-linux/ - -Most distributions will ship it already. - -Add "profile=2" to the kernel command line. - -clear readprofile -r - -dump output readprofile -m /boot/System.map > captured_profile - -Oprofile --------- - -Get the source (see Changes for required version) from -http://oprofile.sourceforge.net/ and add "idle=poll" to the kernel command -line. - -Configure with CONFIG_PROFILING=y and CONFIG_OPROFILE=y & reboot on new kernel - -./configure --with-kernel-support -make install - -For superior results, be sure to enable the local APIC. If opreport sees -a 0Hz CPU, APIC was not on. Be aware that idle=poll may mean a performance -penalty. - -One time setup: - opcontrol --setup --vmlinux=/boot/vmlinux - -clear opcontrol --reset -start opcontrol --start - -stop opcontrol --stop -dump output opreport > output_file - -To only report on the kernel, run opreport -l /boot/vmlinux > output_file - -A reset is needed to clear old statistics, which survive a reboot. - diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt deleted file mode 100644 index ec83bbce547a..000000000000 --- a/Documentation/binfmt_misc.txt +++ /dev/null @@ -1,131 +0,0 @@ - Kernel Support for miscellaneous (your favourite) Binary Formats v1.1 - ===================================================================== - -This Kernel feature allows you to invoke almost (for restrictions see below) -every program by simply typing its name in the shell. -This includes for example compiled Java(TM), Python or Emacs programs. - -To achieve this you must tell binfmt_misc which interpreter has to be invoked -with which binary. Binfmt_misc recognises the binary-type by matching some bytes -at the beginning of the file with a magic byte sequence (masking out specified -bits) you have supplied. Binfmt_misc can also recognise a filename extension -aka '.com' or '.exe'. - -First you must mount binfmt_misc: - mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc - -To actually register a new binary type, you have to set up a string looking like -:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':' -upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register. - -Here is what the fields mean: - - 'name' is an identifier string. A new /proc file will be created with this - name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious - reasons. - - 'type' is the type of recognition. Give 'M' for magic and 'E' for extension. - - 'offset' is the offset of the magic/mask in the file, counted in bytes. This - defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored - when using filename extension matching. - - 'magic' is the byte sequence binfmt_misc is matching for. The magic string - may contain hex-encoded characters like \x0a or \xA4. Note that you must - escape any NUL bytes; parsing halts at the first one. In a shell environment - you might have to write \\x0a to prevent the shell from eating your \. - If you chose filename extension matching, this is the extension to be - recognised (without the '.', the \x0a specials are not allowed). Extension - matching is case sensitive, and slashes '/' are not allowed! - - 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some - bits from matching by supplying a string like magic and as long as magic. - The mask is anded with the byte sequence of the file. Note that you must - escape any NUL bytes; parsing halts at the first one. Ignored when using - filename extension matching. - - 'interpreter' is the program that should be invoked with the binary as first - argument (specify the full path) - - 'flags' is an optional field that controls several aspects of the invocation - of the interpreter. It is a string of capital letters, each controls a - certain aspect. The following flags are supported - - 'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite - the original argv[0] with the full path to the binary. When this - flag is included, binfmt_misc will add an argument to the argument - vector for this purpose, thus preserving the original argv[0]. - e.g. If your interp is set to /bin/foo and you run `blah` (which is - in /usr/local/bin), then the kernel will execute /bin/foo with - argv[] set to ["/bin/foo", "/usr/local/bin/blah", "blah"]. The - interp has to be aware of this so it can execute /usr/local/bin/blah - with argv[] set to ["blah"]. - 'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path - of the binary to the interpreter as an argument. When this flag is - included, binfmt_misc will open the file for reading and pass its - descriptor as an argument, instead of the full path, thus allowing - the interpreter to execute non-readable binaries. This feature - should be used with care - the interpreter has to be trusted not to - emit the contents of the non-readable binary. - 'C' - credentials. Currently, the behavior of binfmt_misc is to calculate - the credentials and security token of the new process according to - the interpreter. When this flag is included, these attributes are - calculated according to the binary. It also implies the 'O' flag. - This feature should be used with care as the interpreter - will run with root permissions when a setuid binary owned by root - is run with binfmt_misc. - 'F' - fix binary. The usual behaviour of binfmt_misc is to spawn the - binary lazily when the misc format file is invoked. However, - this doesn't work very well in the face of mount namespaces and - changeroots, so the F mode opens the binary as soon as the - emulation is installed and uses the opened image to spawn the - emulator, meaning it is always available once installed, - regardless of how the environment changes. - - -There are some restrictions: - - the whole register string may not exceed 1920 characters - - the magic must reside in the first 128 bytes of the file, i.e. - offset+size(magic) has to be less than 128 - - the interpreter string may not exceed 127 characters - -To use binfmt_misc you have to mount it first. You can mount it with -"mount -t binfmt_misc none /proc/sys/fs/binfmt_misc" command, or you can add -a line "none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0" to your -/etc/fstab so it auto mounts on boot. - -You may want to add the binary formats in one of your /etc/rc scripts during -boot-up. Read the manual of your init program to figure out how to do this -right. - -Think about the order of adding entries! Later added entries are matched first! - - -A few examples (assumed you are in /proc/sys/fs/binfmt_misc): - -- enable support for em86 (like binfmt_em86, for Alpha AXP only): - echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register - echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register - -- enable support for packed DOS applications (pre-configured dosemu hdimages): - echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register - -- enable support for Windows executables using wine: - echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register - -For java support see Documentation/java.txt - - -You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable) -or 1 (to enable) to /proc/sys/fs/binfmt_misc/status or /proc/.../the_name. -Catting the file tells you the current status of binfmt_misc/the entry. - -You can remove one entry or all entries by echoing -1 to /proc/.../the_name -or /proc/sys/fs/binfmt_misc/status. - - -HINTS: -====== - -If you want to pass special arguments to your interpreter, you can -write a wrapper script for it. See Documentation/java.txt for an -example. - -Your interpreter should NOT look in the PATH for the filename; the kernel -passes it the full filename (or the file descriptor) to use. Using $PATH can -cause unexpected behaviour and can be a security hazard. - - -Richard Günther diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 918e1e0d0e78..01ddeaf64b0f 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -348,7 +348,7 @@ Drivers can now specify a request prepare function (q->prep_rq_fn) that the block layer would invoke to pre-build device commands for a given request, or perform other preparatory processing for the request. This is routine is called by elv_next_request(), i.e. typically just before servicing a request. -(The prepare function would not be called for requests that have REQ_DONTPREP +(The prepare function would not be called for requests that have RQF_DONTPREP enabled) Aside: @@ -553,8 +553,8 @@ struct request { struct request_list *rl; } -See the rq_flag_bits definitions for an explanation of the various flags -available. Some bits are used by the block layer or i/o scheduler. +See the req_ops and req_flag_bits definitions for an explanation of the various +flags available. Some bits are used by the block layer or i/o scheduler. The behaviour of the various sector counts are almost the same as before, except that since we have multi-segment bios, current_nr_sectors refers diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt index 1e4f835a659d..895bd3813115 100644 --- a/Documentation/block/cfq-iosched.txt +++ b/Documentation/block/cfq-iosched.txt @@ -240,11 +240,11 @@ All cfq queues doing synchronous sequential IO go on to sync-idle tree. On this tree we idle on each queue individually. All synchronous non-sequential queues go on sync-noidle tree. Also any -request which are marked with REQ_NOIDLE go on this service tree. On this -tree we do not idle on individual queues instead idle on the whole group -of queues or the tree. So if there are 4 queues waiting for IO to dispatch -we will idle only once last queue has dispatched the IO and there is -no more IO on this service tree. +synchronous write request which is not marked with REQ_IDLE goes on this +service tree. On this tree we do not idle on individual queues instead idle +on the whole group of queues or the tree. So if there are 4 queues waiting +for IO to dispatch we will idle only once last queue has dispatched the IO +and there is no more IO on this service tree. All async writes go on async service tree. There is no idling on async queues. @@ -257,17 +257,17 @@ tree idling provides isolation with buffered write queues on async tree. FAQ === -Q1. Why to idle at all on queues marked with REQ_NOIDLE. +Q1. Why to idle at all on queues not marked with REQ_IDLE. -A1. We only do tree idle (all queues on sync-noidle tree) on queues marked - with REQ_NOIDLE. This helps in providing isolation with all the sync-idle +A1. We only do tree idle (all queues on sync-noidle tree) on queues not marked + with REQ_IDLE. This helps in providing isolation with all the sync-idle queues. Otherwise in presence of many sequential readers, other synchronous IO might not get fair share of disk. For example, if there are 10 sequential readers doing IO and they get - 100ms each. If a REQ_NOIDLE request comes in, it will be scheduled - roughly after 1 second. If after completion of REQ_NOIDLE request we - do not idle, and after a couple of milli seconds a another REQ_NOIDLE + 100ms each. If a !REQ_IDLE request comes in, it will be scheduled + roughly after 1 second. If after completion of !REQ_IDLE request we + do not idle, and after a couple of milli seconds a another !REQ_IDLE request comes in, again it will be scheduled after 1second. Repeat it and notice how a workload can lose its disk share and suffer due to multiple sequential readers. @@ -276,16 +276,16 @@ A1. We only do tree idle (all queues on sync-noidle tree) on queues marked context of fsync, and later some journaling data is written. Journaling data comes in only after fsync has finished its IO (atleast for ext4 that seemed to be the case). Now if one decides not to idle on fsync - thread due to REQ_NOIDLE, then next journaling write will not get + thread due to !REQ_IDLE, then next journaling write will not get scheduled for another second. A process doing small fsync, will suffer badly in presence of multiple sequential readers. - Hence doing tree idling on threads using REQ_NOIDLE flag on requests + Hence doing tree idling on threads using !REQ_IDLE flag on requests provides isolation from multiple sequential readers and at the same time we do not idle on individual threads. -Q2. When to specify REQ_NOIDLE -A2. I would think whenever one is doing synchronous write and not expecting +Q2. When to specify REQ_IDLE +A2. I would think whenever one is doing synchronous write and expecting more writes to be dispatched from same context soon, should be able - to specify REQ_NOIDLE on writes and that probably should work well for + to specify REQ_IDLE on writes and that probably should work well for most of the cases. diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt index d8880ca30af4..3140dbd860d8 100644 --- a/Documentation/block/null_blk.txt +++ b/Documentation/block/null_blk.txt @@ -72,4 +72,4 @@ use_per_node_hctx=[0/1]: Default: 0 queue for each CPU node in the system. use_lightnvm=[0/1]: Default: 0 - Register device with LightNVM. Requires blk-mq to be used. + Register device with LightNVM. Requires blk-mq and CONFIG_NVM to be enabled. diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 2a3904030dea..c0a3bb5a6e4e 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -54,9 +54,23 @@ This is the hardware sector size of the device, in bytes. io_poll (RW) ------------ -When read, this file shows the total number of block IO polls and how -many returned success. Writing '0' to this file will disable polling -for this device. Writing any non-zero value will enable this feature. +When read, this file shows whether polling is enabled (1) or disabled +(0). Writing '0' to this file will disable polling for this device. +Writing any non-zero value will enable this feature. + +io_poll_delay (RW) +------------------ +If polling is enabled, this controls what kind of polling will be +performed. It defaults to -1, which is classic polling. In this mode, +the CPU will repeatedly ask for completions without giving up any time. +If set to 0, a hybrid polling mode is used, where the kernel will attempt +to make an educated guess at when the IO will complete. Based on this +guess, the kernel will put the process issuing IO to sleep for an amount +of time, before entering a classic poll loop. This mode might be a +little slower than pure classic polling, but it will be more efficient. +If set to a value larger than 0, the kernel will put the process issuing +IO to sleep for this amont of microseconds before entering classic +polling. iostats (RW) ------------- @@ -169,5 +183,14 @@ This is the number of bytes the device can write in a single write-same command. A value of '0' means write-same is not supported by this device. +wb_lat_usec (RW) +---------------- +If the device is registered for writeback throttling, then this file shows +the target minimum read latency. If this latency is exceeded in a given +window of time (see wb_window_usec), then the writeback throttling will start +scaling back writes. Writing a value of '0' to this file disables the +feature. Writing a value of '-1' to this file resets the value to the +default setting. + Jens Axboe , February 2009 diff --git a/Documentation/blockdev/cciss.txt b/Documentation/blockdev/cciss.txt index b79d0a13e7cd..3a5477cc456e 100644 --- a/Documentation/blockdev/cciss.txt +++ b/Documentation/blockdev/cciss.txt @@ -184,7 +184,7 @@ infrequently used and the primary purpose of Smart Array controllers is to act as a RAID controller for disk drives, so the vast majority of commands are allocated for disk devices. However, if you have more than a few tape drives attached to a smart array, the default number of commands may not be -enought (for example, if you have 8 tape drives, you could only rewind 6 +enough (for example, if you have 8 tape drives, you could only rewind 6 at one time with the default number of commands.) The cciss_tape_cmds module parameter allows more commands (up to 16 more) to be allocated for use by tape drives. For example: diff --git a/Documentation/blockdev/ramdisk.txt b/Documentation/blockdev/ramdisk.txt index fe2ef978d85a..501e12e0323e 100644 --- a/Documentation/blockdev/ramdisk.txt +++ b/Documentation/blockdev/ramdisk.txt @@ -14,7 +14,7 @@ Contents: The RAM disk driver is a way to use main system memory as a block device. It is required for initrd, an initial filesystem used if you need to load modules -in order to access the root filesystem (see Documentation/initrd.txt). It can +in order to access the root filesystem (see Documentation/admin-guide/initrd.rst). It can also be used for a temporary filesystem for crypto work, since the contents are erased on reboot. diff --git a/Documentation/braille-console.txt b/Documentation/braille-console.txt deleted file mode 100644 index d0d042c2fd5e..000000000000 --- a/Documentation/braille-console.txt +++ /dev/null @@ -1,34 +0,0 @@ - Linux Braille Console - -To get early boot messages on a braille device (before userspace screen -readers can start), you first need to compile the support for the usual serial -console (see serial-console.txt), and for braille device (in Device Drivers - -Accessibility). - -Then you need to specify a console=brl, option on the kernel command line, the -format is: - - console=brl,serial_options... - -where serial_options... are the same as described in serial-console.txt - -So for instance you can use console=brl,ttyS0 if the braille device is connected -to the first serial port, and console=brl,ttyS0,115200 to override the baud rate -to 115200, etc. - -By default, the braille device will just show the last kernel message (console -mode). To review previous messages, press the Insert key to switch to the VT -review mode. In review mode, the arrow keys permit to browse in the VT content, -page up/down keys go at the top/bottom of the screen, and the home key goes back -to the cursor, hence providing very basic screen reviewing facility. - -Sound feedback can be obtained by adding the braille_console.sound=1 kernel -parameter. - -For simplicity, only one braille console can be enabled, other uses of -console=brl,... will be discarded. Also note that it does not interfere with -the console selection mechanism described in serial-console.txt - -For now, only the VisioBraille device is supported. - -Samuel Thibault diff --git a/Documentation/cgroup-v1/00-INDEX b/Documentation/cgroup-v1/00-INDEX index 106885ad670d..13e0c85e7b35 100644 --- a/Documentation/cgroup-v1/00-INDEX +++ b/Documentation/cgroup-v1/00-INDEX @@ -8,7 +8,7 @@ cpuacct.txt - CPU Accounting Controller; account CPU usage for groups of tasks. cpusets.txt - documents the cpusets feature; assign CPUs and Mem to a set of tasks. -devices.txt +admin-guide/devices.rst - Device Whitelist Controller; description, interface and security. freezer-subsystem.txt - checkpointing; rationale to not use signals, interface. diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt index 88951b179262..4a824d232472 100644 --- a/Documentation/circular-buffers.txt +++ b/Documentation/circular-buffers.txt @@ -161,7 +161,7 @@ The producer will look something like this: unsigned long head = buffer->head; /* The spin_unlock() and next spin_lock() provide needed ordering. */ - unsigned long tail = ACCESS_ONCE(buffer->tail); + unsigned long tail = READ_ONCE(buffer->tail); if (CIRC_SPACE(head, tail, buffer->size) >= 1) { /* insert one item into the buffer */ @@ -222,7 +222,7 @@ This will instruct the CPU to make sure the index is up to date before reading the new item, and then it shall make sure the CPU has finished reading the item before it writes the new tail pointer, which will erase the item. -Note the use of ACCESS_ONCE() and smp_load_acquire() to read the +Note the use of READ_ONCE() and smp_load_acquire() to read the opposition index. This prevents the compiler from discarding and reloading its cached value - which some compilers will do across smp_read_barrier_depends(). This isn't strictly needed if you can diff --git a/Documentation/conf.py b/Documentation/conf.py index bf6f310e5170..1ac958c0333d 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -34,10 +34,10 @@ from load_config import loadConfig # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. -extensions = ['kernel-doc', 'rstFlatTable', 'kernel_include', 'cdomain'] +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain'] # The name of the math extension changed on Sphinx 1.4 -if minor > 3: +if major == 1 and minor > 3: extensions.append("sphinx.ext.imgmath") else: extensions.append("sphinx.ext.pngmath") @@ -136,7 +136,7 @@ pygments_style = 'sphinx' todo_include_todos = False primary_domain = 'C' -highlight_language = 'guess' +highlight_language = 'none' # -- Options for HTML output ---------------------------------------------- @@ -332,18 +332,32 @@ latex_elements = { ''' } +# Fix reference escape troubles with Sphinx 1.4.x +if major == 1 and minor > 3: + latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n' + # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ + ('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide', + 'The kernel development community', 'manual'), + ('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation', + 'The kernel development community', 'manual'), + ('core-api/index', 'core-api.tex', 'The kernel core API manual', + 'The kernel development community', 'manual'), + ('driver-api/index', 'driver-api.tex', 'The kernel driver API manual', + 'The kernel development community', 'manual'), ('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation', 'The kernel development community', 'manual'), - ('development-process/index', 'development-process.tex', 'Linux Kernel Development Documentation', + ('process/index', 'development-process.tex', 'Linux Kernel Development Documentation', 'The kernel development community', 'manual'), ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', 'The kernel development community', 'manual'), ('media/index', 'media.tex', 'Linux Media Subsystem Documentation', 'The kernel development community', 'manual'), + ('security/index', 'security.tex', 'The kernel security subsystem manual', + 'The kernel development community', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of diff --git a/Documentation/core-api/assoc_array.rst b/Documentation/core-api/assoc_array.rst new file mode 100644 index 000000000000..d83cfff9ea43 --- /dev/null +++ b/Documentation/core-api/assoc_array.rst @@ -0,0 +1,551 @@ +======================================== +Generic Associative Array Implementation +======================================== + +Overview +======== + +This associative array implementation is an object container with the following +properties: + +1. Objects are opaque pointers. The implementation does not care where they + point (if anywhere) or what they point to (if anything). +.. note:: Pointers to objects _must_ be zero in the least significant bit. + +2. Objects do not need to contain linkage blocks for use by the array. This + permits an object to be located in multiple arrays simultaneously. + Rather, the array is made up of metadata blocks that point to objects. + +3. Objects require index keys to locate them within the array. + +4. Index keys must be unique. Inserting an object with the same key as one + already in the array will replace the old object. + +5. Index keys can be of any length and can be of different lengths. + +6. Index keys should encode the length early on, before any variation due to + length is seen. + +7. Index keys can include a hash to scatter objects throughout the array. + +8. The array can iterated over. The objects will not necessarily come out in + key order. + +9. The array can be iterated over whilst it is being modified, provided the + RCU readlock is being held by the iterator. Note, however, under these + circumstances, some objects may be seen more than once. If this is a + problem, the iterator should lock against modification. Objects will not + be missed, however, unless deleted. + +10. Objects in the array can be looked up by means of their index key. + +11. Objects can be looked up whilst the array is being modified, provided the + RCU readlock is being held by the thread doing the look up. + +The implementation uses a tree of 16-pointer nodes internally that are indexed +on each level by nibbles from the index key in the same manner as in a radix +tree. To improve memory efficiency, shortcuts can be emplaced to skip over +what would otherwise be a series of single-occupancy nodes. Further, nodes +pack leaf object pointers into spare space in the node rather than making an +extra branch until as such time an object needs to be added to a full node. + + +The Public API +============== + +The public API can be found in ````. The associative +array is rooted on the following structure:: + + struct assoc_array { + ... + }; + +The code is selected by enabling ``CONFIG_ASSOCIATIVE_ARRAY`` with:: + + ./script/config -e ASSOCIATIVE_ARRAY + + +Edit Script +----------- + +The insertion and deletion functions produce an 'edit script' that can later be +applied to effect the changes without risking ``ENOMEM``. This retains the +preallocated metadata blocks that will be installed in the internal tree and +keeps track of the metadata blocks that will be removed from the tree when the +script is applied. + +This is also used to keep track of dead blocks and dead objects after the +script has been applied so that they can be freed later. The freeing is done +after an RCU grace period has passed - thus allowing access functions to +proceed under the RCU read lock. + +The script appears as outside of the API as a pointer of the type:: + + struct assoc_array_edit; + +There are two functions for dealing with the script: + +1. Apply an edit script:: + + void assoc_array_apply_edit(struct assoc_array_edit *edit); + +This will perform the edit functions, interpolating various write barriers +to permit accesses under the RCU read lock to continue. The edit script +will then be passed to ``call_rcu()`` to free it and any dead stuff it points +to. + +2. Cancel an edit script:: + + void assoc_array_cancel_edit(struct assoc_array_edit *edit); + +This frees the edit script and all preallocated memory immediately. If +this was for insertion, the new object is _not_ released by this function, +but must rather be released by the caller. + +These functions are guaranteed not to fail. + + +Operations Table +---------------- + +Various functions take a table of operations:: + + struct assoc_array_ops { + ... + }; + +This points to a number of methods, all of which need to be provided: + +1. Get a chunk of index key from caller data:: + + unsigned long (*get_key_chunk)(const void *index_key, int level); + +This should return a chunk of caller-supplied index key starting at the +*bit* position given by the level argument. The level argument will be a +multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return +``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible. + + +2. Get a chunk of an object's index key:: + + unsigned long (*get_object_key_chunk)(const void *object, int level); + +As the previous function, but gets its data from an object in the array +rather than from a caller-supplied index key. + + +3. See if this is the object we're looking for:: + + bool (*compare_object)(const void *object, const void *index_key); + +Compare the object against an index key and return ``true`` if it matches and +``false`` if it doesn't. + + +4. Diff the index keys of two objects:: + + int (*diff_objects)(const void *object, const void *index_key); + +Return the bit position at which the index key of the specified object +differs from the given index key or -1 if they are the same. + + +5. Free an object:: + + void (*free_object)(void *object); + +Free the specified object. Note that this may be called an RCU grace period +after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be +necessary on module unloading. + + +Manipulation Functions +---------------------- + +There are a number of functions for manipulating an associative array: + +1. Initialise an associative array:: + + void assoc_array_init(struct assoc_array *array); + +This initialises the base structure for an associative array. It can't fail. + + +2. Insert/replace an object in an associative array:: + + struct assoc_array_edit * + assoc_array_insert(struct assoc_array *array, + const struct assoc_array_ops *ops, + const void *index_key, + void *object); + +This inserts the given object into the array. Note that the least +significant bit of the pointer must be zero as it's used to type-mark +pointers internally. + +If an object already exists for that key then it will be replaced with the +new object and the old one will be freed automatically. + +The ``index_key`` argument should hold index key information and is +passed to the methods in the ops table when they are called. + +This function makes no alteration to the array itself, but rather returns +an edit script that must be applied. ``-ENOMEM`` is returned in the case of +an out-of-memory error. + +The caller should lock exclusively against other modifiers of the array. + + +3. Delete an object from an associative array:: + + struct assoc_array_edit * + assoc_array_delete(struct assoc_array *array, + const struct assoc_array_ops *ops, + const void *index_key); + +This deletes an object that matches the specified data from the array. + +The ``index_key`` argument should hold index key information and is +passed to the methods in the ops table when they are called. + +This function makes no alteration to the array itself, but rather returns +an edit script that must be applied. ``-ENOMEM`` is returned in the case of +an out-of-memory error. ``NULL`` will be returned if the specified object is +not found within the array. + +The caller should lock exclusively against other modifiers of the array. + + +4. Delete all objects from an associative array:: + + struct assoc_array_edit * + assoc_array_clear(struct assoc_array *array, + const struct assoc_array_ops *ops); + +This deletes all the objects from an associative array and leaves it +completely empty. + +This function makes no alteration to the array itself, but rather returns +an edit script that must be applied. ``-ENOMEM`` is returned in the case of +an out-of-memory error. + +The caller should lock exclusively against other modifiers of the array. + + +5. Destroy an associative array, deleting all objects:: + + void assoc_array_destroy(struct assoc_array *array, + const struct assoc_array_ops *ops); + +This destroys the contents of the associative array and leaves it +completely empty. It is not permitted for another thread to be traversing +the array under the RCU read lock at the same time as this function is +destroying it as no RCU deferral is performed on memory release - +something that would require memory to be allocated. + +The caller should lock exclusively against other modifiers and accessors +of the array. + + +6. Garbage collect an associative array:: + + int assoc_array_gc(struct assoc_array *array, + const struct assoc_array_ops *ops, + bool (*iterator)(void *object, void *iterator_data), + void *iterator_data); + +This iterates over the objects in an associative array and passes each one to +``iterator()``. If ``iterator()`` returns ``true``, the object is kept. If it +returns ``false``, the object will be freed. If the ``iterator()`` function +returns ``true``, it must perform any appropriate refcount incrementing on the +object before returning. + +The internal tree will be packed down if possible as part of the iteration +to reduce the number of nodes in it. + +The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise +ignored by the function. + +The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't +enough memory. + +It is possible for other threads to iterate over or search the array under +the RCU read lock whilst this function is in progress. The caller should +lock exclusively against other modifiers of the array. + + +Access Functions +---------------- + +There are two functions for accessing an associative array: + +1. Iterate over all the objects in an associative array:: + + int assoc_array_iterate(const struct assoc_array *array, + int (*iterator)(const void *object, + void *iterator_data), + void *iterator_data); + +This passes each object in the array to the iterator callback function. +``iterator_data`` is private data for that function. + +This may be used on an array at the same time as the array is being +modified, provided the RCU read lock is held. Under such circumstances, +it is possible for the iteration function to see some objects twice. If +this is a problem, then modification should be locked against. The +iteration algorithm should not, however, miss any objects. + +The function will return ``0`` if no objects were in the array or else it will +return the result of the last iterator function called. Iteration stops +immediately if any call to the iteration function results in a non-zero +return. + + +2. Find an object in an associative array:: + + void *assoc_array_find(const struct assoc_array *array, + const struct assoc_array_ops *ops, + const void *index_key); + +This walks through the array's internal tree directly to the object +specified by the index key.. + +This may be used on an array at the same time as the array is being +modified, provided the RCU read lock is held. + +The function will return the object if found (and set ``*_type`` to the object +type) or will return ``NULL`` if the object was not found. + + +Index Key Form +-------------- + +The index key can be of any form, but since the algorithms aren't told how long +the key is, it is strongly recommended that the index key includes its length +very early on before any variation due to the length would have an effect on +comparisons. + +This will cause leaves with different length keys to scatter away from each +other - and those with the same length keys to cluster together. + +It is also recommended that the index key begin with a hash of the rest of the +key to maximise scattering throughout keyspace. + +The better the scattering, the wider and lower the internal tree will be. + +Poor scattering isn't too much of a problem as there are shortcuts and nodes +can contain mixtures of leaves and metadata pointers. + +The index key is read in chunks of machine word. Each chunk is subdivided into +one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and +on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is +unlikely that more than one word of any particular index key will have to be +used. + + +Internal Workings +================= + +The associative array data structure has an internal tree. This tree is +constructed of two types of metadata blocks: nodes and shortcuts. + +A node is an array of slots. Each slot can contain one of four things: + +* A NULL pointer, indicating that the slot is empty. +* A pointer to an object (a leaf). +* A pointer to a node at the next level. +* A pointer to a shortcut. + + +Basic Internal Tree Layout +-------------------------- + +Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index +key space is strictly subdivided by the nodes in the tree and nodes occur on +fixed levels. For example:: + + Level: 0 1 2 3 + =============== =============== =============== =============== + NODE D + NODE B NODE C +------>+---+ + +------>+---+ +------>+---+ | | 0 | + NODE A | | 0 | | | 0 | | +---+ + +---+ | +---+ | +---+ | : : + | 0 | | : : | : : | +---+ + +---+ | +---+ | +---+ | | f | + | 1 |---+ | 3 |---+ | 7 |---+ +---+ + +---+ +---+ +---+ + : : : : | 8 |---+ + +---+ +---+ +---+ | NODE E + | e |---+ | f | : : +------>+---+ + +---+ | +---+ +---+ | 0 | + | f | | | f | +---+ + +---+ | +---+ : : + | NODE F +---+ + +------>+---+ | f | + | 0 | NODE G +---+ + +---+ +------>+---+ + : : | | 0 | + +---+ | +---+ + | 6 |---+ : : + +---+ +---+ + : : | f | + +---+ +---+ + | f | + +---+ + +In the above example, there are 7 nodes (A-G), each with 16 slots (0-f). +Assuming no other meta data nodes in the tree, the key space is divided +thusly:: + + KEY PREFIX NODE + ========== ==== + 137* D + 138* E + 13[0-69-f]* C + 1[0-24-f]* B + e6* G + e[0-57-f]* F + [02-df]* A + +So, for instance, keys with the following example index keys will be found in +the appropriate nodes:: + + INDEX KEY PREFIX NODE + =============== ======= ==== + 13694892892489 13 C + 13795289025897 137 D + 13889dde88793 138 E + 138bbb89003093 138 E + 1394879524789 12 C + 1458952489 1 B + 9431809de993ba - A + b4542910809cd - A + e5284310def98 e F + e68428974237 e6 G + e7fffcbd443 e F + f3842239082 - A + +To save memory, if a node can hold all the leaves in its portion of keyspace, +then the node will have all those leaves in it and will not have any metadata +pointers - even if some of those leaves would like to be in the same slot. + +A node can contain a heterogeneous mix of leaves and metadata pointers. +Metadata pointers must be in the slots that match their subdivisions of key +space. The leaves can be in any slot not occupied by a metadata pointer. It +is guaranteed that none of the leaves in a node will match a slot occupied by a +metadata pointer. If the metadata pointer is there, any leaf whose key matches +the metadata key prefix must be in the subtree that the metadata pointer points +to. + +In the above example list of index keys, node A will contain:: + + SLOT CONTENT INDEX KEY (PREFIX) + ==== =============== ================== + 1 PTR TO NODE B 1* + any LEAF 9431809de993ba + any LEAF b4542910809cd + e PTR TO NODE F e* + any LEAF f3842239082 + +and node B:: + + 3 PTR TO NODE C 13* + any LEAF 1458952489 + + +Shortcuts +--------- + +Shortcuts are metadata records that jump over a piece of keyspace. A shortcut +is a replacement for a series of single-occupancy nodes ascending through the +levels. Shortcuts exist to save memory and to speed up traversal. + +It is possible for the root of the tree to be a shortcut - say, for example, +the tree contains at least 17 nodes all with key prefix ``1111``. The +insertion algorithm will insert a shortcut to skip over the ``1111`` keyspace +in a single bound and get to the fourth level where these actually become +different. + + +Splitting And Collapsing Nodes +------------------------------ + +Each node has a maximum capacity of 16 leaves and metadata pointers. If the +insertion algorithm finds that it is trying to insert a 17th object into a +node, that node will be split such that at least two leaves that have a common +key segment at that level end up in a separate node rooted on that slot for +that common key segment. + +If the leaves in a full node and the leaf that is being inserted are +sufficiently similar, then a shortcut will be inserted into the tree. + +When the number of objects in the subtree rooted at a node falls to 16 or +fewer, then the subtree will be collapsed down to a single node - and this will +ripple towards the root if possible. + + +Non-Recursive Iteration +----------------------- + +Each node and shortcut contains a back pointer to its parent and the number of +slot in that parent that points to it. None-recursive iteration uses these to +proceed rootwards through the tree, going to the parent node, slot N + 1 to +make sure progress is made without the need for a stack. + +The backpointers, however, make simultaneous alteration and iteration tricky. + + +Simultaneous Alteration And Iteration +------------------------------------- + +There are a number of cases to consider: + +1. Simple insert/replace. This involves simply replacing a NULL or old + matching leaf pointer with the pointer to the new leaf after a barrier. + The metadata blocks don't change otherwise. An old leaf won't be freed + until after the RCU grace period. + +2. Simple delete. This involves just clearing an old matching leaf. The + metadata blocks don't change otherwise. The old leaf won't be freed until + after the RCU grace period. + +3. Insertion replacing part of a subtree that we haven't yet entered. This + may involve replacement of part of that subtree - but that won't affect + the iteration as we won't have reached the pointer to it yet and the + ancestry blocks are not replaced (the layout of those does not change). + +4. Insertion replacing nodes that we're actively processing. This isn't a + problem as we've passed the anchoring pointer and won't switch onto the + new layout until we follow the back pointers - at which point we've + already examined the leaves in the replaced node (we iterate over all the + leaves in a node before following any of its metadata pointers). + + We might, however, re-see some leaves that have been split out into a new + branch that's in a slot further along than we were at. + +5. Insertion replacing nodes that we're processing a dependent branch of. + This won't affect us until we follow the back pointers. Similar to (4). + +6. Deletion collapsing a branch under us. This doesn't affect us because the + back pointers will get us back to the parent of the new node before we + could see the new node. The entire collapsed subtree is thrown away + unchanged - and will still be rooted on the same slot, so we shouldn't + process it a second time as we'll go back to slot + 1. + +.. note:: + + Under some circumstances, we need to simultaneously change the parent + pointer and the parent slot pointer on a node (say, for example, we + inserted another node before it and moved it up a level). We cannot do + this without locking against a read - so we have to replace that node too. + + However, when we're changing a shortcut into a node this isn't a problem + as shortcuts only have one slot and so the parent slot number isn't used + when traversing backwards over one. This means that it's okay to change + the slot number first - provided suitable barriers are used to make sure + the parent slot number is read after the back pointer. + +Obsolete blocks and leaves are freed up after an RCU grace period has passed, +so as long as anyone doing walking or iteration holds the RCU read lock, the +old superstructure should not go away on them. diff --git a/Documentation/atomic_ops.txt b/Documentation/core-api/atomic_ops.rst similarity index 76% rename from Documentation/atomic_ops.txt rename to Documentation/core-api/atomic_ops.rst index c9d1cacb4395..55e43f1c80de 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/core-api/atomic_ops.rst @@ -1,36 +1,42 @@ - Semantics and Behavior of Atomic and - Bitmask Operations +======================================================= +Semantics and Behavior of Atomic and Bitmask Operations +======================================================= - David S. Miller +:Author: David S. Miller - This document is intended to serve as a guide to Linux port +This document is intended to serve as a guide to Linux port maintainers on how to implement atomic counter, bitops, and spinlock interfaces properly. - The atomic_t type should be defined as a signed integer and +Atomic Type And Operations +========================== + +The atomic_t type should be defined as a signed integer and the atomic_long_t type as a signed long integer. Also, they should be made opaque such that any kind of cast to a normal C integer type -will fail. Something like the following should suffice: +will fail. Something like the following should suffice:: typedef struct { int counter; } atomic_t; typedef struct { long counter; } atomic_long_t; Historically, counter has been declared volatile. This is now discouraged. -See Documentation/volatile-considered-harmful.txt for the complete rationale. +See :ref:`Documentation/process/volatile-considered-harmful.rst +` for the complete rationale. local_t is very similar to atomic_t. If the counter is per CPU and only updated by one CPU, local_t is probably more appropriate. Please see -Documentation/local_ops.txt for the semantics of local_t. +:ref:`Documentation/core-api/local_ops.rst ` for the semantics of +local_t. The first operations to implement for atomic_t's are the initializers and -plain reads. +plain reads. :: #define ATOMIC_INIT(i) { (i) } #define atomic_set(v, i) ((v)->counter = (i)) -The first macro is used in definitions, such as: +The first macro is used in definitions, such as:: -static atomic_t my_counter = ATOMIC_INIT(1); + static atomic_t my_counter = ATOMIC_INIT(1); The initializer is atomic in that the return values of the atomic operations are guaranteed to be correct reflecting the initialized value if the @@ -38,10 +44,10 @@ initializer is used before runtime. If the initializer is used at runtime, a proper implicit or explicit read memory barrier is needed before reading the value with atomic_read from another thread. -As with all of the atomic_ interfaces, replace the leading "atomic_" -with "atomic_long_" to operate on atomic_long_t. +As with all of the ``atomic_`` interfaces, replace the leading ``atomic_`` +with ``atomic_long_`` to operate on atomic_long_t. -The second interface can be used at runtime, as in: +The second interface can be used at runtime, as in:: struct foo { atomic_t counter; }; ... @@ -59,7 +65,7 @@ been set with this operation or set with another operation. A proper implicit or explicit memory barrier is needed before the value set with the operation is guaranteed to be readable with atomic_read from another thread. -Next, we have: +Next, we have:: #define atomic_read(v) ((v)->counter) @@ -73,36 +79,37 @@ initialization by any other thread is visible yet, so the user of the interface must take care of that with a proper implicit or explicit memory barrier. -*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! *** +.. warning:: -Some architectures may choose to use the volatile keyword, barriers, or inline -assembly to guarantee some degree of immediacy for atomic_read() and -atomic_set(). This is not uniformly guaranteed, and may change in the future, -so all users of atomic_t should treat atomic_read() and atomic_set() as simple -C statements that may be reordered or optimized away entirely by the compiler -or processor, and explicitly invoke the appropriate compiler and/or memory -barrier for each use case. Failure to do so will result in code that may -suddenly break when used with different architectures or compiler -optimizations, or even changes in unrelated code which changes how the -compiler optimizes the section accessing atomic_t variables. + ``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS! -*** YOU HAVE BEEN WARNED! *** + Some architectures may choose to use the volatile keyword, barriers, or + inline assembly to guarantee some degree of immediacy for atomic_read() + and atomic_set(). This is not uniformly guaranteed, and may change in + the future, so all users of atomic_t should treat atomic_read() and + atomic_set() as simple C statements that may be reordered or optimized + away entirely by the compiler or processor, and explicitly invoke the + appropriate compiler and/or memory barrier for each use case. Failure + to do so will result in code that may suddenly break when used with + different architectures or compiler optimizations, or even changes in + unrelated code which changes how the compiler optimizes the section + accessing atomic_t variables. Properly aligned pointers, longs, ints, and chars (and unsigned equivalents) may be atomically loaded from and stored to in the same -sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE() -macro should be used to prevent the compiler from using optimizations -that might otherwise optimize accesses out of existence on the one hand, -or that might create unsolicited accesses on the other. +sense as described for atomic_read() and atomic_set(). The READ_ONCE() +and WRITE_ONCE() macros should be used to prevent the compiler from using +optimizations that might otherwise optimize accesses out of existence on +the one hand, or that might create unsolicited accesses on the other. -For example consider the following code: +For example consider the following code:: while (a > 0) do_something(); If the compiler can prove that do_something() does not store to the variable a, then the compiler is within its rights transforming this to -the following: +the following:: tmp = a; if (a > 0) @@ -110,14 +117,14 @@ the following: do_something(); If you don't want the compiler to do this (and you probably don't), then -you should use something like the following: +you should use something like the following:: - while (ACCESS_ONCE(a) < 0) + while (READ_ONCE(a) < 0) do_something(); Alternatively, you could place a barrier() call in the loop. -For another example, consider the following code: +For another example, consider the following code:: tmp_a = a; do_something_with(tmp_a); @@ -125,7 +132,7 @@ For another example, consider the following code: If the compiler can prove that do_something_with() does not store to the variable a, then the compiler is within its rights to manufacture an -additional load as follows: +additional load as follows:: tmp_a = a; do_something_with(tmp_a); @@ -139,15 +146,15 @@ The compiler would be likely to manufacture this additional load if do_something_with() was an inline function that made very heavy use of registers: reloading from variable a could save a flush to the stack and later reload. To prevent the compiler from attacking your -code in this manner, write the following: +code in this manner, write the following:: - tmp_a = ACCESS_ONCE(a); + tmp_a = READ_ONCE(a); do_something_with(tmp_a); do_something_else_with(tmp_a); For a final example, consider the following code, assuming that the variable a is set at boot time before the second CPU is brought online -and never changed later, so that memory barriers are not needed: +and never changed later, so that memory barriers are not needed:: if (a) b = 9; @@ -155,7 +162,7 @@ and never changed later, so that memory barriers are not needed: b = 42; The compiler is within its rights to manufacture an additional store -by transforming the above code into the following: +by transforming the above code into the following:: b = 42; if (a) @@ -163,20 +170,22 @@ by transforming the above code into the following: This could come as a fatal surprise to other code running concurrently that expected b to never have the value 42 if a was zero. To prevent -the compiler from doing this, write something like: +the compiler from doing this, write something like:: if (a) - ACCESS_ONCE(b) = 9; + WRITE_ONCE(b, 9); else - ACCESS_ONCE(b) = 42; + WRITE_ONCE(b, 42); Don't even -think- about doing this without proper use of memory barriers, locks, or atomic operations if variable a can change at runtime! -*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! *** +.. warning:: + + ``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER! Now, we move onto the atomic operation interfaces typically implemented with -the help of assembly code. +the help of assembly code. :: void atomic_add(int i, atomic_t *v); void atomic_sub(int i, atomic_t *v); @@ -192,7 +201,7 @@ One very important aspect of these two routines is that they DO NOT require any explicit memory barriers. They need only perform the atomic_t counter update in an SMP safe manner. -Next, we have: +Next, we have:: int atomic_inc_return(atomic_t *v); int atomic_dec_return(atomic_t *v); @@ -214,7 +223,7 @@ If the atomic instructions used in an implementation provide explicit memory barrier semantics which satisfy the above requirements, that is fine as well. -Let's move on: +Let's move on:: int atomic_add_return(int i, atomic_t *v); int atomic_sub_return(int i, atomic_t *v); @@ -224,7 +233,7 @@ explicit counter adjustment is given instead of the implicit "1". This means that like atomic_{inc,dec}_return(), the memory barrier semantics are required. -Next: +Next:: int atomic_inc_and_test(atomic_t *v); int atomic_dec_and_test(atomic_t *v); @@ -234,13 +243,13 @@ given atomic counter. They return a boolean indicating whether the resulting counter value was zero or not. Again, these primitives provide explicit memory barrier semantics around -the atomic operation. +the atomic operation:: int atomic_sub_and_test(int i, atomic_t *v); This is identical to atomic_dec_and_test() except that an explicit decrement is given instead of the implicit "1". This primitive must -provide explicit memory barrier semantics around the operation. +provide explicit memory barrier semantics around the operation:: int atomic_add_negative(int i, atomic_t *v); @@ -249,7 +258,7 @@ is return which indicates whether the resulting counter value is negative. This primitive must provide explicit memory barrier semantics around the operation. -Then: +Then:: int atomic_xchg(atomic_t *v, int new); @@ -257,14 +266,14 @@ This performs an atomic exchange operation on the atomic variable v, setting the given new value. It returns the old value that the atomic variable v had just before the operation. -atomic_xchg must provide explicit memory barriers around the operation. +atomic_xchg must provide explicit memory barriers around the operation. :: int atomic_cmpxchg(atomic_t *v, int old, int new); This performs an atomic compare exchange operation on the atomic value v, with the given old and new values. Like all atomic_xxx operations, atomic_cmpxchg will only satisfy its atomicity semantics as long as all -other accesses of *v are performed through atomic_xxx operations. +other accesses of \*v are performed through atomic_xxx operations. atomic_cmpxchg must provide explicit memory barriers around the operation, although if the comparison fails then no memory ordering guarantees are @@ -273,7 +282,7 @@ required. The semantics for atomic_cmpxchg are the same as those defined for 'cas' below. -Finally: +Finally:: int atomic_add_unless(atomic_t *v, int a, int u); @@ -289,12 +298,12 @@ atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0) If a caller requires memory barrier semantics around an atomic_t operation which does not return a value, a set of interfaces are -defined which accomplish this: +defined which accomplish this:: void smp_mb__before_atomic(void); void smp_mb__after_atomic(void); -For example, smp_mb__before_atomic() can be used like so: +For example, smp_mb__before_atomic() can be used like so:: obj->dead = 1; smp_mb__before_atomic(); @@ -315,67 +324,69 @@ atomic_t implementation above can have disastrous results. Here is an example, which follows a pattern occurring frequently in the Linux kernel. It is the use of atomic counters to implement reference counting, and it works such that once the counter falls to zero it can -be guaranteed that no other entity can be accessing the object: - -static void obj_list_add(struct obj *obj, struct list_head *head) -{ - obj->active = 1; - list_add(&obj->list, head); -} - -static void obj_list_del(struct obj *obj) -{ - list_del(&obj->list); - obj->active = 0; -} - -static void obj_destroy(struct obj *obj) -{ - BUG_ON(obj->active); - kfree(obj); -} - -struct obj *obj_list_peek(struct list_head *head) -{ - if (!list_empty(head)) { - struct obj *obj; +be guaranteed that no other entity can be accessing the object:: + + static void obj_list_add(struct obj *obj, struct list_head *head) + { + obj->active = 1; + list_add(&obj->list, head); + } + + static void obj_list_del(struct obj *obj) + { + list_del(&obj->list); + obj->active = 0; + } - obj = list_entry(head->next, struct obj, list); - atomic_inc(&obj->refcnt); - return obj; + static void obj_destroy(struct obj *obj) + { + BUG_ON(obj->active); + kfree(obj); } - return NULL; -} -void obj_poke(void) -{ - struct obj *obj; + struct obj *obj_list_peek(struct list_head *head) + { + if (!list_empty(head)) { + struct obj *obj; + + obj = list_entry(head->next, struct obj, list); + atomic_inc(&obj->refcnt); + return obj; + } + return NULL; + } + + void obj_poke(void) + { + struct obj *obj; + + spin_lock(&global_list_lock); + obj = obj_list_peek(&global_list); + spin_unlock(&global_list_lock); - spin_lock(&global_list_lock); - obj = obj_list_peek(&global_list); - spin_unlock(&global_list_lock); + if (obj) { + obj->ops->poke(obj); + if (atomic_dec_and_test(&obj->refcnt)) + obj_destroy(obj); + } + } + + void obj_timeout(struct obj *obj) + { + spin_lock(&global_list_lock); + obj_list_del(obj); + spin_unlock(&global_list_lock); - if (obj) { - obj->ops->poke(obj); if (atomic_dec_and_test(&obj->refcnt)) obj_destroy(obj); } -} - -void obj_timeout(struct obj *obj) -{ - spin_lock(&global_list_lock); - obj_list_del(obj); - spin_unlock(&global_list_lock); - if (atomic_dec_and_test(&obj->refcnt)) - obj_destroy(obj); -} +.. note:: -(This is a simplification of the ARP queue management in the - generic neighbour discover code of the networking. Olaf Kirch - found a bug wrt. memory barriers in kfree_skb() that exposed - the atomic_t memory barrier requirements quite clearly.) + This is a simplification of the ARP queue management in the generic + neighbour discover code of the networking. Olaf Kirch found a bug wrt. + memory barriers in kfree_skb() that exposed the atomic_t memory barrier + requirements quite clearly. Given the above scheme, it must be the case that the obj->active update done by the obj list deletion be visible to other processors @@ -383,7 +394,7 @@ before the atomic counter decrement is performed. Otherwise, the counter could fall to zero, yet obj->active would still be set, thus triggering the assertion in obj_destroy(). The error -sequence looks like this: +sequence looks like this:: cpu 0 cpu 1 obj_poke() obj_timeout() @@ -420,6 +431,10 @@ same scheme. Another note is that the atomic_t operations returning values are extremely slow on an old 386. + +Atomic Bitmask +============== + We will now cover the atomic bitmask operations. You will find that their SMP and memory barrier semantics are similar in shape and scope to the atomic_t ops above. @@ -427,7 +442,7 @@ to the atomic_t ops above. Native atomic bit operations are defined to operate on objects aligned to the size of an "unsigned long" C data type, and are least of that size. The endianness of the bits within each "unsigned long" are the -native endianness of the cpu. +native endianness of the cpu. :: void set_bit(unsigned long nr, volatile unsigned long *addr); void clear_bit(unsigned long nr, volatile unsigned long *addr); @@ -437,7 +452,7 @@ These routines set, clear, and change, respectively, the bit number indicated by "nr" on the bit mask pointed to by "ADDR". They must execute atomically, yet there are no implicit memory barrier -semantics required of these interfaces. +semantics required of these interfaces. :: int test_and_set_bit(unsigned long nr, volatile unsigned long *addr); int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr); @@ -466,7 +481,7 @@ must provide explicit memory barrier semantics around their execution. All memory operations before the atomic bit operation call must be made visible globally before the atomic bit operation is made visible. Likewise, the atomic bit operation must be visible globally before any -subsequent memory operation is made visible. For example: +subsequent memory operation is made visible. For example:: obj->dead = 1; if (test_and_set_bit(0, &obj->flags)) @@ -479,7 +494,7 @@ done by test_and_set_bit() becomes visible. Likewise, the atomic memory operation done by test_and_set_bit() must become visible before "obj->killed = 1;" is visible. -Finally there is the basic operation: +Finally there is the basic operation:: int test_bit(unsigned long nr, __const__ volatile unsigned long *addr); @@ -488,13 +503,13 @@ pointed to by "addr". If explicit memory barriers are required around {set,clear}_bit() (which do not return a value, and thus does not need to provide memory barrier -semantics), two interfaces are provided: +semantics), two interfaces are provided:: void smp_mb__before_atomic(void); void smp_mb__after_atomic(void); They are used as follows, and are akin to their atomic_t operation -brothers: +brothers:: /* All memory operations before this call will * be globally visible before the clear_bit(). @@ -511,7 +526,7 @@ There are two special bitops with lock barrier semantics (acquire/release, same as spinlocks). These operate in the same way as their non-_lock/unlock postfixed variants, except that they are to provide acquire/release semantics, respectively. This means they can be used for bit_spin_trylock and -bit_spin_unlock type operations without specifying any more barriers. +bit_spin_unlock type operations without specifying any more barriers. :: int test_and_set_bit_lock(unsigned long nr, unsigned long *addr); void clear_bit_unlock(unsigned long nr, unsigned long *addr); @@ -526,7 +541,7 @@ provided. They are used in contexts where some other higher-level SMP locking scheme is being used to protect the bitmask, and thus less expensive non-atomic operations may be used in the implementation. They have names similar to the above bitmask operation interfaces, -except that two underscores are prefixed to the interface name. +except that two underscores are prefixed to the interface name. :: void __set_bit(unsigned long nr, volatile unsigned long *addr); void __clear_bit(unsigned long nr, volatile unsigned long *addr); @@ -542,9 +557,11 @@ The routines xchg() and cmpxchg() must provide the same exact memory-barrier semantics as the atomic and bit operations returning values. -Note: If someone wants to use xchg(), cmpxchg() and their variants, -linux/atomic.h should be included rather than asm/cmpxchg.h, unless -the code is in arch/* and can take care of itself. +.. note:: + + If someone wants to use xchg(), cmpxchg() and their variants, + linux/atomic.h should be included rather than asm/cmpxchg.h, unless the + code is in arch/* and can take care of itself. Spinlocks and rwlocks have memory barrier expectations as well. The rule to follow is simple: @@ -558,7 +575,7 @@ The rule to follow is simple: Which finally brings us to _atomic_dec_and_lock(). There is an architecture-neutral version implemented in lib/dec_and_lock.c, -but most platforms will wish to optimize this in assembler. +but most platforms will wish to optimize this in assembler. :: int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock); @@ -573,7 +590,7 @@ sure the spinlock operation is globally visible before any subsequent memory operation. We can demonstrate this operation more clearly if we define -an abstract atomic operation: +an abstract atomic operation:: long cas(long *mem, long old, long new); @@ -584,48 +601,48 @@ an abstract atomic operation: 3) Regardless, the current value at "mem" is returned. As an example usage, here is what an atomic counter update -might look like: +might look like:: -void example_atomic_inc(long *counter) -{ - long old, new, ret; + void example_atomic_inc(long *counter) + { + long old, new, ret; - while (1) { - old = *counter; - new = old + 1; + while (1) { + old = *counter; + new = old + 1; - ret = cas(counter, old, new); - if (ret == old) - break; - } -} - -Let's use cas() in order to build a pseudo-C atomic_dec_and_lock(): - -int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) -{ - long old, new, ret; - int went_to_zero; - - went_to_zero = 0; - while (1) { - old = atomic_read(atomic); - new = old - 1; - if (new == 0) { - went_to_zero = 1; - spin_lock(lock); - } - ret = cas(atomic, old, new); - if (ret == old) - break; - if (went_to_zero) { - spin_unlock(lock); - went_to_zero = 0; + ret = cas(counter, old, new); + if (ret == old) + break; } } - return went_to_zero; -} +Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():: + + int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) + { + long old, new, ret; + int went_to_zero; + + went_to_zero = 0; + while (1) { + old = atomic_read(atomic); + new = old - 1; + if (new == 0) { + went_to_zero = 1; + spin_lock(lock); + } + ret = cas(atomic, old, new); + if (ret == old) + break; + if (went_to_zero) { + spin_unlock(lock); + went_to_zero = 0; + } + } + + return went_to_zero; + } Now, as far as memory barriers go, as long as spin_lock() strictly orders all subsequent memory operations (including @@ -635,6 +652,7 @@ Said another way, _atomic_dec_and_lock() must guarantee that a counter dropping to zero is never made visible before the spinlock being acquired. -Note that this also means that for the case where the counter -is not dropping to zero, there are no memory ordering -requirements. +.. note:: + + Note that this also means that for the case where the counter is not + dropping to zero, there are no memory ordering requirements. diff --git a/Documentation/core-api/conf.py b/Documentation/core-api/conf.py new file mode 100644 index 000000000000..db1f7659f3da --- /dev/null +++ b/Documentation/core-api/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = "Core-API Documentation" + +tags.add("subproject") + +latex_documents = [ + ('index', 'core-api.tex', project, + 'The kernel development community', 'manual'), +] diff --git a/Documentation/core-api/debug-objects.rst b/Documentation/core-api/debug-objects.rst new file mode 100644 index 000000000000..ac926fd55a64 --- /dev/null +++ b/Documentation/core-api/debug-objects.rst @@ -0,0 +1,310 @@ +============================================ +The object-lifetime debugging infrastructure +============================================ + +:Author: Thomas Gleixner + +Introduction +============ + +debugobjects is a generic infrastructure to track the life time of +kernel objects and validate the operations on those. + +debugobjects is useful to check for the following error patterns: + +- Activation of uninitialized objects + +- Initialization of active objects + +- Usage of freed/destroyed objects + +debugobjects is not changing the data structure of the real object so it +can be compiled in with a minimal runtime impact and enabled on demand +with a kernel command line option. + +Howto use debugobjects +====================== + +A kernel subsystem needs to provide a data structure which describes the +object type and add calls into the debug code at appropriate places. The +data structure to describe the object type needs at minimum the name of +the object type. Optional functions can and should be provided to fixup +detected problems so the kernel can continue to work and the debug +information can be retrieved from a live system instead of hard core +debugging with serial consoles and stack trace transcripts from the +monitor. + +The debug calls provided by debugobjects are: + +- debug_object_init + +- debug_object_init_on_stack + +- debug_object_activate + +- debug_object_deactivate + +- debug_object_destroy + +- debug_object_free + +- debug_object_assert_init + +Each of these functions takes the address of the real object and a +pointer to the object type specific debug description structure. + +Each detected error is reported in the statistics and a limited number +of errors are printk'ed including a full stack trace. + +The statistics are available via /sys/kernel/debug/debug_objects/stats. +They provide information about the number of warnings and the number of +successful fixups along with information about the usage of the internal +tracking objects and the state of the internal tracking objects pool. + +Debug functions +=============== + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_init + +This function is called whenever the initialization function of a real +object is called. + +When the real object is already tracked by debugobjects it is checked, +whether the object can be initialized. Initializing is not allowed for +active and destroyed objects. When debugobjects detects an error, then +it calls the fixup_init function of the object type description +structure if provided by the caller. The fixup function can correct the +problem before the real initialization of the object happens. E.g. it +can deactivate an active object in order to prevent damage to the +subsystem. + +When the real object is not yet tracked by debugobjects, debugobjects +allocates a tracker object for the real object and sets the tracker +object state to ODEBUG_STATE_INIT. It verifies that the object is not +on the callers stack. If it is on the callers stack then a limited +number of warnings including a full stack trace is printk'ed. The +calling code must use debug_object_init_on_stack() and remove the +object before leaving the function which allocated it. See next section. + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_init_on_stack + +This function is called whenever the initialization function of a real +object which resides on the stack is called. + +When the real object is already tracked by debugobjects it is checked, +whether the object can be initialized. Initializing is not allowed for +active and destroyed objects. When debugobjects detects an error, then +it calls the fixup_init function of the object type description +structure if provided by the caller. The fixup function can correct the +problem before the real initialization of the object happens. E.g. it +can deactivate an active object in order to prevent damage to the +subsystem. + +When the real object is not yet tracked by debugobjects debugobjects +allocates a tracker object for the real object and sets the tracker +object state to ODEBUG_STATE_INIT. It verifies that the object is on +the callers stack. + +An object which is on the stack must be removed from the tracker by +calling debug_object_free() before the function which allocates the +object returns. Otherwise we keep track of stale objects. + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_activate + +This function is called whenever the activation function of a real +object is called. + +When the real object is already tracked by debugobjects it is checked, +whether the object can be activated. Activating is not allowed for +active and destroyed objects. When debugobjects detects an error, then +it calls the fixup_activate function of the object type description +structure if provided by the caller. The fixup function can correct the +problem before the real activation of the object happens. E.g. it can +deactivate an active object in order to prevent damage to the subsystem. + +When the real object is not yet tracked by debugobjects then the +fixup_activate function is called if available. This is necessary to +allow the legitimate activation of statically allocated and initialized +objects. The fixup function checks whether the object is valid and calls +the debug_objects_init() function to initialize the tracking of this +object. + +When the activation is legitimate, then the state of the associated +tracker object is set to ODEBUG_STATE_ACTIVE. + + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_deactivate + +This function is called whenever the deactivation function of a real +object is called. + +When the real object is tracked by debugobjects it is checked, whether +the object can be deactivated. Deactivating is not allowed for untracked +or destroyed objects. + +When the deactivation is legitimate, then the state of the associated +tracker object is set to ODEBUG_STATE_INACTIVE. + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_destroy + +This function is called to mark an object destroyed. This is useful to +prevent the usage of invalid objects, which are still available in +memory: either statically allocated objects or objects which are freed +later. + +When the real object is tracked by debugobjects it is checked, whether +the object can be destroyed. Destruction is not allowed for active and +destroyed objects. When debugobjects detects an error, then it calls the +fixup_destroy function of the object type description structure if +provided by the caller. The fixup function can correct the problem +before the real destruction of the object happens. E.g. it can +deactivate an active object in order to prevent damage to the subsystem. + +When the destruction is legitimate, then the state of the associated +tracker object is set to ODEBUG_STATE_DESTROYED. + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_free + +This function is called before an object is freed. + +When the real object is tracked by debugobjects it is checked, whether +the object can be freed. Free is not allowed for active objects. When +debugobjects detects an error, then it calls the fixup_free function of +the object type description structure if provided by the caller. The +fixup function can correct the problem before the real free of the +object happens. E.g. it can deactivate an active object in order to +prevent damage to the subsystem. + +Note that debug_object_free removes the object from the tracker. Later +usage of the object is detected by the other debug checks. + + +.. kernel-doc:: lib/debugobjects.c + :functions: debug_object_assert_init + +This function is called to assert that an object has been initialized. + +When the real object is not tracked by debugobjects, it calls +fixup_assert_init of the object type description structure provided by +the caller, with the hardcoded object state ODEBUG_NOT_AVAILABLE. The +fixup function can correct the problem by calling debug_object_init +and other specific initializing functions. + +When the real object is already tracked by debugobjects it is ignored. + +Fixup functions +=============== + +Debug object type description structure +--------------------------------------- + +.. kernel-doc:: include/linux/debugobjects.h + :internal: + +fixup_init +----------- + +This function is called from the debug code whenever a problem in +debug_object_init is detected. The function takes the address of the +object and the state which is currently recorded in the tracker. + +Called from debug_object_init when the object state is: + +- ODEBUG_STATE_ACTIVE + +The function returns true when the fixup was successful, otherwise +false. The return value is used to update the statistics. + +Note, that the function needs to call the debug_object_init() function +again, after the damage has been repaired in order to keep the state +consistent. + +fixup_activate +--------------- + +This function is called from the debug code whenever a problem in +debug_object_activate is detected. + +Called from debug_object_activate when the object state is: + +- ODEBUG_STATE_NOTAVAILABLE + +- ODEBUG_STATE_ACTIVE + +The function returns true when the fixup was successful, otherwise +false. The return value is used to update the statistics. + +Note that the function needs to call the debug_object_activate() +function again after the damage has been repaired in order to keep the +state consistent. + +The activation of statically initialized objects is a special case. When +debug_object_activate() has no tracked object for this object address +then fixup_activate() is called with object state +ODEBUG_STATE_NOTAVAILABLE. The fixup function needs to check whether +this is a legitimate case of a statically initialized object or not. In +case it is it calls debug_object_init() and debug_object_activate() +to make the object known to the tracker and marked active. In this case +the function should return false because this is not a real fixup. + +fixup_destroy +-------------- + +This function is called from the debug code whenever a problem in +debug_object_destroy is detected. + +Called from debug_object_destroy when the object state is: + +- ODEBUG_STATE_ACTIVE + +The function returns true when the fixup was successful, otherwise +false. The return value is used to update the statistics. + +fixup_free +----------- + +This function is called from the debug code whenever a problem in +debug_object_free is detected. Further it can be called from the debug +checks in kfree/vfree, when an active object is detected from the +debug_check_no_obj_freed() sanity checks. + +Called from debug_object_free() or debug_check_no_obj_freed() when +the object state is: + +- ODEBUG_STATE_ACTIVE + +The function returns true when the fixup was successful, otherwise +false. The return value is used to update the statistics. + +fixup_assert_init +------------------- + +This function is called from the debug code whenever a problem in +debug_object_assert_init is detected. + +Called from debug_object_assert_init() with a hardcoded state +ODEBUG_STATE_NOTAVAILABLE when the object is not found in the debug +bucket. + +The function returns true when the fixup was successful, otherwise +false. The return value is used to update the statistics. + +Note, this function should make sure debug_object_init() is called +before returning. + +The handling of statically initialized objects is a special case. The +fixup function should check if this is a legitimate case of a statically +initialized object or not. In this case only debug_object_init() +should be called to make the object known to the tracker. Then the +function should return false because this is not a real fixup. + +Known Bugs And Assumptions +========================== + +None (knock on wood). diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst new file mode 100644 index 000000000000..2872ca1a52f1 --- /dev/null +++ b/Documentation/core-api/index.rst @@ -0,0 +1,33 @@ +====================== +Core API Documentation +====================== + +This is the beginning of a manual for core kernel APIs. The conversion +(and writing!) of documents for this manual is much appreciated! + +Core utilities +============== + +.. toctree:: + :maxdepth: 1 + + assoc_array + atomic_ops + local_ops + workqueue + +Interfaces for kernel debugging +=============================== + +.. toctree:: + :maxdepth: 1 + + debug-objects + tracepoint + +.. only:: subproject + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/core-api/local_ops.rst b/Documentation/core-api/local_ops.rst new file mode 100644 index 000000000000..1062ddba62c7 --- /dev/null +++ b/Documentation/core-api/local_ops.rst @@ -0,0 +1,206 @@ + +.. _local_ops: + +================================================= +Semantics and Behavior of Local Atomic Operations +================================================= + +:Author: Mathieu Desnoyers + + +This document explains the purpose of the local atomic operations, how +to implement them for any given architecture and shows how they can be used +properly. It also stresses on the precautions that must be taken when reading +those local variables across CPUs when the order of memory writes matters. + +.. note:: + + Note that ``local_t`` based operations are not recommended for general + kernel use. Please use the ``this_cpu`` operations instead unless there is + really a special purpose. Most uses of ``local_t`` in the kernel have been + replaced by ``this_cpu`` operations. ``this_cpu`` operations combine the + relocation with the ``local_t`` like semantics in a single instruction and + yield more compact and faster executing code. + + +Purpose of local atomic operations +================================== + +Local atomic operations are meant to provide fast and highly reentrant per CPU +counters. They minimize the performance cost of standard atomic operations by +removing the LOCK prefix and memory barriers normally required to synchronize +across CPUs. + +Having fast per CPU atomic counters is interesting in many cases: it does not +require disabling interrupts to protect from interrupt handlers and it permits +coherent counters in NMI handlers. It is especially useful for tracing purposes +and for various performance monitoring counters. + +Local atomic operations only guarantee variable modification atomicity wrt the +CPU which owns the data. Therefore, care must taken to make sure that only one +CPU writes to the ``local_t`` data. This is done by using per cpu data and +making sure that we modify it from within a preemption safe context. It is +however permitted to read ``local_t`` data from any CPU: it will then appear to +be written out of order wrt other memory writes by the owner CPU. + + +Implementation for a given architecture +======================================= + +It can be done by slightly modifying the standard atomic operations: only +their UP variant must be kept. It typically means removing LOCK prefix (on +i386 and x86_64) and any SMP synchronization barrier. If the architecture does +not have a different behavior between SMP and UP, including +``asm-generic/local.h`` in your architecture's ``local.h`` is sufficient. + +The ``local_t`` type is defined as an opaque ``signed long`` by embedding an +``atomic_long_t`` inside a structure. This is made so a cast from this type to +a ``long`` fails. The definition looks like:: + + typedef struct { atomic_long_t a; } local_t; + + +Rules to follow when using local atomic operations +================================================== + +* Variables touched by local ops must be per cpu variables. +* *Only* the CPU owner of these variables must write to them. +* This CPU can use local ops from any context (process, irq, softirq, nmi, ...) + to update its ``local_t`` variables. +* Preemption (or interrupts) must be disabled when using local ops in + process context to make sure the process won't be migrated to a + different CPU between getting the per-cpu variable and doing the + actual local op. +* When using local ops in interrupt context, no special care must be + taken on a mainline kernel, since they will run on the local CPU with + preemption already disabled. I suggest, however, to explicitly + disable preemption anyway to make sure it will still work correctly on + -rt kernels. +* Reading the local cpu variable will provide the current copy of the + variable. +* Reads of these variables can be done from any CPU, because updates to + "``long``", aligned, variables are always atomic. Since no memory + synchronization is done by the writer CPU, an outdated copy of the + variable can be read when reading some *other* cpu's variables. + + +How to use local atomic operations +================================== + +:: + + #include + #include + + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); + + +Counting +======== + +Counting is done on all the bits of a signed long. + +In preemptible context, use ``get_cpu_var()`` and ``put_cpu_var()`` around +local atomic operations: it makes sure that preemption is disabled around write +access to the per cpu variable. For instance:: + + local_inc(&get_cpu_var(counters)); + put_cpu_var(counters); + +If you are already in a preemption-safe context, you can use +``this_cpu_ptr()`` instead:: + + local_inc(this_cpu_ptr(&counters)); + + + +Reading the counters +==================== + +Those local counters can be read from foreign CPUs to sum the count. Note that +the data seen by local_read across CPUs must be considered to be out of order +relatively to other memory writes happening on the CPU that owns the data:: + + long sum = 0; + for_each_online_cpu(cpu) + sum += local_read(&per_cpu(counters, cpu)); + +If you want to use a remote local_read to synchronize access to a resource +between CPUs, explicit ``smp_wmb()`` and ``smp_rmb()`` memory barriers must be used +respectively on the writer and the reader CPUs. It would be the case if you use +the ``local_t`` variable as a counter of bytes written in a buffer: there should +be a ``smp_wmb()`` between the buffer write and the counter increment and also a +``smp_rmb()`` between the counter read and the buffer read. + + +Here is a sample module which implements a basic per cpu counter using +``local.h``:: + + /* test-local.c + * + * Sample module for local.h usage. + */ + + + #include + #include + #include + + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); + + static struct timer_list test_timer; + + /* IPI called on each CPU. */ + static void test_each(void *info) + { + /* Increment the counter from a non preemptible context */ + printk("Increment on cpu %d\n", smp_processor_id()); + local_inc(this_cpu_ptr(&counters)); + + /* This is what incrementing the variable would look like within a + * preemptible context (it disables preemption) : + * + * local_inc(&get_cpu_var(counters)); + * put_cpu_var(counters); + */ + } + + static void do_test_timer(unsigned long data) + { + int cpu; + + /* Increment the counters */ + on_each_cpu(test_each, NULL, 1); + /* Read all the counters */ + printk("Counters read from CPU %d\n", smp_processor_id()); + for_each_online_cpu(cpu) { + printk("Read : CPU %d, count %ld\n", cpu, + local_read(&per_cpu(counters, cpu))); + } + del_timer(&test_timer); + test_timer.expires = jiffies + 1000; + add_timer(&test_timer); + } + + static int __init test_init(void) + { + /* initialize the timer that will increment the counter */ + init_timer(&test_timer); + test_timer.function = do_test_timer; + test_timer.expires = jiffies + 1; + add_timer(&test_timer); + + return 0; + } + + static void __exit test_exit(void) + { + del_timer_sync(&test_timer); + } + + module_init(test_init); + module_exit(test_exit); + + MODULE_LICENSE("GPL"); + MODULE_AUTHOR("Mathieu Desnoyers"); + MODULE_DESCRIPTION("Local Atomic Ops"); diff --git a/Documentation/core-api/tracepoint.rst b/Documentation/core-api/tracepoint.rst new file mode 100644 index 000000000000..6b44bec0de43 --- /dev/null +++ b/Documentation/core-api/tracepoint.rst @@ -0,0 +1,55 @@ +=============================== +The Linux Kernel Tracepoint API +=============================== + +:Author: Jason Baron +:Author: William Cohen + +Introduction +============ + +Tracepoints are static probe points that are located in strategic points +throughout the kernel. 'Probes' register/unregister with tracepoints via +a callback mechanism. The 'probes' are strictly typed functions that are +passed a unique set of parameters defined by each tracepoint. + +From this simple callback mechanism, 'probes' can be used to profile, +debug, and understand kernel behavior. There are a number of tools that +provide a framework for using 'probes'. These tools include Systemtap, +ftrace, and LTTng. + +Tracepoints are defined in a number of header files via various macros. +Thus, the purpose of this document is to provide a clear accounting of +the available tracepoints. The intention is to understand not only what +tracepoints are available but also to understand where future +tracepoints might be added. + +The API presented has functions of the form: +``trace_tracepointname(function parameters)``. These are the tracepoints +callbacks that are found throughout the code. Registering and +unregistering probes with these callback sites is covered in the +``Documentation/trace/*`` directory. + +IRQ +=== + +.. kernel-doc:: include/trace/events/irq.h + :internal: + +SIGNAL +====== + +.. kernel-doc:: include/trace/events/signal.h + :internal: + +Block IO +======== + +.. kernel-doc:: include/trace/events/block.h + :internal: + +Workqueue +========= + +.. kernel-doc:: include/trace/events/workqueue.h + :internal: diff --git a/Documentation/workqueue.txt b/Documentation/core-api/workqueue.rst similarity index 63% rename from Documentation/workqueue.txt rename to Documentation/core-api/workqueue.rst index c49e3178178d..ffdec94fbca1 100644 --- a/Documentation/workqueue.txt +++ b/Documentation/core-api/workqueue.rst @@ -1,21 +1,14 @@ - +==================================== Concurrency Managed Workqueue (cmwq) +==================================== -September, 2010 Tejun Heo - Florian Mickler - -CONTENTS - -1. Introduction -2. Why cmwq? -3. The Design -4. Application Programming Interface (API) -5. Example Execution Scenarios -6. Guidelines -7. Debugging +:Date: September, 2010 +:Author: Tejun Heo +:Author: Florian Mickler -1. Introduction +Introduction +============ There are many cases where an asynchronous process execution context is needed and the workqueue (wq) API is the most commonly used @@ -32,7 +25,8 @@ there is no work item left on the workqueue the worker becomes idle. When a new work item gets queued, the worker begins executing again. -2. Why cmwq? +Why cmwq? +========= In the original wq implementation, a multi threaded (MT) wq had one worker thread per CPU and a single threaded (ST) wq had one worker @@ -71,7 +65,8 @@ focus on the following goals. the API users don't need to worry about such details. -3. The Design +The Design +========== In order to ease the asynchronous execution of functions a new abstraction, the work item, is introduced. @@ -102,7 +97,7 @@ aspects of the way the work items are executed by setting flags on the workqueue they are putting the work item on. These flags include things like CPU locality, concurrency limits, priority and more. To get a detailed overview refer to the API description of -alloc_workqueue() below. +``alloc_workqueue()`` below. When a work item is queued to a workqueue, the target worker-pool is determined according to the queue parameters and workqueue attributes @@ -136,7 +131,7 @@ them. For unbound workqueues, the number of backing pools is dynamic. Unbound workqueue can be assigned custom attributes using -apply_workqueue_attrs() and workqueue will automatically create +``apply_workqueue_attrs()`` and workqueue will automatically create backing worker pools matching the attributes. The responsibility of regulating concurrency level is on the users. There is also a flag to mark a bound wq to ignore the concurrency management. Please refer to @@ -151,94 +146,95 @@ pressure. Else it is possible that the worker-pool deadlocks waiting for execution contexts to free up. -4. Application Programming Interface (API) +Application Programming Interface (API) +======================================= -alloc_workqueue() allocates a wq. The original create_*workqueue() -functions are deprecated and scheduled for removal. alloc_workqueue() -takes three arguments - @name, @flags and @max_active. @name is the -name of the wq and also used as the name of the rescuer thread if -there is one. +``alloc_workqueue()`` allocates a wq. The original +``create_*workqueue()`` functions are deprecated and scheduled for +removal. ``alloc_workqueue()`` takes three arguments - @``name``, +``@flags`` and ``@max_active``. ``@name`` is the name of the wq and +also used as the name of the rescuer thread if there is one. A wq no longer manages execution resources but serves as a domain for -forward progress guarantee, flush and work item attributes. @flags -and @max_active control how work items are assigned execution +forward progress guarantee, flush and work item attributes. ``@flags`` +and ``@max_active`` control how work items are assigned execution resources, scheduled and executed. -@flags: - - WQ_UNBOUND - - Work items queued to an unbound wq are served by the special - worker-pools which host workers which are not bound to any - specific CPU. This makes the wq behave as a simple execution - context provider without concurrency management. The unbound - worker-pools try to start execution of work items as soon as - possible. Unbound wq sacrifices locality but is useful for - the following cases. - - * Wide fluctuation in the concurrency level requirement is - expected and using bound wq may end up creating large number - of mostly unused workers across different CPUs as the issuer - hops through different CPUs. - - * Long running CPU intensive workloads which can be better - managed by the system scheduler. - - WQ_FREEZABLE - - A freezable wq participates in the freeze phase of the system - suspend operations. Work items on the wq are drained and no - new work item starts execution until thawed. - - WQ_MEM_RECLAIM - - All wq which might be used in the memory reclaim paths _MUST_ - have this flag set. The wq is guaranteed to have at least one - execution context regardless of memory pressure. - - WQ_HIGHPRI - Work items of a highpri wq are queued to the highpri - worker-pool of the target cpu. Highpri worker-pools are - served by worker threads with elevated nice level. - - Note that normal and highpri worker-pools don't interact with - each other. Each maintain its separate pool of workers and - implements concurrency management among its workers. - - WQ_CPU_INTENSIVE - - Work items of a CPU intensive wq do not contribute to the - concurrency level. In other words, runnable CPU intensive - work items will not prevent other work items in the same - worker-pool from starting execution. This is useful for bound - work items which are expected to hog CPU cycles so that their - execution is regulated by the system scheduler. - - Although CPU intensive work items don't contribute to the - concurrency level, start of their executions is still - regulated by the concurrency management and runnable - non-CPU-intensive work items can delay execution of CPU - intensive work items. - - This flag is meaningless for unbound wq. - -Note that the flag WQ_NON_REENTRANT no longer exists as all workqueues -are now non-reentrant - any work item is guaranteed to be executed by -at most one worker system-wide at any given time. - -@max_active: - -@max_active determines the maximum number of execution contexts per -CPU which can be assigned to the work items of a wq. For example, -with @max_active of 16, at most 16 work items of the wq can be +``flags`` +--------- + +``WQ_UNBOUND`` + Work items queued to an unbound wq are served by the special + worker-pools which host workers which are not bound to any + specific CPU. This makes the wq behave as a simple execution + context provider without concurrency management. The unbound + worker-pools try to start execution of work items as soon as + possible. Unbound wq sacrifices locality but is useful for + the following cases. + + * Wide fluctuation in the concurrency level requirement is + expected and using bound wq may end up creating large number + of mostly unused workers across different CPUs as the issuer + hops through different CPUs. + + * Long running CPU intensive workloads which can be better + managed by the system scheduler. + +``WQ_FREEZABLE`` + A freezable wq participates in the freeze phase of the system + suspend operations. Work items on the wq are drained and no + new work item starts execution until thawed. + +``WQ_MEM_RECLAIM`` + All wq which might be used in the memory reclaim paths **MUST** + have this flag set. The wq is guaranteed to have at least one + execution context regardless of memory pressure. + +``WQ_HIGHPRI`` + Work items of a highpri wq are queued to the highpri + worker-pool of the target cpu. Highpri worker-pools are + served by worker threads with elevated nice level. + + Note that normal and highpri worker-pools don't interact with + each other. Each maintain its separate pool of workers and + implements concurrency management among its workers. + +``WQ_CPU_INTENSIVE`` + Work items of a CPU intensive wq do not contribute to the + concurrency level. In other words, runnable CPU intensive + work items will not prevent other work items in the same + worker-pool from starting execution. This is useful for bound + work items which are expected to hog CPU cycles so that their + execution is regulated by the system scheduler. + + Although CPU intensive work items don't contribute to the + concurrency level, start of their executions is still + regulated by the concurrency management and runnable + non-CPU-intensive work items can delay execution of CPU + intensive work items. + + This flag is meaningless for unbound wq. + +Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all +workqueues are now non-reentrant - any work item is guaranteed to be +executed by at most one worker system-wide at any given time. + + +``max_active`` +-------------- + +``@max_active`` determines the maximum number of execution contexts +per CPU which can be assigned to the work items of a wq. For example, +with ``@max_active`` of 16, at most 16 work items of the wq can be executing at the same time per CPU. -Currently, for a bound wq, the maximum limit for @max_active is 512 -and the default value used when 0 is specified is 256. For an unbound -wq, the limit is higher of 512 and 4 * num_possible_cpus(). These -values are chosen sufficiently high such that they are not the -limiting factor while providing protection in runaway cases. +Currently, for a bound wq, the maximum limit for ``@max_active`` is +512 and the default value used when 0 is specified is 256. For an +unbound wq, the limit is higher of 512 and 4 * +``num_possible_cpus()``. These values are chosen sufficiently high +such that they are not the limiting factor while providing protection +in runaway cases. The number of active work items of a wq is usually regulated by the users of the wq, more specifically, by how many work items the users @@ -247,13 +243,14 @@ throttling the number of active work items, specifying '0' is recommended. Some users depend on the strict execution ordering of ST wq. The -combination of @max_active of 1 and WQ_UNBOUND is used to achieve this -behavior. Work items on such wq are always queued to the unbound -worker-pools and only one work item can be active at any given time thus -achieving the same ordering property as ST wq. +combination of ``@max_active`` of 1 and ``WQ_UNBOUND`` is used to +achieve this behavior. Work items on such wq are always queued to the +unbound worker-pools and only one work item can be active at any given +time thus achieving the same ordering property as ST wq. -5. Example Execution Scenarios +Example Execution Scenarios +=========================== The following example execution scenarios try to illustrate how cmwq behave under different configurations. @@ -265,7 +262,7 @@ behave under different configurations. Ignoring all other tasks, works and processing overhead, and assuming simple FIFO scheduling, the following is one highly simplified version -of possible sequences of events with the original wq. +of possible sequences of events with the original wq. :: TIME IN MSECS EVENT 0 w0 starts and burns CPU @@ -279,7 +276,7 @@ of possible sequences of events with the original wq. 40 w2 sleeps 50 w2 wakes up and finishes -And with cmwq with @max_active >= 3, +And with cmwq with ``@max_active`` >= 3, :: TIME IN MSECS EVENT 0 w0 starts and burns CPU @@ -293,7 +290,7 @@ And with cmwq with @max_active >= 3, 20 w1 wakes up and finishes 25 w2 wakes up and finishes -If @max_active == 2, +If ``@max_active`` == 2, :: TIME IN MSECS EVENT 0 w0 starts and burns CPU @@ -308,7 +305,7 @@ If @max_active == 2, 35 w2 wakes up and finishes Now, let's assume w1 and w2 are queued to a different wq q1 which has -WQ_CPU_INTENSIVE set, +``WQ_CPU_INTENSIVE`` set, :: TIME IN MSECS EVENT 0 w0 starts and burns CPU @@ -322,13 +319,15 @@ WQ_CPU_INTENSIVE set, 25 w2 wakes up and finishes -6. Guidelines +Guidelines +========== -* Do not forget to use WQ_MEM_RECLAIM if a wq may process work items - which are used during memory reclaim. Each wq with WQ_MEM_RECLAIM - set has an execution context reserved for it. If there is - dependency among multiple work items used during memory reclaim, - they should be queued to separate wq each with WQ_MEM_RECLAIM. +* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work + items which are used during memory reclaim. Each wq with + ``WQ_MEM_RECLAIM`` set has an execution context reserved for it. If + there is dependency among multiple work items used during memory + reclaim, they should be queued to separate wq each with + ``WQ_MEM_RECLAIM``. * Unless strict ordering is required, there is no need to use ST wq. @@ -337,30 +336,31 @@ WQ_CPU_INTENSIVE set, well under the default limit. * A wq serves as a domain for forward progress guarantee - (WQ_MEM_RECLAIM, flush and work item attributes. Work items which - are not involved in memory reclaim and don't need to be flushed as a - part of a group of work items, and don't require any special - attribute, can use one of the system wq. There is no difference in - execution characteristics between using a dedicated wq and a system - wq. + (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items + which are not involved in memory reclaim and don't need to be + flushed as a part of a group of work items, and don't require any + special attribute, can use one of the system wq. There is no + difference in execution characteristics between using a dedicated wq + and a system wq. * Unless work items are expected to consume a huge amount of CPU cycles, using a bound wq is usually beneficial due to the increased level of locality in wq operations and work item execution. -7. Debugging +Debugging +========= Because the work functions are executed by generic worker threads there are a few tricks needed to shed some light on misbehaving workqueue users. -Worker threads show up in the process list as: +Worker threads show up in the process list as: :: -root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] -root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] -root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] -root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] + root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] + root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] + root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] + root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] If kworkers are going crazy (using too much cpu), there are two types of possible problems: @@ -368,7 +368,7 @@ of possible problems: 1. Something being scheduled in rapid succession 2. A single work item that consumes lots of cpu cycles -The first one can be tracked using tracing: +The first one can be tracked using tracing: :: $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt @@ -380,9 +380,15 @@ the output and the offender can be determined with the work item function. For the second type of problems it should be possible to just check -the stack trace of the offending worker thread. +the stack trace of the offending worker thread. :: $ cat /proc/THE_OFFENDING_KWORKER/stack The work item's function should be trivially visible in the stack trace. + + +Kernel Inline Documentations Reference +====================================== + +.. kernel-doc:: include/linux/workqueue.h diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt index 8d9773f23550..3c355f6ad834 100644 --- a/Documentation/cpu-freq/cpufreq-stats.txt +++ b/Documentation/cpu-freq/cpufreq-stats.txt @@ -44,11 +44,17 @@ the stats driver insertion. total 0 drwxr-xr-x 2 root root 0 May 14 16:06 . drwxr-xr-x 3 root root 0 May 14 15:58 .. +--w------- 1 root root 4096 May 14 16:06 reset -r--r--r-- 1 root root 4096 May 14 16:06 time_in_state -r--r--r-- 1 root root 4096 May 14 16:06 total_trans -r--r--r-- 1 root root 4096 May 14 16:06 trans_table -------------------------------------------------------------------------------- +- reset +Write-only attribute that can be used to reset the stat counters. This can be +useful for evaluating system behaviour under different governors without the +need for a reboot. + - time_in_state This gives the amount of time spent in each of the frequencies supported by this CPU. The cat output will have "