###
# The build process is as follows (targets):
-# (xmldocs)
-# file.tmpl --> file.xml +--> file.ps (psdocs)
-# +--> file.pdf (pdfdocs)
-# +--> DIR=file (htmldocs)
-# +--> man/ (mandocs)
+# (xmldocs) [by docproc]
+# file.tmpl --> file.xml +--> file.ps (psdocs) [by db2ps or xmlto]
+# +--> file.pdf (pdfdocs) [by db2pdf or xmlto]
+# +--> DIR=file (htmldocs) [by xmlto]
+# +--> man/ (mandocs) [by xmlto]
# for PDF and PS output you can choose between xmlto and docbook-utils tools
!Earch/i386/lib/usercopy.c
</sect1>
<sect1><title>More Memory Management Functions</title>
-!Iinclude/linux/rmap.h
!Emm/readahead.c
!Emm/filemap.c
!Emm/memory.c
struct cn_msg *m;
char data[32];
- m = kmalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC);
+ m = kzalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC);
if (m) {
- memset(m, 0, sizeof(*m) + sizeof(data));
memcpy(&m->id, &cn_test_id, sizeof(m->id));
m->seq = cn_test_timer_counter;
If sysfs is enabled, the contents of /sys/class/vtconsole can be
examined. This shows the console backends currently registered by the
-system which are named vtcon<n> where <n> is an integer fro 0 to 15. Thus:
+system which are named vtcon<n> where <n> is an integer from 0 to 15. Thus:
ls /sys/class/vtconsole
. .. vtcon0 vtcon1
EDAC - Error Detection And Correction
-Written by Doug Thompson <norsk5@xmission.com>
+Written by Doug Thompson <dougthompson@xmission.com>
7 Dec 2005
+17 Jul 2007 Updated
-EDAC was written by:
- Thayne Harbaugh,
- modified by Dave Peterson, Doug Thompson, et al,
- from the bluesmoke.sourceforge.net project.
+EDAC is maintained and written by:
+ Doug Thompson, Dave Jiang, Dave Peterson et al,
+ original author: Thayne Harbaugh,
+
+Contact:
+ website: bluesmoke.sourceforge.net
+ mailing list: bluesmoke-devel@lists.sourceforge.net
+
+"bluesmoke" was the name for this device driver when it was "out-of-tree"
+and maintained at sourceforge.net. When it was pushed into 2.6.16 for the
+first time, it was renamed to 'EDAC'.
+
+The bluesmoke project at sourceforge.net is now utilized as a 'staging area'
+for EDAC development, before it is sent upstream to kernel.org
+
+At the bluesmoke/EDAC project site, is a series of quilt patches against
+recent kernels, stored in a SVN respository. For easier downloading, there
+is also a tarball snapshot available.
============================================================================
EDAC PURPOSE
The 'edac' kernel module goal is to detect and report errors that occur
-within the computer system. In the initial release, memory Correctable Errors
-(CE) and Uncorrectable Errors (UE) are the primary errors being harvested.
+within the computer system running under linux.
+
+MEMORY
+
+In the initial release, memory Correctable Errors (CE) and Uncorrectable
+Errors (UE) are the primary errors being harvested. These types of errors
+are harvested by the 'edac_mc' class of device.
Detecting CE events, then harvesting those events and reporting them,
CAN be a predictor of future UE events. With CE events, the system can
proactive part replacement of memory DIMMs exhibiting CEs can reduce
the likelihood of the dreaded UE events and system 'panics'.
+NON-MEMORY
+
+A new feature for EDAC, the edac_device class of device, was added in
+the 2.6.23 version of the kernel.
+
+This new device type allows for non-memory type of ECC hardware detectors
+to have their states harvested and presented to userspace via the sysfs
+interface.
+
+Some architectures have ECC detectors for L1, L2 and L3 caches, along with DMA
+engines, fabric switches, main data path switches, interconnections,
+and various other hardware data paths. If the hardware reports it, then
+a edac_device device probably can be constructed to harvest and present
+that to userspace.
+
+
+PCI BUS SCANNING
In addition, PCI Bus Parity and SERR Errors are scanned for on PCI devices
in order to determine if errors are occurring on data transfers.
+
The presence of PCI Parity errors must be examined with a grain of salt.
There are several add-in adapters that do NOT follow the PCI specification
with regards to Parity generation and reporting. The specification says
to generate parity. Some vendors do not do this, and thus the parity bit
can "float" giving false positives.
-[There are patches in the kernel queue which will allow for storage of
-quirks of PCI devices reporting false parity positives. The 2.6.18
-kernel should have those patches included. When that becomes available,
-then EDAC will be patched to utilize that information to "skip" such
-devices.]
+In the kernel there is a pci device attribute located in sysfs that is
+checked by the EDAC PCI scanning code. If that attribute is set,
+PCI parity/error scannining is skipped for that device. The attribute
+is:
+
+ broken_parity_status
+
+as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
+PCI devices.
+
+FUTURE HARDWARE SCANNING
EDAC will have future error detectors that will be integrated with
EDAC or added to it, in the following list:
============================================================================
EDAC VERSIONING
-EDAC is composed of a "core" module (edac_mc.ko) and several Memory
+EDAC is composed of a "core" module (edac_core.ko) and several Memory
Controller (MC) driver modules. On a given system, the CORE
is loaded and one MC driver will be loaded. Both the CORE and
-the MC driver have individual versions that reflect current release
-level of their respective modules. Thus, to "report" on what version
-a system is running, one must report both the CORE's and the
-MC driver's versions.
+the MC driver (or edac_device driver) have individual versions that reflect
+current release level of their respective modules.
+
+Thus, to "report" on what version a system is running, one must report both
+the CORE's and the MC driver's versions.
LOADING
EDAC presents a 'sysfs' interface for control, reporting and attribute
reporting purposes.
-EDAC lives in the /sys/devices/system/edac directory. Within this directory
-there currently reside 2 'edac' components:
+EDAC lives in the /sys/devices/system/edac directory.
+
+Within this directory there currently reside 2 'edac' components:
mc memory controller(s) system
pci PCI control and status system
Panic on UE control file:
- 'panic_on_ue'
+ 'edac_mc_panic_on_ue'
An uncorrectable error will cause a machine panic. This is usually
desirable. It is a bad idea to continue when an uncorrectable error
LOAD TIME: module/kernel parameter: panic_on_ue=[0|1]
- RUN TIME: echo "1" >/sys/devices/system/edac/mc/panic_on_ue
+ RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_panic_on_ue
Log UE control file:
- 'log_ue'
+ 'edac_mc_log_ue'
Generate kernel messages describing uncorrectable errors. These errors
are reported through the system message log system. UE statistics
LOAD TIME: module/kernel parameter: log_ue=[0|1]
- RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ue
+ RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ue
Log CE control file:
- 'log_ce'
+ 'edac_mc_log_ce'
Generate kernel messages describing correctable errors. These
errors are reported through the system message log system.
LOAD TIME: module/kernel parameter: log_ce=[0|1]
- RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ce
+ RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ce
Polling period control file:
- 'poll_msec'
+ 'edac_mc_poll_msec'
The time period, in milliseconds, for polling for error information.
Too small a value wastes resources. Too large a value might delay
LOAD TIME: module/kernel parameter: poll_msec=[0|1]
- RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec
+ RUN TIME: echo "1000" >/sys/devices/system/edac/mc/edac_mc_poll_msec
============================================================================
=======================================================================
+
+
+EDAC_DEVICE type of device
+
+In the header file, edac_core.h, there is a series of edac_device structures
+and APIs for the EDAC_DEVICE.
+
+User space access to an edac_device is through the sysfs interface.
+
+At the location /sys/devices/system/edac (sysfs) new edac_device devices will
+appear.
+
+There is a three level tree beneath the above 'edac' directory. For example,
+the 'test_device_edac' device (found at the bluesmoke.sourceforget.net website)
+installs itself as:
+
+ /sys/devices/systm/edac/test-instance
+
+in this directory are various controls, a symlink and one or more 'instance'
+directorys.
+
+The standard default controls are:
+
+ log_ce boolean to log CE events
+ log_ue boolean to log UE events
+ panic_on_ue boolean to 'panic' the system if an UE is encountered
+ (default off, can be set true via startup script)
+ poll_msec time period between POLL cycles for events
+
+The test_device_edac device adds at least one of its own custom control:
+
+ test_bits which in the current test driver does nothing but
+ show how it is installed. A ported driver can
+ add one or more such controls and/or attributes
+ for specific uses.
+ One out-of-tree driver uses controls here to allow
+ for ERROR INJECTION operations to hardware
+ injection registers
+
+The symlink points to the 'struct dev' that is registered for this edac_device.
+
+INSTANCES
+
+One or more instance directories are present. For the 'test_device_edac' case:
+
+ test-instance0
+
+
+In this directory there are two default counter attributes, which are totals of
+counter in deeper subdirectories.
+
+ ce_count total of CE events of subdirectories
+ ue_count total of UE events of subdirectories
+
+BLOCKS
+
+At the lowest directory level is the 'block' directory. There can be 0, 1
+or more blocks specified in each instance.
+
+ test-block0
+
+
+In this directory the default attributes are:
+
+ ce_count which is counter of CE events for this 'block'
+ of hardware being monitored
+ ue_count which is counter of UE events for this 'block'
+ of hardware being monitored
+
+
+The 'test_device_edac' device adds 4 attributes and 1 control:
+
+ test-block-bits-0 for every POLL cycle this counter
+ is incremented
+ test-block-bits-1 every 10 cycles, this counter is bumped once,
+ and test-block-bits-0 is set to 0
+ test-block-bits-2 every 100 cycles, this counter is bumped once,
+ and test-block-bits-1 is set to 0
+ test-block-bits-3 every 1000 cycles, this counter is bumped once,
+ and test-block-bits-2 is set to 0
+
+
+ reset-counters writing ANY thing to this control will
+ reset all the above counters.
+
+
+Use of the 'test_device_edac' driver should any others to create their own
+unique drivers for their hardware systems.
+
+The 'test_device_edac' sample driver is located at the
+bluesmoke.sourceforge.net project site for EDAC.
+
What: Video4Linux API 1 ioctls and video_decoder.h from Video devices.
When: December 2006
Files: include/linux/video_decoder.h
+Check: include/linux/video_decoder.h
Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6
series. The old API have lots of drawbacks and don't provide enough
means to work with all video and audio standards. The newer API is
What: remove EXPORT_SYMBOL(kernel_thread)
When: August 2006
Files: arch/*/kernel/*_ksyms.c
-Funcs: kernel_thread
+Check: kernel_thread
Why: kernel_thread is a low-level implementation detail. Drivers should
use the <linux/kthread.h> API instead which shields them from
implementation details and provides a higherlevel interface that
---------------------------
+What: vm_ops.nopage
+When: Soon, provided in-kernel callers have been converted
+Why: This interface is replaced by vm_ops.fault, but it has been around
+ forever, is used by a lot of drivers, and doesn't cost much to
+ maintain.
+Who: Nick Piggin <npiggin@suse.de>
+
+---------------------------
+
What: Interrupt only SA_* flags
When: September 2007
Why: The interrupt related SA_* flags are replaced by IRQF_* to move them
---------------------------
-What: i2c-isa
-When: December 2006
-Why: i2c-isa is a non-sense and doesn't fit in the device driver
- model. Drivers relying on it are better implemented as platform
- drivers.
-Who: Jean Delvare <khali@linux-fr.org>
-
----------------------------
-
What: i2c_adapter.list
When: July 2007
Why: Superfluous, this list duplicates the one maintained by the driver
prototypes:
void (*open)(struct vm_area_struct*);
void (*close)(struct vm_area_struct*);
+ int (*fault)(struct vm_area_struct*, struct vm_fault *);
struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);
+ int (*page_mkwrite)(struct vm_area_struct *, struct page *);
locking rules:
- BKL mmap_sem
+ BKL mmap_sem PageLocked(page)
open: no yes
close: no yes
+fault: no yes
nopage: no yes
+page_mkwrite: no yes no
+
+ ->page_mkwrite() is called when a previously read-only page is
+about to become writeable. The file system is responsible for
+protecting against truncate races. Once appropriate action has been
+taking to lock out truncate, the page range should be verified to be
+within i_size. The page mapping should also be checked that it is not
+NULL.
================================================================================
Dubious stuff
{
struct simple_child *simple_child;
- simple_child = kmalloc(sizeof(struct simple_child), GFP_KERNEL);
+ simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
if (!simple_child)
return NULL;
- memset(simple_child, 0, sizeof(struct simple_child));
config_item_init_type_name(&simple_child->item, name,
&simple_child_type);
{
struct simple_children *simple_children;
- simple_children = kmalloc(sizeof(struct simple_children),
+ simple_children = kzalloc(sizeof(struct simple_children),
GFP_KERNEL);
if (!simple_children)
return NULL;
- memset(simple_children, 0, sizeof(struct simple_children));
config_group_init_type_name(&simple_children->group, name,
&simple_children_type);
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
2.13 /proc/<pid>/oom_score - Display current oom-killer score
2.14 /proc/<pid>/io - Display the IO accounting fields
+ 2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
------------------------------------------------------------------------------
Preface
resume it if we have a value of 3 or more percent; consider information about
the amount of free space valid for 30 seconds
+audit_argv_kb
+-------------
+
+The file contains a single value denoting the limit on the argv array size
+for execve (in KiB). This limit is only applied when system call auditing for
+execve is enabled, otherwise the value is ignored.
+
ctrl-alt-del
------------
More information about this can be found within the taskstats documentation in
Documentation/accounting.
+2.15 /proc/<pid>/coredump_filter - Core dump filtering settings
+---------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory. Conversely,
+sometimes we want to save file-backed memory segments into a core file, not
+only the individual files.
+
+/proc/<pid>/coredump_filter allows you to customize which memory segments
+will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
+of memory types. If a bit of the bitmask is set, memory segments of the
+corresponding memory type are dumped, otherwise they are not dumped.
+
+The following 4 memory types are supported:
+ - (bit 0) anonymous private memory
+ - (bit 1) anonymous shared memory
+ - (bit 2) file-backed private memory
+ - (bit 3) file-backed shared memory
+
+ Note that MMIO pages such as frame buffer are never dumped and vDSO pages
+ are always dumped regardless of the bitmask status.
+
+Default value of coredump_filter is 0x3; this means all anonymous memory
+segments are dumped.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 1 to the process's proc file.
+
+ $ echo 0x1 > /proc/1234/coredump_filter
+
+When a new process is created, the process inherits the bitmask status from its
+parent. It is useful to set up coredump_filter before the program runs.
+For example:
+
+ $ echo 0x7 > /proc/self/coredump_filter
+ $ ./some_program
+
------------------------------------------------------------------------------
If you stick to this convention then it'll be easier for other developers to
see what your code is doing, and help maintain it.
+Note that these operations include I/O barriers on platforms which need to
+use them; drivers don't need to add them explicitly.
+
Identifying GPIOs
-----------------
=======================
Supported chips:
- * Abit uGuru revision 1-3 (Hardware Monitor part only)
+ * Abit uGuru revision 1 & 2 (Hardware Monitor part only)
Prefix: 'abituguru'
Addresses scanned: ISA 0x0E0
Datasheet: Not available, this driver is based on reverse engineering.
uGuru 2.1.0.0 ~ 2.1.2.8 (AS8, AV8, AA8, AG8, AA8XE, AX8)
uGuru 2.2.0.0 ~ 2.2.0.6 (AA8 Fatal1ty)
uGuru 2.3.0.0 ~ 2.3.0.9 (AN8)
- uGuru 3.0.0.0 ~ 3.0.1.2 (AW8, AL8, NI8)
- uGuru 4.xxxxx? (AT8 32X) (2)
+ uGuru 3.0.0.0 ~ 3.0.x.x (AW8, AL8, AT8, NI8 SLI, AT8 32X, AN8 32X,
+ AW9D-MAX) (2)
1) For revisions 2 and 3 uGuru's the driver can autodetect the
sensortype (Volt or Temp) for bank1 sensors, for revision 1 uGuru's
this doesnot always work. For these uGuru's the autodection can
bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1
You may also need to specify the fan_sensors option for these boards
fan_sensors=5
- 2) The current version of the abituguru driver is known to NOT work
- on these Motherboards
+ 2) There is a seperate abituguru3 driver for these motherboards,
+ the abituguru (without the 3 !) driver will not work on these
+ motherboards (and visa versa)!
Authors:
Hans de Goede <j.w.r.degoede@hhs.nl>,
-----------------
* force: bool Force detection. Note this parameter only causes the
- detection to be skipped, if the uGuru can't be read
- the module initialization (insmod) will still fail.
+ detection to be skipped, and thus the insmod to
+ succeed. If the uGuru can't be read the actual hwmon
+ driver will not load and thus no hwmon device will get
+ registered.
* bank1_types: int[] Bank1 sensortype autodetection override:
-1 autodetect (default)
0 volt sensor
Description
-----------
-This driver supports the hardware monitoring features of the Abit uGuru chip
-found on Abit uGuru featuring motherboards (most modern Abit motherboards).
+This driver supports the hardware monitoring features of the first and
+second revision of the Abit uGuru chip found on Abit uGuru featuring
+motherboards (most modern Abit motherboards).
-The uGuru chip in reality is a Winbond W83L950D in disguise (despite Abit
-claiming it is "a new microprocessor designed by the ABIT Engineers").
-Unfortunatly this doesn't help since the W83L950D is a generic
-microcontroller with a custom Abit application running on it.
+The first and second revision of the uGuru chip in reality is a Winbond
+W83L950D in disguise (despite Abit claiming it is "a new microprocessor
+designed by the ABIT Engineers"). Unfortunatly this doesn't help since the
+W83L950D is a generic microcontroller with a custom Abit application running
+on it.
Despite Abit not releasing any information regarding the uGuru, Olle
Sandberg <ollebull@gmail.com> has managed to reverse engineer the sensor part
--- /dev/null
+Kernel driver abituguru3
+========================
+
+Supported chips:
+ * Abit uGuru revision 3 (Hardware Monitor part, reading only)
+ Prefix: 'abituguru3'
+ Addresses scanned: ISA 0x0E0
+ Datasheet: Not available, this driver is based on reverse engineering.
+ Note:
+ The uGuru is a microcontroller with onboard firmware which programs
+ it to behave as a hwmon IC. There are many different revisions of the
+ firmware and thus effectivly many different revisions of the uGuru.
+ Below is an incomplete list with which revisions are used for which
+ Motherboards:
+ uGuru 1.00 ~ 1.24 (AI7, KV8-MAX3, AN7)
+ uGuru 2.0.0.0 ~ 2.0.4.2 (KV8-PRO)
+ uGuru 2.1.0.0 ~ 2.1.2.8 (AS8, AV8, AA8, AG8, AA8XE, AX8)
+ uGuru 2.3.0.0 ~ 2.3.0.9 (AN8)
+ uGuru 3.0.0.0 ~ 3.0.x.x (AW8, AL8, AT8, NI8 SLI, AT8 32X, AN8 32X,
+ AW9D-MAX)
+ The abituguru3 driver is only for revison 3.0.x.x motherboards,
+ this driver will not work on older motherboards. For older
+ motherboards use the abituguru (without the 3 !) driver.
+
+Authors:
+ Hans de Goede <j.w.r.degoede@hhs.nl>,
+ (Initial reverse engineering done by Louis Kruger)
+
+
+Module Parameters
+-----------------
+
+* force: bool Force detection. Note this parameter only causes the
+ detection to be skipped, and thus the insmod to
+ succeed. If the uGuru can't be read the actual hwmon
+ driver will not load and thus no hwmon device will get
+ registered.
+* verbose: bool Should the driver be verbose?
+ 0/off/false normal output
+ 1/on/true + verbose error reporting (default)
+ Default: 1 (the driver is still in the testing phase)
+
+Description
+-----------
+
+This driver supports the hardware monitoring features of the third revision of
+the Abit uGuru chip, found on recent Abit uGuru featuring motherboards.
+
+The 3rd revision of the uGuru chip in reality is a Winbond W83L951G.
+Unfortunatly this doesn't help since the W83L951G is a generic microcontroller
+with a custom Abit application running on it.
+
+Despite Abit not releasing any information regarding the uGuru revision 3,
+Louis Kruger has managed to reverse engineer the sensor part of the uGuru.
+Without his work this driver would not have been possible.
+
+Known Issues
+------------
+
+The voltage and frequency control parts of the Abit uGuru are not supported,
+neither is writing any of the sensor settings and writing / reading the
+fanspeed control registers (FanEQ)
+
+If you encounter any problems please mail me <j.w.r.degoede@hhs.nl> and
+include the output of: "dmesg | grep abituguru"
--- /dev/null
+Kernel driver dme1737
+=====================
+
+Supported chips:
+ * SMSC DME1737 and compatibles (like Asus A8000)
+ Prefix: 'dme1737'
+ Addresses scanned: I2C 0x2c, 0x2d, 0x2e
+ Datasheet: Provided by SMSC upon request and under NDA
+
+Authors:
+ Juerg Haefliger <juergh@gmail.com>
+
+
+Module Parameters
+-----------------
+
+* force_start: bool Enables the monitoring of voltage, fan and temp inputs
+ and PWM output control functions. Using this parameter
+ shouldn't be required since the BIOS usually takes care
+ of this.
+
+Note that there is no need to use this parameter if the driver loads without
+complaining. The driver will say so if it is necessary.
+
+
+Description
+-----------
+
+This driver implements support for the hardware monitoring capabilities of the
+SMSC DME1737 and Asus A8000 (which are the same) Super-I/O chips. This chip
+features monitoring of 3 temp sensors temp[1-3] (2 remote diodes and 1
+internal), 7 voltages in[0-6] (6 external and 1 internal) and 6 fan speeds
+fan[1-6]. Additionally, the chip implements 5 PWM outputs pwm[1-3,5-6] for
+controlling fan speeds both manually and automatically.
+
+Fan[3-6] and pwm[3,5-6] are optional features and their availability is
+dependent on the configuration of the chip. The driver will detect which
+features are present during initialization and create the sysfs attributes
+accordingly.
+
+
+Voltage Monitoring
+------------------
+
+The voltage inputs are sampled with 12-bit resolution and have internal
+scaling resistors. The values returned by the driver therefore reflect true
+millivolts and don't need scaling. The voltage inputs are mapped as follows
+(the last column indicates the input ranges):
+
+ in0: +5VTR (+5V standby) 0V - 6.64V
+ in1: Vccp (processor core) 0V - 3V
+ in2: VCC (internal +3.3V) 0V - 4.38V
+ in3: +5V 0V - 6.64V
+ in4: +12V 0V - 16V
+ in5: VTR (+3.3V standby) 0V - 4.38V
+ in6: Vbat (+3.0V) 0V - 4.38V
+
+Each voltage input has associated min and max limits which trigger an alarm
+when crossed.
+
+
+Temperature Monitoring
+----------------------
+
+Temperatures are measured with 12-bit resolution and reported in millidegree
+Celsius. The chip also features offsets for all 3 temperature inputs which -
+when programmed - get added to the input readings. The chip does all the
+scaling by itself and the driver therefore reports true temperatures that don't
+need any user-space adjustments. The temperature inputs are mapped as follows
+(the last column indicates the input ranges):
+
+ temp1: Remote diode 1 (3904 type) temperature -127C - +127C
+ temp2: DME1737 internal temperature -127C - +127C
+ temp3: Remote diode 2 (3904 type) temperature -127C - +127C
+
+Each temperature input has associated min and max limits which trigger an alarm
+when crossed. Additionally, each temperature input has a fault attribute that
+returns 1 when a faulty diode or an unconnected input is detected and 0
+otherwise.
+
+
+Fan Monitoring
+--------------
+
+Fan RPMs are measured with 16-bit resolution. The chip provides inputs for 6
+fan tachometers. All 6 inputs have an associated min limit which triggers an
+alarm when crossed. Fan inputs 1-4 provide type attributes that need to be set
+to the number of pulses per fan revolution that the connected tachometer
+generates. Supported values are 1, 2, and 4. Fan inputs 5-6 only support fans
+that generate 2 pulses per revolution. Fan inputs 5-6 also provide a max
+attribute that needs to be set to the maximum attainable RPM (fan at 100% duty-
+cycle) of the input. The chip adjusts the sampling rate based on this value.
+
+
+PWM Output Control
+------------------
+
+This chip features 5 PWM outputs. PWM outputs 1-3 are associated with fan
+inputs 1-3 and PWM outputs 5-6 are associated with fan inputs 5-6. PWM outputs
+1-3 can be configured to operate either in manual or automatic mode by setting
+the appropriate enable attribute accordingly. PWM outputs 5-6 can only operate
+in manual mode, their enable attributes are therefore read-only. When set to
+manual mode, the fan speed is set by writing the duty-cycle value to the
+appropriate PWM attribute. In automatic mode, the PWM attribute returns the
+current duty-cycle as set by the fan controller in the chip. All PWM outputs
+support the setting of the output frequency via the freq attribute.
+
+In automatic mode, the chip supports the setting of the PWM ramp rate which
+defines how fast the PWM output is adjusting to changes of the associated
+temperature input. Associating PWM outputs to temperature inputs is done via
+temperature zones. The chip features 3 zones whose assignments to temperature
+inputs is static and determined during initialization. These assignments can
+be retrieved via the zone[1-3]_auto_channels_temp attributes. Each PWM output
+is assigned to one (or hottest of multiple) temperature zone(s) through the
+pwm[1-3]_auto_channels_zone attributes. Each PWM output has 3 distinct output
+duty-cycles: full, low, and min. Full is internally hard-wired to 255 (100%)
+and low and min can be programmed via pwm[1-3]_auto_point1_pwm and
+pwm[1-3]_auto_pwm_min, respectively. The thermal thresholds of the zones are
+programmed via zone[1-3]_auto_point[1-3]_temp and
+zone[1-3]_auto_point1_temp_hyst:
+
+ pwm[1-3]_auto_point2_pwm full-speed duty-cycle (255, i.e., 100%)
+ pwm[1-3]_auto_point1_pwm low-speed duty-cycle
+ pwm[1-3]_auto_pwm_min min-speed duty-cycle
+
+ zone[1-3]_auto_point3_temp full-speed temp (all outputs)
+ zone[1-3]_auto_point2_temp full-speed temp
+ zone[1-3]_auto_point1_temp low-speed temp
+ zone[1-3]_auto_point1_temp_hyst min-speed temp
+
+The chip adjusts the output duty-cycle linearly in the range of auto_point1_pwm
+to auto_point2_pwm if the temperature of the associated zone is between
+auto_point1_temp and auto_point2_temp. If the temperature drops below the
+auto_point1_temp_hyst value, the output duty-cycle is set to the auto_pwm_min
+value which only supports two values: 0 or auto_point1_pwm. That means that the
+fan either turns completely off or keeps spinning with the low-speed
+duty-cycle. If any of the temperatures rise above the auto_point3_temp value,
+all PWM outputs are set to 100% duty-cycle.
+
+Following is another representation of how the chip sets the output duty-cycle
+based on the temperature of the associated thermal zone:
+
+ Duty-Cycle Duty-Cycle
+ Temperature Rising Temp Falling Temp
+ ----------- ----------- ------------
+ full-speed full-speed full-speed
+
+ < linearly adjusted duty-cycle >
+
+ low-speed low-speed low-speed
+ min-speed low-speed
+ min-speed min-speed min-speed
+ min-speed min-speed
+
+
+Sysfs Attributes
+----------------
+
+Following is a list of all sysfs attributes that the driver provides, their
+permissions and a short description:
+
+Name Perm Description
+---- ---- -----------
+cpu0_vid RO CPU core reference voltage in
+ millivolts.
+vrm RW Voltage regulator module version
+ number.
+
+in[0-6]_input RO Measured voltage in millivolts.
+in[0-6]_min RW Low limit for voltage input.
+in[0-6]_max RW High limit for voltage input.
+in[0-6]_alarm RO Voltage input alarm. Returns 1 if
+ voltage input is or went outside the
+ associated min-max range, 0 otherwise.
+
+temp[1-3]_input RO Measured temperature in millidegree
+ Celsius.
+temp[1-3]_min RW Low limit for temp input.
+temp[1-3]_max RW High limit for temp input.
+temp[1-3]_offset RW Offset for temp input. This value will
+ be added by the chip to the measured
+ temperature.
+temp[1-3]_alarm RO Alarm for temp input. Returns 1 if temp
+ input is or went outside the associated
+ min-max range, 0 otherwise.
+temp[1-3]_fault RO Temp input fault. Returns 1 if the chip
+ detects a faulty thermal diode or an
+ unconnected temp input, 0 otherwise.
+
+zone[1-3]_auto_channels_temp RO Temperature zone to temperature input
+ mapping. This attribute is a bitfield
+ and supports the following values:
+ 1: temp1
+ 2: temp2
+ 4: temp3
+zone[1-3]_auto_point1_temp_hyst RW Auto PWM temp point1 hysteresis. The
+ output of the corresponding PWM is set
+ to the pwm_auto_min value if the temp
+ falls below the auto_point1_temp_hyst
+ value.
+zone[1-3]_auto_point[1-3]_temp RW Auto PWM temp points. Auto_point1 is
+ the low-speed temp, auto_point2 is the
+ full-speed temp, and auto_point3 is the
+ temp at which all PWM outputs are set
+ to full-speed (100% duty-cycle).
+
+fan[1-6]_input RO Measured fan speed in RPM.
+fan[1-6]_min RW Low limit for fan input.
+fan[1-6]_alarm RO Alarm for fan input. Returns 1 if fan
+ input is or went below the associated
+ min value, 0 otherwise.
+fan[1-4]_type RW Type of attached fan. Expressed in
+ number of pulses per revolution that
+ the fan generates. Supported values are
+ 1, 2, and 4.
+fan[5-6]_max RW Max attainable RPM at 100% duty-cycle.
+ Required for chip to adjust the
+ sampling rate accordingly.
+
+pmw[1-3,5-6] RO/RW Duty-cycle of PWM output. Supported
+ values are 0-255 (0%-100%). Only
+ writeable if the associated PWM is in
+ manual mode.
+pwm[1-3]_enable RW Enable of PWM outputs 1-3. Supported
+ values are:
+ 0: turned off (output @ 100%)
+ 1: manual mode
+ 2: automatic mode
+pwm[5-6]_enable RO Enable of PWM outputs 5-6. Always
+ returns 1 since these 2 outputs are
+ hard-wired to manual mode.
+pmw[1-3,5-6]_freq RW Frequency of PWM output. Supported
+ values are in the range 11Hz-30000Hz
+ (default is 25000Hz).
+pmw[1-3]_ramp_rate RW Ramp rate of PWM output. Determines how
+ fast the PWM duty-cycle will change
+ when the PWM is in automatic mode.
+ Expressed in ms per PWM step. Supported
+ values are in the range 0ms-206ms
+ (default is 0, which means the duty-
+ cycle changes instantly).
+pwm[1-3]_auto_channels_zone RW PWM output to temperature zone mapping.
+ This attribute is a bitfield and
+ supports the following values:
+ 1: zone1
+ 2: zone2
+ 4: zone3
+ 6: highest of zone[2-3]
+ 7: highest of zone[1-3]
+pwm[1-3]_auto_pwm_min RW Auto PWM min pwm. Minimum PWM duty-
+ cycle. Supported values are 0 or
+ auto_point1_pwm.
+pwm[1-3]_auto_point1_pwm RW Auto PWM pwm point. Auto_point1 is the
+ low-speed duty-cycle.
+pwm[1-3]_auto_point2_pwm RO Auto PWM pwm point. Auto_point2 is the
+ full-speed duty-cycle which is hard-
+ wired to 255 (100% duty-cycle).
* Fintek F71805F/FG
Prefix: 'f71805f'
Addresses scanned: none, address read from Super I/O config space
- Datasheet: Provided by Fintek on request
+ Datasheet: Available from the Fintek website
* Fintek F71872F/FG
Prefix: 'f71872f'
Addresses scanned: none, address read from Super I/O config space
- Datasheet: Provided by Fintek on request
+ Datasheet: Available from the Fintek website
Author: Jean Delvare <khali@linux-fr.org>
When the PWM method is used, you can select the operating frequency,
from 187.5 kHz (default) to 31 Hz. The best frequency depends on the
fan model. As a rule of thumb, lower frequencies seem to give better
-control, but may generate annoying high-pitch noise. Fintek recommends
+control, but may generate annoying high-pitch noise. So a frequency just
+above the audible range, such as 25 kHz, may be a good choice; if this
+doesn't give you good linear control, try reducing it. Fintek recommends
not going below 1 kHz, as the fan tachometers get confused by lower
frequencies as well.
corresponds to a pwm value of 106 for the driver. The driver doesn't
enforce this limit though.
-Three different fan control modes are supported:
+Three different fan control modes are supported; the mode number is written
+to the pwm<n>_enable file.
-* Manual mode
- You ask for a specific PWM duty cycle or DC voltage.
+* 1: Manual mode
+ You ask for a specific PWM duty cycle or DC voltage by writing to the
+ pwm<n> file.
-* Fan speed mode
- You ask for a specific fan speed. This mode assumes that pwm1
- corresponds to fan1, pwm2 to fan2 and pwm3 to fan3.
+* 2: Temperature mode
+ You define 3 temperature/fan speed trip points using the
+ pwm<n>_auto_point<m>_temp and _fan files. These define a staircase
+ relationship between temperature and fan speed with two additional points
+ interpolated between the values that you define. When the temperature
+ is below auto_point1_temp the fan is switched off.
-* Temperature mode
- You define 3 temperature/fan speed trip points, and the fan speed is
- adjusted depending on the measured temperature, using interpolation.
- This mode is not yet supported by the driver.
+* 3: Fan speed mode
+ You ask for a specific fan speed by writing to the fan<n>_target file.
+
+Both of the automatic modes require that pwm1 corresponds to fan1, pwm2 to
+fan2 and pwm3 to fan3. Temperature mode also requires that temp1 corresponds
+to pwm1 and fan1, etc.
Addresses scanned: from Super I/O config space (8 I/O ports)
Datasheet: Publicly available at the ITE website
http://www.ite.com.tw/
- * IT8716F
+ * IT8716F/IT8726F
Prefix: 'it8716'
Addresses scanned: from Super I/O config space (8 I/O ports)
Datasheet: Publicly available at the ITE website
http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP
+ http://www.ite.com.tw/product_info/file/pc/IT8726F_V0.3.pdf
* IT8718F
Prefix: 'it8718'
Addresses scanned: from Super I/O config space (8 I/O ports)
-----------
This driver implements support for the IT8705F, IT8712F, IT8716F,
-IT8718F and SiS950 chips.
+IT8718F, IT8726F and SiS950 chips.
These chips are 'Super I/O chips', supporting floppy disks, infrared ports,
joysticks and other miscellaneous stuff. For hardware monitoring, they
revisions. For now, the driver only uses the 16-bit mode on the
IT8716F and IT8718F.
+The IT8726F is just bit enhanced IT8716F with additional hardware
+for AMD power sequencing. Therefore the chip will appear as IT8716F
+to userspace applications.
+
Temperatures are measured in degrees Celsius. An alarm is triggered once
when the Overtemperature Shutdown limit is crossed.
Addresses scanned: I2C 0x4c, 0x4d (unsupported 0x4e)
Datasheet: Publicly available at the Maxim website
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2578
+ * Maxim MAX6680
+ Prefix: 'max6680'
+ Addresses scanned: I2C 0x18, 0x19, 0x1a, 0x29, 0x2a, 0x2b,
+ 0x4c, 0x4d and 0x4e
+ Datasheet: Publicly available at the Maxim website
+ http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3370
+ * Maxim MAX6681
+ Prefix: 'max6680'
+ Addresses scanned: I2C 0x18, 0x19, 0x1a, 0x29, 0x2a, 0x2b,
+ 0x4c, 0x4d and 0x4e
+ Datasheet: Publicly available at the Maxim website
+ http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3370
Author: Jean Delvare <khali@linux-fr.org>
The LM90 is a digital temperature sensor. It senses its own temperature as
well as the temperature of up to one external diode. It is compatible
with many other devices such as the LM86, the LM89, the LM99, the ADM1032,
-the MAX6657, MAX6658 and the MAX6659 all of which are supported by this driver.
-Note that there is no easy way to differentiate between the last three
-variants. The extra address and features of the MAX6659 are not supported by
-this driver. Additionally, the ADT7461 is supported if found in ADM1032
-compatibility mode.
+the MAX6657, MAX6658, MAX6659, MAX6680 and the MAX6681 all of which are
+supported by this driver.
+
+Note that there is no easy way to differentiate between the MAX6657,
+MAX6658 and MAX6659 variants. The extra address and features of the
+MAX6659 are not supported by this driver. The MAX6680 and MAX6681 only
+differ in their pinout, therefore they obviously can't (and don't need to)
+be distinguished. Additionally, the ADT7461 is supported if found in
+ADM1032 compatibility mode.
The specificity of this family of chipsets over the ADM1021/LM84
family is that it features critical limits with hysteresis, and an
* ALERT is triggered by open remote sensor.
* SMBus PEC support for Write Byte and Receive Byte transactions.
-ADT7461
+ADT7461:
* Extended temperature range (breaks compatibility)
* Lower resolution for remote temperature
MAX6657 and MAX6658:
* Remote sensor type selection
-MAX6659
+MAX6659:
* Selectable address
* Second critical temperature limit
* Remote sensor type selection
+MAX6680 and MAX6681:
+ * Selectable address
+ * Remote sensor type selection
+
All temperature values are given in degrees Celsius. Resolution
is 1.0 degree for the local temperature, 0.125 degree for the remote
temperature.
Additionally, the ADM1032 doesn't support SMBus Send Byte with PEC.
Instead, it will try to write the PEC value to the register (because the
SMBus Send Byte transaction with PEC is similar to a Write Byte transaction
-without PEC), which is not what we want. Thus, PEC is explicitely disabled
+without PEC), which is not what we want. Thus, PEC is explicitly disabled
on SMBus Send Byte transactions in the lm90 driver.
PEC on byte data transactions represents a significant increase in bandwidth
--- /dev/null
+Kernel driver lm93
+==================
+
+Supported chips:
+ * National Semiconductor LM93
+ Prefix 'lm93'
+ Addresses scanned: I2C 0x2c-0x2e
+ Datasheet: http://www.national.com/ds.cgi/LM/LM93.pdf
+
+Author:
+ Mark M. Hoffman <mhoffman@lightlink.com>
+ Ported to 2.6 by Eric J. Bowersox <ericb@aspsys.com>
+ Adapted to 2.6.20 by Carsten Emde <ce@osadl.org>
+ Modified for mainline integration by Hans J. Koch <hjk@linutronix.de>
+
+Module Parameters
+-----------------
+
+(specific to LM93)
+* init: integer
+ Set to non-zero to force some initializations (default is 0).
+* disable_block: integer
+ A "0" allows SMBus block data transactions if the host supports them. A "1"
+ disables SMBus block data transactions. The default is 0.
+* vccp_limit_type: integer array (2)
+ Configures in7 and in8 limit type, where 0 means absolute and non-zero
+ means relative. "Relative" here refers to "Dynamic Vccp Monitoring using
+ VID" from the datasheet. It greatly simplifies the interface to allow
+ only one set of limits (absolute or relative) to be in operation at a
+ time (even though the hardware is capable of enabling both). There's
+ not a compelling use case for enabling both at once, anyway. The default
+ is "0,0".
+* vid_agtl: integer
+ A "0" configures the VID pins for V(ih) = 2.1V min, V(il) = 0.8V max.
+ A "1" configures the VID pins for V(ih) = 0.8V min, V(il) = 0.4V max.
+ (The latter setting is referred to as AGTL+ Compatible in the datasheet.)
+ I.e. this parameter controls the VID pin input thresholds; if your VID
+ inputs are not working, try changing this. The default value is "0".
+
+(common among sensor drivers)
+* force: short array (min = 1, max = 48)
+ List of adapter,address pairs to assume to be present. Autodetection
+ of the target device will still be attempted. Use one of the more
+ specific force directives below if this doesn't detect the device.
+* force_lm93: short array (min = 1, max = 48)
+ List of adapter,address pairs which are unquestionably assumed to contain
+ a 'lm93' chip
+* ignore: short array (min = 1, max = 48)
+ List of adapter,address pairs not to scan
+* ignore_range: short array (min = 1, max = 48)
+ List of adapter,start-addr,end-addr triples not to scan
+* probe: short array (min = 1, max = 48)
+ List of adapter,address pairs to scan additionally
+* probe_range: short array (min = 1, max = 48)
+ List of adapter,start-addr,end-addr triples to scan additionally
+
+
+Hardware Description
+--------------------
+
+(from the datasheet)
+
+The LM93, hardware monitor, has a two wire digital interface compatible with
+SMBus 2.0. Using an 8-bit ADC, the LM93 measures the temperature of two remote
+diode connected transistors as well as its own die and 16 power supply
+voltages. To set fan speed, the LM93 has two PWM outputs that are each
+controlled by up to four temperature zones. The fancontrol algorithm is lookup
+table based. The LM93 includes a digital filter that can be invoked to smooth
+temperature readings for better control of fan speed. The LM93 has four
+tachometer inputs to measure fan speed. Limit and status registers for all
+measured values are included. The LM93 builds upon the functionality of
+previous motherboard management ASICs and uses some of the LM85 s features
+(i.e. smart tachometer mode). It also adds measurement and control support
+for dynamic Vccp monitoring and PROCHOT. It is designed to monitor a dual
+processor Xeon class motherboard with a minimum of external components.
+
+
+Driver Description
+------------------
+
+This driver implements support for the National Semiconductor LM93.
+
+
+User Interface
+--------------
+
+#PROCHOT:
+
+The LM93 can monitor two #PROCHOT signals. The results are found in the
+sysfs files prochot1, prochot2, prochot1_avg, prochot2_avg, prochot1_max,
+and prochot2_max. prochot1_max and prochot2_max contain the user limits
+for #PROCHOT1 and #PROCHOT2, respectively. prochot1 and prochot2 contain
+the current readings for the most recent complete time interval. The
+value of prochot1_avg and prochot2_avg is something like a 2 period
+exponential moving average (but not quite - check the datasheet). Note
+that this third value is calculated by the chip itself. All values range
+from 0-255 where 0 indicates no throttling, and 255 indicates > 99.6%.
+
+The monitoring intervals for the two #PROCHOT signals is also configurable.
+These intervals can be found in the sysfs files prochot1_interval and
+prochot2_interval. The values in these files specify the intervals for
+#P1_PROCHOT and #P2_PROCHOT, respectively. Selecting a value not in this
+list will cause the driver to use the next largest interval. The available
+intervals are:
+
+#PROCHOT intervals: 0.73, 1.46, 2.9, 5.8, 11.7, 23.3, 46.6, 93.2, 186, 372
+
+It is possible to configure the LM93 to logically short the two #PROCHOT
+signals. I.e. when #P1_PROCHOT is asserted, the LM93 will automatically
+assert #P2_PROCHOT, and vice-versa. This mode is enabled by writing a
+non-zero integer to the sysfs file prochot_short.
+
+The LM93 can also override the #PROCHOT pins by driving a PWM signal onto
+one or both of them. When overridden, the signal has a period of 3.56 mS,
+a minimum pulse width of 5 clocks (at 22.5kHz => 6.25% duty cycle), and
+a maximum pulse width of 80 clocks (at 22.5kHz => 99.88% duty cycle).
+
+The sysfs files prochot1_override and prochot2_override contain boolean
+intgers which enable or disable the override function for #P1_PROCHOT and
+#P2_PROCHOT, respectively. The sysfs file prochot_override_duty_cycle
+contains a value controlling the duty cycle for the PWM signal used when
+the override function is enabled. This value ranges from 0 to 15, with 0
+indicating minimum duty cycle and 15 indicating maximum.
+
+#VRD_HOT:
+
+The LM93 can monitor two #VRD_HOT signals. The results are found in the
+sysfs files vrdhot1 and vrdhot2. There is one value per file: a boolean for
+which 1 indicates #VRD_HOT is asserted and 0 indicates it is negated. These
+files are read-only.
+
+Smart Tach Mode:
+
+(from the datasheet)
+
+ If a fan is driven using a low-side drive PWM, the tachometer
+ output of the fan is corrupted. The LM93 includes smart tachometer
+ circuitry that allows an accurate tachometer reading to be
+ achieved despite the signal corruption. In smart tach mode all
+ four signals are measured within 4 seconds.
+
+Smart tach mode is enabled by the driver by writing 1 or 2 (associating the
+the fan tachometer with a pwm) to the sysfs file fan<n>_smart_tach. A zero
+will disable the function for that fan. Note that Smart tach mode cannot be
+enabled if the PWM output frequency is 22500 Hz (see below).
+
+Manual PWM:
+
+The LM93 has a fixed or override mode for the two PWM outputs (although, there
+are still some conditions that will override even this mode - see section
+15.10.6 of the datasheet for details.) The sysfs files pwm1_override
+and pwm2_override are used to enable this mode; each is a boolean integer
+where 0 disables and 1 enables the manual control mode. The sysfs files pwm1
+and pwm2 are used to set the manual duty cycle; each is an integer (0-255)
+where 0 is 0% duty cycle, and 255 is 100%. Note that the duty cycle values
+are constrained by the hardware. Selecting a value which is not available
+will cause the driver to use the next largest value. Also note: when manual
+PWM mode is disabled, the value of pwm1 and pwm2 indicates the current duty
+cycle chosen by the h/w.
+
+PWM Output Frequency:
+
+The LM93 supports several different frequencies for the PWM output channels.
+The sysfs files pwm1_freq and pwm2_freq are used to select the frequency. The
+frequency values are constrained by the hardware. Selecting a value which is
+not available will cause the driver to use the next largest value. Also note
+that this parameter has implications for the Smart Tach Mode (see above).
+
+PWM Output Frequencies: 12, 36, 48, 60, 72, 84, 96, 22500 (h/w default)
+
+Automatic PWM:
+
+The LM93 is capable of complex automatic fan control, with many different
+points of configuration. To start, each PWM output can be bound to any
+combination of eight control sources. The final PWM is the largest of all
+individual control sources to which the PWM output is bound.
+
+The eight control sources are: temp1-temp4 (aka "zones" in the datasheet),
+#PROCHOT 1 & 2, and #VRDHOT 1 & 2. The bindings are expressed as a bitmask
+in the sysfs files pwm<n>_auto_channels, where a "1" enables the binding, and
+ a "0" disables it. The h/w default is 0x0f (all temperatures bound).
+
+ 0x01 - Temp 1
+ 0x02 - Temp 2
+ 0x04 - Temp 3
+ 0x08 - Temp 4
+ 0x10 - #PROCHOT 1
+ 0x20 - #PROCHOT 2
+ 0x40 - #VRDHOT 1
+ 0x80 - #VRDHOT 2
+
+The function y = f(x) takes a source temperature x to a PWM output y. This
+function of the LM93 is derived from a base temperature and a table of 12
+temperature offsets. The base temperature is expressed in degrees C in the
+sysfs files temp<n>_auto_base. The offsets are expressed in cumulative
+degrees C, with the value of offset <i> for temperature value <n> being
+contained in the file temp<n>_auto_offset<i>. E.g. if the base temperature
+is 40C:
+
+ offset # temp<n>_auto_offset<i> range pwm
+ 1 0 - 25.00%
+ 2 0 - 28.57%
+ 3 1 40C - 41C 32.14%
+ 4 1 41C - 42C 35.71%
+ 5 2 42C - 44C 39.29%
+ 6 2 44C - 46C 42.86%
+ 7 2 48C - 50C 46.43%
+ 8 2 50C - 52C 50.00%
+ 9 2 52C - 54C 53.57%
+ 10 2 54C - 56C 57.14%
+ 11 2 56C - 58C 71.43%
+ 12 2 58C - 60C 85.71%
+ > 60C 100.00%
+
+Valid offsets are in the range 0C <= x <= 7.5C in 0.5C increments.
+
+There is an independent base temperature for each temperature channel. Note,
+however, there are only two tables of offsets: one each for temp[12] and
+temp[34]. Therefore, any change to e.g. temp1_auto_offset<i> will also
+affect temp2_auto_offset<i>.
+
+The LM93 can also apply hysteresis to the offset table, to prevent unwanted
+oscillation between two steps in the offsets table. These values are found in
+the sysfs files temp<n>_auto_offset_hyst. The value in this file has the
+same representation as in temp<n>_auto_offset<i>.
+
+If a temperature reading falls below the base value for that channel, the LM93
+will use the minimum PWM value. These values are found in the sysfs files
+temp<n>_auto_pwm_min. Note, there are only two minimums: one each for temp[12]
+and temp[34]. Therefore, any change to e.g. temp1_auto_pwm_min will also
+affect temp2_auto_pwm_min.
+
+PWM Spin-Up Cycle:
+
+A spin-up cycle occurs when a PWM output is commanded from 0% duty cycle to
+some value > 0%. The LM93 supports a minimum duty cycle during spin-up. These
+values are found in the sysfs files pwm<n>_auto_spinup_min. The value in this
+file has the same representation as other PWM duty cycle values. The
+duration of the spin-up cycle is also configurable. These values are found in
+the sysfs files pwm<n>_auto_spinup_time. The value in this file is
+the spin-up time in seconds. The available spin-up times are constrained by
+the hardware. Selecting a value which is not available will cause the driver
+to use the next largest value.
+
+Spin-up Durations: 0 (disabled, h/w default), 0.1, 0.25, 0.4, 0.7, 1.0,
+ 2.0, 4.0
+
+#PROCHOT and #VRDHOT PWM Ramping:
+
+If the #PROCHOT or #VRDHOT signals are asserted while bound to a PWM output
+channel, the LM93 will ramp the PWM output up to 100% duty cycle in discrete
+steps. The duration of each step is configurable. There are two files, with
+one value each in seconds: pwm_auto_prochot_ramp and pwm_auto_vrdhot_ramp.
+The available ramp times are constrained by the hardware. Selecting a value
+which is not available will cause the driver to use the next largest value.
+
+Ramp Times: 0 (disabled, h/w default) to 0.75 in 0.05 second intervals
+
+Fan Boost:
+
+For each temperature channel, there is a boost temperature: if the channel
+exceeds this limit, the LM93 will immediately drive both PWM outputs to 100%.
+This limit is expressed in degrees C in the sysfs files temp<n>_auto_boost.
+There is also a hysteresis temperature for this function: after the boost
+limit is reached, the temperature channel must drop below this value before
+the boost function is disabled. This temperature is also expressed in degrees
+C in the sysfs files temp<n>_auto_boost_hyst.
+
+GPIO Pins:
+
+The LM93 can monitor the logic level of four dedicated GPIO pins as well as the
+four tach input pins. GPIO0-GPIO3 correspond to (fan) tach 1-4, respectively.
+All eight GPIOs are read by reading the bitmask in the sysfs file gpio. The
+LSB is GPIO0, and the MSB is GPIO7.
+
+
+LM93 Unique sysfs Files
+-----------------------
+
+ file description
+ -------------------------------------------------------------
+
+ prochot<n> current #PROCHOT %
+
+ prochot<n>_avg moving average #PROCHOT %
+
+ prochot<n>_max limit #PROCHOT %
+
+ prochot_short enable or disable logical #PROCHOT pin short
+
+ prochot<n>_override force #PROCHOT assertion as PWM
+
+ prochot_override_duty_cycle
+ duty cycle for the PWM signal used when
+ #PROCHOT is overridden
+
+ prochot<n>_interval #PROCHOT PWM sampling interval
+
+ vrdhot<n> 0 means negated, 1 means asserted
+
+ fan<n>_smart_tach enable or disable smart tach mode
+
+ pwm<n>_auto_channels select control sources for PWM outputs
+
+ pwm<n>_auto_spinup_min minimum duty cycle during spin-up
+
+ pwm<n>_auto_spinup_time duration of spin-up
+
+ pwm_auto_prochot_ramp ramp time per step when #PROCHOT asserted
+
+ pwm_auto_vrdhot_ramp ramp time per step when #VRDHOT asserted
+
+ temp<n>_auto_base temperature channel base
+
+ temp<n>_auto_offset[1-12]
+ temperature channel offsets
+
+ temp<n>_auto_offset_hyst
+ temperature channel offset hysteresis
+
+ temp<n>_auto_boost temperature channel boost (PWMs to 100%) limit
+
+ temp<n>_auto_boost_hyst temperature channel boost hysteresis
+
+ gpio input state of 8 GPIO pins; read-only
+
+
+Sample Configuration File
+-------------------------
+
+Here is a sample LM93 chip config for sensors.conf:
+
+---------- cut here ----------
+chip "lm93-*"
+
+# VOLTAGE INPUTS
+
+ # labels and scaling based on datasheet recommendations
+ label in1 "+12V1"
+ compute in1 @ * 12.945, @ / 12.945
+ set in1_min 12 * 0.90
+ set in1_max 12 * 1.10
+
+ label in2 "+12V2"
+ compute in2 @ * 12.945, @ / 12.945
+ set in2_min 12 * 0.90
+ set in2_max 12 * 1.10
+
+ label in3 "+12V3"
+ compute in3 @ * 12.945, @ / 12.945
+ set in3_min 12 * 0.90
+ set in3_max 12 * 1.10
+
+ label in4 "FSB_Vtt"
+
+ label in5 "3GIO"
+
+ label in6 "ICH_Core"
+
+ label in7 "Vccp1"
+
+ label in8 "Vccp2"
+
+ label in9 "+3.3V"
+ set in9_min 3.3 * 0.90
+ set in9_max 3.3 * 1.10
+
+ label in10 "+5V"
+ set in10_min 5.0 * 0.90
+ set in10_max 5.0 * 1.10
+
+ label in11 "SCSI_Core"
+
+ label in12 "Mem_Core"
+
+ label in13 "Mem_Vtt"
+
+ label in14 "Gbit_Core"
+
+ # Assuming R1/R2 = 4.1143, and 3.3V reference
+ # -12V = (4.1143 + 1) * (@ - 3.3) + 3.3
+ label in15 "-12V"
+ compute in15 @ * 5.1143 - 13.57719, (@ + 13.57719) / 5.1143
+ set in15_min -12 * 0.90
+ set in15_max -12 * 1.10
+
+ label in16 "+3.3VSB"
+ set in16_min 3.3 * 0.90
+ set in16_max 3.3 * 1.10
+
+# TEMPERATURE INPUTS
+
+ label temp1 "CPU1"
+ label temp2 "CPU2"
+ label temp3 "LM93"
+
+# TACHOMETER INPUTS
+
+ label fan1 "Fan1"
+ set fan1_min 3000
+ label fan2 "Fan2"
+ set fan2_min 3000
+ label fan3 "Fan3"
+ set fan3_min 3000
+ label fan4 "Fan4"
+ set fan4_min 3000
+
+# PWM OUTPUTS
+
+ label pwm1 "CPU1"
+ label pwm2 "CPU2"
+
Supported chips:
* SMSC LPC47B397-NC
* SMSC SCH5307-NS
+ * SMSC SCH5317
Prefix: 'smsc47b397'
Addresses scanned: none, address read from Super I/O config space
Datasheet: In this file
provided by Craig Kelly (In-Store Broadcast Network) and edited/corrected
by Mark M. Hoffman <mhoffman@lightlink.com>.
-[1] And SMSC SCH5307-NS, which has a different device ID but is otherwise
-compatible.
+[1] And SMSC SCH5307-NS and SCH5317, which have different device IDs but are
+otherwise compatible.
* * * * *
The registers of interest for identifying the SIO on the dc7100 are Device ID
(0x20) and Device Rev (0x21).
-The Device ID will read 0x6F (for SCH5307-NS, 0x81)
+The Device ID will read 0x6F (0x81 for SCH5307-NS, and 0x85 for SCH5317)
The Device Rev currently reads 0x01
Obtaining the HWM Base Address.
255 is max or 100%.
pwm[1-*]_enable
- Switch PWM on and off.
- Not always present even if pwmN is.
- 0: turn off
- 1: turn on in manual mode
- 2+: turn on in automatic mode
+ Fan speed control method:
+ 0: no fan speed control (i.e. fan at full speed)
+ 1: manual fan speed control enabled (using pwm[1-*])
+ 2+: automatic fan speed control enabled
Check individual chip documentation files for automatic mode
details.
RW
supports it. When this boolean has value 1, the measurement for that
channel should not be trusted.
-in[0-*]_input_fault
-fan[1-*]_input_fault
-temp[1-*]_input_fault
+in[0-*]_fault
+fan[1-*]_fault
+temp[1-*]_fault
Input fault condition
0: no fault occured
1: fault condition
W83627DHG super I/O chips. We will refer to them collectively as Winbond chips.
The chips implement three temperature sensors, five fan rotation
-speed sensors, ten analog voltage sensors (only nine for the 627DHG), alarms
-with beep warnings (control unimplemented), and some automatic fan regulation
-strategies (plus manual fan control mode).
+speed sensors, ten analog voltage sensors (only nine for the 627DHG), one
+VID (6 pins), alarms with beep warnings (control unimplemented), and
+some automatic fan regulation strategies (plus manual fan control mode).
Temperatures are measured in degrees Celsius and measurement resolution is 1
degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
The third parameter may be a text as in this example, but it may also
be an expanded variable or a macro.
+ cc-fullversion
+ cc-fullversion is useful when the exact version of gcc is needed.
+ One typical use-case is when a specific GCC version is broken.
+ cc-fullversion points out a more specific version than cc-version does.
+
+ Example:
+ #arch/powerpc/Makefile
+ $(Q)if test "$(call cc-fullversion)" = "040200" ; then \
+ echo -n '*** GCC-4.2.0 cannot compile the 64-bit powerpc ' ; \
+ false ; \
+ fi
+
+ In this example for a specific GCC version the build will error out explaining
+ to the user why it stops.
=== 4 Host Program support
fastcall, or anything else that affects how args are passed, the
handler's declaration must match.
-NOTE: A macro JPROBE_ENTRY is provided to handle architecture-specific
-aliasing of jp->entry. In the interest of portability, it is advised
-to use:
-
- jp->entry = JPROBE_ENTRY(handler);
-
register_jprobe() returns 0 on success, or a negative errno otherwise.
4.3 register_kretprobe
}
static struct jprobe my_jprobe = {
- .entry = JPROBE_ENTRY(jdo_fork)
+ .entry = jdo_fork
};
static int __init jprobe_init(void)
--- /dev/null
+# This creates the demonstration utility "lguest" which runs a Linux guest.
+
+# For those people that have a separate object dir, look there for .config
+KBUILD_OUTPUT := ../..
+ifdef O
+ ifeq ("$(origin O)", "command line")
+ KBUILD_OUTPUT := $(O)
+ endif
+endif
+# We rely on CONFIG_PAGE_OFFSET to know where to put lguest binary.
+include $(KBUILD_OUTPUT)/.config
+LGUEST_GUEST_TOP := ($(CONFIG_PAGE_OFFSET) - 0x08000000)
+
+CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 \
+ -static -DLGUEST_GUEST_TOP="$(LGUEST_GUEST_TOP)" -Wl,-T,lguest.lds
+LDLIBS:=-lz
+
+all: lguest.lds lguest
+
+# The linker script on x86 is so complex the only way of creating one
+# which will link our binary in the right place is to mangle the
+# default one.
+lguest.lds:
+ $(LD) --verbose | awk '/^==========/ { PRINT=1; next; } /SIZEOF_HEADERS/ { gsub(/0x[0-9A-F]*/, "$(LGUEST_GUEST_TOP)") } { if (PRINT) print $$0; }' > $@
+
+clean:
+ rm -f lguest.lds lguest
--- /dev/null
+/* Simple program to layout "physical" memory for new lguest guest.
+ * Linked high to avoid likely physical memory. */
+#define _LARGEFILE64_SOURCE
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <err.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <elf.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <ctype.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <time.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <linux/sockios.h>
+#include <linux/if_tun.h>
+#include <sys/uio.h>
+#include <termios.h>
+#include <getopt.h>
+#include <zlib.h>
+typedef unsigned long long u64;
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+#include "../../include/linux/lguest_launcher.h"
+#include "../../include/asm-i386/e820.h"
+
+#define PAGE_PRESENT 0x7 /* Present, RW, Execute */
+#define NET_PEERNUM 1
+#define BRIDGE_PFX "bridge:"
+#ifndef SIOCBRADDIF
+#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
+#endif
+
+static bool verbose;
+#define verbose(args...) \
+ do { if (verbose) printf(args); } while(0)
+static int waker_fd;
+
+struct device_list
+{
+ fd_set infds;
+ int max_infd;
+
+ struct device *dev;
+ struct device **lastdev;
+};
+
+struct device
+{
+ struct device *next;
+ struct lguest_device_desc *desc;
+ void *mem;
+
+ /* Watch this fd if handle_input non-NULL. */
+ int fd;
+ bool (*handle_input)(int fd, struct device *me);
+
+ /* Watch DMA to this key if handle_input non-NULL. */
+ unsigned long watch_key;
+ u32 (*handle_output)(int fd, const struct iovec *iov,
+ unsigned int num, struct device *me);
+
+ /* Device-specific data. */
+ void *priv;
+};
+
+static int open_or_die(const char *name, int flags)
+{
+ int fd = open(name, flags);
+ if (fd < 0)
+ err(1, "Failed to open %s", name);
+ return fd;
+}
+
+static void *map_zeroed_pages(unsigned long addr, unsigned int num)
+{
+ static int fd = -1;
+
+ if (fd == -1)
+ fd = open_or_die("/dev/zero", O_RDONLY);
+
+ if (mmap((void *)addr, getpagesize() * num,
+ PROT_READ|PROT_WRITE|PROT_EXEC, MAP_FIXED|MAP_PRIVATE, fd, 0)
+ != (void *)addr)
+ err(1, "Mmaping %u pages of /dev/zero @%p", num, (void *)addr);
+ return (void *)addr;
+}
+
+/* Find magic string marking entry point, return entry point. */
+static unsigned long entry_point(void *start, void *end,
+ unsigned long page_offset)
+{
+ void *p;
+
+ for (p = start; p < end; p++)
+ if (memcmp(p, "GenuineLguest", strlen("GenuineLguest")) == 0)
+ return (long)p + strlen("GenuineLguest") + page_offset;
+
+ err(1, "Is this image a genuine lguest?");
+}
+
+/* Returns the entry point */
+static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr,
+ unsigned long *page_offset)
+{
+ void *addr;
+ Elf32_Phdr phdr[ehdr->e_phnum];
+ unsigned int i;
+ unsigned long start = -1UL, end = 0;
+
+ /* Sanity checks. */
+ if (ehdr->e_type != ET_EXEC
+ || ehdr->e_machine != EM_386
+ || ehdr->e_phentsize != sizeof(Elf32_Phdr)
+ || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr))
+ errx(1, "Malformed elf header");
+
+ if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0)
+ err(1, "Seeking to program headers");
+ if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr))
+ err(1, "Reading program headers");
+
+ *page_offset = 0;
+ /* We map the loadable segments at virtual addresses corresponding
+ * to their physical addresses (our virtual == guest physical). */
+ for (i = 0; i < ehdr->e_phnum; i++) {
+ if (phdr[i].p_type != PT_LOAD)
+ continue;
+
+ verbose("Section %i: size %i addr %p\n",
+ i, phdr[i].p_memsz, (void *)phdr[i].p_paddr);
+
+ /* We expect linear address space. */
+ if (!*page_offset)
+ *page_offset = phdr[i].p_vaddr - phdr[i].p_paddr;
+ else if (*page_offset != phdr[i].p_vaddr - phdr[i].p_paddr)
+ errx(1, "Page offset of section %i different", i);
+
+ if (phdr[i].p_paddr < start)
+ start = phdr[i].p_paddr;
+ if (phdr[i].p_paddr + phdr[i].p_filesz > end)
+ end = phdr[i].p_paddr + phdr[i].p_filesz;
+
+ /* We map everything private, writable. */
+ addr = mmap((void *)phdr[i].p_paddr,
+ phdr[i].p_filesz,
+ PROT_READ|PROT_WRITE|PROT_EXEC,
+ MAP_FIXED|MAP_PRIVATE,
+ elf_fd, phdr[i].p_offset);
+ if (addr != (void *)phdr[i].p_paddr)
+ err(1, "Mmaping vmlinux seg %i gave %p not %p",
+ i, addr, (void *)phdr[i].p_paddr);
+ }
+
+ return entry_point((void *)start, (void *)end, *page_offset);
+}
+
+/* This is amazingly reliable. */
+static unsigned long intuit_page_offset(unsigned char *img, unsigned long len)
+{
+ unsigned int i, possibilities[256] = { 0 };
+
+ for (i = 0; i + 4 < len; i++) {
+ /* mov 0xXXXXXXXX,%eax */
+ if (img[i] == 0xA1 && ++possibilities[img[i+4]] > 3)
+ return (unsigned long)img[i+4] << 24;
+ }
+ errx(1, "could not determine page offset");
+}
+
+static unsigned long unpack_bzimage(int fd, unsigned long *page_offset)
+{
+ gzFile f;
+ int ret, len = 0;
+ void *img = (void *)0x100000;
+
+ f = gzdopen(fd, "rb");
+ while ((ret = gzread(f, img + len, 65536)) > 0)
+ len += ret;
+ if (ret < 0)
+ err(1, "reading image from bzImage");
+
+ verbose("Unpacked size %i addr %p\n", len, img);
+ *page_offset = intuit_page_offset(img, len);
+
+ return entry_point(img, img + len, *page_offset);
+}
+
+static unsigned long load_bzimage(int fd, unsigned long *page_offset)
+{
+ unsigned char c;
+ int state = 0;
+
+ /* Ugly brute force search for gzip header. */
+ while (read(fd, &c, 1) == 1) {
+ switch (state) {
+ case 0:
+ if (c == 0x1F)
+ state++;
+ break;
+ case 1:
+ if (c == 0x8B)
+ state++;
+ else
+ state = 0;
+ break;
+ case 2 ... 8:
+ state++;
+ break;
+ case 9:
+ lseek(fd, -10, SEEK_CUR);
+ if (c != 0x03) /* Compressed under UNIX. */
+ state = -1;
+ else
+ return unpack_bzimage(fd, page_offset);
+ }
+ }
+ errx(1, "Could not find kernel in bzImage");
+}
+
+static unsigned long load_kernel(int fd, unsigned long *page_offset)
+{
+ Elf32_Ehdr hdr;
+
+ if (read(fd, &hdr, sizeof(hdr)) != sizeof(hdr))
+ err(1, "Reading kernel");
+
+ if (memcmp(hdr.e_ident, ELFMAG, SELFMAG) == 0)
+ return map_elf(fd, &hdr, page_offset);
+
+ return load_bzimage(fd, page_offset);
+}
+
+static inline unsigned long page_align(unsigned long addr)
+{
+ return ((addr + getpagesize()-1) & ~(getpagesize()-1));
+}
+
+/* initrd gets loaded at top of memory: return length. */
+static unsigned long load_initrd(const char *name, unsigned long mem)
+{
+ int ifd;
+ struct stat st;
+ unsigned long len;
+ void *iaddr;
+
+ ifd = open_or_die(name, O_RDONLY);
+ if (fstat(ifd, &st) < 0)
+ err(1, "fstat() on initrd '%s'", name);
+
+ len = page_align(st.st_size);
+ iaddr = mmap((void *)mem - len, st.st_size,
+ PROT_READ|PROT_EXEC|PROT_WRITE,
+ MAP_FIXED|MAP_PRIVATE, ifd, 0);
+ if (iaddr != (void *)mem - len)
+ err(1, "Mmaping initrd '%s' returned %p not %p",
+ name, iaddr, (void *)mem - len);
+ close(ifd);
+ verbose("mapped initrd %s size=%lu @ %p\n", name, st.st_size, iaddr);
+ return len;
+}
+
+static unsigned long setup_pagetables(unsigned long mem,
+ unsigned long initrd_size,
+ unsigned long page_offset)
+{
+ u32 *pgdir, *linear;
+ unsigned int mapped_pages, i, linear_pages;
+ unsigned int ptes_per_page = getpagesize()/sizeof(u32);
+
+ /* If we can map all of memory above page_offset, we do so. */
+ if (mem <= -page_offset)
+ mapped_pages = mem/getpagesize();
+ else
+ mapped_pages = -page_offset/getpagesize();
+
+ /* Each linear PTE page can map ptes_per_page pages. */
+ linear_pages = (mapped_pages + ptes_per_page-1)/ptes_per_page;
+
+ /* We lay out top-level then linear mapping immediately below initrd */
+ pgdir = (void *)mem - initrd_size - getpagesize();
+ linear = (void *)pgdir - linear_pages*getpagesize();
+
+ for (i = 0; i < mapped_pages; i++)
+ linear[i] = ((i * getpagesize()) | PAGE_PRESENT);
+
+ /* Now set up pgd so that this memory is at page_offset */
+ for (i = 0; i < mapped_pages; i += ptes_per_page) {
+ pgdir[(i + page_offset/getpagesize())/ptes_per_page]
+ = (((u32)linear + i*sizeof(u32)) | PAGE_PRESENT);
+ }
+
+ verbose("Linear mapping of %u pages in %u pte pages at %p\n",
+ mapped_pages, linear_pages, linear);
+
+ return (unsigned long)pgdir;
+}
+
+static void concat(char *dst, char *args[])
+{
+ unsigned int i, len = 0;
+
+ for (i = 0; args[i]; i++) {
+ strcpy(dst+len, args[i]);
+ strcat(dst+len, " ");
+ len += strlen(args[i]) + 1;
+ }
+ /* In case it's empty. */
+ dst[len] = '\0';
+}
+
+static int tell_kernel(u32 pgdir, u32 start, u32 page_offset)
+{
+ u32 args[] = { LHREQ_INITIALIZE,
+ LGUEST_GUEST_TOP/getpagesize(), /* Just below us */
+ pgdir, start, page_offset };
+ int fd;
+
+ fd = open_or_die("/dev/lguest", O_RDWR);
+ if (write(fd, args, sizeof(args)) < 0)
+ err(1, "Writing to /dev/lguest");
+ return fd;
+}
+
+static void set_fd(int fd, struct device_list *devices)
+{
+ FD_SET(fd, &devices->infds);
+ if (fd > devices->max_infd)
+ devices->max_infd = fd;
+}
+
+/* When input arrives, we tell the kernel to kick lguest out with -EAGAIN. */
+static void wake_parent(int pipefd, int lguest_fd, struct device_list *devices)
+{
+ set_fd(pipefd, devices);
+
+ for (;;) {
+ fd_set rfds = devices->infds;
+ u32 args[] = { LHREQ_BREAK, 1 };
+
+ select(devices->max_infd+1, &rfds, NULL, NULL, NULL);
+ if (FD_ISSET(pipefd, &rfds)) {
+ int ignorefd;
+ if (read(pipefd, &ignorefd, sizeof(ignorefd)) == 0)
+ exit(0);
+ FD_CLR(ignorefd, &devices->infds);
+ } else
+ write(lguest_fd, args, sizeof(args));
+ }
+}
+
+static int setup_waker(int lguest_fd, struct device_list *device_list)
+{
+ int pipefd[2], child;
+
+ pipe(pipefd);
+ child = fork();
+ if (child == -1)
+ err(1, "forking");
+
+ if (child == 0) {
+ close(pipefd[1]);
+ wake_parent(pipefd[0], lguest_fd, device_list);
+ }
+ close(pipefd[0]);
+
+ return pipefd[1];
+}
+
+static void *_check_pointer(unsigned long addr, unsigned int size,
+ unsigned int line)
+{
+ if (addr >= LGUEST_GUEST_TOP || addr + size >= LGUEST_GUEST_TOP)
+ errx(1, "%s:%i: Invalid address %li", __FILE__, line, addr);
+ return (void *)addr;
+}
+#define check_pointer(addr,size) _check_pointer(addr, size, __LINE__)
+
+/* Returns pointer to dma->used_len */
+static u32 *dma2iov(unsigned long dma, struct iovec iov[], unsigned *num)
+{
+ unsigned int i;
+ struct lguest_dma *udma;
+
+ udma = check_pointer(dma, sizeof(*udma));
+ for (i = 0; i < LGUEST_MAX_DMA_SECTIONS; i++) {
+ if (!udma->len[i])
+ break;
+
+ iov[i].iov_base = check_pointer(udma->addr[i], udma->len[i]);
+ iov[i].iov_len = udma->len[i];
+ }
+ *num = i;
+ return &udma->used_len;
+}
+
+static u32 *get_dma_buffer(int fd, void *key,
+ struct iovec iov[], unsigned int *num, u32 *irq)
+{
+ u32 buf[] = { LHREQ_GETDMA, (u32)key };
+ unsigned long udma;
+ u32 *res;
+
+ udma = write(fd, buf, sizeof(buf));
+ if (udma == (unsigned long)-1)
+ return NULL;
+
+ /* Kernel stashes irq in ->used_len. */
+ res = dma2iov(udma, iov, num);
+ *irq = *res;
+ return res;
+}
+
+static void trigger_irq(int fd, u32 irq)
+{
+ u32 buf[] = { LHREQ_IRQ, irq };
+ if (write(fd, buf, sizeof(buf)) != 0)
+ err(1, "Triggering irq %i", irq);
+}
+
+static void discard_iovec(struct iovec *iov, unsigned int *num)
+{
+ static char discard_buf[1024];
+ *num = 1;
+ iov->iov_base = discard_buf;
+ iov->iov_len = sizeof(discard_buf);
+}
+
+static struct termios orig_term;
+static void restore_term(void)
+{
+ tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
+}
+
+struct console_abort
+{
+ int count;
+ struct timeval start;
+};
+
+/* We DMA input to buffer bound at start of console page. */
+static bool handle_console_input(int fd, struct device *dev)
+{
+ u32 irq = 0, *lenp;
+ int len;
+ unsigned int num;
+ struct iovec iov[LGUEST_MAX_DMA_SECTIONS];
+ struct console_abort *abort = dev->priv;
+
+ lenp = get_dma_buffer(fd, dev->mem, iov, &num, &irq);
+ if (!lenp) {
+ warn("console: no dma buffer!");
+ discard_iovec(iov, &num);
+ }
+
+ len = readv(dev->fd, iov, num);
+ if (len <= 0) {
+ warnx("Failed to get console input, ignoring console.");
+ len = 0;
+ }
+
+ if (lenp) {
+ *lenp = len;
+ trigger_irq(fd, irq);
+ }
+
+ /* Three ^C within one second? Exit. */
+ if (len == 1 && ((char *)iov[0].iov_base)[0] == 3) {
+ if (!abort->count++)
+ gettimeofday(&abort->start, NULL);
+ else if (abort->count == 3) {
+ struct timeval now;
+ gettimeofday(&now, NULL);
+ if (now.tv_sec <= abort->start.tv_sec+1) {
+ /* Make sure waker is not blocked in BREAK */
+ u32 args[] = { LHREQ_BREAK, 0 };
+ close(waker_fd);
+ write(fd, args, sizeof(args));
+ exit(2);
+ }
+ abort->count = 0;
+ }
+ } else
+ abort->count = 0;
+
+ if (!len) {
+ restore_term();
+ return false;
+ }
+ return true;
+}
+
+static u32 handle_console_output(int fd, const struct iovec *iov,
+ unsigned num, struct device*dev)
+{
+ return writev(STDOUT_FILENO, iov, num);
+}
+
+static u32 handle_tun_output(int fd, const struct iovec *iov,
+ unsigned num, struct device *dev)
+{
+ /* Now we've seen output, we should warn if we can't get buffers. */
+ *(bool *)dev->priv = true;
+ return writev(dev->fd, iov, num);
+}
+
+static unsigned long peer_offset(unsigned int peernum)
+{
+ return 4 * peernum;
+}
+
+static bool handle_tun_input(int fd, struct device *dev)
+{
+ u32 irq = 0, *lenp;
+ int len;
+ unsigned num;
+ struct iovec iov[LGUEST_MAX_DMA_SECTIONS];
+
+ lenp = get_dma_buffer(fd, dev->mem+peer_offset(NET_PEERNUM), iov, &num,
+ &irq);
+ if (!lenp) {
+ if (*(bool *)dev->priv)
+ warn("network: no dma buffer!");
+ discard_iovec(iov, &num);
+ }
+
+ len = readv(dev->fd, iov, num);
+ if (len <= 0)
+ err(1, "reading network");
+ if (lenp) {
+ *lenp = len;
+ trigger_irq(fd, irq);
+ }
+ verbose("tun input packet len %i [%02x %02x] (%s)\n", len,
+ ((u8 *)iov[0].iov_base)[0], ((u8 *)iov[0].iov_base)[1],
+ lenp ? "sent" : "discarded");
+ return true;
+}
+
+static u32 handle_block_output(int fd, const struct iovec *iov,
+ unsigned num, struct device *dev)
+{
+ struct lguest_block_page *p = dev->mem;
+ u32 irq, *lenp;
+ unsigned int len, reply_num;
+ struct iovec reply[LGUEST_MAX_DMA_SECTIONS];
+ off64_t device_len, off = (off64_t)p->sector * 512;
+
+ device_len = *(off64_t *)dev->priv;
+
+ if (off >= device_len)
+ err(1, "Bad offset %llu vs %llu", off, device_len);
+ if (lseek64(dev->fd, off, SEEK_SET) != off)
+ err(1, "Bad seek to sector %i", p->sector);
+
+ verbose("Block: %s at offset %llu\n", p->type ? "WRITE" : "READ", off);
+
+ lenp = get_dma_buffer(fd, dev->mem, reply, &reply_num, &irq);
+ if (!lenp)
+ err(1, "Block request didn't give us a dma buffer");
+
+ if (p->type) {
+ len = writev(dev->fd, iov, num);
+ if (off + len > device_len) {
+ ftruncate(dev->fd, device_len);
+ errx(1, "Write past end %llu+%u", off, len);
+ }
+ *lenp = 0;
+ } else {
+ len = readv(dev->fd, reply, reply_num);
+ *lenp = len;
+ }
+
+ p->result = 1 + (p->bytes != len);
+ trigger_irq(fd, irq);
+ return 0;
+}
+
+static void handle_output(int fd, unsigned long dma, unsigned long key,
+ struct device_list *devices)
+{
+ struct device *i;
+ u32 *lenp;
+ struct iovec iov[LGUEST_MAX_DMA_SECTIONS];
+ unsigned num = 0;
+
+ lenp = dma2iov(dma, iov, &num);
+ for (i = devices->dev; i; i = i->next) {
+ if (i->handle_output && key == i->watch_key) {
+ *lenp = i->handle_output(fd, iov, num, i);
+ return;
+ }
+ }
+ warnx("Pending dma %p, key %p", (void *)dma, (void *)key);
+}
+
+static void handle_input(int fd, struct device_list *devices)
+{
+ struct timeval poll = { .tv_sec = 0, .tv_usec = 0 };
+
+ for (;;) {
+ struct device *i;
+ fd_set fds = devices->infds;
+
+ if (select(devices->max_infd+1, &fds, NULL, NULL, &poll) == 0)
+ break;
+
+ for (i = devices->dev; i; i = i->next) {
+ if (i->handle_input && FD_ISSET(i->fd, &fds)) {
+ if (!i->handle_input(fd, i)) {
+ FD_CLR(i->fd, &devices->infds);
+ /* Tell waker to ignore it too... */
+ write(waker_fd, &i->fd, sizeof(i->fd));
+ }
+ }
+ }
+ }
+}
+
+static struct lguest_device_desc *new_dev_desc(u16 type, u16 features,
+ u16 num_pages)
+{
+ static unsigned long top = LGUEST_GUEST_TOP;
+ struct lguest_device_desc *desc;
+
+ desc = malloc(sizeof(*desc));
+ desc->type = type;
+ desc->num_pages = num_pages;
+ desc->features = features;
+ desc->status = 0;
+ if (num_pages) {
+ top -= num_pages*getpagesize();
+ map_zeroed_pages(top, num_pages);
+ desc->pfn = top / getpagesize();
+ } else
+ desc->pfn = 0;
+ return desc;
+}
+
+static struct device *new_device(struct device_list *devices,
+ u16 type, u16 num_pages, u16 features,
+ int fd,
+ bool (*handle_input)(int, struct device *),
+ unsigned long watch_off,
+ u32 (*handle_output)(int,
+ const struct iovec *,
+ unsigned,
+ struct device *))
+{
+ struct device *dev = malloc(sizeof(*dev));
+
+ /* Append to device list. */
+ *devices->lastdev = dev;
+ dev->next = NULL;
+ devices->lastdev = &dev->next;
+
+ dev->fd = fd;
+ if (handle_input)
+ set_fd(dev->fd, devices);
+ dev->desc = new_dev_desc(type, features, num_pages);
+ dev->mem = (void *)(dev->desc->pfn * getpagesize());
+ dev->handle_input = handle_input;
+ dev->watch_key = (unsigned long)dev->mem + watch_off;
+ dev->handle_output = handle_output;
+ return dev;
+}
+
+static void setup_console(struct device_list *devices)
+{
+ struct device *dev;
+
+ if (tcgetattr(STDIN_FILENO, &orig_term) == 0) {
+ struct termios term = orig_term;
+ term.c_lflag &= ~(ISIG|ICANON|ECHO);
+ tcsetattr(STDIN_FILENO, TCSANOW, &term);
+ atexit(restore_term);
+ }
+
+ /* We don't currently require a page for the console. */
+ dev = new_device(devices, LGUEST_DEVICE_T_CONSOLE, 0, 0,
+ STDIN_FILENO, handle_console_input,
+ LGUEST_CONSOLE_DMA_KEY, handle_console_output);
+ dev->priv = malloc(sizeof(struct console_abort));
+ ((struct console_abort *)dev->priv)->count = 0;
+ verbose("device %p: console\n",
+ (void *)(dev->desc->pfn * getpagesize()));
+}
+
+static void setup_block_file(const char *filename, struct device_list *devices)
+{
+ int fd;
+ struct device *dev;
+ off64_t *device_len;
+ struct lguest_block_page *p;
+
+ fd = open_or_die(filename, O_RDWR|O_LARGEFILE|O_DIRECT);
+ dev = new_device(devices, LGUEST_DEVICE_T_BLOCK, 1,
+ LGUEST_DEVICE_F_RANDOMNESS,
+ fd, NULL, 0, handle_block_output);
+ device_len = dev->priv = malloc(sizeof(*device_len));
+ *device_len = lseek64(fd, 0, SEEK_END);
+ p = dev->mem;
+
+ p->num_sectors = *device_len/512;
+ verbose("device %p: block %i sectors\n",
+ (void *)(dev->desc->pfn * getpagesize()), p->num_sectors);
+}
+
+/* We use fnctl locks to reserve network slots (autocleanup!) */
+static unsigned int find_slot(int netfd, const char *filename)
+{
+ struct flock fl;
+
+ fl.l_type = F_WRLCK;
+ fl.l_whence = SEEK_SET;
+ fl.l_len = 1;
+ for (fl.l_start = 0;
+ fl.l_start < getpagesize()/sizeof(struct lguest_net);
+ fl.l_start++) {
+ if (fcntl(netfd, F_SETLK, &fl) == 0)
+ return fl.l_start;
+ }
+ errx(1, "No free slots in network file %s", filename);
+}
+
+static void setup_net_file(const char *filename,
+ struct device_list *devices)
+{
+ int netfd;
+ struct device *dev;
+
+ netfd = open(filename, O_RDWR, 0);
+ if (netfd < 0) {
+ if (errno == ENOENT) {
+ netfd = open(filename, O_RDWR|O_CREAT, 0600);
+ if (netfd >= 0) {
+ char page[getpagesize()];
+ memset(page, 0, sizeof(page));
+ write(netfd, page, sizeof(page));
+ }
+ }
+ if (netfd < 0)
+ err(1, "cannot open net file '%s'", filename);
+ }
+
+ dev = new_device(devices, LGUEST_DEVICE_T_NET, 1,
+ find_slot(netfd, filename)|LGUEST_NET_F_NOCSUM,
+ -1, NULL, 0, NULL);
+
+ /* We overwrite the /dev/zero mapping with the actual file. */
+ if (mmap(dev->mem, getpagesize(), PROT_READ|PROT_WRITE,
+ MAP_FIXED|MAP_SHARED, netfd, 0) != dev->mem)
+ err(1, "could not mmap '%s'", filename);
+ verbose("device %p: shared net %s, peer %i\n",
+ (void *)(dev->desc->pfn * getpagesize()), filename,
+ dev->desc->features & ~LGUEST_NET_F_NOCSUM);
+}
+
+static u32 str2ip(const char *ipaddr)
+{
+ unsigned int byte[4];
+
+ sscanf(ipaddr, "%u.%u.%u.%u", &byte[0], &byte[1], &byte[2], &byte[3]);
+ return (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3];
+}
+
+/* adapted from libbridge */
+static void add_to_bridge(int fd, const char *if_name, const char *br_name)
+{
+ int ifidx;
+ struct ifreq ifr;
+
+ if (!*br_name)
+ errx(1, "must specify bridge name");
+
+ ifidx = if_nametoindex(if_name);
+ if (!ifidx)
+ errx(1, "interface %s does not exist!", if_name);
+
+ strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
+ ifr.ifr_ifindex = ifidx;
+ if (ioctl(fd, SIOCBRADDIF, &ifr) < 0)
+ err(1, "can't add %s to bridge %s", if_name, br_name);
+}
+
+static void configure_device(int fd, const char *devname, u32 ipaddr,
+ unsigned char hwaddr[6])
+{
+ struct ifreq ifr;
+ struct sockaddr_in *sin = (struct sockaddr_in *)&ifr.ifr_addr;
+
+ memset(&ifr, 0, sizeof(ifr));
+ strcpy(ifr.ifr_name, devname);
+ sin->sin_family = AF_INET;
+ sin->sin_addr.s_addr = htonl(ipaddr);
+ if (ioctl(fd, SIOCSIFADDR, &ifr) != 0)
+ err(1, "Setting %s interface address", devname);
+ ifr.ifr_flags = IFF_UP;
+ if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0)
+ err(1, "Bringing interface %s up", devname);
+
+ if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
+ err(1, "getting hw address for %s", devname);
+
+ memcpy(hwaddr, ifr.ifr_hwaddr.sa_data, 6);
+}
+
+static void setup_tun_net(const char *arg, struct device_list *devices)
+{
+ struct device *dev;
+ struct ifreq ifr;
+ int netfd, ipfd;
+ u32 ip;
+ const char *br_name = NULL;
+
+ netfd = open_or_die("/dev/net/tun", O_RDWR);
+ memset(&ifr, 0, sizeof(ifr));
+ ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+ strcpy(ifr.ifr_name, "tap%d");
+ if (ioctl(netfd, TUNSETIFF, &ifr) != 0)
+ err(1, "configuring /dev/net/tun");
+ ioctl(netfd, TUNSETNOCSUM, 1);
+
+ /* You will be peer 1: we should create enough jitter to randomize */
+ dev = new_device(devices, LGUEST_DEVICE_T_NET, 1,
+ NET_PEERNUM|LGUEST_DEVICE_F_RANDOMNESS, netfd,
+ handle_tun_input, peer_offset(0), handle_tun_output);
+ dev->priv = malloc(sizeof(bool));
+ *(bool *)dev->priv = false;
+
+ ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+ if (ipfd < 0)
+ err(1, "opening IP socket");
+
+ if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
+ ip = INADDR_ANY;
+ br_name = arg + strlen(BRIDGE_PFX);
+ add_to_bridge(ipfd, ifr.ifr_name, br_name);
+ } else
+ ip = str2ip(arg);
+
+ /* We are peer 0, ie. first slot. */
+ configure_device(ipfd, ifr.ifr_name, ip, dev->mem);
+
+ /* Set "promisc" bit: we want every single packet. */
+ *((u8 *)dev->mem) |= 0x1;
+
+ close(ipfd);
+
+ verbose("device %p: tun net %u.%u.%u.%u\n",
+ (void *)(dev->desc->pfn * getpagesize()),
+ (u8)(ip>>24), (u8)(ip>>16), (u8)(ip>>8), (u8)ip);
+ if (br_name)
+ verbose("attached to bridge: %s\n", br_name);
+}
+
+/* Now we know how much memory we have, we copy in device descriptors */
+static void map_device_descriptors(struct device_list *devs, unsigned long mem)
+{
+ struct device *i;
+ unsigned int num;
+ struct lguest_device_desc *descs;
+
+ /* Device descriptor array sits just above top of normal memory */
+ descs = map_zeroed_pages(mem, 1);
+
+ for (i = devs->dev, num = 0; i; i = i->next, num++) {
+ if (num == LGUEST_MAX_DEVICES)
+ errx(1, "too many devices");
+ verbose("Device %i: %s\n", num,
+ i->desc->type == LGUEST_DEVICE_T_NET ? "net"
+ : i->desc->type == LGUEST_DEVICE_T_CONSOLE ? "console"
+ : i->desc->type == LGUEST_DEVICE_T_BLOCK ? "block"
+ : "unknown");
+ descs[num] = *i->desc;
+ free(i->desc);
+ i->desc = &descs[num];
+ }
+}
+
+static void __attribute__((noreturn))
+run_guest(int lguest_fd, struct device_list *device_list)
+{
+ for (;;) {
+ u32 args[] = { LHREQ_BREAK, 0 };
+ unsigned long arr[2];
+ int readval;
+
+ /* We read from the /dev/lguest device to run the Guest. */
+ readval = read(lguest_fd, arr, sizeof(arr));
+
+ if (readval == sizeof(arr)) {
+ handle_output(lguest_fd, arr[0], arr[1], device_list);
+ continue;
+ } else if (errno == ENOENT) {
+ char reason[1024] = { 0 };
+ read(lguest_fd, reason, sizeof(reason)-1);
+ errx(1, "%s", reason);
+ } else if (errno != EAGAIN)
+ err(1, "Running guest failed");
+ handle_input(lguest_fd, device_list);
+ if (write(lguest_fd, args, sizeof(args)) < 0)
+ err(1, "Resetting break");
+ }
+}
+
+static struct option opts[] = {
+ { "verbose", 0, NULL, 'v' },
+ { "sharenet", 1, NULL, 's' },
+ { "tunnet", 1, NULL, 't' },
+ { "block", 1, NULL, 'b' },
+ { "initrd", 1, NULL, 'i' },
+ { NULL },
+};
+static void usage(void)
+{
+ errx(1, "Usage: lguest [--verbose] "
+ "[--sharenet=<filename>|--tunnet=(<ipaddr>|bridge:<bridgename>)\n"
+ "|--block=<filename>|--initrd=<filename>]...\n"
+ "<mem-in-mb> vmlinux [args...]");
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned long mem, pgdir, start, page_offset, initrd_size = 0;
+ int c, lguest_fd;
+ struct device_list device_list;
+ void *boot = (void *)0;
+ const char *initrd_name = NULL;
+
+ device_list.max_infd = -1;
+ device_list.dev = NULL;
+ device_list.lastdev = &device_list.dev;
+ FD_ZERO(&device_list.infds);
+
+ while ((c = getopt_long(argc, argv, "v", opts, NULL)) != EOF) {
+ switch (c) {
+ case 'v':
+ verbose = true;
+ break;
+ case 's':
+ setup_net_file(optarg, &device_list);
+ break;
+ case 't':
+ setup_tun_net(optarg, &device_list);
+ break;
+ case 'b':
+ setup_block_file(optarg, &device_list);
+ break;
+ case 'i':
+ initrd_name = optarg;
+ break;
+ default:
+ warnx("Unknown argument %s", argv[optind]);
+ usage();
+ }
+ }
+ if (optind + 2 > argc)
+ usage();
+
+ /* We need a console device */
+ setup_console(&device_list);
+
+ /* First we map /dev/zero over all of guest-physical memory. */
+ mem = atoi(argv[optind]) * 1024 * 1024;
+ map_zeroed_pages(0, mem / getpagesize());
+
+ /* Now we load the kernel */
+ start = load_kernel(open_or_die(argv[optind+1], O_RDONLY),
+ &page_offset);
+
+ /* Write the device descriptors into memory. */
+ map_device_descriptors(&device_list, mem);
+
+ /* Map the initrd image if requested */
+ if (initrd_name) {
+ initrd_size = load_initrd(initrd_name, mem);
+ *(unsigned long *)(boot+0x218) = mem - initrd_size;
+ *(unsigned long *)(boot+0x21c) = initrd_size;
+ *(unsigned char *)(boot+0x210) = 0xFF;
+ }
+
+ /* Set up the initial linar pagetables. */
+ pgdir = setup_pagetables(mem, initrd_size, page_offset);
+
+ /* E820 memory map: ours is a simple, single region. */
+ *(char*)(boot+E820NR) = 1;
+ *((struct e820entry *)(boot+E820MAP))
+ = ((struct e820entry) { 0, mem, E820_RAM });
+ /* Command line pointer and command line (at 4096) */
+ *(void **)(boot + 0x228) = boot + 4096;
+ concat(boot + 4096, argv+optind+2);
+ /* Paravirt type: 1 == lguest */
+ *(int *)(boot + 0x23c) = 1;
+
+ lguest_fd = tell_kernel(pgdir, start, page_offset);
+ waker_fd = setup_waker(lguest_fd, &device_list);
+
+ run_guest(lguest_fd, &device_list);
+}
--- /dev/null
+Rusty's Remarkably Unreliable Guide to Lguest
+ - or, A Young Coder's Illustrated Hypervisor
+http://lguest.ozlabs.org
+
+Lguest is designed to be a minimal hypervisor for the Linux kernel, for
+Linux developers and users to experiment with virtualization with the
+minimum of complexity. Nonetheless, it should have sufficient
+features to make it useful for specific tasks, and, of course, you are
+encouraged to fork and enhance it.
+
+Features:
+
+- Kernel module which runs in a normal kernel.
+- Simple I/O model for communication.
+- Simple program to create new guests.
+- Logo contains cute puppies: http://lguest.ozlabs.org
+
+Developer features:
+
+- Fun to hack on.
+- No ABI: being tied to a specific kernel anyway, you can change anything.
+- Many opportunities for improvement or feature implementation.
+
+Running Lguest:
+
+- Lguest runs the same kernel as guest and host. You can configure
+ them differently, but usually it's easiest not to.
+
+ You will need to configure your kernel with the following options:
+
+ CONFIG_HIGHMEM64G=n ("High Memory Support" "64GB")[1]
+ CONFIG_TUN=y/m ("Universal TUN/TAP device driver support")
+ CONFIG_EXPERIMENTAL=y ("Prompt for development and/or incomplete code/drivers")
+ CONFIG_PARAVIRT=y ("Paravirtualization support (EXPERIMENTAL)")
+ CONFIG_LGUEST=y/m ("Linux hypervisor example code")
+
+ and I recommend:
+ CONFIG_HZ=100 ("Timer frequency")[2]
+
+- A tool called "lguest" is available in this directory: type "make"
+ to build it. If you didn't build your kernel in-tree, use "make
+ O=<builddir>".
+
+- Create or find a root disk image. There are several useful ones
+ around, such as the xm-test tiny root image at
+ http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img
+
+ For more serious work, I usually use a distribution ISO image and
+ install it under qemu, then make multiple copies:
+
+ dd if=/dev/zero of=rootfile bs=1M count=2048
+ qemu -cdrom image.iso -hda rootfile -net user -net nic -boot d
+
+- "modprobe lg" if you built it as a module.
+
+- Run an lguest as root:
+
+ Documentation/lguest/lguest 64m vmlinux --tunnet=192.168.19.1 --block=rootfile root=/dev/lgba
+
+ Explanation:
+ 64m: the amount of memory to use.
+
+ vmlinux: the kernel image found in the top of your build directory. You
+ can also use a standard bzImage.
+
+ --tunnet=192.168.19.1: configures a "tap" device for networking with this
+ IP address.
+
+ --block=rootfile: a file or block device which becomes /dev/lgba
+ inside the guest.
+
+ root=/dev/lgba: this (and anything else on the command line) are
+ kernel boot parameters.
+
+- Configuring networking. I usually have the host masquerade, using
+ "iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE" and "echo 1 >
+ /proc/sys/net/ipv4/ip_forward". In this example, I would configure
+ eth0 inside the guest at 192.168.19.2.
+
+ Another method is to bridge the tap device to an external interface
+ using --tunnet=bridge:<bridgename>, and perhaps run dhcp on the guest
+ to obtain an IP address. The bridge needs to be configured first:
+ this option simply adds the tap interface to it.
+
+ A simple example on my system:
+
+ ifconfig eth0 0.0.0.0
+ brctl addbr lg0
+ ifconfig lg0 up
+ brctl addif lg0 eth0
+ dhclient lg0
+
+ Then use --tunnet=bridge:lg0 when launching the guest.
+
+ See http://linux-net.osdl.org/index.php/Bridge for general information
+ on how to get bridging working.
+
+- You can also create an inter-guest network using
+ "--sharenet=<filename>": any two guests using the same file are on
+ the same network. This file is created if it does not exist.
+
+Lguest I/O model:
+
+Lguest uses a simplified DMA model plus shared memory for I/O. Guests
+can communicate with each other if they share underlying memory
+(usually by the lguest program mmaping the same file), but they can
+use any non-shared memory to communicate with the lguest process.
+
+Guests can register DMA buffers at any key (must be a valid physical
+address) using the LHCALL_BIND_DMA(key, dmabufs, num<<8|irq)
+hypercall. "dmabufs" is the physical address of an array of "num"
+"struct lguest_dma": each contains a used_len, and an array of
+physical addresses and lengths. When a transfer occurs, the
+"used_len" field of one of the buffers which has used_len 0 will be
+set to the length transferred and the irq will fire.
+
+Using an irq value of 0 unbinds the dma buffers.
+
+To send DMA, the LHCALL_SEND_DMA(key, dma_physaddr) hypercall is used,
+and the bytes used is written to the used_len field. This can be 0 if
+noone else has bound a DMA buffer to that key or some other error.
+DMA buffers bound by the same guest are ignored.
+
+Cheers!
+Rusty Russell rusty@rustcorp.com.au.
+
+[1] These are on various places on the TODO list, waiting for you to
+ get annoyed enough at the limitation to fix it.
+[2] Lguest is not yet tickless when idle. See [1].
--- /dev/null
+Suspend notifiers
+ (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
+
+There are some operations that device drivers may want to carry out in their
+.suspend() routines, but shouldn't, because they can cause the hibernation or
+suspend to fail. For example, a driver may want to allocate a substantial amount
+of memory (like 50 MB) in .suspend(), but that shouldn't be done after the
+swsusp's memory shrinker has run.
+
+Also, there may be some operations, that subsystems want to carry out before a
+hibernation/suspend or after a restore/resume, requiring the system to be fully
+functional, so the drivers' .suspend() and .resume() routines are not suitable
+for this purpose. For example, device drivers may want to upload firmware to
+their devices after a restore from a hibernation image, but they cannot do it by
+calling request_firmware() from their .resume() routines (user land processes
+are frozen at this point). The solution may be to load the firmware into
+memory before processes are frozen and upload it from there in the .resume()
+routine. Of course, a hibernation notifier may be used for this purpose.
+
+The subsystems that have such needs can register suspend notifiers that will be
+called upon the following events by the suspend core:
+
+PM_HIBERNATION_PREPARE The system is going to hibernate or suspend, tasks will
+ be frozen immediately.
+
+PM_POST_HIBERNATION The system memory state has been restored from a
+ hibernation image or an error occured during the
+ hibernation. Device drivers' .resume() callbacks have
+ been executed and tasks have been thawed.
+
+PM_SUSPEND_PREPARE The system is preparing for a suspend.
+
+PM_POST_SUSPEND The system has just resumed or an error occured during
+ the suspend. Device drivers' .resume() callbacks have
+ been executed and tasks have been thawed.
+
+It is generally assumed that whatever the notifiers do for
+PM_HIBERNATION_PREPARE, should be undone for PM_POST_HIBERNATION. Analogously,
+operations performed for PM_SUSPEND_PREPARE should be reversed for
+PM_POST_SUSPEND. Additionally, all of the notifiers are called for
+PM_POST_HIBERNATION if one of them fails for PM_HIBERNATION_PREPARE, and
+all of the notifiers are called for PM_POST_SUSPEND if one of them fails for
+PM_SUSPEND_PREPARE.
+
+The hibernation and suspend notifiers are called with pm_mutex held. They are
+defined in the usual way, but their last argument is meaningless (it is always
+NULL). To register and/or unregister a suspend notifier use the functions
+register_pm_notifier() and unregister_pm_notifier(), respectively, defined in
+include/linux/suspend.h . If you don't need to unregister the notifier, you can
+also use the pm_notifier() macro defined in include/linux/suspend.h .
always := $(offsets-file)
targets := $(offsets-file)
targets += arch/$(ARCH)/kernel/asm-offsets.s
+clean-files := $(addprefix $(objtree)/,$(targets))
# Default sed regexp - multiline due to syntax constraints
define sed-y
W: http://xf.iksaif.net/acpi4asus
S: Maintained
+ASUS ASB100 HARDWARE MONITOR DRIVER
+P: Mark M. Hoffman
+M: mhoffman@lightlink.com
+L: lm-sensors@lm-sensors.org
+S: Maintained
+
ASUS LAPTOP EXTRAS DRIVER
P: Corentin Chary
M: corentincj@iksaif.net
L: linux-kernel@vger.kernel.org
S: Maintained
+DME1737 HARDWARE MONITOR DRIVER
+P: Juerg Haefliger
+M: juergh@gmail.com
+L: lm-sensors@lm-sensors.org
+S: Maintained
+
DOCBOOK FOR DOCUMENTATION
P: Randy Dunlap
M: rdunlap@xenotime.net
EDAC-CORE
P: Doug Thompson
-M: norsk5@xmission.com
+M: dougthompson@xmission.com
L: bluesmoke-devel@lists.sourceforge.net
W: bluesmoke.sourceforge.net
S: Supported
EDAC-E752X
P: Mark Gross
+P: Doug Thompson
M: mark.gross@intel.com
+M: dougthompson@xmission.com
L: bluesmoke-devel@lists.sourceforge.net
W: bluesmoke.sourceforge.net
S: Maintained
EDAC-E7XXX
P: Doug Thompson
-M: norsk5@xmission.com
+M: dougthompson@xmission.com
+L: bluesmoke-devel@lists.sourceforge.net
+W: bluesmoke.sourceforge.net
+S: Maintained
+
+EDAC-I82443BXGX
+P: Tim Small
+M: tim@buttersideup.com
+L: bluesmoke-devel@lists.sourceforge.net
+W: bluesmoke.sourceforge.net
+S: Maintained
+
+EDAC-I3000
+P: Jason Uhlenkott
+M: juhlenko@akamai.com
+L: bluesmoke-devel@lists.sourceforge.net
+W: bluesmoke.sourceforge.net
+S: Maintained
+
+EDAC-I5000
+P: Doug Thompson
+M: dougthompson@xmission.com
+L: bluesmoke-devel@lists.sourceforge.net
+W: bluesmoke.sourceforge.net
+S: Maintained
+
+EDAC-I82975X
+P: Ranganathan Desikan
+P: Arvind R.
+M: rdesikan@jetzbroadband.com
+M: arvind@acarlab.com
+L: bluesmoke-devel@lists.sourceforge.net
+W: bluesmoke.sourceforge.net
+S: Maintained
+
+EDAC-PASEMI
+P: Egor Martovetsky
+M: egor@pasemi.com
L: bluesmoke-devel@lists.sourceforge.net
W: bluesmoke.sourceforge.net
S: Maintained
S: Maintained
HARDWARE MONITORING
-P: Jean Delvare
-M: khali@linux-fr.org
+P: Mark M. Hoffman
+M: mhoffman@lightlink.com
L: lm-sensors@lm-sensors.org
W: http://www.lm-sensors.org/
-T: quilt http://khali.linux-fr.org/devel/linux-2.6/jdelvare-hwmon/
+T: git lm-sensors.org:/kernel/mhoffman/hwmon-2.6.git
S: Maintained
HARDWARE RANDOM NUMBER GENERATOR CORE
M: wli@holomorphy.com
S: Maintained
+I2C/SMBUS STUB DRIVER
+P: Mark M. Hoffman
+M: mhoffman@lightlink.com
+L: lm-sensors@lm-sensors.org
+S: Maintained
+
I2C SUBSYSTEM
P: Jean Delvare
M: khali@linux-fr.org
S: Maintained
POSIX CLOCKS and TIMERS
-P: George Anzinger
-M: george@mvista.com
+P: Thomas Gleixner
+M: tglx@linutronix.de
L: linux-kernel@vger.kernel.org
S: Supported
L: netdev@vger.kernel.org
S: Maintained
+SIS 96X I2C/SMBUS DRIVER
+P: Mark M. Hoffman
+M: mhoffman@lightlink.com
+L: lm-sensors@lm-sensors.org
+S: Maintained
+
SIS FRAMEBUFFER DRIVER
P: Thomas Winischhofer
M: thomas@winischhofer.net
M: nico@cam.org
S: Maintained
+SMSC47B397 HARDWARE MONITOR DRIVER
+P: Mark M. Hoffman
+M: mhoffman@lightlink.com
+L: lm-sensors@lm-sensors.org
+S: Maintained
+
SOFTMAC LAYER (IEEE 802.11)
P: Johannes Berg
M: johannes@sipsolutions.net
L: linux-raid@vger.kernel.org
S: Supported
-SOFTWARE SUSPEND:
+HIBERNATION (aka Software Suspend, aka swsusp):
+P: Pavel Machek
+M: pavel@suse.cz
+P: Rafael J. Wysocki
+M: rjw@sisk.pl
+L: linux-pm@lists.linux-foundation.org
+S: Supported
+
+SUSPEND TO RAM:
P: Pavel Machek
M: pavel@suse.cz
+P: Rafael J. Wysocki
+M: rjw@sisk.pl
L: linux-pm@lists.linux-foundation.org
S: Maintained
include $(srctree)/arch/$(ARCH)/Makefile
ifdef CONFIG_FRAME_POINTER
-CFLAGS += -fno-omit-frame-pointer $(call cc-option,-fno-optimize-sibling-calls,)
+CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
else
CFLAGS += -fomit-frame-pointer
endif
# disable pointer signed / unsigned warnings in gcc 4.0
CFLAGS += $(call cc-option,-Wno-pointer-sign,)
+# Use --build-id when available.
+LDFLAGS_BUILD_ID = $(patsubst -Wl$(comma)%,%,\
+ $(call ld-option, -Wl$(comma)--build-id,))
+LDFLAGS_MODULE += $(LDFLAGS_BUILD_ID)
+LDFLAGS_vmlinux += $(LDFLAGS_BUILD_ID)
+
# Default kernel image to build when no specific target is given.
# KBUILD_IMAGE may be overruled on the command line or
# set in the environment
cmd_vmlinux__ ?= $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux) -o $@ \
-T $(vmlinux-lds) $(vmlinux-init) \
--start-group $(vmlinux-main) --end-group \
- $(filter-out $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) FORCE ,$^)
+ $(filter-out $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) vmlinux.o FORCE ,$^)
# Generate new vmlinux version
quiet_cmd_vmlinux_version = GEN .version
endif # ifdef CONFIG_KALLSYMS
+# Do modpost on a prelinked vmlinux. The finally linked vmlinux has
+# relevant sections renamed as per the linker script.
+quiet_cmd_vmlinux-modpost = LD $@
+ cmd_vmlinux-modpost = $(LD) $(LDFLAGS) -r -o $@ \
+ $(vmlinux-init) --start-group $(vmlinux-main) --end-group \
+ $(filter-out $(vmlinux-init) $(vmlinux-main) $(vmlinux-lds) FORCE ,$^)
+define rule_vmlinux-modpost
+ :
+ +$(call cmd,vmlinux-modpost)
+ $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost $@
+ $(Q)echo 'cmd_$@ := $(cmd_vmlinux-modpost)' > $(dot-target).cmd
+endef
+
# vmlinux image - including updated kernel symbols
-vmlinux: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) $(kallsyms.o) FORCE
+vmlinux: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) $(kallsyms.o) vmlinux.o FORCE
ifdef CONFIG_HEADERS_CHECK
$(Q)$(MAKE) -f $(srctree)/Makefile headers_check
endif
+ $(call vmlinux-modpost)
$(call if_changed_rule,vmlinux__)
- $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost $@
$(Q)rm -f .old_version
+vmlinux.o: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) $(kallsyms.o) FORCE
+ $(call if_changed_rule,vmlinux-modpost)
+
# The actual objects are generated when descending,
# make sure no implicit rule kicks in
$(sort $(vmlinux-init) $(vmlinux-main)) $(vmlinux-lds): $(vmlinux-dirs) ;
-I __initdata,__exitdata,__acquires,__releases \
-I EXPORT_SYMBOL,EXPORT_SYMBOL_GPL \
--extra=+f --c-kinds=+px \
- --regex-asm='/ENTRY\(([^)]*)\).*/\1/'; \
+ --regex-asm='/^ENTRY\(([^)]*)\).*/\1/'; \
$(all-kconfigs) | xargs $1 -a \
--langdef=kconfig \
--language-force=kconfig \
}
nsyms = symtab->sh_size / sizeof(Elf64_Sym);
- chains = kmalloc(nsyms * sizeof(struct got_entry), GFP_KERNEL);
- memset(chains, 0, nsyms * sizeof(struct got_entry));
+ chains = kcalloc(nsyms, sizeof(struct got_entry), GFP_KERNEL);
got->sh_size = 0;
got->sh_addralign = 8;
OUTPUT_FORMAT("elf64-alpha")
OUTPUT_ARCH(alpha)
ENTRY(__start)
-PHDRS { kernel PT_LOAD ; }
+PHDRS { kernel PT_LOAD; note PT_NOTE; }
jiffies = jiffies_64;
SECTIONS
{
__ex_table : { *(__ex_table) }
__stop___ex_table = .;
+ NOTES :kernel :note
+ .dummy : { *(.dummy) } :kernel
+
RODATA
/* Will be freed after init */
. = ALIGN(8);
SECURITY_INIT
- . = ALIGN(8192);
- __per_cpu_start = .;
- .data.percpu : { *(.data.percpu) }
- __per_cpu_end = .;
+ PERCPU(8192)
. = ALIGN(2*8192);
__init_end = .;
the fault. */
fault = handle_mm_fault(mm, vma, address, cause > 0);
up_read(&mm->mmap_sem);
-
- switch (fault) {
- case VM_FAULT_MINOR:
- current->min_flt++;
- break;
- case VM_FAULT_MAJOR:
- current->maj_flt++;
- break;
- case VM_FAULT_SIGBUS:
- goto do_sigbus;
- case VM_FAULT_OOM:
- goto out_of_memory;
- default:
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ goto do_sigbus;
BUG();
}
+ if (fault & VM_FAULT_MAJOR)
+ current->maj_flt++;
+ else
+ current->min_flt++;
return;
/* Something tried to access memory that isn't in our memory map.
# I2C Hardware Bus support
#
CONFIG_I2C_ELEKTOR=m
-# CONFIG_I2C_ISA is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_STUB is not set
# I2C Hardware Bus support
#
# CONFIG_I2C_ELEKTOR is not set
-# CONFIG_I2C_ISA is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_PCA_ISA is not set
# CONFIG_I2C_ELEKTOR is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_I810 is not set
-# CONFIG_I2C_ISA is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# I2C Hardware Bus support
#
# CONFIG_I2C_ELEKTOR is not set
-# CONFIG_I2C_ISA is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_PCA_ISA is not set
# I2C Hardware Bus support
#
CONFIG_I2C_AT91=m
-CONFIG_I2C_ISA=m
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_STUB is not set
#
# I2C Hardware Bus support
#
-# CONFIG_I2C_ISA is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_STUB is not set
# I2C Hardware Bus support
#
# CONFIG_I2C_ELEKTOR is not set
-CONFIG_I2C_ISA=m
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
. = ALIGN(4096);
__per_cpu_start = .;
*(.data.percpu)
+ *(.data.percpu.shared_aligned)
__per_cpu_end = .;
#ifndef CONFIG_XIP_KERNEL
__init_begin = _stext;
if (nr > 1)
return 0;
- res = kmalloc(sizeof(struct resource) * 2, GFP_KERNEL);
+ res = kcalloc(2, sizeof(struct resource), GFP_KERNEL);
if (!res)
panic("PCI: unable to alloc resources");
- memset(res, 0, sizeof(struct resource) * 2);
/* 'nr' assumptions:
* ATUX is always 0
*/
survive:
fault = handle_mm_fault(mm, vma, addr & PAGE_MASK, fsr & (1 << 11));
-
- /*
- * Handle the "normal" cases first - successful and sigbus
- */
- switch (fault) {
- case VM_FAULT_MAJOR:
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ return fault;
+ BUG();
+ }
+ if (fault & VM_FAULT_MAJOR)
tsk->maj_flt++;
- return fault;
- case VM_FAULT_MINOR:
+ else
tsk->min_flt++;
- case VM_FAULT_SIGBUS:
- return fault;
- }
+ return fault;
+out_of_memory:
if (!is_init(tsk))
goto out;
/*
* Handle the "normal" case first - VM_FAULT_MAJOR / VM_FAULT_MINOR
*/
- if (fault >= VM_FAULT_MINOR)
+ if (likely(!(fault & VM_FAULT_ERROR)))
return 0;
/*
if (!user_mode(regs))
goto no_context;
- switch (fault) {
- case VM_FAULT_OOM:
+ if (fault & VM_FAULT_OOM) {
/*
* We ran out of memory, or some other thing
* happened to us that made us unable to handle
printk("VM: killing process %s\n", tsk->comm);
do_exit(SIGKILL);
return 0;
-
- case VM_FAULT_SIGBUS:
+ }
+ if (fault & VM_FAULT_SIGBUS) {
/*
* We had some memory, but were unable to
* successfully fix up this page fault.
*/
sig = SIGBUS;
code = BUS_ADRERR;
- break;
-
- default:
+ } else {
/*
* Something tried to access memory that
* isn't in our memory map..
sig = SIGSEGV;
code = fault == VM_FAULT_BADACCESS ?
SEGV_ACCERR : SEGV_MAPERR;
- break;
}
__do_user_fault(tsk, addr, fsr, sig, code, regs);
*/
survive:
fault = handle_mm_fault(mm, vma, addr & PAGE_MASK, DO_COW(fsr));
-
- /*
- * Handle the "normal" cases first - successful and sigbus
- */
- switch (fault) {
- case VM_FAULT_MAJOR:
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ return fault;
+ BUG();
+ }
+ if (fault & VM_FAULT_MAJOR)
tsk->maj_flt++;
- return fault;
- case VM_FAULT_MINOR:
+ else
tsk->min_flt++;
- case VM_FAULT_SIGBUS:
- return fault;
- }
+ return fault;
+out_of_memory:
fault = -3; /* out of memory */
if (!is_init(tsk))
goto out;
/*
* Handle the "normal" case first
*/
- switch (fault) {
- case VM_FAULT_MINOR:
- case VM_FAULT_MAJOR:
+ if (likely(!(fault & VM_FAULT_ERROR)))
return 0;
- case VM_FAULT_SIGBUS:
+ if (fault & VM_FAULT_SIGBUS)
goto do_sigbus;
- }
+ /* else VM_FAULT_OOM */
/*
* If we are in kernel mode at this point, we
int writeaccess;
long signr;
int code;
+ int fault;
if (notify_page_fault(regs, ecr))
return;
* fault.
*/
survive:
- switch (handle_mm_fault(mm, vma, address, writeaccess)) {
- case VM_FAULT_MINOR:
- tsk->min_flt++;
- break;
- case VM_FAULT_MAJOR:
- tsk->maj_flt++;
- break;
- case VM_FAULT_SIGBUS:
- goto do_sigbus;
- case VM_FAULT_OOM:
- goto out_of_memory;
- default:
+ fault = handle_mm_fault(mm, vma, address, writeaccess);
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ goto do_sigbus;
BUG();
}
+ if (fault & VM_FAULT_MAJOR)
+ tsk->maj_flt++;
+ else
+ tsk->min_flt++;
up_read(&mm->mmap_sem);
return;
struct sram_list_struct *lsl = NULL;
struct mm_struct *mm = current->mm;
- lsl = kmalloc(sizeof(struct sram_list_struct), GFP_KERNEL);
+ lsl = kzalloc(sizeof(struct sram_list_struct), GFP_KERNEL);
if (!lsl)
return NULL;
- memset(lsl, 0, sizeof(*lsl));
if (flags & L1_INST_SRAM)
addr = l1_inst_sram_alloc(size);
void __exit
pcf8563_exit(void)
{
- if (unregister_chrdev(PCF8563_MAJOR, DEVICE_NAME) < 0) {
- printk(KERN_INFO "%s: Unable to unregister device.\n", PCF8563_NAME);
- }
+ unregister_chrdev(PCF8563_MAJOR, DEVICE_NAME);
}
/*
___data_start = . ;
__Sdata = . ;
.data : { /* Data */
- *(.data)
+ DATA_DATA
}
__edata = . ; /* End of data section */
_edata = . ;
void __exit
pcf8563_exit(void)
{
- if (unregister_chrdev(PCF8563_MAJOR, DEVICE_NAME) < 0) {
- printk(KERN_INFO "%s: Unable to unregister device.\n", PCF8563_NAME);
- }
+ unregister_chrdev(PCF8563_MAJOR, DEVICE_NAME);
}
/*
if (!mem_base)
goto out;
- dev->dma_mem = kmalloc(sizeof(struct dma_coherent_mem), GFP_KERNEL);
+ dev->dma_mem = kzalloc(sizeof(struct dma_coherent_mem), GFP_KERNEL);
if (!dev->dma_mem)
goto out;
- memset(dev->dma_mem, 0, sizeof(struct dma_coherent_mem));
- dev->dma_mem->bitmap = kmalloc(bitmap_size, GFP_KERNEL);
+ dev->dma_mem->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
if (!dev->dma_mem->bitmap)
goto free1_out;
- memset(dev->dma_mem->bitmap, 0, bitmap_size);
dev->dma_mem->virt_base = mem_base;
dev->dma_mem->device_base = device_addr;
___data_start = . ;
__Sdata = . ;
.data : { /* Data */
- *(.data)
+ DATA_DATA
}
__edata = . ; /* End of data section. */
_edata = . ;
}
SECURITY_INIT
- . = ALIGN (8192);
- __per_cpu_start = .;
- .data.percpu : { *(.data.percpu) }
- __per_cpu_end = .;
+ PERCPU(8192)
#ifdef CONFIG_BLK_DEV_INITRD
.init.ramfs : {
struct mm_struct *mm;
struct vm_area_struct * vma;
siginfo_t info;
+ int fault;
D(printk("Page fault for %lX on %X at %lX, prot %d write %d\n",
address, smp_processor_id(), instruction_pointer(regs),
* the fault.
*/
- switch (handle_mm_fault(mm, vma, address, writeaccess & 1)) {
- case VM_FAULT_MINOR:
- tsk->min_flt++;
- break;
- case VM_FAULT_MAJOR:
- tsk->maj_flt++;
- break;
- case VM_FAULT_SIGBUS:
- goto do_sigbus;
- default:
- goto out_of_memory;
+ fault = handle_mm_fault(mm, vma, address, writeaccess & 1);
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ goto do_sigbus;
+ BUG();
}
+ if (fault & VM_FAULT_MAJOR)
+ tsk->maj_flt++;
+ else
+ tsk->min_flt++;
up_read(&mm->mmap_sem);
return;
# make sure the .S files get compiled with debug info
# and disable optimisations that are unhelpful whilst debugging
ifdef CONFIG_DEBUG_INFO
-CFLAGS += -O1
+#CFLAGS += -O1
AFLAGS += -Wa,--gdwarf2
ASFLAGS += -Wa,--gdwarf2
endif
__alt_instructions_end = .;
.altinstr_replacement : { *(.altinstr_replacement) }
- . = ALIGN(4096);
- __per_cpu_start = .;
- .data.percpu : { *(.data.percpu) }
- __per_cpu_end = .;
+ PERCPU(4096)
#ifdef CONFIG_BLK_DEV_INITRD
. = ALIGN(4096);
pud_t *pue;
pte_t *pte;
int write;
+ int fault;
#if 0
const char *atxc[16] = {
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- switch (handle_mm_fault(mm, vma, ear0, write)) {
- case VM_FAULT_MINOR:
- current->min_flt++;
- break;
- case VM_FAULT_MAJOR:
- current->maj_flt++;
- break;
- case VM_FAULT_SIGBUS:
- goto do_sigbus;
- default:
- goto out_of_memory;
+ fault = handle_mm_fault(mm, vma, ear0, write);
+ if (unlikely(fault & VM_FAULT_ERROR)) {
+ if (fault & VM_FAULT_OOM)
+ goto out_of_memory;
+ else if (fault & VM_FAULT_SIGBUS)
+ goto do_sigbus;
+ BUG();
}
+ if (fault & VM_FAULT_MAJOR)
+ current->maj_flt++;
+ else
+ current->min_flt++;
up_read(&mm->mmap_sem);
return;
depends on !M386
default y
-config X86_CMPXCHG64
- bool
- depends on X86_PAE
- default y
-
config X86_ALIGNMENT_16
bool
depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCYRIXIII || X86_ELAN || MK6 || M586MMX || M586TSC || M586 || M486 || MVIAC3_2 || MGEODEGX1
bootsect
bzImage
setup
+setup.bin
+setup.elf
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
-CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
/* address in low memory of the wakeup routine. */
unsigned long acpi_wakeup_address = 0;
-unsigned long acpi_video_flags;
+unsigned long acpi_realmode_flags;
extern char wakeup_start, wakeup_end;
extern unsigned long FASTCALL(acpi_copy_wakeup_routine(unsigned long));
{
while ((str != NULL) && (*str != '\0')) {
if (strncmp(str, "s3_bios", 7) == 0)
- acpi_video_flags = 1;
+ acpi_realmode_flags |= 1;
if (strncmp(str, "s3_mode", 7) == 0)
- acpi_video_flags |= 2;
+ acpi_realmode_flags |= 2;
+ if (strncmp(str, "s3_beep", 7) == 0)
+ acpi_realmode_flags |= 4;
str = strchr(str, ',');
if (str != NULL)
str += strspn(str, ", \t");
__setup("acpi_sleep=", acpi_sleep_setup);
+/* Ouch, we want to delete this. We already have better version in userspace, in
+ s2ram from suspend.sf.net project */
static __init int reset_videomode_after_s3(struct dmi_system_id *d)
{
- acpi_video_flags |= 2;
+ acpi_realmode_flags |= 2;
return 0;
}
# cs = 0x1234, eip = 0x05
#
+#define BEEP \
+ inb $97, %al; \
+ outb %al, $0x80; \
+ movb $3, %al; \
+ outb %al, $97; \
+ outb %al, $0x80; \
+ movb $-74, %al; \
+ outb %al, $67; \
+ outb %al, $0x80; \
+ movb $-119, %al; \
+ outb %al, $66; \
+ outb %al, $0x80; \
+ movb $15, %al; \
+ outb %al, $66;
+
ALIGN
.align 4096
ENTRY(wakeup_start)
movw %cs, %ax
movw %ax, %ds # Make ds:0 point to wakeup_start
movw %ax, %ss
+
+ testl $4, realmode_flags - wakeup_code
+ jz 1f
+ BEEP
+1:
mov $(wakeup_stack - wakeup_code), %sp # Private stack is needed for ASUS board
movw $0x0e00 + 'S', %fs:(0x12)
cmpl $0x12345678, %eax
jne bogus_real_magic
- testl $1, video_flags - wakeup_code
+ testl $1, realmode_flags - wakeup_code
jz 1f
lcall $0xc000,$3
movw %cs, %ax
movw %ax, %ss
1:
- testl $2, video_flags - wakeup_code
+ testl $2, realmode_flags - wakeup_code
jz 1f
mov video_mode - wakeup_code, %ax
call mode_set
cmpl $0x12345678, %eax
jne bogus_real_magic
- ljmpl $__KERNEL_CS,$wakeup_pmode_return
+ testl $8, realmode_flags - wakeup_code
+ jz 1f
+ BEEP
+1:
+ ljmpl $__KERNEL_CS, $wakeup_pmode_return
real_save_gdt: .word 0
.long 0
real_save_cr4: .long 0
real_magic: .long 0
video_mode: .long 0
-video_flags: .long 0
+realmode_flags: .long 0
+beep_flags: .long 0
real_efer_save_restore: .long 0
real_save_efer_edx: .long 0
real_save_efer_eax: .long 0
movl saved_videomode, %edx
movl %edx, video_mode - wakeup_start (%eax)
- movl acpi_video_flags, %edx
- movl %edx, video_flags - wakeup_start (%eax)
+ movl acpi_realmode_flags, %edx
+ movl %edx, realmode_flags - wakeup_start (%eax)
movl $0x12345678, real_magic - wakeup_start (%eax)
movl $0x12345678, saved_magic
popl %ebx
#include <xen/interface/xen.h>
+#ifdef CONFIG_LGUEST_GUEST
+#include <linux/lguest.h>
+#include "../../../drivers/lguest/lg.h"
+#endif
+
#define DEFINE(sym, val) \
asm volatile("\n->" #sym " %0 " #val : : "i" (val))
OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
OFFSET(XEN_vcpu_info_pending, vcpu_info, evtchn_upcall_pending);
#endif
+
+#ifdef CONFIG_LGUEST_GUEST
+ BLANK();
+ OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
+ OFFSET(LGUEST_PAGES_host_gdt_desc, lguest_pages, state.host_gdt_desc);
+ OFFSET(LGUEST_PAGES_host_idt_desc, lguest_pages, state.host_idt_desc);
+ OFFSET(LGUEST_PAGES_host_cr3, lguest_pages, state.host_cr3);
+ OFFSET(LGUEST_PAGES_host_sp, lguest_pages, state.host_sp);
+ OFFSET(LGUEST_PAGES_guest_gdt_desc, lguest_pages,state.guest_gdt_desc);
+ OFFSET(LGUEST_PAGES_guest_idt_desc, lguest_pages,state.guest_idt_desc);
+ OFFSET(LGUEST_PAGES_guest_gdt, lguest_pages, state.guest_gdt);
+ OFFSET(LGUEST_PAGES_regs_trapnum, lguest_pages, regs.trapnum);
+ OFFSET(LGUEST_PAGES_regs_errcode, lguest_pages, regs.errcode);
+ OFFSET(LGUEST_PAGES_regs, lguest_pages, regs);
+#endif
}
* per-CPU TSS segments. Threads are completely 'soft' on Linux,
* no more per-task TSS's.
*/
-DEFINE_PER_CPU(struct tss_struct, init_tss) ____cacheline_internodealigned_in_smp = INIT_TSS;
+DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, init_tss) = INIT_TSS;
#include <asm/apic.h>
#include <asm/uaccess.h>
-DEFINE_PER_CPU(irq_cpustat_t, irq_stat) ____cacheline_internodealigned_in_smp;
+DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
DEFINE_PER_CPU(struct pt_regs *, irq_regs);
#include <linux/mca.h>
#endif
+#if defined(CONFIG_EDAC)
+#include <linux/edac.h>
+#endif
+
#include <asm/processor.h>
#include <asm/system.h>
#include <asm/io.h>