Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6
This commit is contained in:
Коммит
ff877ea80e
3
CREDITS
3
CREDITS
|
@ -3344,8 +3344,7 @@ S: Spain
|
|||
N: Linus Torvalds
|
||||
E: torvalds@linux-foundation.org
|
||||
D: Original kernel hacker
|
||||
S: 12725 SW Millikan Way, Suite 400
|
||||
S: Beaverton, Oregon 97005
|
||||
S: Portland, Oregon 97005
|
||||
S: USA
|
||||
|
||||
N: Marcelo Tosatti
|
||||
|
|
|
@ -26,3 +26,37 @@ Description:
|
|||
I/O statistics of partition <part>. The format is the
|
||||
same as the above-written /sys/block/<disk>/stat
|
||||
format.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/format
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Metadata format for integrity capable block device.
|
||||
E.g. T10-DIF-TYPE1-CRC.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/read_verify
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should verify the
|
||||
integrity of read requests serviced by devices that
|
||||
support sending integrity metadata.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/tag_size
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Number of bytes of integrity tag space available per
|
||||
512 bytes of data.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/write_generate
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
|
|
@ -0,0 +1,35 @@
|
|||
What: /sys/bus/css/devices/.../type
|
||||
Date: March 2008
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the subchannel type, as reported by the hardware.
|
||||
This attribute is present for all subchannel types.
|
||||
|
||||
What: /sys/bus/css/devices/.../modalias
|
||||
Date: March 2008
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the module alias as reported with uevents.
|
||||
It is of the format css:t<type> and present for all
|
||||
subchannel types.
|
||||
|
||||
What: /sys/bus/css/drivers/io_subchannel/.../chpids
|
||||
Date: December 2002
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the ids of the channel paths used by this
|
||||
subchannel, as reported by the channel subsystem
|
||||
during subchannel recognition.
|
||||
Note: This is an I/O-subchannel specific attribute.
|
||||
Users: s390-tools, HAL
|
||||
|
||||
What: /sys/bus/css/drivers/io_subchannel/.../pimpampom
|
||||
Date: December 2002
|
||||
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
linux-s390@vger.kernel.org
|
||||
Description: Contains the PIM/PAM/POM values, as reported by the
|
||||
channel subsystem when last queried by the common I/O
|
||||
layer (this implies that this attribute is not neccessarily
|
||||
in sync with the values current in the channel subsystem).
|
||||
Note: This is an I/O-subchannel specific attribute.
|
||||
Users: s390-tools, HAL
|
|
@ -0,0 +1,20 @@
|
|||
What: /sys/dev
|
||||
Date: April 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: Dan Williams <dan.j.williams@intel.com>
|
||||
Description: The /sys/dev tree provides a method to look up the sysfs
|
||||
path for a device using the information returned from
|
||||
stat(2). There are two directories, 'block' and 'char',
|
||||
beneath /sys/dev containing symbolic links with names of
|
||||
the form "<major>:<minor>". These links point to the
|
||||
corresponding sysfs path for the given device.
|
||||
|
||||
Example:
|
||||
$ readlink /sys/dev/block/8:32
|
||||
../../block/sdc
|
||||
|
||||
Entries in /sys/dev/char and /sys/dev/block will be
|
||||
dynamically created and destroyed as devices enter and
|
||||
leave the system.
|
||||
|
||||
Users: mdadm <linux-raid@vger.kernel.org>
|
|
@ -29,46 +29,46 @@ Description:
|
|||
|
||||
$ cd /sys/firmware/acpi/interrupts
|
||||
$ grep . *
|
||||
error:0
|
||||
ff_gbl_lock:0
|
||||
ff_pmtimer:0
|
||||
ff_pwr_btn:0
|
||||
ff_rt_clk:0
|
||||
ff_slp_btn:0
|
||||
gpe00:0
|
||||
gpe01:0
|
||||
gpe02:0
|
||||
gpe03:0
|
||||
gpe04:0
|
||||
gpe05:0
|
||||
gpe06:0
|
||||
gpe07:0
|
||||
gpe08:0
|
||||
gpe09:174
|
||||
gpe0A:0
|
||||
gpe0B:0
|
||||
gpe0C:0
|
||||
gpe0D:0
|
||||
gpe0E:0
|
||||
gpe0F:0
|
||||
gpe10:0
|
||||
gpe11:60
|
||||
gpe12:0
|
||||
gpe13:0
|
||||
gpe14:0
|
||||
gpe15:0
|
||||
gpe16:0
|
||||
gpe17:0
|
||||
gpe18:0
|
||||
gpe19:7
|
||||
gpe1A:0
|
||||
gpe1B:0
|
||||
gpe1C:0
|
||||
gpe1D:0
|
||||
gpe1E:0
|
||||
gpe1F:0
|
||||
gpe_all:241
|
||||
sci:241
|
||||
error: 0
|
||||
ff_gbl_lock: 0 enable
|
||||
ff_pmtimer: 0 invalid
|
||||
ff_pwr_btn: 0 enable
|
||||
ff_rt_clk: 2 disable
|
||||
ff_slp_btn: 0 invalid
|
||||
gpe00: 0 invalid
|
||||
gpe01: 0 enable
|
||||
gpe02: 108 enable
|
||||
gpe03: 0 invalid
|
||||
gpe04: 0 invalid
|
||||
gpe05: 0 invalid
|
||||
gpe06: 0 enable
|
||||
gpe07: 0 enable
|
||||
gpe08: 0 invalid
|
||||
gpe09: 0 invalid
|
||||
gpe0A: 0 invalid
|
||||
gpe0B: 0 invalid
|
||||
gpe0C: 0 invalid
|
||||
gpe0D: 0 invalid
|
||||
gpe0E: 0 invalid
|
||||
gpe0F: 0 invalid
|
||||
gpe10: 0 invalid
|
||||
gpe11: 0 invalid
|
||||
gpe12: 0 invalid
|
||||
gpe13: 0 invalid
|
||||
gpe14: 0 invalid
|
||||
gpe15: 0 invalid
|
||||
gpe16: 0 invalid
|
||||
gpe17: 1084 enable
|
||||
gpe18: 0 enable
|
||||
gpe19: 0 invalid
|
||||
gpe1A: 0 invalid
|
||||
gpe1B: 0 invalid
|
||||
gpe1C: 0 invalid
|
||||
gpe1D: 0 invalid
|
||||
gpe1E: 0 invalid
|
||||
gpe1F: 0 invalid
|
||||
gpe_all: 1192
|
||||
sci: 1194
|
||||
|
||||
sci - The total number of times the ACPI SCI
|
||||
has claimed an interrupt.
|
||||
|
@ -89,6 +89,13 @@ Description:
|
|||
|
||||
error - an interrupt that can't be accounted for above.
|
||||
|
||||
invalid: it's either a wakeup GPE or a GPE/Fixed Event that
|
||||
doesn't have an event handler.
|
||||
|
||||
disable: the GPE/Fixed Event is valid but disabled.
|
||||
|
||||
enable: the GPE/Fixed Event is valid and enabled.
|
||||
|
||||
Root has permission to clear any of these counters. Eg.
|
||||
# echo 0 > gpe11
|
||||
|
||||
|
@ -97,3 +104,43 @@ Description:
|
|||
|
||||
None of these counters has an effect on the function
|
||||
of the system, they are simply statistics.
|
||||
|
||||
Besides this, user can also write specific strings to these files
|
||||
to enable/disable/clear ACPI interrupts in user space, which can be
|
||||
used to debug some ACPI interrupt storm issues.
|
||||
|
||||
Note that only writting to VALID GPE/Fixed Event is allowed,
|
||||
i.e. user can only change the status of runtime GPE and
|
||||
Fixed Event with event handler installed.
|
||||
|
||||
Let's take power button fixed event for example, please kill acpid
|
||||
and other user space applications so that the machine won't shutdown
|
||||
when pressing the power button.
|
||||
# cat ff_pwr_btn
|
||||
0
|
||||
# press the power button for 3 times;
|
||||
# cat ff_pwr_btn
|
||||
3
|
||||
# echo disable > ff_pwr_btn
|
||||
# cat ff_pwr_btn
|
||||
disable
|
||||
# press the power button for 3 times;
|
||||
# cat ff_pwr_btn
|
||||
disable
|
||||
# echo enable > ff_pwr_btn
|
||||
# cat ff_pwr_btn
|
||||
4
|
||||
/*
|
||||
* this is because the status bit is set even if the enable bit is cleared,
|
||||
* and it triggers an ACPI fixed event when the enable bit is set again
|
||||
*/
|
||||
# press the power button for 3 times;
|
||||
# cat ff_pwr_btn
|
||||
7
|
||||
# echo disable > ff_pwr_btn
|
||||
# press the power button for 3 times;
|
||||
# echo clear > ff_pwr_btn /* clear the status bit */
|
||||
# echo disable > ff_pwr_btn
|
||||
# cat ff_pwr_btn
|
||||
7
|
||||
|
||||
|
|
|
@ -0,0 +1,71 @@
|
|||
What: /sys/firmware/memmap/
|
||||
Date: June 2008
|
||||
Contact: Bernhard Walle <bwalle@suse.de>
|
||||
Description:
|
||||
On all platforms, the firmware provides a memory map which the
|
||||
kernel reads. The resources from that memory map are registered
|
||||
in the kernel resource tree and exposed to userspace via
|
||||
/proc/iomem (together with other resources).
|
||||
|
||||
However, on most architectures that firmware-provided memory
|
||||
map is modified afterwards by the kernel itself, either because
|
||||
the kernel merges that memory map with other information or
|
||||
just because the user overwrites that memory map via command
|
||||
line.
|
||||
|
||||
kexec needs the raw firmware-provided memory map to setup the
|
||||
parameter segment of the kernel that should be booted with
|
||||
kexec. Also, the raw memory map is useful for debugging. For
|
||||
that reason, /sys/firmware/memmap is an interface that provides
|
||||
the raw memory map to userspace.
|
||||
|
||||
The structure is as follows: Under /sys/firmware/memmap there
|
||||
are subdirectories with the number of the entry as their name:
|
||||
|
||||
/sys/firmware/memmap/0
|
||||
/sys/firmware/memmap/1
|
||||
/sys/firmware/memmap/2
|
||||
/sys/firmware/memmap/3
|
||||
...
|
||||
|
||||
The maximum depends on the number of memory map entries provided
|
||||
by the firmware. The order is just the order that the firmware
|
||||
provides.
|
||||
|
||||
Each directory contains three files:
|
||||
|
||||
start : The start address (as hexadecimal number with the
|
||||
'0x' prefix).
|
||||
end : The end address, inclusive (regardless whether the
|
||||
firmware provides inclusive or exclusive ranges).
|
||||
type : Type of the entry as string. See below for a list of
|
||||
valid types.
|
||||
|
||||
So, for example:
|
||||
|
||||
/sys/firmware/memmap/0/start
|
||||
/sys/firmware/memmap/0/end
|
||||
/sys/firmware/memmap/0/type
|
||||
/sys/firmware/memmap/1/start
|
||||
...
|
||||
|
||||
Currently following types exist:
|
||||
|
||||
- System RAM
|
||||
- ACPI Tables
|
||||
- ACPI Non-volatile Storage
|
||||
- reserved
|
||||
|
||||
Following shell snippet can be used to display that memory
|
||||
map in a human-readable format:
|
||||
|
||||
-------------------- 8< ----------------------------------------
|
||||
#!/bin/bash
|
||||
cd /sys/firmware/memmap
|
||||
for dir in * ; do
|
||||
start=$(cat $dir/start)
|
||||
end=$(cat $dir/end)
|
||||
type=$(cat $dir/type)
|
||||
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
|
||||
done
|
||||
-------------------- >8 ----------------------------------------
|
|
@ -22,3 +22,12 @@ ready and available in memory. The DMA of the "completion indication"
|
|||
could race with data DMA. Mapping the memory used for completion
|
||||
indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
|
||||
|
||||
DMA_ATTR_WEAK_ORDERING
|
||||
----------------------
|
||||
|
||||
DMA_ATTR_WEAK_ORDERING specifies that reads and writes to the mapping
|
||||
may be weakly ordered, that is that reads and writes may pass each other.
|
||||
|
||||
Since it is optional for platforms to implement DMA_ATTR_WEAK_ORDERING,
|
||||
those that do not will simply ignore the attribute and exhibit default
|
||||
behavior.
|
||||
|
|
|
@ -524,6 +524,44 @@ These utilities include endpoint autoconfiguration.
|
|||
<!-- !Edrivers/usb/gadget/epautoconf.c -->
|
||||
</sect1>
|
||||
|
||||
<sect1 id="composite"><title>Composite Device Framework</title>
|
||||
|
||||
<para>The core API is sufficient for writing drivers for composite
|
||||
USB devices (with more than one function in a given configuration),
|
||||
and also multi-configuration devices (also more than one function,
|
||||
but not necessarily sharing a given configuration).
|
||||
There is however an optional framework which makes it easier to
|
||||
reuse and combine functions.
|
||||
</para>
|
||||
|
||||
<para>Devices using this framework provide a <emphasis>struct
|
||||
usb_composite_driver</emphasis>, which in turn provides one or
|
||||
more <emphasis>struct usb_configuration</emphasis> instances.
|
||||
Each such configuration includes at least one
|
||||
<emphasis>struct usb_function</emphasis>, which packages a user
|
||||
visible role such as "network link" or "mass storage device".
|
||||
Management functions may also exist, such as "Device Firmware
|
||||
Upgrade".
|
||||
</para>
|
||||
|
||||
!Iinclude/linux/usb/composite.h
|
||||
!Edrivers/usb/gadget/composite.c
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="functions"><title>Composite Device Functions</title>
|
||||
|
||||
<para>At this writing, a few of the current gadget drivers have
|
||||
been converted to this framework.
|
||||
Near-term plans include converting all of them, except for "gadgetfs".
|
||||
</para>
|
||||
|
||||
!Edrivers/usb/gadget/f_acm.c
|
||||
!Edrivers/usb/gadget/f_serial.c
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="controllers"><title>Peripheral Controller Drivers</title>
|
||||
|
|
|
@ -21,6 +21,18 @@
|
|||
</affiliation>
|
||||
</author>
|
||||
|
||||
<copyright>
|
||||
<year>2006-2008</year>
|
||||
<holder>Hans-Jürgen Koch.</holder>
|
||||
</copyright>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is Free Software licensed under the terms of the
|
||||
GPL version 2.
|
||||
</para>
|
||||
</legalnotice>
|
||||
|
||||
<pubdate>2006-12-11</pubdate>
|
||||
|
||||
<abstract>
|
||||
|
@ -29,6 +41,12 @@
|
|||
</abstract>
|
||||
|
||||
<revhistory>
|
||||
<revision>
|
||||
<revnumber>0.5</revnumber>
|
||||
<date>2008-05-22</date>
|
||||
<authorinitials>hjk</authorinitials>
|
||||
<revremark>Added description of write() function.</revremark>
|
||||
</revision>
|
||||
<revision>
|
||||
<revnumber>0.4</revnumber>
|
||||
<date>2007-11-26</date>
|
||||
|
@ -57,20 +75,9 @@
|
|||
</bookinfo>
|
||||
|
||||
<chapter id="aboutthisdoc">
|
||||
<?dbhtml filename="about.html"?>
|
||||
<?dbhtml filename="aboutthis.html"?>
|
||||
<title>About this document</title>
|
||||
|
||||
<sect1 id="copyright">
|
||||
<?dbhtml filename="copyright.html"?>
|
||||
<title>Copyright and License</title>
|
||||
<para>
|
||||
Copyright (c) 2006 by Hans-Jürgen Koch.</para>
|
||||
<para>
|
||||
This documentation is Free Software licensed under the terms of the
|
||||
GPL version 2.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="translations">
|
||||
<?dbhtml filename="translations.html"?>
|
||||
<title>Translations</title>
|
||||
|
@ -189,6 +196,30 @@ interested in translating it, please email me
|
|||
represents the total interrupt count. You can use this number
|
||||
to figure out if you missed some interrupts.
|
||||
</para>
|
||||
<para>
|
||||
For some hardware that has more than one interrupt source internally,
|
||||
but not separate IRQ mask and status registers, there might be
|
||||
situations where userspace cannot determine what the interrupt source
|
||||
was if the kernel handler disables them by writing to the chip's IRQ
|
||||
register. In such a case, the kernel has to disable the IRQ completely
|
||||
to leave the chip's register untouched. Now the userspace part can
|
||||
determine the cause of the interrupt, but it cannot re-enable
|
||||
interrupts. Another cornercase is chips where re-enabling interrupts
|
||||
is a read-modify-write operation to a combined IRQ status/acknowledge
|
||||
register. This would be racy if a new interrupt occurred
|
||||
simultaneously.
|
||||
</para>
|
||||
<para>
|
||||
To address these problems, UIO also implements a write() function. It
|
||||
is normally not used and can be ignored for hardware that has only a
|
||||
single interrupt source or has separate IRQ mask and status registers.
|
||||
If you need it, however, a write to <filename>/dev/uioX</filename>
|
||||
will call the <function>irqcontrol()</function> function implemented
|
||||
by the driver. You have to write a 32-bit value that is usually either
|
||||
0 or 1 to disable or enable interrupts. If a driver does not implement
|
||||
<function>irqcontrol()</function>, <function>write()</function> will
|
||||
return with <varname>-ENOSYS</varname>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To handle interrupts properly, your custom kernel module can
|
||||
|
@ -362,6 +393,14 @@ device is actually used.
|
|||
<function>open()</function>, you will probably also want a custom
|
||||
<function>release()</function> function.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
<varname>int (*irqcontrol)(struct uio_info *info, s32 irq_on)
|
||||
</varname>: Optional. If you need to be able to enable or disable
|
||||
interrupts from userspace by writing to <filename>/dev/uioX</filename>,
|
||||
you can implement this function. The parameter <varname>irq_on</varname>
|
||||
will be 0 to disable interrupts and 1 to enable them.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
|
|
|
@ -358,7 +358,7 @@ Here is a list of some of the different kernel trees available:
|
|||
- pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
|
||||
|
||||
- SCSI, James Bottomley <James.Bottomley@SteelEye.com>
|
||||
- SCSI, James Bottomley <James.Bottomley@hansenpartnership.com>
|
||||
git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
|
||||
|
||||
- x86, Ingo Molnar <mingo@elte.hu>
|
||||
|
@ -377,7 +377,7 @@ Bug Reporting
|
|||
bugzilla.kernel.org is where the Linux kernel developers track kernel
|
||||
bugs. Users are encouraged to report all bugs that they find in this
|
||||
tool. For details on how to use the kernel bugzilla, please see:
|
||||
http://test.kernel.org/bugzilla/faq.html
|
||||
http://bugzilla.kernel.org/page.cgi?id=faq.html
|
||||
|
||||
The file REPORTING-BUGS in the main kernel source directory has a good
|
||||
template for how to report a possible kernel bug, and details what kind
|
||||
|
|
|
@ -1,17 +1,26 @@
|
|||
ChangeLog:
|
||||
Started by Ingo Molnar <mingo@redhat.com>
|
||||
Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||
|
||||
SMP IRQ affinity, started by Ingo Molnar <mingo@redhat.com>
|
||||
|
||||
SMP IRQ affinity
|
||||
|
||||
/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
|
||||
for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
|
||||
to turn off all CPUs, and if an IRQ controller does not support IRQ
|
||||
affinity then the value will not change from the default 0xffffffff.
|
||||
|
||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||
the IRQ to CPU4-7 (this is an 8-CPU SMP box):
|
||||
/proc/irq/default_smp_affinity specifies default affinity mask that applies
|
||||
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
|
||||
will be set to the default mask. It can then be changed as described above.
|
||||
Default mask is 0xffffffff.
|
||||
|
||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||
it to CPU4-7 (this is an 8-CPU SMP box):
|
||||
|
||||
[root@moon 44]# cd /proc/irq/44
|
||||
[root@moon 44]# cat smp_affinity
|
||||
ffffffff
|
||||
|
||||
[root@moon 44]# echo 0f > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
0000000f
|
||||
|
@ -21,17 +30,27 @@ PING hell (195.4.7.3): 56 data bytes
|
|||
--- hell ping statistics ---
|
||||
6029 packets transmitted, 6027 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | grep 44:
|
||||
44: 0 1785 1785 1783 1783 1
|
||||
1 0 IO-APIC-level eth1
|
||||
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
|
||||
|
||||
As can be seen from the line above IRQ44 was delivered only to the first four
|
||||
processors (0-3).
|
||||
Now lets restrict that IRQ to CPU(4-7).
|
||||
|
||||
[root@moon 44]# echo f0 > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
000000f0
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
..
|
||||
--- hell ping statistics ---
|
||||
2779 packets transmitted, 2777 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | grep 44:
|
||||
44: 1068 1785 1785 1784 1784 1069 1070 1069 IO-APIC-level eth1
|
||||
[root@moon 44]#
|
||||
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
|
||||
|
||||
This time around IRQ44 was delivered only to the last four processors.
|
||||
i.e counters for the CPU0-3 did not change.
|
||||
|
||||
|
|
|
@ -93,6 +93,9 @@ Since NMI handlers disable preemption, synchronize_sched() is guaranteed
|
|||
not to return until all ongoing NMI handlers exit. It is therefore safe
|
||||
to free up the handler's data as soon as synchronize_sched() returns.
|
||||
|
||||
Important note: for this to work, the architecture in question must
|
||||
invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
|
||||
|
||||
|
||||
Answer to Quick Quiz
|
||||
|
||||
|
|
|
@ -52,6 +52,10 @@ of each iteration. Unfortunately, chaotic relaxation requires highly
|
|||
structured data, such as the matrices used in scientific programs, and
|
||||
is thus inapplicable to most data structures in operating-system kernels.
|
||||
|
||||
In 1992, Henry (now Alexia) Massalin completed a dissertation advising
|
||||
parallel programmers to defer processing when feasible to simplify
|
||||
synchronization. RCU makes extremely heavy use of this advice.
|
||||
|
||||
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
|
||||
simplest deferred-free technique: simply waiting a fixed amount of time
|
||||
before freeing blocks awaiting deferred free. Jacobson did not describe
|
||||
|
@ -138,6 +142,13 @@ blocking in read-side critical sections appeared [PaulEMcKenney2006c],
|
|||
Robert Olsson described an RCU-protected trie-hash combination
|
||||
[RobertOlsson2006a].
|
||||
|
||||
2007 saw the journal version of the award-winning RCU paper from 2006
|
||||
[ThomasEHart2007a], as well as a paper demonstrating use of Promela
|
||||
and Spin to mechanically verify an optimization to Oleg Nesterov's
|
||||
QRCU [PaulEMcKenney2007QRCUspin], a design document describing
|
||||
preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
|
||||
LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
|
||||
PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
|
||||
|
||||
Bibtex Entries
|
||||
|
||||
|
@ -202,6 +213,20 @@ Bibtex Entries
|
|||
,Year="1991"
|
||||
}
|
||||
|
||||
@phdthesis{HMassalinPhD
|
||||
,author="H. Massalin"
|
||||
,title="Synthesis: An Efficient Implementation of Fundamental Operating
|
||||
System Services"
|
||||
,school="Columbia University"
|
||||
,address="New York, NY"
|
||||
,year="1992"
|
||||
,annotation="
|
||||
Mondo optimizing compiler.
|
||||
Wait-free stuff.
|
||||
Good advice: defer work to avoid synchronization.
|
||||
"
|
||||
}
|
||||
|
||||
@unpublished{Jacobson93
|
||||
,author="Van Jacobson"
|
||||
,title="Avoid Read-Side Locking Via Delayed Free"
|
||||
|
@ -635,3 +660,86 @@ Revised:
|
|||
"
|
||||
}
|
||||
|
||||
@unpublished{PaulEMcKenney2007PreemptibleRCU
|
||||
,Author="Paul E. McKenney"
|
||||
,Title="The design of preemptible read-copy-update"
|
||||
,month="October"
|
||||
,day="8"
|
||||
,year="2007"
|
||||
,note="Available:
|
||||
\url{http://lwn.net/Articles/253651/}
|
||||
[Viewed October 25, 2007]"
|
||||
,annotation="
|
||||
LWN article describing the design of preemptible RCU.
|
||||
"
|
||||
}
|
||||
|
||||
########################################################################
|
||||
#
|
||||
# "What is RCU?" LWN series.
|
||||
#
|
||||
|
||||
@unpublished{PaulEMcKenney2007WhatIsRCUFundamentally
|
||||
,Author="Paul E. McKenney and Jonathan Walpole"
|
||||
,Title="What is {RCU}, Fundamentally?"
|
||||
,month="December"
|
||||
,day="17"
|
||||
,year="2007"
|
||||
,note="Available:
|
||||
\url{http://lwn.net/Articles/262464/}
|
||||
[Viewed December 27, 2007]"
|
||||
,annotation="
|
||||
Lays out the three basic components of RCU: (1) publish-subscribe,
|
||||
(2) wait for pre-existing readers to complete, and (2) maintain
|
||||
multiple versions.
|
||||
"
|
||||
}
|
||||
|
||||
@unpublished{PaulEMcKenney2008WhatIsRCUUsage
|
||||
,Author="Paul E. McKenney"
|
||||
,Title="What is {RCU}? Part 2: Usage"
|
||||
,month="January"
|
||||
,day="4"
|
||||
,year="2008"
|
||||
,note="Available:
|
||||
\url{http://lwn.net/Articles/263130/}
|
||||
[Viewed January 4, 2008]"
|
||||
,annotation="
|
||||
Lays out six uses of RCU:
|
||||
1. RCU is a Reader-Writer Lock Replacement
|
||||
2. RCU is a Restricted Reference-Counting Mechanism
|
||||
3. RCU is a Bulk Reference-Counting Mechanism
|
||||
4. RCU is a Poor Man's Garbage Collector
|
||||
5. RCU is a Way of Providing Existence Guarantees
|
||||
6. RCU is a Way of Waiting for Things to Finish
|
||||
"
|
||||
}
|
||||
|
||||
@unpublished{PaulEMcKenney2008WhatIsRCUAPI
|
||||
,Author="Paul E. McKenney"
|
||||
,Title="{RCU} part 3: the {RCU} {API}"
|
||||
,month="January"
|
||||
,day="17"
|
||||
,year="2008"
|
||||
,note="Available:
|
||||
\url{http://lwn.net/Articles/264090/}
|
||||
[Viewed January 10, 2008]"
|
||||
,annotation="
|
||||
Gives an overview of the Linux-kernel RCU API and a brief annotated RCU
|
||||
bibliography.
|
||||
"
|
||||
}
|
||||
|
||||
@article{DinakarGuniguntala2008IBMSysJ
|
||||
,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
|
||||
,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
|
||||
,Year="2008"
|
||||
,Month="April"
|
||||
,journal="IBM Systems Journal"
|
||||
,volume="47"
|
||||
,number="2"
|
||||
,pages="@@-@@"
|
||||
,annotation="
|
||||
RCU, realtime RCU, sleepable RCU, performance.
|
||||
"
|
||||
}
|
||||
|
|
|
@ -13,10 +13,13 @@ over a rather long period of time, but improvements are always welcome!
|
|||
detailed performance measurements show that RCU is nonetheless
|
||||
the right tool for the job.
|
||||
|
||||
The other exception would be where performance is not an issue,
|
||||
and RCU provides a simpler implementation. An example of this
|
||||
situation is the dynamic NMI code in the Linux 2.6 kernel,
|
||||
at least on architectures where NMIs are rare.
|
||||
Another exception is where performance is not an issue, and RCU
|
||||
provides a simpler implementation. An example of this situation
|
||||
is the dynamic NMI code in the Linux 2.6 kernel, at least on
|
||||
architectures where NMIs are rare.
|
||||
|
||||
Yet another exception is where the low real-time latency of RCU's
|
||||
read-side primitives is critically important.
|
||||
|
||||
1. Does the update code have proper mutual exclusion?
|
||||
|
||||
|
@ -39,9 +42,10 @@ over a rather long period of time, but improvements are always welcome!
|
|||
|
||||
2. Do the RCU read-side critical sections make proper use of
|
||||
rcu_read_lock() and friends? These primitives are needed
|
||||
to suppress preemption (or bottom halves, in the case of
|
||||
rcu_read_lock_bh()) in the read-side critical sections,
|
||||
and are also an excellent aid to readability.
|
||||
to prevent grace periods from ending prematurely, which
|
||||
could result in data being unceremoniously freed out from
|
||||
under your read-side code, which can greatly increase the
|
||||
actuarial risk of your kernel.
|
||||
|
||||
As a rough rule of thumb, any dereference of an RCU-protected
|
||||
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
||||
|
@ -54,15 +58,30 @@ over a rather long period of time, but improvements are always welcome!
|
|||
be running while updates are in progress. There are a number
|
||||
of ways to handle this concurrency, depending on the situation:
|
||||
|
||||
a. Make updates appear atomic to readers. For example,
|
||||
a. Use the RCU variants of the list and hlist update
|
||||
primitives to add, remove, and replace elements on an
|
||||
RCU-protected list. Alternatively, use the RCU-protected
|
||||
trees that have been added to the Linux kernel.
|
||||
|
||||
This is almost always the best approach.
|
||||
|
||||
b. Proceed as in (a) above, but also maintain per-element
|
||||
locks (that are acquired by both readers and writers)
|
||||
that guard per-element state. Of course, fields that
|
||||
the readers refrain from accessing can be guarded by the
|
||||
update-side lock.
|
||||
|
||||
This works quite well, also.
|
||||
|
||||
c. Make updates appear atomic to readers. For example,
|
||||
pointer updates to properly aligned fields will appear
|
||||
atomic, as will individual atomic primitives. Operations
|
||||
performed under a lock and sequences of multiple atomic
|
||||
primitives will -not- appear to be atomic.
|
||||
|
||||
This is almost always the best approach.
|
||||
This can work, but is starting to get a bit tricky.
|
||||
|
||||
b. Carefully order the updates and the reads so that
|
||||
d. Carefully order the updates and the reads so that
|
||||
readers see valid data at all phases of the update.
|
||||
This is often more difficult than it sounds, especially
|
||||
given modern CPUs' tendency to reorder memory references.
|
||||
|
@ -123,18 +142,22 @@ over a rather long period of time, but improvements are always welcome!
|
|||
when publicizing a pointer to a structure that can
|
||||
be traversed by an RCU read-side critical section.
|
||||
|
||||
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
||||
is used, the callback function must be written to be called
|
||||
from softirq context. In particular, it cannot block.
|
||||
5. If call_rcu(), or a related primitive such as call_rcu_bh() or
|
||||
call_rcu_sched(), is used, the callback function must be
|
||||
written to be called from softirq context. In particular,
|
||||
it cannot block.
|
||||
|
||||
6. Since synchronize_rcu() can block, it cannot be called from
|
||||
any sort of irq context.
|
||||
any sort of irq context. Ditto for synchronize_sched() and
|
||||
synchronize_srcu().
|
||||
|
||||
7. If the updater uses call_rcu(), then the corresponding readers
|
||||
must use rcu_read_lock() and rcu_read_unlock(). If the updater
|
||||
uses call_rcu_bh(), then the corresponding readers must use
|
||||
rcu_read_lock_bh() and rcu_read_unlock_bh(). Mixing things up
|
||||
will result in confusion and broken kernels.
|
||||
rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater
|
||||
uses call_rcu_sched(), then the corresponding readers must
|
||||
disable preemption. Mixing things up will result in confusion
|
||||
and broken kernels.
|
||||
|
||||
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
|
||||
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
|
||||
|
@ -143,9 +166,9 @@ over a rather long period of time, but improvements are always welcome!
|
|||
such cases is a must, of course! And the jury is still out on
|
||||
whether the increased speed is worth it.
|
||||
|
||||
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
|
||||
it usually results in simpler code. So, unless update
|
||||
performance is critically important or the updaters cannot block,
|
||||
8. Although synchronize_rcu() is slower than is call_rcu(), it
|
||||
usually results in simpler code. So, unless update performance
|
||||
is critically important or the updaters cannot block,
|
||||
synchronize_rcu() should be used in preference to call_rcu().
|
||||
|
||||
An especially important property of the synchronize_rcu()
|
||||
|
@ -187,23 +210,23 @@ over a rather long period of time, but improvements are always welcome!
|
|||
number of updates per grace period.
|
||||
|
||||
9. All RCU list-traversal primitives, which include
|
||||
list_for_each_rcu(), list_for_each_entry_rcu(),
|
||||
rcu_dereference(), list_for_each_rcu(), list_for_each_entry_rcu(),
|
||||
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
|
||||
must be within an RCU read-side critical section. RCU
|
||||
must be either within an RCU read-side critical section or
|
||||
must be protected by appropriate update-side locks. RCU
|
||||
read-side critical sections are delimited by rcu_read_lock()
|
||||
and rcu_read_unlock(), or by similar primitives such as
|
||||
rcu_read_lock_bh() and rcu_read_unlock_bh().
|
||||
|
||||
Use of the _rcu() list-traversal primitives outside of an
|
||||
RCU read-side critical section causes no harm other than
|
||||
a slight performance degradation on Alpha CPUs. It can
|
||||
also be quite helpful in reducing code bloat when common
|
||||
code is shared between readers and updaters.
|
||||
The reason that it is permissible to use RCU list-traversal
|
||||
primitives when the update-side lock is held is that doing so
|
||||
can be quite helpful in reducing code bloat when common code is
|
||||
shared between readers and updaters.
|
||||
|
||||
10. Conversely, if you are in an RCU read-side critical section,
|
||||
you -must- use the "_rcu()" variants of the list macros.
|
||||
Failing to do so will break Alpha and confuse people reading
|
||||
your code.
|
||||
and you don't hold the appropriate update-side lock, you -must-
|
||||
use the "_rcu()" variants of the list macros. Failing to do so
|
||||
will break Alpha and confuse people reading your code.
|
||||
|
||||
11. Note that synchronize_rcu() -only- guarantees to wait until
|
||||
all currently executing rcu_read_lock()-protected RCU read-side
|
||||
|
@ -230,6 +253,14 @@ over a rather long period of time, but improvements are always welcome!
|
|||
must use whatever locking or other synchronization is required
|
||||
to safely access and/or modify that data structure.
|
||||
|
||||
RCU callbacks are -usually- executed on the same CPU that executed
|
||||
the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(),
|
||||
but are by -no- means guaranteed to be. For example, if a given
|
||||
CPU goes offline while having an RCU callback pending, then that
|
||||
RCU callback will execute on some surviving CPU. (If this was
|
||||
not the case, a self-spawning RCU callback would prevent the
|
||||
victim CPU from ever going offline.)
|
||||
|
||||
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
|
||||
may only be invoked from process context. Unlike other forms of
|
||||
RCU, it -is- permissible to block in an SRCU read-side critical
|
||||
|
|
|
@ -10,23 +10,30 @@ status messages via printk(), which can be examined via the dmesg
|
|||
command (perhaps grepping for "torture"). The test is started
|
||||
when the module is loaded, and stops when the module is unloaded.
|
||||
|
||||
However, actually setting this config option to "y" results in the system
|
||||
running the test immediately upon boot, and ending only when the system
|
||||
is taken down. Normally, one will instead want to build the system
|
||||
with CONFIG_RCU_TORTURE_TEST=m and to use modprobe and rmmod to control
|
||||
the test, perhaps using a script similar to the one shown at the end of
|
||||
this document. Note that you will need CONFIG_MODULE_UNLOAD in order
|
||||
to be able to end the test.
|
||||
CONFIG_RCU_TORTURE_TEST_RUNNABLE
|
||||
|
||||
It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will
|
||||
result in the tests being loaded into the base kernel. In this case,
|
||||
the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify
|
||||
whether the RCU torture tests are to be started immediately during
|
||||
boot or whether the /proc/sys/kernel/rcutorture_runnable file is used
|
||||
to enable them. This /proc file can be used to repeatedly pause and
|
||||
restart the tests, regardless of the initial state specified by the
|
||||
CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.
|
||||
|
||||
You will normally -not- want to start the RCU torture tests during boot
|
||||
(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing
|
||||
this can sometimes be useful in finding boot-time bugs.
|
||||
|
||||
|
||||
MODULE PARAMETERS
|
||||
|
||||
This module has the following parameters:
|
||||
|
||||
nreaders This is the number of RCU reading threads supported.
|
||||
The default is twice the number of CPUs. Why twice?
|
||||
To properly exercise RCU implementations with preemptible
|
||||
read-side critical sections.
|
||||
irqreaders Says to invoke RCU readers from irq level. This is currently
|
||||
done via timers. Defaults to "1" for variants of RCU that
|
||||
permit this. (Or, more accurately, variants of RCU that do
|
||||
-not- permit this know to ignore this variable.)
|
||||
|
||||
nfakewriters This is the number of RCU fake writer threads to run. Fake
|
||||
writer threads repeatedly use the synchronous "wait for
|
||||
|
@ -37,6 +44,16 @@ nfakewriters This is the number of RCU fake writer threads to run. Fake
|
|||
to trigger special cases caused by multiple writers, such as
|
||||
the synchronize_srcu() early return optimization.
|
||||
|
||||
nreaders This is the number of RCU reading threads supported.
|
||||
The default is twice the number of CPUs. Why twice?
|
||||
To properly exercise RCU implementations with preemptible
|
||||
read-side critical sections.
|
||||
|
||||
shuffle_interval
|
||||
The number of seconds to keep the test threads affinitied
|
||||
to a particular subset of the CPUs, defaults to 3 seconds.
|
||||
Used in conjunction with test_no_idle_hz.
|
||||
|
||||
stat_interval The number of seconds between output of torture
|
||||
statistics (via printk()). Regardless of the interval,
|
||||
statistics are printed when the module is unloaded.
|
||||
|
@ -44,10 +61,11 @@ stat_interval The number of seconds between output of torture
|
|||
be printed -only- when the module is unloaded, and this
|
||||
is the default.
|
||||
|
||||
shuffle_interval
|
||||
The number of seconds to keep the test threads affinitied
|
||||
to a particular subset of the CPUs, defaults to 5 seconds.
|
||||
Used in conjunction with test_no_idle_hz.
|
||||
stutter The length of time to run the test before pausing for this
|
||||
same period of time. Defaults to "stutter=5", so as
|
||||
to run and pause for (roughly) five-second intervals.
|
||||
Specifying "stutter=0" causes the test to run continuously
|
||||
without pausing, which is the old default behavior.
|
||||
|
||||
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
||||
a kernel that disables the scheduling-clock interrupt to
|
||||
|
|
|
@ -1,3 +1,11 @@
|
|||
Please note that the "What is RCU?" LWN series is an excellent place
|
||||
to start learning about RCU:
|
||||
|
||||
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
|
||||
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
|
||||
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
|
||||
|
||||
|
||||
What is RCU?
|
||||
|
||||
RCU is a synchronization mechanism that was added to the Linux kernel
|
||||
|
@ -772,26 +780,18 @@ Linux-kernel source code, but it helps to have a full list of the
|
|||
APIs, since there does not appear to be a way to categorize them
|
||||
in docbook. Here is the list, by category.
|
||||
|
||||
Markers for RCU read-side critical sections:
|
||||
|
||||
rcu_read_lock
|
||||
rcu_read_unlock
|
||||
rcu_read_lock_bh
|
||||
rcu_read_unlock_bh
|
||||
srcu_read_lock
|
||||
srcu_read_unlock
|
||||
|
||||
RCU pointer/list traversal:
|
||||
|
||||
rcu_dereference
|
||||
list_for_each_rcu (to be deprecated in favor of
|
||||
list_for_each_entry_rcu)
|
||||
list_for_each_entry_rcu
|
||||
list_for_each_continue_rcu (to be deprecated in favor of new
|
||||
list_for_each_entry_continue_rcu)
|
||||
hlist_for_each_entry_rcu
|
||||
|
||||
RCU pointer update:
|
||||
list_for_each_rcu (to be deprecated in favor of
|
||||
list_for_each_entry_rcu)
|
||||
list_for_each_continue_rcu (to be deprecated in favor of new
|
||||
list_for_each_entry_continue_rcu)
|
||||
|
||||
RCU pointer/list update:
|
||||
|
||||
rcu_assign_pointer
|
||||
list_add_rcu
|
||||
|
@ -799,16 +799,36 @@ RCU pointer update:
|
|||
list_del_rcu
|
||||
list_replace_rcu
|
||||
hlist_del_rcu
|
||||
hlist_add_after_rcu
|
||||
hlist_add_before_rcu
|
||||
hlist_add_head_rcu
|
||||
hlist_replace_rcu
|
||||
list_splice_init_rcu()
|
||||
|
||||
RCU grace period:
|
||||
RCU: Critical sections Grace period Barrier
|
||||
|
||||
rcu_read_lock synchronize_net rcu_barrier
|
||||
rcu_read_unlock synchronize_rcu
|
||||
call_rcu
|
||||
|
||||
|
||||
bh: Critical sections Grace period Barrier
|
||||
|
||||
rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
|
||||
rcu_read_unlock_bh
|
||||
|
||||
|
||||
sched: Critical sections Grace period Barrier
|
||||
|
||||
[preempt_disable] synchronize_sched rcu_barrier_sched
|
||||
[and friends] call_rcu_sched
|
||||
|
||||
|
||||
SRCU: Critical sections Grace period Barrier
|
||||
|
||||
srcu_read_lock synchronize_srcu N/A
|
||||
srcu_read_unlock
|
||||
|
||||
synchronize_net
|
||||
synchronize_sched
|
||||
synchronize_rcu
|
||||
synchronize_srcu
|
||||
call_rcu
|
||||
call_rcu_bh
|
||||
|
||||
See the comment headers in the source code (or the docbook generated
|
||||
from them) for more information.
|
||||
|
|
|
@ -0,0 +1,327 @@
|
|||
----------------------------------------------------------------------
|
||||
1. INTRODUCTION
|
||||
|
||||
Modern filesystems feature checksumming of data and metadata to
|
||||
protect against data corruption. However, the detection of the
|
||||
corruption is done at read time which could potentially be months
|
||||
after the data was written. At that point the original data that the
|
||||
application tried to write is most likely lost.
|
||||
|
||||
The solution is to ensure that the disk is actually storing what the
|
||||
application meant it to. Recent additions to both the SCSI family
|
||||
protocols (SBC Data Integrity Field, SCC protection proposal) as well
|
||||
as SATA/T13 (External Path Protection) try to remedy this by adding
|
||||
support for appending integrity metadata to an I/O. The integrity
|
||||
metadata (or protection information in SCSI terminology) includes a
|
||||
checksum for each sector as well as an incrementing counter that
|
||||
ensures the individual sectors are written in the right order. And
|
||||
for some protection schemes also that the I/O is written to the right
|
||||
place on disk.
|
||||
|
||||
Current storage controllers and devices implement various protective
|
||||
measures, for instance checksumming and scrubbing. But these
|
||||
technologies are working in their own isolated domains or at best
|
||||
between adjacent nodes in the I/O path. The interesting thing about
|
||||
DIF and the other integrity extensions is that the protection format
|
||||
is well defined and every node in the I/O path can verify the
|
||||
integrity of the I/O and reject it if corruption is detected. This
|
||||
allows not only corruption prevention but also isolation of the point
|
||||
of failure.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
2. THE DATA INTEGRITY EXTENSIONS
|
||||
|
||||
As written, the protocol extensions only protect the path between
|
||||
controller and storage device. However, many controllers actually
|
||||
allow the operating system to interact with the integrity metadata
|
||||
(IMD). We have been working with several FC/SAS HBA vendors to enable
|
||||
the protection information to be transferred to and from their
|
||||
controllers.
|
||||
|
||||
The SCSI Data Integrity Field works by appending 8 bytes of protection
|
||||
information to each sector. The data + integrity metadata is stored
|
||||
in 520 byte sectors on disk. Data + IMD are interleaved when
|
||||
transferred between the controller and target. The T13 proposal is
|
||||
similar.
|
||||
|
||||
Because it is highly inconvenient for operating systems to deal with
|
||||
520 (and 4104) byte sectors, we approached several HBA vendors and
|
||||
encouraged them to allow separation of the data and integrity metadata
|
||||
scatter-gather lists.
|
||||
|
||||
The controller will interleave the buffers on write and split them on
|
||||
read. This means that the Linux can DMA the data buffers to and from
|
||||
host memory without changes to the page cache.
|
||||
|
||||
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
|
||||
is somewhat heavy to compute in software. Benchmarks found that
|
||||
calculating this checksum had a significant impact on system
|
||||
performance for a number of workloads. Some controllers allow a
|
||||
lighter-weight checksum to be used when interfacing with the operating
|
||||
system. Emulex, for instance, supports the TCP/IP checksum instead.
|
||||
The IP checksum received from the OS is converted to the 16-bit CRC
|
||||
when writing and vice versa. This allows the integrity metadata to be
|
||||
generated by Linux or the application at very low cost (comparable to
|
||||
software RAID5).
|
||||
|
||||
The IP checksum is weaker than the CRC in terms of detecting bit
|
||||
errors. However, the strength is really in the separation of the data
|
||||
buffers and the integrity metadata. These two distinct buffers much
|
||||
match up for an I/O to complete.
|
||||
|
||||
The separation of the data and integrity metadata buffers as well as
|
||||
the choice in checksums is referred to as the Data Integrity
|
||||
Extensions. As these extensions are outside the scope of the protocol
|
||||
bodies (T10, T13), Oracle and its partners are trying to standardize
|
||||
them within the Storage Networking Industry Association.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
3. KERNEL CHANGES
|
||||
|
||||
The data integrity framework in Linux enables protection information
|
||||
to be pinned to I/Os and sent to/received from controllers that
|
||||
support it.
|
||||
|
||||
The advantage to the integrity extensions in SCSI and SATA is that
|
||||
they enable us to protect the entire path from application to storage
|
||||
device. However, at the same time this is also the biggest
|
||||
disadvantage. It means that the protection information must be in a
|
||||
format that can be understood by the disk.
|
||||
|
||||
Generally Linux/POSIX applications are agnostic to the intricacies of
|
||||
the storage devices they are accessing. The virtual filesystem switch
|
||||
and the block layer make things like hardware sector size and
|
||||
transport protocols completely transparent to the application.
|
||||
|
||||
However, this level of detail is required when preparing the
|
||||
protection information to send to a disk. Consequently, the very
|
||||
concept of an end-to-end protection scheme is a layering violation.
|
||||
It is completely unreasonable for an application to be aware whether
|
||||
it is accessing a SCSI or SATA disk.
|
||||
|
||||
The data integrity support implemented in Linux attempts to hide this
|
||||
from the application. As far as the application (and to some extent
|
||||
the kernel) is concerned, the integrity metadata is opaque information
|
||||
that's attached to the I/O.
|
||||
|
||||
The current implementation allows the block layer to automatically
|
||||
generate the protection information for any I/O. Eventually the
|
||||
intent is to move the integrity metadata calculation to userspace for
|
||||
user data. Metadata and other I/O that originates within the kernel
|
||||
will still use the automatic generation interface.
|
||||
|
||||
Some storage devices allow each hardware sector to be tagged with a
|
||||
16-bit value. The owner of this tag space is the owner of the block
|
||||
device. I.e. the filesystem in most cases. The filesystem can use
|
||||
this extra space to tag sectors as they see fit. Because the tag
|
||||
space is limited, the block interface allows tagging bigger chunks by
|
||||
way of interleaving. This way, 8*16 bits of information can be
|
||||
attached to a typical 4KB filesystem block.
|
||||
|
||||
This also means that applications such as fsck and mkfs will need
|
||||
access to manipulate the tags from user space. A passthrough
|
||||
interface for this is being worked on.
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
4. BLOCK LAYER IMPLEMENTATION DETAILS
|
||||
|
||||
4.1 BIO
|
||||
|
||||
The data integrity patches add a new field to struct bio when
|
||||
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
|
||||
to a struct bip which contains the bio integrity payload. Essentially
|
||||
a bip is a trimmed down struct bio which holds a bio_vec containing
|
||||
the integrity metadata and the required housekeeping information (bvec
|
||||
pool, vector count, etc.)
|
||||
|
||||
A kernel subsystem can enable data integrity protection on a bio by
|
||||
calling bio_integrity_alloc(bio). This will allocate and attach the
|
||||
bip to the bio.
|
||||
|
||||
Individual pages containing integrity metadata can subsequently be
|
||||
attached using bio_integrity_add_page().
|
||||
|
||||
bio_free() will automatically free the bip.
|
||||
|
||||
|
||||
4.2 BLOCK DEVICE
|
||||
|
||||
Because the format of the protection data is tied to the physical
|
||||
disk, each block device has been extended with a block integrity
|
||||
profile (struct blk_integrity). This optional profile is registered
|
||||
with the block layer using blk_integrity_register().
|
||||
|
||||
The profile contains callback functions for generating and verifying
|
||||
the protection data, as well as getting and setting application tags.
|
||||
The profile also contains a few constants to aid in completing,
|
||||
merging and splitting the integrity metadata.
|
||||
|
||||
Layered block devices will need to pick a profile that's appropriate
|
||||
for all subdevices. blk_integrity_compare() can help with that. DM
|
||||
and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
|
||||
will require extra work due to the application tag.
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
5.0 BLOCK LAYER INTEGRITY API
|
||||
|
||||
5.1 NORMAL FILESYSTEM
|
||||
|
||||
The normal filesystem is unaware that the underlying block device
|
||||
is capable of sending/receiving integrity metadata. The IMD will
|
||||
be automatically generated by the block layer at submit_bio() time
|
||||
in case of a WRITE. A READ request will cause the I/O integrity
|
||||
to be verified upon completion.
|
||||
|
||||
IMD generation and verification can be toggled using the
|
||||
|
||||
/sys/block/<bdev>/integrity/write_generate
|
||||
|
||||
and
|
||||
|
||||
/sys/block/<bdev>/integrity/read_verify
|
||||
|
||||
flags.
|
||||
|
||||
|
||||
5.2 INTEGRITY-AWARE FILESYSTEM
|
||||
|
||||
A filesystem that is integrity-aware can prepare I/Os with IMD
|
||||
attached. It can also use the application tag space if this is
|
||||
supported by the block device.
|
||||
|
||||
|
||||
int bdev_integrity_enabled(block_device, int rw);
|
||||
|
||||
bdev_integrity_enabled() will return 1 if the block device
|
||||
supports integrity metadata transfer for the data direction
|
||||
specified in 'rw'.
|
||||
|
||||
bdev_integrity_enabled() honors the write_generate and
|
||||
read_verify flags in sysfs and will respond accordingly.
|
||||
|
||||
|
||||
int bio_integrity_prep(bio);
|
||||
|
||||
To generate IMD for WRITE and to set up buffers for READ, the
|
||||
filesystem must call bio_integrity_prep(bio).
|
||||
|
||||
Prior to calling this function, the bio data direction and start
|
||||
sector must be set, and the bio should have all data pages
|
||||
added. It is up to the caller to ensure that the bio does not
|
||||
change while I/O is in progress.
|
||||
|
||||
bio_integrity_prep() should only be called if
|
||||
bio_integrity_enabled() returned 1.
|
||||
|
||||
|
||||
int bio_integrity_tag_size(bio);
|
||||
|
||||
If the filesystem wants to use the application tag space it will
|
||||
first have to find out how much storage space is available.
|
||||
Because tag space is generally limited (usually 2 bytes per
|
||||
sector regardless of sector size), the integrity framework
|
||||
supports interleaving the information between the sectors in an
|
||||
I/O.
|
||||
|
||||
Filesystems can call bio_integrity_tag_size(bio) to find out how
|
||||
many bytes of storage are available for that particular bio.
|
||||
|
||||
Another option is bdev_get_tag_size(block_device) which will
|
||||
return the number of available bytes per hardware sector.
|
||||
|
||||
|
||||
int bio_integrity_set_tag(bio, void *tag_buf, len);
|
||||
|
||||
After a successful return from bio_integrity_prep(),
|
||||
bio_integrity_set_tag() can be used to attach an opaque tag
|
||||
buffer to a bio. Obviously this only makes sense if the I/O is
|
||||
a WRITE.
|
||||
|
||||
|
||||
int bio_integrity_get_tag(bio, void *tag_buf, len);
|
||||
|
||||
Similarly, at READ I/O completion time the filesystem can
|
||||
retrieve the tag buffer using bio_integrity_get_tag().
|
||||
|
||||
|
||||
6.3 PASSING EXISTING INTEGRITY METADATA
|
||||
|
||||
Filesystems that either generate their own integrity metadata or
|
||||
are capable of transferring IMD from user space can use the
|
||||
following calls:
|
||||
|
||||
|
||||
struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
|
||||
|
||||
Allocates the bio integrity payload and hangs it off of the bio.
|
||||
nr_pages indicate how many pages of protection data need to be
|
||||
stored in the integrity bio_vec list (similar to bio_alloc()).
|
||||
|
||||
The integrity payload will be freed at bio_free() time.
|
||||
|
||||
|
||||
int bio_integrity_add_page(bio, page, len, offset);
|
||||
|
||||
Attaches a page containing integrity metadata to an existing
|
||||
bio. The bio must have an existing bip,
|
||||
i.e. bio_integrity_alloc() must have been called. For a WRITE,
|
||||
the integrity metadata in the pages must be in a format
|
||||
understood by the target device with the notable exception that
|
||||
the sector numbers will be remapped as the request traverses the
|
||||
I/O stack. This implies that the pages added using this call
|
||||
will be modified during I/O! The first reference tag in the
|
||||
integrity metadata must have a value of bip->bip_sector.
|
||||
|
||||
Pages can be added using bio_integrity_add_page() as long as
|
||||
there is room in the bip bio_vec array (nr_pages).
|
||||
|
||||
Upon completion of a READ operation, the attached pages will
|
||||
contain the integrity metadata received from the storage device.
|
||||
It is up to the receiver to process them and verify data
|
||||
integrity upon completion.
|
||||
|
||||
|
||||
6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
|
||||
METADATA
|
||||
|
||||
To enable integrity exchange on a block device the gendisk must be
|
||||
registered as capable:
|
||||
|
||||
int blk_integrity_register(gendisk, blk_integrity);
|
||||
|
||||
The blk_integrity struct is a template and should contain the
|
||||
following:
|
||||
|
||||
static struct blk_integrity my_profile = {
|
||||
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
|
||||
.generate_fn = my_generate_fn,
|
||||
.verify_fn = my_verify_fn,
|
||||
.get_tag_fn = my_get_tag_fn,
|
||||
.set_tag_fn = my_set_tag_fn,
|
||||
.tuple_size = sizeof(struct my_tuple_size),
|
||||
.tag_size = <tag bytes per hw sector>,
|
||||
};
|
||||
|
||||
'name' is a text string which will be visible in sysfs. This is
|
||||
part of the userland API so chose it carefully and never change
|
||||
it. The format is standards body-type-variant.
|
||||
E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
|
||||
|
||||
'generate_fn' generates appropriate integrity metadata (for WRITE).
|
||||
|
||||
'verify_fn' verifies that the data buffer matches the integrity
|
||||
metadata.
|
||||
|
||||
'tuple_size' must be set to match the size of the integrity
|
||||
metadata per sector. I.e. 8 for DIF and EPP.
|
||||
|
||||
'tag_size' must be set to identify how many bytes of tag space
|
||||
are available per hardware sector. For DIF this is either 2 or
|
||||
0 depending on the value of the Control Mode Page ATO bit.
|
||||
|
||||
See 6.2 for a description of get_tag_fn and set_tag_fn.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
|
|
@ -14,9 +14,8 @@ represent the thread siblings to cpu X in the same physical package;
|
|||
To implement it in an architecture-neutral way, a new source file,
|
||||
drivers/base/topology.c, is to export the 4 attributes.
|
||||
|
||||
If one architecture wants to support this feature, it just needs to
|
||||
implement 4 defines, typically in file include/asm-XXX/topology.h.
|
||||
The 4 defines are:
|
||||
For an architecture to support this feature, it must define some of
|
||||
these macros in include/asm-XXX/topology.h:
|
||||
#define topology_physical_package_id(cpu)
|
||||
#define topology_core_id(cpu)
|
||||
#define topology_thread_siblings(cpu)
|
||||
|
@ -25,17 +24,10 @@ The 4 defines are:
|
|||
The type of **_id is int.
|
||||
The type of siblings is cpumask_t.
|
||||
|
||||
To be consistent on all architectures, the 4 attributes should have
|
||||
default values if their values are unavailable. Below is the rule.
|
||||
1) physical_package_id: If cpu has no physical package id, -1 is the
|
||||
default value.
|
||||
2) core_id: If cpu doesn't support multi-core, its core id is 0.
|
||||
3) thread_siblings: Just include itself, if the cpu doesn't support
|
||||
HT/multi-thread.
|
||||
4) core_siblings: Just include itself, if the cpu doesn't support
|
||||
multi-core and HT/Multi-thread.
|
||||
|
||||
So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
|
||||
|
||||
If an attribute isn't defined on an architecture, it won't be exported.
|
||||
|
||||
To be consistent on all architectures, include/linux/topology.h
|
||||
provides default definitions for any of the above macros that are
|
||||
not defined by include/asm-XXX/topology.h:
|
||||
1) physical_package_id: -1
|
||||
2) core_id: 0
|
||||
3) thread_siblings: just the given CPU
|
||||
4) core_siblings: just the given CPU
|
||||
|
|
|
@ -222,13 +222,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
|
|||
|
||||
---------------------------
|
||||
|
||||
What: i2c-i810, i2c-prosavage and i2c-savage4
|
||||
When: May 2008
|
||||
Why: These drivers are superseded by i810fb, intelfb and savagefb.
|
||||
Who: Jean Delvare <khali@linux-fr.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What (Why):
|
||||
- include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
|
||||
(superseded by xt_TOS/xt_tos target & match)
|
||||
|
@ -315,9 +308,41 @@ Who: Matthew Wilcox <willy@linux.intel.com>
|
|||
|
||||
---------------------------
|
||||
|
||||
What: SCTP_GET_PEER_ADDRS_NUM_OLD, SCTP_GET_PEER_ADDRS_OLD,
|
||||
SCTP_GET_LOCAL_ADDRS_NUM_OLD, SCTP_GET_LOCAL_ADDRS_OLD
|
||||
When: June 2009
|
||||
Why: A newer version of the options have been introduced in 2005 that
|
||||
removes the limitions of the old API. The sctp library has been
|
||||
converted to use these new options at the same time. Any user
|
||||
space app that directly uses the old options should convert to using
|
||||
the new options.
|
||||
Who: Vlad Yasevich <vladislav.yasevich@hp.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: CONFIG_THERMAL_HWMON
|
||||
When: January 2009
|
||||
Why: This option was introduced just to allow older lm-sensors userspace
|
||||
to keep working over the upgrade to 2.6.26. At the scheduled time of
|
||||
removal fixed lm-sensors (2.x or 3.x) should be readily available.
|
||||
Who: Rene Herman <rene.herman@gmail.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS
|
||||
(in net/core/net-sysfs.c)
|
||||
When: After the only user (hal) has seen a release with the patches
|
||||
for enough time, probably some time in 2010.
|
||||
Why: Over 1K .text/.data size reduction, data is available in other
|
||||
ways (ioctls)
|
||||
Who: Johannes Berg <johannes@sipsolutions.net>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: CONFIG_NF_CT_ACCT
|
||||
When: 2.6.29
|
||||
Why: Accounting can now be enabled/disabled without kernel recompilation.
|
||||
Currently used only to set a default value for a feature that is also
|
||||
controlled by a kernel/module/sysfs/sysctl parameter.
|
||||
Who: Krzysztof Piotr Oledzki <ole@ans.pl>
|
||||
|
||||
|
|
|
@ -26,11 +26,11 @@ You can simplify mounting by just typing:
|
|||
|
||||
this will allocate the first available loopback device (and load loop.o
|
||||
kernel module if necessary) automatically. If the loopback driver is not
|
||||
loaded automatically, make sure that your kernel is compiled with kmod
|
||||
support (CONFIG_KMOD) enabled. Beware that umount will not
|
||||
deallocate /dev/loopN device if /etc/mtab file on your system is a
|
||||
symbolic link to /proc/mounts. You will need to do it manually using
|
||||
"-d" switch of losetup(8). Read losetup(8) manpage for more info.
|
||||
loaded automatically, make sure that you have compiled the module and
|
||||
that modprobe is functioning. Beware that umount will not deallocate
|
||||
/dev/loopN device if /etc/mtab file on your system is a symbolic link to
|
||||
/proc/mounts. You will need to do it manually using "-d" switch of
|
||||
losetup(8). Read losetup(8) manpage for more info.
|
||||
|
||||
To create the BFS image under UnixWare you need to find out first which
|
||||
slice contains it. The command prtvtoc(1M) is your friend:
|
||||
|
|
|
@ -279,7 +279,7 @@ static struct config_item *simple_children_make_item(struct config_group *group,
|
|||
|
||||
simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
|
||||
if (!simple_child)
|
||||
return NULL;
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
|
||||
config_item_init_type_name(&simple_child->item, name,
|
||||
|
@ -366,7 +366,7 @@ static struct config_group *group_children_make_group(struct config_group *group
|
|||
simple_children = kzalloc(sizeof(struct simple_children),
|
||||
GFP_KERNEL);
|
||||
if (!simple_children)
|
||||
return NULL;
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
|
||||
config_group_init_type_name(&simple_children->group, name,
|
||||
|
|
|
@ -13,72 +13,93 @@ Mailing list: linux-ext4@vger.kernel.org
|
|||
1. Quick usage instructions:
|
||||
===========================
|
||||
|
||||
- Grab updated e2fsprogs from
|
||||
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
|
||||
This is a patchset on top of e2fsprogs-1.39, which can be found at
|
||||
- Compile and install the latest version of e2fsprogs (as of this
|
||||
writing version 1.41) from:
|
||||
|
||||
http://sourceforge.net/project/showfiles.php?group_id=2406
|
||||
|
||||
or
|
||||
|
||||
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
|
||||
|
||||
- It's still mke2fs -j /dev/hda1
|
||||
or grab the latest git repository from:
|
||||
|
||||
- mount /dev/hda1 /wherever -t ext4dev
|
||||
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
|
||||
|
||||
- To enable extents,
|
||||
- Create a new filesystem using the ext4dev filesystem type:
|
||||
|
||||
mount /dev/hda1 /wherever -t ext4dev -o extents
|
||||
# mke2fs -t ext4dev /dev/hda1
|
||||
|
||||
- The filesystem is compatible with the ext3 driver until you add a file
|
||||
which has extents (ie: `mount -o extents', then create a file).
|
||||
Or configure an existing ext3 filesystem to support extents and set
|
||||
the test_fs flag to indicate that it's ok for an in-development
|
||||
filesystem to touch this filesystem:
|
||||
|
||||
NOTE: The "extents" mount flag is temporary. It will soon go away and
|
||||
extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
|
||||
# tune2fs -O extents -E test_fs /dev/hda1
|
||||
|
||||
If the filesystem was created with 128 byte inodes, it can be
|
||||
converted to use 256 byte for greater efficiency via:
|
||||
|
||||
# tune2fs -I 256 /dev/hda1
|
||||
|
||||
(Note: we currently do not have tools to convert an ext4dev
|
||||
filesystem back to ext3; so please do not do try this on production
|
||||
filesystems.)
|
||||
|
||||
- Mounting:
|
||||
|
||||
# mount -t ext4dev /dev/hda1 /wherever
|
||||
|
||||
- When comparing performance with other filesystems, remember that
|
||||
ext3/4 by default offers higher data integrity guarantees than most. So
|
||||
when comparing with a metadata-only journalling filesystem, use `mount -o
|
||||
data=writeback'. And you might as well use `mount -o nobh' too along
|
||||
with it. Making the journal larger than the mke2fs default often helps
|
||||
performance with metadata-intensive workloads.
|
||||
ext3/4 by default offers higher data integrity guarantees than most.
|
||||
So when comparing with a metadata-only journalling filesystem, such
|
||||
as ext3, use `mount -o data=writeback'. And you might as well use
|
||||
`mount -o nobh' too along with it. Making the journal larger than
|
||||
the mke2fs default often helps performance with metadata-intensive
|
||||
workloads.
|
||||
|
||||
2. Features
|
||||
===========
|
||||
|
||||
2.1 Currently available
|
||||
|
||||
* ability to use filesystems > 16TB
|
||||
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
|
||||
* extent format reduces metadata overhead (RAM, IO for access, transactions)
|
||||
* extent format more robust in face of on-disk corruption due to magics,
|
||||
* internal redunancy in tree
|
||||
|
||||
2.1 Previously available, soon to be enabled by default by "mkefs.ext4":
|
||||
|
||||
* dir_index and resize inode will be on by default
|
||||
* large inodes will be used by default for fast EAs, nsec timestamps, etc
|
||||
* improved file allocation (multi-block alloc)
|
||||
* fix 32000 subdirectory limit
|
||||
* nsec timestamps for mtime, atime, ctime, create time
|
||||
* inode version field on disk (NFSv4, Lustre)
|
||||
* reduced e2fsck time via uninit_bg feature
|
||||
* journal checksumming for robustness, performance
|
||||
* persistent file preallocation (e.g for streaming media, databases)
|
||||
* ability to pack bitmaps and inode tables into larger virtual groups via the
|
||||
flex_bg feature
|
||||
* large file support
|
||||
* Inode allocation using large virtual block groups via flex_bg
|
||||
* delayed allocation
|
||||
* large block (up to pagesize) support
|
||||
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
|
||||
the ordering)
|
||||
|
||||
2.2 Candidate features for future inclusion
|
||||
|
||||
There are several under discussion, whether they all make it in is
|
||||
partly a function of how much time everyone has to work on them:
|
||||
* Online defrag (patches available but not well tested)
|
||||
* reduced mke2fs time via lazy itable initialization in conjuction with
|
||||
the uninit_bg feature (capability to do this is available in e2fsprogs
|
||||
but a kernel thread to do lazy zeroing of unused inode table blocks
|
||||
after filesystem is first mounted is required for safety)
|
||||
|
||||
* improved file allocation (multi-block alloc, delayed alloc; basically done)
|
||||
* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
|
||||
* nsec timestamps for mtime, atime, ctime, create time (patch exists,
|
||||
needs some e2fsck work)
|
||||
* inode version field on disk (NFSv4, Lustre; prototype exists)
|
||||
* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
|
||||
* journal checksumming for robustness, performance (prototype exists)
|
||||
* persistent file preallocation (e.g for streaming media, databases)
|
||||
There are several others under discussion, whether they all make it in is
|
||||
partly a function of how much time everyone has to work on them. Features like
|
||||
metadata checksumming have been discussed and planned for a bit but no patches
|
||||
exist yet so I'm not sure they're in the near-term roadmap.
|
||||
|
||||
Features like metadata checksumming have been discussed and planned for
|
||||
a bit but no patches exist yet so I'm not sure they're in the near-term
|
||||
roadmap.
|
||||
The big performance win will come with mballoc, delalloc and flex_bg
|
||||
grouping of bitmaps and inode tables. Some test results available here:
|
||||
|
||||
The big performance win will come with mballoc and delalloc. CFS has
|
||||
been using mballoc for a few years already with Lustre, and IBM + Bull
|
||||
did a lot of benchmarking on it. The reason it isn't in the first set of
|
||||
patches is partly a manageability issue, and partly because it doesn't
|
||||
directly affect the on-disk format (outside of much better allocation)
|
||||
so it isn't critical to get into the first round of changes. I believe
|
||||
Alex is working on a new set of patches right now.
|
||||
- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
|
||||
- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
|
||||
|
||||
3. Options
|
||||
==========
|
||||
|
@ -222,9 +243,11 @@ stripe=n Number of filesystem blocks that mballoc will try
|
|||
to use for allocation size and alignment. For RAID5/6
|
||||
systems this should be the number of data
|
||||
disks * RAID chunk size in file system blocks.
|
||||
|
||||
delalloc (*) Deferring block allocation until write-out time.
|
||||
nodelalloc Disable delayed allocation. Blocks are allocation
|
||||
when data is copied from user to page cache.
|
||||
Data Mode
|
||||
---------
|
||||
=========
|
||||
There are 3 different data modes:
|
||||
|
||||
* writeback mode
|
||||
|
@ -236,10 +259,10 @@ typically provide the best ext4 performance.
|
|||
|
||||
* ordered mode
|
||||
In data=ordered mode, ext4 only officially journals metadata, but it logically
|
||||
groups metadata and data blocks into a single unit called a transaction. When
|
||||
it's time to write the new metadata out to disk, the associated data blocks
|
||||
are written first. In general, this mode performs slightly slower than
|
||||
writeback but significantly faster than journal mode.
|
||||
groups metadata information related to data changes with the data blocks into a
|
||||
single unit called a transaction. When it's time to write the new metadata
|
||||
out to disk, the associated data blocks are written first. In general,
|
||||
this mode performs slightly slower than writeback but significantly faster than journal mode.
|
||||
|
||||
* journal mode
|
||||
data=journal mode provides full data and metadata journaling. All new data is
|
||||
|
@ -247,7 +270,8 @@ written to the journal first, and then to its final location.
|
|||
In the event of a crash, the journal can be replayed, bringing both data and
|
||||
metadata into a consistent state. This mode is the slowest except when data
|
||||
needs to be read from and written to disk at the same time where it
|
||||
outperforms all others modes.
|
||||
outperforms all others modes. Curently ext4 does not have delayed
|
||||
allocation support if this data journalling mode is selected.
|
||||
|
||||
References
|
||||
==========
|
||||
|
@ -256,7 +280,8 @@ kernel source: <file:fs/ext4/>
|
|||
<file:fs/jbd2/>
|
||||
|
||||
programs: http://e2fsprogs.sourceforge.net/
|
||||
http://ext2resize.sourceforge.net
|
||||
|
||||
useful links: http://fedoraproject.org/wiki/ext3-devel
|
||||
http://www.bullopensource.org/ext4/
|
||||
http://ext4.wiki.kernel.org/index.php/Main_Page
|
||||
http://fedoraproject.org/wiki/Features/Ext4
|
||||
|
|
|
@ -0,0 +1,114 @@
|
|||
Glock internal locking rules
|
||||
------------------------------
|
||||
|
||||
This documents the basic principles of the glock state machine
|
||||
internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
|
||||
has two main (internal) locks:
|
||||
|
||||
1. A spinlock (gl_spin) which protects the internal state such
|
||||
as gl_state, gl_target and the list of holders (gl_holders)
|
||||
2. A non-blocking bit lock, GLF_LOCK, which is used to prevent other
|
||||
threads from making calls to the DLM, etc. at the same time. If a
|
||||
thread takes this lock, it must then call run_queue (usually via the
|
||||
workqueue) when it releases it in order to ensure any pending tasks
|
||||
are completed.
|
||||
|
||||
The gl_holders list contains all the queued lock requests (not
|
||||
just the holders) associated with the glock. If there are any
|
||||
held locks, then they will be contiguous entries at the head
|
||||
of the list. Locks are granted in strictly the order that they
|
||||
are queued, except for those marked LM_FLAG_PRIORITY which are
|
||||
used only during recovery, and even then only for journal locks.
|
||||
|
||||
There are three lock states that users of the glock layer can request,
|
||||
namely shared (SH), deferred (DF) and exclusive (EX). Those translate
|
||||
to the following DLM lock modes:
|
||||
|
||||
Glock mode | DLM lock mode
|
||||
------------------------------
|
||||
UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
|
||||
SH | PR (Protected read)
|
||||
DF | CW (Concurrent write)
|
||||
EX | EX (Exclusive)
|
||||
|
||||
Thus DF is basically a shared mode which is incompatible with the "normal"
|
||||
shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
|
||||
operations. The glocks are basically a lock plus some routines which deal
|
||||
with cache management. The following rules apply for the cache:
|
||||
|
||||
Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
|
||||
--------------------------------------------------------------------------
|
||||
UN | No | No | No | No
|
||||
SH | Yes | Yes | No | No
|
||||
DF | No | Yes | No | No
|
||||
EX | Yes | Yes | Yes | Yes
|
||||
|
||||
These rules are implemented using the various glock operations which
|
||||
are defined for each type of glock. Not all types of glocks use
|
||||
all the modes. Only inode glocks use the DF mode for example.
|
||||
|
||||
Table of glock operations and per type constants:
|
||||
|
||||
Field | Purpose
|
||||
----------------------------------------------------------------------------
|
||||
go_xmote_th | Called before remote state change (e.g. to sync dirty data)
|
||||
go_xmote_bh | Called after remote state change (e.g. to refill cache)
|
||||
go_inval | Called if remote state change requires invalidating the cache
|
||||
go_demote_ok | Returns boolean value of whether its ok to demote a glock
|
||||
| (e.g. checks timeout, and that there is no cached data)
|
||||
go_lock | Called for the first local holder of a lock
|
||||
go_unlock | Called on the final local unlock of a lock
|
||||
go_dump | Called to print content of object for debugfs file, or on
|
||||
| error to dump glock to the log.
|
||||
go_type; | The type of the glock, LM_TYPE_.....
|
||||
go_min_hold_time | The minimum hold time
|
||||
|
||||
The minimum hold time for each lock is the time after a remote lock
|
||||
grant for which we ignore remote demote requests. This is in order to
|
||||
prevent a situation where locks are being bounced around the cluster
|
||||
from node to node with none of the nodes making any progress. This
|
||||
tends to show up most with shared mmaped files which are being written
|
||||
to by multiple nodes. By delaying the demotion in response to a
|
||||
remote callback, that gives the userspace program time to make
|
||||
some progress before the pages are unmapped.
|
||||
|
||||
There is a plan to try and remove the go_lock and go_unlock callbacks
|
||||
if possible, in order to try and speed up the fast path though the locking.
|
||||
Also, eventually we hope to make the glock "EX" mode locally shared
|
||||
such that any local locking will be done with the i_mutex as required
|
||||
rather than via the glock.
|
||||
|
||||
Locking rules for glock operations:
|
||||
|
||||
Operation | GLF_LOCK bit lock held | gl_spin spinlock held
|
||||
-----------------------------------------------------------------
|
||||
go_xmote_th | Yes | No
|
||||
go_xmote_bh | Yes | No
|
||||
go_inval | Yes | No
|
||||
go_demote_ok | Sometimes | Yes
|
||||
go_lock | Yes | No
|
||||
go_unlock | Yes | No
|
||||
go_dump | Sometimes | Yes
|
||||
|
||||
N.B. Operations must not drop either the bit lock or the spinlock
|
||||
if its held on entry. go_dump and do_demote_ok must never block.
|
||||
Note that go_dump will only be called if the glock's state
|
||||
indicates that it is caching uptodate data.
|
||||
|
||||
Glock locking order within GFS2:
|
||||
|
||||
1. i_mutex (if required)
|
||||
2. Rename glock (for rename only)
|
||||
3. Inode glock(s)
|
||||
(Parents before children, inodes at "same level" with same parent in
|
||||
lock number order)
|
||||
4. Rgrp glock(s) (for (de)allocation operations)
|
||||
5. Transaction glock (via gfs2_trans_begin) for non-read operations
|
||||
6. Page lock (always last, very important!)
|
||||
|
||||
There are two glocks per inode. One deals with access to the inode
|
||||
itself (locking order as above), and the other, known as the iopen
|
||||
glock is used in conjunction with the i_nlink field in the inode to
|
||||
determine the lifetime of the inode in question. Locking of inodes
|
||||
is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
|
||||
|
|
@ -5,7 +5,7 @@
|
|||
################################################################################
|
||||
|
||||
Author: NetApp and Open Grid Computing
|
||||
Date: April 15, 2008
|
||||
Date: May 29, 2008
|
||||
|
||||
Table of Contents
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
@ -60,16 +60,18 @@ Installation
|
|||
The procedures described in this document have been tested with
|
||||
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
||||
|
||||
- Install nfs-utils-1.1.1 or greater on the client
|
||||
- Install nfs-utils-1.1.2 or greater on the client
|
||||
|
||||
An NFS/RDMA mount point can only be obtained by using the mount.nfs
|
||||
command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs
|
||||
you are using, type:
|
||||
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
|
||||
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
|
||||
version with support for NFS/RDMA mounts, but for various reasons we
|
||||
recommend using nfs-utils-1.1.2 or greater). To see which version of
|
||||
mount.nfs you are using, type:
|
||||
|
||||
> /sbin/mount.nfs -V
|
||||
$ /sbin/mount.nfs -V
|
||||
|
||||
If the version is less than 1.1.1 or the command does not exist,
|
||||
then you will need to install the latest version of nfs-utils.
|
||||
If the version is less than 1.1.2 or the command does not exist,
|
||||
you should install the latest version of nfs-utils.
|
||||
|
||||
Download the latest package from:
|
||||
|
||||
|
@ -77,22 +79,33 @@ Installation
|
|||
|
||||
Uncompress the package and follow the installation instructions.
|
||||
|
||||
If you will not be using GSS and NFSv4, the installation process
|
||||
can be simplified by disabling these features when running configure:
|
||||
If you will not need the idmapper and gssd executables (you do not need
|
||||
these to create an NFS/RDMA enabled mount command), the installation
|
||||
process can be simplified by disabling these features when running
|
||||
configure:
|
||||
|
||||
> ./configure --disable-gss --disable-nfsv4
|
||||
$ ./configure --disable-gss --disable-nfsv4
|
||||
|
||||
For more information on this see the package's README and INSTALL files.
|
||||
To build nfs-utils you will need the tcp_wrappers package installed. For
|
||||
more information on this see the package's README and INSTALL files.
|
||||
|
||||
After building the nfs-utils package, there will be a mount.nfs binary in
|
||||
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
||||
or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4.
|
||||
The standard technique is to create a symlink called mount.nfs4 to mount.nfs.
|
||||
or v4 mounts. To initiate a v4 mount, the binary must be called
|
||||
mount.nfs4. The standard technique is to create a symlink called
|
||||
mount.nfs4 to mount.nfs.
|
||||
|
||||
NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed
|
||||
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
|
||||
|
||||
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
||||
|
||||
In this location, mount.nfs will be invoked automatically for NFS mounts
|
||||
by the system mount commmand.
|
||||
|
||||
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
||||
on the NFS client machine. You do not need this specific version of
|
||||
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
||||
nfs-utils-1.1.1 is needed on the client.
|
||||
nfs-utils-1.1.2 is needed on the client.
|
||||
|
||||
- Install a Linux kernel with NFS/RDMA
|
||||
|
||||
|
@ -156,8 +169,8 @@ Check RDMA and NFS Setup
|
|||
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
||||
card:
|
||||
|
||||
> modprobe ib_mthca
|
||||
> modprobe ib_ipoib
|
||||
$ modprobe ib_mthca
|
||||
$ modprobe ib_ipoib
|
||||
|
||||
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
||||
running on the network. If your IB switch has an embedded SM, you can
|
||||
|
@ -166,7 +179,7 @@ Check RDMA and NFS Setup
|
|||
|
||||
If an SM is running on your network, you should see the following:
|
||||
|
||||
> cat /sys/class/infiniband/driverX/ports/1/state
|
||||
$ cat /sys/class/infiniband/driverX/ports/1/state
|
||||
4: ACTIVE
|
||||
|
||||
where driverX is mthca0, ipath5, ehca3, etc.
|
||||
|
@ -174,10 +187,10 @@ Check RDMA and NFS Setup
|
|||
To further test the InfiniBand software stack, use IPoIB (this
|
||||
assumes you have two IB hosts named host1 and host2):
|
||||
|
||||
host1> ifconfig ib0 a.b.c.x
|
||||
host2> ifconfig ib0 a.b.c.y
|
||||
host1> ping a.b.c.y
|
||||
host2> ping a.b.c.x
|
||||
host1$ ifconfig ib0 a.b.c.x
|
||||
host2$ ifconfig ib0 a.b.c.y
|
||||
host1$ ping a.b.c.y
|
||||
host2$ ping a.b.c.x
|
||||
|
||||
For other device types, follow the appropriate procedures.
|
||||
|
||||
|
@ -202,11 +215,11 @@ NFS/RDMA Setup
|
|||
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
||||
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
||||
|
||||
The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the
|
||||
cleint's iWARP address(es) for an RNIC.
|
||||
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
|
||||
HCA or the cleint's iWARP address(es) for an RNIC.
|
||||
|
||||
NOTE: The "insecure" option must be used because the NFS/RDMA client does not
|
||||
use a reserved port.
|
||||
NOTE: The "insecure" option must be used because the NFS/RDMA client does
|
||||
not use a reserved port.
|
||||
|
||||
Each time a machine boots:
|
||||
|
||||
|
@ -214,43 +227,45 @@ NFS/RDMA Setup
|
|||
|
||||
For InfiniBand using a Mellanox adapter:
|
||||
|
||||
> modprobe ib_mthca
|
||||
> modprobe ib_ipoib
|
||||
> ifconfig ib0 a.b.c.d
|
||||
$ modprobe ib_mthca
|
||||
$ modprobe ib_ipoib
|
||||
$ ifconfig ib0 a.b.c.d
|
||||
|
||||
NOTE: use unique addresses for the client and server
|
||||
|
||||
- Start the NFS server
|
||||
|
||||
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
|
||||
load the RDMA transport module:
|
||||
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||
kernel config), load the RDMA transport module:
|
||||
|
||||
> modprobe svcrdma
|
||||
$ modprobe svcrdma
|
||||
|
||||
Regardless of how the server was built (module or built-in), start the server:
|
||||
Regardless of how the server was built (module or built-in), start the
|
||||
server:
|
||||
|
||||
> /etc/init.d/nfs start
|
||||
$ /etc/init.d/nfs start
|
||||
|
||||
or
|
||||
|
||||
> service nfs start
|
||||
$ service nfs start
|
||||
|
||||
Instruct the server to listen on the RDMA transport:
|
||||
|
||||
> echo rdma 2050 > /proc/fs/nfsd/portlist
|
||||
$ echo rdma 2050 > /proc/fs/nfsd/portlist
|
||||
|
||||
- On the client system
|
||||
|
||||
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
|
||||
load the RDMA client module:
|
||||
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||
kernel config), load the RDMA client module:
|
||||
|
||||
> modprobe xprtrdma.ko
|
||||
$ modprobe xprtrdma.ko
|
||||
|
||||
Regardless of how the client was built (module or built-in), issue the mount.nfs command:
|
||||
Regardless of how the client was built (module or built-in), use this
|
||||
command to mount the NFS/RDMA server:
|
||||
|
||||
> /path/to/your/mount.nfs <IPoIB-server-name-or-address>:/<export> /mnt -i -o rdma,port=2050
|
||||
$ mount -o rdma,port=2050 <IPoIB-server-name-or-address>:/<export> /mnt
|
||||
|
||||
To verify that the mount is using RDMA, run "cat /proc/mounts" and check the
|
||||
"proto" field for the given mount.
|
||||
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
|
||||
the "proto" field for the given mount.
|
||||
|
||||
Congratulations! You're using NFS/RDMA!
|
||||
|
|
|
@ -380,28 +380,35 @@ i386 and x86_64 platforms support the new IRQ vector displays.
|
|||
Of some interest is the introduction of the /proc/irq directory to 2.4.
|
||||
It could be used to set IRQ to CPU affinity, this means that you can "hook" an
|
||||
IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
|
||||
irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask
|
||||
irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
|
||||
prof_cpu_mask.
|
||||
|
||||
For example
|
||||
> ls /proc/irq/
|
||||
0 10 12 14 16 18 2 4 6 8 prof_cpu_mask
|
||||
1 11 13 15 17 19 3 5 7 9
|
||||
1 11 13 15 17 19 3 5 7 9 default_smp_affinity
|
||||
> ls /proc/irq/0/
|
||||
smp_affinity
|
||||
|
||||
The contents of the prof_cpu_mask file and each smp_affinity file for each IRQ
|
||||
is the same by default:
|
||||
smp_affinity is a bitmask, in which you can specify which CPUs can handle the
|
||||
IRQ, you can set it by doing:
|
||||
|
||||
> cat /proc/irq/0/smp_affinity
|
||||
> echo 1 > /proc/irq/10/smp_affinity
|
||||
|
||||
This means that only the first CPU will handle the IRQ, but you can also echo
|
||||
5 which means that only the first and fourth CPU can handle the IRQ.
|
||||
|
||||
The contents of each smp_affinity file is the same by default:
|
||||
|
||||
> cat /proc/irq/0/smp_affinity
|
||||
ffffffff
|
||||
|
||||
It's a bitmask, in which you can specify which CPUs can handle the IRQ, you can
|
||||
set it by doing:
|
||||
The default_smp_affinity mask applies to all non-active IRQs, which are the
|
||||
IRQs which have not yet been allocated/activated, and hence which lack a
|
||||
/proc/irq/[0-9]* directory.
|
||||
|
||||
> echo 1 > /proc/irq/prof_cpu_mask
|
||||
|
||||
This means that only the first CPU will handle the IRQ, but you can also echo 5
|
||||
which means that only the first and fourth CPU can handle the IRQ.
|
||||
prof_cpu_mask specifies which CPUs are to be profiled by the system wide
|
||||
profiler. Default value is ffffffff (all cpus).
|
||||
|
||||
The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
|
||||
between all the CPUs which are allowed to handle it. As usual the kernel has
|
||||
|
|
|
@ -248,6 +248,7 @@ The top level sysfs directory looks like:
|
|||
block/
|
||||
bus/
|
||||
class/
|
||||
dev/
|
||||
devices/
|
||||
firmware/
|
||||
net/
|
||||
|
@ -274,6 +275,11 @@ fs/ contains a directory for some filesystems. Currently each
|
|||
filesystem wanting to export attributes must create its own hierarchy
|
||||
below fs/ (see ./fuse.txt for an example).
|
||||
|
||||
dev/ contains two directories char/ and block/. Inside these two
|
||||
directories there are symlinks named <major>:<minor>. These symlinks
|
||||
point to the sysfs directory for the given device. /sys/dev provides a
|
||||
quick way to lookup the sysfs interface for a device from the result of
|
||||
a stat(2) operation.
|
||||
|
||||
More information can driver-model specific features can be found in
|
||||
Documentation/driver-model/.
|
||||
|
|
|
@ -0,0 +1,164 @@
|
|||
Introduction
|
||||
=============
|
||||
|
||||
UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
|
||||
Block Images". UBIFS is a flash file system, which means it is designed
|
||||
to work with flash devices. It is important to understand, that UBIFS
|
||||
is completely different to any traditional file-system in Linux, like
|
||||
Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
|
||||
which work with MTD devices, not block devices. The other Linux
|
||||
file-system of this class is JFFS2.
|
||||
|
||||
To make it more clear, here is a small comparison of MTD devices and
|
||||
block devices.
|
||||
|
||||
1 MTD devices represent flash devices and they consist of eraseblocks of
|
||||
rather large size, typically about 128KiB. Block devices consist of
|
||||
small blocks, typically 512 bytes.
|
||||
2 MTD devices support 3 main operations - read from some offset within an
|
||||
eraseblock, write to some offset within an eraseblock, and erase a whole
|
||||
eraseblock. Block devices support 2 main operations - read a whole
|
||||
block and write a whole block.
|
||||
3 The whole eraseblock has to be erased before it becomes possible to
|
||||
re-write its contents. Blocks may be just re-written.
|
||||
4 Eraseblocks become worn out after some number of erase cycles -
|
||||
typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
|
||||
NAND flashes. Blocks do not have the wear-out property.
|
||||
5 Eraseblocks may become bad (only on NAND flashes) and software should
|
||||
deal with this. Blocks on hard drives typically do not become bad,
|
||||
because hardware has mechanisms to substitute bad blocks, at least in
|
||||
modern LBA disks.
|
||||
|
||||
It should be quite obvious why UBIFS is very different to traditional
|
||||
file-systems.
|
||||
|
||||
UBIFS works on top of UBI. UBI is a separate software layer which may be
|
||||
found in drivers/mtd/ubi. UBI is basically a volume management and
|
||||
wear-leveling layer. It provides so called UBI volumes which is a higher
|
||||
level abstraction than a MTD device. The programming model of UBI devices
|
||||
is very similar to MTD devices - they still consist of large eraseblocks,
|
||||
they have read/write/erase operations, but UBI devices are devoid of
|
||||
limitations like wear and bad blocks (items 4 and 5 in the above list).
|
||||
|
||||
In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
|
||||
very different and incompatible to JFFS2. The following are the main
|
||||
differences.
|
||||
|
||||
* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
|
||||
top of UBI volumes.
|
||||
* JFFS2 does not have on-media index and has to build it while mounting,
|
||||
which requires full media scan. UBIFS maintains the FS indexing
|
||||
information on the flash media and does not require full media scan,
|
||||
so it mounts many times faster than JFFS2.
|
||||
* JFFS2 is a write-through file-system, while UBIFS supports write-back,
|
||||
which makes UBIFS much faster on writes.
|
||||
|
||||
Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
|
||||
it possible to fit quite a lot of data to the flash.
|
||||
|
||||
Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
|
||||
It does not need stuff like ckfs.ext2. UBIFS automatically replays its
|
||||
journal and recovers from crashes, ensuring that the on-flash data
|
||||
structures are consistent.
|
||||
|
||||
UBIFS scales logarithmically (most of the data structures it uses are
|
||||
trees), so the mount time and memory consumption do not linearly depend
|
||||
on the flash size, like in case of JFFS2. This is because UBIFS
|
||||
maintains the FS index on the flash media. However, UBIFS depends on
|
||||
UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
|
||||
Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
|
||||
|
||||
The authors of UBIFS believe, that it is possible to develop UBI2 which
|
||||
would scale logarithmically as well. UBI2 would support the same API as UBI,
|
||||
but it would be binary incompatible to UBI. So UBIFS would not need to be
|
||||
changed to use UBI2
|
||||
|
||||
|
||||
Mount options
|
||||
=============
|
||||
|
||||
(*) == default.
|
||||
|
||||
norm_unmount (*) commit on unmount; the journal is committed
|
||||
when the file-system is unmounted so that the
|
||||
next mount does not have to replay the journal
|
||||
and it becomes very fast;
|
||||
fast_unmount do not commit on unmount; this option makes
|
||||
unmount faster, but the next mount slower
|
||||
because of the need to replay the journal.
|
||||
|
||||
|
||||
Quick usage instructions
|
||||
========================
|
||||
|
||||
The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
|
||||
where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
|
||||
UBI volume name.
|
||||
|
||||
Mount volume 0 on UBI device 0 to /mnt/ubifs:
|
||||
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||
|
||||
Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
|
||||
name):
|
||||
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||
|
||||
The following is an example of the kernel boot arguments to attach mtd0
|
||||
to UBI and mount volume "rootfs":
|
||||
ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
|
||||
|
||||
|
||||
Module Parameters for Debugging
|
||||
===============================
|
||||
|
||||
When UBIFS has been compiled with debugging enabled, there are 3 module
|
||||
parameters that are available to control aspects of testing and debugging.
|
||||
The parameters are unsigned integers where each bit controls an option.
|
||||
The parameters are:
|
||||
|
||||
debug_msgs Selects which debug messages to display, as follows:
|
||||
|
||||
Message Type Flag value
|
||||
|
||||
General messages 1
|
||||
Journal messages 2
|
||||
Mount messages 4
|
||||
Commit messages 8
|
||||
LEB search messages 16
|
||||
Budgeting messages 32
|
||||
Garbage collection messages 64
|
||||
Tree Node Cache (TNC) messages 128
|
||||
LEB properties (lprops) messages 256
|
||||
Input/output messages 512
|
||||
Log messages 1024
|
||||
Scan messages 2048
|
||||
Recovery messages 4096
|
||||
|
||||
debug_chks Selects extra checks that UBIFS can do while running:
|
||||
|
||||
Check Flag value
|
||||
|
||||
General checks 1
|
||||
Check Tree Node Cache (TNC) 2
|
||||
Check indexing tree size 4
|
||||
Check orphan area 8
|
||||
Check old indexing tree 16
|
||||
Check LEB properties (lprops) 32
|
||||
Check leaf nodes and inodes 64
|
||||
|
||||
debug_tsts Selects a mode of testing, as follows:
|
||||
|
||||
Test mode Flag value
|
||||
|
||||
Force in-the-gaps method 2
|
||||
Failure mode for recovery testing 4
|
||||
|
||||
For example, set debug_msgs to 5 to display General messages and Mount
|
||||
messages.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
UBIFS documentation and FAQ/HOWTO at the MTD web site:
|
||||
http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||
http://www.linux-mtd.infradead.org/faq/ubifs.html
|
|
@ -2,8 +2,12 @@
|
|||
========================
|
||||
|
||||
Copyright 2008 Red Hat Inc.
|
||||
Author: Steven Rostedt <srostedt@redhat.com>
|
||||
Author: Steven Rostedt <srostedt@redhat.com>
|
||||
License: The GNU Free Documentation License, Version 1.2
|
||||
Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
|
||||
John Kacur, and David Teigland.
|
||||
|
||||
Written for: 2.6.27-rc1
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
@ -15,10 +19,11 @@ issues that take place outside of user-space.
|
|||
|
||||
Although ftrace is the function tracer, it also includes an
|
||||
infrastructure that allows for other types of tracing. Some of the
|
||||
tracers that are currently in ftrace is a tracer to trace
|
||||
tracers that are currently in ftrace include a tracer to trace
|
||||
context switches, the time it takes for a high priority task to
|
||||
run after it was woken up, the time interrupts are disabled, and
|
||||
more.
|
||||
more (ftrace allows for tracer plugins, which means that the list of
|
||||
tracers can always grow).
|
||||
|
||||
|
||||
The File System
|
||||
|
@ -32,6 +37,8 @@ To mount the debugfs system:
|
|||
# mkdir /debug
|
||||
# mount -t debugfs nodev /debug
|
||||
|
||||
(Note: it is more common to mount at /sys/kernel/debug, but for simplicity
|
||||
this document will use /debug)
|
||||
|
||||
That's it! (assuming that you have ftrace configured into your kernel)
|
||||
|
||||
|
@ -46,21 +53,20 @@ of ftrace. Here is a list of some of the key files:
|
|||
that is configured.
|
||||
|
||||
available_tracers : This holds the different types of tracers that
|
||||
has been compiled into the kernel. The tracers
|
||||
listed here can be configured by echoing in their
|
||||
name into current_tracer.
|
||||
have been compiled into the kernel. The tracers
|
||||
listed here can be configured by echoing their name
|
||||
into current_tracer.
|
||||
|
||||
tracing_enabled : This sets or displays whether the current_tracer
|
||||
is activated and tracing or not. Echo 0 into this
|
||||
file to disable the tracer or 1 (or non-zero) to
|
||||
enable it.
|
||||
file to disable the tracer or 1 to enable it.
|
||||
|
||||
trace : This file holds the output of the trace in a human readable
|
||||
format.
|
||||
format (described below).
|
||||
|
||||
latency_trace : This file shows the same trace but the information
|
||||
is organized more to display possible latencies
|
||||
in the system.
|
||||
in the system (described below).
|
||||
|
||||
trace_pipe : The output is the same as the "trace" file but this
|
||||
file is meant to be streamed with live tracing.
|
||||
|
@ -72,7 +78,7 @@ of ftrace. Here is a list of some of the key files:
|
|||
file, it is consumed, and will not be read
|
||||
again with a sequential read. The "trace" and
|
||||
"latency_trace" files are static, and if the
|
||||
tracer isn't adding more data, they will display
|
||||
tracer is not adding more data, they will display
|
||||
the same information every time they are read.
|
||||
|
||||
iter_ctrl : This file lets the user control the amount of data
|
||||
|
@ -89,12 +95,14 @@ of ftrace. Here is a list of some of the key files:
|
|||
|
||||
trace_entries : This sets or displays the number of trace
|
||||
entries each CPU buffer can hold. The tracer buffers
|
||||
are the same size for each CPU, so care must be
|
||||
taken when modifying the trace_entries. The number
|
||||
of actually entries will be the number given
|
||||
times the number of possible CPUS. The buffers
|
||||
are saved as individual pages, and the actual entries
|
||||
will always be rounded up to entries per page.
|
||||
are the same size for each CPU. The displayed number
|
||||
is the size of the CPU buffer and not total size. The
|
||||
trace buffers are allocated in pages (blocks of memory
|
||||
that the kernel uses for allocation, usually 4 KB in size).
|
||||
Since each entry is smaller than a page, if the last
|
||||
allocated page has room for more entries than were
|
||||
requested, the rest of the page is used to allocate
|
||||
entries.
|
||||
|
||||
This can only be updated when the current_tracer
|
||||
is set to "none".
|
||||
|
@ -107,20 +115,19 @@ of ftrace. Here is a list of some of the key files:
|
|||
on specified CPUS. The format is a hex string
|
||||
representing the CPUS.
|
||||
|
||||
set_ftrace_filter : When dynamic ftrace is configured in, the
|
||||
code is dynamically modified to disable calling
|
||||
of the function profiler (mcount). This lets
|
||||
tracing be configured in with practically no overhead
|
||||
in performance. This also has a side effect of
|
||||
enabling or disabling specific functions to be
|
||||
traced. Echoing in names of functions into this
|
||||
file will limit the trace to only those files.
|
||||
set_ftrace_filter : When dynamic ftrace is configured in (see the
|
||||
section below "dynamic ftrace"), the code is dynamically
|
||||
modified (code text rewrite) to disable calling of the
|
||||
function profiler (mcount). This lets tracing be configured
|
||||
in with practically no overhead in performance. This also
|
||||
has a side effect of enabling or disabling specific functions
|
||||
to be traced. Echoing names of functions into this file
|
||||
will limit the trace to only those functions.
|
||||
|
||||
set_ftrace_notrace: This has the opposite effect that
|
||||
set_ftrace_filter has. Any function that is added
|
||||
here will not be traced. If a function exists
|
||||
in both set_ftrace_filter and set_ftrace_notrace
|
||||
the function will _not_ bet traced.
|
||||
set_ftrace_notrace: This has an effect opposite to that of
|
||||
set_ftrace_filter. Any function that is added here will not
|
||||
be traced. If a function exists in both set_ftrace_filter
|
||||
and set_ftrace_notrace, the function will _not_ be traced.
|
||||
|
||||
available_filter_functions : When a function is encountered the first
|
||||
time by the dynamic tracer, it is recorded and
|
||||
|
@ -128,32 +135,31 @@ of ftrace. Here is a list of some of the key files:
|
|||
lists the functions that have been recorded
|
||||
by the dynamic tracer and these functions can
|
||||
be used to set the ftrace filter by the above
|
||||
"set_ftrace_filter" file.
|
||||
"set_ftrace_filter" file. (See the section "dynamic ftrace"
|
||||
below for more details).
|
||||
|
||||
|
||||
The Tracers
|
||||
-----------
|
||||
|
||||
Here are the list of current tracers that can be configured.
|
||||
Here is the list of current tracers that may be configured.
|
||||
|
||||
ftrace - function tracer that uses mcount to trace all functions.
|
||||
It is possible to filter out which functions that are
|
||||
traced when dynamic ftrace is configured in.
|
||||
|
||||
sched_switch - traces the context switches between tasks.
|
||||
|
||||
irqsoff - traces the areas that disable interrupts and saves off
|
||||
irqsoff - traces the areas that disable interrupts and saves
|
||||
the trace with the longest max latency.
|
||||
See tracing_max_latency. When a new max is recorded,
|
||||
it replaces the old trace. It is best to view this
|
||||
trace with the latency_trace file.
|
||||
trace via the latency_trace file.
|
||||
|
||||
preemptoff - Similar to irqsoff but traces and records the time
|
||||
preemption is disabled.
|
||||
preemptoff - Similar to irqsoff but traces and records the amount of
|
||||
time for which preemption is disabled.
|
||||
|
||||
preemptirqsoff - Similar to irqsoff and preemptoff, but traces and
|
||||
records the largest time irqs and/or preemption is
|
||||
disabled.
|
||||
records the largest time for which irqs and/or preemption
|
||||
is disabled.
|
||||
|
||||
wakeup - Traces and records the max latency that it takes for
|
||||
the highest priority task to get scheduled after
|
||||
|
@ -166,13 +172,13 @@ Here are the list of current tracers that can be configured.
|
|||
Examples of using the tracer
|
||||
----------------------------
|
||||
|
||||
Here are typical examples of using the tracers with only controlling
|
||||
them with the debugfs interface (without using any user-land utilities).
|
||||
Here are typical examples of using the tracers when controlling them only
|
||||
with the debugfs interface (without using any user-land utilities).
|
||||
|
||||
Output format:
|
||||
--------------
|
||||
|
||||
Here's an example of the output format of the file "trace"
|
||||
Here is an example of the output format of the file "trace"
|
||||
|
||||
--------
|
||||
# tracer: ftrace
|
||||
|
@ -184,14 +190,15 @@ Here's an example of the output format of the file "trace"
|
|||
bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
|
||||
--------
|
||||
|
||||
A header is printed with the trace that is represented. In this case
|
||||
the tracer is "ftrace". Then a header showing the format. Task name
|
||||
"bash", the task PID "4251", the CPU that it was running on
|
||||
A header is printed with the tracer name that is represented by the trace.
|
||||
In this case the tracer is "ftrace". Then a header showing the format. Task
|
||||
name "bash", the task PID "4251", the CPU that it was running on
|
||||
"01", the timestamp in <secs>.<usecs> format, the function name that was
|
||||
traced "path_put" and the parent function that called this function
|
||||
"path_walk".
|
||||
"path_walk". The timestamp is the time at which the function was
|
||||
entered.
|
||||
|
||||
The sched_switch tracer also includes tracing of task wake ups and
|
||||
The sched_switch tracer also includes tracing of task wakeups and
|
||||
context switches.
|
||||
|
||||
ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
|
||||
|
@ -201,7 +208,7 @@ context switches.
|
|||
kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
|
||||
ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
|
||||
|
||||
Wake ups are represented by a "+" and the context switches show
|
||||
Wake ups are represented by a "+" and the context switches are shown as
|
||||
"==>". The format is:
|
||||
|
||||
Context switches:
|
||||
|
@ -216,7 +223,7 @@ Wake ups are represented by a "+" and the context switches show
|
|||
|
||||
<pid>:<prio>:<state> + <pid>:<prio>:<state>
|
||||
|
||||
The prio is the internal kernel priority, which is inverse to the
|
||||
The prio is the internal kernel priority, which is the inverse of the
|
||||
priority that is usually displayed by user-space tools. Zero represents
|
||||
the highest priority (99). Prio 100 starts the "nice" priorities with
|
||||
100 being equal to nice -20 and 139 being nice 19. The prio "140" is
|
||||
|
@ -227,7 +234,7 @@ Latency trace format
|
|||
--------------------
|
||||
|
||||
For traces that display latency times, the latency_trace file gives
|
||||
a bit more information to see why a latency happened. Here's a typical
|
||||
somewhat more information to see why a latency happened. Here is a typical
|
||||
trace.
|
||||
|
||||
# tracer: irqsoff
|
||||
|
@ -255,21 +262,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8
|
|||
<idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
|
||||
|
||||
|
||||
vim:ft=help
|
||||
|
||||
|
||||
This shows that the current tracer is "irqsoff" tracing the time
|
||||
interrupts are disabled. It gives the trace version and the kernel
|
||||
this was executed on (2.6.26-rc8). Then it displays the max latency
|
||||
in microsecs (97 us). The number of trace entries displayed
|
||||
by the total number recorded (both are three: #3/3). The type of
|
||||
This shows that the current tracer is "irqsoff" tracing the time for which
|
||||
interrupts were disabled. It gives the trace version and the version
|
||||
of the kernel upon which this was executed on (2.6.26-rc8). Then it displays
|
||||
the max latency in microsecs (97 us). The number of trace entries displayed
|
||||
and the total number recorded (both are three: #3/3). The type of
|
||||
preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero
|
||||
and reserved for later use. #P is the number of online CPUS (#P:2).
|
||||
and are reserved for later use. #P is the number of online CPUS (#P:2).
|
||||
|
||||
The task is the process that was running when the latency happened.
|
||||
The task is the process that was running when the latency occurred.
|
||||
(swapper pid: 0).
|
||||
|
||||
The start and stop that caused the latencies:
|
||||
The start and stop (the functions in which the interrupts were disabled and
|
||||
enabled respectively) that caused the latencies:
|
||||
|
||||
apic_timer_interrupt is where the interrupts were disabled.
|
||||
do_softirq is where they were enabled again.
|
||||
|
@ -281,14 +287,14 @@ explains which is which.
|
|||
|
||||
pid: The PID of that process.
|
||||
|
||||
CPU#: The CPU that the process was running on.
|
||||
CPU#: The CPU which the process was running on.
|
||||
|
||||
irqs-off: 'd' interrupts are disabled. '.' otherwise.
|
||||
|
||||
need-resched: 'N' task need_resched is set, '.' otherwise.
|
||||
|
||||
hardirq/softirq:
|
||||
'H' - hard irq happened inside a softirq.
|
||||
'H' - hard irq occurred inside a softirq.
|
||||
'h' - hard irq is running
|
||||
's' - soft irq is running
|
||||
'.' - normal context.
|
||||
|
@ -297,13 +303,13 @@ explains which is which.
|
|||
|
||||
The above is mostly meaningful for kernel developers.
|
||||
|
||||
time: This differs from the trace output where as the trace output
|
||||
contained a absolute timestamp. This timestamp is relative
|
||||
to the start of the first entry in the the trace.
|
||||
time: This differs from the trace file output. The trace file output
|
||||
includes an absolute timestamp. The timestamp used by the
|
||||
latency_trace file is relative to the start of the trace.
|
||||
|
||||
delay: This is just to help catch your eye a bit better. And
|
||||
needs to be fixed to be only relative to the same CPU.
|
||||
The marks is determined by the difference between this
|
||||
The marks are determined by the difference between this
|
||||
current trace and the next trace.
|
||||
'!' - greater than preempt_mark_thresh (default 100)
|
||||
'+' - greater than 1 microsecond
|
||||
|
@ -322,13 +328,13 @@ output. To see what is available, simply cat the file:
|
|||
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
|
||||
noblock nostacktrace nosched-tree
|
||||
|
||||
To disable one of the options, echo in the option appended with "no".
|
||||
To disable one of the options, echo in the option prepended with "no".
|
||||
|
||||
echo noprint-parent > /debug/tracing/iter_ctrl
|
||||
|
||||
To enable an option, leave off the "no".
|
||||
|
||||
echo sym-offest > /debug/tracing/iter_ctrl
|
||||
echo sym-offset > /debug/tracing/iter_ctrl
|
||||
|
||||
Here are the available options:
|
||||
|
||||
|
@ -344,7 +350,7 @@ Here are the available options:
|
|||
|
||||
sym-offset - Display not only the function name, but also the offset
|
||||
in the function. For example, instead of seeing just
|
||||
"ktime_get" you will see "ktime_get+0xb/0x20"
|
||||
"ktime_get", you will see "ktime_get+0xb/0x20".
|
||||
|
||||
sym-offset:
|
||||
bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
|
||||
|
@ -364,7 +370,7 @@ Here are the available options:
|
|||
user applications that can translate the raw numbers better than
|
||||
having it done in the kernel.
|
||||
|
||||
hex - similar to raw, but the numbers will be in a hexadecimal format.
|
||||
hex - Similar to raw, but the numbers will be in a hexadecimal format.
|
||||
|
||||
bin - This will print out the formats in raw binary.
|
||||
|
||||
|
@ -380,8 +386,8 @@ Here are the available options:
|
|||
sched_switch
|
||||
------------
|
||||
|
||||
This tracer simply records schedule switches. Here's an example
|
||||
on how to implement it.
|
||||
This tracer simply records schedule switches. Here is an example
|
||||
of how to use it.
|
||||
|
||||
# echo sched_switch > /debug/tracing/current_tracer
|
||||
# echo 1 > /debug/tracing/tracing_enabled
|
||||
|
@ -416,8 +422,8 @@ the name of the trace and points to the options. The "FUNCTION"
|
|||
is a misnomer since here it represents the wake ups and context
|
||||
switches.
|
||||
|
||||
The sched_switch only lists the wake ups (represented with '+')
|
||||
and context switches ('==>') with the previous task or current
|
||||
The sched_switch file only lists the wake ups (represented with '+')
|
||||
and context switches ('==>') with the previous task or current task
|
||||
first followed by the next task or task waking up. The format for both
|
||||
of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO
|
||||
is the inverse of the actual priority with zero (0) being the highest
|
||||
|
@ -432,7 +438,8 @@ The task states are:
|
|||
|
||||
R - running : wants to run, may not actually be running
|
||||
S - sleep : process is waiting to be woken up (handles signals)
|
||||
D - deep sleep : process must be woken up (ignores signals)
|
||||
D - disk sleep (uninterruptible sleep) : process must be woken up
|
||||
(ignores signals)
|
||||
T - stopped : process suspended
|
||||
t - traced : process is being traced (with something like gdb)
|
||||
Z - zombie : process waiting to be cleaned up
|
||||
|
@ -442,8 +449,8 @@ The task states are:
|
|||
ftrace_enabled
|
||||
--------------
|
||||
|
||||
The following tracers give different output depending on whether
|
||||
or not the sysctl ftrace_enabled is set. To set ftrace_enabled,
|
||||
The following tracers (listed below) give different output depending
|
||||
on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled,
|
||||
one can either use the sysctl function or set it via the proc
|
||||
file system interface.
|
||||
|
||||
|
@ -470,13 +477,12 @@ interrupt from triggering or the mouse interrupt from letting the
|
|||
kernel know of a new mouse event. The result is a latency with the
|
||||
reaction time.
|
||||
|
||||
The irqsoff tracer tracks the time interrupts are disabled and when
|
||||
they are re-enabled. When a new maximum latency is hit, it saves off
|
||||
the trace so that it may be retrieved at a later time. Every time a
|
||||
new maximum in reached, the old saved trace is discarded and the new
|
||||
trace is saved.
|
||||
The irqsoff tracer tracks the time for which interrupts are disabled.
|
||||
When a new maximum latency is hit, the tracer saves the trace leading up
|
||||
to that latency point so that every time a new maximum is reached, the old
|
||||
saved trace is discarded and the new trace is saved.
|
||||
|
||||
To reset the maximum, echo 0 into tracing_max_latency. Here's an
|
||||
To reset the maximum, echo 0 into tracing_max_latency. Here is an
|
||||
example:
|
||||
|
||||
# echo irqsoff > /debug/tracing/current_tracer
|
||||
|
@ -488,14 +494,14 @@ example:
|
|||
# cat /debug/tracing/latency_trace
|
||||
# tracer: irqsoff
|
||||
#
|
||||
irqsoff latency trace v1.1.5 on 2.6.26-rc8
|
||||
irqsoff latency trace v1.1.5 on 2.6.26
|
||||
--------------------------------------------------------------------
|
||||
latency: 6 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
|
||||
latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
|
||||
-----------------
|
||||
| task: bash-4269 (uid:0 nice:0 policy:0 rt_prio:0)
|
||||
| task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0)
|
||||
-----------------
|
||||
=> started at: copy_page_range
|
||||
=> ended at: copy_page_range
|
||||
=> started at: sys_setpgid
|
||||
=> ended at: sys_setpgid
|
||||
|
||||
# _------=> CPU#
|
||||
# / _-----=> irqs-off
|
||||
|
@ -506,21 +512,19 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8
|
|||
# ||||| delay
|
||||
# cmd pid ||||| time | caller
|
||||
# \ / ||||| \ | /
|
||||
bash-4269 1...1 0us+: _spin_lock (copy_page_range)
|
||||
bash-4269 1...1 7us : _spin_unlock (copy_page_range)
|
||||
bash-4269 1...2 7us : trace_preempt_on (copy_page_range)
|
||||
bash-3730 1d... 0us : _write_lock_irq (sys_setpgid)
|
||||
bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid)
|
||||
bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid)
|
||||
|
||||
|
||||
vim:ft=help
|
||||
Here we see that that we had a latency of 12 microsecs (which is
|
||||
very good). The _write_lock_irq in sys_setpgid disabled interrupts.
|
||||
The difference between the 12 and the displayed timestamp 14us occurred
|
||||
because the clock was incremented between the time of recording the max
|
||||
latency and the time of recording the function that had that latency.
|
||||
|
||||
Here we see that that we had a latency of 6 microsecs (which is
|
||||
very good). The spin_lock in copy_page_range disabled interrupts.
|
||||
The difference between the 6 and the displayed timestamp 7us is
|
||||
because the clock must have incremented between the time of recording
|
||||
the max latency and recording the function that had that latency.
|
||||
|
||||
Note the above had ftrace_enabled not set. If we set the ftrace_enabled
|
||||
we get a much larger output:
|
||||
Note the above example had ftrace_enabled not set. If we set the
|
||||
ftrace_enabled, we get a much larger output:
|
||||
|
||||
# tracer: irqsoff
|
||||
#
|
||||
|
@ -566,27 +570,26 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8
|
|||
ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
|
||||
|
||||
|
||||
vim:ft=help
|
||||
|
||||
|
||||
Here we traced a 50 microsecond latency. But we also see all the
|
||||
functions that were called during that time. Note that enabling
|
||||
function tracing we endure an added overhead. This overhead may
|
||||
extend the latency times. But never the less, this trace has provided
|
||||
some very helpful debugging.
|
||||
functions that were called during that time. Note that by enabling
|
||||
function tracing, we incur an added overhead. This overhead may
|
||||
extend the latency times. But nevertheless, this trace has provided
|
||||
some very helpful debugging information.
|
||||
|
||||
|
||||
preemptoff
|
||||
----------
|
||||
|
||||
When preemption is disabled we may be able to receive interrupts but
|
||||
the task can not be preempted and a higher priority task must wait
|
||||
When preemption is disabled, we may be able to receive interrupts but
|
||||
the task cannot be preempted and a higher priority task must wait
|
||||
for preemption to be enabled again before it can preempt a lower
|
||||
priority task.
|
||||
|
||||
The preemptoff tracer traces the places that disables preemption.
|
||||
Like the irqsoff, it records the maximum latency that preemption
|
||||
was disabled. The control of preemptoff is much like the irqsoff.
|
||||
The preemptoff tracer traces the places that disable preemption.
|
||||
Like the irqsoff tracer, it records the maximum latency for which preemption
|
||||
was disabled. The control of preemptoff tracer is much like the irqsoff
|
||||
tracer.
|
||||
|
||||
# echo preemptoff > /debug/tracing/current_tracer
|
||||
# echo 0 > /debug/tracing/tracing_max_latency
|
||||
|
@ -620,8 +623,6 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8
|
|||
sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
|
||||
|
||||
|
||||
vim:ft=help
|
||||
|
||||
This has some more changes. Preemption was disabled when an interrupt
|
||||
came in (notice the 'h'), and was enabled while doing a softirq.
|
||||
(notice the 's'). But we also see that interrupts have been disabled
|
||||
|
@ -689,16 +690,16 @@ The above is an example of the preemptoff trace with ftrace_enabled
|
|||
set. Here we see that interrupts were disabled the entire time.
|
||||
The irq_enter code lets us know that we entered an interrupt 'h'.
|
||||
Before that, the functions being traced still show that it is not
|
||||
in an interrupt, but we can see by the functions themselves that
|
||||
in an interrupt, but we can see from the functions themselves that
|
||||
this is not the case.
|
||||
|
||||
Notice that the __do_softirq when called doesn't have a preempt_count.
|
||||
It may seem that we missed a preempt enabled. What really happened
|
||||
is that the preempt count is held on the threads stack and we
|
||||
Notice that __do_softirq when called does not have a preempt_count.
|
||||
It may seem that we missed a preempt enabling. What really happened
|
||||
is that the preempt count is held on the thread's stack and we
|
||||
switched to the softirq stack (4K stacks in effect). The code
|
||||
does not copy the preempt count, but because interrupts are disabled
|
||||
we don't need to worry about it. Having a tracer like this is good
|
||||
to let people know what really happens inside the kernel.
|
||||
does not copy the preempt count, but because interrupts are disabled,
|
||||
we do not need to worry about it. Having a tracer like this is good
|
||||
for letting people know what really happens inside the kernel.
|
||||
|
||||
|
||||
preemptirqsoff
|
||||
|
@ -708,7 +709,7 @@ Knowing the locations that have interrupts disabled or preemption
|
|||
disabled for the longest times is helpful. But sometimes we would
|
||||
like to know when either preemption and/or interrupts are disabled.
|
||||
|
||||
The following code:
|
||||
Consider the following code:
|
||||
|
||||
local_irq_disable();
|
||||
call_function_with_irqs_off();
|
||||
|
@ -732,7 +733,7 @@ To record this time, use the preemptirqsoff tracer.
|
|||
|
||||
Again, using this trace is much like the irqsoff and preemptoff tracers.
|
||||
|
||||
# echo preemptoff > /debug/tracing/current_tracer
|
||||
# echo preemptirqsoff > /debug/tracing/current_tracer
|
||||
# echo 0 > /debug/tracing/tracing_max_latency
|
||||
# echo 1 > /debug/tracing/tracing_enabled
|
||||
# ls -ltr
|
||||
|
@ -764,12 +765,10 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
|
|||
ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
|
||||
|
||||
|
||||
vim:ft=help
|
||||
|
||||
|
||||
The trace_hardirqs_off_thunk is called from assembly on x86 when
|
||||
interrupts are disabled in the assembly code. Without the function
|
||||
tracing, we don't know if interrupts were enabled within the preemption
|
||||
tracing, we do not know if interrupts were enabled within the preemption
|
||||
points. We do see that it started with preemption enabled.
|
||||
|
||||
Here is a trace with ftrace_enabled set:
|
||||
|
@ -860,25 +859,25 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
|
|||
|
||||
This is a very interesting trace. It started with the preemption of
|
||||
the ls task. We see that the task had the "need_resched" bit set
|
||||
with the 'N' in the trace. Interrupts are disabled in the spin_lock
|
||||
and the trace started. We see that a schedule took place to run
|
||||
sshd. When the interrupts were enabled we took an interrupt.
|
||||
On return of the interrupt the softirq ran. We took another interrupt
|
||||
while running the softirq as we see with the capital 'H'.
|
||||
via the 'N' in the trace. Interrupts were disabled before the spin_lock
|
||||
at the beginning of the trace. We see that a schedule took place to run
|
||||
sshd. When the interrupts were enabled, we took an interrupt.
|
||||
On return from the interrupt handler, the softirq ran. We took another
|
||||
interrupt while running the softirq as we see from the capital 'H'.
|
||||
|
||||
|
||||
wakeup
|
||||
------
|
||||
|
||||
In Real-Time environment it is very important to know the wakeup
|
||||
time it takes for the highest priority task that wakes up to the
|
||||
time it executes. This is also known as "schedule latency".
|
||||
In a Real-Time environment it is very important to know the wakeup
|
||||
time it takes for the highest priority task that is woken up to the
|
||||
time that it executes. This is also known as "schedule latency".
|
||||
I stress the point that this is about RT tasks. It is also important
|
||||
to know the scheduling latency of non-RT tasks, but the average
|
||||
schedule latency is better for non-RT tasks. Tools like
|
||||
LatencyTop is more appropriate for such measurements.
|
||||
LatencyTop are more appropriate for such measurements.
|
||||
|
||||
Real-Time environments is interested in the worst case latency.
|
||||
Real-Time environments are interested in the worst case latency.
|
||||
That is the longest latency it takes for something to happen, and
|
||||
not the average. We can have a very fast scheduler that may only
|
||||
have a large latency once in a while, but that would not work well
|
||||
|
@ -889,8 +888,8 @@ tasks that are unpredictable will overwrite the worst case latency
|
|||
of RT tasks.
|
||||
|
||||
Since this tracer only deals with RT tasks, we will run this slightly
|
||||
different than we did with the previous tracers. Instead of performing
|
||||
an 'ls' we will run 'sleep 1' under 'chrt' which changes the
|
||||
differently than we did with the previous tracers. Instead of performing
|
||||
an 'ls', we will run 'sleep 1' under 'chrt' which changes the
|
||||
priority of the task.
|
||||
|
||||
# echo wakeup > /debug/tracing/current_tracer
|
||||
|
@ -921,12 +920,10 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8
|
|||
<idle>-0 1d..4 4us : schedule (cpu_idle)
|
||||
|
||||
|
||||
vim:ft=help
|
||||
|
||||
|
||||
Running this on an idle system we see that it only took 4 microseconds
|
||||
Running this on an idle system, we see that it only took 4 microseconds
|
||||
to perform the task switch. Note, since the trace marker in the
|
||||
schedule is before the actual "switch" we stop the tracing when
|
||||
schedule is before the actual "switch", we stop the tracing when
|
||||
the recorded task is about to schedule in. This may change if
|
||||
we add a new marker at the end of the scheduler.
|
||||
|
||||
|
@ -991,13 +988,16 @@ ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
|
|||
ksoftirq-7 1d..4 50us : schedule (__cond_resched)
|
||||
|
||||
The interrupt went off while running ksoftirqd. This task runs at
|
||||
SCHED_OTHER. Why didn't we see the 'N' set early? This may be
|
||||
a harmless bug with x86_32 and 4K stacks. The need_reched() function
|
||||
that tests if we need to reschedule looks on the actual stack.
|
||||
Where as the setting of the NEED_RESCHED bit happens on the
|
||||
task's stack. But because we are in a hard interrupt, the test
|
||||
is with the interrupts stack which has that to be false. We don't
|
||||
see the 'N' until we switch back to the task's stack.
|
||||
SCHED_OTHER. Why did not we see the 'N' set early? This may be
|
||||
a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks
|
||||
configured, the interrupt and softirq run with their own stack.
|
||||
Some information is held on the top of the task's stack (need_resched
|
||||
and preempt_count are both stored there). The setting of the NEED_RESCHED
|
||||
bit is done directly to the task's stack, but the reading of the
|
||||
NEED_RESCHED is done by looking at the current stack, which in this case
|
||||
is the stack for the hard interrupt. This hides the fact that NEED_RESCHED
|
||||
has been set. We do not see the 'N' until we switch back to the task's
|
||||
assigned stack.
|
||||
|
||||
ftrace
|
||||
------
|
||||
|
@ -1036,14 +1036,14 @@ this tracer is a nop.
|
|||
[...]
|
||||
|
||||
|
||||
Note: It is sometimes better to enable or disable tracing directly from
|
||||
a program, because the buffer may be overflowed by the echo commands
|
||||
before you get to the point you want to trace. It is also easier to
|
||||
stop the tracing at the point that you hit the part that you are
|
||||
interested in. Since the ftrace buffer is a ring buffer with the
|
||||
oldest data being overwritten, usually it is sufficient to start the
|
||||
tracer with an echo command but have you code stop it. Something
|
||||
like the following is usually appropriate for this.
|
||||
Note: ftrace uses ring buffers to store the above entries. The newest data
|
||||
may overwrite the oldest data. Sometimes using echo to stop the trace
|
||||
is not sufficient because the tracing could have overwritten the data
|
||||
that you wanted to record. For this reason, it is sometimes better to
|
||||
disable tracing directly from a program. This allows you to stop the
|
||||
tracing at the point that you hit the part that you are interested in.
|
||||
To disable the tracing directly from a C program, something like following
|
||||
code snippet can be used:
|
||||
|
||||
int trace_fd;
|
||||
[...]
|
||||
|
@ -1052,25 +1052,31 @@ int main(int argc, char *argv[]) {
|
|||
trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
|
||||
[...]
|
||||
if (condition_hit()) {
|
||||
write(trace_fd, "0", 1);
|
||||
write(trace_fd, "0", 1);
|
||||
}
|
||||
[...]
|
||||
}
|
||||
|
||||
Note: Here we hard coded the path name. The debugfs mount is not
|
||||
guaranteed to be at /debug (and is more commonly at /sys/kernel/debug).
|
||||
For simple one time traces, the above is sufficent. For anything else,
|
||||
a search through /proc/mounts may be needed to find where the debugfs
|
||||
file-system is mounted.
|
||||
|
||||
dynamic ftrace
|
||||
--------------
|
||||
|
||||
If CONFIG_DYNAMIC_FTRACE is set, then the system will run with
|
||||
If CONFIG_DYNAMIC_FTRACE is set, the system will run with
|
||||
virtually no overhead when function tracing is disabled. The way
|
||||
this works is the mcount function call (placed at the start of
|
||||
every kernel function, produced by the -pg switch in gcc), starts
|
||||
of pointing to a simple return.
|
||||
of pointing to a simple return. (Enabling FTRACE will include the
|
||||
-pg switch in the compiling of the kernel.)
|
||||
|
||||
When dynamic ftrace is initialized, it calls kstop_machine to make it
|
||||
act like a uniprocessor so that it can freely modify code without
|
||||
worrying about other processors executing that same code. At
|
||||
initialization, the mcount calls are change to call a "record_ip"
|
||||
When dynamic ftrace is initialized, it calls kstop_machine to make
|
||||
the machine act like a uniprocessor so that it can freely modify code
|
||||
without worrying about other processors executing that same code. At
|
||||
initialization, the mcount calls are changed to call a "record_ip"
|
||||
function. After this, the first time a kernel function is called,
|
||||
it has the calling address saved in a hash table.
|
||||
|
||||
|
@ -1078,15 +1084,15 @@ Later on the ftraced kernel thread is awoken and will again call
|
|||
kstop_machine if new functions have been recorded. The ftraced thread
|
||||
will change all calls to mcount to "nop". Just calling mcount
|
||||
and having mcount return has shown a 10% overhead. By converting
|
||||
it to a nop, there is no recordable overhead to the system.
|
||||
it to a nop, there is no measurable overhead to the system.
|
||||
|
||||
One special side-effect to the recording of the functions being
|
||||
traced, is that we can now selectively choose which functions we
|
||||
want to trace and which ones we want the mcount calls to remain as
|
||||
traced is that we can now selectively choose which functions we
|
||||
wish to trace and which ones we want the mcount calls to remain as
|
||||
nops.
|
||||
|
||||
Two files that contain to the enabling and disabling of recorded
|
||||
functions are:
|
||||
Two files are used, one for enabling and one for disabling the tracing
|
||||
of specified functions. They are:
|
||||
|
||||
set_ftrace_filter
|
||||
|
||||
|
@ -1094,7 +1100,7 @@ and
|
|||
|
||||
set_ftrace_notrace
|
||||
|
||||
A list of available functions that you can add to this files is listed
|
||||
A list of available functions that you can add to these files is listed
|
||||
in:
|
||||
|
||||
available_filter_functions
|
||||
|
@ -1108,7 +1114,7 @@ pick_next_task_fair
|
|||
mutex_lock
|
||||
[...]
|
||||
|
||||
If I'm only interested in sys_nanosleep and hrtimer_interrupt:
|
||||
If I am only interested in sys_nanosleep and hrtimer_interrupt:
|
||||
|
||||
# echo sys_nanosleep hrtimer_interrupt \
|
||||
> /debug/tracing/set_ftrace_filter
|
||||
|
@ -1125,21 +1131,21 @@ If I'm only interested in sys_nanosleep and hrtimer_interrupt:
|
|||
usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
|
||||
<idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
|
||||
|
||||
To see what functions are being traced, you can cat the file:
|
||||
To see which functions are being traced, you can cat the file:
|
||||
|
||||
# cat /debug/tracing/set_ftrace_filter
|
||||
hrtimer_interrupt
|
||||
sys_nanosleep
|
||||
|
||||
|
||||
Perhaps this isn't enough. The filters also allow simple wild cards.
|
||||
Only the following is currently available
|
||||
Perhaps this is not enough. The filters also allow simple wild cards.
|
||||
Only the following are currently available
|
||||
|
||||
<match>* - will match functions that begins with <match>
|
||||
<match>* - will match functions that begin with <match>
|
||||
*<match> - will match functions that end with <match>
|
||||
*<match>* - will match functions that have <match> in it
|
||||
|
||||
Thats all the wild cards that are allowed.
|
||||
These are the only wild cards which are supported.
|
||||
|
||||
<match>*<match> will not work.
|
||||
|
||||
|
@ -1187,7 +1193,7 @@ This is because the '>' and '>>' act just like they do in bash.
|
|||
To rewrite the filters, use '>'
|
||||
To append to the filters, use '>>'
|
||||
|
||||
To clear out a filter so that all functions will be recorded again.
|
||||
To clear out a filter so that all functions will be recorded again:
|
||||
|
||||
# echo > /debug/tracing/set_ftrace_filter
|
||||
# cat /debug/tracing/set_ftrace_filter
|
||||
|
@ -1246,24 +1252,24 @@ ftraced
|
|||
|
||||
As mentioned above, when dynamic ftrace is configured in, a kernel
|
||||
thread wakes up once a second and checks to see if there are mcount
|
||||
calls that need to be converted into nops. If there is not, then
|
||||
it simply goes back to sleep. But if there is, it will call
|
||||
calls that need to be converted into nops. If there are not any, then
|
||||
it simply goes back to sleep. But if there are some, it will call
|
||||
kstop_machine to convert the calls to nops.
|
||||
|
||||
There may be a case that you do not want this added latency.
|
||||
There may be a case in which you do not want this added latency.
|
||||
Perhaps you are doing some audio recording and this activity might
|
||||
cause skips in the playback. There is an interface to disable
|
||||
and enable the ftraced kernel thread.
|
||||
and enable the "ftraced" kernel thread.
|
||||
|
||||
# echo 0 > /debug/tracing/ftraced_enabled
|
||||
|
||||
This will disable the calling of the kstop_machine to update the
|
||||
mcount calls to nops. Remember that there's a large overhead
|
||||
This will disable the calling of kstop_machine to update the
|
||||
mcount calls to nops. Remember that there is a large overhead
|
||||
to calling mcount. Without this kernel thread, that overhead will
|
||||
exist.
|
||||
|
||||
Any write to the ftraced_enabled file will cause the kstop_machine
|
||||
to run if there are recorded calls to mcount. This means that a
|
||||
If there are recorded calls to mcount, any write to the ftraced_enabled
|
||||
file will cause the kstop_machine to run. This means that a
|
||||
user can manually perform the updates when they want to by simply
|
||||
echoing a '0' into the ftraced_enabled file.
|
||||
|
||||
|
@ -1274,8 +1280,8 @@ that uses ftrace function recording.
|
|||
trace_pipe
|
||||
----------
|
||||
|
||||
The trace_pipe outputs the same as trace, but the effect on the
|
||||
tracing is different. Every read from trace_pipe is consumed.
|
||||
The trace_pipe outputs the same content as the trace file, but the effect
|
||||
on the tracing is different. Every read from trace_pipe is consumed.
|
||||
This means that subsequent reads will be different. The trace
|
||||
is live.
|
||||
|
||||
|
@ -1305,7 +1311,7 @@ is live.
|
|||
bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
|
||||
|
||||
|
||||
Note, reading the trace_pipe will block until more input is added.
|
||||
Note, reading the trace_pipe file will block until more input is added.
|
||||
By changing the tracer, trace_pipe will issue an EOF. We needed
|
||||
to set the ftrace tracer _before_ cating the trace_pipe file.
|
||||
|
||||
|
@ -1314,8 +1320,8 @@ trace entries
|
|||
-------------
|
||||
|
||||
Having too much or not enough data can be troublesome in diagnosing
|
||||
some issue in the kernel. The file trace_entries is used to modify
|
||||
the size of the internal trace buffers. The numbers listed
|
||||
an issue in the kernel. The file trace_entries is used to modify
|
||||
the size of the internal trace buffers. The number listed
|
||||
is the number of entries that can be recorded per CPU. To know
|
||||
the full size, multiply the number of possible CPUS with the
|
||||
number of entries.
|
||||
|
@ -1323,8 +1329,9 @@ number of entries.
|
|||
# cat /debug/tracing/trace_entries
|
||||
65620
|
||||
|
||||
Note, to modify this you must have tracing fulling disabled. To do that,
|
||||
echo "none" into the current_tracer.
|
||||
Note, to modify this, you must have tracing completely disabled. To do that,
|
||||
echo "none" into the current_tracer. If the current_tracer is not set
|
||||
to "none", an EINVAL error will be returned.
|
||||
|
||||
# echo none > /debug/tracing/current_tracer
|
||||
# echo 100000 > /debug/tracing/trace_entries
|
||||
|
@ -1333,18 +1340,18 @@ echo "none" into the current_tracer.
|
|||
|
||||
|
||||
Notice that we echoed in 100,000 but the size is 100,045. The entries
|
||||
are held by individual pages. It allocates the number of pages it takes
|
||||
are held in individual pages. It allocates the number of pages it takes
|
||||
to fulfill the request. If more entries may fit on the last page
|
||||
it will add them.
|
||||
then they will be added.
|
||||
|
||||
# echo 1 > /debug/tracing/trace_entries
|
||||
# cat /debug/tracing/trace_entries
|
||||
85
|
||||
|
||||
This shows us that 85 entries can fit on a single page.
|
||||
This shows us that 85 entries can fit in a single page.
|
||||
|
||||
The number of pages that will be allocated is a percentage of available
|
||||
memory. Allocating too much will produces an error.
|
||||
The number of pages which will be allocated is limited to a percentage
|
||||
of available memory. Allocating too much will produce an error.
|
||||
|
||||
# echo 1000000000000 > /debug/tracing/trace_entries
|
||||
-bash: echo: write error: Cannot allocate memory
|
||||
|
|
|
@ -1,47 +0,0 @@
|
|||
Kernel driver i2c-i810
|
||||
|
||||
Supported adapters:
|
||||
* Intel 82810, 82810-DC100, 82810E, and 82815 (GMCH)
|
||||
* Intel 82845G (GMCH)
|
||||
|
||||
Authors:
|
||||
Frodo Looijaard <frodol@dds.nl>,
|
||||
Philip Edelbrock <phil@netroedge.com>,
|
||||
Kyösti Mälkki <kmalkki@cc.hut.fi>,
|
||||
Ralph Metzler <rjkm@thp.uni-koeln.de>,
|
||||
Mark D. Studebaker <mdsxyz123@yahoo.com>
|
||||
|
||||
Main contact: Mark Studebaker <mdsxyz123@yahoo.com>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
WARNING: If you have an '810' or '815' motherboard, your standard I2C
|
||||
temperature sensors are most likely on the 801's I2C bus. You want the
|
||||
i2c-i801 driver for those, not this driver.
|
||||
|
||||
Now for the i2c-i810...
|
||||
|
||||
The GMCH chip contains two I2C interfaces.
|
||||
|
||||
The first interface is used for DDC (Data Display Channel) which is a
|
||||
serial channel through the VGA monitor connector to a DDC-compliant
|
||||
monitor. This interface is defined by the Video Electronics Standards
|
||||
Association (VESA). The standards are available for purchase at
|
||||
http://www.vesa.org .
|
||||
|
||||
The second interface is a general-purpose I2C bus. It may be connected to a
|
||||
TV-out chip such as the BT869 or possibly to a digital flat-panel display.
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
Both busses use the i2c-algo-bit driver for 'bit banging'
|
||||
and support for specific transactions is provided by i2c-algo-bit.
|
||||
|
||||
Issues
|
||||
------
|
||||
|
||||
If you enable bus testing in i2c-algo-bit (insmod i2c-algo-bit bit_test=1),
|
||||
the test may fail; if so, the i2c-i810 driver won't be inserted. However,
|
||||
we think this has been fixed.
|
|
@ -1,23 +0,0 @@
|
|||
Kernel driver i2c-prosavage
|
||||
|
||||
Supported adapters:
|
||||
|
||||
S3/VIA KM266/VT8375 aka ProSavage8
|
||||
S3/VIA KM133/VT8365 aka Savage4
|
||||
|
||||
Author: Henk Vergonet <henk@god.dyndns.org>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
The Savage4 chips contain two I2C interfaces (aka a I2C 'master' or
|
||||
'host').
|
||||
|
||||
The first interface is used for DDC (Data Display Channel) which is a
|
||||
serial channel through the VGA monitor connector to a DDC-compliant
|
||||
monitor. This interface is defined by the Video Electronics Standards
|
||||
Association (VESA). The standards are available for purchase at
|
||||
http://www.vesa.org . The second interface is a general-purpose I2C bus.
|
||||
|
||||
Usefull for gaining access to the TV Encoder chips.
|
||||
|
|
@ -1,26 +0,0 @@
|
|||
Kernel driver i2c-savage4
|
||||
|
||||
Supported adapters:
|
||||
* Savage4
|
||||
* Savage2000
|
||||
|
||||
Authors:
|
||||
Alexander Wold <awold@bigfoot.com>,
|
||||
Mark D. Studebaker <mdsxyz123@yahoo.com>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
The Savage4 chips contain two I2C interfaces (aka a I2C 'master'
|
||||
or 'host').
|
||||
|
||||
The first interface is used for DDC (Data Display Channel) which is a
|
||||
serial channel through the VGA monitor connector to a DDC-compliant
|
||||
monitor. This interface is defined by the Video Electronics Standards
|
||||
Association (VESA). The standards are available for purchase at
|
||||
http://www.vesa.org . The DDC bus is not yet supported because its register
|
||||
is not directly memory-mapped.
|
||||
|
||||
The second interface is a general-purpose I2C bus. This is the only
|
||||
interface supported by the driver at the moment.
|
||||
|
|
@ -49,7 +49,7 @@ $ modprobe max6875 force=0,0x50
|
|||
|
||||
The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
|
||||
addresses. For example, for address 0x50, it also reserves 0x51.
|
||||
The even-address instance is called 'max6875', the odd one is 'max6875 subclient'.
|
||||
The even-address instance is called 'max6875', the odd one is 'dummy'.
|
||||
|
||||
|
||||
Programming the chip using i2c-dev
|
||||
|
|
|
@ -7,7 +7,7 @@ drivers/gpio/pca9539.c instead.
|
|||
Supported chips:
|
||||
* Philips PCA9539
|
||||
Prefix: 'pca9539'
|
||||
Addresses scanned: 0x74 - 0x77
|
||||
Addresses scanned: none
|
||||
Datasheet:
|
||||
http://www.semiconductors.philips.com/acrobat/datasheets/PCA9539_2.pdf
|
||||
|
||||
|
@ -23,6 +23,14 @@ The input sense can also be inverted.
|
|||
The 16 lines are split between two bytes.
|
||||
|
||||
|
||||
Detection
|
||||
---------
|
||||
|
||||
The PCA9539 is difficult to detect and not commonly found in PC machines,
|
||||
so you have to pass the I2C bus and address of the installed PCA9539
|
||||
devices explicitly to the driver at load time via the force=... parameter.
|
||||
|
||||
|
||||
Sysfs entries
|
||||
-------------
|
||||
|
||||
|
|
|
@ -4,13 +4,13 @@ Kernel driver pcf8574
|
|||
Supported chips:
|
||||
* Philips PCF8574
|
||||
Prefix: 'pcf8574'
|
||||
Addresses scanned: I2C 0x20 - 0x27
|
||||
Addresses scanned: none
|
||||
Datasheet: Publicly available at the Philips Semiconductors website
|
||||
http://www.semiconductors.philips.com/pip/PCF8574P.html
|
||||
|
||||
* Philips PCF8574A
|
||||
Prefix: 'pcf8574a'
|
||||
Addresses scanned: I2C 0x38 - 0x3f
|
||||
Addresses scanned: none
|
||||
Datasheet: Publicly available at the Philips Semiconductors website
|
||||
http://www.semiconductors.philips.com/pip/PCF8574P.html
|
||||
|
||||
|
@ -38,12 +38,10 @@ For more informations see the datasheet.
|
|||
Accessing PCF8574(A) via /sys interface
|
||||
-------------------------------------
|
||||
|
||||
! Be careful !
|
||||
The PCF8574(A) is plainly impossible to detect ! Stupid chip.
|
||||
So every chip with address in the interval [20..27] and [38..3f] are
|
||||
detected as PCF8574(A). If you have other chips in this address
|
||||
range, the workaround is to load this module after the one
|
||||
for your others chips.
|
||||
So, you have to pass the I2C bus and address of the installed PCF857A
|
||||
and PCF8574A devices explicitly to the driver at load time via the
|
||||
force=... parameter.
|
||||
|
||||
On detection (i.e. insmod, modprobe et al.), directories are being
|
||||
created for each detected PCF8574(A):
|
||||
|
|
|
@ -40,12 +40,9 @@ Detection
|
|||
---------
|
||||
|
||||
There is no method known to detect whether a chip on a given I2C address is
|
||||
a PCF8575 or whether it is any other I2C device. So there are two alternatives
|
||||
to let the driver find the installed PCF8575 devices:
|
||||
- Load this driver after any other I2C driver for I2C devices with addresses
|
||||
in the range 0x20 .. 0x27.
|
||||
- Pass the I2C bus and address of the installed PCF8575 devices explicitly to
|
||||
the driver at load time via the probe=... or force=... parameters.
|
||||
a PCF8575 or whether it is any other I2C device, so you have to pass the I2C
|
||||
bus and address of the installed PCF8575 devices explicitly to the driver at
|
||||
load time via the force=... parameter.
|
||||
|
||||
/sys interface
|
||||
--------------
|
||||
|
|
|
@ -0,0 +1,127 @@
|
|||
This is a summary of the most important conventions for use of fault
|
||||
codes in the I2C/SMBus stack.
|
||||
|
||||
|
||||
A "Fault" is not always an "Error"
|
||||
----------------------------------
|
||||
Not all fault reports imply errors; "page faults" should be a familiar
|
||||
example. Software often retries idempotent operations after transient
|
||||
faults. There may be fancier recovery schemes that are appropriate in
|
||||
some cases, such as re-initializing (and maybe resetting). After such
|
||||
recovery, triggered by a fault report, there is no error.
|
||||
|
||||
In a similar way, sometimes a "fault" code just reports one defined
|
||||
result for an operation ... it doesn't indicate that anything is wrong
|
||||
at all, just that the outcome wasn't on the "golden path".
|
||||
|
||||
In short, your I2C driver code may need to know these codes in order
|
||||
to respond correctly. Other code may need to rely on YOUR code reporting
|
||||
the right fault code, so that it can (in turn) behave correctly.
|
||||
|
||||
|
||||
I2C and SMBus fault codes
|
||||
-------------------------
|
||||
These are returned as negative numbers from most calls, with zero or
|
||||
some positive number indicating a non-fault return. The specific
|
||||
numbers associated with these symbols differ between architectures,
|
||||
though most Linux systems use <asm-generic/errno*.h> numbering.
|
||||
|
||||
Note that the descriptions here are not exhaustive. There are other
|
||||
codes that may be returned, and other cases where these codes should
|
||||
be returned. However, drivers should not return other codes for these
|
||||
cases (unless the hardware doesn't provide unique fault reports).
|
||||
|
||||
Also, codes returned by adapter probe methods follow rules which are
|
||||
specific to their host bus (such as PCI, or the platform bus).
|
||||
|
||||
|
||||
EAGAIN
|
||||
Returned by I2C adapters when they lose arbitration in master
|
||||
transmit mode: some other master was transmitting different
|
||||
data at the same time.
|
||||
|
||||
Also returned when trying to invoke an I2C operation in an
|
||||
atomic context, when some task is already using that I2C bus
|
||||
to execute some other operation.
|
||||
|
||||
EBADMSG
|
||||
Returned by SMBus logic when an invalid Packet Error Code byte
|
||||
is received. This code is a CRC covering all bytes in the
|
||||
transaction, and is sent before the terminating STOP. This
|
||||
fault is only reported on read transactions; the SMBus slave
|
||||
may have a way to report PEC mismatches on writes from the
|
||||
host. Note that even if PECs are in use, you should not rely
|
||||
on these as the only way to detect incorrect data transfers.
|
||||
|
||||
EBUSY
|
||||
Returned by SMBus adapters when the bus was busy for longer
|
||||
than allowed. This usually indicates some device (maybe the
|
||||
SMBus adapter) needs some fault recovery (such as resetting),
|
||||
or that the reset was attempted but failed.
|
||||
|
||||
EINVAL
|
||||
This rather vague error means an invalid parameter has been
|
||||
detected before any I/O operation was started. Use a more
|
||||
specific fault code when you can.
|
||||
|
||||
One example would be a driver trying an SMBus Block Write
|
||||
with block size outside the range of 1-32 bytes.
|
||||
|
||||
EIO
|
||||
This rather vague error means something went wrong when
|
||||
performing an I/O operation. Use a more specific fault
|
||||
code when you can.
|
||||
|
||||
ENODEV
|
||||
Returned by driver probe() methods. This is a bit more
|
||||
specific than ENXIO, implying the problem isn't with the
|
||||
address, but with the device found there. Driver probes
|
||||
may verify the device returns *correct* responses, and
|
||||
return this as appropriate. (The driver core will warn
|
||||
about probe faults other than ENXIO and ENODEV.)
|
||||
|
||||
ENOMEM
|
||||
Returned by any component that can't allocate memory when
|
||||
it needs to do so.
|
||||
|
||||
ENXIO
|
||||
Returned by I2C adapters to indicate that the address phase
|
||||
of a transfer didn't get an ACK. While it might just mean
|
||||
an I2C device was temporarily not responding, usually it
|
||||
means there's nothing listening at that address.
|
||||
|
||||
Returned by driver probe() methods to indicate that they
|
||||
found no device to bind to. (ENODEV may also be used.)
|
||||
|
||||
EOPNOTSUPP
|
||||
Returned by an adapter when asked to perform an operation
|
||||
that it doesn't, or can't, support.
|
||||
|
||||
For example, this would be returned when an adapter that
|
||||
doesn't support SMBus block transfers is asked to execute
|
||||
one. In that case, the driver making that request should
|
||||
have verified that functionality was supported before it
|
||||
made that block transfer request.
|
||||
|
||||
Similarly, if an I2C adapter can't execute all legal I2C
|
||||
messages, it should return this when asked to perform a
|
||||
transaction it can't. (These limitations can't be seen in
|
||||
the adapter's functionality mask, since the assumption is
|
||||
that if an adapter supports I2C it supports all of I2C.)
|
||||
|
||||
EPROTO
|
||||
Returned when slave does not conform to the relevant I2C
|
||||
or SMBus (or chip-specific) protocol specifications. One
|
||||
case is when the length of an SMBus block data response
|
||||
(from the SMBus slave) is outside the range 1-32 bytes.
|
||||
|
||||
ETIMEDOUT
|
||||
This is returned by drivers when an operation took too much
|
||||
time, and was aborted before it completed.
|
||||
|
||||
SMBus adapters may return it when an operation took more
|
||||
time than allowed by the SMBus specification; for example,
|
||||
when a slave stretches clocks too far. I2C has no such
|
||||
timeouts, but it's normal for I2C adapters to impose some
|
||||
arbitrary limits (much longer than SMBus!) too.
|
||||
|
|
@ -42,8 +42,8 @@ Count (8 bits): A data byte containing the length of a block operation.
|
|||
[..]: Data sent by I2C device, as opposed to data sent by the host adapter.
|
||||
|
||||
|
||||
SMBus Quick Command: i2c_smbus_write_quick()
|
||||
=============================================
|
||||
SMBus Quick Command
|
||||
===================
|
||||
|
||||
This sends a single bit to the device, at the place of the Rd/Wr bit.
|
||||
|
||||
|
|
|
@ -44,6 +44,10 @@ static struct i2c_driver foo_driver = {
|
|||
.id_table = foo_ids,
|
||||
.probe = foo_probe,
|
||||
.remove = foo_remove,
|
||||
/* if device autodetection is needed: */
|
||||
.class = I2C_CLASS_SOMETHING,
|
||||
.detect = foo_detect,
|
||||
.address_data = &addr_data,
|
||||
|
||||
/* else, driver uses "legacy" binding model: */
|
||||
.attach_adapter = foo_attach_adapter,
|
||||
|
@ -217,6 +221,31 @@ in the I2C bus driver. You may want to save the returned i2c_client
|
|||
reference for later use.
|
||||
|
||||
|
||||
Device Detection (Standard driver model)
|
||||
----------------------------------------
|
||||
|
||||
Sometimes you do not know in advance which I2C devices are connected to
|
||||
a given I2C bus. This is for example the case of hardware monitoring
|
||||
devices on a PC's SMBus. In that case, you may want to let your driver
|
||||
detect supported devices automatically. This is how the legacy model
|
||||
was working, and is now available as an extension to the standard
|
||||
driver model (so that we can finally get rid of the legacy model.)
|
||||
|
||||
You simply have to define a detect callback which will attempt to
|
||||
identify supported devices (returning 0 for supported ones and -ENODEV
|
||||
for unsupported ones), a list of addresses to probe, and a device type
|
||||
(or class) so that only I2C buses which may have that type of device
|
||||
connected (and not otherwise enumerated) will be probed. The i2c
|
||||
core will then call you back as needed and will instantiate a device
|
||||
for you for every successful detection.
|
||||
|
||||
Note that this mechanism is purely optional and not suitable for all
|
||||
devices. You need some reliable way to identify the supported devices
|
||||
(typically using device-specific, dedicated identification registers),
|
||||
otherwise misdetections are likely to occur and things can get wrong
|
||||
quickly.
|
||||
|
||||
|
||||
Device Deletion (Standard driver model)
|
||||
---------------------------------------
|
||||
|
||||
|
@ -569,7 +598,6 @@ SMBus communication
|
|||
in terms of it. Never use this function directly!
|
||||
|
||||
|
||||
extern s32 i2c_smbus_write_quick(struct i2c_client * client, u8 value);
|
||||
extern s32 i2c_smbus_read_byte(struct i2c_client * client);
|
||||
extern s32 i2c_smbus_write_byte(struct i2c_client * client, u8 value);
|
||||
extern s32 i2c_smbus_read_byte_data(struct i2c_client * client, u8 command);
|
||||
|
@ -578,30 +606,31 @@ SMBus communication
|
|||
extern s32 i2c_smbus_read_word_data(struct i2c_client * client, u8 command);
|
||||
extern s32 i2c_smbus_write_word_data(struct i2c_client * client,
|
||||
u8 command, u16 value);
|
||||
extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
|
||||
u8 command, u8 *values);
|
||||
extern s32 i2c_smbus_write_block_data(struct i2c_client * client,
|
||||
u8 command, u8 length,
|
||||
u8 *values);
|
||||
extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
|
||||
u8 command, u8 length, u8 *values);
|
||||
|
||||
These ones were removed in Linux 2.6.10 because they had no users, but could
|
||||
be added back later if needed:
|
||||
|
||||
extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
|
||||
u8 command, u8 *values);
|
||||
extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client * client,
|
||||
u8 command, u8 length,
|
||||
u8 *values);
|
||||
|
||||
These ones were removed from i2c-core because they had no users, but could
|
||||
be added back later if needed:
|
||||
|
||||
extern s32 i2c_smbus_write_quick(struct i2c_client * client, u8 value);
|
||||
extern s32 i2c_smbus_process_call(struct i2c_client * client,
|
||||
u8 command, u16 value);
|
||||
extern s32 i2c_smbus_block_process_call(struct i2c_client *client,
|
||||
u8 command, u8 length,
|
||||
u8 *values)
|
||||
|
||||
All these transactions return -1 on failure. The 'write' transactions
|
||||
return 0 on success; the 'read' transactions return the read value, except
|
||||
for read_block, which returns the number of values read. The block buffers
|
||||
need not be longer than 32 bytes.
|
||||
All these transactions return a negative errno value on failure. The 'write'
|
||||
transactions return 0 on success; the 'read' transactions return the read
|
||||
value, except for block transactions, which return the number of values
|
||||
read. The block buffers need not be longer than 32 bytes.
|
||||
|
||||
You can read the file `smbus-protocol' for more information about the
|
||||
actual SMBus protocol.
|
||||
|
|
|
@ -0,0 +1,137 @@
|
|||
Paravirt_ops on IA64
|
||||
====================
|
||||
21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp>
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
The aim of this documentation is to help with maintainability and/or to
|
||||
encourage people to use paravirt_ops/IA64.
|
||||
|
||||
paravirt_ops (pv_ops in short) is a way for virtualization support of
|
||||
Linux kernel on x86. Several ways for virtualization support were
|
||||
proposed, paravirt_ops is the winner.
|
||||
On the other hand, now there are also several IA64 virtualization
|
||||
technologies like kvm/IA64, xen/IA64 and many other academic IA64
|
||||
hypervisors so that it is good to add generic virtualization
|
||||
infrastructure on Linux/IA64.
|
||||
|
||||
|
||||
What is paravirt_ops?
|
||||
---------------------
|
||||
It has been developed on x86 as virtualization support via API, not ABI.
|
||||
It allows each hypervisor to override operations which are important for
|
||||
hypervisors at API level. And it allows a single kernel binary to run on
|
||||
all supported execution environments including native machine.
|
||||
Essentially paravirt_ops is a set of function pointers which represent
|
||||
operations corresponding to low level sensitive instructions and high
|
||||
level functionalities in various area. But one significant difference
|
||||
from usual function pointer table is that it allows optimization with
|
||||
binary patch. It is because some of these operations are very
|
||||
performance sensitive and indirect call overhead is not negligible.
|
||||
With binary patch, indirect C function call can be transformed into
|
||||
direct C function call or in-place execution to eliminate the overhead.
|
||||
|
||||
Thus, operations of paravirt_ops are classified into three categories.
|
||||
- simple indirect call
|
||||
These operations correspond to high level functionality so that the
|
||||
overhead of indirect call isn't very important.
|
||||
|
||||
- indirect call which allows optimization with binary patch
|
||||
Usually these operations correspond to low level instructions. They
|
||||
are called frequently and performance critical. So the overhead is
|
||||
very important.
|
||||
|
||||
- a set of macros for hand written assembly code
|
||||
Hand written assembly codes (.S files) also need paravirtualization
|
||||
because they include sensitive instructions or some of code paths in
|
||||
them are very performance critical.
|
||||
|
||||
|
||||
The relation to the IA64 machine vector
|
||||
---------------------------------------
|
||||
Linux/IA64 has the IA64 machine vector functionality which allows the
|
||||
kernel to switch implementations (e.g. initialization, ipi, dma api...)
|
||||
depending on executing platform.
|
||||
We can replace some implementations very easily defining a new machine
|
||||
vector. Thus another approach for virtualization support would be
|
||||
enhancing the machine vector functionality.
|
||||
But paravirt_ops approach was taken because
|
||||
- virtualization support needs wider support than machine vector does.
|
||||
e.g. low level instruction paravirtualization. It must be
|
||||
initialized very early before platform detection.
|
||||
|
||||
- virtualization support needs more functionality like binary patch.
|
||||
Probably the calling overhead might not be very large compared to the
|
||||
emulation overhead of virtualization. However in the native case, the
|
||||
overhead should be eliminated completely.
|
||||
A single kernel binary should run on each environment including native,
|
||||
and the overhead of paravirt_ops on native environment should be as
|
||||
small as possible.
|
||||
|
||||
- for full virtualization technology, e.g. KVM/IA64 or
|
||||
Xen/IA64 HVM domain, the result would be
|
||||
(the emulated platform machine vector. probably dig) + (pv_ops).
|
||||
This means that the virtualization support layer should be under
|
||||
the machine vector layer.
|
||||
|
||||
Possibly it might be better to move some function pointers from
|
||||
paravirt_ops to machine vector. In fact, Xen domU case utilizes both
|
||||
pv_ops and machine vector.
|
||||
|
||||
|
||||
IA64 paravirt_ops
|
||||
-----------------
|
||||
In this section, the concrete paravirt_ops will be discussed.
|
||||
Because of the architecture difference between ia64 and x86, the
|
||||
resulting set of functions is very different from x86 pv_ops.
|
||||
|
||||
- C function pointer tables
|
||||
They are not very performance critical so that simple C indirect
|
||||
function call is acceptable. The following structures are defined at
|
||||
this moment. For details see linux/include/asm-ia64/paravirt.h
|
||||
- struct pv_info
|
||||
This structure describes the execution environment.
|
||||
- struct pv_init_ops
|
||||
This structure describes the various initialization hooks.
|
||||
- struct pv_iosapic_ops
|
||||
This structure describes hooks to iosapic operations.
|
||||
- struct pv_irq_ops
|
||||
This structure describes hooks to irq related operations
|
||||
- struct pv_time_op
|
||||
This structure describes hooks to steal time accounting.
|
||||
|
||||
- a set of indirect calls which need optimization
|
||||
Currently this class of functions correspond to a subset of IA64
|
||||
intrinsics. At this moment the optimization with binary patch isn't
|
||||
implemented yet.
|
||||
struct pv_cpu_op is defined. For details see
|
||||
linux/include/asm-ia64/paravirt_privop.h
|
||||
Mostly they correspond to ia64 intrinsics 1-to-1.
|
||||
Caveat: Now they are defined as C indirect function pointers, but in
|
||||
order to support binary patch optimization, they will be changed
|
||||
using GCC extended inline assembly code.
|
||||
|
||||
- a set of macros for hand written assembly code (.S files)
|
||||
For maintenance purpose, the taken approach for .S files is single
|
||||
source code and compile multiple times with different macros definitions.
|
||||
Each pv_ops instance must define those macros to compile.
|
||||
The important thing here is that sensitive, but non-privileged
|
||||
instructions must be paravirtualized and that some privileged
|
||||
instructions also need paravirtualization for reasonable performance.
|
||||
Developers who modify .S files must be aware of that. At this moment
|
||||
an easy checker is implemented to detect paravirtualization breakage.
|
||||
But it doesn't cover all the cases.
|
||||
|
||||
Sometimes this set of macros is called pv_cpu_asm_op. But there is no
|
||||
corresponding structure in the source code.
|
||||
Those macros mostly 1:1 correspond to a subset of privileged
|
||||
instructions. See linux/include/asm-ia64/native/inst.h.
|
||||
And some functions written in assembly also need to be overrided so
|
||||
that each pv_ops instance have to define some macros. Again see
|
||||
linux/include/asm-ia64/native/inst.h.
|
||||
|
||||
|
||||
Those structures must be initialized very early before start_kernel.
|
||||
Probably initialized in head.S using multi entry point or some other trick.
|
||||
For native case implementation see linux/arch/ia64/kernel/paravirt.c.
|
|
@ -1,5 +1,3 @@
|
|||
$Id: gameport-programming.txt,v 1.3 2001/04/24 13:51:37 vojtech Exp $
|
||||
|
||||
Programming gameport drivers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
Linux Input drivers v1.0
|
||||
(c) 1999-2001 Vojtech Pavlik <vojtech@ucw.cz>
|
||||
Sponsored by SuSE
|
||||
$Id: input.txt,v 1.8 2002/05/29 03:15:01 bradleym Exp $
|
||||
----------------------------------------------------------------------------
|
||||
|
||||
0. Disclaimer
|
||||
|
|
|
@ -5,8 +5,6 @@
|
|||
|
||||
7 Aug 1998
|
||||
|
||||
$Id: joystick-api.txt,v 1.2 2001/05/08 21:21:23 vojtech Exp $
|
||||
|
||||
1. Initialization
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
|
|
@ -2,7 +2,6 @@
|
|||
(c) 1998-2000 Vojtech Pavlik <vojtech@ucw.cz>
|
||||
(c) 1998 Andree Borrmann <a.borrmann@tu-bs.de>
|
||||
Sponsored by SuSE
|
||||
$Id: joystick-parport.txt,v 1.6 2001/09/25 09:31:32 vojtech Exp $
|
||||
----------------------------------------------------------------------------
|
||||
|
||||
0. Disclaimer
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
Linux Joystick driver v2.0.0
|
||||
(c) 1996-2000 Vojtech Pavlik <vojtech@ucw.cz>
|
||||
Sponsored by SuSE
|
||||
$Id: joystick.txt,v 1.12 2002/03/03 12:13:07 jdeneux Exp $
|
||||
----------------------------------------------------------------------------
|
||||
|
||||
0. Disclaimer
|
||||
|
|
|
@ -117,6 +117,7 @@ Code Seq# Include File Comments
|
|||
<mailto:natalia@nikhefk.nikhef.nl>
|
||||
'c' 00-7F linux/comstats.h conflict!
|
||||
'c' 00-7F linux/coda.h conflict!
|
||||
'c' 80-9F asm-s390/chsc.h
|
||||
'd' 00-FF linux/char/drm/drm/h conflict!
|
||||
'd' 00-DF linux/video_decoder.h conflict!
|
||||
'd' F0-FF linux/digi1.h
|
||||
|
|
|
@ -508,12 +508,13 @@ HDIO_DRIVE_RESET execute a device reset
|
|||
|
||||
error returns:
|
||||
EACCES Access denied: requires CAP_SYS_ADMIN
|
||||
ENXIO No such device: phy dead or ctl_addr == 0
|
||||
EIO I/O error: reset timed out or hardware error
|
||||
|
||||
notes:
|
||||
|
||||
Abort any current command, prevent anything else from being
|
||||
queued, execute a reset on the device, and issue BLKRRPART
|
||||
ioctl on the block device.
|
||||
Execute a reset on the device as soon as the current IO
|
||||
operation has completed.
|
||||
|
||||
Executes an ATAPI soft reset if applicable, otherwise
|
||||
executes an ATA soft reset on the controller.
|
||||
|
|
|
@ -109,7 +109,7 @@ There are two possible methods of using Kdump.
|
|||
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
||||
no need to build a separate dump-capture kernel. This is possible
|
||||
only with the architecutres which support a relocatable kernel. As
|
||||
of today i386 and ia64 architectures support relocatable kernel.
|
||||
of today, i386, x86_64 and ia64 architectures support relocatable kernel.
|
||||
|
||||
Building a relocatable kernel is advantageous from the point of view that
|
||||
one does not have to build a second kernel for capturing the dump. But
|
||||
|
|
|
@ -147,10 +147,14 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
default: 0
|
||||
|
||||
acpi_sleep= [HW,ACPI] Sleep options
|
||||
Format: { s3_bios, s3_mode, s3_beep }
|
||||
Format: { s3_bios, s3_mode, s3_beep, old_ordering }
|
||||
See Documentation/power/video.txt for s3_bios and s3_mode.
|
||||
s3_beep is for debugging; it makes the PC's speaker beep
|
||||
as soon as the kernel's real-mode entry point is called.
|
||||
old_ordering causes the ACPI 1.0 ordering of the _PTS
|
||||
control method, wrt putting devices into low power
|
||||
states, to be enforced (the ACPI 2.0 ordering of _PTS is
|
||||
used by default).
|
||||
|
||||
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
|
||||
Format: { level | edge | high | low }
|
||||
|
@ -271,6 +275,17 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
aic79xx= [HW,SCSI]
|
||||
See Documentation/scsi/aic79xx.txt.
|
||||
|
||||
amd_iommu= [HW,X86-84]
|
||||
Pass parameters to the AMD IOMMU driver in the system.
|
||||
Possible values are:
|
||||
isolate - enable device isolation (each device, as far
|
||||
as possible, will get its own protection
|
||||
domain)
|
||||
amd_iommu_size= [HW,X86-64]
|
||||
Define the size of the aperture for the AMD IOMMU
|
||||
driver. Possible values are:
|
||||
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
||||
|
||||
amijoy.map= [HW,JOY] Amiga joystick support
|
||||
Map of devices attached to JOY0DAT and JOY1DAT
|
||||
Format: <a>,<b>
|
||||
|
@ -560,6 +575,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
|
||||
debug_objects [KNL] Enable object debugging
|
||||
|
||||
debugpat [X86] Enable PAT debugging
|
||||
|
||||
decnet.addr= [HW,NET]
|
||||
Format: <area>[,<node>]
|
||||
See also Documentation/networking/decnet.txt.
|
||||
|
@ -599,6 +616,29 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
See drivers/char/README.epca and
|
||||
Documentation/digiepca.txt.
|
||||
|
||||
disable_mtrr_cleanup [X86]
|
||||
enable_mtrr_cleanup [X86]
|
||||
The kernel tries to adjust MTRR layout from continuous
|
||||
to discrete, to make X server driver able to add WB
|
||||
entry later. This parameter enables/disables that.
|
||||
|
||||
mtrr_chunk_size=nn[KMG] [X86]
|
||||
used for mtrr cleanup. It is largest continous chunk
|
||||
that could hold holes aka. UC entries.
|
||||
|
||||
mtrr_gran_size=nn[KMG] [X86]
|
||||
Used for mtrr cleanup. It is granularity of mtrr block.
|
||||
Default is 1.
|
||||
Large value could prevent small alignment from
|
||||
using up MTRRs.
|
||||
|
||||
mtrr_spare_reg_nr=n [X86]
|
||||
Format: <integer>
|
||||
Range: 0,7 : spare reg number
|
||||
Default : 1
|
||||
Used for mtrr cleanup. It is spare mtrr entries number.
|
||||
Set to 2 or more if your graphical card needs more.
|
||||
|
||||
disable_mtrr_trim [X86, Intel and AMD only]
|
||||
By default the kernel will trim any uncacheable
|
||||
memory out of your available memory pool based on
|
||||
|
@ -722,9 +762,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
hd= [EIDE] (E)IDE hard drive subsystem geometry
|
||||
Format: <cyl>,<head>,<sect>
|
||||
|
||||
hd?= [HW] (E)IDE subsystem
|
||||
hd?lun= See Documentation/ide/ide.txt.
|
||||
|
||||
highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
|
||||
size of <nn>. This works even on boxes that have no
|
||||
highmem otherwise. This also works to reduce highmem
|
||||
|
@ -785,7 +822,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
See Documentation/ide/ide.txt.
|
||||
|
||||
idle= [X86]
|
||||
Format: idle=poll or idle=mwait
|
||||
Format: idle=poll or idle=mwait, idle=halt, idle=nomwait
|
||||
Poll forces a polling idle loop that can slightly improves the performance
|
||||
of waking up a idle CPU, but will use a lot of power and make the system
|
||||
run hot. Not recommended.
|
||||
|
@ -793,6 +830,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
to not use it because it doesn't save as much power as a normal idle
|
||||
loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same
|
||||
as idle=poll.
|
||||
idle=halt. Halt is forced to be used for CPU idle.
|
||||
In such case C2/C3 won't be used again.
|
||||
idle=nomwait. Disable mwait for CPU C-states
|
||||
|
||||
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
|
||||
Claim all unknown PCI IDE storage controllers.
|
||||
|
@ -1166,7 +1206,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
or
|
||||
memmap=0x10000$0x18690000
|
||||
|
||||
memtest= [KNL,X86_64] Enable memtest
|
||||
memtest= [KNL,X86] Enable memtest
|
||||
Format: <integer>
|
||||
range: 0,4 : pattern number
|
||||
default : 0 <disable>
|
||||
|
@ -1208,6 +1248,11 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
mtdparts= [MTD]
|
||||
See drivers/mtd/cmdlinepart.c.
|
||||
|
||||
mtdset= [ARM]
|
||||
ARM/S3C2412 JIVE boot control
|
||||
|
||||
See arch/arm/mach-s3c2412/mach-jive.c
|
||||
|
||||
mtouchusb.raw_coordinates=
|
||||
[HW] Make the MicroTouch USB driver use raw coordinates
|
||||
('y', default) or cooked coordinates ('n')
|
||||
|
@ -1234,6 +1279,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
This usage is only documented in each driver source
|
||||
file if at all.
|
||||
|
||||
nf_conntrack.acct=
|
||||
[NETFILTER] Enable connection tracking flow accounting
|
||||
0 to disable accounting
|
||||
1 to enable accounting
|
||||
Default value depends on CONFIG_NF_CT_ACCT that is
|
||||
going to be removed in 2.6.29.
|
||||
|
||||
nfsaddrs= [NFS]
|
||||
See Documentation/filesystems/nfsroot.txt.
|
||||
|
||||
|
@ -1496,6 +1548,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
Use with caution as certain devices share
|
||||
address decoders between ROMs and other
|
||||
resources.
|
||||
norom [X86-32,X86_64] Do not assign address space to
|
||||
expansion ROMs that do not already have
|
||||
BIOS assigned address ranges.
|
||||
irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
|
||||
assigned automatically to PCI devices. You can
|
||||
make the kernel exclude IRQs of your ISA cards
|
||||
|
@ -1571,6 +1626,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
Format: { parport<nr> | timid | 0 }
|
||||
See also Documentation/parport.txt.
|
||||
|
||||
pmtmr= [X86] Manual setup of pmtmr I/O Port.
|
||||
Override pmtimer IOPort with a hex value.
|
||||
e.g. pmtmr=0x508
|
||||
|
||||
pnpacpi= [ACPI]
|
||||
{ off }
|
||||
|
||||
|
@ -1975,6 +2034,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
|
||||
snd-ymfpci= [HW,ALSA]
|
||||
|
||||
softlockup_panic=
|
||||
[KNL] Should the soft-lockup detector generate panics.
|
||||
|
||||
sonypi.*= [HW] Sony Programmable I/O Control Device driver
|
||||
See Documentation/sonypi.txt
|
||||
|
||||
|
@ -2106,6 +2168,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
Note that genuine overcurrent events won't be
|
||||
reported either.
|
||||
|
||||
unknown_nmi_panic
|
||||
[X86-32,X86-64]
|
||||
Set unknown_nmi_panic=1 early on boot.
|
||||
|
||||
usbcore.autosuspend=
|
||||
[USB] The autosuspend time delay (in seconds) used
|
||||
for newly-detected USB devices (default 2). This
|
||||
|
@ -2116,6 +2182,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||
usbhid.mousepoll=
|
||||
[USBHID] The interval which mice are to be polled at.
|
||||
|
||||
add_efi_memmap [EFI; x86-32,X86-64] Include EFI memory map in
|
||||
kernel's map of available physical RAM.
|
||||
|
||||
vdso= [X86-32,SH,x86-64]
|
||||
vdso=2: enable compat VDSO (default with COMPAT_VDSO)
|
||||
vdso=1: enable VDSO (default)
|
||||
|
|
|
@ -172,6 +172,7 @@ architectures:
|
|||
- ia64 (Does not support probes on instruction slot1.)
|
||||
- sparc64 (Return probes not yet implemented.)
|
||||
- arm
|
||||
- ppc
|
||||
|
||||
3. Configuring Kprobes
|
||||
|
||||
|
|
|
@ -174,8 +174,6 @@ The LED is exposed through the LED subsystem, and can be found in:
|
|||
The mail LED is autodetected, so if you don't have one, the LED device won't
|
||||
be registered.
|
||||
|
||||
If you have a mail LED that is not green, please report this to me.
|
||||
|
||||
Backlight
|
||||
*********
|
||||
|
||||
|
|
|
@ -236,6 +236,11 @@ All md devices contain:
|
|||
writing the word for the desired state, however some states
|
||||
cannot be explicitly set, and some transitions are not allowed.
|
||||
|
||||
Select/poll works on this file. All changes except between
|
||||
active_idle and active (which can be frequent and are not
|
||||
very interesting) are notified. active->active_idle is
|
||||
reported if the metadata is externally managed.
|
||||
|
||||
clear
|
||||
No devices, no size, no level
|
||||
Writing is equivalent to STOP_ARRAY ioctl
|
||||
|
@ -292,6 +297,10 @@ Each directory contains:
|
|||
writemostly - device will only be subject to read
|
||||
requests if there are no other options.
|
||||
This applies only to raid1 arrays.
|
||||
blocked - device has failed, metadata is "external",
|
||||
and the failure hasn't been acknowledged yet.
|
||||
Writes that would write to this device if
|
||||
it were not faulty are blocked.
|
||||
spare - device is working, but not a full member.
|
||||
This includes spares that are in the process
|
||||
of being recovered to
|
||||
|
@ -301,6 +310,12 @@ Each directory contains:
|
|||
Writing "remove" removes the device from the array.
|
||||
Writing "writemostly" sets the writemostly flag.
|
||||
Writing "-writemostly" clears the writemostly flag.
|
||||
Writing "blocked" sets the "blocked" flag.
|
||||
Writing "-blocked" clear the "blocked" flag and allows writes
|
||||
to complete.
|
||||
|
||||
This file responds to select/poll. Any change to 'faulty'
|
||||
or 'blocked' causes an event.
|
||||
|
||||
errors
|
||||
An approximate count of read errors that have been detected on
|
||||
|
@ -332,7 +347,7 @@ Each directory contains:
|
|||
for storage of data. This will normally be the same as the
|
||||
component_size. This can be written while assembling an
|
||||
array. If a value less than the current component_size is
|
||||
written, component_size will be reduced to this value.
|
||||
written, it will be rejected.
|
||||
|
||||
|
||||
An active md device will also contain and entry for each active device
|
||||
|
@ -381,6 +396,19 @@ also have
|
|||
'check' and 'repair' will start the appropriate process
|
||||
providing the current state is 'idle'.
|
||||
|
||||
This file responds to select/poll. Any important change in the value
|
||||
triggers a poll event. Sometimes the value will briefly be
|
||||
"recover" if a recovery seems to be needed, but cannot be
|
||||
achieved. In that case, the transition to "recover" isn't
|
||||
notified, but the transition away is.
|
||||
|
||||
degraded
|
||||
This contains a count of the number of devices by which the
|
||||
arrays is degraded. So an optimal array with show '0'. A
|
||||
single failed/missing drive will show '1', etc.
|
||||
This file responds to select/poll, any increase or decrease
|
||||
in the count of missing devices will trigger an event.
|
||||
|
||||
mismatch_count
|
||||
When performing 'check' and 'repair', and possibly when
|
||||
performing 'resync', md will count the number of errors that are
|
||||
|
|
|
@ -289,35 +289,73 @@ downdelay
|
|||
fail_over_mac
|
||||
|
||||
Specifies whether active-backup mode should set all slaves to
|
||||
the same MAC address (the traditional behavior), or, when
|
||||
enabled, change the bond's MAC address when changing the
|
||||
active interface (i.e., fail over the MAC address itself).
|
||||
the same MAC address at enslavement (the traditional
|
||||
behavior), or, when enabled, perform special handling of the
|
||||
bond's MAC address in accordance with the selected policy.
|
||||
|
||||
Fail over MAC is useful for devices that cannot ever alter
|
||||
their MAC address, or for devices that refuse incoming
|
||||
broadcasts with their own source MAC (which interferes with
|
||||
the ARP monitor).
|
||||
Possible values are:
|
||||
|
||||
The down side of fail over MAC is that every device on the
|
||||
network must be updated via gratuitous ARP, vs. just updating
|
||||
a switch or set of switches (which often takes place for any
|
||||
traffic, not just ARP traffic, if the switch snoops incoming
|
||||
traffic to update its tables) for the traditional method. If
|
||||
the gratuitous ARP is lost, communication may be disrupted.
|
||||
none or 0
|
||||
|
||||
When fail over MAC is used in conjuction with the mii monitor,
|
||||
devices which assert link up prior to being able to actually
|
||||
transmit and receive are particularly susecptible to loss of
|
||||
the gratuitous ARP, and an appropriate updelay setting may be
|
||||
required.
|
||||
This setting disables fail_over_mac, and causes
|
||||
bonding to set all slaves of an active-backup bond to
|
||||
the same MAC address at enslavement time. This is the
|
||||
default.
|
||||
|
||||
A value of 0 disables fail over MAC, and is the default. A
|
||||
value of 1 enables fail over MAC. This option is enabled
|
||||
automatically if the first slave added cannot change its MAC
|
||||
address. This option may be modified via sysfs only when no
|
||||
slaves are present in the bond.
|
||||
active or 1
|
||||
|
||||
This option was added in bonding version 3.2.0.
|
||||
The "active" fail_over_mac policy indicates that the
|
||||
MAC address of the bond should always be the MAC
|
||||
address of the currently active slave. The MAC
|
||||
address of the slaves is not changed; instead, the MAC
|
||||
address of the bond changes during a failover.
|
||||
|
||||
This policy is useful for devices that cannot ever
|
||||
alter their MAC address, or for devices that refuse
|
||||
incoming broadcasts with their own source MAC (which
|
||||
interferes with the ARP monitor).
|
||||
|
||||
The down side of this policy is that every device on
|
||||
the network must be updated via gratuitous ARP,
|
||||
vs. just updating a switch or set of switches (which
|
||||
often takes place for any traffic, not just ARP
|
||||
traffic, if the switch snoops incoming traffic to
|
||||
update its tables) for the traditional method. If the
|
||||
gratuitous ARP is lost, communication may be
|
||||
disrupted.
|
||||
|
||||
When this policy is used in conjuction with the mii
|
||||
monitor, devices which assert link up prior to being
|
||||
able to actually transmit and receive are particularly
|
||||
susecptible to loss of the gratuitous ARP, and an
|
||||
appropriate updelay setting may be required.
|
||||
|
||||
follow or 2
|
||||
|
||||
The "follow" fail_over_mac policy causes the MAC
|
||||
address of the bond to be selected normally (normally
|
||||
the MAC address of the first slave added to the bond).
|
||||
However, the second and subsequent slaves are not set
|
||||
to this MAC address while they are in a backup role; a
|
||||
slave is programmed with the bond's MAC address at
|
||||
failover time (and the formerly active slave receives
|
||||
the newly active slave's MAC address).
|
||||
|
||||
This policy is useful for multiport devices that
|
||||
either become confused or incur a performance penalty
|
||||
when multiple ports are programmed with the same MAC
|
||||
address.
|
||||
|
||||
|
||||
The default policy is none, unless the first slave cannot
|
||||
change its MAC address, in which case the active policy is
|
||||
selected by default.
|
||||
|
||||
This option may be modified via sysfs only when no slaves are
|
||||
present in the bond.
|
||||
|
||||
This option was added in bonding version 3.2.0. The "follow"
|
||||
policy was added in bonding version 3.3.0.
|
||||
|
||||
lacp_rate
|
||||
|
||||
|
@ -338,7 +376,8 @@ max_bonds
|
|||
Specifies the number of bonding devices to create for this
|
||||
instance of the bonding driver. E.g., if max_bonds is 3, and
|
||||
the bonding driver is not already loaded, then bond0, bond1
|
||||
and bond2 will be created. The default value is 1.
|
||||
and bond2 will be created. The default value is 1. Specifying
|
||||
a value of 0 will load bonding, but will not create any devices.
|
||||
|
||||
miimon
|
||||
|
||||
|
@ -501,6 +540,17 @@ mode
|
|||
swapped with the new curr_active_slave that was
|
||||
chosen.
|
||||
|
||||
num_grat_arp
|
||||
|
||||
Specifies the number of gratuitous ARPs to be issued after a
|
||||
failover event. One gratuitous ARP is issued immediately after
|
||||
the failover, subsequent ARPs are sent at a rate of one per link
|
||||
monitor interval (arp_interval or miimon, whichever is active).
|
||||
|
||||
The valid range is 0 - 255; the default value is 1. This option
|
||||
affects only the active-backup mode. This option was added for
|
||||
bonding version 3.3.0.
|
||||
|
||||
primary
|
||||
|
||||
A string (eth0, eth2, etc) specifying which slave is the
|
||||
|
|
|
@ -0,0 +1,167 @@
|
|||
DM9000 Network driver
|
||||
=====================
|
||||
|
||||
Copyright 2008 Simtec Electronics,
|
||||
Ben Dooks <ben@simtec.co.uk> <ben-linux@fluff.org>
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
This file describes how to use the DM9000 platform-device based network driver
|
||||
that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h.
|
||||
|
||||
The driver supports three DM9000 variants, the DM9000E which is the first chip
|
||||
supported as well as the newer DM9000A and DM9000B devices. It is currently
|
||||
maintained and tested by Ben Dooks, who should be CC: to any patches for this
|
||||
driver.
|
||||
|
||||
|
||||
Defining the platform device
|
||||
----------------------------
|
||||
|
||||
The minimum set of resources attached to the platform device are as follows:
|
||||
|
||||
1) The physical address of the address register
|
||||
2) The physical address of the data register
|
||||
3) The IRQ line the device's interrupt pin is connected to.
|
||||
|
||||
These resources should be specified in that order, as the ordering of the
|
||||
two address regions is important (the driver expects these to be address
|
||||
and then data).
|
||||
|
||||
An example from arch/arm/mach-s3c2410/mach-bast.c is:
|
||||
|
||||
static struct resource bast_dm9k_resource[] = {
|
||||
[0] = {
|
||||
.start = S3C2410_CS5 + BAST_PA_DM9000,
|
||||
.end = S3C2410_CS5 + BAST_PA_DM9000 + 3,
|
||||
.flags = IORESOURCE_MEM,
|
||||
},
|
||||
[1] = {
|
||||
.start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40,
|
||||
.end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f,
|
||||
.flags = IORESOURCE_MEM,
|
||||
},
|
||||
[2] = {
|
||||
.start = IRQ_DM9000,
|
||||
.end = IRQ_DM9000,
|
||||
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL,
|
||||
}
|
||||
};
|
||||
|
||||
static struct platform_device bast_device_dm9k = {
|
||||
.name = "dm9000",
|
||||
.id = 0,
|
||||
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
|
||||
.resource = bast_dm9k_resource,
|
||||
};
|
||||
|
||||
Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags,
|
||||
as this will generate a warning if it is not present. The trigger from
|
||||
the flags field will be passed to request_irq() when registering the IRQ
|
||||
handler to ensure that the IRQ is setup correctly.
|
||||
|
||||
This shows a typical platform device, without the optional configuration
|
||||
platform data supplied. The next example uses the same resources, but adds
|
||||
the optional platform data to pass extra configuration data:
|
||||
|
||||
static struct dm9000_plat_data bast_dm9k_platdata = {
|
||||
.flags = DM9000_PLATF_16BITONLY,
|
||||
};
|
||||
|
||||
static struct platform_device bast_device_dm9k = {
|
||||
.name = "dm9000",
|
||||
.id = 0,
|
||||
.num_resources = ARRAY_SIZE(bast_dm9k_resource),
|
||||
.resource = bast_dm9k_resource,
|
||||
.dev = {
|
||||
.platform_data = &bast_dm9k_platdata,
|
||||
}
|
||||
};
|
||||
|
||||
The platform data is defined in include/linux/dm9000.h and described below.
|
||||
|
||||
|
||||
Platform data
|
||||
-------------
|
||||
|
||||
Extra platform data for the DM9000 can describe the IO bus width to the
|
||||
device, whether or not an external PHY is attached to the device and
|
||||
the availability of an external configuration EEPROM.
|
||||
|
||||
The flags for the platform data .flags field are as follows:
|
||||
|
||||
DM9000_PLATF_8BITONLY
|
||||
|
||||
The IO should be done with 8bit operations.
|
||||
|
||||
DM9000_PLATF_16BITONLY
|
||||
|
||||
The IO should be done with 16bit operations.
|
||||
|
||||
DM9000_PLATF_32BITONLY
|
||||
|
||||
The IO should be done with 32bit operations.
|
||||
|
||||
DM9000_PLATF_EXT_PHY
|
||||
|
||||
The chip is connected to an external PHY.
|
||||
|
||||
DM9000_PLATF_NO_EEPROM
|
||||
|
||||
This can be used to signify that the board does not have an
|
||||
EEPROM, or that the EEPROM should be hidden from the user.
|
||||
|
||||
DM9000_PLATF_SIMPLE_PHY
|
||||
|
||||
Switch to using the simpler PHY polling method which does not
|
||||
try and read the MII PHY state regularly. This is only available
|
||||
when using the internal PHY. See the section on link state polling
|
||||
for more information.
|
||||
|
||||
The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry
|
||||
"Force simple NSR based PHY polling" allows this flag to be
|
||||
forced on at build time.
|
||||
|
||||
|
||||
PHY Link state polling
|
||||
----------------------
|
||||
|
||||
The driver keeps track of the link state and informs the network core
|
||||
about link (carrier) availablilty. This is managed by several methods
|
||||
depending on the version of the chip and on which PHY is being used.
|
||||
|
||||
For the internal PHY, the original (and currently default) method is
|
||||
to read the MII state, either when the status changes if we have the
|
||||
necessary interrupt support in the chip or every two seconds via a
|
||||
periodic timer.
|
||||
|
||||
To reduce the overhead for the internal PHY, there is now the option
|
||||
of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY
|
||||
platform data option to read the summary information without the
|
||||
expensive MII accesses. This method is faster, but does not print
|
||||
as much information.
|
||||
|
||||
When using an external PHY, the driver currently has to poll the MII
|
||||
link status as there is no method for getting an interrupt on link change.
|
||||
|
||||
|
||||
DM9000A / DM9000B
|
||||
-----------------
|
||||
|
||||
These chips are functionally similar to the DM9000E and are supported easily
|
||||
by the same driver. The features are:
|
||||
|
||||
1) Interrupt on internal PHY state change. This means that the periodic
|
||||
polling of the PHY status may be disabled on these devices when using
|
||||
the internal PHY.
|
||||
|
||||
2) TCP/UDP checksum offloading, which the driver does not currently support.
|
||||
|
||||
|
||||
ethtool
|
||||
-------
|
||||
|
||||
The driver supports the ethtool interface for access to the driver
|
||||
state information, the PHY state and the EEPROM.
|
|
@ -513,21 +513,11 @@ Additional Configurations
|
|||
Intel(R) PRO/1000 PT Dual Port Server Connection
|
||||
Intel(R) PRO/1000 PT Dual Port Server Adapter
|
||||
Intel(R) PRO/1000 PF Dual Port Server Adapter
|
||||
Intel(R) PRO/1000 PT Quad Port Server Adapter
|
||||
Intel(R) PRO/1000 PT Quad Port Server Adapter
|
||||
|
||||
NAPI
|
||||
----
|
||||
NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled
|
||||
or disabled based on the configuration of the kernel. To override
|
||||
the default, use the following compile-time flags.
|
||||
|
||||
To enable NAPI, compile the driver module, passing in a configuration option:
|
||||
|
||||
make CFLAGS_EXTRA=-DE1000_NAPI install
|
||||
|
||||
To disable NAPI, compile the driver module, passing in a configuration option:
|
||||
|
||||
make CFLAGS_EXTRA=-DE1000_NO_NAPI install
|
||||
NAPI (Rx polling mode) is enabled in the e1000 driver.
|
||||
|
||||
See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
|
||||
|
||||
|
|
|
@ -551,8 +551,9 @@ icmp_echo_ignore_broadcasts - BOOLEAN
|
|||
icmp_ratelimit - INTEGER
|
||||
Limit the maximal rates for sending ICMP packets whose type matches
|
||||
icmp_ratemask (see below) to specific targets.
|
||||
0 to disable any limiting, otherwise the maximal rate in jiffies(1)
|
||||
Default: 100
|
||||
0 to disable any limiting,
|
||||
otherwise the minimal space between responses in milliseconds.
|
||||
Default: 1000
|
||||
|
||||
icmp_ratemask - INTEGER
|
||||
Mask made of ICMP types for which rates are being limited.
|
||||
|
@ -1023,11 +1024,23 @@ max_addresses - INTEGER
|
|||
autoconfigured addresses.
|
||||
Default: 16
|
||||
|
||||
disable_ipv6 - BOOLEAN
|
||||
Disable IPv6 operation.
|
||||
Default: FALSE (enable IPv6 operation)
|
||||
|
||||
accept_dad - INTEGER
|
||||
Whether to accept DAD (Duplicate Address Detection).
|
||||
0: Disable DAD
|
||||
1: Enable DAD (default)
|
||||
2: Enable DAD, and disable IPv6 operation if MAC-based duplicate
|
||||
link-local address has been found.
|
||||
|
||||
icmp/*:
|
||||
ratelimit - INTEGER
|
||||
Limit the maximal rates for sending ICMPv6 packets.
|
||||
0 to disable any limiting, otherwise the maximal rate in jiffies(1)
|
||||
Default: 100
|
||||
0 to disable any limiting,
|
||||
otherwise the minimal space between responses in milliseconds.
|
||||
Default: 1000
|
||||
|
||||
|
||||
IPv6 Update by:
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
Linux* Base Driver for the Intel(R) PRO/10GbE Family of Adapters
|
||||
================================================================
|
||||
Linux Base Driver for 10 Gigabit Intel(R) Network Connection
|
||||
=============================================================
|
||||
|
||||
November 17, 2004
|
||||
October 9, 2007
|
||||
|
||||
|
||||
Contents
|
||||
|
@ -9,94 +9,151 @@ Contents
|
|||
|
||||
- In This Release
|
||||
- Identifying Your Adapter
|
||||
- Building and Installation
|
||||
- Command Line Parameters
|
||||
- Improving Performance
|
||||
- Additional Configurations
|
||||
- Known Issues/Troubleshooting
|
||||
- Support
|
||||
|
||||
|
||||
|
||||
In This Release
|
||||
===============
|
||||
|
||||
This file describes the Linux* Base Driver for the Intel(R) PRO/10GbE Family
|
||||
of Adapters, version 1.0.x.
|
||||
This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
|
||||
Network Connection. This driver includes support for Itanium(R)2-based
|
||||
systems.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your 10 Gigabit adapter. All hardware requirements listed apply
|
||||
to use with Linux.
|
||||
|
||||
The following features are available in this kernel:
|
||||
- Native VLANs
|
||||
- Channel Bonding (teaming)
|
||||
- SNMP
|
||||
|
||||
Channel Bonding documentation can be found in the Linux kernel source:
|
||||
/Documentation/networking/bonding.txt
|
||||
|
||||
The driver information previously displayed in the /proc filesystem is not
|
||||
supported in this release. Alternatively, you can use ethtool (version 1.6
|
||||
or later), lspci, and ifconfig to obtain the same information.
|
||||
|
||||
Instructions on updating ethtool can be found in the section "Additional
|
||||
Configurations" later in this document.
|
||||
|
||||
For questions related to hardware requirements, refer to the documentation
|
||||
supplied with your Intel PRO/10GbE adapter. All hardware requirements listed
|
||||
apply to use with Linux.
|
||||
|
||||
Identifying Your Adapter
|
||||
========================
|
||||
|
||||
To verify your Intel adapter is supported, find the board ID number on the
|
||||
adapter. Look for a label that has a barcode and a number in the format
|
||||
A12345-001.
|
||||
The following Intel network adapters are compatible with the drivers in this
|
||||
release:
|
||||
|
||||
Use the above information and the Adapter & Driver ID Guide at:
|
||||
Controller Adapter Name Physical Layer
|
||||
---------- ------------ --------------
|
||||
82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber)
|
||||
Server Adapters 10G Base-SR (850 nm optical fiber)
|
||||
10G Base-CX4(twin-axial copper cabling)
|
||||
|
||||
http://support.intel.com/support/network/adapter/pro100/21397.htm
|
||||
For more information on how to identify your adapter, go to the Adapter &
|
||||
Driver ID Guide at:
|
||||
|
||||
For the latest Intel network drivers for Linux, go to:
|
||||
http://support.intel.com/support/network/sb/CS-012904.htm
|
||||
|
||||
|
||||
Building and Installation
|
||||
=========================
|
||||
|
||||
select m for "Intel(R) PRO/10GbE support" located at:
|
||||
Location:
|
||||
-> Device Drivers
|
||||
-> Network device support (NETDEVICES [=y])
|
||||
-> Ethernet (10000 Mbit) (NETDEV_10000 [=y])
|
||||
1. make modules && make modules_install
|
||||
|
||||
2. Load the module:
|
||||
|
||||
modprobe ixgb <parameter>=<value>
|
||||
|
||||
The insmod command can be used if the full
|
||||
path to the driver module is specified. For example:
|
||||
|
||||
insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko
|
||||
|
||||
With 2.6 based kernels also make sure that older ixgb drivers are
|
||||
removed from the kernel, before loading the new module:
|
||||
|
||||
rmmod ixgb; modprobe ixgb
|
||||
|
||||
3. Assign an IP address to the interface by entering the following, where
|
||||
x is the interface number:
|
||||
|
||||
ifconfig ethx <IP_address>
|
||||
|
||||
4. Verify that the interface works. Enter the following, where <IP_address>
|
||||
is the IP address for another machine on the same subnet as the interface
|
||||
that is being tested:
|
||||
|
||||
ping <IP_address>
|
||||
|
||||
http://downloadfinder.intel.com/scripts-df/support_intel.asp
|
||||
|
||||
Command Line Parameters
|
||||
=======================
|
||||
|
||||
If the driver is built as a module, the following optional parameters are
|
||||
used by entering them on the command line with the modprobe or insmod command
|
||||
using this syntax:
|
||||
If the driver is built as a module, the following optional parameters are
|
||||
used by entering them on the command line with the modprobe command using
|
||||
this syntax:
|
||||
|
||||
modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
|
||||
|
||||
insmod ixgb [<option>=<VAL1>,<VAL2>,...]
|
||||
For example, with two 10GbE PCI adapters, entering:
|
||||
|
||||
For example, with two PRO/10GbE PCI adapters, entering:
|
||||
modprobe ixgb TxDescriptors=80,128
|
||||
|
||||
insmod ixgb TxDescriptors=80,128
|
||||
|
||||
loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
|
||||
loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
|
||||
resources for the second adapter.
|
||||
|
||||
The default value for each parameter is generally the recommended setting,
|
||||
unless otherwise noted. Also, if the driver is statically built into the
|
||||
kernel, the driver is loaded with the default values for all the parameters.
|
||||
Ethtool can be used to change some of the parameters at runtime.
|
||||
unless otherwise noted.
|
||||
|
||||
FlowControl
|
||||
Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
|
||||
Default: Read from the EEPROM
|
||||
If EEPROM is not detected, default is 3
|
||||
This parameter controls the automatic generation(Tx) and response(Rx) to
|
||||
Ethernet PAUSE frames.
|
||||
If EEPROM is not detected, default is 1
|
||||
This parameter controls the automatic generation(Tx) and response(Rx) to
|
||||
Ethernet PAUSE frames. There are hardware bugs associated with enabling
|
||||
Tx flow control so beware.
|
||||
|
||||
RxDescriptors
|
||||
Valid Range: 64-512
|
||||
Default Value: 512
|
||||
This value is the number of receive descriptors allocated by the driver.
|
||||
Increasing this value allows the driver to buffer more incoming packets.
|
||||
Each descriptor is 16 bytes. A receive buffer is also allocated for
|
||||
each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
|
||||
depending on the MTU setting. When the MTU size is 1500 or less, the
|
||||
This value is the number of receive descriptors allocated by the driver.
|
||||
Increasing this value allows the driver to buffer more incoming packets.
|
||||
Each descriptor is 16 bytes. A receive buffer is also allocated for
|
||||
each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
|
||||
depending on the MTU setting. When the MTU size is 1500 or less, the
|
||||
receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
|
||||
receive buffer size will be either 4056, 8192, or 16384 bytes. The
|
||||
receive buffer size will be either 4056, 8192, or 16384 bytes. The
|
||||
maximum MTU size is 16114.
|
||||
|
||||
RxIntDelay
|
||||
Valid Range: 0-65535 (0=off)
|
||||
Default Value: 6
|
||||
This value delays the generation of receive interrupts in units of
|
||||
0.8192 microseconds. Receive interrupt reduction can improve CPU
|
||||
efficiency if properly tuned for specific network traffic. Increasing
|
||||
this value adds extra latency to frame reception and can end up
|
||||
decreasing the throughput of TCP traffic. If the system is reporting
|
||||
dropped receives, this value may be set too high, causing the driver to
|
||||
Default Value: 72
|
||||
This value delays the generation of receive interrupts in units of
|
||||
0.8192 microseconds. Receive interrupt reduction can improve CPU
|
||||
efficiency if properly tuned for specific network traffic. Increasing
|
||||
this value adds extra latency to frame reception and can end up
|
||||
decreasing the throughput of TCP traffic. If the system is reporting
|
||||
dropped receives, this value may be set too high, causing the driver to
|
||||
run out of available receive descriptors.
|
||||
|
||||
TxDescriptors
|
||||
Valid Range: 64-4096
|
||||
Default Value: 256
|
||||
This value is the number of transmit descriptors allocated by the driver.
|
||||
Increasing this value allows the driver to queue more transmits. Each
|
||||
Increasing this value allows the driver to queue more transmits. Each
|
||||
descriptor is 16 bytes.
|
||||
|
||||
XsumRX
|
||||
|
@ -105,51 +162,49 @@ Default Value: 1
|
|||
A value of '1' indicates that the driver should enable IP checksum
|
||||
offload for received packets (both UDP and TCP) to the adapter hardware.
|
||||
|
||||
XsumTX
|
||||
Valid Range: 0-1
|
||||
Default Value: 1
|
||||
A value of '1' indicates that the driver should enable IP checksum
|
||||
offload for transmitted packets (both UDP and TCP) to the adapter
|
||||
hardware.
|
||||
|
||||
Improving Performance
|
||||
=====================
|
||||
|
||||
With the Intel PRO/10 GbE adapter, the default Linux configuration will very
|
||||
likely limit the total available throughput artificially. There is a set of
|
||||
things that when applied together increase the ability of Linux to transmit
|
||||
and receive data. The following enhancements were originally acquired from
|
||||
settings published at http://www.spec.org/web99 for various submitted results
|
||||
using Linux.
|
||||
With the 10 Gigabit server adapters, the default Linux configuration will
|
||||
very likely limit the total available throughput artificially. There is a set
|
||||
of configuration changes that, when applied together, will increase the ability
|
||||
of Linux to transmit and receive data. The following enhancements were
|
||||
originally acquired from settings published at http://www.spec.org/web99/ for
|
||||
various submitted results using Linux.
|
||||
|
||||
NOTE: These changes are only suggestions, and serve as a starting point for
|
||||
tuning your network performance.
|
||||
NOTE: These changes are only suggestions, and serve as a starting point for
|
||||
tuning your network performance.
|
||||
|
||||
The changes are made in three major ways, listed in order of greatest effect:
|
||||
- Use ifconfig to modify the mtu (maximum transmission unit) and the txqueuelen
|
||||
- Use ifconfig to modify the mtu (maximum transmission unit) and the txqueuelen
|
||||
parameter.
|
||||
- Use sysctl to modify /proc parameters (essentially kernel tuning)
|
||||
- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
|
||||
- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
|
||||
transmit burst lengths on the bus.
|
||||
|
||||
NOTE: setpci modifies the adapter's configuration registers to allow it to read
|
||||
up to 4k bytes at a time (for transmits). However, for some systems the
|
||||
behavior after modifying this register may be undefined (possibly errors of some
|
||||
kind). A power-cycle, hard reset or explicitly setting the e6 register back to
|
||||
22 (setpci -d 8086:1048 e6.b=22) may be required to get back to a stable
|
||||
configuration.
|
||||
NOTE: setpci modifies the adapter's configuration registers to allow it to read
|
||||
up to 4k bytes at a time (for transmits). However, for some systems the
|
||||
behavior after modifying this register may be undefined (possibly errors of
|
||||
some kind). A power-cycle, hard reset or explicitly setting the e6 register
|
||||
back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
|
||||
stable configuration.
|
||||
|
||||
- COPY these lines and paste them into ixgb_perf.sh:
|
||||
#!/bin/bash
|
||||
echo "configuring network performance , edit this file to change the interface"
|
||||
echo "configuring network performance , edit this file to change the interface
|
||||
or device ID of 10GbE card"
|
||||
# set mmrbc to 4k reads, modify only Intel 10GbE device IDs
|
||||
setpci -d 8086:1048 e6.b=2e
|
||||
# set the MTU (max transmission unit) - it requires your switch and clients to change too!
|
||||
# replace 1a48 with appropriate 10GbE device's ID installed on the system,
|
||||
# if needed.
|
||||
setpci -d 8086:1a48 e6.b=2e
|
||||
# set the MTU (max transmission unit) - it requires your switch and clients
|
||||
# to change as well.
|
||||
# set the txqueuelen
|
||||
# your ixgb adapter should be loaded as eth1 for this to work, change if needed
|
||||
ifconfig eth1 mtu 9000 txqueuelen 1000 up
|
||||
# call the sysctl utility to modify /proc/sys entries
|
||||
sysctl -p ./sysctl_ixgb.conf
|
||||
# call the sysctl utility to modify /proc/sys entries
|
||||
sysctl -p ./sysctl_ixgb.conf
|
||||
- END ixgb_perf.sh
|
||||
|
||||
- COPY these lines and paste them into sysctl_ixgb.conf:
|
||||
|
@ -159,54 +214,220 @@ sysctl -p ./sysctl_ixgb.conf
|
|||
# several network benchmark tests, your mileage may vary
|
||||
|
||||
### IPV4 specific settings
|
||||
net.ipv4.tcp_timestamps = 0 # turns TCP timestamp support off, default 1, reduces CPU use
|
||||
net.ipv4.tcp_sack = 0 # turn SACK support off, default on
|
||||
# on systems with a VERY fast bus -> memory interface this is the big gainer
|
||||
net.ipv4.tcp_rmem = 10000000 10000000 10000000 # sets min/default/max TCP read buffer, default 4096 87380 174760
|
||||
net.ipv4.tcp_wmem = 10000000 10000000 10000000 # sets min/pressure/max TCP write buffer, default 4096 16384 131072
|
||||
net.ipv4.tcp_mem = 10000000 10000000 10000000 # sets min/pressure/max TCP buffer space, default 31744 32256 32768
|
||||
# turn TCP timestamp support off, default 1, reduces CPU use
|
||||
net.ipv4.tcp_timestamps = 0
|
||||
# turn SACK support off, default on
|
||||
# on systems with a VERY fast bus -> memory interface this is the big gainer
|
||||
net.ipv4.tcp_sack = 0
|
||||
# set min/default/max TCP read buffer, default 4096 87380 174760
|
||||
net.ipv4.tcp_rmem = 10000000 10000000 10000000
|
||||
# set min/pressure/max TCP write buffer, default 4096 16384 131072
|
||||
net.ipv4.tcp_wmem = 10000000 10000000 10000000
|
||||
# set min/pressure/max TCP buffer space, default 31744 32256 32768
|
||||
net.ipv4.tcp_mem = 10000000 10000000 10000000
|
||||
|
||||
### CORE settings (mostly for socket and UDP effect)
|
||||
net.core.rmem_max = 524287 # maximum receive socket buffer size, default 131071
|
||||
net.core.wmem_max = 524287 # maximum send socket buffer size, default 131071
|
||||
net.core.rmem_default = 524287 # default receive socket buffer size, default 65535
|
||||
net.core.wmem_default = 524287 # default send socket buffer size, default 65535
|
||||
net.core.optmem_max = 524287 # maximum amount of option memory buffers, default 10240
|
||||
net.core.netdev_max_backlog = 300000 # number of unprocessed input packets before kernel starts dropping them, default 300
|
||||
# set maximum receive socket buffer size, default 131071
|
||||
net.core.rmem_max = 524287
|
||||
# set maximum send socket buffer size, default 131071
|
||||
net.core.wmem_max = 524287
|
||||
# set default receive socket buffer size, default 65535
|
||||
net.core.rmem_default = 524287
|
||||
# set default send socket buffer size, default 65535
|
||||
net.core.wmem_default = 524287
|
||||
# set maximum amount of option memory buffers, default 10240
|
||||
net.core.optmem_max = 524287
|
||||
# set number of unprocessed input packets before kernel starts dropping them; default 300
|
||||
net.core.netdev_max_backlog = 300000
|
||||
- END sysctl_ixgb.conf
|
||||
|
||||
Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
|
||||
your ixgb driver is using.
|
||||
Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
|
||||
your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
|
||||
ID installed on the system.
|
||||
|
||||
NOTE: Unless these scripts are added to the boot process, these changes will
|
||||
only last only until the next system reboot.
|
||||
NOTE: Unless these scripts are added to the boot process, these changes will
|
||||
only last only until the next system reboot.
|
||||
|
||||
|
||||
Resolving Slow UDP Traffic
|
||||
--------------------------
|
||||
If your server does not seem to be able to receive UDP traffic as fast as it
|
||||
can receive TCP traffic, it could be because Linux, by default, does not set
|
||||
the network stack buffers as large as they need to be to support high UDP
|
||||
transfer rates. One way to alleviate this problem is to allow more memory to
|
||||
be used by the IP stack to store incoming data.
|
||||
|
||||
If your server does not seem to be able to receive UDP traffic as fast as it
|
||||
can receive TCP traffic, it could be because Linux, by default, does not set
|
||||
the network stack buffers as large as they need to be to support high UDP
|
||||
transfer rates. One way to alleviate this problem is to allow more memory to
|
||||
be used by the IP stack to store incoming data.
|
||||
|
||||
For instance, use the commands:
|
||||
For instance, use the commands:
|
||||
sysctl -w net.core.rmem_max=262143
|
||||
and
|
||||
sysctl -w net.core.rmem_default=262143
|
||||
to increase the read buffer memory max and default to 262143 (256k - 1) from
|
||||
defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
|
||||
will increase the amount of memory used by the network stack for receives, and
|
||||
to increase the read buffer memory max and default to 262143 (256k - 1) from
|
||||
defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
|
||||
will increase the amount of memory used by the network stack for receives, and
|
||||
can be increased significantly more if necessary for your application.
|
||||
|
||||
|
||||
Additional Configurations
|
||||
=========================
|
||||
|
||||
Configuring the Driver on Different Distributions
|
||||
-------------------------------------------------
|
||||
Configuring a network driver to load properly when the system is started is
|
||||
distribution dependent. Typically, the configuration process involves adding
|
||||
an alias line to /etc/modprobe.conf as well as editing other system startup
|
||||
scripts and/or configuration files. Many popular Linux distributions ship
|
||||
with tools to make these changes for you. To learn the proper way to
|
||||
configure a network device for your system, refer to your distribution
|
||||
documentation. If during this process you are asked for the driver or module
|
||||
name, the name for the Linux Base Driver for the Intel 10GbE Family of
|
||||
Adapters is ixgb.
|
||||
|
||||
Viewing Link Messages
|
||||
---------------------
|
||||
Link messages will not be displayed to the console if the distribution is
|
||||
restricting system messages. In order to see network driver link messages on
|
||||
your console, set dmesg to eight by entering the following:
|
||||
|
||||
dmesg -n 8
|
||||
|
||||
NOTE: This setting is not saved across reboots.
|
||||
|
||||
|
||||
Jumbo Frames
|
||||
------------
|
||||
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
|
||||
enabled by changing the MTU to a value larger than the default of 1500.
|
||||
The maximum value for the MTU is 16114. Use the ifconfig command to
|
||||
increase the MTU size. For example:
|
||||
|
||||
ifconfig ethx mtu 9000 up
|
||||
|
||||
The maximum MTU setting for Jumbo Frames is 16114. This value coincides
|
||||
with the maximum Jumbo Frames size of 16128.
|
||||
|
||||
|
||||
Ethtool
|
||||
-------
|
||||
The driver utilizes the ethtool interface for driver configuration and
|
||||
diagnostics, as well as displaying statistical information. Ethtool
|
||||
version 1.6 or later is required for this functionality.
|
||||
|
||||
The latest release of ethtool can be found from
|
||||
http://sourceforge.net/projects/gkernel
|
||||
|
||||
NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support
|
||||
for a more complete ethtool feature set can be enabled by upgrading
|
||||
to the latest version.
|
||||
|
||||
|
||||
NAPI
|
||||
----
|
||||
|
||||
NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled
|
||||
or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI
|
||||
|
||||
See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
|
||||
|
||||
|
||||
Known Issues/Troubleshooting
|
||||
============================
|
||||
|
||||
NOTE: After installing the driver, if your Intel Network Connection is not
|
||||
working, verify in the "In This Release" section of the readme that you have
|
||||
installed the correct driver.
|
||||
|
||||
Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with
|
||||
Fujitsu XENPAK Module in SmartBits Chassis
|
||||
---------------------------------------------------------------------
|
||||
Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
|
||||
Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
|
||||
chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
|
||||
The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
|
||||
Server adapter or the SmartBits. If this situation occurs using a different
|
||||
cable assembly may resolve the issue.
|
||||
|
||||
CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl
|
||||
Switch Port
|
||||
------------------------------------------------------------------------
|
||||
Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
|
||||
adapter is connected to an HP Procurve 3400cl switch port using short cables
|
||||
(1 m or shorter). If this situation occurs, using a longer cable may resolve
|
||||
the issue.
|
||||
|
||||
Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
|
||||
Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
|
||||
errors may be received either by the CX4 Server adapter or at the switch. If
|
||||
this situation occurs, using a different cable assembly may resolve the issue.
|
||||
|
||||
|
||||
Jumbo Frames System Requirement
|
||||
-------------------------------
|
||||
Memory allocation failures have been observed on Linux systems with 64 MB
|
||||
of RAM or less that are running Jumbo Frames. If you are using Jumbo
|
||||
Frames, your system may require more than the advertised minimum
|
||||
requirement of 64 MB of system memory.
|
||||
|
||||
|
||||
Performance Degradation with Jumbo Frames
|
||||
-----------------------------------------
|
||||
Degradation in throughput performance may be observed in some Jumbo frames
|
||||
environments. If this is observed, increasing the application's socket buffer
|
||||
size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
|
||||
See the specific application manual and /usr/src/linux*/Documentation/
|
||||
networking/ip-sysctl.txt for more details.
|
||||
|
||||
|
||||
Allocating Rx Buffers when Using Jumbo Frames
|
||||
---------------------------------------------
|
||||
Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
|
||||
the available memory is heavily fragmented. This issue may be seen with PCI-X
|
||||
adapters or with packet split disabled. This can be reduced or eliminated
|
||||
by changing the amount of available memory for receive buffer allocation, by
|
||||
increasing /proc/sys/vm/min_free_kbytes.
|
||||
|
||||
|
||||
Multiple Interfaces on Same Ethernet Broadcast Network
|
||||
------------------------------------------------------
|
||||
Due to the default ARP behavior on Linux, it is not possible to have
|
||||
one system on two IP networks in the same Ethernet broadcast domain
|
||||
(non-partitioned switch) behave as expected. All Ethernet interfaces
|
||||
will respond to IP traffic for any IP address assigned to the system.
|
||||
This results in unbalanced receive traffic.
|
||||
|
||||
If you have multiple interfaces in a server, do either of the following:
|
||||
|
||||
- Turn on ARP filtering by entering:
|
||||
echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
|
||||
|
||||
- Install the interfaces in separate broadcast domains - either in
|
||||
different switches or in a switch partitioned to VLANs.
|
||||
|
||||
|
||||
UDP Stress Test Dropped Packet Issue
|
||||
--------------------------------------
|
||||
Under small packets UDP stress test with 10GbE driver, the Linux system
|
||||
may drop UDP packets due to the fullness of socket buffers. You may want
|
||||
to change the driver's Flow Control variables to the minimum value for
|
||||
controlling packet reception.
|
||||
|
||||
|
||||
Tx Hangs Possible Under Stress
|
||||
------------------------------
|
||||
Under stress conditions, if TX hangs occur, turning off TSO
|
||||
"ethtool -K eth0 tso off" may resolve the problem.
|
||||
|
||||
|
||||
Support
|
||||
=======
|
||||
|
||||
For general information and support, go to the Intel support website at:
|
||||
For general information, go to the Intel support website at:
|
||||
|
||||
http://support.intel.com
|
||||
|
||||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||||
|
||||
http://sourceforge.net/projects/e1000
|
||||
|
||||
If an issue is identified with the released source code on the supported
|
||||
kernel with a supported adapter, email the specific information related to
|
||||
the issue to linux.nics@intel.com.
|
||||
kernel with a supported adapter, email the specific information related
|
||||
to the issue to e1000-devel@lists.sf.net
|
||||
|
|
|
@ -0,0 +1,67 @@
|
|||
mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211
|
||||
Copyright (c) 2008, Jouni Malinen <j@w1.fi>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License version 2 as
|
||||
published by the Free Software Foundation.
|
||||
|
||||
|
||||
Introduction
|
||||
|
||||
mac80211_hwsim is a Linux kernel module that can be used to simulate
|
||||
arbitrary number of IEEE 802.11 radios for mac80211. It can be used to
|
||||
test most of the mac80211 functionality and user space tools (e.g.,
|
||||
hostapd and wpa_supplicant) in a way that matches very closely with
|
||||
the normal case of using real WLAN hardware. From the mac80211 view
|
||||
point, mac80211_hwsim is yet another hardware driver, i.e., no changes
|
||||
to mac80211 are needed to use this testing tool.
|
||||
|
||||
The main goal for mac80211_hwsim is to make it easier for developers
|
||||
to test their code and work with new features to mac80211, hostapd,
|
||||
and wpa_supplicant. The simulated radios do not have the limitations
|
||||
of real hardware, so it is easy to generate an arbitrary test setup
|
||||
and always reproduce the same setup for future tests. In addition,
|
||||
since all radio operation is simulated, any channel can be used in
|
||||
tests regardless of regulatory rules.
|
||||
|
||||
mac80211_hwsim kernel module has a parameter 'radios' that can be used
|
||||
to select how many radios are simulated (default 2). This allows
|
||||
configuration of both very simply setups (e.g., just a single access
|
||||
point and a station) or large scale tests (multiple access points with
|
||||
hundreds of stations).
|
||||
|
||||
mac80211_hwsim works by tracking the current channel of each virtual
|
||||
radio and copying all transmitted frames to all other radios that are
|
||||
currently enabled and on the same channel as the transmitting
|
||||
radio. Software encryption in mac80211 is used so that the frames are
|
||||
actually encrypted over the virtual air interface to allow more
|
||||
complete testing of encryption.
|
||||
|
||||
A global monitoring netdev, hwsim#, is created independent of
|
||||
mac80211. This interface can be used to monitor all transmitted frames
|
||||
regardless of channel.
|
||||
|
||||
|
||||
Simple example
|
||||
|
||||
This example shows how to use mac80211_hwsim to simulate two radios:
|
||||
one to act as an access point and the other as a station that
|
||||
associates with the AP. hostapd and wpa_supplicant are used to take
|
||||
care of WPA2-PSK authentication. In addition, hostapd is also
|
||||
processing access point side of association.
|
||||
|
||||
Please note that the current Linux kernel does not enable AP mode, so a
|
||||
simple patch is needed to enable AP mode selection:
|
||||
http://johannes.sipsolutions.net/patches/kernel/all/LATEST/006-allow-ap-vlan-modes.patch
|
||||
|
||||
|
||||
# Build mac80211_hwsim as part of kernel configuration
|
||||
|
||||
# Load the module
|
||||
modprobe mac80211_hwsim
|
||||
|
||||
# Run hostapd (AP) for wlan0
|
||||
hostapd hostapd.conf
|
||||
|
||||
# Run wpa_supplicant (station) for wlan1
|
||||
wpa_supplicant -Dwext -iwlan1 -c wpa_supplicant.conf
|
|
@ -0,0 +1,11 @@
|
|||
interface=wlan0
|
||||
driver=nl80211
|
||||
|
||||
hw_mode=g
|
||||
channel=1
|
||||
ssid=mac80211 test
|
||||
|
||||
wpa=2
|
||||
wpa_key_mgmt=WPA-PSK
|
||||
wpa_pairwise=CCMP
|
||||
wpa_passphrase=12345678
|
|
@ -0,0 +1,10 @@
|
|||
ctrl_interface=/var/run/wpa_supplicant
|
||||
|
||||
network={
|
||||
ssid="mac80211 test"
|
||||
psk="12345678"
|
||||
key_mgmt=WPA-PSK
|
||||
proto=WPA2
|
||||
pairwise=CCMP
|
||||
group=CCMP
|
||||
}
|
|
@ -3,19 +3,11 @@
|
|||
===========================================
|
||||
|
||||
Section 1: Base driver requirements for implementing multiqueue support
|
||||
Section 2: Qdisc support for multiqueue devices
|
||||
Section 3: Brief howto using PRIO or RR for multiqueue devices
|
||||
|
||||
|
||||
Intro: Kernel support for multiqueue devices
|
||||
---------------------------------------------------------
|
||||
|
||||
Kernel support for multiqueue devices is only an API that is presented to the
|
||||
netdevice layer for base drivers to implement. This feature is part of the
|
||||
core networking stack, and all network devices will be running on the
|
||||
multiqueue-aware stack. If a base driver only has one queue, then these
|
||||
changes are transparent to that driver.
|
||||
|
||||
Kernel support for multiqueue devices is always present.
|
||||
|
||||
Section 1: Base driver requirements for implementing multiqueue support
|
||||
-----------------------------------------------------------------------
|
||||
|
@ -32,84 +24,4 @@ netif_{start|stop|wake}_subqueue() functions to manage each queue while the
|
|||
device is still operational. netdev->queue_lock is still used when the device
|
||||
comes online or when it's completely shut down (unregister_netdev(), etc.).
|
||||
|
||||
Finally, the base driver should indicate that it is a multiqueue device. The
|
||||
feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
|
||||
bitmap on device initialization. Below is an example from e1000:
|
||||
|
||||
#ifdef CONFIG_E1000_MQ
|
||||
if ( (adapter->hw.mac.type == e1000_82571) ||
|
||||
(adapter->hw.mac.type == e1000_82572) ||
|
||||
(adapter->hw.mac.type == e1000_80003es2lan))
|
||||
netdev->features |= NETIF_F_MULTI_QUEUE;
|
||||
#endif
|
||||
|
||||
|
||||
Section 2: Qdisc support for multiqueue devices
|
||||
-----------------------------------------------
|
||||
|
||||
Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
|
||||
sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
|
||||
bands and queues, and will store the queue mapping into skb->queue_mapping.
|
||||
Use this field in the base driver to determine which queue to send the skb
|
||||
to.
|
||||
|
||||
sch_rr has been added for hardware that doesn't want scheduling policies from
|
||||
software, so it's a straight round-robin qdisc. It uses the same syntax and
|
||||
classification priomap that sch_prio uses, so it should be intuitive to
|
||||
configure for people who've used sch_prio.
|
||||
|
||||
In order to utilitize the multiqueue features of the qdiscs, the network
|
||||
device layer needs to enable multiple queue support. This can be done by
|
||||
selecting NETDEVICES_MULTIQUEUE under Drivers.
|
||||
|
||||
The PRIO qdisc naturally plugs into a multiqueue device. If
|
||||
NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of
|
||||
bands requested is compared to the number of queues on the hardware. If they
|
||||
are equal, it sets a one-to-one mapping up between the queues and bands. If
|
||||
they're not equal, it will not load the qdisc. This is the same behavior
|
||||
for RR. Once the association is made, any skb that is classified will have
|
||||
skb->queue_mapping set, which will allow the driver to properly queue skb's
|
||||
to multiple queues.
|
||||
|
||||
|
||||
Section 3: Brief howto using PRIO and RR for multiqueue devices
|
||||
---------------------------------------------------------------
|
||||
|
||||
The userspace command 'tc,' part of the iproute2 package, is used to configure
|
||||
qdiscs. To add the PRIO qdisc to your network device, assuming the device is
|
||||
called eth0, run the following command:
|
||||
|
||||
# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
|
||||
|
||||
This will create 4 bands, 0 being highest priority, and associate those bands
|
||||
to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
|
||||
would look like:
|
||||
|
||||
band 0 => queue 0
|
||||
band 1 => queue 1
|
||||
band 2 => queue 2
|
||||
band 3 => queue 3
|
||||
|
||||
Traffic will begin flowing through each queue if your TOS values are assigning
|
||||
traffic across the various bands. For example, ssh traffic will always try to
|
||||
go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
|
||||
so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
|
||||
traffic classification, which is band 1. Therefore pings will be send out
|
||||
queue 1 on the NIC.
|
||||
|
||||
Note the use of the multiqueue keyword. This is only in versions of iproute2
|
||||
that support multiqueue networking devices; if this is omitted when loading
|
||||
a qdisc onto a multiqueue device, the qdisc will load and operate the same
|
||||
if it were loaded onto a single-queue device (i.e. - sends all traffic to
|
||||
queue 0).
|
||||
|
||||
Another alternative to multiqueue band allocation can be done by using the
|
||||
multiqueue option and specify 0 bands. If this is the case, the qdisc will
|
||||
allocate the number of bands to equal the number of queues that the device
|
||||
reports, and bring the qdisc online.
|
||||
|
||||
The behavior of tc filters remains the same, where it will override TOS priority
|
||||
classification.
|
||||
|
||||
|
||||
Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
|
||||
|
|
|
@ -52,13 +52,10 @@ d. MSI/MSI-X. Can be enabled on platforms which support this feature
|
|||
(IA64, Xeon) resulting in noticeable performance improvement(upto 7%
|
||||
on certain platforms).
|
||||
|
||||
e. NAPI. Compile-time option(CONFIG_S2IO_NAPI) for better Rx interrupt
|
||||
moderation.
|
||||
|
||||
f. Statistics. Comprehensive MAC-level and software statistics displayed
|
||||
e. Statistics. Comprehensive MAC-level and software statistics displayed
|
||||
using "ethtool -S" option.
|
||||
|
||||
g. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
|
||||
f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings,
|
||||
with multiple steering options.
|
||||
|
||||
4. Command line parameters
|
||||
|
|
|
@ -148,7 +148,7 @@
|
|||
getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
|
||||
|
||||
is meaningless (as in TCP). Packets with a zero checksum field are
|
||||
illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
|
||||
illegal (cf. RFC 3828, sec. 3.1) and will be silently discarded.
|
||||
|
||||
4) Fragmentation
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
|
|||
which get executed even if the system is otherwise locked up hard).
|
||||
This can be used to debug hard kernel lockups. By executing periodic
|
||||
NMI interrupts, the kernel can monitor whether any CPU has locked up,
|
||||
and print out debugging messages if so.
|
||||
and print out debugging messages if so.
|
||||
|
||||
In order to use the NMI watchdog, you need to have APIC support in your
|
||||
kernel. For SMP kernels, APIC support gets compiled in automatically. For
|
||||
|
@ -22,8 +22,7 @@ CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
|
|||
kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
|
||||
may implicitly disable the NMI watchdog.]
|
||||
|
||||
For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
|
||||
always enabled with I/O-APIC mode (nmi_watchdog=1).
|
||||
For x86-64, the needed APIC is always compiled in.
|
||||
|
||||
Using local APIC (nmi_watchdog=2) needs the first performance register, so
|
||||
you can't use it for other purposes (such as high precision performance
|
||||
|
@ -63,16 +62,15 @@ when the system is idle), but if your system locks up on anything but the
|
|||
"hlt", then you are out of luck -- the event will not happen at all and the
|
||||
watchdog won't trigger. This is a shortcoming of the local APIC watchdog
|
||||
-- unfortunately there is no "clock ticks" event that would work all the
|
||||
time. The I/O APIC watchdog is driven externally and has no such shortcoming.
|
||||
time. The I/O APIC watchdog is driven externally and has no such shortcoming.
|
||||
But its NMI frequency is much higher, resulting in a more significant hit
|
||||
to the overall system performance.
|
||||
|
||||
NOTE: starting with 2.4.2-ac18 the NMI-oopser is disabled by default,
|
||||
you have to enable it with a boot time parameter. Prior to 2.4.2-ac18
|
||||
the NMI-oopser is enabled unconditionally on x86 SMP boxes.
|
||||
On x86 nmi_watchdog is disabled by default so you have to enable it with
|
||||
a boot time parameter.
|
||||
|
||||
On x86-64 the NMI oopser is on by default. On 64bit Intel CPUs
|
||||
it uses IO-APIC by default and on AMD it uses local APIC.
|
||||
NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally
|
||||
on x86 SMP boxes.
|
||||
|
||||
[ feel free to send bug reports, suggestions and patches to
|
||||
Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
|
||||
|
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,141 @@
|
|||
The PowerPC boot wrapper
|
||||
------------------------
|
||||
Copyright (C) Secret Lab Technologies Ltd.
|
||||
|
||||
PowerPC image targets compresses and wraps the kernel image (vmlinux) with
|
||||
a boot wrapper to make it usable by the system firmware. There is no
|
||||
standard PowerPC firmware interface, so the boot wrapper is designed to
|
||||
be adaptable for each kind of image that needs to be built.
|
||||
|
||||
The boot wrapper can be found in the arch/powerpc/boot/ directory. The
|
||||
Makefile in that directory has targets for all the available image types.
|
||||
The different image types are used to support all of the various firmware
|
||||
interfaces found on PowerPC platforms. OpenFirmware is the most commonly
|
||||
used firmware type on general purpose PowerPC systems from Apple, IBM and
|
||||
others. U-Boot is typically found on embedded PowerPC hardware, but there
|
||||
are a handful of other firmware implementations which are also popular. Each
|
||||
firmware interface requires a different image format.
|
||||
|
||||
The boot wrapper is built from the makefile in arch/powerpc/boot/Makefile and
|
||||
it uses the wrapper script (arch/powerpc/boot/wrapper) to generate target
|
||||
image. The details of the build system is discussed in the next section.
|
||||
Currently, the following image format targets exist:
|
||||
|
||||
cuImage.%: Backwards compatible uImage for older version of
|
||||
U-Boot (for versions that don't understand the device
|
||||
tree). This image embeds a device tree blob inside
|
||||
the image. The boot wrapper, kernel and device tree
|
||||
are all embedded inside the U-Boot uImage file format
|
||||
with boot wrapper code that extracts data from the old
|
||||
bd_info structure and loads the data into the device
|
||||
tree before jumping into the kernel.
|
||||
Because of the series of #ifdefs found in the
|
||||
bd_info structure used in the old U-Boot interfaces,
|
||||
cuImages are platform specific. Each specific
|
||||
U-Boot platform has a different platform init file
|
||||
which populates the embedded device tree with data
|
||||
from the platform specific bd_info file. The platform
|
||||
specific cuImage platform init code can be found in
|
||||
arch/powerpc/boot/cuboot.*.c. Selection of the correct
|
||||
cuImage init code for a specific board can be found in
|
||||
the wrapper structure.
|
||||
dtbImage.%: Similar to zImage, except device tree blob is embedded
|
||||
inside the image instead of provided by firmware. The
|
||||
output image file can be either an elf file or a flat
|
||||
binary depending on the platform.
|
||||
dtbImages are used on systems which do not have an
|
||||
interface for passing a device tree directly.
|
||||
dtbImages are similar to simpleImages except that
|
||||
dtbImages have platform specific code for extracting
|
||||
data from the board firmware, but simpleImages do not
|
||||
talk to the firmware at all.
|
||||
PlayStation 3 support uses dtbImage. So do Embedded
|
||||
Planet boards using the PlanetCore firmware. Board
|
||||
specific initialization code is typically found in a
|
||||
file named arch/powerpc/boot/<platform>.c; but this
|
||||
can be overridden by the wrapper script.
|
||||
simpleImage.%: Firmware independent compressed image that does not
|
||||
depend on any particular firmware interface and embeds
|
||||
a device tree blob. This image is a flat binary that
|
||||
can be loaded to any location in RAM and jumped to.
|
||||
Firmware cannot pass any configuration data to the
|
||||
kernel with this image type and it depends entirely on
|
||||
the embedded device tree for all information.
|
||||
The simpleImage is useful for booting systems with
|
||||
an unknown firmware interface or for booting from
|
||||
a debugger when no firmware is present (such as on
|
||||
the Xilinx Virtex platform). The only assumption that
|
||||
simpleImage makes is that RAM is correctly initialized
|
||||
and that the MMU is either off or has RAM mapped to
|
||||
base address 0.
|
||||
simpleImage also supports inserting special platform
|
||||
specific initialization code to the start of the bootup
|
||||
sequence. The virtex405 platform uses this feature to
|
||||
ensure that the cache is invalidated before caching
|
||||
is enabled. Platform specific initialization code is
|
||||
added as part of the wrapper script and is keyed on
|
||||
the image target name. For example, all
|
||||
simpleImage.virtex405-* targets will add the
|
||||
virtex405-head.S initialization code (This also means
|
||||
that the dts file for virtex405 targets should be
|
||||
named (virtex405-<board>.dts). Search the wrapper
|
||||
script for 'virtex405' and see the file
|
||||
arch/powerpc/boot/virtex405-head.S for details.
|
||||
treeImage.%; Image format for used with OpenBIOS firmware found
|
||||
on some ppc4xx hardware. This image embeds a device
|
||||
tree blob inside the image.
|
||||
uImage: Native image format used by U-Boot. The uImage target
|
||||
does not add any boot code. It just wraps a compressed
|
||||
vmlinux in the uImage data structure. This image
|
||||
requires a version of U-Boot that is able to pass
|
||||
a device tree to the kernel at boot. If using an older
|
||||
version of U-Boot, then you need to use a cuImage
|
||||
instead.
|
||||
zImage.%: Image format which does not embed a device tree.
|
||||
Used by OpenFirmware and other firmware interfaces
|
||||
which are able to supply a device tree. This image
|
||||
expects firmware to provide the device tree at boot.
|
||||
Typically, if you have general purpose PowerPC
|
||||
hardware then you want this image format.
|
||||
|
||||
Image types which embed a device tree blob (simpleImage, dtbImage, treeImage,
|
||||
and cuImage) all generate the device tree blob from a file in the
|
||||
arch/powerpc/boot/dts/ directory. The Makefile selects the correct device
|
||||
tree source based on the name of the target. Therefore, if the kernel is
|
||||
built with 'make treeImage.walnut simpleImage.virtex405-ml403', then the
|
||||
build system will use arch/powerpc/boot/dts/walnut.dts to build
|
||||
treeImage.walnut and arch/powerpc/boot/dts/virtex405-ml403.dts to build
|
||||
the simpleImage.virtex405-ml403.
|
||||
|
||||
Two special targets called 'zImage' and 'zImage.initrd' also exist. These
|
||||
targets build all the default images as selected by the kernel configuration.
|
||||
Default images are selected by the boot wrapper Makefile
|
||||
(arch/powerpc/boot/Makefile) by adding targets to the $image-y variable. Look
|
||||
at the Makefile to see which default image targets are available.
|
||||
|
||||
How it is built
|
||||
---------------
|
||||
arch/powerpc is designed to support multiplatform kernels, which means
|
||||
that a single vmlinux image can be booted on many different target boards.
|
||||
It also means that the boot wrapper must be able to wrap for many kinds of
|
||||
images on a single build. The design decision was made to not use any
|
||||
conditional compilation code (#ifdef, etc) in the boot wrapper source code.
|
||||
All of the boot wrapper pieces are buildable at any time regardless of the
|
||||
kernel configuration. Building all the wrapper bits on every kernel build
|
||||
also ensures that obscure parts of the wrapper are at the very least compile
|
||||
tested in a large variety of environments.
|
||||
|
||||
The wrapper is adapted for different image types at link time by linking in
|
||||
just the wrapper bits that are appropriate for the image type. The 'wrapper
|
||||
script' (found in arch/powerpc/boot/wrapper) is called by the Makefile and
|
||||
is responsible for selecting the correct wrapper bits for the image type.
|
||||
The arguments are well documented in the script's comment block, so they
|
||||
are not repeated here. However, it is worth mentioning that the script
|
||||
uses the -p (platform) argument as the main method of deciding which wrapper
|
||||
bits to compile in. Look for the large 'case "$platform" in' block in the
|
||||
middle of the script. This is also the place where platform specific fixups
|
||||
can be selected by changing the link order.
|
||||
|
||||
In particular, care should be taken when working with cuImages. cuImage
|
||||
wrapper bits are very board specific and care should be taken to make sure
|
||||
the target you are trying to build is supported by the wrapper bits.
|
|
@ -0,0 +1,29 @@
|
|||
* Board Control and Status (BCSR)
|
||||
|
||||
Required properties:
|
||||
|
||||
- device_type : Should be "board-control"
|
||||
- reg : Offset and length of the register set for the device
|
||||
|
||||
Example:
|
||||
|
||||
bcsr@f8000000 {
|
||||
device_type = "board-control";
|
||||
reg = <f8000000 8000>;
|
||||
};
|
||||
|
||||
* Freescale on board FPGA
|
||||
|
||||
This is the memory-mapped registers for on board FPGA.
|
||||
|
||||
Required properities:
|
||||
- compatible : should be "fsl,fpga-pixis".
|
||||
- reg : should contain the address and the lenght of the FPPGA register
|
||||
set.
|
||||
|
||||
Example (MPC8610HPCD):
|
||||
|
||||
board-control@e8000000 {
|
||||
compatible = "fsl,fpga-pixis";
|
||||
reg = <0xe8000000 32>;
|
||||
};
|
|
@ -0,0 +1,67 @@
|
|||
* Freescale Communications Processor Module
|
||||
|
||||
NOTE: This is an interim binding, and will likely change slightly,
|
||||
as more devices are supported. The QE bindings especially are
|
||||
incomplete.
|
||||
|
||||
* Root CPM node
|
||||
|
||||
Properties:
|
||||
- compatible : "fsl,cpm1", "fsl,cpm2", or "fsl,qe".
|
||||
- reg : A 48-byte region beginning with CPCR.
|
||||
|
||||
Example:
|
||||
cpm@119c0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
#interrupt-cells = <2>;
|
||||
compatible = "fsl,mpc8272-cpm", "fsl,cpm2";
|
||||
reg = <119c0 30>;
|
||||
}
|
||||
|
||||
* Properties common to mulitple CPM/QE devices
|
||||
|
||||
- fsl,cpm-command : This value is ORed with the opcode and command flag
|
||||
to specify the device on which a CPM command operates.
|
||||
|
||||
- fsl,cpm-brg : Indicates which baud rate generator the device
|
||||
is associated with. If absent, an unused BRG
|
||||
should be dynamically allocated. If zero, the
|
||||
device uses an external clock rather than a BRG.
|
||||
|
||||
- reg : Unless otherwise specified, the first resource represents the
|
||||
scc/fcc/ucc registers, and the second represents the device's
|
||||
parameter RAM region (if it has one).
|
||||
|
||||
* Multi-User RAM (MURAM)
|
||||
|
||||
The multi-user/dual-ported RAM is expressed as a bus under the CPM node.
|
||||
|
||||
Ranges must be set up subject to the following restrictions:
|
||||
|
||||
- Children's reg nodes must be offsets from the start of all muram, even
|
||||
if the user-data area does not begin at zero.
|
||||
- If multiple range entries are used, the difference between the parent
|
||||
address and the child address must be the same in all, so that a single
|
||||
mapping can cover them all while maintaining the ability to determine
|
||||
CPM-side offsets with pointer subtraction. It is recommended that
|
||||
multiple range entries not be used.
|
||||
- A child address of zero must be translatable, even if no reg resources
|
||||
contain it.
|
||||
|
||||
A child "data" node must exist, compatible with "fsl,cpm-muram-data", to
|
||||
indicate the portion of muram that is usable by the OS for arbitrary
|
||||
purposes. The data node may have an arbitrary number of reg resources,
|
||||
all of which contribute to the allocatable muram pool.
|
||||
|
||||
Example, based on mpc8272:
|
||||
muram@0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
ranges = <0 0 10000>;
|
||||
|
||||
data@0 {
|
||||
compatible = "fsl,cpm-muram-data";
|
||||
reg = <0 2000 9800 800>;
|
||||
};
|
||||
};
|
|
@ -0,0 +1,21 @@
|
|||
* Baud Rate Generators
|
||||
|
||||
Currently defined compatibles:
|
||||
fsl,cpm-brg
|
||||
fsl,cpm1-brg
|
||||
fsl,cpm2-brg
|
||||
|
||||
Properties:
|
||||
- reg : There may be an arbitrary number of reg resources; BRG
|
||||
numbers are assigned to these in order.
|
||||
- clock-frequency : Specifies the base frequency driving
|
||||
the BRG.
|
||||
|
||||
Example:
|
||||
brg@119f0 {
|
||||
compatible = "fsl,mpc8272-brg",
|
||||
"fsl,cpm2-brg",
|
||||
"fsl,cpm-brg";
|
||||
reg = <119f0 10 115f0 10>;
|
||||
clock-frequency = <d#25000000>;
|
||||
};
|
|
@ -0,0 +1,41 @@
|
|||
* I2C
|
||||
|
||||
The I2C controller is expressed as a bus under the CPM node.
|
||||
|
||||
Properties:
|
||||
- compatible : "fsl,cpm1-i2c", "fsl,cpm2-i2c"
|
||||
- reg : On CPM2 devices, the second resource doesn't specify the I2C
|
||||
Parameter RAM itself, but the I2C_BASE field of the CPM2 Parameter RAM
|
||||
(typically 0x8afc 0x2).
|
||||
- #address-cells : Should be one. The cell is the i2c device address with
|
||||
the r/w bit set to zero.
|
||||
- #size-cells : Should be zero.
|
||||
- clock-frequency : Can be used to set the i2c clock frequency. If
|
||||
unspecified, a default frequency of 60kHz is being used.
|
||||
The following two properties are deprecated. They are only used by legacy
|
||||
i2c drivers to find the bus to probe:
|
||||
- linux,i2c-index : Can be used to hard code an i2c bus number. By default,
|
||||
the bus number is dynamically assigned by the i2c core.
|
||||
- linux,i2c-class : Can be used to override the i2c class. The class is used
|
||||
by legacy i2c device drivers to find a bus in a specific context like
|
||||
system management, video or sound. By default, I2C_CLASS_HWMON (1) is
|
||||
being used. The definition of the classes can be found in
|
||||
include/i2c/i2c.h
|
||||
|
||||
Example, based on mpc823:
|
||||
|
||||
i2c@860 {
|
||||
compatible = "fsl,mpc823-i2c",
|
||||
"fsl,cpm1-i2c";
|
||||
reg = <0x860 0x20 0x3c80 0x30>;
|
||||
interrupts = <16>;
|
||||
interrupt-parent = <&CPM_PIC>;
|
||||
fsl,cpm-command = <0x10>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
rtc@68 {
|
||||
compatible = "dallas,ds1307";
|
||||
reg = <0x68>;
|
||||
};
|
||||
};
|
|
@ -0,0 +1,18 @@
|
|||
* Interrupt Controllers
|
||||
|
||||
Currently defined compatibles:
|
||||
- fsl,cpm1-pic
|
||||
- only one interrupt cell
|
||||
- fsl,pq1-pic
|
||||
- fsl,cpm2-pic
|
||||
- second interrupt cell is level/sense:
|
||||
- 2 is falling edge
|
||||
- 8 is active low
|
||||
|
||||
Example:
|
||||
interrupt-controller@10c00 {
|
||||
#interrupt-cells = <2>;
|
||||
interrupt-controller;
|
||||
reg = <10c00 80>;
|
||||
compatible = "mpc8272-pic", "fsl,cpm2-pic";
|
||||
};
|
|
@ -0,0 +1,15 @@
|
|||
* USB (Universal Serial Bus Controller)
|
||||
|
||||
Properties:
|
||||
- compatible : "fsl,cpm1-usb", "fsl,cpm2-usb", "fsl,qe-usb"
|
||||
|
||||
Example:
|
||||
usb@11bc0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
compatible = "fsl,cpm2-usb";
|
||||
reg = <11b60 18 8b00 100>;
|
||||
interrupts = <b 8>;
|
||||
interrupt-parent = <&PIC>;
|
||||
fsl,cpm-command = <2e600000>;
|
||||
};
|
|
@ -0,0 +1,38 @@
|
|||
Every GPIO controller node must have #gpio-cells property defined,
|
||||
this information will be used to translate gpio-specifiers.
|
||||
|
||||
On CPM1 devices, all ports are using slightly different register layouts.
|
||||
Ports A, C and D are 16bit ports and Ports B and E are 32bit ports.
|
||||
|
||||
On CPM2 devices, all ports are 32bit ports and use a common register layout.
|
||||
|
||||
Required properties:
|
||||
- compatible : "fsl,cpm1-pario-bank-a", "fsl,cpm1-pario-bank-b",
|
||||
"fsl,cpm1-pario-bank-c", "fsl,cpm1-pario-bank-d",
|
||||
"fsl,cpm1-pario-bank-e", "fsl,cpm2-pario-bank"
|
||||
- #gpio-cells : Should be two. The first cell is the pin number and the
|
||||
second cell is used to specify optional paramters (currently unused).
|
||||
- gpio-controller : Marks the port as GPIO controller.
|
||||
|
||||
Example of three SOC GPIO banks defined as gpio-controller nodes:
|
||||
|
||||
CPM1_PIO_A: gpio-controller@950 {
|
||||
#gpio-cells = <2>;
|
||||
compatible = "fsl,cpm1-pario-bank-a";
|
||||
reg = <0x950 0x10>;
|
||||
gpio-controller;
|
||||
};
|
||||
|
||||
CPM1_PIO_B: gpio-controller@ab8 {
|
||||
#gpio-cells = <2>;
|
||||
compatible = "fsl,cpm1-pario-bank-b";
|
||||
reg = <0xab8 0x10>;
|
||||
gpio-controller;
|
||||
};
|
||||
|
||||
CPM1_PIO_E: gpio-controller@ac8 {
|
||||
#gpio-cells = <2>;
|
||||
compatible = "fsl,cpm1-pario-bank-e";
|
||||
reg = <0xac8 0x18>;
|
||||
gpio-controller;
|
||||
};
|
|
@ -0,0 +1,45 @@
|
|||
* Network
|
||||
|
||||
Currently defined compatibles:
|
||||
- fsl,cpm1-scc-enet
|
||||
- fsl,cpm2-scc-enet
|
||||
- fsl,cpm1-fec-enet
|
||||
- fsl,cpm2-fcc-enet (third resource is GFEMR)
|
||||
- fsl,qe-enet
|
||||
|
||||
Example:
|
||||
|
||||
ethernet@11300 {
|
||||
device_type = "network";
|
||||
compatible = "fsl,mpc8272-fcc-enet",
|
||||
"fsl,cpm2-fcc-enet";
|
||||
reg = <11300 20 8400 100 11390 1>;
|
||||
local-mac-address = [ 00 00 00 00 00 00 ];
|
||||
interrupts = <20 8>;
|
||||
interrupt-parent = <&PIC>;
|
||||
phy-handle = <&PHY0>;
|
||||
fsl,cpm-command = <12000300>;
|
||||
};
|
||||
|
||||
* MDIO
|
||||
|
||||
Currently defined compatibles:
|
||||
fsl,pq1-fec-mdio (reg is same as first resource of FEC device)
|
||||
fsl,cpm2-mdio-bitbang (reg is port C registers)
|
||||
|
||||
Properties for fsl,cpm2-mdio-bitbang:
|
||||
fsl,mdio-pin : pin of port C controlling mdio data
|
||||
fsl,mdc-pin : pin of port C controlling mdio clock
|
||||
|
||||
Example:
|
||||
mdio@10d40 {
|
||||
device_type = "mdio";
|
||||
compatible = "fsl,mpc8272ads-mdio-bitbang",
|
||||
"fsl,mpc8272-mdio-bitbang",
|
||||
"fsl,cpm2-mdio-bitbang";
|
||||
reg = <10d40 14>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
fsl,mdio-pin = <12>;
|
||||
fsl,mdc-pin = <13>;
|
||||
};
|
|
@ -0,0 +1,58 @@
|
|||
* Freescale QUICC Engine module (QE)
|
||||
This represents qe module that is installed on PowerQUICC II Pro.
|
||||
|
||||
NOTE: This is an interim binding; it should be updated to fit
|
||||
in with the CPM binding later in this document.
|
||||
|
||||
Basically, it is a bus of devices, that could act more or less
|
||||
as a complete entity (UCC, USB etc ). All of them should be siblings on
|
||||
the "root" qe node, using the common properties from there.
|
||||
The description below applies to the qe of MPC8360 and
|
||||
more nodes and properties would be extended in the future.
|
||||
|
||||
i) Root QE device
|
||||
|
||||
Required properties:
|
||||
- compatible : should be "fsl,qe";
|
||||
- model : precise model of the QE, Can be "QE", "CPM", or "CPM2"
|
||||
- reg : offset and length of the device registers.
|
||||
- bus-frequency : the clock frequency for QUICC Engine.
|
||||
|
||||
Recommended properties
|
||||
- brg-frequency : the internal clock source frequency for baud-rate
|
||||
generators in Hz.
|
||||
|
||||
Example:
|
||||
qe@e0100000 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
#interrupt-cells = <2>;
|
||||
compatible = "fsl,qe";
|
||||
ranges = <0 e0100000 00100000>;
|
||||
reg = <e0100000 480>;
|
||||
brg-frequency = <0>;
|
||||
bus-frequency = <179A7B00>;
|
||||
}
|
||||
|
||||
* Multi-User RAM (MURAM)
|
||||
|
||||
Required properties:
|
||||
- compatible : should be "fsl,qe-muram", "fsl,cpm-muram".
|
||||
- mode : the could be "host" or "slave".
|
||||
- ranges : Should be defined as specified in 1) to describe the
|
||||
translation of MURAM addresses.
|
||||
- data-only : sub-node which defines the address area under MURAM
|
||||
bus that can be allocated as data/parameter
|
||||
|
||||
Example:
|
||||
|
||||
muram@10000 {
|
||||
compatible = "fsl,qe-muram", "fsl,cpm-muram";
|
||||
ranges = <0 00010000 0000c000>;
|
||||
|
||||
data-only@0{
|
||||
compatible = "fsl,qe-muram-data",
|
||||
"fsl,cpm-muram-data";
|
||||
reg = <0 c000>;
|
||||
};
|
||||
};
|
|
@ -0,0 +1,24 @@
|
|||
* Uploaded QE firmware
|
||||
|
||||
If a new firwmare has been uploaded to the QE (usually by the
|
||||
boot loader), then a 'firmware' child node should be added to the QE
|
||||
node. This node provides information on the uploaded firmware that
|
||||
device drivers may need.
|
||||
|
||||
Required properties:
|
||||
- id: The string name of the firmware. This is taken from the 'id'
|
||||
member of the qe_firmware structure of the uploaded firmware.
|
||||
Device drivers can search this string to determine if the
|
||||
firmware they want is already present.
|
||||
- extended-modes: The Extended Modes bitfield, taken from the
|
||||
firmware binary. It is a 64-bit number represented
|
||||
as an array of two 32-bit numbers.
|
||||
- virtual-traps: The virtual traps, taken from the firmware binary.
|
||||
It is an array of 8 32-bit numbers.
|
||||
|
||||
Example:
|
||||
firmware {
|
||||
id = "Soft-UART";
|
||||
extended-modes = <0 0>;
|
||||
virtual-traps = <0 0 0 0 0 0 0 0>;
|
||||
};
|
|
@ -0,0 +1,51 @@
|
|||
* Parallel I/O Ports
|
||||
|
||||
This node configures Parallel I/O ports for CPUs with QE support.
|
||||
The node should reside in the "soc" node of the tree. For each
|
||||
device that using parallel I/O ports, a child node should be created.
|
||||
See the definition of the Pin configuration nodes below for more
|
||||
information.
|
||||
|
||||
Required properties:
|
||||
- device_type : should be "par_io".
|
||||
- reg : offset to the register set and its length.
|
||||
- num-ports : number of Parallel I/O ports
|
||||
|
||||
Example:
|
||||
par_io@1400 {
|
||||
reg = <1400 100>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
device_type = "par_io";
|
||||
num-ports = <7>;
|
||||
ucc_pin@01 {
|
||||
......
|
||||
};
|
||||
|
||||
Note that "par_io" nodes are obsolete, and should not be used for
|
||||
the new device trees. Instead, each Par I/O bank should be represented
|
||||
via its own gpio-controller node:
|
||||
|
||||
Required properties:
|
||||
- #gpio-cells : should be "2".
|
||||
- compatible : should be "fsl,<chip>-qe-pario-bank",
|
||||
"fsl,mpc8323-qe-pario-bank".
|
||||
- reg : offset to the register set and its length.
|
||||
- gpio-controller : node to identify gpio controllers.
|
||||
|
||||
Example:
|
||||
qe_pio_a: gpio-controller@1400 {
|
||||
#gpio-cells = <2>;
|
||||
compatible = "fsl,mpc8360-qe-pario-bank",
|
||||
"fsl,mpc8323-qe-pario-bank";
|
||||
reg = <0x1400 0x18>;
|
||||
gpio-controller;
|
||||
};
|
||||
|
||||
qe_pio_e: gpio-controller@1460 {
|
||||
#gpio-cells = <2>;
|
||||
compatible = "fsl,mpc8360-qe-pario-bank",
|
||||
"fsl,mpc8323-qe-pario-bank";
|
||||
reg = <0x1460 0x18>;
|
||||
gpio-controller;
|
||||
};
|
|
@ -0,0 +1,60 @@
|
|||
* Pin configuration nodes
|
||||
|
||||
Required properties:
|
||||
- linux,phandle : phandle of this node; likely referenced by a QE
|
||||
device.
|
||||
- pio-map : array of pin configurations. Each pin is defined by 6
|
||||
integers. The six numbers are respectively: port, pin, dir,
|
||||
open_drain, assignment, has_irq.
|
||||
- port : port number of the pin; 0-6 represent port A-G in UM.
|
||||
- pin : pin number in the port.
|
||||
- dir : direction of the pin, should encode as follows:
|
||||
|
||||
0 = The pin is disabled
|
||||
1 = The pin is an output
|
||||
2 = The pin is an input
|
||||
3 = The pin is I/O
|
||||
|
||||
- open_drain : indicates the pin is normal or wired-OR:
|
||||
|
||||
0 = The pin is actively driven as an output
|
||||
1 = The pin is an open-drain driver. As an output, the pin is
|
||||
driven active-low, otherwise it is three-stated.
|
||||
|
||||
- assignment : function number of the pin according to the Pin Assignment
|
||||
tables in User Manual. Each pin can have up to 4 possible functions in
|
||||
QE and two options for CPM.
|
||||
- has_irq : indicates if the pin is used as source of external
|
||||
interrupts.
|
||||
|
||||
Example:
|
||||
ucc_pin@01 {
|
||||
linux,phandle = <140001>;
|
||||
pio-map = <
|
||||
/* port pin dir open_drain assignment has_irq */
|
||||
0 3 1 0 1 0 /* TxD0 */
|
||||
0 4 1 0 1 0 /* TxD1 */
|
||||
0 5 1 0 1 0 /* TxD2 */
|
||||
0 6 1 0 1 0 /* TxD3 */
|
||||
1 6 1 0 3 0 /* TxD4 */
|
||||
1 7 1 0 1 0 /* TxD5 */
|
||||
1 9 1 0 2 0 /* TxD6 */
|
||||
1 a 1 0 2 0 /* TxD7 */
|
||||
0 9 2 0 1 0 /* RxD0 */
|
||||
0 a 2 0 1 0 /* RxD1 */
|
||||
0 b 2 0 1 0 /* RxD2 */
|
||||
0 c 2 0 1 0 /* RxD3 */
|
||||
0 d 2 0 1 0 /* RxD4 */
|
||||
1 1 2 0 2 0 /* RxD5 */
|
||||
1 0 2 0 2 0 /* RxD6 */
|
||||
1 4 2 0 2 0 /* RxD7 */
|
||||
0 7 1 0 1 0 /* TX_EN */
|
||||
0 8 1 0 1 0 /* TX_ER */
|
||||
0 f 2 0 1 0 /* RX_DV */
|
||||
0 10 2 0 1 0 /* RX_ER */
|
||||
0 0 2 0 1 0 /* RX_CLK */
|
||||
2 9 1 0 3 0 /* GTX_CLK - CLK10 */
|
||||
2 8 2 0 1 0>; /* GTX125 - CLK9 */
|
||||
};
|
||||
|
||||
|
|
@ -0,0 +1,70 @@
|
|||
* UCC (Unified Communications Controllers)
|
||||
|
||||
Required properties:
|
||||
- device_type : should be "network", "hldc", "uart", "transparent"
|
||||
"bisync", "atm", or "serial".
|
||||
- compatible : could be "ucc_geth" or "fsl_atm" and so on.
|
||||
- cell-index : the ucc number(1-8), corresponding to UCCx in UM.
|
||||
- reg : Offset and length of the register set for the device
|
||||
- interrupts : <a b> where a is the interrupt number and b is a
|
||||
field that represents an encoding of the sense and level
|
||||
information for the interrupt. This should be encoded based on
|
||||
the information in section 2) depending on the type of interrupt
|
||||
controller you have.
|
||||
- interrupt-parent : the phandle for the interrupt controller that
|
||||
services interrupts for this device.
|
||||
- pio-handle : The phandle for the Parallel I/O port configuration.
|
||||
- port-number : for UART drivers, the port number to use, between 0 and 3.
|
||||
This usually corresponds to the /dev/ttyQE device, e.g. <0> = /dev/ttyQE0.
|
||||
The port number is added to the minor number of the device. Unlike the
|
||||
CPM UART driver, the port-number is required for the QE UART driver.
|
||||
- soft-uart : for UART drivers, if specified this means the QE UART device
|
||||
driver should use "Soft-UART" mode, which is needed on some SOCs that have
|
||||
broken UART hardware. Soft-UART is provided via a microcode upload.
|
||||
- rx-clock-name: the UCC receive clock source
|
||||
"none": clock source is disabled
|
||||
"brg1" through "brg16": clock source is BRG1-BRG16, respectively
|
||||
"clk1" through "clk24": clock source is CLK1-CLK24, respectively
|
||||
- tx-clock-name: the UCC transmit clock source
|
||||
"none": clock source is disabled
|
||||
"brg1" through "brg16": clock source is BRG1-BRG16, respectively
|
||||
"clk1" through "clk24": clock source is CLK1-CLK24, respectively
|
||||
The following two properties are deprecated. rx-clock has been replaced
|
||||
with rx-clock-name, and tx-clock has been replaced with tx-clock-name.
|
||||
Drivers that currently use the deprecated properties should continue to
|
||||
do so, in order to support older device trees, but they should be updated
|
||||
to check for the new properties first.
|
||||
- rx-clock : represents the UCC receive clock source.
|
||||
0x00 : clock source is disabled;
|
||||
0x1~0x10 : clock source is BRG1~BRG16 respectively;
|
||||
0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively.
|
||||
- tx-clock: represents the UCC transmit clock source;
|
||||
0x00 : clock source is disabled;
|
||||
0x1~0x10 : clock source is BRG1~BRG16 respectively;
|
||||
0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively.
|
||||
|
||||
Required properties for network device_type:
|
||||
- mac-address : list of bytes representing the ethernet address.
|
||||
- phy-handle : The phandle for the PHY connected to this controller.
|
||||
|
||||
Recommended properties:
|
||||
- phy-connection-type : a string naming the controller/PHY interface type,
|
||||
i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id" (Internal
|
||||
Delay), "rgmii-txid" (delay on TX only), "rgmii-rxid" (delay on RX only),
|
||||
"tbi", or "rtbi".
|
||||
|
||||
Example:
|
||||
ucc@2000 {
|
||||
device_type = "network";
|
||||
compatible = "ucc_geth";
|
||||
cell-index = <1>;
|
||||
reg = <2000 200>;
|
||||
interrupts = <a0 0>;
|
||||
interrupt-parent = <700>;
|
||||
mac-address = [ 00 04 9f 00 23 23 ];
|
||||
rx-clock = "none";
|
||||
tx-clock = "clk9";
|
||||
phy-handle = <212000>;
|
||||
phy-connection-type = "gmii";
|
||||
pio-handle = <140001>;
|
||||
};
|
|
@ -0,0 +1,37 @@
|
|||
Freescale QUICC Engine USB Controller
|
||||
|
||||
Required properties:
|
||||
- compatible : should be "fsl,<chip>-qe-usb", "fsl,mpc8323-qe-usb".
|
||||
- reg : the first two cells should contain usb registers location and
|
||||
length, the next two two cells should contain PRAM location and
|
||||
length.
|
||||
- interrupts : should contain USB interrupt.
|
||||
- interrupt-parent : interrupt source phandle.
|
||||
- fsl,fullspeed-clock : specifies the full speed USB clock source:
|
||||
"none": clock source is disabled
|
||||
"brg1" through "brg16": clock source is BRG1-BRG16, respectively
|
||||
"clk1" through "clk24": clock source is CLK1-CLK24, respectively
|
||||
- fsl,lowspeed-clock : specifies the low speed USB clock source:
|
||||
"none": clock source is disabled
|
||||
"brg1" through "brg16": clock source is BRG1-BRG16, respectively
|
||||
"clk1" through "clk24": clock source is CLK1-CLK24, respectively
|
||||
- hub-power-budget : USB power budget for the root hub, in mA.
|
||||
- gpios : should specify GPIOs in this order: USBOE, USBTP, USBTN, USBRP,
|
||||
USBRN, SPEED (optional), and POWER (optional).
|
||||
|
||||
Example:
|
||||
|
||||
usb@6c0 {
|
||||
compatible = "fsl,mpc8360-qe-usb", "fsl,mpc8323-qe-usb";
|
||||
reg = <0x6c0 0x40 0x8b00 0x100>;
|
||||
interrupts = <11>;
|
||||
interrupt-parent = <&qeic>;
|
||||
fsl,fullspeed-clock = "clk21";
|
||||
gpios = <&qe_pio_b 2 0 /* USBOE */
|
||||
&qe_pio_b 3 0 /* USBTP */
|
||||
&qe_pio_b 8 0 /* USBTN */
|
||||
&qe_pio_b 9 0 /* USBRP */
|
||||
&qe_pio_b 11 0 /* USBRN */
|
||||
&qe_pio_e 20 0 /* SPEED */
|
||||
&qe_pio_e 21 0 /* POWER */>;
|
||||
};
|
|
@ -0,0 +1,21 @@
|
|||
* Serial
|
||||
|
||||
Currently defined compatibles:
|
||||
- fsl,cpm1-smc-uart
|
||||
- fsl,cpm2-smc-uart
|
||||
- fsl,cpm1-scc-uart
|
||||
- fsl,cpm2-scc-uart
|
||||
- fsl,qe-uart
|
||||
|
||||
Example:
|
||||
|
||||
serial@11a00 {
|
||||
device_type = "serial";
|
||||
compatible = "fsl,mpc8272-scc-uart",
|
||||
"fsl,cpm2-scc-uart";
|
||||
reg = <11a00 20 8000 100>;
|
||||
interrupts = <28 8>;
|
||||
interrupt-parent = <&PIC>;
|
||||
fsl,cpm-brg = <1>;
|
||||
fsl,cpm-command = <00800000>;
|
||||
};
|
|
@ -0,0 +1,18 @@
|
|||
* Freescale Display Interface Unit
|
||||
|
||||
The Freescale DIU is a LCD controller, with proper hardware, it can also
|
||||
drive DVI monitors.
|
||||
|
||||
Required properties:
|
||||
- compatible : should be "fsl-diu".
|
||||
- reg : should contain at least address and length of the DIU register
|
||||
set.
|
||||
- Interrupts : one DIU interrupt should be describe here.
|
||||
|
||||
Example (MPC8610HPCD):
|
||||
display@2c000 {
|
||||
compatible = "fsl,diu";
|
||||
reg = <0x2c000 100>;
|
||||
interrupts = <72 2>;
|
||||
interrupt-parent = <&mpic>;
|
||||
};
|
|
@ -0,0 +1,127 @@
|
|||
* Freescale 83xx DMA Controller
|
||||
|
||||
Freescale PowerPC 83xx have on chip general purpose DMA controllers.
|
||||
|
||||
Required properties:
|
||||
|
||||
- compatible : compatible list, contains 2 entries, first is
|
||||
"fsl,CHIP-dma", where CHIP is the processor
|
||||
(mpc8349, mpc8360, etc.) and the second is
|
||||
"fsl,elo-dma"
|
||||
- reg : <registers mapping for DMA general status reg>
|
||||
- ranges : Should be defined as specified in 1) to describe the
|
||||
DMA controller channels.
|
||||
- cell-index : controller index. 0 for controller @ 0x8100
|
||||
- interrupts : <interrupt mapping for DMA IRQ>
|
||||
- interrupt-parent : optional, if needed for interrupt mapping
|
||||
|
||||
|
||||
- DMA channel nodes:
|
||||
- compatible : compatible list, contains 2 entries, first is
|
||||
"fsl,CHIP-dma-channel", where CHIP is the processor
|
||||
(mpc8349, mpc8350, etc.) and the second is
|
||||
"fsl,elo-dma-channel"
|
||||
- reg : <registers mapping for channel>
|
||||
- cell-index : dma channel index starts at 0.
|
||||
|
||||
Optional properties:
|
||||
- interrupts : <interrupt mapping for DMA channel IRQ>
|
||||
(on 83xx this is expected to be identical to
|
||||
the interrupts property of the parent node)
|
||||
- interrupt-parent : optional, if needed for interrupt mapping
|
||||
|
||||
Example:
|
||||
dma@82a8 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
compatible = "fsl,mpc8349-dma", "fsl,elo-dma";
|
||||
reg = <82a8 4>;
|
||||
ranges = <0 8100 1a4>;
|
||||
interrupt-parent = <&ipic>;
|
||||
interrupts = <47 8>;
|
||||
cell-index = <0>;
|
||||
dma-channel@0 {
|
||||
compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
|
||||
cell-index = <0>;
|
||||
reg = <0 80>;
|
||||
};
|
||||
dma-channel@80 {
|
||||
compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
|
||||
cell-index = <1>;
|
||||
reg = <80 80>;
|
||||
};
|
||||
dma-channel@100 {
|
||||
compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
|
||||
cell-index = <2>;
|
||||
reg = <100 80>;
|
||||
};
|
||||
dma-channel@180 {
|
||||
compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
|
||||
cell-index = <3>;
|
||||
reg = <180 80>;
|
||||
};
|
||||
};
|
||||
|
||||
* Freescale 85xx/86xx DMA Controller
|
||||
|
||||
Freescale PowerPC 85xx/86xx have on chip general purpose DMA controllers.
|
||||
|
||||
Required properties:
|
||||
|
||||
- compatible : compatible list, contains 2 entries, first is
|
||||
"fsl,CHIP-dma", where CHIP is the processor
|
||||
(mpc8540, mpc8540, etc.) and the second is
|
||||
"fsl,eloplus-dma"
|
||||
- reg : <registers mapping for DMA general status reg>
|
||||
- cell-index : controller index. 0 for controller @ 0x21000,
|
||||
1 for controller @ 0xc000
|
||||
- ranges : Should be defined as specified in 1) to describe the
|
||||
DMA controller channels.
|
||||
|
||||
- DMA channel nodes:
|
||||
- compatible : compatible list, contains 2 entries, first is
|
||||
"fsl,CHIP-dma-channel", where CHIP is the processor
|
||||
(mpc8540, mpc8560, etc.) and the second is
|
||||
"fsl,eloplus-dma-channel"
|
||||
- cell-index : dma channel index starts at 0.
|
||||
- reg : <registers mapping for channel>
|
||||
- interrupts : <interrupt mapping for DMA channel IRQ>
|
||||
- interrupt-parent : optional, if needed for interrupt mapping
|
||||
|
||||
Example:
|
||||
dma@21300 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
compatible = "fsl,mpc8540-dma", "fsl,eloplus-dma";
|
||||
reg = <21300 4>;
|
||||
ranges = <0 21100 200>;
|
||||
cell-index = <0>;
|
||||
dma-channel@0 {
|
||||
compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
|
||||
reg = <0 80>;
|
||||
cell-index = <0>;
|
||||
interrupt-parent = <&mpic>;
|
||||
interrupts = <14 2>;
|
||||
};
|
||||
dma-channel@80 {
|
||||
compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
|
||||
reg = <80 80>;
|
||||
cell-index = <1>;
|
||||
interrupt-parent = <&mpic>;
|
||||
interrupts = <15 2>;
|
||||
};
|
||||
dma-channel@100 {
|
||||
compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
|
||||
reg = <100 80>;
|
||||
cell-index = <2>;
|
||||
interrupt-parent = <&mpic>;
|
||||
interrupts = <16 2>;
|
||||
};
|
||||
dma-channel@180 {
|
||||
compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
|
||||
reg = <180 80>;
|
||||
cell-index = <3>;
|
||||
interrupt-parent = <&mpic>;
|
||||
interrupts = <17 2>;
|
||||
};
|
||||
};
|
|
@ -0,0 +1,31 @@
|
|||
* Freescale General-purpose Timers Module
|
||||
|
||||
Required properties:
|
||||
- compatible : should be
|
||||
"fsl,<chip>-gtm", "fsl,gtm" for SOC GTMs
|
||||
"fsl,<chip>-qe-gtm", "fsl,qe-gtm", "fsl,gtm" for QE GTMs
|
||||
"fsl,<chip>-cpm2-gtm", "fsl,cpm2-gtm", "fsl,gtm" for CPM2 GTMs
|
||||
- reg : should contain gtm registers location and length (0x40).
|
||||
- interrupts : should contain four interrupts.
|
||||
- interrupt-parent : interrupt source phandle.
|
||||
- clock-frequency : specifies the frequency driving the timer.
|
||||
|
||||
Example:
|
||||
|
||||
timer@500 {
|
||||
compatible = "fsl,mpc8360-gtm", "fsl,gtm";
|
||||
reg = <0x500 0x40>;
|
||||
interrupts = <90 8 78 8 84 8 72 8>;
|
||||
interrupt-parent = <&ipic>;
|
||||
/* filled by u-boot */
|
||||
clock-frequency = <0>;
|
||||
};
|
||||
|
||||
timer@440 {
|
||||
compatible = "fsl,mpc8360-qe-gtm", "fsl,qe-gtm", "fsl,gtm";
|
||||
reg = <0x440 0x40>;
|
||||
interrupts = <12 13 14 15>;
|
||||
interrupt-parent = <&qeic>;
|
||||
/* filled by u-boot */
|
||||
clock-frequency = <0>;
|
||||
};
|
|
@ -0,0 +1,25 @@
|
|||
* Global Utilities Block
|
||||
|
||||
The global utilities block controls power management, I/O device
|
||||
enabling, power-on-reset configuration monitoring, general-purpose
|
||||
I/O signal configuration, alternate function selection for multiplexed
|
||||
signals, and clock control.
|
||||
|
||||
Required properties:
|
||||
|
||||
- compatible : Should define the compatible device type for
|
||||
global-utilities.
|
||||
- reg : Offset and length of the register set for the device.
|
||||
|
||||
Recommended properties:
|
||||
|
||||
- fsl,has-rstcr : Indicates that the global utilities register set
|
||||
contains a functioning "reset control register" (i.e. the board
|
||||
is wired to reset upon setting the HRESET_REQ bit in this register).
|
||||
|
||||
Example:
|
||||
global-utilities@e0000 { /* global utilities block */
|
||||
compatible = "fsl,mpc8548-guts";
|
||||
reg = <e0000 1000>;
|
||||
fsl,has-rstcr;
|
||||
};
|
|
@ -0,0 +1,32 @@
|
|||
* I2C
|
||||
|
||||
Required properties :
|
||||
|
||||
- device_type : Should be "i2c"
|
||||
- reg : Offset and length of the register set for the device
|
||||
|
||||
Recommended properties :
|
||||
|
||||
- compatible : Should be "fsl-i2c" for parts compatible with
|
||||
Freescale I2C specifications.
|
||||
- interrupts : <a b> where a is the interrupt number and b is a
|
||||
field that represents an encoding of the sense and level
|
||||
information for the interrupt. This should be encoded based on
|
||||
the information in section 2) depending on the type of interrupt
|
||||
controller you have.
|
||||
- interrupt-parent : the phandle for the interrupt controller that
|
||||
services interrupts for this device.
|
||||
- dfsrr : boolean; if defined, indicates that this I2C device has
|
||||
a digital filter sampling rate register
|
||||
- fsl5200-clocking : boolean; if defined, indicated that this device
|
||||
uses the FSL 5200 clocking mechanism.
|
||||
|
||||
Example :
|
||||
i2c@3000 {
|
||||
interrupt-parent = <40000>;
|
||||
interrupts = <1b 3>;
|
||||
reg = <3000 18>;
|
||||
device_type = "i2c";
|
||||
compatible = "fsl-i2c";
|
||||
dfsrr;
|
||||
};
|
|
@ -0,0 +1,35 @@
|
|||
* Chipselect/Local Bus
|
||||
|
||||
Properties:
|
||||
- name : Should be localbus
|
||||
- #address-cells : Should be either two or three. The first cell is the
|
||||
chipselect number, and the remaining cells are the
|
||||
offset into the chipselect.
|
||||
- #size-cells : Either one or two, depending on how large each chipselect
|
||||
can be.
|
||||
- ranges : Each range corresponds to a single chipselect, and cover
|
||||
the entire access window as configured.
|
||||
|
||||
Example:
|
||||
localbus@f0010100 {
|
||||
compatible = "fsl,mpc8272-localbus",
|
||||
"fsl,pq2-localbus";
|
||||
#address-cells = <2>;
|
||||
#size-cells = <1>;
|
||||
reg = <f0010100 40>;
|
||||
|
||||
ranges = <0 0 fe000000 02000000
|
||||
1 0 f4500000 00008000>;
|
||||
|
||||
flash@0,0 {
|
||||
compatible = "jedec-flash";
|
||||
reg = <0 0 2000000>;
|
||||
bank-width = <4>;
|
||||
device-width = <1>;
|
||||
};
|
||||
|
||||
board-control@1,0 {
|
||||
reg = <1 0 20>;
|
||||
compatible = "fsl,mpc8272ads-bcsr";
|
||||
};
|
||||
};
|
|
@ -0,0 +1,17 @@
|
|||
Freescale MPC8349E-mITX-compatible Power Management Micro Controller Unit (MCU)
|
||||
|
||||
Required properties:
|
||||
- compatible : "fsl,<mcu-chip>-<board>", "fsl,mcu-mpc8349emitx".
|
||||
- reg : should specify I2C address (0x0a).
|
||||
- #gpio-cells : should be 2.
|
||||
- gpio-controller : should be present.
|
||||
|
||||
Example:
|
||||
|
||||
mcu@0a {
|
||||
#gpio-cells = <2>;
|
||||
compatible = "fsl,mc9s08qg8-mpc8349emitx",
|
||||
"fsl,mcu-mpc8349emitx";
|
||||
reg = <0x0a>;
|
||||
gpio-controller;
|
||||
};
|
|
@ -0,0 +1,36 @@
|
|||
* Freescale MSI interrupt controller
|
||||
|
||||
Reguired properities:
|
||||
- compatible : compatible list, contains 2 entries,
|
||||
first is "fsl,CHIP-msi", where CHIP is the processor(mpc8610, mpc8572,
|
||||
etc.) and the second is "fsl,mpic-msi" or "fsl,ipic-msi" depending on
|
||||
the parent type.
|
||||
- reg : should contain the address and the length of the shared message
|
||||
interrupt register set.
|
||||
- msi-available-ranges: use <start count> style section to define which
|
||||
msi interrupt can be used in the 256 msi interrupts. This property is
|
||||
optional, without this, all the 256 MSI interrupts can be used.
|
||||
- interrupts : each one of the interrupts here is one entry per 32 MSIs,
|
||||
and routed to the host interrupt controller. the interrupts should
|
||||
be set as edge sensitive.
|
||||
- interrupt-parent: the phandle for the interrupt controller
|
||||
that services interrupts for this device. for 83xx cpu, the interrupts
|
||||
are routed to IPIC, and for 85xx/86xx cpu the interrupts are routed
|
||||
to MPIC.
|
||||
|
||||
Example:
|
||||
msi@41600 {
|
||||
compatible = "fsl,mpc8610-msi", "fsl,mpic-msi";
|
||||
reg = <0x41600 0x80>;
|
||||
msi-available-ranges = <0 0x100>;
|
||||
interrupts = <
|
||||
0xe0 0
|
||||
0xe1 0
|
||||
0xe2 0
|
||||
0xe3 0
|
||||
0xe4 0
|
||||
0xe5 0
|
||||
0xe6 0
|
||||
0xe7 0>;
|
||||
interrupt-parent = <&mpic>;
|
||||
};
|
|
@ -0,0 +1,63 @@
|
|||
* Power Management Controller
|
||||
|
||||
Properties:
|
||||
- compatible: "fsl,<chip>-pmc".
|
||||
|
||||
"fsl,mpc8349-pmc" should be listed for any chip whose PMC is
|
||||
compatible. "fsl,mpc8313-pmc" should also be listed for any chip
|
||||
whose PMC is compatible, and implies deep-sleep capability.
|
||||
|
||||
"fsl,mpc8548-pmc" should be listed for any chip whose PMC is
|
||||
compatible. "fsl,mpc8536-pmc" should also be listed for any chip
|
||||
whose PMC is compatible, and implies deep-sleep capability.
|
||||
|
||||
"fsl,mpc8641d-pmc" should be listed for any chip whose PMC is
|
||||
compatible; all statements below that apply to "fsl,mpc8548-pmc" also
|
||||
apply to "fsl,mpc8641d-pmc".
|
||||
|
||||
Compatibility does not include bit assigments in SCCR/PMCDR/DEVDISR; these
|
||||
bit assigments are indicated via the sleep specifier in each device's
|
||||
sleep property.
|
||||
|
||||
- reg: For devices compatible with "fsl,mpc8349-pmc", the first resource
|
||||
is the PMC block, and the second resource is the Clock Configuration
|
||||
block.
|
||||
|
||||
For devices compatible with "fsl,mpc8548-pmc", the first resource
|
||||
is a 32-byte block beginning with DEVDISR.
|
||||
|
||||
- interrupts: For "fsl,mpc8349-pmc"-compatible devices, the first
|
||||
resource is the PMC block interrupt.
|
||||
|
||||
- fsl,mpc8313-wakeup-timer: For "fsl,mpc8313-pmc"-compatible devices,
|
||||
this is a phandle to an "fsl,gtm" node on which timer 4 can be used as
|
||||
a wakeup source from deep sleep.
|
||||
|
||||
Sleep specifiers:
|
||||
|
||||
fsl,mpc8349-pmc: Sleep specifiers consist of one cell. For each bit
|
||||
that is set in the cell, the corresponding bit in SCCR will be saved
|
||||
and cleared on suspend, and restored on resume. This sleep controller
|
||||
supports disabling and resuming devices at any time.
|
||||
|
||||
fsl,mpc8536-pmc: Sleep specifiers consist of three cells, the third of
|
||||
which will be ORed into PMCDR upon suspend, and cleared from PMCDR
|
||||
upon resume. The first two cells are as described for fsl,mpc8578-pmc.
|
||||
This sleep controller only supports disabling devices during system
|
||||
sleep, or permanently.
|
||||
|
||||
fsl,mpc8548-pmc: Sleep specifiers consist of one or two cells, the
|
||||
first of which will be ORed into DEVDISR (and the second into
|
||||
DEVDISR2, if present -- this cell should be zero or absent if the
|
||||
hardware does not have DEVDISR2) upon a request for permanent device
|
||||
disabling. This sleep controller does not support configuring devices
|
||||
to disable during system sleep (unless supported by another compatible
|
||||
match), or dynamically.
|
||||
|
||||
Example:
|
||||
|
||||
power@b00 {
|
||||
compatible = "fsl,mpc8313-pmc", "fsl,mpc8349-pmc";
|
||||
reg = <0xb00 0x100 0xa00 0x100>;
|
||||
interrupts = <80 8>;
|
||||
};
|
|
@ -0,0 +1,29 @@
|
|||
* Freescale 8xxx/3.0 Gb/s SATA nodes
|
||||
|
||||
SATA nodes are defined to describe on-chip Serial ATA controllers.
|
||||
Each SATA port should have its own node.
|
||||
|
||||
Required properties:
|
||||
- compatible : compatible list, contains 2 entries, first is
|
||||
"fsl,CHIP-sata", where CHIP is the processor
|
||||
(mpc8315, mpc8379, etc.) and the second is
|
||||
"fsl,pq-sata"
|
||||
- interrupts : <interrupt mapping for SATA IRQ>
|
||||
- cell-index : controller index.
|
||||
1 for controller @ 0x18000
|
||||
2 for controller @ 0x19000
|
||||
3 for controller @ 0x1a000
|
||||
4 for controller @ 0x1b000
|
||||
|
||||
Optional properties:
|
||||
- interrupt-parent : optional, if needed for interrupt mapping
|
||||
- reg : <registers mapping>
|
||||
|
||||
Example:
|
||||
sata@18000 {
|
||||
compatible = "fsl,mpc8379-sata", "fsl,pq-sata";
|
||||
reg = <0x18000 0x1000>;
|
||||
cell-index = <1>;
|
||||
interrupts = <2c 8>;
|
||||
interrupt-parent = < &ipic >;
|
||||
};
|
|
@ -0,0 +1,68 @@
|
|||
Freescale SoC SEC Security Engines
|
||||
|
||||
Required properties:
|
||||
|
||||
- compatible : Should contain entries for this and backward compatible
|
||||
SEC versions, high to low, e.g., "fsl,sec2.1", "fsl,sec2.0"
|
||||
- reg : Offset and length of the register set for the device
|
||||
- interrupts : the SEC's interrupt number
|
||||
- fsl,num-channels : An integer representing the number of channels
|
||||
available.
|
||||
- fsl,channel-fifo-len : An integer representing the number of
|
||||
descriptor pointers each channel fetch fifo can hold.
|
||||
- fsl,exec-units-mask : The bitmask representing what execution units
|
||||
(EUs) are available. It's a single 32-bit cell. EU information
|
||||
should be encoded following the SEC's Descriptor Header Dword
|
||||
EU_SEL0 field documentation, i.e. as follows:
|
||||
|
||||
bit 0 = reserved - should be 0
|
||||
bit 1 = set if SEC has the ARC4 EU (AFEU)
|
||||
bit 2 = set if SEC has the DES/3DES EU (DEU)
|
||||
bit 3 = set if SEC has the message digest EU (MDEU/MDEU-A)
|
||||
bit 4 = set if SEC has the random number generator EU (RNG)
|
||||
bit 5 = set if SEC has the public key EU (PKEU)
|
||||
bit 6 = set if SEC has the AES EU (AESU)
|
||||
bit 7 = set if SEC has the Kasumi EU (KEU)
|
||||
bit 8 = set if SEC has the CRC EU (CRCU)
|
||||
bit 11 = set if SEC has the message digest EU extended alg set (MDEU-B)
|
||||
|
||||
remaining bits are reserved for future SEC EUs.
|
||||
|
||||
- fsl,descriptor-types-mask : The bitmask representing what descriptors
|
||||
are available. It's a single 32-bit cell. Descriptor type information
|
||||
should be encoded following the SEC's Descriptor Header Dword DESC_TYPE
|
||||
field documentation, i.e. as follows:
|
||||
|
||||
bit 0 = set if SEC supports the aesu_ctr_nonsnoop desc. type
|
||||
bit 1 = set if SEC supports the ipsec_esp descriptor type
|
||||
bit 2 = set if SEC supports the common_nonsnoop desc. type
|
||||
bit 3 = set if SEC supports the 802.11i AES ccmp desc. type
|
||||
bit 4 = set if SEC supports the hmac_snoop_no_afeu desc. type
|
||||
bit 5 = set if SEC supports the srtp descriptor type
|
||||
bit 6 = set if SEC supports the non_hmac_snoop_no_afeu desc.type
|
||||
bit 7 = set if SEC supports the pkeu_assemble descriptor type
|
||||
bit 8 = set if SEC supports the aesu_key_expand_output desc.type
|
||||
bit 9 = set if SEC supports the pkeu_ptmul descriptor type
|
||||
bit 10 = set if SEC supports the common_nonsnoop_afeu desc. type
|
||||
bit 11 = set if SEC supports the pkeu_ptadd_dbl descriptor type
|
||||
|
||||
..and so on and so forth.
|
||||
|
||||
Optional properties:
|
||||
|
||||
- interrupt-parent : the phandle for the interrupt controller that
|
||||
services interrupts for this device.
|
||||
|
||||
Example:
|
||||
|
||||
/* MPC8548E */
|
||||
crypto@30000 {
|
||||
compatible = "fsl,sec2.1", "fsl,sec2.0";
|
||||
reg = <0x30000 0x10000>;
|
||||
interrupts = <29 2>;
|
||||
interrupt-parent = <&mpic>;
|
||||
fsl,num-channels = <4>;
|
||||
fsl,channel-fifo-len = <24>;
|
||||
fsl,exec-units-mask = <0xfe>;
|
||||
fsl,descriptor-types-mask = <0x12b0ebf>;
|
||||
};
|
|
@ -0,0 +1,24 @@
|
|||
* SPI (Serial Peripheral Interface)
|
||||
|
||||
Required properties:
|
||||
- cell-index : SPI controller index.
|
||||
- compatible : should be "fsl,spi".
|
||||
- mode : the SPI operation mode, it can be "cpu" or "cpu-qe".
|
||||
- reg : Offset and length of the register set for the device
|
||||
- interrupts : <a b> where a is the interrupt number and b is a
|
||||
field that represents an encoding of the sense and level
|
||||
information for the interrupt. This should be encoded based on
|
||||
the information in section 2) depending on the type of interrupt
|
||||
controller you have.
|
||||
- interrupt-parent : the phandle for the interrupt controller that
|
||||
services interrupts for this device.
|
||||
|
||||
Example:
|
||||
spi@4c0 {
|
||||
cell-index = <0>;
|
||||
compatible = "fsl,spi";
|
||||
reg = <4c0 40>;
|
||||
interrupts = <82 0>;
|
||||
interrupt-parent = <700>;
|
||||
mode = "cpu";
|
||||
};
|
|
@ -0,0 +1,38 @@
|
|||
Freescale Synchronous Serial Interface
|
||||
|
||||
The SSI is a serial device that communicates with audio codecs. It can
|
||||
be programmed in AC97, I2S, left-justified, or right-justified modes.
|
||||
|
||||
Required properties:
|
||||
- compatible : compatible list, containing "fsl,ssi"
|
||||
- cell-index : the SSI, <0> = SSI1, <1> = SSI2, and so on
|
||||
- reg : offset and length of the register set for the device
|
||||
- interrupts : <a b> where a is the interrupt number and b is a
|
||||
field that represents an encoding of the sense and
|
||||
level information for the interrupt. This should be
|
||||
encoded based on the information in section 2)
|
||||
depending on the type of interrupt controller you
|
||||
have.
|
||||
- interrupt-parent : the phandle for the interrupt controller that
|
||||
services interrupts for this device.
|
||||
- fsl,mode : the operating mode for the SSI interface
|
||||
"i2s-slave" - I2S mode, SSI is clock slave
|
||||
"i2s-master" - I2S mode, SSI is clock master
|
||||
"lj-slave" - left-justified mode, SSI is clock slave
|
||||
"lj-master" - l.j. mode, SSI is clock master
|
||||
"rj-slave" - right-justified mode, SSI is clock slave
|
||||
"rj-master" - r.j., SSI is clock master
|
||||
"ac97-slave" - AC97 mode, SSI is clock slave
|
||||
"ac97-master" - AC97 mode, SSI is clock master
|
||||
|
||||
Optional properties:
|
||||
- codec-handle : phandle to a 'codec' node that defines an audio
|
||||
codec connected to this SSI. This node is typically
|
||||
a child of an I2C or other control node.
|
||||
|
||||
Child 'codec' node required properties:
|
||||
- compatible : compatible list, contains the name of the codec
|
||||
|
||||
Child 'codec' node optional properties:
|
||||
- clock-frequency : The frequency of the input clock, which typically
|
||||
comes from an on-board dedicated oscillator.
|
|
@ -0,0 +1,62 @@
|
|||
* MDIO IO device
|
||||
|
||||
The MDIO is a bus to which the PHY devices are connected. For each
|
||||
device that exists on this bus, a child node should be created. See
|
||||
the definition of the PHY node below for an example of how to define
|
||||
a PHY.
|
||||
|
||||
Required properties:
|
||||
- reg : Offset and length of the register set for the device
|
||||
- compatible : Should define the compatible device type for the
|
||||
mdio. Currently, this is most likely to be "fsl,gianfar-mdio"
|
||||
|
||||
Example:
|
||||
|
||||
mdio@24520 {
|
||||
reg = <24520 20>;
|
||||
compatible = "fsl,gianfar-mdio";
|
||||
|
||||
ethernet-phy@0 {
|
||||
......
|
||||
};
|
||||
};
|
||||
|
||||
|
||||
* Gianfar-compatible ethernet nodes
|
||||
|
||||
Properties:
|
||||
|
||||
- device_type : Should be "network"
|
||||
- model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
|
||||
- compatible : Should be "gianfar"
|
||||
- reg : Offset and length of the register set for the device
|
||||
- local-mac-address : List of bytes representing the ethernet address of
|
||||
this controller
|
||||
- interrupts : For FEC devices, the first interrupt is the device's
|
||||
interrupt. For TSEC and eTSEC devices, the first interrupt is
|
||||
transmit, the second is receive, and the third is error.
|
||||
- phy-handle : The phandle for the PHY connected to this ethernet
|
||||
controller.
|
||||
- fixed-link : <a b c d e> where a is emulated phy id - choose any,
|
||||
but unique to the all specified fixed-links, b is duplex - 0 half,
|
||||
1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
|
||||
pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
|
||||
- phy-connection-type : a string naming the controller/PHY interface type,
|
||||
i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id", "sgmii",
|
||||
"tbi", or "rtbi". This property is only really needed if the connection
|
||||
is of type "rgmii-id", as all other connection types are detected by
|
||||
hardware.
|
||||
- fsl,magic-packet : If present, indicates that the hardware supports
|
||||
waking up via magic packet.
|
||||
|
||||
Example:
|
||||
ethernet@24000 {
|
||||
device_type = "network";
|
||||
model = "TSEC";
|
||||
compatible = "gianfar";
|
||||
reg = <0x24000 0x1000>;
|
||||
local-mac-address = [ 00 E0 0C 00 73 00 ];
|
||||
interrupts = <29 2 30 2 34 2>;
|
||||
interrupt-parent = <&mpic>;
|
||||
phy-handle = <&phy0>
|
||||
};
|
|
@ -0,0 +1,28 @@
|
|||
Freescale Localbus UPM programmed to work with NAND flash
|
||||
|
||||
Required properties:
|
||||
- compatible : "fsl,upm-nand".
|
||||
- reg : should specify localbus chip select and size used for the chip.
|
||||
- fsl,upm-addr-offset : UPM pattern offset for the address latch.
|
||||
- fsl,upm-cmd-offset : UPM pattern offset for the command latch.
|
||||
- gpios : may specify optional GPIO connected to the Ready-Not-Busy pin.
|
||||
|
||||
Example:
|
||||
|
||||
upm@1,0 {
|
||||
compatible = "fsl,upm-nand";
|
||||
reg = <1 0 1>;
|
||||
fsl,upm-addr-offset = <16>;
|
||||
fsl,upm-cmd-offset = <8>;
|
||||
gpios = <&qe_pio_e 18 0>;
|
||||
|
||||
flash {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
compatible = "...";
|
||||
|
||||
partition@0 {
|
||||
...
|
||||
};
|
||||
};
|
||||
};
|
|
@ -0,0 +1,59 @@
|
|||
Freescale SOC USB controllers
|
||||
|
||||
The device node for a USB controller that is part of a Freescale
|
||||
SOC is as described in the document "Open Firmware Recommended
|
||||
Practice : Universal Serial Bus" with the following modifications
|
||||
and additions :
|
||||
|
||||
Required properties :
|
||||
- compatible : Should be "fsl-usb2-mph" for multi port host USB
|
||||
controllers, or "fsl-usb2-dr" for dual role USB controllers
|
||||
- phy_type : For multi port host USB controllers, should be one of
|
||||
"ulpi", or "serial". For dual role USB controllers, should be
|
||||
one of "ulpi", "utmi", "utmi_wide", or "serial".
|
||||
- reg : Offset and length of the register set for the device
|
||||
- port0 : boolean; if defined, indicates port0 is connected for
|
||||
fsl-usb2-mph compatible controllers. Either this property or
|
||||
"port1" (or both) must be defined for "fsl-usb2-mph" compatible
|
||||
controllers.
|
||||
- port1 : boolean; if defined, indicates port1 is connected for
|
||||
fsl-usb2-mph compatible controllers. Either this property or
|
||||
"port0" (or both) must be defined for "fsl-usb2-mph" compatible
|
||||
controllers.
|
||||
- dr_mode : indicates the working mode for "fsl-usb2-dr" compatible
|
||||
controllers. Can be "host", "peripheral", or "otg". Default to
|
||||
"host" if not defined for backward compatibility.
|
||||
|
||||
Recommended properties :
|
||||
- interrupts : <a b> where a is the interrupt number and b is a
|
||||
field that represents an encoding of the sense and level
|
||||
information for the interrupt. This should be encoded based on
|
||||
the information in section 2) depending on the type of interrupt
|
||||
controller you have.
|
||||
- interrupt-parent : the phandle for the interrupt controller that
|
||||
services interrupts for this device.
|
||||
|
||||
Example multi port host USB controller device node :
|
||||
usb@22000 {
|
||||
compatible = "fsl-usb2-mph";
|
||||
reg = <22000 1000>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
interrupt-parent = <700>;
|
||||
interrupts = <27 1>;
|
||||
phy_type = "ulpi";
|
||||
port0;
|
||||
port1;
|
||||
};
|
||||
|
||||
Example dual role USB controller device node :
|
||||
usb@23000 {
|
||||
compatible = "fsl-usb2-dr";
|
||||
reg = <23000 1000>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
interrupt-parent = <700>;
|
||||
interrupts = <26 1>;
|
||||
dr_mode = "otg";
|
||||
phy = "ulpi";
|
||||
};
|
|
@ -0,0 +1,15 @@
|
|||
LED connected to GPIO
|
||||
|
||||
Required properties:
|
||||
- compatible : should be "gpio-led".
|
||||
- label : (optional) the label for this LED. If omitted, the label is
|
||||
taken from the node name (excluding the unit address).
|
||||
- gpios : should specify LED GPIO.
|
||||
|
||||
Example:
|
||||
|
||||
led@0 {
|
||||
compatible = "gpio-led";
|
||||
label = "hdd";
|
||||
gpios = <&mcu_pio 0 1>;
|
||||
};
|
|
@ -1,89 +1,528 @@
|
|||
rfkill - RF switch subsystem support
|
||||
====================================
|
||||
|
||||
1 Implementation details
|
||||
2 Driver support
|
||||
3 Userspace support
|
||||
1 Introduction
|
||||
2 Implementation details
|
||||
3 Kernel driver guidelines
|
||||
3.1 wireless device drivers
|
||||
3.2 platform/switch drivers
|
||||
3.3 input device drivers
|
||||
4 Kernel API
|
||||
5 Userspace support
|
||||
|
||||
|
||||
1. Introduction:
|
||||
|
||||
The rfkill switch subsystem exists to add a generic interface to circuitry that
|
||||
can enable or disable the signal output of a wireless *transmitter* of any
|
||||
type. By far, the most common use is to disable radio-frequency transmitters.
|
||||
|
||||
Note that disabling the signal output means that the the transmitter is to be
|
||||
made to not emit any energy when "blocked". rfkill is not about blocking data
|
||||
transmissions, it is about blocking energy emission.
|
||||
|
||||
The rfkill subsystem offers support for keys and switches often found on
|
||||
laptops to enable wireless devices like WiFi and Bluetooth, so that these keys
|
||||
and switches actually perform an action in all wireless devices of a given type
|
||||
attached to the system.
|
||||
|
||||
The buttons to enable and disable the wireless transmitters are important in
|
||||
situations where the user is for example using his laptop on a location where
|
||||
radio-frequency transmitters _must_ be disabled (e.g. airplanes).
|
||||
|
||||
Because of this requirement, userspace support for the keys should not be made
|
||||
mandatory. Because userspace might want to perform some additional smarter
|
||||
tasks when the key is pressed, rfkill provides userspace the possibility to
|
||||
take over the task to handle the key events.
|
||||
|
||||
===============================================================================
|
||||
1: Implementation details
|
||||
2: Implementation details
|
||||
|
||||
The rfkill switch subsystem offers support for keys often found on laptops
|
||||
to enable wireless devices like WiFi and Bluetooth.
|
||||
The rfkill subsystem is composed of various components: the rfkill class, the
|
||||
rfkill-input module (an input layer handler), and some specific input layer
|
||||
events.
|
||||
|
||||
This is done by providing the user 3 possibilities:
|
||||
1 - The rfkill system handles all events; userspace is not aware of events.
|
||||
2 - The rfkill system handles all events; userspace is informed about the events.
|
||||
3 - The rfkill system does not handle events; userspace handles all events.
|
||||
The rfkill class provides kernel drivers with an interface that allows them to
|
||||
know when they should enable or disable a wireless network device transmitter.
|
||||
This is enabled by the CONFIG_RFKILL Kconfig option.
|
||||
|
||||
The buttons to enable and disable the wireless radios are important in
|
||||
situations where the user is for example using his laptop on a location where
|
||||
wireless radios _must_ be disabled (e.g. airplanes).
|
||||
Because of this requirement, userspace support for the keys should not be
|
||||
made mandatory. Because userspace might want to perform some additional smarter
|
||||
tasks when the key is pressed, rfkill still provides userspace the possibility
|
||||
to take over the task to handle the key events.
|
||||
The rfkill class support makes sure userspace will be notified of all state
|
||||
changes on rfkill devices through uevents. It provides a notification chain
|
||||
for interested parties in the kernel to also get notified of rfkill state
|
||||
changes in other drivers. It creates several sysfs entries which can be used
|
||||
by userspace. See section "Userspace support".
|
||||
|
||||
The system inside the kernel has been split into 2 separate sections:
|
||||
1 - RFKILL
|
||||
2 - RFKILL_INPUT
|
||||
The rfkill-input module provides the kernel with the ability to implement a
|
||||
basic response when the user presses a key or button (or toggles a switch)
|
||||
related to rfkill functionality. It is an in-kernel implementation of default
|
||||
policy of reacting to rfkill-related input events and neither mandatory nor
|
||||
required for wireless drivers to operate. It is enabled by the
|
||||
CONFIG_RFKILL_INPUT Kconfig option.
|
||||
|
||||
The first option enables rfkill support and will make sure userspace will
|
||||
be notified of any events through the input device. It also creates several
|
||||
sysfs entries which can be used by userspace. See section "Userspace support".
|
||||
rfkill-input is a rfkill-related events input layer handler. This handler will
|
||||
listen to all rfkill key events and will change the rfkill state of the
|
||||
wireless devices accordingly. With this option enabled userspace could either
|
||||
do nothing or simply perform monitoring tasks.
|
||||
|
||||
The second option provides an rfkill input handler. This handler will
|
||||
listen to all rfkill key events and will toggle the radio accordingly.
|
||||
With this option enabled userspace could either do nothing or simply
|
||||
perform monitoring tasks.
|
||||
The rfkill-input module also provides EPO (emergency power-off) functionality
|
||||
for all wireless transmitters. This function cannot be overridden, and it is
|
||||
always active. rfkill EPO is related to *_RFKILL_ALL input layer events.
|
||||
|
||||
|
||||
Important terms for the rfkill subsystem:
|
||||
|
||||
In order to avoid confusion, we avoid the term "switch" in rfkill when it is
|
||||
referring to an electronic control circuit that enables or disables a
|
||||
transmitter. We reserve it for the physical device a human manipulates
|
||||
(which is an input device, by the way):
|
||||
|
||||
rfkill switch:
|
||||
|
||||
A physical device a human manipulates. Its state can be perceived by
|
||||
the kernel either directly (through a GPIO pin, ACPI GPE) or by its
|
||||
effect on a rfkill line of a wireless device.
|
||||
|
||||
rfkill controller:
|
||||
|
||||
A hardware circuit that controls the state of a rfkill line, which a
|
||||
kernel driver can interact with *to modify* that state (i.e. it has
|
||||
either write-only or read/write access).
|
||||
|
||||
rfkill line:
|
||||
|
||||
An input channel (hardware or software) of a wireless device, which
|
||||
causes a wireless transmitter to stop emitting energy (BLOCK) when it
|
||||
is active. Point of view is extremely important here: rfkill lines are
|
||||
always seen from the PoV of a wireless device (and its driver).
|
||||
|
||||
soft rfkill line/software rfkill line:
|
||||
|
||||
A rfkill line the wireless device driver can directly change the state
|
||||
of. Related to rfkill_state RFKILL_STATE_SOFT_BLOCKED.
|
||||
|
||||
hard rfkill line/hardware rfkill line:
|
||||
|
||||
A rfkill line that works fully in hardware or firmware, and that cannot
|
||||
be overridden by the kernel driver. The hardware device or the
|
||||
firmware just exports its status to the driver, but it is read-only.
|
||||
Related to rfkill_state RFKILL_STATE_HARD_BLOCKED.
|
||||
|
||||
The enum rfkill_state describes the rfkill state of a transmitter:
|
||||
|
||||
When a rfkill line or rfkill controller is in the RFKILL_STATE_UNBLOCKED state,
|
||||
the wireless transmitter (radio TX circuit for example) is *enabled*. When the
|
||||
it is in the RFKILL_STATE_SOFT_BLOCKED or RFKILL_STATE_HARD_BLOCKED, the
|
||||
wireless transmitter is to be *blocked* from operating.
|
||||
|
||||
RFKILL_STATE_SOFT_BLOCKED indicates that a call to toggle_radio() can change
|
||||
that state. RFKILL_STATE_HARD_BLOCKED indicates that a call to toggle_radio()
|
||||
will not be able to change the state and will return with a suitable error if
|
||||
attempts are made to set the state to RFKILL_STATE_UNBLOCKED.
|
||||
|
||||
RFKILL_STATE_HARD_BLOCKED is used by drivers to signal that the device is
|
||||
locked in the BLOCKED state by a hardwire rfkill line (typically an input pin
|
||||
that, when active, forces the transmitter to be disabled) which the driver
|
||||
CANNOT override.
|
||||
|
||||
Full rfkill functionality requires two different subsystems to cooperate: the
|
||||
input layer and the rfkill class. The input layer issues *commands* to the
|
||||
entire system requesting that devices registered to the rfkill class change
|
||||
state. The way this interaction happens is not complex, but it is not obvious
|
||||
either:
|
||||
|
||||
Kernel Input layer:
|
||||
|
||||
* Generates KEY_WWAN, KEY_WLAN, KEY_BLUETOOTH, SW_RFKILL_ALL, and
|
||||
other such events when the user presses certain keys, buttons, or
|
||||
toggles certain physical switches.
|
||||
|
||||
THE INPUT LAYER IS NEVER USED TO PROPAGATE STATUS, NOTIFICATIONS OR THE
|
||||
KIND OF STUFF AN ON-SCREEN-DISPLAY APPLICATION WOULD REPORT. It is
|
||||
used to issue *commands* for the system to change behaviour, and these
|
||||
commands may or may not be carried out by some kernel driver or
|
||||
userspace application. It follows that doing user feedback based only
|
||||
on input events is broken, as there is no guarantee that an input event
|
||||
will be acted upon.
|
||||
|
||||
Most wireless communication device drivers implementing rfkill
|
||||
functionality MUST NOT generate these events, and have no reason to
|
||||
register themselves with the input layer. Doing otherwise is a common
|
||||
misconception. There is an API to propagate rfkill status change
|
||||
information, and it is NOT the input layer.
|
||||
|
||||
rfkill class:
|
||||
|
||||
* Calls a hook in a driver to effectively change the wireless
|
||||
transmitter state;
|
||||
* Keeps track of the wireless transmitter state (with help from
|
||||
the driver);
|
||||
* Generates userspace notifications (uevents) and a call to a
|
||||
notification chain (kernel) when there is a wireless transmitter
|
||||
state change;
|
||||
* Connects a wireless communications driver with the common rfkill
|
||||
control system, which, for example, allows actions such as
|
||||
"switch all bluetooth devices offline" to be carried out by
|
||||
userspace or by rfkill-input.
|
||||
|
||||
THE RFKILL CLASS NEVER ISSUES INPUT EVENTS. THE RFKILL CLASS DOES
|
||||
NOT LISTEN TO INPUT EVENTS. NO DRIVER USING THE RFKILL CLASS SHALL
|
||||
EVER LISTEN TO, OR ACT ON RFKILL INPUT EVENTS. Doing otherwise is
|
||||
a layering violation.
|
||||
|
||||
Most wireless data communication drivers in the kernel have just to
|
||||
implement the rfkill class API to work properly. Interfacing to the
|
||||
input layer is not often required (and is very often a *bug*) on
|
||||
wireless drivers.
|
||||
|
||||
Platform drivers often have to attach to the input layer to *issue*
|
||||
(but never to listen to) rfkill events for rfkill switches, and also to
|
||||
the rfkill class to export a control interface for the platform rfkill
|
||||
controllers to the rfkill subsystem. This does NOT mean the rfkill
|
||||
switch is attached to a rfkill class (doing so is almost always wrong).
|
||||
It just means the same kernel module is the driver for different
|
||||
devices (rfkill switches and rfkill controllers).
|
||||
|
||||
|
||||
Userspace input handlers (uevents) or kernel input handlers (rfkill-input):
|
||||
|
||||
* Implements the policy of what should happen when one of the input
|
||||
layer events related to rfkill operation is received.
|
||||
* Uses the sysfs interface (userspace) or private rfkill API calls
|
||||
to tell the devices registered with the rfkill class to change
|
||||
their state (i.e. translates the input layer event into real
|
||||
action).
|
||||
* rfkill-input implements EPO by handling EV_SW SW_RFKILL_ALL 0
|
||||
(power off all transmitters) in a special way: it ignores any
|
||||
overrides and local state cache and forces all transmitters to the
|
||||
RFKILL_STATE_SOFT_BLOCKED state (including those which are already
|
||||
supposed to be BLOCKED). Note that the opposite event (power on all
|
||||
transmitters) is handled normally.
|
||||
|
||||
Userspace uevent handler or kernel platform-specific drivers hooked to the
|
||||
rfkill notifier chain:
|
||||
|
||||
* Taps into the rfkill notifier chain or to KOBJ_CHANGE uevents,
|
||||
in order to know when a device that is registered with the rfkill
|
||||
class changes state;
|
||||
* Issues feedback notifications to the user;
|
||||
* In the rare platforms where this is required, synthesizes an input
|
||||
event to command all *OTHER* rfkill devices to also change their
|
||||
statues when a specific rfkill device changes state.
|
||||
|
||||
|
||||
===============================================================================
|
||||
3: Kernel driver guidelines
|
||||
|
||||
Remember: point-of-view is everything for a driver that connects to the rfkill
|
||||
subsystem. All the details below must be measured/perceived from the point of
|
||||
view of the specific driver being modified.
|
||||
|
||||
The first thing one needs to know is whether his driver should be talking to
|
||||
the rfkill class or to the input layer. In rare cases (platform drivers), it
|
||||
could happen that you need to do both, as platform drivers often handle a
|
||||
variety of devices in the same driver.
|
||||
|
||||
Do not mistake input devices for rfkill controllers. The only type of "rfkill
|
||||
switch" device that is to be registered with the rfkill class are those
|
||||
directly controlling the circuits that cause a wireless transmitter to stop
|
||||
working (or the software equivalent of them), i.e. what we call a rfkill
|
||||
controller. Every other kind of "rfkill switch" is just an input device and
|
||||
MUST NOT be registered with the rfkill class.
|
||||
|
||||
A driver should register a device with the rfkill class when ALL of the
|
||||
following conditions are met (they define a rfkill controller):
|
||||
|
||||
1. The device is/controls a data communications wireless transmitter;
|
||||
|
||||
2. The kernel can interact with the hardware/firmware to CHANGE the wireless
|
||||
transmitter state (block/unblock TX operation);
|
||||
|
||||
3. The transmitter can be made to not emit any energy when "blocked":
|
||||
rfkill is not about blocking data transmissions, it is about blocking
|
||||
energy emission;
|
||||
|
||||
A driver should register a device with the input subsystem to issue
|
||||
rfkill-related events (KEY_WLAN, KEY_BLUETOOTH, KEY_WWAN, KEY_WIMAX,
|
||||
SW_RFKILL_ALL, etc) when ALL of the folowing conditions are met:
|
||||
|
||||
1. It is directly related to some physical device the user interacts with, to
|
||||
command the O.S./firmware/hardware to enable/disable a data communications
|
||||
wireless transmitter.
|
||||
|
||||
Examples of the physical device are: buttons, keys and switches the user
|
||||
will press/touch/slide/switch to enable or disable the wireless
|
||||
communication device.
|
||||
|
||||
2. It is NOT slaved to another device, i.e. there is no other device that
|
||||
issues rfkill-related input events in preference to this one.
|
||||
|
||||
Please refer to the corner cases and examples section for more details.
|
||||
|
||||
When in doubt, do not issue input events. For drivers that should generate
|
||||
input events in some platforms, but not in others (e.g. b43), the best solution
|
||||
is to NEVER generate input events in the first place. That work should be
|
||||
deferred to a platform-specific kernel module (which will know when to generate
|
||||
events through the rfkill notifier chain) or to userspace. This avoids the
|
||||
usual maintenance problems with DMI whitelisting.
|
||||
|
||||
|
||||
Corner cases and examples:
|
||||
====================================
|
||||
2: Driver support
|
||||
|
||||
To build a driver with rfkill subsystem support, the driver should
|
||||
depend on the Kconfig symbol RFKILL; it should _not_ depend on
|
||||
RKFILL_INPUT.
|
||||
1. If the device is an input device that, because of hardware or firmware,
|
||||
causes wireless transmitters to be blocked regardless of the kernel's will, it
|
||||
is still just an input device, and NOT to be registered with the rfkill class.
|
||||
|
||||
Unless key events trigger an interrupt to which the driver listens, polling
|
||||
will be required to determine the key state changes. For this the input
|
||||
layer providers the input-polldev handler.
|
||||
2. If the wireless transmitter switch control is read-only, it is an input
|
||||
device and not to be registered with the rfkill class (and maybe not to be made
|
||||
an input layer event source either, see below).
|
||||
|
||||
A driver should implement a few steps to correctly make use of the
|
||||
rfkill subsystem. First for non-polling drivers:
|
||||
3. If there is some other device driver *closer* to the actual hardware the
|
||||
user interacted with (the button/switch/key) to issue an input event, THAT is
|
||||
the device driver that should be issuing input events.
|
||||
|
||||
- rfkill_allocate()
|
||||
- input_allocate_device()
|
||||
- rfkill_register()
|
||||
- input_register_device()
|
||||
E.g:
|
||||
[RFKILL slider switch] -- [GPIO hardware] -- [WLAN card rf-kill input]
|
||||
(platform driver) (wireless card driver)
|
||||
|
||||
For polling drivers:
|
||||
The user is closer to the RFKILL slide switch plaform driver, so the driver
|
||||
which must issue input events is the platform driver looking at the GPIO
|
||||
hardware, and NEVER the wireless card driver (which is just a slave). It is
|
||||
very likely that there are other leaves than just the WLAN card rf-kill input
|
||||
(e.g. a bluetooth card, etc)...
|
||||
|
||||
- rfkill_allocate()
|
||||
- input_allocate_polled_device()
|
||||
- rfkill_register()
|
||||
- input_register_polled_device()
|
||||
On the other hand, some embedded devices do this:
|
||||
|
||||
When a key event has been detected, the correct event should be
|
||||
sent over the input device which has been registered by the driver.
|
||||
[RFKILL slider switch] -- [WLAN card rf-kill input]
|
||||
(wireless card driver)
|
||||
|
||||
In this situation, the wireless card driver *could* register itself as an input
|
||||
device and issue rf-kill related input events... but in order to AVOID the need
|
||||
for DMI whitelisting, the wireless card driver does NOT do it. Userspace (HAL)
|
||||
or a platform driver (that exists only on these embedded devices) will do the
|
||||
dirty job of issuing the input events.
|
||||
|
||||
|
||||
COMMON MISTAKES in kernel drivers, related to rfkill:
|
||||
====================================
|
||||
3: Userspace support
|
||||
|
||||
For each key an input device will be created which will send out the correct
|
||||
key event when the rfkill key has been pressed.
|
||||
1. NEVER confuse input device keys and buttons with input device switches.
|
||||
|
||||
1a. Switches are always set or reset. They report the current state
|
||||
(on position or off position).
|
||||
|
||||
1b. Keys and buttons are either in the pressed or not-pressed state, and
|
||||
that's it. A "button" that latches down when you press it, and
|
||||
unlatches when you press it again is in fact a switch as far as input
|
||||
devices go.
|
||||
|
||||
Add the SW_* events you need for switches, do NOT try to emulate a button using
|
||||
KEY_* events just because there is no such SW_* event yet. Do NOT try to use,
|
||||
for example, KEY_BLUETOOTH when you should be using SW_BLUETOOTH instead.
|
||||
|
||||
2. Input device switches (sources of EV_SW events) DO store their current state
|
||||
(so you *must* initialize it by issuing a gratuitous input layer event on
|
||||
driver start-up and also when resuming from sleep), and that state CAN be
|
||||
queried from userspace through IOCTLs. There is no sysfs interface for this,
|
||||
but that doesn't mean you should break things trying to hook it to the rfkill
|
||||
class to get a sysfs interface :-)
|
||||
|
||||
3. Do not issue *_RFKILL_ALL events by default, unless you are sure it is the
|
||||
correct event for your switch/button. These events are emergency power-off
|
||||
events when they are trying to turn the transmitters off. An example of an
|
||||
input device which SHOULD generate *_RFKILL_ALL events is the wireless-kill
|
||||
switch in a laptop which is NOT a hotkey, but a real switch that kills radios
|
||||
in hardware, even if the O.S. has gone to lunch. An example of an input device
|
||||
which SHOULD NOT generate *_RFKILL_ALL events by default, is any sort of hot
|
||||
key that does nothing by itself, as well as any hot key that is type-specific
|
||||
(e.g. the one for WLAN).
|
||||
|
||||
|
||||
3.1 Guidelines for wireless device drivers
|
||||
------------------------------------------
|
||||
|
||||
1. Each independent transmitter in a wireless device (usually there is only one
|
||||
transmitter per device) should have a SINGLE rfkill class attached to it.
|
||||
|
||||
2. If the device does not have any sort of hardware assistance to allow the
|
||||
driver to rfkill the device, the driver should emulate it by taking all actions
|
||||
required to silence the transmitter.
|
||||
|
||||
3. If it is impossible to silence the transmitter (i.e. it still emits energy,
|
||||
even if it is just in brief pulses, when there is no data to transmit and there
|
||||
is no hardware support to turn it off) do NOT lie to the users. Do not attach
|
||||
it to a rfkill class. The rfkill subsystem does not deal with data
|
||||
transmission, it deals with energy emission. If the transmitter is emitting
|
||||
energy, it is not blocked in rfkill terms.
|
||||
|
||||
4. It doesn't matter if the device has multiple rfkill input lines affecting
|
||||
the same transmitter, their combined state is to be exported as a single state
|
||||
per transmitter (see rule 1).
|
||||
|
||||
This rule exists because users of the rfkill subsystem expect to get (and set,
|
||||
when possible) the overall transmitter rfkill state, not of a particular rfkill
|
||||
line.
|
||||
|
||||
Example of a WLAN wireless driver connected to the rfkill subsystem:
|
||||
--------------------------------------------------------------------
|
||||
|
||||
A certain WLAN card has one input pin that causes it to block the transmitter
|
||||
and makes the status of that input pin available (only for reading!) to the
|
||||
kernel driver. This is a hard rfkill input line (it cannot be overridden by
|
||||
the kernel driver).
|
||||
|
||||
The card also has one PCI register that, if manipulated by the driver, causes
|
||||
it to block the transmitter. This is a soft rfkill input line.
|
||||
|
||||
It has also a thermal protection circuitry that shuts down its transmitter if
|
||||
the card overheats, and makes the status of that protection available (only for
|
||||
reading!) to the kernel driver. This is also a hard rfkill input line.
|
||||
|
||||
If either one of these rfkill lines are active, the transmitter is blocked by
|
||||
the hardware and forced offline.
|
||||
|
||||
The driver should allocate and attach to its struct device *ONE* instance of
|
||||
the rfkill class (there is only one transmitter).
|
||||
|
||||
It can implement the get_state() hook, and return RFKILL_STATE_HARD_BLOCKED if
|
||||
either one of its two hard rfkill input lines are active. If the two hard
|
||||
rfkill lines are inactive, it must return RFKILL_STATE_SOFT_BLOCKED if its soft
|
||||
rfkill input line is active. Only if none of the rfkill input lines are
|
||||
active, will it return RFKILL_STATE_UNBLOCKED.
|
||||
|
||||
If it doesn't implement the get_state() hook, it must make sure that its calls
|
||||
to rfkill_force_state() are enough to keep the status always up-to-date, and it
|
||||
must do a rfkill_force_state() on resume from sleep.
|
||||
|
||||
Every time the driver gets a notification from the card that one of its rfkill
|
||||
lines changed state (polling might be needed on badly designed cards that don't
|
||||
generate interrupts for such events), it recomputes the rfkill state as per
|
||||
above, and calls rfkill_force_state() to update it.
|
||||
|
||||
The driver should implement the toggle_radio() hook, that:
|
||||
|
||||
1. Returns an error if one of the hardware rfkill lines are active, and the
|
||||
caller asked for RFKILL_STATE_UNBLOCKED.
|
||||
|
||||
2. Activates the soft rfkill line if the caller asked for state
|
||||
RFKILL_STATE_SOFT_BLOCKED. It should do this even if one of the hard rfkill
|
||||
lines are active, effectively double-blocking the transmitter.
|
||||
|
||||
3. Deactivates the soft rfkill line if none of the hardware rfkill lines are
|
||||
active and the caller asked for RFKILL_STATE_UNBLOCKED.
|
||||
|
||||
===============================================================================
|
||||
4: Kernel API
|
||||
|
||||
To build a driver with rfkill subsystem support, the driver should depend on
|
||||
(or select) the Kconfig symbol RFKILL; it should _not_ depend on RKFILL_INPUT.
|
||||
|
||||
The hardware the driver talks to may be write-only (where the current state
|
||||
of the hardware is unknown), or read-write (where the hardware can be queried
|
||||
about its current state).
|
||||
|
||||
The rfkill class will call the get_state hook of a device every time it needs
|
||||
to know the *real* current state of the hardware. This can happen often.
|
||||
|
||||
Some hardware provides events when its status changes. In these cases, it is
|
||||
best for the driver to not provide a get_state hook, and instead register the
|
||||
rfkill class *already* with the correct status, and keep it updated using
|
||||
rfkill_force_state() when it gets an event from the hardware.
|
||||
|
||||
There is no provision for a statically-allocated rfkill struct. You must
|
||||
use rfkill_allocate() to allocate one.
|
||||
|
||||
You should:
|
||||
- rfkill_allocate()
|
||||
- modify rfkill fields (flags, name)
|
||||
- modify state to the current hardware state (THIS IS THE ONLY TIME
|
||||
YOU CAN ACCESS state DIRECTLY)
|
||||
- rfkill_register()
|
||||
|
||||
The only way to set a device to the RFKILL_STATE_HARD_BLOCKED state is through
|
||||
a suitable return of get_state() or through rfkill_force_state().
|
||||
|
||||
When a device is in the RFKILL_STATE_HARD_BLOCKED state, the only way to switch
|
||||
it to a different state is through a suitable return of get_state() or through
|
||||
rfkill_force_state().
|
||||
|
||||
If toggle_radio() is called to set a device to state RFKILL_STATE_SOFT_BLOCKED
|
||||
when that device is already at the RFKILL_STATE_HARD_BLOCKED state, it should
|
||||
not return an error. Instead, it should try to double-block the transmitter,
|
||||
so that its state will change from RFKILL_STATE_HARD_BLOCKED to
|
||||
RFKILL_STATE_SOFT_BLOCKED should the hardware blocking cease.
|
||||
|
||||
Please refer to the source for more documentation.
|
||||
|
||||
===============================================================================
|
||||
5: Userspace support
|
||||
|
||||
rfkill devices issue uevents (with an action of "change"), with the following
|
||||
environment variables set:
|
||||
|
||||
RFKILL_NAME
|
||||
RFKILL_STATE
|
||||
RFKILL_TYPE
|
||||
|
||||
The ABI for these variables is defined by the sysfs attributes. It is best
|
||||
to take a quick look at the source to make sure of the possible values.
|
||||
|
||||
It is expected that HAL will trap those, and bridge them to DBUS, etc. These
|
||||
events CAN and SHOULD be used to give feedback to the user about the rfkill
|
||||
status of the system.
|
||||
|
||||
Input devices may issue events that are related to rfkill. These are the
|
||||
various KEY_* events and SW_* events supported by rfkill-input.c.
|
||||
|
||||
******IMPORTANT******
|
||||
When rfkill-input is ACTIVE, userspace is NOT TO CHANGE THE STATE OF AN RFKILL
|
||||
SWITCH IN RESPONSE TO AN INPUT EVENT also handled by rfkill-input, unless it
|
||||
has set to true the user_claim attribute for that particular switch. This rule
|
||||
is *absolute*; do NOT violate it.
|
||||
******IMPORTANT******
|
||||
|
||||
Userspace must not assume it is the only source of control for rfkill switches.
|
||||
Their state CAN and WILL change due to firmware actions, direct user actions,
|
||||
and the rfkill-input EPO override for *_RFKILL_ALL.
|
||||
|
||||
When rfkill-input is not active, userspace must initiate a rfkill status
|
||||
change by writing to the "state" attribute in order for anything to happen.
|
||||
|
||||
Take particular care to implement EV_SW SW_RFKILL_ALL properly. When that
|
||||
switch is set to OFF, *every* rfkill device *MUST* be immediately put into the
|
||||
RFKILL_STATE_SOFT_BLOCKED state, no questions asked.
|
||||
|
||||
The following sysfs entries will be created:
|
||||
|
||||
name: Name assigned by driver to this key (interface or driver name).
|
||||
type: Name of the key type ("wlan", "bluetooth", etc).
|
||||
state: Current state of the key. 1: On, 0: Off.
|
||||
state: Current state of the transmitter
|
||||
0: RFKILL_STATE_SOFT_BLOCKED
|
||||
transmitter is forced off, but one can override it
|
||||
by a write to the state attribute;
|
||||
1: RFKILL_STATE_UNBLOCKED
|
||||
transmiter is NOT forced off, and may operate if
|
||||
all other conditions for such operation are met
|
||||
(such as interface is up and configured, etc);
|
||||
2: RFKILL_STATE_HARD_BLOCKED
|
||||
transmitter is forced off by something outside of
|
||||
the driver's control. One cannot set a device to
|
||||
this state through writes to the state attribute;
|
||||
claim: 1: Userspace handles events, 0: Kernel handles events
|
||||
|
||||
Both the "state" and "claim" entries are also writable. For the "state" entry
|
||||
this means that when 1 or 0 is written all radios, not yet in the requested
|
||||
state, will be will be toggled accordingly.
|
||||
this means that when 1 or 0 is written, the device rfkill state (if not yet in
|
||||
the requested state), will be will be toggled accordingly.
|
||||
|
||||
For the "claim" entry writing 1 to it means that the kernel no longer handles
|
||||
key events even though RFKILL_INPUT input was enabled. When "claim" has been
|
||||
set to 0, userspace should make sure that it listens for the input events or
|
||||
check the sysfs "state" entry regularly to correctly perform the required
|
||||
tasks when the rkfill key is pressed.
|
||||
check the sysfs "state" entry regularly to correctly perform the required tasks
|
||||
when the rkfill key is pressed.
|
||||
|
||||
A note about input devices and EV_SW events:
|
||||
|
||||
In order to know the current state of an input device switch (like
|
||||
SW_RFKILL_ALL), you will need to use an IOCTL. That information is not
|
||||
available through sysfs in a generic way at this time, and it is not available
|
||||
through the rfkill class AT ALL.
|
||||
|
|
|
@ -61,10 +61,7 @@ builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
|
|||
arch_init_sched_domains function. This function will attach domains to all
|
||||
CPUs using cpu_attach_domain.
|
||||
|
||||
Implementors should change the line
|
||||
#undef SCHED_DOMAIN_DEBUG
|
||||
to
|
||||
#define SCHED_DOMAIN_DEBUG
|
||||
in kernel/sched.c as this enables an error checking parse of the sched domains
|
||||
The sched-domains debugging infrastructure can be enabled by enabling
|
||||
CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
|
||||
which should catch most possible errors (described above). It also prints out
|
||||
the domain structure in a visual format.
|
||||
|
|
|
@ -51,9 +51,9 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
|
|||
0.00015s. So this group can be scheduled with a period of 0.005s and a run time
|
||||
of 0.00015s.
|
||||
|
||||
The remaining CPU time will be used for user input and other tass. Because
|
||||
The remaining CPU time will be used for user input and other tasks. Because
|
||||
realtime tasks have explicitly allocated the CPU time they need to perform
|
||||
their tasks, buffer underruns in the graphocs or audio can be eliminated.
|
||||
their tasks, buffer underruns in the graphics or audio can be eliminated.
|
||||
|
||||
NOTE: the above example is not fully implemented as of yet (2.6.25). We still
|
||||
lack an EDF scheduler to make non-uniform periods usable.
|
||||
|
|
Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше
Загрузка…
Ссылка в новой задаче