2005-04-17 02:20:36 +04:00
|
|
|
Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
|
|
|
|
|
|
|
|
The intent of this file is to have an uptodate, running commentary
|
|
|
|
from different people about NUMA specific code in the Linux vm.
|
|
|
|
|
|
|
|
What is NUMA? It is an architecture where the memory access times
|
|
|
|
for different regions of memory from a given processor varies
|
|
|
|
according to the "distance" of the memory region from the processor.
|
|
|
|
Each region of memory to which access times are the same from any
|
|
|
|
cpu, is called a node. On such architectures, it is beneficial if
|
|
|
|
the kernel tries to minimize inter node communications. Schemes
|
|
|
|
for this range from kernel text and read-only data replication
|
|
|
|
across nodes, and trying to house all the data structures that
|
|
|
|
key components of the kernel need on memory on that node.
|
|
|
|
|
|
|
|
Currently, all the numa support is to provide efficient handling
|
|
|
|
of widely discontiguous physical memory, so architectures which
|
|
|
|
are not NUMA but can have huge holes in the physical address space
|
|
|
|
can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.
|
|
|
|
|
|
|
|
The initial port includes NUMAizing the bootmem allocator code by
|
|
|
|
encapsulating all the pieces of information into a bootmem_data_t
|
|
|
|
structure. Node specific calls have been added to the allocator.
|
|
|
|
In theory, any platform which uses the bootmem allocator should
|
2006-10-04 00:57:56 +04:00
|
|
|
be able to put the bootmem and mem_map data structures anywhere
|
2005-04-17 02:20:36 +04:00
|
|
|
it deems best.
|
|
|
|
|
|
|
|
Each node's page allocation data structures have also been encapsulated
|
|
|
|
into a pg_data_t. The bootmem_data_t is just one part of this. To
|
|
|
|
make the code look uniform between NUMA and regular UMA platforms,
|
|
|
|
UMA platforms have a statically allocated pg_data_t too (contig_page_data).
|
|
|
|
For the sake of uniformity, the function num_online_nodes() is also defined
|
|
|
|
for all platforms. As we run benchmarks, we might decide to NUMAize
|
|
|
|
more variables like low_on_memory, nr_free_pages etc into the pg_data_t.
|
|
|
|
|
|
|
|
The NUMA aware page allocation code currently tries to allocate pages
|
|
|
|
from different nodes in a round robin manner. This will be changed to
|
|
|
|
do concentratic circle search, starting from current node, once the
|
|
|
|
NUMA port achieves more maturity. The call alloc_pages_node has been
|
|
|
|
added, so that drivers can make the call and not worry about whether
|
|
|
|
it is running on a NUMA or UMA platform.
|