HDK Technical Reference

Cache coherent NUMA localization

The ccNUMA kernel is designed to optimize performance on machines with non-uniform memory access. Such machines contain multiple CPU groups, each with local memory, interconnected by a coherent memory bus. The optimization is achieved by keeping data local to the CPU groups making access. When data is read-only, a stronger form of localization, called replication, can be used. The ccNUMA kernel implements a process placement policy to balance the load across the CPU groups, to optimize the effectiveness of localization.

A ccNUMA system optimizes for the cost of the system by using standard high volume server components, connected by a reasonably priced coherent bus. This results in a system with the processing power of a small super-computer, but at a much lower price, and with more scalability than can be achieved in an SMP system where all memory access is randomly placed.

The SVR5 kernel includes features to support ccNUMA functionality, including replicated kernel text, localized page pools, localized page reservations pools, localized KMA pools, localized anon slots pools, replicated kernel page tables, replicated kernel text, localized run queues, and fork/exec time process placement. These features have been designed to support up to 4 CPU groups, providing for up to 16 processors and up to 16 gigabytes of memory.

The libcg library contains routines that allow applications to optimize for the ccNUMA environment.

DDI Version 8 contains the functionality required to make drivers work in a ccNUMA configuration.

No ccNUMA hardware is yet available. SCO is working with hardware vendors to ensure that ccNUMA is supported in the SVR5 kernel when hardware is available.