Finding Out and Visualising Hardware Locality
Although most commonly used machines (with Intel/AMD processors) present a common programming model, the way the machines are actually configured can hugely vary and the optimum way of operating them varies accordingly. In particular the machines can have multiple sockets, with multiple cores, caches, I/O devices, etc. The way these are all connected can vary and therefore it is necessary to know hardware locality, i.e., which processing cores can access which resources efficiently.
For people using lots of machines even remembering this hardware locality can be hard, and anyway often one needs to be able to make choices using this information run time. A useful tool for both of these tasks is hwloc. The simplest way of using this package is through the lstopo tool, which produces a graphical summary of the configuration and hardware locality of the machine on which it is run.
As an example, here is the output on my X220 laptop:
And here is example lstopo output on a rather better configured machine:
The hwloc library also has facility for finding out these topologies at run-time within your own program.