We tried using OpenMP many years ago on clusters of SMPs but found the payoff to be minimal or negative. What is different now?
First, we reiterate the declining available memory per core as a key difference. This basically limits our choices more than it did in the past. Although many of the issues are the same as they were in the past, there are several key architectural differences between a Hopper node and the SMPs of days past. For example, the ratio of intra-socket to inter-node MPI latency and bandwidth on a machine like Hopper is some 10-100 times higher than was the ratio of intra-processor to inter-processor latency and bandwidth in older SMPs. An MPI-only code will just not as effectively exploit the vastly improved latencies and bandwidths available from a chip multiprocessor such as the Hopper Magny-Cours. See below for a more detailed look at OpenMP benefits and drawbacks. To review: MPI + OpenMP may not be faster than pure MPI but it will almost certainly use less memory.