Why does using OpenMP help?
OpenMP allows an application to exploit parallelism using threads rather than processes. On-node CPU cores are utilized to perform parallel work within a shared memory space. Thus the duplication of memory resources that is often required by MPI is no longer needed, as the parallelism is expressed within the same process, and no explicit message passing between threads is required. A second benefit comes from using fewer MPI processes. This means messages are larger, or less tasks are participating in a collective operation, which also increases performance.