Does Open MPI support end-to-end data reliability in MPI message passing?
The current release of Open MPI does not support end-to-end data reliability in message passing any more than the underlying network already guarantees. Future releases of Open MPI will include explicit data reliability support (i.e., more functionality than is provided by the underlying network). Specifically, the data reliability (“dr”) PML component (available on the trunk, but not yet in a stable release) assumes that the underlying network is unreliable. It can drop / restart connections, retransmit corrupted or lost data, etc. The end effect is that data sent through MPI API functions will be guaranteed to be reliable. For example, if you’re using TCP as a message transport, chances of data corruption are fairly low. However, other interconnects do not guarantee that data will be uncorrupted when traveling across the network. Additionally, there are nonzero possibilities that data can be corrupted while traversing PCI buses, etc. (some corruption errors at this level can be caugh