How was the 5% tolerance decided?
Benchmark timings cannot be exactly reproduced even between consecutive runs of the same binary with the same system settings. The reproducibility tolerance is expected to account for allowable run-to-run variability, and any degradation beyond this is likely to indicate deficiencies in the hardware or software. Since MPI-parallel applications are composed of communicating asynchronous processes, they show higher run-to-run variability than do serial benchmarks. SPEC/HPG studied the benchmark runtimes on a variety of systems used in the MPI2007 internal acceptance tests and chose 5% as a tolerance that such systems can be expected to meet. As MPI2007 comes into wider use and is measured across a larger set of machines, the SPEC/HPG committee can increase this range of tolerance if it is decided to be necessary. If you are overly concerned with the stability of your measured results, you can improve it by running more iterations of the benchmark suite.
Benchmark timings cannot be exactly reproduced even between consecutive runs of the same binary with the same system settings. The reproducibility tolerance is expected to account for allowable run-to-run variability, and any degradation beyond this is likely to indicate deficiencies in the hardware or software. Since MPI-parallel applications are composed of communicating asynchronous processes, they show higher run-to-run variability than do serial benchmarks. SPEC/HPG studied the benchmark runtimes on a variety of systems used in the MPI2007 internal acceptance tests and chose 5% as a tolerance that such systems can be expected to meet. As MPI2007 comes into wider use and is measured across a larger set of machines, the SPEC/HPG committee can increase this range of tolerance if it is decided to be necessary. If you are overly concerned with the stability of your measured results, you can improve it by running more iterations of the benchmark suite. Statistically, both the median mea