How are p-values calculated, and why are they approximate?
A raw functional association score is a ratio of the between, background, within, and baseline measures of average functional relationship for two gene sets. When this ratio is bigger, the two sets are more related – but quantifying “how much more” depends on the size of the two gene sets and on the current biological context. To analyze these scores in a more principled manner, we convert them to p-values by comparing them to bootstrapped null distributions generated by randomly calculating scores for thousands of gene sets over a wide ranges of sizes in every biological context. This yields a distribution of expected scores (per size of the two gene sets, per context) that is approximately normal with mean one, and comparing the score for a “real” gene set to this background distribution yields a p-value. However, the exact variance of the distribution is dependent on the size of the gene sets being analyzed and on the current context, and we can’t randomly generate thousands of gene