How should I pick m (the table size) for the Bloom filter?
You should do this exactly the same what as it is done for the provided open addressing bucket mapping. I’ll now describe how that is done and the reason for it. For this genomics application, an estimated value (really an upper bound) for the number of elements to insert is provided to the constructor, and also a desired load is provided (as an argument or using a default value). Let capacity be the estimated value for n (i.e., it is the desired capacity for which you are designing the hash table.) Recall that the actual load alpha = n/m (when d = 0), and solving for m yields that m = n/alpha. So setting m = capacity/load where load is the target load is the size needed for the hash table so the actual load matches the target load when n = capacity. The open addressing bucket mapping sets m to be the nearest power of 2 that is at least as big as capacity/load.