Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How should I pick m (the table size) for the Bloom filter?

April 26, 2017bloom filter pick size Table

0

Posted

How should I pick m (the table size) for the Bloom filter?

1 Answer

0

Posted

You should do this exactly the same what as it is done for the provided open addressing bucket mapping. I’ll now describe how that is done and the reason for it. For this genomics application, an estimated value (really an upper bound) for the number of elements to insert is provided to the constructor, and also a desired load is provided (as an argument or using a default value). Let capacity be the estimated value for n (i.e., it is the desired capacity for which you are designing the hash table.) Recall that the actual load alpha = n/m (when d = 0), and solving for m yields that m = n/alpha. So setting m = capacity/load where load is the target load is the size needed for the hash table so the actual load matches the target load when n = capacity. The open addressing bucket mapping sets m to be the nearest power of 2 that is at least as big as capacity/load.