Prime number storage

When I was in college, my friend Mike and I had an idea for a method of compressing prime numbers. The basic idea was to use a bitvector, where each bit was set to a 1 if the index of that element was prime. Otherwise it was set to 0.

Of course this can be greatly improved upon using the some of the ideas behind the Sieve of Eratosthenes. The sieve starts with a list of numbers (that starts at 2), and then goes through the list, removing each element that is a multiple of 2. It then goes on to the next number in the list, which is 3 and repeats. At each iteration, the next number in the list is guaranteed to be prime, so after \sqrt(N) iterations all the numbers left in the list from 2 to N will be prime.

This is a great way to compute the primes, but using a complete sieve for storing primes in a bitvector doesn’t make any sense. However we can do a few iterations to remove a significant number of non-prime numbers from the bitvector. We decided upon removing all of the multiples of 2, 3, 5 and 7, and starting our bitvector at 11 instead of 2.

However once the bitvector was filled in, we had to figure out how to calculate the real index of a given number. For that we turned to some basic set theory. We basically had four sets that we wanted to remove: the set of numbers divisible by 2, the set of numbers divisible by 3, divisible by 5 and divisible by 7. However there is overlap between all of these sets. To figure out how many numbers we need to remove, we have to take that into account. We knew that the magnitude of the union of four sets was:

||A \cup B \cup C \cup D|| = ||A|| + ||B|| + ||C|| + ||D|| - ||A \cap B|| - ||A \cap C|| - ||A \cap D|| - ||B \cap C|| - ||B \cap D|| - ||C \cap D|| + ||A \cap B \cap C|| + ||A \cap B \cap D|| + ||A \cap C \cap D|| + ||B \cap C \cap D|| - ||A \cap B \cap C \cap D||

A, B, C, D represent the four sets (2, 3, 5, 7) and the intersection of sets can be represented by multiples of the products of those numbers. In other words, || A \cap B|| is the set of multiples of 6 (2 times 3). The easiest way of thinking about it is like this: you remove all multiples of two. That includes the number six. You now remove all multiples of three. That also includes the number six. You have effectively removed the number 6 twice, so you need to add it back in once.

Once we had worked out all the math, we decided to search online for a large list of prime numbers so we wouldn’t need to generate our own. We found a website that contained a suitable list, along with an explanation of how they stored that list. It turns out they used the same exact method we did.

We never ended up implementing our method primarily due to the knowledge that it was already in use. However recently I decided to implement it in C, just for fun. Here it is:

Download the source code