# Segmented Sieve of Eratosthenes

A classic programming problem is to find all the primes up to a certain number. This problem admits a classic solution, the sieve of Eratosthenes. Here it is in Java.

/**
* @param upper bound exclusive
* @return a list of primes strictly less than upper
*/
public static Deque<Integer> findPrimes(int upper) {
Deque<Integer> primes = new ArrayDeque<Integer>();
if (upper <= 2) return primes;
boolean[] isPrime = new boolean[(upper-2)/2]; // index 0 is 3
Arrays.fill(isPrime, true);
for (int p = 3; p < upper; p += 2) {
if (isPrime[p/2 - 1]) {
// only need to start from p^2 since we already checked p*m, where m < p
for (long q = ((long) p)*((long) p); q < upper; q += 2*p) {
isPrime[((int) q)/2 - 1] = false;
}
}
}
return primes;
}


The problem is with the isPrime array. This solution is $O(n)$ in space and computation. We may only be interested in finding large primes from say 999,900,000 to 1,000,000,000 as in this problem PRIME1. It doesn't make sense to check numbers less than 999,900,000 or allocate space for them.

Hence, we use a segmented sieve. The underlying idea is that to check if a number $P$ is prime by trial division, we only need to check that it is not divisible by any prime numbers $q \leq \sqrt{P}$. Thus, if we want to find all the primes between $m$ and $n$, we first generate all the primes that are less than or equal to $\sqrt{n}$ with the traditional sieve. Let $S$ be the set of those primes.

Then, let $L$ be some constant number. We work in segments $[m, m + L)$, $[m + L, m + 2L)$, $\ldots$, $[m + kL, n + 1)$. In each of these segments, we identify of all the multiples of the primes found in $S$ and mark them as not prime. Now, we only need $O(\max(|S|, L))$ space, and computation is $$O\left(|S| \cdot \frac{n-m}{L} + (n-m)\right),$$ and we can set $L$ to be as large or small as we want.

By the prime number theorem, $|S|$ is not typically very large. Asympototically, $$|S| = \pi(\sqrt{n}) \sim \frac{\sqrt{n}}{\log \sqrt{n}}.$$ For $L$, we have a tradeoff. If we have large $L$, we may need a lot of space. If we have $L$ too small, our sieve is very small and may not contain many multiples of the primes in $S$, which results in wasted computation. Here is the code with some tweaks to avoid even numbers.

/**
* Find primes in range
* @param lower bound, inclusive
* @param upper bound exclusive
* @param sieveSize space to use
* @return list of primes in range
*/
public static Deque<Integer> findPrimes(int lower, int upper, int sieveSize) {
if (lower >= upper) throw new IllegalArgumentException("lower must be less than upper");
int sievingPrimesUpper = (int) Math.sqrt(upper);
if (lower <= sievingPrimesUpper || sievingPrimesUpper <= 2) {
Deque<Integer> primes = findPrimes(upper);
if (!primes.isEmpty()) while (primes.peekFirst() < lower) primes.removeFirst();
return primes;
}
if (sieveSize < 5) sieveSize = 10;
Deque<Integer> primes = new ArrayDeque<Integer>();
Deque<Integer> sievingPrimes = findPrimes(sievingPrimesUpper + 1);
sievingPrimes.removeFirst(); // get rid of 2
while (!sievingPrimes.isEmpty() &&
sievingPrimes.getLast()*sievingPrimes.getLast() >= upper) sievingPrimes.removeLast();
if (lower % 2 == 0) lower += 1; // make lower odd
boolean[] isPrime = new boolean[sieveSize]; // isPrime[i] refers to lower + 2*i
/**
* Find first odd multiple for each sieving prime. lower + 2*nextMultipleOffset[i]
* will be the first odd multiple of sievingPrimes[i] that is greater than or
* equal to lower.
*/
int[] nextMultipleOffset = new int[sievingPrimes.size()];
int idx = 0;
for (int p : sievingPrimes) {
int nextMultiple = lower - (lower % p); // make it a multiple of p
if (nextMultiple < lower)  nextMultiple += p; // make it at least lower
if (nextMultiple % 2 == 0) nextMultiple += p; // make sure it's odd
nextMultipleOffset[idx++] = (nextMultiple - lower)/2;
}
while (lower < upper) {
Arrays.fill(isPrime, true);
idx = 0;
for (int p : sievingPrimes) {
int offset = nextMultipleOffset[idx];
for (int j = offset; j < sieveSize; j += p) isPrime[j] = false;
/**
* We want (lower + 2*sieveSize + 2*(nextMultipleOffset[idx] + k)) % p == 0
* and (lower + 2*sieveSize + 2*(nextMultipleOffset[idx] + k)) % 2 == 1,
* where k is the correction term. Second equation is always true.
* First reduces to 2*(sieveSize + k) % p == 0 ==> (sieveSize + k) % p == 0
* since 2 must be invertible in the field F_p. Thus, we have that
* k % p = (-sieveSize) % p. Then, we make sure that the offset is nonnegative.
*/
nextMultipleOffset[idx] = (nextMultipleOffset[idx] - sieveSize) % p;
if (nextMultipleOffset[idx] < 0) nextMultipleOffset[idx] += p;
++idx;
}
for (int i = 0; i < sieveSize; ++i) {
int newPrime = lower + i*2;
if (newPrime >= upper) break;