The Sieve of Eratosthenes - Finding Primes Like a Pro

The Sieve of Eratosthenes: An Ancient Algorithm for Modern Problems

Finding prime numbers is a fundamental problem in computer science and mathematics, with applications ranging from cryptography to advanced number theory. While trial division is the most straightforward method to check if a single number is prime, it becomes incredibly inefficient when you need to find all primes up to a certain limit.

Enter the Sieve of Eratosthenes. Dating back to ancient Greece (around 200 BC), this algorithm is surprisingly efficient and elegant for generating lists of prime numbers up to a specified integer. It’s a classic for a reason, and understanding it is a rite of passage for any developer delving into algorithms.

What is the Sieve of Eratosthenes?

At its core, the Sieve of Eratosthenes is an algorithm for finding all prime numbers up to any given limit. It does this by iteratively marking as composite (i.e., not prime) the multiples of each prime, starting with the first prime number, 2.

How It Works: The Core Idea

Imagine you have a list of consecutive integers starting from 2 up to your desired limit n. The Sieve works by systematically eliminating the composite numbers.

Here’s the step-by-step process:

  1. Create a list: Start with a list of booleans, say is_prime, of size n+1. Initialize all entries from is_prime[2] to is_prime[n] as true. Mark is_prime[0] and is_prime[1] as false because 0 and 1 are not prime.

  2. Start with the first prime: Begin with p = 2 (the first prime number).

  3. Mark multiples:

    • If is_prime[p] is true (meaning p has not been marked as composite yet, so it’s prime):
      • Iterate through all multiples of p starting from p*p up to n (i.e., p*p, p*p + p, p*p + 2p, …).
      • Mark these multiples as false in your is_prime list.
      • Note: We start marking from p*p because any composite number less than p*p that has p as a factor must also have a smaller prime factor (let’s say q < p). These smaller prime factors would have already been processed and their multiples (including the current one) marked false by previous iterations. For example, when p=5, multiples like 10, 15, 20 are already marked by 2 or 3. The first multiple of 5 not yet marked is 25 (5*5).
  4. Move to the next number: Increment p to p + 1.

  5. Repeat: Go back to step 3. Continue this process as long as p*p <= n. Why p*p <= n? Because if p*p > n, then any multiple k*p where k >= p would be greater than n, so there’s nothing left to mark. If k < p, it would have been marked by a previous prime factor.

  6. Collect primes: Once the loop finishes, all numbers i for which is_prime[i] is true are prime numbers.

Let’s walk through an example to find primes up to 30:

  • Initialize: [F, F, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T] (indices 0 to 30)

  • p = 2: is_prime[2] is T.

    • Mark multiples of 2 from 2*2=4: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 as F.
    • List becomes: [F, F, T, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F, T, F]
  • p = 3: is_prime[3] is T.

    • Mark multiples of 3 from 3*3=9: 9, 12(already F), 15, 18(already F), 21, 24(already F), 27, 30(already F) as F.
    • List becomes: [F, F, T, T, F, T, F, T, F, F, F, T, F, T, F, F, F, T, F, T, F, F, F, T, F, F, F, T, F, T, F]
  • p = 4: is_prime[4] is F. Skip.

  • p = 5: is_prime[5] is T.

    • Mark multiples of 5 from 5*5=25: 25, 30(already F) as F.
    • List becomes: [F, F, T, T, F, T, F, T, F, F, F, T, F, T, F, F, F, T, F, T, F, F, F, T, F, F, F, F, F, T, F]
  • p = 6: is_prime[6] is F. Skip.

  • **p = sqrt(30) ~ 5.47. So, pwill go up to 5. The loop finishes afterp=5`.

The primes up to 30 are the indices where the value is T: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29.

Why It’s Important: Efficiency and Practicality

The Sieve is much faster than trial division for finding multiple primes because it avoids repeated division operations. Instead, it uses simple array lookups and arithmetic. Its efficiency makes it suitable for:

  • Competitive Programming: Many problems require a list of primes up to a certain N.
  • Cryptography: While not directly used for generating large cryptographic primes (which are usually found by probabilistic methods like Miller-Rabin), understanding prime generation is foundational.
  • Number Theory Algorithms: Primes are building blocks for many other algorithms.

Implementation in Python

Python is a great language to demonstrate the Sieve due to its readability.

Basic Python Sieve

def sieve_of_eratosthenes_basic(limit):
    """
    Finds all prime numbers up to 'limit' using the basic Sieve of Eratosthenes.
    """
    if limit < 2:
        return []

    is_prime = [True] * (limit + 1)
    is_prime[0] = is_prime[1] = False # 0 and 1 are not prime

    p = 2
    while p * p <= limit:
        # If is_prime[p] is still True, then it is a prime
        if is_prime[p]:
            # Mark all multiples of p as not prime
            for multiple in range(p * p, limit + 1, p):
                is_prime[multiple] = False
        p += 1

    # Collect all prime numbers
    primes = [i for i, prime in enumerate(is_prime) if prime]
    return primes

# Example Usage
print(f"Primes up to 10: {sieve_of_eratosthenes_basic(10)}")
print(f"Primes up to 30: {sieve_of_eratosthenes_basic(30)}")
print(f"Primes up to 100: {sieve_of_eratosthenes_basic(100)}")
Primes up to 10: [2, 3, 5, 7]
Primes up to 30: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
Primes up to 100: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Optimized Python Sieve (Skipping Even Numbers)

We can optimize by realizing that after processing p=2, all even numbers (except 2 itself) are marked false. We only need to check odd numbers for p and mark their odd multiples.

def sieve_of_eratosthenes_optimized(limit):
    """
    Finds all prime numbers up to 'limit' using an optimized Sieve.
    Optimizations:
    1. Handles 2 separately.
    2. Only considers odd numbers for marking.
    """
    if limit < 2:
        return []
    if limit == 2:
        return [2]

    # Initialize all odd numbers as potentially prime
    # size is (limit-1)//2 + 1 for indices 0 to (limit-1)//2
    # maps index i to number 2*i + 1
    # is_prime_odd[0] corresponds to 1 (not prime)
    # is_prime_odd[1] corresponds to 3 (prime)
    is_prime_odd = [True] * ((limit - 1) // 2 + 1)

    # Note: is_prime_odd[0] corresponds to 1, which is not prime.
    # It's automatically handled by the loop starting from p=3.

    p_idx = 1 # Corresponds to p = 3
    # We iterate p as 2*p_idx + 1
    # The loop condition p * p <= limit means (2*p_idx + 1)^2 <= limit
    while (2 * p_idx + 1)**2 <= limit:
        p = 2 * p_idx + 1
        # If is_prime_odd[p_idx] is True, then p is prime
        if is_prime_odd[p_idx]:
            # Mark all multiples of p as not prime
            # Multiples will be p*p, p*(p+2), p*(p+4), ...
            # These are p*(2k+1)
            # The indices for these multiples are ((p*k) - 1) // 2
            # Start multiple: p*p. Index: (p*p - 1) // 2
            # Subsequent multiples: p*(p+2), p*(p+4)...
            # The step size for indices is p
            for multiple_val in range(p * p, limit + 1, 2 * p):
                is_prime_odd[(multiple_val - 1) // 2] = False
        p_idx += 1

    # Collect primes: 2, and all odd numbers for which is_prime_odd is True
    primes = [2]
    for i, prime_status in enumerate(is_prime_odd):
        if prime_status:
            num = 2 * i + 1
            if num > limit: # Ensure we don't go past the limit
                break
            if num != 1: # 1 is not prime
                primes.append(num)

    return primes

# Example Usage
print(f"Primes up to 10 (optimized): {sieve_of_eratosthenes_optimized(10)}")
print(f"Primes up to 30 (optimized): {sieve_of_eratosthenes_optimized(30)}")
print(f"Primes up to 100 (optimized): {sieve_of_eratosthenes_optimized(100)}")
print(f"Primes up to 2 (optimized): {sieve_of_eratosthenes_optimized(2)}")
print(f"Primes up to 1 (optimized): {sieve_of_eratosthenes_optimized(1)}")
Primes up to 10 (optimized): [2, 3, 5, 7]
Primes up to 30 (optimized): [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
Primes up to 100 (optimized): [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Primes up to 2 (optimized): [2]
Primes up to 1 (optimized): []

Implementation in Go (Golang)

Go’s static typing and efficient slices make it a good choice for competitive programming and systems where performance matters.

package main

import "fmt"

// SieveOfEratosthenes finds all prime numbers up to 'limit' using the Sieve of Eratosthenes.
func SieveOfEratosthenes(limit int) []int {
	if limit < 2 {
		return []int{}
	}

	// Create a boolean slice, initialized to true (default for bool)
	isNotPrime := make([]bool, limit+1) // isNotPrime[i] == true means i is not prime

	// 0 and 1 are not prime
	isNotPrime[0] = true
	isNotPrime[1] = true

	// Start from p = 2
	for p := 2; p*p <= limit; p++ {
		// If isNotPrime[p] is false, then p is prime
		if !isNotPrime[p] {
			// Mark all multiples of p as not prime
			for multiple := p * p; multiple <= limit; multiple += p {
				isNotPrime[multiple] = true
			}
		}
	}

	// Collect all prime numbers
	primes := []int{}
	for i := 2; i <= limit; i++ {
		if !isNotPrime[i] {
			primes = append(primes, i)
		}
	}

	return primes
}

func main() {
	fmt.Printf("Primes up to 10: %v\n", SieveOfEratosthenes(10))
	fmt.Printf("Primes up to 30: %v\n", SieveOfEratosthenes(30))
	fmt.Printf("Primes up to 100: %v\n", SieveOfEratosthenes(100))
}
Primes up to 10: [2 3 5 7]
Primes up to 30: [2 3 5 7 11 13 17 19 23 29]
Primes up to 100: [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

Implementation in C++

C++ offers low-level control and performance, making it ideal for tasks where speed is critical. Using std::vector<bool> is a common optimization, as it’s specialized to pack booleans efficiently, often using single bits.

#include <iostream>
#include <vector>
#include <numeric> // For std::iota if needed, not strictly for sieve logic
#include <cmath>   // For std::sqrt

// SieveOfEratosthenes finds all prime numbers up to 'limit' using the Sieve of Eratosthenes.
std::vector<int> SieveOfEratosthenes(int limit) {
    if (limit < 2) {
        return {}; // Return empty vector for limits less than 2
    }

    // std::vector<bool> is a specialized template that uses bits for storage,
    // which is memory-efficient.
    std::vector<bool> is_prime(limit + 1, true);

    is_prime[0] = false; // 0 is not prime
    is_prime[1] = false; // 1 is not prime

    for (int p = 2; p * p <= limit; ++p) {
        // If is_prime[p] is still true, then it is a prime
        if (is_prime[p]) {
            // Mark all multiples of p as not prime
            // We start from p*p because smaller multiples would have been
            // marked by smaller primes.
            for (int multiple = p * p; multiple <= limit; multiple += p) {
                is_prime[multiple] = false;
            }
        }
    }

    // Collect all prime numbers
    std::vector<int> primes;
    for (int i = 2; i <= limit; ++i) {
        if (is_prime[i]) {
            primes.push_back(i);
        }
    }

    return primes;
}

int main() {
    std::cout << "Primes up to 10: ";
    for (int p : SieveOfEratosthenes(10)) {
        std::cout << p << " ";
    }
    std::cout << std::endl;

    std::cout << "Primes up to 30: ";
    for (int p : SieveOfEratosthenes(30)) {
        std::cout << p << " ";
    }
    std::cout << std::endl;

    std::cout << "Primes up to 100: ";
    for (int p : SieveOfEratosthenes(100)) {
        std::cout << p << " ";
    }
    std::cout << std::endl;

    return 0;
}
Primes up to 10: 2 3 5 7 
Primes up to 30: 2 3 5 7 11 13 17 19 23 29 
Primes up to 100: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

Complexity Analysis

  • Time Complexity: The Sieve of Eratosthenes has a time complexity of O(N log log N).

    • The outer loop runs up to sqrt(N).
    • The inner loop (marking multiples) runs for N/p times for each prime p.
    • The sum N/2 + N/3 + N/5 + ... for primes p up to N is approximately N log log N. This makes it extremely efficient for generating primes up to N.
  • Space Complexity: The algorithm requires a boolean array of size N+1, so its space complexity is O(N).

Limitations and Trade-offs

While highly efficient, the Sieve of Eratosthenes isn’t without its limitations:

  1. Memory Usage: For very large N (e.g., 10^10), O(N) space becomes prohibitive. Storing a boolean for every number up to 10^10 would require approximately 10 gigabytes of memory, which is usually too much for a single process.
  2. Fixed Upper Limit: It’s designed to find all primes up to a specific N. If you need to find primes in a very large range (e.g., between 10^12 and 10^12 + 10^5) but not from 0, the basic Sieve is inefficient as you’d still need an array up to 10^12.
  3. Not for Nth Prime: If your goal is to find the Nth prime number, the Sieve is not directly suitable unless you have an upper bound estimate for the Nth prime.

For these larger scale problems, variations like the Segmented Sieve (to handle larger ranges with limited memory) or more advanced algorithms like the Sieve of Atkin (which is asymptotically faster but much more complex to implement) might be considered.

Conclusion

The Sieve of Eratosthenes stands as a testament to the enduring power of simple yet brilliant algorithms. It’s an essential tool for any developer working with prime numbers up to a reasonable limit. Understanding its mechanics, optimizations, and complexities provides a solid foundation for tackling more advanced number theory and algorithm challenges. It’s a pragmatic, clear, and efficient solution that continues to be relevant in modern computing.

Last updated on