A prime number is an integer number divisible only by 1 and by itself. Opposite to prime numbers there are composite numbers, numbers that can be obtained multiplying at least two numbers greater than 1.
Euclide’s theorem is a simple theorem that proves the infinity of prime numbers, and by Euler’s fundamental theorem of arithmetic we know that every number has a unique representation in prime factors, so they are the bricks of the whole integer numbers domain.
Trial division and basic facts
The simplest approach to determine if a number is prime or composite is trial division: you try to divide the candidate n by all the primes smaller than it. You can optimize this method noticing that you need to divide the number only up to primes at most equals to square root of the candidate (the Sieve of Eratosthenes in next paragraph shows why this is true).
There is an interesting algorithm to find all numbers less then n in few time, the Sieve of Eratosthenes. You take a table with numbers from 1 to n, you highlight 2 and them make an X over the multiples of 2. The first remaining number is 3, so you highlight 3 and mark all multiples of 3. You repeat this elimination process until √ n and in the table will survive only primes numbers less then n. This algorithm is fast, in fact it requires O(n log log n) operations.
Image to divide numbers in ranges of 1000 numbers. As you can perceive, the greatest is the starting number of a range and the more it will be sieved by the previous sieve, so less prime numbers should be contained in the range. We already know that prime numbers are infinite, so how sparse do they become in the integer domain? The response is the Prime Number Theorem of Gauss: π(N) ~ N/log(N), that is the number of prime numbers less than N is asymptotically N/log(N). For large N there is a 1/log N probability than a random number of that size is prime.
Pollard’s Rho method
The Pollard’s Rho method uses clever observations to probabilistic speed up the factorization process. Let n = pq, and consider the polynomial f(x) = x2 + 1 modulo n. Let choose a value for x, let’s say x1, the algorithm uses the pseudo-random sequence s1 = f(x1), s2 = f(f(x1)), s3 = f(f(f(x1))), etc, that is related to the sequence modulo p, although it is unknown and we can not compute it directly.
The sequences modulo n and modulo p at some point it will repeat their elements. Since the sequences are similar to random sequences, for the Birthday Paradox we expect a repetition after O(√ N) elements, that is smaller in the mod p sequence than in mod n sequence. If you find two numbers in this sequence mod n with xa ≠ xb, but xa ≡ xb mod p than |xa – xb| is a multiple of p, so gcd(n, |xa – xb|) will give n or p.
The fact that the used sequence repeats elements at some point can be thought as a line shaped like the Rho greek letter (ρ) and this names the algorithm.
Quadratic Sieve and its enhancements
Previous algorithms require in average exponential time to factor the number. Since eighties sub-exponential algorithms have been studied, the first of these is Quadratic Sieve.
It tries to factor a number n building a difference of squares modulo n. Formally x2 ≡ y2 (mod
n), but it simply means that x2 – y2 = kn and we can try to factor n using gcd
(n, x–y).
But how can it build a difference of squares? Consider a polynomial f
(x) = x2 – n, when x is about √n than f(x) modulo n is “small” and we can find relations like f(x) ≡ abc (mod
n) with “small” factors like a,b and c. We want to combine relations like this to obtain squares in both the sides of the modulo, for example:
u12 ≡ abc (mod
n)
u22 ≡ ac (mod
n)
u32 ≡ b (mod
n)
We can combine them in
(u1u2u3)2 ≡ (abc)2 (mod
n)
This is a difference of squares like declared above. The fact that if f
(x) is divisible by p also f
(x+ kp) are divisible by p allow us to “sieve” the polynomial. This is the basic algorithm, but lots of enhancements are availables (multiple polynomial, small and large primes variations etc.) Please refer to the full article for details.
Primality and Pseudo-primality tests
TODO: Primality and Pseudo-primality tests
Algorithms complexity
If you don’t like algorithm complexity theory please skip this paragraph. 🙂 We present models useful to study complexity here, and hopefully we’ll treat also complexity in general in future.
In trial division considering that the size of the problem is x = log n and that you must perform in the worst case about √n = n1/2 division, the complexity of this approach is exponential: O (2x/2).
Since Pollard’s Rho method involve birthday paradox, you expect a repetition in the sequence and possibly a factor in O(√ p) steps, which in the worst case (the maximum minimum factor size is √ n) is O(∜p) so the overall expected complexity is still exponential but now it is O(2x/4).
Mersenne Primes
Mersenne Primes are primes with special form 2p-1, that enables them to be tested for primality with the fast Lucas-Lehmer. Maybe they are interesting only as intellectual challenge, but the GIMPS project that aims to find them is a noticeable example of powerful programs and organization between people.