(Note: last time I looked, the following link was no longer valid.) For another source of introductory material on factoring, you may want to go to this website and download the paper by Peter Montgomery. This file is in PostScript format, so you will probably need to print it (on a printer that understands PostScript) before you can read it.
I have been using a program called factor.exe, which can find the factors of any number up to about 80 digits, using several different factoring methods. This program uses its authors MIRACL library for doing arithmetic with large numbers.
Another program that I have found useful is PPSIQS, available here. (Scroll down a couple of screenfuls to find it.) It uses the Quadratic Sieve factoring method, to factor numbers up to about 100 digits. A 100-digit number will take a few weeks. That site also has PPMPQS, an earlier program that also uses the Quadratic Sieve.
A prime number is an integer greater than 1 (one)
that is divisible only by 1 and itself.
A composite number is an integer greater than 1
that is not prime.
12 = 2 * 2 * 3
30 = 2 * 3 * 5
119 = 7 * 17
245 = 5 * 7 * 7
257 = 257 (a prime number)
Note that the asterisk (*) is used to mean multiply, since the usual multiplication symbol looks too much like the letter x.
If p is a prime number, then for any integer a:
a^p - a is divisible by p.
If a is not divisible by p, then we can divide the above by a, and get:
a^(p-1) - 1 is divisible by p.
For example, for the prime number 7:
2^7 - 2 = 126 = 7 * 18 3^7 - 3 = 2184 = 7 * 312 4^7 - 4 = 16380 = 7 * 2340 2^6 - 1 = 63 = 7 * 9 3^6 - 1 = 728 = 7 * 104 4^6 - 1 = 4095 = 7 * 585If we call the number we are trying to factor N, then we pick a value for a, such as 3, and compute the remainder when dividing 3^(N - 1) by N. If N is prime (and not equal to 3), then the result will be 1. If the result is not 1, then N cannot be prime.
Note that if the result of this calculation is 1, this does not prove that N is prime. It means that N is very likely to be prime, but it could be composite, since some composite numbers will look prime by this test. Proving that a large number is prime takes more work than just this test. The Prime Pages has an introduction to proving primality.
There are two basic classes of factoring algorithms. In one class, the amount of work (or computer time) that is needed to find a factor depends mostly on the size of the factor that is found (or looked for). In the other class, the amount of work depends primarily on the size of the number being factored. The first several methods described below belong to the first class.
The program factor.exe uses all of the following methods except for the Number Field Sieve.
As an example, suppose you want to find the factors of:
24595184394
First, we divide by 2 and find that the remainder is zero. So we have:
24595184394 = 2 * 12297592197
But we are not done yet. That last number is not prime. So we continue (with the new number), by trying 2 again. After trying 2, 3, 5, 7, and 11, we will have:
24595184394 = 2 * 3 * 3 * 11 * 124218103
A person with pencil and paper might stop here, without being able to determine whether that last factor is prime or not. But the computer would continue and find that:
24595184394 = 2 * 3 * 3 * 11 * 97 * 103 * 12433
So how do we know that the computer is done? Imagine that the computer has just divided by 103, and got a quotient of 12433 and a remainder of 0. Now it will divide 12433 by 103, and get quotient 120, remainder 73. After dividing by the primes 107 and 109, it will divide 12433 by 113 and get quotient 110, remainder 3. So if 12433 has a prime divisor greater than 113, then it is also divisible by the quotient (12433/prime), which will be less than 113. In this case, we would have already found this smaller factor. So 12433 must be prime, and the factorization is complete.Trial division is suitable for finding factors up to about 8 or 9 digits. But what if the number to be factored is:
174224571863520493293247799005065324265471
For this number, the smallest prime factor is 20 digits long. Finding this factor by trial division would take years, even on the fastest computer. In contrast, it only takes factor.exe a minute or two to find that this number is:
32032215596496435569 * 5439042183600204290159 (both prime)
factor.exe finds these factors using the quadratic sieve method, after trying the other methods first.
x^2 + 1
and an initial value of x, such as 1, and compute the value of the polynomial. The polynomial is then evaluated using the computed value as the new value of x. The values of x are reduced modulo N (use the remainder when divided by N), where N is the number being factored. Eventually, the sequence of values of x will repeat, except for possibly the first several values of x. However, we are hoping that the sequence will repeat modulo p, where p is a factor of N; and that the length of the repeating sequence is not too big.
So, how do we test if the sequence repeats modulo p, when we dont know what p is? Suppose, for example, that the 3rd and 15th term of the sequence are the same modulo p, for some factor p of N. Then, if we subtract the two values (modulo N), the result will be divisible by p. If we then compute the Greatest Common Divisor of this difference and N, we will get p (or a multiple of p). In any case, the result will be a factor of N (and we hope this factor is not N itself).
In the example above, note that if the 3rd and 15th terms are the same modulo p, then the 4th and 16th terms will be the same, the 5th and 17th terms will be same (all modulo p), and so on.
An implementation of Pollards rho method is:
Initialize: A = first term of sequence, B = second term Step 1: set A = next term of sequence (that is: A <-- (A^2 + 1) mod N) Step 2: compute two terms of the sequence starting at B and set B to the second of these terms. Step 3: Compute GCD(B-A,N) If the GCD is 1, go to Step 1 If the GCD is greater than 1 and less than N, then it is a factor (not necessarily prime) of N. Otherwise, the GCD is N. Either give up, or start over with a different polynomial.The first time we reach Step 3, A is the second term and B is the fourth term. The next time, A is the third term and B is the sixth term. Thus we test for a repetition period of length 2, then 3, then 4, and so on. In the example above, where the 3rd and 15th terms are the same modulo p, we will find the factor p (or larger) when A is the 12th term and B is the 24th.
Since computing a GCD (Step 3) takes longer than computing three terms in the sequence (Steps 1 and 2), we can speed things up by changing Step 3 so that it computes the GCD only some of the time (perhaps every 10th time, for example). The other times, Step 3 will accumulate values of (B-A), by multiplying them together and taking the remainder when divided by N.
Brents method is a modification of the rho method, discovered by Richard P. Brent, that requires that the sequence be computed only once, instead of twice.
In the program factor.exe, Brents method finds factors from 5 digits up to about 10 or 11 digits.
As an example, suppose we want to find the factors of N=10573. Using the polynomial X^2 + 1 and an initial value of 1 for x, the sequence is (modulo 10573):
1, 2, 5, 26, 95, 677, 3691, 5458, 5624, 5534, ...
In Step 3, we will compute GCD(2-1, 10573), GCD(26-2, 10573) GCD(677-5, 10573) and GCD( 5458-26, 10573). The first three GCDs are 1, and the last is 97. Thus, we have found the factor 97. Dividing 10573 by 97 gives the quotient 109, so 10573 = 97*109.
Let p be a prime factor of n. Then, by Fermats Little Theorem:
3p-1 = 1 mod p
Note that this implies, for any positive integer k
3k(p-1) = 1 mod p
This is because
3k(p-1) = (3(p-1))^k = 1^k = 1 mod p
The (p-1) method raises 3 (or some other value for a) to some power t, where t has lots of small factors. If (p-1) has only small factors (such that it is a factor of the exponent t), then we will have:
3t = 1 mod p
This last statement is equivalent to:
3t - 1 is a multiple of p
Since we cannot calculate 3t modulo p, because we dont know what p is, we do the calculation modulo n instead. We then compute
gcd(3t - 1, n)
The result will be p, or possibly a multiple of p.
For example, let n = 18446744073709551617, and let us choose the limit B1 to be 19. Then one possible value for t is 19! (factorial). [Note: choosing the exponent t this way results in too many occurrences of small factors (such as 2 and 3), so in an actual implementation of the (p-1) method, t is chosen a little differently.] We then compute:
319! = 12392788485487941828 mod n
and then
gcd( 12392788485487941827, n ) = 274177
so we have found a factor. Note that
19! = 216 * 38 * 53 * 72 * 11 * 13 * 17 * 19274176 = 28 * 32 * 7 * 17
so that 274176 is indeed a factor of t = 19!, which is required in order for the (p-1) method to find this factor.
More details to be added.
As with the (p-1) and (p+1) methods, limits B1 and B2 are chosen. (Like those other methods, larger values of B1 and B2 can result in finding larger factors, but the computation takes longer as well.) B1 can be anywhere from about 11,000 (to find small factors) to as high as 44,000,000 or more (if you are really ambitious). B2 is often 100*B1.
To run one curve, a starting point, called sigma, is chosen. I am not sure if sigma determines which elliptic curve is used, or if it determines a starting point on some built-in curve. (Or possibly it is something else entirely.) The actual calculation is similar in concept to the (p-1) or (p+1) calculation, in that we are hoping that a particular number, related to a prime factor p, has only small factors. If no factor is found after doing Phase 1 and (if necessary) Phase 2, a different value is chosen for sigma and the calculation is repeated.
The Elliptic Curve Method finds a factor p if p+d has only small factors, where d is a number between -sqrt(p) and sqrt(p). The value of d for a particular calculation depends on the value of sigma. So, if one value of sigma fails to find a factor, we try another value of sigma, hoping that eventually we pick a sigma that succeeds. Typically, an ECM program will either choose a random value of sigma each time it starts a new curve, or it will run through a predetermined series of values for sigma.
If we run lots of curves and fail to find a factor, we increase the limits B1 and B2, and try some more curves. Paul Zimmermanns ECMNET page contains a table (scroll down a ways to find it) of how many curves to run with particular B1 and B2 values.
I used an ECM factoring program from ECMNET to complete the factorization of Woodall number 441. This number is 441*2^441 - 1. As of October, 1998, five prime factors of this number were known, but there was a 100-digit composite factor still to be dealt with. (How was it known that this number is composite? The answer is Fermats Little Theorem. See above.) In February, 1999, after about three months of trying, the ECM program announced that the 100-digit number was the product of a 36-digit prime and a 65-digit prime. This occurred on the 304th curve that was tried. The output of the ECM program was:
GMP-ECM 3, by P. Zimmermann (Inria), 17 Sep 1998, with contributions from T. Granlund, P. Leyland, C. Curry, A. Stuebinger, G. Woltman, JC. Meyrignac. Input number is 2115884038681905273654978796817034949245931610669015166858567548335257539002578135081915952440228791 (100 digits) Using B1=20000000, B2=2000000000, polynomial x^120, sigma=734067359 Step 1 took 18544175ms for 262778083 muls, 3 gcdexts Step 2 took 9473901ms for 111918387 muls, 330429 gcdexts ********** Factor found in step 2: 210591719488568010340939210068507793 Found probable prime factor of 36 digits: 210591719488568010340939210068507793 Probable prime cofactor 10047327804817920303940778680419610211637795822510390814121375687 has 65 digitsIn July, 1999, I found the last two factors of Smarandache number 79. This number is 12345678910111213141516171819...76777879, that is the numbers from 1 to 79 written as one big number. Previously it was known that this number is:
73 * 137 * 22683534613064519783 * 132316335833889742191773 * c102where c102 is a 102-digit composite number. Using factor.exe (with my modifications), I factored the c102 as:
35488612864124533038957177977 * 11589330059060921218833486882285427414280233987959540582909167514265308253The program factor.exe uses several different factoring methods, and it was the Elliptic Curve method that actually found these factors.
The following two factoring methods work by trying to find integers A and B (with both of them between 1 and N) such that A^2 - B^2 is a multiple of N (the number to be factored). Note that A^2 - B^2 = (A + B)*(A - B). If A is not equal to B and A+B is not equal to N, then GCD(A+B,N) and GCD(A-B,N) will be factors of N.
As above, GCD means Greatest Common Divisor.
Probably, most quadratic sieve programs are written so that one computer does all the work. However, it can be written so that the sieving can take place on several computers simultaneously, with each computer using a different range. The results from each computer are then combined to find factors.
This method is suitable for numbers up to about 100 digits, or perhaps a few more. For numbers larger than this, the Number Field Sieve is more efficient.
In contrast to the quadratic sieve, implementations of the Number Field Sieve are (probably) almost always written so that sieving can take place on several computers, with the results being combined later.
The following is a portion of a read me file about a project that used the Number Field Sieve (NFS). In this write-up, C refers (I think) to the set of complex numbers, which are numbers of the form a+bi, where a and b are real numbers (or perhaps integers?) and i is sqrt(-1).
Special Number Field Sieve --------------------------- A brief introduction for those unfamiliar with the algorithm. I have simplified some things. The version of the Number Field Sieve implemented in this code uses "special" polynomials to factor numbers of the form a^n +/- 1. It can do others (e.g. k*A^N + k1*B^M) as well. The two polynomials will generally be a linear one and a quartic or quintic, but can vary somewhat. These two polynomials share a common root mod N, where N is the number we are trying to factor. Call this M. The linear polynomial is usually just 'x - M'. Consider a root (in C) of the quartic/quintic. Call this alpha. There is a map (a homomorphism) which sends alpha to M mod N (because the polynomials share a common root). Thus: M = phi(alpha) mod N, where phi is the homomorphism. We then have, for sets of integers (a,b): a + bM = phi(a + b alpha) mod N. What NFS does is hope to find *many* sets (a,b) such that a + bM and norm(a + b alpha) are both factorisable with a fixed set of primes. (actually one is with prime ideals of index 1; a minor nitpick). This is done by sieving, which is the time consuming part of factoring N. After enough factorizations have been collected, we perform some "magic" and come up with a set of (a,b)'s such that the product over all values of (a + bM) and over all values of norm(a + b alpha) are squares. Thus, we get: A^2 = B^2 mod N and: GCD(A+B,N) is a factor of N if A != +/- B.
A typical factoring project using NFS works as follows. The number to be factored is one which is composite, but for which the other factoring methods have not produced any factors. Usually this number will be between about 100 and 200 digits. The sieving part of the factoring is done first. Often this is done by several volunteers using PC-style computers, with each computer working on a different range of numbers. When enough data is obtained (taking perhaps 2 weeks to 3 or 4 months), it is sent to a single collection point, where the project leader (for example) shuffles it all together, and perhaps tweaks the data a little, such as removing duplicate data. The data is then sent to a super computer, which will spend several hours digesting the data, to come up with the factors.
I saw a write-up somewhere about why a super computer is needed for part 2 of NFS. I will try to find it again. For now, I think I remember that this computation needed several hundred megabytes of RAM, and it was working on an array of numbers that was about 1,000,000 by 1,000,000. Several hours of computing on a super computer may correspond to a couple of weeks on todays PCs.
Last Update: August 21, 2002.