Empirical explorations of faster Fermat factorisation, part 8

by | Published


This is the eighth in a series of technical articles presenting original work towards an open-source, free software implementation of the Fermat factorisation algorithm optimised for speed.

This article uses concepts and terminology introduced in the previous articles and is not intended to be understood without them. See the Related section below for the other articles in the series.

Part 3 explained optimisation by filtering out non-viable presquare candidates using reduction modulo M. The public-domain code YAFU uses three separate filters with three different M values and combines them by applying them successively. This article introduces a different way of combining two such filters.


big square
Successful factorisation finds b2 - s2 = N where b2 is called here the big square. See Part 1.
GNU general public license
least significant bit
least significant digit
The input range, or modulus, of the filter. In effect, what happens during filtering is that a big presquare candidate is reduced mod M and the residue tested in the filter to discover whether it is viable. This is named after the variable in the YAFU code. See Part 3.
The number to factorise, of the form pq where p and q are large primes; possibly the modulus of an RSA public key.
The root x of any square x2 will be called the presquare.
reduction ratio, defined in Part 3.
Rivest Shamir Adleman
small square
Successful factorisation finds b2 - s2 = N where s2 is called here the small square. See Part 1.
table ratio. This is the number of elements needed in a table to implement a filter. It is the same as the number of candidates that pass the filter in the input range M of the filter. If M is the moduluse of the filter then by their definitions RR = M/TR.
Yet Another Factorisation Utility. See Part 2.


If M is a composite number then it filters by all its factors at the same time which means that the more factors that could be included in a filter, the faster factorisation would be. The limit to the number of factors that can be added to a single filter is the resulting size of the filter table because M increases multiplicatively with each factor added.

YAFU uses three separate filters applied successively. Candidates passing the first filter are presented to the second filter and those that pass the second filter are presented to the third filter. The modulus of each of the three filters is represented by a variable called M, M1 and M2 respectively. Their values are defined in the following code snippet from zFermat() in YAFU's factor/trialdiv.c

uint32 M = 2 * 2 * 2 * 2 * 3 * 3 * 5 * 5 * 7 * 7; //176400u
uint32 M1 = 11 * 17 * 23 * 31; //133331u
uint32 M2 = 13 * 19 * 29 * 37; //265031u

The first filter is stored as series of skips from one viable big presquare candidate to the next The second and third filter tables are stored differently. For them, the filter table is a bit map of size M1 or M2 respectively. The position in the bit map represents the modular residue of the big presquare candidate with respect to the filter modulus. The value of the bit indicates whether that value passes the filter or is rejected.

The alternative filter combination

The idea here was to attempt to combine two filters where both filters are stored as a list of skips or intervals to the next candidate.

One motivation is the following reasoning. Although storing a skip value uses 16 bits (in YAFU's implementation) rather than the single bit of a bit map, the number of elements in the table is M/RR rather than M, as discussed in Part 3 and demonstrated in code in Part 7. Now the RR of a larger prime (say 11 or more) in a filter is approximately two, as noted in Part 3. For lower primes it can be significantly more. It follows that with four prime factors in the filter, the 16x size advantage of the bit map is gone. As more primes are added the bit map will grow at about twice the rate of the of the skip table. So the list-of-intervals implementation will scale with larger M values better than the bit map implementation.

The idea for combining two filters was to consult both skip tables simultaneously in a manner reminiscent of a merge sort.

For the current presquare candidate the algorithm looks into the first filter table to see what value must be added to make it pass this filter. Let's call this value the advance request of that filter. It then consults the table of the second filter in the same way. If these two values are the same then it has found a candidate that passes both filters. If not, it looks to see which filter is requesting the smaller advance. It then adds the next skip value from that filter's skip table to that filter's advance request and makes the comparison again. It keeps adding the next skip interval from whichever filter is requesting the smaller advance until the advances requested from both filters are equal. When they are equal it tests the candidate and clears these accumulated advance requests of both filters.

All these calculations are done over integers. There are no modular reductions in the search loop (except for the index of each table, which must wrap when it reaches the top of the table).