Empirical explorations of faster Fermat factorisation, part 7
by Stephen Hewitt | Published | Last updated
Introduction
This is the seventh in a series of technical articles presenting original work towards an open-source, free software implementation of the Fermat factorisation algorithm optimised for speed.
This article uses concepts and terminology introduced in the previous articles and is not intended to be understood without them. See the Related section below for the other articles in the series.
Part 3 reported a fourfold increase in speed after tuning the YAFU filter for a particular chosen value of number to factor. This article reports an empirical test of speed with tuning, across the full range of numbers to factor, representative of a random distribution.
Terminology
- big square
- Successful factorisation finds b2 - s2 = N where b2 is called here the big square. See Part 1.
- GPL
- GNU general public license
- LSB
- least significant bit
- LSD
- least significant digit
- M
- The input range, or modulus, of the filter. In effect, what happens during filtering is that a big presquare candidate is reduced mod M and the residue tested in the filter to discover whether it is viable. This is named after the variable in the YAFU code. See Part 3.
- N
- The number to factorise, of the form pq where p and q are large primes; possibly the modulus of an RSA public key.
- presquare
- The root x of any square x2 will be called the presquare.
- RR
- reduction ratio, defined in Part 3.
- RSA
- Rivest Shamir Adleman
- small square
- Successful factorisation finds b2 - s2 = N where s2 is called here the small square. See Part 1.
- YAFU
- Yet Another Factorisation Utility. See Part 2.
Experiment
For each of the 24 categories of N a design value of filter modulus M was selected from the Appendix in Part 5. These values are shown in Table 1.
Two modified versions of the YAFU Fermat factorisation function, zFermat(), were created. One was with a compact filter table of size M/RR (as discussed in Part 3 and in Part 5) and one with the original YAFU filter table of size M.
The modified compact table version allows M to be changed at run time, so that it can be tuned.
The benchmark software factorises each test number in three different ways.
- It uses the original YAFU zFermat() filter table size of M with the fixed YAFU M value of 176400.
- For the tuned filter it categorises each number to be factorised into one of the 24 categories and uses the value of M shown in Table 1 for that category.
- It uses the same version of zFermat() with a compact filter table, but with the original YAFU fixed value of M = 176400.
N category | M | RR | TR |
---|---|---|---|
AAA1 | 101606400 | 2362.5 | 43008 |
AAA3 | 793800 | 443.0 | 1792 |
AAA5 | 3175200 | 885.9 | 3584 |
AAB1 | 3763200 | 700.0 | 5376 |
AAB3 | 29400 | 131.2 | 224 |
AAB5 | 117600 | 262.5 | 448 |
ABA1 | 20321280 | 1653.8 | 12288 |
ABA3 | 158760 | 310.1 | 512 |
ABA5 | 635040 | 620.2 | 1024 |
ABB1 | 752640 | 490.0 | 1536 |
ABB3 | 5880 | 91.9 | 64 |
ABB5 | 23520 | 183.8 | 128 |
BAA1 | 14515200 | 1800.0 | 8064 |
BAA3 | 113400 | 337.5 | 336 |
BAA5 | 453600 | 675.0 | 672 |
BAB1 | 537600 | 533.3 | 1008 |
BAB3 | 4200 | 100.0 | 42 |
BAB5 | 16800 | 200.0 | 84 |
BBA1 | 2903040 | 1260.0 | 2304 |
BBA3 | 22680 | 236.2 | 96 |
BBA5 | 90720 | 472.5 | 192 |
BBB1 | 107520 | 373.3 | 288 |
BBB3 | 840 | 70.0 | 12 |
BBB5 | 3360 | 140.0 | 24 |
Results
N category | Original | Untuned | Tuned | Untuned/Tuned | RRtuned/RRuntuned |
---|---|---|---|---|---|
AAA1 | 185.8s | 118.0s | 11.8s | 10.0 | 12.0 |
AAA3 | 156.9s | 105.2s | 46.9s | 2.2 | 2.2 |
AAA5 | 104.1s | 68.1s | 17.0s | 4.0 | 4.5 |
AAB1 | 96.7s | 58.6s | 13.9s | 4.2 | 5.3 |
AAB3 | 105.1s | 71.5s | 71.5s | 1.0 | 1.0 |
AAB5 | 206.1s | 129.9s | 73.8s | 1.8 | 2.0 |
ABA1 | 89.2s | 55.6s | 5.5s | 10.1 | 12.0 |
ABA3 | 140.6s | 95.3s | 42.5s | 2.2 | 2.2 |
ABA5 | 258.1s | 162.1s | 40.9s | 4.0 | 4.5 |
ABB1 | 170.5s | 105.3s | 24.1s | 4.4 | 5.3 |
ABB3 | 352.5s | 245.8s | 247.9s | 1.0 | 1.0 |
ABB5 | 269.6s | 178.1s | 102.0s | 1.7 | 2.0 |
BAA1 | 128.7s | 80.8s | 8.3s | 9.7 | 12.0 |
BAA3 | 219.4s | 147.7s | 66.1s | 2.2 | 2.2 |
BAA5 | 208.8s | 140.9s | 36.3s | 3.9 | 4.5 |
BAB1 | 346.8s | 220.8s | 52.5s | 4.2 | 5.3 |
BAB3 | 309.6s | 206.5s | 210.2s | 1.0 | 1.0 |
BAB5 | 155.1s | 98.2s | 56.6s | 1.7 | 2.0 |
BBA1 | 268.0s | 160.4s | 16.2s | 9.9 | 12.0 |
BBA3 | 230.3s | 150.0s | 67.2s | 2.2 | 2.2 |
BBA5 | 310.1s | 197.0s | 49.4s | 4.0 | 4.5 |
BBB1 | 402.2s | 258.8s | 58.8s | 4.4 | 5.3 |
BBB3 | 278.9s | 190.3s | 200.3s | 0.9 | 1.0 |
BBB5 | 293.8s | 187.2s | 107.9s | 1.7 | 2.0 |
Mean | 221.3s | 163.3s | 80.6s | 2.0 | - |
The mean in Table 2 has been adjusted to reflect the fact that all categories of N occur with equal probability except for categories ending in '3' which occur twice as often. (This is because the '3' ending reflects either the LSB pattern 011 or 111 in N, whereas '5' reflects the bit pattern 101 only and '1' reflects the bit pattern 001 only, as described in Part 3)
Discussion
The results in Table 2 show that the version of the factorisation function modified to use a compact filter table (without tuning) factored the number in about 65-70% of the time of the version using the original sparse filter table, for every value of N.
As expected, the tuned version was faster than the untuned version, and multiplication in speed correlated with the calculated improvement in filter RR due to tuning. The last two columns of Table 2 illustrate this. However the correlation is not exact, and it is noticeable that the empirical improvement in almost all cases is only around 85% of the improvement in RR. For example for the category AAA1, the improvement in RR from tuning is a factor of 12 but the measured time reduction is only a factor of 10. For the category BBB5, the improvement in RR is 2 but the measured improvement is 1.7 Exceptions to this are the categories ending '3' where the RR is 1.0 or 2.2.
GPL free software to download
The open-source software here produces benchmark timings like the ones in Table 2.
It consists of the following files with the following SHA256 hashes at the following locations on this website.
3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986 /download/GPLv3.txt 2af8c3410899a696f6d3945993f5a0b2e227d4d41191912e1d157e18209d175f /download/benchmark.cc a68feaaaea73d4d9f6574de8ac61541dc49599d73f0e1591af52ab39e6b41c03 /download/benchmark.mk 6d06e857886ab6290412af17dcf294de54908e8c4bf02f5c3c96df74fdf1b388 /download/check.c 2daaa50eff3d0953a912c1b1ae8a089886264a37aea334eb4986108011601cbc /download/generate.cc 8037db0a4b7fc32e9e96741950e0d154e34c8580c680c22571fb1503522f4df8 /download/optima.gh 0ac100ee43662d69f9695728a5a9cd79332d14bcfb752b80cb99c428deb02365 /download/timings.cc e855144097e59ba7036e18f0ceecc08b937b085e8d14266fc51ed2bb42be6f1b /download/timings.h b9f239a4e595122a27d0c715cc05136ea46f493146c996be33b2a41ad1d0003e /download/zf.h ed0fd4971718c76b1ff7fe8ee641ddb4d674d4577f649e4afed103852af08b42 /download/zf1.c 2de98117dbb6ea4066facf91d111a05431f045e7ec5dcf9113ecf1b5bc2deda9 /download/zf2.c
The files zf1.c and zf2.c each contain a modified version of the zFermat() factorisation function from YAFU. This code generated the times recorded in the benchmarks in Table 2.
This build also requires a generated header file lsd2series.gh
which is the output from the programme lsd2.cc,
released in
Part 6.