Empirical explorations of faster Fermat factorisation, part 7

by | Published | Last updated

Introduction

This is the seventh in a series of technical articles presenting original work towards an open-source, free software implementation of the Fermat factorisation algorithm optimised for speed.

This article uses concepts and terminology introduced in the previous articles and is not intended to be understood without them. See the Related section below for the other articles in the series.

Part 3 reported a fourfold increase in speed after tuning the YAFU filter for a particular chosen value of number to factor. This article reports an empirical test of speed with tuning, across the full range of numbers to factor, representative of a random distribution.

Terminology

big square
Successful factorisation finds b2 - s2 = N where b2 is called here the big square. See Part 1.
GPL
GNU general public license
LSB
least significant bit
LSD
least significant digit
M
The input range, or modulus, of the filter. In effect, what happens during filtering is that a big presquare candidate is reduced mod M and the residue tested in the filter to discover whether it is viable. This is named after the variable in the YAFU code. See Part 3.
N
The number to factorise, of the form pq where p and q are large primes; possibly the modulus of an RSA public key.
presquare
The root x of any square x2 will be called the presquare.
RR
reduction ratio, defined in Part 3.
RSA
Rivest Shamir Adleman
small square
Successful factorisation finds b2 - s2 = N where s2 is called here the small square. See Part 1.
YAFU
Yet Another Factorisation Utility. See Part 2.

Experiment

For each of the 24 categories of N a design value of filter modulus M was selected from the Appendix in Part 5. These values are shown in Table 1.

Two modified versions of the YAFU Fermat factorisation function, zFermat(), were created. One was with a compact filter table of size M/RR (as discussed in Part 3 and in Part 5) and one with the original YAFU filter table of size M.

The modified compact table version allows M to be changed at run time, so that it can be tuned.

The benchmark software factorises each test number in three different ways.

  1. It uses the original YAFU zFermat() filter table size of M with the fixed YAFU M value of 176400.
  2. For the tuned filter it categorises each number to be factorised into one of the 24 categories and uses the value of M shown in Table 1 for that category.
  3. It uses the same version of zFermat() with a compact filter table, but with the original YAFU fixed value of M = 176400.
Table 1: The values of tuned filter modulus M chosen from the Pareto front in the Appendix in Part 5 for each of the 24 categories of number to factorise. The reduction ratio RR and the table ratio (TR) are fixed by the choice of M.
N category M RR TR
AAA1 101606400 2362.5 43008
AAA3 793800 443.0 1792
AAA5 3175200 885.9 3584
AAB1 3763200 700.0 5376
AAB3 29400 131.2 224
AAB5 117600 262.5 448
ABA1 20321280 1653.8 12288
ABA3 158760 310.1 512
ABA5 635040 620.2 1024
ABB1 752640 490.0 1536
ABB3 5880 91.9 64
ABB5 23520 183.8 128
BAA1 14515200 1800.0 8064
BAA3 113400 337.5 336
BAA5 453600 675.0 672
BAB1 537600 533.3 1008
BAB3 4200 100.0 42
BAB5 16800 200.0 84
BBA1 2903040 1260.0 2304
BBA3 22680 236.2 96
BBA5 90720 472.5 192
BBB1 107520 373.3 288
BBB3 840 70.0 12
BBB5 3360 140.0 24

Results

Table 2: Empirical measurements of Fermat factorisation time for three different implementations: the original YAFU zFermat() function, zFermat() with a compact table with an untuned filter and zFermat() with a compact table with a tuned filter. Filter parameters and measured times are shown for 24 categories of number to factorise.
N category Original Untuned Tuned Untuned/Tuned RRtuned/RRuntuned
AAA1 185.8s 118.0s 11.8s 10.0 12.0
AAA3 156.9s 105.2s 46.9s 2.2 2.2
AAA5 104.1s 68.1s 17.0s 4.0 4.5
AAB1 96.7s 58.6s 13.9s 4.2 5.3
AAB3 105.1s 71.5s 71.5s 1.0 1.0
AAB5 206.1s 129.9s 73.8s 1.8 2.0
ABA1 89.2s 55.6s 5.5s 10.1 12.0
ABA3 140.6s 95.3s 42.5s 2.2 2.2
ABA5 258.1s 162.1s 40.9s 4.0 4.5
ABB1 170.5s 105.3s 24.1s 4.4 5.3
ABB3 352.5s 245.8s 247.9s 1.0 1.0
ABB5 269.6s 178.1s 102.0s 1.7 2.0
BAA1 128.7s 80.8s 8.3s 9.7 12.0
BAA3 219.4s 147.7s 66.1s 2.2 2.2
BAA5 208.8s 140.9s 36.3s 3.9 4.5
BAB1 346.8s 220.8s 52.5s 4.2 5.3
BAB3 309.6s 206.5s 210.2s 1.0 1.0
BAB5 155.1s 98.2s 56.6s 1.7 2.0
BBA1 268.0s 160.4s 16.2s 9.9 12.0
BBA3 230.3s 150.0s 67.2s 2.2 2.2
BBA5 310.1s 197.0s 49.4s 4.0 4.5
BBB1 402.2s 258.8s 58.8s 4.4 5.3
BBB3 278.9s 190.3s 200.3s 0.9 1.0
BBB5 293.8s 187.2s 107.9s 1.7 2.0
Mean 221.3s 163.3s 80.6s 2.0 -

The mean in Table 2 has been adjusted to reflect the fact that all categories of N occur with equal probability except for categories ending in '3' which occur twice as often. (This is because the '3' ending reflects either the LSB pattern 011 or 111 in N, whereas '5' reflects the bit pattern 101 only and '1' reflects the bit pattern 001 only, as described in Part 3)

Discussion

The results in Table 2 show that the version of the factorisation function modified to use a compact filter table (without tuning) factored the number in about 65-70% of the time of the version using the original sparse filter table, for every value of N.

As expected, the tuned version was faster than the untuned version, and multiplication in speed correlated with the calculated improvement in filter RR due to tuning. The last two columns of Table 2 illustrate this. However the correlation is not exact, and it is noticeable that the empirical improvement in almost all cases is only around 85% of the improvement in RR. For example for the category AAA1, the improvement in RR from tuning is a factor of 12 but the measured time reduction is only a factor of 10. For the category BBB5, the improvement in RR is 2 but the measured improvement is 1.7 Exceptions to this are the categories ending '3' where the RR is 1.0 or 2.2.

GPL free software to download

The open-source software here produces benchmark timings like the ones in Table 2.

It consists of the following files with the following SHA256 hashes at the following locations on this website.

3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986  /download/GPLv3.txt
2af8c3410899a696f6d3945993f5a0b2e227d4d41191912e1d157e18209d175f  /download/benchmark.cc
a68feaaaea73d4d9f6574de8ac61541dc49599d73f0e1591af52ab39e6b41c03  /download/benchmark.mk
6d06e857886ab6290412af17dcf294de54908e8c4bf02f5c3c96df74fdf1b388  /download/check.c
2daaa50eff3d0953a912c1b1ae8a089886264a37aea334eb4986108011601cbc  /download/generate.cc
8037db0a4b7fc32e9e96741950e0d154e34c8580c680c22571fb1503522f4df8  /download/optima.gh
0ac100ee43662d69f9695728a5a9cd79332d14bcfb752b80cb99c428deb02365  /download/timings.cc
e855144097e59ba7036e18f0ceecc08b937b085e8d14266fc51ed2bb42be6f1b  /download/timings.h
b9f239a4e595122a27d0c715cc05136ea46f493146c996be33b2a41ad1d0003e  /download/zf.h
ed0fd4971718c76b1ff7fe8ee641ddb4d674d4577f649e4afed103852af08b42  /download/zf1.c
2de98117dbb6ea4066facf91d111a05431f045e7ec5dcf9113ecf1b5bc2deda9  /download/zf2.c

The files zf1.c and zf2.c each contain a modified version of the zFermat() factorisation function from YAFU. This code generated the times recorded in the benchmarks in Table 2.

This build also requires a generated header file lsd2series.gh which is the output from the programme lsd2.cc, released in Part 6.

Related