Innovation and design

Empirical explorations of faster Fermat factorisation, part 9

by Stephen Hewitt | Published 13 October 2022

Introduction

This is the ninth in a series of technical articles presenting original work towards an open-source, free software implementation of the Fermat factorisation algorithm optimised for speed. It uses concepts and terminology introduced in the previous articles and is not intended to be understood without them. See the Related section below for the other articles in the series.

This article reports an experiment to measure the performance of Fermat factorisation with filtering using the idea of the previous article (Part 8) for filtering with two filters by considering them in parallel rather than sequentially. It presents the software written for this experiment, under a free software GPL licence. This is the first implementation using filtering that does not incorporate code from YAFU.

Experiment

The experiment compared the factorisation times of the following two programmes for the same N values on the same machine.

New software written to implement the idea for the parallel combination of two filters of Part 8.
The fastest software presented in Part 7, modified to reduce the number of its filters from three to two. (This was the YAFU zFermat() function with a reduced skip table, with the first filter tuned to suit N.)

Both these programmes used the same filter values as each other. The experiment used the same 24 values of N used in the benchmarks of Part 7. Each of these belongs to a different category of N, as described in Part 5. The first of the two filters was tuned to suit the category of N, with the same values as used in Part 7, shown in its Table 1. The other filter was set to the original YAFU value of 133331.

As described in Part 8, the YAFU combining of the two filters is sequential. The first filter is stored as a series of skip values from one candidate that passes the filter to the next. A candidate passing this first filter is then presented to the second filter, which is stored as a look-up table in the form of bit map.

In contrast the combining of two filters of Part 8 is parallel and symmetrical. Both filters are stored as a table of skip values from one candidate that passes the filter to the next. The search steps through both tables in parallel until it finds a matching cumulative offset from the last valid candidate in both tables. The following code snippet illustrates the basic algorithm.

    while (1)
    {
        if (advance_requested > advance2_requested)
        {
            advance2_requested += skip2[skindex2++];

            if (skindex2 >= skip2_n)
                skindex2 = 0;
        }
        else if (advance2_requested > advance_requested)
        {
            advance_requested += skip[skindex++];

            if (skindex >= skip_n)
                skindex = 0;
        }
        else
        {
            Form the little square candidate here and test whether it is square.  Break from loop if found.

            // now start next round
            advance_requested = skip[skindex++];
            if (skindex >= skip_n)
                skindex = 0;

            advance2_requested = skip2[skindex2++];
            if (skindex2 >= skip2_n)
                skindex2 = 0;
        }
    }

Results

The results of this experiment are shown in Table 1.

Table 1: Empirical measurements of Fermat factorisation time in seconds for two different implementations: 1. The YAFU `zFermat()` function, modified to use a compact filter table and with a tuned first filter as in Part 7 but with the third filter removed, leaving two filters. 2. The same two filters but with the new parallel filter combining algorithm. The measured times are shown for 24 categories of number to factorise. The last column shows the filter modulus used in the tuned filter.
`N` category	Sequential	Parallel	Relative time	Tuned RR
AAA1	34.7s	537.4s	15.5	2362.5
AAA3	147.5s	572.6s	3.9	443.0
AAA5	53.2s	320.9s	6.0	885.9
AAB1	46.9s	240.6s	5.1	700.0
AAB3	233.9s	443.8s	1.9	131.2
AAB5	226.7s	590.5s	2.6	262.5
ABA1	15.2s	165.0s	10.9	1653.8
ABA3	133.1s	382.7s	2.9	310.1
ABA5	132.8s	669.2s	5.0	620.2
ABB1	73.8s	308.7s	4.2	490.0
ABB3	805.6s	1340.7s	1.7	91.9
ABB5	328.6s	711.6s	2.2	183.8
BAA1	24.7s	274.6s	11.1	1800.0
BAA3	221.0s	673.6s	3.0	337.5
BAA5	116.2s	638.7s	5.5	675.0
BAB1	182.7s	776.4s	4.2	533.3
BAB3	687.6s	1172.6s	1.7	100.0
BAB5	189.5s	428.6s	2.3	200.0
BBA1	51.7s	467.5s	9.0	1260.0
BBA3	187.4s	485.2s	2.6	236.2
BBA5	143.2s	577.2s	4.0	472.5
BBB1	182.2s	630.7s	3.5	373.3
BBB3	639.1s	1005.5s	1.6	70.0
BBB5	335.6s	669.9s	2.0	140.0

Discussion

The parallel algorithm for combining two filters was significantly slower than the sequential bit map implementation of YAFU, in some cases by more than an order of magnitude. However, the ratio between their times varies greatly with the category of N, from 1.6 to 15.5, and seems to correlate with the RR of the tuned filter. This suggests that the parallel filter combining might be improved by a different distribution of the prime factors between the two filters.

On theoretical reasoning, it may be that time is lost in the parallel algorithm when one of the filters has a very large skip value but skip values in the other filter are small. (For example the maximum skip value in the tuned filter for N of class AAA1 is 14,080 but in the other filter the maximum skip is only 134). Then the algorithm will have to make multiple increments in the other filter table to catch up with the the large skip. The algorithm using a bit map for the second filter can make a random-access query of the second filter at the same cost regardless of the length of the skip from the first filter.

GPL free software to download

Here is the experimental factorisation software using the parallel filter combining algorithm. It consists of the following files with the following SHA256 hashes at the following locations on this website.

/download/check2.cc: 5e7a51d83f82c5f9f61b3bea3efb64bbe6634935941ec00ada0ca0a939f4598f
/download/check2.h: 413175e31f6ac6c7121b3a21f485c57c885c24e1b8a48677098b2bccddbcbd9b
/download/demo_dual.cc: 26d7ef95ff0a1346ac9986872381fd83f453d79dfc58123d349faf29b5c8b546
/download/dual.cc: a2a6bc5730e69bcd769191b5073e227318e797216f17b249003fb281bcd8a2a7
/download/dual.h: d790ae488a97b6b2480d193139e2d50bdf0b850cda1f9d226a10e01f3bf80f54
/download/filter.cc: 4a6d20f1c87d5ef32c9669f89d0d97d92706fe251549cf81e0e6b22dd60df788
/download/filter.h: 00a4ca0bfec01ac34fe77077334db44eb2d5d1cfd54bfc905fda1a829257db29
/download/optima.gh: 8037db0a4b7fc32e9e96741950e0d154e34c8580c680c22571fb1503522f4df8
/download/result.h: 53301afc31eb5d82e2bf033abf91800a5fea36a67e85fe821324f38a32d64b3c

It requires the GMP library. On GNU/Linux it has been built with

g++ --std=c++17 demo_dual.cc filter.cc dual.cc check2.cc -l:libgmp.a -o demo_dual

Terminology

big square: Successful factorisation finds b² - s² = N where b² is called here the big square. See Part 1.
GPL: GNU general public license
LSB: least significant bit
LSD: least significant digit
M: The input range, or modulus, of the filter. In effect, what happens during filtering is that a big presquare candidate is reduced mod M and the residue tested in the filter to discover whether it is viable. This is named after the variable in the YAFU code. See Part 3.
N: The number to factorise, of the form pq where p and q are large primes; possibly the modulus of an RSA public key.
presquare: The root x of any square x² will be called the presquare.
RR: reduction ratio, defined in Part 3.
RSA: Rivest Shamir Adleman
small square: Successful factorisation finds b² - s² = N where s² is called here the small square. See Part 1.
TR: table ratio. This is the number of elements needed in a table to implement a filter. It is the same as the number of candidates that pass the filter in the input range M of the filter. If M is the moduluse of the filter then by their definitions RR = M/TR.
YAFU: Yet Another Factorisation Utility. See Part 2.

Empirical explorations of faster Fermat factorisation, part 9

Introduction

Experiment

Results

Discussion

GPL free software to download

Terminology

Related articles

Empirical explorations of faster Fermat factorisation, part 1

Empirical explorations of faster Fermat factorisation, part 2

Empirical explorations of faster Fermat factorisation, part 3

Empirical explorations of faster Fermat factorisation, part 4

Empirical explorations of faster Fermat factorisation, part 5

Empirical explorations of faster Fermat factorisation, part 6

Empirical explorations of faster Fermat factorisation, part 7

Empirical explorations of faster Fermat factorisation, part 8

How to remember a 128-bit key using ‘constrained choice’

Manual encryption with a one-time pad revisited

How to remember a provably strong password: a new way using ‘constrained choice’

Authenticating a public key from just a name - a feasibility study with GPL code

Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals

Which web search engines have their own index? Evidence from website visits

External links