Empirical explorations of faster Fermat factorisation, part 2

by Stephen Hewitt | Published 25 February 2022

Introduction

This is the second in a series of technical articles presenting original work towards an open-source, free software implementation of the Fermat factorisation algorithm optimised for speed.

See the Related section below for the other articles in the series. The first article introduced Fermat factorisation, some terminology and a simple approach. This article explains an optimisation included in this simple approach, presents an implementation and reports an empirical measurement of its performance compared to an implementation in publicly available software called YAFU.

To explain the optimisation it first introduces some concepts that will also be used in further work.

Terminology

LSB
least significant bit
MSB
most significant bit
Presquare
The root x of any square x2 will be called the presquare.

Theoretical observations on least significant bits

Lemma: The n least significant bits (LSBs) of a square depend only on the n least significant bits of its presquare.

This is intuitive but can be formally proven by supposing the following. Let's define the presquare as consisting of the sum of two components: the value, X, of its more significant bits and the value, x, of its least significant bits. For example if n = 3, and the presquare is 11101110 in binary then X = 11101000 and x = 110. We can then say:

Square = (X + x)(X + x) = X2 + 2Xx + x2 = X(X + 2x) + x2

So the square is the sum of two components, one x2 and the other a multiple of X. By its definition, X has its n LSBs zero. It is not possible for an integer multiple of X to have non-zero values of those LSBs. We can see this clearly by thinking of integer multiplication as repeated addition. There is no number of additions of X to itself that can produce non-zero values of those LSBs, as the following example column of binary numbers to be added illustrates:

11101000
11101000
...
11101000
---------
.....000

It is not difficult to extend this argument from base two to any base number system. This extending will be used later.

The optimisation of testing only alternate presquares

The third optimisation mentioned in part 1 is that in factorising any particular modulus it is necessary to consider either only odd or only even big presquares. The proof of this follows.

It is not possible for a square to have bit 1 set. There are various ways to see this, but building on the lemma above a simple way is just to consider all four possible cases for the 2 LSBs of the presquare, resulting in all possible cases of the 2 LSBs of the square, as follows:

00 01 10 11 presquare
00 01 00 01 square

Now we know that the big square is the sum of the modulus and the little square. Also, the modulus is the product of two large primes so must be odd. So possible sequences for last two bits of modulus are 01, 11. Possible endings for last two bits of the small square are 00 and 01. So the only (four) possible combinations are as follows

01 11 01 11 modulus
00 00 01 01 small square
-----------
01 11 10 00 sum

Now if this sum is to be the least significant two bits of a big square then the middle two cases are not possible because bit 1 must be 0 in a square. This leaves only two cases and they show that if the modulus ends 01 then the small square is even and the big square is odd. If the modulus ends 11 then the small square is odd and the big square is even.

The optimisation of reporting on a signal

There is a drawback to the fourth optimisation introduced in part 1. The optimisation was to allow inspection at any time of how far the big square has advanced without loading the loop with an extra test. The programme achieves this by using a POSIX signal to interrupt the search. The drawback is that there is no protection against data races. At the moment of interruption, the main loop might be part way through updating a single variable or between updating different variables. To attempt to mitigate this, the reporting function checks for inconsistencies between variables. In practice the problem does not seem to happen often, but it can happen. The following example is the output of the programme after pressing ctrl-\, where presumably the small square variable was being changed at the time.

...
^\--------- Report on signal 3
Note there is no protection against data races on this asynchronous signal; variables could be in an inconsistent state
current small square candidate 21332531531272794815123139589896758318980016372868461642221641431240520551508923111866278057477509999970333951483954914000615205457394402482912871417747870009705492
modulus 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512590323915826448266616741533232891220025698071647988317789125525821000237672311769577294966790885260094640627513288213434253048131773244591800441567386069459534350419701
big square back-calculated from small square + modulus 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512590345248357979539411556656372481116784017051664361186250767747462431478192863278500406833068942737604640597847239697389167048746978701986202924480257487207404360125193
fermat1: error 133 internal problem - back-calculated big square is not square

First implementation

An implementation of the first approach is is available as free, open-source software at /download/fermat1.cc on this website. It comes with absolutely no warranty. It is licensed under the GNU Public Licence (GPL) version 3.

This factorises a modulus defined in the source code, in the include file available at /download/examples.h. The value in examples.h is taken from a website. In October 2021 an article with the title “Attacking RSA keys”, dated 19 June 2019, with stated author Sjoerd Langkemper was at https://www.sjoerdlangkemper.nl/2019/06/19/attacking-rsa/. This contained a link to https://demo.sjoerdlangkemper.nl/rsa/notsoclosepq.pem from which I extracted this modulus.

In decimal it is 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512590323915826448266616741533232891220025698071647988317789125525821000237672311769577294966790885260094640627513288213434253048131773244591800441567386069459534350419701

This modulus can, of course, be changed and the software is intended to work with any modulus value of the form pq where p and q are large primes.

The SHA256 hashes of these two files are:

69c82c8e22ad3f6f3735a02ef0576d137a1366cbd08b3fa8c11b26b944ee9592  fermat1.cc
efc708b808f9ed865dcd1d58d80dead60cbace18fe964e389eb1b1cf7ba69932  modulus1.h

Performance measurements

The following shows the last part of the output of this programme running on a desktop machine.

...
--------- Report on successful completion
current small square candidate 7895324230118036470764587708336457979125285060284382914718665776645099853730433432960984258937033652288686113999085731142389274244728582027377814486856005883659928900
modulus 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512590323915826448266616741533232891220025698071647988317789125525821000237672311769577294966790885260094640627513288213434253048131773244591800441567386069459534350419701
big square back-calculated from small square + modulus 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512598219240056566303087506120941227678004823356708272700703844191597645337526042203010255951049822293746929313627287299165395437406017973173827819381872925465418010348601
initial delta 88630764540473205959718698236926695570576042994647335606355050796586358089813639184428585176769131659381497491429539411657271872058380454810249189293267472
current delta 88630764540473205959718698236926695570576042994647335606355050796586358089813639184428585176769131659381497491429539411657271872058380454810249901941907800
delta increase 712648640328
iterations 89081080041
seconds elapsed 4362
iterations per second 20422072
modulus, ie number factorised 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512590323915826448266616741533232891220025698071647988317789125525821000237672311769577294966790885260094640627513288213434253048131773244591800441567386069459534350419701
big square back-calculated from small square + modulus 490963276439300163974358078751564966935647623573945092674153919631095106803113960790445734183657340544356961673814947178422179864580766235512598219240056566303087506120941227678004823356708272700703844191597645337526042203010255951049822293746929313627287299165395437406017973173827819381872925465418010348601
big back-calculated from big square 22157691135118301489929674559231673892644010748661833901588762699146589522453409796107146294192282914845374372857384852914317968014595113702562475485476949
big back-calculated from delta 22157691135118301489929674559231673892644010748661833901588762699146589522453409796107146294192282914845374372857384852914317968014595113702562475485476949
Absolute increase in big presquare from initial value 178162160082
small 88855637019370000523373608745053305410561547591286137512157928530692601731284014830
prime factor p 22157691135118301489929674559231673892644010748661833901588762699146589611309046815477146817565891659898679783418932444200455480172523644395164206769491779
prime factor q 22157691135118301489929674559231673892644010748661833901588762699146589433597772776737145770818674169792068962295837261628180455856666583009960744201462119
good for you pq = modulus

It means factorisation took 4,362 seconds or nearly 1 hour 15 minutes.

Incidentally the output above illustrates the point from part 1, why on empirical grounds it does not seem feasible to implement Fermat factorisation by incrementing the small presquare rather than testing for a perfect square. The value 'small 888556370...' in this print-out refers to the final value of the small presquare, here of the order of 1083. The initial value was zero, when the first candidate big presquare was near the square root of the modulus. An increase of 83 orders of magnitude means it is not tractable to reach this final value by incrementing the small presquare, at least not if it is going to be incremented through all integer values, or even a significant fraction of them.

A web search for software for a speed comparison found YAFU (Yet Another Factorisation Utility), https://github.com/bbuhrow/yafu. This monolithic programme includes multiple different factorisation methods. A function zFermat() in factor/trialdiv.c implements Fermat factorisation. I extracted zFermat() and wrote a harness to call it.

On the same desktop machine this function from YAFU factorised the same modulus in about 13 seconds. That means it was over 330 times faster than the simple approach explored here.

The question of how YAFU is so fast and whether there is scope to make it even faster will be topics in part 3.

Related