# Empirical explorations of faster Fermat factorisation, part 1

by Stephen Hewitt | Published 13 February 2022

## Introduction

This is the first in a planned series of technical articles presenting original work towards an open-source, free software implementation of the Fermat factorisation algorithm optimised for speed.

Fermat factorisation here means factorising a large number pq which is itself the product of two primes, p and q. This kind of number is sometimes called a ‘semiprime’ but that is evidently a misnomer, since it is not prime.

Factorisation of such numbers has received attention in the last decades because of RSA public key cryptography. The modulus of the public key is usually of the form pq and the security of RSA depends on the difficulty of factorising this modulus. For this reason, in these articles I will refer to the number to be factorised as the “modulus”, represented by algebraic variable N or n.

These articles will include some assertions or theorems which seem to hold empirically, as demonstrated by a computer programme, but for which I have yet to find a proof.

This first article introduces the idea and describes the first, simple approach. Further articles will present the software developed and more sophisticated approaches.

## Fermat factorisation

Fermat factorisation here means finding two squares (of integers) such that the modulus
is the difference between the two. These squares are the big square
b^{2} and the small square s^{2}. So the goal is to find integers b and s such that:

n = b^{2} - s^{2}

Now this can be written as

n = (b-s)(b+s)

which means that we have found factors of n.

The only approach pursued here is to start with the lowest possible guess for b, calculate b^{2} - n and keep increasing b it until
b^{2} - n is a perfect square.
This first guess will be that b is the lowest integer not smaller than the square root of n. If n did in fact have an integer square root, this would correspond to s = 0.

Supposedly, there is another approach. This involves simultaneously maintaining
a guess for s, calculating b^{2} - s^{2} and trying to make that difference equal to n.
So instead of checking whether something is a perfect
square, you have to check whether the difference between two known squares is the modulus.
I cannot see how such a method can be viable, at least not in its simple form. The reason is that the
the amount by which s must increase is orders of magnitude more than the increase in b.
The time taken in incrementing s and generating its square would therefore make this prohibitive. I will illustrate
this with figures from a real example later.

## Terminology

**Modulus**- For the reason explained above the number to be factorised will be referred to as the modulus.
**Presquare**- The root b of the big square b
^{2}will be called the big presquare and likewise s, the small presquare. **Semiprime**- This term will not be used.

## First exploration

As a way of handling large integers, all the approaches explored here used GMP, the GNU Multiple Precision Arithmetic Library. The first attempt had optimisations, all of which are illustrated in the following code fragment. This is the main search loop, which has been optimised to consist of just two additions and a test for a perfect square.

while (!mpz_perfect_square_p(candidate)) { mpz_add(candidate, candidate, delta); mpz_add_ui(delta, delta, 8); }

Here all the functions with names starting mpz_ are GMP library functions. The variables `candidate` and `delta`
are of library type mpz_t, type that can hold arbitrarily large integers.

The first optimisation is to generate each value
of b^{2} from the previous value without having to square a large number again.

This was based on the fact that

(b + 1)^{2} = b^{2} + 2b + 1

So if b^{2} is the current big square, then to
to generate the next big square
all we need to do to
is to increase it by 2b + 1.

The second idea for optimisation extends this idea. Since what we really want is not b^{2} but b^{2} - n,
we don't actually need to generate b^{2} at all. Given the value of
b^{2} - n at the last iteration, the same increase can be added directly to that. This is the value held in
`candidate`. It means that the value of the modulus is only ever subtracted once, before the loop starts.

The third optimisation is the realisation that we can predict from the modulus whether the big presquare is going to be odd or even. This will be explained in the next article, but in this loop it means we increase the big presquare b by 2 each iteration rather than by 1.

The `delta` variable in the loop is updated to always contain the increase from the current big square to the next one.
Table 1 below illustrates by example why the loop adds 8 to `delta` and then uses delta as the increase to the next square.

presquare | square | delta to next | delta, even presquares | delta, odd presquares |
---|---|---|---|---|

0 | 0 | 1 | 4 | - |

1 | 1 | 3 | - | 8 |

2 | 4 | 5 | 12 | - |

3 | 9 | 7 | - | 16 |

4 | 16 | 9 | 20 | - |

5 | 25 | 11 | - | 24 |

6 | 36 | 13 | 28 | - |

7 | 49 | 15 | - | 32 |

8 | 64 | 17 | 36 | - |

The fourth optimisation was to allow inspection at any time of how far the big square has advanced while the search is in progress, but without slowing the loop with additional code.

The solution was to use a POSIX signal. The following code fragment sets up a signal handling function report().

signal(SIGQUIT, report);

In Linux pressing ctrl-\ (control back slash) on the keyboard generates a SIGQUIT, signal 3.
The programme catches this in report() and prints a report of the current value of the big square and presquare
and the number of iterations so far etc, all of which can be back-calculated from
`delta` and `candidate`.

It would be possible to write a programme to build on this idea, so that if you wanted to shut down the computer you could print out the search position so far, then resume later at the value of the big presquare previously reached.

The next article will present the free, open-source software and some performance measurements.

## Related

- Empirical explorations of faster Fermat factorisation, part 2 February 2022, A technical article - simple algorithm in GPL software
- Empirical explorations of faster Fermat factorisation, part 3 March 2022, A technical article - experiments with filtering
- How to remember a 128-bit key using ‘constrained choice’ August 2018, Stephen Hewitt. The 3rd Clarion data privacy article
- Manual encryption with a one-time pad revisited March 2019, Stephen Hewitt. The 4th Clarion data privacy article
- How to remember a provably strong password: a new way using ‘constrained choice’ July 2018, Stephen Hewitt. The 2nd Clarion data privacy article
- An idea for public key authentication from a name without certificates or central authority May 2021 Stephen Hewitt
- Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals May 2021 An empirical investigation by Stephen Hewitt