Manual encryption with a one-time pad revisited
by Stephen Hewitt | Published 18 March 2019
abc | def | ghi |
jkl | mno | pqr |
stu | vwx | yz3 |
Introduction
Here is a possibly novel way to encrypt and decrypt messages without a computer using the classic principle of the one-time pad.
The method presented here enciphers directly from the letters of the plain text without conversion to numbers. What may be novel is that it uses only the 3x3 grid shown in Table 1, whereas other published systems using letters directly have instead used a larger NxN table or substitution square to look up pairs of letters, where N is the number of letters in the alphabet. (See below for two such published systems)
This grid contains the letters of the alphabet in normal alphabetical order and can be written out by hand in a minute. It has also proven possible to encipher without the grid by mentally picturing it.
To my knowledge this method presented here is indeed novel, although this seems like a field where there may be techniques and knowledge which are never published.
Uses
Enciphering directly from letters is potentially convenient for splitting a strong password consisting entirely of letters of the type described in the earlier Clarion article How to remember a provably strong password: a new way using constrained choice.
A demonstration of splitting an example password from the article is given in the next section.
Splitting means creating two splits or shares of the password which when recombined yield the original password, but each of which on its own yields no information about the password at all.
For example keeping a written copy of each share in a different location might provide more security than simply writing the password down.
One way of generating the random letters of the key or one-time pad is with beads. Note that this would need 27 beads - the 26 letters of the alphabet and one more bead to represent '3'. The use of beads is described in general in the Clarion article How to use beads to generate random letters and numbers by hand.
How to use the grid
Table 1 consists of a 3x3 grid of cells where each cell contains three letters. For example the top left cell contains “abc”. The letters are coloured according to their position in the cell, with the first one red, the second one green and the third one blue. Cell position and colour are therefore equivalent.
To encipher with a one-time pad each successive letter of the plain text is combined with each successive letter of the key (one-time pad) to produce each successive letter of the cipher text.
To combine a pair letters using Table 1, you look at the row, column and cell position of both letters.
If both letters are in the same column of the grid, then the result is also in that column. Otherwise it is in the remaining column that neither of the letters are in.
If both letters are in the same row, then the result is also in that row. Otherwise it is in the remaining row that neither of the letters are in.
If both letters have the same cell position or colour, then the resulting letter also has that cell position or colour, otherwise it has the remaining cell position and colour that neither of the letters do.
Examples
H+N=T because H and T are in different rows and different columns so the result is in a different row and column to both of them. But they both have the same cell position (green, in the middle) so the result is also green and in the middle of the cell.
E+L=Y because E and L have everything different to each other - column, row and cell position, so the result has all these things different to both of them.
W+X=V because both letters are in the same cell (ie they have the same row and column) but they have different positions within the cell, so the result is in the same cell but in a different cell position.
I+I=I because combining two identical letters always results in a letter that is the same too, since under the rules it will be in the same row, the same column and the same cell position, making it the same letter.
Here is an example of splitting a strong password. The password is bxwg ftiv... (“box wig faithful ivy”...) from the Memory Walk example in How to remember a provably strong password: a new way using constrained choice.
Share 1 is a sequence of random symbols from the alphabet. Note this 27-symbol alphabet includes the number '3' in addition to the 26 letters.
Share 2 is the result of using the grid to combine each letter of the password with the corresponding letter of Share 1.
Password: bxwg ftiv bvmc mwpl ploy gmct glbw mrpe Share 1: gvhx rkau 3mo3 tiza oulb pqka vyml qmml Share 2: fwkk ubez mdno ijit kcro ylsl je3g lkjy
After generating Share 2 the written copy of the password can be destroyed and Share 1 and Share 2 can be kept securely in separate locations. To to retrieve the password, simply combine Share 1 and Share 2 using the same process and the result will be the original password.
Share 1: gvhx rkau 3mo3 tiza oulb pqka vyml qmml Share 2: fwkb ubez mdno ijit kcro ylsl je3g lkjy Password: bxwg ftiv bvmc mwpl ploy gmct glbw mrpe
Triplets
Apart from the convenience of knowing that the result of any letter combined with itself is always itself, it is useful to note that all the letters work in triplets.
A triplet is a group of three letters where combining two of them results in the third, for example ABC. Now in this system it is always the case that triplets are symmetrical. Combining any two letters of a triplet (in either order) will result in the third one. So for example A+B=C, B+A=C, B+C=A, A+C=B and so on.
Note that it is always the case that either all three letters in a triplet are different, or they are all identical. It is not possible to have just two out of three the same.
Theory of how it works
Inspiration for a novel operator
The exclusive OR or XOR operator, traditionally used in computers for one-time pad or other stream ciphers, is very easy for humans to calculate. It takes two binary digits or bits, which can both only be 0 or 1, and it produces an output bit which is 0 if the two input bits are the same but 1 if the two input bits are different. It turns out this is equivalent to modular 2 addition, but to a human it is intuitive and hardly seems like addition at all.
The XOR operator has two other desirable properties. First, it does not care which input bit is which, so you don't have to think about which bit came from your message and which came from your key. Second, applying it with the same key bit twice always gets back to the original message bit, meaning that decryption and encryption are the same operations with the same key.
The problem with using XOR for manual enciphering is that with an alphabet of 26 symbols each symbol would need 5 bits, meaning 5 operations.
Inspired by the idea of one kind of output for equal inputs and a different kind of output for unequal inputs, could there be a way to extend this intuitive operator so that not so many operations are required for each letter? Let's try increasing the base system from base 2 to base 3.
Defining a novel operator, the rule-of-three
This section will use the notation of upper-case Roman letters for literal letters from an alphabet but Greek letters for variables over an alphabet.
Suppose we have an alphabet with only three symbols A,B,C. Let us define a novel addition operator '+' such that when it operates on two symbols then if they are the same, then the result is the same. So for example
A + A = A
But if the symbols are different, then the result is different to both of them. In other words, the result is the only other symbol in the alphabet. So:
A + B = C
B + C = A
A + C = B
Let us call this novel operator the rule-of-three operator. It follows that a property of this operator is that adding the same symbol twice gets back to the original value. That is to say:
(α + β) + β = α ....... (Equation 1)
where variables α and β each represent any symbol in the alphabet.
This useful property, reciprocity, will allow us the convenience of being able to use the same operation to decrypt as we use to encrypt.
Another useful property is commutativity. That is to say that for all α and β in the alphabet:
α + β = β + α ....... (Equation 2)
This symmetry will allow us the convenience of not needing to mentally distinguish the one-time pad symbol from the from the text symbol as we apply the operator.
By the way, unlike XOR, this operator is not associative and (α + β) + γ is not necessarily equal to α + (β + γ). For our immediate purposes, this does not matter. (It might matter if it is used to split a password into more than two shares, when this property will mean that the shares have to be recombined in the correct order.)
So our one-time pad or key consists of a stream of randomly chosen symbols from the alphabet and we encipher the plain text by taking each letter in turn and applying the rule-of-three operator to it to produce the corresponding letter of the cipher text:
plaintext = CABBAA one-time pad = ACCBCB ciphertext = BBABBC
Of course for practical purposes we would like an alphabet with more than three symbols.
We will try to extend our use of the rule-of-three operator by defining an alphabet where each letter consists of a vector of component values and each component has three possible values, making it amenable to the operator.
An equivalent and alternative way of thinking of this is that we are going to encode a numeric value for each letter as a base 3 number in a way analogous to how a computer would represent it as a binary number. Then in the same way that a one-time pad could be implemented in a computer by applying the XOR operator to a vector of bits holding the value of the letter so we will apply the rule-of-three operator to a vector of trinary digits.
Let's require 3n letters in our practical alphabet, where n is an integer. This requirement ensures that applying the operator to any pair of letters always produces another valid letter. It does this because it means that every possible permutation of n trinary digits (or vector components, according to taste) represents a letter.
Here we are in luck, because the English alphabet has only 26 symbols. If we choose 27 symbols for our alphabet, corresponding to n=3, we can use the extra symbol to represent a space. For clarity here we will use the number '3' for this 27th symbol. So our alphabet will be, in groups of three:
ABC DEF GHI JKL MNO PQR STU VWX YZ3
So we will assume that n=3 for the rest of this article, but all the proofs would work with any value of n.
It's possible to imagine each letter as a point in three dimensional space (which is close to what the grid in Table 1 represents), or as a number written in base-3 with three digits. So every letter in our practical alphabet is represented as a vector of dimension n=3 with three possible values for each of its components. So a letter α in the practical alphabet can be written as a vector (αx, αy, αz)
We will extend the definition of our rule-of-three addition operator so that it works over such a vector alphabet (or you can think of it as the definition of a new operator, it really makes no difference)
Then by definition of the extended rule-of-three operator:
(αx, αy, αz ) + (βx, βy, βz ) = (αx + βx, αy + βy, αz + βz)
In other words, applying the operator to a pair of such a letters is defined as applying the simple operator above to each component independently.
It follows from this definition that the properties of commutativity and reciprocity apply to the operator over the practical alphabet too.
So to prove commutativity:
α + β = (αx + βx, αy + βy, αz + βz)
But by the commutativity of the simple operator in Equation 2, we can rewrite the right hand side
α + β = (βx + αx, βy + αy, βz + αz)
(α + β) + β = α
Similarly for reciprocity. Informally it is obvious that if every component of the vector returns to its original value, then the vector itself is its original value. More formally:
(α + β) + β = ((αx + βx) + βx, (αy + βy) + βy, (αz + βz) + βz)
But by applying the reciprocity of the simple operator in Equation 1 to each vector component, we can rewrite the right hand side:
(α + β) + β = (αx, αy, αz)
(α + β) + β = α
Triplets
Now suppose we have any two letters α, β and apply the operator to produce γ:
γ = α + β
It follows from the properties of reciprocity and commutativity of the operator that the following are also true
α = β + γ
β = γ + α
We can prove this because
γ = α + β
γ + β = (α + β) + β
But we know from the property of reciprocity that (α + β) + β = α, therefore
γ + β = α
This means that we can think of letters of the alphabet in triplets, where the addition of any two letters of a triplet always results in the third letter of the triplet.
The grid
The grid in Table 1 functions as a representation of three vector components. These are column position, row position and cell position, each of which has three possible values.
Related to some historical published methods
In his book Between Silk and Cyanide A codemaker's war 1941-1945 Leo Marks wrote that during World War II while working for Britain's “Special Operations Executive” he devised a manual one-time pad system which directly enciphered letters without an intervening step of converting them to numbers. He called this a “letter one-time pad” or “LOP”.
This worked by means of what he called a “substitution square” - a 26x26 grid serving as a lookup table for any pair of letters in the alphabet.
He demonstrated his idea to an expert from Bletchley Park, Commander Dudley Smith, who then told him “As a matter of fact letter one-time pads have been working very successfully for quite a long time.”
There does not seem to be any information in the book about this earlier system or any indication that Marks received any information about it.
There is a footnote (on page 250 of the HarperCollins “corrected paperback edition 2000”):
Even in 1998, when so much has been written about SOE that only its secrets remain, I am still credited with inventing the letter one-time pad. WOKs, yes - but with LOPs I was pre-empted, and I wish I knew by whom! Whoever you are, and wherever you may be, my apologies and thanks. L.M.
Dudley Smith pointed out to Marks that there was no security benefit to using a different substitution square for each agent as he had planned. Despite that, the book contains photographs with the caption “Some of SOE's silks” that show at least two different substitution squares printed on silk. Further, Appendix 2 reproduces a minute dated 2 November 1943 which includes an example with plaintext, key and ciphertext (HOUSE, ZVRBI and BIPJB) which together are not consistent with either of the squares legible in the photos.
The substitution squares in the book seem to be reciprocal in the sense above, and Marks drew attention to this: “I pointed out that O over L had produced Q, and that in the decoding process O over Q would produce L, and the same principle applied to all the other pairs of letters.”
However the squares are not commutative. If they had been, two of the potential ciphering mistakes listed by Marks in Appendix 2 would not have been possible.
Appendix 2 is titled Minute of 2 November 1943 from L.S. Marks, HQ Security & Planning Office .
The subject of the minute is “Indecipherable Messages on Letter One-Time Pads” and it starts:
Owing to the very simple construction of the letter one-time pad code it is most improbable that many indecipherable messages will be received. However, when these indecipherables occur, the following are the lines of attack which should be attempted.
It then provides an ordered list of ways to attempt deciphering, each based on a putative mistake made in enciphering. Of these, the following two could not arise in a commutative system.
- Attempt No 4
- “The Home Station should now try to assume that the agent has written the message on top of the one-time pad groups instead of beneath them.”...
- Attempt No 6
- “Assume that the agent, instead of taking the little letter as the cipher group and the large letter as the en clair, reverses the process and takes the little letter as the en clair group, and the large capital as the cipher group.”
In contrast, a document from the American military's NSA dated 1973 shows a substitution square that is both commutative and reciprocal. The square is introduced with the words “The oldest type still in substantial use is called DIANA. Here is a sample.”
A | ABCDEFGHIJKLMNOPQRSTUVWXYZ ZYXWVUTSRQPONMLKJIHGFEDCBA |
B | ABCDEFGHIJKLMNOPQRSTUVWXYZ YXWVUTSRQPONMLKJIHGFEDCBAZ |
C | ABCDEFGHIJKLMNOPQRSTUVWXYZ XWVUTSRQPONMLKJIHGFEDCBAZY |
D | ABCDEFGHIJKLMNOPQRSTUVWXYZ WVUTSRQPONMLKJIHGFEDCBAZYX |
E | ABCDEFGHIJKLMNOPQRSTUVWXYZ VUTSRQPONMLKJIHGFEDCBAZYXW |
F | ABCDEFGHIJKLMNOPQRSTUVWXYZ UTSRQPONMLKJIHGFEDCBAZYXWV |
G | ABCDEFGHIJKLMNOPQRSTUVWXYZ TSRQPONMLKJIHGFEDCBAZYXWVU |
H | ABCDEFGHIJKLMNOPQRSTUVWXYZ SRQPONMLKJIHGFEDCBAZYXWVUT |
I | ABCDEFGHIJKLMNOPQRSTUVWXYZ RQPONMLKJIHGFEDCBAZYXWVUTS |
J | ABCDEFGHIJKLMNOPQRSTUVWXYZ QPONMLKJIHGFEDCBAZYXWVUTSR |
K | ABCDEFGHIJKLMNOPQRSTUVWXYZ PONMLKJIHGFEDCBAZYXWVUTSRQ |
L | ABCDEFGHIJKLMNOPQRSTUVWXYZ ONMLKJIHGFEDCBAZYXWVUTSRQP |
M | ABCDEFGHIJKLMNOPQRSTUVWXYZ NMLKJIHGFEDCBAZYXWVUTSRQPO |
N | ABCDEFGHIJKLMNOPQRSTUVWXYZ MLKJIHGFEDCBAZYXWVUTSRQPON |
O | ABCDEFGHIJKLMNOPQRSTUVWXYZ LKJIHGFEDCBAZYXWVUTSRQPONM |
P | ABCDEFGHIJKLMNOPQRSTUVWXYZ KJIHGFEDCBAZYXWVUTSRQPONML |
Q | ABCDEFGHIJKLMNOPQRSTUVWXYZ JIHGFEDCBAZYXWVUTSRQPONMLK |
R | ABCDEFGHIJKLMNOPQRSTUVWXYZ IHGFEDCBAZYXWVUTSRQPONMLKJ |
S | ABCDEFGHIJKLMNOPQRSTUVWXYZ HGFEDCBAZYXWVUTSRQPONMLKJI |
T | ABCDEFGHIJKLMNOPQRSTUVWXYZ GFEDCBAZYXWVUTSRQPONMLKJIH |
U | ABCDEFGHIJKLMNOPQRSTUVWXYZ FEDCBAZYXWVUTSRQPONMLKJIHG |
V | ABCDEFGHIJKLMNOPQRSTUVWXYZ EDCBAZYXWVUTSRQPONMLKJIHGF |
W | ABCDEFGHIJKLMNOPQRSTUVWXYZ DCBAZYXWVUTSRQPONMLKJIHGFE |
X | ABCDEFGHIJKLMNOPQRSTUVWXYZ CBAZYXWVUTSRQPONMLKJIHGFED |
Y | ABCDEFGHIJKLMNOPQRSTUVWXYZ BAZYXWVUTSRQPONMLKJIHGFEDC |
Z | ABCDEFGHIJKLMNOPQRSTUVWXYZ AZYXWVUTSRQPONMLKJIHGFEDCB |
That substitution square is shown in Table 2. It is from page 22 of A HISTORY of U.S. COMMUNICATIONS SECURITY (U) (The David G. Boak Lectures) National Security Agency, Revised July 1973.
A photograph of essentially the same square printed on paper is included on this web page (“One-time Pad”), with the description “Below, on the left, a one-time pad booklet with Vigenere table from a Western agent, seized by the East-German MfS (Ministerium für Staatssicherheit or Stasi).”
The commutativity and reciprocity of this system means that it also has triplets (albeit of course different ones to the 3x3 grid).
According to one web page ENCRYPTION VIA ONE-TIME PADS By Pete McCollum, which claims that “a retired CIA old-timer” provided the information:
An experienced radio operator would memorize the entire table, and could take a look at the plain text, and key text, identify the triad from memory, and thus encipher as he sent code.
This memorisation was done by memorising “triads” (triplets).
The number of triplets to memorise in this system is 126 and they are listed in Table 2.
How does this number compare with the rule-of-three system?
It turns out that there are only 117 triplets to memorise in the rule-of-three system. They are listed in Table 1.
There are of course an additional 27 triplets where every letter in the triplet is identical: aaa, bbb, ccc, and so on. These hardly need to be memorised.
So despite having an extra symbol in the alphabet, there are less triplets to memorise than the 26x26 “DIANA” system.
If that seems a paradox, it can be resolved by the following argument. A principle of the rule-of-three system is that all the letters in a triplet are different, which means that every triplet allows the look-up of three different pairs of letters. For example the ABC triplet allows look-up of AB, BC and AC.
In contrast, the 26x26 system includes triplets with double letters, such as XBB. A triplet with double letters can be used to look up only two different pairs of letters, in the case of XBB for example only XB and BB, meaning on average less pairs are covered by a triplet so more triplets are needed per pair.
For those who would like to calculate the number of rule-of-three triplets without listing them and counting, here is a way. In a 27-symbol alphabet there are 27x26 ordered pairs of symbols where the symbols in the pair are different. For every such pair of symbols, there will be a unique third symbol that completes a triplet. However if we select all these pairs from the alphabet we will have included every triplet in all of its possible orderings. Since to count triplets we don't care about the order of symbols within them, we need to divide by the number of ways of permuting 3 objects = 3x2. So the number of triplets is 26 x 27 / (3 x 2) = 117.
Personal experience using the grid
Using special software to drill you, I spent several hours practising. The drill software chooses a random pair of letters and asks you the result of combining them.
My general observations were that while looking at the 3x3 grid it took on average about 6 seconds per operation, and I could do something like 50 before making a mistake, although on occasion I got to nearly 200.
When I tried without the grid the time increased to around 9 - 10 seconds per operation.
After an hour or two's accumulated practice, you have probably noticed and memorised some triplets without trying.
For example, the first three vowels of the alphabet, AEI, form a triplet across the top row. By serendipity the vowels IOU form another triplet down the diagonal.
As soon as I was drilled on it, I noticed that the last letter in each cell in the middle column spells FOX, and I have remembered it ever since. A similar triplet within the middle column is MEX, for which the mnemonic is, obviously, Mexico. Another memorable triplet is ELY, the famous cathedral city, 15 miles north of Cambridge.
Without conscious intent you notice things like this and it means that sometimes you know the answer without looking at the grid.
However, in order to encipher without using the grid at all, I attempted to memorise it, so that I could picture it. The simplicity of its layout enables you to re-construct in your head any parts that are a bit hazy, but it helps if you have a least some cells that you just know.
For me one of these is the central cell MNO, which I remember with the mnemonic manganese oxide. To its left is JKL which has the mnemonic jackal.
Some cells were easier to construct a mnemonic for than others. ABC and DEF do not seem to need a mnemonic at all. GHI was a little more difficult...
The nature of the rule-of-three operator means that in many cases the three cells of the three letters will be in a straight line - either on a diagonal (eg AMY) or a row (eg JNR) or a column (eg GPY). The point is that looking at the grid, either physically or in your mind's eye, you can in many cases see instinctively how to complete the line.
There is a complication that sometimes make this more difficult. Let's call it the “broken diagonal”. An example is GJV. I learned to think of these in terms of chess moves. One of the cells will be in a corner of the grid, and the other two cells will both be a knight's move away from it, both against the edge of the grid, one in the middle of a row and the other in the middle of column.
The idea of a “broken diagonal” comes from another way of looking at the situation which is to assume that the all three cells are actually on a diagonal but the diagonal reaches the edge of the grid and has had to wrap round, re-appearing at the opposite side of the grid.
Related
- How to remember a provably strong password: a new way using ‘constrained choice’ July 2018, Stephen Hewitt. The 2nd Clarion data privacy article
- How to use beads to generate random letters and numbers by hand 22 September 2017, Stephen Hewitt. The first Clarion data privacy article
- How to remember a 128-bit key using ‘constrained choice’ August 2018, Stephen Hewitt. The 3rd Clarion data privacy article
- Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals May 2021 An empirical investigation by Stephen Hewitt
- Lepron project part 2: towards public key authentication without central authority using names made of pseudowords August 2021 An open source project by Stephen Hewitt
- Lepron project part 3: A first attempt at pronounceable passwords using dice for quantified strength September 2021 An open source project by Stephen Hewitt
- An idea for public key authentication from a name without certificates or central authority May 2021 Stephen Hewitt
- A simple way to represent cryptographic key fingerprints 7 June 2020, Stephen Hewitt. The 5th Clarion data privacy article
- Empirical explorations of faster Fermat factorisation, part 1 A technical article
External links
- A HISTORY of U.S. COMMUNICATIONS SECURITY (U) (The David G. Boak Lectures) National Security Agency, USA, Revised July 1973