A simple way to represent cryptographic key fingerprints
by Stephen Hewitt | Published 7 June 2020
Here is a proposal for a transparent way to represent a cryptographic key fingerprint as a list of words suitable for speaking over the telephone. The key could be a PGP/GPG key or some other cryptographic key.
Such a list of words could be written on a business card. More generally, it could be used for communicating any kind of binary string as a more user-friendly sequence of words.
Systems with similar goals have been published. What is novel here is the aim for simplicity. For example it would be easy to convert the words back into hex by hand, using only Table 1. Other published systems are discussed below.
The basic principle
This system is different because it works at the level of letters.
Each word represents one byte.
67B4 007A FFFD = walnut barley onion lizard fifty friday
To convert from the word to the hex, each word is abbreviated to two letters, each letter representing one of the hex digits. Then the mapping defined in Table 1 is applied to convert from each of the two letters to a hex digit.
The abbreviation system is to take the first letter of the word as the first letter of the abbreviation. There are then two cases to consider, depending on whether that is a vowel or consonant.
If the first letter in the word is a consonant, then the second letter in the abbreviation is the first consonant in the next consonant cluster. For example walnut => wl, song => sn, reptile => rp, friday => fd, string => sn, style => sl, day => dy. (For these purposes y is defined as a consonant if and only if it directly follows one of the vowels aeiou or is at the start of the word.) If there is no following consonant cluster, then the first vowel in the word is taken, for example tree => te, tea=> te, dye => dy, spa => sa There are relatively few words where this happens.
If the first letter in the word is a vowel, then the second letter in the abbreviation is the next consonant found in the word. For example egg => eg, eel => el, early => er, ozone => oz.
The letters are then looked-up individually in Table 1. So for example in the case of "ozone" abbreviated to oz, o => 0 and z => A so the value of the byte in hex is 0A.
Although it is not the intended use, the coding simple enough that many words can be decoded at a glance. To assist in this the letter conversion table includes various mnemonics as follows. The most important of these is the first one.
- The letters a-f map to hex digits A-F respectively.
- o => 0
- n => 0 (n for nil or nothing)
- i => 1
- t => 2. The first number whose name starts with a 't' is two.
- m => 3. Letter m with three legs.
- r => 4. The only digit with a 'r' in its name is four.
- p => 5. From the Greek as in pentagon.
- h => 6. From the Greek as in hexagon.
- s => 8. If the loose ends of the 's' were joined together it look would like '8'.
- g => 9. Lower case g has a similar shape to 9. Also Giga is 109.
- x => A. (Roman numeral)
A vocabulary for speaking
A proposed vocabulary of 256 words for speaking is shown in Figure 1.
The selection criteria for these words was that they should be reasonably common and easy to spell and recognise, including for people speaking English as a second language.
Words with fewer letters are preferred but monosyllabic words are mostly avoided on the premise that in speech there is a greater chance of mishearing a monosyllabic word. Words that sound the same or similar to other English words are avoided. For example flower is not included because it is indistinguishable from flour.
To reduce the chance of a word being misheard, it helps if the listener is aware of what kind of word is likely, without having to memorise or even have seen the 256 actual words. For this reason in general the selected words have certain predictable characteristics.
Nouns are preferred to verbs or adjectives. Concrete concepts are preferred to abstract. So-called “functional” words (conjunctions etc, for example “because”) are never used. There are several themes:
- animals, plants and flowers and the natural world
- food and eating
- the occasional musical instrument
- names of countries,
- words from the NATO phonetic alphabet (alpha, bravo, charlie ...)
- words related to the calendar - names of months, or days of the week
- the occasional number such as fifty.
Related published systems
A word is list given in the PGPfone: Pretty Good Privacy Phone Owner's Manual by Philip R. Zimmerman dated 5 January 1996 available on a mit.edu url in June 2020.
A similar manual in pdf form is at https://philzimmermann.com/docs/pgpfone10b7.pdf
Describing how the word list was chosen, the manual says:
“After the computer evolved the winning list, we looked at it. Yes, the words were phonetically distinct. But many of them looked like a computer picked them, not a human.”
... “Some words were deleted, and replaced by some human-chosen words.” ... “I'm not entirely happy with the word list.”
This system described uses one word per byte, but it has two 256-word lists, one with two-syllable words and one with three. Odd byte positions are supposed to use one list and even byte positions the other. The two-syllable word list starts “aardvark absurd accrue acme adrift”. The three-syllable list starts “adroitness adviser aggregate alkali almighty amulet”.
There is a paper about the selection of these words:
Whole-Word Phonetic Distances and the PGPfone Alphabet, Patrick Juola & Philip Zimmermann
It includes: “Clearly, a full evaluation of this work requires some empirical checking, which at this point has not yet been done.”
The word list is not used in GPG. An archived email list on gnupg.org includes discussion of a patch to add it, dated March 2001.
The word list is in the source code of GNU ZRTP (RFC 6189) on github.com .
There is a paper on arxiv.org A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable by Daniel Shawcross Wilkerson, dated 26 January 2009.
The proposal is to use a group of 5 letters of alternating consonants and vowels to represent 4 hex digits. The paper says "We suspect the fact that proquints are more information-dense than English is the reason proquints must be pronounced a bit more carefully than English - there's no such thing as a free lunch."
It includes examples of the representation of IP addresses:
127.0.0.1 lusab-babad 126.96.36.199 gutih-tugad 188.8.131.52 gutuk-bisog 184.108.40.206 mudof-sakat
The Proquint paper does not include a reference to Bubble Babble (below) which had been published over seven years earlier.
There is an email dated 7 August 2001 with Subject "Bubble Babble" which seems to contain a draft RFC, in a mail archive on ietf.org
Similar but adding “corrected 2011” is a draft at https://web.mit.edu/kenta/www/one/bubblebabble/spec/jrtrjwzi/draft-huima-01.txt
Like Proquint, Bubble Babble produces 5 letters, alternating vowels and consonants, from two bytes or 4 hex digits. Bubble Babble is a more complicated encoding and includes a checksum.
Bubble Babble is included in OpenSSH in ssh-genkey, where there is an option to use it to show the key fingerprint, for example:
ssh-keygen -B -f .ssh/id_dsa 1024 xetok-fylyk-vepec-cegec-muhul-cylen-resup-kucyr-ginar-vevev-doxox xxx@yyy (DSA)
IETF RFC 1751
A Convention for Human-Readable 128-bit Keys, D. McDonald, December 1994
A dictionary of 2048 short words with each word representing 11 bits. An example given is
EFF8 1F9B FBC6 5350 920C DD74 16DE 8009 = TROD MUTE TAIL WARM CHAR KONG HAAG CITY BORE O TEAL AWL
Not every “word” in the dictionary is English, for example PHI, MOS, MOE, RUBE, ULAN. There are words with identical sounds, for example SOW, SO, SEW, READ, REED, REID.
- How to remember a provably strong password: a new way using ‘constrained choice’ July 2018, Stephen Hewitt. The 2nd Clarion data privacy article
- How to use beads to generate random letters and numbers by hand 22 September 2017, Stephen Hewitt. The first Clarion data privacy article
- How to remember a 128-bit key using ‘constrained choice’ August 2018, Stephen Hewitt. The 3rd Clarion data privacy article
- Manual encryption with a one-time pad revisited March 2019, Stephen Hewitt. The 4th Clarion data privacy article