Innovation and design

A simple way to represent cryptographic key fingerprints

by Stephen Hewitt | Published 7 June 2020

Introduction

Here is a proposal for a transparent way to represent a cryptographic key fingerprint as a list of words suitable for speaking over the telephone. The key could be a PGP/GPG key or some other cryptographic key.

Such a list of words could be written on a business card. More generally, it could be used for communicating any kind of binary string as a more user-friendly sequence of words.

Systems with similar goals have been published. What is novel here is the aim for simplicity. For example it would be easy to convert the words back into hex by hand, using only Table 1. Other published systems are discussed below.

The basic principle

This system is different because it works at the level of letters.

Hex	Letters
0	no
1	ikqy
2	t
3	mu
4	r
5	p
6	hw
7	l

Hex	Letters
8	s
9	g
A	ajxz
B	b
C	c
D	d
E	ev
F	f

Table 1 A mapping between the value of a hex digit and letters in the abbreviation of a word. The word represents a byte value whose hex digits can be obtained using this mapping.

Each word represents one byte.

An example:

67B4 007A FFFD = walnut barley onion lizard fifty friday

To convert from the word to the hex, each word is abbreviated to two letters, each letter representing one of the hex digits. Then the mapping defined in Table 1 is applied to convert from each of the two letters to a hex digit.

The abbreviation system is to take the first letter of the word as the first letter of the abbreviation. There are then two cases to consider, depending on whether that is a vowel or consonant.

If the first letter in the word is a consonant, then the second letter in the abbreviation is the first consonant in the next consonant cluster. For example walnut => wl, song => sn, reptile => rp, friday => fd, string => sn, style => sl, day => dy. (For these purposes y is defined as a consonant if and only if it directly follows one of the vowels aeiou or is at the start of the word.) If there is no following consonant cluster, then the first vowel in the word is taken, for example tree => te, tea=> te, dye => dy, spa => sa There are relatively few words where this happens.
If the first letter in the word is a vowel, then the second letter in the abbreviation is the next consonant found in the word. For example egg => eg, eel => el, early => er, ozone => oz.

The letters are then looked-up individually in Table 1. So for example in the case of "ozone" abbreviated to oz, o => 0 and z => A so the value of the byte in hex is 0A.

Although it is not the intended use, the coding simple enough that many words can be decoded at a glance. To assist in this the letter conversion table includes various mnemonics as follows. The most important of these is the first one.

The letters a-f map to hex digits A-F respectively.
o => 0
n => 0 (n for nil or nothing)
i => 1
t => 2. The first number whose name starts with a 't' is two.
m => 3. Letter m with three legs.
r => 4. The only digit with a 'r' in its name is four.
p => 5. From the Greek as in pentagon.
h => 6. From the Greek as in hexagon.
s => 8. If the loose ends of the 's' were joined together it look would like '8'.
g => 9. Lower case g has a similar shape to 9. Also Giga is 10⁹.
x => A. (Roman numeral)

A vocabulary for speaking

A proposed vocabulary of 256 words for speaking is shown in Figure 1.

The selection criteria for these words was that they should be reasonably common and easy to spell and recognise, including for people speaking English as a second language.

Words with fewer letters are preferred but monosyllabic words are mostly avoided on the premise that in speech there is a greater chance of mishearing a monosyllabic word. Words that sound the same or similar to other English words are avoided. For example flower is not included because it is indistinguishable from flour.

To reduce the chance of a word being misheard, it helps if the listener is aware of what kind of word is likely, without having to memorise or even have seen the 256 actual words. For this reason in general the selected words have certain predictable characteristics.

Nouns are preferred to verbs or adjectives. Concrete concepts are preferred to abstract. So-called “functional” words (conjunctions etc, for example “because”) are never used. There are several themes:

animals, plants and flowers and the natural world
food and eating
the occasional musical instrument
names of countries,
words from the NATO phonetic alphabet (alpha, bravo, charlie ...)
words related to the calendar - names of months, or days of the week
the occasional number such as fifty.

00 onion 01 oyster 02 nutmeg 03 number 04 orange 05 napkin 06 newt 07 olive 08 oscar 09 nigeria 0A ozone 0B nobody 0C octopus 0D needle 0E november 0F office 10 india 11 keyboard 12 italy 13 yemen 14 iran 15 kipper 16 kiwi 17 kilo 18 question 19 yoga 1A quiz 1B kebab 1C iceberg 1D kidnap 1E ivy 1F knife 20 tango 21 tequila 22 tattoo 23 tomato 24 turkey 25 teapot 26 tower 27 tulip 28 tuesday 29 tiger 2A taxi 2B table 2C tractor 2D tadpole 2E tree 2F toffee 30 uniform 31 mike 32 mother 33 mimosa 34 marble 35 maple 36 mahogany 37 melon 38 mouse 39 magic 3A mexico 3B mobile 3C mucus 3D meadow 3E maverick 3F mafia 40 rainbow 41 ray 42 retina 43 romeo 44 rural 45 reptile 46 reward 47 relax 48 risotto 49 roger 4A rejoice 4B robin 4C rocket 4D radish 4E river 4F rifle 50 peanut 51 pakistan 52 potato 53 pumpkin 54 peru 55 papa 56 powder 57 pelican 58 pasta 59 pigeon 5A pizza 5B pebble 5C picnic 5D pedal 5E privacy 5F puffin 60 honey 61 wok 62 hotel 63 woman 64 heron 65 happy 66 hawk 67 walnut 68 whisky 69 haggis 6A wax 6B web 6C hack 6D hedgehog 6E havoc 6F wife 70 lion 71 liquid 72 lettuce 73 lima 74 lorry 75 laptop 76 lawn 77 lilac 78 lesson 79 lager 7A lizard 7B lobster 7C locust 7D lady 7E lever 7F leafy 80 sunset 81 soya 82 saturday 83 samba 84 sierra 85 supper 86 sahara 87 salad 88 sushi 89 sugar 8A sixty 8B subway 8C second 8D spider 8E seven 8F sofa 90 ginger 91 grey 92 guitar 93 gammon 94 garlic 95 gypsy 96 growl 97 golf 98 goose 99 gaga 9A gazelle 9B globe 9C gecko 9D gadget 9E gravy 9F gift A0 animal A1 joy A2 atom A3 amazon A4 aroma A5 apple A6 jawbone A7 zulu A8 asia A9 august AA axle AB zebra AC acorn AD adder AE java AF africa B0 banana B1 bikini B2 butter B3 bamboo B4 barley B5 biopsy B6 bowtie B7 balcony B8 biscuit B9 bagel BA boxer BB baby BC bacon BD badger BE bravo BF beef C0 china C1 coyote C2 cattle C3 camel C4 charlie C5 copper C6 cow C7 celery C8 cheese C9 cage CA crazy CB cuba CC cocoa CD cider CE caviar CF cafe D0 donkey D1 day D2 diet D3 damson D4 dairy D5 depot D6 drowsy D7 delta D8 daisy D9 dog DA dozen DB debut DC doctor DD deduct DE dove DF daffodil E0 vanilla E1 voyage E2 vitamin E3 emu E4 virgin E5 epic E6 vehicle E7 violin E8 eskimo E9 egg EA vixen EB ebony EC echo ED vodka EE evening EF effort F0 finger F1 foyer F2 future F3 family F4 ferret F5 flipper F6 flow F7 falcon F8 fossil F9 frog FA foxtrot FB fabric FC focus FD friday FE flavour FF fifty

Figure 1: A 256-word vocabulary for saying all possible values of a byte or a pair of hex digits. The average word length is 5.4 letters and the maximum is 8. Apparent at first glance is a correlation between the first digit of the hex code and the first letter of the word, a consequence of the mapping in Table 1, explained in the text.

Related published systems

PGP Words

A word is list given in the PGPfone: Pretty Good Privacy Phone Owner's Manual by Philip R. Zimmerman dated 5 January 1996 available on a mit.edu url in June 2020.

A similar manual in pdf form is at https://philzimmermann.com/docs/pgpfone10b7.pdf

Describing how the word list was chosen, the manual says:

“After the computer evolved the winning list, we looked at it. Yes, the words were phonetically distinct. But many of them looked like a computer picked them, not a human.”

... “Some words were deleted, and replaced by some human-chosen words.” ... “I'm not entirely happy with the word list.”

This system described uses one word per byte, but it has two 256-word lists, one with two-syllable words and one with three. Odd byte positions are supposed to use one list and even byte positions the other. The two-syllable word list starts “aardvark absurd accrue acme adrift”. The three-syllable list starts “adroitness adviser aggregate alkali almighty amulet”.

There is a paper about the selection of these words:

Whole-Word Phonetic Distances and the PGPfone Alphabet, Patrick Juola & Philip Zimmermann

It includes: “Clearly, a full evaluation of this work requires some empirical checking, which at this point has not yet been done.”

The word list is not used in GPG. An archived email list on gnupg.org includes discussion of a patch to add it, dated March 2001.

The word list is in the source code of GNU ZRTP (RFC 6189) on github.com .

Proquints

There is a paper on arxiv.org A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable by Daniel Shawcross Wilkerson, dated 26 January 2009.

The proposal is to use a group of 5 letters of alternating consonants and vowels to represent 4 hex digits. The paper says "We suspect the fact that proquints are more information-dense than English is the reason proquints must be pronounced a bit more carefully than English - there's no such thing as a free lunch."

It includes examples of the representation of IP addresses:

127.0.0.1       lusab-babad
63.84.220.193   gutih-tugad
63.118.7.35     gutuk-bisog
140.98.193.141  mudof-sakat

The Proquint paper does not include a reference to Bubble Babble (below) which had been published over seven years earlier.

Bubble Babble

There is an email dated 7 August 2001 with Subject "Bubble Babble" which seems to contain a draft RFC, in a mail archive on ietf.org

Similar but adding “corrected 2011” is a draft at https://web.mit.edu/kenta/www/one/bubblebabble/spec/jrtrjwzi/draft-huima-01.txt

See https://kenta.blogspot.com/2011/03/jrtrjwzi-errors-in-bubble-babble.html

Like Proquint, Bubble Babble produces 5 letters, alternating vowels and consonants, from two bytes or 4 hex digits. Bubble Babble is a more complicated encoding and includes a checksum.

Bubble Babble is included in OpenSSH in ssh-genkey, where there is an option to use it to show the key fingerprint, for example:

ssh-keygen -B -f .ssh/id_dsa
1024 xetok-fylyk-vepec-cegec-muhul-cylen-resup-kucyr-ginar-vevev-doxox xxx@yyy (DSA)

IETF RFC 1751

A Convention for Human-Readable 128-bit Keys, D. McDonald, December 1994

A dictionary of 2048 short words with each word representing 11 bits. An example given is

EFF8 1F9B FBC6 5350 920C DD74 16DE 8009 = TROD MUTE TAIL WARM CHAR KONG HAAG CITY BORE O TEAL AWL

Not every “word” in the dictionary is English, for example PHI, MOS, MOE, RUBE, ULAN. There are words with identical sounds, for example SOW, SO, SEW, READ, REED, REID.

A simple way to represent cryptographic key fingerprints

Introduction

The basic principle

A vocabulary for speaking

Related published systems

PGP Words

Proquints

Bubble Babble

IETF RFC 1751

Related articles

Authenticating a public key from just a name - a feasibility study with GPL code

An idea for human-friendly hex strings in cryptographic key fingerprints

Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals

Lepron project part 2: towards public key authentication without central authority using names made of pseudowords

How to remember a provably strong password: a new way using ‘constrained choice’

How to remember a 128-bit key using ‘constrained choice’

A design for passwords with system-assigned randomness and user choice

Empirical explorations of faster Fermat factorisation, part 1

Use beads to generate random letters and numbers by hand

Manual encryption with a one-time pad revisited