Lepron project part 3: A first attempt at pronounceable passwords using dice for quantified strength

by Stephen Hewitt | Published 8 September 2021

Background

This is the third article in a planned series of technical articles about Lepron, an open-source, free software project to develop a system for representing arbitrary binary strings as pronounceable pseudowords. This project is for the benefit of anyone who wants to use it or develop it further.

The first article [LEPRON] introduced three motivations for the project and this article is explores an aspect of the first of these, which was to generate pronounceable passwords of quantified strength.

Introduction

This article presents tables for creating a pronounceable password by casting dice, without using a computer.

In using dice the method resembles those described in [DICEWARE] or [EFF] but the difference is that the result is a pseudoword rather than a sequence of real words or passphrase.

Another difference is that this article incorporates the idea from [PW] that for password strength the mapping between number and pseudoword does not have have to be a bijective but can be a surjective, where multiple pseudowords can map to the same number. Note therefore this case is a departure from the original goal of bijective mapping proposed in [LEPRON].

The surjective mapping allows someone using these tables some strictly limited choice. From a linguistic perspective the method used to produce these pseudowords is primitive and the presence of this choice might mitigate this by enabling a user to reject more ungrammatical word forms. It is also possible that choice may also help with personal memorisation, as in [PW], though this aspect has not been explored here.

The tables have been generated using the simple free software provided in the Appendix, with additional hand-tweaking of the output, but no software is needed to use them. The method used to form pseudowords is the same as described in [LEPRON2] and software used is a modified version of that presented in that article.

How it works

Each pseudoword produced using this method takes 11 dice results. It is constructed in the simple way described in [LEPRON2], briefly recapitulated here.

The pseudoword is built from segments, each of which is essentially (but not exactly) a cluster of consonants or a cluster vowels. The reasons they are not exactly that is that a ‘u’ following a ‘q’ is considered to be part of the same consonant cluster and that trailing vowels on the end of the word are considered to be part of the final consonant cluster.

There are four segment types. One is a vowel cluster and three are consonant clusters. The consonant clusters are defined as a start cluster, which occurs at the start of the word, a middle cluster, which occurs between vowel clusters, and an end cluster, which occurs at the end of the word and always includes any trailing vowels on the end of the word.

The word is then constructed from a template of clusters as follows:

<start><vowel><middle><vowel><middle><vowel><end>

For example the real word ‘consonant’ could be generated from this template and would be segmented as c-o-ns-o-n-a-nt as follows:

<start> = c <vowel> = o <middle> = ns <vowel> = o <middle> = n <vowel> = a <end> = nt

For each of the four segment types, there is a corresponding table below. To build a pseudoword segment by segment, the appropriate table is used to look up the particular instance of a segment type that corresponds to its dice result.

It takes two dice to select a start segment from the 36 possibilities in Figure 1. It takes one dice to select a vowel segment from the six possibilities in Figure 2. It takes two dice to select a middle segment from the 36 possibilities in Figure 3. Finally it takes two dice to select an end segment from Figure 4.

For example the following dice results can produce the pseudoword ‘struppersoige’:

56 5 62 2 35 5 61
stru pp e rs oige

Using the choices available in the tables, other possibilities for these dice results might have been ‘spurmessuge’ or ‘stroippessupe’

11 c/cp 12 b/bh 13 p/xm 14 d/bs 15 r/ts 16 s/chl 21 m/kh 22 l/dw 23 t/kl 24 h/kr 25 f/chr 26 g/ps 31 n/x 32 w/sch 33 v/gn 34 j/gh 35 pr/spl 36 st/shr 41 tr/spr 42 ch/sk 43 br/kn 44 fl/squ 45 cr/thr 46 gr/tw 51 k/rh 52 sh/scr 53 qu/sm 54 fr/ph 55 bl/wr 56 sp/str 61 cl/sn 62 dr/sw 63 sl/z 64 pl/th 65 y/sc 66 gl/wh
Figure 1: Start-of-word segment mapping from dice outcome
1 a 2 e 3 i 4 o 5 u/oi 6 ee/ea/au/ua/ya/y/oo/ai/io
Figure 2: Vowel segment mapping from dice outcome
11 t/mpl 12 l/nf 13 r/gg 14 n/nv 15 m/lt 16 v/dd 21 c/rb 22 d/bb 23 s/ch 24 g/rl 25 p/pl 26 b/pr 31 st/sp 32 ll/sh 33 nt/bl 34 nd/str 35 ss/rs 36 rr/br 41 w/gn 42 f/rg 43 mp/tr 44 ct/j 45 rt/cr 46 tt/cc 51 mb/y 52 ff/rc 53 x/gr 54 h/rd 55 th/pt 56 nn/nc 61 qu/ph 62 rm/pp 63 sc/k 64 ns/rn 65 mm/ng 66 z/ck
Figure 3: Middle consonant segment mapping from dice outcome
11 r/rry 12 n/gh 13 t/wl 14 l/lk 15 te/nge 16 nt/ft 21 m/sk 22 d/wn 23 s/na 24 re/be 25 ne/gy 26 p/rm 31 st/ny 32 y/cs 33 w/sm 34 se/nch 35 ck/f 36 nd/ld 41 ll/dge 42 ry/ght 43 de/rk 44 ss/ze 45 c/th 46 ve/ff 51 g/ch 52 le/ble 53 nce/rn 54 sh/tch 55 ty/mp 56 ce/lt 61 ge/pe 62 ng/nk 63 ct/k 64 b/ke 65 me/rd 66 rt/x
Figure 4: End-of-word segment mapping from dice outcome

Some comparisons with passphrases from diceware and the EFF word list

The Diceware word list [DICEWARE] and the Electronic Frontier Foundation's long word list [EFF] both have 5 dice per word so one of these pseudowords (with 11 dice) is equivalent in entropy to slightly more than a two-word passphrase made with those words.

The XKCD comic estimated ‘correct horse battery staple’ type of phrase to have 44 bits of entropy. Two of these pseudowords would be very slightly less than 57 bits.

This article avoids making any recommendations about the strength of passwords.

Further work

There is scope to optimise these tables, meaning that better quality words could perhaps be produced with the same number of pseudowords, or the number of possible pseudowords could be increased with no loss of quality.

The introduction of choice into the mapping from number to word segment has introduced an area for investigation. Some arbitrary decisions have been made in the tables presented here.

Firstly the decision to limit the choice to two is arbitrary. There is no fundamental reason not to introduce more possibilities. One motivation for limiting them might be to keep the system simple to use, but it is not clear whether choice really affects this. And if reduced choice does make it easier to use, should the choice be entirely removed?

Secondly for all of the consonant segment types, the number used is currently truncated at 72 (36 x 2). This number was close to the numbers used in the the software of [LEPRON2] which produced words of reasonable quality. But given user choice, it may be possible to increase this number without degrading the word quality, since a user can reject low quality possibilities.

Given choice is included in the system, this means that the consonant segments are partitioned. The dice result now selects not an individual segment, but a partition containing several possible segments, in the same way as described for a word abbreviation in [PW].

There is scope for investigating not only how many, but also which segments should be together in a particular partition. It might turn out that consonant segments can be classified into different types and that certain types are suitable for a particular context in a word and certain types are not.

If this is so then there is scope to investigate whether the partitioning could be optimised so that dissimilar types of segment are in the same partition, so that the partition could cope with multiple possible contexts.

One possible investigation is to determine how many middle clusters contain distinct syllable boundaries, where the start of the second syllable is independent of the end of the first syllable. An extreme example would be a segment from a compound word like ‘background’. In this case the middle segment ‘ckgr’ represents the juxtaposition of the the end of one word and the start of a following word.

If in some middle segments the end of the first syllable is completely independent of the start of the next one, then is the number of such start-end permutations included in the set of middle segments in Figure 3 currently being limited by the words that happen to be in the small dictionary used to generate them rather than by linguistic constraints?

If so it might be possible to increase the number of middle segments. Perhaps the number could even be increased to three dice (6x6x6 = 216). Alternatively perhaps the word template could be changed at least for some words, so that what is now a middle segment will become two components.

A related open question for the dice is whether the size of each look-up table has to remain a power of six. For example it would be easily possible to have also multiples of 2 and 3. This could be achieved by either instructing the user to divide a dice result by 2 or 3, or by simply duplicating table entries.

Another method of achieving division by two is the method of [PW] where dice are cast in pairs and the order of the dice ignored, so making 18 different possibilities rather than 36.

It might be better overall to restrict the vowels to five possibilities, meaning that if the dice showed a 6 then it would have to be cast again, rather than allowing a segment of multiple vowels. In a simple system where the adjacent consonants do not depend on the vowels and vice versa, it seems that allowing multiple vowels degrades the quality of the words, as noted in [LEPRON2].

The problem with ‘qu’ has not been completely solved. The initial problem was that ‘qu’ cannot normally be followed by a vowel segment of a single ‘u’ and yet single vowels are preferred in this system as just noted, which means a single ‘u’ will occur quite often. The partial mitigation here was to include an optional alternative to the single ‘u’. So ‘quoi’ can be chosen instead of ‘quu’. It might be better just to exclude q from this dice system.

Appendix - Lepron is free GPL software

The software that generated the tables is available for download under GPL (Gnu Public Licence) on this website at /download/lepron-r2.tgz (SHA256 bea8e5886b0d8bce4e2a26f1d8bc1dade55de480fe4f4e5de68eda67f1e80730) It comes with no warranty - see the Licence for more details. This is the second release of Lepron software and it also contains the software presented in [LEPRON2], with some updates.

References

Links to these are in the sections below.

[DICEWARE]
The Diceware Passphrase Home Page, Arnold G. Reinhold, diceware.com,
[EFF]
EFF's New Wordlists for Random Passphrases, Joseph Bonneau, Electronic Frontier Foundation, 19 July 2016
[GLOBALNAMES]
An idea for public key authentication from a name without certificates or central authority, Stephen Hewitt, Cambridge Clarion, 14 May 2021
[LEPRON]
Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals, Stephen Hewitt, Cambridge Clarion, 2 July 2021
[LEPRON2]
Lepron project part 2: towards public key authentication without central authority using names made of pseudowords, Stephen Hewitt, Cambridge Clarion, 3 August 2021
[PW]
How to remember a provably strong password: a new way using ‘constrained choice’, Stephen Hewitt, Cambridge Clarion, July 2018.

Related

External links