Lepron project part 3: A first attempt at pronounceable passwords using dice for quantified strength
by Stephen Hewitt | Published 8 September 2021
This is the third article in a planned series of technical articles about Lepron, an open-source, free software project to develop a system for representing arbitrary binary strings as pronounceable pseudowords. This project is for the benefit of anyone who wants to use it or develop it further.
The first article [LEPRON] introduced three motivations for the project and this article is explores an aspect of the first of these, which was to generate pronounceable passwords of quantified strength.
This article presents tables for creating a pronounceable password by casting dice, without using a computer.
In using dice the method resembles those described in [DICEWARE] or [EFF] but the difference is that the result is a pseudoword rather than a sequence of real words or passphrase.
Another difference is that this article incorporates the idea from [PW] that for password strength the mapping between number and pseudoword does not have have to be a bijective but can be a surjective, where multiple pseudowords can map to the same number. Note therefore this case is a departure from the original goal of bijective mapping proposed in [LEPRON].
The surjective mapping allows someone using these tables some strictly limited choice. From a linguistic perspective the method used to produce these pseudowords is primitive and the presence of this choice might mitigate this by enabling a user to reject more ungrammatical word forms. It is also possible that choice may also help with personal memorisation, as in [PW], though this aspect has not been explored here.
The tables have been generated using the simple free software provided in the Appendix, with additional hand-tweaking of the output, but no software is needed to use them. The method used to form pseudowords is the same as described in [LEPRON2] and software used is a modified version of that presented in that article.
How it works
Each pseudoword produced using this method takes 11 dice results. It is constructed in the simple way described in [LEPRON2], briefly recapitulated here.
The pseudoword is built from segments, each of which is essentially (but not exactly) a cluster of consonants or a cluster vowels. The reasons they are not exactly that is that a ‘u’ following a ‘q’ is considered to be part of the same consonant cluster and that trailing vowels on the end of the word are considered to be part of the final consonant cluster.
There are four segment types. One is a vowel cluster and three are consonant clusters. The consonant clusters are defined as a start cluster, which occurs at the start of the word, a middle cluster, which occurs between vowel clusters, and an end cluster, which occurs at the end of the word and always includes any trailing vowels on the end of the word.
The word is then constructed from a template of clusters as follows:
For example the real word ‘consonant’ could be generated from this template and would be segmented as c-o-ns-o-n-a-nt as follows:
<start> = c <vowel> = o <middle> = ns <vowel> = o <middle> = n <vowel> = a <end> = nt
For each of the four segment types, there is a corresponding table below. To build a pseudoword segment by segment, the appropriate table is used to look up the particular instance of a segment type that corresponds to its dice result.
It takes two dice to select a start segment from the 36 possibilities in Figure 1. It takes one dice to select a vowel segment from the six possibilities in Figure 2. It takes two dice to select a middle segment from the 36 possibilities in Figure 3. Finally it takes two dice to select an end segment from Figure 4.
For example the following dice results can produce the pseudoword ‘struppersoige’:
56 5 62 2 35 5 61 stru pp e rs oige
Using the choices available in the tables, other possibilities for these dice results might have been ‘spurmessuge’ or ‘stroippessupe’
Some comparisons with passphrases from diceware and the EFF word list
The Diceware word list [DICEWARE] and the Electronic Frontier Foundation's long word list [EFF] both have 5 dice per word so one of these pseudowords (with 11 dice) is equivalent in entropy to slightly more than a two-word passphrase made with those words.
The XKCD comic estimated ‘correct horse battery staple’ type of phrase to have 44 bits of entropy. Two of these pseudowords would be very slightly less than 57 bits.
This article avoids making any recommendations about the strength of passwords.
There is scope to optimise these tables, meaning that better quality words could perhaps be produced with the same number of pseudowords, or the number of possible pseudowords could be increased with no loss of quality.
The introduction of choice into the mapping from number to word segment has introduced an area for investigation. Some arbitrary decisions have been made in the tables presented here.
Firstly the decision to limit the choice to two is arbitrary. There is no fundamental reason not to introduce more possibilities. One motivation for limiting them might be to keep the system simple to use, but it is not clear whether choice really affects this. And if reduced choice does make it easier to use, should the choice be entirely removed?
Secondly for all of the consonant segment types, the number used is currently truncated at 72 (36 x 2). This number was close to the numbers used in the the software of [LEPRON2] which produced words of reasonable quality. But given user choice, it may be possible to increase this number without degrading the word quality, since a user can reject low quality possibilities.
Given choice is included in the system, this means that the consonant segments are partitioned. The dice result now selects not an individual segment, but a partition containing several possible segments, in the same way as described for a word abbreviation in [PW].
There is scope for investigating not only how many, but also which segments should be together in a particular partition. It might turn out that consonant segments can be classified into different types and that certain types are suitable for a particular context in a word and certain types are not.
If this is so then there is scope to investigate whether the partitioning could be optimised so that dissimilar types of segment are in the same partition, so that the partition could cope with multiple possible contexts.
One possible investigation is to determine how many middle clusters contain distinct syllable boundaries, where the start of the second syllable is independent of the end of the first syllable. An extreme example would be a segment from a compound word like ‘background’. In this case the middle segment ‘ckgr’ represents the juxtaposition of the the end of one word and the start of a following word.
If in some middle segments the end of the first syllable is completely independent of the start of the next one, then is the number of such start-end permutations included in the set of middle segments in Figure 3 currently being limited by the words that happen to be in the small dictionary used to generate them rather than by linguistic constraints?
If so it might be possible to increase the number of middle segments. Perhaps the number could even be increased to three dice (6x6x6 = 216). Alternatively perhaps the word template could be changed at least for some words, so that what is now a middle segment will become two components.
A related open question for the dice is whether the size of each look-up table has to remain a power of six. For example it would be easily possible to have also multiples of 2 and 3. This could be achieved by either instructing the user to divide a dice result by 2 or 3, or by simply duplicating table entries.
Another method of achieving division by two is the method of [PW] where dice are cast in pairs and the order of the dice ignored, so making 18 different possibilities rather than 36.
It might be better overall to restrict the vowels to five possibilities, meaning that if the dice showed a 6 then it would have to be cast again, rather than allowing a segment of multiple vowels. In a simple system where the adjacent consonants do not depend on the vowels and vice versa, it seems that allowing multiple vowels degrades the quality of the words, as noted in [LEPRON2].
The problem with ‘qu’ has not been completely solved. The initial problem was that ‘qu’ cannot normally be followed by a vowel segment of a single ‘u’ and yet single vowels are preferred in this system as just noted, which means a single ‘u’ will occur quite often. The partial mitigation here was to include an optional alternative to the single ‘u’. So ‘quoi’ can be chosen instead of ‘quu’. It might be better just to exclude q from this dice system.
Appendix - Lepron is free GPL software
The software that generated the tables is available for download under GPL (Gnu Public Licence) on this website at /download/lepron-r2.tgz (SHA256 bea8e5886b0d8bce4e2a26f1d8bc1dade55de480fe4f4e5de68eda67f1e80730) It comes with no warranty - see the Licence for more details. This is the second release of Lepron software and it also contains the software presented in [LEPRON2], with some updates.
Links to these are in the sections below.
- The Diceware Passphrase Home Page, Arnold G. Reinhold, diceware.com,
- EFF's New Wordlists for Random Passphrases, Joseph Bonneau, Electronic Frontier Foundation, 19 July 2016
- An idea for public key authentication from a name without certificates or central authority, Stephen Hewitt, Cambridge Clarion, 14 May 2021
- Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals, Stephen Hewitt, Cambridge Clarion, 2 July 2021
- Lepron project part 2: towards public key authentication without central authority using names made of pseudowords, Stephen Hewitt, Cambridge Clarion, 3 August 2021
- How to remember a provably strong password: a new way using ‘constrained choice’, Stephen Hewitt, Cambridge Clarion, July 2018.
- Lepron, a project to develop pronounceable pseudowords for representing binary strings - Part 1: introduction and goals May 2021 An empirical investigation by Stephen Hewitt
- Lepron project part 2: towards public key authentication without central authority using names made of pseudowords August 2021 An open source project by Stephen Hewitt
- How to remember a provably strong password: a new way using ‘constrained choice’ July 2018, Stephen Hewitt. The 2nd Clarion data privacy article
- How to remember a 128-bit key using ‘constrained choice’ August 2018, Stephen Hewitt. The 3rd Clarion data privacy article
- An idea for public key authentication from a name without certificates or central authority May 2021 Stephen Hewitt
- How to use beads to generate random letters and numbers by hand 22 September 2017, Stephen Hewitt. The first Clarion data privacy article
- ars technica article: Anatomy of a hack: How crackers ransack passwords like “qeadzcwrsfxv1331” Dan Goodin, ars technica, 28 May 2013
- Electronic Frontier Foundation article: EFF's New Wordlists for Random Passphrases Joseph Bonneau, Electronic Frontier Foundation, 19 July 2016
- diceware.com article: The Diceware Passphrase Home Page Arnold G. Reinhold, diceware.com,
- XKCD article: Password Strength XKCD, 10 August 2011
- How to Memorize a Random 60-Bit String Marjan Ghazvininejad and Kevin Knight, 2015 (SHA256 of PDF f0e091f445a4f777b123d56153cf149552e51424493ddf276378a709cbf5d79c)