Innovation and design

Authenticating a public key from a globally unique name continued

by | Published

This article presents a small advance in the concept described in the 2024 article Authenticating a public key from just a name - a feasibility study with GPL code [1].

That article described how a cryptographic hash can be used prove that a particular globally-unique but human-friendly name has been derived from a particular cryptographic public key and cannot have come from anywhere else. Its principle is to generate a hash from a pre-image that includes both the key and a grammar. The grammar defines a translation of that hash value into the name. For details see the article.

The original article held as a principle of security that a name may represent only one hash value. In other words for any particular grammar, only one value would map to a particular name. This article proposes a limited relaxation of this requirement with no loss of security. Under the relaxation a name may represent a range of hash values, provided that the range is restricted to a small enough fraction of all possible hash values. For example if that fraction is less than or equal to 1/2128 then the system has at least the same security as a system that requires a unique hash value but uses only a 128-bit hash image. The revised concept is shown in Figure 1 (cf Figure 4 in the earlier article).


       Key file             Grammar file
 +-----------------+    +------------------+
 | Nonce           |    | Word PCFG        |
 | Associated info |    | %%               |
 | Public key      |    | Phrase Grammar   |
 | Hash(Grammar)<-------| %%               |
 | Length(Grammar) |    | Word permutation |
 +-----------------+    +------------------+
           |                      |
           V                      |
   hash(Key file)                 |
     |                            |
     V                            |
 +-------------------------+      |
 | Key fingerprint, n bits |      |
 +-------------------------+      |
      |                           |
   ___V____                  _____V_____
 /         \  Probability  /            \
 | Entropy |  model        | Generate   |
 | coding  |<--------------| strings    |
 |         |               | using PCFG |
 |         |-------------->|            |
 |         |  Production   |            |
 \_________/  choices      \____________/
      |                           |
      V                           V
 +------------------+  +----------------------+
 | Residual entropy |  | Globally unique name |
 | k bits           |  +----------------------+
 +------------------+

Figure 1. Authentication of a public key by regeneration of its owner's name. Here n is an integer but k is a real number, because the residual entropy may contain a fractional number of bits. Before each hashing operation on a new key file the a priori probability of it resulting in a particular name is 2k/2n in the random oracle model. Security depends on enforcing a maximum value on this fraction for valid names. The intent is that this maximum is small enough that an adversary has negligible probability of colliding with a target name by repeated hashing of different key files.

The motivation for this reformulation is the interaction with the range encoding used by the grammar to convert the hash value to a name. The fundamental step forward is that what is enforced now is an inequality rather than an equality. This is better suited to the probabilistic encoding of arithmetic encoding, rather than enumerative encoding.

For example if we enforce k <= n - 128 then we have equivalent security level at a minimum as the prototype version in the previous article.

It means there are more than 2128 names, and we don't really know how many there are.

The original idea was to make the name represent exactly 128 bits. The range encoding used takes an integer (initially of 128 bits) and uses it to create choices. so as entropy is used to make each choice in a choice node, the entropy remaining in the integer is reduced by a corresponding amount and the integer becomes smaller.

As explained in [1], a name in consists of a sequence of pseudowords. However each pseudoword generated in a name may consume an unpredictable amount of entropy, so that making a complete name of whole pseudowords exactly 128 bits requires a method. This method was to make the last word produced enumerative rather than probabilistic. By enumerative is meant that every possible string is equally likely and is assigned to a particular integer value. Provided then that the number of integer values remaining from the hash was not more than the number of strings under the node in the grammar for that last word it was guaranteed that this final word would consume all the remaining entropy and no more.

However it is evident that we can allow a set of values to result in a single name, providing that the probability of the hash image falling within this set is less than our security requirement. With range or arithmetic encoding this set of values will be a range of contiguous values, but in principle it can be any set of values. A scattered set of values might occur with other kinds of entropy encoding.

So for example, if we start with a hash image of 128 bits, then in the prototype of [1] we require that there is only one integer in the range 0 to (2128 - 1) that maps to any particular name. But if we use (say) 132 bits from the hash then our requirement becomes that a set of x integers within the range 0 to 2132 - 1 may map to the same name, provided that

x/2132 < 1/2128

In other words, the requirement is that the probability of generating a hash that maps to this name must be less than 1/2128, a value at least as small as it would be if we insisted on using only 128 bits and a unique integer. Note that this requirement does not necessarily mean that the first 128 bits of the hash image are the same for all the values that map to the same name. The security proposition from [1] can now be restated in the following terms.

The restated security proposition

Suppose the key fingerprint input to the PCFG consists of n bits and that after generating the name k bits remain, meaning that there are 2k values of the n-bit hash image that generate the name. In the random oracle model [ROM], given a particular target name that has been generated from a key file, for each hashing operation on a different key file there is at most a 2k/2n chance of finding a collision with the target name.

Terminology

CFG
Context free grammar
Commission
Defined in this article to be the remaining number of fingerprint values that are possible at any particular point in range-encoding
Fingerprint
A cryptographic hash or digest of a public key and related information
PCFG
Probabilistic context free grammar

References

[1]

Authenticating a public key from just a name - a feasibility study with GPL code, Stephen Hewitt

[ROM]

Random oracles are practical: a paradigm for designing efficient protocols, Mihir Bellare and Phillip Rogaway, December 1993

In December 2023 a PDF file was at https://dl.acm.org/doi/pdf/10.1145/168588.168596, linked from page https://dl.acm.org/doi/10.1145/168588.168596 with SHA256 hash 61f2224f2af5bfaaa586a6be0d17b9b2164410506e224b692bf7312ad9e0afdc.

Related articles