Distributed Information Reference

Christophe Meessen - Tue, Nov 3, 2015

A Distribute Information Reference (DIR) uniquely identifies a Node reference or an information in the Distributed Information Sytem (DIS). It is composed of a header byte followed by a varying number of Local Information References (LIR). The sequence of LIR defines a path in the DIS tree. A DIR has a 1 to 32 bytes long compact binary representation and a 4 to 98 bytes long ASCII representation as a URI.

1. Local Information Reference (LIR)

Each DIS Node contains references to other Nodes or information. The reference to other Nodes correspond to branches (Node links) in the DIS tree. An information can by of any type like images, text, music,… Each of these references or information is identified by a locally unique sequence of bytes called a Local Information Reference (LIR).

As depicted in figure 1, the LIR for information and Node references are distinct and independent identification spaces. The identification space of a LIR is specified by its type. There are at most 4 types of LIR in DIS, but only 2 are currently defined and used.

              _______________
Parent       |      Node     |
Node    <----|* 0         0 *|----> information on this node
SubNode <----|* 1         1 *|----> image
SubNode <----|* 2         2 *|----> text
SubNode <----|* 3         3 *|----> music
             |  :         4 -|
             |  :         5 *|----> DIS certificate
             |  :         :  |
             |  :  Local  :  |
             |  Information  |
             |   Reference   |
             |_______________|
   Node references        Information

 Fig. 1: A Node containing information and Node references

A LIR can be 1 to 26 byte long and may contain up to 200 bits (25 bytes) of identification information. But keep in mind that the maximum length of a DIR is 32 byte long. Thus at most one LIR can be 26 byte long.

The identifier will most frequently be an unsigned 64 bit integer assigned incrementally to newly inserted data because it is the most simple and compact identifier. Since small integer values will then appear more frequently, the LIR uses a varying length integer encoding so that small integer values are encoded in only a few bytes. The identifier may also be a raw byte sequence defined by the user like a hash value, an IP address, a phone number, … A Node may use a combination of raw bytes LIR and integer LIR for each LIR type, each one corresponding to a distinct and independent address space.

a. LIR Binary encoding

A LIR is a varying length identifier value with bytes A0, A1, … An. When A0 is bigger than 230, the LIR is a raw bytes identifier. The A0 - 230 raw bytes follow A0. A raw byte LIR may have at most 25 raw bytes (200 bits).

When A0 is smaller than 231, the LIR is a 64 bit unsigned integer identifier. The table below shows for each encoding rule number, the corresponding values for A0 and the LIR byte size. It also shows the constants Ki associated with the rule Ni to encode and decode the LIR integer.

N A0 size K
0 0..127 1 0
1 128..223 2 0x80
2 224 3 0x6080
3 225 4 0x16080
4 226 5 0x1016080
5 227 6 0x101016080
6 228 7 0x10101016080
7 229 8 0x1010101016080
8 230 9 0x101010101016080

To encode the unsigned integer x into a LIR, the encoding rule number must be first determined. The rule Ni is picked with the biggest Ki smaller than x.

To decode the LIR into an integer, we first determine the decoding rule based on the value of A0.

b. LIR ASCII encoding

The ASCII LIR encoding uses a variant of a base64 encoding using the characters ‘-’, ‘0’ to ‘9’, ‘A’ to ‘Z’, ‘_’ and ‘a’ to ‘z’ in that order. It is the character set defined in RFC4648 which can be used for URIs and file names without needing percent encoding. The character order is however different to match the ASCII ordering.

To encode an integer LIR in ASCII:

To encode a raw bytes LIR:

Due to the base64 variant encoding, not all sequences of the character set form a valid ASCII encoding of a LIR.

2. Distributed Information Reference (DIR)

A Distributed Information Reference (DIR), as depicted in figure 2, is a sequence of LIR defining a path in the DIS tree. When the path starts at the root Node, the first LIR applies to the root Node and the DIR is called an absolute DIR. Otherwise it starts at another Node defined by the context and the DIR is called a relative DIR.

All LIR in a DIR, except the last one, are exclusively Node reference LIR. The last LIR can be of any type and defines the DIR type.

         ________________________ ____________________
        |hdr|_ROOT_LIR_|_LIR_|__//___|_LIR_|_LAST_LIR_|
       
       Fig. 2: General encoding of an absolute DIR

a. DIR Binary encoding

The first byte of a DIR is the DIR header byte. It has 3 bit fields.

  1. bit 7: 0 for asolute DIR, 1 for relative DIR ;
  2. bits 5 and 6: type of the last LIR (0..3) ;
    • 0: Node reference identifier ;
    • 1: Information identifier ;
    • 2: reserved ;
    • 3: reserved ;
  3. bits 4 to 0: total number of LIR bytes that follow (0..31) ;

The header byte is followed by the sequence of LIR values encoded in binary.

If a DIR has 0 LIR bytes, the DIR is a null DIR. A null DIR can only be an absolute reference (header byte = 0x00) or a relative information (header byte = 0xA0).

b. DIR ASCII encoding

The ASCII representation of a DIR is a URI (RFC3986) with the scheme dis. The URI has only a path part. It has no authority, query or fragment part.

The segments of the path are the ASCII representation of the LIR in order of appearance in the DIR. Like any URI, path segments are separated by the character ‘/’. When the URI path starts with ‘/’ the DIR is an absolute DIR, otherwise it’s a relative DIR (root-less). When the path ends with ‘/’, it’s a Node reference DIR, otherwise it’s an information DIR. The URI “dis:” is a null (relative) information DIR. The URI “dis:/” is a null (absolute) Node reference DIR.

The other types of DIR will later be distinguished by appending a ‘~’ or ‘.’ to the last segment. These characters can only appear at the end of the path and can’t be directly preceded by a ‘/’. The URI “dis:~” or “dis:.” are null (relative) DIR of the corresponding types.