A Distribute Information Reference (DIR) uniquely identifies a Node reference or an information in the Distributed Information Sytem (DIS). It is composed of a header byte followed by a varying number of Local Information References (LIR). The sequence of LIR defines a path in the DIS tree. A DIR has a 1 to 32 bytes long compact binary representation and a 4 to 98 bytes long ASCII representation as a URI.
1. Local Information Reference (LIR)
Each DIS Node contains references to other Nodes or information. The reference to other Nodes correspond to branches (Node links) in the DIS tree. An information can by of any type like images, text, music,… Each of these references or information is identified by a locally unique sequence of bytes called a Local Information Reference (LIR).
As depicted in figure 1, the LIR for information and Node references are distinct and independent identification spaces. The identification space of a LIR is specified by its type. There are at most 4 types of LIR in DIS, but only 2 are currently defined and used.
_______________ Parent | Node | Node <----|* 0 0 *|----> information on this node SubNode <----|* 1 1 *|----> image SubNode <----|* 2 2 *|----> text SubNode <----|* 3 3 *|----> music | : 4 -| | : 5 *|----> DIS certificate | : : | | : Local : | | Information | | Reference | |_______________| Node references Information Fig. 1: A Node containing information and Node references
A LIR can be 1 to 26 byte long and may contain up to 200 bits (25 bytes) of identification information. But keep in mind that the maximum length of a DIR is 32 byte long. Thus at most one LIR can be 26 byte long.
The identifier will most frequently be an unsigned 64 bit integer assigned incrementally to newly inserted data because it is the most simple and compact identifier. Since small integer values will then appear more frequently, the LIR uses a varying length integer encoding so that small integer values are encoded in only a few bytes. The identifier may also be a raw byte sequence defined by the user like a hash value, an IP address, a phone number, … A Node may use a combination of raw bytes LIR and integer LIR for each LIR type, each one corresponding to a distinct and independent address space.
a. LIR Binary encoding
A LIR is a varying length identifier value with bytes A0, A1, … An. When A0 is bigger than 230, the LIR is a raw bytes identifier. The A0 - 230 raw bytes follow A0. A raw byte LIR may have at most 25 raw bytes (200 bits).
When A0 is smaller than 231, the LIR is a 64 bit unsigned integer identifier. The table below shows for each encoding rule number, the corresponding values for A0 and the LIR byte size. It also shows the constants Ki associated with the rule Ni to encode and decode the LIR integer.
To encode the unsigned integer x into a LIR, the encoding rule number must be first determined. The rule Ni is picked with the biggest Ki smaller than x.
- encoding rule N0 : A0 = x;
- encoding rule N1 :
- x’ = x - K1;
- A0 = (x’ / 256) + 128;
- A1 = x’ % 256;
- encodings rule N2 to N8 :
- x’ = x - Ki;
- the i less significant bytes of x’ are stored in big endian order after the value A0i.
To decode the LIR into an integer, we first determine the decoding rule based on the value of A0.
- decoding rule N0 : x = A0;
- decoding rule N1 : x = (A0 - 128) * 256 + A1 + K1;
- decoding rule N2 to N8 :
- we decode the i bytes after A0 as the less significant bytes of the integer value x’ in big endian order;
- x = x’ + Ki.
b. LIR ASCII encoding
The ASCII LIR encoding uses a variant of a base64 encoding using the characters ‘-’, ‘0’ to ‘9’, ‘A’ to ‘Z’, ‘_’ and ‘a’ to ‘z’ in that order. It is the character set defined in RFC4648 which can be used for URIs and file names without needing percent encoding. The character order is however different to match the ASCII ordering.
To encode an integer LIR in ASCII:
- The integer LIR is first decoded as a 64 bit unsigned integer x;
- Two 0 bits are added as most significant bits so that the number of bits (66) of the integer can be divided by 6;
- Each group of 6 bits is replaced by the corresponding character in the list above, from the most significant bits to the less significant bits;
- All ‘-’ characters in front of the string are removed until the string doesn’t start with ‘-’ or the string contains the single character ‘-’.The string “-” then corresponds to the integer value 0;
To encode a raw bytes LIR:
- 0 bits are added in front of the most significant bits of A1 so that the total number of bits in the raw bytes can be divided by 6;
- each group of 6 bits is then replaced by the corresponding character from the above list, from the group of 6 bit starting with the most significant bits of A1 to the last group of 6 bits in the last byte;
- The character ‘~’ is prepended to the string to distinguish it from an integer.
Due to the base64 variant encoding, not all sequences of the character set form a valid ASCII encoding of a LIR.
2. Distributed Information Reference (DIR)
A Distributed Information Reference (DIR), as depicted in figure 2, is a sequence of LIR defining a path in the DIS tree. When the path starts at the root Node, the first LIR applies to the root Node and the DIR is called an absolute DIR. Otherwise it starts at another Node defined by the context and the DIR is called a relative DIR.
All LIR in a DIR, except the last one, are exclusively Node reference LIR. The last LIR can be of any type and defines the DIR type.
________________________ ____________________ |hdr|_ROOT_LIR_|_LIR_|__//___|_LIR_|_LAST_LIR_| Fig. 2: General encoding of an absolute DIR
a. DIR Binary encoding
The first byte of a DIR is the DIR header byte. It has 3 bit fields.
- bit 7: 0 for asolute DIR, 1 for relative DIR ;
- bits 5 and 6: type of the last LIR (0..3) ;
- 0: Node reference identifier ;
- 1: Information identifier ;
- 2: reserved ;
- 3: reserved ;
- bits 4 to 0: total number of LIR bytes that follow (0..31) ;
The header byte is followed by the sequence of LIR values encoded in binary.
If a DIR has 0 LIR bytes, the DIR is a null DIR. A null DIR can only be an absolute reference (header byte = 0x00) or a relative information (header byte = 0xA0).
b. DIR ASCII encoding
The ASCII representation of a DIR is a URI (RFC3986) with the scheme dis. The URI has only a path part. It has no authority, query or fragment part.
The segments of the path are the ASCII representation of the LIR in order of appearance in the DIR. Like any URI, path segments are separated by the character ‘/’. When the URI path starts with ‘/’ the DIR is an absolute DIR, otherwise it’s a relative DIR (root-less). When the path ends with ‘/’, it’s a Node reference DIR, otherwise it’s an information DIR. The URI “dis:” is a null (relative) information DIR. The URI “dis:/” is a null (absolute) Node reference DIR.
The other types of DIR will later be distinguished by appending a ‘~’ or ‘.’ to the last segment. These characters can only appear at the end of the path and can’t be directly preceded by a ‘/’. The URI “dis:~” or “dis:.” are null (relative) DIR of the corresponding types.