ASN.1 is a notation for describing data structures; it's very much
like a type declaration in C or C++. Let's start with a C++ structure and
create the appropriate ASN.1 Data structure. I'll use a simplified form of
the the GET request from FHTTP to start with.
struct GetRequest {
int headerOnly; // flag: do we only want headers?
int lock; // flag: should we checkout the object?
string url; // the url to fetch
AcceptTypes* acceptTypes; // Optional list of accept types that only apply
// to this request
};
struct AcceptTypes {
List<bitset>* standardTypes; // list of bitmaps indicating which
// preference order for standard types
List<string>* otherTypes; // nonstandard types in preference order
};
GetRequest ::= [APPLICATION 0] IMPLICIT SEQUENCE {
headerOnly BOOLEAN,
lock BOOLEAN,
acceptTypes AcceptTypes OPTIONAL
url OCTET STRING,
}
AcceptTypes ::= [APPLICATION 1] IMPLICIT SEQUENCE {
standardTypes [0] IMPLICIT SEQUENCE OF BIT STRING
{html(0),plain-text(1),gif(2), jpeg(3)} OPTIONAL,
otherTypes [1] IMPLICIT SEQUENCE OF OCTET STRING OPTIONAL
}
For the encoding examples, we'll use the following example. (the notation for
the test case is a little lispy. If that makes you uncomfortable, just thing
of it as a wais source :-)
(GetRequest
:headerOnly TRUE
:lock FALSE
:acceptTypes (AcceptTypes
:standardTypes ((html)
(plain-text)))
:url "/ses/magic/moxen.html")
-----
The Basic Encoding Rules (BER) were the orignal rules for taking an
ASN.1 data type, and turning it into a sequence of bits and bytes. BER uses
a form of encoding commonly known as Tag-Length-Value. Each item is encoded as
a tag, indicating what type it is, a length indicating the size of the object,
and a value, which contains the actual contents of the object.
Tags are reasonably simple - in there simplest form they consist of a single
byte; the highest two bits indicate the tag class (whether the tag is an
official ISO tag, an application wide tag, a private tag, or a tag that only
has meaning for this particular structure. The next bit is a flag to indicate
whether the tagged object is a simple type, such as a number or a string,
or a compound type made up from a bunch of other types. The remaining 5 bits
are used to represent the tag number. If the tag is too big to fit in 5 bits,
then these bits are set to all ones, and the tag number is encoded in the
following bytes as a sequence of seven bit bytes. The high bit of these
bytes is used as a flag to indicate whether there's more tag available.
Lengths are also quite simple. There are two ways of encoding lengths -
the definite form, and the indefinite form.
For the definite form, if the length is less than 128, you just use a single
byte, with the high bit set to zero. Otherwise the high bit is set to one,
and the low seven bits set to the length of length. The length is then
encoded in that many bytes.
The indefinite form is encoded by sending a length field with a length of
length of 0 - i.e. [1000|0000]. The object is ended by sending two
zero bytes.
Here's our test case encoded using the BER
0x60 -- [0110|0000], [APPLICATION 0, Compound] - GetRequest
0x80 -- [1000|0000], Indefinite length
0x01 -- [0000|0001], [BOOLEAN] GetRequest.headerOnly
0x01 -- [0000|0001], length 1
0x01 -- [0000|0001], value TRUE
0x01 -- [0000|0001], [BOOLEAN] GetRequest.lock
0x01 -- [0000|0001], length 1
0x00 -- [0000|0000] value FALSE
0x61 -- [0110|0001], [APPLICATION 1, Compound] - GetRequest.types
0x80 -- indefinite length
0xa0 -- [1010|0000], [CONTEXT 0, Compound] types.standardTypes
0x80 -- indefinite length
0x03 -- [0000|0011] [BIT STRING]
0x02 -- length 2
0x04 -- Four bits unused
0x80 -- [1000|0000] {html}
0x03 -- [0000|0011] [BIT STRING]
0x02 -- length 2
0x04 -- Four bits unused
0x40 -- [0100|0000] {plain-text}
0x00
0x00 -- End of list of standard Types
0x00
0x00 -- End of Accept Types
0x04 -- [0000|0100], [OCTET STRING] GetRequest.url
0x16 -- [0001|0110], length 22
[/ses/magic/moxen.html] -- value
0x00
0x00 -- End of get request
[50 bytes total], 22 url
---------------
The packed encoding rules use a different style of encoding. Instead of
using a generic style of encoding that encodes all types in a uniform way,
the PER specialise the encoding based on the data type to generate much more
compact representations.
PER only generates tags when they are needed to prevent ambiguity- this
only occurs when ASN.1's version of union is used (CHOICE). PER also only
generates lengths when the size of an object can vary. Even then, PER tries
to represent the lengths in the most compact form possible.
PER encodings are not always aligned on byte boundaries- if the 'aligned'
variant of the rules is used, then strings *will* be aligned - otherwise
the encoding is treated as a string of bits, allowing things like booleans
and small integers to be squished together in one byte.
The presence of optional elements in a sequence is indicated by a list of
single bit flags placed at the start of a sequence - if the bit is set, then
the option is present.
Here's what our test case looks like in PER.
[1] -- flag bit indicates acceptTypes is present
[1] -- boolean, header only, TRUE
[0] -- boolean, lock, FALSE
[1] -- flag bit, indicates standardTypes is present
[0] -- flag bit, indicates otherTypes is absent
[000] -- pad bits to make length octet aligned
[00000010] -- length of 2, -- 2 standardType bit strings to follow
[1000] -- the first bit string, html is set
[0100] -- the second bit string, plain-text is set
[00010110] -- length 22; url is 22 octets long
/ses/magic/moxen.html -- value of url
[total size is 26, 22 url]
In this case, the PER are about twice as compact as the BER. If I had chosen
a shorter URL, such as /, the difference would have been even greater -
BER, 29; PER, 5, for a ratio of over 5-1.