[16] | 1 | <?xml version="1.0" standalone="no"?> |
---|
| 2 | <!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" |
---|
| 3 | "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [ |
---|
| 4 | |
---|
| 5 | ]> |
---|
| 6 | |
---|
| 7 | <section id="vorbis-spec-codebook"> |
---|
| 8 | <sectioninfo> |
---|
| 9 | <releaseinfo> |
---|
| 10 | $Id: 03-codebook.xml 7186 2004-07-20 07:19:25Z xiphmont $ |
---|
| 11 | </releaseinfo> |
---|
| 12 | </sectioninfo> |
---|
| 13 | <title>Probability Model and Codebooks</title> |
---|
| 14 | |
---|
| 15 | <section> |
---|
| 16 | <title>Overview</title> |
---|
| 17 | |
---|
| 18 | <para> |
---|
| 19 | Unlike practically every other mainstream audio codec, Vorbis has no |
---|
| 20 | statically configured probability model, instead packing all entropy |
---|
| 21 | decoding configuration, VQ and Huffman, into the bitstream itself in |
---|
| 22 | the third header, the codec setup header. This packed configuration |
---|
| 23 | consists of multiple 'codebooks', each containing a specific |
---|
| 24 | Huffman-equivalent representation for decoding compressed codewords as |
---|
| 25 | well as an optional lookup table of output vector values to which a |
---|
| 26 | decoded Huffman value is applied as an offset, generating the final |
---|
| 27 | decoded output corresponding to a given compressed codeword.</para> |
---|
| 28 | |
---|
| 29 | <section><title>Bitwise operation</title> |
---|
| 30 | <para> |
---|
| 31 | The codebook mechanism is built on top of the vorbis bitpacker. Both |
---|
| 32 | the codebooks themselves and the codewords they decode are unrolled |
---|
| 33 | from a packet as a series of arbitrary-width values read from the |
---|
| 34 | stream according to <xref linkend="vorbis-spec-bitpacking"/>.</para> |
---|
| 35 | </section> |
---|
| 36 | |
---|
| 37 | </section> |
---|
| 38 | |
---|
| 39 | <section> |
---|
| 40 | <title>Packed codebook format</title> |
---|
| 41 | |
---|
| 42 | <para> |
---|
| 43 | For purposes of the examples below, we assume that the storage |
---|
| 44 | system's native byte width is eight bits. This is not universally |
---|
| 45 | true; see <xref linkend="vorbis-spec-bitpacking"/> for discussion |
---|
| 46 | relating to non-eight-bit bytes.</para> |
---|
| 47 | |
---|
| 48 | <section><title>codebook decode</title> |
---|
| 49 | |
---|
| 50 | <para> |
---|
| 51 | A codebook begins with a 24 bit sync pattern, 0x564342: |
---|
| 52 | |
---|
| 53 | <screen> |
---|
| 54 | byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42) |
---|
| 55 | byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43) |
---|
| 56 | byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56) |
---|
| 57 | </screen></para> |
---|
| 58 | |
---|
| 59 | <para> |
---|
| 60 | 16 bit <varname>[codebook_dimensions]</varname> and 24 bit <varname>[codebook_entries]</varname> fields: |
---|
| 61 | |
---|
| 62 | <screen> |
---|
| 63 | |
---|
| 64 | byte 3: [ X X X X X X X X ] |
---|
| 65 | byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned) |
---|
| 66 | |
---|
| 67 | byte 5: [ X X X X X X X X ] |
---|
| 68 | byte 6: [ X X X X X X X X ] |
---|
| 69 | byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned) |
---|
| 70 | |
---|
| 71 | </screen></para> |
---|
| 72 | |
---|
| 73 | <para> |
---|
| 74 | Next is the <varname>[ordered]</varname> bit flag: |
---|
| 75 | |
---|
| 76 | <screen> |
---|
| 77 | |
---|
| 78 | byte 8: [ X ] [ordered] (1 bit) |
---|
| 79 | |
---|
| 80 | </screen></para> |
---|
| 81 | |
---|
| 82 | <para> |
---|
| 83 | Each entry, numbering a |
---|
| 84 | total of <varname>[codebook_entries]</varname>, is assigned a codeword length. |
---|
| 85 | We now read the list of codeword lengths and store these lengths in |
---|
| 86 | the array <varname>[codebook_codeword_lengths]</varname>. Decode of lengths is |
---|
| 87 | according to whether the <varname>[ordered]</varname> flag is set or unset. |
---|
| 88 | |
---|
| 89 | <itemizedlist> |
---|
| 90 | <listitem> |
---|
| 91 | <para>If the <varname>[ordered]</varname> flag is unset, the codeword list is not |
---|
| 92 | length ordered and the decoder needs to read each codeword length |
---|
| 93 | one-by-one.</para> |
---|
| 94 | |
---|
| 95 | <para>The decoder first reads one additional bit flag, the |
---|
| 96 | <varname>[sparse]</varname> flag. This flag determines whether or not the |
---|
| 97 | codebook contains unused entries that are not to be included in the |
---|
| 98 | codeword decode tree: |
---|
| 99 | |
---|
| 100 | <screen> |
---|
| 101 | byte 8: [ X 1 ] [sparse] flag (1 bit) |
---|
| 102 | </screen></para> |
---|
| 103 | |
---|
| 104 | <para> |
---|
| 105 | The decoder now performs for each of the <varname>[codebook_entries]</varname> |
---|
| 106 | codebook entries: |
---|
| 107 | |
---|
| 108 | <screen> |
---|
| 109 | |
---|
| 110 | 1) if([sparse] is set){ |
---|
| 111 | |
---|
| 112 | 2) [flag] = read one bit; |
---|
| 113 | 3) if([flag] is set){ |
---|
| 114 | |
---|
| 115 | 4) [length] = read a five bit unsigned integer; |
---|
| 116 | 5) codeword length for this entry is [length]+1; |
---|
| 117 | |
---|
| 118 | } else { |
---|
| 119 | |
---|
| 120 | 6) this entry is unused. mark it as such. |
---|
| 121 | |
---|
| 122 | } |
---|
| 123 | |
---|
| 124 | } else the sparse flag is not set { |
---|
| 125 | |
---|
| 126 | 7) [length] = read a five bit unsigned integer; |
---|
| 127 | 8) the codeword length for this entry is [length]+1; |
---|
| 128 | |
---|
| 129 | } |
---|
| 130 | |
---|
| 131 | </screen></para> |
---|
| 132 | </listitem> |
---|
| 133 | <listitem> |
---|
| 134 | <para>If the <varname>[ordered]</varname> flag is set, the codeword list for this |
---|
| 135 | codebook is encoded in ascending length order. Rather than reading |
---|
| 136 | a length for every codeword, the encoder reads the number of |
---|
| 137 | codewords per length. That is, beginning at entry zero: |
---|
| 138 | |
---|
| 139 | <screen> |
---|
| 140 | 1) [current_entry] = 0; |
---|
| 141 | 2) [current_length] = read a five bit unsigned integer and add 1; |
---|
| 142 | 3) [number] = read <link linkend="vorbis-spec-ilog">ilog</link>([codebook_entries] - [current_entry]) bits as an unsigned integer |
---|
| 143 | 4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive, |
---|
| 144 | of the [codebook_codeword_lengths] array to [current_length] |
---|
| 145 | 5) set [current_entry] to [number] + [current_entry] |
---|
| 146 | 6) increment [current_length] by 1 |
---|
| 147 | 7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; |
---|
| 148 | the decoder will not be able to read this stream. |
---|
| 149 | 8) if [current_entry] is less than [codebook_entries], repeat process starting at 3) |
---|
| 150 | 9) done. |
---|
| 151 | </screen></para> |
---|
| 152 | </listitem> |
---|
| 153 | </itemizedlist> |
---|
| 154 | |
---|
| 155 | After all codeword lengths have been decoded, the decoder reads the |
---|
| 156 | vector lookup table. Vorbis I supports three lookup types: |
---|
| 157 | <orderedlist> |
---|
| 158 | <listitem> |
---|
| 159 | <simpara>No lookup</simpara> |
---|
| 160 | </listitem><listitem> |
---|
| 161 | <simpara>Implicitly populated value mapping (lattice VQ)</simpara> |
---|
| 162 | </listitem><listitem> |
---|
| 163 | <simpara>Explicitly populated value mapping (tessellated or 'foam' |
---|
| 164 | VQ)</simpara> |
---|
| 165 | </listitem> |
---|
| 166 | </orderedlist> |
---|
| 167 | </para> |
---|
| 168 | |
---|
| 169 | <para> |
---|
| 170 | The lookup table type is read as a four bit unsigned integer: |
---|
| 171 | <screen> |
---|
| 172 | 1) [codebook_lookup_type] = read four bits as an unsigned integer |
---|
| 173 | </screen></para> |
---|
| 174 | |
---|
| 175 | <para> |
---|
| 176 | Codebook decode precedes according to <varname>[codebook_lookup_type]</varname>: |
---|
| 177 | <itemizedlist> |
---|
| 178 | <listitem> |
---|
| 179 | <para>Lookup type zero indicates no lookup to be read. Proceed past |
---|
| 180 | lookup decode.</para> |
---|
| 181 | </listitem><listitem> |
---|
| 182 | <para>Lookup types one and two are similar, differing only in the |
---|
| 183 | number of lookup values to be read. Lookup type one reads a list of |
---|
| 184 | values that are permuted in a set pattern to build a list of vectors, |
---|
| 185 | each vector of order <varname>[codebook_dimensions]</varname> scalars. Lookup |
---|
| 186 | type two builds the same vector list, but reads each scalar for each |
---|
| 187 | vector explicitly, rather than building vectors from a smaller list of |
---|
| 188 | possible scalar values. Lookup decode proceeds as follows: |
---|
| 189 | |
---|
| 190 | <screen> |
---|
| 191 | 1) [codebook_minimum_value] = <link linkend="vorbis-spec-float32_unpack">float32_unpack</link>( read 32 bits as an unsigned integer) |
---|
| 192 | 2) [codebook_delta_value] = <link linkend="vorbis-spec-float32_unpack">float32_unpack</link>( read 32 bits as an unsigned integer) |
---|
| 193 | 3) [codebook_value_bits] = read 4 bits as an unsigned integer and add 1 |
---|
| 194 | 4) [codebook_sequence_p] = read 1 bit as a boolean flag |
---|
| 195 | |
---|
| 196 | if ( [codebook_lookup_type] is 1 ) { |
---|
| 197 | |
---|
| 198 | 5) [codebook_lookup_values] = <link linkend="vorbis-spec-lookup1_values">lookup1_values</link>(<varname>[codebook_entries]</varname>, <varname>[codebook_dimensions]</varname> ) |
---|
| 199 | |
---|
| 200 | } else { |
---|
| 201 | |
---|
| 202 | 6) [codebook_lookup_values] = <varname>[codebook_entries]</varname> * <varname>[codebook_dimensions]</varname> |
---|
| 203 | |
---|
| 204 | } |
---|
| 205 | |
---|
| 206 | 7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each; |
---|
| 207 | store these in order in the array [codebook_multiplicands] |
---|
| 208 | </screen></para> |
---|
| 209 | </listitem><listitem> |
---|
| 210 | <para>A <varname>[codebook_lookup_type]</varname> of greater than two is reserved |
---|
| 211 | and indicates a stream that is not decodable by the specification in this |
---|
| 212 | document.</para> |
---|
| 213 | </listitem> |
---|
| 214 | </itemizedlist> |
---|
| 215 | </para> |
---|
| 216 | |
---|
| 217 | <para> |
---|
| 218 | An 'end of packet' during any read operation in the above steps is |
---|
| 219 | considered an error condition rendering the stream undecodable.</para> |
---|
| 220 | |
---|
| 221 | <section><title>Huffman decision tree representation</title> |
---|
| 222 | |
---|
| 223 | <para> |
---|
| 224 | The <varname>[codebook_codeword_lengths]</varname> array and |
---|
| 225 | <varname>[codebook_entries]</varname> value uniquely define the Huffman decision |
---|
| 226 | tree used for entropy decoding.</para> |
---|
| 227 | |
---|
| 228 | <para> |
---|
| 229 | Briefly, each used codebook entry (recall that length-unordered |
---|
| 230 | codebooks support unused codeword entries) is assigned, in order, the |
---|
| 231 | lowest valued unused binary Huffman codeword possible. Assume the |
---|
| 232 | following codeword length list: |
---|
| 233 | |
---|
| 234 | <screen> |
---|
| 235 | entry 0: length 2 |
---|
| 236 | entry 1: length 4 |
---|
| 237 | entry 2: length 4 |
---|
| 238 | entry 3: length 4 |
---|
| 239 | entry 4: length 4 |
---|
| 240 | entry 5: length 2 |
---|
| 241 | entry 6: length 3 |
---|
| 242 | entry 7: length 3 |
---|
| 243 | </screen></para> |
---|
| 244 | |
---|
| 245 | <para> |
---|
| 246 | Assigning codewords in order (lowest possible value of the appropriate |
---|
| 247 | length to highest) results in the following codeword list: |
---|
| 248 | |
---|
| 249 | <screen> |
---|
| 250 | entry 0: length 2 codeword 00 |
---|
| 251 | entry 1: length 4 codeword 0100 |
---|
| 252 | entry 2: length 4 codeword 0101 |
---|
| 253 | entry 3: length 4 codeword 0110 |
---|
| 254 | entry 4: length 4 codeword 0111 |
---|
| 255 | entry 5: length 2 codeword 10 |
---|
| 256 | entry 6: length 3 codeword 110 |
---|
| 257 | entry 7: length 3 codeword 111 |
---|
| 258 | </screen></para> |
---|
| 259 | |
---|
| 260 | |
---|
| 261 | <note> |
---|
| 262 | <para> |
---|
| 263 | Unlike most binary numerical values in this document, we |
---|
| 264 | intend the above codewords to be read and used bit by bit from left to |
---|
| 265 | right, thus the codeword '001' is the bit string 'zero, zero, one'. |
---|
| 266 | When determining 'lowest possible value' in the assignment definition |
---|
| 267 | above, the leftmost bit is the MSb.</para> |
---|
| 268 | </note> |
---|
| 269 | |
---|
| 270 | <para> |
---|
| 271 | It is clear that the codeword length list represents a Huffman |
---|
| 272 | decision tree with the entry numbers equivalent to the leaves numbered |
---|
| 273 | left-to-right: |
---|
| 274 | |
---|
| 275 | <mediaobject> |
---|
| 276 | <imageobject> |
---|
| 277 | <imagedata fileref="hufftree.png" format="PNG"/> |
---|
| 278 | </imageobject> |
---|
| 279 | <textobject> |
---|
| 280 | <phrase>[huffman tree illustration]</phrase> |
---|
| 281 | </textobject> |
---|
| 282 | </mediaobject> |
---|
| 283 | </para> |
---|
| 284 | |
---|
| 285 | <para> |
---|
| 286 | As we assign codewords in order, we see that each choice constructs a |
---|
| 287 | new leaf in the leftmost possible position.</para> |
---|
| 288 | |
---|
| 289 | <para> |
---|
| 290 | Note that it's possible to underspecify or overspecify a Huffman tree |
---|
| 291 | via the length list. In the above example, if codeword seven were |
---|
| 292 | eliminated, it's clear that the tree is unfinished: |
---|
| 293 | |
---|
| 294 | <mediaobject> |
---|
| 295 | <imageobject> |
---|
| 296 | <imagedata fileref="hufftree-under.png" format="PNG"/> |
---|
| 297 | </imageobject> |
---|
| 298 | <textobject> |
---|
| 299 | <phrase>[underspecified huffman tree illustration]</phrase> |
---|
| 300 | </textobject> |
---|
| 301 | </mediaobject> |
---|
| 302 | </para> |
---|
| 303 | |
---|
| 304 | <para> |
---|
| 305 | Similarly, in the original codebook, it's clear that the tree is fully |
---|
| 306 | populated and a ninth codeword is impossible. Both underspecified and |
---|
| 307 | overspecified trees are an error condition rendering the stream |
---|
| 308 | undecodable.</para> |
---|
| 309 | |
---|
| 310 | <para> |
---|
| 311 | Codebook entries marked 'unused' are simply skipped in the assigning |
---|
| 312 | process. They have no codeword and do not appear in the decision |
---|
| 313 | tree, thus it's impossible for any bit pattern read from the stream to |
---|
| 314 | decode to that entry number.</para> |
---|
| 315 | |
---|
| 316 | </section> |
---|
| 317 | |
---|
| 318 | <section><title>VQ lookup table vector representation</title> |
---|
| 319 | |
---|
| 320 | <para> |
---|
| 321 | Unpacking the VQ lookup table vectors relies on the following values: |
---|
| 322 | <programlisting> |
---|
| 323 | the [codebook_multiplicands] array |
---|
| 324 | [codebook_minimum_value] |
---|
| 325 | [codebook_delta_value] |
---|
| 326 | [codebook_sequence_p] |
---|
| 327 | [codebook_lookup_type] |
---|
| 328 | [codebook_entries] |
---|
| 329 | [codebook_dimensions] |
---|
| 330 | [codebook_lookup_values] |
---|
| 331 | </programlisting> |
---|
| 332 | </para> |
---|
| 333 | |
---|
| 334 | <para> |
---|
| 335 | Decoding (unpacking) a specific vector in the vector lookup table |
---|
| 336 | proceeds according to <varname>[codebook_lookup_type]</varname>. The unpacked |
---|
| 337 | vector values are what a codebook would return during audio packet |
---|
| 338 | decode in a VQ context.</para> |
---|
| 339 | |
---|
| 340 | <section><title>Vector value decode: Lookup type 1</title> |
---|
| 341 | |
---|
| 342 | <para> |
---|
| 343 | Lookup type one specifies a lattice VQ lookup table built |
---|
| 344 | algorithmically from a list of scalar values. Calculate (unpack) the |
---|
| 345 | final values of a codebook entry vector from the entries in |
---|
| 346 | <varname>[codebook_multiplicands]</varname> as follows (<varname>[value_vector]</varname> |
---|
| 347 | is the output vector representing the vector of values for entry number |
---|
| 348 | <varname>[lookup_offset]</varname> in this codebook): |
---|
| 349 | |
---|
| 350 | <screen> |
---|
| 351 | 1) [last] = 0; |
---|
| 352 | 2) [index_divisor] = 1; |
---|
| 353 | 3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) { |
---|
| 354 | |
---|
| 355 | 4) [multiplicand_offset] = ( [lookup_offset] divided by [index_divisor] using integer |
---|
| 356 | division ) integer modulo [codebook_lookup_values] |
---|
| 357 | |
---|
| 358 | 5) vector [value_vector] element [i] = |
---|
| 359 | ( [codebook_multiplicands] array element number [multiplicand_offset] ) * |
---|
| 360 | [codebook_delta_value] + [codebook_minimum_value] + [last]; |
---|
| 361 | |
---|
| 362 | 6) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i] |
---|
| 363 | |
---|
| 364 | 7) [index_divisor] = [index_divisor] * [codebook_lookup_values] |
---|
| 365 | |
---|
| 366 | } |
---|
| 367 | |
---|
| 368 | 8) vector calculation completed. |
---|
| 369 | </screen></para> |
---|
| 370 | |
---|
| 371 | </section> |
---|
| 372 | |
---|
| 373 | <section><title>Vector value decode: Lookup type 2</title> |
---|
| 374 | |
---|
| 375 | <para> |
---|
| 376 | Lookup type two specifies a VQ lookup table in which each scalar in |
---|
| 377 | each vector is explicitly set by the <varname>[codebook_multiplicands]</varname> |
---|
| 378 | array in a one-to-one mapping. Calculate [unpack] the |
---|
| 379 | final values of a codebook entry vector from the entries in |
---|
| 380 | <varname>[codebook_multiplicands]</varname> as follows (<varname>[value_vector]</varname> |
---|
| 381 | is the output vector representing the vector of values for entry number |
---|
| 382 | <varname>[lookup_offset]</varname> in this codebook): |
---|
| 383 | |
---|
| 384 | <screen> |
---|
| 385 | 1) [last] = 0; |
---|
| 386 | 2) [multiplicand_offset] = [lookup_offset] * [codebook_dimensions] |
---|
| 387 | 3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) { |
---|
| 388 | |
---|
| 389 | 4) vector [value_vector] element [i] = |
---|
| 390 | ( [codebook_multiplicands] array element number [multiplicand_offset] ) * |
---|
| 391 | [codebook_delta_value] + [codebook_minimum_value] + [last]; |
---|
| 392 | |
---|
| 393 | 5) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i] |
---|
| 394 | |
---|
| 395 | 6) increment [multiplicand_offset] |
---|
| 396 | |
---|
| 397 | } |
---|
| 398 | |
---|
| 399 | 7) vector calculation completed. |
---|
| 400 | </screen></para> |
---|
| 401 | |
---|
| 402 | </section> |
---|
| 403 | |
---|
| 404 | </section> |
---|
| 405 | |
---|
| 406 | </section> |
---|
| 407 | |
---|
| 408 | </section> |
---|
| 409 | |
---|
| 410 | <section> |
---|
| 411 | <title>Use of the codebook abstraction</title> |
---|
| 412 | |
---|
| 413 | <para> |
---|
| 414 | The decoder uses the codebook abstraction much as it does the |
---|
| 415 | bit-unpacking convention; a specific codebook reads a |
---|
| 416 | codeword from the bitstream, decoding it into an entry number, and then |
---|
| 417 | returns that entry number to the decoder (when used in a scalar |
---|
| 418 | entropy coding context), or uses that entry number as an offset into |
---|
| 419 | the VQ lookup table, returning a vector of values (when used in a context |
---|
| 420 | desiring a VQ value). Scalar or VQ context is always explicit; any call |
---|
| 421 | to the codebook mechanism requests either a scalar entry number or a |
---|
| 422 | lookup vector.</para> |
---|
| 423 | |
---|
| 424 | <para> |
---|
| 425 | Note that VQ lookup type zero indicates that there is no lookup table; |
---|
| 426 | requesting decode using a codebook of lookup type 0 in any context |
---|
| 427 | expecting a vector return value (even in a case where a vector of |
---|
| 428 | dimension one) is forbidden. If decoder setup or decode requests such |
---|
| 429 | an action, that is an error condition rendering the packet |
---|
| 430 | undecodable.</para> |
---|
| 431 | |
---|
| 432 | <para> |
---|
| 433 | Using a codebook to read from the packet bitstream consists first of |
---|
| 434 | reading and decoding the next codeword in the bitstream. The decoder |
---|
| 435 | reads bits until the accumulated bits match a codeword in the |
---|
| 436 | codebook. This process can be though of as logically walking the |
---|
| 437 | Huffman decode tree by reading one bit at a time from the bitstream, |
---|
| 438 | and using the bit as a decision boolean to take the 0 branch (left in |
---|
| 439 | the above examples) or the 1 branch (right in the above examples). |
---|
| 440 | Walking the tree finishes when the decode process hits a leaf in the |
---|
| 441 | decision tree; the result is the entry number corresponding to that |
---|
| 442 | leaf. Reading past the end of a packet propagates the 'end-of-stream' |
---|
| 443 | condition to the decoder.</para> |
---|
| 444 | |
---|
| 445 | <para> |
---|
| 446 | When used in a scalar context, the resulting codeword entry is the |
---|
| 447 | desired return value.</para> |
---|
| 448 | |
---|
| 449 | <para> |
---|
| 450 | When used in a VQ context, the codeword entry number is used as an |
---|
| 451 | offset into the VQ lookup table. The value returned to the decoder is |
---|
| 452 | the vector of scalars corresponding to this offset.</para> |
---|
| 453 | |
---|
| 454 | </section> |
---|
| 455 | |
---|
| 456 | </section> |
---|
| 457 | |
---|
| 458 | <!-- end section of probablity model and codebooks --> |
---|