Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/libvorbis-1.2.0/doc/xml/01-introduction.xml @ 16

Last change on this file since 16 was 16, checked in by landauf, 16 years ago
added libvorbis
File size: 25.8 KB

Rev	Line
[16]	1	<?xml version="1.0" standalone="no"?>
	2	<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
	3	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
	4
	5	]>
	6
	7	<section id="vorbis-spec-intro">
	8	<sectioninfo>
	9	<releaseinfo>
	10	$Id: 01-introduction.xml 7186 2004-07-20 07:19:25Z xiphmont $
	11	</releaseinfo>
	12	</sectioninfo>
	13	<title>Introduction and Description</title>
	14
	15	<section>
	16	<title>Overview</title>
	17
	18	<para>
	19	This document provides a high level description of the Vorbis codec's
	20	construction. A bit-by-bit specification appears beginning in
	21	<xref linkend="vorbis-spec-codec"/>.
	22	The later sections assume a high-level
	23	understanding of the Vorbis decode process, which is
	24	provided here.</para>
	25
	26	<section>
	27	<title>Application</title>
	28	<para>
	29	Vorbis is a general purpose perceptual audio CODEC intended to allow
	30	maximum encoder flexibility, thus allowing it to scale competitively
	31	over an exceptionally wide range of bitrates. At the high
	32	quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits)
	33	it is in the same league as MPEG-2 and MPC. Similarly, the 1.0
	34	encoder can encode high-quality CD and DAT rate stereo at below 48kbps
	35	without resampling to a lower rate. Vorbis is also intended for
	36	lower and higher sample rates (from 8kHz telephony to 192kHz digital
	37	masters) and a range of channel representations (monaural,
	38	polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255
	39	discrete channels).
	40	</para>
	41	</section>
	42
	43	<section>
	44	<title>Classification</title>
	45	<para>
	46	Vorbis I is a forward-adaptive monolithic transform CODEC based on the
	47	Modified Discrete Cosine Transform. The codec is structured to allow
	48	addition of a hybrid wavelet filterbank in Vorbis II to offer better
	49	transient response and reproduction using a transform better suited to
	50	localized time events.
	51	</para>
	52	</section>
	53
	54	<section>
	55	<title>Assumptions</title>
	56
	57	<para>
	58	The Vorbis CODEC design assumes a complex, psychoacoustically-aware
	59	encoder and simple, low-complexity decoder. Vorbis decode is
	60	computationally simpler than mp3, although it does require more
	61	working memory as Vorbis has no static probability model; the vector
	62	codebooks used in the first stage of decoding from the bitstream are
	63	packed in their entirety into the Vorbis bitstream headers. In
	64	packed form, these codebooks occupy only a few kilobytes; the extent
	65	to which they are pre-decoded into a cache is the dominant factor in
	66	decoder memory usage.
	67	</para>
	68
	69	<para>
	70	Vorbis provides none of its own framing, synchronization or protection
	71	against errors; it is solely a method of accepting input audio,
	72	dividing it into individual frames and compressing these frames into
	73	raw, unformatted 'packets'. The decoder then accepts these raw
	74	packets in sequence, decodes them, synthesizes audio frames from
	75	them, and reassembles the frames into a facsimile of the original
	76	audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no
	77	minimum size, maximum size, or fixed/expected size. Packets
	78	are designed that they may be truncated (or padded) and remain
	79	decodable; this is not to be considered an error condition and is used
	80	extensively in bitrate management in peeling. Both the transport
	81	mechanism and decoder must allow that a packet may be any size, or
	82	end before or after packet decode expects.</para>
	83
	84	<para>
	85	Vorbis packets are thus intended to be used with a transport mechanism
	86	that provides free-form framing, sync, positioning and error correction
	87	in accordance with these design assumptions, such as Ogg (for file
	88	transport) or RTP (for network multicast). For purposes of a few
	89	examples in this document, we will assume that Vorbis is to be
	90	embedded in an Ogg stream specifically, although this is by no means a
	91	requirement or fundamental assumption in the Vorbis design.</para>
	92
	93	<para>
	94	The specification for embedding Vorbis into
	95	an Ogg transport stream is in <xref linkend="vorbis-over-ogg"/>.
	96	</para>
	97
	98	</section>
	99
	100	<section>
	101	<title>Codec Setup and Probability Model</title>
	102
	103	<para>
	104	Vorbis' heritage is as a research CODEC and its current design
	105	reflects a desire to allow multiple decades of continuous encoder
	106	improvement before running out of room within the codec specification.
	107	For these reasons, configurable aspects of codec setup intentionally
	108	lean toward the extreme of forward adaptive.</para>
	109
	110	<para>
	111	The single most controversial design decision in Vorbis (and the most
	112	unusual for a Vorbis developer to keep in mind) is that the entire
	113	probability model of the codec, the Huffman and VQ codebooks, is
	114	packed into the bitstream header along with extensive CODEC setup
	115	parameters (often several hundred fields). This makes it impossible,
	116	as it would be with MPEG audio layers, to embed a simple frame type
	117	flag in each audio packet, or begin decode at any frame in the stream
	118	without having previously fetched the codec setup header.
	119	</para>
	120
	121	<note><para>
	122	Vorbis <emphasis>can</emphasis> initiate decode at any arbitrary packet within a
	123	bitstream so long as the codec has been initialized/setup with the
	124	setup headers.</para></note>
	125
	126	<para>
	127	Thus, Vorbis headers are both required for decode to begin and
	128	relatively large as bitstream headers go. The header size is
	129	unbounded, although for streaming a rule-of-thumb of 4kB or less is
	130	recommended (and Xiph.Org's Vorbis encoder follows this suggestion).</para>
	131
	132	<para>
	133	Our own design work indicates the primary liability of the
	134	required header is in mindshare; it is an unusual design and thus
	135	causes some amount of complaint among engineers as this runs against
	136	current design trends (and also points out limitations in some
	137	existing software/interface designs, such as Windows' ACM codec
	138	framework). However, we find that it does not fundamentally limit
	139	Vorbis' suitable application space.</para>
	140
	141	</section>
	142
	143	<section><title>Format Specification</title>
	144	<para>
	145	The Vorbis format is well-defined by its decode specification; any
	146	encoder that produces packets that are correctly decoded by the
	147	reference Vorbis decoder described below may be considered a proper
	148	Vorbis encoder. A decoder must faithfully and completely implement
	149	the specification defined below (except where noted) to be considered
	150	a proper Vorbis decoder.</para>
	151	</section>
	152
	153	<section><title>Hardware Profile</title>
	154	<para>
	155	Although Vorbis decode is computationally simple, it may still run
	156	into specific limitations of an embedded design. For this reason,
	157	embedded designs are allowed to deviate in limited ways from the
	158	'full' decode specification yet still be certified compliant. These
	159	optional omissions are labelled in the spec where relevant.</para>
	160	</section>
	161
	162	</section>
	163
	164	<section>
	165	<title>Decoder Configuration</title>
	166
	167	<para>
	168	Decoder setup consists of configuration of multiple, self-contained
	169	component abstractions that perform specific functions in the decode
	170	pipeline. Each different component instance of a specific type is
	171	semantically interchangeable; decoder configuration consists both of
	172	internal component configuration, as well as arrangement of specific
	173	instances into a decode pipeline. Componentry arrangement is roughly
	174	as follows:</para>
	175
	176	<mediaobject>
	177	<imageobject>
	178	<imagedata fileref="components.png" format="PNG"/>
	179	</imageobject>
	180	<textobject>
	181	<phrase>decoder pipeline configuration</phrase>
	182	</textobject>
	183	</mediaobject>
	184
	185	<section><title>Global Config</title>
	186	<para>
	187	Global codec configuration consists of a few audio related fields
	188	(sample rate, channels), Vorbis version (always '0' in Vorbis I),
	189	bitrate hints, and the lists of component instances. All other
	190	configuration is in the context of specific components.</para>
	191	</section>
	192
	193	<section><title>Mode</title>
	194
	195	<para>
	196	Each Vorbis frame is coded according to a master 'mode'. A bitstream
	197	may use one or many modes.</para>
	198
	199	<para>
	200	The mode mechanism is used to encode a frame according to one of
	201	multiple possible methods with the intention of choosing a method best
	202	suited to that frame. Different modes are, e.g. how frame size
	203	is changed from frame to frame. The mode number of a frame serves as a
	204	top level configuration switch for all other specific aspects of frame
	205	decode.</para>
	206
	207	<para>
	208	A 'mode' configuration consists of a frame size setting, window type
	209	(always 0, the Vorbis window, in Vorbis I), transform type (always
	210	type 0, the MDCT, in Vorbis I) and a mapping number. The mapping
	211	number specifies which mapping configuration instance to use for
	212	low-level packet decode and synthesis.</para>
	213
	214	</section>
	215
	216	<section><title>Mapping</title>
	217
	218	<para>
	219	A mapping contains a channel coupling description and a list of
	220	'submaps' that bundle sets of channel vectors together for grouped
	221	encoding and decoding. These submaps are not references to external
	222	components; the submap list is internal and specific to a mapping.</para>
	223
	224	<para>
	225	A 'submap' is a configuration/grouping that applies to a subset of
	226	floor and residue vectors within a mapping. The submap functions as a
	227	last layer of indirection such that specific special floor or residue
	228	settings can be applied not only to all the vectors in a given mode,
	229	but also specific vectors in a specific mode. Each submap specifies
	230	the proper floor and residue instance number to use for decoding that
	231	submap's spectral floor and spectral residue vectors.</para>
	232
	233	<para>
	234	As an example:</para>
	235
	236	<para>
	237	Assume a Vorbis stream that contains six channels in the standard 5.1
	238	format. The sixth channel, as is normal in 5.1, is bass only.
	239	Therefore it would be wasteful to encode a full-spectrum version of it
	240	as with the other channels. The submapping mechanism can be used to
	241	apply a full range floor and residue encoding to channels 0 through 4,
	242	and a bass-only representation to the bass channel, thus saving space.
	243	In this example, channels 0-4 belong to submap 0 (which indicates use
	244	of a full-range floor) and channel 5 belongs to submap 1, which uses a
	245	bass-only representation.</para>
	246
	247	</section>
	248
	249	<section><title>Floor</title>
	250
	251	<para>
	252	Vorbis encodes a spectral 'floor' vector for each PCM channel. This
	253	vector is a low-resolution representation of the audio spectrum for
	254	the given channel in the current frame, generally used akin to a
	255	whitening filter. It is named a 'floor' because the Xiph.Org
	256	reference encoder has historically used it as a unit-baseline for
	257	spectral resolution.</para>
	258
	259	<para>
	260	A floor encoding may be of two types. Floor 0 uses a packed LSP
	261	representation on a dB amplitude scale and Bark frequency scale.
	262	Floor 1 represents the curve as a piecewise linear interpolated
	263	representation on a dB amplitude scale and linear frequency scale.
	264	The two floors are semantically interchangeable in
	265	encoding/decoding. However, floor type 1 provides more stable
	266	inter-frame behavior, and so is the preferred choice in all
	267	coupled-stereo and high bitrate modes. Floor 1 is also considerably
	268	less expensive to decode than floor 0.</para>
	269
	270	<para>
	271	Floor 0 is not to be considered deprecated, but it is of limited
	272	modern use. No known Vorbis encoder past Xiph.org's own beta 4 makes
	273	use of floor 0.</para>
	274
	275	<para>
	276	The values coded/decoded by a floor are both compactly formatted and
	277	make use of entropy coding to save space. For this reason, a floor
	278	configuration generally refers to multiple codebooks in the codebook
	279	component list. Entropy coding is thus provided as an abstraction,
	280	and each floor instance may choose from any and all available
	281	codebooks when coding/decoding.</para>
	282
	283	</section>
	284
	285	<section><title>Residue</title>
	286	<para>
	287	The spectral residue is the fine structure of the audio spectrum
	288	once the floor curve has been subtracted out. In simplest terms, it
	289	is coded in the bitstream using cascaded (multi-pass) vector
	290	quantization according to one of three specific packing/coding
	291	algorithms numbered 0 through 2. The packing algorithm details are
	292	configured by residue instance. As with the floor components, the
	293	final VQ/entropy encoding is provided by external codebook instances
	294	and each residue instance may choose from any and all available
	295	codebooks.</para>
	296	</section>
	297
	298	<section><title>Codebooks</title>
	299
	300	<para>
	301	Codebooks are a self-contained abstraction that perform entropy
	302	decoding and, optionally, use the entropy-decoded integer value as an
	303	offset into an index of output value vectors, returning the indicated
	304	vector of values.</para>
	305
	306	<para>
	307	The entropy coding in a Vorbis I codebook is provided by a standard
	308	Huffman binary tree representation. This tree is tightly packed using
	309	one of several methods, depending on whether codeword lengths are
	310	ordered or unordered, or the tree is sparse.</para>
	311
	312	<para>
	313	The codebook vector index is similarly packed according to index
	314	characteristic. Most commonly, the vector index is encoded as a
	315	single list of values of possible values that are then permuted into
	316	a list of n-dimensional rows (lattice VQ).</para>
	317
	318	</section>
	319
	320	</section>
	321
	322
	323	<section>
	324	<title>High-level Decode Process</title>
	325
	326	<section>
	327	<title>Decode Setup</title>
	328
	329	<para>
	330	Before decoding can begin, a decoder must initialize using the
	331	bitstream headers matching the stream to be decoded. Vorbis uses
	332	three header packets; all are required, in-order, by this
	333	specification. Once set up, decode may begin at any audio packet
	334	belonging to the Vorbis stream. In Vorbis I, all packets after the
	335	three initial headers are audio packets. </para>
	336
	337	<para>
	338	The header packets are, in order, the identification
	339	header, the comments header, and the setup header.</para>
	340
	341	<section><title>Identification Header</title>
	342	<para>
	343	The identification header identifies the bitstream as Vorbis, Vorbis
	344	version, and the simple audio characteristics of the stream such as
	345	sample rate and number of channels.</para>
	346	</section>
	347
	348	<section><title>Comment Header</title>
	349	<para>
	350	The comment header includes user text comments ("tags") and a vendor
	351	string for the application/library that produced the bitstream. The
	352	encoding and proper use of the comment header is described in
	353	<xref linkend="vorbis-spec-comment"/>.</para>
	354	</section>
	355
	356	<section><title>Setup Header</title>
	357	<para>
	358	The setup header includes extensive CODEC setup information as well as
	359	the complete VQ and Huffman codebooks needed for decode.</para>
	360	</section>
	361
	362	</section>
	363
	364	<section><title>Decode Procedure</title>
	365
	366	<highlights>
	367	<para>
	368	The decoding and synthesis procedure for all audio packets is
	369	fundamentally the same.
	370	<orderedlist>
	371	<listitem><simpara>decode packet type flag</simpara></listitem>
	372	<listitem><simpara>decode mode number</simpara></listitem>
	373	<listitem><simpara>decode window shape (long windows only)</simpara></listitem>
	374	<listitem><simpara>decode floor</simpara></listitem>
	375	<listitem><simpara>decode residue into residue vectors</simpara></listitem>
	376	<listitem><simpara>inverse channel coupling of residue vectors</simpara></listitem>
	377	<listitem><simpara>generate floor curve from decoded floor data</simpara></listitem>
	378	<listitem><simpara>compute dot product of floor and residue, producing audio spectrum vector</simpara></listitem>
	379	<listitem><simpara>inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I</simpara></listitem>
	380	<listitem><simpara>overlap/add left-hand output of transform with right-hand output of previous frame</simpara></listitem>
	381	<listitem><simpara>store right hand-data from transform of current frame for future lapping</simpara></listitem>
	382	<listitem><simpara>if not first frame, return results of overlap/add as audio result of current frame</simpara></listitem>
	383	</orderedlist>
	384	</para>
	385	</highlights>
	386
	387	<para>
	388	Note that clever rearrangement of the synthesis arithmetic is
	389	possible; as an example, one can take advantage of symmetries in the
	390	MDCT to store the right-hand transform data of a partial MDCT for a
	391	50% inter-frame buffer space savings, and then complete the transform
	392	later before overlap/add with the next frame. This optimization
	393	produces entirely equivalent output and is naturally perfectly legal.
	394	The decoder must be <emphasis>entirely mathematically equivalent</emphasis> to the
	395	specification, it need not be a literal semantic implementation.</para>
	396
	397	<section><title>Packet type decode</title>
	398
	399	<para>
	400	Vorbis I uses four packet types. The first three packet types mark each
	401	of the three Vorbis headers described above. The fourth packet type
	402	marks an audio packet. All other packet types are reserved; packets
	403	marked with a reserved type should be ignored.</para>
	404
	405	<para>
	406	Following the three header packets, all packets in a Vorbis I stream
	407	are audio. The first step of audio packet decode is to read and
	408	verify the packet type; <emphasis>a non-audio packet when audio is expected
	409	indicates stream corruption or a non-compliant stream. The decoder
	410	must ignore the packet and not attempt decoding it to
	411	audio</emphasis>.</para>
	412
	413	</section>
	414
	415
	416	<section><title>Mode decode</title>
	417	<para>
	418	Vorbis allows an encoder to set up multiple, numbered packet 'modes',
	419	as described earlier, all of which may be used in a given Vorbis
	420	stream. The mode is encoded as an integer used as a direct offset into
	421	the mode instance index. </para>
	422	</section>
	423
	424	<section id="vorbis-spec-window">
	425	<title>Window shape decode (long windows only)</title>
	426
	427	<para>
	428	Vorbis frames may be one of two PCM sample sizes specified during
	429	codec setup. In Vorbis I, legal frame sizes are powers of two from 64
	430	to 8192 samples. Aside from coupling, Vorbis handles channels as
	431	independent vectors and these frame sizes are in samples per channel.</para>
	432
	433	<para>
	434	Vorbis uses an overlapping transform, namely the MDCT, to blend one
	435	frame into the next, avoiding most inter-frame block boundary
	436	artifacts. The MDCT output of one frame is windowed according to MDCT
	437	requirements, overlapped 50% with the output of the previous frame and
	438	added. The window shape assures seamless reconstruction. </para>
	439
	440	<para>
	441	This is easy to visualize in the case of equal sized-windows:</para>
	442
	443	<mediaobject>
	444	<imageobject>
	445	<imagedata fileref="window1.png" format="PNG"/>
	446	</imageobject>
	447	<textobject>
	448	<phrase>overlap of two equal-sized windows</phrase>
	449	</textobject>
	450	</mediaobject>
	451
	452	<para>
	453	And slightly more complex in the case of overlapping unequal sized
	454	windows:</para>
	455
	456	<mediaobject>
	457	<imageobject>
	458	<imagedata fileref="window2.png" format="PNG"/>
	459	</imageobject>
	460	<textobject>
	461	<phrase>overlap of a long and a short window</phrase>
	462	</textobject>
	463	</mediaobject>
	464
	465	<para>
	466	In the unequal-sized window case, the window shape of the long window
	467	must be modified for seamless lapping as above. It is possible to
	468	correctly infer window shape to be applied to the current window from
	469	knowing the sizes of the current, previous and next window. It is
	470	legal for a decoder to use this method. However, in the case of a long
	471	window (short windows require no modification), Vorbis also codes two
	472	flag bits to specify pre- and post- window shape. Although not
	473	strictly necessary for function, this minor redundancy allows a packet
	474	to be fully decoded to the point of lapping entirely independently of
	475	any other packet, allowing easier abstraction of decode layers as well
	476	as allowing a greater level of easy parallelism in encode and
	477	decode.</para>
	478
	479	<para>
	480	A description of valid window functions for use with an inverse MDCT
	481	can be found in the paper
	482	<citetitle pubwork="article">
	483	<ulink url="http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps">
	484	The use of multirate filter banks for coding of high quality digital
	485	audio</ulink></citetitle>, by T. Sporer, K. Brandenburg and B. Edler. Vorbis windows
	486	all use the slope function
	487	<inlineequation>
	488
	489	<alt>y=sin(.5PIsin^2((x+.5)/n*pi))</alt>
	490	<inlinemediaobject>
	491	<textobject>
	492	<phrase>$y = \sin(.5\pi \, \sin^2((x+.5)/n\pi))$</phrase>
	493	</textobject>
	494	</inlinemediaobject>
	495	</inlineequation>.
	496	</para>
	497
	498	</section>
	499
	500	<section><title>floor decode</title>
	501	<para>
	502	Each floor is encoded/decoded in channel order, however each floor
	503	belongs to a 'submap' that specifies which floor configuration to
	504	use. All floors are decoded before residue decode begins.</para>
	505	</section>
	506
	507	<section><title>residue decode</title>
	508
	509	<para>
	510	Although the number of residue vectors equals the number of channels,
	511	channel coupling may mean that the raw residue vectors extracted
	512	during decode do not map directly to specific channels. When channel
	513	coupling is in use, some vectors will correspond to coupled magnitude
	514	or angle. The coupling relationships are described in the codec setup
	515	and may differ from frame to frame, due to different mode numbers.</para>
	516
	517	<para>
	518	Vorbis codes residue vectors in groups by submap; the coding is done
	519	in submap order from submap 0 through n-1. This differs from floors
	520	which are coded using a configuration provided by submap number, but
	521	are coded individually in channel order.</para>
	522
	523	</section>
	524
	525	<section><title>inverse channel coupling</title>
	526
	527	<para>
	528	A detailed discussion of stereo in the Vorbis codec can be found in
	529	the document <ulink url="stereo.html"><citetitle>Stereo Channel Coupling in the
	530	Vorbis CODEC</citetitle></ulink>. Vorbis is not limited to only stereo coupling, but
	531	the stereo document also gives a good overview of the generic coupling
	532	mechanism.</para>
	533
	534	<para>
	535	Vorbis coupling applies to pairs of residue vectors at a time;
	536	decoupling is done in-place a pair at a time in the order and using
	537	the vectors specified in the current mapping configuration. The
	538	decoupling operation is the same for all pairs, converting square
	539	polar representation (where one vector is magnitude and the second
	540	angle) back to Cartesian representation.</para>
	541
	542	<para>
	543	After decoupling, in order, each pair of vectors on the coupling list,
	544	the resulting residue vectors represent the fine spectral detail
	545	of each output channel.</para>
	546
	547	</section>
	548
	549	<section><title>generate floor curve</title>
	550
	551	<para>
	552	The decoder may choose to generate the floor curve at any appropriate
	553	time. It is reasonable to generate the output curve when the floor
	554	data is decoded from the raw packet, or it can be generated after
	555	inverse coupling and applied to the spectral residue directly,
	556	combining generation and the dot product into one step and eliminating
	557	some working space.</para>
	558
	559	<para>
	560	Both floor 0 and floor 1 generate a linear-range, linear-domain output
	561	vector to be multiplied (dot product) by the linear-range,
	562	linear-domain spectral residue.</para>
	563
	564	</section>
	565
	566	<section><title>compute floor/residue dot product</title>
	567
	568	<para>
	569	This step is straightforward; for each output channel, the decoder
	570	multiplies the floor curve and residue vectors element by element,
	571	producing the finished audio spectrum of each channel.</para>
	572
	573	<para>
	574	One point is worth mentioning about this dot product; a common mistake
	575	in a fixed point implementation might be to assume that a 32 bit
	576	fixed-point representation for floor and residue and direct
	577	multiplication of the vectors is sufficient for acceptable spectral
	578	depth in all cases because it happens to mostly work with the current
	579	Xiph.Org reference encoder.</para>
	580
	581	<para>
	582	However, floor vector values can span ~140dB (~24 bits unsigned), and
	583	the audio spectrum vector should represent a minimum of 120dB (~21
	584	bits with sign), even when output is to a 16 bit PCM device. For the
	585	residue vector to represent full scale if the floor is nailed to
	586	-140dB, it must be able to span 0 to +140dB. For the residue vector
	587	to reach full scale if the floor is nailed at 0dB, it must be able to
	588	represent -140dB to +0dB. Thus, in order to handle full range
	589	dynamics, a residue vector may span -140dB to +140dB entirely within
	590	spec. A 280dB range is approximately 48 bits with sign; thus the
	591	residue vector must be able to represent a 48 bit range and the dot
	592	product must be able to handle an effective 48 bit times 24 bit
	593	multiplication. This range may be achieved using large (64 bit or
	594	larger) integers, or implementing a movable binary point
	595	representation.</para>
	596
	597	</section>
	598
	599	<section><title>inverse monolithic transform (MDCT)</title>
	600
	601	<para>
	602	The audio spectrum is converted back into time domain PCM audio via an
	603	inverse Modified Discrete Cosine Transform (MDCT). A detailed
	604	description of the MDCT is available in the paper <ulink
	605	url="http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps"><citetitle pubwork="article">The use of multirate filter banks for coding of high quality digital
	606	audio</citetitle></ulink>, by T. Sporer, K. Brandenburg and B. Edler.</para>
	607
	608	<para>
	609	Note that the PCM produced directly from the MDCT is not yet finished
	610	audio; it must be lapped with surrounding frames using an appropriate
	611	window (such as the Vorbis window) before the MDCT can be considered
	612	orthogonal.</para>
	613
	614	</section>
	615
	616	<section><title>overlap/add data</title>
	617	<para>
	618	Windowed MDCT output is overlapped and added with the right hand data
	619	of the previous window such that the 3/4 point of the previous window
	620	is aligned with the 1/4 point of the current window (as illustrated in
	621	the window overlap diagram). At this point, the audio data between the
	622	center of the previous frame and the center of the current frame is
	623	now finished and ready to be returned. </para>
	624	</section>
	625
	626	<section><title>cache right hand data</title>
	627	<para>
	628	The decoder must cache the right hand portion of the current frame to
	629	be lapped with the left hand portion of the next frame.
	630	</para>
	631	</section>
	632
	633	<section><title>return finished audio data</title>
	634
	635	<para>
	636	The overlapped portion produced from overlapping the previous and
	637	current frame data is finished data to be returned by the decoder.
	638	This data spans from the center of the previous window to the center
	639	of the current window. In the case of same-sized windows, the amount
	640	of data to return is one-half block consisting of and only of the
	641	overlapped portions. When overlapping a short and long window, much of
	642	the returned range is not actually overlap. This does not damage
	643	transform orthogonality. Pay attention however to returning the
	644	correct data range; the amount of data to be returned is:
	645
	646	<programlisting>
	647	window_blocksize(previous_window)/4+window_blocksize(current_window)/4
	648	</programlisting>
	649
	650	from the center of the previous window to the center of the current
	651	window.</para>
	652
	653	<para>
	654	Data is not returned from the first frame; it must be used to 'prime'
	655	the decode engine. The encoder accounts for this priming when
	656	calculating PCM offsets; after the first frame, the proper PCM output
	657	offset is '0' (as no data has been returned yet).</para>
	658	</section>
	659	</section>
	660
	661	</section>
	662
	663	</section>
	664	<!-- end Vorbis I specification introduction and description -->
	665

Note: See TracBrowser for help on using the repository browser.

Download in other formats: