diff options
Diffstat (limited to 'libtoolame-dab/html/vbr.html')
-rw-r--r-- | libtoolame-dab/html/vbr.html | 239 |
1 files changed, 239 insertions, 0 deletions
diff --git a/libtoolame-dab/html/vbr.html b/libtoolame-dab/html/vbr.html new file mode 100644 index 0000000..5f7ae38 --- /dev/null +++ b/libtoolame-dab/html/vbr.html @@ -0,0 +1,239 @@ +<html> +<head> +<title>tooLAME: MPEG Audio Layer II VBR</title> +<style> +<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif } + A { COLOR: #111177; TEXT-DECORATION: none } + TD { font-size: medium; font-weight:normal } +--!> +</STYLE> +</head> +<body> + +<table border = 0 width="75%" align="center"><tr><td> +<h1> tooLAME: MPEG Audio Layer II VBR </h1> + +<h2>Contents</h2> +<Ul> +<LI>Introduction +<LI>Usage +<LI>Bitrate Ranges for various Sampling frequencies +<LI>Why can't the bitrate vary from 32kbps to 384kbps for every file? +<UL> + <LI>Short Answer + <LI>Long Answer +</UL> +<LI> Tech Stuff +</UL> + +<h2>Introduction</h2> +VBR mode works by selecting a different bitrate for each frame. Frames +which are harder to encode will be allocated more bits i.e. a higher bitrate.</p> + +LayerII VBR is a complete hack - the ISO standard actually says that decoders are not +required to support it. As a hack, its implementation is a pain to try and understand. +If you're mega-keen to get full range VBR working, either (a) send me money (b) grab the +ISO standard and a C compiler and email me.</p> + +<h2>Usage</h2> +<pre> + toolame -v [level] inputfile outputfile. +</pre> +A level of 5 works very well for me.</p> + +The level value can is a measurement of quality - the higher +the level the higher the average bitrate of the resultant file. +[See TECH STUFF for a better explanation of what the value does]</p> + +The confusing part of my implementation of LayerII VBR is that it's different from MP3 VBR. +<UL> +<LI>The range of bitrates used is controlled by the input sampling frequency. (See below "Bitrate ranges") +<LI>The tendency to use higher bitrates is governed by the <level>. +</ul> + +E.g. Say you have a 44.1kHz Stereo file. In VBR mode, the bitrate can range from 192 to 384 kbps. +Using "-v -5" will force the encoder to favour the lower bitrate. +Using "-v 5" will force the encoder to favour the upper bitrate. +The value can actually be *any* int. -27, 233, 47. The larger the number, the greater +the bitrate bias.</p> + +<h2>Bitrate Ranges</h2> + +When making a VBR stream, the bitrate is only allowed to vary within +set limits</p> + +<pre> +48kHz +Stereo: 112-384kbps Mono: 56-192kbps + +44.1kHz & 32kHz +Stereo: 192-384kbps Mono: 96-192kbps + +24kHz, 22.05kHz & 16kHz +Stereo/Mono: 8-160kbps +</pre> + +<h2>Why doesn't the VBR mode work the same as MP3VBR? The Short Answer</h2> +<b>Why can't the bitrate vary from 32kbps to 384kbps for every file?</b></p> +According to the standard (ISO/IEC 11172-3:1993) Section 2.4.2.3 +<pre> + "In order to provide the smallest possible delay and complexity, the + decoder is not required to support a continuously variable bitrate when + in layer I or II. Layer III supports variable bitrate by switching the + bitrate index." + + and + + "For Layer II, not all combinations of total bitrate and mode are allowed." +</pre> + +Hence, most LayerII coders would not have been written with VBR in mind, and +LayerII VBR is a hack. It works for limited cases. Getting it to work to +the same extent as MP3-style VBR will be a major hack.</p> + +(If you *really* want better bitrate ranges, read "The Long Answer" and submit your mega-patch.)</p> + +<h2>Why doesn't the VBR mode work the same as MP3VBR? The Long Answer</h2> +<b>Why can't the bitrate vary from 32kbps to 384kbps for every file?</b> + +<h3>Reason 1: The standard limits the range</h3> + +As quoted above from the standard for 48/44.1/32kHz: +<pre> + "For Layer II, not all combinations of total bitrate and mode are allowed. See + the following table." + +Bitrate Allowed Modes +(kbps) +32 mono only +48 mono only +56 mono only +64 all modes +80 mono only +96 all modes +112 all modes +128 all modes +160 all modes +192 all modes +224 stereo only +256 stereo only +320 stereo only +384 stereo only +</pre> + +So based upon this table alone, you *could* have VBR stereo encoding which varies +smoothly from 96 to 384kbps. Or you could have have VBR mono encoding which varies from +32 to 192kbps. But since the top and bottom bitrates don't apply to all modes, it would +be impossible to have a stereo file encoded from 32 to 384 kbps.</p> + +But this isn't what is really limiting the allowable bitrate range - the bit allocation +tables are the major hurdle.</p> + +<h3>Reason 2: The bit allocation tables don't allow it</h3> + +From the standard, Section 2.4.3.3.1 "Bit allocation decoding"</p> +<pre> + "For different combinations of bitrate and sampling frequency, different bit + allocation tables exist. +</pre> +These bit allocation tables are pre-determined tables (in Annex B of the standard) which +indicate +<UL> + <LI>how many bits to read for the initial data (2,3 or 4) + <LI>these bits are then used as an index back into the table to + find the number of quantize levels for the samples in this subband +</ul> +But the table used (and hence the number of bits and the calculated index) are different +for different combinations of bitrate and sampling frequency.</p> + +I will use TableB.2a as an example.</p> + +Table B.2a Applies for the following combinations. +<pre> +Sampling Freq Bitrates in (kbps/channel) [emphasis: this is a PER CHANNEL bitrate] +48 56, 64, 80, 96, 112, 128, 160, 192 +44.1 56, 64, 80 +32 56, 64, 80 +</pre> +If we have a STEREO 48kHz input file, and we use this table, then the bitrates +we could calculate from this would be 112, 128, 160, 192, 224, 256, 320 and 384 kbps.</p> + +This table contains no information on how to encode stuff at bitrates less than 112kbps +(for a stereo file). You would have to load allocation table B.2c to encode stereo at +64kbps and 128kbps.</p> + +Since it would be a MAJOR piece of hacking to get the different tables shifted in and out +during the encoding process, once an allocation table is loaded *IT IS NOT CHANGED*.</p> + +Hence, the best table is picked at the start of the encoding process, and the encoder +is stuck with it for the rest of the encode. </p> + +For toolame-02j, I have picked the table it loads for different +sampling frequencies in order to optimize the range of bitrates possible. +<pre> +48 kHz - Table B.2a + Stereo Bitrate Range: 112 - 384 + Mono Bitrate Range : 56 - 192 + +44.1/32 kHz - Table B.2b + Stereo Bitrate Range: 192 - 384 + Mono Bitrate Range: 96 - 192 + +24/22.05/16 kHz - LSF Table (Standard ISO/IEC 13818.3:1995 Annex B, Table B.1) + There is only 1 table for the Lower Sampling Frequencies + All modes (mono and stereo) are allowable at all bitrates + So at the Lower Sampling Frequencies you *can* have a completely variable + bitrate over the entire range. +</pre> +<h2>Tech Stuff</h2> + +The VBR mode is mainly centered around the main_bit_allocation() and +a_bit_allocation() routines in encode.c.</p> + +The limited range of VBR is due to my particular implementation which restricts +ranges to within one alloc table (see tables B.2a, B.2b, B.2c and B.2d in ISO 11172). +The VBR range for 32/44.1khz lies within B.2b, and the 48khz VBR lies within table B.2a.</p> + +I'm not sure whether it is worth extending these ranges down to lower bitrates. +The work required to switch alloc tables *during* the encoding is major.</p> + +In the case of silence, it might be worth doing a quick check for very low signals +and writing a pre-calculated *blank* 32kpbs frame. [probably also a lot of work].</p> + +<h3>How CBR works</h3> +<UL> +<LI> Use the psycho model to determine the MNRs for each subband + [MNR = the ratio of "masking" to "noise"] + (From an encoding perspective, a bigger MNR in a subband means that + it sounds better since the noise is more masked)) +<LI> calculate the available data bits (adb) for this bitrate. +<LI> Based upon the MNR (Masking:Noise Ratio) values, allocate bits to each + subband +<LI> Keep increasing the bits to whichever subband currently has the min MNR + value until we have no bits left. +<LI> This mode does not guarentee that all the subbands are without noise + ie there may still be subbands with MNR less than 0.0 (noisy!) +</ul> + +<h3>How VBR works</h3> +<UL> +<LI> pretend we have lots of bits to spare, and work out the bits which would + raise the MNR in each subband to the level given by the argument on the + command line "-v [int]" +<LI> Pick the bitrate which has more bits than the required_bits we just calculated +<LI> calculate a_bit_allocation() +<LI> VBR "guarantees" that all subbands have MNR > VBRLEVEL or that we have + reached the maximum bitrate. +</ul> + +<h2>FUTURE</h2> +<UL> +<LI> with this VBR mode, we know the bits aren't going to run out, so we can + just assign them "greedily". +<LI> VBR_a_bit_allocation() is yet to be written :) +</ul> + + +</tr></td> +</body> +</html> |