libtoolame-dab/html/vbr.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239

<html>
<head>
<title>tooLAME: MPEG Audio Layer II VBR</title>
<style>	
<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif }
	A { COLOR: #111177;  TEXT-DECORATION: none }
       TD { font-size: medium; font-weight:normal }
--!>
</STYLE>
</head>
<body>

<table border = 0 width="75%" align="center"><tr><td>
<h1> tooLAME: MPEG Audio Layer II VBR </h1>

<h2>Contents</h2>
<Ul>
<LI>Introduction
<LI>Usage
<LI>Bitrate Ranges for various Sampling frequencies
<LI>Why can't the bitrate vary from 32kbps to 384kbps for every  file?
<UL>
	<LI>Short Answer
	<LI>Long Answer 
</UL>
<LI> Tech Stuff
</UL>

<h2>Introduction</h2>
VBR mode works by selecting a different bitrate for each frame.  Frames
which are harder to encode will be allocated more bits i.e. a higher bitrate.</p>

LayerII VBR is a complete hack - the ISO standard actually says that decoders are not
required to support it. As a hack, its implementation is a pain to try and understand.
If you're mega-keen to get full range VBR working, either (a) send me money (b) grab the 
ISO standard and a C compiler and email me.</p>

<h2>Usage</h2>
<pre>
	toolame -v [level] inputfile outputfile.
</pre>
A level of 5 works very well for me.</p>

The level value can is a measurement of quality - the higher
the level the higher the average bitrate of the resultant file. 
[See TECH STUFF for a better explanation of what the value does]</p>

The confusing part of my implementation of LayerII VBR is that it's different from MP3 VBR.
<UL>
<LI>The range of bitrates used is controlled by the input sampling frequency. (See below "Bitrate ranges")
<LI>The tendency to use higher bitrates is governed by the <level>.
</ul>

E.g. Say you have a 44.1kHz Stereo file. In VBR mode, the bitrate can range from 192 to 384 kbps.  
Using "-v -5" will force the encoder to favour the lower bitrate.  
Using "-v 5" will force the encoder to favour the upper bitrate. 
The value can actually be *any* int. -27, 233, 47. The larger the number, the greater 
the bitrate bias.</p>

<h2>Bitrate Ranges</h2>

When making a VBR stream, the bitrate is only allowed to vary within
set limits</p>

<pre>
48kHz   
Stereo: 112-384kbps  Mono: 56-192kbps

44.1kHz & 32kHz
Stereo: 192-384kbps  Mono: 96-192kbps

24kHz, 22.05kHz & 16kHz
Stereo/Mono: 8-160kbps
</pre>

<h2>Why doesn't the VBR mode work the same as MP3VBR? The Short Answer</h2>
<b>Why can't the bitrate vary from 32kbps to 384kbps for every file?</b></p>
According to the standard (ISO/IEC 11172-3:1993) Section 2.4.2.3
<pre>
	"In order to provide the smallest possible delay and complexity, the
	 decoder is not required to support a continuously variable bitrate when
	 in layer I or II.  Layer III supports variable bitrate by switching the
	 bitrate index."

	and 

	"For Layer II, not all combinations of total bitrate and mode are allowed."
</pre>

Hence, most LayerII coders would not have been written with VBR in mind, and 
LayerII VBR is a hack. It works for limited cases. Getting it to work to 
the same extent as MP3-style VBR will be a major hack.</p>

(If you *really* want better bitrate ranges, read "The Long Answer" and submit your mega-patch.)</p>

<h2>Why doesn't the VBR mode work the same as MP3VBR? The Long Answer</h2>
<b>Why can't the bitrate vary from 32kbps to 384kbps for every file?</b>

<h3>Reason 1: The standard limits the range</h3>

As quoted above from the standard for 48/44.1/32kHz: 
<pre>
	"For Layer II, not all combinations of total bitrate and mode are allowed. See
	 the following table."

Bitrate		Allowed Modes
(kbps)
32		mono only
48		mono only
56		mono only
64		all modes
80		mono only
96		all modes
112		all modes
128		all modes
160		all modes
192		all modes
224		stereo only
256		stereo only
320		stereo only
384		stereo only
</pre>

So based upon this table alone, you *could* have VBR stereo encoding which varies
smoothly from 96 to 384kbps. Or you could have have VBR mono encoding which varies from
32 to 192kbps.  But since the top and bottom bitrates don't apply to all modes, it would
be impossible to have a stereo file encoded from 32 to 384 kbps.</p>

But this isn't what is really limiting the allowable bitrate range - the bit allocation
tables are the major hurdle.</p>

<h3>Reason 2: The bit allocation tables don't allow it</h3>

From the standard, Section 2.4.3.3.1 "Bit allocation decoding"</p>
<pre>
	"For different combinations of bitrate and sampling frequency, different bit 
	 allocation tables exist.
</pre>
These bit allocation tables are pre-determined tables (in Annex B of the standard) which
indicate 
<UL>
	<LI>how many bits to read for the initial data (2,3 or 4)
	<LI>these bits are then used as an index back into the table to 
	  find the number of quantize levels for the samples in this subband
</ul>
But the table used (and hence the number of bits and the calculated index) are different
for different combinations of bitrate and sampling frequency.</p>

I will use TableB.2a as an example.</p>

Table B.2a Applies for the following combinations.
<pre>
Sampling Freq		Bitrates in (kbps/channel) [emphasis: this is a PER CHANNEL bitrate]
48			56, 64, 80, 96, 112, 128, 160, 192
44.1			56, 64, 80
32			56, 64, 80
</pre>
If we have a STEREO 48kHz input file, and we use this table, then the bitrates
we could calculate from this would be 112, 128, 160, 192, 224, 256, 320 and 384 kbps.</p>

This table contains no information on how to encode stuff at bitrates less than 112kbps 
(for a stereo file). You would have to load allocation table B.2c to encode stereo at
64kbps and 128kbps.</p>

Since it would be a MAJOR piece of hacking to get the different tables shifted in and out
during the encoding process, once an allocation table is loaded *IT IS NOT CHANGED*.</p>

Hence, the best table is picked at the start of the encoding process, and the encoder
is stuck with it for the rest of the encode. </p>

For toolame-02j, I have picked the table it loads for different
sampling frequencies in order to optimize the range of bitrates possible.
<pre>
48 kHz - Table B.2a
	 Stereo Bitrate Range: 112 - 384
	 Mono Bitrate Range : 56 - 192

44.1/32 kHz - Table B.2b
	Stereo Bitrate Range: 192 - 384
	Mono Bitrate Range: 96 - 192

24/22.05/16 kHz - LSF Table (Standard ISO/IEC 13818.3:1995 Annex B, Table B.1)
	There is only 1 table for the Lower Sampling Frequencies
	All modes (mono and stereo) are allowable at all bitrates
	So at the Lower Sampling Frequencies you *can* have a completely variable 
	bitrate over the entire range.
</pre>
<h2>Tech Stuff</h2>

The VBR mode is mainly centered around the main_bit_allocation() and 
a_bit_allocation() routines in encode.c.</p>

The limited range of VBR is due to my particular implementation which restricts 
ranges to within one alloc table (see tables B.2a, B.2b, B.2c and B.2d in ISO 11172).  
The VBR range for 32/44.1khz lies within B.2b, and the 48khz VBR lies within table B.2a.</p>

I'm not sure whether it is worth extending these ranges down to lower bitrates.
The work required to switch alloc tables *during* the encoding is major.</p>

In the case of silence, it might be worth doing a quick check for very low signals
and writing a pre-calculated *blank* 32kpbs frame. [probably also a lot of work].</p>

<h3>How CBR works</h3>
<UL>
<LI>	Use the psycho model to determine the MNRs for each subband
	  [MNR = the ratio of "masking" to "noise"]
	  (From an encoding perspective, a bigger MNR in a subband means that
	   it sounds better since the noise is more masked))
<LI>	calculate the available data bits (adb) for this bitrate.
<LI>	Based upon the MNR (Masking:Noise Ratio) values, allocate bits to each 
	  subband 
<LI>	Keep increasing the bits to whichever subband currently has the min MNR 
	  value until we have no bits left.
<LI>	This mode does not guarentee that all the subbands are without noise
	  ie there may still be subbands with MNR less than  0.0  (noisy!)
</ul>

<h3>How VBR works</h3>
<UL>
<LI>	pretend we have lots of bits to spare, and work out the bits which would
	  raise the MNR in each subband to the level given by the argument on the
	  command line "-v [int]"
<LI>	Pick the bitrate which has more bits than the  required_bits we just calculated
<LI>	calculate a_bit_allocation()
<LI>	VBR "guarantees" that all subbands have MNR > VBRLEVEL or that we have 
	  reached the maximum bitrate.
</ul>

<h2>FUTURE</h2>
<UL>
<LI>	with this VBR mode, we know the bits aren't going to run out, so we can
	  just assign them "greedily".
<LI>	VBR_a_bit_allocation() is yet to be written :)
</ul>


</tr></td>
</body>
</html>