diff options
Diffstat (limited to 'libtoolame-dab/html')
-rw-r--r-- | libtoolame-dab/html/changes.html | 107 | ||||
-rw-r--r-- | libtoolame-dab/html/default.html | 16 | ||||
-rw-r--r-- | libtoolame-dab/html/psycho.html | 58 | ||||
-rw-r--r-- | libtoolame-dab/html/readme.html | 236 | ||||
-rw-r--r-- | libtoolame-dab/html/vbr.html | 239 |
5 files changed, 656 insertions, 0 deletions
diff --git a/libtoolame-dab/html/changes.html b/libtoolame-dab/html/changes.html new file mode 100644 index 0000000..544ad7b --- /dev/null +++ b/libtoolame-dab/html/changes.html @@ -0,0 +1,107 @@ +<html> +<head> +<title>tooLAME changes</title> +<style> +<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif } + A { COLOR: #111177; TEXT-DECORATION: none } + TD { font-size: medium; font-weight:normal } +--!> +</STYLE> +</head> +<body> +<table border = 0 width="75%" align="center"><tr><td> +<h1>TooLAME 02l - 2 March 2003</h1> +<UL> +<LI> <b>Major psychoacoustic model overhauls</b> +<UL> + <LI> For more detail, see <a href="psycho.html">psycho models</a> + <LI> <b>Psychoacoustic Model 3</b> + <UL> + <LI> Psycho3 is a reimplementation of the psychoacoustic model 1 from the + ISO standard. Pretty much totally rewritten from the ground up, following the nomenclature of the ISO standard. + <LI> Uses arrays for keeping tone/noise labels rather than the really + hard to grok pointer-stuff from the dist10 code. + <LI> Uses Painter + Spanias' Formula for the ATH rather than the tables from the standard. + <LI> Uses LAME's freq-to-bark conversion to construct the critical bands. + <LI> Future: Needs a proper geometric mean for weighting the noise. Needs one more function added to the decimation routines which would eliminate maskers within 0.5 dB of each other. + <LI> Eliminated all the ISO/dist10 tables. Everything is from equations or built from scratch. + </ul> + <LI> <b>Psychoacoustic Model 4</b> + <UL> + <LI> A reimplementation of psychoacoustic model 2. + <LI> Eliminated all the ISO/dist10 tables. Everything (ath, bark, critical bands) is from equations or built from scratch + <LI> FUTURE: For psycho model 2 and 4 there's some really bad "warbling" and "Davros" type noise. Depending on the loudness of your sound sample, this can get really annoying. I don't know where it's coming from. + </ul> + <LI> <b>Psychoacoustic Model 0</b> + <UL> + <LI> This model uses the ATH and the scalefactors for each subband to build an approximate SMR for each subband at nearly zero cost. + <LI> Based upon an idea mentioned in "Low power mpeg/audio encoders using simplified psychoacoustic model and fast bit allocation" by Hyen-O Oh et al. + <LI> For the amount of effort that this psycho model puts in, the results are pretty good. + <LI> Future: Add some parameters to the equation to allow it to be tweaked on the fly. + </UL> + <LI> <b>Psychoacoustic Model -1</b> + <UL> + <LI> This is the old "fast" psychoacoustic model ("-f"). + <LI> All it does is copy over a static set of pre-calculated SMR values + <LI> Sounds OK for most stuff. + </UL> +</UL> +<LI> <b>New bitstream encoding routines (encode_new.c)</b> +<UL> + <LI> All the old tables.c/alloc_table stuff is now superfluous. + <LI> All tables are now at the top of encode_new.c + <LI> The tables have quite a bit of indirection to get to the value that you need (but really not any more indirection than the old alloc_table stuff). Probably need to add some more docs to say what's going on. + <LI> These new routines are the default, but you can remove the "NEWENCODE" definition and use the old ones (just in case). + <LI> The new routines will become the default in the next release. +</UL> +<LI> <b>More speed</b> +<UL> +<LI> All the trig stuff for psychomodel 4 is now done with tables instead of calculating exact values. +<LI> The <i>exact</i> trig values aren't really used directly in the encoder. They're sort of averaged over a couple of iterations and used as a predictor of uncertainty. So being off a few thousandths won't really affect anything. +</UL> +</UL> +<HL> +<h1>TooLAME 02k - 16 February 2003</h1> +<UL> +<LI> Some great speedups with a combined filtersubband and windowsubband (Ricardo Schelp ricardoschelp at arnet.com.ar) +<LI> Cleaned up the psycho model calling (should be easier to add your own psycho +model if you felt like having a hack) +<LI> DAB Extensions are now of variable length controlled by an argument to the -D switch + (Nicolas Croiset - ncroiset at vdl.fr) +<LI> Fixed raw PCM reading to no longer miss the first 40 bytes. (MFC) +<LI> No longer a 4GB limit when reading from stdin (or if your filesys supports) (Nicolas) +<LI> Tweaks to the end of the bitstream to allow concatentation of mp2 files (Nicolas) +<LI> Finally (?) fixed the segfaults using psy model 1 (Nicolas et al) +</ul> +<HL> +<h1>TooLAME 02j - 12 Feb 2003</h1> +<UL> +<LI>Definitely LGPL now. + +<LI>encode.c - VBR mode has been stabilised to work correctly for all sampling frequencies (<a href="vbr.html">README.VBR</a> has more details) + +<LI>get_audio.c has become audio_read.c - cleaned up that really dodgy wave header parsing. +(thanks to Philippe Jouguet - philippe.jouguet at vdl.fr and Henrik Herranen - leopold at vlsi.fi) + +<LI>spelling fix for 'extension' - Philippe again + +<LI>psycho_I.c - Speedup for "% 1408" calcs "-DSAMI_P1" sami.sallinen at g-cluster.com + (about 4% overall speedup for me) + +<LI>subband.c - Pointer arithmetic for filter subband "-DSAMI_SB" (sami again) + (doesn't give any advantage over gcc3.2 on my system) + +<LI>psycho_II.c +<UL> + <LI> enabled the use of gcc's _sincos(). "-DSINCOS -D_GNU_SOURCE" + about a 5% overall speed-up in encoding (Philippe again) + <LI> added the LSF frequencies so that you can use psy model 2 with LSF (good old Philippe) +</UL> +<LI>verbosity - added a '-t' flag to set the 'talkativity' level needed for transcode plugin (Andreas neukoetter - anti at webhome.de) + +<LI>toolame.c - LSF files should now select a valid default bitrate by default. (96kbps) + +</li> +</tr></td> +</body> +</html> diff --git a/libtoolame-dab/html/default.html b/libtoolame-dab/html/default.html new file mode 100644 index 0000000..b5a7b94 --- /dev/null +++ b/libtoolame-dab/html/default.html @@ -0,0 +1,16 @@ +<html> +<head> +<title>Stuff and Thing</title> +<style> +<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif } + A { COLOR: #111177; TEXT-DECORATION: none } + TD { font-size: medium; font-weight:normal } +--!> +</STYLE> +</head> +<body> +<table border = 0 width="75%" align="center"><tr><td> +Text goes here +</tr></td> +</body> +</html> diff --git a/libtoolame-dab/html/psycho.html b/libtoolame-dab/html/psycho.html new file mode 100644 index 0000000..c02ad53 --- /dev/null +++ b/libtoolame-dab/html/psycho.html @@ -0,0 +1,58 @@ +<html> +<head> +<title>tooLAME Psychoacoustic Models</title> +<style> +<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif } + A { COLOR: #111177; TEXT-DECORATION: none } + TD { font-size: medium; font-weight:normal } +--!> +</STYLE> +</head> +<body> +<table border = 0 width="75%" align="center"><tr><td> +<h1>Psychoacoustic Models in tooLAME</h1> +<h3> Introduction </h3> +In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which are the sonically important parts of the waveform that is being encoded. The PAM looks for loud sounds which may mask soft sounds, noise which may affect the level of sounds nearby, sounds which are too soft for us to hear and should be ignored and so on. The information from the PAM is used to determine which parts of the spectrum should get more bits and thus be encoded at greater quality - and which parts are inaudible/unimportant and should thus get fewer bits.<p> +In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes a <i>frame</i>. For each frame the PAM outputs just <b>32</b> values (The values are the Signal to Masking Ratio [SMR] in that subband). This is important! There are only 32 values to determine how to alloctate bits for 1152 samples - this is a pretty coarse technique.<p> +The different PAMs listed below use different techniques to decide on these 32 values. Some models are better than others - meaning that the 32 values chosen are pretty good at spreading the bits where they should go. Even with a really bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time. <P> +All of these models have strengths and weaknesses. The model <i>you</i> end up using will be the one that produces the best sound for your ears, for your audio. <p> + +<h3> Psychoacoustic Model -1 </h3> +This PAM doesn't actually look at the samples being encoded to decide upon the output values. There is simply a set of 32 default values which are used, regardless of input.<p> +<b>Pros</b>: Faaaast. Low complexity.<br> +<b>Cons</b>: Absolutely no attempt to consider any of the masking effects that would help the audio sound better<br> + +<h3> Psychoacoustic Model 0 </h3> +This PAM looks at the sizes of the <i>scalefactors</i> for the audio and combines it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.<p> +<b>Pros</b>: Faaast. Low complexity. Sounds better than PAM-1<br> +<b>Cons</b>: This model has absolutely no mathematical basis and does not use any perceptual model of hearing. It simply juggles some of the numbers of the input sound to determine the values.<br> + +<h3> Psychoacoustic Model 1 and 2</h3> +These PAMs are from the ISO standard. Just because they are the standard, doesn't mean that they are any good. Look at <a href="http://lame.sourceforge.net">LAME</a> which basically threw out the MP3 standard psycho models and made their own (GPSYCHO).<p> +<b>Pros</b>: A reference for future PAMs<br> +<b>Cons</b>: Terrible ISO code, buggy tables, poor documentation.<br> + +<h3> Psychoacoustic Model 3</h3> +A re-implementation of psychoacoustic model 1. ISO11172 was used as the guide for re-writing this PAM from the ground up.<p> +<b>Pros</b>: No more obscure tables of values from the ISO code. Hopefully a good base to work upon for tweaking PAMs<br> +<b>Cons</b>: At the moment, doesn't really sound any better than PAM1<br> + +<h3>Psychoacoustic Model 4</h3> +A cleaned up version of PAM2.<p> +<b>Pros</b>: Faster than PAM2. No more obscure tables of values from the ISO standard. Hopefully a good base to work from for improving the PAMs<br> +<b>Cons</b>: Still has the same "warbling"/"Davros" problems as PAM2.<br> + +<h3> Future psychoacoustic models </h3> +There's a heap that could be done. Unfortunately, I've got a set of tin ears, crappy speakers and a noisy computer room. If you've got the capability to do proper PAM testing then please feel free to do so. Otherwise, I'll just keep plodding along with new ideas as they arise, such as: +<UL> +<LI> Temporal masking (there's no pre-echo or anything in tooLAME) +<LI> Left Right Masking +<LI> A PAM that's fully tuneable from the command line? +<LI> Graphical output of SMR values etc. Would allow better debugging of PAMs +<LI> Re-sampling routines +<LI> Low/High pass filtering +</UL> + +</tr></td> +</body> +</html> diff --git a/libtoolame-dab/html/readme.html b/libtoolame-dab/html/readme.html new file mode 100644 index 0000000..c62394a --- /dev/null +++ b/libtoolame-dab/html/readme.html @@ -0,0 +1,236 @@ +<html> +<head> +<title>TooLAME: MPEG 1/2 Layer II Audio Encoder</title> +<style> +<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif } + A { COLOR: #111177; TEXT-DECORATION: none } + TD { font-size: medium; font-weight:normal } +--!> +</STYLE> +</head> +<body> +<table border = 0 width="75%" align="center"><tr><td> + +<h2>tooLAME - an optimized mpeg 1/2 layer 2 audio encoder</h2> +Copyright (C) 2002, 2003 Michael Cheng [mikecheng at NOT planckenergy com] remove the NOT +http://www.planckenergy.com/</p> + +<h2>Contents</h2> +<UL> +<LI> LGPL +<LI> Introduction +<LI> Usage +<LI> Examples +<LI> Contributors +<LI> References +</UL> + +<h2>LGPL</h2> +<pre> +All changes to the ISO source are licensed under the LGPL +(see LGPL.txt for details) + +tooLAME is free software; you can redistribute it and/or +modify it under the terms of the GNU Lesser General Public +License as published by the Free Software Foundation; either +version 2.1 of the License, or (at your option) any later version. + +tooLAME is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Lesser General Public License for more details. + +You should have received a copy of the GNU Lesser General Public +License along with tooLAME; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +</pre> + +<h2>Introduction</h2> + +tooLAME is an optimized Mpeg Audio 1/2 Layer 2 encoder. It is based heavily on +<UL> + <LI>the ISO dist10 code + <LI>improvement to algorithms as part of the LAME project (www.sulaco.org/mp3) + <LI>work by myself and other contributors (see CONTRIBUTORS) +</ul> +<h2>Installation</h2> +<OL> +<LI> edit Makefile (at least change the architecture type [ARCH] to suit your machine) +<LI> 'make' +</ol> + +<h2>Usage</h2> + +<pre> + ./toolame [options] [input file] [output file] + +Input File + tooLAME parses AIFF and WAV files for file info + raw PCM is assumed if no header is found + for stdin use a - + +Output File + file is automatically renamed from *.* to *.mp2 + for stdout use a - + +Input Options + -s [int] + if inputting raw PCM sound, you must specify the sample rate + default sample rate is 44.1khz. + + -a + downmix from stereo to mono + if the incoming file is stereo, combine the audio into + a single channel + + -x + force byte-swapping of the input. (current endian detection is dodgy, + so if toolame produces only noise, use -x ) + + -g + swap the LR channels of a stereo file + +Output Options + -m [char] + the encoding mode (default 'j') + 's' stereo + 'd' dual channel + 'j' joint stereo + 'm' mono + + -p [int] + which psy model to use (default '1') + Different models for the psychoacoustics + Models: -1 to 4 + + -b [int] + the total bitrate + For 48/44.1/32kHz default = 192 + For 24/22.05/16kHz default = 96 + + -v [int] + Switch on VBR mode. + The higher the number the better the quality. + Useful range -10 to 10. + See README.VBR for details. + + +Operation + -f + fast mode turns off calculation of the psychoacoustic model. + Instead a set of default values are assumed + + -q [int] + quick mode calculates the psy model every 'num' frames. + +Misc + -d emp + de-emphasis (default 'n') + -c + mark as copyright + -o + mark as original + -e + add error protection + -r + force padding bits off + -D + add DAB extensions + -t [int] + 'talkativity' setting. 0 = no message. 3 = too much information + (-t 20 will probably flood you off your terminal) +</pre> + +<h2>Examples</h2> + +<pre> + toolame sound.wav +</pre> + This will encode sound.wav to sound.mp2 using the default bitrate of 192 kbps + and using the default psychoacoustic model (model 1)</p> + +<pre> + toolame -p 2 -v 5 sound.wav newfile.mp2 +</pre> + Encode sound.wav to newfile.mp2 using psychoacoustic model 2 and encoding + with variable bitrate. The high value of the "-v" argument means that + the encoding will tend to favour higher bitrates.</p> + +<pre> + toolame -p 2 -v -5 sound.wav newfile.mp2 +</pre> + Same as example above, except that the negative value of the "-v" argument + means that the lower bitrates will be favoured over the higher ones.</p> + +<pre> + cat sound.raw | toolame -s 22050 -f -b 96 - newfile.mp2 +</pre> + Toolame is encoding from stdin at a bitrate of 96kbps and is using the + 'fast' mode which means that no psychoacoustic modelling is done. The + input file is raw pcm so the sample rate needs to be specified (22050Hz)</p> + +<h2>Contributors</h2> + +<UL> +<LI>Dist10 code writers +<LI>LAME specific contributions +<UL> + <LI>fht routines from Ron Mayer mayer at acuson.com + <LI>fht tweaking by Mathew Hendry math at vissci.com + <LI>window_subband & filter_subband from LAME circa v3.30 + (multiple LAME authors) + (before Takehiro's window/filter/mdct combination) +</UL> +<LI> Oliver Lietz - lietz at nanocosmos.de - Tables included in the exe + +<LI> Patrick de Smet - pds at telin.rug.ac.be - scale_factor calc speedup. subband_quantization speedup + +<LI> Federico Grau - grauf at rfa.org - and Bill Eldridge - bill at hk.rfa.org - option for "no padding" + +<LI> Nick Burch - gagravarr at SoftHome.net - WAV file reading, os/2 Makefile mods. + +<LI> Phillipe Jouguet - philippe.jouguet at vdldiffusion.com - DAB extensions. spelling, LSF using psyII, WAVE reading [02j] + +<LI> Henrik Herranen - leopold at vlsi.fi - fixed WAVE reading [02j] + +<LI> Andreas Neukoetter - anti at webhome.de - verbosity patch '-t' switch for transcode plugin [02j] + +<LI> Sami Sallinen - sami.sallinen at g-cluster.com - filter_subband loop unroll, psycho_i fix for "% 1408" calcs [02j] + +<LI> Ricardo Schelp - ricardoschelp at arnet.com.ar - merged window/filter subband for a nice speedup [02k] + +<LI> Nicolas Croiset - ncroiset at vdl.fr - DAB length control, ignore 4GB limit when reading from stdin, fixed bitstream ending to allow concatenation of mp2 files, fixes for psycho1 model [02k] + +<LI> Mike Cheng mikecheng at NOT planckenergy.com - Most of the rest + +</UL> + +<h2>References</h2> + +Kumar, M & Zubair, M., A high performance software implementation of mpeg audio +encoder, 1996, ICASSP Conf Proceedings (I think)</p> + +Fischer, K.A., Calculation of the psychoacoustic simultaneous masked threshold +based on MPEG/Audio Encoder Model One, ICSI Technical Report, 1997 +ftp://ftp.icsi.berkeley.edu/pub/real/kyrill/PsychoMpegOne.tar.Z </p> + +Hyen-O et al, New Implementation techniques of a real-time mpeg-2 audio encoding +system. p2287, ICASSP 99.</p> + +Imai, T., et al, MPEG-1 Audio real-time encoding system, IEEE Trans on Consumer +Electronics, v44, n3 1998. p888</p> + +Teh, D., et al, Efficient bit allocation algorithm for ISO/MPEG audio encoder, +Electronics Letters, v34, n8, p721</p> + +Murphy, C & Anandakumar, K, Real-time MPEG-1 audio coding and decoding on a DSP +Chip, IEEE Trans on Consumer Electronics, v43, n1, 1997 p40</p> + +Hans, M & Bhaskaran, V., A compliant MPEG-1 layer II audio decoder with 16-B +arithmetic operations, IEEE Signal Proc Letters v4 n5 1997 p121</p> + +[mikecheng at NOT planckenergy com] remove the NOT</p> + +</tr></td> +</body> +</html> diff --git a/libtoolame-dab/html/vbr.html b/libtoolame-dab/html/vbr.html new file mode 100644 index 0000000..5f7ae38 --- /dev/null +++ b/libtoolame-dab/html/vbr.html @@ -0,0 +1,239 @@ +<html> +<head> +<title>tooLAME: MPEG Audio Layer II VBR</title> +<style> +<!-- BODY { BACKGROUND: #FFFFFF; COLOR: #000000; FONT-SIZE: 10pt; FONT-FAMILY: verdana, sans-serif } + A { COLOR: #111177; TEXT-DECORATION: none } + TD { font-size: medium; font-weight:normal } +--!> +</STYLE> +</head> +<body> + +<table border = 0 width="75%" align="center"><tr><td> +<h1> tooLAME: MPEG Audio Layer II VBR </h1> + +<h2>Contents</h2> +<Ul> +<LI>Introduction +<LI>Usage +<LI>Bitrate Ranges for various Sampling frequencies +<LI>Why can't the bitrate vary from 32kbps to 384kbps for every file? +<UL> + <LI>Short Answer + <LI>Long Answer +</UL> +<LI> Tech Stuff +</UL> + +<h2>Introduction</h2> +VBR mode works by selecting a different bitrate for each frame. Frames +which are harder to encode will be allocated more bits i.e. a higher bitrate.</p> + +LayerII VBR is a complete hack - the ISO standard actually says that decoders are not +required to support it. As a hack, its implementation is a pain to try and understand. +If you're mega-keen to get full range VBR working, either (a) send me money (b) grab the +ISO standard and a C compiler and email me.</p> + +<h2>Usage</h2> +<pre> + toolame -v [level] inputfile outputfile. +</pre> +A level of 5 works very well for me.</p> + +The level value can is a measurement of quality - the higher +the level the higher the average bitrate of the resultant file. +[See TECH STUFF for a better explanation of what the value does]</p> + +The confusing part of my implementation of LayerII VBR is that it's different from MP3 VBR. +<UL> +<LI>The range of bitrates used is controlled by the input sampling frequency. (See below "Bitrate ranges") +<LI>The tendency to use higher bitrates is governed by the <level>. +</ul> + +E.g. Say you have a 44.1kHz Stereo file. In VBR mode, the bitrate can range from 192 to 384 kbps. +Using "-v -5" will force the encoder to favour the lower bitrate. +Using "-v 5" will force the encoder to favour the upper bitrate. +The value can actually be *any* int. -27, 233, 47. The larger the number, the greater +the bitrate bias.</p> + +<h2>Bitrate Ranges</h2> + +When making a VBR stream, the bitrate is only allowed to vary within +set limits</p> + +<pre> +48kHz +Stereo: 112-384kbps Mono: 56-192kbps + +44.1kHz & 32kHz +Stereo: 192-384kbps Mono: 96-192kbps + +24kHz, 22.05kHz & 16kHz +Stereo/Mono: 8-160kbps +</pre> + +<h2>Why doesn't the VBR mode work the same as MP3VBR? The Short Answer</h2> +<b>Why can't the bitrate vary from 32kbps to 384kbps for every file?</b></p> +According to the standard (ISO/IEC 11172-3:1993) Section 2.4.2.3 +<pre> + "In order to provide the smallest possible delay and complexity, the + decoder is not required to support a continuously variable bitrate when + in layer I or II. Layer III supports variable bitrate by switching the + bitrate index." + + and + + "For Layer II, not all combinations of total bitrate and mode are allowed." +</pre> + +Hence, most LayerII coders would not have been written with VBR in mind, and +LayerII VBR is a hack. It works for limited cases. Getting it to work to +the same extent as MP3-style VBR will be a major hack.</p> + +(If you *really* want better bitrate ranges, read "The Long Answer" and submit your mega-patch.)</p> + +<h2>Why doesn't the VBR mode work the same as MP3VBR? The Long Answer</h2> +<b>Why can't the bitrate vary from 32kbps to 384kbps for every file?</b> + +<h3>Reason 1: The standard limits the range</h3> + +As quoted above from the standard for 48/44.1/32kHz: +<pre> + "For Layer II, not all combinations of total bitrate and mode are allowed. See + the following table." + +Bitrate Allowed Modes +(kbps) +32 mono only +48 mono only +56 mono only +64 all modes +80 mono only +96 all modes +112 all modes +128 all modes +160 all modes +192 all modes +224 stereo only +256 stereo only +320 stereo only +384 stereo only +</pre> + +So based upon this table alone, you *could* have VBR stereo encoding which varies +smoothly from 96 to 384kbps. Or you could have have VBR mono encoding which varies from +32 to 192kbps. But since the top and bottom bitrates don't apply to all modes, it would +be impossible to have a stereo file encoded from 32 to 384 kbps.</p> + +But this isn't what is really limiting the allowable bitrate range - the bit allocation +tables are the major hurdle.</p> + +<h3>Reason 2: The bit allocation tables don't allow it</h3> + +From the standard, Section 2.4.3.3.1 "Bit allocation decoding"</p> +<pre> + "For different combinations of bitrate and sampling frequency, different bit + allocation tables exist. +</pre> +These bit allocation tables are pre-determined tables (in Annex B of the standard) which +indicate +<UL> + <LI>how many bits to read for the initial data (2,3 or 4) + <LI>these bits are then used as an index back into the table to + find the number of quantize levels for the samples in this subband +</ul> +But the table used (and hence the number of bits and the calculated index) are different +for different combinations of bitrate and sampling frequency.</p> + +I will use TableB.2a as an example.</p> + +Table B.2a Applies for the following combinations. +<pre> +Sampling Freq Bitrates in (kbps/channel) [emphasis: this is a PER CHANNEL bitrate] +48 56, 64, 80, 96, 112, 128, 160, 192 +44.1 56, 64, 80 +32 56, 64, 80 +</pre> +If we have a STEREO 48kHz input file, and we use this table, then the bitrates +we could calculate from this would be 112, 128, 160, 192, 224, 256, 320 and 384 kbps.</p> + +This table contains no information on how to encode stuff at bitrates less than 112kbps +(for a stereo file). You would have to load allocation table B.2c to encode stereo at +64kbps and 128kbps.</p> + +Since it would be a MAJOR piece of hacking to get the different tables shifted in and out +during the encoding process, once an allocation table is loaded *IT IS NOT CHANGED*.</p> + +Hence, the best table is picked at the start of the encoding process, and the encoder +is stuck with it for the rest of the encode. </p> + +For toolame-02j, I have picked the table it loads for different +sampling frequencies in order to optimize the range of bitrates possible. +<pre> +48 kHz - Table B.2a + Stereo Bitrate Range: 112 - 384 + Mono Bitrate Range : 56 - 192 + +44.1/32 kHz - Table B.2b + Stereo Bitrate Range: 192 - 384 + Mono Bitrate Range: 96 - 192 + +24/22.05/16 kHz - LSF Table (Standard ISO/IEC 13818.3:1995 Annex B, Table B.1) + There is only 1 table for the Lower Sampling Frequencies + All modes (mono and stereo) are allowable at all bitrates + So at the Lower Sampling Frequencies you *can* have a completely variable + bitrate over the entire range. +</pre> +<h2>Tech Stuff</h2> + +The VBR mode is mainly centered around the main_bit_allocation() and +a_bit_allocation() routines in encode.c.</p> + +The limited range of VBR is due to my particular implementation which restricts +ranges to within one alloc table (see tables B.2a, B.2b, B.2c and B.2d in ISO 11172). +The VBR range for 32/44.1khz lies within B.2b, and the 48khz VBR lies within table B.2a.</p> + +I'm not sure whether it is worth extending these ranges down to lower bitrates. +The work required to switch alloc tables *during* the encoding is major.</p> + +In the case of silence, it might be worth doing a quick check for very low signals +and writing a pre-calculated *blank* 32kpbs frame. [probably also a lot of work].</p> + +<h3>How CBR works</h3> +<UL> +<LI> Use the psycho model to determine the MNRs for each subband + [MNR = the ratio of "masking" to "noise"] + (From an encoding perspective, a bigger MNR in a subband means that + it sounds better since the noise is more masked)) +<LI> calculate the available data bits (adb) for this bitrate. +<LI> Based upon the MNR (Masking:Noise Ratio) values, allocate bits to each + subband +<LI> Keep increasing the bits to whichever subband currently has the min MNR + value until we have no bits left. +<LI> This mode does not guarentee that all the subbands are without noise + ie there may still be subbands with MNR less than 0.0 (noisy!) +</ul> + +<h3>How VBR works</h3> +<UL> +<LI> pretend we have lots of bits to spare, and work out the bits which would + raise the MNR in each subband to the level given by the argument on the + command line "-v [int]" +<LI> Pick the bitrate which has more bits than the required_bits we just calculated +<LI> calculate a_bit_allocation() +<LI> VBR "guarantees" that all subbands have MNR > VBRLEVEL or that we have + reached the maximum bitrate. +</ul> + +<h2>FUTURE</h2> +<UL> +<LI> with this VBR mode, we know the bits aren't going to run out, so we can + just assign them "greedily". +<LI> VBR_a_bit_allocation() is yet to be written :) +</ul> + + +</tr></td> +</body> +</html> |