525,659

Primer’s Detailed Guide to Digital Audio File Types

While you’ve probably been converting your CD’s to MP3′s for years, we guarantee you haven’t been getting the best quality. It’s time to set the record — err — MP3 straight so you can finally listen to your crappy 90′s rock collection the way it was meant to be heard.

 

Back in the day, trading bootlegged, pirated* or otherwise portable (i.e. non-retail) music was fraught with relatively few technological stumbling blocks. Most of the high-tech finesse dealt with remembering to keep your cassettes away from magnets and affixing a piece of masking tape over your mom’s “G Force” by Kenny G cassette so you could record over it with Metallica’s “Kill ‘Em All.”

Now we have an entirely different set of problems that has less to do with hitting pause just in time to cut-out “You’re listening to 105.5 KNAC…” and more to do with figuring out what the hell is the difference between an M4A, MP3, FLAC, WMA, FLAC and APE. If that string of acronyms made your brain melt, go ahead and bookmark this page now – you’ll be needing it if you plan to traffic digital media.

*A quick note on legality: It doesn’t matter if you’re downloading a digitally ripped album merely as a “backup” for your physical copy (which I’m pretty sure still isn’t a legal) – if you’re grabbing it via Torrent or some other Peer-to-Peer transfer network, you’ll likely be sending it back out to others who may not be so legit. That’s what will get you busted (i.e. get you kicked off your campus’ Internet access, fined or jailed). For safety’s sake, make sure you are grabbing your tunes from a reputable, official source.

Why so many file types?

To understand why there are so many different types of digital music and movie files, you need to understand encoding and compression. Bear with me for a second, we’re going to make a brief stopover in Nerdland. If you don’t care about all this mumbo jumbo, feel free to skip down to the compatibility reference sheet below.

As you can imagine, you can’t simply bottle up sound in a jelly jar and then squish it onto a compact disc. It must first be converted into a digitized (i.e. a series of zeros and ones) format that can be read electronically. The product of this process is Pulse-code Modulation (PCM) data or, a digital representation of the analog sound. PCM is what we use for telephones, compact discs and computer audio.

On your computer, a PCM file is appended with a digital header that makes it readable by your machine – for Macs, it’s usually AIFF. With Windows, it’s typically WAV. That’s basically the only difference between a straight-up PCM file and an AIFF or WAV. Don’t confuse an AIFF or a WAV file for a digital audio file itself. They are merely media containers. Media containers can hold numerous different kinds of audio or video formats. (For example, an M4A (Apple’s proprietary file type) is an AAC file within an MP4 container. If you put a TwinVQ file into an MP4 container, it would be a VQF file.)

PCM files are uncompressed and relatively unaltered – meaning they sound as close to the original as possible. However, the downside of this is that they are big. For example, a five minute two-channel 44.1 kHZ 720 kbits/s WAV file – that’s a CD-quality stereo recording – would weigh in at about 54 megabytes. At that rate, you would’ve already filled up your 2nd Gen iPod Shuffle after adding “Bat out of Hell” and “Bat out of Hell II: Back into Hell” and you wouldn’t even have room for “Bat out of Hell III: The Monster is Loose.”

In order to shrink down an audio file’s size, it needs to be compressed. With an uncompressed audio file, every second of the audio takes up the same amount of space, whether it’s a chorus of yodeling goats or dead silence. Essentially, it’d be like packing your boxes half full of nothing when moving out of your apartment because that’s how your stuff was arranged in your closet. Audio compression says to hell with that, ignoring empty space and, depending on how small your box is, some other extra crap that you’re never going to use, too. He’s also going to take apart your bed and collapse all your other furniture in order to make it fit better. This space-efficient packing is what we’ll call “encoding.” That clutter that gets tossed out during the move – let’s call that “loss.”

Torturing the metaphor further, there are dozens of ways to pack a U-Haul, some more efficient than others, and some that will result in less damage to your Fabergé egg collection when you go bumping down the backroads. Likewise, there are numerous different methods for encoding an audio file, hence numerous different file types and ultimately, numerous procedures for unpacking it all. Let’s talk about the most common method first: lossy audio encoding.

Bits, Quantization and Codecs

The concept behind lossy audio compression hinges upon psychoacoustics. But before we get into that, we need to talk about bits. Remember the zeros and ones we talked about above? Those are bits, and they compose the audio file that you’re listening to. Eight bits equals a byte and 1024 (usually rounded to 1000) bytes make a megabyte. The bitrate refers to how many bits are conveyed per second. Essentially, the higher the bitrate, the more detailed the file becomes. Just like a shitty JPEG with less bits per pixel (BPP) looks pixelated and blurry, an audio file with a low bitrate becomes garbled and muddy (i.e. noisy).

Take the example from above: 720 kilobits (a kilobit being 1024 bits) per second for 300 seconds (that’s five minutes) times two (for stereo sound) equals 432,000 bits. Divided by eight to get bytes is 54,000 or 54 megabytes. The goal in lossy audio compression is to create a more streamlined distribution of bits while keeping the audio quality mostly intact.

The keyword there would be “mostly” intact, which is where the meaning of the term “lossy” comes in. Basically, when the audio file is compressed, certain pieces of data are being lopped off or just kind of fudged over. Think of a curve drawn on a line graph (sound waves are curves, afterall). Now, say you plot a couple points along that curve – let’s say, one hundred of them, and then take the curve away and re-draw it based on the points you plotted. As long as the curve is relatively predictable you can easily draw a pretty accurate representation of the original curve from the hundred points you plotted. Now, try the same thing again, but with ten points. The curve that you re-draw begins to resemble the original curve less and less as you use fewer points. This is what happens when you take bits away – the re-drawing of the curve is called quantization and is much easier to present visually, as this Tufts professor did in his lecture notes.

So you see, the sounds you are hearing aren’t actually the song as recorded. They are a digitally deconstructed and then digitally reconstructed representation of the recording of the song. And the inaccuracies in the reconstruction are what are called quantization noise or more generally compression artifacts. You’ll recognize these sounds on poorly encoded songs as popping, drop-outs, warbling, hissing, an “underwater” quality or pre-echo. Compression artifacts are most apparent in very “busy” recordings, such as applause or cymbal crashes.

Deciding which frequencies to fiddle with and which ones to leave be is all up to the algorithm that is used to encode the file and subsequently decode it for listening later. The program that runs that algorithm is called a codec. A codec is something you may need to download in order to play a file, and something you will definitely need if you intend to encode an audio file.

Psychoacoustics

Now that we have that figured out, lets return to psychoacoustics. Aside from totally being the name of my next folk/horror punk band, psychoacoustics is basically the study of humans’ perception of sound. For audio compression, it’s simple: there are frequencies of the audio spectrum that are imperceptible to man, thus rendering certain sounds inaudible, such as dog whistles, rat laughter and incessant female nagging while the game is on (Hey-oh! I’ll be here all week folks; remember to tip your waiter).

The first step to audio compression is cutting all that unhearable junk out. While hipsters and audiophiles may argue otherwise, unpretentious folk won’t notice a thing. A marginal amount of space can also be saved right off the bat by chopping out the few seconds of silence at the beginning of the song and the 10 and a half minutes of nothing between the last song on the album and the obligatory hidden track. Of course, the silence will still be there when you listen, it just won’t have hardly any bits devoted to it.

Variable Bitrates

Some lossy encoding methods employ a variable bitrate. Just as there are some frequencies we don’t hear at all, there are some frequencies that are less audible at certain levels of loudness. As such, compression artifacts occurring in these less audible frequencies are nearly undetectable. Hiding encoding errors in inaudible frequency ranges is called noise shaping. Because certain frequencies don’t need a high bitrate to sound passable, putting bits to work there is overkill. Instead of letting them hang around like redundant employees, wasting electricity and gobbling up all the snacks in the break room, a variable bitrate just lets them go.

There are pro’s and con’s to variable bitrates. On the one hand, you do get a much more efficient distribution of bits – those bits that were hanging out in the moments of silence song can now be sent to the big guitar solo, where lots of bits are needed. However, if the bitrate dips too low, things can start to sound junky again. Luckily, when encoding, you can set a threshold to keep from the bitrate from dipping too low or soaring too high (i.e. a constrained variable bitrate).

Specs

While mileage varies between lossy codecs, especially when different bitrates are factored in, typicaly, a CD quality audio file compresses to about 10:1. That is, about 1 megabyte per minute. Because of its small size and high quality, lossy encoded audio files have long been the gold standard in trading audio files over the Internet.

Lossy file formats include: Advanced Audio Coding (AAC), Adaptive Differential Pulse Code Modulation (ADPCM), Adaptive Transform Acoustic Coding (ATRAC), Dolby AC-3, M-PEG-1 Audio Layer II (MP2), M-PEG-1 Audio Layer II (MP3) Musepack (MPC), Ogg Vorbis (OGG or OGA) and Windows Media Audio (WMA).

Lossless Audio Compression

Lossless audio compression, being a somewhat more impressive and recent technology, is understandably a bit harder to explain and understand without getting really technical. For our purposes, it suffices to understand lossless encoding in contrast to lossy encoding in that, whereas lossy encoding ditches stuff that you don’t need, lossless doesn’t. Thereby, a losslessly encoded file should, in theory, be almost identical to the original recording. So how do they make the file size smaller? Well it’s complicated. And not everyone does it the same. But basically, it involves “applying carefully selected filters to the audio, noting the filter coefficients and only storing the coefficients and residual audio output of those filters,” as explained by Seneschal. If that makes no sense to you, don’t worry, I’m with you. Here’s how I understand it (standby for a three paragraph sandwich metaphor):

Think of a lossless codec as an intern taking down a lunch order for forty extremely picky executives, each one itching to fire him if he get’s it wrong. Problem is: he’s only got one post-it note to write it all down.

The first executive, let’s call him “Reuben,” goes to him and says: “Listen, I want you to get me two slices of rye bread, toasted, and between that bread I want a slice of beef that’s been preserved by anaerobic fermentation in a solution of salt and water. Also, I want you to stick some fermented cabbage in there with some curdled milk (the kind with holes in it) and slather it with a mixture of ketchup, mayonnaise, Tabasco and some chopped onions, bell peppers, olives and also some cucumber that has also been preserved by anaerobic fermentation in a solution of salt and water. Got that?”

The intern nods, and abbreviates the the order down to a “corned beef sandwich with Swiss cheese, sauerkraut and Thousand Island Dressing on rye.” Then, rather than writing that down four more times, he just writes, “three Reubens.” One lady orders mostly the same thing, but with a few tweaks. The next three people say, “I’ll have what she’s having.” So he writes down “four Rachels.” And so on. After all that, he’s got a post-it note with a bunch of words on it but seemingly no food items. That would be your audio file. The guy at the deli translating the items on the post-it to sandwiches would be the audio player decoding the file.

So, basically, imagine the lossless codec doing that to an audio recording, but multiplied by a million or more times. An example of another kind of lossless encoding beyond sandwiches and audio files would be an archive such as a zip file. An archive contains files that have been saved in their entirety with no alterations, but must be extracted from the archive before they can be used.

Because a lossless audio file is, by virtue, lossless, you won’t have the same issues with discerning bitrate and such. You can make the file smaller, but it won’t affect the bitrate or quality – it will simply take the algorithm longer to figure out a way to pack it all down and, subsequently for the codec to unpack it. In that case, it’d be like one of the executives ordered some obscure foreign sandwich and the intern had to take time to learn how to spell it correctly and the guy at the deli had to call his buddy in Turkmenistan to ask how to make it.

Specs

Again, while mileage varies depending on the codec and the audio file (i.e. a five minute recording of silence will be much smaller than a five minute Dillinger Escape Plan song), lossless audio typically compresses down to a 2:1 ratio. That is, about 25 megabytes per minute. As Internet connections get faster and hard drives and digital players get bigger, lossless audio encoding is becoming increasingly popular for those who don’t want to muck up their precious audio collections.

Lossless file types include: Free Lossless Audio Codec (FLAC); WavPack (WV); Monkey’s Audio (APE), Shorten (SHN) and Apple Lossless (ALAC).

Proprietary Formats

So how do people choose which file format to use? Other than personal preference, a lot of it has to do with proprietary licensing. Take the most popular format: MPEG-1 Layer III (MP3). People refer to audio files as MP3s generically sort of like southerners call all soda pop Coke (even if it’s a Mountain Dew!).

Like any software, the code that’s used to encode and decode MP3s is intellectual property with numerous organizations holding the rights to distribute and use its various facets. Developers – such as Nullsoft, Microsoft and Apple – all need to buy licensing in order to support MP3 playback. There are more intracacies to the rights involved as well, which is why major players have largely moved towards their own file formats.

Microsoft’s file format is called Windows Media Audio (WMA) and all Windows systems come with built-in support for its playback through Windows Media Player. Apple, on the other hand, touts its own Apple Lossless audio codec, which is bundled inside an MP4 container and called an M4A. Note that this is different from Advanced Audio Coding (AAC) which is also bundled into an M4A (the preferred file type for iTunes) and licensed by MPEG.

What this all means is that your Windows PC won’t be able to play Apple Lossless files out of the box and your Apple computer won’t be able to play WMA out of the box. You’ll need to download the corresponding codecs for your platform in order to do so. In this instance, you’d need to download a filter for Windows Media Player (such as this one by DSP-worx) and Flip4Mac for your Apple computer. Kind of a hassle, right?

In response to this headache, some developers are creating open source solutions. Namely, the Free Lossless Audio Codec and Ogg Vorbis. These are virtually free for anyone to use and develop, but rarely come bundled with default installs.

What does this mean to you? Unless you’re a developer, not much other than the headache of downloading codecs (something that’s made easier with a program like CodecInstaller). But if you plan on backing up your music collection and/or sharing music with friends, you’ll want something that transfers across platforms easily. You can always convert a file, but each time you do so – especially with lossless formats – you lose a little bit of quality. My advice: choose a file type and stick to it.

Digital Rights Management

Another reason purveyors of tunes use proprietary formats is so they can impose Digital Rights Management (DRM) on the files. In other words, anti-piracy measures. Up until recently, all music from iTunes required you to “authorize” your computer in order to play your downloaded files. Currently, iTunes still uses DRM for its videos and iPhone apps. Other audio retailers – such as Audible.com – still use DRM. While certainly defensible (having people buying albums and then sharing amongst ten of their friends is hardly good for the bottom line), DRM more often proves a hindrance to legitimate customers. Anyone who has ever tried to play a song they purchased off iTunes pre-April 2009 on their Microsoft Zune is familiar with the frustrations of being thwarted by DRM. There are tricky ways to circumvent DRM, but they are largely complex and often illegal. Alternately, it’s best to get your songs and audio from a non-DRM vendor – such as Amazon or eMusic.

Compatibility Reference Guide

Now that you know more than you wanted about digital audio compression, here’s the quick reference guide on compatibility. Bookmark this page and refer back to it whenever you’re stumped about what to do with an audio file.

Applications

iTunes by Apple

Plays: Apple Lossless, MP3, AIFF, WAV, MPEG-4, AAC (also plays Quicktime files)

Rips: AAC, MP3, AIFF, Apple Lossless, WAV (Windows version converts WMA)

Windows Media Player by Microsoft

Plays: WMA, WAV, MP3

Rips: WMA, WMA Lossless, MP3, WAV

Winamp by Nullsoft

Plays: AAC, it, mod, nst, stmAIF, itz, MP1, NSV, stzAIFF, KAR, MP2, OGG, ultamf, M2V, MP3, okt, flacVLB, mp3, m3u, m3u8, plsASF, M4A, MP4, ptm, WAVAU, mdz, MPEG, RMI, WMAAVI, MID, MPG, s3m, WMVCDA, MIDI, mtm, s3z, xmfar, MIZ, NSA, SND, xmz669, VOC, b4s, asx, wpl

Rips: aacPlus, AAC, WMA, HE-ACC, MP3, (requires purchase)

VLC Media Player by VideoLAN

Plays: ALAC, RA, Speex, Screamtracker 3/S3M, TTA, OGG, WavPack, AMR, DTS, AAC, AC3, MP3, PLS, DV Audio, XM, FLAC, MACE, MOD, QDM2/QDMC, WMA.

Rips: MP3, A/52, FLAC, SPEEX, MP4

Portable Audio Players

Zune by Microsoft

Plays: WMA, WMV, MP3, MP4, M4V, M4A, M4B, MOV, AA

More details here.

iPod by Apple

Plays: MP3, AAC, AAC/M4A, AIFF, WAV, AA, Apple Lossless

More details here.

SanDisk Sansa by Samsung

Plays: MP3, WMA, PCM WAV, OGG and FLAC

More details here.

Converters

If you find yourself stuck with one kind of file and needing it to be a different kind of file, try one of these converters:

Online:

Downloads:

About

Jack Busch is a Pittsburgh resident, freelance writer and a crummy dancer. You can find him on Twitter and at JackBusch.com.

 

Primer is proudly spam-free. Unsubscribe anytime.