Home Stories Why does digital music only make sense to a human ear?

Why does digital music only make sense to a human ear?

August 20, 2020

German acoustics professor Eberhard Zwicker spent years studying the ways humans recognise sounds. After conducting a number of experiments, he reached an important conclusion: a human ear doesn’t abide by the same principles as a microphone. It is a sense organ that became, through evolution, specially adjusted to speech recognition and detecting danger in the natural environment. That’s what makes it efficient in discerning conversations in the buzz of a coffee shop, but not as a universal sensor that is equally effective in detecting any kind of sound.

Zwicker’s experiments showed that people are able to distinguish between two tones differing in pitch only if they are far enough apart. When he equalized the frequency of two tones, listeners couldn’t distinguish between the two, while they found the difference most noticeable when the lower tone was the louder. A similar phenomenon was observed with a sequence of clicks and beats. If two clicks are too close together, listeners can’t distinguish between them. Again, if one click is louder than the other, the closer they are the more likely it is that someone will hear them as a single sound.

While pondering such idiosyncrasies of human hearing, which came to be known collectively as “psychoacoustic masking,” Zwicker’s student, a computer engineer called Dieter Seitzer, came up with the idea of putting those tendencies to use. The limited forms of sound the human ear can process, Seitzer reasoned, reduced the amount of data a digital sound recording would actually need to preserve. If he could remove the data undetectable to the human ear, the recording would contain less data but nobody would be able to tell the difference.

When in 1982 the CD was introduced as a new audio recording format, it seemed to represent a huge advance in the technology of sound reproduction. But Seitzer wasn’t impressed. He was convinced that most CD audio data could be erased without the listener ever noticing. A CD recording uses approximately 1.4 billion bytes of data for 1 second of stereo sound, and Seitzer believed that a comparable quality could be attained from a recording that required a device to read just 128,000 bytes per second. In order to accomplish this, he needed help. That assistance arrived in the form of a young student of electrotechnics called Karlheinz Brandenburg, who soon took on the project of finding an algorithm for compressing audio files.

Deleting redundant data in audio recordings

In digital sound transcription, an audio recording is first of all broken up into its basic elements. Those elements could be described as the auditory equivalent of pixels, the basic cell-like components of a digital image. Brandenburg’s task was to use his findings from psychoacoustics in order to clean out the “sound pixels” that the human ear couldn’t hear anyway.

Human hearing is attuned to the range of frequency one finds in human speech. With this in mind, Brandenburg believed he could safely remove the number of pixels a recording contained at very high and very low-frequency levels. When working with sounds of a similar pitch, he was able to thin out the pixels at a slightly higher pitch, because lower pitch supports better sound recognition. Because the human ear filters out the noise followed by a loud bang, it is less accurate in registering the sounds directly following the bang. Surprisingly, it transpired that the human ear also fails to register the sounds moments before the loud bang occurs, because the brain is still busy analysing them when the bang interrupts. This means that the moments preceding a loud event will also take up fewer data in a digital recording.

Brandenburg conducted many experiments before figuring out which sound pixels could be omitted without reducing the quality of a recording. After much research, he managed to formulate a set of mathematical rules for compressing audio data that could be reapplied to the same recording. In 1986 he filed a patent for psychoacoustic sound compression and soon after he was invited to the Fraunhofer Institute, a government-sponsored incubator of technology, which had been founded to develop new state-of-the-art products and techniques. There he met a young programmer and an excellent musician called Bernhard Grill, who transformed Brandenburg’s mathematical formulas into an efficient computer algorithm. They were able to keep improving this algorithm with the help of test subjects.

In June 1990 the sound compression algorithm became good enough to compete in a contest held by the Moving Picture Experts Group (MPEG), the international organisation that issues universal standards for digital coding. During their meeting in Stockholm, a group of young Swedish students took a double-blind test in which they listened to samples of various pieces of music and sounds, encoded in formats set by different algorithms.

The public was somewhat surprised by the results, which revealed two winning algorithms: Brandenburg’s, and one developed by the MUSICAM consortium, a group supported by Philips. The jury spent several months deciding which standard they should recommend for general use. There were pros and cons on both sides. All things considered, Brandenburg’s algorithm attained comparable sound quality by using less data than its rival, yet required more processing power for coding.

In the end, the MPEG offered a compromise. In addition to the MUSICAM format, the Fraunhofer Institute’s algorithm would also be proposed as a standard, but only if they added an audio bank filter patented by Philips. Because this addition didn’t improve the algorithm, but in fact made it worse since it required even more processing power, one can assume that the “compromise” was, in fact, the result of backstage lobbying, a powerful company pressurising the expert jury.

After some intense internal debate, the Fraunhofer Institute accepted these additional terms. In April 1991 the MPEG announced three standards of use: MPEG Audio Layer I, which was temporarily used for digital cassettes but soon died out; MPEG Audio Layer II, the MUSICAM algorithm also known as mp2; and, last but not least, Brandenburg’s method, dubbed MPEG Audio Layer III, or the now famous mp3.

Unusual ways of introducing new technologies

Soon, unfortunately, despite the support it received, the mp3 standard didn’t seem to catch on in new applications. Everything from digital radio, interactive CD-ROM, VCD to HDTV seemed to prefer the mp2 standard. Manufacturers of audio devices explained that mp3 required too much processing, which was among other things due to the additionally enforced Philips technology. In 1994 an upgraded mp3 algorithm managed to use 12 times less processing power while retaining the same sound quality, but this still wasn’t enough for it to be selected as standard in DVD sound recording in 1995.

According to Stephen Witt, the author of How Music Got Free: The End of an Industry, the Turn of the Century, and the Patient Zero of Piracy (Viking 2015), the Internet music piracy community was the first to recognise the potential of mp3-technology and the first to popularise it. Just when the mp3’s inventors had all but lost hope of ever selling their product, they tried their luck by addressing the final users directly. They developed a simple program free of charge that enabled users to convert their music collections into mp3 files. At first, this wasn’t easy to do since it was only in 1993 that Intel Pentium processors became powerful enough to play mp3 files on home computers. Coding was then still extremely slow and it took Pentium six hours to convert the tracks of a single CD into mp3 files. Nevertheless, the mp3 format gained popularity with music buffs and became almost synonymous with digital sound transcription.

In the mid-1990s, the team that created the mp3 standard started to develop a new, second-generation psychoacoustic coding system that was found to be faster, simpler and more efficient. It was named Advanced Audio Coding (AAC) and it later became known as mp4 or m4a. Most digital music, sound and videos today are encoded in this 2.0 psychoacoustic system.

If users were guided only by quality, the mp3 format would have died out in 1996, since its successor AAC was far superior. But years of rejection and disappointment left the creators of mp3 with some valuable entrepreneurial nous. When mp3 gained a large enough audience, they left it at that and focused on charging royalties. AAC was first used with phones, high definition TV and a number of new applications, while mp3 remained the foremost format of digital music.

Without efficient data compression, the revolution in the use and storage of music we have witnessed in recent decades could not have occurred. Interestingly enough, the father of psychoacoustic data compression, Dieter Seitzer, applied for a patent on a “digital jukebox” decades ago. Users would subscribe to listen to music through a network connection by accessing a centralised server that contained all the music files. The idea was rejected for patenting. It was deemed impossible to implement because music couldn’t be compressed into files small enough to allow a remote transfer. Or could it?