On Recording the Human Voice

Recording the human speaking voice can be one of the trickiest tasks a professional sound recordist encounters. Even when working with seasoned professional voice artists, problems can creep in. Here are a few of them and how to solve the problem.

Let’s begin by clarifying one thing. I don’t mean a singing vocalist, but a speaking voice — perhaps an announcer, a narrator, a person doing a commercial or even someone recording a book on tape. We are talking everyday speech.

First of all, the voice must sound natural and be clear and understandable. This is not music. Special efforts to manipulate the voice are not allowed. No masking is allowed either. We are talking about the purity of the human voice here.

Courtesy Alt Recording Studios.

Courtesy Alt Recording Studios.

Normally, in such situations, the voice talent sits or stands in front of a microphone in a treated studio or voice-over booth. It can be at a broadcast station, a recording studio or even on-location with the proper acoustic treatment. The problems we must deal with here occur even under ideal recording conditions.

The first situation that can occur is sibilance, a manner of articulation of fricative and affricate consonants. Sibilance occurs when a stream of air is directed with the tongue toward the sharp edge of the teeth. It causes a sibilant — or strident — sound.

Sibilance is an unpleasant tonal harshness that can happen during consonant syllables (like S, T and Z), caused by disproportionate audio dynamics in upper midrange frequencies. Sibilance is often centered between 5kHz to 8kHz, but can occur well above that frequency range.

This problem is usually caused by the actual vocal formant, but can also be exaggerated by microphone placement and technique. Every human voice is different and don’t pre-suppose that anything you’ve tried before will or will not work again. It’s all up for grabs.

The best way to start is to leave some space — about 12 to 18 inches — between the speaker and the microphone. Forget a pop filter here — it won’t help. Once you find a suitable microphone and distance combination that reduces sibilance, point the microphone downward 10 to 15 degrees toward the throat instead of the source. Also, a good tip is change out the type of microphone. Dynamic mics often work when condensers don’t in these situations.

If electronics are required, de-essers are the tools of choice. The de-esser technique typically uses a narrow peak EQ in the sidechain to boost the most offensive sibilant frequencies. This EQ exaggerates the dynamic difference between the sibilant band and the rest of the vocal waveform, making it much easier to achieve gain reduction during those consonants.

Another vocal issue that can develop into a problem are plosives — blasts of air that result from certain consonant sounds usually heard on words with Ps and Bs. This is where a pop shield does help. Position it a couple of inches from the mic and cross your fingers.

Plosives can be especially bad with cardioid or hypercardioid mics and can cause the diaphragm to bottom out, hit the backplate insulator and cause mechanical clipping. This is bad and can ruin a recording. In this case, try a mic with an omni directional pickup pattern which can lessen the effect. Sometimes, though, plosives are unavoidable.

If electronics are needed to fix plosives, try iZotope’s De-Plosive module in RX Advanced for the fix. As with all such problems though, it is best to solve it in the recording session rather than depend on electronic solutions in post.

Finally, and this tends to come into play when using other than top-tier trained voiceover artists, are the assorted pops, clicks, smacks, swallows and other odd sounds that creep into human speech. It can happen at any moment and often tests the skill set of the engineer doing the recording session.

These odd-ball sounds fall under the idiosyncrasies of human speech. This involves more the talent and professionalism of the person doing the recording more than anyone else. It may involve working with the voice talent to address the problem and to make sure the person is well hydrated before the recording session. It is always good to have hot tea, lemon and honey on the set to help soothe the voice.

Of course, switching mics and other gear can help, but in the end iZotope’s RX Advanced and Wave’s modules can also help save the day. Editing with these tools has become the go-to fix for many tiny, indescribable problems.

Recording the human voice has never been easy. It tests the skills of every recordist. When you think you’ve seen it all, there is something new waiting in the wings to test you again.

You might also like...

Digital Audio: Part 8 - Sampling Rates

The best sampling rate for digital audio is easily established by considering the requirements of the human auditory system (HAS), which is the only meaningful arbiter. Provided that the bandwidth of a digital audio system somewhat exceeds the bandwidth of…

Digital Audio: Part 7 - Debunking The Myths Around Hi-Fi Audio

It’s interesting to compare the quality that can be obtained using digital audio with legacy media such as the vinyl disk and magnetic tape.

Object-Based Audio Mixing: A New Way To Personalized Listening

With the advent of immersive audio mixing using codecs like Dolby Atmos and DTS:X (the successor to DTS HD) professionals now have the ability to create interactive, personalized, scalable and immersive content by representing it as a set of…

Digital Audio: Part 6 - Noise Shaping

Noise shaping performs an important role in digital audio because it allows hardware to be made at lower cost without sacrificing performance, and in some cases allowing a performance improvement.

Digital Audio: Part 5 - The Mathematics Of Oversampling

Oversampling is a topic that is central to digital audio and has almost become universal, but what does it mean?