Across Acoustics

Music Mixing for Listeners with Hearing Impairment

September 25, 2023 ASA Publications' Office
Across Acoustics
Music Mixing for Listeners with Hearing Impairment
Show Notes Transcript

Musical mixes are typically created with normal-hearing listeners’ preferences in mind. How do the preferences of listeners with hearing impairment differ, though?  In this episode, we talk to Aravindan Joseph Benjamin and Kai Siedenburg (University of Oldenburg) about their recent article, which explores how various spectrum- and level-based mixing transforms might be altered to cater to listeners with different hearing abilities.

Associated paper: Aravindan Joseph Benjamin and Kai Siedenburg. "Exploring level- and spectrum-based music mixing transforms for hearing-impaired listeners." The Journal of the Acoustical Society of America 154, 1048 (2023); https://doi.org/10.1121/10.0020269.


Read more from The Journal of the Acoustical Society of America (JASA).
Learn more about Acoustical Society of America Publications.


Music Credit: Min 2019 by minwbu from Pixabay. https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=music&utm_content=1022

 

 

Kat Setzer  00:06

Welcome to Across Acoustics, the official podcast of the Acoustical Society of America's publications office. On this podcast, we will highlight research from our four publications. I'm your host, Kat Setzer, editorial associate for the ASA. Today we'll be talking to Aravindan Joseph Benjamin and Kai Siedenburg about their recent article "Exploring level- and spectrum-based music mixing transforms for hearing-impaired listeners," which appeared in the August 2023 issue of JASA. Thank you both for taking the time to speak with me today. How are you doing?

 

Kai Siedenburg  00:42

Good. Thanks for having us.

 

Aravindan Benjamin  00:44

Thank you very much.

 

Kat Setzer  00:45

So first, tell us a bit about your research backgrounds.

 

Aravindan Benjamin  00:48

I come from an electronic and electrical engineering background. I have a master's degree in electronic media technology from the Technical University of Ilmenau here in Germany. Here, my focus was on audio signal processing, psychoacoustics, and virtual room acoustics. I'm currently working in the music perception and processing lab headed by Kai. 

 

Kai Siedenburg  01:08

Yeah, and I have a math and music background originally. And in my lab, we mostly work on the psychoacoustics of music-- so how to sound waves in the ear transform into a sense of melody, timbre, emotion, and the like.

 

Kat Setzer  01:22

And for listeners, we've actually had Kai on here before. If you would like to hear about some of his other research, you can listen to his episode about lead vocal levels in music, which aired in April, I think.  But, so, let's get on to this episode here. So what is multitrack mixing?

 

Aravindan Benjamin  01:40

Multitrack mixing is quite an elaborate process where separate recordings that make up a song or the final mix, which is often referred to as the "mix-down," is modified and combined by trained mixing engineer. This is done to make the so-called mix-down more coherent and enjoyable for the listener or the consumer. The mixing engineer, while making the mix-down coherent, should also make sure that each instrument and the vocals in the mix-down is clearly audible. 

 

Kat Setzer  02:04

Okay, so then how does automated mixing work? 

 

Aravindan Benjamin  02:07

Automated mixing is a manner in which we computationally mimic the involvement of the mixing engineer through mathematical modeling, or machine learning, for example, or mathematical methods. But these methods require a priori information, which are generally settings used by expert engineers in the past on mixes that were already created. These settings are also referred to as the best practices in mixing, and these best practices can be used to train these mathematical models to create new mixes from scratch.

 

Kat Setzer  02:40

What are situations that you would use automated mixing in compared to, like, regular mixing?

 

Kai Siedenburg  02:45

So I think automated mixing is very handy if you want to have a very quick first approximation of a feasible mix. Like if you're recording something and then you want to have a first shot at a mix which sounds better than the raw mix, which is obviously not a fine-tuned mix, or not a very artistically distinct mix, then automatic mixing might be a very useful tool.

 

Kat Setzer  03:10

Okay, so it kind of gives you a general idea of what it will sound like, and then you can... The audio engineer would individually tweak things to make it specifically what you would want it to sound like. So then why is masking a concern in multitrack mixing, and how's it typically handled for normal-hearing listeners?

 

Kai Siedenburg  03:29

Yeah, we're speaking of masking when the presence of one sound renders another sound inaudible. And so to allow a comparison, in vision, masking is pretty clear, I guess. When I hold my hand in front of an object, I can see the object because light just doesn't get through. And hearing masking is a bit more tricky because sound waves don't get blocked, but superimpose in the air. And then masking happens due to the mechanics of the cochlea, so the inner ear, where not all frequency components of a sound get represented in the auditory nerve, but only the most dominant components get through, so to say. So that's masking in a nutshell.  And masking happens in multitrack music all the time, because several tracks obviously overlap in time and frequency. And sometimes masking is desired because producers want to have a certain degree of blend between a selection of tracks. So if you if you have a brass section, you want to have a certain degree of cohesiveness and blend between the different instruments of that section. And then blend obviously involves partial masking. In other situations, it's valuable to be able to hear out individual components from a mix, which yields a sense of clarity of the mix. And in this case, masking is unwanted.  And we know that masking is more severe for hearing-impaired listeners compared to normal-hearing listeners. So it should be of concern  when we talk about adjusting mixes for hearing-impaired listeners.

 

Kat Setzer  05:04

So what was known prior to this study about mixing preferences for listeners with hearing impairment?

 

Kai Siedenburg  05:09

So I would say, not so much. There's been quite some work on cochlear implant listeners, but for people with, say, a moderate degree of hearing loss, which are not cochlear implant candidates, we don't know much about their preferences. So with this study, we tried to give a starting point for this.

 

Kat Setzer  05:29

Okay, so what was the goal of this study?

 

Kai Siedenburg  05:32

The goals were straightforward. We wanted to explore the mixing preferences of hearing-impaired listeners by simply comparing their preferences with those of normal-hearing listeners. And we wanted to do this in a setting with and without hearing aids, and a musical setting which is realistic, so using naturalistic music mixes, and then this would allow us to see whether there are potential differences along fairly basic mixing dimensions that could be considered when adjusting mixes for hearing-impaired listeners in the future.

 

Kat Setzer  06:05

Okay, got it. So what mixing effects did you look at in the study?

 

Aravindan Benjamin  06:12

We looked at three mixing effects in this study. The first one was a level-based effect called the lead-to-accompaniment ratio. Here, to bring about changes, the overall level of the lead vocal tracks alone was changed. All the other tracks in the mix, we referred to in the study as the accompaniment, remained unchanged in the process. We did this to represent the relative levels of these accompanying tracks. The measure of the lead-to-accompaniment ratio, or LAR as we refer to it in the study, can be negative, zero, or positive. To provide a bit of perspective, an infinitely positive lead-to-accompaniment ratio will render only the lead vocals in the mix audible. Conversely, and infinitely negative lead-to-accompaniment ratio will render only the accompaniment audible. 

 

The next audio effect was the spectral balance, which was a frequency-domain-based effect. Here, we balanced the energy weightings of the overall mix about one kilohertz. With this effect, we can change the energy weighting such that it sounds boomy on one side of the range and shrill on the other side of our range. So this effect effectively changes from a low-pass to a high-pass filter, essentially. So to elaborate what a low-pass or a high-pass filter is, a low-pass filter would mean that we have a cutoff frequency where we allow all frequencies below that to pass by, and sort of attenuate the frequencies above that, and vice versa with high-pass filtering, where we attenuate low frequencies and pass high frequencies by, so it's in a way akin to filtering that we would do on a day-to-day basis. 

 

And, last but not least, we have what we call the EQ transform, which is also a spectral effect. Here, we aim at exaggerating or downplaying the equalization, or what we call “EQing” applied on the tracks by the mixing engineer. And you may wonder what equalization is. So equalization is inadvertently a ploy of the mixing engineer to sort of color the audio signal in the frequency domain. So he or she does that by segregating an audio signal into individual octave bands or smaller third octave bands and providing different energies or different weightings at different octave bands, therefore, sort of coloring that audio signal in question. So the EQ transform that we talked about in this study, we do this by linearly extrapolating between a reference spectrum and the spectrum of each individual track bearing the equalization applied by the mixing engineer. This so-called reference spectrum that we talked about was an ensemble average spectrum of commonly occurring tracks in the open-source databases that we chose. These included lead vocals, bass, guitar, drums, guitar, piano, percussion instruments, and synth instruments. This EQ transform was measured as a percentage of equalization applied by the mixing engineer. So to provide a bit of perspective, 100% EQ transform setting would mean that the participant preferred the equalization applied by the mixing engineer as it were without any changes. Anything above 100% would mean that the participant prefers to exaggerate the equalization applied by the engineer. Anything below that will simply reduce the equalization applied by the engineer sort of making the overall spectrum of the audio signal to which the equalization was applied to more flat. I would like to add that interestingly, by altering the equalization in the tracks by applying our EQ transform, significant changes in the frequency domains sparsity could be shown. We did so by quantitatively measuring the sparsity using what we call the Guinea coefficient or index. This index has been proven time and time again to be more robust than most available mathematical measures of sparsity in previous studies

 

Kat Setzer  09:44

What is sparsity?

 

Aravindan Benjamin  09:46

Yes, sparsity is, for example, the distribution of energies and frequencies. For example, when we talk about spectral sparsity, this would be akin to wealth distribution, for example, so in a given nation, there will be a sparse distribution of wealth if only 2% of the population had said 99% of the wealth and a denser distribution if everybody was equally rich, and they held the same amount.

 

Kat Setzer  09:22

So what was the setup of the first experiment? 

 

Aravindan Benjamin  10:13

Yeah, we use a standard two-speaker setup in a quiet room. So it's a generic hi-fi setup that you’d have in a living room environment, for example. The speakers were 90 degrees apart from one another, and placed two meters away from the participant. I would like to add that the overall playback level of the loudspeakers that we used were 80 decibels elevated at the participant position. The participant could alter the audio effects that we discussed using what we call an ungraduated dial, which means that there were no sort of markings to indicate where the dial should be and whatnot, so it was just a completely blank dial. So the participant just had to sort of rotate it with the mouse on a standalone computer and as they rotated this virtual dial, the audio effects would change in real time.

 

.

Kat Setzer  10:55

Okay, got it. So what did you end up finding with regard to listening preferences among the three groups in the first experiment?

 

Aravindan Benjamin  11:01

So we have the normal hearing, or the participants who did not have hearing impairment. So compared to these participants, our hearing-impaired participants who used hearing aids or bilateral hearing aids had elevated lead-vocal-level preferences. So the ones who were hearing-aid users preferred louder lead vocals  as compared to normal-hearing participants. But these preferences, the lead-vocal-level preferences were significantly varied for hearing-impaired listeners without hearing aids, so who did not use hearing aids.

 

I would also like to add that there were large individual differences in the spectral balance preferences among the normal-hearing participants. So there was a diverse range of preferences among the normal-hearing participants compared to the other two, who were hearing impaired who did not use hearing aids and the ones who did use hearing aids. Furthermore, for the spectral balance part, our hearing-impaired participants with hearing aids preferred mixes without any changes to the spectral balance. So effectively, the set the spectral balance preferences to zero. However, those who did not use hearing aids preferred higher energy ratings and frequencies above one kilohertz in the mixes presented. So they sort of favored amplifying high frequencies. 

 

As for the EQ transform preferences, all three groups that we studied in the first experiment preferred spectrally denser mixes by way of preferring EQ transforms settings below 100%.

 

Kat Setzer  12:23

So the second experiment was essentially the same setup, but you were looking specifically at hearing-impaired listeners with and without hearing aids. How did this experiment differ from the first one?

 

Aravindan Benjamin  12:31

So it's just the sample of participants was more bespoke as compared to the first experiment, where we targeted a controlled group of hearing-impaired participants who were certainly bilateral hearing aid users. This was so that we can assess the preferences with and without the hearing aids.

 

Kat Setzer  12:48

Okay, got it. So what did you end up finding with regards to the relationship between mixing effect preferences and the level of hearing loss?

 

Kai Siedenburg  12:55

So when we looked across all subjects from our two experiments, we had quite a range of hearing loss covered, so actually from 0 dB to 60 dB hearing loss in terms of pure tone averages of hearing thresholds. And when we then correlated the degree of hearing loss with the larger preferences, so the loudness of the vocals, we saw that the stronger the hearing loss, the more the participants preferred louder vocals. And this is very similar to the literature on cochlear implants that we've mentioned before. We also saw that with higher levels of hearing loss, people tended to have higher preference for sparser mixes, which potentially rules out some of the masking we also have talked about earlier.

 

Kat Setzer  13:42

Okay, interesting. What else did you find out about the mixing effect preferences of listeners with hearing impairment?

 

Kai Siedenburg  13:49

So Aravindan said earlier that in experiment two, we just looked at people with hearing aids. And then we had them wear the hearing aids in one session and don't wear their hearing aids in another session, and there so we could directly compare results from these two sessions. And here we saw that hearing aids compensate for the hearing loss when listening to music to a certain extent. So with hearing aids, we saw that lower vocal levels were preferred, as well as less sparsity. And without hearing aids, higher vocal levels were preferred and more sparsity was preferred. So we saw that type of amplification through the hearing aid is really critical to consider as well in this picture. We can't just say hearing-impaired, yes/no. We really need to be looking also at the whole acoustic chain, of course involving hearing aids. It's important to emphasize this because other studies show that hearing aids don't improve certain aspects of music perception. So here we found if we look at sound preferences for music mixes, there is a considerable difference depending on whether people use hearing aids or don't use hearing aids.

 

Kat Setzer  15:02

Okay, yeah, that makes sense. And this is using a loudspeaker. So I imagine... I've known people with hearing aids who use their hearing aids to listen to music now, since they're used kind of like Bluetooth speakers in a way. Would that affect, you think, what their preferences on mixing are?

 

Kai Siedenburg  15:18

Yeah, I think so. It really depends whether you're listening via loud speakers or streaming to the hearing aid directly. I think there are also quite a few degrees of freedom that we can use in the future to improve music processing for hearing-impaired listeners.

 

Kat Setzer  15:34

Yeah. So actually, that leads us to our next question: what are your future research plans?

 

Kai Siedenburg  15:38

We have quite, quite many plans. We're currently exploring the other side of the coin of this whole topic, namely a performance-based task. So, in the study we were talking about, we look at preference, but of course, one could also look at performance-- so looking into whether people can hear out individual components from a mix or not. So, there we will see whether subjective preference and objective performance are closely or only loosely related. So that's one interesting path that we're currently exploring. Aravindan, do you want to add something to that on future research?

 

Aravindan Benjamin  16:19

I would also like to add that we would like to consider a synergy of effects. So here we isolated individual effects, and then consider their preferences separately. But we would like to see what the preference is if two effects were considered together, for example, an EQ transform of 150%. How would it feel if the participant had the degrees of freedom to change two effects at once, for example?

 

Kat Setzer  16:43

Well, that sounds all very exciting. I'm sure a lot of people will benefit from the research and how it will increase accessibility of music to a wider variety of listeners. I wish you the best in your future research and thank you again for chatting with me today.

 

Kai Siedenburg  16:57

Thanks for having us.

 

Aravindan Benjamin  16:59

Thank you very much.

 

Kat Setzer  17:02

Thank you for tuning into Across Acoustics. If you would like to hear more interviews from our authors about their research, please subscribe and find us on your preferred podcast platform.