Across Acoustics
Across Acoustics
POMA Student Paper Competition: New Orleans
In this episode, find out what the next generation of acousticians is researching! In this episode, we talk to the latest round of POMA Student Paper Competition winners, from the joint 188th meeting of the ASA and the 25th International Congress on Acoustics held in New Orleans in May 2025. Their topics include:
- Using the spatial decomposition method to parameterize acoustic reflections in a room (Lucas Hocquette, L-Acoustics)
- Visualizing nonlinearities in a bolted plate system with digital image correlation (Nicholas Pomianek, Boston University)
- Analyzing the how people pronounce the word "just" in casual speech (Ki Woong Moon, University of Arizona)
- Modeling strings of historical instruments that no longer make sound (Riccardo Russo, University of Bologna)
- Improving automatic music mashup generators (Yu Foon Darin Chau, Hong Kong University of Science and Technology)
Associated papers:
Lucas Hocquette, Philip Coleman, and Frederic Roskam. "Acoustic reflection parameterization based on the spatial decomposition method." Proc. Mtgs. Acoust. 56, 055004 (2025). https://doi.org/10.1121/2.0002037.
Nicholas Pomianek, Trevor Jerome, Enrique Gutierrez-Wing, and J. Gregory McDaniel. "Visualizing contact area dependent nonlinearity in a bolted plate system through digital image correlation." Proc. Mtgs. Acoust. 56, 065001 (2025). https://doi.org/10.1121/2.0002099.
Ki Woong Moon and Natasha Warner. "Realization of just: Speech reduction in a high-frequency word." Proc. Mtgs. Acoust. 56, 060005 (2025). https://doi.org/10.1121/2.0002080
Riccardo Russo, Craig J. Webb, Michele Ducceschi, and Stefan Bilbao. "Convergence analysis and relaxation techniques for modal scalar auxiliary variable methods applied to nonlinear transverse string vibration." Proc. Mtgs. Acoust. 56, 035007 (2025). https://doi.org/10.1121/2.0002073.
Yu Foon Darin Chau, Andrew Brian Horner, Joshua Chang, Chun Yuen Chan, and Harris Lau. "Retrieval-based automatic mashup generation with deep learning-guided features." Proc. Mtgs. Acoust. 56, 035006 (2025). https://doi.org/10.1121/2.0002071.
Learn more about entering the POMA Student Paper Competition for the Fall 2025 meeting in Honolulu
Read more from Proceedings of Meetings on Acoustics (POMA).
Learn more about Acoustical Society of America Publications.
ASA Publications (00:26)
Today we're featuring a new round of POMA student paper competition winners, this time from the joint 188th meeting of the ASA and the 25th International Congress on Acoustics, which was held in New Orleans this past May. First up, I'm talking to Lucas Hocquette, who authored the paper, “Acoustic reflection parameterization based on the spatial decomposition method.” Congrats on the award and thanks for taking the time to speak with me today. How are you?
Lucas Hocquette (00:50)
I'm doing great, thanks for having me today.
ASA Publications (00:53)
Awesome, well, excited to learn about your paper. So first, tell us a bit about your research background.
Lucas Hocquette (00:58)
So I started with pretty much some engineering studies with a strong emphasis on applied mathematics and signal processing. I was quite far from physics and especially from acoustics, but during my studies, I got into music organizations, and I started plugins and speakers, doing a bit of audio engineering. And that's how I realized that actually acoustics is a really nice field. And after discovering that world, I kind of thought, why not try to work in that field? And that's where I started to apply at internships in acoustics, first at LAcoustics London and then France. So that's pretty much how I am here. And so this study was led at LAcoustics London for about six months. And I'm happy to be able to preset those results.
ASA Publications (01:45)
So what is the spatial decomposition method, and how is it typically used in acoustics?
Lucas Hocquette (01:50)
So the spatial decomposition method is a room acoustics tool. It's used to kind of understand how the sound field kind of propagates spatially somewhere. And so with that method, we can try to understand from a listener's point of view where the sound field is coming from over time, when the sound is coming from a speaker, say. So it's used for now mostly in room acoustics to kind of investigate how the distribution of sound energy around the audience can have an impact on their musical experience.
ASA Publications (02:22)
Okay, got it, that makes sense. So what are the challenges with using this method in visualizing spatial sound fields?
Lucas Hocquette (02:29)
So one of the tricky parts with method is that it kind of provides a very noisy output. So with SDM, we get a new type of signal, which is kind of a direction signal over time, and it tends to be very noisy. So when we have to analyze the results after doing some measurements, it's kind of difficult to build good links between the measurement that we make and the kind of physical features that we can see in the rooms, and try to understand at some point over time is the sound more reflected by a wall, by the seats, the roof, the ceiling, who knows. And so yeah, that was kind of the big problem that we had when we first encountered STM at LAcoustic.
ASA Publications (03:10)
So what was the goal of your work?
Lucas Hocquette (03:12)
So the goal of my work was to try to get rid of that noise and try to find kind of an interpretable way to read these SDM, the spatial decomposition measurements. And so our goal, kind of our ideal goal was to kind of have an automatic way to understand the links between the room, the geometry, the physical features of the room,
and the measurements that we had. And for example, say at one second after the speaker starts playing, does the sound bounce from a wall? Does it come from the stage? That was our ultimate goal.
ASA Publications (03:52)
Okay, okay. So like, is this reflection that we're seeing coming from here or is it coming from there? Kind of, yeah.
Lucas Hocquette (03:57)
Yeah, exactly.
ASA Publications (03:59)
Tell us about your sound field model and how you incorporated the spatial decomposition method. What did that entail?
Lucas Hocquette (04:06)
So we kind of started by just reusing the hypothesis that are made by the spatial decomposition method in itself. And what we added on top is a statistical model to kind of incorporate that noise that we would observe, that appears in the estimation of the directions of the sound field. And with this kind of new model that we have, we were able to kind of represent our sound fields in a much more compact way with a very reduced number of simple parameters, hence the parameterization that we are talking about, because we are converting our observation from a signal where we had a direction of arrival signal over time to a representation with a few number of reflections that are characterized by one direction and an energy level. And so that’s pretty much sound field model that we built. And on top of that, we deduced a set of estimators to then compute these parameters for different sound fields that we were able to measure. So yeah, there was a model and on top of that, then an estimator to kind of be able to use that model in kind of a real world sense.
ASA Publications (05:18)
Okay, okay, what kind of parameters were you looking at?
Lucas Hocquette (05:22)
So we planned a few, we had directions, so from where are the reflections coming from, but also how specular they are, how concentrated pretty much spatially they are, and also their energy level, to what extent do they contribute to the impulse response kind of overall energy level.
ASA Publications (05:41)
Okay, okay, okay. So how did you validate the model?
Lucas Hocquette (05:44)
So we validated the model kind of in the very classical, I'd say, workflow. We ran some simulations to just assess the accuracy of the model in a accessible and cost-effective way and to just simulate plenty of different situations. And then we ran some measurements in one of our labs in London to kind of further confirm that intuition. We observed quite a few differences between what we had in simulation and in our measurements, because we had kind of not the best measurement setup when we did the real-life measurements, but we had kind of still a good accord between our two validation steps.
ASA Publications (06:25)
Okay, yeah, that does sound very straightforward. So what are some of the applications of these visualizations of estimated parameters?
Lucas Hocquette (06:33)
We've identified mostly two applications, but many more maybe can exist. We had first for room diagnostics. So the idea would be to have this tool to help designers, audio engineers to understand eventually when they have problems in a room, where they are coming from and a system to see where the problems are coming from and then potentially resolve
those problems. And another application that we can see is the source localization or detection of obstacles in a sound field or in a room or reflectors. So, yeah, detection of physical acoustic objects in space.
ASA Publications (07:14)
I mean, both of those sound like very broad applications that can be used…
Lucas Hocquette (07:17)
Yeah
ASA Publications (07:19)
So what are the next steps in this research?
Lucas Hocquette (07:22)
So the project for now is on hold because this study was a study led during an internship that I finished a year ago now, and then I pretty much swapped teams, and so I’m working on different topics and so the project for now is pretty much on hold.
ASA Publications (07:36)
So what was the most exciting, interesting, or surprising aspect of this research?
Lucas Hocquette (07:42)
So there were quite a few, I'd say the first one was I didn't expect the project to end up where it ended up. Initially, we just wanted to kind of assess SDM and if we would be able to use it for calibrations at L Acoustics. And we ended up building this statistical model, and I really didn't expect to go this deep into statistics at the beginning of this project.
And one of the other interesting things that I feel is it's really nice to kind of see the links unfold between kind of the abstract mathematical theory and concepts that we're using and the physical experiments and the observations that we were able to make afterwards. And it's really, really nice to observe. And so, yeah, I find that very appealing in a way when you do this kind of research. And I think I always remember kind of the first time that we got positive results with this parametrization. And so I was like, just like a kid looking at my graphs and yes, having these links was yeah, really, really fun. So yeah, that would be the best.
ASA Publications (08:43)
Yeah, yeah, it's seeing how the simulations just link to real life and it's like, oh, this actually does simulate it the way I need it to. Yeah.
Lucas Hocquette (08:51)
Exactly, exactly. Especially like at first you try some stuff out and you're not really sure if it's going to work, and just yeah, you continue to kind of work on it, fiddle a bit, and it turns out that sometimes it works and yeah really nice to have.
ASA Publications (09:05)
Yeah, very satisfying.
Lucas Hocquette (09:06)
Yeah exactly.
ASA Publications (09:08)
Well, thank you for sharing with us about your efforts to build upon the spatial decomposition method. And congratulations again on winning the award. I wish you the best of luck in your future research.
Lucas Hocquette (09:19)
Thank you very much. Thanks for having me. It was great.
ASA Publications (09:25)
Our next POMA student paper winner from the New Orleans meeting is Nicholas Pomianek, who will be chatting with me about his article, “Visualizing Contact Area-Dependent Non-Linearity in a Bolted Plate System Through Digital Image Correlation.” Thanks for taking the time to speak with me today, Nick, and congrats on the award. How are you?
Nicholas Pomianek (09:42)
Thanks, I'm great. Happy to be here and excited to talk about this work.
ASA Publications (09:46)
Yeah. So first, just tell us a bit about your research background.
Nicholas Pomianek (09:50)
So I'm a fifth year mechanical engineering PhD student in the McDaniel lab at Boston University. And the overall goal of my work so far is to kind of contribute to a better understanding of how vibrations propagate through all types of solid material interfaces.
ASA Publications (10:09)
Okay, okay. So your work is concerned with the vibration in bolted connections. Why are we concerned with bolts and vibration and what's so tricky about these systems?
Nicholas Pomianek (10:18)
So my work is focused on bolted connections, but it's kind of intended to inform our general understanding of any mechanical or structural joint where friction plays a major role. So riveted joints or dovetail joints could also be included here. An example would be the joint where an airplane's engines are bolted to the wing or gusset plates holding truss members together on a bridge—really anything friction is the glue that's preventing things from moving and it plays a big role. As a research community, the joints research community, we’re primarily motivated by the fact that the vibration of a mechanical structure with joints is really messy and hard to predict compared to an equivalent structure without these joints. And that’s because when a joint vibrates the different regions inside the joint where the friction interface exists are subjected to lots of different forces in different directions, creating this very chaotic internal environment. Some parts of the joint will like slip and grind against each other while other parts of the joint will be separating and clapping back together. But in reality, every part of the joint is going to express some combination of these behaviors and which of these behaviors and what ratios is going to depend on both the forcing level of the whole structure, so how hard it's vibrating, and the frequencies that are being expressed by the system as it vibrates.
I guess the most important ways that joints can create issues or opportunities for engineers are their tendency to alter the overall resonant frequencies and damping characteristics of built-up structures. For one, like joints can actually be used for large structural vibrations in something like an earthquake scenario in a large high-rise building. There's lots of energy that gets dissipated as the building sways and the joints kind of move relative to themselves. And this can be engineered in a way to reduce risk of collapse as it absorbs a lot of the energy preventing the building from vibrating as severely.
And secondly, from a structural engineering perspective, understanding resonant frequencies is extremely important in order to prevent unexpected destructive behavior or destructive vibrations. The most famous example of this is the Tacoma Narrows Bridge collapse. Wind blowing across the bridge excited a torsional mode that caused the bridge to tear itself apart and caused lots of damage and injury. So anything that can affect resonant frequencies has to be treated with a lot of care. But right now, it's extremely computationally intensive to model like highly detailed joint behavior. So most of the time the effects of joints are kind of baked into the safety factors and designed around because it's not really possible to account for that without having to take thousands of hours of extra computational time. And for something like getting an airframe design approved by regulators, you just submit thousands of different structural load case simulations and run thousands of simulations. And for a structure with possibly tens of thousands of joints, that's not really feasible. So long-term as a field, we're working towards reducing this cost by better understanding the physics of joints and improving those modeling techniques.
ASA Publications (13:54)
Right, right, that sounds insanely complicated. So what is system nonlinearity and what approaches are typically used for identifying it in bolted joints?
Nicholas Pomianek (14:04)
Yeah, so system nonlinearity or linearity is kind of the central difficulty when it comes to joint modeling. In this context, for structural dynamics, or really any kind of physical system, we're talking about the input/output relationship between the force applied to a structure and then how that structure responds to that force being applied. Nothing like in nature or in reality is perfectly linear, but lots of things can be modeled by linear models. They're perfectly good most of the time. The backbone of most FEA code is linear models. So you can think of any simple spring mass or spring mass system. But again, in reality, a tuning fork is practically like a real-life linear system. It's always going to ring at the same frequency that it was designed to ring at, no matter how hard you hit it. Within reason, if you hit it too hard, you'll break it. But within it's usable force range. If you hit it twice as hard, you're going to get a response that's twice as loud. And that's kind of at the core of what a linear system is.
An example of a nonlinear system, kind of in the same theme, would be a guitar string pinned against the fretboard. So at the lower kind of forcing levels, where if you pluck the guitar string lightly, it's going to have that same kind of behavior where if you pluck it twice as hard, it's going to be exactly or almost exactly twice as loud. But there are limits to how far you can push this, where if you pluck the guitar string too hard, it's either going to start impacting other parts of the guitar fretboard or other strings or, in a more relevant to my work scenario, it'll start slipping between your finger and the fretboard, slightly changing the length of the string or introducing other artifacts or harmonics and changing the response. So in a system where forces are low, it's linear, but as that force increases, you're going to transition to a nonlinear system. And it's at that transition point where this input/output relationship breaks down and everything gets much more complicated and our usual models stop working as well. And this applies to mechanical jointed systems.
ASA Publications (16:29)
Okay, okay that makes sense. So what was the goal in your study?
Nicholas Pomianek (16:36)
So in this research field, we typically look at the overall or global behavior of a structure with joints. But we, in contrast, wanted to develop a way to zoom in on a joint and measure spatially dependent nonlinearity in finer detail. So looking at a place near a bolt versus away from a bolt and seeing how that changes the response. In typical vibration testing, you will usually use accelerometers. You attach them to your structure at single point locations and record how it responds to being excited at different frequencies. You could also use something like a laser Doppler vibrometer, which doesn't mass load the structure, but either way, they are going to be single point measurements or multi point measurements that work in the frequency domain. Instead, we wanted to use a newer method called digital image correlation, which I won't go into the fine details of, but long story short, we're able to measure the displacement of tens of thousands of points on a surface by taking slow motion videos and running them through specialized algorithms, which essentially place lots of displacement sensors all across the surface virtually. This kind of allows us to create a detailed map of a structure's vibration in a way that doesn't really exist before digital image correlation. And it doesn't influence the mass of the structure. And you get all of this displacement information over time all at the same time. So it's a very highly data-dense measurement that is kind of newer and very exciting. And yeah, the overall goal of the study is to develop a method specifically for digital image correlation to analyze joint non-linearity.
ASA Publications (18:26)
Okay, okay, that's cool. It does sound a lot more efficient and probably helpful for all the problems you've talked about.
Nicholas Pomianek (18:33)
Helpful, yes. Efficient, definitely not. It's an extremely, extremely data-intense and highly expensive method.
ASA Publications (18:41)
Okay, okay. Well, you know, you win some, lose some. So you used a corner bolted plate in your work. Why?
Nicholas Pomianek (18:47)
So the joints community has kind of a variety of academic benchmark structures. So these are structures that some researcher has designed to be interesting in a joints research context. And the community has kind of agreed to study these because it makes it easier to compare results and replicate results. And I chose the corner bolted plate out of the few different ones that are popular right now because it's highly compatible with digital image correlation in that it has a large surface area with all of that surface area being directly on top of a friction interface, making it really easy to get a lot of information using the digital image correlation method. You're looking at the entire surface’s displacement all at once.
ASA Publications (19:36)
Okay, okay, got it. So what did you do to the corner bolted plate?
Nicholas Pomianek (19:40)
So for this experiment, we used impact hammer tap testing. It's a vibration testing method that involves impacting the system with an instrumented hammer. So you can imagine a small hammer with an electrical wire coming out of the handle. It has a force gauge on the tip. It's very sensitive and it measures exactly how hard you hit something and for exactly how long it was impacted, whether or not there were multiple hits, et cetera.
The reason this is so useful and probably the first step to any vibration analysis is that when the structure is impacted, it's equivalent to a broadband modal excitation, which means all modes or resonances are excited at the same time. And then they all decay together at the same time at different rates. And the rate that which each mode decays depends on which mode gets the most energy from this broadband excitation. And comparing these tells you a lot about what's going on in the system. In my specific test, the corner bolted plate system that we fabricated was suspended from fishing line and elastics approximating free boundary conditions. And I tested the plate system itself in a number of different internal contact configurations by placing different spacing washers at different locations in between the bolts. So these configurations, they varied from full 100 % contact down to around 10 % internal area contact at different locations in the system. So varying the quantity and location really helped us look for spatial distributions of nonlinearity that wouldn't really be possible with other measurement methods.
ASA Publications (21:27)
Okay, okay. So then you did some finite element modeling. What did you model and what did those models show you?
Nicholas Pomianek (21:34)
Yeah, so I modeled every test that I ran. All the different contact configurations and the impacts. We made pretty highly detailed finite element models, which are pretty important to create a baseline for both mode shapes and resonant frequencies to compare to when we're analyzing the response of the actual system.
ASA Publications (21:56)
So what did you learn from your impact hammer and DIC testing?
Nicholas Pomianek (21:59)
So after analysis, we found that frictional contact area was almost entirely responsible for nonlinear behavior. So nonlinear behavior only really existed in parts of the system that were experiencing contact and was significantly diminished away from areas with contact. We also found that friction played a large role in determining which mode shapes were actually expressed by the structure and at which ratios they were expressed. And that mainly the image correlation method that we developed is not only a viable tool for analyzing this spatially dependent nonlinear behavior, but it's kind of uniquely suited to this purpose. And we think it has a lot of potential for growth in this in this space. So yeah, it was kind of what we needed to see in order to prove the method.
ASA Publications (22:50)
Well, that's exciting.
Nicholas Pomianek (22:52)
Right.
ASA Publications (22:53)
What are the next steps to this research?
Nicholas Pomianek (22:56)
So the next steps are definitely to expand to more sophisticated nonlinear modal testing techniques. The tap test is a great first step in any modal test, but there exists far better methods for identifying nonlinear behavior, such as ring down testing or force controlled shaker testing. These methods will give a much better picture and quantification for system nonlinearity than tap testing can just because they focus the test more on a specific mode. Yeah, expanding this method to a better nonlinear test method is definitely the next step. We think it could have some exciting results.
ASA Publications (23:39)
That’s very cool. So what was the most exciting, surprising, or interesting aspect of this project for you?
Nicholas Pomianek (23:44)
So for me, the most exciting was definitely the digital image correlation video results that I produced. They had these beautiful patterns of vibration that looked almost like a drop in water as the impact hammer hit the back of the plate and then the vibration propagated and bounced off the sides. It is completely imperceivable to the naked eye when you're doing the test but then seeing it come together like that like theory coming to live was very cool to see.
ASA Publications (24:12)
Yeah, that sounds awesome. Well, hopefully these methods will help in more computationally efficient methods for assessing vibration and structures and helping make structures safer. Thank you again for taking the time to speak with me today and good luck with your future research.
Nicholas Pomianek (24:26)
Yeah, and thanks so much for having me. This has been really fun.
ASA Publications (24:31)
Our next POMA student paper competition winner from New Orleans comes from the field of speech communication. With me is Ki Woong Moon, who will be discussing his article, “Realization of just: Speech reduction in a high-frequency word.” Thanks for taking the time to speak with me today and congrats on the award. How are you doing?
Ki Woong Moon (24:49)
Hi, I'm doing good and thank you for your warm welcome.
ASA Publications (24:52)
Yeah, I'm happy to have you. So first, tell us a bit about your research back.
Ki Woong Moon (24:58)
I’m a fourth year PhD student in Linguistics at the University of Arizona, and I am also expecting to earn a Masters in Human Language Technology at the end of this year. So my research lives at the intersection of phonetics, speech technologies, and computational linguistics. I'm especially interested in incorporating speech features into computational models to enhance their performance. Also, I am deeply intrigued by analyzing speech data using statistical methods. Lately, I've been diving deep into the phenomenon of speech reduction in spontaneous speech and exploring ways to adapt automatic speech recognition models to more accurately interpret reduced and variable speech patterns. My goal is to bridge the gap between variable human speech and machine understanding.
ASA Publications (25:49)
Oh, very interesting. Okay. So the basis of your work stems out of the fact that people pronounce some words differently in conversational speech than they do in careful speech. Can you tell us more about this phenomenon?
Ki Woong Moon (26:02)
Yes, this is known as speech reduction, and it's extremely common in spontaneous conversations. Instead of producing every segment or sound clearly, speakers often shorten vowels, delete consonants, or merge articulatory gestures, especially in high-frequency words. Prior work shows that more than a quarter of words in conversational American English have at least one segment deleted. And importantly, this phenomenon isn't a random thing. It's systematic and context dependent, influenced by various features such as speech rate, predictability, and the surrounding context. My study builds on this framework by examining how and when these reductions occur in the word just.
ASA Publications (26:48)
Interesting. Okay. Yeah. So you just mentioned you were interested in the word “just.” Why? And what was your goal for this study?
Ki Woong Moon (26:56)
So my interest in the word just actually started in a very simple way. During a meeting with my advisor, I noticed that I use the word just frequently and in every time it came out a little differently, sometimes without the final T, sometimes a smooth sibilant noise. That made me wonder whether this kind of variation was something only I was doing, or whether other speakers showed the same patterns. I checked some recordings from our lab, and then began exploring tokens in the Buckeye corpus, which contains spontaneous American English. Very quickly, it became clear that “just” is an ideal case for studying reductions. It's extremely common, very short, has two sibilant sounds that easily blend with surrounding speech segments, and appears in many different positions in a sentence. Because it carries multiple meanings and can fit almost anywhere synthetically, speakers use it in a wide variety of contexts. The goal of the study was to understand how “just” is actually produced in spontaneous American English, how often it's reduced, what those reductions look like acoustically, and how they depend on the surrounding sounds, and how predictable the word is in its local context.
ASA Publications (28:15)
Interesting, interesting. The self-reflection that that involved to build a study out of that was pretty impressive. So how did you go about taking measurements and analyzing the conversational use of “just”?
Ki Woong Moon (28:27)
So, to study how “just” is produced in conversation, I started by extracting a little over a thousand tokens of the words from the Buckeye corpus, along with the words immediately before and after it. I then aligned the word and segment boundaries using the Montreal Forced Aligner, and manually corrected them in Praat to make sure the boundaries were accurate. For each token, I measured things like the total word durations, the vowel durations, and the vowel's formant values, and I also created an 8-point clarity score that reflects how many of the expected segments were present or substituted. This allowed me to treat reduction as a gradient rather than simply labeling tokens as reduced or not reduced. I then categorized the sounds surrounding “just,” such as sibilants, vowels, or pauses, and also calculated how predictable “just” was in each context using the Biogram probabilities from the feature transcript corpus. Finally, I use a linear mix effects model to test whether these contextual factors and predictability measures significantly influenced the acoustic properties and degree of reduction.
ASA Publications (29:40)
What did you end up learning about how people pronounce “just” in conversation?
Ki Woong Moon (29:44)
The one major finding is that over 90% of tokens were reduced relative to the canonical form. “Just” often lost segments, especially the final T, and the vowel was sometimes extremely short or absent entirely. Importantly, reduction depended strongly on context, and “just” was more reduced next to sibilants, both before and after, showing shorter durations, higher vowel-to-word ratios, and lower clarity scores. Predictability also mattered. “Just” was more reduced when it appeared in very common sequence like “just like” or “was just.” However, the acoustic cues were not significantly affected from the predictability.
ASA Publications (30:30)
Okay. So what are the implications of these findings in the larger discussion of spontaneous speech production?
Ki Woong Moon (30:37)
One of the key findings is that reduction in spontaneous speech is highly systematic and very sensitive to context, rather than random. Speakers don't reduce just in arbitrary ways, they do it in predictable patterns depending on what comes before and after the words. For example, when “just” is next to sibilant sounds, it is reduced much more often and in more extreme ways. These similar sounds tend to blend together in natural conversations, and this creates a consistent environment where “just” becomes shorter or loses segments. Reduction is also shaped by how predictable the word is in a given sequence. In very common combinations like “was just like,” speakers reduce it more than when it appears in less familiar context. This means speakers seem to rely on listeners being able to anticipate the word in those frequent patterns. Altogether, the result shows that speech production follows organized context-dependent patterns. Even when words sound very reduced or ambiguous, the variation reflects stable influences from surrounding sounds and predictable sequences, helping explain why listeners can still understand the message despite the lack of clear, careful pronunciation.
ASA Publications (31:54)
Okay, okay, interesting. So that contextualization then, too, would probably, you'd want to incorporate that if you were designing a computer program to recognize speech to expect that in these certain situations “just” might be shortened. Okay. So what are the next steps in this research?
Ki Woong Moon (32:15)
The next step that I'm planning to do is to look at the perception. So now that we know how systematically “just” is reduced in different contexts, the next question is probably what do listeners actually need to hear in order to perceive the word “just.” In particular, I want to test how different phonetic contexts affect perception and ask how much sibilant noise or how much vowel duration is necessary for listeners to recognize that “just” is present at all. When the vowel is extremely short or when the surrounding sibilants blend together, listeners might miss the word entirely. Our perception experiment can help us understand the boundary between listeners still hear “just” and listeners no longer perceive it, and how that threshold shifts depending on the surrounding sound.
ASA Publications (33:06)
Okay, okay. That does sound very interesting as far as next steps. So what was the most exciting, interesting, or surprising aspect of this research for you?
Ki Woong Moon (33:15)
One of the most exciting and surprising aspects of this research was discovering just how systematic and context-sensitive the reduction of “just” really is. At the beginning, I noticed my own casual pronunciations varying a lot, but I expected that across speakers the patterns would be messy or unpredictable. Instead, data showed very clear and consistent trends. More than 90 % of tokens were reduced in some way, and the extent of the reduction wasn't random at all. It depended very strongly on the surrounding phonetic context, especially the sibilants, and on how predictable the word was in a given sequence.
Another surprising part was seeing how extreme some reductions were. In many tokens, the vowel was almost entirely missing, where the word was realized as essentially a stretch of sibilant noise. And in natural conversations, listeners still seemed to understand it. That raised the intriguing questions of how much acoustic information listeners actually need to perceive the presence of the word, which motivated my ideas for the follow-up perception study. Overall, the most exciting part was realizing that something as small and common as “just,” a word we barely noticed in everyday speech, reveals a rich, high-organized pattern of variations that reflects broader principles of spontaneous speech. It was a reminder that even very simple words can uncover complex and uniform phonetic behavior.
ASA Publications (34:50)
Yeah, that is so interesting that there are these patterns that just are common among people. Anyway, well, thank you again for taking the time to speak with me, and congratulations again on the award. And I wish you the best of luck in your future research.
Ki Woong Moon (35:04)
Thank you. Thank you for having me today.
ASA Publications (35:06)
You're welcome.
ASA Publications (35:07)
Our next student interview is from Musical Acoustics. With me is Riccardo Russo, who will be discussing his article, “Convergence analysis and relaxation techniques for modal scalar auxiliary variable methods applied to nonlinear transverse string vibration. “Thanks for taking the time to speak with me today and congrats on the award. How are you?
Riccardo Russo (35:25)
I'm good, I'm good. And thank you for inviting me. Very glad to be here.
ASA Publications (35:29)
Yeah. Glad to have you. So tell us a bit about your research background.
Riccardo Russo (35:35
Sure, so I just finished my PhD at the Department of Engineering here at the University of Bologna, my hometown, just defended a few months ago. And I have quite a strange background because I did my bachelor in physics here at the University of Bologna. Then I went to work as a developer for an audio company for a while. And then my passion for research came back and I decided to take the master's in sound and music computing at the Aalborg University in Copenhagen. And then from that my PhD. So I’ve always tried to bring together my for physics, music, music technology. So that brings me to the topic of my current research, which is developing efficient numerical algorithms to simulate the nonlinear vibration of strings. And my PhD research, and my current research, is within the NEMUS project, which is a European-funded project dedicated to developing digital restorations of non-sounding ancient musical instruments. So the idea is that there are instruments in museums that are ancient and cannot be played anymore. And instead of creating physical copies, what we want to do is we want to use modern engineering techniques to digitally reconstruct the sound.
ASA Publications (36:52)
Oh, so that's so cool. So people can go to the museum and hear what instruments might have sounded like at some point? That's so cool. Okay. So can you give us a bit of background on the numerical simulation of nonlinear string vibration?
Riccardo Russo (37:07)
Yes, when we want to develop a model, a mathematical model that describes the vibration of strings, we typically end up with one or more partial differential equations. And so, as always, when we try create a mathematical model that describes reality, there are some assumptions that we need to make and some aspects that we need to neglect. And so in the literature, there are different partial differential equations that describe the vibration of strings with different levels of accuracy. And the first example is the D’Alambert's wave equation, which is linear. And also it's the first example of partial differential equation. But this is very simple, and we may add other components to it to model stiffness and damping, for example. And there are different models for stiffness and damping in the literature, but in musical acoustics, typically we consider them to be linear. But then linear models are somehow limited because they cannot capture important features. For example, the pitch glide that you get when you pluck a string very strongly. In that case, what happens is that tension increases due to the high amplitude vibration and so does the frequency. So what you perceive is a pitch glide, essentially. Or another feature is the mode coupling that happens in nonlinear models that gives you spectral enrichment. So of course, the more complex the model, the more accurately it describes reality, but also the more difficult it is to find a solution. In the particular case of the string, the D'Alembert's equation, so the simplest one, is the only one that admits a mathematical closed form solution. But for all the other models, we require to use a numerical method to find approximate solutions.
ASA Publications (38:55)
Okay. So in your paper, you say that an issue with simulating nonlinear systems is guaranteeing numerical stability. Why is this a challenge, and how has it been dealt with previously?
Riccardo Russo (39:06)
So, yeah, this question is closely linked to the previous one. And stability is a fundamental property of numerical methods, numerical solvers in general. If a method is unstable, then it may happen that a combination of parameters make it blow up. And when this happens, basically the solution grows indefinitely towards infinity. And we want to avoid this because first, we want to be able to reach a solution for all possible combination of parameters. But also, this is a particular issue in real-time applications. So for example, let's say that we build our software instrument from our simulations. So for example, a musician plays a MIDI keyboard, which sends a signal out to the software that runs the simulation under the hood. Then if the software explodes or stops working just because the user selected the wrong set of parameters, that is just not good software. So we want to avoid unstable schemes at all costs. And there's different techniques to ensure stability of numerical methods. For example, a classic technique is von Neumann analysis, which is very powerful, but also only applies to linear systems. The technique we use is energy analysis. And in this technique, basically we try to bound a quantity with the dimension of energy, which we can view as the energy of the numerical scheme. And if we manage to do this, then we are sure that the numerical scheme will not explode.
ASA Publications (40:42)
Okay, got it. What was the goal of this project?
Riccardo Russo (40:45)
So the goal here was to use the scalar auxiliary variable method, or SAV, together with the modal methods to solve a nonlinear string. So SAV was used in this context only together with the finite difference methods. It was used with the modal methods but to solve different kinds of nonlinearities such as collision modeling, for example. So the final goal was to develop the numerical string model that will be used as part of the sound engine for the final part of the NEMUS project. And it was very important for us to use modal methods because they offer particular features with respect to finite difference methods. So for example, with modal methods, it's very easy to implement complex dumping profiles that depend on each frequency, allowing for very realistic sounds. And also they benefit from reduced numerical dispersion, which is something we try to limit.
ASA Publications (41:45)
Okay, okay. So how did you develop your algorithm, and how did you go about testing it?
Riccardo Russo (41:50)
So here my answer will have to be bit technical. So this algorithm is directly derived from my previous work where I used the finite difference methods for solving the same string model. The difficulty here was to re-adapt everything to use it with modal methods and especially adapt our regularization technique that I developed in the finite difference case. So this technique allows to keep the solution well behaved because otherwise, SAV in its “vanilla” version tends to give inaccurate results at low sample rates, especially at the sample rates that we need, so 44.1 kilohertz typically.
So the testing I did was essentially to ensure that the method is convergent. So if a method is convergent, what happens is that as the sample rate is increased, the solution converges to the real one, which is what we want. So basically, we took a benchmark solution, and we saw that sub-solution was actually approaching the benchmark one as we increased the sample rate. So we managed to prove that the method was convergent. And also, we saw that, as we expected, when we refined the spatial grid, this is something not feasible with finite difference methods. It's something that we can only do in modal methods. As we refine the spatial sampling grid, then the overall error of the scheme with respect to the benchmark solution was becoming smaller. So yeah, this is one of the reasons why we used this technique and we did this work.
ASA Publications (43:24)
You ended up using the model you developed for a virtual harpsichord. Can you tell us about that?
Riccardo Russo (53:29)
Yeah. So as I mentioned previously, this work is part of the NEMUS project, which is dedicated to developing digital restorations of non-sounding ancient musical instruments. And in particular, what we are studying is one ancient harpsichord, which is stored in a museum in Bologna, the Tagliavini collection. So the final goal of the project is to develop a software instrument from these simulations that people will be able to play. So one will connect a MIDI keyboard to his computer and running our software that will run the mathematical simulations of the strings under the hood, one will be able to actually play the digital reconstruction of the harpsichord. And in this project, also the controller part is very important. So for this reason, we asked a harpsichord builder to build for us a harpsichord keyboard. And then Matthew Hamilton, a colleague of mine, during his PhD, augmented the keyboard with sensors to turn it into a MIDI controller. So essentially now we have this harpsichord keyboard, which has sensors and sends out MIDI signals that can be used as a controller for our simulations. And this way it's possible to retain the haptic feedback of playing a real harpsichord. And since the project is also devoted to increase museum accessibility. So, for example, while a real harpsichord copy wouldn’t probably be accessible to the public, our keyboard will be stored in the museum next to the non-sounding instrument connected to a computer so that visitors will be able to come and play the simulation with the feeling of playing a real instrument.
ASA Publications (45:13)
Well, that's so fun. That sounds like so much fun. So what are the next steps for this research?
Riccardo Russo (45:18)
So the next steps, which we're working at at the moment, are to include other aspects of the instrument inside the simulations. So in this paper, I only showed the simulation of the string, but of course there's much more inside the instrument. So usually, especially like in the harpsichord, the string is attached at both sides to two bridges, which vibrate together with the string and then set the soundboard into vibration, and then the soundboard transmits the vibration to the air, which is what we perceive. And in this moment, we're working on reconstructing exactly this part. So a colleague of mine, Sebastian Duran, has worked on measuring the modal parameters of the harpsichord soundboard. And then these will be included as an impedance condition into the string model. And then from this, we will also model transmission into the air with a radiation filter.
ASA Publications (46:13)
Okay, okay, very neat. So what was the most exciting, surprising, or interesting aspect of this research?
Riccardo Russo (46:18)
So I think two aspects were most exciting. So the first one is more related to simulations and was when I actually observed that the error was getting lower when we reduced the spatial grid. So these things, you study them in theory, but then when you actually see that they work with the simulations and numbers, it's always very exciting. But I think the best part was actually when we connected the simulation of the harpsichord to the MIDI keyboard my colleague built, and we were actually able to play and to see the whole system working. This was very exciting because these are the final parts of the project, and it's very nice to all our work, so my work, my colleague's work, converging together to the final outcome. This is very cool.
ASA Publications (47:10)
Yeah, I was gonna say that is really exciting. That makes me really want to go to the museum and play the little harpsichord.
Riccardo Russo (47:16)
Yeah.
ASA Publications (47:17)
Anyway, it's really cool that your research can be used to help us hear musical instruments from the past, and hopefully we see some of these simulations showing up in museums. Thank you again for speaking with me today and I wish you the best of luck on your future research.
Riccardo Russo (47:29)
Thank you very much and thank you for having me here.
ASA Publications (47:32)
Yeah, you're welcome.
ASA Publications (47:35)
Our last POMA student paper competition interview is with Yu Foon Darin Chau. We'll be discussing his article, “Retrieval-Based Automatic Mashup Generation with Deep Learning Guided Features.” Thanks for taking the time to speak with me today, Darin, and congratulations on the award. How are you?
Chau Yu Foon Darin (47:50)
Great, thanks for having me.
ASA Publications (47:52)
So first, tell us a bit about your research background.
Chau Yu Foon Darin (47:55)
Sure, so I'm currently a second year MPhil on my research journey and I'm interested in the area of AI machine creativity related to music. Basically, I would like to teach computers to listen to, analyze, and compose music. This would involve designing algorithms and metrics to analyze the acoustics of different pop songs and different instruments. Then we would create songs with different acoustic properties, ask people to evaluate our creations and see how our metrics align with human intuition. Since I have a classical music background, I also care about and want to work into the computer about structure, voice leading, and musical storytelling, and not just sound quality.
ASA Publications (58:39)
Oh, so cool. That sounds like a lot of fun. So this work has to do with musical mashups, which I imagine our listeners are probably a little bit familiar with, but if they aren't, can you explain what a musical mashup is and how automatic mashup makers typically work?
Chau Yu Foon Darin (48:53)
Sure. So a musical mashup is a genre of music created by superimposing stems of two or more existing pieces of music, that would be typically pop songs, together in a way such that the outcome is harmonious. So for example, I would take the vocals of a song and combine it with, say, the drums of another song and the instruments of yet another song. An example of a mashup could the collaboration album from Jay Z and Linkin Park, where they combined “99 Problems” and “One Step Closer.” And DJs and mashup creators pick songs with compatible tempos and keys, line up the beats, adjust the pitch, and then mix the stems so that the energy and transitions feel natural. And automatic mashup makers try to replicate each part of that workflow with algorithms. Andare mainly three parts to automate, which is the song compatibility part, which will be some sort of heuristics, line up the beats, so beat detection, and adjusting the pitch, so pitch or harmony or key detection.
ASA Publications (49:59)
Okay, okay, that makes. So your goal was to improve on these current automatic mashup creators since they don't really meet the same quality as human-made mashups do. How did you go about doing this?
Chau Yu Foon Darin (50:11)
Well, we started by asking what makes two songs fit musically the way humans do by ear. And this allows us to investigate how we might evaluate the compatibility of two songs and retrieve matching song segments at scale. So using models from the music information retrieval community, we were able to measure what harmonies are happening over time. Then we scaled up. We built a large library of about 37k pop songs, pre-analyzed everything for chords and beats, and let the system search for the best matching sections. Once it finds a promising pair, it can go through the whole pipeline. And finally, we tested the outputs with listeners, including musicians and college students, and refined the scoring and mixing based on their feedback.
ASA Publications (51:00)
Okay, okay. You kind of get into this a little bit, but how did you assess the mashup compatibility of the two songs?
Chau Yu Foon Darin (51:07)
Well, we measured how well two songs fit by looking at two things that probably we care the most about: so that would be harmony and rhythm. For harmony, we used a chord recognition model that listens to a song and labels which chords are happening over time. That would be like turning the music into a sequence of building blocks—so for example, like C major and A minor and so on. And then we compared two song segments by lining them up according to the beats that we detected. And scoring how similar those building blocks are, we get a number from computing a distance of each of these blocks over a fixed window of, say, eight bars. A lower distance would mean the chords and timing agree more, so the pair would be more mashable. We also tested a richer version that uses the model's latent features, which would be like a model's internal understanding of harmony. In practice, the system searches a large song library, tries different starting points and key shifts, and picks the candidate with the lowest distance, and those are the segments that would most likely to sound good together.
ASA Publications (52:19)
Okay, okay. How did you create the pipeline of songs to use in the mashups?
Chau Yu Foon Darin (52:24)
So we built the song pipeline in two big steps. The first step would be prepare a large music library. And then we would make it analysis ready so that the computer can quickly find good pairings. So we prepared about, we said like 37,000 pop songs, from public sources—that would be like YouTube and large data sets. So for each song, we ran all the detection tools and stored their analysis alongside information like language, the length, and the YouTube link. We also filtered out low quality or non-musical audio. And then after that, we indexed these features so that the system can scan the library quickly and try different starting points and small key shifts. So when a user submits the song, we run the same analysis on it. And then the search matches their segments against the pre-analyzed library to return the best fitting candidates. And, finally, once the match is chosen, we separate stems and do the mixing and mastering steps to deliver a clean, polished mashup.
ASA Publications (53:33)
How did your mashup system perform in listening tests?
Chau Yu Foon Darin (53:36)
So we tested the system with two groups of people. So it will be everyday listeners and professional musicians. So for casual listeners, we asked whether our compatibility scores matches their gut feeling about how well two song clip fit. On average, they said it was pretty accurate. It was slightly optimistic, but it was close. So the score is useful for finding good pairs. We also compared our mashups to a popular online tool called Rave DJ. People rated both as generally good at keeping the feel of original songs, with a small edge to Rave, likely because Rave keeps a lot of the vocals, and vocals strongly encourage song’s identity. For the pros, we had them rate enjoyment of our mashups on a one-to-five scale. The original songs scored the highest as expected, but our mashups built with the deeper harmony features beat both our simpler chord label method and classic academic system. So in short, listeners found our mashups sensible, competitive with a commercial model, and musicians preferred this version guided by the richer harmony model.
ASA Publications (54:51)
Okay, interesting, interesting. So what are the next steps for this research?
Chau Yu Foon Darin (54:56)
So the next steps would be to improve the fit model beyond, let's say, harmony and beats to include, for example, the melody, the phrase boundaries, and the vocal phrasing. So we also want to upgrade the mixing or the mastering automation with tools from music production, for example, like dynamic EQ and spectral masking control. And since the latent factors, which are like the inner understanding of the model, worked quite well for predicting the acoustic properties of a pop song, we thought we might use some machine learning methods to transfer these latent factors directly from one song to another. And to this, we just got a paper accepted to IEEE Big Data. So that's good news, I guess.
ASA Publications (55:44)
Yeah, totally! So what was the most interesting, exciting, or surprising aspect of this research?
Chau Yu Foon Darin (55:51)
So two things stood out. So first is that the scale of these kind of library-based mashup systems, the scale really matters. So as we grew the library, the best match compatibility improved roughly geometrically, which is that if we double the size of the library, it is twice more likely we'll find a good match. And this is evidence that retrieval size is a first-class quality lever, let's say. And secondly, a practical surprise was how far simple heuristics can go in characterizing the search phrase. Our estimates from the heuristics suggest that there are on the order of about 850,000 unique eight-bar segments in the corpus, which explains, first of all, there are a lot of unique pop songs and also explains the power and the challenge of retrieval. So there are many, good options, but also many near misses.
ASA Publications (56:51)
OK. Yeah, yeah. Well, it's really interesting to hear about your attempts to bring some more of this nuance of human-made mashups into automatic mashups. It'll be interesting to see how this technology progresses. Thank you again for taking the time to speak with me today, and good luck with your future research.
Chau Yu Foon Darin (57:08)
Thank you so much for having me.
ASA Publications (57:10)
Yeah! So before we wrap up this episode, I'd like to share a couple messages with our listeners. One, if you liked what you heard in this episode, please text it or email it to someone who may enjoy it as well. Second, for any students or mentors listening around the time this episode is airing, we're actually holding another student paper competition for the sixth joint meeting between the ASA and the Acoustical Society of Japan, which was recently held in Honolulu. So students, if you gave a presentation or had a poster at the meeting, now's the time to submit your POMA. We're accepting papers from all the technical areas represented by the ASA. Not only will you get the respect of your peers, you'll win $300, and perhaps the greatest reward of all, the opportunity to appear on this podcast. And if you don't win, this is a great opportunity to boost your CV or resume with an editor-reviewed proceedings paper. The deadline is January 16th, 2026. We’ll include a link to the submission information in the show notes for this episode.