Across Acoustics
Across Acoustics
Wait, What's That?: Weird Data in Underwater Acoustics
Oftentimes, when a scientist studying underwater acoustics begins an experiment, they have a specific goal in mind... but then there's a bloop or a crackle they don't expect, or the instruments are saying the ocean floor is at 500 meters instead of 5,000 meters like all the charts say, or a rogue pod of dolphins has caused measurements to go awry. In this episode, we talk to Erin Fischell (Acbotics Research) about all the weird data researchers can run into when they're trying to study underwater sound.
Read the associated article: Erin M. Fischell. (2022) “Weird Data: The Element of Surprise in Underwater Acoustic Sensing” Acoustics Today 18(2). https://doi.org/10.1121/AT.2022.18.2.34.
Read more from Acoustics Today.
Learn more about Acoustical Society of America Publications.
Intro/Outro Music Credit: Min 2019 by minwbu from Pixabay. https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=music&utm_content=1022
Kat Setzer 00:06
Welcome to Across Acoustics, the official podcast of the Acoustical Society of America's publications office. On this podcast, we will highlight research from our four publications. I'm your host, Kat Setzer, editorial associate for the ASA. Today I'm talking to Erin Fischell, whose article, "Weird Data: The Element of Surprise in Underwater Acoustic Sensing," appeared in the summer 2022 issue of Acoustics Today. Thanks for taking the time to speak with me today, Erin! How are you?
Erin Fischell 00:38
I am good. Thanks for having me on.
Kat Setzer 00:41
Thanks for being here. I'm excited to hear some weird stories. So what's your research background?
Erin Fischell 00:46
I actually started work in underwater acoustics and robotics as an undergraduate on the Cornell University Autonomous Underwater team. I really enjoyed the engineering challenge of working on underwater systems, so I went on and did a PhD in the MIT/WHOI joint program. I then did my postdoc work at MIT, and then was hired as research faculty at Woods Hole Oceanographic Institution in 2017. I left WHOI in 2021 and have since split my time between my work as a senior scientist at a company called JP Analytics, as well as founding my own company called Acbotics Research. So my work in underwater acoustics has actually been pretty broad. It includes scattering, array processing, environmental acoustic signal processing and machine learning, acoustics system development, and Arctic seismic acoustics. So I've worked across a broad swath of underwater acoustics, which I think was part of what motivated me to write this article, because there were some commonalities and challenges in all of those different types of work.
Kat Setzer 01:45
That's cool. You're kind of like a Jane of all trades, but in underwater acoustics.
Erin Fischell 01:50
Yeah, a little bit. Yeah.
Kat Setzer 01:52
Okay? So what brought about this article?
Erin Fischell 01:54
When I was a faculty member at WHOI, I taught in the MIT/WHOI joint program, and one of my favorite lectures that I gave teaching both adaptive array processing and environmental ocean acoustics was a lecture that I called "Weird Data." It basically was aiming to give students a sense of all the things that go wrong when you're trying to do underwater acoustics in the actual field. The other motivation was when we're all at conferences, and I'm talking to my underwater acoustics friends across the spectrum, from high-frequency to low-frequency, marine mammal folks, when people talk shop, they really enjoy complaining about all the weird stuff that they find in their data when they're actually out in the field. So I started collecting stories for this lecture, I started reaching out to colleagues. And then I was approached, if I wanted to write an article for Acoustics Today, and it seemed like the perfect opportunity to kind of write down some of the many, many stories that I had heard over the years.
Kat Setzer 02:57
You know, you're talking about how you were teaching this to students. And it's, like, probably very helpful for them to understand that, like, all the things that are going wrong in their data, it's not them. It's the underwater sound.
Erin Fischell 03:10
Sometimes it's them, but a lot a lot of the time it is the underwater sound. And one of the things you kind of have to accept as an underwater acoustician, is sometimes the equipment doesn't work, sometimes the environment is against you. Sometimes stuff just breaks. And sometimes you see something that is completely and utterly unexpected that sparks a completely new hypothesis about how sound in the ocean works. So I think, you know, whenever I was teaching graduate-level acoustics, it was very important for me to include that in the curriculum, because the reality is, you know, it's not completely cut and dry the way sound behaves in the ocean and what is going to show up every time you go out on a ship,
Kat Setzer 03:51
There's even more variables than you could... you may plan for, to think about that, like.
Erin Fischell 03:56
Exactly.
Kat Setzer 03:57
So can you give us just a little bit of an explanation of how is underwater acoustics different from in-air acoustics?
Erin Fischell 04:03
Yeah, so the biggest differences are that underwater acoustic propagation is so heavily impacted by the environment. But also we are incredibly reliant on acoustics in underwater in general. So we rely really heavily on acoustics underwater, for nearly all our underwater sensing needs, because light and the electromagnetic spectrum are absorbed more than 10 billion times more in water compared to air. So as a result, we use acoustic sensors and systems for everything from environmental characterization to communications, to navigation to observing wildlife over pretty much every spatial scale you can imagine in the ocean. But even though we're highly reliant on it, the acoustic propagation underwater is really, really variable in behavior. So the sound speed in the water column, it changes with temperature, it changes with depth, it changes with salinity, and all of those variables change throughout where you are on the ocean. So this means that we get a so-called sound speed profile that changes with depth, but it also changes with latitude and longitude. And it also changes with time due to a variety of environmental forcing factors. So you get some really weird behavior and underwater sound waves. In addition, we have a bottom and a surface, and that means there's also a lot of multipath effects, you get interactions due to the exact nature of the bottom, the layering in the bottom... And also, we're in water, and that water has certain molecular properties. And one of the effects of that is, for example, high frequencies of acoustic sound are attenuated more rapidly than low frequencies of underwater sound. So all of this means that there's a lot of effects that come into play when you're trying to do underwater acoustics. The sound also moves very fast in water compared to an air, so the sound speed in air is like 340 meters per second in water, it's 1500 meters per second or so. So all of that means we're highly reliant on it, and it's highly variable. And there's no real baseline information of exactly what the state is at any given time to make it predictable. So a famous example of this is how low-frequency sounds in the ocean can travel thousands of kilometers along propagation paths that don't interact with either the bottom or the surface. This means you can literally hear sounds from the far side of the ocean through what's called the SOFAR channel.
Kat Setzer 06:35
You're already kind of gone into this next question a little bit, so this is a good segue: What are common challenges that researchers run into when dealing with underwater sound? And what are you talking about exactly when you say, "weird data"?
Erin Fischell 06:48
Yeah, so most of, when we're talking about underwater instrumentation and problems, we're talking about signal-to-noise ratio, right? So the level that you are getting of the thing you want to sense divided by everything else, right? So what is signal and what is noise changes depending on what you're trying to tell about the world, tell from your sensing system. So what's signal to me when I'm tracking ships might be noise to someone else trying to understand whale behavior, for example. And the biggest challenge in underwater acoustics, it's nearly impossible to get a full picture of what's present in the environment in terms of anthropogenic, meaning manmade sources of signal, the biological signals, and the environmental sources. But that's just the environmental acoustics, right, we also have to add in the challenges of actually deploying sensors in the ocean. So in seagoing ocean acoustics, scientists are dealing with building and deploying systems that can survive the ocean environment: it's corrosive, it's high pressure, it's wet, this is not a good place to be putting electronics. It's also very expensive, because you're having to go out on a ship to deploy it. You know, there are often space limitations, you're often sticking these acoustic systems onto a platform or a system without much choice in where it is or how it's connected or what the power levels are. So as an example of how hard it is sometimes to do this, even in "easy" underwater acoustics, I went out with University of Delaware just into Delaware Bay to go and collect some acoustic data using a system I had built for them with my company. And, you know, while we were out there, there were biting flies that were attacking all the students who had worn shorts instead of long pants, and they're... nearly everyone got seasick. It was hot; it was muggy. We had to change our plan for rigging to lift the system, because the type of winch that was on the boat was different than the type of winch we expected it to be. Halfway through one of the batteries died, you know. So the data is excellent from that experiment, it worked fine. We had planned for contingencies. We had seasickness medication, but even an easy case where it's just a quick day trip off shore, lots of things can go wrong. And so all of that makes it challenging. And the combination of the environment and the difficulty of building and deploying these instruments leads to a lot of what I'm calling "weird data" in this article. So weird data is all that stuff in the noise category for whatever your experiment is, right? It's going to vary depending on what your hypothesis is. So whenever you're designing an underwater acoustic instrument, there's something you're trying to observe. Right? It might be whale's calls on a hydrophone array. It might be copepods, and a fish finder. It might be acoustic modem messages via a distant AUV that's sending information across several kilometers, right? But regardless, there's something you want out of that acoustics data, and weird data is everything else that isn't that, you know, you might call it noise in your experiment. But the reality is that sometimes what's noise to your experiment is actually just weird data. It's something that might be useful to someone else. So I think of weird data is the part of any of my data sets that don't get highlighted in the paper summarizing the experiment, but might interest someone else. So an example of this, in 2021, I was a part of this sea ice dynamics experiment where I developed a seismic acoustic sensing system that was deployed on the Beaufort Sea for a month. So my work on that data, according to my ONR contract, is focusing on ice cracking events. So I am localizing and characterizing and getting temporal spatial statistics right on all of the ice cracking in that data. But I've got 15 hydrophones under the ice for 30 days with continuous data, right? There are so many mysterious bloops and chirps and rumbles that they're not the dispersive flexural events that I'm trying to localize, there's something else, right? So some of that might be noise from camp, some of it might be planes taking off and landing, some are seal and whale calls. Others might be seabed earthquakes, you know, some of that is system noise, right? And all of that is weird data to that experiment. So in the article, you know, I kind of took all these different things that aren't what you designed the experiment for, and I put them in a bunch of categories, right? So we talk about external interference. That's like, the camp noise, or that's a whale if you're tracking ships, or ships if you're tracking whales. There's propagation effects. That's all the weird bounces, and things that you get in the underwater environment due to the curving of the sound underwater. There's also system noise, that's, you know, like, I have a power ripple in my system that's causing a specific type of resonance in my system. And there's also unexpected reflections. At the beginning of the article, you know, we talked some about that, that one of the most famous cases of that is depth sounders will sometimes just show a false bottom at, say, 500 meters, which is due to, you know, very small zooplankton that are congregating and creating a bunch of scattering effects, right? So all of these things are happening in the ocean all the time. Which ones show up depend on your frequencies, depend on your systems. These are very fuzzy categories. But the goal was to try to encapsulate all this weird stuff that we all see every time we put instruments in the ocean.
Kat Setzer 12:37
Okay, awesome. So why don't we get into some of those categories that you were just discussing? The first one was external interference, which is what most people think of when they think of underwater noise. What is external interference? And what kind of problems can it cause?
Erin Fischell 12:51
Yeah, so external interference is basically, you know, anything from the many sources and noise that are... where sound's being put into the ocean by varying things, right? It might be... and again, exactly what's the weird data and what's the signal depends on your experiment. But some examples I've seen in my own data are things like pile driving or whale calls, or ships, or sonars that aren't a part of your system. An example of that, you know, I ran a passive acoustic data collection. I had this really weird, episodic broadband signal showing up. Well, it went away when we turned off the fish finder on the boat. It turned out that the array was close enough to the fish finder, that we were getting these very broad band in frequency acoustic peaks that were showing up in the system, even though that's not part of our system, right? That sound's in the water, so it's showing up. Other examples might be bubbles, wave breaking, you know, rain, all of those things cause sound to be put into the water. You know, there's a really fun anecdote from Matt Palanze at Woods Hole Oceanographic Institution in the article, where he talks about his acoustic modem acoustic release for an OOI system, completely failing to work one time when they were out on the ship and they couldn't figure out what was going on. They were sitting in the lab, they were telling it to release multiple times, nothing was happening. Someone finally came down from the deck and said, "Uhhhhh.... There's a few hundred dolphins off the ship, and that might be causing the problem." So they had to wait and they put it in the log, he says, as "operations delayed due to mammalian interference." So all of that is external interference. You know, it's it's all of those things.
Kat Setzer 14:46
That's hilarious. Okay, so then the next one you talked about was environmental features, which can create some weird data for sound propagation. What are those?
Erin Fischell 14:56
So I counted environmental propagation effects as effects due to the surface, the water column, and the bottom. So one of my personal favorites of these effects is something called a Scholte wave, where you, if you have a low-frequency sound wave that's encountering the bottom and, say, you have a layered bottom with a mud or a sand layer, and then bedrock underneath that sound will actually travel along the bottom and then shed back out at a bit of a distance from where it was originally entering. So you get weird angles. Another one, which I mentioned in the beginning, is this so-called SOFAR channel, right, which allows you to essentially hear low-frequency sound from all the way around the world, right? If it's loud enough, because it's not encountering the bottom and it's not encountering the surface, it's just kind of curving its way around, never interacting, and just staying in the water over massive distances. People think that, you know, whales may be communicating over long distances using this. There have been a number of acoustic studies using explosive sources that have demonstrated this effect over thousands of kilometers. I mean, it's really a remarkable feature of our ocean that this exists. You know, other examples might be the fact that, you know, there's something wonky on the seabed, right? So if you're operating near the continental shelf, you're going to see reflections coming back at you at odd angles from the continental shelf. This showed up for me when I was a postdoc at MIT. We were running AUVs in the Charles River, and there's a great honking big sea wall right there. And we had to actually filter our AUV navigation solution using an acoustic array to take out the reflection from the seawall because the AUV was supposed to be following an acoustic source in the water, but it started following the reflection from that seawall. So that's an example of an environmental propagation effect that can change your navigation solution. Okay, got it. And then some surprises can come from the acoustic systems themselves. How can internal system noise throw off a researcher's expected results? So one of my earliest encounters with underwater acoustics, as I mentioned in the beginning, was when I was an undergraduate on the Cornell University autonomous underwater vehicle team. So, well, on that team, we built a hydrophone system with a little acoustic array of four hydrophones in a tetrahedron to be able to localize a pinger relative to our vehicle. Now, when we were running this system on our AUV, we had this very odd effect where a couple of frequencies were showing up at two specific levels all the time, no matter if we were, you know, in a pool or in a tanker in the lake, they'd always be there on our hydrophone system. We finally diagnosed them as an alias of the PWM switching frequency of the motors on our AUV. So PWM is a pulse width modulation; it's basically a way to dictate how hard a motor is spinning. And so the PWM switching is how quickly it goes back and forth in providing that information to the motor. And that PWM switching frequency it turns out across my career has been one of the major sources of system noise in hydrophone systems. So if you're trying to collect passive acoustic data, meaning you're just listening, a combination of the mechanical coupling of the switching frequency as well as the acoustic coupling of that switching frequency, as well as noise in the electrical system potentially can show up quite a bit. So I also saw that same effect, you know, when I was a graduate student in the AUV I was working on as a graduate student, and I saw it again as a postdoc in a surface vehicle I was working on, and I saw it yet again as a junior faculty member, again on an AUV. So it's something that comes up again and again and again and again. Another example of that is power noise. So underwater systems are highly prone to power noise issues, because, you know, if you have a terrestrial system, ground is ground, right? You can have everything be connected to a single ground, and you can still wind up with ground issues. But underwater, you don't have that ground. It's kind of floating, right? And so there are a number of challenges that you can run into with that, and one of them is you can get noise on your ground line. So an example of that is again during my PhD work, I was working on a time synchronized acoustics stumped for an AUV. And we were getting power for a ship scale atomic clock from a 5 volt rail of a Bluefin 21 vehicle. Now it's a rather elderly Bluefin 21 vehicle, with a fairly old power system. And we were seeing this very odd effect where we were triggering our recording based on what's called a pulse per second signal. So the rising edge of the second, it goes up, you catch that going up, and then you start your recording. Well, it was, it was triggering recording at like random times, but only when it was on the AUV. It turned out there were nearly 3 volts of noise on the ground line of that power system, which means we were getting this really strong ground up and down signal that was then triggering off of the non PPS rising edges that were due to the noise in this ground signal. So you know, there's just a lot of things like that that can happen. That was a particularly egregious example that we had to fix before we could get the experiment to work. But on a smaller scale that happens in a lot of different acoustic systems, you'll have, you know, lines of noise showing up depending on what else is in the system. Another really famous example, that nearly everyone who's worked with underwater systems, especially autonomous underwater vehicles, sees, is acoustic modem noise. So acoustic modems, for those who don't know, imagine for those who are old enough, those old fashioned modems we used to access the Internet back in the 90s, right, where they kind of just screech and groan at each other and in that way are able to send data. That's how we do most of our communication underwater is via these acoustic modems. The problem is in order to put enough energy into the water to get the transmission ranges of kilometers, there are giant capacitor banks in these things, and these giant capacitor banks are very, very noisy. And so you can get introduced electrical noise from these acoustic modem systems. But you also get a lot of self noise if, say, you have a passive acoustic system. And you'll also have an acoustic modem, you'll get that interference from the acoustic modem into your passive acoustic system. And if you're working off an energy detector or something, right, you have to take that into account or it'll trigger every single time the modem fires, even when that's not what you care about.
Kat Setzer 22:27
It's kind of like your systems are just constantly trying to sabotage your research.
Erin Fischell 22:30
Yeah, and so much of it is because of just that, you know, it's a hard environment to work in, right? You get limited test time, usually. It's extremely expensive to go out on the ocean in general. And you don't necessarily have the whole system together, you're often taking some acoustic system and then sticking it on something that someone else has built, assuming that everything's going to work, hoping that everything's going to work. And sometimes it does and sometimes it doesn't. So, the internal system noise is pretty high on my list of things that can cause acoustics experiments to not work. And unlike the other categories of noise, it's generally less interesting to other scientists, when you have a, you know, 18 kilohertz PWM switching frequency peak in your passive acoustic data, like that's not actually that interesting to anyone. There was an example in the article where it was something interesting and positive, though, which I really liked. This is from Chris Bassett. He's a scientist at APL-UW. And so he was doing hydroacoustic surveys and had this weird cyclical noise issue that was causing their signal-noise ratio to drop, you know, and it was screwing up their echo sounders, their fishfinders. So these were active systems, but it was just fuzzing it all over with noise. And they were able to observe this on hydrophones on the vessel and they had a hypothesis that there was some bearing that was going bad on the ship. The ship went to dry dock, and it turned out there was a bad bearing, so they were right on that. So it's not that the internal system noise is always just the ,you know, something that is unhelpful, just fogs up your ability to do science. sometimes it does provide you system information. Another example from my Arctic dataset, right? I don't know if this is external interference or internal system noise, but we recharge our batteries on that ice flow system using a wind turbine and of course, that wind turbine is frozen to the ice so it doesn't go away. Well, that's coupling to the hydrophones that are like nearly directly underneath it. And so we have a near direct correlation between the wind speed and the amount and the signal-to-noise ratio on that particular part of the hydrophone array. So we've made ourselves a very expensive Wind anemometer.
Kat Setzer 25:05
I like it. So the last category of surprises you talk about in your article have to do with reflected sound. How do reflections, scattering, and clutter affect underwater acoustic data in unexpected ways?
Erin Fischell 25:17
Yeah, so as I mentioned at the beginning, you know, we're trying to use acoustics underwater for most of the things, many of the things that we use electromagnetic spectrum for in terrestrial sensing, right? So cameras have very limited range. You can't really do satellite measurements of the water column of what's present, you know, there's no real equivalent of flying plane overhead and counting the penguins, right, for underwater. So we use these echosounder systems for lots of different scientific questions in underwater sensing. So in these systems, you're sending out a ping, and then you're listening to the echoes, and the time it takes for those echoes to get back to you, and the phase of the echoes, and the frequency of the echoes. All of that tells you something about what is in the water column. And so that's great, but lots of things create reflections due to that. And so there's a whole category of these unexpected reflections which are like that classic World War II example of, you know, they're offshore, they're starting to lay cables, they have early acoustic bathymetry sensors, where essentially they're sending a ping, it's supposed to bounce off the bottom come back, they can get the depth. And all of a sudden, you know, they're like in 5000 meters of water, and it's showing up as 500 meters. And that makes no sense whatsoever, right? That is not where the bottom is. Well, that's this thing that we call the ocean Twilight Zone, there's this whole mesopelagic region in the ocean. So in this so called mesopelagic zone, there are all these, for lack of a better word,critters that have a daily migration that go from deeper to shallower to deeper to shallower and create essentially what they call a deep scattering layer. So those depth sounders are bouncing off all these creatures in the water column and coming back, and these days, right, we're really interested in those creatures. But if you're interested in the depth of water, that is definitely weird data. So that's an example of the reflections. That's kind of the most famous example. But this occurs all the time with pretty much every echo sounder, passive acoustics experiment. There's just a lot of stuff that reflects sound out there. So an example I was working on this RPE Mariner program, where we were building autonomous underwater vehicle systems to try to map out kelp aquaculture farms. So can we use broadband echo sounders, within AUV with cameras with a bunch of other sensors to be able to say something about the growth of the kelp on those lines, the position of the lines, the engineering characteristics of the system? Well, we get out there and we collect our data and we start looking at the data and I pull up the data for some of my colleagues who were sort of on the farm in the engineering end of how do you actually build these big marine systems. And the thing that was most interesting to them was not necessarily the kelp itself. It was the fact that we were seeing all of these, probably some kind of zooplankton, congregating right around where the kelp was. They were also really interested in the fact that these lines were bowing, you know, significantly and changing with the time. They were also interested in, we were seeing what looked like turbulent structures in the flow of the water around these lines. So if you approach these types of data recording systems as hopefully answering the question you ask, but also maybe providing insight into broader processes. Being open to the idea that there's something else in there you can often get information about things you wouldn't even consider. I really liked the story on this in the article from Ian Vaughn at Woods Hole Oceanographic Institution. So Ian was out at the time with the sentry program and they were surveying an underwater volcano and then all of a sudden, you know, they're doing multibeam sonar surveys, so they're just mowing the lawn going back and forth trying to get a map of this volcano. And something really weird kept showing up. They were getting like a cluster of scatters right at the edge of the volcano. It turned out it was a school of large tuna, which they woould pick up when they drove an ROV down there and actually looked with a camera. So too deep to fish for, but it was still a very interesting thing to see in that data set, which kind of is another thing to talk about with all this weird data, right? The acoustics at a single frequency give you a single perspective on something, right? Having secondary sensors, different frequencies actually tell you a lot more about what something might actually be. And it's important when designing acoustic systems and acoustic sensing experiments, to think about what all these different categories of weird data might be, so that you can get some additional classification information for whatever weird data you get, right? So again, going back to that arctic dataset, which I keep going back to because it's the one I've been working on most recently, I had all these really weird episodes where it's very loud, and it's rumbly. And it's all the things at once, when the planes were taking off and landing on the ice sheet, and the entire ice sheet is going "Rrrrrahraaahrrrrrr." So having secondary metadata, other sensors to give you information can be extraordinarily helpful. The other thing is, you know, again, also from that arctic dataset, I found a couple of incidents of this really weird acoustic signal. Again, it wasn't what I was looking for, it wasn't a discrete ice cracking event that I could pull statistics from and get a temporal and spatial map of. Instead, it's like the entire ice sheet is shifting in the horizontal plane, there's no vertical component, it's just moving back and forth. Fortunately, other people have seen the same effect before, right. I was able to find a paper from another acoustician who specifically observed this type of horizontally polarized ice event that describes something very similar to what I saw. So I guess the takeaway is, you know, if you can use your weird data or make it available to other people, there's a lot of information there that might not otherwise be valuable for science. You shouldn't throw it away.
Kat Setzer 32:23
Right. You're getting a lot of rich information that's got more layers to it than what you might be looking for just by yourself.
Erin Fischell 32:30
Exactly. Exactly.
Kat Setzer 32:32
Can machine learning be used to deal with weird data at all?
Erin Fischell 32:35
It's complicated. So machine learning in acoustics is still developing. The machine learning in acoustics that is probably the most sophisticated is a lot of the speech- and language-type machine learning. There's also been a big effort in bioacoustics, including underwater bioacoustics. The challenge in underwater acoustics is a lot of the machine learning algorithms and systems assume a reasonably stationary signal-to-noise ratio, meaning the statistics of the noise are not changing versus time. The very nature of the underwater environment means that the environment's essentially non stationary, we have a constantly evolving signal-to-noise ratio. And that's not just like a single number, right? You know, the noise statistics themselves shift, for example, with tide or for example, with where you are spatially, if you have an acoustic system on a moving platform, so it makes it very difficult to have a big data trainable system. The other problem is lack of labeling, right? So you have variable signal-to-noise ratio, which makes a lot of the automated, sort of just throw-it-in-a-pile-and-see-what-pops-out work difficult. But there's also a bit of a lack of data that's labeled that would let you get a start on it. And people do tend to cherry pick their data sets for where they have a good signal-to-noise ratio, or where the data answers the question that they're going for. I mean, I've done a fair amount of machine learning and acoustics. I actually helped to run a machine learning in acoustics session at ASA, at one of the meetings a few years ago, but it is tricky. It continues to be tricky. And I think a big part of the solution to that is going to be making more data broadly available. But again, we have other problems there. The the data formats are not at all fixed in terms of how can we handle big amounts of acoustic data. Every manufacturer for a sonar has a different proprietary data format, it seems everyone's storing their hydrophone data in a slightly different file format. And most of those data sets are not broadly available. to people. I wish they were. I wish there were a central place to put all of this that then someone could work a big data machine learning angle on it. But again, because there are so many weird sources of interference in these data sets, without full metadata, which isn't always available, it's hard to train on. For that arctic data set that I mentioned, a big part of my effort, this last year has been getting an initial paper ready to go. And it's actually about to be published in IEEE JOE, that links to the full dataset with all of the data and all the metadata I have. And my hope with that is that someone else can look at my weird data. Like I'm not a marine mammal expert, I'm hoping a marine mammal expert will be able to take that data, put it into their machine learning system and pull out all the seal calls, right? So I think it's a fair amount of work, though, to get to that point. So to make that be possible, right, I had to set up the Google Drive because there isn't a single clearinghouse for acoustic data that I'm aware of. I had to figure out how I wanted to host the code, I had to get the code to the point where I felt like it was publishable and usable by other people, I had to document it, I had to, you know, provide a GitHub, there were a lot of steps in that. And for this particular experiment, I'm fortunate in that, ONR was very interested in me doing that with this dataset, and so I was able to do that as part of the project. But I was surprised as I started this process, that there wasn't a place to put that data and a place to put that code that could be easily accessed and automatic place that people would go looking for it, if hey wanted to write a master's thesis on seals in the Arctic, where they could just type that into a search and have my dataset come up for them. And to my knowledge, that's not really available for acoustics yet.
Kat Setzer 37:04
So hopefully, that becomes a resource available for folks.
Erin Fischell 37:07
Yeah, I really would hope so. I think part of the challenge is, so National Science Foundation has a set of ways that they do data management and organization for their programs, right? But a lot of acoustics research, most acoustics research is not funded by NSF, right? So it's not going into those places where people might search, you know, and then each program often has its own database. And there, there are great websites, and I list a few in the article that kind of compile a bunch of different sources for acoustic data, so you can find it. But it's not necessarily the easiest, most searchable. It's not like an image database, right, where you can type in "pictures of dogs" and get, you know, thousands and thousands and millions of pictures of dogs. I can't type in "seal calls" and get thousands of examples of audio files of seal calls, right? I can't type in "scattering from copepods at 200 kilohertz," right, and get thousands of examples. So there's a lot of labeling. And the automated labeling is getting better, but it's still definitely a challenge and one that I hope that the community will move towards addressing like, you know, there's a lot of potential there. But it has to be used properly with appropriate checks, because there's a lot of bad machine learning out there. And we don't want to be guilty of that as a community.
Kat Setzer 38:36
That makes a lot of sense. So you kind of have already talked about this, and this has kind of already been answered, but why should researchers pay attention to weird data? And how should they deal with it?
Erin Fischell 38:45
So the longer you get from an experiment, the more you forget about what's happened, and the more reliant you are on whatever notes you happen to take. That means that if you do not immediately analyze your data, which most people don't, most of the time, you know, you got a lot of funding to do the initial experiment and very little funding to actually do the analysis. It means that, you know, by the time you get to the analysis, you're just looking for the things that make sense that answer a specific question. The problem is there's a lot of data left on the cutting room floor, if you're approaching it like that, right? If you don't have the time and the interest to dig a little into the weird things that show up, as opposed to the things that directly answer the question you were hoping to study, there's a lot of information that gets lost. And again, if you don't go through this, frankly, painful process of documenting all of your metadata and publishing your code and getting it to the point where you can really say, "Here's my dataset, anyone who wants to use it can use it. And here's the scripts to read it into MATLAB or Python," right? No one's gonna really be able to use it. And so I think there's an important conversation to be had in terms of, you know, how do we share these large data sets? They're often quite large. How do we document the metadata in a way that makes it possible for other people to use it? And how do we track these things adequately over multiple years? Right? And how do we maybe, hopefully, recover data that's currently being lost? So I think about for example, you know, all the Arctic acoustics experiments that happened back in the 90s, those are on physical tape, under somebody's desk at Scripps. Certain parts of that data set were paid to be transcribed as a part of say, of the ICE-X efforts, you know, there was an effort to take a bunch of those tapes and digitize them and get it to the point where people could use them and they have been able to be used but, you know, as the whole generation of Cold War acousticians are retiring now, how do we preserve all that data in tapes and hard drives and floppy disks, you know, sitting on hundreds of desks across the world with metadata, right, which is probably handwritten notes in a notebook. And the thing that bothers me is, we don't have much of a better system now. Like maybe maybe all that stuff's lost, maybe we just, we have to say, you know, if it hasn't been digitized and documented by somebody in the last 10 years, you know, it may not be usable by anybody. But the experiments we're running now, the data isn't that much better handled for long term access. So I think I think it's mostly including in experiment designs ways to preserve the metadata with the data in a place where it'll be appropriately backed up, it'll be appropriately managed, and it'll be searchable and findable by people. And that's a huge ask, right? That's not something that's easy, unless people are writing it into their proposals and into their programs. So I think there's a funding agency role here as well. And there's been interest in this. But, you know, a question for ONR, right, is of the data sets that are scientific data sets where we want them available to the scientific community, how do you make sure that all this information is preserved along with them? So that people can look through weird data so people can take a propagation loss dataset, say, and pull whale calls out. Which it's there, right? It's almost guaranteed to be there. But the data that's publicly available is probably the transmission loss statistics, not the raw data files that would be needed to run a seal or whale call classifier against, right? So that's where it gets complicated. And that was very meandering, what I just said.
Kat Setzer 43:19
[laughs] It's okay.
Erin Fischell 43:20
I can try to summarize it a little more patly. I mean, I, it is, it's a major challenge to the acoustics community to figure out how to save and share and document data sets, so that it is accessible to scientists in adjacent fields who might have science questions that can be answered by data that isn't of interest to the original study, but might be of value in a different area.
Kat Setzer 43:50
Right, right. Yeah.
Erin Fischell 43:52
That's a more pat way to put it.
Kat Setzer 43:56
That's your elevator speech version of that.
Erin Fischell 43:58
That's my elevator speech. I mean, but no one wants to pay for it, as far as I know, right? And it's complicated in underwater acoustics because a lot of the work has been done by the Navy, right, so there's, there's always this, you know, what exists and what's available are very different things.
Kat Setzer 44:16
Right. All right. Do you have any other closing thoughts?
Erin Fischell 44:19
I think it's very important that people stay curious as they're looking at their data. And it's hard when you're on a timeline, and you're trying to just get stuff published. And actually, I've appreciated this a lot better since I left academia and I'm no longer just trying to get stuff published, right? I have a little more time to just be curious about my data and look for things that might surprise me. So looking for those surprises is a lot of the fun of science, like you know, you want that, "Huh. That's funny," moment looking at your data, to try and drive what other questions there might be about the environment you're in, about the things that might be in that environment and about how the world is changing. Because I think there's a lot of information in the data sets we are collecting now that in the future, people are going to want to understand how soundscape as a fundamental measure of health of the ocean is being affected by things like climate change, things like changing shipping, things like changing amount of carbon in the ocean, right? And I think the more we can do to document our data sets and make them accessible and make them searchable, the better we're going to be able to use what might be weird data to us now in the future. To better understand, you know, what's going on everywhere.
Kat Setzer 46:00
That all makes sense. Well, I am not going to lie, all these stories make the study of underwater acoustics sound just a wee bit daunting, but it also sounds like you end up getting some really interesting and valuable discoveries out of it, both despite and because of the so-called "weird data." I hope our listeners find your stories of strange and entertaining as I did. And thank you again for chatting with me.
Erin Fischell 46:24
Thank you, Kat.
Kat Setzer 46:28
Thank you for tuning into Across Acoustics. If you'd like to hear more interviews from our authors about their research, please Subscribe and find us on your preferred podcast platform.