We've just released Steam Audio beta 2, and one of the improvements we've made is to the performance of bilinear filtering of HRTF data. This blog post describes this feature in more detail, with examples and benchmarks. For the full release notes for the beta 2 release, click here.
Wait, What Is This About Again?
In Steam Audio, when you attach a Phonon Source component to any Audio Source in Unity, the following options appear:
Under Direct Sound, there's a drop-down labeled HRTF Interpolation. There are two settings here: Nearest and Bilinear. This post explains what each of those settings does, and the performance vs. quality trade-off involved in selecting the best option for your sound source.
For those of you using the Steam Audio SDK directly, note that HRTF effects are applied using the iplApplyBinauralEffect function:
There are two possible values of the interpolation parameter: IPL_HRTFINTERPOLATION_NEAREST and IPL_HRTFINTERPOLATION_BILINEAR. These are the same thing as the Unity drop-down options mentioned above, and the performance vs. quality trade-offs discussed here apply as-is.
Querying HRTF Data
Here's how HRTF data is measured. A person (or a mannequin) sits in the middle of an anechoic room. A microphone is placed in each of their ears. A speaker is positioned at one of many pre-determined directions around the listener, and a special sound is played through it. The sound recorded by the microphones is (after some post-processing) the HRTF for that specific direction. The HRTFs for all the measurement directions together makes up the HRTF data set for the listener.
So HRTFs are measured only at some specific directions. But in a game, the source can move freely, and the direction from the listener to the source may be something other than one of the measurement directions. Given all the measured HRTFs, how can we estimate the HRTF for the source's actual direction? There are two common ways of doing this:
Nearest Neighbor Just pick the measurement direction that's closest to the source's actual direction -- its nearest neighbor. If the measurements are regularly-spaced, this approach is super-fast. On the downside, as the source moves, the nearest neighbor will change, and you may be able to hear the sudden change in the HRTF.
Bilinear Interpolation For regularly-spaced measurements, calculate a weighted average of the HRTFs at the four measurement directions that are closest to the source's actual direction. This is slower than the nearest neighbor approach, because extra calculations are needed, and care must be taken because the averaging involves complex numbers. With this approach, you're less likely to hear noticeable changes in the HRTF as the source moves around.
Comparing Nearest Neighbor and Bilinear Interpolation For many kinds of sound, nearest neighbor works well in that sudden changes are not noticeable when the source moves around. For others, the changes are noticeable, and bilinear interpolation should be considered.
For example, here's white noise, with nearest neighbor, followed by bilinear interpolation:
In general, noise-like sounds, like machinery, helicopters, etc. benefit the most from bilinear interpolation. Sounds like speech and music tend to sound good enough with nearest neighbor.
Performance
To compare the CPU overhead of nearest neighbor and bilinear interpolation, we use a small benchmarking program that directly uses the Steam Audio SDK. We measured the performance of the iplApplyBinauralEffect function, which performs all of the computational work of HRTF-based binaural rendering, including either nearest neighbor or bilinear interpolation. To measure performance, we used the std::chrono::high_resolution_clock class, which is part of the C++11 standard library.
Terminology Before discussing any further, let’s quickly recap some basic terms related to audio processing.
Sampling Rate. The number of digital samples used to represent 1 second of audio. For example, a sampling rate of 48 kilohertz (48 kHz) means 48,000 sample values are used to represent 1 second of audio.
Frame Size. The number of digital samples that are processed at the same time. This is usually set by your audio engine. Typical frame sizes are 1024 samples or 512 samples.
Frame Time. The duration of audio represented by a single frame’s samples. This can be calculated as (audio frame time) = (audio frame size) / (sampling rate). For example, with a 1024-sample frame at 48 kHz, the audio frame time is about 21 milliseconds (ms).
Metric Measured With our test program, we measured the Theoretical Max. Sources that can be achieved on a single CPU core. This is the maximum number of HRTF-based 3D audio sources that can be processed on a given device, with one CPU core maxed out. To calculate this metric, we measure the time taken for a single execution of iplApplyBinauralEffect, and use the following calculation: (max sources) = (audio frame time) / (process time for 1 source). We do this because all audio processing for a single frame must complete within the audio frame time, or else the user will hear artifacts.
Device Specs Here are all the devices on which we measured the performance of Steam Audio:
Device
Operating System
CPU
Cores
Clock Frequency
Windows PC
Windows 10
Intel Core i7 6850K
6
3.6 GHz
Linux PC
Ubuntu 16.04 LTS
Intel Core i7 5930K
6
3.7 GHz
Mac Pro
macOS Sierra
Intel Xeon E5
4
3.7 GHz
Google Pixel
Android 7.1 Nougat
Qualcomm Snapdragon 821
4
2.15 GHz
Benchmark Results Below are the benchmarking results for each of the devices we tested. All numbers are the theoretical max. number of sources that can be rendered by using 100% of a single CPU core:
Device
Nearest Neighbor
Bilinear Interpolation
Windows PC
2290
1114
Linux PC
1437
743
Mac Pro
1852
712
Google Pixel
580
54
Conclusion
In conclusion, bilinear filtering is a great way to ensure that moving binaural sources sound smooth as they move. While there is a performance hit relative to the default nearest-neighbor filtering, the absolute performance remains high on PC platforms. On the other hand, for many common kinds of sounds, nearest-neighbor sounds acceptable, so you don't need to spend CPU cycles on bilinear filtering. You should experiment with the various sounds you're creating and using in your game or VR experience, and enable bilinear filtering only on the sounds where the difference is noticeable.
We've just released Steam Audio beta 2, and one of the improvements we've made is to the performance of bilinear filtering of HRTF data. This blog post describes this feature in more detail, with examples and benchmarks. For the full release notes for the beta 2 release, click here.
Wait, What Is This About Again?
In Steam Audio, when you attach a Phonon Source component to any Audio Source in Unity, the following options appear:
Under Direct Sound, there's a drop-down labeled HRTF Interpolation. There are two settings here: Nearest and Bilinear. This post explains what each of those settings does, and the performance vs. quality trade-off involved in selecting the best option for your sound source.
For those of you using the Steam Audio SDK directly, note that HRTF effects are applied using the iplApplyBinauralEffect function:
There are two possible values of the interpolation parameter: IPL_HRTFINTERPOLATION_NEAREST and IPL_HRTFINTERPOLATION_BILINEAR. These are the same thing as the Unity drop-down options mentioned above, and the performance vs. quality trade-offs discussed here apply as-is.
Querying HRTF Data
Here's how HRTF data is measured. A person (or a mannequin) sits in the middle of an anechoic room. A microphone is placed in each of their ears. A speaker is positioned at one of many pre-determined directions around the listener, and a special sound is played through it. The sound recorded by the microphones is (after some post-processing) the HRTF for that specific direction. The HRTFs for all the measurement directions together makes up the HRTF data set for the listener.
So HRTFs are measured only at some specific directions. But in a game, the source can move freely, and the direction from the listener to the source may be something other than one of the measurement directions. Given all the measured HRTFs, how can we estimate the HRTF for the source's actual direction? There are two common ways of doing this:
Nearest Neighbor Just pick the measurement direction that's closest to the source's actual direction -- its nearest neighbor. If the measurements are regularly-spaced, this approach is super-fast. On the downside, as the source moves, the nearest neighbor will change, and you may be able to hear the sudden change in the HRTF.
Bilinear Interpolation For regularly-spaced measurements, calculate a weighted average of the HRTFs at the four measurement directions that are closest to the source's actual direction. This is slower than the nearest neighbor approach, because extra calculations are needed, and care must be taken because the averaging involves complex numbers. With this approach, you're less likely to hear noticeable changes in the HRTF as the source moves around.
Comparing Nearest Neighbor and Bilinear Interpolation For many kinds of sound, nearest neighbor works well in that sudden changes are not noticeable when the source moves around. For others, the changes are noticeable, and bilinear interpolation should be considered.
For example, here's white noise, with nearest neighbor, followed by bilinear interpolation:
In general, noise-like sounds, like machinery, helicopters, etc. benefit the most from bilinear interpolation. Sounds like speech and music tend to sound good enough with nearest neighbor.
Performance
To compare the CPU overhead of nearest neighbor and bilinear interpolation, we use a small benchmarking program that directly uses the Steam Audio SDK. We measured the performance of the iplApplyBinauralEffect function, which performs all of the computational work of HRTF-based binaural rendering, including either nearest neighbor or bilinear interpolation. To measure performance, we used the std::chrono::high_resolution_clock class, which is part of the C++11 standard library.
Terminology Before discussing any further, let’s quickly recap some basic terms related to audio processing.
Sampling Rate. The number of digital samples used to represent 1 second of audio. For example, a sampling rate of 48 kilohertz (48 kHz) means 48,000 sample values are used to represent 1 second of audio.
Frame Size. The number of digital samples that are processed at the same time. This is usually set by your audio engine. Typical frame sizes are 1024 samples or 512 samples.
Frame Time. The duration of audio represented by a single frame’s samples. This can be calculated as (audio frame time) = (audio frame size) / (sampling rate). For example, with a 1024-sample frame at 48 kHz, the audio frame time is about 21 milliseconds (ms).
Metric Measured With our test program, we measured the Theoretical Max. Sources that can be achieved on a single CPU core. This is the maximum number of HRTF-based 3D audio sources that can be processed on a given device, with one CPU core maxed out. To calculate this metric, we measure the time taken for a single execution of iplApplyBinauralEffect, and use the following calculation: (max sources) = (audio frame time) / (process time for 1 source). We do this because all audio processing for a single frame must complete within the audio frame time, or else the user will hear artifacts.
Device Specs Here are all the devices on which we measured the performance of Steam Audio:
Device
Operating System
CPU
Cores
Clock Frequency
Windows PC
Windows 10
Intel Core i7 6850K
6
3.6 GHz
Linux PC
Ubuntu 16.04 LTS
Intel Core i7 5930K
6
3.7 GHz
Mac Pro
macOS Sierra
Intel Xeon E5
4
3.7 GHz
Google Pixel
Android 7.1 Nougat
Qualcomm Snapdragon 821
4
2.15 GHz
Benchmark Results Below are the benchmarking results for each of the devices we tested. All numbers are the theoretical max. number of sources that can be rendered by using 100% of a single CPU core:
Device
Nearest Neighbor
Bilinear Interpolation
Windows PC
2290
1114
Linux PC
1437
743
Mac Pro
1852
712
Google Pixel
580
54
Conclusion
In conclusion, bilinear filtering is a great way to ensure that moving binaural sources sound smooth as they move. While there is a performance hit relative to the default nearest-neighbor filtering, the absolute performance remains high on PC platforms. On the other hand, for many common kinds of sounds, nearest-neighbor sounds acceptable, so you don't need to spend CPU cycles on bilinear filtering. You should experiment with the various sounds you're creating and using in your game or VR experience, and enable bilinear filtering only on the sounds where the difference is noticeable.
We are excited to update the Steam Audio SDK after receiving some great feedback. Steam Audio SDK 2.0-beta.2 release includes the following changes:
(Unity Integration) Fixed a bug where baked data is not properly saved with the scene, requiring a bake every time a project is opened in the Unity editor.
(Unity Integration) Fixed a bug where all audio sources stop playing when Partial occlusion is turned on and a source intersects with a geometry.
(Unity Integration) Fixed a bug which introduced a buzz artifact when moving fast or transporting from one location to another.
(Unity integration) Fixed a bug where baking reverb or propagation effect crashes Unity if no probes have been generated.
(Unity Integration) Fixed a bug where coroutines on Phonon Source and Phonon Listener did not start properly if a GameObject is toggled.
(Unity Integration) Cleaned up Phonon Effect UX by dividing it into Phonon Source and Phonon Listener.
(Unity Integration) Reduced overhead when dynamically creating a GameObject with Phonon Source or Phonon Listener.
(C API) Fixed incorrect baked data lookup when two or more sources use baked data.
(Performance) Bilinear interpolation for HRTF is up to 4x faster on PC (Windows, Linux, macOS).
(Performance) Convolution for indirect sound is up to 2x faster.
(Documentation) Added information on the actual frequencies of the low, mid, and high frequency bands used for acoustic materials.
(Documentation) Added more introductory material to the API documentation.
(Hotfix-1) Fixed issue with Unity integration when Audio Listener is not present.
(Hotfix-1) Updated documentation - Environment Component, Environmental Renderer Component, and Phonon Static Listener Component can be attached to any GameObject in Unity.
Click here to download the latest version of Steam Audio.
Visit the Discussions Forum for important details on transitioning from Steam Audio SDK 2.0-beta.1 to version 2.0-beta.2.
We are excited to update the Steam Audio SDK after receiving some great feedback. Steam Audio SDK 2.0-beta.2 release includes the following changes:
(Unity Integration) Fixed a bug where baked data is not properly saved with the scene, requiring a bake every time a project is opened in the Unity editor.
(Unity Integration) Fixed a bug where all audio sources stop playing when Partial occlusion is turned on and a source intersects with a geometry.
(Unity Integration) Fixed a bug which introduced a buzz artifact when moving fast or transporting from one location to another.
(Unity integration) Fixed a bug where baking reverb or propagation effect crashes Unity if no probes have been generated.
(Unity Integration) Fixed a bug where coroutines on Phonon Source and Phonon Listener did not start properly if a GameObject is toggled.
(Unity Integration) Cleaned up Phonon Effect UX by dividing it into Phonon Source and Phonon Listener.
(Unity Integration) Reduced overhead when dynamically creating a GameObject with Phonon Source or Phonon Listener.
(C API) Fixed incorrect baked data lookup when two or more sources use baked data.
(Performance) Bilinear interpolation for HRTF is up to 4x faster on PC (Windows, Linux, macOS).
(Performance) Convolution for indirect sound is up to 2x faster.
(Documentation) Added information on the actual frequencies of the low, mid, and high frequency bands used for acoustic materials.
(Documentation) Added more introductory material to the API documentation.
(Hotfix-1) Fixed issue with Unity integration when Audio Listener is not present.
(Hotfix-1) Updated documentation - Environment Component, Environmental Renderer Component, and Phonon Static Listener Component can be attached to any GameObject in Unity.
Click here to download the latest version of Steam Audio.
Visit the Discussions Forum for important details on transitioning from Steam Audio SDK 2.0-beta.1 to version 2.0-beta.2.
Steam Audio is now available, delivering an advanced spatial audio solution for games and VR apps. Steam Audio includes several exciting features that significantly improve immersion and open up new possibilities for spatial audio design.
The Steam Audio SDK is available free of charge, for use by teams of any size, without any royalty requirements. Steam Audio currently supports Windows, Linux, macOS, and Android. Just like Steam itself, Steam Audio is available for use with a growing list of VR devices and platforms.
Steam Audio SDK is not restricted to any particular VR device or to Steam.
Steam Audio adds physics-based sound propagation on top of HRTF-based binaural audio, for increased immersion. Sounds interact with and bounce off of the actual scene geometry, so they feel like they are actually in the scene, and give players more information about the scene they are in.
What can Steam Audio do?
Binaural Rendering The simplest thing that any spatial audio technology must do is HRTF-based binaural rendering. This refers to a way of recreating how a sound is affected by a listener's head, ears, and torso, resulting in subtle cues that allow you to pinpoint where a sound is coming from.
Steam Audio's implementation of HRTF-based binaural rendering has a very low CPU overhead; you can handle hundreds, even thousands of sources using a single CPU core. It also minimizes the frequency coloration of audio clips, while maintaining good localization.
Occlusion Steam Audio simulates how objects occlude sound sources. In addition to the typical raycast occlusion that many game engines already support, Steam Audio supports partial occlusion: if you can see part of a sound source, Steam Audio will only partly occlude the sound. Steam Audio uses your existing scene geometry to occlude sounds, so you don't need to create special occlusion geometry just for sounds.
Physics-Based Reverb Reflections and reverb can add a lot to spatial audio. Steam Audio uses the actual scene geometry to simulate reverb. This lets users sense the scene around them through subtle sound cues, an important addition to VR audio. This physics-based reverb can handle many important scenarios that don't easily fit within a simple box-model.
Steam Audio applies physics-based reverb by simulating how sound bounces off of the different objects in the scene, based on their acoustic material properties (a carpet doesn't reflect as much sound as a large pane of glass, for example). Simulations can run in real-time, so the reverb can respond easily to design changes. Add furniture to a room, or change a wall from brick to drywall, and you can hear the difference.
Real-Time Sound Propagation In reality, sound is emitted from a source, after which it bounces around through the environment, interacting with and reflecting off of various objects before it reaches the listener. Developers have wanted to model this effect, and tend to manually (and painstakingly!) approximate sound propagation using hand-tuned filters and scripts. Steam Audio automatically models these sound propagation effects.
Steam Audio simulates sound propagation in real time, so the effects can change automatically as sources move around the scene. Sounds interact with the actual geometry of the scene, so they feel integrated with the scene.
Baked Reverb & Propagation Just like light probes can accelerate high-quality lighting calculations by precomputing lighting in static scenes, Steam Audio can bake sound propagation and reverb effects in a static scene. For largely static scenes, baking can significantly reduce CPU load while allowing you to improve the quality of sound propagation and reverb effects.
If your geometry is mostly static, you can bake reverb during design. If a sound source is fixed in place, you can bake sound propagation effects during design. For VR experiences where you have only a few listener positions, but multiple moving sources, you can bake sound propagation effects during design too.
Putting It All Together Steam Audio can apply binaural rendering to occlusion, reverb, and sound propagation effects, so you can get a strong sense of space and direction, even from reflected sounds, reverb entering a room through a doorway, and more.
Download the Steam Audio SDK Beta now and try any of these features today. Steam Audio is currently available as a plugin for Unity and as a C API for integration into custom engines and tools.
Steam Audio is now available, delivering an advanced spatial audio solution for games and VR apps. Steam Audio includes several exciting features that significantly improve immersion and open up new possibilities for spatial audio design.
The Steam Audio SDK is available free of charge, for use by teams of any size, without any royalty requirements. Steam Audio currently supports Windows, Linux, macOS, and Android. Just like Steam itself, Steam Audio is available for use with a growing list of VR devices and platforms.
Steam Audio SDK is not restricted to any particular VR device or to Steam.
Steam Audio adds physics-based sound propagation on top of HRTF-based binaural audio, for increased immersion. Sounds interact with and bounce off of the actual scene geometry, so they feel like they are actually in the scene, and give players more information about the scene they are in.
What can Steam Audio do?
Binaural Rendering The simplest thing that any spatial audio technology must do is HRTF-based binaural rendering. This refers to a way of recreating how a sound is affected by a listener's head, ears, and torso, resulting in subtle cues that allow you to pinpoint where a sound is coming from.
Steam Audio's implementation of HRTF-based binaural rendering has a very low CPU overhead; you can handle hundreds, even thousands of sources using a single CPU core. It also minimizes the frequency coloration of audio clips, while maintaining good localization.
Occlusion Steam Audio simulates how objects occlude sound sources. In addition to the typical raycast occlusion that many game engines already support, Steam Audio supports partial occlusion: if you can see part of a sound source, Steam Audio will only partly occlude the sound. Steam Audio uses your existing scene geometry to occlude sounds, so you don't need to create special occlusion geometry just for sounds.
Physics-Based Reverb Reflections and reverb can add a lot to spatial audio. Steam Audio uses the actual scene geometry to simulate reverb. This lets users sense the scene around them through subtle sound cues, an important addition to VR audio. This physics-based reverb can handle many important scenarios that don't easily fit within a simple box-model.
Steam Audio applies physics-based reverb by simulating how sound bounces off of the different objects in the scene, based on their acoustic material properties (a carpet doesn't reflect as much sound as a large pane of glass, for example). Simulations can run in real-time, so the reverb can respond easily to design changes. Add furniture to a room, or change a wall from brick to drywall, and you can hear the difference.
Real-Time Sound Propagation In reality, sound is emitted from a source, after which it bounces around through the environment, interacting with and reflecting off of various objects before it reaches the listener. Developers have wanted to model this effect, and tend to manually (and painstakingly!) approximate sound propagation using hand-tuned filters and scripts. Steam Audio automatically models these sound propagation effects.
Steam Audio simulates sound propagation in real time, so the effects can change automatically as sources move around the scene. Sounds interact with the actual geometry of the scene, so they feel integrated with the scene.
Baked Reverb & Propagation Just like light probes can accelerate high-quality lighting calculations by precomputing lighting in static scenes, Steam Audio can bake sound propagation and reverb effects in a static scene. For largely static scenes, baking can significantly reduce CPU load while allowing you to improve the quality of sound propagation and reverb effects.
If your geometry is mostly static, you can bake reverb during design. If a sound source is fixed in place, you can bake sound propagation effects during design. For VR experiences where you have only a few listener positions, but multiple moving sources, you can bake sound propagation effects during design too.
Putting It All Together Steam Audio can apply binaural rendering to occlusion, reverb, and sound propagation effects, so you can get a strong sense of space and direction, even from reflected sounds, reverb entering a room through a doorway, and more.
Download the Steam Audio SDK Beta now and try any of these features today. Steam Audio is currently available as a plugin for Unity and as a C API for integration into custom engines and tools.