趣味人のブログ

Cogito, ergo sum. 我思う故に我あり.

SoundObject

1. Overview of SoundObject

SoundObject creates a binaural sound with the senses of three-dimensional sound localization from a monaural acoustic source and its positional information. It supports headphones-based as well as stereo speakers-based 3D spatial sound. Furthermore, it could also support headphones monitoring of stereo speaker sound.


www.youtube.com


www.youtube.com by xas abra


www.youtube.com by xas abra

Conventional three-dimensional binaural sound processors implement sound localization with the convolution of an acoustic source and head-related impulse response (HRIR) that represents scattering by the head. Since the convolution consumes a large amount of computational resource, SoundObject assumes that scattering by a head consists of scattering by a rigid sphere and pinnae (earlobes), then enables sound localization with simplified sphere and pinna scattering effect filters.

And in many cases, conventional convolution-based binaural sound rarely creates a sense of front distance, while SoundObject enables it by reflected waves in a reverberation room.

Furthermore, Doppler effect inevitably results from a moving acoustic source. Generally, sine waves whose frequency difference is 0.3% are distinguishable as different sounds. This fact implies that the recognizable Doppler effect results from an acoustic source whose speed exceeds approximately 1m/s, where acoustic speed is 345m/s. Since 1m/s is never high speed, SoundObject constantly adds Doppler effect when an acoustic source is moving.


www.youtube.com


www.youtube.com

SoundObject is provided as a VST 3 plug-in for digital audio workstations (DAW) and supports 44.1KHz, 48KHz, and 96KHz sampling rates. OS environments are 64bit Windows 10 and macOS 10.14.

SoundObject binary distribution is licensed under Creative Commons Attribution 4.0 (CC BY 4.0) at no charge.

github.com

SoundObject source code distribution is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0) at no charge.

github.com

2. Block diagram of SoundObject

SoundObject enables senses of three-dimensional directions, distances, and speed by the following:

  • A sense of leftward and rightward direction by interaural time difference (ITD) and interaural level difference (ILD).
  • A sense of leftward and rightward distance by sphere scattering effect filter.
  • Senses of front-back and upward-downward directions by pinna scattering effect filter.
  • Senses of front-back and upward-downward distances by reflected waves.
  • A sense of speed by Doppler effect.

A single channel block diagram of SoundObject is shown in Fig. 1.

f:id:suzumushi0:20220108153612p:plain

Fig. 1: A single channel Block diagram of SoundObject.

As shown in the figure, a single channel of SoundObject inputs a monaural acoustic source, then creates a direct wave, reflected waves from 6 directions, and combined waves of them. And it finally outputs as headphones-based (binaural) or speakers-based (transaural) 3D sound.

Delay line and resampler

SoundObject implements ITD and ILD with a delay line and Doppler effect with a resampler. The latter is supported by polyphase decomposition-based up and down sampling. For example, in the case of a 48KHz sampling input signal, it is once converted into a 43.2MHz−52.8MHz sampling signal then outputted as a 48KHz sampling signal. Polyphase decomposition-based up and down sampling is a well-known technology, thus leaving out a detailed explanation.

Sphere scattering effect filter

Sphere scattering effect filter inputs an incident wave to a rigid sphere then outputs the incident and scattered waves by the sphere. Details are described in the following document.

suzumushi0.hatenablog.com

Pinna scattering effect filter

Pinna scattering effect filter outputs convolution of an input and impulse response that represents scattering by the pinna. Pinna-related impulse responses of SoundObject are derived from open head-related transfer function (HRTF) databases. Details are described in the following document.

suzumushi0.hatenablog.com

Coordinate converter

Coordinate converter calculates delay, decay, and angles such as θo and θp of direct and reflected waves from a position of the acoustic source, dimensions of the reverberation room, and a position of the sphere in real-time.

To monitor the effects of digital signal processing units described above, SoundObject could select the output from combined waves as well as direct wave, reflected waves, scattered waves by the sphere, and incident wave with ITD, ILD, and Doppler. And reflected waves pass through a low pass filter. It could change the feeling of reflected waves.

3. Operation of SoundObject

Coordinate system

Coordinate system of SoundObject based on the spatially oriented format for acoustics (SOFA) defined by AES69-2020 [1] specification is shown in Fig. 2.

Fig. 2: Coordinate system of SoundObject.

As shown in the figure, in SoundObject, the center of the sphere is located at the origin. And x, y, and z-axis directions are the front, leftward, and upward directions respectively. In the polar coordinates system, radius r is the distance from the origin to the acoustic source, elevation angle θ [−90, 90] deg. is measured from the x-y plane, and azimuth angle φ [0, 360) deg. is measured counterclockwise from the x-axis. The depth, width, and height of the reverberation room are the distances of x, y, and z-axis directions respectively. And distances to the center of the sphere are measured from the point shown in the figure.

Note: The definition of elevation angle in AES69-2020 is different from the typical definition.

User interface

User interface of SoundObject is shown in Fig. 3.

Fig. 3: User interface of SoundObject.

The position of the acoustic source colud be set from any of the rectangular coordinates, the polar coordinates, the x-y pad illustrating the top view, and the y-z pad also illustrating the rear view. It is constrained by the inside of the reverberation room shown by the rectangles.

The distance attenuation slider indicates the distance decay of direct and reflected waves. −6 dB means decreasing by half when the distance to the acoustic source is doubled (i.e., point sound source: compatible with SoundObject ver. 2 and earlier) and 0 dB means no attenuation (i.e., plane sound source).

The reflectance slider indicates the reflectance of the reverberation room. Typically, it is set to less than 0 dB, however, in a large reverberation room, it may be set to over 0 dB because the level of reflected waves decreases due to distance attenuation.

The LPF cutoff frequency slider denotes the cutoff frequency of the LPF for reflected waves. However, in the case of ∞ KHz, the LPF function is invalidated.

The HRIR option menu selects the HRIR database used by the pinna scattering effect filter. And the output option menu selects monitoring outputs described in section 2. It also selects headphones-based (binaural) or speakers-based (transaural) 3D sound. Furthermore, as shown in Fig. 4, azimuth to the left speaker is measured from the x-axis.

f:id:suzumushi0:20220108154433p:plain

Fig. 4: Azimuth angle to the left speaker.

Descriptions for dimensions of the reverberation room and center of the sphere are explained in the coordinate system. In addition to these, settings for acoustic speed and radius of the sphere are updated during the DAW is not playing, that is, not updated in real-time.

Headphones monitoring of stereo speaker sound

Refer to the following.

suzumushi0.hatenablog.com

References

[1] Audio Engineering Society, "AES standard for file exchange - Spatial acoustic data file format," AES69-2020, December 2020.

VST is a registered trademark of Steinberg Media Technologies GmbH.

f:id:suzumushi0:20210807145222p:plain