An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment

Shivam Saini; Isaac Engel; Jürgen Peissig

doi:10.1186/s13636-024-00338-6

Details

Original language	English
Article number	16
Number of pages	24
Journal	Eurasip Journal on Audio, Speech, and Music Processing
Volume	2024
Publication status	Published - 27 Mar 2024

Abstract

Audio augmented reality (AAR), a prominent topic in the field of audio, requires understanding the listening environment of the user for rendering an authentic virtual auditory object. Reverberation time (RT₆₀) is a predominant metric for the characterization of room acoustics and numerous approaches have been proposed to estimate it blindly from a reverberant speech signal. However, a single RT₆₀ value may not be sufficient to correctly describe and render the acoustics of a room. This contribution presents a method for the estimation of multiple room acoustic parameters required to render close-to-accurate room acoustics in an unknown environment. It is shown how these parameters can be estimated blindly using an audio transformer that can be deployed on a mobile device. Furthermore, the paper also discusses the use of the estimated room acoustic parameters to find a similar room from a dataset of real BRIRs that can be further used for rendering the virtual audio source. Additionally, a novel binaural room impulse response (BRIR) augmentation technique to overcome the limitation of inadequate data is proposed. Finally, the proposed method is validated perceptually by means of a listening test.

Keywords

Binaural rendering, BRIR augmentation, Reverberation time estimation

ASJC Scopus subject areas

Physics and Astronomy(all)
Acoustics and Ultrasonics
Engineering(all)
Electrical and Electronic Engineering

Cite this

An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment. / Saini, Shivam; Engel, Isaac; Peissig, Jürgen.
In: Eurasip Journal on Audio, Speech, and Music Processing, Vol. 2024, 16, 27.03.2024.

Research output: Contribution to journal › Article › Research › peer review

Saini, S, Engel, I & Peissig, J 2024, 'An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment', Eurasip Journal on Audio, Speech, and Music Processing, vol. 2024, 16. https://doi.org/10.1186/s13636-024-00338-6

Saini, S., Engel, I., & Peissig, J. (2024). An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment. Eurasip Journal on Audio, Speech, and Music Processing, 2024, Article 16. https://doi.org/10.1186/s13636-024-00338-6

Saini S, Engel I, Peissig J. An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment. Eurasip Journal on Audio, Speech, and Music Processing. 2024 Mar 27;2024:16. doi: 10.1186/s13636-024-00338-6

Saini, Shivam ; Engel, Isaac ; Peissig, Jürgen. / An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment. In: Eurasip Journal on Audio, Speech, and Music Processing. 2024 ; Vol. 2024.

Download

@article{db95d0a4479c48a984dcd608265d9f0b,

title = "An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment",

abstract = "Audio augmented reality (AAR), a prominent topic in the field of audio, requires understanding the listening environment of the user for rendering an authentic virtual auditory object. Reverberation time (RT60) is a predominant metric for the characterization of room acoustics and numerous approaches have been proposed to estimate it blindly from a reverberant speech signal. However, a single RT60 value may not be sufficient to correctly describe and render the acoustics of a room. This contribution presents a method for the estimation of multiple room acoustic parameters required to render close-to-accurate room acoustics in an unknown environment. It is shown how these parameters can be estimated blindly using an audio transformer that can be deployed on a mobile device. Furthermore, the paper also discusses the use of the estimated room acoustic parameters to find a similar room from a dataset of real BRIRs that can be further used for rendering the virtual audio source. Additionally, a novel binaural room impulse response (BRIR) augmentation technique to overcome the limitation of inadequate data is proposed. Finally, the proposed method is validated perceptually by means of a listening test.",

keywords = "Binaural rendering, BRIR augmentation, Reverberation time estimation",

author = "Shivam Saini and Isaac Engel and J{\"u}rgen Peissig",

note = "Funding Information: This work is part of the PhD research of Shivam Saini at Leibniz Universit{\"a}t Hannover commissioned by Huawei Technologies D{\"u}sseldorf GmbH.",

year = "2024",

month = mar,

day = "27",

doi = "10.1186/s13636-024-00338-6",

language = "English",

volume = "2024",

journal = "Eurasip Journal on Audio, Speech, and Music Processing",

issn = "1687-4714",

publisher = "Springer Publishing Company",

}

Download

TY - JOUR

T1 - An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment

AU - Saini, Shivam

AU - Engel, Isaac

AU - Peissig, Jürgen

N1 - Funding Information: This work is part of the PhD research of Shivam Saini at Leibniz Universität Hannover commissioned by Huawei Technologies Düsseldorf GmbH.

PY - 2024/3/27

Y1 - 2024/3/27

N2 - Audio augmented reality (AAR), a prominent topic in the field of audio, requires understanding the listening environment of the user for rendering an authentic virtual auditory object. Reverberation time (RT60) is a predominant metric for the characterization of room acoustics and numerous approaches have been proposed to estimate it blindly from a reverberant speech signal. However, a single RT60 value may not be sufficient to correctly describe and render the acoustics of a room. This contribution presents a method for the estimation of multiple room acoustic parameters required to render close-to-accurate room acoustics in an unknown environment. It is shown how these parameters can be estimated blindly using an audio transformer that can be deployed on a mobile device. Furthermore, the paper also discusses the use of the estimated room acoustic parameters to find a similar room from a dataset of real BRIRs that can be further used for rendering the virtual audio source. Additionally, a novel binaural room impulse response (BRIR) augmentation technique to overcome the limitation of inadequate data is proposed. Finally, the proposed method is validated perceptually by means of a listening test.

AB - Audio augmented reality (AAR), a prominent topic in the field of audio, requires understanding the listening environment of the user for rendering an authentic virtual auditory object. Reverberation time (RT60) is a predominant metric for the characterization of room acoustics and numerous approaches have been proposed to estimate it blindly from a reverberant speech signal. However, a single RT60 value may not be sufficient to correctly describe and render the acoustics of a room. This contribution presents a method for the estimation of multiple room acoustic parameters required to render close-to-accurate room acoustics in an unknown environment. It is shown how these parameters can be estimated blindly using an audio transformer that can be deployed on a mobile device. Furthermore, the paper also discusses the use of the estimated room acoustic parameters to find a similar room from a dataset of real BRIRs that can be further used for rendering the virtual audio source. Additionally, a novel binaural room impulse response (BRIR) augmentation technique to overcome the limitation of inadequate data is proposed. Finally, the proposed method is validated perceptually by means of a listening test.

KW - Binaural rendering

KW - BRIR augmentation

KW - Reverberation time estimation

UR - http://www.scopus.com/inward/record.url?scp=85189182996&partnerID=8YFLogxK

U2 - 10.1186/s13636-024-00338-6

DO - 10.1186/s13636-024-00338-6

M3 - Article

AN - SCOPUS:85189182996

VL - 2024

JO - Eurasip Journal on Audio, Speech, and Music Processing

JF - Eurasip Journal on Audio, Speech, and Music Processing

SN - 1687-4714

M1 - 16

ER -

Research@Leibniz University

An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this