On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Nils Poschadel
  • Roman Kiyan
  • Stephan Preihs
  • Jürgen Peissig

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the International Congress on Acoustics
PublikationsstatusVeröffentlicht - 2022
Veranstaltung24th International Congress on Acoustics, ICA 2022 - Gyeongju, Südkorea
Dauer: 24 Okt. 202228 Okt. 2022

Publikationsreihe

NameProceedings of the International Congress on Acoustics
Herausgeber (Verlag)International Commission for Acoustics (ICA)
ISSN (Print)2226-7808

Abstract

When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.

ASJC Scopus Sachgebiete

Zitieren

On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals. / Poschadel, Nils; Kiyan, Roman; Preihs, Stephan et al.
Proceedings of the International Congress on Acoustics. 2022. (Proceedings of the International Congress on Acoustics ).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Poschadel, N, Kiyan, R, Preihs, S & Peissig, J 2022, On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals. in Proceedings of the International Congress on Acoustics. Proceedings of the International Congress on Acoustics , 24th International Congress on Acoustics, ICA 2022, Gyeongju, Südkorea, 24 Okt. 2022.
Poschadel, N., Kiyan, R., Preihs, S., & Peissig, J. (2022). On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals. In Proceedings of the International Congress on Acoustics (Proceedings of the International Congress on Acoustics ).
Poschadel N, Kiyan R, Preihs S, Peissig J. On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals. in Proceedings of the International Congress on Acoustics. 2022. (Proceedings of the International Congress on Acoustics ).
Poschadel, Nils ; Kiyan, Roman ; Preihs, Stephan et al. / On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals. Proceedings of the International Congress on Acoustics. 2022. (Proceedings of the International Congress on Acoustics ).
Download
@inproceedings{0f1294b7c03740c0a5474371726b0016,
title = "On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals",
abstract = "When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.",
keywords = "DOA, Feature Scaling, FOA, Input Scaling",
author = "Nils Poschadel and Roman Kiyan and Stephan Preihs and J{\"u}rgen Peissig",
note = "Publisher Copyright: {\textcopyright} 2022 Proceedings of the International Congress on Acoustics. All rights reserved.; 24th International Congress on Acoustics, ICA 2022 ; Conference date: 24-10-2022 Through 28-10-2022",
year = "2022",
language = "English",
series = "Proceedings of the International Congress on Acoustics ",
publisher = "International Commission for Acoustics (ICA)",
booktitle = "Proceedings of the International Congress on Acoustics",

}

Download

TY - GEN

T1 - On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals

AU - Poschadel, Nils

AU - Kiyan, Roman

AU - Preihs, Stephan

AU - Peissig, Jürgen

N1 - Publisher Copyright: © 2022 Proceedings of the International Congress on Acoustics. All rights reserved.

PY - 2022

Y1 - 2022

N2 - When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.

AB - When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.

KW - DOA

KW - Feature Scaling

KW - FOA

KW - Input Scaling

UR - http://www.scopus.com/inward/record.url?scp=85192520745&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85192520745

T3 - Proceedings of the International Congress on Acoustics

BT - Proceedings of the International Congress on Acoustics

T2 - 24th International Congress on Acoustics, ICA 2022

Y2 - 24 October 2022 through 28 October 2022

ER -