Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Proceedings of the International Congress on Acoustics |
Publikationsstatus | Veröffentlicht - 2022 |
Veranstaltung | 24th International Congress on Acoustics, ICA 2022 - Gyeongju, Südkorea Dauer: 24 Okt. 2022 → 28 Okt. 2022 |
Publikationsreihe
Name | Proceedings of the International Congress on Acoustics |
---|---|
Herausgeber (Verlag) | International Commission for Acoustics (ICA) |
ISSN (Print) | 2226-7808 |
Abstract
When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.
ASJC Scopus Sachgebiete
- Ingenieurwesen (insg.)
- Maschinenbau
- Physik und Astronomie (insg.)
- Akustik und Ultraschall
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Proceedings of the International Congress on Acoustics. 2022. (Proceedings of the International Congress on Acoustics ).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - On the impact of input scaling strategies for deep learning based DOA estimation from Ambisonics signals
AU - Poschadel, Nils
AU - Kiyan, Roman
AU - Preihs, Stephan
AU - Peissig, Jürgen
N1 - Publisher Copyright: © 2022 Proceedings of the International Congress on Acoustics. All rights reserved.
PY - 2022
Y1 - 2022
N2 - When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.
AB - When training deep neural networks, input scaling plays an important role and can contribute decisively to the performance of the resulsting models. Here, we focus on the application of estimating the direction of arrival (DOA) from noisy Ambisonics speech signals with convolutional recurrent neural networks. The input features used for training the models are either amplitude and phase spectrograms or spectrograms of features derived from the intensity vector. In this work we systematically evaluate different input scaling strategies at the level of both audio data and spectrograms, as well as combined scaling. Our investigations give insights in the dependence of DOA estimation accuracy on various combinations of scaling across different dimensions of the input data. We evaluate both regression and classification models as well as single- and multi-speaker scenarios. Our results might serve as a guidance for design choices of preprocessing methods for similar applications.
KW - DOA
KW - Feature Scaling
KW - FOA
KW - Input Scaling
UR - http://www.scopus.com/inward/record.url?scp=85192520745&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85192520745
T3 - Proceedings of the International Congress on Acoustics
BT - Proceedings of the International Congress on Acoustics
T2 - 24th International Congress on Acoustics, ICA 2022
Y2 - 24 October 2022 through 28 October 2022
ER -