DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters

A.M. Castro Martinez; Lukas Gerlach; Guillermo Payá-Vayá; Hynek Hermansky; Jasper Ooster; Bernd T. Meyer

doi:10.1016/j.specom.2018.11.006

Details

Originalsprache	Englisch
Seiten (von - bis)	44-56
Seitenumfang	13
Fachzeitschrift	Speech communication
Jahrgang	106
Frühes Online-Datum	26 Nov. 2018
Publikationsstatus	Veröffentlicht - Jan. 2019

Abstract

In several applications of machine listening, predicting how well an automatic speech recognition system will perform before the actual decoding enables the system to adapt to unseen acoustic characteristics dynamically. Feedback about speech quality, for instance, could allow modern hearing aids to select a speech source in complex acoustic scenes with the aim of enhancing the speech intelligibility of a target speaker. In this study, we look at different performance measures to estimate the word error rates of simulated behind-the-ear hearing aid signals and detect the azimuth angle of the target source in 180-degree spatial scenes. These measures derive from phoneme posterior probabilities produced by a deep neural network acoustic model. However, the more complex the model is, the more computationally expensive it becomes to obtain these measures; therefore, we assess how the model size affects prediction performance. Our findings suggest measures derived from smaller nets are suitable to predict error rates of more complex models reliably enough to be implemented in hearing aid hardware.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Mathematik (insg.)
Modellierung und Simulation
Sozialwissenschaften (insg.)
Kommunikation
Geisteswissenschaftliche Fächer (insg.)
Sprache und Linguistik
Sozialwissenschaften (insg.)
Linguistik und Sprache
Informatik (insg.)
Maschinelles Sehen und Mustererkennung
Informatik (insg.)
Angewandte Informatik

Ziele für nachhaltige Entwicklung

SDG 3 – Gute Gesundheit und Wohlergehen

Zitieren

DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters. / Castro Martinez, A.M.; Gerlach, Lukas; Payá-Vayá, Guillermo et al.
in: Speech communication, Jahrgang 106, 01.2019, S. 44-56.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Castro Martinez, AM, Gerlach, L, Payá-Vayá, G, Hermansky, H, Ooster, J & Meyer, BT 2019, 'DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters', Speech communication, Jg. 106, S. 44-56. https://doi.org/10.1016/j.specom.2018.11.006

Castro Martinez, A. M., Gerlach, L., Payá-Vayá, G., Hermansky, H., Ooster, J., & Meyer, B. T. (2019). DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters. Speech communication, 106, 44-56. https://doi.org/10.1016/j.specom.2018.11.006

Castro Martinez AM, Gerlach L, Payá-Vayá G, Hermansky H, Ooster J, Meyer BT. DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters. Speech communication. 2019 Jan;106:44-56. Epub 2018 Nov 26. doi: 10.1016/j.specom.2018.11.006

Castro Martinez, A.M. ; Gerlach, Lukas ; Payá-Vayá, Guillermo et al. / DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters. in: Speech communication. 2019 ; Jahrgang 106. S. 44-56.

Download

@article{1d7cc7956c944982ba80fcff6651b5ff,

title = "DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters",

abstract = "In several applications of machine listening, predicting how well an automatic speech recognition system will perform before the actual decoding enables the system to adapt to unseen acoustic characteristics dynamically. Feedback about speech quality, for instance, could allow modern hearing aids to select a speech source in complex acoustic scenes with the aim of enhancing the speech intelligibility of a target speaker. In this study, we look at different performance measures to estimate the word error rates of simulated behind-the-ear hearing aid signals and detect the azimuth angle of the target source in 180-degree spatial scenes. These measures derive from phoneme posterior probabilities produced by a deep neural network acoustic model. However, the more complex the model is, the more computationally expensive it becomes to obtain these measures; therefore, we assess how the model size affects prediction performance. Our findings suggest measures derived from smaller nets are suitable to predict error rates of more complex models reliably enough to be implemented in hearing aid hardware.",

keywords = "Automatic speech recognition, Hearing aids, Performance monitoring, Spatial filtering",

author = "{Castro Martinez}, A.M. and Lukas Gerlach and Guillermo Pay{\'a}-Vay{\'a} and Hynek Hermansky and Jasper Ooster and Meyer, {Bernd T.}",

note = "Funding Information: This work was funded by the DFG (Cluster of Excellence 1077/1 Hearing4All (http://hearing4all.eu)).",

year = "2019",

month = jan,

doi = "10.1016/j.specom.2018.11.006",

language = "English",

volume = "106",

pages = "44--56",

journal = "Speech communication",

issn = "0167-6393",

publisher = "Elsevier",

}

Download

TY - JOUR

T1 - DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters

AU - Castro Martinez, A.M.

AU - Gerlach, Lukas

AU - Payá-Vayá, Guillermo

AU - Hermansky, Hynek

AU - Ooster, Jasper

AU - Meyer, Bernd T.

N1 - Funding Information: This work was funded by the DFG (Cluster of Excellence 1077/1 Hearing4All (http://hearing4all.eu)).

PY - 2019/1

Y1 - 2019/1

N2 - In several applications of machine listening, predicting how well an automatic speech recognition system will perform before the actual decoding enables the system to adapt to unseen acoustic characteristics dynamically. Feedback about speech quality, for instance, could allow modern hearing aids to select a speech source in complex acoustic scenes with the aim of enhancing the speech intelligibility of a target speaker. In this study, we look at different performance measures to estimate the word error rates of simulated behind-the-ear hearing aid signals and detect the azimuth angle of the target source in 180-degree spatial scenes. These measures derive from phoneme posterior probabilities produced by a deep neural network acoustic model. However, the more complex the model is, the more computationally expensive it becomes to obtain these measures; therefore, we assess how the model size affects prediction performance. Our findings suggest measures derived from smaller nets are suitable to predict error rates of more complex models reliably enough to be implemented in hearing aid hardware.

AB - In several applications of machine listening, predicting how well an automatic speech recognition system will perform before the actual decoding enables the system to adapt to unseen acoustic characteristics dynamically. Feedback about speech quality, for instance, could allow modern hearing aids to select a speech source in complex acoustic scenes with the aim of enhancing the speech intelligibility of a target speaker. In this study, we look at different performance measures to estimate the word error rates of simulated behind-the-ear hearing aid signals and detect the azimuth angle of the target source in 180-degree spatial scenes. These measures derive from phoneme posterior probabilities produced by a deep neural network acoustic model. However, the more complex the model is, the more computationally expensive it becomes to obtain these measures; therefore, we assess how the model size affects prediction performance. Our findings suggest measures derived from smaller nets are suitable to predict error rates of more complex models reliably enough to be implemented in hearing aid hardware.

KW - Automatic speech recognition

KW - Hearing aids

KW - Performance monitoring

KW - Spatial filtering

UR - http://www.scopus.com/inward/record.url?scp=85057441627&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2018.11.006

DO - 10.1016/j.specom.2018.11.006

M3 - Article

VL - 106

SP - 44

EP - 56

JO - Speech communication

JF - Speech communication

SN - 0167-6393

ER -

Research@Leibniz University

DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Ziele für nachhaltige Entwicklung

Zitieren