Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 4583-4587 |
Seitenumfang | 5 |
Fachzeitschrift | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Jahrgang | 2023-August |
Publikationsstatus | Veröffentlicht - 2023 |
Veranstaltung | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Irland Dauer: 20 Aug. 2023 → 24 Aug. 2023 |
Abstract
Predictive uncertainty estimation of deep neural networks is important when their outputs are used for high stakes decision making. We investigate token-level uncertainty of connectionist temporal classification (CTC) based automatic speech recognition models. We propose an approach, which considers that not all changes at frame-level lead to a change at token-level after CTC decoding. The approach shows promising performance for prediction of recognition errors on TIMIT, Mozilla Common Voice (MCV) and kidsTALC, a corpus of children's speech, using two different model architectures, while introducing only negligible computational overhead. Our approach identifies over 80 % of a wav2vec2.0 model's errors on MCV by selecting 10 % of the tokens. We further show, that the predictive uncertainty estimate relates to the uncertainty of a human annotator, by re-annotating 500 utterances of kidsTALC.
ASJC Scopus Sachgebiete
- Geisteswissenschaftliche Fächer (insg.)
- Sprache und Linguistik
- Informatik (insg.)
- Mensch-Maschine-Interaktion
- Informatik (insg.)
- Signalverarbeitung
- Informatik (insg.)
- Software
- Mathematik (insg.)
- Modellierung und Simulation
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Jahrgang 2023-August, 2023, S. 4583-4587.
Publikation: Beitrag in Fachzeitschrift › Konferenzaufsatz in Fachzeitschrift › Forschung › Peer-Review
}
TY - JOUR
T1 - Uncertainty Estimation for Connectionist Temporal Classification Based Automatic Speech Recognition
AU - Rumberg, Lars
AU - Gebauer, Christopher
AU - Ehlert, Hanna
AU - Wallbaum, Maren
AU - Lüdtke, Ulrike
AU - Ostermann, Jörn
PY - 2023
Y1 - 2023
N2 - Predictive uncertainty estimation of deep neural networks is important when their outputs are used for high stakes decision making. We investigate token-level uncertainty of connectionist temporal classification (CTC) based automatic speech recognition models. We propose an approach, which considers that not all changes at frame-level lead to a change at token-level after CTC decoding. The approach shows promising performance for prediction of recognition errors on TIMIT, Mozilla Common Voice (MCV) and kidsTALC, a corpus of children's speech, using two different model architectures, while introducing only negligible computational overhead. Our approach identifies over 80 % of a wav2vec2.0 model's errors on MCV by selecting 10 % of the tokens. We further show, that the predictive uncertainty estimate relates to the uncertainty of a human annotator, by re-annotating 500 utterances of kidsTALC.
AB - Predictive uncertainty estimation of deep neural networks is important when their outputs are used for high stakes decision making. We investigate token-level uncertainty of connectionist temporal classification (CTC) based automatic speech recognition models. We propose an approach, which considers that not all changes at frame-level lead to a change at token-level after CTC decoding. The approach shows promising performance for prediction of recognition errors on TIMIT, Mozilla Common Voice (MCV) and kidsTALC, a corpus of children's speech, using two different model architectures, while introducing only negligible computational overhead. Our approach identifies over 80 % of a wav2vec2.0 model's errors on MCV by selecting 10 % of the tokens. We further show, that the predictive uncertainty estimate relates to the uncertainty of a human annotator, by re-annotating 500 utterances of kidsTALC.
KW - Automatic Speech Recognition
KW - Children's speech
KW - Uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85171598603&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2023-907
DO - 10.21437/Interspeech.2023-907
M3 - Conference article
AN - SCOPUS:85171598603
VL - 2023-August
SP - 4583
EP - 4587
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - 24th International Speech Communication Association, Interspeech 2023
Y2 - 20 August 2023 through 24 August 2023
ER -