Details
Original language | English |
---|---|
Pages (from-to) | 4583-4587 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2023-August |
Publication status | Published - 2023 |
Event | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 |
Abstract
Predictive uncertainty estimation of deep neural networks is important when their outputs are used for high stakes decision making. We investigate token-level uncertainty of connectionist temporal classification (CTC) based automatic speech recognition models. We propose an approach, which considers that not all changes at frame-level lead to a change at token-level after CTC decoding. The approach shows promising performance for prediction of recognition errors on TIMIT, Mozilla Common Voice (MCV) and kidsTALC, a corpus of children's speech, using two different model architectures, while introducing only negligible computational overhead. Our approach identifies over 80 % of a wav2vec2.0 model's errors on MCV by selecting 10 % of the tokens. We further show, that the predictive uncertainty estimate relates to the uncertainty of a human annotator, by re-annotating 500 utterances of kidsTALC.
Keywords
- Automatic Speech Recognition, Children's speech, Uncertainty
ASJC Scopus subject areas
- Arts and Humanities(all)
- Language and Linguistics
- Computer Science(all)
- Human-Computer Interaction
- Computer Science(all)
- Signal Processing
- Computer Science(all)
- Software
- Mathematics(all)
- Modelling and Simulation
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2023-August, 2023, p. 4583-4587.
Research output: Contribution to journal › Conference article › Research › peer review
}
TY - JOUR
T1 - Uncertainty Estimation for Connectionist Temporal Classification Based Automatic Speech Recognition
AU - Rumberg, Lars
AU - Gebauer, Christopher
AU - Ehlert, Hanna
AU - Wallbaum, Maren
AU - Lüdtke, Ulrike
AU - Ostermann, Jörn
PY - 2023
Y1 - 2023
N2 - Predictive uncertainty estimation of deep neural networks is important when their outputs are used for high stakes decision making. We investigate token-level uncertainty of connectionist temporal classification (CTC) based automatic speech recognition models. We propose an approach, which considers that not all changes at frame-level lead to a change at token-level after CTC decoding. The approach shows promising performance for prediction of recognition errors on TIMIT, Mozilla Common Voice (MCV) and kidsTALC, a corpus of children's speech, using two different model architectures, while introducing only negligible computational overhead. Our approach identifies over 80 % of a wav2vec2.0 model's errors on MCV by selecting 10 % of the tokens. We further show, that the predictive uncertainty estimate relates to the uncertainty of a human annotator, by re-annotating 500 utterances of kidsTALC.
AB - Predictive uncertainty estimation of deep neural networks is important when their outputs are used for high stakes decision making. We investigate token-level uncertainty of connectionist temporal classification (CTC) based automatic speech recognition models. We propose an approach, which considers that not all changes at frame-level lead to a change at token-level after CTC decoding. The approach shows promising performance for prediction of recognition errors on TIMIT, Mozilla Common Voice (MCV) and kidsTALC, a corpus of children's speech, using two different model architectures, while introducing only negligible computational overhead. Our approach identifies over 80 % of a wav2vec2.0 model's errors on MCV by selecting 10 % of the tokens. We further show, that the predictive uncertainty estimate relates to the uncertainty of a human annotator, by re-annotating 500 utterances of kidsTALC.
KW - Automatic Speech Recognition
KW - Children's speech
KW - Uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85171598603&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2023-907
DO - 10.21437/Interspeech.2023-907
M3 - Conference article
AN - SCOPUS:85171598603
VL - 2023-August
SP - 4583
EP - 4587
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - 24th International Speech Communication Association, Interspeech 2023
Y2 - 20 August 2023 through 24 August 2023
ER -