Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
Seiten | 1913-1917 |
Seitenumfang | 5 |
ISBN (elektronisch) | 9781713836902 |
Publikationsstatus | Veröffentlicht - 2021 |
Veranstaltung | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Tschechische Republik Dauer: 30 Aug. 2021 → 3 Sept. 2021 |
Publikationsreihe
Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
---|---|
Band | 3 |
ISSN (Print) | 2308-457X |
ISSN (elektronisch) | 1990-9772 |
Abstract
Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
ASJC Scopus Sachgebiete
- Geisteswissenschaftliche Fächer (insg.)
- Sprache und Linguistik
- Informatik (insg.)
- Mensch-Maschine-Interaktion
- Informatik (insg.)
- Signalverarbeitung
- Informatik (insg.)
- Software
- Mathematik (insg.)
- Modellierung und Simulation
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. S. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Band 3).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning
AU - Rumberg, Lars
AU - Ehlert, Hanna
AU - Lüdtke, Ulrike
AU - Ostermann, Jörn
N1 - Publisher Copyright: Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
AB - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
KW - Child speech
KW - Domain adaptation
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85119174913&partnerID=8YFLogxK
U2 - 10.21437/interspeech.2021-1241
DO - 10.21437/interspeech.2021-1241
M3 - Conference contribution
AN - SCOPUS:85119174913
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 1913
EP - 1917
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -