Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Seiten1913-1917
Seitenumfang5
ISBN (elektronisch)9781713836902
PublikationsstatusVeröffentlicht - 2021
Veranstaltung22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Tschechische Republik
Dauer: 30 Aug. 20213 Sept. 2021

Publikationsreihe

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Band3
ISSN (Print)2308-457X
ISSN (elektronisch)1990-9772

Abstract

Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

ASJC Scopus Sachgebiete

Zitieren

Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. / Rumberg, Lars; Ehlert, Hanna; Lüdtke, Ulrike et al.
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. S. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Band 3).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Rumberg, L, Ehlert, H, Lüdtke, U & Ostermann, J 2021, Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Bd. 3, S. 1913-1917, 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Tschechische Republik, 30 Aug. 2021. https://doi.org/10.21437/interspeech.2021-1241, https://doi.org/10.21437/Interspeech.2021-1241
Rumberg, L., Ehlert, H., Lüdtke, U., & Ostermann, J. (2021). Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (S. 1913-1917). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Band 3). https://doi.org/10.21437/interspeech.2021-1241, https://doi.org/10.21437/Interspeech.2021-1241
Rumberg L, Ehlert H, Lüdtke U, Ostermann J. Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. S. 1913-1917. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/interspeech.2021-1241, 10.21437/Interspeech.2021-1241
Rumberg, Lars ; Ehlert, Hanna ; Lüdtke, Ulrike et al. / Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. S. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).
Download
@inproceedings{78446b345f734c27999fe407bd76232a,
title = "Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning",
abstract = "Automatic speech recognition for children{\textquoteright}s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children{\textquoteright}s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.",
keywords = "Child speech, Domain adaptation, Speech recognition",
author = "Lars Rumberg and Hanna Ehlert and Ulrike L{\"u}dtke and J{\"o}rn Ostermann",
note = "Publisher Copyright: Copyright {\textcopyright} 2021 ISCA.; 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 ; Conference date: 30-08-2021 Through 03-09-2021",
year = "2021",
doi = "10.21437/interspeech.2021-1241",
language = "English",
series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
pages = "1913--1917",
booktitle = "22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021",

}

Download

TY - GEN

T1 - Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

AU - Rumberg, Lars

AU - Ehlert, Hanna

AU - Lüdtke, Ulrike

AU - Ostermann, Jörn

N1 - Publisher Copyright: Copyright © 2021 ISCA.

PY - 2021

Y1 - 2021

N2 - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

AB - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

KW - Child speech

KW - Domain adaptation

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85119174913&partnerID=8YFLogxK

U2 - 10.21437/interspeech.2021-1241

DO - 10.21437/interspeech.2021-1241

M3 - Conference contribution

AN - SCOPUS:85119174913

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 1913

EP - 1917

BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

Y2 - 30 August 2021 through 3 September 2021

ER -

Von denselben Autoren