Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Abhilash Nandy; Soumya Sharma; Shubham Maddhashiya; Kapil Sachdeva; Pawan Goyal; Niloy Ganguly

doi:10.18653/v1/2021.findings-emnlp.392

Details

Original language	English
Title of host publication	Findings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publication	EMNLP 2021
Editors	Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
Publisher	Association for Computational Linguistics (ACL)
Pages	4600-4609
Number of pages	10
ISBN (electronic)	9781955917100
Publication status	Published - 2021
Event	2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021

Publication series

Name	Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Abstract

Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.

ASJC Scopus subject areas

Arts and Humanities(all)
Language and Linguistics
Social Sciences(all)
Linguistics and Language

Cite this

Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. / Nandy, Abhilash; Sharma, Soumya; Maddhashiya, Shubham et al.
Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. ed. / Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-Tau Yih. Association for Computational Linguistics (ACL), 2021. p. 4600-4609 (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Nandy, A, Sharma, S, Maddhashiya, S, Sachdeva, K, Goyal, P & Ganguly, N 2021, Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. in M-F Moens, X Huang, L Specia & SW-T Yih (eds), Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, Association for Computational Linguistics (ACL), pp. 4600-4609, 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, Punta Cana, Dominican Republic, 7 Nov 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.392

Nandy, A., Sharma, S., Maddhashiya, S., Sachdeva, K., Goyal, P., & Ganguly, N. (2021). Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. In M.-F. Moens, X. Huang, L. Specia, & S. W.-T. Yih (Eds.), Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4600-4609). (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.392

Nandy A, Sharma S, Maddhashiya S, Sachdeva K, Goyal P, Ganguly N. Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. In Moens MF, Huang X, Specia L, Yih SWT, editors, Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. Association for Computational Linguistics (ACL). 2021. p. 4600-4609. (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021). doi: 10.18653/v1/2021.findings-emnlp.392

Nandy, Abhilash ; Sharma, Soumya ; Maddhashiya, Shubham et al. / Question Answering over Electronic Devices : A New Benchmark Dataset and a Multi-Task Learning based QA Framework. Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. editor / Marie-Francine Moens ; Xuanjing Huang ; Lucia Specia ; Scott Wen-Tau Yih. Association for Computational Linguistics (ACL), 2021. pp. 4600-4609 (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).

Download

@inproceedings{d59698afad224fd284cfeae65706ac7d,

title = "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework",

abstract = "Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.",

author = "Abhilash Nandy and Soumya Sharma and Shubham Maddhashiya and Kapil Sachdeva and Pawan Goyal and Niloy Ganguly",

note = "Funding Information: We would like to thank the annotators who made the curation of the datasets possible. Also, special thanks to Manav Kapadnis, an Undergraduate Student of Indian Institute of Technology Kharagpur, for his contribution towards the implementation of certain baselines. This work is supported in part by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor (grant no. 01DD20003). This work is also supported in part by Confederation of Indian Industry (CII) and the Science & Engineering Research Board Department of Science & Technology Government of India (SERB) through the Prime Minister's Research Fellowship scheme. Finally, we acknowledge the funding received from Samsung Research Institute, Delhi for the work. ; 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 ; Conference date: 07-11-2021 Through 11-11-2021",

year = "2021",

doi = "10.18653/v1/2021.findings-emnlp.392",

language = "English",

series = "Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021",

publisher = "Association for Computational Linguistics (ACL)",

pages = "4600--4609",

editor = "Marie-Francine Moens and Xuanjing Huang and Lucia Specia and Yih, {Scott Wen-Tau}",

booktitle = "Findings of the Association for Computational Linguistics, Findings of ACL",

address = "Australia",

}

Download

TY - GEN

T1 - Question Answering over Electronic Devices

T2 - 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

AU - Nandy, Abhilash

AU - Sharma, Soumya

AU - Maddhashiya, Shubham

AU - Sachdeva, Kapil

AU - Goyal, Pawan

AU - Ganguly, Niloy

N1 - Funding Information: We would like to thank the annotators who made the curation of the datasets possible. Also, special thanks to Manav Kapadnis, an Undergraduate Student of Indian Institute of Technology Kharagpur, for his contribution towards the implementation of certain baselines. This work is supported in part by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor (grant no. 01DD20003). This work is also supported in part by Confederation of Indian Industry (CII) and the Science & Engineering Research Board Department of Science & Technology Government of India (SERB) through the Prime Minister's Research Fellowship scheme. Finally, we acknowledge the funding received from Samsung Research Institute, Delhi for the work.

PY - 2021

Y1 - 2021

N2 - Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.

AB - Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.

UR - http://www.scopus.com/inward/record.url?scp=85129184921&partnerID=8YFLogxK

U2 - 10.18653/v1/2021.findings-emnlp.392

DO - 10.18653/v1/2021.findings-emnlp.392

M3 - Conference contribution

AN - SCOPUS:85129184921

T3 - Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

SP - 4600

EP - 4609

BT - Findings of the Association for Computational Linguistics, Findings of ACL

A2 - Moens, Marie-Francine

A2 - Huang, Xuanjing

A2 - Specia, Lucia

A2 - Yih, Scott Wen-Tau

PB - Association for Computational Linguistics (ACL)

Y2 - 7 November 2021 through 11 November 2021

ER -

Research@Leibniz University

Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this