Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Abhilash Nandy
  • Soumya Sharma
  • Shubham Maddhashiya
  • Kapil Sachdeva
  • Pawan Goyal
  • Niloy Ganguly

Research Organisations

External Research Organisations

  • Indian Institute of Technology Kharagpur (IITKGP)
  • Samsung R&D Institute India-Delhi (SRI-Delhi)
View graph of relations

Details

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publicationEMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
Pages4600-4609
Number of pages10
ISBN (electronic)9781955917100
Publication statusPublished - 2021
Event2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameFindings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Abstract

Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.

ASJC Scopus subject areas

Cite this

Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. / Nandy, Abhilash; Sharma, Soumya; Maddhashiya, Shubham et al.
Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. ed. / Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-Tau Yih. 2021. p. 4600-4609 (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Nandy, A, Sharma, S, Maddhashiya, S, Sachdeva, K, Goyal, P & Ganguly, N 2021, Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. in M-F Moens, X Huang, L Specia & SW-T Yih (eds), Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, pp. 4600-4609, 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, Punta Cana, Dominican Republic, 7 Nov 2021.
Nandy, A., Sharma, S., Maddhashiya, S., Sachdeva, K., Goyal, P., & Ganguly, N. (2021). Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. In M.-F. Moens, X. Huang, L. Specia, & S. W.-T. Yih (Eds.), Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4600-4609). (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).
Nandy A, Sharma S, Maddhashiya S, Sachdeva K, Goyal P, Ganguly N. Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. In Moens MF, Huang X, Specia L, Yih SWT, editors, Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. 2021. p. 4600-4609. (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).
Nandy, Abhilash ; Sharma, Soumya ; Maddhashiya, Shubham et al. / Question Answering over Electronic Devices : A New Benchmark Dataset and a Multi-Task Learning based QA Framework. Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. editor / Marie-Francine Moens ; Xuanjing Huang ; Lucia Specia ; Scott Wen-Tau Yih. 2021. pp. 4600-4609 (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).
Download
@inproceedings{d59698afad224fd284cfeae65706ac7d,
title = "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework",
abstract = "Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.",
author = "Abhilash Nandy and Soumya Sharma and Shubham Maddhashiya and Kapil Sachdeva and Pawan Goyal and Niloy Ganguly",
note = "Funding Information: We would like to thank the annotators who made the curation of the datasets possible. Also, special thanks to Manav Kapadnis, an Undergraduate Student of Indian Institute of Technology Kharagpur, for his contribution towards the implementation of certain baselines. This work is supported in part by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor (grant no. 01DD20003). This work is also supported in part by Confederation of Indian Industry (CII) and the Science & Engineering Research Board Department of Science & Technology Government of India (SERB) through the Prime Minister's Research Fellowship scheme. Finally, we acknowledge the funding received from Samsung Research Institute, Delhi for the work. ; 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 ; Conference date: 07-11-2021 Through 11-11-2021",
year = "2021",
language = "English",
series = "Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021",
pages = "4600--4609",
editor = "Marie-Francine Moens and Xuanjing Huang and Lucia Specia and Yih, {Scott Wen-Tau}",
booktitle = "Findings of the Association for Computational Linguistics, Findings of ACL",

}

Download

TY - GEN

T1 - Question Answering over Electronic Devices

T2 - 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

AU - Nandy, Abhilash

AU - Sharma, Soumya

AU - Maddhashiya, Shubham

AU - Sachdeva, Kapil

AU - Goyal, Pawan

AU - Ganguly, Niloy

N1 - Funding Information: We would like to thank the annotators who made the curation of the datasets possible. Also, special thanks to Manav Kapadnis, an Undergraduate Student of Indian Institute of Technology Kharagpur, for his contribution towards the implementation of certain baselines. This work is supported in part by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor (grant no. 01DD20003). This work is also supported in part by Confederation of Indian Industry (CII) and the Science & Engineering Research Board Department of Science & Technology Government of India (SERB) through the Prime Minister's Research Fellowship scheme. Finally, we acknowledge the funding received from Samsung Research Institute, Delhi for the work.

PY - 2021

Y1 - 2021

N2 - Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.

AB - Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.

UR - http://www.scopus.com/inward/record.url?scp=85129184921&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85129184921

T3 - Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

SP - 4600

EP - 4609

BT - Findings of the Association for Computational Linguistics, Findings of ACL

A2 - Moens, Marie-Francine

A2 - Huang, Xuanjing

A2 - Specia, Lucia

A2 - Yih, Scott Wen-Tau

Y2 - 7 November 2021 through 11 November 2021

ER -