Details
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics, Findings of ACL |
Subtitle of host publication | EMNLP 2021 |
Editors | Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih |
Pages | 4600-4609 |
Number of pages | 10 |
ISBN (electronic) | 9781955917100 |
Publication status | Published - 2021 |
Event | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 |
Publication series
Name | Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|
Abstract
Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.
ASJC Scopus subject areas
- Arts and Humanities(all)
- Language and Linguistics
- Social Sciences(all)
- Linguistics and Language
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021. ed. / Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-Tau Yih. 2021. p. 4600-4609 (Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Question Answering over Electronic Devices
T2 - 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
AU - Nandy, Abhilash
AU - Sharma, Soumya
AU - Maddhashiya, Shubham
AU - Sachdeva, Kapil
AU - Goyal, Pawan
AU - Ganguly, Niloy
N1 - Funding Information: We would like to thank the annotators who made the curation of the datasets possible. Also, special thanks to Manav Kapadnis, an Undergraduate Student of Indian Institute of Technology Kharagpur, for his contribution towards the implementation of certain baselines. This work is supported in part by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor (grant no. 01DD20003). This work is also supported in part by Confederation of Indian Industry (CII) and the Science & Engineering Research Board Department of Science & Technology Government of India (SERB) through the Prime Minister's Research Fellowship scheme. Finally, we acknowledge the funding received from Samsung Research Institute, Delhi for the work.
PY - 2021
Y1 - 2021
N2 - Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.
AB - Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/ EMNLP-2021-Findings, and the corresponding project website is https://sites. google.com/view/emanualqa/home.
UR - http://www.scopus.com/inward/record.url?scp=85129184921&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85129184921
T3 - Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
SP - 4600
EP - 4609
BT - Findings of the Association for Computational Linguistics, Findings of ACL
A2 - Moens, Marie-Francine
A2 - Huang, Xuanjing
A2 - Specia, Lucia
A2 - Yih, Scott Wen-Tau
Y2 - 7 November 2021 through 11 November 2021
ER -