Details
Original language | English |
---|---|
Title of host publication | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
Editors | Houda Bouamor, Juan Pino, Kalika Bali |
Pages | 10915-10931 |
Number of pages | 17 |
ISBN (electronic) | 9798891760608 |
Publication status | Published - 2023 |
Event | Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore Duration: 6 Dec 2023 → 10 Dec 2023 |
Abstract
Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.
ASJC Scopus subject areas
- Computer Science(all)
- Computational Theory and Mathematics
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Information Systems
- Social Sciences(all)
- Linguistics and Language
Sustainable Development Goals
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. ed. / Houda Bouamor; Juan Pino; Kalika Bali. 2023. p. 10915-10931.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - FAMESUMM
T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
AU - Zhang, Nan
AU - Zhang, Yusen
AU - Guo, Wu
AU - Mitra, Prasenjit
AU - Zhang, Rui
N1 - Funding Information: We thank Tianyang Zhao, Xiangyu Dong, Yilun Zhao, Yiyang Feng, Fangxu Yu, Yunxiang Li, Xue-qing Zhang, and Wei Chen for their significant assistance on data annotation. This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003.
PY - 2023
Y1 - 2023
N2 - Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.
AB - Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.
UR - http://www.scopus.com/inward/record.url?scp=85184801118&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.emnlp-main.673
DO - 10.18653/v1/2023.emnlp-main.673
M3 - Conference contribution
AN - SCOPUS:85184801118
SP - 10915
EP - 10931
BT - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
A2 - Bouamor, Houda
A2 - Pino, Juan
A2 - Bali, Kalika
Y2 - 6 December 2023 through 10 December 2023
ER -