FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization

Nan Zhang; Yusen Zhang; Wu Guo; Prasenjit Mitra; Rui Zhang

doi:10.18653/v1/2023.emnlp-main.673

Details

Originalsprache	Englisch
Titel des Sammelwerks	Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Herausgeber/-innen	Houda Bouamor, Juan Pino, Kalika Bali
Herausgeber (Verlag)	Association for Computational Linguistics (ACL)
Seiten	10915-10931
Seitenumfang	17
ISBN (elektronisch)	9798891760608
Publikationsstatus	Veröffentlicht - 2023
Veranstaltung	Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapur Dauer: 6 Dez. 2023 → 10 Dez. 2023

Abstract

Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.

ASJC Scopus Sachgebiete

Informatik (insg.)
Theoretische Informatik und Mathematik
Informatik (insg.)
Angewandte Informatik
Informatik (insg.)
Information systems
Sozialwissenschaften (insg.)
Linguistik und Sprache

Ziele für nachhaltige Entwicklung

SDG 3 – Gute Gesundheit und Wohlergehen

Zitieren

FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization. / Zhang, Nan; Zhang, Yusen; Guo, Wu et al.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Hrsg. / Houda Bouamor; Juan Pino; Kalika Bali. Association for Computational Linguistics (ACL), 2023. S. 10915-10931.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Zhang, N, Zhang, Y, Guo, W, Mitra, P & Zhang, R 2023, FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization. in H Bouamor, J Pino & K Bali (Hrsg.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), S. 10915-10931, Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapur, 6 Dez. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.673

Zhang, N., Zhang, Y., Guo, W., Mitra, P., & Zhang, R. (2023). FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization. In H. Bouamor, J. Pino, & K. Bali (Hrsg.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (S. 10915-10931). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.673

Zhang N, Zhang Y, Guo W, Mitra P, Zhang R. FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization. in Bouamor H, Pino J, Bali K, Hrsg., Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2023. S. 10915-10931 Epub 2023 Dez. doi: 10.18653/v1/2023.emnlp-main.673

Zhang, Nan ; Zhang, Yusen ; Guo, Wu et al. / FAMESUMM : Investigating and Improving Faithfulness of Medical Summarization. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Hrsg. / Houda Bouamor ; Juan Pino ; Kalika Bali. Association for Computational Linguistics (ACL), 2023. S. 10915-10931

Download

@inproceedings{d005864a37bc436ebd6b02405e943516,

title = "FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization",

abstract = "Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.",

author = "Nan Zhang and Yusen Zhang and Wu Guo and Prasenjit Mitra and Rui Zhang",

note = "Funding Information: We thank Tianyang Zhao, Xiangyu Dong, Yilun Zhao, Yiyang Feng, Fangxu Yu, Yunxiang Li, Xue-qing Zhang, and Wei Chen for their significant assistance on data annotation. This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003. ; Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 ; Conference date: 06-12-2023 Through 10-12-2023",

year = "2023",

doi = "10.18653/v1/2023.emnlp-main.673",

language = "English",

pages = "10915--10931",

editor = "Houda Bouamor and Juan Pino and Kalika Bali",

booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics (ACL)",

address = "Australia",

}

Download

TY - GEN

T1 - FAMESUMM

T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2023

AU - Zhang, Nan

AU - Zhang, Yusen

AU - Guo, Wu

AU - Mitra, Prasenjit

AU - Zhang, Rui

N1 - Funding Information: We thank Tianyang Zhao, Xiangyu Dong, Yilun Zhao, Yiyang Feng, Fangxu Yu, Yunxiang Li, Xue-qing Zhang, and Wei Chen for their significant assistance on data annotation. This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003.

PY - 2023

Y1 - 2023

N2 - Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.

AB - Summaries of medical text shall be faithful by being consistent and factual with source inputs, which is an important but understudied topic for safety and efficiency in healthcare. In this paper, we investigate and improve faithfulness in summarization on a broad range of medical summarization tasks. Our investigation reveals that current summarization models often produce unfaithful outputs for medical input text. We then introduce FAMESUMM, a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge. FAMESUMM performs contrastive learning on designed sets of faithful and unfaithful summaries, and it incorporates medical terms and their contexts to encourage faithful generation of medical terms. We conduct comprehensive experiments on three datasets in two languages: health question and radiology report summarization datasets in English, and a patient-doctor dialogue dataset in Chinese. Results demonstrate that FAMESUMM is flexible and effective by delivering consistent improvements over mainstream language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art performances on metrics for faithfulness and general quality. Human evaluation by doctors also shows that FAMESUMM generates more faithful outputs. Our code is available at https://github.com/psunlpgroup/FaMeSumm.

UR - http://www.scopus.com/inward/record.url?scp=85184801118&partnerID=8YFLogxK

U2 - 10.18653/v1/2023.emnlp-main.673

DO - 10.18653/v1/2023.emnlp-main.673

M3 - Conference contribution

AN - SCOPUS:85184801118

SP - 10915

EP - 10931

BT - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

A2 - Bouamor, Houda

A2 - Pino, Juan

A2 - Bali, Kalika

PB - Association for Computational Linguistics (ACL)

Y2 - 6 December 2023 through 10 December 2023

ER -

Research@Leibniz University

FAMESUMM: Investigating and Improving Faithfulness of Medical Summarization

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Ziele für nachhaltige Entwicklung

Zitieren