Assessing the Sufficiency of Arguments through Conclusion Generation

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

External Research Organisations

  • Heinz Nixdorf Institute
  • Paderborn University
View graph of relations

Details

Original languageEnglish
Title of host publication8th Workshop on Argument Mining, ArgMining 2021 - Proceedings
Place of PublicationPunta Cana
Pages67-77
Number of pages11
Publication statusPublished - 2021
Externally publishedYes
Event8th Workshop on Argument Mining, ArgMining 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 10 Nov 202111 Nov 2021

Abstract

The premises of an argument give evidence or other reasons to support a conclusion. However, the amount of support required depends on the generality of a conclusion, the nature of the individual premises, and similar. An argument whose premises make its conclusion rationally worthy to be drawn is called sufficient in argument quality research. Previous work tackled sufficiency assessment as a standard text classification problem, not modeling the inherent relation of premises and conclusion. In this paper, we hypothesize that the conclusion of a sufficient argument can be generated from its premises. To study this hypothesis, we explore the potential of assessing sufficiency based on the output of large-scale pre-trained language models. Our best model variant achieves an F1-score of .885, outperforming the previous state-of-the-art and being on par with human experts. While manual evaluation reveals the quality of the generated conclusions, their impact remains low ultimately.

ASJC Scopus subject areas

Cite this

Assessing the Sufficiency of Arguments through Conclusion Generation. / Gurcke, Timon; Alshomary, Milad; Wachsmuth, Henning.
8th Workshop on Argument Mining, ArgMining 2021 - Proceedings. Punta Cana, 2021. p. 67-77.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Gurcke, T, Alshomary, M & Wachsmuth, H 2021, Assessing the Sufficiency of Arguments through Conclusion Generation. in 8th Workshop on Argument Mining, ArgMining 2021 - Proceedings. Punta Cana, pp. 67-77, 8th Workshop on Argument Mining, ArgMining 2021, Virtual, Punta Cana, Dominican Republic, 10 Nov 2021. https://doi.org/10.48550/arXiv.2110.13495
Gurcke, T., Alshomary, M., & Wachsmuth, H. (2021). Assessing the Sufficiency of Arguments through Conclusion Generation. In 8th Workshop on Argument Mining, ArgMining 2021 - Proceedings (pp. 67-77). https://doi.org/10.48550/arXiv.2110.13495
Gurcke T, Alshomary M, Wachsmuth H. Assessing the Sufficiency of Arguments through Conclusion Generation. In 8th Workshop on Argument Mining, ArgMining 2021 - Proceedings. Punta Cana. 2021. p. 67-77 doi: 10.48550/arXiv.2110.13495
Gurcke, Timon ; Alshomary, Milad ; Wachsmuth, Henning. / Assessing the Sufficiency of Arguments through Conclusion Generation. 8th Workshop on Argument Mining, ArgMining 2021 - Proceedings. Punta Cana, 2021. pp. 67-77
Download
@inproceedings{b70e71fb4ec943428cb09527ce926a5f,
title = "Assessing the Sufficiency of Arguments through Conclusion Generation",
abstract = "The premises of an argument give evidence or other reasons to support a conclusion. However, the amount of support required depends on the generality of a conclusion, the nature of the individual premises, and similar. An argument whose premises make its conclusion rationally worthy to be drawn is called sufficient in argument quality research. Previous work tackled sufficiency assessment as a standard text classification problem, not modeling the inherent relation of premises and conclusion. In this paper, we hypothesize that the conclusion of a sufficient argument can be generated from its premises. To study this hypothesis, we explore the potential of assessing sufficiency based on the output of large-scale pre-trained language models. Our best model variant achieves an F1-score of .885, outperforming the previous state-of-the-art and being on par with human experts. While manual evaluation reveals the quality of the generated conclusions, their impact remains low ultimately.",
author = "Timon Gurcke and Milad Alshomary and Henning Wachsmuth",
note = "Funding Information: Employing knowledge about argumentative structure can benefit sufficiency assessment, in line with findings on predicting essay-level argument quality (Wachsmuth et al., 2016). Our results sug- gest that there is at least some additional knowledge We thank Katharina Brennig, Simon Seidl, Abdul-in an argument{\textquoteright}s conclusion that our model could lah Burak, Frederike Gurcke and Dr. Maurice Gur-not learn itself. However, we did not actually mine cke for their feedback. We gratefully acknowledge argumentative structure here, but we resorted to the computing time provided the described experi-the human-annotated ground truth, which is usu-ments by the Paderborn Center for Parallel Comput-ally not available in a real-world setting. Thus, the ing (PC2). This project has been partially funded improvements obtained by the structure could van-by the German Research Foundation (DFG) within ish as soon as we resort to computational methods. the project OASiS, project number 455913891, as We note, though, that we obtained state-of-the-art part of the Priority Program “Robust Argumenta-results also using RoBERTa on the plain text only. tion Machines (RATIO)” (SPP-1999).; 8th Workshop on Argument Mining, ArgMining 2021 ; Conference date: 10-11-2021 Through 11-11-2021",
year = "2021",
doi = "10.48550/arXiv.2110.13495",
language = "English",
isbn = "9781954085923",
pages = "67--77",
booktitle = "8th Workshop on Argument Mining, ArgMining 2021 - Proceedings",

}

Download

TY - GEN

T1 - Assessing the Sufficiency of Arguments through Conclusion Generation

AU - Gurcke, Timon

AU - Alshomary, Milad

AU - Wachsmuth, Henning

N1 - Funding Information: Employing knowledge about argumentative structure can benefit sufficiency assessment, in line with findings on predicting essay-level argument quality (Wachsmuth et al., 2016). Our results sug- gest that there is at least some additional knowledge We thank Katharina Brennig, Simon Seidl, Abdul-in an argument’s conclusion that our model could lah Burak, Frederike Gurcke and Dr. Maurice Gur-not learn itself. However, we did not actually mine cke for their feedback. We gratefully acknowledge argumentative structure here, but we resorted to the computing time provided the described experi-the human-annotated ground truth, which is usu-ments by the Paderborn Center for Parallel Comput-ally not available in a real-world setting. Thus, the ing (PC2). This project has been partially funded improvements obtained by the structure could van-by the German Research Foundation (DFG) within ish as soon as we resort to computational methods. the project OASiS, project number 455913891, as We note, though, that we obtained state-of-the-art part of the Priority Program “Robust Argumenta-results also using RoBERTa on the plain text only. tion Machines (RATIO)” (SPP-1999).

PY - 2021

Y1 - 2021

N2 - The premises of an argument give evidence or other reasons to support a conclusion. However, the amount of support required depends on the generality of a conclusion, the nature of the individual premises, and similar. An argument whose premises make its conclusion rationally worthy to be drawn is called sufficient in argument quality research. Previous work tackled sufficiency assessment as a standard text classification problem, not modeling the inherent relation of premises and conclusion. In this paper, we hypothesize that the conclusion of a sufficient argument can be generated from its premises. To study this hypothesis, we explore the potential of assessing sufficiency based on the output of large-scale pre-trained language models. Our best model variant achieves an F1-score of .885, outperforming the previous state-of-the-art and being on par with human experts. While manual evaluation reveals the quality of the generated conclusions, their impact remains low ultimately.

AB - The premises of an argument give evidence or other reasons to support a conclusion. However, the amount of support required depends on the generality of a conclusion, the nature of the individual premises, and similar. An argument whose premises make its conclusion rationally worthy to be drawn is called sufficient in argument quality research. Previous work tackled sufficiency assessment as a standard text classification problem, not modeling the inherent relation of premises and conclusion. In this paper, we hypothesize that the conclusion of a sufficient argument can be generated from its premises. To study this hypothesis, we explore the potential of assessing sufficiency based on the output of large-scale pre-trained language models. Our best model variant achieves an F1-score of .885, outperforming the previous state-of-the-art and being on par with human experts. While manual evaluation reveals the quality of the generated conclusions, their impact remains low ultimately.

UR - http://www.scopus.com/inward/record.url?scp=85127214319&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2110.13495

DO - 10.48550/arXiv.2110.13495

M3 - Conference contribution

AN - SCOPUS:85127214319

SN - 9781954085923

SP - 67

EP - 77

BT - 8th Workshop on Argument Mining, ArgMining 2021 - Proceedings

CY - Punta Cana

T2 - 8th Workshop on Argument Mining, ArgMining 2021

Y2 - 10 November 2021 through 11 November 2021

ER -

By the same author(s)