Selecting textual analysis tools to classify sustainability information in corporate reporting

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Frederik Maibaum
  • Johannes Kriebel
  • Johann Nils Foege

Research Organisations

External Research Organisations

  • University of Münster
View graph of relations

Details

Original languageEnglish
Article number114269
Number of pages11
JournalDecision support systems
Volume183
Early online date11 Jun 2024
Publication statusPublished - Aug 2024

Abstract

Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

Keywords

    ChatGPT, Corporate reporting, Natural language processing, Performance evaluation, Sustainability

ASJC Scopus subject areas

Cite this

Selecting textual analysis tools to classify sustainability information in corporate reporting. / Maibaum, Frederik; Kriebel, Johannes; Foege, Johann Nils.
In: Decision support systems, Vol. 183, 114269, 08.2024.

Research output: Contribution to journalArticleResearchpeer review

Maibaum F, Kriebel J, Foege JN. Selecting textual analysis tools to classify sustainability information in corporate reporting. Decision support systems. 2024 Aug;183:114269. Epub 2024 Jun 11. doi: 10.1016/j.dss.2024.114269
Maibaum, Frederik ; Kriebel, Johannes ; Foege, Johann Nils. / Selecting textual analysis tools to classify sustainability information in corporate reporting. In: Decision support systems. 2024 ; Vol. 183.
Download
@article{4f9247bb360644ba8b45eb29ae0f56cb,
title = "Selecting textual analysis tools to classify sustainability information in corporate reporting",
abstract = "Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.",
keywords = "ChatGPT, Corporate reporting, Natural language processing, Performance evaluation, Sustainability",
author = "Frederik Maibaum and Johannes Kriebel and Foege, {Johann Nils}",
note = "Publisher Copyright: {\textcopyright} 2024 The Authors",
year = "2024",
month = aug,
doi = "10.1016/j.dss.2024.114269",
language = "English",
volume = "183",
journal = "Decision support systems",
issn = "0167-9236",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Selecting textual analysis tools to classify sustainability information in corporate reporting

AU - Maibaum, Frederik

AU - Kriebel, Johannes

AU - Foege, Johann Nils

N1 - Publisher Copyright: © 2024 The Authors

PY - 2024/8

Y1 - 2024/8

N2 - Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

AB - Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

KW - ChatGPT

KW - Corporate reporting

KW - Natural language processing

KW - Performance evaluation

KW - Sustainability

UR - http://www.scopus.com/inward/record.url?scp=85196207417&partnerID=8YFLogxK

U2 - 10.1016/j.dss.2024.114269

DO - 10.1016/j.dss.2024.114269

M3 - Article

AN - SCOPUS:85196207417

VL - 183

JO - Decision support systems

JF - Decision support systems

SN - 0167-9236

M1 - 114269

ER -