Selecting textual analysis tools to classify sustainability information in corporate reporting

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Frederik Maibaum
  • Johannes Kriebel
  • Johann Nils Foege

Externe Organisationen

  • Westfälische Wilhelms-Universität Münster (WWU)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer114269
Seitenumfang11
FachzeitschriftDecision support systems
Jahrgang183
Frühes Online-Datum11 Juni 2024
PublikationsstatusVeröffentlicht - Aug. 2024

Abstract

Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

ASJC Scopus Sachgebiete

Zitieren

Selecting textual analysis tools to classify sustainability information in corporate reporting. / Maibaum, Frederik; Kriebel, Johannes; Foege, Johann Nils.
in: Decision support systems, Jahrgang 183, 114269, 08.2024.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Maibaum F, Kriebel J, Foege JN. Selecting textual analysis tools to classify sustainability information in corporate reporting. Decision support systems. 2024 Aug;183:114269. Epub 2024 Jun 11. doi: 10.1016/j.dss.2024.114269
Maibaum, Frederik ; Kriebel, Johannes ; Foege, Johann Nils. / Selecting textual analysis tools to classify sustainability information in corporate reporting. in: Decision support systems. 2024 ; Jahrgang 183.
Download
@article{4f9247bb360644ba8b45eb29ae0f56cb,
title = "Selecting textual analysis tools to classify sustainability information in corporate reporting",
abstract = "Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.",
keywords = "ChatGPT, Corporate reporting, Natural language processing, Performance evaluation, Sustainability",
author = "Frederik Maibaum and Johannes Kriebel and Foege, {Johann Nils}",
note = "Publisher Copyright: {\textcopyright} 2024 The Authors",
year = "2024",
month = aug,
doi = "10.1016/j.dss.2024.114269",
language = "English",
volume = "183",
journal = "Decision support systems",
issn = "0167-9236",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Selecting textual analysis tools to classify sustainability information in corporate reporting

AU - Maibaum, Frederik

AU - Kriebel, Johannes

AU - Foege, Johann Nils

N1 - Publisher Copyright: © 2024 The Authors

PY - 2024/8

Y1 - 2024/8

N2 - Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

AB - Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

KW - ChatGPT

KW - Corporate reporting

KW - Natural language processing

KW - Performance evaluation

KW - Sustainability

UR - http://www.scopus.com/inward/record.url?scp=85196207417&partnerID=8YFLogxK

U2 - 10.1016/j.dss.2024.114269

DO - 10.1016/j.dss.2024.114269

M3 - Article

AN - SCOPUS:85196207417

VL - 183

JO - Decision support systems

JF - Decision support systems

SN - 0167-9236

M1 - 114269

ER -