Temporal Blind Spots in Large Language Models

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • Universität Innsbruck
  • Delft University of Technology
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the 17th ACM International Conference on Web Search and Data Mining
UntertitelWSDM ’24
Seiten683-692
Seitenumfang10
ISBN (elektronisch)9798400703713
PublikationsstatusVeröffentlicht - 4 März 2024
Veranstaltung17th ACM International Conference on Web Search and Data Mining, WSDM 2024 - Merida, Mexiko
Dauer: 4 März 20248 März 2024

Abstract

Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

ASJC Scopus Sachgebiete

Zitieren

Temporal Blind Spots in Large Language Models. / Wallat, Jonas; Jatowt, Adam; Anand, Avishek.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. S. 683-692.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Wallat, J, Jatowt, A & Anand, A 2024, Temporal Blind Spots in Large Language Models. in Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. S. 683-692, 17th ACM International Conference on Web Search and Data Mining, WSDM 2024, Merida, Mexiko, 4 März 2024. https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818
Wallat, J., Jatowt, A., & Anand, A. (2024). Temporal Blind Spots in Large Language Models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24 (S. 683-692) https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818
Wallat J, Jatowt A, Anand A. Temporal Blind Spots in Large Language Models. in Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. S. 683-692 doi: 10.48550/arXiv.2401.12078, 10.1145/3616855.3635818
Wallat, Jonas ; Jatowt, Adam ; Anand, Avishek. / Temporal Blind Spots in Large Language Models. Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. S. 683-692
Download
@inproceedings{2eff110b72264957b1a5f241c1054d04,
title = "Temporal Blind Spots in Large Language Models",
abstract = "Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.",
keywords = "large language models, question answering, temporal information retrieval, temporal query intents",
author = "Jonas Wallat and Adam Jatowt and Avishek Anand",
note = "Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052 ; 17th ACM International Conference on Web Search and Data Mining, WSDM 2024 ; Conference date: 04-03-2024 Through 08-03-2024",
year = "2024",
month = mar,
day = "4",
doi = "10.48550/arXiv.2401.12078",
language = "English",
pages = "683--692",
booktitle = "Proceedings of the 17th ACM International Conference on Web Search and Data Mining",

}

Download

TY - GEN

T1 - Temporal Blind Spots in Large Language Models

AU - Wallat, Jonas

AU - Jatowt, Adam

AU - Anand, Avishek

N1 - Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052

PY - 2024/3/4

Y1 - 2024/3/4

N2 - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

AB - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

KW - large language models

KW - question answering

KW - temporal information retrieval

KW - temporal query intents

UR - http://www.scopus.com/inward/record.url?scp=85191716137&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2401.12078

DO - 10.48550/arXiv.2401.12078

M3 - Conference contribution

AN - SCOPUS:85191716137

SP - 683

EP - 692

BT - Proceedings of the 17th ACM International Conference on Web Search and Data Mining

T2 - 17th ACM International Conference on Web Search and Data Mining, WSDM 2024

Y2 - 4 March 2024 through 8 March 2024

ER -

Von denselben Autoren