Temporal Blind Spots in Large Language Models

Jonas Wallat; Adam Jatowt; Avishek Anand

doi:10.48550/arXiv.2401.12078

Details

Originalsprache	Englisch
Titel des Sammelwerks	Proceedings of the 17th ACM International Conference on Web Search and Data Mining
Untertitel	WSDM ’24
Seiten	683-692
Seitenumfang	10
ISBN (elektronisch)	9798400703713
Publikationsstatus	Veröffentlicht - 4 März 2024
Veranstaltung	17th ACM International Conference on Web Search and Data Mining, WSDM 2024 - Merida, Mexiko Dauer: 4 März 2024 → 8 März 2024

Abstract

Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

ASJC Scopus Sachgebiete

Informatik (insg.)
Computernetzwerke und -kommunikation
Informatik (insg.)
Angewandte Informatik
Informatik (insg.)
Software

Zitieren

Temporal Blind Spots in Large Language Models. / Wallat, Jonas; Jatowt, Adam; Anand, Avishek.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. S. 683-692.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Wallat, J, Jatowt, A & Anand, A 2024, Temporal Blind Spots in Large Language Models. in Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. S. 683-692, 17th ACM International Conference on Web Search and Data Mining, WSDM 2024, Merida, Mexiko, 4 März 2024. https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818

Wallat, J., Jatowt, A., & Anand, A. (2024). Temporal Blind Spots in Large Language Models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24 (S. 683-692) https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818

Wallat J, Jatowt A, Anand A. Temporal Blind Spots in Large Language Models. in Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. S. 683-692 doi: 10.48550/arXiv.2401.12078, 10.1145/3616855.3635818

Wallat, Jonas ; Jatowt, Adam ; Anand, Avishek. / Temporal Blind Spots in Large Language Models. Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. S. 683-692

Download

@inproceedings{2eff110b72264957b1a5f241c1054d04,

title = "Temporal Blind Spots in Large Language Models",

abstract = "Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.",

keywords = "large language models, question answering, temporal information retrieval, temporal query intents",

author = "Jonas Wallat and Adam Jatowt and Avishek Anand",

note = "Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052 ; 17th ACM International Conference on Web Search and Data Mining, WSDM 2024 ; Conference date: 04-03-2024 Through 08-03-2024",

year = "2024",

month = mar,

day = "4",

doi = "10.48550/arXiv.2401.12078",

language = "English",

pages = "683--692",

booktitle = "Proceedings of the 17th ACM International Conference on Web Search and Data Mining",

}

Download

TY - GEN

T1 - Temporal Blind Spots in Large Language Models

AU - Wallat, Jonas

AU - Jatowt, Adam

AU - Anand, Avishek

N1 - Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052

PY - 2024/3/4

Y1 - 2024/3/4

N2 - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

AB - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

KW - large language models

KW - question answering

KW - temporal information retrieval

KW - temporal query intents

UR - http://www.scopus.com/inward/record.url?scp=85191716137&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2401.12078

DO - 10.48550/arXiv.2401.12078

M3 - Conference contribution

AN - SCOPUS:85191716137

SP - 683

EP - 692

BT - Proceedings of the 17th ACM International Conference on Web Search and Data Mining

T2 - 17th ACM International Conference on Web Search and Data Mining, WSDM 2024

Y2 - 4 March 2024 through 8 March 2024

ER -

Research@Leibniz University

Temporal Blind Spots in Large Language Models

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Causal Probing for Dual Encoders

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Causal Probing for Dual Encoders

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning