Temporal Blind Spots in Large Language Models

Jonas Wallat; Adam Jatowt; Avishek Anand

doi:10.48550/arXiv.2401.12078

Details

Original language	English
Title of host publication	Proceedings of the 17th ACM International Conference on Web Search and Data Mining
Subtitle of host publication	WSDM ’24
Pages	683-692
Number of pages	10
ISBN (electronic)	9798400703713
Publication status	Published - 4 Mar 2024
Event	17th ACM International Conference on Web Search and Data Mining, WSDM 2024 - Merida, Mexico Duration: 4 Mar 2024 → 8 Mar 2024

Abstract

Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

Keywords

large language models, question answering, temporal information retrieval, temporal query intents

ASJC Scopus subject areas

Computer Science(all)
Computer Networks and Communications
Computer Science(all)
Computer Science Applications
Computer Science(all)
Software

Cite this

Temporal Blind Spots in Large Language Models. / Wallat, Jonas; Jatowt, Adam; Anand, Avishek.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. p. 683-692.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Wallat, J, Jatowt, A & Anand, A 2024, Temporal Blind Spots in Large Language Models. in Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. pp. 683-692, 17th ACM International Conference on Web Search and Data Mining, WSDM 2024, Merida, Mexico, 4 Mar 2024. https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818

Wallat, J., Jatowt, A., & Anand, A. (2024). Temporal Blind Spots in Large Language Models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24 (pp. 683-692) https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818

Wallat J, Jatowt A, Anand A. Temporal Blind Spots in Large Language Models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. p. 683-692 doi: 10.48550/arXiv.2401.12078, 10.1145/3616855.3635818

Wallat, Jonas ; Jatowt, Adam ; Anand, Avishek. / Temporal Blind Spots in Large Language Models. Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. pp. 683-692

Download

@inproceedings{2eff110b72264957b1a5f241c1054d04,

title = "Temporal Blind Spots in Large Language Models",

abstract = "Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.",

keywords = "large language models, question answering, temporal information retrieval, temporal query intents",

author = "Jonas Wallat and Adam Jatowt and Avishek Anand",

note = "Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052 ; 17th ACM International Conference on Web Search and Data Mining, WSDM 2024 ; Conference date: 04-03-2024 Through 08-03-2024",

year = "2024",

month = mar,

day = "4",

doi = "10.48550/arXiv.2401.12078",

language = "English",

pages = "683--692",

booktitle = "Proceedings of the 17th ACM International Conference on Web Search and Data Mining",

}

Download

TY - GEN

T1 - Temporal Blind Spots in Large Language Models

AU - Wallat, Jonas

AU - Jatowt, Adam

AU - Anand, Avishek

N1 - Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052

PY - 2024/3/4

Y1 - 2024/3/4

N2 - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

AB - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

KW - large language models

KW - question answering

KW - temporal information retrieval

KW - temporal query intents

UR - http://www.scopus.com/inward/record.url?scp=85191716137&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2401.12078

DO - 10.48550/arXiv.2401.12078

M3 - Conference contribution

AN - SCOPUS:85191716137

SP - 683

EP - 692

BT - Proceedings of the 17th ACM International Conference on Web Search and Data Mining

T2 - 17th ACM International Conference on Web Search and Data Mining, WSDM 2024

Y2 - 4 March 2024 through 8 March 2024

ER -

Research@Leibniz University

Temporal Blind Spots in Large Language Models

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Causal Probing for Dual Encoders

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Causal Probing for Dual Encoders

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning