Temporal Blind Spots in Large Language Models

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

External Research Organisations

  • University of Innsbruck
  • Delft University of Technology
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the 17th ACM International Conference on Web Search and Data Mining
Subtitle of host publicationWSDM ’24
Pages683-692
Number of pages10
ISBN (electronic)9798400703713
Publication statusPublished - 4 Mar 2024
Event17th ACM International Conference on Web Search and Data Mining, WSDM 2024 - Merida, Mexico
Duration: 4 Mar 20248 Mar 2024

Abstract

Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

Keywords

    large language models, question answering, temporal information retrieval, temporal query intents

ASJC Scopus subject areas

Cite this

Temporal Blind Spots in Large Language Models. / Wallat, Jonas; Jatowt, Adam; Anand, Avishek.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. p. 683-692.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Wallat, J, Jatowt, A & Anand, A 2024, Temporal Blind Spots in Large Language Models. in Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. pp. 683-692, 17th ACM International Conference on Web Search and Data Mining, WSDM 2024, Merida, Mexico, 4 Mar 2024. https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818
Wallat, J., Jatowt, A., & Anand, A. (2024). Temporal Blind Spots in Large Language Models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24 (pp. 683-692) https://doi.org/10.48550/arXiv.2401.12078, https://doi.org/10.1145/3616855.3635818
Wallat J, Jatowt A, Anand A. Temporal Blind Spots in Large Language Models. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. p. 683-692 doi: 10.48550/arXiv.2401.12078, 10.1145/3616855.3635818
Wallat, Jonas ; Jatowt, Adam ; Anand, Avishek. / Temporal Blind Spots in Large Language Models. Proceedings of the 17th ACM International Conference on Web Search and Data Mining: WSDM ’24. 2024. pp. 683-692
Download
@inproceedings{2eff110b72264957b1a5f241c1054d04,
title = "Temporal Blind Spots in Large Language Models",
abstract = "Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.",
keywords = "large language models, question answering, temporal information retrieval, temporal query intents",
author = "Jonas Wallat and Adam Jatowt and Avishek Anand",
note = "Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052 ; 17th ACM International Conference on Web Search and Data Mining, WSDM 2024 ; Conference date: 04-03-2024 Through 08-03-2024",
year = "2024",
month = mar,
day = "4",
doi = "10.48550/arXiv.2401.12078",
language = "English",
pages = "683--692",
booktitle = "Proceedings of the 17th ACM International Conference on Web Search and Data Mining",

}

Download

TY - GEN

T1 - Temporal Blind Spots in Large Language Models

AU - Wallat, Jonas

AU - Jatowt, Adam

AU - Anand, Avishek

N1 - Funding Information: This research was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under the project LeibnizKILabor with grant No. 01DD20003 and Cubra with grant No. 13N16052

PY - 2024/3/4

Y1 - 2024/3/4

N2 - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

AB - Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.

KW - large language models

KW - question answering

KW - temporal information retrieval

KW - temporal query intents

UR - http://www.scopus.com/inward/record.url?scp=85191716137&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2401.12078

DO - 10.48550/arXiv.2401.12078

M3 - Conference contribution

AN - SCOPUS:85191716137

SP - 683

EP - 692

BT - Proceedings of the 17th ACM International Conference on Web Search and Data Mining

T2 - 17th ACM International Conference on Web Search and Data Mining, WSDM 2024

Y2 - 4 March 2024 through 8 March 2024

ER -

By the same author(s)