KAMEL: Knowledge Analysis with Multitoken Entities in Language Models

Jan-Christoph Kalo; Leandra Fichtel

Details

Originalsprache	Englisch
Titel des Sammelwerks	Conference on Automated Knowledge Base Construction
Publikationsstatus	Veröffentlicht - 2022
Extern publiziert	Ja

Abstract

Large language models (LMs) have been shown to capture large amounts of relational knowledge from the pre-training corpus. These models can be probed for this factual knowledge by using cloze-style prompts as demonstrated on the LAMA benchmark. However, recent studies have uncovered that results only perform well, because the models are good at performing educated guesses or recalling facts from the training data. We present a novel Wikidata-based benchmark dataset, KAMEL, for probing relational knowledge in LMs. In contrast to previous datasets, it covers a broader range of knowledge, probes for single-, and multi-token entities, and contains facts with literal values. Furthermore, the evaluation procedure is more accurate, since the dataset contains alternative entity labels and deals with higher-cardinality relations. Instead of performing the evaluation on masked language models, we present results for a variety of recent causal LMs in a few-shot setting. We show that indeed novel models perform very well on LAMA, achieving a promising F1-score of 52.90%, while only achieving 17.62% on KAMEL. Our analysis shows that even large language models are far from being able to memorize all varieties of relational knowledge that is usually stored knowledge graphs.

Zitieren

KAMEL: Knowledge Analysis with Multitoken Entities in Language Models. / Kalo, Jan-Christoph; Fichtel, Leandra.
Conference on Automated Knowledge Base Construction. 2022.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Kalo, J-C & Fichtel, L 2022, KAMEL: Knowledge Analysis with Multitoken Entities in Language Models. in Conference on Automated Knowledge Base Construction.

Kalo, J.-C., & Fichtel, L. (2022). KAMEL: Knowledge Analysis with Multitoken Entities in Language Models. In Conference on Automated Knowledge Base Construction

Kalo JC, Fichtel L. KAMEL: Knowledge Analysis with Multitoken Entities in Language Models. in Conference on Automated Knowledge Base Construction. 2022

Kalo, Jan-Christoph ; Fichtel, Leandra. / KAMEL: Knowledge Analysis with Multitoken Entities in Language Models. Conference on Automated Knowledge Base Construction. 2022.

Download

@inproceedings{067240ed740b4ecbac2e523dd2353af4,

title = "KAMEL: Knowledge Analysis with Multitoken Entities in Language Models",

abstract = "Large language models (LMs) have been shown to capture large amounts of relational knowledge from the pre-training corpus. These models can be probed for this factual knowledge by using cloze-style prompts as demonstrated on the LAMA benchmark. However, recent studies have uncovered that results only perform well, because the models are good at performing educated guesses or recalling facts from the training data. We present a novel Wikidata-based benchmark dataset, KAMEL, for probing relational knowledge in LMs. In contrast to previous datasets, it covers a broader range of knowledge, probes for single-, and multi-token entities, and contains facts with literal values. Furthermore, the evaluation procedure is more accurate, since the dataset contains alternative entity labels and deals with higher-cardinality relations. Instead of performing the evaluation on masked language models, we present results for a variety of recent causal LMs in a few-shot setting. We show that indeed novel models perform very well on LAMA, achieving a promising F1-score of 52.90%, while only achieving 17.62% on KAMEL. Our analysis shows that even large language models are far from being able to memorize all varieties of relational knowledge that is usually stored knowledge graphs.",

author = "Jan-Christoph Kalo and Leandra Fichtel",

year = "2022",

language = "English",

booktitle = "Conference on Automated Knowledge Base Construction",

}

Download

TY - GEN

T1 - KAMEL: Knowledge Analysis with Multitoken Entities in Language Models

AU - Kalo, Jan-Christoph

AU - Fichtel, Leandra

PY - 2022

Y1 - 2022

N2 - Large language models (LMs) have been shown to capture large amounts of relational knowledge from the pre-training corpus. These models can be probed for this factual knowledge by using cloze-style prompts as demonstrated on the LAMA benchmark. However, recent studies have uncovered that results only perform well, because the models are good at performing educated guesses or recalling facts from the training data. We present a novel Wikidata-based benchmark dataset, KAMEL, for probing relational knowledge in LMs. In contrast to previous datasets, it covers a broader range of knowledge, probes for single-, and multi-token entities, and contains facts with literal values. Furthermore, the evaluation procedure is more accurate, since the dataset contains alternative entity labels and deals with higher-cardinality relations. Instead of performing the evaluation on masked language models, we present results for a variety of recent causal LMs in a few-shot setting. We show that indeed novel models perform very well on LAMA, achieving a promising F1-score of 52.90%, while only achieving 17.62% on KAMEL. Our analysis shows that even large language models are far from being able to memorize all varieties of relational knowledge that is usually stored knowledge graphs.

AB - Large language models (LMs) have been shown to capture large amounts of relational knowledge from the pre-training corpus. These models can be probed for this factual knowledge by using cloze-style prompts as demonstrated on the LAMA benchmark. However, recent studies have uncovered that results only perform well, because the models are good at performing educated guesses or recalling facts from the training data. We present a novel Wikidata-based benchmark dataset, KAMEL, for probing relational knowledge in LMs. In contrast to previous datasets, it covers a broader range of knowledge, probes for single-, and multi-token entities, and contains facts with literal values. Furthermore, the evaluation procedure is more accurate, since the dataset contains alternative entity labels and deals with higher-cardinality relations. Instead of performing the evaluation on masked language models, we present results for a variety of recent causal LMs in a few-shot setting. We show that indeed novel models perform very well on LAMA, achieving a promising F1-score of 52.90%, while only achieving 17.62% on KAMEL. Our analysis shows that even large language models are far from being able to memorize all varieties of relational knowledge that is usually stored knowledge graphs.

M3 - Conference contribution

BT - Conference on Automated Knowledge Base Construction

ER -

Research@Leibniz University

KAMEL: Knowledge Analysis with Multitoken Entities in Language Models

Autoren

Externe Organisationen

Details

Abstract

Zitieren

Von denselben Autoren

Prompt Tuning or Fine-Tuning: Investigating Relational Knowledge in Pre-Trained Language Models.

KnowlyBERT: Hybrid Query Answering over Language Models and Knowledge Graphs