BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Jonas Wallat; Jaspreet Singh; Avishek Anand

doi:10.18653/v1/2020.blackboxnlp-1.17

Details

Originalsprache	Englisch
Titel des Sammelwerks	Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Seiten	174–183
ISBN (elektronisch)	ISBN 978-1-952148-86-6
Publikationsstatus	Veröffentlicht - Nov. 2020

Abstract

Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

Zitieren

BERTnesia: Investigating the capture and forgetting of knowledge in BERT. / Wallat, Jonas; Singh, Jaspreet; Anand, Avishek.
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. 2020. S. 174–183.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Wallat, J, Singh, J & Anand, A 2020, BERTnesia: Investigating the capture and forgetting of knowledge in BERT. in Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. S. 174–183. https://doi.org/10.18653/v1/2020.blackboxnlp-1.17

Wallat, J., Singh, J., & Anand, A. (2020). BERTnesia: Investigating the capture and forgetting of knowledge in BERT. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (S. 174–183) https://doi.org/10.18653/v1/2020.blackboxnlp-1.17

Wallat J, Singh J, Anand A. BERTnesia: Investigating the capture and forgetting of knowledge in BERT. in Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. 2020. S. 174–183 Epub 2020 Okt 19. doi: 10.18653/v1/2020.blackboxnlp-1.17

Wallat, Jonas ; Singh, Jaspreet ; Anand, Avishek. / BERTnesia: Investigating the capture and forgetting of knowledge in BERT. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. 2020. S. 174–183

Download

@inproceedings{50ed398836314e91a7b9485636b979c1,

title = "BERTnesia: Investigating the capture and forgetting of knowledge in BERT",

abstract = " Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments. ",

keywords = "cs.CL, cs.LG, I.2.7",

author = "Jonas Wallat and Jaspreet Singh and Avishek Anand",

note = "BBNLP 2020",

year = "2020",

month = nov,

doi = "10.18653/v1/2020.blackboxnlp-1.17",

language = "English",

pages = "174–183",

booktitle = "Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP",

}

Download

TY - GEN

T1 - BERTnesia: Investigating the capture and forgetting of knowledge in BERT

AU - Wallat, Jonas

AU - Singh, Jaspreet

AU - Anand, Avishek

N1 - BBNLP 2020

PY - 2020/11

Y1 - 2020/11

N2 - Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

AB - Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

KW - cs.CL

KW - cs.LG

KW - I.2.7

U2 - 10.18653/v1/2020.blackboxnlp-1.17

DO - 10.18653/v1/2020.blackboxnlp-1.17

M3 - Conference contribution

SP - 174

EP - 183

BT - Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

ER -

Research@Leibniz University

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Autorschaft

Organisationseinheiten

Details

Abstract

Zitieren

Von denselben Autoren

Temporal Blind Spots in Large Language Models

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

Causal Probing for Dual Encoders

Temporal Blind Spots in Large Language Models

Probing BERT for Ranking Abilities

GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

Causal Probing for Dual Encoders

Temporal Blind Spots in Large Language Models

Probing BERT for Ranking Abilities