Rethinking Evaluation Methods for Machine Unlearning

Leon Wichert; Sandipan Sikdar

doi:10.18653/v1/2024.findings-emnlp.271

Details

Originalsprache	Englisch
Titel des Sammelwerks	Findings of the Association for Computational Linguistics
Untertitel	EMNLP 2024
Herausgeber/-innen	Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Seiten	4727-4739
Seitenumfang	13
ISBN (elektronisch)	9798891761681
Publikationsstatus	Veröffentlicht - 12 Nov. 2024
Veranstaltung	2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, USA / Vereinigte Staaten Dauer: 12 Nov. 2024 → 16 Nov. 2024

Abstract

Machine *unlearning* refers to methods for deleting information about specific training instances from a trained machine learning model. This enables models to delete user information and comply with privacy regulations. While retraining the model from scratch on the training set excluding the instances to be “forgotten” would result in a desired unlearned model, owing to the size of datasets and models, it is infeasible. Hence, unlearning algorithms have been developed, where the goal is to obtain an unlearned model that behaves as closely as possible to the retrained model. Consequently, evaluating an unlearning method involves - (i) randomly selecting a forget set (i.e., the training instances to be unlearned), (ii) obtaining an unlearned and a retrained model, and (iii) comparing the performance of the unlearned and the retrained model on the test and forget set. However, when the forget set is randomly selected, the unlearned model is almost often similar to the original (i.e., prior to unlearning) model. Hence, it is unclear if the model did really unlearn or simply copied the weights from the original model. For a more robust evaluation, we instead propose to consider training instances with significant influence on the trained model. When such influential instances are considered in the forget set, we observe that the unlearned model deviates significantly from the retrained model. Such deviations are also observed when the size of the forget set is increased. Lastly, choice of dataset for evaluation could also lead to misleading interpretation of results.

ASJC Scopus Sachgebiete

Informatik (insg.)
Theoretische Informatik und Mathematik
Informatik (insg.)
Angewandte Informatik
Informatik (insg.)
Information systems
Sozialwissenschaften (insg.)
Linguistik und Sprache

Zitieren

Rethinking Evaluation Methods for Machine Unlearning. / Wichert, Leon; Sikdar, Sandipan.
Findings of the Association for Computational Linguistics: EMNLP 2024. Hrsg. / Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen. 2024. S. 4727-4739.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Wichert, L & Sikdar, S 2024, Rethinking Evaluation Methods for Machine Unlearning. in Y Al-Onaizan, M Bansal & Y-N Chen (Hrsg.), Findings of the Association for Computational Linguistics: EMNLP 2024. S. 4727-4739, 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Hybrid, Miami, USA / Vereinigte Staaten, 12 Nov. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.271

Wichert, L., & Sikdar, S. (2024). Rethinking Evaluation Methods for Machine Unlearning. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Hrsg.), Findings of the Association for Computational Linguistics: EMNLP 2024 (S. 4727-4739) https://doi.org/10.18653/v1/2024.findings-emnlp.271

Wichert L, Sikdar S. Rethinking Evaluation Methods for Machine Unlearning. in Al-Onaizan Y, Bansal M, Chen YN, Hrsg., Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. S. 4727-4739 doi: 10.18653/v1/2024.findings-emnlp.271

Wichert, Leon ; Sikdar, Sandipan. / Rethinking Evaluation Methods for Machine Unlearning. Findings of the Association for Computational Linguistics: EMNLP 2024. Hrsg. / Yaser Al-Onaizan ; Mohit Bansal ; Yun-Nung Chen. 2024. S. 4727-4739

Download

@inproceedings{a7d13687cdba42899327e26944099c2a,

title = "Rethinking Evaluation Methods for Machine Unlearning",

abstract = "Machine *unlearning* refers to methods for deleting information about specific training instances from a trained machine learning model. This enables models to delete user information and comply with privacy regulations. While retraining the model from scratch on the training set excluding the instances to be “forgotten” would result in a desired unlearned model, owing to the size of datasets and models, it is infeasible. Hence, unlearning algorithms have been developed, where the goal is to obtain an unlearned model that behaves as closely as possible to the retrained model. Consequently, evaluating an unlearning method involves - (i) randomly selecting a forget set (i.e., the training instances to be unlearned), (ii) obtaining an unlearned and a retrained model, and (iii) comparing the performance of the unlearned and the retrained model on the test and forget set. However, when the forget set is randomly selected, the unlearned model is almost often similar to the original (i.e., prior to unlearning) model. Hence, it is unclear if the model did really unlearn or simply copied the weights from the original model. For a more robust evaluation, we instead propose to consider training instances with significant influence on the trained model. When such influential instances are considered in the forget set, we observe that the unlearned model deviates significantly from the retrained model. Such deviations are also observed when the size of the forget set is increased. Lastly, choice of dataset for evaluation could also lead to misleading interpretation of results.",

author = "Leon Wichert and Sandipan Sikdar",

note = "Publisher Copyright: {\textcopyright} 2024 Association for Computational Linguistics.; 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 ; Conference date: 12-11-2024 Through 16-11-2024",

year = "2024",

month = nov,

day = "12",

doi = "10.18653/v1/2024.findings-emnlp.271",

language = "English",

pages = "4727--4739",

editor = "Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen",

booktitle = "Findings of the Association for Computational Linguistics",

}

Download

TY - GEN

T1 - Rethinking Evaluation Methods for Machine Unlearning

AU - Wichert, Leon

AU - Sikdar, Sandipan

PY - 2024/11/12

Y1 - 2024/11/12

N2 - Machine *unlearning* refers to methods for deleting information about specific training instances from a trained machine learning model. This enables models to delete user information and comply with privacy regulations. While retraining the model from scratch on the training set excluding the instances to be “forgotten” would result in a desired unlearned model, owing to the size of datasets and models, it is infeasible. Hence, unlearning algorithms have been developed, where the goal is to obtain an unlearned model that behaves as closely as possible to the retrained model. Consequently, evaluating an unlearning method involves - (i) randomly selecting a forget set (i.e., the training instances to be unlearned), (ii) obtaining an unlearned and a retrained model, and (iii) comparing the performance of the unlearned and the retrained model on the test and forget set. However, when the forget set is randomly selected, the unlearned model is almost often similar to the original (i.e., prior to unlearning) model. Hence, it is unclear if the model did really unlearn or simply copied the weights from the original model. For a more robust evaluation, we instead propose to consider training instances with significant influence on the trained model. When such influential instances are considered in the forget set, we observe that the unlearned model deviates significantly from the retrained model. Such deviations are also observed when the size of the forget set is increased. Lastly, choice of dataset for evaluation could also lead to misleading interpretation of results.

AB - Machine *unlearning* refers to methods for deleting information about specific training instances from a trained machine learning model. This enables models to delete user information and comply with privacy regulations. While retraining the model from scratch on the training set excluding the instances to be “forgotten” would result in a desired unlearned model, owing to the size of datasets and models, it is infeasible. Hence, unlearning algorithms have been developed, where the goal is to obtain an unlearned model that behaves as closely as possible to the retrained model. Consequently, evaluating an unlearning method involves - (i) randomly selecting a forget set (i.e., the training instances to be unlearned), (ii) obtaining an unlearned and a retrained model, and (iii) comparing the performance of the unlearned and the retrained model on the test and forget set. However, when the forget set is randomly selected, the unlearned model is almost often similar to the original (i.e., prior to unlearning) model. Hence, it is unclear if the model did really unlearn or simply copied the weights from the original model. For a more robust evaluation, we instead propose to consider training instances with significant influence on the trained model. When such influential instances are considered in the forget set, we observe that the unlearned model deviates significantly from the retrained model. Such deviations are also observed when the size of the forget set is increased. Lastly, choice of dataset for evaluation could also lead to misleading interpretation of results.

UR - http://www.scopus.com/inward/record.url?scp=85217615154&partnerID=8YFLogxK

U2 - 10.18653/v1/2024.findings-emnlp.271

DO - 10.18653/v1/2024.findings-emnlp.271

M3 - Conference contribution

AN - SCOPUS:85217615154

SP - 4727

EP - 4739

BT - Findings of the Association for Computational Linguistics

A2 - Al-Onaizan, Yaser

A2 - Bansal, Mohit

A2 - Chen, Yun-Nung

T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

Y2 - 12 November 2024 through 16 November 2024

ER -

Research@Leibniz University

Rethinking Evaluation Methods for Machine Unlearning

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren