Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study

Lorena Poenaru-Olaru; Luis Miranda da Cruz; Arie Van Deursen; Jan S. Rellermeyer

doi:10.48550/arXiv.2211.13098

Details

Original language	English
Title of host publication	Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
Editors	Shusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3364-3373
Number of pages	10
ISBN (electronic)	9781665480451
ISBN (print)	978-1-6654-8046-8
Publication status	Published - 2022
Event	2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan Duration: 17 Dec 2022 → 20 Dec 2022

Abstract

As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.

Keywords

concept drift detection, machine learning lifecycle management

ASJC Scopus subject areas

Mathematics(all)
Modelling and Simulation
Computer Science(all)
Computer Networks and Communications
Computer Science(all)
Information Systems
Decision Sciences(all)
Information Systems and Management
Engineering(all)
Safety, Risk, Reliability and Quality
Mathematics(all)
Control and Optimization

Cite this

Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. / Poenaru-Olaru, Lorena; Miranda da Cruz, Luis; Van Deursen, Arie et al.
Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. ed. / Shusaku Tsumoto; Yukio Ohsawa; Lei Chen; Dirk Van den Poel; Xiaohua Hu; Yoichi Motomura; Takuya Takagi; Lingfei Wu; Ying Xie; Akihiro Abe; Vijay Raghavan. Institute of Electrical and Electronics Engineers Inc., 2022. p. 3364-3373.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Poenaru-Olaru, L, Miranda da Cruz, L, Van Deursen, A & Rellermeyer, JS 2022, Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. in S Tsumoto, Y Ohsawa, L Chen, D Van den Poel, X Hu, Y Motomura, T Takagi, L Wu, Y Xie, A Abe & V Raghavan (eds), Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. Institute of Electrical and Electronics Engineers Inc., pp. 3364-3373, 2022 IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, 17 Dec 2022. https://doi.org/10.48550/arXiv.2211.13098, https://doi.org/10.1109/BigData55660.2022.10020292

Poenaru-Olaru, L., Miranda da Cruz, L., Van Deursen, A., & Rellermeyer, J. S. (2022). Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. In S. Tsumoto, Y. Ohsawa, L. Chen, D. Van den Poel, X. Hu, Y. Motomura, T. Takagi, L. Wu, Y. Xie, A. Abe, & V. Raghavan (Eds.), Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 (pp. 3364-3373). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arXiv.2211.13098, https://doi.org/10.1109/BigData55660.2022.10020292

Poenaru-Olaru L, Miranda da Cruz L, Van Deursen A, Rellermeyer JS. Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. In Tsumoto S, Ohsawa Y, Chen L, Van den Poel D, Hu X, Motomura Y, Takagi T, Wu L, Xie Y, Abe A, Raghavan V, editors, Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 3364-3373 doi: 10.48550/arXiv.2211.13098, 10.1109/BigData55660.2022.10020292

Poenaru-Olaru, Lorena ; Miranda da Cruz, Luis ; Van Deursen, Arie et al. / Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. editor / Shusaku Tsumoto ; Yukio Ohsawa ; Lei Chen ; Dirk Van den Poel ; Xiaohua Hu ; Yoichi Motomura ; Takuya Takagi ; Lingfei Wu ; Ying Xie ; Akihiro Abe ; Vijay Raghavan. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 3364-3373

Download

@inproceedings{641ffe08f8e2489fadb846691ba98ec0,

title = "Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study",

abstract = "As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.",

keywords = "concept drift detection, machine learning lifecycle management",

author = "Lorena Poenaru-Olaru and {Miranda da Cruz}, Luis and {Van Deursen}, Arie and Rellermeyer, {Jan S.}",

note = "Funding Information: ACKNOWLEDGMENT This work was partially supported by ING through the AI for Fintech Research Lab with TU Delft. ; 2022 IEEE International Conference on Big Data, Big Data 2022 ; Conference date: 17-12-2022 Through 20-12-2022",

year = "2022",

doi = "10.48550/arXiv.2211.13098",

language = "English",

isbn = "978-1-6654-8046-8",

pages = "3364--3373",

editor = "Shusaku Tsumoto and Yukio Ohsawa and Lei Chen and {Van den Poel}, Dirk and Xiaohua Hu and Yoichi Motomura and Takuya Takagi and Lingfei Wu and Ying Xie and Akihiro Abe and Vijay Raghavan",

booktitle = "Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

address = "United States",

}

Download

TY - GEN

T1 - Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study

AU - Poenaru-Olaru, Lorena

AU - Miranda da Cruz, Luis

AU - Van Deursen, Arie

AU - Rellermeyer, Jan S.

N1 - Funding Information: ACKNOWLEDGMENT This work was partially supported by ING through the AI for Fintech Research Lab with TU Delft.

PY - 2022

Y1 - 2022

N2 - As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.

AB - As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.

KW - concept drift detection

KW - machine learning lifecycle management

UR - http://www.scopus.com/inward/record.url?scp=85147976931&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2211.13098

DO - 10.48550/arXiv.2211.13098

M3 - Conference contribution

AN - SCOPUS:85147976931

SN - 978-1-6654-8046-8

SP - 3364

EP - 3373

BT - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

A2 - Tsumoto, Shusaku

A2 - Ohsawa, Yukio

A2 - Chen, Lei

A2 - Van den Poel, Dirk

A2 - Hu, Xiaohua

A2 - Motomura, Yoichi

A2 - Takagi, Takuya

A2 - Wu, Lingfei

A2 - Xie, Ying

A2 - Abe, Akihiro

A2 - Raghavan, Vijay

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE International Conference on Big Data, Big Data 2022

Y2 - 17 December 2022 through 20 December 2022

ER -

Research@Leibniz University

Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Toward Competitive Serverless Deep Learning

The Performance of Distributed Applications: A Traffic Shaping Perspective

Log Parsing Evaluation in the Era of Modern Software Systems

Brug: An Adaptive Memory (Re-)Allocator

Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World

Toward Competitive Serverless Deep Learning

The Performance of Distributed Applications: A Traffic Shaping Perspective

Log Parsing Evaluation in the Era of Modern Software Systems

Brug: An Adaptive Memory (Re-)Allocator

Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World

Toward Competitive Serverless Deep Learning