Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

External Research Organisations

  • Delft University of Technology
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3364-3373
Number of pages10
ISBN (electronic)9781665480451
ISBN (print)978-1-6654-8046-8
Publication statusPublished - 2022
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: 17 Dec 202220 Dec 2022

Abstract

As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.

Keywords

    concept drift detection, machine learning lifecycle management

ASJC Scopus subject areas

Cite this

Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. / Poenaru-Olaru, Lorena; Miranda da Cruz, Luis; Van Deursen, Arie et al.
Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. ed. / Shusaku Tsumoto; Yukio Ohsawa; Lei Chen; Dirk Van den Poel; Xiaohua Hu; Yoichi Motomura; Takuya Takagi; Lingfei Wu; Ying Xie; Akihiro Abe; Vijay Raghavan. Institute of Electrical and Electronics Engineers Inc., 2022. p. 3364-3373.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Poenaru-Olaru, L, Miranda da Cruz, L, Van Deursen, A & Rellermeyer, JS 2022, Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. in S Tsumoto, Y Ohsawa, L Chen, D Van den Poel, X Hu, Y Motomura, T Takagi, L Wu, Y Xie, A Abe & V Raghavan (eds), Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. Institute of Electrical and Electronics Engineers Inc., pp. 3364-3373, 2022 IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, 17 Dec 2022. https://doi.org/10.48550/arXiv.2211.13098, https://doi.org/10.1109/BigData55660.2022.10020292
Poenaru-Olaru, L., Miranda da Cruz, L., Van Deursen, A., & Rellermeyer, J. S. (2022). Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. In S. Tsumoto, Y. Ohsawa, L. Chen, D. Van den Poel, X. Hu, Y. Motomura, T. Takagi, L. Wu, Y. Xie, A. Abe, & V. Raghavan (Eds.), Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 (pp. 3364-3373). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arXiv.2211.13098, https://doi.org/10.1109/BigData55660.2022.10020292
Poenaru-Olaru L, Miranda da Cruz L, Van Deursen A, Rellermeyer JS. Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. In Tsumoto S, Ohsawa Y, Chen L, Van den Poel D, Hu X, Motomura Y, Takagi T, Wu L, Xie Y, Abe A, Raghavan V, editors, Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 3364-3373 doi: 10.48550/arXiv.2211.13098, 10.1109/BigData55660.2022.10020292
Poenaru-Olaru, Lorena ; Miranda da Cruz, Luis ; Van Deursen, Arie et al. / Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022. editor / Shusaku Tsumoto ; Yukio Ohsawa ; Lei Chen ; Dirk Van den Poel ; Xiaohua Hu ; Yoichi Motomura ; Takuya Takagi ; Lingfei Wu ; Ying Xie ; Akihiro Abe ; Vijay Raghavan. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 3364-3373
Download
@inproceedings{641ffe08f8e2489fadb846691ba98ec0,
title = "Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study",
abstract = "As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.",
keywords = "concept drift detection, machine learning lifecycle management",
author = "Lorena Poenaru-Olaru and {Miranda da Cruz}, Luis and {Van Deursen}, Arie and Rellermeyer, {Jan S.}",
note = "Funding Information: ACKNOWLEDGMENT This work was partially supported by ING through the AI for Fintech Research Lab with TU Delft. ; 2022 IEEE International Conference on Big Data, Big Data 2022 ; Conference date: 17-12-2022 Through 20-12-2022",
year = "2022",
doi = "10.48550/arXiv.2211.13098",
language = "English",
isbn = "978-1-6654-8046-8",
pages = "3364--3373",
editor = "Shusaku Tsumoto and Yukio Ohsawa and Lei Chen and {Van den Poel}, Dirk and Xiaohua Hu and Yoichi Motomura and Takuya Takagi and Lingfei Wu and Ying Xie and Akihiro Abe and Vijay Raghavan",
booktitle = "Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study

AU - Poenaru-Olaru, Lorena

AU - Miranda da Cruz, Luis

AU - Van Deursen, Arie

AU - Rellermeyer, Jan S.

N1 - Funding Information: ACKNOWLEDGMENT This work was partially supported by ING through the AI for Fintech Research Lab with TU Delft.

PY - 2022

Y1 - 2022

N2 - As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.

AB - As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as an alarming system.

KW - concept drift detection

KW - machine learning lifecycle management

UR - http://www.scopus.com/inward/record.url?scp=85147976931&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2211.13098

DO - 10.48550/arXiv.2211.13098

M3 - Conference contribution

AN - SCOPUS:85147976931

SN - 978-1-6654-8046-8

SP - 3364

EP - 3373

BT - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

A2 - Tsumoto, Shusaku

A2 - Ohsawa, Yukio

A2 - Chen, Lei

A2 - Van den Poel, Dirk

A2 - Hu, Xiaohua

A2 - Motomura, Yoichi

A2 - Takagi, Takuya

A2 - Wu, Lingfei

A2 - Xie, Ying

A2 - Abe, Akihiro

A2 - Raghavan, Vijay

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE International Conference on Big Data, Big Data 2022

Y2 - 17 December 2022 through 20 December 2022

ER -

By the same author(s)