An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Sajjad Kamali Siahroudi
  • Daniel Kudenko

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksAdvances in Knowledge Discovery and Data Mining
Untertitel25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part I
Herausgeber/-innenKamal Karlapalem, Hong Cheng, Naren Ramakrishnan, R. K. Agrawal, P. Krishna Reddy, Jaideep Srivastava, Tanmoy Chakraborty
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten603-615
Seitenumfang13
ISBN (elektronisch)978-3-030-75762-5
ISBN (Print)9783030757618
PublikationsstatusVeröffentlicht - 9 Mai 2021
Veranstaltung25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2021 - Virtual, Online
Dauer: 11 Mai 202114 Mai 2021

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band12712 LNAI
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Abstract

Online learning is one of the trending areas of machine learning in recent years. How to update the model based on new data is the core question in developing an online classifier. When new data arrives, the classifier should keep its model up-to-date by (1) learn new knowledge, (2) keep relevant learned knowledge, and (3) forget obsolete knowledge. This problem becomes more challenging in imbalanced non-stationary scenarios. Previous approaches save arriving instances, then utilize up/down sampling techniques to balance preserved samples and update their models. However, this strategy comes with two drawbacks: first, a delay in updating the models, and second, the up/down sampling causes information loss for the majority classes and introduces noise for the minority classes. To address these drawbacks, we propose the Hyper-Ellipses-Extra-Margin model (HEEM), which properly addresses the class imbalance challenge in online learning by reacting to every new instance as it arrives. HEEM keeps an ensemble of hyper-extended-ellipses for the minority class. Misclassified instances of the majority class are then used to shrink the ellipse, and correctly predicted instances of the minority class are used to enlarge the ellipse. Experimental results show that HEEM mitigates the class imbalance problem and outperforms the state-of-the-art methods.

ASJC Scopus Sachgebiete

Zitieren

An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class. / Siahroudi, Sajjad Kamali; Kudenko, Daniel.
Advances in Knowledge Discovery and Data Mining : 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part I. Hrsg. / Kamal Karlapalem; Hong Cheng; Naren Ramakrishnan; R. K. Agrawal; P. Krishna Reddy; Jaideep Srivastava; Tanmoy Chakraborty. Springer Science and Business Media Deutschland GmbH, 2021. S. 603-615 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12712 LNAI).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Siahroudi, SK & Kudenko, D 2021, An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class. in K Karlapalem, H Cheng, N Ramakrishnan, RK Agrawal, PK Reddy, J Srivastava & T Chakraborty (Hrsg.), Advances in Knowledge Discovery and Data Mining : 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12712 LNAI, Springer Science and Business Media Deutschland GmbH, S. 603-615, 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2021, Virtual, Online, 11 Mai 2021. https://doi.org/10.1007/978-3-030-75762-5_48
Siahroudi, S. K., & Kudenko, D. (2021). An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class. In K. Karlapalem, H. Cheng, N. Ramakrishnan, R. K. Agrawal, P. K. Reddy, J. Srivastava, & T. Chakraborty (Hrsg.), Advances in Knowledge Discovery and Data Mining : 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part I (S. 603-615). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12712 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-75762-5_48
Siahroudi SK, Kudenko D. An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class. in Karlapalem K, Cheng H, Ramakrishnan N, Agrawal RK, Reddy PK, Srivastava J, Chakraborty T, Hrsg., Advances in Knowledge Discovery and Data Mining : 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part I. Springer Science and Business Media Deutschland GmbH. 2021. S. 603-615. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-75762-5_48
Siahroudi, Sajjad Kamali ; Kudenko, Daniel. / An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class. Advances in Knowledge Discovery and Data Mining : 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part I. Hrsg. / Kamal Karlapalem ; Hong Cheng ; Naren Ramakrishnan ; R. K. Agrawal ; P. Krishna Reddy ; Jaideep Srivastava ; Tanmoy Chakraborty. Springer Science and Business Media Deutschland GmbH, 2021. S. 603-615 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{4ce549f7d741411d80cc747ecf3bfd9e,
title = "An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class",
abstract = "Online learning is one of the trending areas of machine learning in recent years. How to update the model based on new data is the core question in developing an online classifier. When new data arrives, the classifier should keep its model up-to-date by (1) learn new knowledge, (2) keep relevant learned knowledge, and (3) forget obsolete knowledge. This problem becomes more challenging in imbalanced non-stationary scenarios. Previous approaches save arriving instances, then utilize up/down sampling techniques to balance preserved samples and update their models. However, this strategy comes with two drawbacks: first, a delay in updating the models, and second, the up/down sampling causes information loss for the majority classes and introduces noise for the minority classes. To address these drawbacks, we propose the Hyper-Ellipses-Extra-Margin model (HEEM), which properly addresses the class imbalance challenge in online learning by reacting to every new instance as it arrives. HEEM keeps an ensemble of hyper-extended-ellipses for the minority class. Misclassified instances of the majority class are then used to shrink the ellipse, and correctly predicted instances of the minority class are used to enlarge the ellipse. Experimental results show that HEEM mitigates the class imbalance problem and outperforms the state-of-the-art methods.",
keywords = "Imbalanced data, Nonstationary data, Online learning",
author = "Siahroudi, {Sajjad Kamali} and Daniel Kudenko",
year = "2021",
month = may,
day = "9",
doi = "10.1007/978-3-030-75762-5_48",
language = "English",
isbn = "9783030757618",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "603--615",
editor = "Kamal Karlapalem and Hong Cheng and Naren Ramakrishnan and Agrawal, {R. K.} and Reddy, {P. Krishna} and Jaideep Srivastava and Tanmoy Chakraborty",
booktitle = "Advances in Knowledge Discovery and Data Mining",
address = "Germany",
note = "25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2021 ; Conference date: 11-05-2021 Through 14-05-2021",

}

Download

TY - GEN

T1 - An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class

AU - Siahroudi, Sajjad Kamali

AU - Kudenko, Daniel

PY - 2021/5/9

Y1 - 2021/5/9

N2 - Online learning is one of the trending areas of machine learning in recent years. How to update the model based on new data is the core question in developing an online classifier. When new data arrives, the classifier should keep its model up-to-date by (1) learn new knowledge, (2) keep relevant learned knowledge, and (3) forget obsolete knowledge. This problem becomes more challenging in imbalanced non-stationary scenarios. Previous approaches save arriving instances, then utilize up/down sampling techniques to balance preserved samples and update their models. However, this strategy comes with two drawbacks: first, a delay in updating the models, and second, the up/down sampling causes information loss for the majority classes and introduces noise for the minority classes. To address these drawbacks, we propose the Hyper-Ellipses-Extra-Margin model (HEEM), which properly addresses the class imbalance challenge in online learning by reacting to every new instance as it arrives. HEEM keeps an ensemble of hyper-extended-ellipses for the minority class. Misclassified instances of the majority class are then used to shrink the ellipse, and correctly predicted instances of the minority class are used to enlarge the ellipse. Experimental results show that HEEM mitigates the class imbalance problem and outperforms the state-of-the-art methods.

AB - Online learning is one of the trending areas of machine learning in recent years. How to update the model based on new data is the core question in developing an online classifier. When new data arrives, the classifier should keep its model up-to-date by (1) learn new knowledge, (2) keep relevant learned knowledge, and (3) forget obsolete knowledge. This problem becomes more challenging in imbalanced non-stationary scenarios. Previous approaches save arriving instances, then utilize up/down sampling techniques to balance preserved samples and update their models. However, this strategy comes with two drawbacks: first, a delay in updating the models, and second, the up/down sampling causes information loss for the majority classes and introduces noise for the minority classes. To address these drawbacks, we propose the Hyper-Ellipses-Extra-Margin model (HEEM), which properly addresses the class imbalance challenge in online learning by reacting to every new instance as it arrives. HEEM keeps an ensemble of hyper-extended-ellipses for the minority class. Misclassified instances of the majority class are then used to shrink the ellipse, and correctly predicted instances of the minority class are used to enlarge the ellipse. Experimental results show that HEEM mitigates the class imbalance problem and outperforms the state-of-the-art methods.

KW - Imbalanced data

KW - Nonstationary data

KW - Online learning

UR - http://www.scopus.com/inward/record.url?scp=85111098740&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-75762-5_48

DO - 10.1007/978-3-030-75762-5_48

M3 - Conference contribution

AN - SCOPUS:85111098740

SN - 9783030757618

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 603

EP - 615

BT - Advances in Knowledge Discovery and Data Mining

A2 - Karlapalem, Kamal

A2 - Cheng, Hong

A2 - Ramakrishnan, Naren

A2 - Agrawal, R. K.

A2 - Reddy, P. Krishna

A2 - Srivastava, Jaideep

A2 - Chakraborty, Tanmoy

PB - Springer Science and Business Media Deutschland GmbH

T2 - 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2021

Y2 - 11 May 2021 through 14 May 2021

ER -