Resource management for model learning at entity level

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Christian Beyer
  • Vishnu Unnikrishnan
  • Robert Brüggemann
  • Vincent Toulouse
  • Hafez Kader Omar
  • Eirini Ntoutsi
  • Myra Spiliopoulou

Organisationseinheiten

Externe Organisationen

  • Otto-von-Guericke-Universität Magdeburg
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)549-561
Seitenumfang13
FachzeitschriftAnnales des Telecommunications/Annals of Telecommunications
Jahrgang75
Ausgabenummer9-10
Frühes Online-Datum29 Aug. 2020
PublikationsstatusVeröffentlicht - Okt. 2020

Abstract

Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model.

ASJC Scopus Sachgebiete

Zitieren

Resource management for model learning at entity level. / Beyer, Christian; Unnikrishnan, Vishnu; Brüggemann, Robert et al.
in: Annales des Telecommunications/Annals of Telecommunications, Jahrgang 75, Nr. 9-10, 10.2020, S. 549-561.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Beyer, C, Unnikrishnan, V, Brüggemann, R, Toulouse, V, Omar, HK, Ntoutsi, E & Spiliopoulou, M 2020, 'Resource management for model learning at entity level', Annales des Telecommunications/Annals of Telecommunications, Jg. 75, Nr. 9-10, S. 549-561. https://doi.org/10.1007/s12243-020-00800-4
Beyer, C., Unnikrishnan, V., Brüggemann, R., Toulouse, V., Omar, H. K., Ntoutsi, E., & Spiliopoulou, M. (2020). Resource management for model learning at entity level. Annales des Telecommunications/Annals of Telecommunications, 75(9-10), 549-561. https://doi.org/10.1007/s12243-020-00800-4
Beyer C, Unnikrishnan V, Brüggemann R, Toulouse V, Omar HK, Ntoutsi E et al. Resource management for model learning at entity level. Annales des Telecommunications/Annals of Telecommunications. 2020 Okt;75(9-10):549-561. Epub 2020 Aug 29. doi: 10.1007/s12243-020-00800-4
Beyer, Christian ; Unnikrishnan, Vishnu ; Brüggemann, Robert et al. / Resource management for model learning at entity level. in: Annales des Telecommunications/Annals of Telecommunications. 2020 ; Jahrgang 75, Nr. 9-10. S. 549-561.
Download
@article{15f55ac302dc49c9b05a62409b58f226,
title = "Resource management for model learning at entity level",
abstract = "Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model.",
keywords = "Document prediction, Entity-centric learning, Memory reduction, Stream classification, Text ignorant models",
author = "Christian Beyer and Vishnu Unnikrishnan and Robert Br{\"u}ggemann and Vincent Toulouse and Omar, {Hafez Kader} and Eirini Ntoutsi and Myra Spiliopoulou",
note = "Funding information: Open Access funding provided by Projekt DEAL. This work was partially funded by the German Research Foundation, project OSCAR “Opinion Stream Classification with Ensembles and Active Learners.” Additionally, the first author is also partially funded by a PhD grant from the federal state of Saxony-Anhalt.",
year = "2020",
month = oct,
doi = "10.1007/s12243-020-00800-4",
language = "English",
volume = "75",
pages = "549--561",
journal = "Annales des Telecommunications/Annals of Telecommunications",
issn = "0003-4347",
publisher = "Springer Paris",
number = "9-10",

}

Download

TY - JOUR

T1 - Resource management for model learning at entity level

AU - Beyer, Christian

AU - Unnikrishnan, Vishnu

AU - Brüggemann, Robert

AU - Toulouse, Vincent

AU - Omar, Hafez Kader

AU - Ntoutsi, Eirini

AU - Spiliopoulou, Myra

N1 - Funding information: Open Access funding provided by Projekt DEAL. This work was partially funded by the German Research Foundation, project OSCAR “Opinion Stream Classification with Ensembles and Active Learners.” Additionally, the first author is also partially funded by a PhD grant from the federal state of Saxony-Anhalt.

PY - 2020/10

Y1 - 2020/10

N2 - Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model.

AB - Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model.

KW - Document prediction

KW - Entity-centric learning

KW - Memory reduction

KW - Stream classification

KW - Text ignorant models

UR - http://www.scopus.com/inward/record.url?scp=85089996172&partnerID=8YFLogxK

U2 - 10.1007/s12243-020-00800-4

DO - 10.1007/s12243-020-00800-4

M3 - Article

AN - SCOPUS:85089996172

VL - 75

SP - 549

EP - 561

JO - Annales des Telecommunications/Annals of Telecommunications

JF - Annales des Telecommunications/Annals of Telecommunications

SN - 0003-4347

IS - 9-10

ER -