Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Christian Beyer
  • Vishnu Unnikrishnan
  • Pawel Matuszyk
  • Uli Niemann
  • Rudiger Pryss
  • Winfried Schlee
  • Eirini Ntoutsi
  • Myra Spiliopoulou

Organisationseinheiten

Externe Organisationen

  • Otto-von-Guericke-Universität Magdeburg
  • Universität Ulm
  • Universität Regensburg
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
UntertitelProceedings
Herausgeber/-innenTina Eliassi-Rad, Wei Wang, Ciro Cattuto, Foster Provost, Rayid Ghani, Francesco Bonchi
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten246-255
Seitenumfang10
ISBN (elektronisch)9781538650905
ISBN (Print)9781538650912
PublikationsstatusVeröffentlicht - Okt. 2019
Veranstaltung5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 - Turin, Italien
Dauer: 1 Okt. 20183 Okt. 2018

Abstract

Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

ASJC Scopus Sachgebiete

Zitieren

Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. / Beyer, Christian; Unnikrishnan, Vishnu; Matuszyk, Pawel et al.
2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. Hrsg. / Tina Eliassi-Rad; Wei Wang; Ciro Cattuto; Foster Provost; Rayid Ghani; Francesco Bonchi. Institute of Electrical and Electronics Engineers Inc., 2019. S. 246-255.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Beyer, C, Unnikrishnan, V, Matuszyk, P, Niemann, U, Pryss, R, Schlee, W, Ntoutsi, E & Spiliopoulou, M 2019, Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. in T Eliassi-Rad, W Wang, C Cattuto, F Provost, R Ghani & F Bonchi (Hrsg.), 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. Institute of Electrical and Electronics Engineers Inc., S. 246-255, 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italien, 1 Okt. 2018. https://doi.org/10.1109/DSAA.2018.00035
Beyer, C., Unnikrishnan, V., Matuszyk, P., Niemann, U., Pryss, R., Schlee, W., Ntoutsi, E., & Spiliopoulou, M. (2019). Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. In T. Eliassi-Rad, W. Wang, C. Cattuto, F. Provost, R. Ghani, & F. Bonchi (Hrsg.), 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings (S. 246-255). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSAA.2018.00035
Beyer C, Unnikrishnan V, Matuszyk P, Niemann U, Pryss R, Schlee W et al. Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. in Eliassi-Rad T, Wang W, Cattuto C, Provost F, Ghani R, Bonchi F, Hrsg., 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. S. 246-255 doi: 10.1109/DSAA.2018.00035
Beyer, Christian ; Unnikrishnan, Vishnu ; Matuszyk, Pawel et al. / Entity-Level Stream Classification : Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. Hrsg. / Tina Eliassi-Rad ; Wei Wang ; Ciro Cattuto ; Foster Provost ; Rayid Ghani ; Francesco Bonchi. Institute of Electrical and Electronics Engineers Inc., 2019. S. 246-255
Download
@inproceedings{08b862b75c9c416d8fb23833c41c40e7,
title = "Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity",
abstract = "Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.",
keywords = "Entity similarity, KNN, Stream classification",
author = "Christian Beyer and Vishnu Unnikrishnan and Pawel Matuszyk and Uli Niemann and Rudiger Pryss and Winfried Schlee and Eirini Ntoutsi and Myra Spiliopoulou",
note = "Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project{\textquoteright}s principal investigators.; 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 ; Conference date: 01-10-2018 Through 03-10-2018",
year = "2019",
month = oct,
doi = "10.1109/DSAA.2018.00035",
language = "English",
isbn = "9781538650912",
pages = "246--255",
editor = "Tina Eliassi-Rad and Wei Wang and Ciro Cattuto and Foster Provost and Rayid Ghani and Francesco Bonchi",
booktitle = "2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - Entity-Level Stream Classification

T2 - 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018

AU - Beyer, Christian

AU - Unnikrishnan, Vishnu

AU - Matuszyk, Pawel

AU - Niemann, Uli

AU - Pryss, Rudiger

AU - Schlee, Winfried

AU - Ntoutsi, Eirini

AU - Spiliopoulou, Myra

N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.

PY - 2019/10

Y1 - 2019/10

N2 - Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

AB - Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

KW - Entity similarity

KW - KNN

KW - Stream classification

UR - http://www.scopus.com/inward/record.url?scp=85062867049&partnerID=8YFLogxK

U2 - 10.1109/DSAA.2018.00035

DO - 10.1109/DSAA.2018.00035

M3 - Conference contribution

AN - SCOPUS:85062867049

SN - 9781538650912

SP - 246

EP - 255

BT - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)

A2 - Eliassi-Rad, Tina

A2 - Wang, Wei

A2 - Cattuto, Ciro

A2 - Provost, Foster

A2 - Ghani, Rayid

A2 - Bonchi, Francesco

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 October 2018 through 3 October 2018

ER -