Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Christian Beyer
  • Vishnu Unnikrishnan
  • Pawel Matuszyk
  • Uli Niemann
  • Rudiger Pryss
  • Winfried Schlee
  • Eirini Ntoutsi
  • Myra Spiliopoulou

Research Organisations

External Research Organisations

  • Otto-von-Guericke University Magdeburg
  • Ulm University
  • University of Regensburg
View graph of relations

Details

Original languageEnglish
Title of host publication2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
Subtitle of host publicationProceedings
EditorsTina Eliassi-Rad, Wei Wang, Ciro Cattuto, Foster Provost, Rayid Ghani, Francesco Bonchi
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages246-255
Number of pages10
ISBN (electronic)9781538650905
ISBN (print)9781538650912
Publication statusPublished - Oct 2019
Event5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 - Turin, Italy
Duration: 1 Oct 20183 Oct 2018

Abstract

Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

Keywords

    Entity similarity, KNN, Stream classification

ASJC Scopus subject areas

Cite this

Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. / Beyer, Christian; Unnikrishnan, Vishnu; Matuszyk, Pawel et al.
2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. ed. / Tina Eliassi-Rad; Wei Wang; Ciro Cattuto; Foster Provost; Rayid Ghani; Francesco Bonchi. Institute of Electrical and Electronics Engineers Inc., 2019. p. 246-255.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Beyer, C, Unnikrishnan, V, Matuszyk, P, Niemann, U, Pryss, R, Schlee, W, Ntoutsi, E & Spiliopoulou, M 2019, Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. in T Eliassi-Rad, W Wang, C Cattuto, F Provost, R Ghani & F Bonchi (eds), 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. Institute of Electrical and Electronics Engineers Inc., pp. 246-255, 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, 1 Oct 2018. https://doi.org/10.1109/DSAA.2018.00035
Beyer, C., Unnikrishnan, V., Matuszyk, P., Niemann, U., Pryss, R., Schlee, W., Ntoutsi, E., & Spiliopoulou, M. (2019). Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. In T. Eliassi-Rad, W. Wang, C. Cattuto, F. Provost, R. Ghani, & F. Bonchi (Eds.), 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings (pp. 246-255). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSAA.2018.00035
Beyer C, Unnikrishnan V, Matuszyk P, Niemann U, Pryss R, Schlee W et al. Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. In Eliassi-Rad T, Wang W, Cattuto C, Provost F, Ghani R, Bonchi F, editors, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 246-255 doi: 10.1109/DSAA.2018.00035
Beyer, Christian ; Unnikrishnan, Vishnu ; Matuszyk, Pawel et al. / Entity-Level Stream Classification : Exploiting Entity Similarity to Label the Future Observations Referring to an Entity. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. editor / Tina Eliassi-Rad ; Wei Wang ; Ciro Cattuto ; Foster Provost ; Rayid Ghani ; Francesco Bonchi. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 246-255
Download
@inproceedings{08b862b75c9c416d8fb23833c41c40e7,
title = "Entity-Level Stream Classification: Exploiting Entity Similarity to Label the Future Observations Referring to an Entity",
abstract = "Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.",
keywords = "Entity similarity, KNN, Stream classification",
author = "Christian Beyer and Vishnu Unnikrishnan and Pawel Matuszyk and Uli Niemann and Rudiger Pryss and Winfried Schlee and Eirini Ntoutsi and Myra Spiliopoulou",
note = "Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project{\textquoteright}s principal investigators.; 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 ; Conference date: 01-10-2018 Through 03-10-2018",
year = "2019",
month = oct,
doi = "10.1109/DSAA.2018.00035",
language = "English",
isbn = "9781538650912",
pages = "246--255",
editor = "Tina Eliassi-Rad and Wei Wang and Ciro Cattuto and Foster Provost and Rayid Ghani and Francesco Bonchi",
booktitle = "2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - Entity-Level Stream Classification

T2 - 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018

AU - Beyer, Christian

AU - Unnikrishnan, Vishnu

AU - Matuszyk, Pawel

AU - Niemann, Uli

AU - Pryss, Rudiger

AU - Schlee, Winfried

AU - Ntoutsi, Eirini

AU - Spiliopoulou, Myra

N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.

PY - 2019/10

Y1 - 2019/10

N2 - Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

AB - Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

KW - Entity similarity

KW - KNN

KW - Stream classification

UR - http://www.scopus.com/inward/record.url?scp=85062867049&partnerID=8YFLogxK

U2 - 10.1109/DSAA.2018.00035

DO - 10.1109/DSAA.2018.00035

M3 - Conference contribution

AN - SCOPUS:85062867049

SN - 9781538650912

SP - 246

EP - 255

BT - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)

A2 - Eliassi-Rad, Tina

A2 - Wang, Wei

A2 - Cattuto, Ciro

A2 - Provost, Foster

A2 - Ghani, Rayid

A2 - Bonchi, Francesco

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 October 2018 through 3 October 2018

ER -