Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Vishnu Unnikrishnan
  • Christian Beyer
  • Pawel Matuszyk
  • Uli Niemann
  • Rüdiger Pryss
  • Winfried Schlee
  • Eirini Ntoutsi
  • Myra Spiliopoulou

Organisationseinheiten

Externe Organisationen

  • Otto-von-Guericke-Universität Magdeburg
  • Universität Ulm
  • Universität Regensburg
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)1-15
Seitenumfang15
FachzeitschriftInternational Journal of Data Science and Analytics
Jahrgang9
Ausgabenummer1
Frühes Online-Datum22 Feb. 2019
PublikationsstatusVeröffentlicht - Feb. 2020

Abstract

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

ASJC Scopus Sachgebiete

Zitieren

Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. / Unnikrishnan, Vishnu; Beyer, Christian; Matuszyk, Pawel et al.
in: International Journal of Data Science and Analytics, Jahrgang 9, Nr. 1, 02.2020, S. 1-15.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Unnikrishnan, V, Beyer, C, Matuszyk, P, Niemann, U, Pryss, R, Schlee, W, Ntoutsi, E & Spiliopoulou, M 2020, 'Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity', International Journal of Data Science and Analytics, Jg. 9, Nr. 1, S. 1-15. https://doi.org/10.1007/s41060-019-00177-1
Unnikrishnan, V., Beyer, C., Matuszyk, P., Niemann, U., Pryss, R., Schlee, W., Ntoutsi, E., & Spiliopoulou, M. (2020). Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics, 9(1), 1-15. https://doi.org/10.1007/s41060-019-00177-1
Unnikrishnan V, Beyer C, Matuszyk P, Niemann U, Pryss R, Schlee W et al. Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics. 2020 Feb;9(1):1-15. Epub 2019 Feb 22. doi: 10.1007/s41060-019-00177-1
Unnikrishnan, Vishnu ; Beyer, Christian ; Matuszyk, Pawel et al. / Entity-level stream classification : exploiting entity similarity to label the future observations referring to an entity. in: International Journal of Data Science and Analytics. 2020 ; Jahrgang 9, Nr. 1. S. 1-15.
Download
@article{0afbd8e82e7b4a71887cf2b273482a08,
title = "Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity",
abstract = "Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.",
keywords = "Entity similarity, kNN, Stream classification",
author = "Vishnu Unnikrishnan and Christian Beyer and Pawel Matuszyk and Uli Niemann and R{\"u}diger Pryss and Winfried Schlee and Eirini Ntoutsi and Myra Spiliopoulou",
note = "Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project{\textquoteright}s principal investigators.",
year = "2020",
month = feb,
doi = "10.1007/s41060-019-00177-1",
language = "English",
volume = "9",
pages = "1--15",
number = "1",

}

Download

TY - JOUR

T1 - Entity-level stream classification

T2 - exploiting entity similarity to label the future observations referring to an entity

AU - Unnikrishnan, Vishnu

AU - Beyer, Christian

AU - Matuszyk, Pawel

AU - Niemann, Uli

AU - Pryss, Rüdiger

AU - Schlee, Winfried

AU - Ntoutsi, Eirini

AU - Spiliopoulou, Myra

N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.

PY - 2020/2

Y1 - 2020/2

N2 - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

AB - Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

KW - Entity similarity

KW - kNN

KW - Stream classification

UR - http://www.scopus.com/inward/record.url?scp=85078408482&partnerID=8YFLogxK

U2 - 10.1007/s41060-019-00177-1

DO - 10.1007/s41060-019-00177-1

M3 - Article

AN - SCOPUS:85078408482

VL - 9

SP - 1

EP - 15

JO - International Journal of Data Science and Analytics

JF - International Journal of Data Science and Analytics

SN - 2364-415X

IS - 1

ER -