Details
Original language | English |
---|---|
Title of host publication | 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) |
Subtitle of host publication | Proceedings |
Editors | Tina Eliassi-Rad, Wei Wang, Ciro Cattuto, Foster Provost, Rayid Ghani, Francesco Bonchi |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 246-255 |
Number of pages | 10 |
ISBN (electronic) | 9781538650905 |
ISBN (print) | 9781538650912 |
Publication status | Published - Oct 2019 |
Event | 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 - Turin, Italy Duration: 1 Oct 2018 → 3 Oct 2018 |
Abstract
Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
Keywords
- Entity similarity, KNN, Stream classification
ASJC Scopus subject areas
- Computer Science(all)
- Signal Processing
- Decision Sciences(all)
- Information Systems and Management
- Decision Sciences(all)
- Statistics, Probability and Uncertainty
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA): Proceedings. ed. / Tina Eliassi-Rad; Wei Wang; Ciro Cattuto; Foster Provost; Rayid Ghani; Francesco Bonchi. Institute of Electrical and Electronics Engineers Inc., 2019. p. 246-255.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Entity-Level Stream Classification
T2 - 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018
AU - Beyer, Christian
AU - Unnikrishnan, Vishnu
AU - Matuszyk, Pawel
AU - Niemann, Uli
AU - Pryss, Rudiger
AU - Schlee, Winfried
AU - Ntoutsi, Eirini
AU - Spiliopoulou, Myra
N1 - Funding information: Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.
PY - 2019/10
Y1 - 2019/10
N2 - Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
AB - Stream classification algorithms traditionally treat arriving observations as independent. However, in many applications the arriving examples may depend on the 'entity' that generated them, e.g. in product reviewing or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of observations into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k Nearest Neighbour inspired stream classification approach (kNN), in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially transferred from another domain. To distinguish between cases where this kind of knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial few observations of each entity, we assume that no additional labels arrive, and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
KW - Entity similarity
KW - KNN
KW - Stream classification
UR - http://www.scopus.com/inward/record.url?scp=85062867049&partnerID=8YFLogxK
U2 - 10.1109/DSAA.2018.00035
DO - 10.1109/DSAA.2018.00035
M3 - Conference contribution
AN - SCOPUS:85062867049
SN - 9781538650912
SP - 246
EP - 255
BT - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
A2 - Eliassi-Rad, Tina
A2 - Wang, Wei
A2 - Cattuto, Ciro
A2 - Provost, Foster
A2 - Ghani, Rayid
A2 - Bonchi, Francesco
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 October 2018 through 3 October 2018
ER -