Details
Original language | English |
---|---|
Title of host publication | Proceedings of the 2017 SIAM International Conference on Data Mining (SDM) |
Editors | Nitesh Chawla, Wei Wang |
Publisher | Society for Industrial and Applied Mathematics Publications |
Pages | 390-398 |
Number of pages | 9 |
ISBN (electronic) | 9781611974973 |
Publication status | Published - 2017 |
Event | 17th SIAM International Conference on Data Mining, SDM 2017 - Houston, United States Duration: 27 Apr 2017 → 29 Apr 2017 |
Abstract
A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Science Applications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the 2017 SIAM International Conference on Data Mining (SDM). ed. / Nitesh Chawla; Wei Wang. Society for Industrial and Applied Mathematics Publications, 2017. p. 390-398.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Redundancies in Data and their Effect on the Evaluation of Recommendation Systems
T2 - 17th SIAM International Conference on Data Mining, SDM 2017
AU - Basaran, Daniel
AU - Ntoutsi, Eirini
AU - Zimek, Arthur
N1 - Publisher Copyright: Copyright © by SIAM. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2017
Y1 - 2017
N2 - A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.
AB - A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.
UR - http://www.scopus.com/inward/record.url?scp=85027880582&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974973.44
DO - 10.1137/1.9781611974973.44
M3 - Conference contribution
AN - SCOPUS:85027880582
SP - 390
EP - 398
BT - Proceedings of the 2017 SIAM International Conference on Data Mining (SDM)
A2 - Chawla, Nitesh
A2 - Wang, Wei
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 27 April 2017 through 29 April 2017
ER -