Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Daniel Basaran
  • Eirini Ntoutsi
  • Arthur Zimek

Organisationseinheiten

Externe Organisationen

  • Ludwig-Maximilians-Universität München (LMU)
  • University of Southern Denmark
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the 2017 SIAM International Conference on Data Mining (SDM)
Herausgeber/-innenNitesh Chawla, Wei Wang
Herausgeber (Verlag)Society for Industrial and Applied Mathematics Publications
Seiten390-398
Seitenumfang9
ISBN (elektronisch)9781611974973
PublikationsstatusVeröffentlicht - 2017
Veranstaltung17th SIAM International Conference on Data Mining, SDM 2017 - Houston, USA / Vereinigte Staaten
Dauer: 27 Apr. 201729 Apr. 2017

Abstract

A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.

ASJC Scopus Sachgebiete

Zitieren

Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. / Basaran, Daniel; Ntoutsi, Eirini; Zimek, Arthur.
Proceedings of the 2017 SIAM International Conference on Data Mining (SDM). Hrsg. / Nitesh Chawla; Wei Wang. Society for Industrial and Applied Mathematics Publications, 2017. S. 390-398.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Basaran, D, Ntoutsi, E & Zimek, A 2017, Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. in N Chawla & W Wang (Hrsg.), Proceedings of the 2017 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics Publications, S. 390-398, 17th SIAM International Conference on Data Mining, SDM 2017, Houston, USA / Vereinigte Staaten, 27 Apr. 2017. https://doi.org/10.1137/1.9781611974973.44
Basaran, D., Ntoutsi, E., & Zimek, A. (2017). Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. In N. Chawla, & W. Wang (Hrsg.), Proceedings of the 2017 SIAM International Conference on Data Mining (SDM) (S. 390-398). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.44
Basaran D, Ntoutsi E, Zimek A. Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets. in Chawla N, Wang W, Hrsg., Proceedings of the 2017 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics Publications. 2017. S. 390-398 doi: 10.1137/1.9781611974973.44
Basaran, Daniel ; Ntoutsi, Eirini ; Zimek, Arthur. / Redundancies in Data and their Effect on the Evaluation of Recommendation Systems : A Case Study on the Amazon Reviews Datasets. Proceedings of the 2017 SIAM International Conference on Data Mining (SDM). Hrsg. / Nitesh Chawla ; Wei Wang. Society for Industrial and Applied Mathematics Publications, 2017. S. 390-398
Download
@inproceedings{9076c016f90a4b549e8f0e4cc4085e31,
title = "Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets",
abstract = "A collection of datasets crawled from Amazon, {"}Amazon reviews{"}, is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.",
author = "Daniel Basaran and Eirini Ntoutsi and Arthur Zimek",
note = "Publisher Copyright: Copyright {\textcopyright} by SIAM. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; 17th SIAM International Conference on Data Mining, SDM 2017 ; Conference date: 27-04-2017 Through 29-04-2017",
year = "2017",
doi = "10.1137/1.9781611974973.44",
language = "English",
pages = "390--398",
editor = "Nitesh Chawla and Wei Wang",
booktitle = "Proceedings of the 2017 SIAM International Conference on Data Mining (SDM)",
publisher = "Society for Industrial and Applied Mathematics Publications",
address = "United States",

}

Download

TY - GEN

T1 - Redundancies in Data and their Effect on the Evaluation of Recommendation Systems

T2 - 17th SIAM International Conference on Data Mining, SDM 2017

AU - Basaran, Daniel

AU - Ntoutsi, Eirini

AU - Zimek, Arthur

N1 - Publisher Copyright: Copyright © by SIAM. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2017

Y1 - 2017

N2 - A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.

AB - A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.

UR - http://www.scopus.com/inward/record.url?scp=85027880582&partnerID=8YFLogxK

U2 - 10.1137/1.9781611974973.44

DO - 10.1137/1.9781611974973.44

M3 - Conference contribution

AN - SCOPUS:85027880582

SP - 390

EP - 398

BT - Proceedings of the 2017 SIAM International Conference on Data Mining (SDM)

A2 - Chawla, Nitesh

A2 - Wang, Wei

PB - Society for Industrial and Applied Mathematics Publications

Y2 - 27 April 2017 through 29 April 2017

ER -