Is Big Data Performance Reproducible in Modern Cloud Networks?

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Alexandru Uta
  • Alexandru Custura
  • Dmitry Duplyakin
  • Ivo Jimenez
  • Jan Rellermeyer
  • Carlos Maltzahn
  • Robert Ricci
  • Alexandru Iosup

Externe Organisationen

  • Vrije Universiteit Amsterdam
  • University of Utah
  • University of California at Santa Cruz
  • Delft University of Technology
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020
Seiten513-527
Seitenumfang15
ISBN (elektronisch)9781939133137
PublikationsstatusVeröffentlicht - 25 Feb. 2020
Extern publiziertJa
Veranstaltung17th USENIX Symposium on Networked Systems Design and Implementation - , USA / Vereinigte Staaten
Dauer: 25 Feb. 202027 Feb. 2020

Abstract

Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.

ASJC Scopus Sachgebiete

Zitieren

Is Big Data Performance Reproducible in Modern Cloud Networks? / Uta, Alexandru; Custura, Alexandru; Duplyakin, Dmitry et al.
Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020. 2020. S. 513-527.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Uta, A, Custura, A, Duplyakin, D, Jimenez, I, Rellermeyer, J, Maltzahn, C, Ricci, R & Iosup, A 2020, Is Big Data Performance Reproducible in Modern Cloud Networks? in Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020. S. 513-527, 17th USENIX Symposium on Networked Systems Design and Implementation, USA / Vereinigte Staaten, 25 Feb. 2020.
Uta, A., Custura, A., Duplyakin, D., Jimenez, I., Rellermeyer, J., Maltzahn, C., Ricci, R., & Iosup, A. (2020). Is Big Data Performance Reproducible in Modern Cloud Networks? In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020 (S. 513-527)
Uta A, Custura A, Duplyakin D, Jimenez I, Rellermeyer J, Maltzahn C et al. Is Big Data Performance Reproducible in Modern Cloud Networks? in Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020. 2020. S. 513-527
Uta, Alexandru ; Custura, Alexandru ; Duplyakin, Dmitry et al. / Is Big Data Performance Reproducible in Modern Cloud Networks?. Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020. 2020. S. 513-527
Download
@inproceedings{cb389b4283b040d587999ceffe280d57,
title = "Is Big Data Performance Reproducible in Modern Cloud Networks?",
abstract = "Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.",
author = "Alexandru Uta and Alexandru Custura and Dmitry Duplyakin and Ivo Jimenez and Jan Rellermeyer and Carlos Maltzahn and Robert Ricci and Alexandru Iosup",
note = "Funding information: We thank our shepherd Amar Phanishayee and all the anonymous reviewers for all their valuable suggestions. Work on this article was funded via NWO VIDI MagnaData (#14826), SURFsara e-infra180061, as well as NSF Grant numbers CNS-1419199, CNS-1743363, OAC-1836650, CNS-1764102, CNS-1705021, OAC-1450488, and the Center for Research in Open Source Software.; 17th USENIX Symposium on Networked Systems Design and Implementation ; Conference date: 25-02-2020 Through 27-02-2020",
year = "2020",
month = feb,
day = "25",
language = "English",
pages = "513--527",
booktitle = "Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020",

}

Download

TY - GEN

T1 - Is Big Data Performance Reproducible in Modern Cloud Networks?

AU - Uta, Alexandru

AU - Custura, Alexandru

AU - Duplyakin, Dmitry

AU - Jimenez, Ivo

AU - Rellermeyer, Jan

AU - Maltzahn, Carlos

AU - Ricci, Robert

AU - Iosup, Alexandru

N1 - Funding information: We thank our shepherd Amar Phanishayee and all the anonymous reviewers for all their valuable suggestions. Work on this article was funded via NWO VIDI MagnaData (#14826), SURFsara e-infra180061, as well as NSF Grant numbers CNS-1419199, CNS-1743363, OAC-1836650, CNS-1764102, CNS-1705021, OAC-1450488, and the Center for Research in Open Source Software.

PY - 2020/2/25

Y1 - 2020/2/25

N2 - Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.

AB - Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.

UR - http://www.scopus.com/inward/record.url?scp=85084920033&partnerID=8YFLogxK

M3 - Conference contribution

SP - 513

EP - 527

BT - Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020

T2 - 17th USENIX Symposium on Networked Systems Design and Implementation

Y2 - 25 February 2020 through 27 February 2020

ER -

Von denselben Autoren