Details
Original language | English |
---|---|
Title of host publication | Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020 |
Pages | 513-527 |
Number of pages | 15 |
ISBN (electronic) | 9781939133137 |
Publication status | Published - 25 Feb 2020 |
Externally published | Yes |
Event | 17th USENIX Symposium on Networked Systems Design and Implementation - , United States Duration: 25 Feb 2020 → 27 Feb 2020 |
Abstract
ASJC Scopus subject areas
- Engineering(all)
- Control and Systems Engineering
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020. 2020. p. 513-527.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Is Big Data Performance Reproducible in Modern Cloud Networks?
AU - Uta, Alexandru
AU - Custura, Alexandru
AU - Duplyakin, Dmitry
AU - Jimenez, Ivo
AU - Rellermeyer, Jan
AU - Maltzahn, Carlos
AU - Ricci, Robert
AU - Iosup, Alexandru
N1 - Funding information: We thank our shepherd Amar Phanishayee and all the anonymous reviewers for all their valuable suggestions. Work on this article was funded via NWO VIDI MagnaData (#14826), SURFsara e-infra180061, as well as NSF Grant numbers CNS-1419199, CNS-1743363, OAC-1836650, CNS-1764102, CNS-1705021, OAC-1450488, and the Center for Research in Open Source Software.
PY - 2020/2/25
Y1 - 2020/2/25
N2 - Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.
AB - Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.
UR - http://www.scopus.com/inward/record.url?scp=85084920033&partnerID=8YFLogxK
M3 - Conference contribution
SP - 513
EP - 527
BT - Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020
T2 - 17th USENIX Symposium on Networked Systems Design and Implementation
Y2 - 25 February 2020 through 27 February 2020
ER -