Details
Original language | English |
---|---|
Pages (from-to) | 1545-1559 |
Number of pages | 15 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 29 |
Issue number | 7 |
Early online date | 1 Jan 2018 |
Publication status | Published - 1 Jul 2018 |
Abstract
Parallel computing has become a standard tool with architectures such as Google MapReduce, Hadoop, and Spark being broadly used in applications such as data processing and machine learning. Common to these systems are a fork operation, where jobs are first divided into tasks that are processed in parallel, and a join operation where completed tasks wait for the other tasks of the job before leaving the system. The synchronization constraint of the join operation makes the analysis of fork-join systems challenging, and few explicit results are known. In this work, we formulate a max-plus server model for parallel systems which allows us to derive performance bounds for a variety of systems in the GI GI and G G cases. We contribute end-to-end delay bounds for multi-stage fork-join networks. We perform a detailed comparison of different multi-server configurations, including an analysis of single-queue fork-join systems that achieve a fundamental performance gain. We compare these results to both simulation and a live Spark system.
Keywords
- Hadoop, MapReduce, Parallel computing, performance analysis, spark, stochastic network calculus
ASJC Scopus subject areas
- Computer Science(all)
- Signal Processing
- Computer Science(all)
- Hardware and Architecture
- Computer Science(all)
- Computational Theory and Mathematics
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: IEEE Transactions on Parallel and Distributed Systems, Vol. 29, No. 7, 01.07.2018, p. 1545-1559.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Non-Asymptotic Delay Bounds for Multi-Server Systems with Synchronization Constraints
AU - Fidler, Markus
AU - Walker, Brenton
AU - Jiang, Yuming
N1 - Funding information: This work was supported in part by the European Research Council (ERC) under StG 306644.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Parallel computing has become a standard tool with architectures such as Google MapReduce, Hadoop, and Spark being broadly used in applications such as data processing and machine learning. Common to these systems are a fork operation, where jobs are first divided into tasks that are processed in parallel, and a join operation where completed tasks wait for the other tasks of the job before leaving the system. The synchronization constraint of the join operation makes the analysis of fork-join systems challenging, and few explicit results are known. In this work, we formulate a max-plus server model for parallel systems which allows us to derive performance bounds for a variety of systems in the GI GI and G G cases. We contribute end-to-end delay bounds for multi-stage fork-join networks. We perform a detailed comparison of different multi-server configurations, including an analysis of single-queue fork-join systems that achieve a fundamental performance gain. We compare these results to both simulation and a live Spark system.
AB - Parallel computing has become a standard tool with architectures such as Google MapReduce, Hadoop, and Spark being broadly used in applications such as data processing and machine learning. Common to these systems are a fork operation, where jobs are first divided into tasks that are processed in parallel, and a join operation where completed tasks wait for the other tasks of the job before leaving the system. The synchronization constraint of the join operation makes the analysis of fork-join systems challenging, and few explicit results are known. In this work, we formulate a max-plus server model for parallel systems which allows us to derive performance bounds for a variety of systems in the GI GI and G G cases. We contribute end-to-end delay bounds for multi-stage fork-join networks. We perform a detailed comparison of different multi-server configurations, including an analysis of single-queue fork-join systems that achieve a fundamental performance gain. We compare these results to both simulation and a live Spark system.
KW - Hadoop
KW - MapReduce
KW - Parallel computing
KW - performance analysis
KW - spark
KW - stochastic network calculus
UR - http://www.scopus.com/inward/record.url?scp=85040042465&partnerID=8YFLogxK
U2 - 10.48550/arXiv.1610.06309
DO - 10.48550/arXiv.1610.06309
M3 - Article
AN - SCOPUS:85040042465
VL - 29
SP - 1545
EP - 1559
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
SN - 1045-9219
IS - 7
ER -