Tiny Tasks: A Remedy for Synchronization Constraints in Multi-Server Systems

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

View graph of relations

Details

Original languageEnglish
Title of host publicationINFOCOM 2020
Subtitle of host publication IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1063-1072
Number of pages10
ISBN (electronic)9781728164120
ISBN (print)978-1-7281-6413-7
Publication statusPublished - 2020
Event38th IEEE Conference on Computer Communications, INFOCOM 2020 - Toronto, Canada
Duration: 6 Jul 20209 Jul 2020

Publication series

NameProceedings - IEEE INFOCOM
Volume2020-July
ISSN (Print)0743-166X

Abstract

Models of parallel processing systems typically assume that one has l servers and jobs are split into an equal number of k = l tasks. This seemingly simple approximation has surprisingly large consequences for the resulting stability and performance bounds. In reality, best practices for modern mapreduce systems indicate that a job's partitioning factor should be much larger than the number of servers available, with some researchers going to far as to advocate for a tiny tasks regime, where jobs are split into over 10,000 tasks. In this paper we use recent advances in stochastic network calculus to fundamentally understand the effects of task granularity on parallel systems' scaling, stability, and performance. For the split-merge model, we show that when one allows for tiny tasks, the stability region is actually much better than had previously been concluded. For the single-queue fork-join model, we show that sojourn times quickly approach the optimal case when l big tasks are subdivided into k≫ l tiny tasks. Our results are validated using extensive simulations, and the applicability of the models used is validated by experiments on an Apache Spark cluster.

ASJC Scopus subject areas

Cite this

Tiny Tasks: A Remedy for Synchronization Constraints in Multi-Server Systems. / Fidler, Markus; Walker, Brenton; Bora, Stefan.
INFOCOM 2020 : IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2020. p. 1063-1072 9155368 (Proceedings - IEEE INFOCOM; Vol. 2020-July).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Fidler, M, Walker, B & Bora, S 2020, Tiny Tasks: A Remedy for Synchronization Constraints in Multi-Server Systems. in INFOCOM 2020 : IEEE Conference on Computer Communications., 9155368, Proceedings - IEEE INFOCOM, vol. 2020-July, Institute of Electrical and Electronics Engineers Inc., pp. 1063-1072, 38th IEEE Conference on Computer Communications, INFOCOM 2020, Toronto, Canada, 6 Jul 2020. https://doi.org/10.1109/infocom41043.2020.9155368
Fidler, M., Walker, B., & Bora, S. (2020). Tiny Tasks: A Remedy for Synchronization Constraints in Multi-Server Systems. In INFOCOM 2020 : IEEE Conference on Computer Communications (pp. 1063-1072). Article 9155368 (Proceedings - IEEE INFOCOM; Vol. 2020-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/infocom41043.2020.9155368
Fidler M, Walker B, Bora S. Tiny Tasks: A Remedy for Synchronization Constraints in Multi-Server Systems. In INFOCOM 2020 : IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc. 2020. p. 1063-1072. 9155368. (Proceedings - IEEE INFOCOM). doi: 10.1109/infocom41043.2020.9155368
Fidler, Markus ; Walker, Brenton ; Bora, Stefan. / Tiny Tasks : A Remedy for Synchronization Constraints in Multi-Server Systems. INFOCOM 2020 : IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 1063-1072 (Proceedings - IEEE INFOCOM).
Download
@inproceedings{62e97c90f40840ebaee7ebe5c55f8030,
title = "Tiny Tasks: A Remedy for Synchronization Constraints in Multi-Server Systems",
abstract = "Models of parallel processing systems typically assume that one has l servers and jobs are split into an equal number of k = l tasks. This seemingly simple approximation has surprisingly large consequences for the resulting stability and performance bounds. In reality, best practices for modern mapreduce systems indicate that a job's partitioning factor should be much larger than the number of servers available, with some researchers going to far as to advocate for a tiny tasks regime, where jobs are split into over 10,000 tasks. In this paper we use recent advances in stochastic network calculus to fundamentally understand the effects of task granularity on parallel systems' scaling, stability, and performance. For the split-merge model, we show that when one allows for tiny tasks, the stability region is actually much better than had previously been concluded. For the single-queue fork-join model, we show that sojourn times quickly approach the optimal case when l big tasks are subdivided into k≫ l tiny tasks. Our results are validated using extensive simulations, and the applicability of the models used is validated by experiments on an Apache Spark cluster.",
author = "Markus Fidler and Brenton Walker and Stefan Bora",
note = "Funding information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1).; 38th IEEE Conference on Computer Communications, INFOCOM 2020 ; Conference date: 06-07-2020 Through 09-07-2020",
year = "2020",
doi = "10.1109/infocom41043.2020.9155368",
language = "English",
isbn = "978-1-7281-6413-7",
series = "Proceedings - IEEE INFOCOM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1063--1072",
booktitle = "INFOCOM 2020",
address = "United States",

}

Download

TY - GEN

T1 - Tiny Tasks

T2 - 38th IEEE Conference on Computer Communications, INFOCOM 2020

AU - Fidler, Markus

AU - Walker, Brenton

AU - Bora, Stefan

N1 - Funding information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1).

PY - 2020

Y1 - 2020

N2 - Models of parallel processing systems typically assume that one has l servers and jobs are split into an equal number of k = l tasks. This seemingly simple approximation has surprisingly large consequences for the resulting stability and performance bounds. In reality, best practices for modern mapreduce systems indicate that a job's partitioning factor should be much larger than the number of servers available, with some researchers going to far as to advocate for a tiny tasks regime, where jobs are split into over 10,000 tasks. In this paper we use recent advances in stochastic network calculus to fundamentally understand the effects of task granularity on parallel systems' scaling, stability, and performance. For the split-merge model, we show that when one allows for tiny tasks, the stability region is actually much better than had previously been concluded. For the single-queue fork-join model, we show that sojourn times quickly approach the optimal case when l big tasks are subdivided into k≫ l tiny tasks. Our results are validated using extensive simulations, and the applicability of the models used is validated by experiments on an Apache Spark cluster.

AB - Models of parallel processing systems typically assume that one has l servers and jobs are split into an equal number of k = l tasks. This seemingly simple approximation has surprisingly large consequences for the resulting stability and performance bounds. In reality, best practices for modern mapreduce systems indicate that a job's partitioning factor should be much larger than the number of servers available, with some researchers going to far as to advocate for a tiny tasks regime, where jobs are split into over 10,000 tasks. In this paper we use recent advances in stochastic network calculus to fundamentally understand the effects of task granularity on parallel systems' scaling, stability, and performance. For the split-merge model, we show that when one allows for tiny tasks, the stability region is actually much better than had previously been concluded. For the single-queue fork-join model, we show that sojourn times quickly approach the optimal case when l big tasks are subdivided into k≫ l tiny tasks. Our results are validated using extensive simulations, and the applicability of the models used is validated by experiments on an Apache Spark cluster.

UR - http://www.scopus.com/inward/record.url?scp=85090297693&partnerID=8YFLogxK

U2 - 10.1109/infocom41043.2020.9155368

DO - 10.1109/infocom41043.2020.9155368

M3 - Conference contribution

AN - SCOPUS:85090297693

SN - 978-1-7281-6413-7

T3 - Proceedings - IEEE INFOCOM

SP - 1063

EP - 1072

BT - INFOCOM 2020

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 6 July 2020 through 9 July 2020

ER -

By the same author(s)