Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

View graph of relations

Details

Original languageEnglish
Title of host publicationINFOCOM 2022 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages460-469
Number of pages10
ISBN (electronic)9781665458221
ISBN (print)978-1-6654-5823-8
Publication statusPublished - 2022
Event41st IEEE Conference on Computer Communications, INFOCOM 2022 - Virtual, Online, United Kingdom (UK)
Duration: 2 May 20225 May 2022

Publication series

NameProceedings - IEEE INFOCOM
Volume2022-May
ISSN (Print)0743-166X
ISSN (electronic)2641-9874

Abstract

Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

ASJC Scopus subject areas

Cite this

Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. / Walker, Brenton; Bora, Stefan; Fidler, Markus.
INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2022. p. 460-469 (Proceedings - IEEE INFOCOM; Vol. 2022-May).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Walker, B, Bora, S & Fidler, M 2022, Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. in INFOCOM 2022 - IEEE Conference on Computer Communications. Proceedings - IEEE INFOCOM, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 460-469, 41st IEEE Conference on Computer Communications, INFOCOM 2022, Virtual, Online, United Kingdom (UK), 2 May 2022. https://doi.org/10.1109/INFOCOM48880.2022.9796754
Walker, B., Bora, S., & Fidler, M. (2022). Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. In INFOCOM 2022 - IEEE Conference on Computer Communications (pp. 460-469). (Proceedings - IEEE INFOCOM; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM48880.2022.9796754
Walker B, Bora S, Fidler M. Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. In INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc. 2022. p. 460-469. (Proceedings - IEEE INFOCOM). doi: 10.1109/INFOCOM48880.2022.9796754
Walker, Brenton ; Bora, Stefan ; Fidler, Markus. / Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 460-469 (Proceedings - IEEE INFOCOM).
Download
@inproceedings{2a89d40a6fdb400a95308b735f356147,
title = "Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers",
abstract = "Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call {"}Take-Half{"}, that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.",
author = "Brenton Walker and Stefan Bora and Markus Fidler",
note = "Funding Information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1). ; 41st IEEE Conference on Computer Communications, INFOCOM 2022 ; Conference date: 02-05-2022 Through 05-05-2022",
year = "2022",
doi = "10.1109/INFOCOM48880.2022.9796754",
language = "English",
isbn = "978-1-6654-5823-8",
series = "Proceedings - IEEE INFOCOM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "460--469",
booktitle = "INFOCOM 2022 - IEEE Conference on Computer Communications",
address = "United States",

}

Download

TY - GEN

T1 - Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

AU - Walker, Brenton

AU - Bora, Stefan

AU - Fidler, Markus

N1 - Funding Information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1).

PY - 2022

Y1 - 2022

N2 - Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

AB - Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

UR - http://www.scopus.com/inward/record.url?scp=85126371308&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM48880.2022.9796754

DO - 10.1109/INFOCOM48880.2022.9796754

M3 - Conference contribution

AN - SCOPUS:85126371308

SN - 978-1-6654-5823-8

T3 - Proceedings - IEEE INFOCOM

SP - 460

EP - 469

BT - INFOCOM 2022 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 41st IEEE Conference on Computer Communications, INFOCOM 2022

Y2 - 2 May 2022 through 5 May 2022

ER -

By the same author(s)