Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksINFOCOM 2022 - IEEE Conference on Computer Communications
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten460-469
Seitenumfang10
ISBN (elektronisch)9781665458221
ISBN (Print)978-1-6654-5823-8
PublikationsstatusVeröffentlicht - 2022
Veranstaltung41st IEEE Conference on Computer Communications, INFOCOM 2022 - Virtual, Online, Großbritannien / Vereinigtes Königreich
Dauer: 2 Mai 20225 Mai 2022

Publikationsreihe

NameProceedings - IEEE INFOCOM
Band2022-May
ISSN (Print)0743-166X
ISSN (elektronisch)2641-9874

Abstract

Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

ASJC Scopus Sachgebiete

Zitieren

Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. / Walker, Brenton; Bora, Stefan; Fidler, Markus.
INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2022. S. 460-469 (Proceedings - IEEE INFOCOM; Band 2022-May).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Walker, B, Bora, S & Fidler, M 2022, Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. in INFOCOM 2022 - IEEE Conference on Computer Communications. Proceedings - IEEE INFOCOM, Bd. 2022-May, Institute of Electrical and Electronics Engineers Inc., S. 460-469, 41st IEEE Conference on Computer Communications, INFOCOM 2022, Virtual, Online, Großbritannien / Vereinigtes Königreich, 2 Mai 2022. https://doi.org/10.1109/INFOCOM48880.2022.9796754
Walker, B., Bora, S., & Fidler, M. (2022). Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. In INFOCOM 2022 - IEEE Conference on Computer Communications (S. 460-469). (Proceedings - IEEE INFOCOM; Band 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM48880.2022.9796754
Walker B, Bora S, Fidler M. Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. in INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc. 2022. S. 460-469. (Proceedings - IEEE INFOCOM). doi: 10.1109/INFOCOM48880.2022.9796754
Walker, Brenton ; Bora, Stefan ; Fidler, Markus. / Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2022. S. 460-469 (Proceedings - IEEE INFOCOM).
Download
@inproceedings{2a89d40a6fdb400a95308b735f356147,
title = "Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers",
abstract = "Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call {"}Take-Half{"}, that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.",
author = "Brenton Walker and Stefan Bora and Markus Fidler",
note = "Funding Information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1). ; 41st IEEE Conference on Computer Communications, INFOCOM 2022 ; Conference date: 02-05-2022 Through 05-05-2022",
year = "2022",
doi = "10.1109/INFOCOM48880.2022.9796754",
language = "English",
isbn = "978-1-6654-5823-8",
series = "Proceedings - IEEE INFOCOM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "460--469",
booktitle = "INFOCOM 2022 - IEEE Conference on Computer Communications",
address = "United States",

}

Download

TY - GEN

T1 - Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

AU - Walker, Brenton

AU - Bora, Stefan

AU - Fidler, Markus

N1 - Funding Information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1).

PY - 2022

Y1 - 2022

N2 - Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

AB - Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

UR - http://www.scopus.com/inward/record.url?scp=85126371308&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM48880.2022.9796754

DO - 10.1109/INFOCOM48880.2022.9796754

M3 - Conference contribution

AN - SCOPUS:85126371308

SN - 978-1-6654-5823-8

T3 - Proceedings - IEEE INFOCOM

SP - 460

EP - 469

BT - INFOCOM 2022 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 41st IEEE Conference on Computer Communications, INFOCOM 2022

Y2 - 2 May 2022 through 5 May 2022

ER -

Von denselben Autoren