Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

Brenton Walker; Stefan Bora; Markus Fidler

doi:10.1109/INFOCOM48880.2022.9796754

Details

Originalsprache	Englisch
Titel des Sammelwerks	INFOCOM 2022 - IEEE Conference on Computer Communications
Herausgeber (Verlag)	Institute of Electrical and Electronics Engineers Inc.
Seiten	460-469
Seitenumfang	10
ISBN (elektronisch)	9781665458221
ISBN (Print)	978-1-6654-5823-8
Publikationsstatus	Veröffentlicht - 2022
Veranstaltung	41st IEEE Conference on Computer Communications, INFOCOM 2022 - Virtual, Online, Großbritannien / Vereinigtes Königreich Dauer: 2 Mai 2022 → 5 Mai 2022

Publikationsreihe

Name	Proceedings - IEEE INFOCOM
Band	2022-May
ISSN (Print)	0743-166X
ISSN (elektronisch)	2641-9874

Abstract

Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

ASJC Scopus Sachgebiete

Informatik (insg.)
Allgemeine Computerwissenschaft
Ingenieurwesen (insg.)
Elektrotechnik und Elektronik

Zitieren

Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. / Walker, Brenton; Bora, Stefan; Fidler, Markus.
INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2022. S. 460-469 (Proceedings - IEEE INFOCOM; Band 2022-May).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Walker, B, Bora, S & Fidler, M 2022, Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. in INFOCOM 2022 - IEEE Conference on Computer Communications. Proceedings - IEEE INFOCOM, Bd. 2022-May, Institute of Electrical and Electronics Engineers Inc., S. 460-469, 41st IEEE Conference on Computer Communications, INFOCOM 2022, Virtual, Online, Großbritannien / Vereinigtes Königreich, 2 Mai 2022. https://doi.org/10.1109/INFOCOM48880.2022.9796754

Walker, B., Bora, S., & Fidler, M. (2022). Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. In INFOCOM 2022 - IEEE Conference on Computer Communications (S. 460-469). (Proceedings - IEEE INFOCOM; Band 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM48880.2022.9796754

Walker B, Bora S, Fidler M. Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. in INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc. 2022. S. 460-469. (Proceedings - IEEE INFOCOM). doi: 10.1109/INFOCOM48880.2022.9796754

Walker, Brenton ; Bora, Stefan ; Fidler, Markus. / Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers. INFOCOM 2022 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2022. S. 460-469 (Proceedings - IEEE INFOCOM).

Download

@inproceedings{2a89d40a6fdb400a95308b735f356147,

title = "Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers",

abstract = "Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call {"}Take-Half{"}, that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.",

author = "Brenton Walker and Stefan Bora and Markus Fidler",

note = "Funding Information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1). ; 41st IEEE Conference on Computer Communications, INFOCOM 2022 ; Conference date: 02-05-2022 Through 05-05-2022",

year = "2022",

doi = "10.1109/INFOCOM48880.2022.9796754",

language = "English",

isbn = "978-1-6654-5823-8",

series = "Proceedings - IEEE INFOCOM",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "460--469",

booktitle = "INFOCOM 2022 - IEEE Conference on Computer Communications",

address = "United States",

}

Download

TY - GEN

T1 - Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

AU - Walker, Brenton

AU - Bora, Stefan

AU - Fidler, Markus

N1 - Funding Information: This work was supported in part by the German Research Council (DFG) under Grant VaMoS (FI 1236/7-1).

PY - 2022

Y1 - 2022

N2 - Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

AB - Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.

UR - http://www.scopus.com/inward/record.url?scp=85126371308&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM48880.2022.9796754

DO - 10.1109/INFOCOM48880.2022.9796754

M3 - Conference contribution

AN - SCOPUS:85126371308

SN - 978-1-6654-5823-8

T3 - Proceedings - IEEE INFOCOM

SP - 460

EP - 469

BT - INFOCOM 2022 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 41st IEEE Conference on Computer Communications, INFOCOM 2022

Y2 - 2 May 2022 through 5 May 2022

ER -

Research@Leibniz University

Performance and Scaling of Parallel Systems with Blocking Start and/or Departure Barriers

Autorschaft

Organisationseinheiten

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Statistical Age-of-Information Bounds for Parallel Systems: When Do Independent Channels Make a Difference?

The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel Systems

A Min-plus Model of Age-of-Information with Worst-case and Statistical Bounds

Age-of-Information in Tandem Queues with Delayed Feedback: Zero-Wait vs. Pipelining

Age- and deviation-of-information of hybrid time- and event-triggered systems: What matters more, determinism or resource conservation?

Statistical Age-of-Information Bounds for Parallel Systems: When Do Independent Channels Make a Difference?

The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel Systems

A Min-plus Model of Age-of-Information with Worst-case and Statistical Bounds

Age-of-Information in Tandem Queues with Delayed Feedback: Zero-Wait vs. Pipelining

Age- and deviation-of-information of hybrid time- and event-triggered systems: What matters more, determinism or resource conservation?

Statistical Age-of-Information Bounds for Parallel Systems: When Do Independent Channels Make a Difference?