Analyzing the memory ordering models of the Apple M1

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Lars Wrenger
  • Dominik Töllner
  • Daniel Lohmann
View graph of relations

Details

Original languageEnglish
Article number103102
Number of pages8
JournalJournal of Systems Architecture
Volume149
Early online date4 Mar 2024
Publication statusPublished - Apr 2024

Abstract

The Apple M1 ARM processor family incorporates two memory consistency models: the conventional ARM weak memory ordering and the Total store ordering (TSO) model from the x86 architecture utilized by Apple's x86 emulator, Rosetta 2. The presence of both memory ordering models on the same hardware enables us to thoroughly benchmark and compare their performance characteristics and worst-case workloads. In this paper, we assess the performance implications of TSO on the Apple M1 processor architecture. Based on the multi-threading workloads of the SPEC2017 CPU FP benchmark suite, our findings indicate that TSO is, on average, 8.94 percent slower than ARM's weaker memory ordering. Through synthetic benchmarks, we further explore the workloads that experience the most significant performance degradation due to TSO. We also take a deeper look into the specific atomic instructions provided by the ARMv8.3 specification and their synchronization overheads.

Keywords

    Apple M1, ARM, Memory ordering, TSO

ASJC Scopus subject areas

Cite this

Analyzing the memory ordering models of the Apple M1. / Wrenger, Lars; Töllner, Dominik; Lohmann, Daniel.
In: Journal of Systems Architecture, Vol. 149, 103102, 04.2024.

Research output: Contribution to journalArticleResearchpeer review

Wrenger L, Töllner D, Lohmann D. Analyzing the memory ordering models of the Apple M1. Journal of Systems Architecture. 2024 Apr;149:103102. Epub 2024 Mar 4. doi: 10.1016/j.sysarc.2024.103102
Wrenger, Lars ; Töllner, Dominik ; Lohmann, Daniel. / Analyzing the memory ordering models of the Apple M1. In: Journal of Systems Architecture. 2024 ; Vol. 149.
Download
@article{b22c44b5838441f7b74c74c5434d90b8,
title = "Analyzing the memory ordering models of the Apple M1",
abstract = "The Apple M1 ARM processor family incorporates two memory consistency models: the conventional ARM weak memory ordering and the Total store ordering (TSO) model from the x86 architecture utilized by Apple's x86 emulator, Rosetta 2. The presence of both memory ordering models on the same hardware enables us to thoroughly benchmark and compare their performance characteristics and worst-case workloads. In this paper, we assess the performance implications of TSO on the Apple M1 processor architecture. Based on the multi-threading workloads of the SPEC2017 CPU FP benchmark suite, our findings indicate that TSO is, on average, 8.94 percent slower than ARM's weaker memory ordering. Through synthetic benchmarks, we further explore the workloads that experience the most significant performance degradation due to TSO. We also take a deeper look into the specific atomic instructions provided by the ARMv8.3 specification and their synchronization overheads.",
keywords = "Apple M1, ARM, Memory ordering, TSO",
author = "Lars Wrenger and Dominik T{\"o}llner and Daniel Lohmann",
note = "Funding Information: We thank our reviewers for their valuable feedback. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – LO 1719/8-1 . ",
year = "2024",
month = apr,
doi = "10.1016/j.sysarc.2024.103102",
language = "English",
volume = "149",
journal = "Journal of Systems Architecture",
issn = "1383-7621",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Analyzing the memory ordering models of the Apple M1

AU - Wrenger, Lars

AU - Töllner, Dominik

AU - Lohmann, Daniel

N1 - Funding Information: We thank our reviewers for their valuable feedback. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – LO 1719/8-1 .

PY - 2024/4

Y1 - 2024/4

N2 - The Apple M1 ARM processor family incorporates two memory consistency models: the conventional ARM weak memory ordering and the Total store ordering (TSO) model from the x86 architecture utilized by Apple's x86 emulator, Rosetta 2. The presence of both memory ordering models on the same hardware enables us to thoroughly benchmark and compare their performance characteristics and worst-case workloads. In this paper, we assess the performance implications of TSO on the Apple M1 processor architecture. Based on the multi-threading workloads of the SPEC2017 CPU FP benchmark suite, our findings indicate that TSO is, on average, 8.94 percent slower than ARM's weaker memory ordering. Through synthetic benchmarks, we further explore the workloads that experience the most significant performance degradation due to TSO. We also take a deeper look into the specific atomic instructions provided by the ARMv8.3 specification and their synchronization overheads.

AB - The Apple M1 ARM processor family incorporates two memory consistency models: the conventional ARM weak memory ordering and the Total store ordering (TSO) model from the x86 architecture utilized by Apple's x86 emulator, Rosetta 2. The presence of both memory ordering models on the same hardware enables us to thoroughly benchmark and compare their performance characteristics and worst-case workloads. In this paper, we assess the performance implications of TSO on the Apple M1 processor architecture. Based on the multi-threading workloads of the SPEC2017 CPU FP benchmark suite, our findings indicate that TSO is, on average, 8.94 percent slower than ARM's weaker memory ordering. Through synthetic benchmarks, we further explore the workloads that experience the most significant performance degradation due to TSO. We also take a deeper look into the specific atomic instructions provided by the ARMv8.3 specification and their synchronization overheads.

KW - Apple M1

KW - ARM

KW - Memory ordering

KW - TSO

UR - http://www.scopus.com/inward/record.url?scp=85186716348&partnerID=8YFLogxK

U2 - 10.1016/j.sysarc.2024.103102

DO - 10.1016/j.sysarc.2024.103102

M3 - Article

AN - SCOPUS:85186716348

VL - 149

JO - Journal of Systems Architecture

JF - Journal of Systems Architecture

SN - 1383-7621

M1 - 103102

ER -