Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency

Michael Beyer; Sven Gesper; Andre Guntoro; Guillermo Paya-Vaya; Holger Blume

doi:10.1109/ASAP57973.2023.00023

Details

Originalsprache	Englisch
Titel des Sammelwerks	Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023
Herausgeber (Verlag)	Institute of Electrical and Electronics Engineers Inc.
Seiten	61-68
Seitenumfang	8
ISBN (elektronisch)	9798350346855
ISBN (Print)	979-8-3503-4686-2
Publikationsstatus	Veröffentlicht - 2023
Veranstaltung	34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 - Porto, Portugal Dauer: 19 Juli 2023 → 21 Juli 2023

Publikationsreihe

Name	Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors
Band	2023-July
ISSN (Print)	1063-6862

Abstract

Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.

ASJC Scopus Sachgebiete

Informatik (insg.)
Hardware und Architektur
Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency. / Beyer, Michael; Gesper, Sven; Guntoro, Andre et al.
Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 61-68 (Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors; Band 2023-July).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Beyer, M, Gesper, S, Guntoro, A, Paya-Vaya, G & Blume, H 2023, Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency. in Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023. Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Bd. 2023-July, Institute of Electrical and Electronics Engineers Inc., S. 61-68, 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023, Porto, Portugal, 19 Juli 2023. https://doi.org/10.1109/ASAP57973.2023.00023

Beyer, M., Gesper, S., Guntoro, A., Paya-Vaya, G., & Blume, H. (2023). Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency. In Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 (S. 61-68). (Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors; Band 2023-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASAP57973.2023.00023

Beyer M, Gesper S, Guntoro A, Paya-Vaya G, Blume H. Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency. in Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023. Institute of Electrical and Electronics Engineers Inc. 2023. S. 61-68. (Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors). doi: 10.1109/ASAP57973.2023.00023

Beyer, Michael ; Gesper, Sven ; Guntoro, Andre et al. / Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency. Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023. Institute of Electrical and Electronics Engineers Inc., 2023. S. 61-68 (Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors).

Download

@inproceedings{652b927dcaba485face632adffdbac25,

title = "Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency",

abstract = "Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.",

keywords = "Application-Specific Processor, CNN, Neural Network Hardware, Subword Permutation",

author = "Michael Beyer and Sven Gesper and Andre Guntoro and Guillermo Paya-Vaya and Holger Blume",

note = "Funding Information: This work is supported by the German federal ministry of education and research (BMBF), project ZuSE-KI-AVF (grant no. 16ME0062).; 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 ; Conference date: 19-07-2023 Through 21-07-2023",

year = "2023",

doi = "10.1109/ASAP57973.2023.00023",

language = "English",

isbn = "979-8-3503-4686-2",

series = "Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "61--68",

booktitle = "Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023",

address = "United States",

}

Download

TY - GEN

T1 - Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency

AU - Beyer, Michael

AU - Gesper, Sven

AU - Guntoro, Andre

AU - Paya-Vaya, Guillermo

AU - Blume, Holger

N1 - Funding Information: This work is supported by the German federal ministry of education and research (BMBF), project ZuSE-KI-AVF (grant no. 16ME0062).

PY - 2023

Y1 - 2023

N2 - Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.

AB - Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.

KW - Application-Specific Processor

KW - CNN

KW - Neural Network Hardware

KW - Subword Permutation

UR - http://www.scopus.com/inward/record.url?scp=85174836754&partnerID=8YFLogxK

U2 - 10.1109/ASAP57973.2023.00023

DO - 10.1109/ASAP57973.2023.00023

M3 - Conference contribution

AN - SCOPUS:85174836754

SN - 979-8-3503-4686-2

T3 - Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors

SP - 61

EP - 68

BT - Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023

Y2 - 19 July 2023 through 21 July 2023

ER -

Research@Leibniz University

Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Blue Light-Induced, Dosed Protein Expression of Active BDNF in Human Cells Using the Optogenetic CRY2/CIB System

ZuSE-KI-Mobil AI Chip Design Platform: An Overview

High Temperature In-Order RISC-V Processor with Heterogeneous Pipeline and Out-of-Order Write-Back Mechanism

RRNS Arith Lib – An Open-Source Redundant Residue Number System Arithmetic VHDL Library

Radar Object Detection on a Vector Processor Using Sparse Convolutional Neural Networks

Blue Light-Induced, Dosed Protein Expression of Active BDNF in Human Cells Using the Optogenetic CRY2/CIB System

ZuSE-KI-Mobil AI Chip Design Platform: An Overview

High Temperature In-Order RISC-V Processor with Heterogeneous Pipeline and Out-of-Order Write-Back Mechanism

RRNS Arith Lib – An Open-Source Redundant Residue Number System Arithmetic VHDL Library

Radar Object Detection on a Vector Processor Using Sparse Convolutional Neural Networks

Blue Light-Induced, Dosed Protein Expression of Active BDNF in Human Cells Using the Optogenetic CRY2/CIB System