Details
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 61-68 |
Number of pages | 8 |
ISBN (electronic) | 9798350346855 |
ISBN (print) | 979-8-3503-4686-2 |
Publication status | Published - 2023 |
Event | 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 - Porto, Portugal Duration: 19 Jul 2023 → 21 Jul 2023 |
Publication series
Name | Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors |
---|---|
Volume | 2023-July |
ISSN (Print) | 1063-6862 |
Abstract
Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.
Keywords
- Application-Specific Processor, CNN, Neural Network Hardware, Subword Permutation
ASJC Scopus subject areas
- Computer Science(all)
- Hardware and Architecture
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 61-68 (Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors; Vol. 2023-July).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency
AU - Beyer, Michael
AU - Gesper, Sven
AU - Guntoro, Andre
AU - Paya-Vaya, Guillermo
AU - Blume, Holger
N1 - Funding Information: This work is supported by the German federal ministry of education and research (BMBF), project ZuSE-KI-AVF (grant no. 16ME0062).
PY - 2023
Y1 - 2023
N2 - Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.
AB - Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.
KW - Application-Specific Processor
KW - CNN
KW - Neural Network Hardware
KW - Subword Permutation
UR - http://www.scopus.com/inward/record.url?scp=85174836754&partnerID=8YFLogxK
U2 - 10.1109/ASAP57973.2023.00023
DO - 10.1109/ASAP57973.2023.00023
M3 - Conference contribution
AN - SCOPUS:85174836754
SN - 979-8-3503-4686-2
T3 - Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors
SP - 61
EP - 68
BT - Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023
Y2 - 19 July 2023 through 21 July 2023
ER -