Details
Original language | English |
---|---|
Pages (from-to) | 391-408 |
Number of pages | 18 |
Journal | Microprocessing and Microprogramming |
Volume | 41 |
Issue number | 5-6 |
Publication status | Published - Oct 1995 |
Abstract
A highly parallel single-chip image signal processor architecture has been derived by analysis of image processing algorithms. Available levels of parallelism and their associated demands on data access, control and complexity of operations were taken into account. The RISC-architecture, called "HiPAR-DSP", consists of a control unit, 16 parallel ASIMD-controlled datapaths with autonomous addressing and instruction selection capability, a local data cache per data path, a shared memory with matrix type data access and a powerful DMA-unit. The proposed architecture was designed by assessing the results of an analysis of characteristic algorithm properties with respect to their inherent parallelization resources, achievable speed up and implementation costs. This resulted in a proper balance between the degree of parallelism and flexibility, leading to a high performance for a wide field of applications. Additional measures were taken to support an efficient high level programmability of the processor. This was achieved by the concurrent implementation of special architectural features and a C++-programming environment. It consists of an adaptation of the GNU C++-compiler and an optimizing assembler, supporting all levels of concurrence offered by the hardware. While most levels of parallelization are kept invisible to the programmer, data-level parallelism is expressed by the programmer using special new data types added to the standard C/C++-data-types. A sustained performance of about 2.0 Gigaoperations per second is achieved by the 100 MHz clocked processor for numerous image processing algorithms, leading to a processing time e.g. for a normalized correlation of a 512 × 512 image with a 32 × 32 correlation mask of 450 ms. Thus, a performance is achieved with a programmable parallel processor architecture that hitherto required the application of a dedicated integrated circuit.
Keywords
- Autonomous, Image processing, Parallel language extension, Parallel VLSI RISC processor, Parallelizing compiler, Shared memory architecture, SIMD controlling
ASJC Scopus subject areas
- Engineering(all)
- General Engineering
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Microprocessing and Microprogramming, Vol. 41, No. 5-6, 10.1995, p. 391-408.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Architecture and C++-programming environment of a highly parallel image signal processor
AU - Kneip, J.
AU - Ohmacht, M.
AU - Rönner, K.
AU - Pirsch, P.
PY - 1995/10
Y1 - 1995/10
N2 - A highly parallel single-chip image signal processor architecture has been derived by analysis of image processing algorithms. Available levels of parallelism and their associated demands on data access, control and complexity of operations were taken into account. The RISC-architecture, called "HiPAR-DSP", consists of a control unit, 16 parallel ASIMD-controlled datapaths with autonomous addressing and instruction selection capability, a local data cache per data path, a shared memory with matrix type data access and a powerful DMA-unit. The proposed architecture was designed by assessing the results of an analysis of characteristic algorithm properties with respect to their inherent parallelization resources, achievable speed up and implementation costs. This resulted in a proper balance between the degree of parallelism and flexibility, leading to a high performance for a wide field of applications. Additional measures were taken to support an efficient high level programmability of the processor. This was achieved by the concurrent implementation of special architectural features and a C++-programming environment. It consists of an adaptation of the GNU C++-compiler and an optimizing assembler, supporting all levels of concurrence offered by the hardware. While most levels of parallelization are kept invisible to the programmer, data-level parallelism is expressed by the programmer using special new data types added to the standard C/C++-data-types. A sustained performance of about 2.0 Gigaoperations per second is achieved by the 100 MHz clocked processor for numerous image processing algorithms, leading to a processing time e.g. for a normalized correlation of a 512 × 512 image with a 32 × 32 correlation mask of 450 ms. Thus, a performance is achieved with a programmable parallel processor architecture that hitherto required the application of a dedicated integrated circuit.
AB - A highly parallel single-chip image signal processor architecture has been derived by analysis of image processing algorithms. Available levels of parallelism and their associated demands on data access, control and complexity of operations were taken into account. The RISC-architecture, called "HiPAR-DSP", consists of a control unit, 16 parallel ASIMD-controlled datapaths with autonomous addressing and instruction selection capability, a local data cache per data path, a shared memory with matrix type data access and a powerful DMA-unit. The proposed architecture was designed by assessing the results of an analysis of characteristic algorithm properties with respect to their inherent parallelization resources, achievable speed up and implementation costs. This resulted in a proper balance between the degree of parallelism and flexibility, leading to a high performance for a wide field of applications. Additional measures were taken to support an efficient high level programmability of the processor. This was achieved by the concurrent implementation of special architectural features and a C++-programming environment. It consists of an adaptation of the GNU C++-compiler and an optimizing assembler, supporting all levels of concurrence offered by the hardware. While most levels of parallelization are kept invisible to the programmer, data-level parallelism is expressed by the programmer using special new data types added to the standard C/C++-data-types. A sustained performance of about 2.0 Gigaoperations per second is achieved by the 100 MHz clocked processor for numerous image processing algorithms, leading to a processing time e.g. for a normalized correlation of a 512 × 512 image with a 32 × 32 correlation mask of 450 ms. Thus, a performance is achieved with a programmable parallel processor architecture that hitherto required the application of a dedicated integrated circuit.
KW - Autonomous
KW - Image processing
KW - Parallel language extension
KW - Parallel VLSI RISC processor
KW - Parallelizing compiler
KW - Shared memory architecture
KW - SIMD controlling
UR - http://www.scopus.com/inward/record.url?scp=0029386737&partnerID=8YFLogxK
U2 - 10.1016/0165-6074(95)00023-H
DO - 10.1016/0165-6074(95)00023-H
M3 - Article
AN - SCOPUS:0029386737
VL - 41
SP - 391
EP - 408
JO - Microprocessing and Microprogramming
JF - Microprocessing and Microprogramming
SN - 0165-6074
IS - 5-6
ER -