Details
Originalsprache | Englisch |
---|---|
Seiten | 62-69 |
Seitenumfang | 8 |
Publikationsstatus | Veröffentlicht - 27 Sept. 2003 |
Veranstaltung | 2003 Workshop on Memory Performance: Dealing with Applications, Systems and Architecture, MEDEA '03 - Antibes Juan-les-Pins, Frankreich Dauer: 29 Sept. 2004 → 3 Okt. 2004 |
Konferenz
Konferenz | 2003 Workshop on Memory Performance: Dealing with Applications, Systems and Architecture, MEDEA '03 |
---|---|
Land/Gebiet | Frankreich |
Ort | Antibes Juan-les-Pins |
Zeitraum | 29 Sept. 2004 → 3 Okt. 2004 |
Abstract
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Angewandte Informatik
- Informatik (insg.)
- Hardware und Architektur
- Ingenieurwesen (insg.)
- Elektrotechnik und Elektronik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
2003. 62-69 Beitrag in 2003 Workshop on Memory Performance: Dealing with Applications, Systems and Architecture, MEDEA '03, Antibes Juan-les-Pins, Frankreich.
Publikation: Konferenzbeitrag › Paper › Forschung › Peer-Review
}
TY - CONF
T1 - A scalable, clustered SMT processor for digital signal processing
AU - Berekovic, Mladen
AU - Moch, Sören
AU - Pirsch, Peter
PY - 2003/9/27
Y1 - 2003/9/27
N2 - A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.
AB - A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.
UR - http://www.scopus.com/inward/record.url?scp=77953574256&partnerID=8YFLogxK
U2 - 10.1145/1152923.1024304
DO - 10.1145/1152923.1024304
M3 - Paper
AN - SCOPUS:77953574256
SP - 62
EP - 69
T2 - 2003 Workshop on Memory Performance: Dealing with Applications, Systems and Architecture, MEDEA '03
Y2 - 29 September 2004 through 3 October 2004
ER -