Networks-on-chip from implementations to programming paradigms

Networks-on-Chip: From Implementations to Programming Paradigms provides a thorough and bottom-up exploration of the whole NoC design space in a coherent and uniform fashion, from low-level router, buffer and topology implementations, to routing and flow control schemes, to co-optimizations of NoC a...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Ma, Sheng, author (author), Wang, Zhiying, editor (editor)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Waltham, Massachusetts : Morgan Kaufmann 2015.
Edición:	First edition
Materias:	Networks on a chip > Design and construction. Networks on a chip > Reliability.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629232706719

Tabla de Contenidos:

Front Cover
Networks-on-Chip: From Implementations to Programming Paradigms
Copyright
Contents in Brief
Contents
Preface
About the Editor-in-Chief and Authors
Editor-in-Chief
Authors
Part I: Prologue
Chapter 1: Introduction
1.1 The dawn of the many-core era
1.2 Communication-centric cross-layer optimizations
1.3 A baseline design space exploration of NoCs
1.3.1 Topology
1.3.2 Routing algorithm
1.3.3 Flow control
1.3.4 Router microarchitecture
1.3.5 Performance metric
1.4 Review of NoC research
1.4.1 Research on topologies
1.4.2 Research on unicast routing
1.4.3 Research on supporting collective communications
1.4.4 Research on flow control
1.4.5 Research on router microarchitecture
1.5 Trends of real processors
1.5.1 The MIT Raw processor
1.5.2 The Tilera TILE64 processor
1.5.3 The Sony/Toshiba/IBM Cell processor
1.5.4 The U.T. Austin TRIPS processor
1.5.5 The Intel Teraflops processor
1.5.6 The Intel SCC processor
1.5.7 The Intel Larrabee processor
1.5.8 The Intel Knights Corner processor
1.5.9 Summary of real processors
1.6 Overview of the book
References
Part II: Logic implementations
Chapter 2: A single-cycle router with wing channels
2.1 Introduction
2.2 The router architecture
2.2.1 The overall architecture
2.2.2 Wing channels
2.3 Microarchitecture designs
2.3.1 Channel dispensers
2.3.2 Fast arbiter components
2.3.3 SIG managers and SIG controllers
2.4 Experimental results
2.4.1 Simulation infrastructures
2.4.2 Pipeline delay analysis
2.4.3 Latency and throughput
2.4.4 Area and power consumption
2.5 Chapter summary
References
Chapter 3: Dynamic virtual channel routers with congestion awareness
3.1 Introduction
3.2 DVC with congestion awareness
3.2.1 DVC scheme.
3.2.2 Congestion avoidance scheme
3.3 Multiple-port shared buffer with congestion awareness
3.3.1 DVC scheme among multiple ports
3.3.2 Congestion avoidance scheme
3.4 DVC router microarchitecture
3.4.1 VC control module
3.4.2 Metric aggregation and congestion avoidance
3.4.3 VC allocation module
3.5 HiBB router microarchitecture
3.5.1 VC control module
3.5.2 VC allocation and output port allocation
3.5.3 VC regulation
3.6 Evaluation
3.6.1 DVC router evaluation
3.6.2 HiBB router evaluation
3.7 Chapter summary
References
Chapter 4: Virtual bus structure-based network-on-chip topologies
4.1 Introduction
4.2 Background
4.3 Motivation
4.3.1 Baseline on-chip communication networks
4.3.1.1 Transaction-based bus
4.3.1.2 Packet-based NoC
4.3.2 Analysis of NoC problems
4.3.2.1 Multihop problem
4.3.2.2 Multicast problem
4.3.3 Advantages of a transaction-based bus
4.4 The VBON
4.4.1 Interconnect structures
4.4.1.1 Wire delay consideration
4.4.2 The VB mechanism
4.4.2.1 The VB construction
4.4.2.2 VB arbitration
4.4.2.3 Packet format
4.4.2.4 VB operation
4.4.2.5 A simple example for VB communication
4.4.3 Starvation and deadlock avoidance
4.4.4 The VBON router microarchitecture
4.5 Evaluation
4.5.1 Simulation infrastructures
4.5.1.1 Router choices for comparison
4.5.1.2 Network configuration
4.5.1.3 Traffic generation
4.5.2 Synthetic traffic evaluations
4.5.2.1 Single-level 4 4 VBON
4.5.2.2 Hierarchical 8 8 VBON
4.5.3 Real application evaluations
4.5.4 Power consumption analysis
4.5.5 Overhead analysis
4.6 Chapter summary
References
Part III: Routing and flow Control
Chapter 5: Routing algorithms for workload consolidation
5.1 Introduction
5.2 Background
5.3 Motivation.
5.3.1 Insufficient information
5.3.2 Intraregion interference
5.3.3 Inter-region interference
5.4 Destination-based adaptive routing
5.4.1 Destination-based selection strategy
5.4.1.1 Congestion information propagation network
5.4.1.2 DBSS router microarchitecture
5.4.2 Routing function design
5.4.2.1 Offered path diversity
5.4.2.2 VC reallocation scheme
5.5 Evaluation
5.5.1 Evaluation of routing functions
5.5.2 Single-region performance
5.5.2.1 Synthetic traffic results
5.5.2.2 Application results
5.5.3 Multiple-region performance
5.5.3.1 Results for a small regular region
5.5.3.2 Irregular-region results
5.5.3.3 Summary
5.5.4 CMesh evaluation
5.5.4.1 Configuration
5.5.4.2 Performance
5.5.5 Hardware overhead
5.5.5.1 Wiring overhead
5.5.5.2 Router overhead
5.5.5.3 Power consumption
5.6 Analysis and discussion
5.6.1 In-depth analysis of interference
5.6.2 Design space exploration
5.6.2.1 Number of propagation wires
5.6.2.2 DBSS scalability
5.6.2.3 Congestion propagation delay
5.7 Chapter summary
References
Chapter 6: Flow control for fully adaptive routing
6.1 Introduction
6.2 Background
6.2.1 Deadlock avoidance theories
6.2.2 Fully adaptive routing algorithms
6.3 Motivation
6.3.1 VC reallocation
6.3.2 Routing flexibility
6.4 Flow control and routing designs
6.4.1 Whole packet forwarding
6.4.2 Aggressive VC reallocation for EVCs
6.4.3 Maintain routing flexibility
6.4.4 Router microarchitecture
6.5 Evaluation on synthetic traffic
6.5.1 Performance of synthetic workloads
6.5.2 Buffer utilization of routing algorithms
6.5.3 Sensitivity to network design
6.5.3.1 SFP ratio
6.5.3.2 VC depth
6.5.3.3 VC count
6.5.3.4 Network size
6.6 Evaluation of PARSEC workloads.
6.6.1 Methodology and configuration
6.6.2 Performance
6.7 Detailed analysis of flow control
6.7.1 The detailed buffer utilization
6.7.1.1 Allowable EVCs
6.7.1.2 Performance analysis
6.7.2 The effect of flow control on fairness
6.8 Further discussion
6.8.1 Packet length
6.8.2 Dynamically allocated multiqueue and hybrid flow controls
6.9 Chapter summary
Appendix: Logical Equivalence of Alg and Alg + WPF
References
Chapter 7: Deadlock-free flow control for torus networks-on-chip
7.1 Introduction
7.2 Limitations of existing designs
7.2.1 Dateline
7.2.2 Localized bubble scheme
7.2.3 Critical bubble scheme
7.2.4 Inefficiency with variable-size packets
7.3 Flit bubble flow control
7.3.1 Theoretical description
7.3.2 FBFC-localized
7.3.3 FBFC-critical
7.3.4 Starvation
7.4 Router microarchitecture
7.4.1 FBFC routers
7.4.2 VCT routers
7.5 Methodology
7.6 Evaluation on 1D tori (rings)
7.6.1 Performance
7.6.2 Buffer utilization
7.6.3 Latency of short and long packets
7.7 Evaluation on 2D tori
7.7.1 Performance for a 44 torus
7.7.2 Sensitivity to SFP ratios
7.7.3 Sensitivity to buffer size
7.7.4 Scalability for an 88 torus
7.7.5 Effect of starvation
7.7.6 Real application performance
7.7.7 Large-scale systems and message passing
7.8 Overheads: Power and area
7.8.1 Methodology
7.8.2 Power efficiency
7.8.3 Area
7.8.4 Comparison with meshes
7.9 Discussion and related work
7.9.1 Discussion
7.9.2 Related work
7.10 Chapter summary
References
Part IV: Programming paradigms
Chapter 8: Supporting cache-coherent collective communications
8.1 Introduction
8.2 Message combination framework
8.2.1 MCT format
8.2.2 Message combination example
8.2.3 Insufficient MCT entries
8.3 BAM routing.
8.4 Router pipeline and microarchitecture
8.5 Evaluation
8.5.1 Performance
8.5.1.1 Overall network performance
8.5.1.2 Multicast transaction performance
8.5.1.3 Real application performance
8.5.2 Comparing multicast VN configurations
8.5.2.1 Unicast performance
8.5.2.2 Multicast performance
8.5.3 MCT size
8.5.4 Sensitivity to network design
8.5.4.1 VC count
8.5.4.2 Multicast ratio
8.5.4.3 Destinations per multicast
8.5.4.4 Network size
8.6 Power analysis
8.7 Related work
8.7.1 Message combination
8.7.2 NoC multicast routing
8.8 Chapter summary
References
Chapter 9: Network-on-chip customizations for message passing interface primitives
9.1 Introduction
9.2 Background
9.3 Motivation
9.3.1 MPI adaption in NoC designs
9.3.2 Optimizations of MPI functions
9.4 Communication customization architectures
9.4.1 Architecture overview
9.4.2 The customized NoC design: VBON
9.4.3 The MPI primitive implementation: MU
9.4.3.1 The architecture of the MU
9.4.3.2 MPI processing unit
9.4.3.3 The collective operation implementation
9.4.3.4 Communication protocols
9.5 Evaluation
9.5.1 Methodology
9.5.2 Experimental results
9.5.2.1 The effect of point-to-point communication: Bandwidth
9.5.2.2 The effect of collective communication: Broadcast operations
9.5.2.3 The effect of collective communication: Barrier operations
9.5.2.4 The effect of collective communication: Reduce operation
9.5.2.5 The effect of application communication: Performance
9.5.2.6 The effect of application communication: Power and scalability
9.5.2.7 Implementation overheads
9.6 Chapter summary
References
Chapter 10: Message passing interface communication protocol optimizations
10.1 Introduction
10.2 Background
10.2.1 Communication protocols in MPI.
10.2.2 Existing problems.

Networks-on-chip from implementations to programming paradigms

Ejemplares similares