Networks-on-chip from implementations to programming paradigms
Networks-on-Chip: From Implementations to Programming Paradigms provides a thorough and bottom-up exploration of the whole NoC design space in a coherent and uniform fashion, from low-level router, buffer and topology implementations, to routing and flow control schemes, to co-optimizations of NoC a...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Waltham, Massachusetts :
Morgan Kaufmann
2015.
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629232706719 |
Tabla de Contenidos:
- Front Cover
- Networks-on-Chip: From Implementations to Programming Paradigms
- Copyright
- Contents in Brief
- Contents
- Preface
- About the Editor-in-Chief and Authors
- Editor-in-Chief
- Authors
- Part I: Prologue
- Chapter 1: Introduction
- 1.1 The dawn of the many-core era
- 1.2 Communication-centric cross-layer optimizations
- 1.3 A baseline design space exploration of NoCs
- 1.3.1 Topology
- 1.3.2 Routing algorithm
- 1.3.3 Flow control
- 1.3.4 Router microarchitecture
- 1.3.5 Performance metric
- 1.4 Review of NoC research
- 1.4.1 Research on topologies
- 1.4.2 Research on unicast routing
- 1.4.3 Research on supporting collective communications
- 1.4.4 Research on flow control
- 1.4.5 Research on router microarchitecture
- 1.5 Trends of real processors
- 1.5.1 The MIT Raw processor
- 1.5.2 The Tilera TILE64 processor
- 1.5.3 The Sony/Toshiba/IBM Cell processor
- 1.5.4 The U.T. Austin TRIPS processor
- 1.5.5 The Intel Teraflops processor
- 1.5.6 The Intel SCC processor
- 1.5.7 The Intel Larrabee processor
- 1.5.8 The Intel Knights Corner processor
- 1.5.9 Summary of real processors
- 1.6 Overview of the book
- References
- Part II: Logic implementations
- Chapter 2: A single-cycle router with wing channels
- 2.1 Introduction
- 2.2 The router architecture
- 2.2.1 The overall architecture
- 2.2.2 Wing channels
- 2.3 Microarchitecture designs
- 2.3.1 Channel dispensers
- 2.3.2 Fast arbiter components
- 2.3.3 SIG managers and SIG controllers
- 2.4 Experimental results
- 2.4.1 Simulation infrastructures
- 2.4.2 Pipeline delay analysis
- 2.4.3 Latency and throughput
- 2.4.4 Area and power consumption
- 2.5 Chapter summary
- References
- Chapter 3: Dynamic virtual channel routers with congestion awareness
- 3.1 Introduction
- 3.2 DVC with congestion awareness
- 3.2.1 DVC scheme.
- 3.2.2 Congestion avoidance scheme
- 3.3 Multiple-port shared buffer with congestion awareness
- 3.3.1 DVC scheme among multiple ports
- 3.3.2 Congestion avoidance scheme
- 3.4 DVC router microarchitecture
- 3.4.1 VC control module
- 3.4.2 Metric aggregation and congestion avoidance
- 3.4.3 VC allocation module
- 3.5 HiBB router microarchitecture
- 3.5.1 VC control module
- 3.5.2 VC allocation and output port allocation
- 3.5.3 VC regulation
- 3.6 Evaluation
- 3.6.1 DVC router evaluation
- 3.6.2 HiBB router evaluation
- 3.7 Chapter summary
- References
- Chapter 4: Virtual bus structure-based network-on-chip topologies
- 4.1 Introduction
- 4.2 Background
- 4.3 Motivation
- 4.3.1 Baseline on-chip communication networks
- 4.3.1.1 Transaction-based bus
- 4.3.1.2 Packet-based NoC
- 4.3.2 Analysis of NoC problems
- 4.3.2.1 Multihop problem
- 4.3.2.2 Multicast problem
- 4.3.3 Advantages of a transaction-based bus
- 4.4 The VBON
- 4.4.1 Interconnect structures
- 4.4.1.1 Wire delay consideration
- 4.4.2 The VB mechanism
- 4.4.2.1 The VB construction
- 4.4.2.2 VB arbitration
- 4.4.2.3 Packet format
- 4.4.2.4 VB operation
- 4.4.2.5 A simple example for VB communication
- 4.4.3 Starvation and deadlock avoidance
- 4.4.4 The VBON router microarchitecture
- 4.5 Evaluation
- 4.5.1 Simulation infrastructures
- 4.5.1.1 Router choices for comparison
- 4.5.1.2 Network configuration
- 4.5.1.3 Traffic generation
- 4.5.2 Synthetic traffic evaluations
- 4.5.2.1 Single-level 4 4 VBON
- 4.5.2.2 Hierarchical 8 8 VBON
- 4.5.3 Real application evaluations
- 4.5.4 Power consumption analysis
- 4.5.5 Overhead analysis
- 4.6 Chapter summary
- References
- Part III: Routing and flow Control
- Chapter 5: Routing algorithms for workload consolidation
- 5.1 Introduction
- 5.2 Background
- 5.3 Motivation.
- 5.3.1 Insufficient information
- 5.3.2 Intraregion interference
- 5.3.3 Inter-region interference
- 5.4 Destination-based adaptive routing
- 5.4.1 Destination-based selection strategy
- 5.4.1.1 Congestion information propagation network
- 5.4.1.2 DBSS router microarchitecture
- 5.4.2 Routing function design
- 5.4.2.1 Offered path diversity
- 5.4.2.2 VC reallocation scheme
- 5.5 Evaluation
- 5.5.1 Evaluation of routing functions
- 5.5.2 Single-region performance
- 5.5.2.1 Synthetic traffic results
- 5.5.2.2 Application results
- 5.5.3 Multiple-region performance
- 5.5.3.1 Results for a small regular region
- 5.5.3.2 Irregular-region results
- 5.5.3.3 Summary
- 5.5.4 CMesh evaluation
- 5.5.4.1 Configuration
- 5.5.4.2 Performance
- 5.5.5 Hardware overhead
- 5.5.5.1 Wiring overhead
- 5.5.5.2 Router overhead
- 5.5.5.3 Power consumption
- 5.6 Analysis and discussion
- 5.6.1 In-depth analysis of interference
- 5.6.2 Design space exploration
- 5.6.2.1 Number of propagation wires
- 5.6.2.2 DBSS scalability
- 5.6.2.3 Congestion propagation delay
- 5.7 Chapter summary
- References
- Chapter 6: Flow control for fully adaptive routing
- 6.1 Introduction
- 6.2 Background
- 6.2.1 Deadlock avoidance theories
- 6.2.2 Fully adaptive routing algorithms
- 6.3 Motivation
- 6.3.1 VC reallocation
- 6.3.2 Routing flexibility
- 6.4 Flow control and routing designs
- 6.4.1 Whole packet forwarding
- 6.4.2 Aggressive VC reallocation for EVCs
- 6.4.3 Maintain routing flexibility
- 6.4.4 Router microarchitecture
- 6.5 Evaluation on synthetic traffic
- 6.5.1 Performance of synthetic workloads
- 6.5.2 Buffer utilization of routing algorithms
- 6.5.3 Sensitivity to network design
- 6.5.3.1 SFP ratio
- 6.5.3.2 VC depth
- 6.5.3.3 VC count
- 6.5.3.4 Network size
- 6.6 Evaluation of PARSEC workloads.
- 6.6.1 Methodology and configuration
- 6.6.2 Performance
- 6.7 Detailed analysis of flow control
- 6.7.1 The detailed buffer utilization
- 6.7.1.1 Allowable EVCs
- 6.7.1.2 Performance analysis
- 6.7.2 The effect of flow control on fairness
- 6.8 Further discussion
- 6.8.1 Packet length
- 6.8.2 Dynamically allocated multiqueue and hybrid flow controls
- 6.9 Chapter summary
- Appendix: Logical Equivalence of Alg and Alg + WPF
- References
- Chapter 7: Deadlock-free flow control for torus networks-on-chip
- 7.1 Introduction
- 7.2 Limitations of existing designs
- 7.2.1 Dateline
- 7.2.2 Localized bubble scheme
- 7.2.3 Critical bubble scheme
- 7.2.4 Inefficiency with variable-size packets
- 7.3 Flit bubble flow control
- 7.3.1 Theoretical description
- 7.3.2 FBFC-localized
- 7.3.3 FBFC-critical
- 7.3.4 Starvation
- 7.4 Router microarchitecture
- 7.4.1 FBFC routers
- 7.4.2 VCT routers
- 7.5 Methodology
- 7.6 Evaluation on 1D tori (rings)
- 7.6.1 Performance
- 7.6.2 Buffer utilization
- 7.6.3 Latency of short and long packets
- 7.7 Evaluation on 2D tori
- 7.7.1 Performance for a 44 torus
- 7.7.2 Sensitivity to SFP ratios
- 7.7.3 Sensitivity to buffer size
- 7.7.4 Scalability for an 88 torus
- 7.7.5 Effect of starvation
- 7.7.6 Real application performance
- 7.7.7 Large-scale systems and message passing
- 7.8 Overheads: Power and area
- 7.8.1 Methodology
- 7.8.2 Power efficiency
- 7.8.3 Area
- 7.8.4 Comparison with meshes
- 7.9 Discussion and related work
- 7.9.1 Discussion
- 7.9.2 Related work
- 7.10 Chapter summary
- References
- Part IV: Programming paradigms
- Chapter 8: Supporting cache-coherent collective communications
- 8.1 Introduction
- 8.2 Message combination framework
- 8.2.1 MCT format
- 8.2.2 Message combination example
- 8.2.3 Insufficient MCT entries
- 8.3 BAM routing.
- 8.4 Router pipeline and microarchitecture
- 8.5 Evaluation
- 8.5.1 Performance
- 8.5.1.1 Overall network performance
- 8.5.1.2 Multicast transaction performance
- 8.5.1.3 Real application performance
- 8.5.2 Comparing multicast VN configurations
- 8.5.2.1 Unicast performance
- 8.5.2.2 Multicast performance
- 8.5.3 MCT size
- 8.5.4 Sensitivity to network design
- 8.5.4.1 VC count
- 8.5.4.2 Multicast ratio
- 8.5.4.3 Destinations per multicast
- 8.5.4.4 Network size
- 8.6 Power analysis
- 8.7 Related work
- 8.7.1 Message combination
- 8.7.2 NoC multicast routing
- 8.8 Chapter summary
- References
- Chapter 9: Network-on-chip customizations for message passing interface primitives
- 9.1 Introduction
- 9.2 Background
- 9.3 Motivation
- 9.3.1 MPI adaption in NoC designs
- 9.3.2 Optimizations of MPI functions
- 9.4 Communication customization architectures
- 9.4.1 Architecture overview
- 9.4.2 The customized NoC design: VBON
- 9.4.3 The MPI primitive implementation: MU
- 9.4.3.1 The architecture of the MU
- 9.4.3.2 MPI processing unit
- 9.4.3.3 The collective operation implementation
- 9.4.3.4 Communication protocols
- 9.5 Evaluation
- 9.5.1 Methodology
- 9.5.2 Experimental results
- 9.5.2.1 The effect of point-to-point communication: Bandwidth
- 9.5.2.2 The effect of collective communication: Broadcast operations
- 9.5.2.3 The effect of collective communication: Barrier operations
- 9.5.2.4 The effect of collective communication: Reduce operation
- 9.5.2.5 The effect of application communication: Performance
- 9.5.2.6 The effect of application communication: Power and scalability
- 9.5.2.7 Implementation overheads
- 9.6 Chapter summary
- References
- Chapter 10: Message passing interface communication protocol optimizations
- 10.1 Introduction
- 10.2 Background
- 10.2.1 Communication protocols in MPI.
- 10.2.2 Existing problems.