Embedded computing for high performance design exploration and customization using high-level compilation and synthesis tools

Embedded Computing for High Performance: Design Exploration and Customization Using High-level Compilation and Synthesis Tools provides a set of real-life example implementations that migrate traditional desktop systems to embedded systems. Working with popular hardware, including Xilinx and ARM, th...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Cardoso, Joao Manuel Paiva, author (author), Coutinho, José Gabriel de Figueiredo, author, Diniz, Pedro C., author
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Cambridge, Massachusetts : Morgan Kaufmann 2017.
Edición:	Second edition
Materias:	Embedded computer systems. High performance computing.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630068406719

Tabla de Contenidos:

Front Cover
Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, CodeTransformations and Com...
Copyright
Dedication
Contents
About the Authors
Preface
Acknowledgments
Abbreviations
Chapter 1: Introduction
1.1. Overview
1.2. Embedded Systems in Society and Industry
1.3. Embedded Computing Trends
1.4. Embedded Systems: Prototyping and Production
1.5. About LARA: An Aspect-Oriented Approach
1.6. Objectives and Target Audience
1.7. Complementary Bibliography
1.8. Dependences in Terms of Knowledge
1.9. Examples and Benchmarks
1.10. Book Organization
1.11. Intended Use
1.12. Summary
References
Chapter 2: High-performance embedded computing
2.1. Introduction
2.2. Target Architectures
2.2.1. Hardware Accelerators as Coprocessors
2.2.2. Multiprocessor and Multicore Architectures
2.2.3. Heterogeneous Multiprocessor/Multicore Architectures
2.2.4. OpenCL Platform Model
2.3. Core-Based Architectural Enhancements
2.3.1. Single Instruction, Multiple Data Units
2.3.2. Fused Multiply-Add Units
2.3.3. Multithreading Support
2.4. Common Hardware Accelerators
2.4.1. GPU Accelerators
2.4.2. Reconfigurable Hardware Accelerators
2.4.3. SoCs With Reconfigurable Hardware
2.5. Performance
2.5.1. Amdahl's Law
2.5.2. The Roofline Model
2.5.3. Worst-Case Execution Time Analysis
2.6. Power and Energy Consumption
2.6.1. Dynamic Power Management
2.6.2. Dynamic Voltage and Frequency Scaling
2.6.3. Dark Silicon
2.7. Comparing Results
2.8. Summary
2.9. Further Reading
References
Chapter 3: Controlling the design and development cycle
3.1. Introduction
3.2. Specifications in MATLAB and C: Prototyping and Development
3.2.1. Abstraction Levels
3.2.2. Dealing With Different Concerns.
3.2.3. Dealing With Generic Code
3.2.4. Dealing With Multiple Targets
3.3. Translation, Compilation, and Synthesis Design flows
3.4. Hardware/Software Partitioning
3.4.1. Static Partitioning
3.4.2. Dynamic Partitioning
3.5. LARA: a language for Specifying Strategies
3.5.1. Select and Apply
3.5.2. Insert Action
3.5.3. Exec and Def Actions
3.5.4. Invoking Aspects
3.5.5. Executing External Tools
3.5.6. Compilation and Synthesis Strategies in LARA
3.6. Summary
3.7. Further Reading
References
Chapter 4: Source code analysis and instrumentation
4.1. Introduction
4.2. Analysis and Metrics
4.3. Static Source Code Analysis
4.3.1. Data Dependences
4.3.2. Code Metrics
4.4. Dynamic Analysis: The Need for Instrumentation
4.4.1. Information From Profiling
4.4.2. Profiling Example
4.5. Custom Profiling Examples
4.5.1. Finding Hotspots
4.5.2. Loop Metrics
4.5.3. Dynamic Call Graphs
4.5.4. Branch Frequencies
4.5.5. Heap Memory
4.6. Summary
4.7. Further Reading
References
Chapter 5: Source code transformations and optimizations
5.1. Introduction
5.2. Basic Transformations
5.3. Data Type Conversions
5.4. Code Reordering
5.5. Data Reuse
5.6. Loop-Based Transformations
5.6.1. Loop Alignment
5.6.2. Loop Coalescing
5.6.3. Loop Flattening
5.6.4. Loop Fusion and Loop Fission
5.6.5. Loop Interchange and Loop Permutation (Loop Reordering)
5.6.6. Loop Peeling
5.6.7. Loop Shifting
5.6.8. Loop Skewing
5.6.9. Loop Splitting
5.6.10. Loop Stripmining
5.6.11. Loop Tiling (Loop Blocking)
5.6.12. Loop Unrolling
5.6.13. Unroll and Jam
5.6.14. Loop Unswitching
5.6.15. Loop Versioning
5.6.16. Software Pipelining
5.6.17. Evaluator-Executor Transformation
5.6.18. Loop Perforation
5.6.19. Other Loop Transformations.
5.6.20. Overview
5.7. Function-Based Transformations
5.7.1. Function Inlining/Outlining
5.7.2. Partial Evaluation and Code Specialization
5.7.3. Function Approximation
5.8. Data structure-Based Transformations
5.8.1. Scalar Expansion, Array Contraction, and Array Scalarization
5.8.2. Scalar and Array Renaming
5.8.3. Arrays and Records
5.8.4. Reducing the Number of Dimensions of Arrays
5.8.5. From Arrays to Pointers and Array Recovery
5.8.6. Array Padding
5.8.7. Representation of Matrices and Graphs
5.8.8. Object Inlining
5.8.9. Data Layout Transformations
5.8.10. Data Replication and Data Distribution
5.9. From Recursion to Iterations
5.10. From Nonstreaming to Streaming
5.11. Data and Computation Partitioning
5.11.1. Data Partitioning
5.11.2. Partitioning Computations
5.11.3. Computation Offloading
5.12. LARA Strategies
5.13. Summary
5.14. Further Reading
References
Chapter 6: Code retargeting for CPU-based platforms
6.1. Introduction
6.2. Retargeting Mechanisms
6.3. Parallelism and Compiler Options
6.3.1. Parallel Execution Opportunities
6.3.2. Compiler Options
6.3.3. Compiler Phase Selection and Ordering
6.4. Loop Vectorization
6.5. Shared Memory (Multicore)
6.6. Distributed Memory (Multiprocessor)
6.7. Cache-based Program Optimizations
6.8. LARA Strategies
6.8.1. Capturing Heuristics to Control Code Transformations
6.8.2. Parallelizing Code With OpenMP
6.8.3. Monitoring an MPI Application
6.9. Summary
6.10. Further Reading
References
Chapter 7: Targeting heterogeneous computing platforms
7.1. Introduction
7.2. Roofline Model Revisited
7.3. Workload Distribution
7.4. Graphics Processing Units
7.5. High-level Synthesis
7.6. LARA Strategies
7.7. Summary
7.8. Further Reading
References.
Chapter 8: Additional topics
8.1. Introduction
8.2. Design Space Exploration
8.2.1. Single-Objective Optimization and Single/Multiple Criteria
8.2.2. Multiobjective Optimization, Pareto Optimal Solutions
8.2.3. DSE Automation
8.3. Hardware/Software Codesign
8.4. Runtime Adaptability
8.4.1. Tuning Application Parameters
8.4.2. Adaptive Algorithms
8.4.3. Resource Adaptivity
8.5. Automatic Tuning (Autotuning)
8.5.1. Search Space
8.5.2. Static and Dynamic Autotuning
8.5.3. Models for Autotuning
8.5.4. Autotuning Without Dynamic Compilation
8.5.5. Autotuning With Dynamic Compilation
8.6. Using LARA for Exploration of Code Transformation Strategies
8.7. Summary
8.8. Further Reading
References
Glossary
Index
Back Cover.

Embedded computing for high performance design exploration and customization using high-level compilation and synthesis tools

Ejemplares similares