Embedded computing for high performance design exploration and customization using high-level compilation and synthesis tools

Embedded Computing for High Performance: Design Exploration and Customization Using High-level Compilation and Synthesis Tools provides a set of real-life example implementations that migrate traditional desktop systems to embedded systems. Working with popular hardware, including Xilinx and ARM, th...

Descripción completa

Detalles Bibliográficos
Otros Autores: Cardoso, Joao Manuel Paiva, author (author), Coutinho, José Gabriel de Figueiredo, author, Diniz, Pedro C., author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Cambridge, Massachusetts : Morgan Kaufmann 2017.
Edición:Second edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630068406719
Tabla de Contenidos:
  • Front Cover
  • Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, CodeTransformations and Com...
  • Copyright
  • Dedication
  • Contents
  • About the Authors
  • Preface
  • Acknowledgments
  • Abbreviations
  • Chapter 1: Introduction
  • 1.1. Overview
  • 1.2. Embedded Systems in Society and Industry
  • 1.3. Embedded Computing Trends
  • 1.4. Embedded Systems: Prototyping and Production
  • 1.5. About LARA: An Aspect-Oriented Approach
  • 1.6. Objectives and Target Audience
  • 1.7. Complementary Bibliography
  • 1.8. Dependences in Terms of Knowledge
  • 1.9. Examples and Benchmarks
  • 1.10. Book Organization
  • 1.11. Intended Use
  • 1.12. Summary
  • References
  • Chapter 2: High-performance embedded computing
  • 2.1. Introduction
  • 2.2. Target Architectures
  • 2.2.1. Hardware Accelerators as Coprocessors
  • 2.2.2. Multiprocessor and Multicore Architectures
  • 2.2.3. Heterogeneous Multiprocessor/Multicore Architectures
  • 2.2.4. OpenCL Platform Model
  • 2.3. Core-Based Architectural Enhancements
  • 2.3.1. Single Instruction, Multiple Data Units
  • 2.3.2. Fused Multiply-Add Units
  • 2.3.3. Multithreading Support
  • 2.4. Common Hardware Accelerators
  • 2.4.1. GPU Accelerators
  • 2.4.2. Reconfigurable Hardware Accelerators
  • 2.4.3. SoCs With Reconfigurable Hardware
  • 2.5. Performance
  • 2.5.1. Amdahl's Law
  • 2.5.2. The Roofline Model
  • 2.5.3. Worst-Case Execution Time Analysis
  • 2.6. Power and Energy Consumption
  • 2.6.1. Dynamic Power Management
  • 2.6.2. Dynamic Voltage and Frequency Scaling
  • 2.6.3. Dark Silicon
  • 2.7. Comparing Results
  • 2.8. Summary
  • 2.9. Further Reading
  • References
  • Chapter 3: Controlling the design and development cycle
  • 3.1. Introduction
  • 3.2. Specifications in MATLAB and C: Prototyping and Development
  • 3.2.1. Abstraction Levels
  • 3.2.2. Dealing With Different Concerns.
  • 3.2.3. Dealing With Generic Code
  • 3.2.4. Dealing With Multiple Targets
  • 3.3. Translation, Compilation, and Synthesis Design flows
  • 3.4. Hardware/Software Partitioning
  • 3.4.1. Static Partitioning
  • 3.4.2. Dynamic Partitioning
  • 3.5. LARA: a language for Specifying Strategies
  • 3.5.1. Select and Apply
  • 3.5.2. Insert Action
  • 3.5.3. Exec and Def Actions
  • 3.5.4. Invoking Aspects
  • 3.5.5. Executing External Tools
  • 3.5.6. Compilation and Synthesis Strategies in LARA
  • 3.6. Summary
  • 3.7. Further Reading
  • References
  • Chapter 4: Source code analysis and instrumentation
  • 4.1. Introduction
  • 4.2. Analysis and Metrics
  • 4.3. Static Source Code Analysis
  • 4.3.1. Data Dependences
  • 4.3.2. Code Metrics
  • 4.4. Dynamic Analysis: The Need for Instrumentation
  • 4.4.1. Information From Profiling
  • 4.4.2. Profiling Example
  • 4.5. Custom Profiling Examples
  • 4.5.1. Finding Hotspots
  • 4.5.2. Loop Metrics
  • 4.5.3. Dynamic Call Graphs
  • 4.5.4. Branch Frequencies
  • 4.5.5. Heap Memory
  • 4.6. Summary
  • 4.7. Further Reading
  • References
  • Chapter 5: Source code transformations and optimizations
  • 5.1. Introduction
  • 5.2. Basic Transformations
  • 5.3. Data Type Conversions
  • 5.4. Code Reordering
  • 5.5. Data Reuse
  • 5.6. Loop-Based Transformations
  • 5.6.1. Loop Alignment
  • 5.6.2. Loop Coalescing
  • 5.6.3. Loop Flattening
  • 5.6.4. Loop Fusion and Loop Fission
  • 5.6.5. Loop Interchange and Loop Permutation (Loop Reordering)
  • 5.6.6. Loop Peeling
  • 5.6.7. Loop Shifting
  • 5.6.8. Loop Skewing
  • 5.6.9. Loop Splitting
  • 5.6.10. Loop Stripmining
  • 5.6.11. Loop Tiling (Loop Blocking)
  • 5.6.12. Loop Unrolling
  • 5.6.13. Unroll and Jam
  • 5.6.14. Loop Unswitching
  • 5.6.15. Loop Versioning
  • 5.6.16. Software Pipelining
  • 5.6.17. Evaluator-Executor Transformation
  • 5.6.18. Loop Perforation
  • 5.6.19. Other Loop Transformations.
  • 5.6.20. Overview
  • 5.7. Function-Based Transformations
  • 5.7.1. Function Inlining/Outlining
  • 5.7.2. Partial Evaluation and Code Specialization
  • 5.7.3. Function Approximation
  • 5.8. Data structure-Based Transformations
  • 5.8.1. Scalar Expansion, Array Contraction, and Array Scalarization
  • 5.8.2. Scalar and Array Renaming
  • 5.8.3. Arrays and Records
  • 5.8.4. Reducing the Number of Dimensions of Arrays
  • 5.8.5. From Arrays to Pointers and Array Recovery
  • 5.8.6. Array Padding
  • 5.8.7. Representation of Matrices and Graphs
  • 5.8.8. Object Inlining
  • 5.8.9. Data Layout Transformations
  • 5.8.10. Data Replication and Data Distribution
  • 5.9. From Recursion to Iterations
  • 5.10. From Nonstreaming to Streaming
  • 5.11. Data and Computation Partitioning
  • 5.11.1. Data Partitioning
  • 5.11.2. Partitioning Computations
  • 5.11.3. Computation Offloading
  • 5.12. LARA Strategies
  • 5.13. Summary
  • 5.14. Further Reading
  • References
  • Chapter 6: Code retargeting for CPU-based platforms
  • 6.1. Introduction
  • 6.2. Retargeting Mechanisms
  • 6.3. Parallelism and Compiler Options
  • 6.3.1. Parallel Execution Opportunities
  • 6.3.2. Compiler Options
  • 6.3.3. Compiler Phase Selection and Ordering
  • 6.4. Loop Vectorization
  • 6.5. Shared Memory (Multicore)
  • 6.6. Distributed Memory (Multiprocessor)
  • 6.7. Cache-based Program Optimizations
  • 6.8. LARA Strategies
  • 6.8.1. Capturing Heuristics to Control Code Transformations
  • 6.8.2. Parallelizing Code With OpenMP
  • 6.8.3. Monitoring an MPI Application
  • 6.9. Summary
  • 6.10. Further Reading
  • References
  • Chapter 7: Targeting heterogeneous computing platforms
  • 7.1. Introduction
  • 7.2. Roofline Model Revisited
  • 7.3. Workload Distribution
  • 7.4. Graphics Processing Units
  • 7.5. High-level Synthesis
  • 7.6. LARA Strategies
  • 7.7. Summary
  • 7.8. Further Reading
  • References.
  • Chapter 8: Additional topics
  • 8.1. Introduction
  • 8.2. Design Space Exploration
  • 8.2.1. Single-Objective Optimization and Single/Multiple Criteria
  • 8.2.2. Multiobjective Optimization, Pareto Optimal Solutions
  • 8.2.3. DSE Automation
  • 8.3. Hardware/Software Codesign
  • 8.4. Runtime Adaptability
  • 8.4.1. Tuning Application Parameters
  • 8.4.2. Adaptive Algorithms
  • 8.4.3. Resource Adaptivity
  • 8.5. Automatic Tuning (Autotuning)
  • 8.5.1. Search Space
  • 8.5.2. Static and Dynamic Autotuning
  • 8.5.3. Models for Autotuning
  • 8.5.4. Autotuning Without Dynamic Compilation
  • 8.5.5. Autotuning With Dynamic Compilation
  • 8.6. Using LARA for Exploration of Code Transformation Strategies
  • 8.7. Summary
  • 8.8. Further Reading
  • References
  • Glossary
  • Index
  • Back Cover.