Programming massively parallel processors a hands-on approach

Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, wh...

Descripción completa

Detalles Bibliográficos
Autor principal: Kirk, David, 1960- (-)
Otros Autores: Hwu, Wen-mei
Formato: Libro electrónico
Idioma:Inglés
Publicado: Burlington, MA : Morgan Kaufmann Publishers c2013.
Edición:2nd ed
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009628564806719
Tabla de Contenidos:
  • Front Cover; Programming Massively Parallel Processors; Copyright Page; Contents; Preface; Target Audience; How to Use the Book; A Three-Phased Approach; Tying It All Together: The Final Project; Project Workshop; Design Document; Project Report; Online Supplements; Acknowledgements; Dedication; 1 Introduction; 1.1 Heterogeneous Parallel Computing; 1.2 Architecture of a Modern GPU; 1.3 Why More Speed or Parallelism?; 1.4 Speeding Up Real Applications; 1.5 Parallel Programming Languages and Models; 1.6 Overarching Goals; 1.7 Organization of the Book; References; 2 History of GPU Computing
  • 2.1 Evolution of Graphics PipelinesThe Era of Fixed-Function Graphics Pipelines; Evolution of Programmable Real-Time Graphics; Unified Graphics and Computing Processors; 2.2 GPGPU: An Intermediate Step; 2.3 GPU Computing; Scalable GPUs; Recent Developments; Future Trends; References and Further Reading; 3 Introduction to Data Parallelism and CUDA C; 3.1 Data Parallelism; 3.2 CUDA Program Structure; 3.3 A Vector Addition Kernel; 3.4 Device Global Memory and Data Transfer; 3.5 Kernel Functions and Threading; 3.6 Summary; Function Declarations; Kernel Launch; Predefined Variables; Runtime API
  • 3.7 ExercisesReferences; 4 Data-Parallel Execution Model; 4.1 Cuda Thread Organization; 4.2 Mapping Threads to Multidimensional Data; 4.3 Matrix-Matrix Multiplication-A More Complex Kernel; 4.4 Synchronization and Transparent Scalability; 4.5 Assigning Resources to Blocks; 4.6 Querying Device Properties; 4.7 Thread Scheduling and Latency Tolerance; 4.8 Summary; 4.9 Exercises; 5 CUDA Memories; 5.1 Importance of Memory Access Efficiency; 5.2 CUDA Device Memory Types; 5.3 A Strategy for Reducing Global Memory Traffic; 5.4 A Tiled Matrix-Matrix Multiplication Kernel
  • 5.5 Memory as a Limiting Factor to Parallelism5.6 Summary; 5.7 Exercises; 6 Performance Considerations; 6.1 Warps and Thread Execution; 6.2 Global Memory Bandwidth; 6.3 Dynamic Partitioning of Execution Resources; 6.4 Instruction Mix and Thread Granularity; 6.5 Summary; 6.6 Exercises; References; 7 Floating-Point Considerations; 7.1 Floating-Point Format; Normalized Representation of M; Excess Encoding of E; 7.2 Representable Numbers; 7.3 Special Bit Patterns and Precision in Ieee Format; 7.4 Arithmetic Accuracy and Rounding; 7.5 Algorithm Considerations; 7.6 Numerical Stability; 7.7 Summary
  • 7.8 ExercisesReferences; 8 Parallel Patterns: Convolution; 8.1 Background; 8.2 1D Parallel Convolution-A Basic Algorithm; 8.3 Constant Memory and Caching; 8.4 Tiled 1D Convolution with Halo Elements; 8.5 A Simpler Tiled 1D Convolution-General Caching; 8.6 Summary; 8.7 Exercises; 9 Parallel Patterns: Prefix Sum; 9.1 Background; 9.2 A Simple Parallel Scan; 9.3 Work Efficiency Considerations; 9.4 A Work-Efficient Parallel Scan; 9.5 Parallel Scan for Arbitrary-Length Inputs; 9.6 Summary; 9.7 Exercises; Reference; 10 Parallel Patterns: Sparse Matrix-Vector Multiplication; 10.1 Background
  • 10.2 Parallel SpMV Using CSR