jagomart
digital resources
picture1_Opencl Programming Guide 190216 | Opencl Programming Guide


 158x       Filetype PDF       File size 1.54 MB       Source: developer.download.nvidia.com


File: Opencl Programming Guide 190216 | Opencl Programming Guide
opencl programming guide for the cuda architecture version 4 2 3 9 2012 table of contents chapter 1 introduction 7 1 1 from graphics processing to general purpose parallel computing ...

icon picture PDF Filetype PDF | Posted on 03 Feb 2023 | 2 years ago
Partial capture of text on file.
                         
                                  OpenCL™  
                                  Programming Guide 
                                  for the CUDA™ 
                                  Architecture 
                                   
                                      
                                  Version 4.2 
                                  3/9/2012 
                                   
                                     
                         
                                                                     
                      
                      
                                                                                      Table of Contents 
                     Chapter 1. Introduction ................................................................................... 7 
                       1.1     From Graphics Processing to General-Purpose Parallel Computing ................... 7 
                       1.2     CUDA™: a General-Purpose Parallel Computing Architecture .......................... 9 
                       1.3     A Scalable Programming Model .................................................................. 10 
                       1.4     Document’s Structure ............................................................................... 11 
                     Chapter 2. OpenCL on the CUDA Architecture ............................................... 13 
                       2.1     CUDA Architecture .................................................................................... 13 
                          2.1.1     SIMT Architecture .............................................................................. 15 
                          2.1.2     Hardware Multithreading .................................................................... 16 
                       2.2     Compilation .............................................................................................. 17 
                          2.2.1     PTX .................................................................................................. 17 
                          2.2.2     Volatile .............................................................................................. 17 
                       2.3     Compute Capability ................................................................................... 18 
                       2.4     Mode Switches ......................................................................................... 18 
                       2.5     Matrix Multiplication Example ..................................................................... 18 
                     Chapter 3. Performance Guidelines ............................................................... 27 
                       3.1     Overall Performance Optimization Strategies ............................................... 27 
                       3.2     Maximize Utilization .................................................................................. 27 
                          3.2.1     Application Level ................................................................................ 27 
                          3.2.2     Device Level ...................................................................................... 28 
                          3.2.3     Multiprocessor Level ........................................................................... 28 
                       3.3     Maximize Memory Throughput ................................................................... 30 
                          3.3.1     Data Transfer between Host and Device .............................................. 31 
                          3.3.2     Device Memory Accesses .................................................................... 32 
                            3.3.2.1      Global Memory ............................................................................ 32 
                            3.3.2.2      Local Memory .............................................................................. 33 
                            3.3.2.3      Shared Memory ........................................................................... 34 
                            3.3.2.4      Constant Memory ........................................................................ 34 
                      
                     ii                                                          OpenCL Programming Guide Version 4.2 
                      
                     
                     
                            3.3.2.5     Texture Memory .......................................................................... 35 
                       3.4    Maximize Instruction Throughput ............................................................... 35 
                         3.4.1     Arithmetic Instructions ....................................................................... 36 
                         3.4.2     Control Flow Instructions .................................................................... 38 
                         3.4.3     Synchronization Instruction ................................................................. 38 
                    Appendix A. CUDA-Enabled GPUs .................................................................. 41 
                    Appendix B. Mathematical Functions Accuracy .............................................. 43 
                       B.1    Standard Functions ................................................................................... 43 
                         B.1.1     Single-Precision Floating-Point Functions .............................................. 43 
                         B.1.2     Double-Precision Floating-Point Functions ............................................ 45 
                       B.2    Native Functions ....................................................................................... 47 
                    Appendix C. Compute Capabilities ................................................................. 49 
                       C.1    Features and Technical Specifications ......................................................... 49 
                       C.2    Floating-Point Standard ............................................................................. 51 
                       C.3    Compute Capability 1.x ............................................................................. 52 
                         C.3.1     Architecture ....................................................................................... 52 
                         C.3.2     Global Memory .................................................................................. 53 
                            C.3.2.1     Devices of Compute Capability 1.0 and 1.1 .................................... 53 
                            C.3.2.2     Devices of Compute Capability 1.2 and 1.3 .................................... 54 
                         C.3.3     Shared Memory ................................................................................. 54 
                            C.3.3.1     32-Bit Strided Access ................................................................... 54 
                            C.3.3.2     32-Bit Broadcast Access ............................................................... 55 
                            C.3.3.3     8-Bit and 16-Bit Access ................................................................ 55 
                            C.3.3.4     Larger Than 32-Bit Access ............................................................ 55 
                       C.4    Compute Capability 2.x ............................................................................. 56 
                         C.4.1     Architecture ....................................................................................... 56 
                         C.4.2     Global Memory .................................................................................. 57 
                         C.4.3     Shared Memory ................................................................................. 58 
                            C.4.3.1     32-Bit Strided Access ................................................................... 58 
                            C.4.3.2     Larger Than 32-Bit Access ............................................................ 58 
                         C.4.4     Constant Memory ............................................................................... 59 
                       C.5    Compute Capability 3.0 ............................................................................. 59 
                         C.5.1     Architecture ....................................................................................... 59 
                     
                    OpenCL Programming Guide Version 4.2                                                         iii 
                     
                     
                     
                         C.5.2    Global Memory .................................................................................. 60 
                         C.5.3    Shared Memory ................................................................................. 62 
                                     
                     
                    iv                                                        OpenCL Programming Guide Version 4.2 
                     
The words contained in this file might help you see if this file matches what you are looking for:

...Opencl programming guide for the cuda architecture version table of contents chapter introduction from graphics processing to general purpose parallel computing a scalable model document s structure on simt hardware multithreading compilation ptx volatile compute capability mode switches matrix multiplication example performance guidelines overall optimization strategies maximize utilization application level device multiprocessor memory throughput data transfer between host and accesses global local shared constant ii texture instruction arithmetic instructions control flow synchronization appendix enabled gpus b mathematical functions accuracy standard single precision floating point double native c capabilities features technical specifications x devices bit strided access broadcast larger than iii iv...

no reviews yet
Please Login to review.