GPU Considerations

GPU Compute Unit

Work-group Size

As a rule of thumb make your work-group size the same as your wavefront or warp size.

Global Memory

What you may think...

Actually the reality is different. Global memory is accessed through channels and depending the size of the chunk that you read/write or it's boundaries, you may have performance impact. Also you want to avoid that multiple compute unit's use the same memory channel, because it will serialize your

To minimize this problem we should try to enforce work-items to access adjacent memory.

GPU/CPU Transfer

Consider the following cuda program. It's just allocate and transfer 1Gb of memory

Now if we use the console profiler (I'm assuming that you already have cuda toolkit installed)

Optionally we can use the NVIDIA visual profiler

Last updated

Was this helpful?