OpenCL buffer creation is too tedious and every time takes me lot of my time figuring out why it doesn't work. Here is some notes I learned after working on several hours.
__constant Limitation
Cannot set a large buffer as __constant in kernel arguments. nVidia GPU will be erroneous reading the buffer(but not showing error message! But you get wrong values!) because constant values are in local registers which cannot contain large array.
Solution: Use __global instead. To enhance performance, use CL_READ_ONLY in clCreateBuffer. This will increase the speed a lot.
Data Re-usage
If you have an array being changed in OpenCL kernel, and will be re-used next time running the kernel, it's not efficient if you just create a CL_READ_WRITE buffer.
Solution: Instead, create two buffers, CL_READ_ONLY, CL_WRITE_ONLY, and set them __global in the kernel arguments. After each round the host copy (clCopyBuffer) the written buffer back to the read-only buffer.In other words, never use CL_READ_WRITE! That will turn parallelism down to be serialized!
* Revision: I used CL_READ_WRITE in the end and found that it worked well, to hold my previous state and re-usable in the next clExecuteNDRangeKernel call. This saves time copying buffers. I am not sure why it ran slowly before. Maybe the updated driver or I am aware of buffer coalescing.
OpenCL buffer usage
Labels: OpenCLPosted by MiGi at 8:56 AM
Subscribe to:
Post Comments (Atom)
Diseño e iconos por N.Design Studio | A Blogger por Blog and Web
1 comments:
I may be wrong in this post about using CL_READ_WRITE.
My recent trials show that CL_READ_WRITE doesn't harm the performance if you only read/write the same location per each thread in the kernel (and maybe even keep coalescing access -- see OpenCL Best Programming Guide from nVidia)
Post a Comment