Notice on using clCreateImage 使用 clCreateImage 需要注意的地方

2 comments


cl_mem clCreateImage3D(
cl_context context,
cl_mem_flags flags,
const cl_image_format *image_format,
size_t image_width,
size_t image_height,
size_t image_depth,
size_t image_row_pitch,
size_t image_slice_pitch,
void *host_ptr,
cl_int *errcode_ret)


the unit for image_width , image_height , image_depth are in pixels
but the unit for image_row_pitch , image_slice_pitch are in bytes!

Example:
mem_volume = clCreateImage3D(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, &volume_format,
vrParam.volSize[0], vrParam.volSize[1], vrParam.volSize[2], // unit in pixels
vrParam.volSize[0]*sizeof tmpBlock, vrParam.volSize[0]* vrParam.volSize[1]*sizeof tmpBlock, // unit in bytes
tmpBlock, &err);

Besides, the spec says "they should be power of 2 in bytes" which seems not mandatory in my experiment

host_ptr

A pointer to the image data that may already be allocated by the application.
The size of the buffer that host_ptr points to must be greater than or equal to
image_slice_pitch * image_depth. The size of each element in bytes must be a power
of 2. The image data specified by host_ptr is stored as a linear sequence of
adjacent 2D slices. Each 2D slice is a linear sequence of adjacent scanlines. Each
scanline is a linear sequence of image elements.

OpenCL / OpenGL Interop.

0 comments

Took me 1 week to figure out.

First download the 3.0 beta driver here (only this version supports CL/GL interop)
http://forums.nvidia.com/index.php?showtopic=149959

Then follow this page to modify the sample codes
http://oscarbg.blogspot.com/2009/11/amd-opencl-samples-on-nvidia-195-opencl_05.html

OpenCL & volume rendering - GF 8800 GTX

2 comments

OK I moved the same program to a PC with GF 8800 GTX
It runs pretty fast!
12.41 ms(80.51 fps) to render a skull ( 85x96x134 voxels) and
29.19 ms(34.25 fps) with 169x192x268 voxels.
The screen size is also 500x500.

PS. the volume renderer has no lighting, no transfer function lookup.


GPU info:
CL_DEVICE_NAME: GeForce 8800 GTX
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 195.62
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 16
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1350 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 192 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 768 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

CL_DEVICE_IMAGE 2D_MAX_WIDTH 8192
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048

CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query


CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.0
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 8192
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_FALSE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1

OpenCL & volume rendering

0 comments

I implemented a simple volume renderer by OpenCL, which can load the mouse dataset from the project.
In WinXP system on my laptop (bootcamp, nVidia 9400M),
it takes around 250ms(4fps) to render a skull ( 85x96x134 voxels) and
. 500 ms(2fps) with 169x192x268 voxels.
The screen size is 500x500.

----
GPU info:


CL_DEVICE_NAME: GeForce 9400M
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 195.62
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1100 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 128 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 253 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

CL_DEVICE_IMAGE 2D_MAX_WIDTH 8192
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048

CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics


CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.1
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 8192
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_FALSE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_TRUE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1