Although Cuda utils provides convenient vector-type functions, they seem screwed up with vector-type operator/ overloading.
In the cutil_math.h you can find:
inline __host__ __device__ float4 operator/(float4 a, float s)
{
float inv = 1.0f / s;
return a * inv;
}
inline __host__ __device__ float4 operator/(float s, float4 a)
{
float inv = 1.0f / s;
return a * inv;
}
See? The function body is the same! So I always get wrong values when I tried to write:
float4 a,inv_a;
inv_a = 1/a;
No wonder someone told me not to use cuda utils.
PS.
The codes have been corrected in Cuda 4.0+.
1 comments:
Post a Comment