AHA!

Install TexLive on Ubuntu

sudo apt-get install texlive
sudo apt-get install xzdec

The second line solves the problem when using tlmgr.
Some people suggest not to use apt-get and instead use a third-party PPA. However, my experience is that it would cause future dependency problem of packages that lead to unsolvable message when performing apt-get upgrade: "packages have been kept back...". Therefore, I will keep using the default PPAs if possible.

Reference

Cuda: Write to depth buffer

Labels: cuda 0 comments

After several tests, finally find how to write to the depth buffer (z-buffer). The ultimate tip when writing the depth buffer:
Do NOT disable GL_DEPTH_TEST!!

Here are the codes (only showing codes related to the depth-buffer). Thanks to my colleges for the references:

1. Create PBO and resources:

GLuint m_pbo_depth; 
cudaGraphicsResource *m_cuda_pbo_depth_resource;

void initPBO()
{
if (m_pbo) {
	// delete old buffer          
	CUDA_SAFE_CALL(cudaGraphicsUnregisterResource(m_cuda_pbo_resource));
	glDeleteBuffersARB(1, &m_pbo_depth);                    
}

glGenBuffersARB(1, &m_pbo_depth);

glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, m_pbo_depth);
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, imgWidth*imgHeight*sizeof(GLuint), 
0, GL_STREAM_DRAW_ARB);
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

CUDA_SAFE_CALL(cudaGraphicsGLRegisterBuffer(&m_cuda_pbo_resource, m_pbo, 
cudaGraphicsMapFlagsNone ));
CUDA_SAFE_CALL(cudaGraphicsGLRegisterBuffer(&m_cuda_pbo_depth_resource, 
m_pbo_depth, cudaGraphicsMapFlagsNone ));

}

2. Map PBO and call cuda

void executeKernel()
{
// map PBO to get CUDA device pointer        

uint *d_depth_buffer;
size_t num_bytes;
CUDA_SAFE_CALL(cudaGraphicsMapResources(1, &m_cuda_pbo_depth_resource, 0));
CUDA_SAFE_CALL(cudaGraphicsResourceGetMappedPointer((void **)&d_depth_buffer, 
&num_bytes, m_cuda_pbo_depth_resource));

     d_render<<<...>>>(..., d_depth_buffer);

CUDA_SAFE_CALL(cudaGraphicsUnmapResources(1, &m_cuda_pbo_depth_resource, 0));    
}

3. In the rendering step, write the depth pbo to the rendering depth buffer

	glEnable(GL_TEXTURE_2D);
	glEnable(GL_DEPTH_TEST);
//	glDisable(GL_DEPTH_TEST); !!! cannot disable depth test
	glDepthFunc(GL_ALWAYS);

	// normalize matrix
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
	glLoadIdentity();

	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	glLoadIdentity();
	glOrtho(0.0, 1.0, 0.0, 1.0, 0.0, 1.0); 

	// draw image from PBO
	glRasterPos2i(0, 0);

	// depth
	glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, m_pbo_depth);
	glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
	glDrawPixels(vrParam.imgWidth, vrParam.imgHeight, 
GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, 0);
	glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
	glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

	// recall matrix
	glMatrixMode(GL_MODELVIEW);
	glPopMatrix();
	glMatrixMode(GL_PROJECTION);
	glPopMatrix();

	glDepthFunc(GL_LESS);

Access Nearest-neighbor in Linear Texture-read Mode

Labels: cuda, OpenCL 0 comments

You may not have noticed that, in either Cuda or OpenCL, you can perform nearest-neighbor texture lookup (like cudaFilterModePoint) on textures that are already set in linear access mode (cudaFilterModeLinear). Why not just use texture in nearest-neighbor mode? Because there are cases in one texture there are channels that should not be linear-interpolated but others are needed (in this way you can more efficiently cache the texture data). Also you don't want to do you own linear interpolation because that will reduce the performance (roughly by half).

Suppose your texture lookup code for point access mode is as follows:


float2 data = tex3D(tex, x,y,z);
id = data.x;
alpha = data.y

id is the color ID for later lookup and cannot be blended.

You want to do interpolation on first channel but not on second channel, here is the way for the texture with cudaFilterModeLinear and cudaReadModeElementType (do not normalize the coordinates) mode:


id = tex3D(tex, floor(x+.5f)+.5f, floor(y+.5f)+.5f, floor(z+.5f)+.5f).x; // nearest neighbor
alpha = tex3D(tex, x,y,z).y; // linear

The .5f in floor is for rounding into nearest integer instead of round down.
The outter .5f is to access the un-interpolated color in the device texture.
Suppose that x,y,z are not normalized

For normalized coordinates (cudaReadModeNormalizedFloat), it's more complex and I would not recommend:


id = tex3D(tex, (floor(x+.5f)+.5f)/w, (floor(y*h+.5f))+.5f)/h, (floor(z+.5f)+.5f)/d ).x; // nearest neighbor
alpha = tex3D(tex, x/w, y/h, z/d).y; // linear

Although there are more computation to perform, the performance will still be much better than doing your own interpolation. One drawback is that you lose the quality compared to if you manually interpolate the 6 neighbors colors referenced by IDs around the access point.

Cuda operator/ overload for

Labels: cuda 1 comments

Although Cuda utils provides convenient vector-type functions, they seem screwed up with vector-type operator/ overloading.
In the cutil_math.h you can find:


inline __host__ __device__ float4 operator/(float4 a, float s)
{
    float inv = 1.0f / s;
    return a * inv;
}
inline __host__ __device__ float4 operator/(float s, float4 a)
{
    float inv = 1.0f / s;
    return a * inv;
}

See? The function body is the same! So I always get wrong values when I tried to write:

    float4 a,inv_a;
    inv_a = 1/a;

No wonder someone told me not to use cuda utils.

PS.

The codes have been corrected in Cuda 4.0+.

Cache matters

Labels: HPC 0 comments

In my CS parallelism class homework we are giving a simple nested for-loop and asked to make it faster without changing the semantics. In other words, we only consider increasing the cache hit rate. Here are the codes:

 
#define N (128)
double A[N][N][N], B[N][N], C[N][N][N],  CC[N][N][N]; 
...

  for (i=0; i<N; i++)
   for (j=0; j<N; j++)
    for (k=0; k<N; k++)
     for (l=0; l<N; l++)
       C[i][j][k] += A[l][i][j]*B[l][k];

By permutating and unrolling the loops and check whether the running time reduces, here are the best codes I can change:

  
for (i=0; i<N; i++)
   for (l=0; l<N; l+=2)
     for (j=0; j<N; j+=2)   
      for (k=0; k<N; k+=2)
      {

      CC[i][j][k] += A[l][i][j]*B[l][k];
      CC[i][j][k+1] += A[l][i][j]*B[l][k+1];
 
      CC[i][j][k] += A[l+1][i][j]*B[l+1][k];
      CC[i][j][k+1] += A[l+1][i][j]*B[l+1][k+1];
 
      CC[i][j+1][k] += A[l][i][j+1]*B[l][k];
      CC[i][j+1][k+1] += A[l][i][j+1]*B[l][k+1];

      
      CC[i][j+1][k] += A[l+1][i][j+1]*B[l+1][k];
      CC[i][j+1][k+1] += A[l+1][i][j+1]*B[l+1][k+1];
      }

And here is the result: (Base is the first codes, and test is the next I changed)

Tensor Size = 128
Base-TensorMult: 113.4 MFLOPS; Time = 4.736 sec; 
Test-TensorMult: 1863.8 MFLOPS; Time = 0.288 sec;
No differences found between base and test versions

You see? It can be more than 16 times faster!! So we really need to take caches into consideration when we are coding, though the codes can be really ugly :(

I found an ultimate method to share files among apps on iPad(also iPhone)

Labels: iPad, iPhone 0 comments

Background
The illusion we get from promotions about iPad is that it's easier to produce documents simply from an iPad. True it's easy, but apple seems to mess up the file sharing among apps and among devices. We need to share files among apps because some application has features that another doesn't have. For example, some applications are easy to download/upload files but not a good reader/editor, while some are vice versa.

Yes Apple provides development tools for programmers to send files to another app, or for users to directly access Documents directory under the application's own directory through iTunes. However they are still handicapped in the following aspects:
1. Files are not organizes. Our working files are spread in several apps and some are duplicated with different versions. It's a nightmare that every time We have to recall which app holds the latest file we edited, and if we have lots of documents, we have to scan through the files to find the exact one.
2. Waste storage space. As mentioned above, sending files to another application is to make a copy to the applications's own directory. So we waste the storage saving two or more copies in our iDevice, while Apple charge a lot for more storage spaces.

So, the ultimate solution to the above problem is to let all regular iApplications see one shared directory. If they can share the same directory, there is no more duplicated files and we can organize files in it once and for all! I have come up with an elegant way to solve this, and is simple, but it needs to be done in jailbroken devices. However I believe it's theoretically doable in non-jail broken iDevices. But I guess Apple will never allow this method or app legally working in our devices. (though they can consider providing more flexible file sharing now)

Problem
So I want all my applications to be able to see a shared directory.

Solution
The simplest way, after jailbreaking, is to utilize the Unix link (alias in Mac os, or shortcut in Windows), to redirect applications to our shared directory. So which directory is shared among all regular iApplications and is both readable and writable? I can only find one, the shared photo album directory, in /var/mobile/Media/DCIM.
The idea is to create a directory, say CrossAppDocs, in the shared directory, and create links in the document directory in every application you want to be accessible to the shared directory. So here are two methods:

Method
Method #1. Using iFile(Cydia)'s Graphical interface: (To be simple I don't tediously list every tiny step)
a. Browse to /var/mobile/Media/DCIM
b. Tap Edit and then '+' to create a new directory, naming it CrossAppDocs. Change the Group permission to be readable and writeable. Tap Create to save.
c. Now you are still in directory edit mode. Select the directly just created and tap the icon in the lower right corner to 'copy/link'
d. Tap Done to finish editing.
e. Browse to /var/mobile/Applications and go to applications you want to link to CrossAppDos.
f. Go to Documents directory and tap Edit again and tap the same lower right icon. Chose 'create link', and you finish creating a link to the sharing documents!

Method #2: using terminal/SSH
1. Go to /var/mobile/Media/DCIM
2. mkdir CrossAppDocs
3. Since you login as root, you have to change the ownership:
- chmod 775 CrossAppDocs
- chown mobile CrossAppDocs
4. Go to document directory in applications you want to link: /var/mobile/applications/(some hash code)/Documents
5. Type: link -s /var/mobile/Media/DCIM/CrossAppDocs ./CrossAppDocs
So you finish creating a link to the sharing document.

Result Images:

GoodReadersLite sees CrossAppDocs

Downloads Lite sees CrossAppDocs

Discussion
Note that some applications have different schemes about their usage of document directory. You may need to add the link in their subdirectories.

Adding a link in the document directory for each app is just one solution using the unix link feature. You can even try changing the document directory into a link to one folder in your shared documentation folder, if the application cannot read/write files in the subdirectories in document folder.
Here are a list of applications working very well:
- iBooks (only shows PDF)
- GoodReader
- Office2
- iAnnotate
- Downloads
- FileBrowser(in data sub-directory)
- Files
- PrintCentral
And more you can imagine

Unfortunately the iWorks products are currently not working well with this scenario because they do not see subdirectories in Documents. The only way but not recommended is that you turn the document directory itself into a link to the shared folder.

So how iTunes backup sees the links? I don't know. But it seems that iTunes doesn't complain. Maybe it simply backs up the link.

More advanced, can we add a link in app's document directory without jailbreaking? Itunes now supports file uploading since iOS 3.2 and also in iOS 4.0, so I am wondering somehow we can send a link created locally to our iDevices. But directly uploading a link does not work. Can it be done through some iTunes API?

This article is just a beginning idea of cross-application file sharing. If you have more advanced ideas, pleases feel free to leave your comments.

Fast way to back up Time Machine on the Windows network for the first time

Labels: Mac 0 comments

Keywords: Mac, Time Machine, Windows sharing, fast

If you have just migrated from PC to Mac (especially to a Mac Book), you must have an old PC that you seldom use, or you share the files to Mac. I have turned my old PC to a file server, just through basic Windows sharing. So I have been thinking why not backup my Time Machine in my file server? Therefore I don't have to plug-in a USB external drive every weekend. Here are some articles and really work(but not well...I'll tell why).

OSX Time Machine and Samba/Windows share
Time Machine Backup to Network Device for Mac

I followed the first guide and successfully started the Time Machine, but found it may take forever to finish the first backup (I used 802.11n to backup to my PC).

However, I know Time Capsule allows us to backup locally at first and then we can move the drive to the network. So, is there any similar way we can do on Windows share? Unfortunately Mac seems doesn't allow connecting to local shared directory. The only way I can think of is using virtual system like VMWare Fusion to share in the virtual machine. Mac will see the share as remote sharing.

I've paid 1 week's trial and error. It's not trivial, as Time Machine sometimes halts and there was nothing we could do but restart. Finally I found some tricks and successfully backed up to my Windows XP in VMWare. It tooks me just half a day to backup 114G's data.

Some things I have confirmed:
1. We sure can backup the Time Machine through Mac sharing(Netalk/AFP protocal) or Windows sharing(Samba/CIFS protocal). The backup format is different, so you cannot switch between after backup. Note that Mac sharing takes less hack (more reliablily and less complexity). However, you must have another Mac. I don't have it, so I focus on backing up into PC network. (If you have another Mac for backup, don't follow this post.)

2. I used NTFS format on PC. I guess it also works with FAT32 format, since Mac's sparsebundle is actually a directory with many small files in it.

How to:
Fist stage - preparation:
1. Run Windows on virtual machine, tune the network interface to Host-only(only sees the host) to prevent unwanted network transmission.

2. Connect backup external drive, create a backup folder, say TM, and share it (remember to alter the permission to allow read/write). Later on we'll let Time Machine backup all stuff in this folder.

3. Check the virtual machine's IP address by typing "ipconfig" in the command line. Say you get 192.168.1.5

4. Back to Mac, run Terminal, type "ping 192.168.1.5" and let it always there till backup done.
- This is to ensure the virtual machine does not halt and miss packets from the host.

Second stage - fist backup (I followed this post and comment up to give more information)

1. Run Terminal and type:

defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1

2. Mount the network share TM (Apple+K in Finder) for backup.
Note:
- The partition is recommended to have empty size twice the size of your data.

3. Get the Ethernet Address of your Mac. You can either do this through the Terminal and typing "ifconfig en0 | grep ether" to pull the address or by going to the System Profiler (About Mac --> More Info), clicking on Network, and then Ethernet.

4. Back to Terminal, type based on your situation:
sudo hdiutil create -size 199.5g -type SPARSEBUNDLE -nospotlight -volname "Time Machine" -fs HFS+J -verbose YourMacName_YourMacAddress.sparsebundle -tgtimagekey sparse-band-size=262144
Note:
- Do not use the GUI application Disk Utilities because you cannot specify "nospotlight" and "imagekey". Once my Time Machine halts backing up because the spotlight tries to index the shared drive. It's stupid.
- Do it locally. Even you specify to create a 199.5g image, this step will just create an image less than 500mb. You can move it to the network drive later.
- 199.5g: Change based on your empty size on network drive.
- YourMacName is your Mac's name. Some article says you should not use names longer than 8 characters.
- YourMacAddress is what you've got from previous step with no colon (12 characters).
- tgtimagekey: I'm not sure it raises your possibility to successful backup. Just try it.

5. After sparsebundle is generated, Mac automatically mount that image.

6. Unmount / Eject the image.

7. Move the .sparsebundle disk image file you created to your mounted network location.

8. Open up the time machine preferences, and click on "Choose Backup Disk".

9. Select your new disk image and then click on "Use for Backup".

10. Your Time Machine should start the initial backup process.
Note:
- You can hit "Back up now" in Time Machine's dropdown list on the status bar to start immediately.
- Never hit the X icon by the statusbar or hit "Stop Time Machine" to pause backup. Time Machine will restart from zero next time and never goes correctly. You have to start over from creating a new sparsebundle image.
- Leave the pinging terminal there until the bakcup process finishes. Do not run anything else in the virtual machine. Let it focuses on saving the packets to your external hard drive.
- Unfortunately, even I did everything right, the Time Machine still sometimes halts during backup. I guess the Time Machine is waiting for reply from the client but packets might have been lost. You can tell by opening up task manager in Windows, turning the networking page. If the networking graph shows flat zero, your Time Machine seems halts. This is what I do: put the Mac to sleep, and wake up. Time Machine will continue magically.
- Open Utilities => Activities Monitor. You can check the backup speed by turning to Disk Activity page at the bottom. The write speed should not be much less than read speed. (But my read speed is actually not high -- just 2-10mb/sec, sometimes dropped to kbs). If you find it too slow, do the magic step too - put Mac to sleep, and wake up.

Last Stage
1. Once done, you can move the external drive to your normal PC, share the TM folder again (remember to allow write permission).
2. Back to Mac. Mount the network share to TM. In Time Machine, hit "Select Disk" and choose the mounted network drive. Then you are done. Enjoy the mobile backup environment.

AHA!

Install TexLive on Ubuntu

Cuda: Write to depth buffer

Access Nearest-neighbor in Linear Texture-read Mode

Cuda operator/ overload for

Cache matters

I found an ultimate method to share files among apps on iPad(also iPhone)

Fast way to back up Time Machine on the Windows network for the first time

Language Selector

Share

Blog Calendar

Blog Archive

Labels

Recent Comments

Followers