Hi,
Wouldn't Powder Toy be way faster using OpenCL? For those who don't know what it is, it is a library that gives you easy access to parallel computing. It is normally used with the GPU to give extra power, and man does it go fast.
Here is a comparison; you can clearly see OpenCL at 1:36: http://www.youtube.com/watch?v=E67jVgcBZS0
I swear you already have an opencl branch of tpt anyway, it cant be that laborious