Unfortunately, it delegates all of the necessary individual processing to the CPU. I have an amazingly fast computer - 3.0ghz quadcore, 9800 GTX+ 1GB videocard - and Powdertoy still runs incredibly slow when the pixels start to add up.
I understand it is a lot of work in C++ to do this sort of thing. However, it could transform the capabilities and potential of Powdertoy from a simple game to a real simulator, even a learning tool.
Explanation:
If calculations are coal, then CPUs are like giant trains. If you have 2 CPU cores in your computer, you're running two huge trains that can perform extremely complicated/long-term jobs, like running millions of tons of coal across the entire country!
On the other hand, GPUs are like collections of thousands of tiny trains. If you need to perform geographically complex operations (analogous to the millions of pixel-based physics calculations in Powdertoy) then a GPU is your best buddy -- he gets the coal everywhere it needs to be, though he can't move a lot at once. Fortunately, the calculations for each pixel are relatively simple. The thousands of seperate cores in a GPU can handle them.
Please, unleash the potential of this already-astounding product, Mr. Programmerguy.