Now I realize that this may have been suggested many times before.
How bout a function that can skip multiple frames at a time to speed up reactions where there can be a proportion of some sort like 1:2 frames where every frame that goes by, the simulation advances it by 2.
Or even proportions of 1:3 or 4 or 5, where for every frame that goes by, the simulation is advanced by 3, 4, or 5.
The only real way of speeding it up like this would be to allow scalable multi threading(manually scaled for these purposes), so that you can tweak the number of threads and maximize speed. Other than that, good luck.
SIMD is really beneficial for TPT, hopefully with AVX and FMA from AMD we can see some major speed-ups. Nothing that will be similar to what a proper multi-threaded engine would produce, but then again there is a reason that has not happened yet.