Thats quick to be coming out, since the OS X API beta just came out in August...CUDA is coming.
:drool:
I'm a software developer and I dont think that writing software that efficiently uses multiple cores is an easy thing. It really depends on the architechture of how the program was created and also what its used for.
The reason that a lot of software engineers may be waiting for a compiler is that once that is made, the porting of these programs to effectively use multicore is much easier and mush less error prone.
But i think one of the things that needs to be understood, is that some programs just dont lend them selves to parallelism. Some things have to be done one after the other. So not all programs will necessarily benefit form parallelism.
Indeed, the problem we face now is not a hardware one, but rather it is the software that lags behind. There is no magic code to make your software work on multiple cores more efficiently. Seemless multitasking, where each core handles an appliation so the user can switch between apps without slowdown, is one thing, but stacking all the cores to work on a single process is seriously difficult.