Program The rantings of a lunatic Scientist

A note about GPGPU for the masses

C/C++ GPGPU L2Program Rant

A note about the growing trend towards GPGPU for the masses. This is my response to a Reddit post I saw about a new GPU language, I felt I should copy it here.

As someone who does a lot of development using the GPU a new language is the last thing I want. Programming for the GPU is complicated, it just is, and it should be because what you are running your program on is a very complicated piece of hardware. You have to treat it right and you have to structure your programs and algorithms in a specific way which is not common to other architectures.

All these attempts lately to make GPU programming easy and doable for everyone (the LISP and Haskell libraries come to mind) completely miss the mark. They work under the premise that if you make it easier, everyone can make everything GPU accelerated, and that that will be better. It won’t.

Half the problem with current libraries that make CPU concurrency easier is that people start parallelizing too early. They don’t fully and completely optimize something they already have, they don’t go in and profile it, they don’t notice long system calls and work in a bit of assembly which will reduce the latency. No. Instead they will just chuck some threads in there, because threads make things faster… It’s just not true.

This problem is even more volatile on the GPU. It’s a delicate balance and not every job is suited to the GPU just as not every job is suited to multi threading on the CPU. Giving people (the uninformed people at least) the power of the GPU for every conceivable task is just daft.

When you structure programs for the GPU you need to have full and complete control over everything it does and when it does it. Languages like OpenCL and CUDA may be complicated but your kernels do what they say they do. It’s exactly why writing good C is complicated because you are right on the hardware level with very low abstraction. OpenCL and CUDA don’t try to optimize what you wrote (past a few compile time optimizations which are to be expected) they translate your commands onto the hardware nearly directly. The downside of that is it means you need to fully and completely understand your algorithm and how the hardware will react to each stage of it, the benefit is incredible performance and massively parallel execution.

TL:DR GPU programming is hard for a reason and giving everyone an easy way to do it completely misses the point. It’s like trying to make everyones car into a supercar by handing out nitrous injectors.