April 26, 2015 - 5:41 pm by Joss Whittle
C/C++ Graphics PhD
While working on my submission to this years SURF Research as Art Competition I realized that if I was to have any hope of rendering the final image at high resolution in a reasonable amount of time I would need more power. To do this I applied node parallelism in the form of a computer lab turned render farm.
The above image is the result of ~8 hours of rendering, at 4k resolution, over 18 machines (as described below). No colour correction or other post-processing (other than converting to jpeg for uploading) has been applied.
I try to keep the current generation of my rendering software nicely optimized but at it’s core it’s purpose is to be mathematically correct, capable of capturing a suite of internal statistics, and to be simple to extend. Speedup by cpu parallelism is only performed at the pixel (technically pixel tile) sampling level to reduce the amount of intrusion ray-packet tracing can bring to a renderer. In the future I plan to add a GPU work distributor using CUDA but for now this is quite low on my research priorities.
In order to get the speed boost I needed to render high resolution bi-directional path traced images I made use of Swansea Computer Sciences – Linux Lab, which has 30 or so i7, 8GB Ram, 256GB SSD machines running OpenSUSE. I wrote a bash script which for each ip address in a
machine_file (containing all ip’s in the farm) ssh’s into each system and starts the renderer as a background process, and another to ssh into all machines and stop the current render.
The render job on each node outputs to a unique binary
partials file every 10 samples per pixel (at 4k resolution) to a common network directory, overwriting the files previous values. This file contains three int’s containing
samples respectively; followed by
(width * height * 3) double precision numbers representing the row-column pixel data stored in
The data in the file represents the average luminance of each pixel in HDR. A second utility program can be run at a later time to process all compatible partials files in the same directory and turn them into a single image. Which is then properly tone mapped, gamma corrected, and saved as a
.bmp file. To combine two partials, the utility program simply performs the following equation for each pixel:
P_1,2 = ((P_1 * S_1) + (P_2 * S_2)) / (S_1 + S_2)
By repeating this process one by one (each partial file can be > 500MB) all partials in a directory can be aggregated together into a single consistent and unbiased image.
December 3, 2013 - 9:43 pm by Joss Whittle
C/C++ Graphics PhD
Today has been a pretty good day, both for me and for my work. For the last couple of weeks I’ve been working from home because all I am doing lately is background reading and coding my new renderer. At first it was nice to know that my workspace was only a commando roll (or a fall) out of bed away from me; but after 3 weeks it just got to be a bit much. Don’t take this to mean I didn’t leave the house for three weeks, I did, but working long hours from the comfort of my room did dramatically take its toll on me.
But enough about me going mad in the house! Today I came into the office (which I think I’ll start doing a lot more often) and have made great progress on my new
November 6, 2013 - 4:13 pm by Joss Whittle
C/C++ GPGPU Graphics Java L2Program PhD
It’s still not perfect, far from it in fact, but it’s progress none the less. I’ve been reading a lot lately about Metropolis Light Transport, Manifold Exploration, Multiple Importance Sampling (they do love their M names) and it’s high time I started implementing some of them myself.
So it’s with great sadness that I am retiring my PRT project which began over a year ago, all the way back at the start of my dissertation. PRT is written in Java, for simplicity, and was designed in such a way that as I read new papers about more and more complex rendering techniques I could easily drop in a new class, add a call to the render loop, or even replace the main renderer all together with an alternative algorithm which still called upon the original framework.
I added many features over time from Ray Tracing, Photon Mapping, Phong and Blinn-Phong shading, DOF, Refraction, Glossy Surfaces, Texture Mapping, Spacial Trees, Meshes, Ambient-Occlusion, Area Lighting, Anti-Aliasing, Jitter Sampling, Adaptive Super-Sampling, Parallelization via both multi-threading and using the gpu with OpenCL, Path Tracing, all the way up top Bi-Directional Path Tracing.
But the time has taken it’s toll and too much has been added on top of what began as a very simple ray tracer. It’s time to start anew.
My plans for the new renderer is to build it entirely in C++ with the ability to easily add plugins over time like the original. Working in C++ gives a nice benefit that as time goes by I can choose to dedicate some parts of the code to the GPU via CUDA or OpenCL without too much overhead or hassle. For now though the plan is to rebuild the optimized maths library and get a generic framework for a render in place. Functioning renderers will then be built on top of the framework each implementing different feature sets and algorithms.
July 25, 2013 - 4:50 pm by Joss Whittle
C/C++ GPGPU L2Program Rant
A note about the growing trend towards GPGPU for the masses. This is my response to a Reddit post I saw about a new GPU language, I felt I should copy it here.
As someone who does a lot of development using the GPU a new language is the last thing I want. Programming for the GPU is complicated, it just is, and it should be because what you are running your program on is a very complicated piece of hardware. You have to treat it right and you have to structure your programs and algorithms in a specific way which is not common to other architectures.
All these attempts lately to make GPU programming easy and doable for everyone (the LISP and Haskell libraries come to mind) completely miss the mark. They work under the premise that if you make it easier, everyone can make everything GPU accelerated, and that that will be better. It won’t.
Half the problem with current libraries that make CPU concurrency easier is that people start parallelizing too early. They don’t fully and completely optimize something they already have, they don’t go in and profile it, they don’t notice long system calls and work in a bit of assembly which will reduce the latency. No. Instead they will just chuck some threads in there, because threads make things faster… It’s just not true.
This problem is even more volatile on the GPU. It’s a delicate balance and not every job is suited to the GPU just as not every job is suited to multi threading on the CPU. Giving people (the uninformed people at least) the power of the GPU for every conceivable task is just daft.
When you structure programs for the GPU you need to have full and complete control over everything it does and when it does it. Languages like OpenCL and CUDA may be complicated but your kernels do what they say they do. It’s exactly why writing good C is complicated because you are right on the hardware level with very low abstraction. OpenCL and CUDA don’t try to optimize what you wrote (past a few compile time optimizations which are to be expected) they translate your commands onto the hardware nearly directly. The downside of that is it means you need to fully and completely understand your algorithm and how the hardware will react to each stage of it, the benefit is incredible performance and massively parallel execution.
TL:DR GPU programming is hard for a reason and giving everyone an easy way to do it completely misses the point. It’s like trying to make everyones car into a supercar by handing out nitrous injectors.