How to build a powerful distributed computer

You can never have enough processing power, especially if you enjoy working with 3D graphics or compiling your own software.

Pleasingly, with a very small outlay, it is easy to use any spare machines you may have to create a single homogeneous computing mega-matrix and calculation engine just by wiring them all together and running the right software.

Even modest hardware can make a significant contribution to your net computing power, and if you've already got the hardware then you've got nothing to lose.

PC hardware is now so cheap that buying a couple of extra machines and wiring them into the same computing pool could make a very cost-effective expansion. This is what we are going to build, and we're going to use Ubuntu Linux to do it. Linux can take cluster computing tasks like these in its stride, and you don't need to fork out for a licence for every machine.

Before embarking on this endeavour, we need to make one thing clear. The combined processing power of a cluster of computers can only be used for certain applications.

You won't be able to boost your frame rate on Crysis or Far Cry 2, for example, and you won't be able to run your everyday applications across the cluster unless they're designed to do so. Neither are we building a Beowulf cluster, where you'd need to use specialised libraries and programming languages to take advantage of the parallel processing potential within the cluster.

We're going to work with distributed computing, which involves splitting up a task across several machines in a local cluster. As a result, our applications can be far more down to earth and practical. You will be able to dramatically reduce rendering times with Blender, for example, or the compilation time of major apps like the GNU/Linux kernel.

You'll also be able to parallelise any number of tasks and use each machine separately from your master if you wish. As with multi-core processors, there's some inefficiency when scaling jobs across a cluster of processors, but it will almost always be much faster than without the processor working.

In theory, you can use any old PC. The minimum requirement is that it must be able to run Linux; so that narrows the choice down to almost any PC from the last 10 years. But in reality, the cluster works best if the machines that you're linking together are relatively close in specification, especially when you start to take running costs into consideration.

A 1GHz Athlon machine, for example, could cost you over £50 a year in electricity costs. You'd be much better off spending this money on a processor upgrade for a more efficient machine. A similar platform for each computer also makes configuration considerably easier. For our cluster, we used four identical powerful machines.

You only need powerful machines if you're making a living from something computer-based – 3D animation, for instance – where you can weigh the extra cost against increased performance. We're also going to assume that you have a main machine you can use as the master. This will be the eyes and ears of the cluster, and it's from here that you'll be able to set up jobs and control the other machines.

Hardware compatibility

Linux has come a long way in terms of hardware compatibility, but you don't want to be troubleshooting three different network adaptors if you can help it. And it's the network adaptors that are likely to be the weakest link.

Cluster computing is dependent on each machine having access to the same data, and that means that data needs to be shuffled between each of the machines on the network cluster continually.

TOPICS