Parallel computing is the use of multiple processors in order to solve a specific task. For quite a few decades now parallelism has been used in the domain of High Performance Computing (HPC) where large, difficult problems are split up into pieces which are solved and then recombined to form the answer. Popular issues such as weather prediction and cosmological simulation have been greatly helped by this form of computing. However, nowadays there is an extra dimension to the whole issue - with multiple cores per processor common place (and only expected to increase) it is important that programmers stop writing sequential code and start writing parallel code - otherwise we will have processors with 64 or 128 cores which are very much underused. Many programmers consider parallel programming "hard"..... and quite rightly so.
In this article I will explain some of the elementary concepts of parallel computing and point the reader to further points of information.
Problem DecompositionWhen we wish to solve a problem via parallel computing we need to decide how to decompose (or divide) it up most effectively. There are two broad ways this can be done.
Task Parallelism - Here a problem is divided into distinct tasks, with each processor running the task and then recombining the results at the end. Lets imagine you have a fence to paint - there are two jobs (firstly preparing the fence and secondly painting it) and there are two of you to do it. In task parallelism one person would prepare the fence whilst the other person would paint it. A bit more computer orientated, lets imagine a typical desktop computer with a dual core processor (two processors.) The user might wish to have a word processor open and also be playing on a game at the same time.... in task parallelism one CPU might run the word processor whilst the other would run the game.
Data Parallelism - Here each processor does exactly the same task, its just that the data they work on is different. Lets go back to the fence example - here both would prepare the fence and then paint it, however one would work on one side of the fence and the other would work on the other side of the fence and aim to meet in the middle. Dividing a problem via data parallelism tends to allow for more gain than using task parallelism.... lets go back to our computer example. Here the game might take loads of CPU resources whilst the word processor takes very few. Using task parallelism CPU 1 (with the word processor on) is only 10% used, whilst CPU 2 (with the game on) is 100%+ used. Using data parallelism the game could be running on both CPUs, with CPU 1 working on half the game data and CPU 2 working on the other half thus resulting in a better use of the CPU resources. Data parallelism also scales much better than task parallelism as the problem size increases. |  |
Problem CommunicationUnfortunately, for us as programmers, during parallel computing often one process needs to communicate with another which adds a whole new level of complexity. For instance, during weather prediction if one processor is trying to decide if it will be warm or hot where you live it will need to take into account lots of different things - some of these might be on other processes and as such it will need communicate with them. We can categorize problems, depending on their communication. in a number of ways.
Embarrassingly Parallel - Here it requires very little or no effort to divide the problem up into parallel subtasks. There is no communication or dependency required between processes - quite rare in practice, but they do exist, an example is the two (data parallel) fence painters above and a bit more technical the Mandelbrot calculation.
Coarse Grained - Here processes do communicate but not all that frequently and are not particularly dependent on each other. Something known as a master-slave design pattern is a common example of this - the slaves do computation and communicate their results to the master process which "glues" them all together. |
 |
Fine Grained - Here processes are very tightly connected, with a high level of communication and dependency on each other. These problems are often difficult to parallelise and, due to the high cost of communication, sometimes not worth parallelising either.
In reality many problems sit somewhere between these classifications.
Communication ModelsIt is important that communication between processors is efficient, easy to program and scalable. There are two main models of communication, Shared memory and Message Passing, which are realized both in hardware and as an abstract programming model irrespective of the hardware.
Shared Memory - Here all the processors are connected to the same (shared) memory. They can all access any part of the memory and as such communicating with each other (and sharing results) is easy to do. In hardware the Symmetric Multi Processors (SMPs) architecture that modern dual core (and core duo) follow is shared memory. Programming model wise, programmers can abstract many machines (such as a cluster) into the shared memory architecture using a library such as Bulk Synchronous Parallelism (BSP.) Although shared memory is relatively simple to program, it is not efficient and really does not scale well. Hardware wise, the more processors you add, the further away from memory they are and this will be a real challenge for SMPs to overcome in future years. Because of the drawbacks, few serious HPC programmers use shared memory. The diagram below shows the typical architecture of a shared memory (SMP) system.
Message Passing - Here all processors are completely separate from each other, each has its own memory and they are connected only via a link (such as a network link in a cluster) and in order to communicate with each other they have to send messages. Typical messages include point to point (synchronous and asynchronous), broadcast and scattering data. Whilst this approach is highly efficient and very scalable there are a number of difficulties for the programmer. For instance, the programmer must ensure that there is no deadlock (where two processes are both waiting for the other one to do something before continuing) and must ensure that messages are received in the correct order and interpreted correctly. Although there is much extra burden placed onto the programmer, this model is by far the most popular with HPC programmers and has meant that up to this point parallel programming has been in the domain of the few experts. The Message Passing Interface (MPI) is one of the most popular message passing standards and is most commonly used in conjunction with C, C++ and Fortran. Even with SMPs which follow the shared memory architecture, one can still use the message passing paradigm for communication, which is still often more efficient than programming shared memory style.
Parallel Programming LanguagesWhilst there are quite a few parallel programming languages out there, up to this point much parallel programming (certainly over the large scale) has been done in C linked with a parallel communication library for efficiency. Due to the low level, sequential, nature of C this really is not ideal. Common libraries which can be used from sequential languages (such as C) include for message passing MPICH and Open MPI (both implement MPI) and for shared memory BSPLib, POSIX Threads and OpenMP.
The advantage of using a parallel language is that parallelism is "built in", often resulting in easier to write code. Some which you may be interested learning more about include Mesham (
http://www.mesham.com), High Performance Fortran (HPF), Co-Array Fortran (CAF), Unified Parallel C (UPC), Cilk, ZPL and NESL.... and many more, a google search should highlight quite a few which I have not listed.
Watch This Space
 | Up until the explosion of multi core chips we were all happy to let parallel programming lie within the domain of experts, and little was done to address this. However, now we realize that soon all programmers will need to start writing parallel code there has been much work done in making this easier and simpler to do. I myself am researching parallel programming languages and paradigms and generally this field has started to move very rapidly. In addition to multi core chips, people are also starting to use GPUs for general purpose computing - these often comprise of many cores and in the future we could be also using this resource to provide a huge amount of extra CPU power, the Nvidia's Tesla to the left is really just the start. For more information about GPGPU, have a look at CUDA. |
Images courtesy of Wikipedia
Article Source: http://www.TechnicalTalk.netYou can find more details here
*** Please do not forget to leave your comments about this article ***