Parallel Addition (45 mins)

The processors for the parallel machine are reasonably simple. They can accept a pair of inputs, do a simple operation, and then return a single answer. A given processor can be though of as waiting for all its inputs to arrive, and only then producing an answer. You can picture a simple processor (which multiplies two numbers X = A * B) as:

A --> * <-- B | | OutThe answer can be put into an array, or passed to another processor directly. The answer can also be sent to several locations at once. Inputs can come from an array, or from another processor. For example, we can draw the machine that multiplies four numbers together :

A B X Y | | | | \- * -/ \- * -/ | | \---- * ----/ | OutThis has only taken two time steps, since x*y and a*b have been calculated simultaneously. Doing this on a normal machine would take three time steps.

You should assume, for our model, that a processor can only be used once. So despite the fact that both the processors used in step 1 are free during step 2, a third processor is needed. (Think of it as a hard-wired electronic circuit.)

B[1] = A[1] B[2] = A[1] + A[2] B[3] = A[1] + A[2] + A[3] ... B[n] = A[1] + .... + A[n]For example, if we have

A = [1 2 3 4 5]then we require

B = [1 3 6 10 15].

There is the obvious sequential algorithm:

total := 0 counter := 1 while (counter < n+1) do begin total := total + A[counter]; B[counter] := total; counter := counter + 1; end;

For all the following questions clear diagrams, such as above, will be sufficient to illustrate any methods. Any lines coming out of processors should be labelled with the values you expect to have on them. Furthermore, when you are asked how many processors or time steps are required you do not have to be perfectly spot on (e.g. saying 'n' when the answer is 'n-1' is okay).

In case you have forgotten (and in case you find it useful):

1/2 + 1/4 + 1/8 + 1/16 + ... = 1If

1 + 2 + 4 + ... + 2^k = m then k = log2 (m+1)/2where log2 is the logarithm to base 2.

a) Suppose we only want to calculate B[n], how many time steps can we do it in, and how many processors do we require? You may find it easier to just consider the case when n is a power of 2.

b) One way of solving the problem is to calculate each element of B like this. Using the method in part a) for each element of B, and assuming the processors calculating each element cannot exchange data with the processors calculating another element, how many time steps do we require to calculate all of B, and how many processors?

We do need a little trick to produce the answer, since if we split

[1 2 3 4]into [1 2] and [3 4], solving each half produces [1 3] and [3 7] respectively. The required answer though is [1 3 6 10] and not [1 3 3 7].

a) What is the trick? (If you illustrate this, try to consider the solving of each half as taking place in a "black box" - i.e. a device whose inside you do not need to explain, that takes n/2 elements and produces the correct n/2 prefix additions.)

In the divide and conquer approach these black boxes are also solved by using the same method. So if we solve A[1]..A[16] by first solving A[1]..A[8] and A[9]..A[16] and then applying the trick, we would solve A[1]..A[8] by solving A[1]..A[4], A[5]..A[8] and applying the trick etc... We can, of course, solve a single element instantly (the prefix addition of [i] is [i].)

b) Using this divide and conquer approach how many time steps do we need, and how many processors?

- Pair off the elements and sum them.
- Feed these n/2 sums into a "black box" which produces the expected n/2 prefix additions.
- Do some "trick" to get the required results from the information so far.

The black box is, of course, a miniature version of this algorithm again (and not one from any previous question.)

a) What is the "trick" used for part 3 of this method? b) How many time steps, and how many processors do we need?

Solution to Prefix Addition

Antony Rix

(see contact details from home page)