Cole's parallel merge sort using N/log(N) processors

This parallel sort runs on EREW in O(log(N)^2) time using O(N/log(N)) processors and results in the optimum cost O(N log(N)). There is also a more complicated variant of Cole’s merge sort which is cost and time optimal (runs in O(log(N))).

Overview

This parallel merge sort works in the same manner as the sequential merge sort. It divides the array into halves, sorts them, and then merges the results. Parallelization into halves is easy as we can dedicate half of the processors to each and sort the halves independently. An implementation-wise easier approach may be to do a bottom-up evaluation, sorting pairs, then quadruplets, then each 8 consecutive elements, etc.

It is clear that such an algorithm will perform O(log(N)) division into halves. As the whole algorithm should have time complexity of O(log(N)^2) we have to ensure that the most computationally intensive task – merging – has complexity of O(log(N)).

Assumptions

There are several things we use to make the explanation easier. With some extra work one could prove the later statements without these assumptions.

input array size is a power or 2 – this increases size of the input by less than 2 and as our complexity shall be polynomial we only get an extra multiplicative constant (that hides into the O notation), let N denote this size and let n=log(N)
elements are unique – changing the numbers X into tuples (X,i) where i is its index makes all elements unique
hidden element parameters – when we mention that something is saved for or retrieved from each element we use notation akin to the retrieval of object’s parameter (from programming); this is a shortcut so that we do not end up with many nested array indexing (we wish to avoid things like P[X[M[i]+1]-1])

Let’s get to it.

Merging

We are given two sorted arrays A and B that have the same size which is a power of 2^n. The goal is to create an array C of size 2^(n+1) that contains elements of A and B in sorted order. The algorithm must have time complexity of O(log(N)).

To merge arrays A with B let us split them into even-indexed and odd-indexed elements – call them Ae, Be and Ao, Bo (for even / odd). First, we recursively merge Ae with Be (assume this works well for now, we’ll return to a more precise analysis later). After doing so, we have a sorted array E that contains elements of Ae and Be in the sorted order. The main part of the procedure now focuses on how to get the resulting array C that represents sorted array of elements from A and B.

Let us index the elements from 1, so A consists of elements indexed (A[1], A[2], A[3], …), and similarly for B, E, and C, but also Ae and Ao. To illustrate how to place elements of E to C we present the procedure on a single element:

Assume that E[7] is the element A[4]. As A[4] is the second even-indexed element of A the other 5 elements in E before E[7] are the first five even-indexed elements from B. Every even-indexed element has an odd-indexed element before it. The only element that has unknown relation to E[7] is the odd-indexed element from the other array B[11] (2*7-4+1 = 11). This is because B[10] is smaller than A[4] and B[12] is bigger than A[4], but A[4] was not compared to B[11] at all. So we compare B[11] with A[4] and if it is smaller, then we set C[15] = A[4], otherwise it is bigger so C[14] = A[4].

We can do the operation described above in parallel for all elements of E and place them into C. Note that elements of E must have attached information about their original array and their original index for us to be able to do this.

Now, to determine the final indices of odd-indexed elements of A and B in C we first notice that in C there may be at most two odd-indexed elements after each other. This is caused by C being a result of sequential merging in which we take the smallest elements of A or B and move them to C one-by-one. We give the responsibility of placing odd-indexed elements to the even element that stands just after them in C. Again, by example: an element E[7] that started at A[4] and was placed at C[15] is preceded zero, one, or two odd-indexed elements. First of them is potentially A[3], the second is from B and takes some work to figure out. We look at E[6] (the element which in E precedes our element E[7]) and whether E[6] belonged to B. If not, then there is no other element

test

Cole's parallel merge sort using N/log(N) processors

Overview

Assumptions

Merging

Analysis