Lower Bounds on Sorting

Lower Bounds can also be important within Computer Science. We can use different comparison models to show that with certain algorithms, we can get linear time sometimes with a sort.

Comparison Model:
  • All input items are black-boxes (Abstract Data Types).
  • The only operations allowed are comparisons.

The only way it can really manipulate items is to compare these items against each other.

  • The time cost of an algorithm is equal to the number of comparisons.

From a lower-bound point of view, we can show that we need O(logn) time to search, and O(n) time to sort.

Decision Tree:

Any comparison algorithm can be viewed as a tree of all possible comparisons and their outcomes, and the resulting answer.

This is much clearer with an example:

Binary Search

For n=3, we have our array A = [(0), (1), (2)]

We can envision a decision tree like so:

if a[1] < x:  
    if a[2] < x:
        "a[2] < x"
        "a[1] < x <= a[2]"
    if a[0] < x:
        "a[0] < x <= a[1]"
        "x <= a[0]"

We generally don't want to represent an algorithm this way since the tree increases exponentially. However, it's useful for analysing an algorithm.

When we have an "internal node" in the decision tree, it corresponds to a binary decision in the algorithm (in this model, we only care about comparisons as binary decisions).

A "leaf" in the decision tree represents a successful termination of the algorithm - we've found the solution.

A downward root-to-leaf path represents a single execution of the algorithm. The time taken by the algorithm is the length of that path.

The worst-case running time is the height of the root node, aka the height of the tree.

This is why decision trees are interesting: if we can figure out the height of the decision tree, we can easily translate that into the worst-case running time of the algorithm.

Lower Bounds

Under the comparison model, we can find a universal lower bound on any sorting algorithm, using the idea of decision trees outlined above.

  • Decision trees are binary
  • The number of leaves must be at least the number of possible answers.

Let n be the number of items being sorted.

There are n! permutations of items we should consider in a sorting algorithm, so there are n! possible "answers" we need to worry about. These answers may in fact be duplicated, so there could be more than n!, but n! is the absolute lower bound.

If a tree is binary, then the number of items is at least the number of leaves (it's normally more like double, but this is a convenient lower bound to work with).

The height of a tree is log(l) where l is the number of leaves. Therefore, the height of this tree is at least log(n!).

We can either use Sterling's approximation, or write it out as a sum as follows:

$$ \begin{aligned} height &\ge \log{n!}\\
&= \log{(n \times (n-1) \times (n-2) ... 2 \times 1)}\\ &= \log{n} + \log{(n-1)} + ... + \log{2} + \log{1}\\ &= \sum\limits_{i=1}^n \log{i} \end{aligned} $$

To proceed, we should recall that logn looks like this:

Image of a log graph

And we're trying to find what's essentially the area under the graph - although, since this is discrete maths, we won't be using integrals. We want to find the sum of logi across all the is up to n.

Looking at the second 'half' of logi (remember, as n grows, we'll start to care less and less about the sharp increase at the start), we can see that they more or less approach logn.

We can make our lives easier by discarding the first half, like so:

$$ \begin{aligned} height &\ge \sum\limits_{i=\frac{n}{2}}^n \log{i}
\end{aligned} $$

We can go even further - because n/2 is the lowest value in the sum, we can be sure that the following is also true:

$$ \begin{aligned} height &\ge \sum\limits_{i=\frac{n}{2}}^n \log{n \over 2}
\end{aligned} $$

After this, it's just simple logarithm tricks and factoring:

$$ \begin{aligned} height &\ge \sum\limits_{i=\frac{n}{2}}^n \log{n} - 1\\
&= \frac{n}{2}\log{n} - \frac{n}{2}\\ &= \frac{1}{2}(n\log{n} - n)\\ &= \Omega(n\log{n})\\ \end{aligned} $$

Behold: a universal lower bound on sorting in the comparison model, found by analysing the minimum height of any decision tree within this model to decide the order into which items should be sorted.