Back to Blog List

The many faces of Mergesort

2024-12-27

Introduction

Mergesort is a fundamental algorithm taught in computer science courses, but is often forgotten in day-to-day work. It typically resurfaces when preparing for technical interviews, where knowledge of CS fundamentals is tested. In this blog, I will rebuild the intuition about Mergesort from first principles, discuss Monoids and their relationship to Mergesort, and show how to overload it to achieve more than just sorting.

The Algorithm From First Principles

Let us first revisit how the algorithm works. Consider a set of elements that can be totally ordered (i.e., are comparable to one another) under a defined $<=$

Mergesort sorts the elements of the set based on the comparison function $<=$ or $>=$. It operates in time: $ O(NlogN)$ where $N$ is the length of the set. It belongs to the "Divide and Conquer" set of algorithms and operates as follows:

  1. Split the set to two relatively equal lengths subsets: $\lfloor N/2 \rfloor$ and $N - \lfloor N/2 \rfloor$.
  2. Sort recursively each of the subsets.
  3. Merge the two sorted subsets: We hold two pointers one to each subset, pointing to the first element of each subset. We continuously compare the values pointed to by the two pointers and copy the smaller one into the new set. Then we move the pointer of that value to the next element. We do this until both pointers point beyond the end of their corresponding subsets.If one of the pointers reaches the end of the corresponding subset before the other, we simply copy the elements of the other subset as is into the new set.

We will now prove the correctness of this method using strong induction on $N$ - the size of the set:

Proof by Induction

  1. $P(0)$ - The empty set $\{\}$ is trivially sorted.
  2. $P(1)$ - the set $\{a\}$ of one element is also trivially sorted.
  3. $P(k <= N)$ Hypothesis: Let's assume that sets of lengths $k <= N$can be sorted correctly using Mergesort.
  4. Let's prove $P(N + 1)$ Mergesort sorts correctly a set of size $N + 1$
    • Split the set into two subsets of size $\lfloor (N + 1)/2 \rfloor$ and $ N + 1 - \lfloor (N + 1)/2 \rfloor $
    • Based on our hypothesis (3) we can sort them using Mergesort.
    • Now that we have two sorted sets, we can merge them into a new set based on the algorithm above to get a sorted set of size $N+1$

With the theoretical foundation in place, let’s see how to implement mergesort in C++:

C++ Implementation

#include <vector>
#include <functional>

template <typename ForwardIt>
using Iter_Value = typename std::iterator_traits<ForwardIt>::value_type;

template <std::forward_iterator ForwardIt, typename Compare = std::less<Iter_Value<ForwardIt>>>
requires std::totally_ordered<Iter_Value<ForwardIt>> &&  std::predicate<Compare, Iter_Value<ForwardIt>, Iter_Value<ForwardIt>>
void MergeSort(ForwardIt begin, ForwardIt end, Compare comp = Compare()) {

    std::function<void(ForwardIt,ForwardIt,ForwardIt,Compare)> Merge = [](ForwardIt begin, ForwardIt mid, ForwardIt end, Compare comp){
  
        std::vector<Iter_Value<ForwardIt>> temp;
        auto b = begin;
        auto m = mid;
        for (; b != mid && m != end;) {
            if (comp(*b,*m)) {
                temp.push_back(*b++);
            }else {
                temp.push_back(*m++);
            }
        }

        while (b != mid) {
            temp.push_back(*b++);
        }

        while (m != end) {
            temp.push_back(*m++);
        }

        std::move(temp.begin(), temp.end(), begin);
    };

    if (begin == end || distance(begin,end) < 2) {
        return;
    }

    auto mid = std::next(begin ,distance(begin,end)/2);
    MergeSort(begin, mid, comp);
    MergeSort(mid, end, comp);
    Merge(begin, mid, end, comp);
}

It is important to note that the MergeSort function is a generic function that can be used to sort any set of elements that can be totally ordered. The function does not depend on any specific data structure, and only requires that an iterator to the elements is provided. In addition the comparison function can be any predicate that satisfies the requirements of the comparison function.

Monoids and Sorted Sets

The fact that Mergesort splits the set into two subsets and then merges them back together, comes with a nice perk, that is a property of the sorted subsets. To understand this, we need to understand the concept of a Monoid.

A Monoid is an algebraic structure with a set of elements and a binary operation that satisfies the following properties:

  1. Closure: For all $a, b$ in the set, $a * b$ is also in the set.
  2. Associativity: $(a * b) * c = a * (b * c)$ for all $a, b, c$ in the set.
  3. Identity: There exists an element $e$ in the set such that $a * e = e * a = a$ for all $a$ in the set.

We can show that sorted sets of elements that can be totally ordered, are a Monoid under the operation of merging, as defined above. First, let’s define a sorted set:

$B = \{a_1, a_2, ... a_n...\}$ $ \forall a_i \in B \text{ s.t. } a_i \leq a_{i+1}$

Let $S = \{B_1, B_2, ..., B_n\}$ where each $B_i$ is a sorted set as defined above.

We will now show that $S$ forms a monoid under the definitions above with $*$ being the merge operations as defined previously.

Closure: for any $Bi$ and $Bj$ : $Bi * Bj$ is also a sorted set based on the definition of the merge operations and hence belongs to $S$

Associativity: for $Bi$ $Bj$ and $Bk$ we need to show that: $(Bi*Bj)*Bk = Bi*(Bj*Bk)$
A formal proof of set equality would require demonstrating mutual containment. However, it is evident that the order of operations does not affect the outcome, as the merge operation reorders elements based on the defined total ordering for example:
$\{1, 3\}, \{2, 4\}, \{5, 6\}$

$( \{1, 3\} * \{2, 4\}) * \{5, 6\} = \{1, 2, 3, 4\} * \{5, 6\} = \{1, 2, 3, 4, 5, 6\}$

$\{1, 3\} * ( \{2, 4\} * \{5, 6\} ) = \{1, 3\} * \{2, 4, 5, 6\} = \{1, 2, 3, 4, 5, 6\}$

Identity: we will use the $\{\}$ - empty sorted set as the identity element. Trivially applying merge on any $Bi$ - sorted set with $\{\}$ will yield back $Bi$.


We may notice that the $S$ also has a commutative property where $Bi * Bj = Bj * Bi$ While this specific property is not a requirement from a Monoid, it is still worth mentioning.


Why is this important?

It turns out that Monoids, while being mathematical structures, possess properties that are highly relevant to programming. The most notable one is that types/sets that are Monoidal lend themselves perfectly to concurrency and parallalization. These properties guarantee that computations can be distributed, combined, and re-ordered without affecting the final result.

  • The associativity allows tasks to be split into independent subtasks which can be processed in parallel and can be combined later without the need to worry about the order of combination.
  • The identity element allows tasks to start with a default/empty state and guaranteeing that the result is unaffected by it.
  • The closure property guarantees that we get consistent results after combining the subresults.

One of the famous usages that take advantage of this is K-Way merge which relies on the Monoidal structure of sorted sets. Specifically, imagine you have many files that are too big to fit in to main memory, each containing sorted elements. Our goal is to combine them all into a single file with all the sorted elements from all of them. Since the files can't fit into memory you can split the work by sending groups of files ($K$) to be merged on different machines. After the merge you get back a smaller set (potentially 2) of files and you merge them by reading blocks of fix sized from the files into memory, merging them, and writing them back to the resulting file.

Applications of Parallel Mergesort (an extension of the above)

Big Data Processing:
• Parallel mergesort is used in distributed systems (e.g., Hadoop, Spark) for sorting massive datasets.
• Large data chunks are sorted on different nodes and merged into a global order.

Real-Time Systems:
• Parallelism enables sorting of real-time sensor data or logs without significant latency.

Takeaways:

By leveraging monoidal properties (associativity, identity, and closure), we can perform efficient sorting on modern multi-core or distributed systems. Whether it’s processing massive datasets, merging sorted logs, or optimizing real-time data streams, parallelism and mergesort go hand in hand, showcasing the elegance of merging as a monoidal operation.
Mergesort is a powerful algorithm that can be used to solve a variety of problems beyond just sorting.

Other Interesting Applications of Mergesort

There are some cases where we receive an unsorted set, and are asked to find elements in it based on some property. Moreover, the property requires that the elements are processed based on their relative original location. If the elements were sorted, finding the elements that correspond to this property would have been easy. However, because the elements are not sorted as well as the fact that it is required to apply the property on their relative and original location, it becomes much more difficult. The good news is that we can use Mergesort in many of these cases to achieve an optimal solution. We will show some examples.

Finding a "bad" Stock

Imagine that you receive a list of historical prices of stocks. You are tasked at assessing which stocks are "bad". In this context a stock is "bad" its prices dropped over time by half (or more) from a previos price it had, more than $K$ times.

To simplify the task, let's focus on one particular stock. You receive a list of historical prices. The list is sorted in chronological order (e.g. the first element is the price of the stock at day 0, the second is the price at day 1 etc..). You want to count the number of times that the price dropped by over a half from some previous price it had.

for example, if the stock prices are:

$[80, 90, 40, 100, 50, 45, 300, 150, 70]$

The number of times the stock dropped by over a half are: $(80,40), (90,40), (90,45), (100,50), (100,45), (300,150), (300,70), (150,70)$ hence, we should return 8.

The naive solution to this would be to run two iterations over the list. The first runs over the entire list, the other runs over the list start from the next element after the first iterator. We then count for each element how many times we encounter elements that are half or more than it. The run time for such a naive algorithm would be $O(N^2)$.

We will now show how we can improve the run time and reduce it to $O(NlogN)$ by leveraging mergesort. At first glance, it may be unclear how sorting aids in solving this problem. Additionally, sorting could appear to compromise the original relative positions of the prices, which are integral to this task.

To understand how sorting can help solve this problem, let’s build the intuition using strong induction over the size $N$ of the list:

Proof by Induction

  1. $P(0)$ a list of size 0, we return 0
  2. $P(1)$ a list of size 1, we return 0.
  3. $P(2)$ a list $\{a1,a2\}$ of size 2. If $a1 >= 2*a2$ we return 1, otherwise we return 0
  4. $P(K <= N)$ Let's assume that we can find the count for sets of size $K$ that are less than or equal to $N$. In addition we will add that they need be sorted in ascending order.
  5. We need to show how to solve this for $P(N + 1)$ a list of size $N+1$:
    • Just like with merge sort we will split it into two subsets of size:
      $\lfloor(N + 1/2)\rfloor$ and $ N + 1 - (\lfloor N + 1/2 \rfloor)$ corresponding the first (left) half of the list and the second (right) half of the list.
    • based on our hypothesis (4) we can get the count for each one of these. Let's denote them by $\text{LeftCount}$ and $ \text{RightCount}$ and we also know based on (4) that they will each be sorted in ascending order.
    • At this point we have two sorted lists of prices: $\text{left-list} = {l1,l2,... ln}$ and $\text{right-list} = {r1,r2,r3,r4... rn}$. While they are now sorted and hence the original locations of the prices within each list has shifted, elements in right-list are still to the right of element in left-list. We can take advantage of the fact that they are sorted. Assume we traverse left-list and we are now at $l2$. Assume that we find that $r4$ is the first element in the list such that $l2 <$ $r4 * 2$ this tells us that elements $r1, r2, r3$ which are smaller than $r4$ must correspond to our requirement. Hence for $l2$ we can say that its "bad" count is 3. Furthermore, for any element after $l2$ those 3 are also "bad" counts for them as well. The reason is that if $l2 >=$ $2*{r1,r2,r3}$ then for any element $li >= l2$ it stands that:
      $li >= l2 >= 2*{r1,r2,r3}$ and again we are taking a dependency that left-list is also sorted in ascending order. In other words, for each $li$ in $\text{left-list}$ we need to find the first $rj$ in $\text{right-list}$ that does not satisfy our property and count the number of elements that proceed it. From that point on we needed look at those elements anymore and we only need to care about elements greater than $rj$. Finally, we need to apply our merge algorithm to combine the two lists into one sorted list in ascending order to keep the invariant of the hypothesis. The runtime of this final step is similar to that of merge in merge sort.

By applying mergesort we are able to find the "bad" counts in time proportional to $O(NLogN)$ which is a substantial improvement if we are talking about very large lists.

C++ Implementation

long long FindStockBadCount(std::vector<int> &nums, size_t start, size_t end){
        if(start == end || start + 1 == end){
            return 0;
        }

        size_t med = start + (end - start)/2;
        long long left = FindStockBadCount(nums,start, med);
        long long right = FindStockBadCount(nums,med,end);

        long long  merged = 0;
        int s = start;
        int m = med;

        // For each nums[s], find the smallest nums[m] in the right half such that nums[s] < 2 * nums[m]
        for(;s < med; s++){
            while(m < end && nums[s] >= 2LL*(nums[m])){
                m++;
            }

            merged += m - med;
        }

        std::inplace_merge(nums.begin() + start, nums.begin() + med, nums.begin() + end);
        return merged + left + right;
    }

Conclusion

Mergesort is more than just a fundamental algorithm taught in computer science courses: it's a versatile tool with applications far beyond basic sorting. By revisiting its principles and exploring its connections to monoids, we’ve seen how its divide-and-conquer approach, combined with properties like associativity, closure, and identity, make it well suited for concurrency and parallelism. Whether used for sorting massive datasets in distributed systems, processing real-time analytics, or solving complex problems like the “bad stock” example, mergesort demonstrates the power of understanding an algorithm’s foundational structure. It’s a reminder that algorithms we often take for granted can unlock powerful solutions when revisited with a deeper perspective.

Share this post