Block Merge Sort



Block Merge Sort (a.k.a. Block Sort) is a family of sorting algorithms, with a stable worst case $$O(n \log n)$$ time sorting algorithm using $$O(1)$$ extra space. Given the time and space bounds, Block Merge Sort is less trivial than most other algorithms such as Heapsort (which is in-place but not stable) or Mergesort (which is stable but not in-place) or Rotate Merge Sort (which is stable and in-place but $$O(n \log^2 n)$$). There also exists Block Merge variants that allocate some extra space such as $$O(\sqrt{n })$$ or $$O(\log n)$$ space.

In brief, Block Merge Sort divides the array into smaller evenly sized sections called "blocks." It then merges the array each block at a time by using specialized techniques such as a blocked Selection Sort to sort the blocks and a modified Insertion Sort for finding distinct elements to use as merge space in the array itself. For convenience and sake of $$O(1)$$ space implementations, the best choice for the block size is $$O(\sqrt{n })$$ but some variants use different block sizes such as $$O(\sqrt[3]{n })$$ or $$O(\log n)$$.

Although Block Merge boasts an in-place stable $$O(n)$$ time merge, it's not the fastest in-place merge in practice and is beaten by Rotate Merge which is a suboptimal $$O(n \log n)$$ merge. This is possibly due to the complicated nature of Block Merge as well as its high overhead and poorer cache utilization.

Algorithm
Block Merge sorts are a class of sorting algorithms that share these qualities: They are Stable, In-Place (sometimes), and $$O(n \log n)$$. They always take the form of a bottom-up merge sort, and extensively use $$O(\sqrt{n})$$ "blocks" and unique buffers. Their merge usually consists of two stages: Merging, where a unique buffer is used to merge individual $$O(\sqrt{n})$$ blocks, and Block Selection, where the blocks are sorted with a variant of Selection Sort that sorts the blocks using their first/last element as the key. Because Selection Sort is not stable, there is also another unique buffer that either "tags" the blocks or imitates their movement to force stability. Cycle Sort can also be used. Some Block Merge sorts use $$O(\sqrt{n})$$ external buffers, making them not In-Place, though they do not have to make sure that elements are unique, as the buffers can be overwritten at will. Block Selection and Merging can be done in any order: Selection before Merging (Grail), Selection at the same time as Merging (Wiki), or Selection after Merging (Kota).

Taking advantage of distinct values
Since Block Merge Sort is designed to allocate minimum space, it cannot take advantage of extra space to merge subarrays like Mergesort. Therefore, to make up for this, Block Merge uses a small collection of elements in the main array as space called a merge buffer. Instead of copying elements to an external array, it swaps the elements to the merge buffer to merge blocks that can fit in the buffer, therefore a buffered merge has the same complexity of a regular out-of-place merge of $$O(n)$$. However, elements used in the merge buffer will lose their original order, so equal valued elements cannot be used since there's no way to recover their original order without losing stability.

Since Selection Sort is unstable, performing Selection Sort on the blocks won't guarantee that the order of the subarray is kept so Block Merge Sort also uses distinct elements to keep track of the order of the blocks during this phase. Because the elements are distinct, it's now possible to sort the blocks in their exact original order. This is called the tag buffer or key buffer.

Every Block Merge Sort variant uses both a merge buffer and tag buffer of some kind.

Key Collection
A key collection algorithm is required for any $$O(1)$$ space implementation of Block Merge. Since Block Merge cannot allocate extra space due to the constraints, it instead searches and collects distinct elements of the original array to use as "space." Block Merge Sort is a stable sort, so its key collection algorithm must collect keys stably as well. It makes use of a rotations to swap ranges of elements and is generally used to find $$O(\sqrt{n })$$ distinct keys due to the expensive cost of a rotation for each key. There are other methods of key collection that can collect more keys (ex. $$O(n^{2/3 })$$ or $$O(n \log^{-3 } n)$$) stably in the same runtime, but they are much more complicated and likely exclusively of theoretical interest.

The methods described below are the two common methods for key collection and are used to find $$O(\sqrt{n })$$ keys so that they yield a $$O(n)$$ runtime. Both methods affect how its respective Block Merge Sort will operate in their own way and have their own strengths and weaknesses.

Key collection from a sorted run
The trivial method is to scan adjacent pairs of elements from a sorted run. If one element is greater than the other, that tells the algorithm that the greater element is a new distinct key. It then rotates its current collection of found keys forward to the position of the last found key to append it to the collection. Once it has found enough keys or fully searched the run, it rotates the keys backwards to the start of the run.

Alternatively, the algorithm can also scan the run to count the amount of distincts before rotating any keys. When it counts enough distinct keys, it rotates the keys backwards starting from the position of the last key. This removes the need for rotating the keys twice forwards and backwards but doubles the amount of comparisons since it cannot allocate extra space to keep track of the positions of the counted keys and thus re-compares.

This method of key collection is most well known as being used in Wikisort.

Advantage
Since this method involves a sorted section of the array, it can be optimized to use fewer comparisons replacing the linear scan with a binary search. This reduces its worst case comparisons from $$O(m)$$ to $$O(k \log m)$$ where $$m$$ is the size of the run searched and $$k$$ the amount of keys which is at most $$O(\sqrt{m })$$. For added adaptivity, the binary search can be replaced with a jump search or exponential search which can bring its best case to $$O(k)$$. This type of key collection also allows for parallelization.

Disadvantage
Performing key collection before every merge step can add a lot of undesirable overhead since a key collection is $$O(m)$$ moves worst case. This is mitigated in implementations like Wikisort which only collects keys from one run and uses those same keys to merge every run instead of re-collecting. However, this method requires extra logic on where to search for the keys as well as keeping track of the buffer's position given it cannot collect keys from areas that have too many equal values. This method also requires the keys to be redistributed after every level of merging.

Since the amount of keys found is $$O(\sqrt{m })$$, the blocks are also limited to size $$O(\sqrt{m })$$ therefore a block selection can add an extra $$O(\sqrt{m }^2) = O(m)$$ comparison overhead for each merge.

Analysis
In the worst case, the key collection has to rotate sections of the data it's collecting from to append/insert new keys. However, once a section is rotated, the elements are ignored throughout the entire key collection therefore each element of the array changes position a constant amount of times giving $$O(n)$$ moves. The entire collection of keys are rotated for each key so this becomes $$k \cdot k = O(k^2)$$ moves. Since $$k$$ is at most $$O(\sqrt{n })$$, the key collection becomes $$O(k^2 + n) = O(\sqrt{n }^2 + n) = O(n)$$ moves worst case. Depending on its implementation, the key collection either makes $$O(n)$$, $$O(k \log n)$$, or $$O(n \log n)$$ comparisons worst case.

Wikisort
Main Page: Wikisort

Grailsort
Main Page: Grailsort

Kotasort
Main Page: Kotasort