Grailsort

Grailsort is a sorting algorithm in the Block Merge Sort family, that runs in $$O(n \log n)$$, is Stable, and not Adaptive, and uses $$O(1)$$ space.

Key collection


The algorithm starts by collecting a $$\sqrt{n} \times 2$$ of unique keys, using a binary search to check if an element is equal. When it finds a unique element not in the key, it rotates (using Gries-Mills) and puts it in the appropriate place. Once enough keys have been found, they are moved onto the beginning of the list, and the merging process begins.

The array now looks like this:

[keys | unsorted data]

Lazy Stable Sort
In a case where not enough keys have been collected and only have less than four, Lazy Stable Sort is used instead. The algorithm then terminates.

Merging

 * Note: To swap with the buffer means to swap with the leftmost element of the buffer (that changes each time it swaps with another.)

We use most of the keys as a merge buffer. Firstly, 2 keys are taken to the buffer to merge each pair of unsorted data, from left-to-right. The pairs are compared, then the minimum is swapped with the buffer, and then the maximum. For instance:

... B B 4 2 3 5 ...

where B, stands for buffer. 2 is the lowest element:

... 2 B 4 B 3 5 ... ... 2 4 B B 3 5 ...

This is repeated until the end of the list is reached.

... 2 4 3 B B 5 ... ... 2 4 3 5 B B ...

When the element at the end does not have a pair, just swap it with the key. The array now looks like this:

[keys | buffer | sorted pairs of data | buffer]

In subsequent merges, a different algorithm is used. It is similar to Merge Sort's merging process, using the buffer to achieve in-place merging. Take another $$n$$ keys to the buffer, starting with $$n=2$$ (not 4, since 2 + 4 is not a power of two). Compare the 2 arrays' leftmost elements and swap the minimum with the buffer, going to the next element, and compare again, until one reaches its end of the list. In that case, swap the other array's remaining element(s) with the buffer.

... B B 2 4|3 5 ... 2 < 3 ... 2 B B 4|3 5 ... 4 > 3 ... 2 3 B 4|B 5 ... 4 < 5 ... 2 3 4 B|B 5 ... swap remaining with buffer ... 2 3 4 5 B B ...

Repeat this, until the right buffer is reached. This process is called merge left. Do this again, with $$n=4, 8, ...$$. The array looks like this:

[keys | sorted blocks of data | buffer]

A trick is used, where the buffer then merges right, which is similar to merging left. It has gone back to its place again:

[keys | buffer | sorted 2 blocks of data]

Block merging

 * TBA: How it works with insufficient amount of keys



Call the blocks of data that are being merged A and B respectively.

The leftmost, remaining keys are now called the movement imitation buffer. This is used to know which blocks are A and which blocks are B.

The blocks are rolled, which is a fancy term for swapping the blocks. Selection Sort is used, comparing the beginning of the element of each block. Keep a pointer of the key to the first B block (denoted as ). Whenever a swap occurs, swap the movement imitation buffer key with the corresponding block:

KEY BUFFER     BLOCKS* 1 2 3 4  ...   A1 A4 B2 B3 -> #           ^ 1 2 3 4   ...   A1 A4 B2 B3     #               ^--^ 1 3 2 4  ...   A1 B2 A4 B3   #                    ^--^ 1 3 4 2  ...   A1 B2 B3 A4   # *numbers mean value of first element in block

It is shown that elements in the movement imitation buffer less than the element that is on the pointer, are on A blocks, and others are on B blocks.

Using the merge buffer, merge the last A block to the series of B blocks, or vice versa, using a modification of the merge left procedure:

... buffer A1 B2 B3 A4 ... ... A1+B2 buffer A1 buffer B2 B3 A4 ...   merging... ... A1+B2 A1 buffer B3 A4 ... merged A1 and B2

As the length of each block is equal to the buffer, enough elements in the buffer have been moved to B2 in this example. A1 still has remaining elements, but they must be merged to the next B block (B3). Therefore, the buffer rotates to the left in order to merge it:

... A1+B2 buffer A1 B3 A4 ... rotate ... A1+B2+B3 buffer A1 buffer B3 A4 ...   merging... ... A1+B2+B3 buffer B3 A4 ... fully merged A1

Take note that the remaining element of B has not been moved past the buffer, as some elements in may be greater than the elements of the A series. Also, in some cases, the A block may have fully merged, but not in the last B block in series, so the buffer moves just before the last B block.

Repeat this process with the last/remaining B (or A) block to the series of A (or B) blocks, until it reaches the end of the list being merged. This is called block merging. Do an insertion sort to the movement imitation buffer to reset it, and continue block merging the next blocks of data, until the end of the whole array is reached. Reset the buffer by moving it back to its place. Repeat until there is only a single sorted run, with a buffer. The array should look like this:

[buffer (movement imitation + merge) | sorted data]

Final steps
The algorithm sorts the buffer using insertion sort, and then moves the elements to the appropriate place, using binary searches and rotations. The input is now sorted.

Special case: less than 16 elements
The algorithm simply uses insertion sort in this case. The algorithm terminates after this point.