How to decide between pre-emptive split or non pre-emptive split and pre-emptive or non pre-emptive merge for a BTree?

Question

I am trying to implement a BTree. As far as I understand the lower the height of the Tree the faster the search (This could be wrong, please correct me if it is).
Is pre-emptive split with non-pre-emptive merge the best or non-pre-emptive split with pre-emptive merge or the other 2 combinations left? Which offers better performance and why? I search online I could not find any article that discussed anything regarding this.
After this I need to implement insert. So, I hoping someone can impart some knowledge on which is better?

So, far I do not have much code except the following because I cannot decide which method I should choose and Why? I want fast insertion deletion and search.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <string.h>
static enum TreeProperties {
    MIN_CHILDREN = 2,
    MAX_CHILDREN = 6,
    MAX_KEYS = MAX_CHILDREN - 1,
    MIN_KEYS = MIN_CHILDREN - 1, // except root
    SPLIT_KEYS = MAX_KEYS - MIN_CHILDREN,
    SPLIT_KEY_INDEX = MIN_CHILDREN,
    SPLIT_LINKS = SPLIT_KEYS + 1,
    SPLIT_LINK_INDEX = SPLIT_KEY_INDEX + 1, 
    MAX_LINKS = MAX_CHILDREN
};
typedef struct BTreeNode { 
    struct BTreeNode *links[MAX_LINKS]; 
    int keys[MAX_KEYS];
    size_t keyCount;
}BTreeNode;
#define firstLink links[0]
/* New node allocation */
static inline size_t childrenNum(BTreeNode *node) {
    return node->keyCount + 1;
}
static inline BTreeNode _createNode() {
    BTreeNode newNode = (BTreeNode)malloc(sizeof(BTreeNode));
    if(!newNode) {
        fprintf(stderr, "New node init failed: out of memory\n");
        exit(EXIT_FAILURE);
    }
    memset(newNode->links, 0, sizeof(BTreeNode)MAX_CHILDREN);
    return newNode;
}
BTreeNode BTree_createEmptyNode() {
    BTreeNode newNode = _createNode();
    newNode->keyCount = 0;
    return newNode;
}
BTreeNode BTree_createRightSplitNode(BTreeNode nodeToSplit) {
    BTreeNode newNode = _createNode();
    memcpy(newNode->keys, nodeToSplit->keys + SPLIT_KEY_INDEX, sizeof(newNode->keys[0])  SPLIT_KEYS);
    if(nodeToSplit->firstLink) {
        memcpy(newNode->links, nodeToSplit->links + SPLIT_LINK_INDEX, sizeof(BTreeNode)  (SPLIT_LINKS));
        memset(nodeToSplit->links + SPLIT_LINK_INDEX, 0, sizeof(BTreeNode)  SPLIT_LINKS);
    }
    newNode->keyCount = SPLIT_KEYS;
    nodeToSplit->keyCount = MIN_KEYS;
}
/* End new node allocation */
/* Search /
static bool isKeyInNodeOrGiveNextLink(int key, BTreeNode current, size_t idx /, size_t * parentLinkIdxToKeyNode/) {
    idx = 0;
    while (idx < current->keyCount && key > current->keys[idx]) {
        if(current->keys[*idx] == key ) {
            return true;
        }
        //parentLinkIdxToKeyNode = linkIdx;
        (*idx)++;
    }
    return false;
}
BTreeNode* BTree_search(int key, BTreeNode *current) {
    size_t i;
    while(current) {
        if(isKeyInNodeOrGiveNextLink(key, current, &i)) return current;
        current = current->links[i];
    }
    return NULL;
}

Kenneth Kho · Accepted Answer · 2024-01-31T16:40:52.673

The reason it is not debated is because it is obvious, the reason that B-tree was invented, and subsequently B* tree is to reduce the frequency of rebalancing. Splitting and merging both involve rebalancing, therefore the more it is delayed the better.

However, there is an exception to ordinary splitting rules when bulk loading (Wikipedia), which is $O(n)$ similar to heapify:

A common special case is adding a large amount of pre-sorted data into an initially empty B-tree. While it is quite possible to simply perform a series of successive inserts, inserting sorted data results in a tree composed almost entirely of half-full nodes. Instead, a special "bulk loading" algorithm can be used to produce a more efficient tree with a higher branching factor.

When the input is sorted, all insertions are at the rightmost edge of the tree, and in particular any time a node is split, we are guaranteed that no more insertions will take place in the left half. When bulk loading, we take advantage of this, and instead of splitting overfull nodes evenly, split them as unevenly as possible: leave the left node completely full and create a right node with zero keys and one child (in violation of the usual B-tree rules).

At the end of bulk loading, the tree is composed almost entirely of completely full nodes; only the rightmost node on each level may be less than full. Because those nodes may also be less than half full, to re-establish the normal B-tree rules, combine such nodes with their (guaranteed full) left siblings and divide the keys to produce two nodes at least half full. The only node which lacks a full left sibling is the root, which is permitted to be less than half full.

How to decide between pre-emptive split or non pre-emptive split and pre-emptive or non pre-emptive merge for a BTree?

1 Answers1