Categories
Uncategorized

High performance linked list sorting in JavaScript

Watch and chain image by Eduardo Mueses licensed under CC BY-NC-ND 2.0

In my previous post, I described a schema and set of associated queries to persist and and update arbitrarily ordered items in a SQL database (using a linked list). This approach can scale to very large lists, without degrading performance when adding or rearranging items. But having stored a list, how can it be reproduced in the correct order? This post describes an approach to efficiently sort linked lists from SQL in client-side code. While the below examples are written in JavaScript, you could use the same basic technique in almost any modern language.

Suppose that you select the following (unordered) linked list from a database:

[
    {
        "item_id": 940,
        "item_name": "Second item",
        "previous_item_id": 239
    },
    {
        "item_id": 949,
        "item_name": "Fourth item",
        "previous_item_id": 238
    },
    {
        "item_id": 238,
        "item_name": "Third item",
        "previous_item_id": 940
    },
    {
        "item_id": 239,
        "item_name": "First item",
        "previous_item_id": null
    }
]

A naive approach to sorting

First, let’s consider a not-so-efficient way to sort the linked list:

function naiveSort(linkedList) {
    var sortedList = [];
    var index = 0;
    var previousItemId = null; // first item in list has null previous ID
 
    while (sortedList.length < linkedList.length) {
        var current = linkedList[index];
        if (current.previous_item_id === previousItemId) {
            // found the item referencing the previous item's ID
            previousItemId = current.item_id;
            sortedList.push(current); // append to sorted list
            index = 0; // start over at first element
        } else {
            index += 1; // check the next item
        }
    }
 
    return sortedList;
}

The naiveSort function re-loops through the list from the beginning each time it finds the next item and adds it to a sorted copy of the array. The function will return the correctly sorted list, but the number of required iterations exponentially increases as the list lengthens, following the equation size + size * ((size - 1) / 2). For example, a list containing 100 items would require 5,050 iterations, while a list containing 1,000 items would require 500,500! With this approach, any advantage of the linked list’s efficient insertion and reordering would be lost in lengthy sort times.

Fortunately there’s a much better way.

function mapSort(linkedList) {
    var sortedList = [];
    var map = new Map();
    var currentId = null;

    // index the linked list by previous_item_id
    for (var i = 0; i < linkedList.length; i++) {
        var item = linkedList[i];
        if (item.previous_item_id === null) {
            // first item
            currentId = item.item_id;
            sortedList.push(item);
        } else {
            map.set(item.previous_item_id, i);
        }
    }

    while (sortedList.length < linkedList.length) {
        // get the item with a previous item ID referencing the current item
        var nextItem = linkedList[map.get(currentId)];
        sortedList.push(nextItem);
        currentId = nextItem.item_id;
    }

    return sortedList;
}

An efficient sorting algorithm

The mapSort function starts by looping through the linked list a single time, adding the item array indexes to a map with a key of the item’s previous_item_id property. It then follows the chain of item_id references through the map to build the complete sorted list. This approach requires (size * 2) - 1 iterations, allowing it to scale linearly with list length.

Testing with Node.js on my Core i5 desktop PC, the mapSort function was able to sort a 5,000 item list in an average of 2.3 ms, compared to 68.4 ms for naiveSort. With larger lists, the discrepancy grew even greater. Sorting 100,000 items took an average of over 40 seconds with naiveSort, but just 61.7 ms with mapSort!

There are likely other optimizations that could be implemented to further increase performance, but for most practical purposes this technique should prove sufficient.

Categories
Uncategorized

Implementing a linked list in SQL

Recently I was challenged with enabling users to drag and drop items in a list to sort them in any order, and persisting that order in a SQL database. One way to handle this would be to add an index column to the table, which could be updated when an item is reordered. The downside of this approach is that whenever an item is added or moved, the index of every item beneath it must be updated. This could become a performance bottleneck in very large lists.

A more efficient approach is to use a linked list, where each item contains a reference to the previous item in the list, and the first item has a null reference (you could alternatively reference the next item, with the last item containing a null reference, but this requires the list to be sorted back-to-front, which I find less intuitive).

Let’s start by creating a minimal table for items:

CREATE TABLE ordered_items (
    item_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    item_name VARCHAR(100) NOT NULL,
    previous_item_id INT UNSIGNED NULL,
    FOREIGN KEY (previous_item_id) REFERENCES ordered_items(item_id)
);

Next, we’ll write two functions for adding items to the list: one to insert the item and the other to update any item referencing the same previous ID as the new item.

<?php

function addItem($name, $previousId)
{
    $sql = "INSERT INTO ordered_items (item_name, previous_item_id)
            VALUES (?, ?)";

    // The query function is assumed to prepare and execute a SQL statement with
    // an array of bound parameters, and return the insert ID or selected row.
    $itemId = query($sql, [$name, $previousId]);
    setInsertedItemReference($itemId, $previousId);
}

/**
 * If another item in the list has the same previous ID as the
 * inserted item, change it to reference the inserted item.
 */
function setInsertedItemReference($itemId, $itemPreviousId)
{
    $params = [$itemId, $itemId];

    if ($itemPreviousId === null) {
        $condition = 'IS NULL';
    } else {
        $condition = '= ?';
        $params[] = $itemPreviousId;
    }

    $sql = "UPDATE ordered_items
            SET previous_item_id = ?
            WHERE item_id <> ?
            AND previous_item_id {$condition}";

    query($sql, $params);
}

To remove an item from the list, we will again need two functions: one to delete the item row and another to update any item referencing the removed item.

<?php

function deleteItem($itemId)
{
    $previousId = selectItem($itemId)['previous_item_id'];
    closeMovedItemGap($itemId, $previousId);
    query("DELETE FROM ordered_items WHERE item_id = ?", [$itemId]);
}

function selectItem($itemId)
{
    $sql = "SELECT * FROM ordered_items WHERE item_id = ?";
    return query($sql, [$itemId]);
}

/**
 * If any other item has a previous ID referencing the moved item,
 * change it to point to the moved item's original previous ID.
 */
function closeMovedItemGap($itemId, $itemPreviousId)
{
    $sql = "UPDATE ordered_items
            SET previous_item_id = ?
            WHERE previous_item_id = ?";

    query($sql, [$itemPreviousId, $itemId]);
}

Finally, we can add a function to update items (including their sort order):

<?php

function updateItem($id, $name, $previousId)
{
    if ($id === $previousId) {
        throw new Exception('Items cannot reference themselves');
    }

    $originalItem = selectItem($id);

    if ($previousId !== $originalItem['previous_item_id']) {
        // the item was reordered
        closeMovedItemGap($id, $originalItem['previous_item_id']);
        setInsertedItemReference($id, $previousId);
    }

    $sql = "UPDATE ordered_items
            SET item_name = ?,
            previous_item_id = ?
            WHERE item_id = ?";

    query($sql, [$name, $previousId, $id]);
}

As can be seen, whether an item is added, removed, or reordered, at most three rows will need to be updated. This keeps the performance nearly constant, regardless of the size of the list. With the basic database implementation complete, in my next post I’ll share an approach to efficiently sort the linked list in client-side code.