Skip lists are a fascinating data structure: very simple, and yet have the same asymptotic efficiency as much more complicated AVL trees and red-black trees. While many standard libraries for various programming languages provide a sorted set data structure, there are numerous problems that require more control over the internal data structure than a sorted set exposes. In this article, I will discuss the asymptotic efficiency of operations on skip lists, the ideas that make them work, and their interesting use cases. And, of course, I will give you the source code for a skip list in C#.

The time complexity of basic operations on a skip list is as follows:

Operation Time Complexity
Insertion O(log N)
Removal O(log N)
Check if contains O(log N)
Enumerate in order O(N)

This makes skip list a very useful data structure. First, as mentioned earlier, skip list can be used as the underlying storage for a sorted set data structure. But, skip list can be directly used to implement some operations that are not efficient on a typical sorted set:

  • Find the element in the set that is closest to some given value, in O(log N) time.
  • Find the k-th largest element in the set, in O(log N) time. Requires a simple  augmentation of the the skip list with partial counts.
  • Count the number of elements in the set whose values fall into a given range, in O(log N) time. Also requires a simple augmentation of the skip list.

From a singly-linked list to a skip list

Sometimes the best way to understand how something works is to attempt to design it yourself. Let’s try to go through that exercise with skip lists.

First, consider a regular sorted singly-linked list. Here is an example of one:

list

A sorted singly-linked list is not a terribly interesting data structure. The complexity of basic operations looks like this:

Operation Time Complexity
Insertion O(N)
Removal O(N)
Check if contains O(N)
Enumerate in order O(N)

That is pretty unimpressive, actually. The only interesting value here is the O(N) in-order enumeration. For an insertion, removal or search, O(N) is about as bad as it gets. (There are more specialized use cases where sorted linked lists are appropriate, though.)

So, how can we make these operations on a sorted linked list faster? The main problem with a linked list is that it takes so long to get into its middle. That makes insertion, removal and search operations all O(N).

Well, here is an idea: let’s consider a sorted multi-level list. We start out with a regular singly-linked list that connects nodes in-order. Then, we add a level-2 list that skips every other node. And a level-3 list that skips every other node in the level-2 list. And so forth, until we have a list that jumps somewhere past the middle element. Our previous list now looks like this:

multilist

Now, checking whether a particular element is in the set only takes O(log N). The search algorithm is a lot like binary search. We first look in the top-most list and move to the right, making sure that we don’t jump too far. For example, if we are searching for number 8, we will not take the level-3 link from the head node, because we would end up too far right: all the way at 12! If we can’t move further right on a particular level, we drop to the next lower level, which has shorter jumps.

Search for value 8 would proceed like this:

multilist-search

Since we landed on a 7, but we were looking for an 8, that means that 8 is not in the set.

The O(log N) search time is very nice. But, there is a problem: how do we implement insertions and removals efficiently, but in a way so that they maintain the structure of the multi-level list? This turns out to be quite a problem. AVL and red-black trees resolve it by tricky rebalancing operations.

Skip lists take an entirely different approach: a probabilistic one. Instead of ensuring that the level-2 list skips every other node, a skip list is designed in a way that the level-2 list skips one node on average. In some places, it may skip two nodes, and in other places, it may not skip any nodes. But overall, the structure of a skip list is very similar to the structure of a sorted multi-level list.

Here is an example of what a skip list may look like:

skiplist

A skip list looks a bit like a slightly garbled sorted multi-level list. A skip list has many of the nice properties of a sorted multi-level list, such as O(log N) search times, but also allows simple O(log N) insertions and deletions.

Implementation of skip lists

So, given this description of a skip list, would you know how to implement it? It is not very hard, and it may be worth it to spend a couple minutes thinking about it.

  • Insertion: decide how many lists will this node be a part of. With a probability of 1/2, make the node a part of the lowest-level list only. With 1/4 probability, the node will be a part of the lowest two lists. With 1/8 probability, the node will be a part of three lists. And so forth. Insert the node at the appropriate position in the lists that it is a part of.
  • Deletion: remove the node from all sorted lists that it is a part of.
  • Check if contains: we can use the O(log N) algorithm similar described above on multi-level lists.

And, all of these operations are pretty simple to implement in O(log N) time!

Source code

Here is a sample C# implementation for a skip list of integers:

class IntSkipList
{
    private class Node
    {
        public Node[] Next { get; private set; }
        public int Value { get; private set; }

        public Node(int value, int level)
        {
            Value = value;
            Next = new Node[level];
        }
    }

    private Node _head = new Node(0, 33); // The max. number of levels is 33
    private Random _rand = new Random();
    private int _levels = 1;

    /// <summary>
    /// Inserts a value into the skip list.
    /// </summary>
    public void Insert(int value)
    {
        // Determine the level of the new node. Generate a random number R. The number of
        // 1-bits before we encounter the first 0-bit is the level of the node. Since R is
        // 32-bit, the level can be at most 32.
        int level = 0;
        for (int R = _rand.Next(); (R & 1) == 1; R >>= 1)
        {
            level++;
            if (level == _levels) { _levels++; break; }
        }

        // Insert this node into the skip list
        Node newNode = new Node(value, level + 1);
        Node cur = _head;
        for (int i = _levels - 1; i >= 0; i--)
        {
            for (; cur.Next[i] != null; cur = cur.Next[i])
            {
                if (cur.Next[i].Value > value) break;
            }

            if (i <= level) { newNode.Next[i] = cur.Next[i]; cur.Next[i] = newNode; }
        }
    }

    /// <summary>
    /// Returns whether a particular value already exists in the skip list
    /// </summary>
    public bool Contains(int value)
    {
        Node cur = _head;
        for (int i = _levels - 1; i >= 0; i--)
        {
            for (; cur.Next[i] != null; cur = cur.Next[i])
            {
                if (cur.Next[i].Value > value) break;
                if (cur.Next[i].Value == value) return true;
            }
        }
        return false;
    }

    /// <summary>
    /// Attempts to remove one occurence of a particular value from the skip list. Returns
    /// whether the value was found in the skip list.
    /// </summary>
    public bool Remove(int value)
    {
        Node cur = _head;

        bool found = false;
        for (int i = _levels - 1; i >= 0; i--)
        {
            for (; cur.Next[i] != null; cur = cur.Next[i])
            {
                if (cur.Next[i].Value == value)
                {
                    found = true;
                    cur.Next[i] = cur.Next[i].Next[i];
                    break;
                }

                if (cur.Next[i].Value > value) break;
            }
        }

        return found;
    }

    /// <summary>
    /// Produces an enumerator that iterates over elements in the skip list in order.
    /// </summary>
    public IEnumerable<int> Enumerate()
    {
        Node cur = _head.Next[0];
        while (cur != null)
        {
            yield return cur.Value;
            cur = cur.Next[0];
        }
    }
}

Possible improvements

  • Obviously, a more useful implementation would be generic, so that we can store values other than integers.
  • Nodes can be structs instead of classes. This significantly reduces the number of heap allocations. But, we cannot use the null value anymore to represent the tail, which adds a bit of extra code.
  • The skip list could be made associative, so that each node stores a key/value pair.
  • The implementation I gave above is a multiset. It is pretty simple to change the skip list so that it implements a set instead.

Related:

kick it on DotNetKicks.com

Share/Save/Bookmark