Igor Ostrovsky Blogging » Algorithms

How RAID-6 dual parity calculation works

Igor Ostrovsky — Mon, 29 Sep 2014 05:09:18 +0000

Parity protection is a common technique for reliable data storage on mediums that may fail (HDDs, SSDs, storage servers, etc.) Calculation of parity uses tools from algebra in very interesting ways, in particular in the dual parity case.

This article is for computer engineers who would like to have a high-level understanding of how the dual parity calculation works, without diving into all of the mathematical details.

Single parity

Let’s start with the simple case – single parity. As an example, say that you need to reliably store 5 data blocks of a fixed size (e.g., 4kB). You can put the 5 data blocks on 5 different storage devices, and then store a parity block on a 6th device. If any one device fails, you can still recompute the block from the failed device, and so you haven’t lost any data.

A simple way to calculate the parity P is just to combine the values a, b, c, d, e with the exclusive-or (XOR) operator:

P = a XOR b XOR c XOR d XOR e

Effectively, every parity bit protects the corresponding bits of a, b, c, d, e.

If the first parity bit is 0, you know that the number of 1 bits among the first bits of a, b, c, d, e is even.
Otherwise the number of 1 bits is odd.

Given a, b, c, e and P, it is easy to reconstruct the missing block d:

d = P XOR a XOR b XOR c XOR e

Straightforward so far. Exclusive-or parity is commonly used in storage systems as RAID-5 configuration:

RAID-5 uses the exclusive-or parity approach, except that the placement of parity is rotated among the storage devices. Parity blocks gets more overwrites than data blocks, so it makes sense to distribute them among the devices.

What about double parity?

Single parity protects data against the loss of a single storage device. But, what if you want protection against the loss off two devices? Can we extend the parity idea to protect the data even in that scenario?

It turns out that we can. Consider defining P and Q like this:

P = a + b + c + d + e
Q = 1a + 2b + 3c + 4d + 5e

If you lose any two of { a, b, c, d, e, P, Q }, you can recover them by plugging in the known values into the equations above, and solving for the two unknowns. Simple.

For example, say that { a, b, c, d, e } are { 10, 5, 8, 13, 2 }. Then:

P = 10 + 5 + 8 + 13 + 2 = 38
Q = 10 + 2 × 5 + 3 × 8 + 4 × 13 + 5 × 2 = 106

If you lost c and d, you can recover them by solving the original equations.

How come you can always solve the equations?

To explain why we can always solve the equations to recover two missing blocks, we need to take a short detour into matrix algebra. We can express the relationship between a, b, c, d, e, P, Q as the following matrix-vector multiplication:

We have 7 linear equations of 5 variables. In our example, we lose blocks c and d and end up with 5 equations of 5 variables. Effectively, we lost two rows in the matrix and the two matching rows in the result vector:

This system of equations is solvable because the matrix is invertible: the five rows are all linearly independent.

In fact, if you look at the original matrix, any five rows would have been linearly independent, so we could solve the equations regardless of which two devices were missing.

But, integers are inconvenient…

Calculating P and Q from the formulas above will allow you to recover up to two lost data blocks.

But, there is an annoying technical problem: the calculated values of P and Q can be significantly larger than the original data values. For example, if the data block is 1 byte, its value ranges from 0 to 255. But in that case, the Q value can reach 3825! Ideally, you’d want P and Q to have the same range as the data values themselves. For example, if you are dealing with 1-byte blocks, it would be ideal if P and Q were 1 byte as well.

We’d like to redefine the arithmetic operators +, –, ×, / such that our equations still work, but the values of P and Q stay within the ranges of the input values.

Thankfully, this is a well-studied problem in algebra. One simple approach that works is to pick a prime number M, and change the meaning of +, –, and × so that values "wrap around" at M. For example, if we picked M=7, then:

6 + 1 = 0 (mod 7)

4 + 5 = 2 (mod 7)

3 × 3 = 2 (mod 7)

etc.

Here is how to visualize the field:

Since 7 is a prime number, this is an example of a finite field. Additions, subtractions, multiplications and even divisions work in a finite field, and so our parity equations will work even modulo 7. In some sense, finite fields are more convenient than integers: dividing two integers does not necessarily give you an integer (e.g., 5 / 2), but in a finite field, division of two values always gives you another value in the field (e.g., 5 / 2 = 6 mod 7). Division modulo a prime is not trivial to calculate – you need to use the extended Euclidean algorithm – but it is pretty fast.

But, modular arithmetic forms a finite field only if the modulus is prime. If the modulus is composite, we have a finite ring, but not a finite field, which is not good enough for our equations to work.

Furthermore, in computing we like to deal with bytes or multiples of bytes. And, the range of a byte is 256, which is definitely not a prime number. 257 is a prime number, but that still doesn’t really work for us… if the inputs are bytes, our P and Q values will sometimes come out as 256, which doesn’t fit into a byte.

Finally, Galois Fields

The finite fields we described so far are all of size M, where M is a prime number. It is possible to construct finite fields of size M^k, where M is a prime number and k is a positive integer.

The solution is called a Galois field. The particular Galois field of interest is GF(2⁸) where:

Addition is XOR.
Subtraction is same as addition; also XOR.
Multiplication can be implemented as gf_ilog[gf_log[a] + gf_log[b]], where gf_log is a logarithm in the Galois field, and gf_ilog is its inverse. gf_log and gf_ilog can be tables computed ahead of time.
Division can then calculated as gf_ilog[gf_log[a] – gf_log[b]], so we can reuse the same tables used by multiplication.

The actual calculation of the logarithm and inverse logarithm tables is beyond the scope of the article, but if you would like to learn more about the details ,there are plenty of in-depth articles on Galois fields online.

In the end, a GF(2⁸) field gives us exactly what we wanted. It is a field of size 2⁸, and it gives us efficient +, –, ×, / operators that we can use to calculate P and Q, and if needed recalculate a lost data block from the remaining data blocks and parities. And, each of P and Q perfectly fits into a byte. The resulting coding is called Reed-Solomon error correction, and that’s the method used by RAID 6 configuration:

Final Comments

The article explains the overall approach to implementing a RAID-6 storage scheme based on GF(2⁸). It is possible to use other Galois fields, such as the 16-bit GF(2¹⁶), but the 8-bit field is most often used in practice.

Also, it is possible to calculate additional parities to protect against additional device failures:

R = 1²a + 2²B + 3²C + 4²D + 5²E

S = 1³a + 2³B + 3³C + 4³D + 5³E

…

Finally, the examples in this article used 5 data blocks per stripe, but that’s an arbitrary choice – any number works, as far as the parity calculation is concerned.

Program like a Quake developer

Igor Ostrovsky — Wed, 09 Sep 2009 06:22:52 +0000

Not many developers have the insights of Michael Abrash. He is a game developer with decades of experience building commercial games, including a game you may recognize as "Quake". His Graphics Programming Black Book is years old, but much of it is just as interesting as it was at the time of writing. And, the entire book is available online, for free.

The book consists of 70 chapters on optimization, graphics, and assembly programming. The entire book is insightful and interesting, but my favorite chapters are these:

The Best Optimizer Is between Your Ears
There Ain’t No Such Thing as the Fastest Code
Optimizing for 286 and 386, 486 (continued), and Pentium (continued here and here).
Algorithms for fast drawing of lines, anti-aliased lines, polygons, fast convex polygons, and 3d objects (continued)
Quake’s visible-surface determination, 3D clipping, hidden-surface removal, span sorting, lighting, surface caching and post-mortem.

Enjoy! Again, the entire book is accessible here: Graphics Programming Black Book.

Efficient auto-complete with a ternary search tree

Igor Ostrovsky — Mon, 31 Aug 2009 00:27:54 +0000

Over the past couple of years, auto-complete has popped up all over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn and lots of other websites all try to complete your phrase as soon as you start typing.

Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. In many cases, an efficient implementation requires the use of interesting algorithms and data structures. In this blog post, I will describe one simple data structure that can be used to implement auto-complete: a ternary search tree.

Trie: simple but space-inefficient

Before discussing ternary search trees, let’s take a look at a simple data structure that supports a fast auto-complete lookup but needs too much memory: a trie. A trie is a tree-like data structure in which each node contains an array of pointers, one pointer for each character in the alphabet. Starting at the root node, we can trace a word by following pointers corresponding to the letters in the target word.

Each node could be implemented like this in C#:

class TrieNode
{
    public const int ALPHABET_SIZE = 26;
    public TrieNode[] m_pointers = new TrieNode[ALPHABET_SIZE];
    public bool m_endsString = false;
}

Here is a trie that stores words AB, ABBA, ABCD, and BCD. Nodes that terminate words are marked yellow:

Implementing auto complete using a trie is easy. We simply trace pointers to get to a node that represents the string the user entered. By exploring the trie from that node down, we can enumerate all strings that complete user’s input.

But, a trie has a major problem that you can see in the diagram above. The diagram only fits on the page because the trie only supports four letters {A,B,C,D}. If we needed to support all 26 English letters, each node would have to store 26 pointers. And, if we need to support international characters, punctuation, or distinguish between lowercase and uppercase characters, the memory usage grows becomes untenable.

Our problem has to do with the memory taken up by all the null pointers stored in the node arrays. We could consider using a different data structure in each node, such as a hash map. However, managing thousands and thousands of hash maps is generally not a good idea, so let’s take a look at a better solution.

Ternary search tree to the rescue

A ternary tree is a data structure that solves the memory problem of tries in a more clever way. To avoid the memory occupied by unnecessary pointers, each trie node is represented as a tree-within-a-tree rather than as an array. Each non-null pointer in the trie node gets its own node in a ternary search tree.

For example, the trie from the example above would be represented in the following way as a ternary search tree:

The ternary search tree contains three types of arrows. First, there are arrows that correspond to arrows in the corresponding trie, shown as dashed down-arrows. Traversing a down-arrow corresponds to “matching” the character from which the arrow starts. The left- and right- arrow are traversed when the current character does not match the desired character at the current position. We take the left-arrow if the character we are looking for is alphabetically before the character in the current node, and the right-arrow in the opposite case.

For example, green arrows show how we’d confirm that the ternary tree contains string ABBA:

And this is how we’d find that the ternary string does not contain string ABD:

Ternary search tree on a server

On the web, a significant chunk of the auto-complete work has to be done by the server. Often, the set of possible completions is large, so it is usually not a good idea to download all of it to the client. Instead, the ternary tree is stored on the server, and the client will send prefix queries to the server.

The client will send a query for words starting with “bin” to the server:

And the server responds with a list of possible words:

Implementation

Here is a simple ternary search tree implementation in C#:

public class TernaryTree
{
    private Node m_root = null;

    private void Add(string s, int pos, ref Node node)
    {
        if (node == null) { node = new Node(s[pos], false); }

        if (s[pos] < node.m_char) { Add(s, pos, ref node.m_left); }
        else if (s[pos] > node.m_char) { Add(s, pos, ref node.m_right); }
        else
        {
            if (pos + 1 == s.Length) { node.m_wordEnd = true; }
            else { Add(s, pos + 1, ref node.m_center); }
        }
    }

    public void Add(string s)
    {
        if (s == null || s == "") throw new ArgumentException();

        Add(s, 0, ref m_root);
    }

    public bool Contains(string s)
    {
        if (s == null || s == "") throw new ArgumentException();

        int pos = 0;
        Node node = m_root;
        while (node != null)
        {
            int cmp = s[pos] - node.m_char;
            if (s[pos] < node.m_char) { node = node.m_left; }
            else if (s[pos] > node.m_char) { node = node.m_right; }
            else
            {
                if (++pos == s.Length) return node.m_wordEnd;
                node = node.m_center;
            }
        }

        return false;
    }
}

And here is the Node class:

class Node
{
    internal char m_char;
    internal Node m_left, m_center, m_right;
    internal bool m_wordEnd;

    public Node(char ch, bool wordEnd)
    {
        m_char = ch;
        m_wordEnd = wordEnd;
    }
}

Remarks

For best performance, strings should be inserted into the ternary tree in a random order. In particular, do not insert strings in the alphabetical order. Each mini-tree that corresponds to a single trie node would degenerate into a linked list, significantly increasing the cost of lookups. Of course, more complex self-balancing ternary trees can be implemented as well.

And, don’t use a fancier data structure than you have to. If you only have a relatively small set of candidate words (say on the order of hundreds) a brute-force search should be fast enough.

Data structure zoo: ordered set

Igor Ostrovsky — Fri, 29 Aug 2008 09:01:16 +0000

This article is the first one in a series titled Data structure zoo. Each article will give you a “working knowledge” of data structures that solve a particular problem. You won’t necessarily know how to implement each one, but you will have a good idea of the main characteristics of each solution and how to pick among them to solve your particular problem.

Today, I am writing about data structures to represent an ordered set. An ordered set is a common data structure that supports O(log N) lookups, insertions and removals. Ordered set is also sometimes used as an alternative to a hash map, for example in STL’s map.

Complexities of various operations on an ordered set are as follows:

O(log N) insertion and removal.
O(log N) check if contains a value.
O(N) enumeration in sorted order.
O(log N) to find the element in the set closest to some value.
O(log N) to find k-th largest element in the set. This operation requires a simple augmentation of the the ordered set with partial counts.
O(log N) to count the number of elements in the set whose values fall into a given range, in O(log N) time. Also requires a simple augmentation of the ordered set.

Now, let’s look at the different ways to implement an ordered set.

AVL Tree

In 1962, Russian mathematicians G.M. Adelson-Velsky and E.M. Landis presented a first solution to the ordered set problem, in a paper called “An algorithm for organization of information”. Given the ambitious title of the paper, it is clear that they knew that they were onto something big. The algorithm of Adelson-Velsky and Landis stores elements of the set in a binary search tree and ensures that the tree always remains perfectly balanced. The tree balances itself after each modification, hence it is an example of a self-balancing tree. Here is an example of an AVL tree:

Notice that the tree is nicely organized so that the maximum number of steps to reach all nodes from the root is as small as possible. That is a very important property, which ensures that the complexity of basic operations such as lookups, insertions and removals is O(log N).

This idea is simple, but its implementation is not. The difficulty in the AVL tree design is inserting and removing nodes, while maintaining the balanced shape. You can read about the algorithm here. These are a few important things to know about how rebalancing works:

Rebalancing the tree consists of O(log N) constant-time operations called rotations.
Consequently, insertion or removal from the tree followed by rebalancing can be done in O(log N) time.
Each rotation is a rearrangement of a small number of neighboring nodes in the tree.

Red-black Tree

Red-black tree is another popular algorithm for a self-balancing binary search tree, but one that makes slightly different trade-offs from the AVL tree. The main difference is that a red-black tree is not quite as neatly organized as an AVL tree. In an AVL tree, the longest path from a leaf to the root is at most one larger than the shortest path from a leaf to the root. In a red-black tree, the longest path to a leaf could be up to twice as long as the shortest path to a leaf.

So, you could say that a red-black is rebalanced more haphazardly. A lookup in a red-black tree may take longer, but on the plus side, the haphazard rebalancing is significantly cheaper. The tree tracks each node as either red or black. The color of the node is used in the complex rebalancing algorithm that I will not detail here, but that you can read all about here.

In practice, red-black trees tend to have slower lookups, but faster insertions and removals than AVL trees. Fast insertions and removals make red-black tree a popular data structure in system programming. For example, according to this article, the Linux kernel uses red-black trees to organize requests in various schedulers, the packet CD/DVD driver and the high-resolution timer, to track directory entries in an ext3 filesystem, and to lookup virtual memory areas, epoll file descriptors, cryptographic keys, and network packets. Also, red-black trees are used by the TreeSet in Java and map in STL.

Here is an example of a red-black tree. Notice that the tree has more levels than it absolutely would have to, but not by much (only by one level in this case):

Treap

One interesting fact about self-balancing trees is that if you only insert random numbers, all of the tricky rebalancing is in fact unnecessarily. It can be mathematically proven that a sequence of random-number insertions is nearly guaranteed to fill the tree uniformly, so that it fairly nicely balanced. Self-balancing trees are only useful for special (but common) cases, such as when elements are inserted in an increasing or nearly-increasing order.

If we could ensure that elements are inserted in a random order, we would not need rebalancing. Of course, we cannot do that because in most scenarios we don’t have control over the order of insertions. But, we can still simulate a random insertion order, which is the main idea behind a treap.

Treap, just like an AVL tree or a red-black tree, is based on a binary search tree. As each value is inserted, treap assigns a priority to that value. Then, treap rearranges the tree as if values had been inserted in the priority order. So, values with low priorities will move closer to the root than values with higher priorities. In other words, a treap has a heap property with respect to priorities.

Treaps are significantly easier to implement than AVL or red-black trees, and have the same expected time complexity of operations. Theoretically, a treap could degenerate so that lookups become O(N) rather than O(log N), but the probability of that happening is extremely small.

Splay Tree

A splay tree is another self-balancing tree, but with an additional interesting property: the element that has been accessed most recently is at the root, and other recently accessed elements are also near the top. If we search for an element that we looked up recently, we will find it very quickly because it is near the top. This is a nice property because many real-world programs will access some values much more frequently than other values.

Unfortunately, there are two penalties that a splay tree pays for the nice cache-like behavior. The first penalty is that in the worst case, a lookup in a splay tree may take O(N) time, even though amortized over many operations, it is guaranteed to be O(log N) on average.

The second – even more serious – penalty is that lookups modify the tree as they perform rotations to move the recently accessed element to the root. The fact that a lookup performs O(log N) writes into the tree significantly adds to its cost on a modern hardware.

The FreeBSD operating system uses splay trees to organize memory pages in it its virtual memory manager. However, it seems that at least some contributors do not consider splay trees to be ideal for this purpose, and proposed a Google Summer of Code project to modify the virtual memory manager to use a different data structure.

Skip List

A skip list is a curiously different solution to the ordered set problem. A relatively new data structure, skip list was discovered in 1990 by professor of computer science at University of Maryland, William Pugh. Like a treap, skip list exploits randomness to implement an ordered set in a simple way. Skip list is essentially an ordered linked list… except that we add more links!

The problem with an ordered linked list is that to get into the middle of the list, we need to scan over half of the list, making the complexity of the lookup O(N). Skip list adds links that move more than one node forward, skipping over some nodes.

We use randomness to decide where to add links that skip over nodes. For a detailed description of how skip lists work and a C# implementation, see my earlier article on skip lists.

Pre-Balanced Tree

If you try to find information on pre-balanced trees, you will probably fail. That is because I don’t know what the real name of “pre-balanced trees” is, or even whether they have one.

It is simply a data structure that occurred to me while thinking about a problem in a programming contest. A pre-balanced tree is another possible solution to the ordered set problem. A pre-balanced tree has one huge advantage and one huge disadvantage over the other data structures mentioned in this article. The advantage is that it is brain-dead simple, even simpler than a treap or a skip list. The disadvantage is that we need to know the values that we will insert into the set ahead of time.

“Know the values that will be inserted in the future? This person is clearly crazy,” you are probably thinking. Well, I may be, but it still turns out that in many real-world problems, you actually know the values that will be inserted ahead of time. For example, a very common use case is to iterate over a sequence of elements, incrementally adding them into the set, and observing the state of the set in between the insertions.

In such cases, we can construct a fully balanced binary search tree that contains all values that will eventually be inserted. Initially, all values are “disabled”. Inserting a node enables it. To look up a value, find the node in the binary tree that contains the value. The set contains that value if such node is in the tree, and it is also enabled.

Here is an example of a pre-balanced tree, with nodes 18, 19 and 45 already inserted:

Skip lists are fascinating!

Igor Ostrovsky — Mon, 21 Jul 2008 06:43:37 +0000

Skip lists are a fascinating data structure: very simple, and yet have the same asymptotic efficiency as much more complicated AVL trees and red-black trees. While many standard libraries for various programming languages provide a sorted set data structure, there are numerous problems that require more control over the internal data structure than a sorted set exposes. In this article, I will discuss the asymptotic efficiency of operations on skip lists, the ideas that make them work, and their interesting use cases. And, of course, I will give you the source code for a skip list in C#.

The time complexity of basic operations on a skip list is as follows:

Operation	Time Complexity
Insertion	O(log N)
Removal	O(log N)
Check if contains	O(log N)
Enumerate in order	O(N)

This makes skip list a very useful data structure. First, as mentioned earlier, skip list can be used as the underlying storage for a sorted set data structure. But, skip list can be directly used to implement some operations that are not efficient on a typical sorted set:

Find the element in the set that is closest to some given value, in O(log N) time.
Find the k-th largest element in the set, in O(log N) time. Requires a simple augmentation of the the skip list with partial counts.
Count the number of elements in the set whose values fall into a given range, in O(log N) time. Also requires a simple augmentation of the skip list.

From a singly-linked list to a skip list

Sometimes the best way to understand how something works is to attempt to design it yourself. Let’s try to go through that exercise with skip lists.

First, consider a regular sorted singly-linked list. Here is an example of one:

A sorted singly-linked list is not a terribly interesting data structure. The complexity of basic operations looks like this:

Operation	Time Complexity
Insertion	O(N)
Removal	O(N)
Check if contains	O(N)
Enumerate in order	O(N)

That is pretty unimpressive, actually. The only interesting value here is the O(N) in-order enumeration. For an insertion, removal or search, O(N) is about as bad as it gets. (There are more specialized use cases where sorted linked lists are appropriate, though.)

So, how can we make these operations on a sorted linked list faster? The main problem with a linked list is that it takes so long to get into its middle. That makes insertion, removal and search operations all O(N).

Well, here is an idea: let’s consider a sorted multi-level list. We start out with a regular singly-linked list that connects nodes in-order. Then, we add a level-2 list that skips every other node. And a level-3 list that skips every other node in the level-2 list. And so forth, until we have a list that jumps somewhere past the middle element. Our previous list now looks like this:

Now, checking whether a particular element is in the set only takes O(log N). The search algorithm is a lot like binary search. We first look in the top-most list and move to the right, making sure that we don’t jump too far. For example, if we are searching for number 8, we will not take the level-3 link from the head node, because we would end up too far right: all the way at 12! If we can’t move further right on a particular level, we drop to the next lower level, which has shorter jumps.

Search for value 8 would proceed like this:

Since we landed on a 7, but we were looking for an 8, that means that 8 is not in the set.

The O(log N) search time is very nice. But, there is a problem: how do we implement insertions and removals efficiently, but in a way so that they maintain the structure of the multi-level list? This turns out to be quite a problem. AVL and red-black trees resolve it by tricky rebalancing operations.

Skip lists take an entirely different approach: a probabilistic one. Instead of ensuring that the level-2 list skips every other node, a skip list is designed in a way that the level-2 list skips one node on average. In some places, it may skip two nodes, and in other places, it may not skip any nodes. But overall, the structure of a skip list is very similar to the structure of a sorted multi-level list.

Here is an example of what a skip list may look like:

A skip list looks a bit like a slightly garbled sorted multi-level list. A skip list has many of the nice properties of a sorted multi-level list, such as O(log N) search times, but also allows simple O(log N) insertions and deletions.

Implementation of skip lists

So, given this description of a skip list, would you know how to implement it? It is not very hard, and it may be worth it to spend a couple minutes thinking about it.

Insertion: decide how many lists will this node be a part of. With a probability of 1/2, make the node a part of the lowest-level list only. With 1/4 probability, the node will be a part of the lowest two lists. With 1/8 probability, the node will be a part of three lists. And so forth. Insert the node at the appropriate position in the lists that it is a part of.
Deletion: remove the node from all sorted lists that it is a part of.
Check if contains: we can use the O(log N) algorithm similar described above on multi-level lists.

And, all of these operations are pretty simple to implement in O(log N) time!

Source code

Here is a sample C# implementation for a skip list of integers:

class IntSkipList
{
    private class Node
    {
        public Node[] Next { get; private set; }
        public int Value { get; private set; }

        public Node(int value, int level)
        {
            Value = value;
            Next = new Node[level];
        }
    }

    private Node _head = new Node(0, 33); // The max. number of levels is 33
    private Random _rand = new Random();
    private int _levels = 1;

    /// 
    /// Inserts a value into the skip list.
    /// 
    public void Insert(int value)
    {
        // Determine the level of the new node. Generate a random number R. The number of
        // 1-bits before we encounter the first 0-bit is the level of the node. Since R is
        // 32-bit, the level can be at most 32.
        int level = 0;
        for (int R = _rand.Next(); (R & 1) == 1; R >>= 1)
        {
            level++;
            if (level == _levels) { _levels++; break; }
        }

        // Insert this node into the skip list
        Node newNode = new Node(value, level + 1);
        Node cur = _head;
        for (int i = _levels - 1; i >= 0; i--)
        {
            for (; cur.Next[i] != null; cur = cur.Next[i])
            {
                if (cur.Next[i].Value > value) break;
            }

            if (i <= level) { newNode.Next[i] = cur.Next[i]; cur.Next[i] = newNode; }
        }
    }

    /// 
    /// Returns whether a particular value already exists in the skip list
    /// 
    public bool Contains(int value)
    {
        Node cur = _head;
        for (int i = _levels - 1; i >= 0; i--)
        {
            for (; cur.Next[i] != null; cur = cur.Next[i])
            {
                if (cur.Next[i].Value > value) break;
                if (cur.Next[i].Value == value) return true;
            }
        }
        return false;
    }

    /// 
    /// Attempts to remove one occurence of a particular value from the skip list. Returns
    /// whether the value was found in the skip list.
    /// 
    public bool Remove(int value)
    {
        Node cur = _head;

        bool found = false;
        for (int i = _levels - 1; i >= 0; i--)
        {
            for (; cur.Next[i] != null; cur = cur.Next[i])
            {
                if (cur.Next[i].Value == value)
                {
                    found = true;
                    cur.Next[i] = cur.Next[i].Next[i];
                    break;
                }

                if (cur.Next[i].Value > value) break;
            }
        }

        return found;
    }

    /// 
    /// Produces an enumerator that iterates over elements in the skip list in order.
    /// 
    public IEnumerable<int> Enumerate()
    {
        Node cur = _head.Next[0];
        while (cur != null)
        {
            yield return cur.Value;
            cur = cur.Next[0];
        }
    }
}

Possible improvements

Obviously, a more useful implementation would be generic, so that we can store values other than integers.
Nodes can be structs instead of classes. This significantly reduces the number of heap allocations. But, we cannot use the null value anymore to represent the tail, which adds a bit of extra code.
The skip list could be made associative, so that each node stores a key/value pair.
The implementation I gave above is a multiset. It is pretty simple to change the skip list so that it implements a set instead.

Related:

Original paper on skip lists [umd.edu]
Skip list [wikipedia.com]
Quicksort killer [igoro.com]
Programming job interview challenge [igoro.com]

Programming job interview challenge

Igor Ostrovsky — Fri, 09 May 2008 07:33:46 +0000

The folks at Dev 102 posted a programming job interview challenge. It is rather easy, but I figured that it may be a nice change of pace, since my last posting was fairly involved.

The challenge is to reverse the bits in each byte, given a large array of bytes. What is the fastest possible solution?

There are only 256 possible values for an 8-bit integer, so it makes sense to pre-compute the reverse of each ahead of time. Then, we can simply scan through the input, and reverse each byte by a simple array lookup. We pay a small initialization cost, but after that, we can reverse bytes nearly as fast as we can read them.

And how to efficiently reverse a byte? There is an obvious solution which uses a for loop and some bit manipulation. My solution below uses a slightly faster approach. It is a nice trick, and not very difficult, so I will leave the details as an exercise to the reader.

Here is a simple solution in C#:

// Reverses bits in each byte in the array 
static void Reverse(byte[] bytes) 
{ 
    // Precompute the value of each reversed byte 
    byte[] reversed = new byte[256]; 
    for (int i = 0; i < 256; i++) reversed[i] = Reverse((byte)i);  

    // Reverse each byte in the input 
    for (int i = 0; i < bytes.Length; i++) bytes[i] = reversed[bytes[i]]; 
}

// Reverses bits in a byte 
static byte Reverse(byte b) 
{ 
    int rev = (b >> 4) | ((b & 0xf) << 4); 
    rev = ((rev & 0xcc) >> 2) | ((rev & 0x33) << 2); 
    rev = ((rev & 0xaa) >> 1) | ((rev & 0x55) << 1);  

    return (byte)rev; 
}

If we are going to be reversing multiple arrays, we can hoist the pre-computation into an initialization routine. And, if handling short arrays is important, we can skip the pre-computation step if the array length is less than 256, and just reverse each byte by calling Reverse(byte).

Not sure what else to add… problem solved!

Quicksort killer

Igor Ostrovsky — Mon, 05 May 2008 00:59:18 +0000

What is the time complexity of quicksort? The answer that first pops up in my head is O(N logN). That answer is only partly right: the worst case is in fact O(N²). However, since very few inputs take anywhere that long, a reasonable quicksort implementation will almost never encounter the quadratic case in real life.

I came across a very cool paper that describes how to easily defeat just about any quicksort implementation. The paper describes a simple comparer that decides ordering of elements lazily as the sort executes, and arranges the order so that the sort takes quadratic time. This works even if the quicksort is randomized! Furthermore, if the quicksort is deterministic (not randomized), this algorithm also reveals the input which reliably triggers quadratic behavior for this particular quicksort implementation.

The trick takes a dozen of code, works with nearly any quicksort routine, and only uses the quicksort via its interface! How cool is that? Here is a C# implementation of this idea:

class QuicksortKiller : IComparer<int>
{
    Dictionary<int, int> keys = new Dictionary<int, int>();

    int candidate = 0;
    public int Compare(int x, int y)
    {
        if (!keys.ContainsKey(x) && !keys.ContainsKey(y))
        {
            if (x == candidate) keys[x] = keys.Count;
            else keys[y] = keys.Count;
        }

        if (!keys.ContainsKey(x)) { candidate = x; return 1; }
        if (!keys.ContainsKey(y)) { candidate = y; return -1; }
        return keys[x] - keys[y];
    }
}

This trick works well when applied to the .Net Array.Sort() method. The following chart displays the number of Compare() calls forced by QuicksortKiller when ran on an array of some size, as well as the number of Compare() calls that made by Array.Sort on a randomly-ordered sequence of the same length:

This chart clearly shows that the QuicksortKiller comparer triggers the quadratic behavior in Array.Sort.

How does it work?

QuicksortKiller’s trick is to ensure that the pivot will compare low against nearly all remaining elements. But, how can we detect which element is the pivot? We know quicksort will take the pivot and compare it against all other elements.

So, initially we consider all elements to be unsorted. The paper refers to them as "gas". When we compare two gas elements, we arbitrarily choose one of them, and freeze it to a value larger than any previously frozen values. We remember the other element as the pivot candidate. If the candidate is used in the next comparison with a gas element, we will make sure to freeze the pivot candidate, rather than the other element. That way, the pivot will be frozen within two comparisons against other elements.

Constructing a "Bad" Array

If the quicksort implementation is deterministic, and always takes the same steps on the same input, it is easy to generate an array that reliably triggers the quadratic behavior. The MakeBadArray method constructs an array that triggers the quadratic-time behavior of Array.Sort():

static int[] MakeBadArray(int length)
{
    int[] arr = Enumerable.Range(0, length).ToArray();
    Array.Sort(arr, new QuicksortKiller()); int[] ret = new int[length];

    for (int i = 0; i < length; i++)
    {
        ret[arr[i]] = i;
    }

    return ret;
}

How to defeat the Quicksort killer?

As the original paper explains, the adversary comparer works if the quicksort implementation checks an O(1) number of elements as pivot candidates. But, quicksort implementation that check more than O(1) elements are possible. For example, the quicksort might choose the median element as the pivot at each step, thus always geting a perfect split of the input sequence into two halves. Median can be found deterministically in O(N) running time, and so the total running time is always O(N logN). The median-based quicksort is rarely used in practice because it tends to have a larger constant than other quicksort implementations.

Another solution is to use a regular quicksort algorithm, and degrade to another sorting algorithm if quicksort is not working out. For example, if we reached a recursive depth of 10, and the size of our partition has not reduced by at least a half, we can just ditch quicksort and sort the partition using heapsort.