<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Igor Ostrovsky Blogging &#187; Algorithms</title>
	<atom:link href="http://igoro.com/archive/category/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://igoro.com</link>
	<description>On programming, technology, and random things of interest</description>
	<lastBuildDate>Fri, 23 Jul 2010 05:24:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Program like a Quake developer</title>
		<link>http://igoro.com/archive/program-like-a-quake-developer/</link>
		<comments>http://igoro.com/archive/program-like-a-quake-developer/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 06:22:52 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=192</guid>
		<description><![CDATA[Not many developers have the insights of Michael Abrash. He is a game developer with decades of experience building commercial games, including a game you may recognize as &#34;Quake&#34;. His Graphics Programming Black Book is years old, but much of it is just as interesting as it was at the time of writing. And, the [...]]]></description>
			<content:encoded><![CDATA[<p>Not many developers have the insights of Michael Abrash. He is a game developer with decades of experience building commercial games, including a game you may recognize as &quot;Quake&quot;. His Graphics Programming Black Book is years old, but much of it is just as interesting as it was at the time of writing. And, the entire book is available <a href="http://www.gamedev.net/reference/articles/article1698.asp">online</a>, for free.</p>
<p>The book consists of 70 chapters on optimization, graphics, and assembly programming. The entire book is insightful and interesting, but my favorite chapters are these:</p>
<ul>
<li><a href="http://downloads.gamedev.net/pdf/gpbb/gpbb1.pdf">The Best Optimizer Is between Your Ears</a>      </li>
<li><a href="http://downloads.gamedev.net/pdf/gpbb/gpbb16.pdf">There Ain’t No Such Thing as the Fastest Code</a>      </li>
<li>Optimizing for <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb11.pdf">286 and 386</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb12.pdf">486</a> (<a href="http://downloads.gamedev.net/pdf/gpbb/gpbb13.pdf">continued</a>), and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb19.pdf">Pentium</a> (continued <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb20.pdf">here</a> and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb21.pdf">here</a>).       </li>
<li>Algorithms for fast drawing of <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb35.pdf">lines</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb42.pdf">anti-aliased lines</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb38.pdf">polygons</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb39.pdf">fast convex polygons</a>, and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb50.pdf">3d objects</a> (<a href="http://downloads.gamedev.net/pdf/gpbb/gpbb51.pdf">continued</a>)      </li>
<li>Quake’s <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb64.pdf">visible-surface determination</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb65.pdf">3D clipping</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb66.pdf">hidden-surface removal</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb67.pdf">span sorting</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb68.pdf">lighting</a>, <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb69.pdf">surface caching</a> and <a href="http://downloads.gamedev.net/pdf/gpbb/gpbb70.pdf">post-mortem</a>.</li>
</ul>
<p>Enjoy! Again, the entire book is accessible here: <a href="http://www.gamedev.net/reference/articles/article1698.asp">Graphics Programming Black Book</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/program-like-a-quake-developer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Efficient auto-complete with a ternary search tree</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/</link>
		<comments>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 00:27:54 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=177</guid>
		<description><![CDATA[pre.code { margin-left: 40px; } Over the past couple of years, auto-complete has popped up all over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn and lots of other websites all try to complete your phrase as soon as you start typing. Auto-complete definitely makes for a nice user experience, but it can be a [...]]]></description>
			<content:encoded><![CDATA[<style>
<p>	pre.code {
		margin-left: 40px;
	}</style>
<p>Over the past couple of years, auto-complete has popped up all over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn and lots of other websites all try to complete your phrase as soon as you start typing.</p>
<p>Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. In many cases, an efficient implementation requires the use of interesting algorithms and data structures. In this blog post, I will describe one simple data structure that can be used to implement auto-complete: a ternary search tree.</p>
<h3>Trie: simple but space-inefficient</h3>
<p> <span id="more-177"></span>
<p>Before discussing ternary search trees, let&#8217;s take a look at a simple data structure that supports a fast auto-complete lookup but needs too much memory: a <em>trie</em>. A trie is a tree-like data structure in which each node contains an array of pointers, one pointer for each character in the alphabet. Starting at the root node, we can trace a word by following pointers corresponding to the letters in the target word.</p>
<p>Each node could be implemented like this in C#:</p>
<pre class="code"><span style="color: blue">class </span><span style="color: #2b91af">TrieNode
</span>{
    <span style="color: blue">public const int </span>ALPHABET_SIZE = 26;
    <span style="color: blue">public </span><span style="color: #2b91af">TrieNode</span>[] m_pointers = <span style="color: blue">new </span><span style="color: #2b91af">TrieNode</span>[ALPHABET_SIZE];
    <span style="color: blue">public bool </span>m_endsString = <span style="color: blue">false</span>;
}</pre>
<p>Here is a trie that stores words AB, ABBA, ABCD, and BCD. Nodes that terminate words are marked yellow:</p>
<p>&#160;</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="545" alt="gif_1" src="http://igoro.com/wordpress/wp-content/uploads/2009/08/gif-1.gif" width="577" border="0" /></p>
<p>&#160;</p>
<p>Implementing auto complete using a trie is easy. We simply trace pointers to get to a node that represents the string the user entered. By exploring the trie from that node down, we can enumerate all strings that complete user&#8217;s input.</p>
<p>But, a trie has a major problem that you can see in the diagram above. The diagram only fits on the page because the trie only supports four letters {A,B,C,D}. If we needed to support all 26 English letters, each node would have to store 26 pointers. And, if we need to support international characters, punctuation, or distinguish between lowercase and uppercase characters, the memory usage grows becomes untenable.</p>
<p>Our problem has to do with the memory taken up by all the null pointers stored in the node arrays. We could consider using a different data structure in each node, such as a hash map. However, managing thousands and thousands of hash maps is generally not a good idea, so let’s take a look at a better solution.</p>
<h3>Ternary search tree to the rescue</h3>
<p>A ternary tree is a data structure that solves the memory problem of tries in a more clever way. To avoid the memory occupied by unnecessary pointers, each trie node is represented as a tree-within-a-tree rather than as an array. Each non-null pointer in the trie node gets its own node in a ternary search tree.</p>
<p>For example, the trie from the example above would be represented in the following way as a ternary search tree:</p>
<p><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="375" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image.png" width="183" border="0" /></p>
<p>The ternary search tree contains three types of arrows. First, there are arrows that correspond to arrows in the corresponding trie, shown as dashed down-arrows. Traversing a down-arrow corresponds to “matching” the character from which the arrow starts. The left- and right- arrow are traversed when the current character does not match the desired character at the current position. We take the left-arrow if the character we are looking for is alphabetically before the character in the current node, and the right-arrow in the opposite case.</p>
<p>For example, green arrows show how we’d confirm that the ternary tree contains string ABBA:</p>
<p>&#160;<img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="375" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image1.png" width="183" border="0" /></p>
<p>And this is how we’d find that the ternary string does not contain string ABD:</p>
<p><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="375" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image2.png" width="183" border="0" />&#160;</p>
<h3>Ternary search tree on a server</h3>
<p>On the web, a significant chunk of the auto-complete work has to be done by the server. Often, the set of possible completions is large, so it is usually not a good idea to download all of it to the client. Instead, the ternary tree is stored on the server, and the client will send prefix queries to the server.</p>
<p>The client will send a query for words starting with “bin” to the server:</p>
<p>&#160; <img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="218" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image3.png" width="639" border="0" /></p>
<p>And the server responds with a list of possible words:</p>
<p><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="218" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2009/09/image4.png" width="452" border="0" />&#160;</p>
<h3>Implementation</h3>
<p>Here is a simple ternary search tree implementation in C#:</p>
<pre class="code"><span style="color: blue">public </span><span style="color: blue">class </span><span style="color: #2b91af">TernaryTree
</span>{
    <span style="color: blue">private </span><span style="color: #2b91af">Node </span>m_root = <span style="color: blue">null</span>;

    <span style="color: blue">private void </span>Add(<span style="color: blue">string </span>s, <span style="color: blue">int </span>pos, <span style="color: blue">ref </span><span style="color: #2b91af">Node </span>node)
    {
        <span style="color: blue">if </span>(node == <span style="color: blue">null</span>) { node = <span style="color: blue">new </span><span style="color: #2b91af">Node</span>(s[pos], <span style="color: blue">false</span>); }

        <span style="color: blue">if </span>(s[pos] &lt; node.m_char) { Add(s, pos, <span style="color: blue">ref </span>node.m_left); }
        <span style="color: blue">else if </span>(s[pos] &gt; node.m_char) { Add(s, pos, <span style="color: blue">ref </span>node.m_right); }
        <span style="color: blue">else
        </span>{
            <span style="color: blue">if </span>(pos + 1 == s.Length) { node.m_wordEnd = <span style="color: blue">true</span>; }
            <span style="color: blue">else </span>{ Add(s, pos + 1, <span style="color: blue">ref </span>node.m_center); }
        }
    }

    <span style="color: blue">public void </span>Add(<span style="color: blue">string </span>s)
    {
        <span style="color: blue">if </span>(s == <span style="color: blue">null </span>|| s == <span style="color: #a31515">&quot;&quot;</span>) <span style="color: blue">throw new </span><span style="color: #2b91af">ArgumentException</span>();

        Add(s, 0, <span style="color: blue">ref </span>m_root);
    }

    <span style="color: blue">public bool </span>Contains(<span style="color: blue">string </span>s)
    {
        <span style="color: blue">if </span>(s == <span style="color: blue">null </span>|| s == <span style="color: #a31515">&quot;&quot;</span>) <span style="color: blue">throw new </span><span style="color: #2b91af">ArgumentException</span>();

        <span style="color: blue">int </span>pos = 0;
        <span style="color: #2b91af">Node </span>node = m_root;
        <span style="color: blue">while </span>(node != <span style="color: blue">null</span>)
        {
            <span style="color: blue">int </span>cmp = s[pos] - node.m_char;
            <span style="color: blue">if </span>(s[pos] &lt; node.m_char) { node = node.m_left; }
            <span style="color: blue">else if </span>(s[pos] &gt; node.m_char) { node = node.m_right; }
            <span style="color: blue">else
            </span>{
                <span style="color: blue">if </span>(++pos == s.Length) <span style="color: blue">return </span>node.m_wordEnd;
                node = node.m_center;
            }
        }

        <span style="color: blue">return false</span>;
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<p><a href="http://11011.net/software/vspaste"></a></p>
<p><strong></strong></p>
<p>And here is the Node class:</p>
<pre class="code"><span style="color: blue">class </span><span style="color: #2b91af">Node
</span>{
    <span style="color: blue">internal char </span>m_char;
    <span style="color: blue">internal </span><span style="color: #2b91af">Node </span>m_left, m_center, m_right;
    <span style="color: blue">internal bool </span>m_wordEnd;

    <span style="color: blue">public </span>Node(<span style="color: blue">char </span>ch, <span style="color: blue">bool </span>wordEnd)
    {
        m_char = ch;
        m_wordEnd = wordEnd;
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<h3>Remarks</h3>
<p>For best performance, strings should be inserted into the ternary tree in a random order. In particular, do not insert strings in the alphabetical order. Each mini-tree that corresponds to a single trie node would degenerate into a linked list, significantly increasing the cost of lookups. Of course, more complex self-balancing ternary trees can be implemented as well.</p>
<p>And, don’t use a fancier data structure than you have to. If you only have a relatively small set of candidate words (say on the order of hundreds) a brute-force search should be fast enough.</p>
<h3>Further reading</h3>
<p>Another article on tries is available on DDJ (careful, their implementation assumes that no word is a prefix of another):</p>
<p><a title="http://www.ddj.com/windows/184410528" href="http://www.ddj.com/windows/184410528">http://www.ddj.com/windows/184410528</a></p>
<p>If you like this article, also check out these posts on my blog:</p>
<ul>
<li><a href="http://igoro.com/archive/skip-lists-are-fascinating/">Skip lists are fascinating!</a> </li>
<li><a href="http://igoro.com/archive/numbers-that-cannot-be-computed/">Numbers that cannot be computed</a> </li>
<li><a href="http://igoro.com/archive/quicksort-killer/">Quicksort killer</a> </li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Data structure zoo: ordered set</title>
		<link>http://igoro.com/archive/data-structure-zoo-ordered-set/</link>
		<comments>http://igoro.com/archive/data-structure-zoo-ordered-set/#comments</comments>
		<pubDate>Fri, 29 Aug 2008 09:01:16 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/archive/data-structure-zoo-ordered-set/</guid>
		<description><![CDATA[This article is the first one in a series titled Data structure zoo. Each article will give you a &#8220;working knowledge&#8221; of data structures that solve a particular problem. You won&#8217;t necessarily know how to implement each one, but you will have a good idea of the main characteristics of each solution and how to [...]]]></description>
			<content:encoded><![CDATA[<p>This article is the first one in a series titled Data structure zoo. Each article will give you a &#8220;working knowledge&#8221; of data structures that solve a particular problem. You won&#8217;t necessarily know how to implement each one, but you will have a good idea of the main characteristics of each solution and how to pick among them to solve your particular problem.</p>
<p>Today, I am writing about data structures to represent an ordered set. An ordered set is a common data structure that supports O(log N) lookups, insertions and removals. Ordered set is also sometimes used as an alternative to a hash map, for example in STL&#8217;s map.</p>
</p>
<p><span id="more-77"></span></p>
<p>Complexities of various operations on an ordered set are as follows:</p>
<ul>
<li>O(log N) insertion and removal.
<li>O(log N) check if contains a value.
<li>O(N) enumeration in sorted order.
<li>O(log N) to find the element in the set closest to some value.
<li>O(log N) to find k-th largest element in the set. This operation requires a simple augmentation of the the ordered set with partial counts.
<li>O(log N) to count the number of elements in the set whose values fall into a given range, in O(log N) time. Also requires a simple augmentation of the ordered set. </li>
</ul>
<p>Now, let&#8217;s look at the different ways to implement an ordered set.</p>
<p><strong>AVL Tree</strong></p>
<p>In 1962, Russian mathematicians G.M. Adelson-Velsky and E.M. Landis presented a first solution to the ordered set problem, in a paper called &#8220;An algorithm for organization of information&#8221;. Given the ambitious title of the paper, it is clear that they knew that they were onto something big. The algorithm of Adelson-Velsky and Landis stores elements of the set in a binary search tree and ensures that the tree always remains perfectly balanced. The tree balances itself after each modification, hence it is an example of a <em>self-balancing</em> tree. Here is an example of an AVL tree:</p>
<p>&nbsp;<img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="181" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2008/08/image.png" width="349" border="0"></p>
<p>Notice that the tree is nicely organized so that the maximum number of steps to reach all nodes from the root is as small as possible. That is a very important property, which ensures that the complexity of basic operations such as lookups, insertions and removals is O(log N).</p>
<p>This idea is simple, but its implementation is not. The difficulty in the AVL tree design is inserting and removing nodes, while maintaining the balanced shape. You can read about the algorithm <a href="http://en.wikipedia.org/wiki/AVL_tree">here</a>. These are a few important things to know about how rebalancing works:</p>
<ul>
<li>Rebalancing the tree consists of O(log N) constant-time operations called rotations.
<li>Consequently, insertion or removal from the tree followed by rebalancing can be done in O(log N) time.
<li>Each rotation is a rearrangement of a small number of neighboring nodes in the tree. </li>
</ul>
<p><strong></strong>&nbsp;</p>
<p><strong>Red-black Tree</strong></p>
<p>Red-black tree is another popular algorithm for a self-balancing binary search tree, but one that makes slightly different trade-offs from the AVL tree. The main difference is that a red-black tree is not quite as neatly organized as an AVL tree. In an AVL tree, the longest path from a leaf to the root is at most one larger than the shortest path from a leaf to the root. In a red-black tree, the longest path to a leaf could be up to twice as long as the shortest path to a leaf.</p>
<p>So, you could say that a red-black is rebalanced more haphazardly. A lookup in a red-black tree may take longer, but on the plus side, the haphazard rebalancing is significantly cheaper. The tree tracks each node as either red or black. The color of the node is used in the complex rebalancing algorithm that I will not detail here, but that you can read all about <a href="http://en.wikipedia.org/wiki/Red-black_tree">here</a>.</p>
<p>In practice, red-black trees tend to have slower lookups, but faster insertions and removals than AVL trees. Fast insertions and removals make red-black tree a popular data structure in system programming. For example, according to <a href="http://lwn.net/Articles/184495/">this article</a>, the Linux kernel uses red-black trees to organize requests in various schedulers, the packet CD/DVD driver and the high-resolution timer, to track directory entries in an ext3 filesystem, and to lookup virtual memory areas, epoll file descriptors, cryptographic keys, and network packets. Also, red-black trees are used by the TreeSet in Java and map in STL.</p>
<p>Here is an example of a red-black tree. Notice that the tree has more levels than it absolutely would have to, but not by much (only by one level in this case):</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="229" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2008/08/image1.png" width="325" border="0">&nbsp;</p>
<p><strong></strong>&nbsp;</p>
<p><strong>Treap</strong></p>
<p>One interesting fact about self-balancing trees is that if you only insert random numbers, all of the tricky rebalancing is in fact unnecessarily. It can be mathematically proven that a sequence of random-number insertions is nearly guaranteed to fill the tree uniformly, so that it fairly nicely balanced. Self-balancing trees are only useful for special (but common) cases, such as when elements are inserted in an increasing or nearly-increasing order.</p>
<p>If we could ensure that elements are inserted in a random order, we would not need rebalancing. Of course, we cannot do that because in most scenarios we don&#8217;t have control over the order of insertions. But, we can still simulate a random insertion order, which is the main idea behind a treap.</p>
<p>Treap, just like an AVL tree or a red-black tree, is based on a binary search tree. As each value is inserted, treap assigns a priority to that value. Then, treap rearranges the tree as if values had been inserted in the priority order. So, values with low priorities will move closer to the root than values with higher priorities. In other words, a treap has a heap property with respect to priorities.</p>
<p>Treaps are significantly easier to implement than AVL or red-black trees, and have the same expected time complexity of operations. Theoretically, a treap could degenerate so that lookups become O(N) rather than O(log N), but the probability of that happening is extremely small.</p>
<p>&nbsp;</p>
<p><strong>Splay Tree</strong></p>
<p>A splay tree is another self-balancing tree, but with an additional interesting property: the element that has been accessed most recently is at the root, and other recently accessed elements are also near the top. If we search for an element that we looked up recently, we will find it very quickly because it is near the top. This is a nice property because many real-world programs will access some values much more frequently than other values.</p>
<p>Unfortunately, there are two penalties that a splay tree pays for the nice cache-like behavior. The first penalty is that in the worst case, a lookup in a splay tree may take O(N) time, even though amortized over many operations, it is guaranteed to be O(log N) on average.</p>
<p>The second &#8211; even more serious &#8211; penalty is that lookups modify the tree as they perform rotations to move the recently accessed element to the root. The fact that a lookup performs O(log N) writes into the tree significantly adds to its cost on a modern hardware.</p>
<p>The FreeBSD operating system uses splay trees to organize memory pages in it its virtual memory manager. However, it seems that at least some contributors do not consider splay trees to be ideal for this purpose, and proposed a Google Summer of Code <a href="http://code.google.com/soc/2008/freebsd/appinfo.html?csaid=40F1D83FF04D3C45">project</a> to modify the virtual memory manager to use a different data structure.</p>
<p>&nbsp;</p>
<p><strong>Skip List</strong></p>
<p>A skip list is a curiously different solution to the ordered set problem. A relatively new data structure, skip list was discovered in 1990 by professor of computer science at University of Maryland, William Pugh. Like a treap, skip list exploits randomness to implement an ordered set in a simple way. Skip list is essentially an ordered linked list&#8230; except that we add more links!</p>
<p>The problem with an ordered linked list is that to get into the middle of the list, we need to scan over half of the list, making the complexity of the lookup O(N). Skip list adds links that move more than one node forward, skipping over some nodes.</p>
<p>We use randomness to decide where to add links that skip over nodes. For a detailed description of how skip lists work and a C# implementation, see my <a href="http://igoro.com/archive/skip-lists-are-fascinating/">earlier article</a> on skip lists.</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="161" alt="skiplist" src="http://igoro.com/wordpress/wp-content/uploads/2008/08/skiplist.png" width="553" border="0"></p>
<p><strong></strong>&nbsp;</p>
<p><strong>Pre-Balanced Tree</strong></p>
<p>If you try to find information on pre-balanced trees, you will probably fail. That is because I don&#8217;t know what the real name of &#8220;pre-balanced trees&#8221; is, or even whether they have one.</p>
<p>It is simply a data structure that occurred to me while thinking about a problem in a programming contest. A pre-balanced tree is another possible solution to the ordered set problem. A pre-balanced tree has one huge advantage and one huge disadvantage over the other data structures mentioned in this article. The advantage is that it is brain-dead simple, even simpler than a treap or a skip list. The disadvantage is that we need to know the values that we will insert into the set ahead of time.</p>
<p>&#8220;Know the values that will be inserted in the future? This person is clearly crazy,&#8221; you are probably thinking. Well, I may be, but it still turns out that in many real-world problems, you actually know the values that will be inserted ahead of time. For example, a very common use case is to iterate over a sequence of elements, incrementally adding them into the set, and observing the state of the set in between the insertions.</p>
<p>In such cases, we can construct a fully balanced binary search tree that contains all values that will eventually be inserted. Initially, all values are &#8220;disabled&#8221;. Inserting a node enables it. To look up a value, find the node in the binary tree that contains the value. The set contains that value if such node is in the tree, and it is also enabled.</p>
<p>Here is an example of a pre-balanced tree, with nodes 18, 19 and 45 already inserted:</p>
<p>&nbsp;<img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="181" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2008/08/image2.png" width="349" border="0"></p>
<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2figoro.com%2farchive%2fdata-structure-zoo-ordered-set%2f"><img alt="kick it on DotNetKicks.com" src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2figoro.com%2farchive%2fdata-structure-zoo-ordered-set%2f" border="0"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/data-structure-zoo-ordered-set/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Skip lists are fascinating!</title>
		<link>http://igoro.com/archive/skip-lists-are-fascinating/</link>
		<comments>http://igoro.com/archive/skip-lists-are-fascinating/#comments</comments>
		<pubDate>Mon, 21 Jul 2008 06:43:37 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/archive/skip-lists-are-fascinating/</guid>
		<description><![CDATA[Skip lists are a fascinating data structure: very simple, and yet have the same asymptotic efficiency as much more complicated AVL trees and red-black trees. While many standard libraries for various programming languages provide a sorted set data structure, there are numerous problems that require more control over the internal data structure than a sorted [...]]]></description>
			<content:encoded><![CDATA[<p>Skip lists are a fascinating data structure: very simple, and yet have the same asymptotic efficiency as much more complicated AVL trees and red-black trees. While many standard libraries for various programming languages provide a sorted set data structure, there are numerous problems that require more control over the internal data structure than a sorted set exposes. In this article, I will discuss the asymptotic efficiency of operations on skip lists, the ideas that make them work, and their interesting use cases. And, of course, I will give you the source code for a skip list in C#.</p>
<p><span id="more-70"></span></p>
<p>The time complexity of basic operations on a skip list is as follows:</p>
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td width="200" valign="top"><em>Operation</em></td>
<td width="200" valign="top"><em>Time Complexity</em></td>
</tr>
<tr>
<td width="200" valign="top">Insertion</td>
<td width="200" valign="top">O(log N)</td>
</tr>
<tr>
<td width="200" valign="top">Removal</td>
<td width="200" valign="top">O(log N)</td>
</tr>
<tr>
<td width="200" valign="top">Check if contains</td>
<td width="200" valign="top">O(log N)</td>
</tr>
<tr>
<td width="200" valign="top">Enumerate in order</td>
<td width="200" valign="top">O(N)</td>
</tr>
</tbody>
</table>
<p>This makes skip list a very useful data structure. First, as mentioned earlier, skip list can be used as the underlying storage for a sorted set data structure. But, skip list can be directly used to implement some operations that are not efficient on a typical sorted set:</p>
<ul>
<li>Find the element in the set that is closest to some given value, in O(log N) time.</li>
<li>Find the k-th largest element in the set, in O(log N) time. Requires a simple  augmentation of the the skip list with partial counts.</li>
<li>Count the number of elements in the set whose values fall into a given range, in O(log N) time. Also requires a simple augmentation of the skip list.</li>
</ul>
<p><strong>From a singly-linked list to a skip list</strong></p>
<p>Sometimes the best way to understand how something works is to attempt to design it yourself. Let&#8217;s try to go through that exercise with skip lists.</p>
<p>First, consider a regular sorted singly-linked list. Here is an example of one:</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://igoro.com/wordpress/wp-content/uploads/2008/07/list.png" border="0" alt="list" width="552" height="80" /></p>
<p>A sorted singly-linked list is not a terribly interesting data structure. The complexity of basic operations looks like this:</p>
<table border="0" cellspacing="0" cellpadding="2" width="402">
<tbody>
<tr>
<td width="200" valign="top"><em>Operation</em></td>
<td width="200" valign="top"><em>Time Complexity</em></td>
</tr>
<tr>
<td width="200" valign="top">Insertion</td>
<td width="200" valign="top">O(N)</td>
</tr>
<tr>
<td width="200" valign="top">Removal</td>
<td width="200" valign="top">O(N)</td>
</tr>
<tr>
<td width="200" valign="top">Check if contains</td>
<td width="200" valign="top">O(N)</td>
</tr>
<tr>
<td width="200" valign="top">Enumerate in order</td>
<td width="200" valign="top">O(N)</td>
</tr>
</tbody>
</table>
<p>That is pretty unimpressive, actually. The only interesting value here is the O(N) in-order enumeration. For an insertion, removal or search, O(N) is about as bad as it gets. (There are more specialized use cases where sorted linked lists are appropriate, though.)</p>
<p>So, how can we make these operations on a sorted linked list faster? The main problem with a linked list is that it takes so long to get into its middle. That makes insertion, removal and search operations all O(N).</p>
<p>Well, here is an idea: let&#8217;s consider a sorted multi-level list. We start out with a regular singly-linked list that connects nodes in-order. Then, we add a level-2 list that skips every other node. And a level-3 list that skips every other node in the level-2 list. And so forth, until we have a list that jumps somewhere past the middle element. Our previous list now looks like this:</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://igoro.com/wordpress/wp-content/uploads/2008/07/multilist.png" border="0" alt="multilist" width="553" height="161" /></p>
<p>Now, checking whether a particular element is in the set only takes O(log N). The search algorithm is a lot like binary search. We first look in the top-most list and move to the right, making sure that we don&#8217;t jump too far. For example, if we are searching for number 8, we will not take the level-3 link from the head node, because we would end up too far right: all the way at 12! If we can&#8217;t move further right on a particular level, we drop to the next lower level, which has shorter jumps.</p>
<p>Search for value 8 would proceed like this:</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://igoro.com/wordpress/wp-content/uploads/2008/07/multilist-search.png" border="0" alt="multilist-search" width="553" height="161" /></p>
<p>Since we landed on a 7, but we were looking for an 8, that means that 8 is not in the set.</p>
<p>The O(log N) search time is very nice. But, there is a problem: how do we implement insertions and removals efficiently, but in a way so that they maintain the structure of the multi-level list? This turns out to be quite a problem. AVL and red-black trees resolve it by tricky rebalancing operations.</p>
<p>Skip lists take an entirely different approach: a probabilistic one. Instead of ensuring that the level-2 list skips every other node, a skip list is designed in a way that the level-2 list skips one node on average. In some places, it may skip two nodes, and in other places, it may not skip any nodes. But overall, the structure of a skip list is very similar to the structure of a sorted multi-level list.</p>
<p>Here is an example of what a skip list may look like:</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://igoro.com/wordpress/wp-content/uploads/2008/07/skiplist.png" border="0" alt="skiplist" width="553" height="161" /></p>
<p>A skip list looks a bit like a slightly garbled sorted multi-level list. A skip list has many of the nice properties of a sorted multi-level list, such as O(log N) search times, but also allows simple O(log N) insertions and deletions.</p>
<p><strong>Implementation of skip lists</strong></p>
<p>So, given this description of a skip list, would you know how to implement it? It is not very hard, and it may be worth it to spend a couple minutes thinking about it.</p>
<ul>
<li>Insertion: decide how many lists will this node be a part of. With a probability of 1/2, make the node a part of the lowest-level list only. With 1/4 probability, the node will be a part of the lowest two lists. With 1/8 probability, the node will be a part of three lists. And so forth. Insert the node at the appropriate position in the lists that it is a part of.</li>
<li>Deletion: remove the node from all sorted lists that it is a part of.</li>
<li>Check if contains: we can use the O(log N) algorithm similar described above on multi-level lists.</li>
</ul>
<p>And, all of these operations are pretty simple to implement in O(log N) time!</p>
<p><strong>Source code</strong></p>
<p>Here is a sample C# implementation for a skip list of integers:</p>
<pre class="code"><span style="color: #0000ff;">class </span><span style="color: #2b91af;">IntSkipList
</span>{
    <span style="color: #0000ff;">private class </span><span style="color: #2b91af;">Node
    </span>{
        <span style="color: #0000ff;">public </span><span style="color: #2b91af;">Node</span>[] Next { <span style="color: #0000ff;">get</span>; <span style="color: #0000ff;">private set</span>; }
        <span style="color: #0000ff;">public int </span>Value { <span style="color: #0000ff;">get</span>; <span style="color: #0000ff;">private set</span>; }

        <span style="color: #0000ff;">public </span>Node(<span style="color: #0000ff;">int </span>value, <span style="color: #0000ff;">int </span>level)
        {
            Value = value;
            Next = <span style="color: #0000ff;">new </span><span style="color: #2b91af;">Node</span>[level];
        }
    }

    <span style="color: #0000ff;">private </span><span style="color: #2b91af;">Node </span>_head = <span style="color: #0000ff;">new </span><span style="color: #2b91af;">Node</span>(0, 33); <span style="color: #008000;">// The max. number of levels is 33
    </span><span style="color: #0000ff;">private </span><span style="color: #2b91af;">Random </span>_rand = <span style="color: #0000ff;">new </span><span style="color: #2b91af;">Random</span>();
    <span style="color: #0000ff;">private int </span>_levels = 1;

    <span style="color: #808080;">/// &lt;summary&gt;
    /// </span><span style="color: #008000;">Inserts a value into the skip list.
    </span><span style="color: #808080;">/// &lt;/summary&gt;
    </span><span style="color: #0000ff;">public void </span>Insert(<span style="color: #0000ff;">int </span>value)
    {
        <span style="color: #008000;">// Determine the level of the new node. Generate a random number R. The number of
        // 1-bits before we encounter the first 0-bit is the level of the node. Since R is
        // 32-bit, the level can be at most 32.
        </span><span style="color: #0000ff;">int </span>level = 0;
        <span style="color: #0000ff;">for </span>(<span style="color: #0000ff;">int </span>R = _rand.Next(); (R &amp; 1) == 1; R &gt;&gt;= 1)
        {
            level++;
            <span style="color: #0000ff;">if </span>(level == _levels) { _levels++; <span style="color: #0000ff;">break</span>; }
        }

        <span style="color: #008000;">// Insert this node into the skip list
        </span><span style="color: #2b91af;">Node </span>newNode = <span style="color: #0000ff;">new </span><span style="color: #2b91af;">Node</span>(value, level + 1);
        <span style="color: #2b91af;">Node </span>cur = _head;
        <span style="color: #0000ff;">for </span>(<span style="color: #0000ff;">int </span>i = _levels - 1; i &gt;= 0; i--)
        {
            <span style="color: #0000ff;">for </span>(; cur.Next[i] != <span style="color: #0000ff;">null</span>; cur = cur.Next[i])
            {
                <span style="color: #0000ff;">if </span>(cur.Next[i].Value &gt; value) <span style="color: #0000ff;">break</span>;
            }

            <span style="color: #0000ff;">if </span>(i &lt;= level) { newNode.Next[i] = cur.Next[i]; cur.Next[i] = newNode; }
        }
    }

    <span style="color: #808080;">/// &lt;summary&gt;
    /// </span><span style="color: #008000;">Returns whether a particular value already exists in the skip list
    </span><span style="color: #808080;">/// &lt;/summary&gt;
    </span><span style="color: #0000ff;">public bool </span>Contains(<span style="color: #0000ff;">int </span>value)
    {
        <span style="color: #2b91af;">Node </span>cur = _head;
        <span style="color: #0000ff;">for </span>(<span style="color: #0000ff;">int </span>i = _levels - 1; i &gt;= 0; i--)
        {
            <span style="color: #0000ff;">for </span>(; cur.Next[i] != <span style="color: #0000ff;">null</span>; cur = cur.Next[i])
            {
                <span style="color: #0000ff;">if </span>(cur.Next[i].Value &gt; value) <span style="color: #0000ff;">break</span>;
                <span style="color: #0000ff;">if </span>(cur.Next[i].Value == value) <span style="color: #0000ff;">return true</span>;
            }
        }
        <span style="color: #0000ff;">return false</span>;
    }

    <span style="color: #808080;">/// &lt;summary&gt;
    /// </span><span style="color: #008000;">Attempts to remove one occurence of a particular value from the skip list. Returns
    </span><span style="color: #808080;">/// </span><span style="color: #008000;">whether the value was found in the skip list.
    </span><span style="color: #808080;">/// &lt;/summary&gt;
    </span><span style="color: #0000ff;">public bool </span>Remove(<span style="color: #0000ff;">int </span>value)
    {
        <span style="color: #2b91af;">Node </span>cur = _head;

        <span style="color: #0000ff;">bool </span>found = <span style="color: #0000ff;">false</span>;
        <span style="color: #0000ff;">for </span>(<span style="color: #0000ff;">int </span>i = _levels - 1; i &gt;= 0; i--)
        {
            <span style="color: #0000ff;">for </span>(; cur.Next[i] != <span style="color: #0000ff;">null</span>; cur = cur.Next[i])
            {
                <span style="color: #0000ff;">if </span>(cur.Next[i].Value == value)
                {
                    found = <span style="color: #0000ff;">true</span>;
                    cur.Next[i] = cur.Next[i].Next[i];
                    <span style="color: #0000ff;">break</span>;
                }

                <span style="color: #0000ff;">if </span>(cur.Next[i].Value &gt; value) <span style="color: #0000ff;">break</span>;
            }
        }

        <span style="color: #0000ff;">return </span>found;
    }

    <span style="color: #808080;">/// &lt;summary&gt;
    /// </span><span style="color: #008000;">Produces an enumerator that iterates over elements in the skip list in order.
    </span><span style="color: #808080;">/// &lt;/summary&gt;
    </span><span style="color: #0000ff;">public </span><span style="color: #2b91af;">IEnumerable</span>&lt;<span style="color: #0000ff;">int</span>&gt; Enumerate()
    {
        <span style="color: #2b91af;">Node </span>cur = _head.Next[0];
        <span style="color: #0000ff;">while </span>(cur != <span style="color: #0000ff;">null</span>)
        {
            <span style="color: #0000ff;">yield return </span>cur.Value;
            cur = cur.Next[0];
        }
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a><a href="http://11011.net/software/vspaste"></a></p>
<p><strong>Possible improvements</strong></p>
<ul>
<li>Obviously, a more useful implementation would be generic, so that we can store values other than integers.</li>
<li>Nodes can be structs instead of classes. This significantly reduces the number of heap allocations. But, we cannot use the null value anymore to represent the tail, which adds a bit of extra code.</li>
<li>The skip list could be made associative, so that each node stores a key/value pair.</li>
<li>The implementation I gave above is a multiset. It is pretty simple to change the skip list so that it implements a set instead.</li>
</ul>
<p><strong>Related:</strong></p>
<ul>
<li><a href="ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf">Original paper on skip lists</a> [umd.edu]</li>
<li><a href="http://en.wikipedia.org/wiki/Skip_list">Skip list</a> [wikipedia.com]</li>
<li><a href="http://igoro.com/archive/quicksort-killer/">Quicksort killer</a> [igoro.com]</li>
<li><a href="http://igoro.com/archive/programming-job-interview-challenge/">Programming job interview challenge</a> [igoro.com]</li>
</ul>
<p><a style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2figoro.com%2farchive%2fskip-lists-are-fascinating%2f"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2figoro.com%2farchive%2fskip-lists-are-fascinating%2f" border="0" alt="kick it on DotNetKicks.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/skip-lists-are-fascinating/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Programming job interview challenge</title>
		<link>http://igoro.com/archive/programming-job-interview-challenge/</link>
		<comments>http://igoro.com/archive/programming-job-interview-challenge/#comments</comments>
		<pubDate>Fri, 09 May 2008 07:33:46 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/archive/programming-job-interview-challenge/</guid>
		<description><![CDATA[The folks at Dev 102 posted a programming job interview challenge. It is rather easy, but I figured that it may be a nice change of pace, since my last posting was fairly involved. The challenge is to reverse the bits in each byte, given a large array of bytes. What is the fastest possible [...]]]></description>
			<content:encoded><![CDATA[<p>The folks at Dev 102 posted a <a href="http://www.dev102.com/2008/05/05/a-programming-job-interview-challenge-2/">programming job interview challenge</a>. It is rather easy, but I figured that it may be a nice change of pace, since my <a href="http://igoro.com/archive/quicksort-killer/">last posting</a> was fairly involved.</p>
<p>The challenge is to reverse the bits in each byte, given a large array of bytes. What is the fastest possible solution?</p>
<p><span id="more-42"></span></p>
<p>There are only 256 possible values for an 8-bit integer, so it makes sense to pre-compute the reverse of each ahead of time. Then, we can simply scan through the input, and reverse each byte by a simple array lookup. We pay a small initialization cost, but after that, we can reverse bytes nearly as fast as we can read them.</p>
<p>And how to efficiently reverse a byte? There is an obvious solution which uses a for loop and some bit manipulation. My solution below uses a slightly faster approach. It is a nice trick, and not very difficult, so I will leave the details as an exercise to the reader. <img src='http://igoro.com/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Here is a simple solution in C#:</p>
<pre class="code"><span style="color: green">// Reverses bits in each byte in the array
</span><span style="color: blue">static void </span>Reverse(<span style="color: blue">byte</span>[] bytes)
{
    <span style="color: green">// Precompute the value of each reversed byte
    </span><span style="color: blue">byte</span>[] reversed = <span style="color: blue">new byte</span>[256];
    <span style="color: blue">for </span>(<span style="color: blue">int </span>i = 0; i &lt; 256; i++) reversed[i] = Reverse((<span style="color: blue">byte</span>)i);  

    <span style="color: green">// Reverse each byte in the input
    </span><span style="color: blue">for </span>(<span style="color: blue">int </span>i = 0; i &lt; bytes.Length; i++) bytes[i] = reversed[bytes[i]];
}  

<span style="color: green"></span></pre>
<p>
  </p>
<pre class="code"><span style="color: green">// Reverses bits in a byte
</span><span style="color: blue">static byte </span>Reverse(<span style="color: blue">byte </span>b)
{
    <span style="color: blue">int </span>rev = (b &gt;&gt; 4) | ((b &amp; 0xf) &lt;&lt; 4);
    rev = ((rev &amp; 0xcc) &gt;&gt; 2) | ((rev &amp; 0x33) &lt;&lt; 2);
    rev = ((rev &amp; 0xaa) &gt;&gt; 1) | ((rev &amp; 0x55) &lt;&lt; 1);  

    <span style="color: blue">return </span>(<span style="color: blue">byte</span>)rev;
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<p>If we are going to be reversing multiple arrays, we can hoist the pre-computation into an initialization routine. And, if handling short arrays is important, we can skip the pre-computation step if the array length is less than 256, and just reverse each byte by calling Reverse(byte).</p>
<p>Not sure what else to add&#8230; problem solved!</p>
]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/programming-job-interview-challenge/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Quicksort killer</title>
		<link>http://igoro.com/archive/quicksort-killer/</link>
		<comments>http://igoro.com/archive/quicksort-killer/#comments</comments>
		<pubDate>Mon, 05 May 2008 00:59:18 +0000</pubDate>
		<dc:creator>Igor Ostrovsky</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://igoro.com/?p=33</guid>
		<description><![CDATA[What is the time complexity of quicksort? The answer that first pops up in my head is O(N logN). That answer is only partly right: the worst case is in fact O(N2). However, since very few inputs take anywhere that long, a reasonable quicksort implementation will almost never encounter the quadratic case in real life. [...]]]></description>
			<content:encoded><![CDATA[<p>What is the time complexity of quicksort? The answer that first pops up in my head is O(N logN). That answer is only partly right: the worst case is in fact O(N<sup>2</sup>). However, since very few inputs take anywhere that long, a reasonable quicksort implementation will almost never encounter the quadratic case in real life.</p>
<p>I came across a very cool <a href="http://www.cs.dartmouth.edu/~doug/mdmspe.pdf">paper</a> that describes how to easily defeat just about any quicksort implementation. The paper describes a simple comparer that decides ordering of elements lazily as the sort executes, and arranges the order so that the sort takes quadratic time. This works even if the quicksort is randomized! Furthermore, if the quicksort is deterministic (not randomized), this algorithm also reveals the input which reliably triggers quadratic behavior for this particular quicksort implementation.</p>
<p><span id="more-33"></span></p>
<p>The trick takes a dozen of code, works with nearly any quicksort routine, and only uses the quicksort via its interface! How cool is that? Here is a C# implementation of this idea:</p>
<pre class="code"><span style="color: blue">class </span><span style="color: #2b91af">QuicksortKiller </span>: <span style="color: #2b91af">IComparer</span>&lt;<span style="color: blue">int</span>&gt;
{
    <span style="color: #2b91af">Dictionary</span>&lt;<span style="color: blue">int</span>, <span style="color: blue">int</span>&gt; keys = <span style="color: blue">new </span><span style="color: #2b91af">Dictionary</span>&lt;<span style="color: blue">int</span>, <span style="color: blue">int</span>&gt;();

    <span style="color: blue">int </span>candidate = 0;
    <span style="color: blue">public int </span>Compare(<span style="color: blue">int </span>x, <span style="color: blue">int </span>y)
    {
        <span style="color: blue">if </span>(!keys.ContainsKey(x) &amp;&amp; !keys.ContainsKey(y))
        {
            <span style="color: blue">if </span>(x == candidate) keys[x] = keys.Count;
            <span style="color: blue">else </span>keys[y] = keys.Count;
        }

        <span style="color: blue">if </span>(!keys.ContainsKey(x)) { candidate = x; <span style="color: blue">return </span>1; }
        <span style="color: blue">if </span>(!keys.ContainsKey(y)) { candidate = y; <span style="color: blue">return </span>-1; }
        <span style="color: blue">return </span>keys[x] - keys[y];
    }
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<p>This trick works well when applied to the .Net Array.Sort() method. The following chart displays the number of Compare() calls forced by QuicksortKiller when ran on an array of some size, as well as the number of Compare() calls that made by Array.Sort on a randomly-ordered sequence of the same length:</p>
<p><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="291" alt="image" src="http://igoro.com/wordpress/wp-content/uploads/2008/05/image-thumb2.png" width="483" border="0" /></p>
<p>This chart clearly shows that the QuicksortKiller comparer triggers the quadratic behavior in Array.Sort.</p>
<h4>How does it work?</h4>
<p>QuicksortKiller&#8217;s trick is to ensure that the pivot will compare low against nearly all remaining elements. But, how can we detect which element is the pivot? We know quicksort will take the pivot and compare it against all other elements.</p>
<p>So, initially we consider all elements to be unsorted. The paper refers to them as &quot;gas&quot;. When we compare two gas elements, we arbitrarily choose one of them, and freeze it to a value larger than any previously frozen values. We remember the other element as the pivot candidate. If the candidate is used in the next comparison with a gas element, we will make sure to freeze the pivot candidate, rather than the other element. That way, the pivot will be frozen within two comparisons against other elements.</p>
<p><a href="http://11011.net/software/vspaste"></a></p>
<h4>Constructing a &quot;Bad&quot; Array</h4>
<p>If the quicksort implementation is deterministic, and always takes the same steps on the same input, it is easy to generate an array that reliably triggers the quadratic behavior. The MakeBadArray method constructs an array that triggers the quadratic-time behavior of Array.Sort():</p>
<pre class="code"><span style="color: blue">static int</span>[] MakeBadArray(<span style="color: blue">int </span>length)
{
    <span style="color: blue">int</span>[] arr = <span style="color: #2b91af">Enumerable</span>.Range(0, length).ToArray();
    <span style="color: #2b91af">Array</span>.Sort(arr, <span style="color: blue">new </span><span style="color: #2b91af">QuicksortKiller</span>()); <span style="color: blue">int</span>[] ret = <span style="color: blue">new int</span>[length];

    <span style="color: blue">for </span>(<span style="color: blue">int </span>i = 0; i &lt; length; i++)
    {
        ret[arr[i]] = i;
    }

    <span style="color: blue">return </span>ret;
}</pre>
<p><a href="http://11011.net/software/vspaste"></a></p>
<h4>How to defeat the Quicksort killer?</h4>
<p>As the original paper explains, the adversary comparer works if the quicksort implementation checks an O(1) number of elements as pivot candidates. But, quicksort implementation that check more than O(1) elements are possible. For example, the quicksort might choose the median element as the pivot at each step, thus always geting a perfect split of the input sequence into two halves. Median can be found deterministically in O(N) running time, and so the total running time is always O(N logN). The median-based quicksort is rarely used in practice because it tends to have a larger constant than other quicksort implementations.</p>
<p>Another solution is to use a regular quicksort algorithm, and degrade to another sorting algorithm if quicksort is not working out. For example, if we reached a recursive depth of 10, and the size of our partition has not reduced by at least a half, we can just ditch quicksort and sort the partition using heapsort.</p>
<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2figoro.com%2farchive%2fquicksort-killer%2f"><img alt="kick it on DotNetKicks.com" src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2figoro.com%2farchive%2fquicksort-killer%2f" border="0" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://igoro.com/archive/quicksort-killer/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
