<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Efficient auto-complete with a ternary search tree</title>
	<atom:link href="http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/feed/" rel="self" type="application/rss+xml" />
	<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/</link>
	<description>On programming, technology, and random things of interest</description>
	<lastBuildDate>Thu, 29 Jul 2010 00:28:29 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: John Haugeland</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-679</link>
		<dc:creator>John Haugeland</dc:creator>
		<pubDate>Wed, 03 Feb 2010 02:20:11 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-679</guid>
		<description>Igor: If you want to support nearest match, the easiest thing to do is to store and search against lexemes (say, metaphone or soundex or whatever).  If you want, you can take the Levenstein difference between the potential string and the source string as a quick sort criterion for small matches; it wouldn&#039;t scale, though, since you&#039;d have to apply it to every source string available.  

For larger matching that scales on machines with a fair amount of memory available, the easiest way to deal with it is to knock out one lexeme from the word at a time, and match the gap with a GADDAG or something.

Also, a suffix trie is a rather simpler way to resolve this quandry you&#039;re approaching.</description>
		<content:encoded><![CDATA[<p>Igor: If you want to support nearest match, the easiest thing to do is to store and search against lexemes (say, metaphone or soundex or whatever).  If you want, you can take the Levenstein difference between the potential string and the source string as a quick sort criterion for small matches; it wouldn&#8217;t scale, though, since you&#8217;d have to apply it to every source string available.  </p>
<p>For larger matching that scales on machines with a fair amount of memory available, the easiest way to deal with it is to knock out one lexeme from the word at a time, and match the gap with a GADDAG or something.</p>
<p>Also, a suffix trie is a rather simpler way to resolve this quandry you&#8217;re approaching.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gallery of Processor Cache Effects</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-656</link>
		<dc:creator>Gallery of Processor Cache Effects</dc:creator>
		<pubDate>Tue, 02 Feb 2010 06:42:47 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-656</guid>
		<description>[...] Efficient auto-complete with a ternary search tree [...]</description>
		<content:encoded><![CDATA[<p>[...] Efficient auto-complete with a ternary search tree [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Igor Ostrovsky</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-647</link>
		<dc:creator>Igor Ostrovsky</dc:creator>
		<pubDate>Thu, 14 Jan 2010 17:11:49 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-647</guid>
		<description>alex, it may look like &quot;AB&quot; is in the diagram, but really it is just &quot;A&quot;. A letter only counts if you follow a down-arrow that starts from that letter.</description>
		<content:encoded><![CDATA[<p>alex, it may look like &#8220;AB&#8221; is in the diagram, but really it is just &#8220;A&#8221;. A letter only counts if you follow a down-arrow that starts from that letter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: alex yang</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-646</link>
		<dc:creator>alex yang</dc:creator>
		<pubDate>Thu, 14 Jan 2010 16:56:34 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-646</guid>
		<description>If ternary tree structure like the above pictrue,words seq should be &quot;AB&quot;,&quot;ABCD&quot;,&quot;ABBA&quot; &amp; &quot;BCD&quot;.</description>
		<content:encoded><![CDATA[<p>If ternary tree structure like the above pictrue,words seq should be &#8220;AB&#8221;,&#8221;ABCD&#8221;,&#8221;ABBA&#8221; &amp; &#8220;BCD&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Igor Ostrovsky</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-553</link>
		<dc:creator>Igor Ostrovsky</dc:creator>
		<pubDate>Fri, 04 Sep 2009 06:52:16 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-553</guid>
		<description>Barry: You could save space by storing the casing as a bitmask rather than a string. Still, even that is some extra space.

Seide: I am not familiar with lucene. Either way, this article is more about explaining a cool data structure than about giving the most practical solution for auto-complete.</description>
		<content:encoded><![CDATA[<p>Barry: You could save space by storing the casing as a bitmask rather than a string. Still, even that is some extra space.</p>
<p>Seide: I am not familiar with lucene. Either way, this article is more about explaining a cool data structure than about giving the most practical solution for auto-complete.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: seide</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-552</link>
		<dc:creator>seide</dc:creator>
		<pubDate>Thu, 03 Sep 2009 10:08:22 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-552</guid>
		<description>Why not just use a full-text search index like lucene?</description>
		<content:encoded><![CDATA[<p>Why not just use a full-text search index like lucene?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Barry Kelly</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-551</link>
		<dc:creator>Barry Kelly</dc:creator>
		<pubDate>Thu, 03 Sep 2009 09:15:29 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-551</guid>
		<description>Igor: one reason for using a trie (especially a Patricia trie) for lookup of strings with lots of common prefixes is that it can be far more space-efficient than e.g. a sorted array used with binary search. Storing the full, properly-cased string in the terminal nodes negates this advantage.</description>
		<content:encoded><![CDATA[<p>Igor: one reason for using a trie (especially a Patricia trie) for lookup of strings with lots of common prefixes is that it can be far more space-efficient than e.g. a sorted array used with binary search. Storing the full, properly-cased string in the terminal nodes negates this advantage.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: links for 2009-09-02 &#171; Blarney Fellow</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-550</link>
		<dc:creator>links for 2009-09-02 &#171; Blarney Fellow</dc:creator>
		<pubDate>Thu, 03 Sep 2009 01:10:34 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-550</guid>
		<description>[...] Efficient auto-complete with a ternary search tree &#124; Igor Ostrovsky Blogging (tags: tree search strings ui) [...]</description>
		<content:encoded><![CDATA[<p>[...] Efficient auto-complete with a ternary search tree | Igor Ostrovsky Blogging (tags: tree search strings ui) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Igor Ostrovsky</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-549</link>
		<dc:creator>Igor Ostrovsky</dc:creator>
		<pubDate>Wed, 02 Sep 2009 23:21:56 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-549</guid>
		<description>Somejan &amp; Barry: Both very good points. As is usual, the list of caveats and alternate solutions should be a lot longer than the actual article. :-)

Barry: A reasonable solution to the casing problem would be to store the list of properly cased words in each terminal node. If words rarely differ by casing only (as is probably the case in practice), the list would typically be of length 1, maybe 2.</description>
		<content:encoded><![CDATA[<p>Somejan &#038; Barry: Both very good points. As is usual, the list of caveats and alternate solutions should be a lot longer than the actual article. <img src='http://igoro.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Barry: A reasonable solution to the casing problem would be to store the list of properly cased words in each terminal node. If words rarely differ by casing only (as is probably the case in practice), the list would typically be of length 1, maybe 2.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Barry Kelly</title>
		<link>http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/comment-page-1/#comment-548</link>
		<dc:creator>Barry Kelly</dc:creator>
		<pubDate>Wed, 02 Sep 2009 22:11:29 +0000</pubDate>
		<guid isPermaLink="false">http://igoro.com/?p=177#comment-548</guid>
		<description>It&#039;s worth pointing out that tries and structures similar to them have issues with case preserving but case insensitive operations. Consider the shape for sequences like:

aBcd
aBdd
abbd
abcd

... and then find all items with the prefix &quot;abc&quot;, considered case-insensitively.

Case preserving but insensitive comparisons normally sort uppercase characters directly before their corresponding lower-case characters, but the way tries store the data, they must either (1) normalize and lose the case data, failing the preservation requirement, or (2) store each letter case separately and have much greater difficulty finding all elements with the same prefix (considered case insensitively), as items with the same prefix may exist in multiple descendants and an in-order traversal of the trie will place the items out of lexicographical order.</description>
		<content:encoded><![CDATA[<p>It&#8217;s worth pointing out that tries and structures similar to them have issues with case preserving but case insensitive operations. Consider the shape for sequences like:</p>
<p>aBcd<br />
aBdd<br />
abbd<br />
abcd</p>
<p>&#8230; and then find all items with the prefix &#8220;abc&#8221;, considered case-insensitively.</p>
<p>Case preserving but insensitive comparisons normally sort uppercase characters directly before their corresponding lower-case characters, but the way tries store the data, they must either (1) normalize and lose the case data, failing the preservation requirement, or (2) store each letter case separately and have much greater difficulty finding all elements with the same prefix (considered case insensitively), as items with the same prefix may exist in multiple descendants and an in-order traversal of the trie will place the items out of lexicographical order.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
