CS20c Progress Report
by Shang-Lin Chen


Title: Sorting and Searching Strings

Introduction

       Efficiently sorting and searching data is one of the most fundamental problems in computer science, both in industrial and in academic research. Many kinds of structures can store, search, and retrieve data, but search trees are the most important and efficient means to date. The paper "Fast Algorithms for Sorting and Searching Strings" by Jon L. Bentley and Robert Sedgewick compares the time complexities of different types of search trees.
       The most common search tree is the binary tree, characterized by nodes that each point to a maximum of two children nodes. A binary search tree's efficiency depends on it height, h. Constructing an n-node binary search tree has is O(n*h), and searching an object is O(h). The height of a carefully constructed binary search tree is no more than log n, which leads to a time complexity of O(log n) for a search. A digital search trie can improve the time complexity of constructing and searching n nodes to O(n). However, a digital search trie requires substantially more memory.
       Bentley and Sedgewick point out that a ternary search tree combines the compact size of a binary search tree with the speed of a digital search trie. Before Bentley and Sedgewick published their paper, ternary search trees had only been used as a theoretical device to prove properties of other topics. Few realized that the characteristics of a ternary search tree actually make it ideal for practical use in sorting and searching data.
 

Current Status

Major Milestones

References

          [Bentley and Saxe79]: "Algorithms on Vector Sets", SIGACT
          News, 11(9):36-39, Fall 1979.

 Back to main page