Progress Report

CS20c Progress Report
by Shang-Lin Chen

Title: Sorting and Searching Strings

Introduction

       Efficiently sorting and searching data is one of the most fundamental problems in computer science, both in industrial and in academic research. Many kinds of structures can store, search, and retrieve data, but search trees are the most important and efficient means to date. The paper "Fast Algorithms for Sorting and Searching Strings" by Jon L. Bentley and Robert Sedgewick compares the time complexities of different types of search trees.
       The most common search tree is the binary tree, characterized by nodes that each point to a maximum of two children nodes. A binary search tree's efficiency depends on it height, h. Constructing an n-node binary search tree has is O(n*h), and searching an object is O(h). The height of a carefully constructed binary search tree is no more than log n, which leads to a time complexity of O(log n) for a search. A digital search trie can improve the time complexity of constructing and searching n nodes to O(n). However, a digital search trie requires substantially more memory.
       Bentley and Sedgewick point out that a ternary search tree combines the compact size of a binary search tree with the speed of a digital search trie. Before Bentley and Sedgewick published their paper, ternary search trees had only been used as a theoretical device to prove properties of other topics. Few realized that the characteristics of a ternary search tree actually make it ideal for practical use in sorting and searching data.

Current Status

I have read the paper and started to study the algorithms and theorems. I have also implemented in LISP the code for binary search trees and the algorithms to sort strings in a binary search tree.
The paper "Fast Algorithms for Sorting and Searching Strings" discusses methods of data storage and retrieval, a fundamental problem in computer science. In the paper, binary trees, digital tries, and ternary trees are compared for efficiency.
Figuring out how to obtain the coefficients in the time complexities in the theorems has presented difficulties. Since some of the theorems come from other papers, I plan to read those papers for more information about their proofs.

Major Milestones

References

[Bentley and Saxe79]: "Algorithms on Vector Sets", SIGACT
News, 11(9):36-39, Fall 1979.

[Bentley and Sedgewick97]:

Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms,

.ps format

.pdf format

[Hoare67]: C.A.R. Hoare, "Quicksort," Computer Journal, 5(1):10-15, April 1962.

Back to main page