CS20c Progress Report
by Shang-Lin Chen
Title: Sorting and Searching Strings
Introduction
Efficiently sorting and searching
data is one of the most fundamental problems in computer science, both
in industrial and in academic research. Many kinds of structures can store,
search, and retrieve data, but search trees are the most important and
efficient means to date. The paper "Fast Algorithms for Sorting and Searching
Strings" by Jon L. Bentley and Robert Sedgewick compares the time complexities
of different types of search trees.
The most common search tree is
the binary tree, characterized by nodes that each point to a maximum of
two children nodes. A binary search tree's efficiency depends on it height,
h. Constructing an n-node binary search tree has is O(n*h),
and searching an
object is O(h). The height of a carefully constructed binary search tree
is no more than log n, which leads to a time complexity of O(log n) for
a search. A digital search trie can improve the time complexity of constructing
and searching n nodes to O(n). However, a digital search trie requires
substantially more memory.
Bentley and Sedgewick point out
that a ternary search tree combines the compact size of a binary search
tree with the speed of a digital search trie. Before Bentley and
Sedgewick published their paper, ternary search
trees had only been used as a theoretical
device to prove properties of other topics. Few realized that the
characteristics
of a ternary search tree actually make it ideal for practical
use in
sorting and searching data.
Current Status
-
I have read the paper and started to study the algorithms and theorems.
I have also implemented in LISP the code for binary search trees and the
algorithms to sort strings in a binary search tree.
-
The paper "Fast Algorithms for Sorting and Searching Strings" discusses
methods of data storage and retrieval, a fundamental problem in computer
science. In the paper, binary trees, digital tries, and ternary trees are
compared for efficiency.
-
Figuring out how to obtain the coefficients in the time complexities in
the theorems has presented difficulties. Since some of the theorems come
from other papers, I plan to read those papers for more information about
their proofs.
Major Milestones
1.Get approval of the paper "Fast Algorithms for Sorting and
Searching Strings." (completed)
2.Read the paper thoroughly. (completed)
3.Understand and prove the theorems presented in the paper.
4.Understand the algorithms provided in the paper.
5.Derive the computational complexities for those algorithms.
6.Implement binary tree and ternary tree algorithms in LISP. Also
consider how to implement them in C or Java. (completed
the binary
tree portion in LISP)
7.Summarize my findings in a final report and presentation.
8.Present my report to the class.
References
[Bentley and
Saxe79]: "Algorithms on Vector Sets",
SIGACT
News, 11(9):36-39,
Fall 1979.
[Bentley and Sedgewick97]: Jon L. Bentley and Robert Sedgewick,
"Fast Algorithms for Sorting and Searching Strings", Proceedings of
the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, January 1997.
(text available in .ps
format or .pdf
format )
[Hoare67]: C.A.R. Hoare, "Quicksort," Computer Journal,
5(1):10-15, April 1962.
Back to main page