--------------------

A Lightweight Suffix Array and BWT Construction Algorithm

--------------------

Welcome to the home page of our deep-shallow suffix array and BWT construction algorithm. This algorithm has been developed by Giovanni Manzini (Università del Piemonte Orientale) and Paolo Ferragina (Università di Pisa). ADetails and experiments on this algorithm have been published Algorithmica, 40(1), 2004.

All the known algorithms for building the suffix array either require a large amount of space or are inefficient when the input string contains many repeated substrings. Our algorithm was designed to overcome this unpleasant dichotomy by deploying the following idea: use a ``shallow'' sorter for the suffixes with a small common prefix, and a ``deep'' sorter for the suffixes with a large common prefix. The algorithm is very fast in practice and uses a very small amount of space---in addition to the space required by the suffix array itself (hence, it is named lightweight).

The source code of our lightweight suffix-sorting algorithm is available under GNU GPL. It has been tested only under Linux, but it should work on any platform supporting GCC. You are encouraged to download it and report any problem to the authors. The archive includes also an algorithm to compute the Burrows-Wheeler Tranform of a string (the basic step of the bzip2 compressor) and some algorithms to compute the LCP array (published on SWAT 2004).

Note. A recent paper by Stefan Burkhardt and Juha Karkkainen (Fast lightweight suffix array construction and checking, Proc. CPM '03, Springer-Verlag LNCS n. 2676) describes a new approach to lightweight suffix array construction which is theoretically superior to our approach. Anyone interested in lightweight suffix array construction should definitely look at this paper!

--------------------

Viewable With Any Browser Valid HTML 3.2!

--------------------