Abstract. The size of electronic data is currently growing at
a faster rate than computer memory and disk storage capacities. For this
reason compression appears always as an attractive choice, if not
mandatory. However space overhead is not the only resource to be optimized
when managing large data collections; in fact data turn out to be useful
only when properly indexed to support search operations that efficiently
extract the user-requested information.
Approaches to combine compression and indexing techniques are nowadays
receiving more and more attention. A first step towards the design of a
compressed full-text index achieving guaranteed performance in the worst
case has been recently done in [Ferragina-Manzini
FOCS 2000]. The novelty of that index resides in the careful combination
of the compression algorithm proposed by Burrows and Wheeler with the suffix
array data structure. The index is opportunistic in that, although
no assumption on a particular fixed distribution is made, it takes advantage
of the compressibility of the input data by decreasing the space occupancy
at no significant asymptotic slowdown in the query performance.
In this paper we present an implementation of this index and perform
an extensive set of experiments on various text collections. These experiments
show that our index is compact (its space occupancy is close to the one
achieved by the best known compressors), it is fast in counting the number
of occurrences of a pattern, and the cost of their retrieval is reasonable
when they are few (i.e. in case of a selective query).