The design of learning-based compressed data structures

Abstract

We revisit some fundamental and ubiquitous problems in data structures design, such as predecessor search and rank/select primitives, by exploiting a reduction from the input data to the geometry of points in a Cartesian plane [Ao et al., VLDB 11].
We introduce techniques that discover, or “learn”, the regularities in this Cartesian plane and solve the problems above in an efficient algorithmic way [Ferragina and Vinciguerra, VLDB 20; Boffa et al., ALENEX 21]. Surprisingly, we show that it is possible to both learn the data regularities and retain the space-time complexity bounds of traditional data structures, which translates into robust performance in practice, as we show by experimenting with them on some large datasets.
Motivated by these results, we dig into the core components of these structures and present the first mathematically grounded answer to why learning-based compressed data structures can outperform their traditional counterparts [Ferragina et al., Theor. Comp. Sci. 2021].
We conclude by discussing the plethora of research opportunities that these new approaches to data structures design open up.

Date
16 Aug 2021 14:30 (UTC+02) — 14:50 (UTC+02)
Location
Virtual
Giorgio Vinciguerra
Giorgio Vinciguerra
Research Fellow (RTD-A)