The PGM-index: a multicriteria, compressed and learned approach to data indexing

Abstract

The recent introduction of learned indexes has shaken the foundations of the decades-old field of indexing data structures. Combining, or even replacing, classic design elements such as B-tree nodes with machine learning models has proven to give outstanding improvements in the space footprint and time efficiency of data systems. In this talk, I will give an overview of these new results, discuss their limitations, and introduce the Piecewise Geometric Model index (shortly, PGM-index). Unlike known learned indexes, the PGM-index achieves I/O-optimality in query operations, learns an optimal number of linear models, and its peculiar recursive construction makes it a purely learned data structure, rather than a hybrid of traditional and learned components. I will then introduce (i) a compressed PGM-index that further reduces its succinct space footprint by exploiting the repetitiveness at the level of the learned linear models it is composed of; (ii) a PGM-index that adapts itself to the distribution of the query operations; and (iii) the multicriteria PGM-index whose speciality is to efficiently auto-tune itself in a few seconds over hundreds of millions of keys to the possibly evolving space-time constraints imposed by the application of use.

Date
6 Sep 2019 11:00 (UTC+02) — 11:45 (UTC+02)
Location
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Oktavie-Allee
Wadern, 66687
Germany
Giorgio Vinciguerra
Giorgio Vinciguerra
Research Fellow (RTD-A)