Giorgio Vinciguerra

Giorgio Vinciguerra

Research Fellow (RTD-A)

Università di Pisa

I am a Research Fellow (academic rank in Italy: RTD-A) at the Department of Computer Science of the University of Pisa since January 2023.

My research interests include compact data structures, data compression, and algorithm engineering, with a focus on the so-called learned data structures, that is, data structures that exploit machine learning tools to uncover new regularities in the input data and achieve significantly improved space-time trade-offs over traditional ones.

I obtained my PhD from the University of Pisa in February 2022 with a thesis on Learning-based compressed data structures that was awarded the Best PhD thesis in Theoretical Computer Science by the Italian Chapter of the EATCS. Before my current position, I was a postdoc (2022) and PhD student (2018–21) at the University of Pisa. I was a visiting researcher at KTH Royal Institute of Technology (2024) and at Harvard University (2020).

In July 2025, I earned the Italian National Scientific Habilitation as Associate Professor in Computer Science.

Results of my research, including my software libraries, have found applications in database systems, information retrieval systems, and bioinformatics tools. Furthermore, I was granted a US and an Italian patent, owned by the University of Pisa.

My research is supported by the EU-funded project SoBigData.it. In the past, I was supported by the SoBigData++ and Multicriteria data structures projects.

Publications

Filter

Authors are listed alphabetically for most papers
(2025). Efficiency of learned indexes on genome spectra. ESA.

PDF Cite Code

(2025). On the compressibility of large-scale source code datasets. J. Syst. Softw..

PDF Cite Code DOI

(2025). Learned compression of nonlinear time series with random access. ICDE.

PDF Cite Code DOI

(2025). Two-level massive string dictionaries. Inf. Syst..

PDF Cite Code DOI

(2024). CoCo-trie: data-aware compression and indexing of strings. Inf. Syst..

PDF Cite Code DOI

(2023). Engineering a textbook approach to index massive string dictionaries. SPIRE.

PDF Cite Code DOI

(2023). On nonlinear learned string indexing. IEEE Access.

PDF Cite Code Dataset DOI

(2023). Learned monotone minimal perfect hashing. ESA.

PDF Cite Code Slides DOI

(2022). Compressed string dictionaries via data-aware subtrie compaction. SPIRE.

PDF Cite Code DOI

(2022). A learned approach to design compressed rank/select data structures. ACM Trans. Algorithms.

PDF Cite Code DOI Experiments code & datasets

(2022). Learning-based compressed data structures. Ph.D. thesis.

PDF Cite

(2021). Repetition- and linearity-aware rank/select dictionaries. ISAAC.

PDF Cite Code Slides DOI

(2021). On the performance of learned data structures. Theor. Comput. Sci..

PDF Cite Code DOI

(2020). Why are learned indexes so effective?. ICML.

PDF Cite Code Slides Video

(2020). Learned data structures. Recent Trends in Learning From Data (Springer).

PDF Cite DOI

Awards

Talks

Learned compression of nonlinear time series with random access
Time for learned data structures
Learned monotone minimal perfect hashing
Advances in data-aware compressed-indexing schemes for integer and string keys
Learning-based approaches to compressed data structures design

Teaching & Supervision

Teacher:

Teaching assistant:

Supervised and co-supervised students:

Software

LeMonHash

LeMonHash

A monotone minimal perfect hash function that learns and leverages the data smoothness.

LZ$\phantom{}_{\boldsymbol\varepsilon}$

Compressed rank/select dictionary based on Lempel-Ziv and LA-vector compression.

LZ-End

Implementation of two LZ-End parsing algorithms.

PrefixPGM

Proof-of-concept extension of the PGM-index to support fixed-length strings.

RearCodedArray

Compressed string dictionary based on rear-coding.

Block-$\boldsymbol\varepsilon$ tree

Block-$\boldsymbol\varepsilon$ tree

Compressed rank/select dictionary exploiting approximate linearity and repetitiveness.

LA-vector

LA-vector

Compressed bitvector/container supporting efficient random access and rank queries.

PyGM

PyGM

Python library of sorted containers with state-of-the-art query performance and compressed memory usage.

PGM-index

PGM-index

Data structure enabling fast searches in arrays of billions of items using orders of magnitude less space than traditional indexes.

CSS-tree

CSS-tree

C++11 implementation of the Cache Sensitive Search tree.

NN Weaver

NN Weaver

Python library to build and train feedforward neural networks, with hyperparameters tuning capabilities.

Knowledge is like a sphere; the greater its volume, the larger its contact with the unknown.

― Blaise Pascal

Contact