I am an algorithmist, primarily specialising in lossless data compression. Since July 2024, I have been working on optimising the compression and efficient indexing of large code archives in collaboration with the Software Heritage team, including Roberto di Cosmo, David Douard, Martin Kirchgessner and Stefano Zacchiroli. In my doctoral thesis, I investigated compressed formats for matrices and trie structures. Subsequently, I explored various sparse matrix formats that support matrix-vector multiplications (SpMV) in the compressed domain, with a focus on energy efficiency.
For those familiar with the IPA, my name is pronounced: [fraŋ'ʧesko to'zoːni].
Education
I earned a PhD in Computer Science
from the University of Pisa, under the supervision of Professors P. Ferragina and G. Manzini. My doctoral dissertation, titled Computation-friendly Compression of Matrices and Tries, focused on efficient data compression techniques. Since 2019, I have been a member of the Acube Laboratory (A³, Advanced Algorithms and Applications), directed by Professor P. Ferragina.
My research interests include lossless data compression, string indexing and stringology, and big data analytics.
I obtained a BSc in Computer and Electronic Engineering
from the University of Perugia. I then continued my studies at the University of Pisa, earning an MSc in Computer Science and Networking
in 2020, as part of a joint programme with the Sant’Anna School of Advanced Studies. My MSc thesis, Algorithms and Data Structures for Efficient Ride-Sharing Platforms, was awarded the Con.Scienze 2020 Best Thesis Award.
In 2020, I was awarded a scholarship and research grant on "Algorithms and Data Structures for Urban Mobility Platforms" at the University of Pisa. That same year, I obtained the qualification to register as a chartered engineer (Section A, Information Engineering).
From 8 September to 20 December 2022, I was a visiting researcher at Professor Gonzalo Navarro's laboratory at the University of Chile in Santiago. In July 2025, I was a visiting researcher at the Software Heritage team at Inria Paris, co-founded by Roberto di Cosmo.
Code Artefacts
Note: For each code artefact, I report the associated publication (
c1,
j1,
j2,
j3, and
j4) in which the code served for experimental evaluations.
Tool ppc-swh-rocksdb
Efficiently reads large source-code datasets in Parquet format using a PPC solution on top of RocksDB. Achieved over 100 MiB/s insertion throughput and up to 10% compression with zstd
.
j4 green-lossless-spmv
Green Lossless Sparse Matrix-Vector Multiplication (SpMV) implementation. It focuses on lossless compression techniques that optimise space, time, and energy for multiplications between binary or ternary matrix formats and real-valued vectors.
j4 zuckerli
Readapted Google's Zuckerli compressed matrix format to carry out computation-friendly matrix-vector multiplication kernels and PageRank computations.
c1, j3 CoCo-trie
A data-aware trie-shaped data structure for indexing and compressing string sets, developed by A³ lab. Implements principled subtree collapsing with optimal encoding scheme selection to minimise space.
j2 mm-repair
Matrix multiplication implementation for RePair-compressed matrices. Efficient computation methods for matrices compressed using grammar-based compression techniques.
Tool Watermark
Implements a C++ multi-threaded data-parallel version based on POSIX threads (pthreads) and fork-join mechanisms, and a FastFlow-enhanced version of an application applying a digital watermark on an image. Includes tools for performance evaluation and visualisation of time statistics. The repository received GitHub's Arctic Code Vault Contributor badge as part of the 2020 GitHub Archive Programme.
Tool PCAP Lab
Contains C/C++ exercises demonstrating the use of the libpcap library for network traffic capture. Features include printing packet metadata, implementing a stateful RPC for packet counting, and identifying IP and TCP packets with their source/destination addresses.
Tool BeepBeep
A microservice-based application for managing challenges based on Strava data. It allows users to create, check, complete, and delete challenges, with specific rules for winning (e.g., longer distance, higher speed).
Related components:
- BeepBeep-dataservice manages core data operations. (GitHub)
- BeepBeep-challenges handles the logic and functionalities related to user challenges. (GitHub)
- BeepBeep-statistics processes and provides user statistics. (GitHub)
- BeepBeep-training-objectives manages training goals and objectives. (GitHub)
- BeepBeep-API-gateway acts as the entry point for external requests to the microservices. (GitHub)
- BeepBeep-emailer manages email notifications. (GitHub)
- BeepBeep-data-pump responsible for data ingestion or transfer. (GitHub)
Tool We Against Virus — PharmaQ
A pharmacy queue prototype that secured 3rd place in the #WeAgainstVirus hackathon. This Flask-based web portal allowed users to upload pictures from their phones, which then automatically detected and displayed the number of customers in a pharmacy's waiting line using Nanonetes' API for people detection. It integrates a local DB and a Google Maps interface.
Event Wikimedia Hackathon 2025
- Palermo, Italy | 14 —16 Mar 2025
- Contributed to technical enhancements for Wikipedia and sister projects:
Contributions:
- Automatic spelling detection for Template F: Optimised a Lombard Wikipedia template to auto-detect article orthography, eliminating manual configuration. (Enabled dynamic rendering for regional language variants)
- Smart image resizing: Python script leveraging REST APIs to standardise image dimensions across articles, improving page aesthetics. (Reduced visual inconsistencies by automated proportional scaling)
Tool Wikimedia Tool for Translation Tags
Contributed a web application originally developed by Gopa Vasanth (Indic Wikimedia Technical Committee) for Wikimedia projects to automate the insertion of translation tags by parsing wikicode. (Pull request currently under review, expected to be merged soon.)
Scholarships
Note: For each scholarship, I report the associated publications (
c1,
j1,
j2,
j3, and
j4) produced as outcomes.
Postdoctoral researcher
- Project in collaboration with Software Heritage
- Jul 2024 — Jun 2025
- Univ. of Pisa, Italy
Research conducted at the University of Pisa, on parallel and I/O-efficient compression and indexing techniques for large source-code archives. I collaborated with the founders Roberto di Cosmo and Stefano Zacchiroli and other Software Heritage team members.
PhD Scholarship
- PhD in Computer Science, 36° cycle
- Nov 2020 — Oct 2023
- Univ. of Pisa
Recipient of a three-year PhD research grant from the University of Pisa (Department of Computer Science). (c1, j1, j2, j3)
Research Scholarship
- Citypost S.p.A.
- Jun — Oct 2020
- Univ. of Pisa
Title: Algorithms and Data Structures for Urban Mobility Platforms. Duration: five months. Grant: Citypost S.p.A. Researched graph-based algorithmic solutions for vehicle routing and mobility problems, as part of Acube Lab's 2018–2020 research collaboration.
Participation in [inter]national projects
Note: For each project, I report the associated publications (
c1,
j1,
j2,
j3, and
j4) produced as outcomes.
NextGenerationEU—National Recovery and Resilience Plan (PNRR)
- SoBigData.it-Strengthening the Italian RI for Social Mining and Big Data Analytics — Avviso (3264 del 28/12/2021)
- 2022 —ongoing
- Grant
IR0000013
Funding for the project "SoBigData.it-Strengthening the Italian RI for Social Mining and Big Data Analytics."
Associated publications: (j3, j4)
European Union-NextGenerationEU-PNRR
- ICSC-Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing
- 2022 —ongoing
- Spoke "Future HPC and BigData"
Funding for the Spoke "Future HPC and BigData."
Associated publications: (j3, j4)
European Union-Horizon 2020 Program
- SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics
- 2020 —ongoing
- Grant 871042
Funded through the Scheme "INFRAIA-01-2018-2019—Integrating Activities for Advanced Communities."
Associated publications: (j1, j3, j4)
European Union-Horizon 2020 Program
- HumanE AI Network
- 2020 —ongoing
- Grant
952026
Funding through the project "HumanE AI Network."
Associated publications: (j1)
NextGenerationEU – PNRR / MUR PRIN Project
- Multicriteria Data Structures and Algorithms: From Compressed to Learned Indexes, and Beyond
- 2019–2023
- Grant n.
2017WR7SHH
Funding from the Italian Ministry of University and Research (MUR) under the "Progetti di Rilevante Interesse Nazionale" (PRIN) programme for the project "Multicriteria Data Structures and Algorithms." Extended research including compressed and learned indices, and related areas.
Associated publications: (j1, j3)
MIT-UniPI Grant
- Using Graph Compression for Shortest Path Computation in Urban On-Demand Mobility
- 2019–2021
Grant on "Using Graph Compression for Shortest Path Computation in Urban On-Demand Mobility".
Associated publications: (j2)
Professional internships and traineeships
Research Intern & Software Development (Software Heritage)
- Inria — Software Heritage
- Jul 2025
- Paris, France
A focused research collaboration on efficient code compression and scalable data retrieval systems for Software Heritage, a leading global source code archive. Developed and optimised solutions for Terabyte-scale caching systems and shard format optimisation, directly addressing challenges in managing vast source code archives. This work, conducted within the CodeCommons project, aimed at boosting performance and mitigating costs of large-scale software preservation and accessibility. Focused on applying advanced compression algorithms to real-world industrial-scale data, fostering innovation in data management for digital heritage.
Invited participant
- Bending Spoons
- Sep 2018
- Copenhagen, Denmark
Selected as one of 20 top Italian tech students from a pool of over 400 applicants to participate in this coding challenge event. Engaged directly with Bending Spoons team members and founders, gaining insights into the industry.
Intern
- EPLASS GmbH
- Aug 2014
- Würzburg, Germany
Worked with C#
at an internet-based software company specialising in international collaborations.
Intern
- Flyeralarm GmbH
- Aug 2014
- Würzburg, Germany
Supported cross-departmental operations at a pan-European online printing firm.
Speaker

Invited Talk at Software Heritage
- Talk: "Lossless-compressed data storage for SWH: Compressed, tunable & energy-aware"
- 2 Jul 2025
- Inria Paris Centre, Paris
Gave a talk to the Software Heritage team about ongoing research on an I/O-efficient caching system, a terabyte-scale energy-aware solution for source code archival. A second milestone presented was a shard permutation designed to boost compression on the current SWH infrastructure. Shard refers to the file format used in Winery.
Invited Sant'Anna Workshop "Learning from large, complex and structured data: advances in methods and applications"
- Talk: "Toward Greener Matrix Operations by Lossless Compressed Formats"
- 4 Jun 2025
- Sant'Anna School of Advanced Studies
Participated in a two-day workshop held in the Aula Magna of the Sant'Anna School of Advanced Studies (SSSUP) showcasing interdisciplinary research in Economics, Management, Law, and Data Science of young researchers from L'EMbeDS, SMaRT COnSTRUCT project, and the AI for Society PhD programme.
Contributed the session The frontiers of Computer Science – Chair: Prof. Andrea Vandin
Invited Google Developer Group (GDG) Pisa
- Talk: "Verso operazioni più green su matrici tramite formati compressi senza perdita."
- 27 Feb 2025
- Polo Fibonacci, Univ. of Pisa
Invited talk on lossless compressed formats for greener sparse matrix operations at GDG Pisa. Thanks to Giovanna Rotundo (Women Techmakers Pisa) for the invitation.

2025 Software Heritage Community Workshop, Paris
- Poster: "Measuring impact by extracting knowledge of software assets"
- 30 Jan 2025
- Inria Palace, Paris
Contributed to a collaborative community workshop and presented one of the four community posters titled Measuring impact by extracting knowledge of software assets. The initiative aimed to enhance transparency, improve accessibility, and promote the mission of SWH.
Other posters presented at the workshop:
- Discovering open-source: One-stop shop for software discovery (Zenodo)
- The Library of Alexandria was available, until it was not (Zenodo)
- Repair today, repair tomorrow: Software Heritage (Zenodo)

Invited Software Heritage Kickoff Workshops, Paris
- Presented at the CodeCommons Kickoff
- 28 Jan 2025
- Inria Paris, France
I gave a talk in front of all research teams about my contribution to enhancing space —time performances of insertion and retrieval into the SWH archive. Talk title: "Enhancing SWH Object Storage with Compressed and Dynamic Solutions".
Invited Seminar at the Ca' Foscary University
- Toward Greener Matrix Operations by Lossless Compressed Formats
- 7 Nov 2024
- Ca' Foscary Univ., Venice
Presented before the members of the REGINDEX research group (Ca' Foscari University, Venice), directed by Prof. Nicola Prezza, a preprint I contributed as first author on the relationship between computation-friendly lossless compressed matrix formats for matrix-vector multiplication kernels and energy savings.
Invited Talk, Efficient Machine Learning Reading Group, chaired by TU Graz
- Toward Greener Matrix Operations by Lossless Compressed Formats
- 21 Oct 2024
- virtual
Presented a preprint I contributed as first author on the relationship between computation-friendly lossless compressed matrix formats for matrix-vector multiplication kernels and energy savings. The seminar series is chaired by the Embedded Learning and Sensing Systems research group directed by Prof. Olga Saukh (TU Graz).
VLDB '22 48th International Conference on Very Large Databases
- Talk: “Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices”
- 5 —9 Sep 2022
- Sydney, Australia
Presented the article (j2) as corresponding author [virtual].
School Lipari School of Computational Complexity and Social Systems
- PhD research summary presentation
- 17 —23 Jul 2022
- Lipari, Italy
During the summer school, I presented intermediate results of my PhD research.
Seminar Mauriana Pesaresi Seminar Series 2020/2021, Univ. of Pisa
- Locality Filtering for Efficient Ride Sharing Platforms
- 19 Feb 2021
- Univ. of Pisa, Italy
Presented my work on locality filtering techniques for ride-sharing platforms as part of this PhD student-organised seminar series. The talk discussed approaches to substantially speed up ride-sharing computations while maintaining solution quality.
Participation to conferences
Symposium Software Heritage 2025 Symposium and Summit
- UNESCO headquarters
- 29 Jan 2025
- Paris, France
Engaged with leaders from UNESCO, Inria, and Software Heritage, participating in discussions and panels on critical topics including cybersecurity and regulation (e.g., EU's Cyber Resilience Act), open and transparent AI (with insights from EU AI Office, IBM Research, Open Source Initiative), open science (aligned with UNESCO Recommendation on Open Science), and cultural preservation of software as digital heritage.
Seminar From Software Heritage to Code Commons: A Vision for Transparent and Responsible AI in Code-Based Model Training
- Sant'Anna School of Advanced Studies
- 12 Dec 2024
- Sant'Anna School
I attended a seminar presented by Roberto Di Cosmo (University of Paris Cité, SWH founder), which was held at the Pilo Boyl Palace, Sant'Anna School of Advanced Studies, Pisa. Gained insights into the ethical and technical challenges of using open codebases for AI model training, emphasising the importance of transparency, accountability, and the role of SWH in fostering CodeCommon's goals for responsible AI development.
Conference Conference article: “Compressed String Dictionaries via Data-Aware Subtrie Compaction”
- SPIRE '22: 29th International Symposium on String Processing and Information Retrieval
- 8 —10 nov 2022
- Concepción, Chile
In presence attendance to SPIRE '22, where my research group contributed the conference article (c1)