Please note: As of August 2025, this page will no longer be updated as I have transferred to Sant'Anna School.

Department of Computer Science

University of Pisa

The logotype of unipi with the cherub.

Francesco Tosoni, PhD

Francesco Tosoni
Photo by Studio Schloen, Cologne.

Acube Lab


 P.zza Martiri della Libertà 33, 56127 Pisa PI, Italy

  Complesso edilizio Sede Centrale

  francesco◦tosoni🐌di◦unipi◦it

Former address:
L.go B. Pontecorvo 3, 56127 Pisa PI, Italy
Polo Fibonacci, Building C, second floor, room 308


I am an algorithmist, primarily specialising in lossless data compression. Since July 2024, I have been working on optimising the compression and efficient indexing of large code archives in collaboration with the Software Heritage team, including Roberto di Cosmo, David Douard, Martin Kirchgessner and Stefano Zacchiroli. In my doctoral thesis, I studied compressed formats for matrices and trie structures; subsequently, I explored various sparse matrix formats that support matrix-vector multiplications (SpMV) in the compressed domain, with a focus on energy efficiency.

For those familiar with the IPA, my name is pronounced: [fraŋ'ʧesko to'zoːni].

Education

I earned a PhD in Computer Science Click here to download a PDF document. from the University of Pisa, under the supervision of Professors P. Ferragina and G. Manzini. My doctoral dissertation, titled Computation-friendly Compression of Matrices and Tries, focused on efficient data compression techniques. Since 2019, I have been a member of the Acube Laboratory (A³, Advanced Algorithms and Applications), directed by Professor P. Ferragina.

My research interests include lossless data compression, string indexing and stringology, and big data analytics.

I obtained a BSc in Computer and Electronic Engineering Click here to download a PDF document. from the University of Perugia. I then continued my studies at the University of Pisa, earning an MSc in Computer Science and Networking Click here to download a PDF document. in 2020, as part of a joint programme with the Sant’Anna School of Advanced Studies. My MSc thesis, Algorithms and Data Structures for Efficient Ride-Sharing Platforms, was awarded the Con.Scienze 2020 Best Thesis Award.

In 2020, I was awarded a scholarship and research grant on "Algorithms and Data Structures for Urban Mobility Platforms" at the University of Pisa. That same year, I obtained the qualification to register as a chartered engineer (Section A, Information Engineering).

From 8 September to 20 December 2022, I was a visiting researcher at Professor Gonzalo Navarro's laboratory at the University of Chile in Santiago. In July 2025, I was a visiting researcher at the Software Heritage team at Inria Paris, co-founded by Roberto di Cosmo.

Publications

2025

  • F. Tosoni, P. Bille, V. Brunacci, A. De Angelis, P. Ferragina, and G. Manzini. Toward Greener Matrix Operations by Lossless Compressed Formats, IEEE Access, doi: 10.1109/ACCESS.2025.3555119.

2024

  • A. Boffa, P. Ferragina, F. Tosoni, and G. Vinciguerra. CoCo-trie: Data-aware compression and indexing of strings, Information Systems (IS), doi: 10.1016/j.is.2023.102316.

2022

  • A. Boffa, P. Ferragina, F. Tosoni, and G. Vinciguerra. Compressed String Dictionaries via Data-Aware Subtrie Compaction, 29th International Symposium on String Processing and Information Retrieval (SPIRE 2022), doi: 10.1007/978-3-031-20643-6_17.
  • P. Ferragina, G. Manzini, T. Gagie, D. Köppl, G. Navarro, M. Striani, and F. Tosoni. Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices, Proceedings of the VLDB Endowment (PVLDB), 15(10), 2175 - 2187, 2022, doi: 10.14778/3547305.3547321
  • F. Tosoni, P. Ferragina, A. Marino, G. Resta, and P. Santi, Locality Filtering for Efficient Ride Sharing Platforms, IEEE Transactions on Intelligent Transportation Systems (IEEE TITS), doi: 10.1109/TITS.2021.3072830.

Awards

  • Sant'Anna School — Best Graduate Award (Master's in Computer Science & Networking)
    • Ranked 2nd in the cohort (2017-2018)
    • High GPA
    • Fastest completion
    Award details
  • con.Scienze — National Best Master's Thesis Award 2020
    • Selected among all Italian technical universities
    • August 2019 - July 2020 cohort
    Award announcement
  • HackTheAlps – #WeAgainstVirus 2020, 3rd Prize
    • Awarded 3rd prize for developing the Pharma-Q application prototype.
    • HackTheAlps focused on proposing software solutions and ideas to support local communities during the COVID-19 emergency. Our team developed an AI-powered web service to monitor queue lengths at pharmacies in Bozen/Bolzano using data acquired through surveillance cameras.
    Event Page | Team: Daniele Gadler, Tajammul Mustafa, Francesco Tosoni
  • First Ascent 2018 Finalist
    • Selected from over 400 applicants to participate in First Ascent 2018.
    • FA18 (Copenhagen, Denmark) was a coding challenge event organised and sponsored by Bending Spoons. The event brought together 20 top Italian tech students from universities in Italy (Bologna, Cagliari, Padua, Pisa, Rome, Trento), England (Cambridge, Oxford, Imperial College London), and Germany (TUM).
    Event Page

Code artefacts

Note: For each code artefact, I report the associated publication (c1, j1, j2, j3, and j4) in which the code served for experimental evaluations.

  • ppc-swh-rocksdb GitHub Efficiently reads large source-code datasets in Parquet format using a PPC solution on top of RocksDB. Achieved over 100 MiB/s insertion throughput and up to 10% compression with zstd. (GitHub site)
  • [j4] green-lossless-spmv GitHub
    Green Lossless Sparse Matrix-Vector Multiplication (SpMV) implementation. It focuses on lossless compression techniques that optimise space, time, and energy for multiplications between binary or ternary matrix formats and real-valued vectors. (GitHub site)
  • [j4] zuckerli GitHub
    Readapted Google's Zuckerli compressed matrix format to carry out computation-friendly matrix-vector multiplication kernels and PageRank computations. (GitHub fork)
  • [c1, j3] CoCo-trie GitHub
    A data-aware trie-shaped data structure for indexing and compressing string sets, developed by A³ lab. Implements principled subtree collapsing with optimal encoding scheme selection to minimise space. (GitHub site)
  • [j2] mm-repair GitLab
    Matrix multiplication implementation for RePair-compressed matrices. Efficient computation methods for matrices compressed using grammar-based compression techniques. (GitHub site)
  • Watermark GitHub Implements a C++ multi-threaded data-parallel version based on POSIX threads (pthreads) and fork-join mechanisms, and a FastFlow-enhanced version of an application applying a digital watermark on an image. Includes tools for performance evaluation and visualisation of time statistics. The repository received GitHub's Arctic Code Vault Contributor badge as part of the 2020 GitHub Archive Programme. (GitHub site)
  • PCAP Lab GitHub Contains C/C++ exercises demonstrating the use of the libpcap library for network traffic capture. Features include printing packet metadata, implementing a stateful RPC for packet counting, and identifying IP and TCP packets with their source/destination addresses. (GitHub site)
  • BeepBeep GitHub A microservice-based application for managing challenges based on Strava data. It allows users to create, check, complete, and delete challenges, with specific rules for winning (e.g., longer distance, higher speed). (GitHub site)
    • BeepBeep-dataservice manages core data operations. (GitHub site)
    • BeepBeep-challenges handles the logic and functionalities related to user challenges. (GitHub site)
    • BeepBeep-statistics processes and provides user statistics. (GitHub site)
    • BeepBeep-training-objectives manages training goals and objectives. (GitHub site)
    • BeepBeep-API-gateway acts as the entry point for external requests to the microservices. (GitHub site)
    • BeepBeep-emailer manages email notifications. (GitHub site)
    • BeepBeep-data-pump responsible for data ingestion or transfer. (GitHub site)
  • We Against Virus — PharmaQ GitHub A pharmacy queue prototype that secured 3rd place in the #WeAgainstVirus hackathon. This Flask-based web portal allows users to upload pictures from their phones, which then automatically detect and display the number of customers in a pharmacy's waiting line using Nanonetes' API for people detection. It integrates a local DB and a Google Maps interface. (GitHub site)
  • Wikimedia Hackathon 2025
    • Palermo, Italy | 14 —16 Mar 2025
    • Contributed to technical enhancements for Wikipedia and sister projects:
      • Automatic spelling detection for Template F: Optimised a Lombard Wikipedia template to auto-detect article orthography, eliminating manual configuration. (Enabled dynamic rendering for regional language variants)
      • Smart image resizing: Python script leveraging REST APIs to standardise image dimensions across articles, improving page aesthetics. (Reduced visual inconsistencies by automated proportional scaling)
    Event Page

Scholarships

Note: For each scholarship, I report the associated publications (c1,j1,j2,j3,j4) produced as outcomes.

  • Postdoctoral researcher
    • Project in collaboration with Software Heritage
    • Jul 2024 — Jun 2025
    • Univ. of Pisa, Italy

    Research conducted at the University of Pisa, on parallel and I/O-efficient compression and indexing techniques for large source-code archives (website). I collaborated with the founders Roberto di Cosmo and Stefano Zacchiroli and other Software Heritage team members.

  • PhD Scholarship
    • PhD in Computer Science, 36° cycle
    • Nov 2020 — Oct 2023
    • Univ. of Pisa

    Recipient of a three-year PhD research grant from the University of Pisa (Department of Computer Science). (c1, j1, j2, j3)

  • Research Scholarship
    • Citypost S.p.A.
    • Jun — Oct 2020
    • Univ. of Pisa

    Title: Algorithms and Data Structures for Urban Mobility Platforms. Duration: five months. Grant: Citypost S.p.A. Researched graph-based algorithmic solutions for vehicle routing and mobility problems, as part of Acube Lab's 2018–2020 research collaboration.

Participation in [inter]national projects

Note: For each project, I report the associated publications (c1,j1,j2,j3,j4) produced as outcomes.

  • NextGenerationEU—National Recovery and Resilience Plan (PNRR)
    • SoBigData.it-Strengthening the Italian RI for Social Mining and Big Data Analytics — Avviso (3264 del 28/12/2021)
    • 2022 —ongoing
    • Grant IR0000013

    Funding for the project “SoBigData.it-Strengthening the Italian RI for Social Mining and Big Data Analytics.”

    Visit pnrr.sobigdata.it/ (j3, j4)

  • European Union-NextGenerationEU-PNRR
    • ICSC-Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing
    • 2022 —ongoing
    • Spoke “Future HPC and BigData”

    Funding for the Spoke “Future HPC and BigData.”

    Learn more at www.supercomputing-icsc.it. (j3, j4)

  • European Union-Horizon 2020 Program
    • SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics
    • 2020 —ongoing
    • Grant 871042

    Funded through the Scheme “INFRAIA-01-2018-2019—Integrating Activities for Advanced Communities.” More information can be found at www.sobigdata.eu. (j1, j3, j4)

  • European Union-Horizon 2020 Program
    • HumanE AI Network
    • 2020 —ongoing
    • Grant 952026

    Funding through the project “HumanE AI Network.”
    See www.humane-ai.eu for details. (j1)

  • NextGenerationEU – PNRR / MUR PRIN Project
    • Multicriteria Data Structures and Algorithms: From Compressed to Learned Indexes, and Beyond
    • 2019–2023
    • Grant n. 2017WR7SHH

    Funding from the Italian Ministry of University and Research (MUR) under the “Progetti di Rilevante Interesse Nazionale” (PRIN) programme for the project “Multicriteria Data Structures and Algorithms.” Extended research includes compressed/learned indexes and beyond. Project website: learned.di.unipi.it. (j1, j3)

  • MIT-UniPI Grant
    • Using Graph Compression for Shortest Path Computation in Urban On-Demand Mobility
    • 2019–2021

    Grant on “Using Graph Compression for Shortest Path Computation in Urban On-Demand Mobility” (website). (j2)

International Research visits

  • Visiting Researcher
    • Software Heritage, Inria
    • 1 Jul — 31 Jul 2025
    • Paris, France

    Collaboration with the Software Heritage team on efficient code compression for storage and retrieval (shard format, Terabyte-sized caching system), specifically linked to the CodeCommons project supervised by Prof. Roberto Di Cosmo and Prof. Stefano Zacchiroli. This visit aims to strengthen the existing collaboration and apply research directly to Software Heritage's vast source code archive. (signed letter)

  • Visiting PhD student
    • University of Chile
    • Sem. I, A.Y. 22/23
    • Santiago de Chile

    Three-month stay at the laboratory directed by Prof. Gonzalo Navarro. I worked on applications of the k²-tree data structure for the storage and operation of large and sparse graphs. (signed letter)

Professional internships and traineeships

  • Research Intern & Software Development (Software Heritage)
    • Inria — Software Heritage
    • Jul 2025
    • Paris, France

    A focused research collaboration on efficient code compression and scalable data retrieval systems for Software Heritage, a leading global source code archive. Developed and optimised solutions for Terabyte-scale caching systems and shard format optimisation, directly addressing challenges in managing vast source code archives. This work, conducted within the CodeCommons project, aimed at enhancing the performance and cost-efficiency of large-scale software preservation and accessibility. Focused on applying advanced compression algorithms to real-world industrial-scale data, fostering innovation in data management for digital heritage.

  • Invited participant
    • Bending Spoons
    • Sep 2018
    • Copenhagen, Denmark

    Selected as one of 20 top Italian tech students from a pool of over 400 applicants to participate in this coding challenge event. Engaged directly with Bending Spoons team members and founders, gaining insights into the industry.

  • Intern
    • EPLASS GmbH
    • Aug 2014
    • Würzburg, Germany

    Worked with C# at an internet-based software company specialising in international collaborations. (attendance certificate)

  • Intern
    • Flyeralarm GmbH
    • Aug 2014
    • Würzburg, Germany

    Supported cross-departmental operations at a pan-European online printing firm. (attendance certificate)

Speaker

  • [Invited] Talk at Software Heritage
    • Talk: “Lossless-compressed data storage for SWH: Compressed, tunable & energy-aware”
    • 2 Jul 2025
    • Inria Paris Centre, Paris

    Gave a talk to the Software Heritage team about ongoing research on an I/O-efficient caching system, a terabyte-scale energy-aware solution for source code archival. A second milestone presented was a shard permutation designed to boost compression on the current SWH infrastructure. Shard refers to the file format used in Winery. (slides)

  • [Invited] Sant'Anna Workshop “Learning from large, complex and structured data: advances in methods and applications”
    • Talk: “Toward Greener Matrix Operations by Lossless Compressed Formats”
    • 4 Jun 2025
    • Sant'Anna School of Advanced Studies

    Participated in a two-day workshop held in the Aula Magna of the Sant'Anna School of Advanced Studies (SSSUP) showcasing interdisciplinary research in Economics, Management, Law, and Data Science of young researchers from L'EMbeDS, SMaRT COnSTRUCT project, and the AI for Society PhD programme.

    Contributed the session The frontiers of Computer Science – Chair: Prof. Andrea Vandin (website, slides)

  • [Invited] Google Developer Group (GDG) Pisa
    • Talk: “Verso operazioni più green su matrici tramite formati compressi senza perdita.”
    • 27 Feb 2025
    • Polo Fibonacci, Univ. of Pisa

    Invited talk on lossless compressed formats for greener sparse matrix operations at GDG Pisa. Thanks to Giovanna Rotundo (Women Techmakers Pisa) for the invitation. (website, slides)

  • 2025 Software Heritage Community Workshop, Paris
    • Poster: “Measuring impact by extracting knowledge of software assets”
    • 30 Jan 2025
    • Inria Palace, Paris

    Contributed to a collaborative community workshop and presented one of the four community posters titled Measuring impact by extracting knowledge of software assets (Zenodo link). The initiative aimed to enhance transparency, improve accessibility, and promote the mission of SWH. (workshop website)

    Posters presented by other groups:

    • Discovering open-source: One-stop shop for software discovery (Zenodo link)
    • The Library of Alexandria was available, until it was not (Zenodo link)
    • Repair today, repair tomorrow: Software Heritage (Zenodo link)
  • [Invited] Software Heritage Kickoff Workshops, Paris
    • Presented at the CodeCommons Kickoff
    • 28 Jan 2025
    • Inria Paris, France

    I gave a talk in front of all research teams about my contribution to enhancing space —time performances of insertion and retrieval into the SWH archive. Talk title: “Enhancing SWH Object Storage with Compressed and Dynamic Solutions”. (website, slides)

  • [Invited] Seminar at the Ca' Foscary University
    • Toward Greener Matrix Operations by Lossless Compressed Formats
    • 7 Nov 2024
    • Ca' Foscary Univ., Venice

    Presented before the members of the REGINDEX research group (Ca' Foscari University, Venice), directed by Prof. Nicola Prezza, a preprint I contributed as first author on the relationship between computation-friendly lossless compressed matrix formats for matrix-vector multiplication kernels and energy savings. (slides)

  • [Invited] Talk, Efficient Machine Learning Reading Group, chaired by TU Graz
    • Toward Greener Matrix Operations by Lossless Compressed Formats
    • 21 Oct 2024
    • virtual

    Presented a preprint I contributed as first author on the relationship between computation-friendly lossless compressed matrix formats for matrix-vector multiplication kernels and energy savings. The seminar series is chaired by the Embedded Learning and Sensing Systems research group directed by Prof. Olga Saukh (TU Graz). (website, YouTube, slides)

  • [Invited] Complexity Science Hub (CSH) Webtalk, Vienna
    • Improving Matrix-Vector Multiplication via Lossless Grammar-Compressed Matrices
    • 16 Sep 2022
    • Complexity Science Hub (CSH)

    Presented the journal article (j2) at a virtual talk hosted by the CSH. Thank Prof. Olga Saukh and Mr. Niraj Kushwaha (CSH) for the invitation. (website, slides)

  • VLDB '22: 48th International Conference on Very Large Databases
    • Talk: “Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices”
    • 5 —9 Sep 2022
    • Sydney, Australia

    Presented the article (j2) as corresponding author [virtual]. (PVLDB website, slides)

  • Lipari School of Computational Complexity and Social Systems
    • PhD research summary presentation
    • 17 —23 Jul 2022
    • Lipari, Italy

    During the summer school, I presented intermediate results of my PhD research. (Certificate of attendance)

  • Mauriana Pesaresi Seminar Series 2020/2021, Univ. of Pisa
    • Locality Filtering for Efficient Ride Sharing Platforms
    • 19 Feb 2021
    • Univ. of Pisa, Italy

    Presented my work on locality filtering techniques for ride-sharing platforms as part of this PhD student-organised seminar series. The talk discussed approaches to substantially speed up ride sharing computations while maintaining solution quality. (website, slides)

Participation to conferences

  • Software Heritage 2025 Symposium and Summit
    • UNESCO headquarters
    • 29 Jan 2025
    • Paris, France

    Engaged with leaders from UNESCO, Inria, and Software Heritage, participating in discussions and panels on critical topics including cybersecurity and regulation (e.g., EU's Cyber Resilience Act), open and transparent AI (with insights from EU AI Office, IBM Research, Open Source Initiative), open science (aligned with UNESCO Recommendation on Open Science), and cultural preservation of software as digital heritage. (UNESCO website, SWH website)

  • From Software Heritage to Code Commons: A Vision for Transparent and Responsible AI in Code-Based Model Training
    • Sant'Anna School of Advanced Studies
    • 12 Dec 2024
    • Sant'Anna School

    I attended a seminar presented by Roberto Di Cosmo (University Paris of Cité, SWH founder), which was held at the Pilo Boyl Palace, Sant'Anna School of Advanced Studies, Pisa. Gained insights into the ethical and technical challenges of using open codebases for AI model training, emphasising the importance of transparency, accountability, and the role of SWH in fostering CodeCommon's goals for responsible AI development. (website)

  • Conference article: “Compressed String Dictionaries via Data-Aware Subtrie Compaction”
    • SPIRE '22: 29th International Symposium on String Processing and Information Retrieval
    • 8 —10 nov 2022
    • Concepción, Chile

    In presence attendance to SPIRE '22, where my research group contributed the conference article (c1) (website)

last update: 18th August '25