![]() |
Paolo FerraginaFull Professor![]() Department of Computer Science
- I'm leading the Advanced Algorithms and Application Lab (Acube), located at the Department of Computer Science, University of Pisa. We design and implement algorithms for compressing, indexing, searching and mining Big Data. We had/have collaborations on these themes with Senseable City Lab at MIT (Boston, USA), Pinello Lab at Harvard and MassGeneral Hospital (Boston, USA), European Broadcasting Union (EBU), Google (Zurich), Bloomberg (London), ST Microelectronics, Tiscali (Istella's search engine), Yahoo! Research (Barcelona), ENEL Foundation, CERVED Group, Bassilichi, Spazio Dati.
- I've got some international awards for my research activities, some follow:
- 2023 ESA Test of Time Award, assigned by the European Symposium on Algorithms (ESA) for the paper on "Engineering a Lightweight Suffix Array Construction Algorithm", Procs ESA 2002 and Algorithmica 2004.
- 2023 CIKM Test of Time Award, assigned by the ACM Conference on Information and Knowledge Management (CIKM) for the paper on TAGME (paper, software), 2023.
- ACM Paris Kanellakis from Theory to Practice award, from the Association of Computing Machinery (USA), 2022.
- Google Cloud Research Innovator program, 2022.
- "Cherub" (Cherubino), from the University of Pisa, 2019.
- Three Google research awards in 2010, 2012 and 2016.
- A 5-year Yahoo! research award from 2007 to 2010.
- The "Philip Morris Award on Science and Technology", 1997;
- The "EATCS Doctoral Dissertation Thesis Award", from the EATCS Italian Chapter, 1997.
- About my research [updated on Jan 2023]: my h-index is 38 on Scopus (with more than 6k citations) and 42 on Google Scholar (with more than 11k citations), and concerns with the following five main lines which are detailed below: Compressed and Learned Data Structures, Data Compression, Search Engines, Sports Analytics.
- I'm currently serving under the following duties:
- Member of the Steering Committee of the SIAM Symposium on Algorithm Engineering and Experiments (ALENEX), 2024-2028.
- Member of the Board of the PhD in Computer Science of the University of Pisa (since 2002),
- Member of the Board of the Italian National PhD program in Artificial Intelligence for Society of the University of Pisa (since 2013/14, cycle 39),
- Member of the Scientific Committee of the CNR Department on "Engineering, ICT, and Technologies for Energy and Mobility" (since 2023),
- co-Director of the J.T. Schwartz International School for Scientific Research, topic "Computational Complex and Social Systems" (since 2016),
- Member of the Editorial Board of the Journal on Graph Algorithms and Applications (JGAA) (since 2011).
- Area Editor of the Encyclopedia of Algorithms (Springer, Editor Ming-Yang Kao, 2016), and of the Encyclopedia of Big Data Technologies (Springer, Editor Zomaya and Sakr, 2018).
- Member of the Advisory Board of the Master in "Big Data Analytics & Artificial Intelligence for Society", promoted by University of Pisa and CNR, and with the support of Scuola Superiore Sant'Anna and Scuola Normale Superiore.
- I'm proud to have served my University as:
- Vice-Rector on ICT (2019-2022).
- President of the PhD in Computer Science which is a joint initiative of the Universities of Pisa, Florence and Siena (2017-2020);
- Vice-rector on "Applied Research and Innovation" (2010-2016);
- President of the IT Center (which is a competence center about Cloud and HPC for Dell and Intel, Xeonphi Centre for Intel, 2009-2016);
- Vice-chairman of the Department of Computer Science (2006-2010).
- I published more than 180 peer reviewed papers in international journals and conferences.
- On Jan 2023, my h-index is 38 on Scopus (with more than 6k citations) and 42 on Google Scholar (with more than 10k citations). See DBLP for an updated list of my most recent publications.
- My research got the following main awards: "Best Land Transportation Paper Award" from IEEE Vehicular Technology Society (1995); "EATCS Doctoral Dissertation Thesis Award" (1997); "Philip Morris Award on Science and Technology" (1997); "Research Capital award" from the University of Pisa (2002); Yahoo! faculty award (2007-2010); Working Capital Award (2010); Google research award (2010, 2012 and 2016); Bloomberg Data Science research grant (2017), Google Cloud Research Innovator program (2022), ACM Paris Kanellakis from Theory to Practice award (2022), CIKM 2023 Test of Time award for his paper on TagME (2023).
- I got accepted
- For the Patents: five in US (owned by Lucent, University of Pisa and Rutgers, Yahoo!, New York University, Catalog Technologies), one in Italy (owned by University of Pisa, code = 102021000014069), and three more are pending in the USA (owned by Yahoo!, and University of Pisa).
- For the journals: 5 Journal of the ACM, 3 SIAM Journal on Computing, 4 ACM Transactions on Algorithms, 7 Algorithmica, 7 Theoretical Computer Science, 2 ACM Trans. on Information Systems, etc.;
- For the conferences: 3 ACM STOC, 5 IEEE FOCS, 7 ACM-SIAM SODA, 13 ESA, 6 WWW, 3 ACM WSDM, etc.
- About my research topics, they address four main lines:
- Data Compression: In the period June 2016 - December 2020 Google was supporting our research on Information Theory and Data Compression, which last since 15 years. The project was called "An algorithmic analysis of Brotli to personalized data compression", which got a Google Faculty Award (2016). Our contribution to the design of Brotli, which is one of the compressors used in many browsers and applications worldwide (e.g. Chrome), is described in the paper I co-authored with Jyirki Alakuijala and his Googlers group, published in ACM TOIS 4:2018.
- Compressed and Learned Data Structures: Yahoo! Research sponsored my promotion to full professor from 2007 to 2011, awarding my research on compressed data structures. Among these results, I mention the FM-index that got the ACM Kanellakis Award in 2022 (see above). Very recently, I moved my attention to the interplay between Machine Learning and Data Compression, also thanks to the support by MIUR PRIN 2017, and started to design and develop data structures that successfully address the new issues posed by modern applications. A first example is the PGM index.
- Search Engines and Information Retrieval: I've got two Google research awards, in 2010 and 2012, and one Bloomberg Research Grant, in 2017 (round 4), on designing and applying a novel Semantic Annotation technology, known as TAGME (which got the 2023 CIKM Test of Time Award, see above, and won the ERD Challenge at SIGIR 2014), to several IR problems: Classification (ECIR 12), Clustering (WSDM 12), Social Network Analysis (ICWSM 15) and Query Disambiguation (WWW '16). For further information on the suite of entity-linking tools we developed in those years, please look at the Web API publicly available at the SoBigData Platform.
- Sports Analytics: We are developing algorithms and AI techniques for the evaluation of performance of soccer players or the planning of soccer matches based on Big Data provided to us by Wyscout. Our first result is PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. PlayeRank has been designed and evaluated on a massive dataset of millions of events pertaining to four seasons of the five prominent European leagues, showing that it is the state-of-the art. Starting from these scientific achievements I co-founded in 2018 the start-up PlayeRank, which currently is a spin-off of the University of Pisa.
- Member of the Scientific Committee of the collaboration between University of Pisa and the "Fondazione Toscana Life Sciences" (June 2020 - December 2022).
- Member of the Scientific Committee of the "Fondazione Innovazione e Sviluppo Imprenditoriale" of the Camera Commercio di Pisa (January 2017 - December 2022).
- Member of the 2019 ERC Panel Committee for PE6 - Computer Science and Informatics.
- Member of the Editorial Board of the Journal on Graph Algorithms and Applications (JGAA), since 2011.
- Area Editor of the Encyclopedia of Algorithms (Springer, Editor Ming-Yang Kao, 2016), and of the Encyclopedia of Big Data Technologies (Springer, Editor Zomaya and Sakr, 2018).
- co-Director of the Lipari PhD School on on Computational Complex and Social Systems, Lipari (Italy), since 2014.
- Member of the Steering Committee of the European Symposium on Algorithms (ESA), 2012-2014.
- The Acube Lab had/has collaborations on Algorithmics for BigData with Senseable City Lab at MIT (Boston, USA), Pinello Lab at Harvard and MassGeneral Hospital (Boston, USA), Google (Zurich), Bloomberg (London), European Broadcasting Union (EBU), ST Microelectronics, Tiscali (Istella's search engine), Yahoo! Research (Barcelona), ENEL Foundation, CERVED Group, Bassilichi, Spazio Dati. Currently, he is:
- PI of the project on "Large scale compression of software source code" developed in collaboration with Roberto di Cosmo and Stefano Zacchiroli for the Software Heritage project (since 2022).
- PI of the project on "Biological Knowledge Graphs and AI for BioMed applications" developed in collaboration with FerroLab (University of Catania), selected in the 2022 Google Cloud Research Innovators program.
- Partner of the HumanE-AI project (call H2020-ICT-2019-3), sponsored by EU and whose main goal will be to study, design and experiment AI systems that augment and empower all Humans by understanding us, our society and the world around us. The project consortium, with 35 partners from 17 countries, and 4 large industrial members, has defined details of all aspects necessary to implement a full scale European action to mobilize major scientific, industrial, political and public support for the vision. (See the one day event at the European Parliament, March 2020.)
- Partner of the SoBigData++ EU research infrastructure for Social Mining and Big Data Analytics, started in January 2020. SoBigData++ strives to deliver a distributed, Pan-European, multi-disciplinary research infrastructure for big social data analytics, coupled with the consolidation of a cross-disciplinary European research community, aimed at using social mining and big data to understand the complexity of our contemporary, globally-interconnected society. SoBigData++ is set to advance on such ambitious tasks thanks to SoBigData, the predecessor project that started this construction in 2015.
- National Coordinator of the MUR-PRIN project on "Multicriteria Data Structures and Algorithms: from compressed to learned indexes, and beyond" (call 2017) to collaborate with groups led by R. Giancarlo (UniPA), G. Manzini (UniPO) and M. Frasca (UniMI).
- Coordinator for his University of the project joint with the European Broadcasting Union (EBU) on "algorithms, applications, prototypes and products in the field of Media service Production and Distribution as well as Data Science Applications which are of interest to EBU Members" (2020-).
- PI of the MIT-UniPI project on "Using Graph Compression for Shortest Path Computation in Urban On-Demand Mobility", developed in collaboration with the Carlo Ratti's Senseable City Lab at MIT (Boston, USA), 2018-2021.
I am a Professor of Algorithms at the University of Pisa, with some other duties that are listed at the top of this web page. I founded and lead the Acube Lab, where we design algorithms for Big Data, with collaborations with companies worldwide: Google, Bloomberg, European Broadcasting Union, Tiscali, Yahoo!, ST Microelectronics, ENEL, Bassilichi, CERVED, Spazio Dati, etc. etc..
My promotion to full professor was sponsored by Yahoo! Research, from 2007 to 2011. In 2019, I got the honors of "Cherub" (Cherubino), from the University of Pisa. And in 2022, I was awared the ACM Paris Kanellakis award, from Association of Computing Machinery. And in 2023, my paper on TagMe got the CIKM 2023 Test of Time award.
From 2019 to 2022, I was the Vice-Rector on ICT of my University.
From 2018 to 2020, I was the President of the PhD in Computer Science which is a joint initiative of the Universities of Pisa, Florence and Siena. This is a regional PhD, named Pegaso, with about 15 students per year (over a total period of three years) and fellowships covered by Italian-MIUR, Tuscan Region, IIT-CNR and ISTI-CNR. Anyway, I'm a member of the PhD Steering Committee since 2002.
From 2010 to 2016, I was Vice-Rector on "Applied Research and Innovation" at the University of Pisa, in the same period I was the President of the IT Center, which is a competence center about Cloud and HPC for Dell and Intel, Xeonphi Centre for Intel, and recently Transform Data Center immersion for Microsoft. In this period, among other achievements, we introduced the Phd Plus: a series of lectures on research valorization and entrepreneurship for Phd and Master students, and Faculties of the University of Pisa. The PhD Plus originated many start-ups that won some (inter-)national competitions and raised some millions worldwide. The PhD Plus got into the shortlisted projects of the QS Reimagine Education Award 2016 (Philadelphia, USA).<\p>
From 2006 to 2010 I was the vice-chairman of the Department of Computer Science of the University of Pisa.
Among the other significant institutional duties, I mention: member of the Scientific Committee of the Fondazione "Innovazione e Sviluppo Imprenditoriale"; member of the Patent Committee of the University of Pisa; member of the CdA of Consorzio QUINN; member of the Scientific Committee of the Fondazione Toscana Life Sciences. I taught at the Scuola Normale Superiore (2009-11) and I was one of the scientific coordinators of its research center Signum, specialized on Digital Humanities (2004-07).
I got my Laurea degree (summa cum laude, 1992) and my PhD (1996) in Computer Science from the University of Pisa, and my Post-doc from the Max-Planck Institut fur Informatik (Saarbrucken, 1997-98). From 1998 to 2000, I've been Assistant Professor at the University of Pisa; and from 2000 to 2007, I've been Associate Professor at the same University. I also spent various periods of research at IBM Research (Rome), AT&T Shannon Lab (NJ), Yahoo! Research (Barcelona), Google (NY), University of North Texas, Max Planck Institut fuer Informatik (DE), Courant Institute at New York University (USA), Harvard (USA) and MIT (USA).
My research is mainly devoted to the design, analysis and experimentation of algorithms and data structures for storing, compressing, mining and retrieving information from Big Data. My research results received five US Patents (owned by Lucent, University of Pisa and Rutgers, Yahoo!, New York University, CatalogDNA) and some international awards: "Best Land Transportation Paper Award" from IEEE Vehicular Technology Society (1995); "EATCS Doctoral Dissertation Thesis Award" (1997); "Philip Morris Award on Science and Technology" (1997); "Research Capital award" from the University of Pisa (2002); Yahoo! faculty award (2007-2010); Working Capital Award (2010); Google research award (2010, 2012 and 2016); Bloomberg Data Science research grant (2017); ACM Paris Kanellakis award (2022).
I've been invited speaker of many international conferences and workshops on Algorithmics; in particular, I was a keynote speaker of CPM '04, SPIRE '05, ESA/ALGO 2010, SISAP '11, and Industral Track of ECIR 2012, DFG Priority Program 1307 "Algorithm Engineering" (Germany) and the DIITET 2015 National Conference of CNR (Pisa), Satellite Workshop on Quantifying Success \@ NETSCI (Parigi, 2018), INNS Conference on Big Data and Deep Learning (Venezia, 2019), Italian Conference on Theoretical Computer Science (Ischia, ICTCS 2020).
I'm serving in the Editor Board of the Journal of Graph Algorithms and Applications (JGAA), and I was in the Steering Committee of the European Sympoium on Algorithms (ESA, 2012-2014) and I was one of the Area Editors of the Encyclopedia of Algorithms (Springer, Editor Ming-Yang Kao) for the topics "Data compression, String Algorithms and Data Structures", and of the Encyclopedia of Big Data Technologies (Springer, Editor Zomaya and Sakr, 2018). I served as (co)editor of special issues on the international journals: Theory of Computing Systems (June 2006), Theoretical Computer Science (November 2007), Information Retrieval (August 2008), Theoretical Computer Science (November 2009), Algorithmica (November 2014).
I have served as PC member of many International Conferences on Theoretical Computer Science, specifically in the field of Algorithmics. I've been co-chair of International Conference on FUN with Algorithms (2004), DIMACS Workshop on the Burrows-Wheeler Transform (2004), Symposium on String Processing and Information Retrieval (2006), Symposium on Combinatorial Pattern Matching (2008), European Symposium on Algorithms-- Algorithm Engineering Track (2012), ACM Conference on Web Search and Data Mining (2013), the Indo-Italian School on ``Algorithms and Combinatorics'' that took place within the 5th Annual International Conference on Algorithms and Discrete Applied Mathematics (IIT Kharagpur, India).
I (co-)authored more than 180 (refereed) publications in international refereed conferences and journals on Theoretical Computer Science and Algorithmics. Let me recall some of my publications in journals --- such as 5 JACM, 3 SICOMP, 4 ACM Trans on Algorithms, 2 ACM Trans. on Information Systems, 7 Algorithmica, 7 TCS, 2 ACM J. on Exp. Algorithmics, 2 Information and Computation, etc.. --- and conferences --- such as 3 ACM STOC, 5 IEEE FOCS, 7 ACM-SIAM SODA, 13 ESA, 6 WWW, 6 ACM WSDM, etc.. I have also co-authored two Italian books and two English books: one on Cryptography (Bollati Boringhieri, 2001 and 2007; now UniPI Press, 2015), one on Computational Thinking (Il Mulino, 2017), recently translated in English and published by Springer (2018), and one on Algorithm Engineering published by Cambridge University Press. I also authored some chapters in books: just to mention a few, one chapter on "String search in external memory: Algorithms and data structures" in the Handbook of Computational Molecular Biology (CRC Press, Editor Srinivas Aluru, 2005), and one chapter on "Web Search" in the book On the power of algorithms (Springer, Editors Ausiello-Petreschi, 2013), and on ``Computational Biology'' in the ``Handbook of Data Structures and Applications'' (CRC Press, 2017).
Here you can download an updated curriculum vitae, whereas for an updated list of publications look at the CS Bibliographic Database, or via Google Scholar.