CAGE: Cache-Aware Graphlet Enumeration

Abstract

When information is (implicitly or explicitly) linked in its own nature, and is modeled as a network, retrieving patterns can benefit from this linked structure. In networks, “graphlets” (connected induced subgraphs of a given size k) are the counterparts of textual n-grams, as their frequency and shape can give powerful insights in the structure of a network and the role of its nodes. Differently from n-grams, the number of graphlets increases dramatically with their size k. We aim to push the exact enumeration of graphlets as far as possible, as enumeration (contrary to counting or approximation) gives the end-user the flexibility of arbitrary queries and restrictions on the graphlets found. For this, we exploit combinatorial and cache-efficient design strategies to cut the computational cost. The resulting algorithm CAGE (Cache-Aware Graphlet Enumeration) outperforms existing enumeration strategies by at least an order of magnitude, exhibiting a low number of L1-L2-L3 cache misses in the CPU. It is also competitive with the fastest known counting algorithms, without having their limitations on k.

Type
Publication
In Proceedings of the 39th International Conference on String Processing and Information Retrieval, 2023