Here you can find a consolidated (a.k.a. slowly updated) list of my publications. A frequently updated (and possibly noisy) list of works is available on my Google Scholar profile.
Please find below a short list of highlight publications for my recent activity.
Serramazza, Davide Italo; Bacciu, Davide
Learning image captioning as a structured transduction task Conference
Proceedings of the 23rd International Conference on Engineering Applications of Neural Networks (EANN 2022), vol. 1600, Communications in Computer and Information Science Springer, 2022.
@conference{Serramazza2022,
title = {Learning image captioning as a structured transduction task},
author = {Davide Italo Serramazza and Davide Bacciu},
doi = {doi.org/10.1007/978-3-031-08223-8_20},
year = {2022},
date = {2022-06-20},
urldate = {2022-06-20},
booktitle = {Proceedings of the 23rd International Conference on Engineering Applications of Neural Networks (EANN 2022)},
volume = {1600},
pages = {235–246},
publisher = {Springer},
series = {Communications in Computer and Information Science },
abstract = {Image captioning is a task typically approached by deep encoder-decoder architectures, where the encoder component works on a flat representation of the image while the decoder considers a sequential representation of natural language sentences. As such, these encoder-decoder architectures implement a simple and very specific form of structured transduction, that is a generalization of a predictive problem where the input data and output predictions might have substantially different structures and topologies. In this paper, we explore a generalization of such an approach by addressing the problem as a general structured transduction problem. In particular, we provide a framework that allows considering input and output information with a tree-structured representation. This allows taking into account the hierarchical nature underlying both images and sentences. To this end, we introduce an approach to generate tree-structured representations from images along with an autoencoder working with this kind of data. We empirically assess our approach on both synthetic and realistic tasks.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Image captioning is a task typically approached by deep encoder-decoder architectures, where the encoder component works on a flat representation of the image while the decoder considers a sequential representation of natural language sentences. As such, these encoder-decoder architectures implement a simple and very specific form of structured transduction, that is a generalization of a predictive problem where the input data and output predictions might have substantially different structures and topologies. In this paper, we explore a generalization of such an approach by addressing the problem as a general structured transduction problem. In particular, we provide a framework that allows considering input and output information with a tree-structured representation. This allows taking into account the hierarchical nature underlying both images and sentences. To this end, we introduce an approach to generate tree-structured representations from images along with an autoencoder working with this kind of data. We empirically assess our approach on both synthetic and realistic tasks.