Publications

Here you can find a consolidated (a.k.a. slowly updated) list of my publications. A frequently updated (and possibly noisy) list of works is available on my Google Scholar profile.
Please find below a short list of highlight publications for my recent activity.
Show all
 Cosimo, Della Santina;  Visar, Arapi;  Giuseppe, Averta;  Francesca, Damiani;  Gaia, Fiore;  Alessandro, Settimi;  Giuseppe, Catalano Manuel;  Davide, Bacciu;  Antonio, Bicchi;  Matteo, Bianchi
Learning from humans how to grasp: a data-driven architecture for autonomous grasping with anthropomorphic soft hands Journal Article 
In: IEEE Robotics and Automation Letters, pp. 1-8, 2019, ISSN: 2377-3766, (Also accepted for presentation at ICRA 2019).
Links | BibTeX
@article{ral2019,

title = {Learning from humans how to grasp: a data-driven architecture for autonomous grasping with anthropomorphic soft hands},

author = {Della Santina Cosimo and Arapi Visar and Averta Giuseppe and Damiani Francesca and Fiore Gaia and Settimi Alessandro and Catalano Manuel Giuseppe and Bacciu Davide and Bicchi Antonio and Bianchi Matteo},

url = {https://ieeexplore.ieee.org/document/8629968},

doi = {10.1109/LRA.2019.2896485},

issn = {2377-3766},

year  = {2019},

date = {2019-02-01},

journal = {IEEE Robotics and Automation Letters},

pages = {1-8},

note = {Also accepted for presentation at ICRA 2019},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close
https://ieeexplore.ieee.org/document/8629968
doi:10.1109/LRA.2019.2896485
Close
 Arapi, Visar;  Santina, Cosimo Della;  Bacciu, Davide;  Bianchi, Matteo;  Bicchi, Antonio
DeepDynamicHand: A deep neural architecture for labeling hand manipulation strategies in video sources exploiting temporal information  Journal Article 
In: Frontiers in Neurorobotics, vol. 12, pp. 86, 2018.
Abstract | Links | BibTeX
@article{frontNeurob18,

title = {DeepDynamicHand: A deep neural architecture for labeling hand manipulation strategies in video sources exploiting temporal information },

author = {Visar Arapi and Cosimo Della Santina and Davide Bacciu and Matteo Bianchi and Antonio Bicchi},

url = {https://www.frontiersin.org/articles/10.3389/fnbot.2018.00086/full},

doi = {10.3389/fnbot.2018.00086},

year  = {2018},

date = {2018-12-17},

urldate = {2018-12-17},

journal = {Frontiers in Neurorobotics},

volume = {12},

pages = {86},

abstract = {Humans are capable of complex manipulation interactions with the environment, relying on the intrinsic adaptability and compliance of their hands. Recently, soft robotic manipulation has attempted to reproduce such an extraordinary behavior, through the design of deformable yet robust end-effectors. To this goal, the investigation of human behavior has become crucial to correctly inform technological developments of robotic hands that can successfully exploit environmental constraint as humans actually do. Among the different tools robotics can leverage on to achieve this objective, deep learning has emerged as a promising approach for the study and then the implementation of neuro-scientific observations on the artificial side. However, current approaches tend to neglect the dynamic nature of hand pose recognition problems, limiting the effectiveness of these techniques in identifying sequences of manipulation primitives underpinning action generation, e.g. during purposeful interaction with the environment. In this work, we propose a vision-based supervised Hand Pose Recognition method which, for the first time, takes into account temporal information to identify meaningful sequences of actions in grasping and manipulation tasks . More specifically, we apply Deep Neural Networks to automatically learn features from hand posture images that consist of frames extracted from grasping and manipulation task videos with objects and external environmental constraints. For training purposes, videos are divided into intervals, each associated to a specific action by a human supervisor. The proposed algorithm combines a Convolutional Neural Network to detect the hand within each video frame and a Recurrent Neural Network to predict the hand action in the current frame, while taking into consideration the history of actions performed in the previous frames. Experimental validation has been performed on two datasets of dynamic hand-centric strategies, where subjects regularly interact with objects and environment. Proposed architecture achieved a very good classification accuracy on both datasets, reaching performance up to 94%, and outperforming state of the art techniques. The outcomes of this study can be successfully applied to robotics, e.g for planning and control of soft anthropomorphic manipulators. },

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close
Humans are capable of complex manipulation interactions with the environment, relying on the intrinsic adaptability and compliance of their hands. Recently, soft robotic manipulation has attempted to reproduce such an extraordinary behavior, through the design of deformable yet robust end-effectors. To this goal, the investigation of human behavior has become crucial to correctly inform technological developments of robotic hands that can successfully exploit environmental constraint as humans actually do. Among the different tools robotics can leverage on to achieve this objective, deep learning has emerged as a promising approach for the study and then the implementation of neuro-scientific observations on the artificial side. However, current approaches tend to neglect the dynamic nature of hand pose recognition problems, limiting the effectiveness of these techniques in identifying sequences of manipulation primitives underpinning action generation, e.g. during purposeful interaction with the environment. In this work, we propose a vision-based supervised Hand Pose Recognition method which, for the first time, takes into account temporal information to identify meaningful sequences of actions in grasping and manipulation tasks . More specifically, we apply Deep Neural Networks to automatically learn features from hand posture images that consist of frames extracted from grasping and manipulation task videos with objects and external environmental constraints. For training purposes, videos are divided into intervals, each associated to a specific action by a human supervisor. The proposed algorithm combines a Convolutional Neural Network to detect the hand within each video frame and a Recurrent Neural Network to predict the hand action in the current frame, while taking into consideration the history of actions performed in the previous frames. Experimental validation has been performed on two datasets of dynamic hand-centric strategies, where subjects regularly interact with objects and environment. Proposed architecture achieved a very good classification accuracy on both datasets, reaching performance up to 94%, and outperforming state of the art techniques. The outcomes of this study can be successfully applied to robotics, e.g for planning and control of soft anthropomorphic manipulators. 
Close
https://www.frontiersin.org/articles/10.3389/fnbot.2018.00086/full
doi:10.3389/fnbot.2018.00086
Close
 Davide, Bacciu
A Perceptual Learning Model to Discover the Hierarchical Latent Structure of Image Collections PhD Thesis 
2008.
Abstract | Links | BibTeX
@phdthesis{11568_466874,

title = {A Perceptual Learning Model to Discover the Hierarchical Latent Structure of Image Collections},

author = { Bacciu Davide},

url = {http://e-theses.imtlucca.it/id/eprint/7},

doi = {10.6092/imtlucca/e-theses/7},

year  = {2008},

date = {2008-01-01},

urldate = {2008-01-01},

publisher = {IMT Lucca},

abstract = {Biology has been an unparalleled source of inspiration for the work of researchers in several scientific and engineering fields including computer vision. The starting point of this thesis is the neurophysiological properties of the human early visual system, in particular, the cortical mechanism that mediates learning by exploiting information about stimuli repetition. Repetition has long been considered a fundamental correlate of skill acquisition andmemory formation in biological aswell as computational learning models. However, recent studies have shown that biological neural networks have differentways of exploiting repetition in forming memory maps. The thesis focuses on a perceptual learning mechanism called repetition suppression, which exploits the temporal distribution of neural activations to drive an efficient neural allocation for a set of stimuli. This explores the neurophysiological hypothesis that repetition suppression serves as an unsupervised perceptual learning mechanism that can drive efficient memory formation by reducing the overall size of stimuli representation while strengthening the responses of the most selective neurons. This interpretation of repetition is different from its traditional role in computational learning models mainly to induce convergence and reach training stability, without using this information to provide focus for the neural representations of the data. The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning. The model is applied to generalproblems in the fields of computational intelligence and machine learning. Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data. In particular, it is shown that the repetition suppression mechanism efficiently addresses the issues of automatically estimating the number of clusters within the data, as well as filtering noise and irrelevant input components in highly dimensional data, e.g. gene expression levels from DNA Microarrays. The CoRe model produces relevance estimates for the each covariate which is useful, for instance, to discover the best discriminating bio-markers. The description of the model includes a theoretical analysis using Huber’s robust statistics to show that the model is robust to outliers and noise in the data. The convergence properties of themodel also studied. It is shown that, besides its biological underpinning, the CoRe model has useful properties in terms of asymptotic behavior. By exploiting a kernel-based formulation for the CoRe learning error, a theoretically sound motivation is provided for the model’s ability to avoid local minima of its loss function. To do this a necessary and sufficient condition for global error minimization in vector quantization is generalized by extending it to distance metrics in generic Hilbert spaces. This leads to the derivation of a family of kernel-based algorithms that address the local minima issue of unsupervised vector quantization in a principled way. The experimental results show that the algorithm can achieve a consistent performance gain compared with state-of-the-art learning vector quantizers, while retaining a lower computational complexity (linear with respect to the dataset size). Bridging the gap between the low level representation of the visual content and the underlying high-level semantics is a major research issue of current interest. The second part of the thesis focuses on this problem by introducing a hierarchical and multi-resolution approach to visual content understanding. On a spatial level, CoRe learning is used to pool together the local visual patches by organizing them into perceptually meaningful intermediate structures. On the semantical level, it provides an extension of the probabilistic Latent Semantic Analysis (pLSA) model that allows discovery and organization of the visual topics into a hierarchy of aspects. The proposed hierarchical pLSA model is shown to effectively address the unsupervised discovery of relevant visual classes from pictorial collections, at the same time learning to segment the image regions containing the discovered classes. Furthermore, by drawing on a recent pLSA-based image annotation system, the hierarchical pLSA model is extended to process and representmulti-modal collections comprising textual and visual data. The results of the experimental evaluation show that the proposed model learns to attach textual labels (available only at the level of the whole image) to the discovered image regions, while increasing the precision/ recall performance with respect to flat, pLSA annotation model.},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close
Biology has been an unparalleled source of inspiration for the work of researchers in several scientific and engineering fields including computer vision. The starting point of this thesis is the neurophysiological properties of the human early visual system, in particular, the cortical mechanism that mediates learning by exploiting information about stimuli repetition. Repetition has long been considered a fundamental correlate of skill acquisition andmemory formation in biological aswell as computational learning models. However, recent studies have shown that biological neural networks have differentways of exploiting repetition in forming memory maps. The thesis focuses on a perceptual learning mechanism called repetition suppression, which exploits the temporal distribution of neural activations to drive an efficient neural allocation for a set of stimuli. This explores the neurophysiological hypothesis that repetition suppression serves as an unsupervised perceptual learning mechanism that can drive efficient memory formation by reducing the overall size of stimuli representation while strengthening the responses of the most selective neurons. This interpretation of repetition is different from its traditional role in computational learning models mainly to induce convergence and reach training stability, without using this information to provide focus for the neural representations of the data. The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning. The model is applied to generalproblems in the fields of computational intelligence and machine learning. Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data. In particular, it is shown that the repetition suppression mechanism efficiently addresses the issues of automatically estimating the number of clusters within the data, as well as filtering noise and irrelevant input components in highly dimensional data, e.g. gene expression levels from DNA Microarrays. The CoRe model produces relevance estimates for the each covariate which is useful, for instance, to discover the best discriminating bio-markers. The description of the model includes a theoretical analysis using Huber’s robust statistics to show that the model is robust to outliers and noise in the data. The convergence properties of themodel also studied. It is shown that, besides its biological underpinning, the CoRe model has useful properties in terms of asymptotic behavior. By exploiting a kernel-based formulation for the CoRe learning error, a theoretically sound motivation is provided for the model’s ability to avoid local minima of its loss function. To do this a necessary and sufficient condition for global error minimization in vector quantization is generalized by extending it to distance metrics in generic Hilbert spaces. This leads to the derivation of a family of kernel-based algorithms that address the local minima issue of unsupervised vector quantization in a principled way. The experimental results show that the algorithm can achieve a consistent performance gain compared with state-of-the-art learning vector quantizers, while retaining a lower computational complexity (linear with respect to the dataset size). Bridging the gap between the low level representation of the visual content and the underlying high-level semantics is a major research issue of current interest. The second part of the thesis focuses on this problem by introducing a hierarchical and multi-resolution approach to visual content understanding. On a spatial level, CoRe learning is used to pool together the local visual patches by organizing them into perceptually meaningful intermediate structures. On the semantical level, it provides an extension of the probabilistic Latent Semantic Analysis (pLSA) model that allows discovery and organization of the visual topics into a hierarchy of aspects. The proposed hierarchical pLSA model is shown to effectively address the unsupervised discovery of relevant visual classes from pictorial collections, at the same time learning to segment the image regions containing the discovered classes. Furthermore, by drawing on a recent pLSA-based image annotation system, the hierarchical pLSA model is extended to process and representmulti-modal collections comprising textual and visual data. The results of the experimental evaluation show that the proposed model learns to attach textual labels (available only at the level of the whole image) to the discovered image regions, while increasing the precision/ recall performance with respect to flat, pLSA annotation model.
Close
http://e-theses.imtlucca.it/id/eprint/7
doi:10.6092/imtlucca/e-theses/7
Close
Davide Bacciu – Homepage

Full Professor – Dipartimento di Informatica, Università di Pisa