Evaluating Transformer Models for Punctuation Restoration in Italian

Alessio Miaschi, Andrea Amelio Ravelli, Felice Dell'Orletta

November 2021

PDF

Abstract

In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.

Type

Conference paper

Publication

In Proceedings of 5th Workshop on Natural Language for Artificial Intelligence (NL4AI @ AIxIA 2021)

Source Themes

Evaluating Transformer Models for Punctuation Restoration in Italian

Abstract

Alessio Miaschi

PostDoc in Natural Language Processing