One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document

Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not compa...

Full description

Bibliographic Details
Main Authors: Antonio Parziale, Giuliana Capriolo, Angelo Marcelli
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/6/10/109
_version_ 1797551050902732800
author Antonio Parziale
Giuliana Capriolo
Angelo Marcelli
author_facet Antonio Parziale
Giuliana Capriolo
Angelo Marcelli
author_sort Antonio Parziale
collection DOAJ
description Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set.
first_indexed 2024-03-10T15:39:15Z
format Article
id doaj.art-2af61cf1065e434cac8c8777757f7437
institution Directory Open Access Journal
issn 2313-433X
language English
last_indexed 2024-03-10T15:39:15Z
publishDate 2020-10-01
publisher MDPI AG
record_format Article
series Journal of Imaging
spelling doaj.art-2af61cf1065e434cac8c8777757f74372023-11-20T16:57:15ZengMDPI AGJournal of Imaging2313-433X2020-10-0161010910.3390/jimaging6100109One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical DocumentAntonio Parziale0Giuliana Capriolo1Angelo Marcelli2Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), ItalyDepartment of Cultural Heritage, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), ItalyDepartment of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), ItalyDigital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set.https://www.mdpi.com/2313-433X/6/10/109keyword spottingassisted transcriptionhandwritten documentstraining setautomatic document processinghistorical documents
spellingShingle Antonio Parziale
Giuliana Capriolo
Angelo Marcelli
One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
Journal of Imaging
keyword spotting
assisted transcription
handwritten documents
training set
automatic document processing
historical documents
title One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
title_full One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
title_fullStr One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
title_full_unstemmed One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
title_short One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
title_sort one step is not enough a multi step procedure for building the training set of a query by string keyword spotting system to assist the transcription of historical document
topic keyword spotting
assisted transcription
handwritten documents
training set
automatic document processing
historical documents
url https://www.mdpi.com/2313-433X/6/10/109
work_keys_str_mv AT antonioparziale onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument
AT giulianacapriolo onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument
AT angelomarcelli onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument