One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not compa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-10-01
|
Series: | Journal of Imaging |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-433X/6/10/109 |
_version_ | 1797551050902732800 |
---|---|
author | Antonio Parziale Giuliana Capriolo Angelo Marcelli |
author_facet | Antonio Parziale Giuliana Capriolo Angelo Marcelli |
author_sort | Antonio Parziale |
collection | DOAJ |
description | Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set. |
first_indexed | 2024-03-10T15:39:15Z |
format | Article |
id | doaj.art-2af61cf1065e434cac8c8777757f7437 |
institution | Directory Open Access Journal |
issn | 2313-433X |
language | English |
last_indexed | 2024-03-10T15:39:15Z |
publishDate | 2020-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Journal of Imaging |
spelling | doaj.art-2af61cf1065e434cac8c8777757f74372023-11-20T16:57:15ZengMDPI AGJournal of Imaging2313-433X2020-10-0161010910.3390/jimaging6100109One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical DocumentAntonio Parziale0Giuliana Capriolo1Angelo Marcelli2Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), ItalyDepartment of Cultural Heritage, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), ItalyDepartment of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), ItalyDigital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set.https://www.mdpi.com/2313-433X/6/10/109keyword spottingassisted transcriptionhandwritten documentstraining setautomatic document processinghistorical documents |
spellingShingle | Antonio Parziale Giuliana Capriolo Angelo Marcelli One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document Journal of Imaging keyword spotting assisted transcription handwritten documents training set automatic document processing historical documents |
title | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_full | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_fullStr | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_full_unstemmed | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_short | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_sort | one step is not enough a multi step procedure for building the training set of a query by string keyword spotting system to assist the transcription of historical document |
topic | keyword spotting assisted transcription handwritten documents training set automatic document processing historical documents |
url | https://www.mdpi.com/2313-433X/6/10/109 |
work_keys_str_mv | AT antonioparziale onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument AT giulianacapriolo onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument AT angelomarcelli onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument |