Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document...

Full description

Bibliographic Details
Main Authors: Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
Format: Article
Language:English
Published: Nicolas Turenne 2021-01-01
Series:Journal of Data Mining and Digital Humanities
Subjects:
Online Access:https://jdmdh.episciences.org/7097/pdf
_version_ 1818582779801108480
author Raphaël Barman
Maud Ehrmann
Simon Clematide
Sofia Ares Oliveira
Frédéric Kaplan
author_facet Raphaël Barman
Maud Ehrmann
Simon Clematide
Sofia Ares Oliveira
Frédéric Kaplan
author_sort Raphaël Barman
collection DOAJ
description The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.
first_indexed 2024-12-16T07:54:49Z
format Article
id doaj.art-8fb6a1f970e74e7dbdcf10837c9c4e2e
institution Directory Open Access Journal
issn 2416-5999
language English
last_indexed 2024-12-16T07:54:49Z
publishDate 2021-01-01
publisher Nicolas Turenne
record_format Article
series Journal of Data Mining and Digital Humanities
spelling doaj.art-8fb6a1f970e74e7dbdcf10837c9c4e2e2022-12-21T22:38:46ZengNicolas TurenneJournal of Data Mining and Digital Humanities2416-59992021-01-01HistoInformaticsHistoInformaticsjdmdh:7097Combining Visual and Textual Features for Semantic Segmentation of Historical NewspapersRaphaël BarmanMaud EhrmannSimon ClematideSofia Ares OliveiraFrédéric KaplanThe massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.https://jdmdh.episciences.org/7097/pdfcomputer science - computer vision and pattern recognitioncomputer science - computation and languagecomputer science - information retrievalcomputer science - machine learning
spellingShingle Raphaël Barman
Maud Ehrmann
Simon Clematide
Sofia Ares Oliveira
Frédéric Kaplan
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
Journal of Data Mining and Digital Humanities
computer science - computer vision and pattern recognition
computer science - computation and language
computer science - information retrieval
computer science - machine learning
title Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
title_full Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
title_fullStr Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
title_full_unstemmed Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
title_short Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
title_sort combining visual and textual features for semantic segmentation of historical newspapers
topic computer science - computer vision and pattern recognition
computer science - computation and language
computer science - information retrieval
computer science - machine learning
url https://jdmdh.episciences.org/7097/pdf
work_keys_str_mv AT raphaelbarman combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers
AT maudehrmann combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers
AT simonclematide combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers
AT sofiaaresoliveira combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers
AT frederickaplan combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers