Publishing an OCR ground truth data set for reuse in an unclear copyright setting. Two case studies with legal and technical solutions to enable a collective OCR ground truth data set effort
We present an OCR ground truth data set for historical prints and show improvement of recognition results over baselines with training on this data. We reflect on reusability of the ground truth data set based on two experiments that look into the lega...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | deu |
Published: |
Forschungsverbund Marbach Weimar Wolfenbüttel
2021-09-01
|
Series: | Zeitschrift für digitale Geisteswissenschaften |
Subjects: | |
Online Access: | https://www.zfdg.de/node/340 |
_version_ | 1797966257055596544 |
---|---|
author | David Lassner Julius Coburger Clemens Neudecker Anne Baillot |
author_facet | David Lassner Julius Coburger Clemens Neudecker Anne Baillot |
author_sort | David Lassner |
collection | DOAJ |
description | We present an OCR ground truth data set for historical prints
and show improvement of recognition results over baselines with training on this
data. We reflect on reusability of the ground truth data set based on two
experiments that look into the legal basis for reuse of digitized document images in
the case of 19th century English and German books. We propose a framework for
publishing ground truth data even when digitized document images cannot be easily
redistributed. |
first_indexed | 2024-04-11T02:11:45Z |
format | Article |
id | doaj.art-6d4f24b55965459fbcebf31cc807db14 |
institution | Directory Open Access Journal |
issn | 2510-1358 |
language | deu |
last_indexed | 2024-04-11T02:11:45Z |
publishDate | 2021-09-01 |
publisher | Forschungsverbund Marbach Weimar Wolfenbüttel |
record_format | Article |
series | Zeitschrift für digitale Geisteswissenschaften |
spelling | doaj.art-6d4f24b55965459fbcebf31cc807db142023-01-03T01:53:59ZdeuForschungsverbund Marbach Weimar WolfenbüttelZeitschrift für digitale Geisteswissenschaften2510-13582021-09-015610.17175/sb005_0061780168195Publishing an OCR ground truth data set for reuse in an unclear copyright setting. Two case studies with legal and technical solutions to enable a collective OCR ground truth data set effortDavid Lassnerhttps://orcid.org/0000-0001-9013-0834Julius Coburgerhttps://orcid.org/0000-0003-4502-7955Clemens Neudeckerhttps://orcid.org/0000-0001-5293-8322Anne Baillothttps://orcid.org/0000-0002-4593-059XWe present an OCR ground truth data set for historical prints and show improvement of recognition results over baselines with training on this data. We reflect on reusability of the ground truth data set based on two experiments that look into the legal basis for reuse of digitized document images in the case of 19th century English and German books. We propose a framework for publishing ground truth data even when digitized document images cannot be easily redistributed.https://www.zfdg.de/node/340informatik maschinelles lernen optische zeichenerkennung urheberrecht |
spellingShingle | David Lassner Julius Coburger Clemens Neudecker Anne Baillot Publishing an OCR ground truth data set for reuse in an unclear copyright setting. Two case studies with legal and technical solutions to enable a collective OCR ground truth data set effort Zeitschrift für digitale Geisteswissenschaften informatik maschinelles lernen optische zeichenerkennung urheberrecht |
title | Publishing an OCR ground truth data set for reuse in an unclear
copyright setting. Two case studies with legal and
technical solutions to enable a collective OCR ground truth data set effort |
title_full | Publishing an OCR ground truth data set for reuse in an unclear
copyright setting. Two case studies with legal and
technical solutions to enable a collective OCR ground truth data set effort |
title_fullStr | Publishing an OCR ground truth data set for reuse in an unclear
copyright setting. Two case studies with legal and
technical solutions to enable a collective OCR ground truth data set effort |
title_full_unstemmed | Publishing an OCR ground truth data set for reuse in an unclear
copyright setting. Two case studies with legal and
technical solutions to enable a collective OCR ground truth data set effort |
title_short | Publishing an OCR ground truth data set for reuse in an unclear
copyright setting. Two case studies with legal and
technical solutions to enable a collective OCR ground truth data set effort |
title_sort | publishing an ocr ground truth data set for reuse in an unclear copyright setting two case studies with legal and technical solutions to enable a collective ocr ground truth data set effort |
topic | informatik maschinelles lernen optische zeichenerkennung urheberrecht |
url | https://www.zfdg.de/node/340 |
work_keys_str_mv | AT davidlassner publishinganocrgroundtruthdatasetforreuseinanunclearcopyrightsettingtwocasestudieswithlegalandtechnicalsolutionstoenableacollectiveocrgroundtruthdataseteffort AT juliuscoburger publishinganocrgroundtruthdatasetforreuseinanunclearcopyrightsettingtwocasestudieswithlegalandtechnicalsolutionstoenableacollectiveocrgroundtruthdataseteffort AT clemensneudecker publishinganocrgroundtruthdatasetforreuseinanunclearcopyrightsettingtwocasestudieswithlegalandtechnicalsolutionstoenableacollectiveocrgroundtruthdataseteffort AT annebaillot publishinganocrgroundtruthdatasetforreuseinanunclearcopyrightsettingtwocasestudieswithlegalandtechnicalsolutionstoenableacollectiveocrgroundtruthdataseteffort |