Blýskání na lepší data z českých digitálních knihoven

In the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in re...

Full description

Bibliographic Details
Main Author: Boris Lehečka
Format: Article
Language:ces
Published: Univerzita Karlova, Filozofická fakulta 2023-06-01
Series:Časopis pro Moderní Filologii
Subjects:
Online Access:https://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2023/07/Boris_Lehecka_274-291.pdf
_version_ 1797776471548231680
author Boris Lehečka
author_facet Boris Lehečka
author_sort Boris Lehečka
collection DOAJ
description In the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in recent years. The article presents an example from abroad and gives a brief overview of data sources in the Czech environment. It focuses on the recently completed DL4DH project, which aims to offer researchers access to large volumes of data from the Kramerius digital library in standardized formats (plain text, ALTO, CSV/TSV, TEI, JSON) not only through a new web application but also through a REST API. To make the subsequent analysis of the publications as easy as possible, the downloaded data can include enrichment data from the UDPipe and NameTag tools developed and operated by the LINDAT/CLARIAH-CZ research infrastructure.
first_indexed 2024-03-12T22:50:27Z
format Article
id doaj.art-c51809b5037a4bf499dd27ac18df8afd
institution Directory Open Access Journal
issn 0008-7386
2336-6591
language ces
last_indexed 2024-03-12T22:50:27Z
publishDate 2023-06-01
publisher Univerzita Karlova, Filozofická fakulta
record_format Article
series Časopis pro Moderní Filologii
spelling doaj.art-c51809b5037a4bf499dd27ac18df8afd2023-07-20T12:20:03ZcesUniverzita Karlova, Filozofická fakultaČasopis pro Moderní Filologii0008-73862336-65912023-06-01105227429210.14712/23366591.2023.2.7Blýskání na lepší data z českých digitálních knihovenBoris Lehečka 0https://orcid.org/0000-0003-4893-5537Moravská zemská knihovna v BrněIn the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in recent years. The article presents an example from abroad and gives a brief overview of data sources in the Czech environment. It focuses on the recently completed DL4DH project, which aims to offer researchers access to large volumes of data from the Kramerius digital library in standardized formats (plain text, ALTO, CSV/TSV, TEI, JSON) not only through a new web application but also through a REST API. To make the subsequent analysis of the publications as easy as possible, the downloaded data can include enrichment data from the UDPipe and NameTag tools developed and operated by the LINDAT/CLARIAH-CZ research infrastructure.https://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2023/07/Boris_Lehecka_274-291.pdfbig datadigital librarydigital humanitiesresearch infrastructurecopyright law
spellingShingle Boris Lehečka
Blýskání na lepší data z českých digitálních knihoven
Časopis pro Moderní Filologii
big data
digital library
digital humanities
research infrastructure
copyright law
title Blýskání na lepší data z českých digitálních knihoven
title_full Blýskání na lepší data z českých digitálních knihoven
title_fullStr Blýskání na lepší data z českých digitálních knihoven
title_full_unstemmed Blýskání na lepší data z českých digitálních knihoven
title_short Blýskání na lepší data z českých digitálních knihoven
title_sort blyskani na lepsi data z ceskych digitalnich knihoven
topic big data
digital library
digital humanities
research infrastructure
copyright law
url https://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2023/07/Boris_Lehecka_274-291.pdf
work_keys_str_mv AT borislehecka blyskaninalepsidatazceskychdigitalnichknihoven