Blýskání na lepší data z českých digitálních knihoven
In the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in re...
Main Author: | |
---|---|
Format: | Article |
Language: | ces |
Published: |
Univerzita Karlova, Filozofická fakulta
2023-06-01
|
Series: | Časopis pro Moderní Filologii |
Subjects: | |
Online Access: | https://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2023/07/Boris_Lehecka_274-291.pdf |
_version_ | 1797776471548231680 |
---|---|
author | Boris Lehečka |
author_facet | Boris Lehečka |
author_sort | Boris Lehečka |
collection | DOAJ |
description | In the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in recent years. The article presents an example from abroad and gives a brief overview of data sources in the Czech environment. It focuses on the recently completed DL4DH project, which aims to offer researchers access to large volumes of data from the Kramerius digital library in standardized formats (plain text, ALTO, CSV/TSV, TEI, JSON) not only through a new web application but also through a REST API. To make the subsequent analysis of the publications as easy as possible, the downloaded data can include enrichment data from the UDPipe and NameTag tools developed and operated by the LINDAT/CLARIAH-CZ research infrastructure. |
first_indexed | 2024-03-12T22:50:27Z |
format | Article |
id | doaj.art-c51809b5037a4bf499dd27ac18df8afd |
institution | Directory Open Access Journal |
issn | 0008-7386 2336-6591 |
language | ces |
last_indexed | 2024-03-12T22:50:27Z |
publishDate | 2023-06-01 |
publisher | Univerzita Karlova, Filozofická fakulta |
record_format | Article |
series | Časopis pro Moderní Filologii |
spelling | doaj.art-c51809b5037a4bf499dd27ac18df8afd2023-07-20T12:20:03ZcesUniverzita Karlova, Filozofická fakultaČasopis pro Moderní Filologii0008-73862336-65912023-06-01105227429210.14712/23366591.2023.2.7Blýskání na lepší data z českých digitálních knihovenBoris Lehečka 0https://orcid.org/0000-0003-4893-5537Moravská zemská knihovna v BrněIn the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in recent years. The article presents an example from abroad and gives a brief overview of data sources in the Czech environment. It focuses on the recently completed DL4DH project, which aims to offer researchers access to large volumes of data from the Kramerius digital library in standardized formats (plain text, ALTO, CSV/TSV, TEI, JSON) not only through a new web application but also through a REST API. To make the subsequent analysis of the publications as easy as possible, the downloaded data can include enrichment data from the UDPipe and NameTag tools developed and operated by the LINDAT/CLARIAH-CZ research infrastructure.https://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2023/07/Boris_Lehecka_274-291.pdfbig datadigital librarydigital humanitiesresearch infrastructurecopyright law |
spellingShingle | Boris Lehečka Blýskání na lepší data z českých digitálních knihoven Časopis pro Moderní Filologii big data digital library digital humanities research infrastructure copyright law |
title | Blýskání na lepší data z českých digitálních knihoven |
title_full | Blýskání na lepší data z českých digitálních knihoven |
title_fullStr | Blýskání na lepší data z českých digitálních knihoven |
title_full_unstemmed | Blýskání na lepší data z českých digitálních knihoven |
title_short | Blýskání na lepší data z českých digitálních knihoven |
title_sort | blyskani na lepsi data z ceskych digitalnich knihoven |
topic | big data digital library digital humanities research infrastructure copyright law |
url | https://casopispromodernifilologii.ff.cuni.cz/wp-content/uploads/sites/9/2023/07/Boris_Lehecka_274-291.pdf |
work_keys_str_mv | AT borislehecka blyskaninalepsidatazceskychdigitalnichknihoven |