«La Repubblica» Corpus

This paper reviews a huge resource of contemporary Italian newspaper language, the «La Repubblica» corpus. The corpus contains articles, which appeared in the Italian daily newspaper La Repubblica during the years 1985 to 2000 and counts more than 380 million tokens. Apart from being tokenized, it i...

Full description

Bibliographic Details
Main Author:	Rebecca Sierig
Format:	Article
Language:	deu
Published:	Institut für Dokumentologie und Editorik e. V. 2017-09-01
Series:	RIDE
Subjects:	20th century interface italian linguistic search newspaper pos-tagging tei text collection
Online Access:	https://ride.i-d-e.de/issues/issue-6/la-repubblica-corpus/

_version_	1797365965374095360
author	Rebecca Sierig
author_facet	Rebecca Sierig
author_sort	Rebecca Sierig
collection	DOAJ
description	This paper reviews a huge resource of contemporary Italian newspaper language, the «La Repubblica» corpus. The corpus contains articles, which appeared in the Italian daily newspaper La Repubblica during the years 1985 to 2000 and counts more than 380 million tokens. Apart from being tokenized, it is also PoS-tagged, enriched with TEI-conformant structural mark-up as well as categorized with respect to topics and genres. The data and their preparation are addressed in the first part of this paper while its second part deals with access to the corpus. When the review was written, there were two possible ways of accessing the corpus: either by the ‘old’ interface directly hosted by the Institute of Translational Studies at the University of Bologna (SSLMIT) or by the ‘new’ one hosted by a NoSketch Engine. Both ways are compared in order to point out the changes.
first_indexed	2024-03-08T16:57:29Z
format	Article
id	doaj.art-0aea1776475e4832b11a6592ff546c87
institution	Directory Open Access Journal
issn	2363-4952
language	deu
last_indexed	2024-03-08T16:57:29Z
publishDate	2017-09-01
publisher	Institut für Dokumentologie und Editorik e. V.
record_format	Article
series	RIDE
spelling	doaj.art-0aea1776475e4832b11a6592ff546c872024-01-04T18:19:37ZdeuInstitut für Dokumentologie und Editorik e. V.RIDE2363-49522017-09-01610.18716/ride.a.6.9«La Repubblica» CorpusRebecca Sierig0https://orcid.org/0000-0002-5323-4543University of LeipzigThis paper reviews a huge resource of contemporary Italian newspaper language, the «La Repubblica» corpus. The corpus contains articles, which appeared in the Italian daily newspaper La Repubblica during the years 1985 to 2000 and counts more than 380 million tokens. Apart from being tokenized, it is also PoS-tagged, enriched with TEI-conformant structural mark-up as well as categorized with respect to topics and genres. The data and their preparation are addressed in the first part of this paper while its second part deals with access to the corpus. When the review was written, there were two possible ways of accessing the corpus: either by the ‘old’ interface directly hosted by the Institute of Translational Studies at the University of Bologna (SSLMIT) or by the ‘new’ one hosted by a NoSketch Engine. Both ways are compared in order to point out the changes.https://ride.i-d-e.de/issues/issue-6/la-repubblica-corpus/20th centuryinterfaceitalianlinguistic searchnewspaperpos-taggingteitext collection
spellingShingle	Rebecca Sierig «La Repubblica» Corpus RIDE 20th century interface italian linguistic search newspaper pos-tagging tei text collection
title	«La Repubblica» Corpus
title_full	«La Repubblica» Corpus
title_fullStr	«La Repubblica» Corpus
title_full_unstemmed	«La Repubblica» Corpus
title_short	«La Repubblica» Corpus
title_sort	la repubblica corpus
topic	20th century interface italian linguistic search newspaper pos-tagging tei text collection
url	https://ride.i-d-e.de/issues/issue-6/la-repubblica-corpus/
work_keys_str_mv	AT rebeccasierig larepubblicacorpus

«La Repubblica» Corpus

Similar Items