Introducing DeReKoGram: A Novel Frequency Dataset with Lemma and Part-of-Speech Information for German

We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We descr...

Full description

Bibliographic Details
Main Authors: Sascha Wolfer, Alexander Koplenig, Marc Kupietz, Carolin Müller-Spitzer
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/8/11/170