The Corpus of the Danish Dictionary

A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed and compiled by DSL (The Society for Danish Language and Literature) in order to serve as a major source for a new six volume dictionary of contemporary Danish. The corpus includes written and spoken,...

Full description

Bibliographic Details
Main Authors: Ole Norling-Christensen, Jørg Asmussen
Format: Article
Language:Afrikaans
Published: Woordeboek van die Afrikaanse Taal-WAT 2012-09-01
Series:Lexikos
Subjects:
Online Access:http://lexikos.journals.ac.za/pub/article/view/955
_version_ 1828275625047097344
author Ole Norling-Christensen
Jørg Asmussen
author_facet Ole Norling-Christensen
Jørg Asmussen
author_sort Ole Norling-Christensen
collection DOAJ
description A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed and compiled by DSL (The Society for Danish Language and Literature) in order to serve as a major source for a new six volume dictionary of contemporary Danish. The corpus includes written and spoken, private and professional, general and specialised language, and each of the 44 000 text samples is annotated with formalized information on these and other features of linguistic and sociological importance. The resulting multidimensional text type specification is useful for the extraction of (virtual or real) subcorpora and for statistical analyses. Specialized software has been developed for flexible interactive concordancing and analysis. The corpus is currently only accessible at the site of DSL; nevertheless, several scholars and students have been using it in their research. The experience gained by the staff of DSL is being reused in co-operative language engineering projects within the European Union, and in 1998 a publicly available corpus will be released as an outcome of the PAROLE project. &lt;p&gt; &lt;/p&gt;<br>&lt;p&gt;&amp;lt;b&amp;gt;Die korpus van die Deense Woordeboek&amp;lt;/b&amp;gt;&lt;/p&gt;&lt;p&gt;A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed and compiled by DSL (The Society for Danish Language and Literature) in order to serve as a major source for a new six volume dictionary of contemporary Danish. The corpus includes written and spoken, private and professional, general and specialised language, and each of the 44 000 text samples is annotated with formalized information on these and other features of linguistic and sociological importance. The resulting multidimensional text type specification is useful for the extraction of (virtual or real) subcorpora and for statistical analyses. Specialized software has been developed for flexible interactive concordancing and analysis. The corpus is currently only accessible at the site of DSL; nevertheless, several scholars and students have been using it in their research. The experience gained by the staff of DSL is being reused in co-operative language engineering projects within the European Union, and in 1998 a publicly available corpus will be released as an outcome of the PAROLE project.&lt;/p&gt;&lt;p&gt; &lt;/p&gt;
first_indexed 2024-04-13T06:48:47Z
format Article
id doaj.art-dc731752e2584c48984af4a6d1b2f00a
institution Directory Open Access Journal
issn 1684-4904
2224-0039
language Afrikaans
last_indexed 2024-04-13T06:48:47Z
publishDate 2012-09-01
publisher Woordeboek van die Afrikaanse Taal-WAT
record_format Article
series Lexikos
spelling doaj.art-dc731752e2584c48984af4a6d1b2f00a2022-12-22T02:57:28ZafrWoordeboek van die Afrikaanse Taal-WATLexikos1684-49042224-00392012-09-018110.5788/8-1-955The Corpus of the Danish DictionaryOle Norling-ChristensenJørg AsmussenA Danish corpus, holding 40 million words of general language from the period 1983-92, was designed and compiled by DSL (The Society for Danish Language and Literature) in order to serve as a major source for a new six volume dictionary of contemporary Danish. The corpus includes written and spoken, private and professional, general and specialised language, and each of the 44 000 text samples is annotated with formalized information on these and other features of linguistic and sociological importance. The resulting multidimensional text type specification is useful for the extraction of (virtual or real) subcorpora and for statistical analyses. Specialized software has been developed for flexible interactive concordancing and analysis. The corpus is currently only accessible at the site of DSL; nevertheless, several scholars and students have been using it in their research. The experience gained by the staff of DSL is being reused in co-operative language engineering projects within the European Union, and in 1998 a publicly available corpus will be released as an outcome of the PAROLE project. &lt;p&gt; &lt;/p&gt;<br>&lt;p&gt;&amp;lt;b&amp;gt;Die korpus van die Deense Woordeboek&amp;lt;/b&amp;gt;&lt;/p&gt;&lt;p&gt;A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed and compiled by DSL (The Society for Danish Language and Literature) in order to serve as a major source for a new six volume dictionary of contemporary Danish. The corpus includes written and spoken, private and professional, general and specialised language, and each of the 44 000 text samples is annotated with formalized information on these and other features of linguistic and sociological importance. The resulting multidimensional text type specification is useful for the extraction of (virtual or real) subcorpora and for statistical analyses. Specialized software has been developed for flexible interactive concordancing and analysis. The corpus is currently only accessible at the site of DSL; nevertheless, several scholars and students have been using it in their research. The experience gained by the staff of DSL is being reused in co-operative language engineering projects within the European Union, and in 1998 a publicly available corpus will be released as an outcome of the PAROLE project.&lt;/p&gt;&lt;p&gt; &lt;/p&gt;http://lexikos.journals.ac.za/pub/article/view/955concordancecopyrightcorpusdanishdictionaryfrequencylanguage engineeringmutual informationsgmlstatisticssubcorpust-scoretext typologyword distribution
spellingShingle Ole Norling-Christensen
Jørg Asmussen
The Corpus of the Danish Dictionary
Lexikos
concordance
copyright
corpus
danish
dictionary
frequency
language engineering
mutual information
sgml
statistics
subcorpus
t-score
text typology
word distribution
title The Corpus of the Danish Dictionary
title_full The Corpus of the Danish Dictionary
title_fullStr The Corpus of the Danish Dictionary
title_full_unstemmed The Corpus of the Danish Dictionary
title_short The Corpus of the Danish Dictionary
title_sort corpus of the danish dictionary
topic concordance
copyright
corpus
danish
dictionary
frequency
language engineering
mutual information
sgml
statistics
subcorpus
t-score
text typology
word distribution
url http://lexikos.journals.ac.za/pub/article/view/955
work_keys_str_mv AT olenorlingchristensen thecorpusofthedanishdictionary
AT jørgasmussen thecorpusofthedanishdictionary
AT olenorlingchristensen corpusofthedanishdictionary
AT jørgasmussen corpusofthedanishdictionary