Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text a...

Full description

Bibliographic Details
Main Author:	Hyun-Seok Park
Format:	Article
Language:	English
Published:	Korea Genome Organization 2018-12-01
Series:	Genomics & Informatics
Subjects:	biomedical text mining corpus text analytics
Online Access:	http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf

Description
Summary:	There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.
ISSN:	2234-0742

Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of

Similar Items