Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text a...

Full description

Bibliographic Details
Main Author:	Hyun-Seok Park
Format:	Article
Language:	English
Published:	Korea Genome Organization 2018-12-01
Series:	Genomics & Informatics
Subjects:	biomedical text mining corpus text analytics
Online Access:	http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf

_version_	1818049038333771776
author	Hyun-Seok Park
author_facet	Hyun-Seok Park
author_sort	Hyun-Seok Park
collection	DOAJ
description	There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.
first_indexed	2024-12-10T10:31:13Z
format	Article
id	doaj.art-88d5117789994eb7a87272fc34e213c4
institution	Directory Open Access Journal
issn	2234-0742
language	English
last_indexed	2024-12-10T10:31:13Z
publishDate	2018-12-01
publisher	Korea Genome Organization
record_format	Article
series	Genomics & Informatics
spelling	doaj.art-88d5117789994eb7a87272fc34e213c42022-12-22T01:52:33ZengKorea Genome OrganizationGenomics & Informatics2234-07422018-12-0116410.5808/GI.2018.16.4.e40542Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus ofHyun-Seok Park0 Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, KoreaThere is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdfbiomedical text miningcorpustext analytics
spellingShingle	Hyun-Seok Park Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of Genomics & Informatics biomedical text mining corpus text analytics
title	Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_full	Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_fullStr	Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_full_unstemmed	Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_short	Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_sort	opinion strategy of semi automatically annotating a full text corpus of
topic	biomedical text mining corpus text analytics
url	http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf
work_keys_str_mv	AT hyunseokpark opinionstrategyofsemiautomaticallyannotatingafulltextcorpusof

Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of

Similar Items