Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text a...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Korea Genome Organization
2018-12-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf |
_version_ | 1818049038333771776 |
---|---|
author | Hyun-Seok Park |
author_facet | Hyun-Seok Park |
author_sort | Hyun-Seok Park |
collection | DOAJ |
description | There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation. |
first_indexed | 2024-12-10T10:31:13Z |
format | Article |
id | doaj.art-88d5117789994eb7a87272fc34e213c4 |
institution | Directory Open Access Journal |
issn | 2234-0742 |
language | English |
last_indexed | 2024-12-10T10:31:13Z |
publishDate | 2018-12-01 |
publisher | Korea Genome Organization |
record_format | Article |
series | Genomics & Informatics |
spelling | doaj.art-88d5117789994eb7a87272fc34e213c42022-12-22T01:52:33ZengKorea Genome OrganizationGenomics & Informatics2234-07422018-12-0116410.5808/GI.2018.16.4.e40542Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus ofHyun-Seok Park0 Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, KoreaThere is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdfbiomedical text miningcorpustext analytics |
spellingShingle | Hyun-Seok Park Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of Genomics & Informatics biomedical text mining corpus text analytics |
title | Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of |
title_full | Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of |
title_fullStr | Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of |
title_full_unstemmed | Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of |
title_short | Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of |
title_sort | opinion strategy of semi automatically annotating a full text corpus of |
topic | biomedical text mining corpus text analytics |
url | http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf |
work_keys_str_mv | AT hyunseokpark opinionstrategyofsemiautomaticallyannotatingafulltextcorpusof |