Enhanced normalization approach addressing stop-word complexity in compound-word schema labels

An extensive review of the existing schema matching approaches discovered an area of improvement in the field of semantic schema matching. Normalization and lexical annotation methods using WordNet have been somewhat successful in general cases. However, in the presence of stop-words these approach...

Full description

Bibliographic Details
Main Authors: Hossain, Jafreen, Mohd Sani, Nor Fazlida, Affendey, Lilly Suriani, Ishak, Iskandar, Kasmiran, Khairul Azhar
Format: Article
Language:English
Published: Asian Research Publication Network 2017
Online Access:http://psasir.upm.edu.my/id/eprint/61723/1/Enhanced%20normalization%20approach%20addressing%20stop-word%20complexity%20.pdf
_version_ 1825932380959932416
author Hossain, Jafreen
Mohd Sani, Nor Fazlida
Affendey, Lilly Suriani
Ishak, Iskandar
Kasmiran, Khairul Azhar
author_facet Hossain, Jafreen
Mohd Sani, Nor Fazlida
Affendey, Lilly Suriani
Ishak, Iskandar
Kasmiran, Khairul Azhar
author_sort Hossain, Jafreen
collection UPM
description An extensive review of the existing schema matching approaches discovered an area of improvement in the field of semantic schema matching. Normalization and lexical annotation methods using WordNet have been somewhat successful in general cases. However, in the presence of stop-words these approaches result in poor accuracy. Stop-words have previously been ignored in most studies resulting in false negative conclusions. This paper proposes NORMSTOP (NORMalizer of schemata having STOP-words) as an improved schema normalization approach that addresses the complexity of stop-words (e.g. ‘by’, ‘at’, ‘and,’ or’) in Compound Word (CW) schema labels. Using a combined set of WordNet features, NORMSTOP isolates these labels during the preprocessing stage and resets the base-form to a relevant WordNet term, or an annotable compound noun. When tested on the same real dataset used in the earlier approach - (NORMS or NORMalizer of Schemata), NORMSTOP shows up to 13% improvement in annotation recall measurement. This level of improvement takes the overall schema matching process another step closer to perfect accuracy; while its absence exposes a gap in expectation, especially in today’s databases, where stop-words are in abundance.
first_indexed 2024-03-06T09:41:09Z
format Article
id upm.eprints-61723
institution Universiti Putra Malaysia
language English
last_indexed 2024-03-06T09:41:09Z
publishDate 2017
publisher Asian Research Publication Network
record_format dspace
spelling upm.eprints-617232019-01-10T08:15:53Z http://psasir.upm.edu.my/id/eprint/61723/ Enhanced normalization approach addressing stop-word complexity in compound-word schema labels Hossain, Jafreen Mohd Sani, Nor Fazlida Affendey, Lilly Suriani Ishak, Iskandar Kasmiran, Khairul Azhar An extensive review of the existing schema matching approaches discovered an area of improvement in the field of semantic schema matching. Normalization and lexical annotation methods using WordNet have been somewhat successful in general cases. However, in the presence of stop-words these approaches result in poor accuracy. Stop-words have previously been ignored in most studies resulting in false negative conclusions. This paper proposes NORMSTOP (NORMalizer of schemata having STOP-words) as an improved schema normalization approach that addresses the complexity of stop-words (e.g. ‘by’, ‘at’, ‘and,’ or’) in Compound Word (CW) schema labels. Using a combined set of WordNet features, NORMSTOP isolates these labels during the preprocessing stage and resets the base-form to a relevant WordNet term, or an annotable compound noun. When tested on the same real dataset used in the earlier approach - (NORMS or NORMalizer of Schemata), NORMSTOP shows up to 13% improvement in annotation recall measurement. This level of improvement takes the overall schema matching process another step closer to perfect accuracy; while its absence exposes a gap in expectation, especially in today’s databases, where stop-words are in abundance. Asian Research Publication Network 2017-06 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/61723/1/Enhanced%20normalization%20approach%20addressing%20stop-word%20complexity%20.pdf Hossain, Jafreen and Mohd Sani, Nor Fazlida and Affendey, Lilly Suriani and Ishak, Iskandar and Kasmiran, Khairul Azhar (2017) Enhanced normalization approach addressing stop-word complexity in compound-word schema labels. Journal of Theoretical and Applied Information Technology, 95 (12). pp. 2635-2646. ISSN 1992-8645; ESSN: 1817-3195 http://www.jatit.org/volumes/ninetyfive12.php
spellingShingle Hossain, Jafreen
Mohd Sani, Nor Fazlida
Affendey, Lilly Suriani
Ishak, Iskandar
Kasmiran, Khairul Azhar
Enhanced normalization approach addressing stop-word complexity in compound-word schema labels
title Enhanced normalization approach addressing stop-word complexity in compound-word schema labels
title_full Enhanced normalization approach addressing stop-word complexity in compound-word schema labels
title_fullStr Enhanced normalization approach addressing stop-word complexity in compound-word schema labels
title_full_unstemmed Enhanced normalization approach addressing stop-word complexity in compound-word schema labels
title_short Enhanced normalization approach addressing stop-word complexity in compound-word schema labels
title_sort enhanced normalization approach addressing stop word complexity in compound word schema labels
url http://psasir.upm.edu.my/id/eprint/61723/1/Enhanced%20normalization%20approach%20addressing%20stop-word%20complexity%20.pdf
work_keys_str_mv AT hossainjafreen enhancednormalizationapproachaddressingstopwordcomplexityincompoundwordschemalabels
AT mohdsaninorfazlida enhancednormalizationapproachaddressingstopwordcomplexityincompoundwordschemalabels
AT affendeylillysuriani enhancednormalizationapproachaddressingstopwordcomplexityincompoundwordschemalabels
AT ishakiskandar enhancednormalizationapproachaddressingstopwordcomplexityincompoundwordschemalabels
AT kasmirankhairulazhar enhancednormalizationapproachaddressingstopwordcomplexityincompoundwordschemalabels