Extended E-N-DIST Algorithm for Alias Detection

Nowadays personal names are not the only way to refer to celebrities and experts from different fields, instead, they can be referred to by their aliases on the web. Associated aliases have remarkable importance in retrieving information about the personal name from the websites. Therefore, disclosi...

Full description

Bibliographic Details
Main Author:	Mohammed Hadwan
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Alias detection edit distance (ED) Levenshtein distance (LD) E-N-DIST dynamic programming
Online Access:	https://ieeexplore.ieee.org/document/9312038/

_version_	1828112012690849792
author	Mohammed Hadwan
author_facet	Mohammed Hadwan
author_sort	Mohammed Hadwan
collection	DOAJ
description	Nowadays personal names are not the only way to refer to celebrities and experts from different fields, instead, they can be referred to by their aliases on the web. Associated aliases have remarkable importance in retrieving information about the personal name from the websites. Therefore, disclosing aliases can have an important role in overcoming many real-world challenges. In this research, the aim is to explore and propose a reliable algorithm that can detect aliases that occurred due to transliteration of Arabic names into English. An extension to the Enhanced N-gram distance algorithm (E-N-DIST) which was previously published is introduced in this paper. The proposed algorithm is called the Extended Enhanced N-gram distance algorithm (E-E-N-DIST). The differences between E-N-DIST and E-E-N-DIST are two main changes in calculating the cost of substitution and transposition. First, E-E-N-DIST is computed based on 2<sup>n+1</sup> – 1 states. The second is the use of an edit operation called the ’Exchange of Vowels’ to count the common spelling errors that happen due to the transliteration from one language to another. The idea of exchange of vowels is to search for vowels (viz. <sub>=</sub> a‘, <sub>=</sub> e‘, <sub>=</sub> i’, <sub>=</sub> o‘, and <sub>=</sub> u‘) and the non-vowel character <sub>=</sub> y‘ that has a vowel sound or a part of it in other languages to estimate the operations cost of insertion and deletion. The proposed algorithm tested using a dataset for the literature; the results obtained are compared with other algorithms from the state of the art. The proposed algorithm outperforms other algorithms; it achieved a better average percentage of similarity than all other compared algorithms.
first_indexed	2024-04-11T11:43:56Z
format	Article
id	doaj.art-fa0f645f0dc84e30b5fe6220661ae4b1
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-11T11:43:56Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-fa0f645f0dc84e30b5fe6220661ae4b12022-12-22T04:25:43ZengIEEEIEEE Access2169-35362021-01-0197952795910.1109/ACCESS.2020.30487559312038Extended E-N-DIST Algorithm for Alias DetectionMohammed Hadwan0https://orcid.org/0000-0002-9924-7980Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi ArabiaNowadays personal names are not the only way to refer to celebrities and experts from different fields, instead, they can be referred to by their aliases on the web. Associated aliases have remarkable importance in retrieving information about the personal name from the websites. Therefore, disclosing aliases can have an important role in overcoming many real-world challenges. In this research, the aim is to explore and propose a reliable algorithm that can detect aliases that occurred due to transliteration of Arabic names into English. An extension to the Enhanced N-gram distance algorithm (E-N-DIST) which was previously published is introduced in this paper. The proposed algorithm is called the Extended Enhanced N-gram distance algorithm (E-E-N-DIST). The differences between E-N-DIST and E-E-N-DIST are two main changes in calculating the cost of substitution and transposition. First, E-E-N-DIST is computed based on 2<sup>n+1</sup> – 1 states. The second is the use of an edit operation called the ’Exchange of Vowels’ to count the common spelling errors that happen due to the transliteration from one language to another. The idea of exchange of vowels is to search for vowels (viz. <sub>=</sub> a‘, <sub>=</sub> e‘, <sub>=</sub> i’, <sub>=</sub> o‘, and <sub>=</sub> u‘) and the non-vowel character <sub>=</sub> y‘ that has a vowel sound or a part of it in other languages to estimate the operations cost of insertion and deletion. The proposed algorithm tested using a dataset for the literature; the results obtained are compared with other algorithms from the state of the art. The proposed algorithm outperforms other algorithms; it achieved a better average percentage of similarity than all other compared algorithms.https://ieeexplore.ieee.org/document/9312038/Alias detectionedit distance (ED)Levenshtein distance (LD)E-N-DISTdynamic programming
spellingShingle	Mohammed Hadwan Extended E-N-DIST Algorithm for Alias Detection IEEE Access Alias detection edit distance (ED) Levenshtein distance (LD) E-N-DIST dynamic programming
title	Extended E-N-DIST Algorithm for Alias Detection
title_full	Extended E-N-DIST Algorithm for Alias Detection
title_fullStr	Extended E-N-DIST Algorithm for Alias Detection
title_full_unstemmed	Extended E-N-DIST Algorithm for Alias Detection
title_short	Extended E-N-DIST Algorithm for Alias Detection
title_sort	extended e n dist algorithm for alias detection
topic	Alias detection edit distance (ED) Levenshtein distance (LD) E-N-DIST dynamic programming
url	https://ieeexplore.ieee.org/document/9312038/
work_keys_str_mv	AT mohammedhadwan extendedendistalgorithmforaliasdetection

Extended E-N-DIST Algorithm for Alias Detection

Similar Items