DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS
The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts).In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
2017-01-01
|
Series: | Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki |
Subjects: | |
Online Access: | http://ntv.ifmo.ru/file/article/16414.pdf |
_version_ | 1819018934558392320 |
---|---|
author | A. A. Vorobeva |
author_facet | A. A. Vorobeva |
author_sort | A. A. Vorobeva |
collection | DOAJ |
description | The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts).In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare). Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm). This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text); features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages) per user. |
first_indexed | 2024-12-21T03:27:18Z |
format | Article |
id | doaj.art-812c35f16b8e49fea8ff223c80e754fa |
institution | Directory Open Access Journal |
issn | 2226-1494 2500-0373 |
language | English |
last_indexed | 2024-12-21T03:27:18Z |
publishDate | 2017-01-01 |
publisher | Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) |
record_format | Article |
series | Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki |
spelling | doaj.art-812c35f16b8e49fea8ff223c80e754fa2022-12-21T19:17:34ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732017-01-0117111712810.17586/2226-1494-2017-17-1-117-128DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTSA. A. Vorobeva0assistant, ITMO University, Saint Petersburg, 197101, Russian FederationThe paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts).In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare). Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm). This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text); features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages) per user.http://ntv.ifmo.ru/file/article/16414.pdfweb user identificationforensic linguisticsinformation security |
spellingShingle | A. A. Vorobeva DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki web user identification forensic linguistics information security |
title | DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS |
title_full | DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS |
title_fullStr | DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS |
title_full_unstemmed | DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS |
title_short | DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS |
title_sort | dynamic feature selection for web user identification on linguistic and stylistic features of online texts |
topic | web user identification forensic linguistics information security |
url | http://ntv.ifmo.ru/file/article/16414.pdf |
work_keys_str_mv | AT aavorobeva dynamicfeatureselectionforwebuseridentificationonlinguisticandstylisticfeaturesofonlinetexts |