A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
The development of steganography technology threatens the security of privacy information in smart campus. To prevent privacy disclosure, a linguistic steganalysis method based on word embedding is proposed to detect the privacy information hidden in synonyms in the texts. With the continuous Skipgr...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8510794/ |
_version_ | 1818854546611372032 |
---|---|
author | Lingyun Xiang Jingmin Yu Chunfang Yang Daojian Zeng Xiaobo Shen |
author_facet | Lingyun Xiang Jingmin Yu Chunfang Yang Daojian Zeng Xiaobo Shen |
author_sort | Lingyun Xiang |
collection | DOAJ |
description | The development of steganography technology threatens the security of privacy information in smart campus. To prevent privacy disclosure, a linguistic steganalysis method based on word embedding is proposed to detect the privacy information hidden in synonyms in the texts. With the continuous Skipgram language model, each synonym and words in its context are represented as word embeddings, which aims to encode semantic meanings of words into low-dimensional dense vectors. The context fitness, which characterizes the suitability of a synonym by its semantic correlations with context words, is effectively estimated by their corresponding word embeddings and weighted by TF-IDF values of context words. By analyzing the differences of context fitness values of synonyms in the same synonym set and the differences of those in the cover and stego text, three features are extracted and fed into a support vector machine classifier for steganalysis task. The experimental results show that the proposed steganalysis improves the average F-value at least 4.8% over two baselines. In addition, the detection performance can be further improved by learning better word embeddings. |
first_indexed | 2024-12-19T07:54:26Z |
format | Article |
id | doaj.art-0587a2057fad41fc878610052612d070 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-19T07:54:26Z |
publishDate | 2018-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0587a2057fad41fc878610052612d0702022-12-21T20:30:02ZengIEEEIEEE Access2169-35362018-01-016641316414110.1109/ACCESS.2018.28782738510794A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym SubstitutionLingyun Xiang0https://orcid.org/0000-0001-7396-0908Jingmin Yu1Chunfang Yang2https://orcid.org/0000-0001-6487-379XDaojian Zeng3Xiaobo Shen4Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, ChinaSchool of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, ChinaZhengzhou Science and Technology Institute, Zhengzhou, ChinaHunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, ChinaSchool of Computer Science and Engineering, Nanyang Technological University, SingaporeThe development of steganography technology threatens the security of privacy information in smart campus. To prevent privacy disclosure, a linguistic steganalysis method based on word embedding is proposed to detect the privacy information hidden in synonyms in the texts. With the continuous Skipgram language model, each synonym and words in its context are represented as word embeddings, which aims to encode semantic meanings of words into low-dimensional dense vectors. The context fitness, which characterizes the suitability of a synonym by its semantic correlations with context words, is effectively estimated by their corresponding word embeddings and weighted by TF-IDF values of context words. By analyzing the differences of context fitness values of synonyms in the same synonym set and the differences of those in the cover and stego text, three features are extracted and fed into a support vector machine classifier for steganalysis task. The experimental results show that the proposed steganalysis improves the average F-value at least 4.8% over two baselines. In addition, the detection performance can be further improved by learning better word embeddings.https://ieeexplore.ieee.org/document/8510794/Steganalysissteganographyword embeddingSkip-gram language modelTF-IDF |
spellingShingle | Lingyun Xiang Jingmin Yu Chunfang Yang Daojian Zeng Xiaobo Shen A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution IEEE Access Steganalysis steganography word embedding Skip-gram language model TF-IDF |
title | A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution |
title_full | A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution |
title_fullStr | A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution |
title_full_unstemmed | A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution |
title_short | A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution |
title_sort | word embedding based steganalysis method for linguistic steganography via synonym substitution |
topic | Steganalysis steganography word embedding Skip-gram language model TF-IDF |
url | https://ieeexplore.ieee.org/document/8510794/ |
work_keys_str_mv | AT lingyunxiang awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT jingminyu awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT chunfangyang awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT daojianzeng awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT xiaoboshen awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT lingyunxiang wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT jingminyu wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT chunfangyang wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT daojianzeng wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution AT xiaoboshen wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution |