A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution

The development of steganography technology threatens the security of privacy information in smart campus. To prevent privacy disclosure, a linguistic steganalysis method based on word embedding is proposed to detect the privacy information hidden in synonyms in the texts. With the continuous Skipgr...

Full description

Bibliographic Details
Main Authors: Lingyun Xiang, Jingmin Yu, Chunfang Yang, Daojian Zeng, Xiaobo Shen
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8510794/
_version_ 1818854546611372032
author Lingyun Xiang
Jingmin Yu
Chunfang Yang
Daojian Zeng
Xiaobo Shen
author_facet Lingyun Xiang
Jingmin Yu
Chunfang Yang
Daojian Zeng
Xiaobo Shen
author_sort Lingyun Xiang
collection DOAJ
description The development of steganography technology threatens the security of privacy information in smart campus. To prevent privacy disclosure, a linguistic steganalysis method based on word embedding is proposed to detect the privacy information hidden in synonyms in the texts. With the continuous Skipgram language model, each synonym and words in its context are represented as word embeddings, which aims to encode semantic meanings of words into low-dimensional dense vectors. The context fitness, which characterizes the suitability of a synonym by its semantic correlations with context words, is effectively estimated by their corresponding word embeddings and weighted by TF-IDF values of context words. By analyzing the differences of context fitness values of synonyms in the same synonym set and the differences of those in the cover and stego text, three features are extracted and fed into a support vector machine classifier for steganalysis task. The experimental results show that the proposed steganalysis improves the average F-value at least 4.8% over two baselines. In addition, the detection performance can be further improved by learning better word embeddings.
first_indexed 2024-12-19T07:54:26Z
format Article
id doaj.art-0587a2057fad41fc878610052612d070
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T07:54:26Z
publishDate 2018-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0587a2057fad41fc878610052612d0702022-12-21T20:30:02ZengIEEEIEEE Access2169-35362018-01-016641316414110.1109/ACCESS.2018.28782738510794A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym SubstitutionLingyun Xiang0https://orcid.org/0000-0001-7396-0908Jingmin Yu1Chunfang Yang2https://orcid.org/0000-0001-6487-379XDaojian Zeng3Xiaobo Shen4Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, ChinaSchool of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, ChinaZhengzhou Science and Technology Institute, Zhengzhou, ChinaHunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, ChinaSchool of Computer Science and Engineering, Nanyang Technological University, SingaporeThe development of steganography technology threatens the security of privacy information in smart campus. To prevent privacy disclosure, a linguistic steganalysis method based on word embedding is proposed to detect the privacy information hidden in synonyms in the texts. With the continuous Skipgram language model, each synonym and words in its context are represented as word embeddings, which aims to encode semantic meanings of words into low-dimensional dense vectors. The context fitness, which characterizes the suitability of a synonym by its semantic correlations with context words, is effectively estimated by their corresponding word embeddings and weighted by TF-IDF values of context words. By analyzing the differences of context fitness values of synonyms in the same synonym set and the differences of those in the cover and stego text, three features are extracted and fed into a support vector machine classifier for steganalysis task. The experimental results show that the proposed steganalysis improves the average F-value at least 4.8% over two baselines. In addition, the detection performance can be further improved by learning better word embeddings.https://ieeexplore.ieee.org/document/8510794/Steganalysissteganographyword embeddingSkip-gram language modelTF-IDF
spellingShingle Lingyun Xiang
Jingmin Yu
Chunfang Yang
Daojian Zeng
Xiaobo Shen
A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
IEEE Access
Steganalysis
steganography
word embedding
Skip-gram language model
TF-IDF
title A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
title_full A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
title_fullStr A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
title_full_unstemmed A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
title_short A Word-Embedding-Based Steganalysis Method for Linguistic Steganography via Synonym Substitution
title_sort word embedding based steganalysis method for linguistic steganography via synonym substitution
topic Steganalysis
steganography
word embedding
Skip-gram language model
TF-IDF
url https://ieeexplore.ieee.org/document/8510794/
work_keys_str_mv AT lingyunxiang awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT jingminyu awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT chunfangyang awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT daojianzeng awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT xiaoboshen awordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT lingyunxiang wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT jingminyu wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT chunfangyang wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT daojianzeng wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution
AT xiaoboshen wordembeddingbasedsteganalysismethodforlinguisticsteganographyviasynonymsubstitution