Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data

Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two d...

Full description

Bibliographic Details
Main Authors: Rui She, Shanyun Liu, Pingyi Fan
Format: Article
Language:English
Published: IEEE 2017-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8090523/
_version_ 1830297916671000576
author Rui She
Shanyun Liu
Pingyi Fan
author_facet Rui She
Shanyun Liu
Pingyi Fan
author_sort Rui She
collection DOAJ
description Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from O(&#x0393;<sup>-j/d</sup>) to O(&#x0393;<sup>-1</sup>) (j &#x2208; (0, d]). We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.
first_indexed 2024-12-19T07:41:40Z
format Article
id doaj.art-701d697b7f2a475ba33653816887ae91
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T07:41:40Z
publishDate 2017-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-701d697b7f2a475ba33653816887ae912022-12-21T20:30:26ZengIEEEIEEE Access2169-35362017-01-015241052411910.1109/ACCESS.2017.27683858090523Amplifying Inter-Message Distance: On Information Divergence Measures in Big DataRui She0Shanyun Liu1Pingyi Fan2https://orcid.org/0000-0002-0658-6079Department of Electronic Engineering, Tsinghua University, Beijing, ChinaDepartment of Electronic Engineering, Tsinghua University, Beijing, ChinaDepartment of Electronic Engineering, Tsinghua University, Beijing, ChinaMessage identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from O(&#x0393;<sup>-j/d</sup>) to O(&#x0393;<sup>-1</sup>) (j &#x2208; (0, d]). We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.https://ieeexplore.ieee.org/document/8090523/Message identification (M-I) divergencediscrete distribution estimationdivergence estimationbig data analysisoutlier detection
spellingShingle Rui She
Shanyun Liu
Pingyi Fan
Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data
IEEE Access
Message identification (M-I) divergence
discrete distribution estimation
divergence estimation
big data analysis
outlier detection
title Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data
title_full Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data
title_fullStr Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data
title_full_unstemmed Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data
title_short Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data
title_sort amplifying inter message distance on information divergence measures in big data
topic Message identification (M-I) divergence
discrete distribution estimation
divergence estimation
big data analysis
outlier detection
url https://ieeexplore.ieee.org/document/8090523/
work_keys_str_mv AT ruishe amplifyingintermessagedistanceoninformationdivergencemeasuresinbigdata
AT shanyunliu amplifyingintermessagedistanceoninformationdivergencemeasuresinbigdata
AT pingyifan amplifyingintermessagedistanceoninformationdivergencemeasuresinbigdata