Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions

In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s val...

Full description

Bibliographic Details
Main Authors: Maryam Moghimi, Herbert W. Corley
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/5/3/84
_version_ 1797553817764495360
author Maryam Moghimi
Herbert W. Corley
author_facet Maryam Moghimi
Herbert W. Corley
author_sort Maryam Moghimi
collection DOAJ
description In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.
first_indexed 2024-03-10T16:22:03Z
format Article
id doaj.art-e8bba6e1356f46efbcb51d1ec2b52fd5
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-03-10T16:22:03Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-e8bba6e1356f46efbcb51d1ec2b52fd52023-11-20T13:36:36ZengMDPI AGData2306-57292020-09-01538410.3390/data5030084Information Loss Due to the Data Reduction of Sample Data from Discrete DistributionsMaryam Moghimi0Herbert W. Corley1Center on Stochastic Modeling, Optimization, and Statistics (COSMOS), the University of Texas at Arlington, Arlington, TX 76013, USACenter on Stochastic Modeling, Optimization, and Statistics (COSMOS), the University of Texas at Arlington, Arlington, TX 76013, USAIn this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.https://www.mdpi.com/2306-5729/5/3/84data reductionShannon informationentropyinformation loss
spellingShingle Maryam Moghimi
Herbert W. Corley
Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
Data
data reduction
Shannon information
entropy
information loss
title Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
title_full Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
title_fullStr Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
title_full_unstemmed Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
title_short Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
title_sort information loss due to the data reduction of sample data from discrete distributions
topic data reduction
Shannon information
entropy
information loss
url https://www.mdpi.com/2306-5729/5/3/84
work_keys_str_mv AT maryammoghimi informationlossduetothedatareductionofsampledatafromdiscretedistributions
AT herbertwcorley informationlossduetothedatareductionofsampledatafromdiscretedistributions