Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions

In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s val...

Full description

Bibliographic Details
Main Authors: Maryam Moghimi, Herbert W. Corley
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/5/3/84
Description
Summary:In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.
ISSN:2306-5729