Distribution-Dependent Weighted Union Bound

In this paper, we deal with the classical Statistical Learning Theory’s problem of bounding, with high probability, the true risk <inline-formula><math display="inline"><semantics><mrow><mi>R</mi><mo>(</mo><mi>h</mi><mo>)<...

Full description

Bibliographic Details
Main Authors: Luca Oneto, Sandro Ridella
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/1/101
_version_ 1797412554312515584
author Luca Oneto
Sandro Ridella
author_facet Luca Oneto
Sandro Ridella
author_sort Luca Oneto
collection DOAJ
description In this paper, we deal with the classical Statistical Learning Theory’s problem of bounding, with high probability, the true risk <inline-formula><math display="inline"><semantics><mrow><mi>R</mi><mo>(</mo><mi>h</mi><mo>)</mo></mrow></semantics></math></inline-formula> of a hypothesis <i>h</i> chosen from a set <inline-formula><math display="inline"><semantics><mi mathvariant="script">H</mi></semantics></math></inline-formula> of <i>m</i> hypotheses. The Union Bound (UB) allows one to state that <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mfenced separators="" open="{" close="}"><mi mathvariant="monospace">L</mi><mfenced separators="" open="(" close=")"><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><msub><mi>q</mi><mi>h</mi></msub></mfenced><mo>≤</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≤</mo><mi mathvariant="monospace">U</mi><mfenced separators="" open="(" close=")"><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><msub><mi>p</mi><mi>h</mi></msub></mfenced></mfenced><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula> where <inline-formula><math display="inline"><semantics><mrow><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow></mrow></semantics></math></inline-formula> is the empirical errors, if it is possible to prove that <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mo>{</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≥</mo><mi mathvariant="monospace">L</mi><mrow><mo>(</mo><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><mo>)</mo></mrow><mo>}</mo><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mo>{</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≤</mo><mi mathvariant="monospace">U</mi><mrow><mo>(</mo><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><mo>)</mo></mrow><mo>}</mo><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula>, when <i>h</i>, <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula>, and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> are chosen before seeing the data such that <inline-formula><math display="inline"><semantics><mrow><msub><mi>q</mi><mi>h</mi></msub><mo>,</mo><msub><mi>p</mi><mi>h</mi></msub><mo>∈</mo><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><msub><mo>∑</mo><mrow><mi>h</mi><mo>∈</mo><mi mathvariant="script">H</mi></mrow></msub><mrow><mo>(</mo><msub><mi>q</mi><mi>h</mi></msub><mo>+</mo><msub><mi>p</mi><mi>h</mi></msub><mo>)</mo></mrow><mo>=</mo><mn>1</mn></mrow></semantics></math></inline-formula>. If no <i>a priori</i> information is available <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> are set to <inline-formula><math display="inline"><semantics><mfrac bevelled="true"><mn>1</mn><mrow><mn>2</mn><mi>m</mi></mrow></mfrac></semantics></math></inline-formula>, namely equally distributed. This approach gives poor results since, as a matter of fact, a learning procedure targets just particular hypotheses, namely hypotheses with small empirical error, disregarding the others. In this work we set the <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> in a distribution-dependent way increasing the probability of being chosen to function with small true risk. We will call this proposal Distribution-Dependent Weighted UB (DDWUB) and we will retrieve the sufficient conditions on the choice of <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> that state that DDWUB outperforms or, in the worst case, degenerates into UB. Furthermore, theoretical and numerical results will show the applicability, the validity, and the potentiality of DDWUB.
first_indexed 2024-03-09T05:04:49Z
format Article
id doaj.art-2f1ac63c34f44d52b7ae824395f62142
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-09T05:04:49Z
publishDate 2021-01-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-2f1ac63c34f44d52b7ae824395f621422023-12-03T12:55:45ZengMDPI AGEntropy1099-43002021-01-0123110110.3390/e23010101Distribution-Dependent Weighted Union BoundLuca Oneto0Sandro Ridella1Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genoa, Via Opera Pia 11a, 16145 Genova, ItalyDepartment of Biophysical and Electronic Engineering, University of Genoa, Via Opera Pia 11a, 16145 Genova, ItalyIn this paper, we deal with the classical Statistical Learning Theory’s problem of bounding, with high probability, the true risk <inline-formula><math display="inline"><semantics><mrow><mi>R</mi><mo>(</mo><mi>h</mi><mo>)</mo></mrow></semantics></math></inline-formula> of a hypothesis <i>h</i> chosen from a set <inline-formula><math display="inline"><semantics><mi mathvariant="script">H</mi></semantics></math></inline-formula> of <i>m</i> hypotheses. The Union Bound (UB) allows one to state that <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mfenced separators="" open="{" close="}"><mi mathvariant="monospace">L</mi><mfenced separators="" open="(" close=")"><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><msub><mi>q</mi><mi>h</mi></msub></mfenced><mo>≤</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≤</mo><mi mathvariant="monospace">U</mi><mfenced separators="" open="(" close=")"><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><msub><mi>p</mi><mi>h</mi></msub></mfenced></mfenced><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula> where <inline-formula><math display="inline"><semantics><mrow><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow></mrow></semantics></math></inline-formula> is the empirical errors, if it is possible to prove that <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mo>{</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≥</mo><mi mathvariant="monospace">L</mi><mrow><mo>(</mo><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><mo>)</mo></mrow><mo>}</mo><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mo>{</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≤</mo><mi mathvariant="monospace">U</mi><mrow><mo>(</mo><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><mo>)</mo></mrow><mo>}</mo><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula>, when <i>h</i>, <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula>, and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> are chosen before seeing the data such that <inline-formula><math display="inline"><semantics><mrow><msub><mi>q</mi><mi>h</mi></msub><mo>,</mo><msub><mi>p</mi><mi>h</mi></msub><mo>∈</mo><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><msub><mo>∑</mo><mrow><mi>h</mi><mo>∈</mo><mi mathvariant="script">H</mi></mrow></msub><mrow><mo>(</mo><msub><mi>q</mi><mi>h</mi></msub><mo>+</mo><msub><mi>p</mi><mi>h</mi></msub><mo>)</mo></mrow><mo>=</mo><mn>1</mn></mrow></semantics></math></inline-formula>. If no <i>a priori</i> information is available <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> are set to <inline-formula><math display="inline"><semantics><mfrac bevelled="true"><mn>1</mn><mrow><mn>2</mn><mi>m</mi></mrow></mfrac></semantics></math></inline-formula>, namely equally distributed. This approach gives poor results since, as a matter of fact, a learning procedure targets just particular hypotheses, namely hypotheses with small empirical error, disregarding the others. In this work we set the <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> in a distribution-dependent way increasing the probability of being chosen to function with small true risk. We will call this proposal Distribution-Dependent Weighted UB (DDWUB) and we will retrieve the sufficient conditions on the choice of <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> that state that DDWUB outperforms or, in the worst case, degenerates into UB. Furthermore, theoretical and numerical results will show the applicability, the validity, and the potentiality of DDWUB.https://www.mdpi.com/1099-4300/23/1/101union boundweighted union bounddistribution-dependent weightsstatistical learning theoryfinite number of hypothesis
spellingShingle Luca Oneto
Sandro Ridella
Distribution-Dependent Weighted Union Bound
Entropy
union bound
weighted union bound
distribution-dependent weights
statistical learning theory
finite number of hypothesis
title Distribution-Dependent Weighted Union Bound
title_full Distribution-Dependent Weighted Union Bound
title_fullStr Distribution-Dependent Weighted Union Bound
title_full_unstemmed Distribution-Dependent Weighted Union Bound
title_short Distribution-Dependent Weighted Union Bound
title_sort distribution dependent weighted union bound
topic union bound
weighted union bound
distribution-dependent weights
statistical learning theory
finite number of hypothesis
url https://www.mdpi.com/1099-4300/23/1/101
work_keys_str_mv AT lucaoneto distributiondependentweightedunionbound
AT sandroridella distributiondependentweightedunionbound