Distribution-Dependent Weighted Union Bound

In this paper, we deal with the classical Statistical Learning Theory’s problem of bounding, with high probability, the true risk <inline-formula><math display="inline"><semantics><mrow><mi>R</mi><mo>(</mo><mi>h</mi><mo>)<...

Full description

Bibliographic Details
Main Authors: Luca Oneto, Sandro Ridella
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/1/101
Description
Summary:In this paper, we deal with the classical Statistical Learning Theory’s problem of bounding, with high probability, the true risk <inline-formula><math display="inline"><semantics><mrow><mi>R</mi><mo>(</mo><mi>h</mi><mo>)</mo></mrow></semantics></math></inline-formula> of a hypothesis <i>h</i> chosen from a set <inline-formula><math display="inline"><semantics><mi mathvariant="script">H</mi></semantics></math></inline-formula> of <i>m</i> hypotheses. The Union Bound (UB) allows one to state that <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mfenced separators="" open="{" close="}"><mi mathvariant="monospace">L</mi><mfenced separators="" open="(" close=")"><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><msub><mi>q</mi><mi>h</mi></msub></mfenced><mo>≤</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≤</mo><mi mathvariant="monospace">U</mi><mfenced separators="" open="(" close=")"><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><msub><mi>p</mi><mi>h</mi></msub></mfenced></mfenced><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula> where <inline-formula><math display="inline"><semantics><mrow><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow></mrow></semantics></math></inline-formula> is the empirical errors, if it is possible to prove that <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mo>{</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≥</mo><mi mathvariant="monospace">L</mi><mrow><mo>(</mo><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><mo>)</mo></mrow><mo>}</mo><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="double-struck">P</mi><mo>{</mo><mi>R</mi><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>≤</mo><mi mathvariant="monospace">U</mi><mrow><mo>(</mo><mover accent="true"><mi>R</mi><mo>^</mo></mover><mrow><mo>(</mo><mi>h</mi><mo>)</mo></mrow><mo>,</mo><mi>δ</mi><mo>)</mo></mrow><mo>}</mo><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow></semantics></math></inline-formula>, when <i>h</i>, <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula>, and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> are chosen before seeing the data such that <inline-formula><math display="inline"><semantics><mrow><msub><mi>q</mi><mi>h</mi></msub><mo>,</mo><msub><mi>p</mi><mi>h</mi></msub><mo>∈</mo><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><msub><mo>∑</mo><mrow><mi>h</mi><mo>∈</mo><mi mathvariant="script">H</mi></mrow></msub><mrow><mo>(</mo><msub><mi>q</mi><mi>h</mi></msub><mo>+</mo><msub><mi>p</mi><mi>h</mi></msub><mo>)</mo></mrow><mo>=</mo><mn>1</mn></mrow></semantics></math></inline-formula>. If no <i>a priori</i> information is available <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> are set to <inline-formula><math display="inline"><semantics><mfrac bevelled="true"><mn>1</mn><mrow><mn>2</mn><mi>m</mi></mrow></mfrac></semantics></math></inline-formula>, namely equally distributed. This approach gives poor results since, as a matter of fact, a learning procedure targets just particular hypotheses, namely hypotheses with small empirical error, disregarding the others. In this work we set the <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> in a distribution-dependent way increasing the probability of being chosen to function with small true risk. We will call this proposal Distribution-Dependent Weighted UB (DDWUB) and we will retrieve the sufficient conditions on the choice of <inline-formula><math display="inline"><semantics><msub><mi>q</mi><mi>h</mi></msub></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><msub><mi>p</mi><mi>h</mi></msub></semantics></math></inline-formula> that state that DDWUB outperforms or, in the worst case, degenerates into UB. Furthermore, theoretical and numerical results will show the applicability, the validity, and the potentiality of DDWUB.
ISSN:1099-4300