Implicit dynamic regularization in deep networks

Square loss has been observed to perform well in classification tasks, at least as well as crossentropy. However, a theoretical justification is lacking. Here we develop a theoretical analysis for the square loss that also complements the existing asymptotic analysis for the exponential loss.

Bibliographic Details
Main Authors: Poggio, Tomaso, Liao, Qianli
Format: Technical Report
Published: Center for Brains, Minds and Machines (CBMM) 2020
Online Access:https://hdl.handle.net/1721.1/126653
Description
Summary:Square loss has been observed to perform well in classification tasks, at least as well as crossentropy. However, a theoretical justification is lacking. Here we develop a theoretical analysis for the square loss that also complements the existing asymptotic analysis for the exponential loss.