Published 2021-04-01
“…By using a max-
heap data structure within our CD algorithm, we optimally choose the largest weight variable <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>θ</mi><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msubsup></semantics></math></inline-formula> at each iteration <i>i</i> such that taking the partial derivative of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">L</mi><mo>(</mo><mi>θ</mi><mo>)</mo></mrow></semantics></math></inline-formula> with respect to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msubsup><mi>θ</mi><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msubsup></semantics></math></inline-formula> allows us to attain the next steepest descent minimizing <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">L</mi><mo>(</mo><mi>θ</mi><mo>)</mo></mrow></semantics></math></inline-formula> without using a learning rate. …”
Get full text
Article