TA-DARTS: Temperature Annealing of Discrete Operator Distribution for Effective Differential Architecture Search

In the realm of machine learning, the optimization of hyperparameters and the design of neural architectures entail laborious and time-intensive endeavors. To address these challenges, considerable research effort has been directed towards Automated Machine Learning (AutoML), with a focus on enhanci...

Full description

Bibliographic Details
Main Authors: Jiyong Shin, Kyongseok Park, Dae-Ki Kang
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/18/10138
Description
Summary:In the realm of machine learning, the optimization of hyperparameters and the design of neural architectures entail laborious and time-intensive endeavors. To address these challenges, considerable research effort has been directed towards Automated Machine Learning (AutoML), with a focus on enhancing these inherent inefficiencies. A pivotal facet of this pursuit is Neural Architecture Search (NAS), a domain dedicated to the automated formulation of neural network architectures. Given the pronounced impact of network architecture on neural network performance, NAS techniques strive to identify architectures that can manifest optimal performance outcomes. A prominent algorithm in this area is Differentiable Architecture Search (DARTS), which transforms discrete search spaces into continuous counterparts using gradient-based methodologies, thereby surpassing prior NAS methodologies. Notwithstanding DARTS’ achievements, a discrepancy between discrete and continuously encoded architectures persists. To ameliorate this disparity, we propose TA-DARTS in this study—a temperature annealing technique applied to the Softmax function, utilized for encoding the continuous search space. By leveraging temperature values, architectural weights are judiciously adjusted to alleviate biases in the search process or to align resulting architectures more closely with discrete values. Our findings exhibit advancements over the original DARTS methodology, evidenced by a 0.07%p enhancement in validation accuracy and a 0.16%p improvement in test accuracy on the CIFAR-100 dataset. Through systematic experimentation on benchmark datasets, we establish the superiority of TA-DARTS over the original mixed operator, thereby underscoring its efficacy in automating neural architecture design.
ISSN:2076-3417