PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING

QActivation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in in...

Full description

Bibliographic Details
Main Authors: Hock Hung Chieng, Noorhaniza Wahid, Pauline Ong
Format: Article
Language:English
Published: UUM Press 2020-11-01
Series:Journal of ICT
Subjects:
Online Access:https://e-journal.uum.edu.my/index.php/jict/article/view/12398
_version_ 1818490187927257088
author Hock Hung Chieng
Noorhaniza Wahid
Pauline Ong
author_facet Hock Hung Chieng
Noorhaniza Wahid
Pauline Ong
author_sort Hock Hung Chieng
collection DOAJ
description QActivation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN-5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.
first_indexed 2024-12-10T17:13:54Z
format Article
id doaj.art-4f7d0c10002c486287d6da57fe38d206
institution Directory Open Access Journal
issn 1675-414X
2180-3862
language English
last_indexed 2024-12-10T17:13:54Z
publishDate 2020-11-01
publisher UUM Press
record_format Article
series Journal of ICT
spelling doaj.art-4f7d0c10002c486287d6da57fe38d2062022-12-22T01:40:11ZengUUM PressJournal of ICT1675-414X2180-38622020-11-01201PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNINGHock Hung Chieng0Noorhaniza Wahid1Pauline Ong2Faculty of Information Technology and Computer Science, Universiti Tun Hussein Onn Malaysia, MalaysiaFaculty of Information Technology and Computer Science, Universiti Tun Hussein Onn Malaysia, MalaysiaFaculty of Mechanical and Manufacturing Engineering, Universiti Tun Hussein Onn Malaysia, Malaysia QActivation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN-5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks. https://e-journal.uum.edu.my/index.php/jict/article/view/12398Activation functiondeep learningFlatten-T Swishnon-linearityReLU
spellingShingle Hock Hung Chieng
Noorhaniza Wahid
Pauline Ong
PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING
Journal of ICT
Activation function
deep learning
Flatten-T Swish
non-linearity
ReLU
title PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING
title_full PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING
title_fullStr PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING
title_full_unstemmed PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING
title_short PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING
title_sort parametric flatten t swish an adaptive nonlinear activation function for deep learning
topic Activation function
deep learning
Flatten-T Swish
non-linearity
ReLU
url https://e-journal.uum.edu.my/index.php/jict/article/view/12398
work_keys_str_mv AT hockhungchieng parametricflattentswishanadaptivenonlinearactivationfunctionfordeeplearning
AT noorhanizawahid parametricflattentswishanadaptivenonlinearactivationfunctionfordeeplearning
AT paulineong parametricflattentswishanadaptivenonlinearactivationfunctionfordeeplearning