Comparison of an effectiveness of artificial neural networks for various activation functions

Daniel Florek; Marek Miłosz

Download PDF - Comparison of an effectiveness of artificial neural networks for various activation functions, Opens in new tab

ArticleOriginal scientific text

Title

ENG

POL

Comparison of an effectiveness of artificial neural networks for various activation functions

Authors ¹, ¹

Affiliations

Department of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland

Abstract

ENG

POL

Activation functions play an important role in artificial neural networks (ANNs) because they break the linearity in the data transformations that are performed by models. Thanks to the recent spike in interest around the topic of ANNs, new improvements to activation functions are emerging. The paper presents the results of research on the effectiveness of ANNs for ReLU, Leaky ReLU, ELU, and Swish activation functions. Four different data sets, and three different network architectures were used. Results show that Leaky ReLU, ELU and Swish functions work better in deep and more complex architectures which are to alleviate vanishing gradient and dead neurons problems. Neither of the three aforementioned functions comes ahead in accuracy in all used datasets, although Swish activation speeds up training considerably and ReLU is the fastest during prediction process.

Funkcje aktywacji, przełamując linową naturę transformacji zachodzących w sztucznych sieciach neuronowych (SSN), pozwalają na uczenie skomplikowanych wzorców występujących w danych wejściowych, np. w obrazach. Wzrost zain-teresowania wokół SSN skłonił naukowców do badań wokół różnolitych aktywacji, które mogą dać przewagę podczas uczenia jak i przewidywania, ostatecznie przyczyniając się do powstania nowych, interesujących rozwiązań. W artykule przedstawiono wyniki badań nad efektywnością SSN dla funkcji ReLU, Leaky ReLU, ELU oraz Swish, przy użyciu czterech zbiorów danych i trzech różnych architektur SSN. Wyniki pokazują, że funkcje Leaky ReLU, ELU i Swish lepiej sprawdzają się w głębokich i bardziej skomplikowanych architekturach, mając za zadanie zapobieganie proble-mom zanikającego gradientu (ang. Vanishing Gradient) i martwych neuronów (ang. Dead neurons). Żadna z trzech wyżej wymienionych funkcji nie ma przewagi w celności (ang. Accuracy), jednakże Swish znacznie przyspiesza ucze-nie SSN, a ReLU jest najszybsza w procesie przewidywania

Keywords

ENG

POL

activation functions, artificial neural networks, artificial intelligence

1. A. Abraham, Artificial neural networks. Handbook of measuring system design, John Wiley and Sons Ltd., London (2005) 901-908, https://doi.org/10.1002/0471497398.mm421.DOI: https://doi.org/10.1002/0471497398.mm421 Google Scholar
2. V. Nair, G. E. Hinton, Rectified Linear Units Improve Restricted Boltzmann Machines in Proceedings of the 27th International Conference on International Conference on Machine Learning, Omnipress, Madison (2010) 807-814. Google Scholar
3. P. Ramachandran, B. Zoph, Q. V. Le, Searching for activation functions, arXiv (2017), https://doi.org/10.48550/arXiv.1710.05941. Google Scholar
4. Krizhevsky, V. Nair, G. E. Hinton, CIFAR-10 and CIFAR-100 datasets http://www.cs.toronto.edu/~kriz/cifar.html , [14.06.2022]. Google Scholar
5. D. A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), Published as a conference paper at ICLR 2016 (2015), https://doi.org/10.48550/arXiv.1511.07289. Google Scholar
6. B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, arXiv (2015), https://doi.org/10.48550/arXiv.1505.00853. Google Scholar
7. C. Nwankpa, W. Ijomah, A. Gachagan, S. Marshall, Activation functions: Comparison of trends in practice and research for deep learning, arXiv (2018), https://doi.org/10.48550/arxiv.1811.03378. Google Scholar
8. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean et al., TensorFlow: A System for Large-Scale Machine Learning, OSDI 16 (2016) 265-283. Google Scholar
9. Keras, https://keras.io , [14.06.2022]. Google Scholar
10. F. Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 12 (2011) 2825-2830, https://doi.org/10.48550/arXiv.1201.0490. Google Scholar
11. G. Van Rossum, F. L. Drake, Python 3 Reference Manual, CA: CreateSpace, Scotts Valley, 2009. Google Scholar
12. Anaconda platform website https://anaconda.org/ , [14.06.2022]. Google Scholar
13. Animals 10 dataset https://www.kaggle.com/datasets/alessiocorrado99/animals10 , [14.06.2022]. Google Scholar
14. Intel Image Classification dataset https://www.kaggle.com/datasets/puneet6060/intel-image-classification , [14.06.2022]. Google Scholar
15. K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, arXiv (2015), https://doi.org/10.48550/arXiv.1502.01852.DOI: https://doi.org/10.1109/ICCV.2015.123 Google Scholar
16. X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Journal of Machine Learning Research - Proceedings Track 9 (2010) 249-256. Google Scholar
17. S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International conference on machine learning, PMLR 37 (2015) 448-456, https://doi.org/10.48550/arXiv.1502.03167. Google Scholar
18. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition (2016) 770-778, https://doi.org/10.48550/arXiv.1512.03385.DOI: https://doi.org/10.1109/CVPR.2016.90 Google Scholar
19. F. Chollet, Xception: Deep learning with depthwise separable convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition (2017) 1251-1258, https://doi.org/10.1109/CVPR.2017.195.DOI: https://doi.org/10.1109/CVPR.2017.195 Google Scholar
20. tf.data.Dataset API https://www.tensorflow.org/api_docs/python/tf/data/Dataset , [20.06.2022]. Google Scholar
21. P. Refaeilzadeh, L. Tang, H. Liu, Cross-Validation. Encyclopedia of Database Systems. Springer, Boston (2009), https://doi.org/10.1007/978-0-387-39940-9_565.DOI: https://doi.org/10.1007/978-0-387-39940-9_565 Google Scholar
22. D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv (2014), https://doi.org/10.48550/arXiv.1412.6980. Google Scholar

Title

Comparison of an effectiveness of artificial neural networks for various activation functions

Affiliations

Abstract

Keywords

Bibliography