# Activation Functions

Activation Functions in Deep Learning with LaTeX Applications

## Summary

purpose, 37 activation functions are explained both mathematically and visually, and

given with their LaTeX implementations due to their common use in scientific articles.

## Excerpt

## Table Of Contents

- Cover
- Title
- Copyright
- About the author
- About the book
- This eBook can be cited
- Table of Contents
- Introduction
- 1. Machine Learning
- 1.1. Types of Machine Learning
- 1.2. Supervised Learning
- 1.2.1. Regression
- 1.2.2. Classification and Logistic Regression
- 1.3. Unsupervised Learning
- 1.3.1. Clustering
- 1.4. Semi-supervised Learning
- 1.5. Reinforcement Learning
- 1.6. Federated Learning
- 1.7. Transfer Learning
- 1.8. Ensemble Learning
- 2. Neural Networks
- 2.1. Single Layer Perceptron
- 2.2. Deep Neural Networks
- 2.3. Architecture Design
- 2.3.1. Feed Forwards
- 2.3.2. Convolutional Neural Networks
- 2.3.3. Sequence Modeling
- 2.3.3.1. Recurrent Neural Networks
- 3. Activation Functions
- 4. Monotonic Activation Functions
- 4.1. Linear Function
- 4.1.1. Identity Function
- 4.1.2. Piecewise Linear Function
- 4.2. Threshold (Unit Heaviside, Binary, Step) Function
- 4.3. Sigmoid Function
- 4.3.1. Bipolar Sigmoid Function
- 4.4. Rectified Linear Unit (ReLU)
- 4.4.1. Leaky Rectified Linear Unit (LReLU)
- 4.4.2. Parametric Rectified Linear Unit (PReLU)
- 4.4.3. Randomized Rectified Linear Unit (RReLU)
- 4.5. Exponential Linear Unit (ELU)
- 4.5.1. Scaled Exponential Linear Unit (SELU)
- 4.6. SoftMax Function
- 4.7. Odd Activation (Signum, Sign) Function
- 4.8. Maxout Function
- 4.9. Softsign Function
- 4.10. Elliott Function
- 4.11. Hyperbolic Tangent (Tanh) Function
- 4.11.1. Arc Tangent Function
- 4.11.2. Lecun’s Hyperbolic Tangent Function
- 4.12. Complementary log-log Function
- 4.13. Softplus Function
- 4.14. Bent Identity Function
- 4.15. Soft Exponential Function
- 5. Periodic Activation Functions
- 5.1. Sinusoidals
- 5.1.1. Sine Wave Function
- 5.1.2. Cardinal Sine Function (Sinc)
- 5.1.3. Fourier Transform (FT, DFT/FFT)
- 5.1.4. Discrete-time Dimensional Fourier Transform (DTFT, Shannon-Nyquist)
- 5.1.5. Short-Time Fourier Transform (Gabor, STFT)
- 5.1.6. Wavelet Transform
- 5.2. Non-sinusoidals
- 5.2.1. Gaussian (Normal) Distribution Function
- 5.2.2. Square Wave Function
- 5.2.3. Triangle Wave Function
- 5.2.4. Sawtooth Wave Function
- 5.2.5. S-shaped Rectifed Linear Unit (SReLU)
- 5.2.6. Adaptive Piecewise Linear Unit (APLU)
- 6. Bias Unit
- References
- LaTeX Applications

## Introduction

In recent years, it can be easily observed that statistics has been renewing itself by means of recent advances in both the processing and the data. Processing technologies are not restricted to the CPU itself but encompasses recent developments in other technologies such as GPU, TPU and other unclassified processors as singular or parallel processing units. The data has also been growing day by day^{1} with the emergence and widespread use of smartphone technologies which has accelerated, firstly, data creation, and then collection and its storage compared to other resources. These two components of the technology provide a great, convenient basis for the birth of Zeitgeist^{2}: Artificial Intelligence (AI hereafter).

AI actually has a long history which can be traced back to the Mechanical Turk (Standage, 2002), a chess machine, controlled by a human who is hidden in the machine playing against other humans as an intelligent and fake-automated machine thinking moves. The breakthrough in AI was made by Alan Turing when trying to break and translate the codes used by axis forces in the Second World War, which made AI an academic discipline in the late 1950’s. This breakthrough was based on categorical and logical implementation of codes, symbolic reasoning and mathematical expressions to imitate a human brain. The first neural network implementation called SNARC was carried out during those years McCorduck (1979). Russell et al. (1995) assert that AI is achieved by changing the way of thinking. It has been achieved by making computers more intelligent and learnable than just codes, which feeds computers consistently from the environment or feeds them with data created or provided by human. So, the utopia of aiming to create a machine thinking, acting and feeling like a human has become a reality thanks to the changes in the way of thinking; switching from creating categorical or logical algorithms that shape problem solving behaviour of computers to computers using information-based learning. Hence, the information, namely data, became the source of learning.

The human brain learns the information which is provided by itself or by the environment. In order to make machines intelligent, the first thing which should ←9 | 10→be done is to mimic the algorithm of learning. Previously, it was questioned if machines, after the problem was declared to them, can solve the problem if the problem exists and has been defined for the machine. The word is the key concept of the algorithms before the 21st century. After that, the learning could be built on whether the condition of “if” exists, as human brains do. So then, the machines are programmed to learn “if” statements’ through their own capabilities, and the main condition for that is to feed into the machines the information, the data, and implicit programming. This process as a whole is called machine learning. Computers can learn from the data, they can make their own “if statements” by generating equation(s), establishing a relationship between informations, separating others etc. Machine learning is a discipline of learning from data in order to make predictions using them.

## 1. Machine Learning

The best and general definition of what machine learning may come from Mitchell (1997) explaining that “machine learning is a discipline of computer algorithms that improve automatically through experience”. In the word experience, there is a hidden and a deep meaning that cannot be explained by stating the key word, the data. As Einstein once said “learning is experience, everything else is just information” as if he wanted to endorse the empiricist school (Zalta et al., 2003). Machine learning algorithms are binding these two sides in

## Details

- Pages
- 84
- Year
- 2022
- ISBN (PDF)
- 9783631876701
- ISBN (ePUB)
- 9783631876718
- ISBN (MOBI)
- 9783631876725
- ISBN (Softcover)
- 9783631873281
- DOI
- 10.3726/b19631
- Language
- English
- Publication date
- 2022 (May)
- Published
- Berlin, Bern, Bruxelles, New York, Oxford, Warszawa, Wien, 2022. 84 pp., 28 fig. b/w, 1 tables.