ReLu - Rectified Linear unit is the default choice of activation functions in the hidden layer. In the output layer, we use Sigmoid as activation function, because its output is in the range between 0 and 1.
By using ReLu in the hidden layer, the Neural Network will learn much faster then using sigmoid or tanah, because the slope of sigmoid and tanh is going to be 0 if z is large positive or negative number and it slow down gradient descent. The derivative of ReLu is 1 if z>0.
First we should define them:
def sigmoid(z):
return 1./(1+np.exp(-z))
def ReLU(z):
return np.maximum(0,z)
z = np.linspace(-10,10,100)
plt.plot(z, sigmoid(z),'r', label = 'sigmoid')
plt.plot(z, ReLU(z),'b',label ='ReLU')
plt.legend(fontsize=12)