Our model will have 2 layers, with 6 neurons in the input layer, and one neuron in the output layer. We will use normal initializer that generates tensors with a normal distribution.
The optimizer we'll use is Adam .It is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based on training data. Adam is a popular algorithm in the field of deep learning because it achieves good results fast. Default parameters follow those provided in the original paper.
To make this work in keras we need to compile the model. An important choice to make is the loss function. We use the binary_crossentropy loss and not the usual in multi-class classification used categorical_crossentropy loss. This might seem unreasonable, but we want to penalize each output node independantly. So we pick a binary loss and model the output of the network as a independent bernoulli distributions per label.
# create model
def create_model():
model = Sequential()
# Input layer
model.add(Dense(6, input_dim=2, kernel_initializer='normal', activation='relu'))
# Output layer
model.add(Dense(y_train.T.shape[1], activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=adam(learning_rate), metrics=['accuracy'])
return model
model = create_model()
model.summary()