xxxxxxxxxx
We start with some imports and helper functions.
We start with some imports and helper functions.
xxxxxxxxxx
import keras
from keras import layers
from keras.utils.np_utils import to_categorical
from keras import initializers
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as pltcol
xxxxxxxxxx
def target_function(x, y):
if abs(x) + abs(y) <= 1:
return 1
return 0
xxxxxxxxxx
def generate_data(N):
x = np.random.uniform(-1.4, 1.4, (N, 2))
y = np.array([to_categorical(target_function(a,b), num_classes=2) for (a,b) in x])
return (x, y)
xxxxxxxxxx
def plot(model, grid_x, grid_y, target_f, title):
num_layers = len(model.layers)
num_neurons = [model.layers[n_layer].get_weights()[0].shape[1] for n_layer in range(num_layers)]
plotting_data = [ [np.empty( (len(grid_x), len(grid_y)) ) for neuron in range(num_neurons[n_layer])] for n_layer in range(num_layers)]
data = np.array([[x,y] for x in grid_x for y in grid_y])
for layer in range(num_layers):
intermediate_layer_model = keras.Model(inputs=model.input, outputs=model.layers[layer].output)
output = intermediate_layer_model.predict(data)
num_out = 0
for i in range(len(grid_x)):
for j in range(len(grid_y)):
for (n, value) in enumerate(output[num_out]):
plotting_data[layer][n][j, i] = value
num_out += 1
horizontal_subplots = max(num_layers, 2)
vertical_subplots = max(num_neurons) + 1
fig, axs = plt.subplots(vertical_subplots, horizontal_subplots)
plt.subplots_adjust(hspace = 0.4)
for layer_index in range(num_layers):
for neuron_index in range(num_neurons[layer_index]):
pcm = axs[neuron_index, layer_index].pcolormesh(grid_x, grid_y, plotting_data[layer_index][neuron_index], vmin=0, vmax=1) #, cmap=pltcol.LinearSegmentedColormap.from_list("mycmap", ['red','blue']))
axs[neuron_index, layer_index].set_title("Layer {}, neuron {}".format(layer_index, neuron_index))
plt.colorbar(pcm, ax=axs[:,:])
target = np.array([[target_f(x,y) for y in grid_y] for x in grid_x])
axs[vertical_subplots-1, 0].pcolormesh(grid_x, grid_y, target, vmin=0, vmax=1)
axs[vertical_subplots-1, 0].set_title("Target function")
last_layer_index = num_layers - 1
result = np.array([[ np.argmax( [neuron_output[x,y] for neuron_output in plotting_data[last_layer_index]] ) / (len(plotting_data[last_layer_index]) - 1)
for y in range(len(grid_y))] for x in range(len(grid_x))])
axs[vertical_subplots-1, last_layer_index].pcolormesh(grid_x, grid_y, result, vmin=0, vmax=1)
axs[vertical_subplots-1, last_layer_index].set_title("argmax(layer {} output)".format(last_layer_index))
fig.suptitle(title)
xxxxxxxxxx
First we will take a net with 2 layers, each having 2 neurons.
First we will take a net with 2 layers, each having 2 neurons.
xxxxxxxxxx
model = keras.Sequential()
model.add(layers.Dense(2, input_dim=2, activation='sigmoid'))
model.add(layers.Dense(2, activation='sigmoid'))
xxxxxxxxxx
optimizer = keras.optimizers.SGD(lr=0.1)
model.compile(optimizer=optimizer, loss='mse')
xxxxxxxxxx
Let us take weights from the example file:
Let us take weights from the example file:
xxxxxxxxxx
model.layers[0].set_weights([np.array(((-0.2,0.3), (.5,-0.6))).T, np.array((0.1,-0.4))])
model.layers[1].set_weights([np.array(((-0.25,0.35), (.55,-0.65))).T, np.array((0.15,-0.45))])
xxxxxxxxxx
Let us see how the network behaves before training. On the left, two top pictures show outputs of the neurons from the first layer -- for each point (x,y) in the square $[-2,2]^2$, we show the output value by colours from black (0) to yellow (1.0). Note that for training we use only points from a square $[-1.4, 1.4]^2$. Bottom left part shows the target function. Two top graphs on the right are the output of the neurons from the second layer (i.e., the output of the network), and right bottom picture shows predictions. In this example, every point is predicted to be in class 0 (the black one).
Let us see how the network behaves before training. On the left, two top pictures show outputs of the neurons from the first layer -- for each point (x,y) in the square , we show the output value by colours from black (0) to yellow (1.0). Note that for training we use only points from a square . Bottom left part shows the target function. Two top graphs on the right are the output of the neurons from the second layer (i.e., the output of the network), and right bottom picture shows predictions. In this example, every point is predicted to be in class 0 (the black one).
xxxxxxxxxx
plot(model, np.linspace(-2,2,100), np.linspace(-2,2,100), target_function, "before training")
xxxxxxxxxx
Now let us train the network.
Now let us train the network.
xxxxxxxxxx
x, y = generate_data(10000)
model.fit(x, y, batch_size=10, epochs=45)
xxxxxxxxxx
...and see how it behaves after the training. It is much better, but with only 2 neurons in the first layer, it is unable to model the square target shape.
...and see how it behaves after the training. It is much better, but with only 2 neurons in the first layer, it is unable to model the square target shape.
xxxxxxxxxx
plot(model, np.linspace(-2,2,100), np.linspace(-2,2,100), target_function, "after training")
xxxxxxxxxx
Let us try with 5 neurons in the first layer.
Let us try with 5 neurons in the first layer.
xxxxxxxxxx
np.random.seed(2) # should give reproducible results, at least using Theano backend
model = keras.Sequential()
model.add(layers.Dense(5, input_dim=2, activation='sigmoid', kernel_initializer=initializers.RandomNormal(stddev=1), bias_initializer=initializers.Zeros()))
model.add(layers.Dense(2, activation='sigmoid', kernel_initializer=initializers.RandomNormal(stddev=1), bias_initializer=initializers.Zeros()))
optimizer = keras.optimizers.SGD(lr=0.1)
model.compile(optimizer=optimizer, loss='mse')
xxxxxxxxxx
x, y = generate_data(10000)
model.fit(x, y, batch_size=10, epochs=45)
xxxxxxxxxx
Let us examine the behaviour of this network after training. This one can recognise the square shape quite well. Note that neuron 1 from layer 0 outputs similar values for all points.
Let us examine the behaviour of this network after training. This one can recognise the square shape quite well. Note that neuron 1 from layer 0 outputs similar values for all points.
xxxxxxxxxx
plot(model, np.linspace(-2,2,100), np.linspace(-2,2,100), target_function, "after learning")
xxxxxxxxxx
Now let us come back to the net with just 2 neurons in each layer and initialise weights to zero:
Now let us come back to the net with just 2 neurons in each layer and initialise weights to zero:
xxxxxxxxxx
model = keras.Sequential()
model.add(layers.Dense(2, input_dim=2, activation='sigmoid'))
model.add(layers.Dense(2, activation='sigmoid'))
optimizer = keras.optimizers.SGD(lr=0.1)
model.compile(optimizer=optimizer, loss='mse')
model.layers[0].set_weights([np.zeros((2,2)), np.zeros((2,))])
model.layers[1].set_weights([np.zeros((2,2)), np.zeros((2,))])
xxxxxxxxxx
x, y = generate_data(10000)
model.fit(x, y, batch_size=10, epochs=45)
plot(model, np.linspace(-2,2,100), np.linspace(-2,2,100), target_function, "zero weights, after training")
xxxxxxxxxx
As one can see above, the network performs very poorly. Examining the weights reveals the reason for that: when initial weights are all the same (within a hidden layer), by symmetry, the neurons in that layer evolve in the same way. As a result, it is like we had only 1 neuron in that layer and this leads to poor performance.
As one can see above, the network performs very poorly. Examining the weights reveals the reason for that: when initial weights are all the same (within a hidden layer), by symmetry, the neurons in that layer evolve in the same way. As a result, it is like we had only 1 neuron in that layer and this leads to poor performance.
xxxxxxxxxx
model.get_weights() # compare columns of the first array
xxxxxxxxxx
On the other hand, if the weights are too large (with respect to the number of inputs), then the neurons saturate and the training is very slow. Let us see such an example, you may see weight $-117.57395$ (at least if the seed gives reproducible results).
On the other hand, if the weights are too large (with respect to the number of inputs), then the neurons saturate and the training is very slow. Let us see such an example, you may see weight (at least if the seed gives reproducible results).
xxxxxxxxxx
np.random.seed(2) # should give reproducible results, at least using Theano backend
model = keras.Sequential()
model.add(layers.Dense(2, input_dim=2, activation='sigmoid', kernel_initializer=initializers.RandomNormal(stddev=100), bias_initializer=initializers.Zeros()))
model.add(layers.Dense(2, activation='sigmoid', kernel_initializer=initializers.RandomNormal(stddev=100), bias_initializer=initializers.Zeros()))
optimizer = keras.optimizers.SGD(lr=0.1)
model.compile(optimizer=optimizer, loss='mse')
print(model.layers[0].get_weights()[0])
plot(model, np.linspace(-2,2,100), np.linspace(-2,2,100), target_function, "large weights, before training")
xxxxxxxxxx
Let us train the network.
Let us train the network.
xxxxxxxxxx
x, y = generate_data(10000)
model.fit(x, y, batch_size=10, epochs=45)
xxxxxxxxxx
Let us examine the same weights again, after training. The weight that had value $−117.57395$ now has a value $-117.63838$. The other weights are also almost the same. The network virtually stays unchanged despite training.
Let us examine the same weights again, after training. The weight that had value now has a value . The other weights are also almost the same. The network virtually stays unchanged despite training.
xxxxxxxxxx
print(model.layers[0].get_weights()[0])
xxxxxxxxxx
plot(model, np.linspace(-2,2,100), np.linspace(-2,2,100), target_function, "large weights, after training")