Previously, Matlab Geeks discussed a simple perceptron, which involves feed-forward learning based on two layers: inputs and outputs. Today we’re going to add a little more complexity by including a third layer, or a hidden layer into the network. A reason for doing so is based on the concept of linear separability. While logic gates like “OR”, “AND” or “NAND” can have 0’s and 1’s separated by a single line (or hyperplane in multiple dimensions), this linear separation is not possible for “XOR” (exclusive OR).
As the two images above demonstrate, a single line can separate values that return 1 and 0 for the “OR” gate, but no such line can be drawn for the “XOR” logic. Therefore, a simple perceptron cannot solve the XOR problem. What we need is a nonlinear means of solving this problem, and that is where multi-layer perceptrons can help.
First let’s initialize all of our variables, including the input, desired output, bias, learning coefficient, iterations and randomized weights.
% XOR input for x1 and x2 input = [0 0; 0 1; 1 0; 1 1]; % Desired output of XOR output = [0;1;1;0]; % Initialize the bias bias = [-1 -1 -1]; % Learning coefficient coeff = 0.7; % Number of learning iterations iterations = 10000; % Calculate weights randomly using seed. rand('state',sum(100*clock)); weights = -1 +2.*rand(3,3);
Similar to biological neurons which are activated when a certain threshold is reached, we will once again use a sigmoid transfer function to provide a nonlinear activation of our neural network. As we mentioned in our previous lesson, the sigmoid function 1/(1+e^(-x)) will squash all values between the range of 0 and 1. Also a requirement of the function in multilayer perceptrons, which use backpropagation to learn, is that this sigmoid activation function is continuously differentiable.
Lets set up our network to have 5 total neurons (if you are interested you can change the number of hidden nodes, change the learning rate, change the learning algorithm, change the activation functions as needed. In fact the artificial neural network toolbox in Matlab allows you to modify all these as well.)
Whereas before we stated the delta rule as delta = (desired out)-(network output), we will use a modification, which is nicely explained by generation5 .
Back-propagation with the delta rule will allow us to modify the weights at each node in the network based on the error at the current level n and at the n+1 level.
Now for the code with back propagation.
for i = 1:iterations out = zeros(4,1); numIn = length (input(:,1)); for j = 1:numIn % Hidden layer H1 = bias(1,1)*weights(1,1) + input(j,1)*weights(1,2) + input(j,2)*weights(1,3); % Send data through sigmoid function 1/1+e^-x % Note that sigma is a different m file % that I created to run this operation x2(1) = sigma(H1); H2 = bias(1,2)*weights(2,1) + input(j,1)*weights(2,2) + input(j,2)*weights(2,3); x2(2) = sigma(H2); % Output layer x3_1 = bias(1,3)*weights(3,1) + x2(1)*weights(3,2) + x2(2)*weights(3,3); out(j) = sigma(x3_1); % Adjust delta values of weights % For output layer: % delta(wi) = xi*delta, % delta = (1-actual output)*(desired output - actual output) delta3_1 = out(j)*(1-out(j))*(output(j)-out(j)); % Propagate the delta backwards into hidden layers delta2_1 = x2(1)*(1-x2(1))*weights(3,2)*delta3_1; delta2_2 = x2(2)*(1-x2(2))*weights(3,3)*delta3_1; % Add weight changes to original weights % And use the new weights to repeat process. % delta weight = coeff*x*delta for k = 1:3 if k == 1 % Bias cases weights(1,k) = weights(1,k) + coeff*bias(1,1)*delta2_1; weights(2,k) = weights(2,k) + coeff*bias(1,2)*delta2_2; weights(3,k) = weights(3,k) + coeff*bias(1,3)*delta3_1; else % When k=2 or 3 input cases to neurons weights(1,k) = weights(1,k) + coeff*input(j,1)*delta2_1; weights(2,k) = weights(2,k) + coeff*input(j,2)*delta2_2; weights(3,k) = weights(3,k) + coeff*x2(k-1)*delta3_1; end end end end
Try to go through each step individually, and there are some additionally great tutorials online, including the generation5 website.
As for the final results?
Well the Weights are:
weights = -9.8050 -6.0907 -7.0623 -2.4839 -5.3249 -6.9537 5.7278 12.1571 -12.8941
These weights represent just one possible solution to this problem, but based on these results, what is the output? Drumroll please…
out = 0.0042 0.9961 0.9956 0.0049
Not bad, as the expected is [0; 1; 1; 0], which gives us a mean squared error (MSE) of 1.89*10^-5. Of course we ran this using 100,000 iterations, and while this could be optimized further, stopped earlier modified to have different architecture as mentioned previously, I’ll leave that to you or for a further lesson down the road.