Neural networks can be used to determine relationships and patterns between inputs and outputs. A simple single layer feed forward neural network which has a to ability to learn and differentiate data sets is known as a perceptron.
By iteratively “learning” the weights, it is possible for the perceptron to find a solution to linearly separable data (data that can be separated by a hyperplane). In this example, we will run a simple perceptron to determine the solution to a 2-input OR.
X1 or X2 can be defined as follows:
If you want to verify this yourself, run the following code in Matlab. Your code can further be modified to fit your personal needs. We first initialize our variables of interest, including the input, desired output, bias, learning coefficient and weights.
input = [0 0; 0 1; 1 0; 1 1]; numIn = 4; desired_out = [0;1;1;1]; bias = -1; coeff = 0.7; rand('state',sum(100*clock)); weights = -1*2.*rand(3,1);
The input and desired_out are self explanatory, with the bias initialized to a constant. This value can be set to any non-zero number between -1 and 1. The coeff represents the learning rate, which specifies how large of an adjustment is made to the network weights after each iteration. If the coefficient approaches 1, the weight adjustments are modified more conservatively. Finally, the weights are randomly assigned.
One more variable we will set is the iterations, specifying how many times to train or go through and modify the weights.
iterations = 10;
Now the feed forward perceptron code.
for i = 1:iterations out = zeros(4,1); for j = 1:numIn y = bias*weights(1,1)+... input(j,1)*weights(2,1)+input(j,2)*weights(3,1); out(j) = 1/(1+exp(-y)); delta = desired_out(j)-out(j); weights(1,1) = weights(1,1)+coeff*bias*delta; weights(2,1) = weights(2,1)+coeff*input(j,1)*delta; weights(3,1) = weights(3,1)+coeff*input(j,2)*delta; end end
A little explanation of the code. First, the equation solving for ‘out’ is determined as mentioned above, and then run through a sigmoid function to ensure values are squashed within a [0 1] limit. Weights are then modified iteratively based on the delta rule.
When running the perceptron over 10 iterations, the outputs begin to converge, but are still not precisely as expected:
out = 0.3756 0.8596 0.9244 0.9952 weights = 0.6166 3.2359 2.7409
As the iterations approach 1000, the output converges towards the desired output.
out = 0.0043 0.9984 0.9987 1.0000 weights = 5.4423 12.1084 11.8823
As the OR logic condition is linearly separable, a solution will be reached after a finite number of loops. Convergence time can also change based on the initial weights, the learning rate, the transfer function (sigmoid, linear, etc) and the learning rule (in this case the delta rule is used, but other algorithms like the Levenberg-Marquardt also exist). If you are interested try to run the same code for other logical conditions like ‘AND’ or ‘NAND’ to see what you get.
While single layer perceptrons like this can solve simple linearly separable data, they are not suitable for non-separable data, such as the XOR. In order to learn such a data set, you will need to use a multi-layer perceptron.