Neural Networks – A Multilayer Perceptron in Matlab

Previously, Matlab Geeks discussed a simple perceptron, which involves feed-forward learning based on two layers: inputs and outputs. Today we’re going to add a little more complexity by including a third layer, or a hidden layer into the network. A reason for doing so is based on the concept of linear separability. While logic gates like “OR”, “AND” or “NAND” can have 0′s and 1′s separated by a single line (or hyperplane in multiple dimensions), this linear separation is not possible for “XOR” (exclusive OR).

Linear separability for OR and XOR logic gates

As the two images above demonstrate, a single line can separate values that return 1 and 0 for the “OR” gate, but no such line can be drawn for the “XOR” logic. Therefore, a simple perceptron cannot solve the XOR problem. What we need is a nonlinear means of solving this problem, and that is where multi-layer perceptrons can help.

First let’s initialize all of our variables, including the input, desired output, bias, learning coefficient, iterations and randomized weights.

% XOR input for x1 and x2
input = [0 0; 0 1; 1 0; 1 1];
% Desired output of XOR
output = [0;1;1;0];
% Initialize the bias
bias = [-1 -1 -1];
% Learning coefficient
coeff = 0.7;
% Number of learning iterations
iterations = 10000;
% Calculate weights randomly using seed.
rand('state',sum(100*clock));
weights = -1 +2.*rand(3,3);

Similar to biological neurons which are activated when a certain threshold is reached, we will once again use a sigmoid transfer function to provide a nonlinear activation of our neural network. As we mentioned in our previous lesson, the sigmoid function 1/(1+e^(-x)) will squash all values between the range of 0 and 1. Also a requirement of the function in multilayer perceptrons, which use backpropagation to learn, is that this sigmoid activation function is continuously differentiable.

Lets set up our network to have 5 total neurons (if you are interested you can change the number of hidden nodes, change the learning rate, change the learning algorithm, change the activation functions as needed. In fact the artificial neural network toolbox in Matlab allows you to modify all these as well.)

This is how the network will look like, with the subscript numbers utilized as indexing in the Matlab code as well.
A multi layer perceptron demonstrating one hidden layer with 2 nodes.

Whereas before we stated the delta rule as delta = (desired out)-(network output), we will use a modification, which is nicely explained by generation5 .
Back-propagation with the delta rule will allow us to modify the weights at each node in the network based on the error at the current level n and at the n+1 level.

Now for the code with back propagation.

for i = 1:iterations
   out = zeros(4,1);
   numIn = length (input(:,1));
   for j = 1:numIn
      % Hidden layer
      H1 = bias(1,1)*weights(1,1)      
          + input(j,1)*weights(1,2)
          + input(j,2)*weights(1,3);

      % Send data through sigmoid function 1/1+e^-x
      % Note that sigma is a different m file 
      % that I created to run this operation
      x2(1) = sigma(H1);
      H2 = bias(1,2)*weights(2,1)
           + input(j,1)*weights(2,2)
           + input(j,2)*weights(2,3);
      x2(2) = sigma(H2);

      % Output layer
      x3_1 = bias(1,3)*weights(3,1)
             + x2(1)*weights(3,2)
             + x2(2)*weights(3,3);
      out(j) = sigma(x3_1);
      
      % Adjust delta values of weights
      % For output layer:
      % delta(wi) = xi*delta,
      % delta = (1-actual output)*(desired output - actual output) 
      delta3_1 = out(j)*(1-out(j))*(output(j)-out(j));
      
      % Propagate the delta backwards into hidden layers
      delta2_1 = x2(1)*(1-x2(1))*weights(3,2)*delta3_1;
      delta2_2 = x2(2)*(1-x2(2))*weights(3,3)*delta3_1;
      
      % Add weight changes to original weights 
      % And use the new weights to repeat process.
      % delta weight = coeff*x*delta
      for k = 1:3
         if k == 1 % Bias cases
            weights(1,k) = weights(1,k) + coeff*bias(1,1)*delta2_1;
            weights(2,k) = weights(2,k) + coeff*bias(1,2)*delta2_2;
            weights(3,k) = weights(3,k) + coeff*bias(1,3)*delta3_1;
         else % When k=2 or 3 input cases to neurons
            weights(1,k) = weights(1,k) + coeff*input(j,1)*delta2_1;
            weights(2,k) = weights(2,k) + coeff*input(j,2)*delta2_2;
            weights(3,k) = weights(3,k) + coeff*x2(k-1)*delta3_1;
         end
      end
   end   
end

Try to go through each step individually, and there are some additionally great tutorials online, including the generation5 website.

As for the final results?
Well the Weights are:

weights =
   -9.8050   -6.0907   -7.0623
   -2.4839   -5.3249   -6.9537
    5.7278   12.1571  -12.8941

These weights represent just one possible solution to this problem, but based on these results, what is the output? Drumroll please…

out =
    0.0042
    0.9961
    0.9956
    0.0049

Not bad, as the expected is [0; 1; 1; 0], which gives us a mean squared error (MSE) of 1.89*10^-5. Of course we ran this using 100,000 iterations, and while this could be optimized further, stopped earlier modified to have different architecture as mentioned previously, I’ll leave that to you or for a further lesson down the road.

Bookmark and Share

64 thoughts on “Neural Networks – A Multilayer Perceptron in Matlab

  1. hi
    i need your help!!!!
    how can i found out use perceptron with one layer or multi layers for solving a problem

  2. %Created by Arghya Pal
    %Date 09/03/2014
    %M.Tech, Goa University
    % a program for back propagation network for EX-OR gate
    %Machine Learning
    close all, clear all, clc

    x = [0 0 1 1; 0 1 0 1]

    t = [0 1 1 0]

    [ni N] = size(x)

    [no N] = size(t)

    nh = 2

    % wih = .1*ones(nh,ni+1);

    % who = .1*ones(no,nh+1);

    wih = 0.01*randn(nh,ni+1);

    who = 0.01*randn(no,nh+1);

    c = 0;
    while(c < 3000)
    c = c+1;
    % %for i = 1:length(x(1,:))

    for i = 1:N
    for j = 1:nh
    netj(j) = wih(j,1:end-1)*x(:,i)+wih(j,end);
    % %outj(j) = 1./(1+exp(-netj(j)));

    outj(j) = tansig(netj(j));
    end
    % hidden to output layer
    for k = 1:no
    netk(k) = who(k,1:end-1)*outj' + who(k,end);
    outk(k) = 1./(1+exp(-netk(k)));
    delk(k) = outk(k)*(1-outk(k))*(t(k,i)-outk(k));
    end
    % back propagation
    for j = 1:nh
    s=0;
    for k = 1:no
    s = s + who(k,j)*delk(k);
    end
    delj(j) = outj(j)*(1-outj(j))*s;
    % %s=0;

    end
    for k = 1:no
    for l = 1:nh
    who(k,l) = who(k,l)+.5*delk(k)*outj(l);
    end
    who(k,l+1) = who(k,l+1)+1*delk(k)*1;
    end
    for j = 1:nh
    for ii = 1:ni
    wih(j,ii) = wih(j,ii)+.5*delj(j)*x(ii,i);
    end
    wih(j,ii+1) = wih(j,ii+1)+1*delj(j)*1;
    end
    end
    end
    h = tansig(wih*[x;ones(1,N)])

    y = logsig(who*[h;ones(1,N)])

    e = t-round(y)

  3. hi,pleaze
    i neede source code matlab neural network MLP for character recognition.
    i neede source code matlabe backpropagation.
    size character input is 5*7
    pleaze me help
    mr30

    • %Created by Arghya Pal
      %Date 09/03/2014
      %M.Tech, Goa University
      % a program for back propagation network for EX-OR gate
      %Machine Learning
      close all, clear all, clc

      x = [0 0 1 1; 0 1 0 1]

      t = [0 1 1 0]

      [ni N] = size(x)

      [no N] = size(t)

      nh = 2

      % wih = .1*ones(nh,ni+1);

      % who = .1*ones(no,nh+1);

      wih = 0.01*randn(nh,ni+1);

      who = 0.01*randn(no,nh+1);

      c = 0;
      while(c < 3000)
      c = c+1;
      % %for i = 1:length(x(1,:))

      for i = 1:N
      for j = 1:nh
      netj(j) = wih(j,1:end-1)*x(:,i)+wih(j,end);
      % %outj(j) = 1./(1+exp(-netj(j)));

      outj(j) = tansig(netj(j));
      end
      % hidden to output layer
      for k = 1:no
      netk(k) = who(k,1:end-1)*outj' + who(k,end);
      outk(k) = 1./(1+exp(-netk(k)));
      delk(k) = outk(k)*(1-outk(k))*(t(k,i)-outk(k));
      end
      % back propagation
      for j = 1:nh
      s=0;
      for k = 1:no
      s = s + who(k,j)*delk(k);
      end
      delj(j) = outj(j)*(1-outj(j))*s;
      % %s=0;

      end
      for k = 1:no
      for l = 1:nh
      who(k,l) = who(k,l)+.5*delk(k)*outj(l);
      end
      who(k,l+1) = who(k,l+1)+1*delk(k)*1;
      end
      for j = 1:nh
      for ii = 1:ni
      wih(j,ii) = wih(j,ii)+.5*delj(j)*x(ii,i);
      end
      wih(j,ii+1) = wih(j,ii+1)+1*delj(j)*1;
      end
      end
      end
      h = tansig(wih*[x;ones(1,N)])

      y = logsig(who*[h;ones(1,N)])

      e = t-round(y)

  4. it was very good. but i think a book that after calculating the delta3_1 it goes toward weight and after that calculate the delta2_1 and delta2_2

    are ther both of them corect or not??

  5. Hi,

    I also get these values for the output, instead of the expected ones.

    0.499524196804705
    0.477684866785518
    0.500475803195295
    0.522315133214482

    I replaced sigma by x2(1) = 1/(1+exp(-x)); as everyone else did. Any suggestions? Is it possible to mail me the sigma.m file to rerun the code.Good work and thanks in advance

  6. Bravo! Good approach.
    But you forgot something to attach, that makes your work incomplete. Can you add here or send me the sigma.m file? Then I can have a look.

  7. thank you a lot for your program it works well, i want to use a linear function in the out node but when i integrate th purelin function or my own linear function the execution fail and i have nan nan nan in out if you can help me for this problem i’m waiting your answer
    thanks

  8. Hi,
    I am user of artificial neural nets, I am looking for multi-layer perceptron and backpropagation. Is there possibility to help me to write an incremental multilayer perceptron matlab code
    thank you

  9. Hi,
    I am user of neural nets, I am looking for backpropagation with incremental or stochastic mode, Is there possibility to help me to write an incremental multilayer perceptron matlab code for input/output regression
    thank you

  10. Hi,
    I am user of neural nets, I am looking for backpropagation with incremental or stochastic mode, Is there possibility to help me to write an incremental multilayer perceptron matlab code
    thank you

  11. I’m trying to adapt this code to work in a character recognition program. A letter is converted to a binary array of 1′s and 0′s. The dimensions are 1×100. I have 4 letters that I would like to train the network to recognize. So the size of input would be 4×100. How can I adapt the code to work for my input size? Also, I would like the network to get trained so that the output is 1 2 3 4 instead of a 0 or 1 like your code does.

  12. This is very informative and thanks a lot. What if I want to just evolve a neural network to approximate a function. I dont want to train or use backpropagation, just a straight forward evolution of a nueral network. I understand that soem people call it NeuroEvolution.
    Thanks in advance

  13. hi…nice work but first as i read in the previous comments the scheme differ than the code& the answer is not c correct for me

    0.499524196804705
    0.477684866785518
    0.500475803195295
    0.522315133214482

    this the out value idk why????

    i replaced sigma by x2(1) = 1/(1+exp(-H1));

    thx in advanced

  14. Hi,

    thanks for the tutorial, but how to implementation matlab program MLP NN to analisis Network Intrusion detection System with input Protocol ID, DNS, Source MAC,Source IP, Source Port, Dest MAC, Dest IP, Dest Port, ICMP type, ICMP Code, Raw Data Length, Raw Data Size and Output Attac or Not Attack and how to build hiden layer for that…. Thks

  15. I seem to have the same problem as Don as well. The output is converging to 0.5;0.5;0.5;0.5 instead of 0;1;1;0. I use the exact code as posted above, but replaced the sigma function
    function x = sigma(net)

    x = 1/(1+exp(-net));
    end

    • hi behnaz, i think i have found a problem in the backpropagation part of this code. I havent received any replies from the site admin to a few of my questions but reply to this if you want to collaborate! :) cheers Jonathan

  16. Hi,

    thanks for the intro to MLP’s and the example code – its been very helpful.

    Out of interest, if I am building a net to predict, say efficiency of a component (in percent i.e. 0 – 100%) based of input parameters, how would I change the code to allow for outputs greater that the standard range [0 1] (for the sigmoid function)?

    In other words, all the examples i have seen so far, give an output of either 0 or 1 or a value inbetween, but cannot give outputs outside of this range.

    Kind regards
    Jonathan

    • Jonathan,
      Have you figure out how to get values outside of [0 1]? I was thinking of scaling down the output values by the maximum number in outputs. So if I have [1 2 3 4], new outputs would be [.25 .5 .75 1].

  17. Hi,

    Thanx for code, it’s cool. But I have problems with “weights=…” and “out=…”, I just can’t see it. Could u pls help me, it’s really urgent since I have an exam in few days.
    When I run the code, it looks like this

    H1 =

    -2.7332

    ans =

    -0.8847

    H2 =

    -2.6884

    ans =

    -3.3642

    x3_1 =

    0.0019

    ans =

    -0.0041

    H1 =

    -2.7336

    ans =

    -0.8851

    H2 =

    -2.6884

    ans =

    -3.3642

    x3_1 =

    0.0893

    ans =
    and so on…((

    • if you type your code like this your problem would be solved:
      H1 = bias(1,1)*weights(1,1) + input(j,1)*weights(1,2) + input(j,2)*weights(1,3);

      (type in one line)

  18. Hi . This is good explanation of MLPNN . I have some problem in my code. I have a matrix of 30×11 and i want to classify this in three categories in 2-d plane . In this way i trained 60% data 10% validate and 30% for test. The three categories are
    1- x<-1 -1<y<1
    2- 01.8 y>0
    3- x0
    how can this achieve by MLPNN.

  19. Hello there I am so grateful I found your website, I really found you by accident, while I was browsing on Askjeeve for something else, Regardless I am here now and would just like to say kudos for a remarkable post and a all round enjoyable blog (I also love the theme/design), I don�t have time to browse it all at the moment but I have book-marked it and also added in your RSS feeds, so when I have time I will be back to read much more, Please do keep up the superb job.

  20. Hello, I don’t really understand why the weight matrix has 9 values in it. There are 6 in the diagram, are there not? I’m trying to figure out how to create a 4-input hidden layer but I am having difficulty understanding how the weights are derived within the code.

  21. Hi.

    I had recently try your network. Even if i had set the iteration to 100,000 times, the result is still far from the target values. I also try with shorter and larger iteration times to reconfirm the result. It seems that the error that propagate back to the network is not working very well as i am getting a similar results even with different iteration times. I hope you can revise it back or I am the only one have this problem??. FYI i am using Matlab 2010b.

    Thanks.

    • I believe you’re the only one having problem. It works fine over here. You may want to;
      clear all; close all; clc
      then copy and paste code to try it again.

    • Hi,

      I am a newby when it comes to ANNs, but your site has been of great help! However, I seem to have the same problem as Don as well. The output is converging to 0.5;0.5;0.5;0.5 instead of 0;1;1;0. I use the exact code as posted above, but replaced the sigma function as suggested by Vipul. Here is the code I am using:

      input = [0 0; 0 1; 1 0; 1 1];
      % Desired output of XOR
      output = [0;1;1;0];
      % Initialize the bias
      bias = [-1 -1 -1];
      % Learning coefficient
      coeff = 0.7;
      % Number of learning iterations
      iterations = 10000;
      % Calculate weights randomly using seed.
      rand(‘state’,sum(100*clock));
      weights = -1 +2.*rand(3,3);

      for i = 1:iterations
      out = zeros(4,1);
      numIn = length (input(:,1));
      for j = 1:numIn
      % Hidden layer
      H1 = bias(1,1)*weights(1,1)
      + input(j,1)*weights(1,2)
      + input(j,2)*weights(1,3);

      % Send data through sigmoid function 1/1+e^-x
      % Note that sigma is a different m file
      % that I created to run this operation
      x2(1) = 1/(1+exp(-H1));

      H2 = bias(1,2)*weights(2,1)
      + input(j,1)*weights(2,2)
      + input(j,2)*weights(2,3);
      x2(2) = 1/(1+exp(-H2));

      % Output layer
      x3_1 = bias(1,3)*weights(3,1)
      + x2(1)*weights(3,2)
      + x2(2)*weights(3,3);
      out(j) = 1/(1+exp(-x3_1));

      % Adjust delta values of weights
      % For output layer:
      % delta(wi) = xi*delta,
      % delta = (1-actual output)*(desired output – actual output)
      delta3_1 = out(j)*(1-out(j))*(output(j)-out(j));

      % Propagate the delta backwards into hidden layers
      delta2_1 = x2(1)*(1-x2(1))*weights(3,2)*delta3_1;
      delta2_2 = x2(2)*(1-x2(2))*weights(3,3)*delta3_1;

      % Add weight changes to original weights
      % And use the new weights to repeat process.
      % delta weight = coeff*x*delta
      for k = 1:3
      if k == 1 % Bias cases
      weights(1,k) = weights(1,k) + coeff*bias(1,1)*delta2_1;
      weights(2,k) = weights(2,k) + coeff*bias(1,2)*delta2_2;
      weights(3,k) = weights(3,k) + coeff*bias(1,3)*delta3_1;
      else % When k=2 or 3 input cases to neurons
      weights(1,k) = weights(1,k) + coeff*input(j,1)*delta2_1;
      weights(2,k) = weights(2,k) + coeff*input(j,2)*delta2_2;
      weights(3,k) = weights(3,k) + coeff*x2(k-1)*delta3_1;
      end
      end
      end
      end

      Hopefully someone can help me with this. Thanks.

      • Berry,

        Your ‘out’ variable is converging to [0.5; 0.5; 0.5; 0.5]? I copy and pasted your code, and was getting close to [0; 1; 1; 0]. Not sure what the difference is though…

        Can you tell me what your weights end up being? Maybe change the iterations to just 2 or 3 or something small, and then track how the weights and out change over the course of just a couple iterations. The math should be easy to follow using just a couple iterations as well, so you can see where it might be failing.

        Sorry its not working out properly. If anyone else has suggestions, we’re all ears.

        Vipul

        • i have looked at this problem and can replicate both using the code above. My understanding is that, sometimes the gradient descent algorithm which is used to find the weights for min error, finds a local minimum in the error function which results in 0.5 / 0.5 / 0.5 / 0.5 rather than the weights for the global min of the error function which for the 0utput 0 / 1 / 1 / 0.

      • I had the same problem the first time I ran it; check that the equations for H1, H2 and x3_1 fit in one line each; that solved it for me. Thanks a lot for the material, Vitpul!!

  22. Hi,

    Why is the equation for calculating H1 (and H2) look different with your network representation in the graphic? Shouldn’t H1 be calculated as H1 = bias(1,1)*weights(1,1) + input(j,1)*weights(1,2) + input(j,2)*weights(2,2)?
    And also H2 = bias(1,2)*weights(2,1) + input(j,1)*weights(1,3) + input(j,2)*weights(2,3)?

    Of course your code still works, but maybe it’s good for keeping the consistency with your picture.

  23. Hi. Nice posting. Thank you very much for this material. I am doing NN for my study. So this is very helpful. Hope you can post more on NN.

  24. The sigma code is a simple function that squashes the data between a certain range. One such sigmoid function example could be something like the following:

    function x = sigma(net)

    x = 1/(1+exp(-net));

  25. Hi,

    thanks for the tutorial, but aren’t you missing the sigma code? I am referring to this in-code comment:

    % Note that sigma is a different m file
    % that I created to run this operation

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>