The Perceptron

The Perceptron:

The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt.

It can be seen as the simplest kind of feedforward neural network: a linear classifier.

The perceptron is a binary classifier which maps its input x (a real-valued vector) to an output value f(x) (a single binary value):


0\ & \text{otherwise}\end{cases}
” />

where w is a vector of real-valued weights, w \cdot x is the dot product (which here computes a weighted sum), and b is the ‘bias’, a constant term that does not depend on any input value.


The value of f(x) (0 or 1) is used to classify x as either a positive or a negative instance, in the case of a binary classification problem. If b is negative, then the weighted combination of inputs must produce a positive value greater than | b | in order to push the classifier neuron over the 0 threshold. Spatially, the bias alters the position (though not the orientation) of thedecision boundary. The perceptron learning algorithm does not terminate if the learning set is not linearly separable.


The perceptron is considered the simplest kind of feed-forward neural network.

Examples:
[1] Single Layer Perceptron
 
Mathematical model
 
[2] Multi-Layer Perceptrons
 
History:

Although the perceptron initially seemed promising, it was eventually proved that perceptrons could not be trained to recognise many classes of patterns. This led to the field of neural network research stagnating for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer (also called a single layer perceptron). Single layer perceptrons are only capable of learning linearly separable patterns; in 1969 a famous book entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. It is often believed that they also conjectured (incorrectly) that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR Function. (See the page on Perceptrons for more information.) Three years later Stephen Grossberg published a series of papers introducing networks capable of modelling differential, contrast-enhancing and XOR functions. (The papers were published in 1972 and 1973, see e.g.: Grossberg, Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52 (1973), 213-257, online ). Nevertheless the often-miscited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s. This text was reprinted in 1987 as “Perceptrons – Expanded Edition” where some errors in the original text are shown and corrected.

More recently, interest in the perceptron learning algorithm increased again after Freund and Schapire (1998) presented a voted formulation of the original algorithm (attaining a large margin) to which the kernel trick can be applied.

The Learning Rule

The perceptron is trained to respond to each input vector with a corresponding target output of either 0 or 1. The learning rule has been proven to converge on a solution in finite time if a solution exists.

The learning rule can be summarized in the following two equations:

For all i:

W(i) = W(i) + [ T – A ] * P(i) b = b + [ T – A ]

where W is the vector of weights, P is the input vector presented to the network, T is the correct result that the neuron should have shown, A is the actual output of the neuron, and b is the bias.

Training

Vectors from a training set are presented to the network one after another. If the network’s output is correct, no change is made. Otherwise, the weights and biases are updated using the perceptron learning rule. An entire pass through all of the input training vectors is called an epoch. When such an entire pass of the training set has occured without error, training is complete. At this time any input training vector may be presented to the network and it will respond with the correct output vector. If a vector P not in the training set is presented to the network, the network will tend to exhibitgeneralization by responding with an output similar to target vectors for input vectors close to the previously unseen input vector P.

Limitations

Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of two values (True or False). Second, perceptrons can only classify linearly separable sets of vectors. If a straight line or plane can be drawn to seperate the input vectors into their correct categories, the input vectors are linearly separable and the perceptron will find the solution. If the vectors are not linearly separable learning will never reach a point where all vectors are classified properly.

The most famous example of the perceptron’s inability to solve problems with linearly nonseparable vectors is the boolean exclusive-or problem.


PERCEPTRON IMPLEMENTATION USING C PROGRAM


/*-----------------------------------------------------------------------*|
|           Perceptron -- A Classic Neural Network Implementation         |
|                                                                         |
\*-----------------------------------------------------------------------*/

/*
 * File name : perceptron.c
 *
 * This program is an implementation of the "Perceptron" - a classic neural
 * network structure. It's used as a helper program read from a Tk shell.
 * It reads points from a file and prints to the standard output a set of 2
 * x:y pairs, that correspond to a line that seperates the plain into two
 * regions, a region where the perceptron's output will be true, and a region
 * where the perceptron's output will be false.
 *
 */

#include <stdio.h>

#define NUM 2         /* Number of input nodes */
#define LIMIT 100     /* Maximum number of inputs the system can handle */
#define SESSIONS 3000 /* Number of training sessions that we'll put the system through */

typedef struct
{
  float p[NUM];
} vector;

vector w, test[LIMIT];
int hits[LIMIT], total;

float bias = 0.5;

print_vector( v ) vector v;
{
  int i;
  for ( i = 0 ; i < NUM ; i++ ) printf("  %.4f ", v.p[i] );
  printf("\n");
}

float sum( x ) vector x;
{
  int i;
  float s = 0.0;
  for ( i = 0 ; i < NUM ; i++ )  s += x.p[i];
  return s;
}

float scalar_mult( x, y ) vector x, y;
{
  int i;
  float s = 0.0;
  for ( i = 0 ; i < NUM ; i++ )  s += ( x.p[i] * y.p[i] );
  return s;
}

/*-----------------------------------------------------------------------*\
|                                                                         |
|  This function computes the NN's output for a certain input vector      |
|                                                                         |
\*-----------------------------------------------------------------------*/
int net( x ) vector x;
{
  return( (scalar_mult( x, w ) + bias > 0.0));
}

/*-----------------------------------------------------------------------*\
|                                                                         |
|  Educate the NN with all training inputs                                |
|                                                                         |
\*-----------------------------------------------------------------------*/
educate_net()
{
  vector x;
  int i, j, correct_result;

  for ( i = 0 ; i < total ; i++ )
    {
      x = test[i];
      correct_result = hits[i];

      if ( net(x) != correct_result )
        if ( correct_result == 0 )
          {
            /* False alarm */
            for (j=0;j<NUM;j++) w.p[j] -= x.p[j];
            bias -= 1.0;
          }
        else
          {
            /* Miss */
            for (j=0;j<NUM;j++) w.p[j] += x.p[j];
            bias += 1.0;
          }
    }
}

/*-----------------------------------------------------------------------*\
|                                                                         |
|  Return the number of cases for which the NN returns the correct value  |
|                                                                         |
\*-----------------------------------------------------------------------*/
check_performance()
{
  vector x;
  int j, count=0;
  for ( j = 0 ; j < total ; j++ )
    {
      x = test[j];
      if ( net(x) == hits[j] )
        count++;
    }
  return count;
}

/*-----------------------------------------------------------------------*\
|                                                                         |
|  Get data (read input file)                                             |
|                                                                         |
\*-----------------------------------------------------------------------*/
int get_data()
{
  char* FileName = "/tmp/perceptron-input";
  FILE *fd;
  int i, posnum, negnum;
  float x,y;

  /* opens the file  */
  if ( (fd = fopen(FileName,"r")) == NULL )
    {
      printf ("no-input-file");
      exit(10);
    }

  /* Total number of input values */
  total = 0;

  /* read the positive examples */
  fscanf( fd, "%d", &posnum);
  if (posnum > LIMIT)
    {
      printf("Error");
      exit(20);
    }
  for ( i = 0 ; i < posnum ; i++ )
    {
      fscanf( fd, "%f %f", &x, &y);
      test[ total ].p[0] = x / 1000;
      test[ total ].p[1] = y / 1000;
      hits[ total++ ] = 1;  /* 1 for positive examples */
    }

  /* read the negative examples */
  fscanf( fd, "%d", &negnum);
  if ((negnum+total) > LIMIT)
    {
      printf("Error");
      exit(21);
    }
  for ( i = 0 ; i < negnum ; i++ )
    {
      fscanf( fd, "%f %f", &x, &y);
      test[ total ].p[0] = x / 1000;
      test[ total ].p[1] = y / 1000;
      hits[ total++ ] = 0; /* 0 for negative example */
    }

  return (0) ;
}

/*-----------------------------------------------------------------------*\
|                                                                         |
|  Main                                                                   |
|                                                                         |
\*-----------------------------------------------------------------------*/
main()
{
  int i=0, test = 0;
  int speed, dist, road;
  vector x;
  float px, py, px1, py1;

  for ( i = 0 ; i < NUM ; i++ ) w.p[i] = 0.0;

  get_data();  /* Read input from file */

  /* educate the net */
  while ( ((test=check_performance()) != total ) && ( i++ < SESSIONS ) )
      educate_net();

  if ( check_performance() != total)
    {
      printf("fail");
      exit(1);
    }

  /* We'll calculate the intersection with the visible box [0..350, 0..350] */
  px = -1000 * bias / w.p[0];
  py = -1000 * bias / w.p[1];

  px1 = ((py-350)/py) * px ;
  py1 = py - 350*py/px ;

  if ( (px  >= 0) && (px  <= 350))  printf("%d %d ", (int)px, 0 );
  if ( (py  >= 0) && (py  <= 350))  printf("%d %d ", 0, (int)py );
  if ( (px1 >= 0) && (px1 <= 350))  printf("%d %d ", (int)px1, 350 );
  if ( (py1 >= 0) && (py1 <= 350))  printf("%d %d ", 350, (int)py1 );
}

4 thoughts on “The Perceptron

  1. Rudolph October 24, 2011 / 7:45 PM

    this description helped me a lot – thanks

    • SURYA October 25, 2011 / 4:19 PM

      You are welcome Rudolph…!!

  2. rachaeeell October 28, 2011 / 2:04 PM

    very nice:)

  3. Albert March 10, 2013 / 2:53 PM

    Thanks, you rox. I offer you the next beer you will dring.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s