Author: Michael Marsalli 
Additional Credits: 
Funding This module was supported by National Science Foundation Grants #9981217 and #0127561. 

The next major development in neural networks was the concept of a perceptron which was introduced by Frank Rosenblatt in 1958. Essentially the perceptron is an MCP neuron where the inputs are first passed through some "preprocessors," which are called association units. These association units detect the presence of certain specific features in the inputs. In fact, as the name suggests, a perceptron was intended to be a pattern recognition device, and the association units correspond to feature or pattern detectors. Interestingly, there is a biological analogue of the association unit.
In 1962 David Hubel and Torsten Wiesel were studying the visual cortex of the cat. They discovered that there were groups of cells in the cortex that responded to the presence of certain features of objects in the cat's field of vision. For example, one of these groups of cells responded to the presence of an edge in an object. Another group of cells responded to the presence of a line in a specific direction. These groups of cells are in effect the association units of Rosenblatt's perceptron.
In essence an association unit is also an MCP neuron which is 1 if a single specific pattern of inputs is received, and it is 0 for all other possible patterns of inputs. Each association unit will have a certain number of inputs which are selected from all the inputs to the perceptron. So the number of inputs to a particular association unit does not have to be the same as the total number of inputs to the perceptron, but clearly the number of inputs to an association unit must be less than or equal to the total number of inputs to the perceptron. Each association unit's output then becomes the input to a single MCP neuron, and the output from this single MCP neuron is the output of the perceptron. So a perceptron consists of a "layer" of MCP neurons, and all of these neurons send their output to a single MCP neuron.
We've seen that a single MCP neuron can't produce the "exclusive or." But it is possible to construct a perceptron that produces the "exclusive or." First, we construct an MCP neuron that produces a 1 if the first input is 1 and the second input is 0, and it produces a 0 for all other pairs of inputs. So this MCP neuron is an association unit that detects the pattern (1,0). Then we construct a second MCP neuron that produces a 1 if the first input is 0 and the second input is 1, and it produces a 0 for all other pairs of inputs. So this MCP neuron is an association unit that detects the pattern (0,1). Next we construct a third MCP neuron that produces a 0 if both inputs are 0, and produces a 1 for all othe inputs. If we use the outputs of the first two MCP neurons as the inputs to the third MCP neuron, we will have a perceptron that will produce a 1 if the pair of inputs is (1,0) or (0,1), and it will produce a 0 for all other pairs of inputs. In other words, this perceptron will produce the "exclusive or." Another way to think about this perceptron is that it "recognizes" the two patterns (1,0) and (0,1).
Far more interesting patterns can be obtained by looking at arrays of ones and zeroes. For example, consider this array of three rows and three columns.









Clearly, the ones form a T pattern. It is possible to design a perceptron that can "recognize" the above pattern. In other words, the perceptron will produce a 1 for this pattern, and it will produce a 0 for any other pattern of ones and zeroes. Of course, such a perceptron would have nine inputs, one for each of the squares in the array. One could design the perceptron with one nine input association unit for each possible pattern of ones and zeroes, and the final MCP neuron would have output 1 only when the T pattern association unit sends its signal. But this would be very inefficient, because there are 2^9 = 512 possible patterns of ones and zeroes in this array. So our perceptron would have 512 association units, and each association unit would have nine inputs! This is a huge perceptron for such a simple task. Can we construct a smaller, more efficient perceptron that still detects the T pattern? In fact, it is possible to design a much more efficient T detector by cutting down the number of association units and the number of inputs to each association unit. We only need three association units that have three inputs each. The first association unit would detect the pattern (1,1,1) from the three inputs in the first row, the second association unit would detect the pattern (0,1,0) from the three inputs in the second row, and the third association unit would also detect the pattern (0,1,0) but from the three inputs in the third row. So we would have a perceptron with only three association units, and each association unit would have three inputs. This is a much more efficient perceptron than the first one we considered.
One way to measure the efficiency of a perceptron is to consider the largest number of inputs of all the association units. This is called the order of the perceptron. The first T detector perceptron described above, the very inefficient one, has order nine, because every association unit in that perceptron has nine inputs. But the second T detector described above has order three, because each of its association units has only three inputs.Both of these examples of perceptrons have the property that their association units all have the same number of inputs. This is not necessary. Consider a perceptron that has five inputs with three association units A1, A2, and A3. Suppose A1 has three inputs, A2 has two inputs, and A3 has four inputs. Then this perceptron would have order four, because that is the largest number of inputs among all three association units. We will use the order of a perceptron as a measure of its efficiency. Next we will consider the efficiency of perceptrons for certain types of pattern recognition problems.
Any digital black and white image can be considered as an array of ones and zeroes, and our T pattern example above suggests that perceptrons can be designed to "recognize" any pattern in an array of ones and zeroes. So in principle perceptrons really can be used as pattern recognition devices. Of course, the efficiency of the perceptron becomes more important as the size of the array becomes larger. As we saw in the T pattern example, it was possible to design a fairly efficient perceptron that "recognized" the pattern. As we will see, such efficient solutions are not always possible. This was the remarkable and disappointing conclusion of a the book Perceptrons written by Marvin Minsky and Seymour Papert.
In particular, Minsky and Papert proved that there is a simple pattern recognition problem that can not be solved efficiently with a perceptron. The key word here is "efficiently," because the problem can be solved with a perceptron, but the perceptrons that work are very inefficient, like the nine input perceptron that "recognized" the T pattern. Let's take a look at this simple problem.
The problem is fairly easy to explain. The problem is to tell in a given array of ones and zeroes whether the number of ones in the array is odd. This is called the parity problem. So we want to design a perceptron which takes a given size array and has output 1 if the number of ones in the array is odd, or output 0 if the number of ones is even. We would of course have a different perceptron for each different size array. We will just consider arrays with three rows and three columns to explain the basic problem. Here are some examples.









The array above has five ones, which is an odd number of ones, so we would want an output of 1.









The array above has four ones, which is an even number of ones, so we would want an output of 0.









The array above has three ones, which is an odd number of ones, so we would want an output of 1.
It is possible to design a perceptron of order nine that solves this problem, but this is inefficient since the total number of inputs is nine. Unfortunately, Minsky and Papert proved that any perceptron that can solve this problem must have order nine! And they also proved that any perceptron that can solve the parity problem for an array with n rows and n columns must have order n^2, which is the total number of inputs in the array. This means there is no efficient way to solve the parity problem with a perceptron.
Minsky and Papert gave other examples of pattern recognition problems that demonstrated severe limitations on perceptrons. Their work is heavily mathematical, and they give elegant mathematical proofs of their results. But in their proofs they only considered perceptrons with association units and one final MCP neuron, what are now called singlelayer perceptrons. One possible way to avoid the limitations on singlelayered perceptrons would be to link together networks of perceptrons in a multilayered fashioned. But Minsky and Papert suggested that these multilayered perceptrons had their own difficulties, especially the lack of a good "learning" algorithm. For these reasons, interest in perceptrons, and neural networks in general, was greatly diminished for quite some time. However, the difficulties of the multilayer approach were eventually overcome.
A multilayer perceptron consists of a number of inputs x1, x2, . . . , xn, one or more "hidden layers" of neurons, and a layer of output neurons. The inputs are sent to the first layer of neurons, and the output of each layer of neurons becomes the input for the next layer of neurons. Finally, the output layer of neurons produces the outputs y1, y2, . . . , ym of the perceptron. The inputs and outputs are now allowed to be decimal numbers, not just ones and zeroes. Furthermore, the neurons used are similar to MCP neurons, but instead of sending out a 0 or 1, these neurons compute a weighted sum of their inputs and send out a decimal number between 0 and 1 which is a function of the weighted sum.
One of the great advantages of the singlelayer perceptron was the ease with which one could apply the "learning" rule to adjust the weights until a desired output was obtained. At first glance there seems to be no obvious systematic way to adjust the weights between the"hidden" layers in a multilayer perceptron. However, this apparent obstacle was overcome by Rumelhart, Hinton, and Williams in 1986 with the development of the backpropagation rule. Similar rules had been obtained by Werbos in 1974 , Parker in 1985, and LeCun in 1985. But it was the influential work Parallel Distributed Processing edited by Rumelhart and McClelland that made backpropagation the most popular learning rule for multilayer perceptrons. This rule consists of two passes through the layers of the perceptron, a forward pass and a backward pass. A given collection of inputs is fed into the network and the output of the network is observed. The observed output is subtracted from the desired output to produce an error. This error is now fed backward through the system from the output layer to the last hidden layer and so on to the inputs, hence the name backpropagation. At each step the weights are adjusted so as to make the observed output closer to the desired output. By presenting many inputs with the desired outputs and then applying backpropagation of the error, the perceptron is "trained" to produce desired outputs with an increasing degree of correctness. These ideas are the foundation of modern neural networks, that we will call connectionist networks. [See Connectionism: An introduction.]
