EP0433414A1 - Continuous bayesian estimation with a neural network architecture - Google Patents

Continuous bayesian estimation with a neural network architecture

Info

Publication number
EP0433414A1
EP0433414A1 EP90909520A EP90909520A EP0433414A1 EP 0433414 A1 EP0433414 A1 EP 0433414A1 EP 90909520 A EP90909520 A EP 90909520A EP 90909520 A EP90909520 A EP 90909520A EP 0433414 A1 EP0433414 A1 EP 0433414A1
Authority
EP
European Patent Office
Prior art keywords
novum
output
threshold
prediction
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP90909520A
Other languages
German (de)
French (fr)
Inventor
Robert Leo Dawes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MARTINGALE RESEARCH CORPN.
Original Assignee
MARTINGALE RESEARCH CORPN
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MARTINGALE RESEARCH CORPN filed Critical MARTINGALE RESEARCH CORPN
Publication of EP0433414A1 publication Critical patent/EP0433414A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention pertains in general to a neural network architecture, and more particularly, to an architecture which is designed to perform adaptive, continuous Bayesian estimation on unpreprocessed large dimensional data.
  • Artificial neural systems is the study of dynamical systems that carry out useful information processing by means of their state response to initial or continuous input.
  • one of the goals of artificial neural systems was the development and application of human-made systems that can carry out the kinds of information processing that brains carry out.
  • These technologies sought to develop processing capabilities such as real- time high performance recognition, knowledge recognition for inexact knowledge domains and fast, precise control of robot effector movement. Therefore, this technology was related to artificial intelligence.
  • Cognitive systems in which neural networks are implemented can be viewed in terms of an observed system and a network which are interfaced by sensor and motor transducers.
  • the neural network is a dynamic system which transforms its current state into the subsequent states under the influence of its inputs to produce outputs which generally influence the observed system.
  • a cognitive system generally attempts to anticipate its sensory input patterns by building internal models of the external dynamics and it minimizes the prediction error by employing a prediction error correction scheme, through improvement of its models, or by influencing the evolution of the observed system, or all three. Mathematically, this is a hybrid of three important and well-known problems: system identification, estimation and control. The theoretical solutions to these have been known for several decades. System identification is accomplished analytically by a number of methods, such as the "model reference" method.
  • Kalman filter provides an iterative estimation of linear plants in Gaussian noise
  • Kalman-Bucy filter provides continuously evolving estimates.
  • the multi- stage Bayesian or continuous Bayesian estimator can be utilized for non-linear plants in non-Gaussian noise. Control is approached through several routes, including the Hamilton-Jacobi theory and the method of Pontryagian.
  • the present invention disclosed and claimed herein comprises a neural network.
  • the neural network includes an observation input for receiving a timed series of observations.
  • a novelty device is then provided for comparing the observations with an internally generated prediction in accordance with the novelty filtering algorithm.
  • the novelty filter device provides on an output a suboptimal innovations process related to the received observations and the predictions.
  • the output represents a prediction error.
  • a prediction device is provided for generating the prediction for output to the novelty device.
  • This prediction device includes a geometric lattice of nodes. Each of the nodes has associated therewith a memory for storage of spatial patterns which represent a spatial history of the timed series of observations.
  • a plurality of signal inputs is provided for receiving the prediction error from the novelty device and then this received prediction error is filtered through the stored spatial patterns to produce a correlation coefficient that represents the similarity between the stored pattern and the prediction error.
  • a plurality of threshold inputs is provided at each node for receiving threshold output levels from selected other nodes.
  • a threshold memory is provided for storing threshold levels representing the prior probability for the occurrance of the stored spatial patterns prior to receiving the stored spatial patterns.
  • a CPU at each of the nodes computes an updated threshold level in accordance with a differential-difference equation which operates on the stored threshold level, the received threshold levels and the correlation coefficients to define and propagate a quantum mechanical wave particle across the geometric lattice of nodes and also store the updated threshold in the threshold memory.
  • a threshold output is provided from each of the nodes for outputting the updated threshold to other nodes.
  • the CPU computes the internally generated prediction by passing the correlation coe ficients through a sigmoid function whose threshold level comprises the updated threshold level.
  • the prediction represents the probability for the occurrence of the storage spatial patterns conditioned upon the prior probability represented by the storage threshold level.
  • the prediction device is adapted such that it is operable to learn by updating the stored spatial patterns so as to correlate the prediction error with the position of the quantum mechanical wave particle over the geometrical lattice. This learning is achieved in accordance with the Hebbian learning law.
  • the novelty device includes an array of nodes with each node having a plurality of signal inputs that receive the observation inputs, and a plurality of prediction inputs for receiving the prediction outputs of the prediction device.
  • a memory is provided for storing temporal patterns that represent a timed history of the timed series of observations. The prediction observation inputs are then operated upon with a predetermined algorithm that utilizes the stored temporal patterns to provide the prediction error.
  • the novelty device also is adaptive. It learns by updating the stored temporal patterns so as to minimize the prediction error.
  • the learning algorithm utilizes the contraHebbian learning law.
  • Figure 1 illustrates a block diagram of the neural network of the present invention
  • Figure 2 illustrates a block diagram of the PA 12 illustrating the novum 14 as an array of separate neurons and the IG 16 as an array of separate neurons;
  • Figures 3a-3c illustrate the use of traveling wave packets in the IG threshold field
  • Figure 4 illustrates a dual construction of the gamma outstar avalanche can be made which consists of instars from the pixel array falling on each of a sequence of neurons in a "timing chain";
  • Figure 5 illustrates a recurrent two-layer neural network similar to that utilized by many neural modelers
  • Figure 6 illustrates a block diagram of the parametric avalanche which represents the innovations approach to stochastic filtering
  • Figure 7 illustrates how the novum and the IG of the PA generate and use the innovations process
  • Figures 8a and 8b illustrate schematic representations of the novum neuron and the IG neuron
  • Figure 9 illustrates a more detailed flow from the focal plane in the observation block through the novum and the IG for a two dimensional lattice
  • Figure 10 illustrates a top view of the IG lattice
  • Figure 11 illustrates a tracking system
  • Figure 12 illustrates a block diagram of a control module which employs two PA Kalman Filters for the state estimation functions
  • Figure 13 illustrates a Luemberger observer
  • Figure 14 illustra ' tes the preferred PACM design
  • FIGS 15 and 16 illustrate graphs of one example of the PA
  • Figure 17 illustrates the response of the synaptic weights in the IG:
  • Figure 18 illustrates the values of the synaptic weights on each neuron of the novum
  • Figure 19 illustrates a block diagram of one of the neurons in the IG
  • Figure 20 illustrates a block diagram of one of the neurons in the novum
  • Figure 21 illustrates an example of an application of the Parametric Avalanche
  • Figures 22 and 23 illustrate the time evolution of the angle from the vertical for the example of Figure 21 and the corresponding novum output
  • Figure 24 presents programme information for use in a neural network in accordance with the invention.
  • the observed system 10 receives on the input control signals ⁇ (t) .
  • an observing system 12 is provided to monitor the state of this observed system 10.
  • the observing system 12 is essentially the neural network.
  • the neural network is comprised of a two-layer recurrent network.
  • There is an input layer which is referred to as the novum, illustrated by a block 14.
  • the second layer provides a classification layer and is referred to as the IG as illustrated in a block 16.
  • the abbreviation IG refers to Infinitesimal
  • the two blocks 14 and 16 operate together to continuously process information and provide a retrospective classification and prediction (estimate) of the evolution of the classified observation.
  • the novum 14 provides a prediction assessment of the observed system and receives on one input thereof the output of the observed system 10. The output of the novum 14 provides a prediction error. The novum 14 also receives on the input thereof the output of the IG 16, the output of the IG 16 providing a state estimate in the form of a conditional probability distribution function. The novum 14 essentially extracts the innovations process which will be described in more detail hereinbelow. It decodes the classifications (state estimations) which it receives from the IG 16 and then subtracts that decoded version from the actual received signal to yield the novel residual.
  • the novum 14 contains the internal model of the transition by which the system 10 is observed and is operable to transform the estimated state output by the IG 16 into a prediction of the observed signal received from the observed system 10.
  • the novum 14 operates under a simple learning law wherein the output is zero when the novelty is not present; that is, when the internal model correctly predicts the observed signal, there is no novelty in the observed signal and, therefore, a zero prediction error. Therefore, the novum 14 is driven to maximize the entropy of its output which comprises its state of "homeostasis".
  • the IG 16 operates as a prediction generator. It implements the functions of Kalman gain matrix, the state transition function and the estimation procedure. The output of the IG 16 indicates the probability that prior events have occurred, given the prior history and the current observation.
  • Both the novum 14 and the IG 16 are comprised of a network of processing elements or "neurons". However, the neurons in the novum 14 are arranged in accordance with an observation matrix such that the observed system is mapped directly to the neurons in the novum 14 whereas the IG 16, which is also comprised of a network of neurons is arranged as a geometric lattice, and the neurons in the IG 16 represent points in an abstract probability space. Each of the neurons in the IG 16 has an activation level, which activation level indicates the likelihood that the events or states which each particular neuron represents have occurred, given only the current measurements.
  • the output of these neurons indicate the likelihood that those events have occurred, conditioned additionally by the prior history of the observations and the dynamical model, as supplied in the threshold level of the output sigmoid function described hereinbelow.
  • the synaptic weights of the IG neurons learn by Hebbian learning.
  • the neural network of the present invention as illustrated by the observing system 12 is referred to as a "parametric avalanche" (PA) which is operable to store dynamic patterns and recall dynamic patterns simultaneously. It performs optimal compression of time varying data and incorporates a moving targert indicator" through its novelty factorization. It can track a non ⁇ linear dynamic system subliminally before accumulating sufficient likelihood to declare a detection.
  • the PA 12 therefore possesses an internal inertia dynamics of its own, with which the external dynamics are associated by * means of the learning law. This internal dynamics is governed by the Quantum Neurodynamic (QND) theory.
  • QND Quantum Neurodynamic
  • FIG. 2 there is illustrated a block diagram of the PA 12 illustrating the novum 14 as an array of separate neurons and the IG 16 as an array of separate neurons.
  • the novum 14 receives the observation of the plant from the observed system 10 on an input vector 18.
  • the novum 14 outputs the novelty on an output vector 20 which is input to the IG 16.
  • the IG 16 generates the state estimates of the plant for input to the novum 14 on an output vector 22.
  • the observed system 10 is illustrated in the form of a focal plane wherein an object, illustrated as a rocket 26, traverses the focal plane in a predetermined path. This constitutes the observation. This observation is that of a dynamic system which possesses a system inertia.
  • the focal plane of the observed system 10 is mapped onto the novum 14.
  • the IG 16 makes a prediction as to the state of the observed system and this is input to the novum 14. If the prediction is correct, then the novelty output of the novum 14 will be zero.
  • the rocket 26 traverses its predetermined path and, if the internal system model in the PA 12 is correct, the state 12 estimates on output vector 22 will maintain the novelty output of the novum 14 at a zero state.
  • Each of the neurons in the novum 14 is represented by a neuron 28.
  • Each neuron 28 receives a single input on a line 30 from the observed system 10 and a plurality of inputs on lines 32 from each of the neurons in the IG 16, which constitute state estimates from the IG 16.
  • the neuron 28 generates internal weighting factors for each of the inputs 32, as will be described hereinbelow.
  • the neuron 28 provides a single output to the IG 16, which output goes to each of the neurons in the IG 16.
  • the IG 16 is comprised of a geometrical lattice of neurons.
  • Each of the neurons in the IG 16 is illustrated by a neuron 34.
  • Each of the neurons 34 receives inputs from the neurons 28 in the novum 14 on input lines 36.
  • Each of the neurons 34 is an independent and asynchronous processor which can generate weighting factors for each of the input line 36 to internally generate an activation level.
  • the weighting factors provide a stored template and the activation level yields the correlation between an input signal vector (i.e., the novum output) and this stored template. It indicates the degree to which the input signal looks like the stored template across the IG 16. If the activation level is zero, it indicates that the input vector does not match the stored template. However, if there is a match, the activation level is relatively high. This activation level is modified by a threshold field which will be described in more detail hereinbelow, which then generates an output that is input to each of the neurons 28 in novum 14.
  • Each of the neurons 34 in the IG 16 have a threshold associated therewith such that an overall threshold field is provided over the lattice in the IG 16.
  • the threshold levels in the IG threshold field are governed by non-linear lattice differential equations.
  • the "natural mode" of wave propagation in the threshold field favors compact, particle-like depressions in the field which are termed Threshold Field Depressions (TFD's).
  • TFD's Threshold Field Depressions
  • the threshold field operation will be described in more detail hereinbelow. However, it can be stated that one or more particle-like wave depressions are propagated across the geometric lattice of the IG 16. This propagation "tracks" the inertia of the plant, as illustrated in the focal plane of the observed system 10. It is important to note that the TFD operates over a number of neurons in the general area of the threshold field depression. Therefore, it is the interaction of the neighboring neurons that control the propagation of the TFD.
  • the actual wave propagation is illustrated by an output wave 38 on the surface of the IG 16.
  • the wave 38 has a peak and an indicated path which it travels. Although this path is noted as being arcuate in nature, it is a relatively complex behavior which will be described in more detail hereinbelow.
  • the IG 16 is comprised of an IG 1 42 and a threshold plane 43.
  • the IG 1 layer 42 represents the geometrical lattice of neurons which receive on the input thereof the output from the novum 20 with each neuron in the IG* layer 42 providing an activation level in response thereto.
  • each of the neurons 34 generates a weighting factor for each of the input lines from the novum 14 in accordance with a learning law. This results in the generation of an activation level for each neuron 34 which indicates the degree to which the input signal looks like the stored template.
  • the stored template will have been learned in a previous operation from which the weighting factors were derived.
  • the activation across the geometrical lattice of the IG' layer 42 will appear as a distribution of activation levels 44.
  • the activation levels will be virtually zero.
  • This distribution even if a zero value, has an inertia which is a result of the wavelike motion described above.
  • the distribution of the activation levels appears as illustrated in Figure 3a.
  • threshold depression 46 there is also a threshold depression 46 in the threshold plane 44.
  • the threshold level in the threshold plane 44 is at a high level.
  • a threshold field depression present which has an associated inertia. In Figure 3a, this is illustrated as a threshold field depression (TFD) 46. Since the output of the novum is zero, the estimation provided by the IG 16 is correct. Therefore, the TFD 46 will be directly aligned with respect to the distribution of the activation levels 44.
  • FIG. 3b a trajectory is illustrated whi ⁇ h moves between a beginning point 48 at a time tp. ! to an end point 50 at a time t p+1 .
  • a point 52 is traversed in the center thereof at a time t p .
  • This observation illustrates a sequence of events which occur in a temporal manner.
  • this is a dynamic system with inertia.
  • the system must examine the output of the
  • the IG 16 to determine if the state estimates are correct. This is done through the innovation process in the novum 14. At time t p , which represents the next slice in time, the IG 16 must predict what the status of the system will be at this time and these state estimates are again input to the novum 14 which coittpares them with the observation to determine if the observation is correct. If so, the output of the novum 14 is zero. This continues on to the next slice in time at time tp +1 and so on. Initially, it is assumed that the system has learned the trajectory from point 48 to point 50 and passing through point 52. In accordance with an important aspect of the present invention, it is the generation of the state estimates in a spatio-temporal manner that is accomplished by the IG 16.
  • the TFD in this example has been initiated and is propagating across the geometrical lattice of the IG 16 in a predetermined path.
  • This path corresponds to the learned path; that is, the neurons over which the TFD is propagated are the same neurons over which it was propagated during the learning process, which will be described hereinbelow.
  • a prediction is made only in the area of the TFD, which prediction either allows the TFD to be propagated along its continued path, or which prediction modifies the path of the TFD. This latter situation may occur, for example, when the identical path is being traversed, but it is being traversed at a slower rate. Therefore, the propagation of the TFD directly corresponds to the inertia of the system whereas the geometrical lattice of the IG corresponds to the probability that the point along the path has occurred at a specific time.
  • the TFD occurs at tp ⁇ ⁇ . to yield a TFD 54 which is illustrated as concentric circles in phantom lines.
  • the concentric circles basically represent the level of the TFD with the center illustrating the lowest value of the depression.
  • Underlying the TFD is the activation level in the IG 1 layer 42. These two layers are illustrated together in an overlapping manner.
  • the TFD propagates from an area 54 in the IG 16 at time tp.. ! to an area 56 at time t p . This propagation continues from the area 56 to an area 58 at time t p+1 .
  • the TFD traversing from area 54 to area 56 to area 58 would track the inertia or speed of the object traversing from points 48 to point 52 to point 50 in the observed system in order for the system to provide the appropriate estimation of the state of the system. It is important to note that the inertia of the system has been embodied in the propagation of the TFD and this TFD in conjunction with the model encoded into the underlying neurons yields the state estimation. Because of the zero activation level in the IG 1 layer 42 (due to zero novelty output) , the wave propagation is not altered.
  • the inertia of the system will be different from that represented by the propagation of the TFD in the threshold plane 43.
  • the novum 14 will output a prediction error which will raise the activation level in front of or behind the TFD to essentially modify the threshold depression and either increase or decrease the propagation rate of the threshold depression. For example, suppose that the inertia of the observed system at a time prior to time t p ! is equal to the corresponding inertia of the TFD in the threshold field 43.
  • the inertia of the observed system has decreased, thus requiring the inertia of the TFD propagation to decrease or slow down. Therefore, this would result in the activation levels just behind the area 54 increasing, thus causing the propagation rate of the TFD to slow down. This would continue until the output of the novum were zero. At this point, the activation level output by the neurons in the IG would be zero as a result of the zero output of the novum 14, due to the whitening effect thereof.
  • the inertia of the system becomes a constant value, the inertia of the TFD will be forced that inertia and the output of the novum 14 will be forced to a zero value.
  • the PA 12 is comprised in part of a number of networks which are integrated together. These will each be described in general and then the way that these networks and learning laws are integrated will be described.
  • the "gamma outstar avalanche" of Grossberg is a well-known neural network architecture which utilizes Hebbian learning and an outstar sequencing scheme to
  • a dual construction of the gamma outstar avalan ⁇ he can be made which consists of instars from the pixel array falling on each of a sequence of neurons in a "timing chain". This is illustrated in Figure .
  • the pixel array is represented by an array 60 with an illuminated pattern thereon.
  • the output of the pixel array 60 is input to a chain of neurons 62- with a pulse 64 represented as traveling down the chain of neurons.
  • Implementation of this type of "instar" avalanche is not a simple task nor is it obvious that by itself it would have any utility.
  • the only difficulty in launching this instar avalanche is that it requires one to send a coherent, compact pulse of activation down the chain of neurons 62 which parameterize the time access.
  • the instar avalanche could also be launched with some very simple estimations. However, these estimations become somewhat uncomputable when it is necessary to go to higher dimensional neural lattices with more than one pulse propagated therein.
  • each of the neurons in the neuron chain 62 is encoded with a compact representation of a pattern, and its neighbors encode the causal context in which that pattern occurred.
  • the coding is associatively accessible in that the spatial patterns are concentrated into the synapses of individual neurons in the neuron chain 62.
  • the network is comprised of a first layer 66 and a second layer 68.
  • the first layer is comprised of a plurality of neurons 70 and the second layer 68 is comprised of a plurality of neurons 72.
  • Each of the neurons 70 is labelled ai ⁇ a n with the a-_ , the a n and the a ⁇ neuron 70 being illustrated, the a ⁇ neuron representing an intermediate neuron.
  • the neurons 72 in layer 68 are labelled b ⁇ -b n w ith the b ⁇ , b n and the bj neuron 72 illustrated, the bj neuron 72 being an intermediate neuron.
  • Each of the neurons in the first layer 66 receives an input signal from an input vector 74.
  • Each of the neurons in the first layer 66 also receive an input signal from each of the neurons 72 in the second layer 68 and an associated weighting value with this input signal.
  • each of the neurons in the second layer 68 receives as an input, a signal from each of the neurons in the first layer 66 and associates a weighting value therewith.
  • Each of the neurons in the second layer 68 receives an input from each of the other neurons therein and associates an appropriate weighting factor therewith.
  • the learning objective for this type of network when utilized in prior systems is to build a compact (preferably a single neuron) code in the second layer 68 22 to represent a class or cluster of patterns that were presented to the first layer 66 in not-necessarily compact distributed form.
  • the recall objective of these networks is to reactivate a pattern of codes in the second layer 68 which identifies the class or set of classes of which the input pattern of the first layer-66 is a representative.
  • the output objective is itself a distributed pattern, but more often in the prior systems, it is to produce a "delta function". This delta function representation is a low entropy distribution of activations, i.e., one that is unlikely to appear by chance.
  • This low entropy distribution of activations amounts to an unequivocable declaration that the input pattern belongs to the class of patterns which the lone active neuron in the second layer 68 represents. Such a representation requires no further pattern processing to communicate its decision to the human user, although it is easy if desired to employ an outstar from the active neuron in the second layer 68 to another layer to generate a distributed picture for the human user. This is essentially what is accomplished in the Hecht-Nielsen counterpropagation network. This low entropy distribution of activations is essentially how prior systems operate.
  • the desired output is also a delta function.
  • this is accomplished by finding the neurons 72 in the second layer 68 with the strongest response to the input pattern and preventing the learning algorithm from applying to any other neuron 72, unless the strongest response is obtained from a neuron 72 which presents a "bad match" to the input pattern, in which case, the learning algorithm is allowed to apply only to some other single neuron 72 in the second layer 68.
  • the PA 12 of the present invention utilizes a learning algorithm that is supervised, but in a locally computable neural form.
  • the TFD is propagated as a wave across the IG 16.
  • the propagation of compact, coherent wave-particles over a discrete lattice was discovered by accident by Fermi, Pasta and Ulam in their study of the finite heat conductivity of solids.
  • the differential (in 24 time) difference (in space) equations which they were studying is now called the Fermi-Pasta-Ulam (FPU) equation.
  • FPU Fermi-Pasta-Ulam
  • the FPU equation is written as a continuum in the spatial coordinate, it is a form of the Korteweg-deVries (KdV) equation, which has been known for some time to model the shallow water solitary waves of Russell.
  • NLS Non-Linear Schroedinger
  • the NLS equation is a complex wave equation
  • h Planck's constant
  • i is the imaginary unit
  • m is a scaling constant identified with the mass of a particle
  • f is a real valued function chosen to offset the dispersion of the wave
  • U is a real scalar field which may be identified with the refractive index of the propagation medicum or with an externally applied force field.
  • the NLS equation will be solved by ⁇ oliton-like wave particles such as the "Gaussons" described in Birula [Annals of Physics, 100, pp. 62-93, 1976].
  • the potential field U(x,t) is established by the activation levels L(x,t) of the neurons of the IG 16, which are in turn determined by the signal vector n(t) which is received from the novum 14. It is easy to show, and is described hereinbelow, that after the PA has been entrained, the prediction errors carried by n(t) generate precisely the right potential field U (x,t) whose gradient vector
  • TFDs also act as markers when considered in conjunction 26 with the Kalman-Bucy filtering because they mark the location of a maximum likelihood state (or feature) estimate.
  • the neural lattice in the second layer 68 of the two layer network of Figure 5 has randomly initialized synaptic weights, i.e. uniformly in the interval [-1, +1] and that at the current time the threshold field exhibits a single TFD at some location x 1 in the lattice.
  • the current image will elicit a random response in the activation levels in the second layer 68, but the output signals will be nearly zero everywhere except in the neighborhood of x' because there the threshold is so low that almost anything (except a strong antimatch to the current pattern) will produce a strong output. Therefore, a signal Hebbian learning law (i.e., one in which the weight change is proportional to the product of the presynaptic signal times the output signal of the postsynaptic neuron) will capture the input pattern and store it at, and to a lesser extent near, x'. Therefore, the input pattern is stored in the nearest neighbor manner. The learning is shut down everywhere else because the baseline threshold levels of the threshold field 43 squelches random responses in the quiet range.
  • TFDs move like particles in a geodesic across the neural lattice of the IG 16, they can do much more than a simple parameterization of a time axis as in an ordinary avalanche. They serve as idealized internal models of the parametric trajectories of features in the observed scene.
  • TFD spatial delta function
  • a single TFD could only encode N levels of the parameter by its position in an N- neuron lattice, because a delta function disappears when its "peak" is between lattice points.
  • a distributed TFD which spans as many as six or seven lattice points at a time can represent a virtual continuum of positions between adjacent neurons.
  • the quantization of the interpolation should be on the order of 2 m times the word-length quantization of the activation of the m- neurons under a given TFD; i.e., if a TFD amplitude at each neuron is coded into an n-bit word and the TFD spans m-neurons at a time, then the interpolation ability of the peak of the TFD between neurons should be on the order of m times n-bits.
  • Hebbian learning law There are two relatively simple learning laws, the Hebbian learning law and the contra-Hebbian learning law* *
  • the Hebbian learning law is the archetype of almost all of the so-called unsupervised learning laws, yet it is almost never used in the original form because it fails to account for the temporal latencies which characterize 28 causal processes, which include the classical conditioning behavior of animals.
  • Some variants account for the direction of time by convolving one or both of the presynaptic or postsynaptic signals with the one- sided distribution, such as in Klopf's Drive
  • the contra-Hebbian learning law is a special case of the well-known delta rule.
  • the delta rule adjusts the synaptic weight in accordance to Widrow's stochastic gradient method to drive the actual output yj toward a "desired" output dj.
  • the formula for the delta rule is:
  • the fundamental objective of every filtering problem is to determine (i.e., to estimate), the conditional probability density P(x(t)
  • Y(t') ) for the state or "feature” or "parameter” of the observed system at time t given all the observations Y(t') ⁇ y(s)
  • the PA 12 is a two-layer architecture as described above, consisting of novum 14 and the IG 16, this architecture being somewhat similar to that illustrated in Figure 5 with the exception that the output of the second layer 68 corresponding to the IG 16 is also fed back to the first layer 66, which corresponds to the novum 14, as an additional input.
  • the novum 14 provides an approximation to the innovations process of the input time series.
  • the IG 16 stores the differential model of the observed system.
  • FIG. 6 there is illustrated a block diagram of the innovations approach to stochastic filtering, which has been taken from the paper by Kailath, T. "An Innovations Approach to Least-Squares Estimation, Part I: Linear Filtering in Additive White Noise", IEEE Trans. Automat. Contr.. vol AC-13, pp. 646- 655, Dec. 1968. Superimposed on that block diagram is a partition showing which functions are performed by the novum 14 and which are performed by the IG.
  • Previous to Applicant's present invention all known implementations of the Kalman filter were based on the well-known iterative formulation and refinements thereof. These previous systems are based on Gaussian statistics in either linear or linearized systems. The bulk of the computational burden is taken up by the matrix operations of the operation of the Kalman gain matrix (shown as the operator "K”) in Figure 6.
  • the novum 14 receives the observation of the plant (i.e., the input) and the IG 16 generates the state estimates of the plant.
  • the "algorithm" of the PA 12- is quite different from that shown in Figure 6. It is based on the more general multi-stage Bayesian estimator as described in Ho, Y.C. and Lee, R.C.K. , "A Bayesian Approach to Problems in Stochastic Estimation and Control", I.E.E.E. Transactions of Automation Control, Vol. AC-9, pp. 333-339, October 1964. In order to describe how the PA 12 operates, the procedure, as described in the Ho and Lee paper will be stepped through to show how the PA 12 accomplishes each step.
  • Step 1 Evaluate P(X] ⁇ +1
  • the new threshold field, T(x,t ]+1 ) represents the conditional likelihood function for the states (features) x given the prior likelihood function for those states.
  • Step 2 Evaluate P(Z] +1
  • z- ⁇ +i is the new observation vector, which in Figure 7 above is denoted by y(t) .
  • current activations L(x) are fed through the threshold field T(x,t j ⁇ +1 ) to produce the a-priori state estimates E(x
  • This IG output is then passed through the synapses of the novum 14, which implement the fl matrix. Since H implements the internal model of the observation matrix H, the result is the estimate of the observation as predicted by the IG 16. (Note that the observation itself is treated as a likelihood function over the receiving transducers, so this estimate is itself a likelihood function.) The novum 14 then subtracts this estimate from the current signal to produce the innovations process.
  • Step 3 Evaluate P(X] ⁇ + ⁇ , zj ⁇ +i
  • F(t) is the state transition operator
  • K(t) is the Kalman gain
  • n(t) is the innovations process.
  • Step 4 Evaluate P( ] ⁇ + ⁇
  • the novelty resulting from the new observation is passed through the updated IG synapses (along with any recurrent IG signals) to produce the new activation levels L(x) in the IG 16, and then L(x) is passed through the threshold field to obtain P(x
  • L(x) is passed through the threshold field to obtain P(x
  • Step 5 Select the state(s) corresponding to the maximum likelihood estimate(s) .
  • the new encoding will not effect the current signal from the IG 16 to the novum 14, nor will it deflect the physical trajectory of the threshold wave particle. But it will deflect the apparent trajectory of the threshold wave particle the next time it crosses its physical trajectory because the processing elements in its path now encode different features. This provides the improvement in the internal model of the dynamics of the observed system. It has no effect on the current estimation effort, but it will effect the convergence rate for the next observation of the same trajectory.
  • the observer model H is contained in the ⁇ G-to- novu synapses of the novum 14 and it is established with a very short time constant utilizing delta-rule learning.
  • the threshold level over the novum 14 is level and does not vary with time. That level defines the 34 maximum information level for the novum 14 activations, i.e., the maximum entropy level. That level is the "desired output" for each processing element (pixel or neuron) of the novum, which 14, for simplicity of computation V, has been chosen to equal zero.
  • the observation of the vector Y(t) is applied to the novum 14 through hard-wired, non- learning synapses, one component Yj (t) to each novum neuron 28.
  • the feedback signals P(x:T) from the IG 16 enter through learnable synapses on the input vector 22.
  • P(x[T) will be a traveling delta function, so that at any one time, only one of the IG input lines of each pixel will have a signal on it.
  • the delta learning algorithm will mold that synaptic weight into a mirror image of the signal component, i(t) , falling on the pixel at the same time.
  • the observation vector y(t) is supplied to the novum through hard-wired, nonlearning synapses, one component yj (t) to each novum neuron.
  • T) from the IG 16 enter through learnable synapses (i.e., input lines 36).
  • T) will be a traveling delta-function, so that at any one time only one of the IG input lines 36 on each neuron 36 will have a signal on it.
  • the delta-rule learning algorithm will mold that synaptic weight into a mirror-image of the signal component, y (t) , falling on the pixel at the same time (the mirror being at the threshold level) .
  • the synaptic weights will be a spatially recorded replica of the signal waveform, and only those synapses which were connected to IG neurons 36 activated by the traveling delta-function actually partake in the representation. Others are available for encoding observations of unrelated signal patterns.
  • the motion of the wa ⁇ e-particles which become associated with the dynamical model of the observed system may be achieved in basically two ways. This can be done by using an appropriate bell-shaped depression in an otherwise level threshold field and simply translating it in the desired direction by incrementing indices in the data array, or, if more than one such depression is to be moving simultaneously, by vectoring the data itself. This is appropriate for any implementation of the PA 12 on general-purpose computing equipment or special-purpose uniprocessor/vector processor equipment.
  • the nonlinear Schroedinger (NLS) equation is one route to the extension of the required dynamics to two and three dimensions. This equation describes the motion of photons and phonons in dispersive media, such as Langmuir waves in plasma.
  • the wave-particle solutions propagate in a medium that is characterized by a nonlinear refractive index which need not be spatially uniform and which, therefore, have the requisite properties for control and modulation of the trajectories of the wave-particles.
  • this refractive index can be tied directly to the response of the IG neurons 34 to the novum "error" signal to induce gradients in the refractive index field which will deflect the soliton trajectories toward smaller errors, as required by the Kalman-Bucy filter.
  • FIGs 8a and 8b there are illustrated schematic representations of the novum neuron 28 and the IG neuron 34, as described hereinabove with respect to Figure 2.
  • the novum neuron 28 in Figure 8a receives an input from other neurons in the novum lattice on the lines 32. Additionally, it receives an input from each of the points in the local plane on input lines 30. Weighting factors are associated with each of the input lines 32 and each of the input lines 30.
  • the external input vector Y(t) will be fanned out so that every novum neuron 28 receives every component of the vector Yj (t) .
  • novum neurons 28 are identified with integer indices (such as "i") and the IG neurons 34 will be identified with the vector indices (such as "x") corresponding to their coordinants in a geometric lattice.
  • the forward flow of signals from the novum 14 to the IG 16 implements an instar avalanche, that is, the novum 14 is a pixel array, while the threshold field'-of the IG 16 supports the propagation of TFDs on the two or the three dimensional IG lattice.
  • the threshold function for the IG neuron 34 at a lattice position x is given by:
  • ⁇ IG (a(x,t);T(x,t)) [l+exp ⁇ 4m(T(x,t)-a(x,t) ) ⁇ 3" 1
  • the "current" position of T(x,t) is illustrated schematically in the interior of the IG neuron 34 by the coordinant axis 80.
  • the learning law of the IG 16 is the Hebbian law.
  • the output signals of the IG 16 represent the conditional probability density P(x
  • the feedback flow of signals from the IG 16 to the novum 14 implements an outstar avalanche except that the learning law in the novum 14 is the contra-Hebbian law. Moreover, the "time" domain is factored through the two or three dimensional IGs 16 instead of being a simple one dimensional domain. The result is that when a "pattern" is recalled, it will be the negative of the observed pattern so that if the recall is executed at the same time that the original pattern is replayed into the sensor array, the output of the novum 14 is zero from all pixels.
  • the threshold function of the novum 14 is given by:
  • a v " waveform 82 represents the output response of the novum 14 in neuron 28.
  • the novum 14 is comprised of an input plane 84 and an output plane 86.
  • the focal plane in the observation block 10 was considered to have a plurality of pixels with each pixel represented by Fft f Y) which represents a neuron y (reference numeral 83) in the novum input plane 84. Therefore, one input of this novum is the vector output 22 from the IG 16 represented as u(x) .
  • Each of the novum neurons y described above, has a plurality of weighting factors associated with each of the IG inputs.
  • the IG 16 is comprised of an activation plane 42 and a threshold plane 43.
  • the activation plane is referred to as the IG' 42.
  • the IG' 42 is comprised of an input plane 98 and an output plane 100, the output plane 100 comprising the output of the IG 16.
  • Each of the neurons 34 in the IG 16 is, as described above, arranged in a geometric lattice. A particular one of the neurons 34 utilized for this example is illustrated by a specific neuron 102 in the input plane 98.
  • the neuron 102 has associated therewith a template 104 wherein the weight values are stored. For each of the neurons 34 in the IG 16, there is one weight associated with each of the novum neurons 28. Therefore, the template 104 is illustrated with a single point 106 representing the weighting factor w IG (x,y) .
  • the dot product of the output vector for the novum n(y) and the associated weighting vector W jG (x,y) is taken to provide a template output 108.
  • the template/output is then input to a threshold block 110.
  • the threshold block 110 receives on the other input thereof the threshold function T(x) which is derived from the input vector 20 at the output of the novum 14 by way of the ave equation, which was described hereinabove. This yields the output u(x) for the output from the IG 16.
  • the network behavior of the PA 12 is rather more complicated than is indicated above, because the learning laws are inseparable from the dynamics of the architecture. It is easiest to explain the step function response of the network.
  • the novum 14 is illuminated with an image (applied to the hard wired synapse) , and that there is a single TFD moving along a geodesic in the IG.
  • each pixel of the novum receives a constant input signal y n which may be positive, negative, or zero (the latter case being uninteresting) .
  • That signal generates an activation of the same level which is passed through ⁇ - * - before being fanned out to the IG 16.
  • Almost all the IG neurons 34 have zero output due to the high threshold level and the synaptic weights of zero. But in the vicinity of the TFD, whose lattice barycenter is at X(t) , the threshold is low enough that the output signals P(X(t)+ ⁇ x
  • the neurons at and near X(0) absorb the input pattern (y n
  • the process is that of a feedback control ⁇ echanism for the internal model of the "plant” which consists of the soliton wave particles on the IG 16 lattice.
  • the input to this model is the observation, but only after it has been supplemented by the "regulator” in the novum 14.
  • the regulator output is constructed to stabilize -the plant, which in this case means that the TFD's are moving along their geodesies with minimum disturbance, i.e., with their own inertia.
  • the effect is that the output of the novum approximates the time derivative of the step function input and therefore, the patterns stored in the synapses of the IG trajectory X(t) record that time derivative.
  • the patterns stored in the synapses of the novum 14 record the negative of that time derivative.
  • IG neurons encode spatial patterns
  • novum neurons encode temporal signals.
  • the time derivative of a step function is also the innovations process of the step function. This does not hold true for more general signals.
  • the current internal representation of that context is the state of the threshold field of the IG 16, i.e., the positions and velocity vectors of the TFD markers. Those TFD's sensitize or condition the IG 16 for the detection of certain states/features in the input.
  • the output of the novum 14 is filtered through the synaptic templates of all the IG neurons 34 which respond with activation levels representing the a-priori (or "context free") estimate of the content of that data. These activation levels are filtered through the nonuniform IG threshold field 43 to produce the IG output distribution, which represents the conditional likelihood for the presence of states/ features in the input.
  • the IG output distribution is treated as a collection of scalar coefficients for the formation of a linear combination of the patterns that are stored in spatially distributed form in the novum synapses.
  • This construction produces the projection of the current observation into the pattern subspace spanned by the prior observations. (It is actually a "fuzzy" projection, since the TFD's are not delta functions over the IG lattice.)
  • This construction also constitutes a decoding of the abstract IG estimate and is easily seen to correspond to the method of "model based vision" in that the features that are detected by the IG 16 are used to reconstruct a model in the novum 14 for comparison against the actual observation. The correspondence even reflects hierarchical model based schemes if one allows that a network of PA bdules can achieve a nesting of more and more abstract feature sets as the distance of each module from the sensory array increases.
  • the reconstructed model is NOT an estimate of the current observation, but rather it is an estimate of the observation that will arrive after a time interval tioop hich is the time required for the signal to propagate forward from novum to IG and back to the novum again (because that is how the recording occurred during learning) .
  • the feature detection has been performed not as a one-shot pattern recognition operation on an isolated image (though it could cldrarly do this as a special case) , but rather as an integrated historical estimate with the temporal gain of the Kalman-Bucy filter.
  • the estimated observation is constructed, it is compared against the actual observation to produce an error pattern, which is used both to correct the ongoing estimate (through the Kalman gain operation of the variable refractive index field) and to improve the IG coding for future reduction of the error covariance (through the action of the Hebbian learning law) .
  • This error pattern is the only output of the novum , and since it consists of the residual after projection of the observation onto the historical subspace, it is rightly called the "innovations process" of the observed stochastic process.
  • it is only a partial or suboptimal innovations process because no single PA module has the capacity to store the entire- fully differential history of its input. This is an important technicality: A true innovations process is a Brownian motion, useless for control or error correction. But a suboptimal innovations process can be so used albeit in a computationally intractable form.
  • the PA 12 constructs the (estimated) probability density for the state of the system, it contains all the information necessary for achieving any desired control objective so long as the observability and controllability criteria are satisfied.
  • the output of the novum 14 is already adequate to control the evolution of the internal model of the plant, contained in the IG 16, so it only needs a gain transformation to allow it to control the plant itself.
  • the mechanism by which the PA can accomplish automatic target recognition and parameter estimation is described, and how to train and operate a system.
  • the training procedure and the result thereof is first described, which for this system is the equivalent of defining the feature set and building the feature detectors for a model based vision scheme.
  • the target recognition and tracking mechanism will be described, which detects the features in the input signal, uses them to build a representation of the estimated target and tracks the target while it moves.
  • Training the PA 12 requires first deciding on a set of basic features which are needed to distinguish targets of interest and selecting training data that is rich in those features and low in confusing or conflicting features.
  • the IG 16 subnetwork will be initialized with "grandmother cells", each of whose synaptic weights match one sample of one key feature of the targets. Some care will have to be given to the hierarchical primacy of these features, because the most primitive features belong in a PA module (which modules will be described hereinbelow) that is closest to the sensor array, while the most abstract features belong in a deeper PA module.
  • the PA 12 will be "imprinted" with training patterns which are rich in “grandmother” images.
  • imprinting occurs when at some time t' one of the key features first appears in the spatiotemporal input pattern. Prior to this time there are no TFD's moving in the IG lattice, because no IG neuron 34 has had a high enough activation level to interact with the threshold field and therefore the threshold field is uniformly flat. But at time t 1 one of the grandmother cells reacts strongly to the passing image that it is coded for and that reaction "plucks the threshold field" to initiate the first wave motion.
  • FIG 10 there is illustrated a top view of the IG 16 lattice.
  • two grandmother cells Gl and G2 located at x ⁇ and x 2 respectively in the IG 16 lattice.
  • G2 always follows Gl by a time J 15 interval of ⁇ t ⁇
  • F always follows G2 after a time St 2 .
  • the distance between Gl and G2 in the lattice is large enough that the time required for a threshold disturbance to travel between them is greater than 6 .
  • F is
  • the pattern at that time consists of F+R, where R is random with zero mean. (If any part of R consistently followed Gl and G2, it would have been included in F.) At that time also, the two threshold disturbances are concentrated in hyperspheres which we
  • IG neurons 34 which have a lowered threshold due to being in one of the hyperspheres (and being under a TFD) will weakly absorb the synaptic code for the feature F. The random part of the sample patterns will be cancelled by the learning law. But IG neurons 34 in both the hyperspheres will strongly absorb F because their thresholds will be lower (hence their outputs will be stronger) due to the superposition of pairs of TFD's. Thus, the strength of the code that is learned at any location is determined by the confluence of consistent events in the data.
  • the error signal from the novum 14 will do two things: (1) It will warp the refractive index field (RIF) to further deflect the TFD's in the direction of smaller error, and (2) it will add (via the Hebbian learning law) a correction to the synaptic patterns in the wakes of the TFD's so that a subsequent repetition of this experiment will require less of a correction — i.e., it improves the model.
  • RIF refractive index field
  • the TFD's should have proceeded on course without deflection. If the post-collision trajectories are uncoded by another training, then they will eventually receive duplicates of the coding in the geodesic trajectories. Otherwise, an inconsistency develops which can only be resolved by extending the IG model into higher dimensions. In practice, this cannot be done by physically implementing the IG on a 4- dimensional lattice; but it can be accomplished by networking a second PA module to the first.
  • Recognition occurs when an input pattern- drives one or more parametrized feature detectors over their thresholds.
  • the outputs of all feature units are sent back to the novum 14, where they are treated as scalar coefficients in the linear combination of one or more spatial patterns stored in the synapses of the novum 52 neurons 28.
  • this linear combination constitutes the prediction of the next observation, and since there is a small time delay in constructing that prediction, it is active at the time when that next input arrives. There is no problem getting the timing right, because if the delay is not the same as when the patterns were learned in the first place then the resulting error will correct the time base as part of the Kalman gain transformation.
  • model-based vision techniques The observation is processed for matches to a number of abstract features which are coded into the IG neurons 34, and these feature responses are used to regenerate a model of the observation. In this case, however, the regenerated model is not a model of what was seen, but what will be seen a short time step into the future. By the time the model is regenerated, the next observation is received and ready for comparison and processing of the error vector.
  • the feature detectors When the feature detectors are stimulated by an observation, they tug on the threshold field and initiate the motion of a TFD marker along a trajectory determined by the location of the feature IG neuron 34 and the velocity vector (if any) associated with that feature. (How the velocity vector is determined by the gradient of the "refractive index" field associated with the activation pattern generated by the observation is described hereinbelow.)
  • This marker moves under its own inertia to generate continuing predictions. That is, the IG neurons 34 whose threshold are lowered by the traveling marker generate an output signal whose intensity is determined by the combination of the synaptic template matching and the depth of the threshold; and this signal fans out to the novum 14 to contribute its decoded template to the current prediction. This prediction is subtracted from the actual observation and the residual error is transmitted from the novum 14 back to the IG 16.
  • the refractive index of the threshold medium of the IG 16 is tied directly to the activation levels of the IG 16, the activation pattern in which the TFD marker is moving will warp the medium in just the right direction to deflect the marker into compliance with the observations. This, at least qualitatively, is what is required by the Kalman-Bucy filter.
  • the principal advantage of the continuous estimator over stationary DSP methods and model based vision is that the latter are "single-shot" decision methods. That is, they must do the best they can with the signal-to- noise ratio that is available in a single frame (which may be the result of the integration of a number of scans) of data.
  • the continuous estimator makes decisions based on the information contained in all the relevant history of observations of the target, thus achieving the gain of massive integration while automatically compensating for (or ignoring) constituent motions in the target image.
  • a neural network architecture based on the Parametric Avalanche Kalman Filter (PAKF) is operable to observe a complex system and issue control signals to cause that system to track a desired reference trajectory.
  • the design employs a PA module to estimate the state of the "plant” and to function as the servocompensator.
  • This PA module has an adaptive feedback gain matrix to transform its state estimates into the required control signal.
  • the adaptive gain matrix monitors the effect of the control signal on the tracking error and adjusts to minimize it, thus allowing appropriate controls to develop even in the event that an actuator motor is cross-wired.
  • the objective is to design a neural network solution to the problem of asymptotic tracking and disturbance rejection. This problem is discussed in the Chen of the PA.
  • the asymptotic tracking problem is a generalization of the regulator problem.
  • a control input to the plant is sought which will stabilize the plant, which usually means to drive it to the zero state.
  • a control is sought which will drive the plant toward a desired trajectory called the reference trajectory, which need not be either zero or constant.
  • the stable state of the PA consists of a (possibly empty) set of TFDs whose trajectories are geodesies on the IG 16 lattice, i.e., a set of TFDs which are not being accelerated by any warping of the refractive index field due to prediction errors or any other induced accelerations.
  • the servocompensator receives the difference between the reference signal and the output of the plant, and that difference modulates the state of the servocompensator in the same way that the sensor input modulates the state of the IG 16 in the Parametric Avalanche.
  • the S/C state is supplied to a gain matrix which transforms it into a control supplement to the state feedback stabilization control (if there is any) .
  • the state feedback can be supplied by a state estimator, so long as the plant is observable and controllable. This is called the Separation Theorem, as it allows the state estimation problem to be separated from the control problem.
  • FIG. 12 there is illustrated a block diagram of a control module 114 which employs two PA Kalman Filters (PAKFs) 116 and 118 for the state estimation functions required in the tracker described above.
  • PAKFs PA Kalman Filters
  • Each PAKF 116 and 118 is followed by a gain matrix 120 and 122, respectively, to transform the state estimates into control signals.
  • This diagram is functionally the same as the tracking system described above and shown in Figure 11. However, it is a bit deceptive for two reasons. One is that the PAKF which is used for asymptotic state estimation does not employ the available control input to the plant 10 to improve its estimates, as it should be. The other is that the gain matrices 120 and 122 cannot be implemented as adaptive neural networks in the positions where they are shown.
  • PAKF 116 which performs state estimation for the feedback stabilization function, sees only the output of the plant 10.
  • figure 13 taken from Chapter 7 of the Chen reference, the design of a different asymptotic state estimator is shown which receives both the output of the plant and the control input to the plant.
  • an estimator 124 is illustrated in feedback with the plant 10.
  • the difference between the designs of Figure 12 and Figure 13 is that in the design of the Kalman filter, the plant 10 is assumed to be "driven" by noise. That is, all deviations of the plant trajectory about the geodesic are determined by the equation,
  • the PAKF simply associates incoming patterns with IG neurons 34 in the path of a TFD, so it will build a model of the control during training. Be since the control signal tends to be generated independently of the plant, any attempt to train the PAKF on observed trajectories that may be pushed one way at a certain point in one trial and another way at the same point in the next trial will encounter great difficulty in constructing a good model. If, however, that control signal could be made accessible to the PAKF through an appropriate mechanism, , then it could serve as an "organizer" of the novelty during training and as an accelerator of estimation convergence during recall.
  • gain matrices 120 and 122 With respect to the gain matrices 120 and 122, its basic function is to move the eigenvalues of the composite system into the left half of the complex plane, and as far left as possible without saturating the controller. What is important here is that a gain is acceptable if the composite system is asymptotically stable (in the sense described above) . One gain is better than another if it drives tne system toward the reference signal faster.
  • FIG. 14 there is illustrated the preferred PACM design, in which the gain matrix has disappeared because it is implemented adaptively in the novum 14.
  • Each of the novum neurons has an output n(t) which is input to the IG 16 and also to the plant 10 on a line 126.
  • the plant output, y(t) is input to an error block 128 that subtracts the value of y(t) from an external input r(t) to provide the input value e(t) to the novum 14.
  • each synapse is adjusted according to the product of its input times the output of the neuron 28.
  • the i-th component of that error happens to be available, since it is input to every element of the novum 14.
  • Our learning objective is to minimize the absolute value of this error. We therefore adapt the gain matrix with the following learning law:
  • ⁇ Kij - ⁇ ⁇ j(t) (d/dt)(e-K ) sgnf ij).
  • the learning law needs to be modified slightly to prevent the control from saturating. Saturation occurs when n- ⁇ (t) approaches +1 or -1, which are the upper and lower asymptotes of the novum sigmoid function. Pushing the K j i further away from zero will have negligible effect on the control signal and may cause numeric overflow of the synaptic weights.
  • a solution is to shutdown the learning by linking the rate constant ⁇ to the magnitude of n ⁇ (t) .
  • a simple example will be stepped through in detail to illustrate the action of the PACM.
  • the observation is a measure of the elevation angle of the barrel of a rapid-fire gun mounted on a moving platform.
  • the reference signal is supplied to the operator and for this example it is assumed to be 60 initially zero (horizontal) . Since this is a one dimensional example, we suppose that the novum 14 contains a single neuron, although the IG 16 may contain several hundred in a one dimensional lattice.
  • the PA 12 has already been >- trained as described hereinabove to observe the measurement and to estimate the elevation angle through normal vehicular motions and during firing of the gun, but without any stabilization.
  • the neurons 34 of the IG 16 have come to be associated with a range of elevation angles. As described hereinabove, even though the IG neurons 34 are on a discrete lattice, the likelihood estimates can interpolate between them, so that the IG estimates are practically continuous.
  • the novum 14 output is then connected to the vertical actuator and a reference signal of zero degrees is supplied so that the input to the novum 14 is the actual elevation angle of the gun.
  • y(t) is the observed elevation angle (positive being above horizontal)
  • Y(t)) is the IG estimate of the tracking error, given the history of observations
  • n(t) is the output of the novum, which is also the control signal u(t) to the actuator
  • K(t) is the (scalar) value of the synaptic weight in the novum 14 which receives the input e(t) .
  • CASE 1 n(t) is connected to the actuator "properly", so that the control acceleration of the gun elevation is directly proportional to n(t) .
  • CASE 2 Same as Case 1 except the actuator is cross wired, so that the vertical acceleration of the gun elevation is inversely proportional to n(t) , i.e., the gun moves down when n(t) is positive.
  • the learning law will react to any large magnitude error as if it did not "trust" its gain value(s). That is because such errors are always increasing in magnitude until the control action takes effect, so during that time the matrix is being adapted in the wrong direction. But if the control action is correct, the error will begin decreasing and the gain matrix will return to its trustworthy state.
  • the learning rate constant ⁇ controls the time constants for adaptation, so it is necessary to adjust ⁇ properly to allow for the latency in the feedback loop.
  • the FPU equations are anisotropic, so that an initial disturbance results in a positive pulse moving to the left and a negative pulse moving to the right. Both of the nonperiodic boundary conditions caused some degree of reflection of the waves from the ends of the lattice. With the periodic boundary condition, we could run the simulation until the left and right waves collided, and we confirmed that they would emerge from the collision with their shapes intact.
  • Figures 15 and 16 show the result of such an experiment.
  • the boundary conditions are "WRAP", which allows the initial disturbance over neurons number 1-5 to propagate to the right and to the left from waveform 129 at time t ⁇ .
  • the leftward disturbance wraps around and re-enters the array from the right as waveform 131 slightly later in time.
  • the waveform 129 moves to the right to form waveform 133 at t, and waveform 131 moves to the left to form waveform 135 at t.
  • the disturbance moving to the right is positive and the disturbance moving to the left is negative. That is the opposite of what happens with the usual sign on the nonlinear term of the FPU equation, but the sign was - reversed since it is desired that positive waves move to the right.
  • Figure 15 the experiment proceeds up to, but not beyond the point of the collision of the right and left waves 133 and 135 at time t ⁇ .
  • Figure 16 shows the two waves 133 and 135 at the time of collision with solid curve 130 and after the collision by dotted curve 132, illustrating one of the key properties of solitons, and demonstrating the viability of one of the most important elements of the Parametric Avalanche design.
  • Figures 17 and 18 illustrate the response of the synaptic weights in the IG 16 and the novum 14, respectively, to the onset and the offset of a boxcar function which was input to pixel number 5 (only) of the novum.
  • Figure 17 shows that the onset was recorded most strongly at IG neuron number 25, which is where the moving soliton was shortly after the signal came on (at IG neuron number 20) .
  • the offset was recorded at IG neuron number 88 (the signal was turned off when the soliton was at number 80) .
  • the graph in Figure 12 shows the values of synaptic weight number 5 on each of the 100 IG neurons, and clearly illustrates the way in which the novelty in the temporal signal is distributed spatially over the neurons of the IG. Note that the synaptic ⁇ - weights at the equilibrium point just before the offset of the boxcar do not reach the zero level, for reasons that we discussed in Section 2.3.1.
  • the graph in Figure 18 illustrates the values of the synaptic weights on each neuron 28 (pixel) of the novum 14.
  • a dashed curve 134 shows all the learnable synapses on pixel number 5 of the novum 14.
  • the other pixels, which received no input, are also shown to illustrate that even though their synapses were receiving input from the IG, their weights remained at their initial values near zero (random within the interval [-.01,+.01]) . Note that these weights are the negative of those in Figure 17, and that they are partially concentrated on a single neuron, rather than being spatially distributed as in the IG.
  • Figure 17 is almost exactly the activation level of the IG at onset of the Mexican hat.
  • the vertical axis is rescaled and relabeled as the "activation" of the neuron whose number appears on the horizontal axis.
  • neuron number 88 responded with the largest positive output, since its template aligned with the negative of the Mexican hat function.
  • the PA has the advantage that the threshold field dynamics not only control the learning of 67 the patterns, but also the recall gain in the presence of a consistent and reinforcing history of observations.
  • FIG. 19 there is illustrated a block diagram of one of the neurons 34 in the IG 16 which, as described above, comprises a single processing element.
  • Each of these processing elements is arranged in an array of processing elements of, for example, an M x N array for a two-dimensional system or even a higher dimensional system.
  • Each of the processing elements in the array is represented by that illustrated in Figure 19.
  • the processing element in Fl ⁇ gure 19 receives on one set of inputs 140 the signal vector
  • inputs 142 receive adjacent threshold levels from selected nodes, which in the preferred embodiment, are neighboring nodes. However, it should be understood that these threshold levels can be received from selected other nodes or neurons in the IG lattice.
  • Each of the processing elements is comprised of an IG processor 144 and a threshold level 146.
  • the inputs 140 are input to the IG processor 144 and the inputs 142 are input to the threshold level 146.
  • a memory 148 is provided which is interfaced through a bi ⁇ directional bus 150 to the processing element to communicate with both the IG processor 144 and *the threshold level 146.
  • a block 152 represents the f portion of the processing element that computes the stefcivation levels. This resides in the IG plane 144.
  • In the threshold plane 146 there is a block 156 that is provided for updating the threshold values.
  • there is a clock 158 that operates the processing element of Figure 19. As described above, each of the processing elements is asynchronous and operates on its own clock, which is an important aspect of the Parametric Avalanche.
  • the output of the activation block 152 is input to a threshold function block 158 which determines the output on a line 160 as a function of the threshold generated by the threshold computation block 156. As described above, the threshold is low only in the vicinity of the TFD.
  • the output of block 158 comprises the output of the IG neuron or processing element of Figure 19 and this also is fed back to the input of the compute weight update block 154 to determine new weights.
  • the output of the activation block 152 is also input to the threshold level update block 156.
  • Each of the blocks 152, 154 and 156 interface with the memory 148 which is essentially a multiport memory. This is so because each of the processes operate independently; that is, the synaptic weights are fetched from memory 148 by the activation computation block 152 while they are being updated.
  • the activation computation block 152 when a signal is received on the lines 140 from the novum, the activation computation block 152 must fetch the weights from the memory 148 in order to compute the activation level. This is then input to the threshold block 158. At the same time, the threshold output levels from each of the interconnected (preferably adjacent) nodes is received to generate the threshold level at that processing element or node. This is utilized to set the threshold level input to the node, and, thus, determine the output level. As described above, if the threshold level is low, this will produce an output even if the activation level is very low. However, if the threshold level is high, but the activation level is very high, this may also produce an output. 69
  • the first is when the system is initialized and nothing is stored in the template such that the system must learn. As a soliton wave moves across the processing element and the threshold level goes down, the signal level will go up on the output due to the mismatch and the presence of a threshold level and the observed image will be stored in the template in memory 148. In the second situation, a TFD moves across a processing element, but the input signal mismatches with the memory, resulting in a high activation output from activation block 152. In this case, the mismatch either does not produce an output or it does produce an output.
  • the memory template will stay where it is, and if it does produce an output, the memory template will be transformed so that it looks like whatever signal is activated.
  • the third situation is when the soliton wave passes across the particular processing element and the threshold gets lowered and the incoming signal actually matches the template. Since it actually matches the template, the output will be high but since the template already looks like what was originally stored it will only be reinforced but not changed in form.
  • FIG 20 there is illustrated a block diagram of one of the neurons 28 in the novum 14, which, as described above, comprises a single processing element.
  • Each of these processing elements is arranged in an array of processing elements of, for example, an M x N array for a two dimensional system.
  • Each of the processing elements in the array is represented by that illustrated in Figure 20.
  • the processing element in Figure 20 receives on one set of inputs 170 the signal vector inputs from the output of the observation matrix 10.
  • the processing element of Figure 20 also receives on a second set of inputs 172 the outputs from the IG 16.
  • Each of the processing elements is comprised of a computational unit and a memory 174.
  • the memory 174 is interfaced with the computational unit through a bi ⁇ directional bus 176.
  • the computational unit computes the activation energy in a computational block 178.
  • the computational unit also computes the weight updates, as represented by compuational block 180.
  • the weight update computational block 180 provides the learning portion of the novum. Both the block 178 and the block 180 receive the inputs from both the signal vector inputs and IG outputs.
  • the output of the activation computation block 178 is input to a threshold function block 182 which was described above and comprises a bi-polar function.
  • the output of the threshold function block 182 is input to the weight update computation block 180 and also provides the novum output on the line 184.
  • a clock 186 is provided which operates the computational unit of the novum.
  • a cart 186 is provided with an upright member 188 disposed on the upper surface thereof and mounted on a pivot point 190 at the lower end thereof.
  • the upper end of the member 188 has a weight 192 disposed thereon.
  • the object of this problem is to maintain the member 188 in a vertical and upright direction.
  • the Parametric Avalanche in this example is comprised of 100 neurons in an IG 194 and a single novum neuron 196.
  • the novum neuron receives as inputs the outputs of the 100 neurons in the IG 194 and it also receives a single observation input, representing the angle from the vertical relative to the cart 186.
  • the angle theta is input to the negative input of summing block 198, the positive input of which is connected to a signal REFISG.
  • the output is input to a block 200 which receives on the other input thereof the adaptive gain input.
  • the output of block 200 provides the observation input to the novum neuron 196.
  • the control input to the cart is a horizontal acceleration and is supplied by the network.
  • the output of the novum neuron 196 is input to each of the 100 neurons in the IG 194 on an output line 204.
  • the output of the novum neuron 196 is input through a gain scaling block 206 in the cart 186 to provide a control input.
  • the threshold field is represent by a ⁇ U ⁇ oving TFD 202 which traverses the IG neurons from the left to the right.
  • This IG is a one dimensional IG. Since the novum output is a smooth approximation of the derivative of the input (when the input is entirely novel) , the novum serves as a "derivative controller".
  • Figure 22 there is illustrated the time evolution of the angle from the vertical, and the corresponding novum output.
  • Figure 22 is similar to Figure 21 except that random disturbances have been injected into the system, as will be described hereinbelow.
  • a "virgin" Parametric Avalanche generates a control signal which maintains an inverted pendulum in its upright position through the application of a horizontal acceleration to the pivot point of the pendulum.
  • the simulation consists of a loop on the time variable, in each cycle of which the error (difference) between the actual angle of the inverted pendulum (in radians away from the vertical) and the desired angle of zero radians is multiplied by a gain coefficient and then supplied as the input to the novum neuron 196 of the Parametric Avalanche model.
  • the Parametric Avalanche model is then called upon to advance its state forward one increment of time in accordance with its Quantum Neurodynamics and its learning laws, and to present the output of the novum neuron 196 as the control signal (the horizontal acceleration) for the motion of the pivot point 190.
  • the Novum output is restricted by the sigmoid threshold function to being within the range from -1 to +1, it is amplified by a constant positive factor before it reaches the pendulum model.
  • the update subroutine for the adaptive adjoint gain coefficient is called upon to adjust this gain for optimum effect of the control action.
  • the pendulum model is called upon to advance its state forward one increment of time by a simple double integration of the second order difference equations, thus producing the actual pendulum angle for use in the next cycle of the loop.
  • the Quantum Neurodynamics of the PA model is implemented by the "naive" method rather than by actual integration of the nonlinear Schroedinger equation.
  • a threshold depression (TFD) 202 is propagated along the one-dimensional IG lattice 194 by a global type of algorithm which is capable of interpolating the TFD 202 into one hundred equally spaced positions between each pair of the IG neurons.
  • the TFD 202 moves with a velocity that is specified by the operator at run time. There is no provision for modulation of this velocity by the "warp drive” mechanism (warping of the refractive index field) , because it is assumed that the synapses of each IG neuron are randomly initialized prior to the passage of the TFD 202. Of course, those synapses will become programmed as the
  • TFD 202 passes by in accordance with the learning law.
  • the output of the novum will simply be a noisy version of the input signal, the variance of the noise depending on the variance of the random initialization of the synaptic weights.
  • any velocity in excess of approximately 2 IG neurons per simulation second will produce this kind of output, which is useless for control of the pendulum.
  • the noise disappears and the phase angle of the novum output begins to lead the phase angle of the input, moving toward the time derivative of the input signal. This allows the novum output to function as a derivative controller of the pendulum.
  • the simulations shown in the graphs of Figures 22 and 23 were produced with a TFD velocity of 1 IG neuron per simulation second.
  • the input data which produced the graph of Figure 22 is contained in Table 1 and the output data which produced the graph in Figure 23 is contained in Table 2 in the list bearing the same name as the graph, but with a ".DAT" extension.
  • the main difference between the two is that in Figure 23, a random disturbance was applied to the velocity of the pivot point, as indicated by the nonzero value of the RANGE parameter.
  • a control system is designed by passing a state estimate through a gain factor to "represent" the estimate properly to the input of the plant.
  • we adopt an adjoint representation of the gain by placing it between the output of the plant and the input of the state estimator (the PA) .
  • the PA state estimator
  • the reason for doing this is that the information needed for adaptive gain adjustments is not compatible with neurocomputing methods when the gain is placed between the PA output and the plant input; but it is compatible with neurocomputing methods when it is placed in the observation path.
  • the UPDATE subroutine adjusts the gain so as to favor opposite signs of the input to the novum (the tracking error) and the output of the novum (the control) .
  • that output approximates the time derivative of the tracking error, such a gain will result in any tracking error being driven to zero by the control signal.
  • the gain happens to be negative, then the PA will be learning a reversed image of the observations, but that is what it takes to obtain one signal from the novum that means the'"same thing to both the plant and the PA's model of the plant in terms of controlling its trajectory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Un réseau neuronal comprend un système d'observation (10) qui envoie une entrée d'observation à un novum (14). Le novum (14) produit comme sortie un procédé d'innovations sous-optimales concernant les entrées d'observation et de prédiction reçues. Les entrées de prédiction, provenant d'un vecteur d'entrée (22), représentent une estimation d'état. La sortie provenant du novum (14) sert d'entrée à un générateur infinitésimal (IG) (16) sur un vecteur d'entrée (20). Ledit générateur (IG) (16) donne des estimations d'état sur le dit vecteur (22). Le novum comprend un arrangement d'éléments de traitement ou de neurones (28), chaque neurone recevant lesdites estimations d'état provenant du générateur (IG) (16) sur des lignes (32). De manière semblable, le générateur (IG) (16) comprend un réseau géométrique de neurones (34). Chaque neurone (34) recevant des entrées synaptiques provenant du novum (14) sur des lignes (36) reçoit également une entrée de champ de seuil modifiant. Une particule d'onde quantummécanique est propagée sur le réseau géométrique de manière à réaliser une sortie (38) à laquelle est associée une inertie. A chaque neurone (34) est associée une mémoire pour l'emmagasinage des modèles spatiaux d'une série chronométrée d'observations. De même, chaque neurone (28) est lié à une mémoire permettant d'emmagasiner les modèles temporaires de ladite série chronométrée d'observations. Le générateur (IG) (16) est adaptatif et apprend selon la loi Hebbienne, tandis que le novum (14) est adaptatif et apprend selon la loi contraHebbienne.A neural network includes an observation system (10) which sends observation input to a novum (14). The novum (14) produces as its output a process of suboptimal innovations concerning the inputs of observation and prediction received. The prediction inputs, from an input vector (22), represent a state estimate. The output from the novum (14) serves as an input to an infinitesimal generator (IG) (16) on an input vector (20). Said generator (IG) (16) gives state estimates on said vector (22). The novum includes an arrangement of processing elements or neurons (28), each neuron receiving said state estimates from the generator (IG) (16) on lines (32). Similarly, the generator (IG) (16) includes a geometric network of neurons (34). Each neuron (34) receiving synaptic inputs from the novum (14) on lines (36) also receives a modifying threshold field input. A quantumechanical wave particle is propagated on the geometric network so as to produce an output (38) with which is associated an inertia. Each neuron (34) is associated with a memory for storing spatial models of a timed series of observations. Likewise, each neuron (28) is linked to a memory making it possible to store the temporary models of said timed series of observations. The generator (IG) (16) is adaptive and learns according to the Hebbian law, while the novum (14) is adaptive and learns according to the contraHebbian law.

Description

CONTINUOUS BAYESIAN ESTIMATION WITH A NEURAL NETWORK ARCHITECTURE
TECHNICAL FIELD OF THE INVENTION
The present invention pertains in general to a neural network architecture, and more particularly, to an architecture which is designed to perform adaptive, continuous Bayesian estimation on unpreprocessed large dimensional data.
BACKGROUND OF THE INVENTION
Artificial neural systems is the study of dynamical systems that carry out useful information processing by means of their state response to initial or continuous input. Initially, one of the goals of artificial neural systems was the development and application of human-made systems that can carry out the kinds of information processing that brains carry out. These technologies sought to develop processing capabilities such as real- time high performance recognition, knowledge recognition for inexact knowledge domains and fast, precise control of robot effector movement. Therefore, this technology was related to artificial intelligence.
Artificial neural systems have typically been studied through the use of neural networks which are comprised of a network of processing elements or neurons that are interconnected through information channels which are referred as "interconnects". Each of these neurons can have multiple input signals, but only one output signal. The behavior of each of the neurons is generally determined by first-order ordinary differential equations in the output signal variable. By providing some of the neurons in the network with the capability to self-adjust some of the coefficients in their governing differential equations by means of additional first order ordinary differential equations, a network can be termed "adaptive". This allows the network to learn and has been one of the primary goals of artificial neural systems.
Cognitive systems in which neural networks are implemented can be viewed in terms of an observed system and a network which are interfaced by sensor and motor transducers. The neural network is a dynamic system which transforms its current state into the subsequent states under the influence of its inputs to produce outputs which generally influence the observed system. A cognitive system generally attempts to anticipate its sensory input patterns by building internal models of the external dynamics and it minimizes the prediction error by employing a prediction error correction scheme, through improvement of its models, or by influencing the evolution of the observed system, or all three. Mathematically, this is a hybrid of three important and well-known problems: system identification, estimation and control. The theoretical solutions to these have been known for several decades. System identification is accomplished analytically by a number of methods, such as the "model reference" method. However, in practice, it is usually accomplished through the art of phenomenological modeling. Estimation is done with, for example, the Kalman filter, assuming that the dynamical equations have been previously identified. This estimation is essentially a monitoring of the state of a complex plant. The Kalman filter provides an iterative estimation of linear plants in Gaussian noise, whereas another filtering approach, the Kalman-Bucy filter, provides continuously evolving estimates. The multi- stage Bayesian or continuous Bayesian estimator can be utilized for non-linear plants in non-Gaussian noise. Control is approached through several routes, including the Hamilton-Jacobi theory and the method of Pontryagian.
Unfortunately, these solutions are not adaptively computable except for systems and observations of fairly small dimensions. Biological systems are vastly superior in most respects, but it must be pointed out that even though a child can extrapolate the trajectory of a frisbee in a visual image comprised of millions of signals — and even put his hands in a position to catch it — the learning process (system identification) is still long and difficult. Typically, the ideal approach is seldom feasible due to the large dimensionality of practical observations and the sampling rate required to effect estimations which together impose an insurmountable computational burden.
SUMMARY OF THE INVENTION
The present invention disclosed and claimed herein comprises a neural network. The neural network includes an observation input for receiving a timed series of observations. A novelty device is then provided for comparing the observations with an internally generated prediction in accordance with the novelty filtering algorithm. The novelty filter device provides on an output a suboptimal innovations process related to the received observations and the predictions. The output represents a prediction error. A prediction device is provided for generating the prediction for output to the novelty device. This prediction device includes a geometric lattice of nodes. Each of the nodes has associated therewith a memory for storage of spatial patterns which represent a spatial history of the timed series of observations. A plurality of signal inputs is provided for receiving the prediction error from the novelty device and then this received prediction error is filtered through the stored spatial patterns to produce a correlation coefficient that represents the similarity between the stored pattern and the prediction error. A plurality of threshold inputs is provided at each node for receiving threshold output levels from selected other nodes. A threshold memory is provided for storing threshold levels representing the prior probability for the occurrance of the stored spatial patterns prior to receiving the stored spatial patterns. A CPU at each of the nodes computes an updated threshold level in accordance with a differential-difference equation which operates on the stored threshold level, the received threshold levels and the correlation coefficients to define and propagate a quantum mechanical wave particle across the geometric lattice of nodes and also store the updated threshold in the threshold memory. A threshold output is provided from each of the nodes for outputting the updated threshold to other nodes. The CPU computes the internally generated prediction by passing the correlation coe ficients through a sigmoid function whose threshold level comprises the updated threshold level. The prediction represents the probability for the occurrence of the storage spatial patterns conditioned upon the prior probability represented by the storage threshold level.
In another aspect of the present invention, the prediction device is adapted such that it is operable to learn by updating the stored spatial patterns so as to correlate the prediction error with the position of the quantum mechanical wave particle over the geometrical lattice. This learning is achieved in accordance with the Hebbian learning law.
In yet another aspect of the present invention, the novelty device includes an array of nodes with each node having a plurality of signal inputs that receive the observation inputs, and a plurality of prediction inputs for receiving the prediction outputs of the prediction device. A memory is provided for storing temporal patterns that represent a timed history of the timed series of observations. The prediction observation inputs are then operated upon with a predetermined algorithm that utilizes the stored temporal patterns to provide the prediction error.
In yet a further aspect of the present invention, the novelty device also is adaptive. It learns by updating the stored temporal patterns so as to minimize the prediction error. The learning algorithm utilizes the contraHebbian learning law. BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying Drawings in which:
Figure 1 illustrates a block diagram of the neural network of the present invention;
Figure 2 illustrates a block diagram of the PA 12 illustrating the novum 14 as an array of separate neurons and the IG 16 as an array of separate neurons;
Figures 3a-3c illustrate the use of traveling wave packets in the IG threshold field;
Figure 4 illustrates a dual construction of the gamma outstar avalanche can be made which consists of instars from the pixel array falling on each of a sequence of neurons in a "timing chain";
Figure 5 illustrates a recurrent two-layer neural network similar to that utilized by many neural modelers;
Figure 6 illustrates a block diagram of the parametric avalanche which represents the innovations approach to stochastic filtering;
Figure 7 illustrates how the novum and the IG of the PA generate and use the innovations process;
Figures 8a and 8b illustrate schematic representations of the novum neuron and the IG neuron;
Figure 9 illustrates a more detailed flow from the focal plane in the observation block through the novum and the IG for a two dimensional lattice;
Figure 10 illustrates a top view of the IG lattice; Figure 11 illustrates a tracking system;
Figure 12 illustrates a block diagram of a control module which employs two PA Kalman Filters for the state estimation functions;
Figure 13 illustrates a Luemberger observer; Figure 14 illustra'tes the preferred PACM design;
Figures 15 and 16 illustrate graphs of one example of the PA;
Figure 17 illustrates the response of the synaptic weights in the IG:
Figure 18 illustrates the values of the synaptic weights on each neuron of the novum;
Figure 19 illustrates a block diagram of one of the neurons in the IG;
Figure 20 illustrates a block diagram of one of the neurons in the novum;
Figure 21 illustrates an example of an application of the Parametric Avalanche;
Figures 22 and 23 illustrate the time evolution of the angle from the vertical for the example of Figure 21 and the corresponding novum output; and
Figure 24 presents programme information for use in a neural network in accordance with the invention. DETAILED DESCRIPTION OF THE INVENTION
Referring now to Figure 1, there is illustrated a block diagram of the neural network of the present invention. In any system, there exists a complex plant which can be referred to as the observed system 10. The observed system 10 receives on the input control signals μ(t) . To monitor the state of this observed system 10, an observing system 12 is provided. The observing system 12 is essentially the neural network. In the present invention, the neural network is comprised of a two-layer recurrent network. There is an input layer which is referred to as the novum, illustrated by a block 14. The second layer provides a classification layer and is referred to as the IG as illustrated in a block 16. The abbreviation IG refers to Infinitesimal
Generator. The two blocks 14 and 16 operate together to continuously process information and provide a retrospective classification and prediction (estimate) of the evolution of the classified observation.
The novum 14 provides a prediction assessment of the observed system and receives on one input thereof the output of the observed system 10. The output of the novum 14 provides a prediction error. The novum 14 also receives on the input thereof the output of the IG 16, the output of the IG 16 providing a state estimate in the form of a conditional probability distribution function. The novum 14 essentially extracts the innovations process which will be described in more detail hereinbelow. It decodes the classifications (state estimations) which it receives from the IG 16 and then subtracts that decoded version from the actual received signal to yield the novel residual. The novum 14 contains the internal model of the transition by which the system 10 is observed and is operable to transform the estimated state output by the IG 16 into a prediction of the observed signal received from the observed system 10. As will be described in more detail hereinbelow, the novum 14 operates under a simple learning law wherein the output is zero when the novelty is not present; that is, when the internal model correctly predicts the observed signal, there is no novelty in the observed signal and, therefore, a zero prediction error. Therefore, the novum 14 is driven to maximize the entropy of its output which comprises its state of "homeostasis". The IG 16 operates as a prediction generator. It implements the functions of Kalman gain matrix, the state transition function and the estimation procedure. The output of the IG 16 indicates the probability that prior events have occurred, given the prior history and the current observation.
Both the novum 14 and the IG 16 are comprised of a network of processing elements or "neurons". However, the neurons in the novum 14 are arranged in accordance with an observation matrix such that the observed system is mapped directly to the neurons in the novum 14 whereas the IG 16, which is also comprised of a network of neurons is arranged as a geometric lattice, and the neurons in the IG 16 represent points in an abstract probability space. Each of the neurons in the IG 16 has an activation level, which activation level indicates the likelihood that the events or states which each particular neuron represents have occurred, given only the current measurements. The output of these neurons indicate the likelihood that those events have occurred, conditioned additionally by the prior history of the observations and the dynamical model, as supplied in the threshold level of the output sigmoid function described hereinbelow. The synaptic weights of the IG neurons learn by Hebbian learning. The neural network of the present invention as illustrated by the observing system 12 is referred to as a "parametric avalanche" (PA) which is operable to store dynamic patterns and recall dynamic patterns simultaneously. It performs optimal compression of time varying data and incorporates a moving targert indicator" through its novelty factorization. It can track a non¬ linear dynamic system subliminally before accumulating sufficient likelihood to declare a detection. The PA 12 therefore possesses an internal inertia dynamics of its own, with which the external dynamics are associated by * means of the learning law. This internal dynamics is governed by the Quantum Neurodynamic (QND) theory.
Referring now to Figure 2, there is illustrated a block diagram of the PA 12 illustrating the novum 14 as an array of separate neurons and the IG 16 as an array of separate neurons. The novum 14 receives the observation of the plant from the observed system 10 on an input vector 18. The novum 14 outputs the novelty on an output vector 20 which is input to the IG 16. The IG 16 generates the state estimates of the plant for input to the novum 14 on an output vector 22.
The observed system 10 is illustrated in the form of a focal plane wherein an object, illustrated as a rocket 26, traverses the focal plane in a predetermined path. This constitutes the observation. This observation is that of a dynamic system which possesses a system inertia. The focal plane of the observed system 10 is mapped onto the novum 14. At each point in time, the IG 16 makes a prediction as to the state of the observed system and this is input to the novum 14. If the prediction is correct, then the novelty output of the novum 14 will be zero. At each point in time, the rocket 26 traverses its predetermined path and, if the internal system model in the PA 12 is correct, the state 12 estimates on output vector 22 will maintain the novelty output of the novum 14 at a zero state.
Each of the neurons in the novum 14 is represented by a neuron 28. Each neuron 28 receives a single input on a line 30 from the observed system 10 and a plurality of inputs on lines 32 from each of the neurons in the IG 16, which constitute state estimates from the IG 16. The neuron 28 generates internal weighting factors for each of the inputs 32, as will be described hereinbelow. The neuron 28 provides a single output to the IG 16, which output goes to each of the neurons in the IG 16.
The IG 16, as described above, is comprised of a geometrical lattice of neurons. Each of the neurons in the IG 16 is illustrated by a neuron 34. Each of the neurons 34 receives inputs from the neurons 28 in the novum 14 on input lines 36. Each of the neurons 34 is an independent and asynchronous processor which can generate weighting factors for each of the input line 36 to internally generate an activation level. The weighting factors provide a stored template and the activation level yields the correlation between an input signal vector (i.e., the novum output) and this stored template. It indicates the degree to which the input signal looks like the stored template across the IG 16. If the activation level is zero, it indicates that the input vector does not match the stored template. However, if there is a match, the activation level is relatively high. This activation level is modified by a threshold field which will be described in more detail hereinbelow, which then generates an output that is input to each of the neurons 28 in novum 14.
Each of the neurons 34 in the IG 16 have a threshold associated therewith such that an overall threshold field is provided over the lattice in the IG 16. The threshold levels in the IG threshold field are governed by non-linear lattice differential equations. The "natural mode" of wave propagation in the threshold field favors compact, particle-like depressions in the field which are termed Threshold Field Depressions (TFD's). These propagating TFD's are subject to short range interactions from the activation of neighboring ones of the neurons 34 which can initiate and subsequently modulate their trajectories. These interactions, which are continuously updated by the changing location of the TFD's on the lattice, implement the effect of the Kalman gain in transforming the novelty into a correction of the trajectory estimate. The threshold field operation will be described in more detail hereinbelow. However, it can be stated that one or more particle-like wave depressions are propagated across the geometric lattice of the IG 16. This propagation "tracks" the inertia of the plant, as illustrated in the focal plane of the observed system 10. It is important to note that the TFD operates over a number of neurons in the general area of the threshold field depression. Therefore, it is the interaction of the neighboring neurons that control the propagation of the TFD. The actual wave propagation is illustrated by an output wave 38 on the surface of the IG 16. The wave 38 has a peak and an indicated path which it travels. Although this path is noted as being arcuate in nature, it is a relatively complex behavior which will be described in more detail hereinbelow.
To further exemplify the wave motion in the IG 16, reference is made to Figures 3a-3c. In Figure 3a, the IG 16 is comprised of an IG1 42 and a threshold plane 43. The IG1 layer 42 represents the geometrical lattice of neurons which receive on the input thereof the output from the novum 20 with each neuron in the IG* layer 42 providing an activation level in response thereto. As 14 described above, each of the neurons 34 generates a weighting factor for each of the input lines from the novum 14 in accordance with a learning law. This results in the generation of an activation level for each neuron 34 which indicates the degree to which the input signal looks like the stored template.
The stored template will have been learned in a previous operation from which the weighting factors were derived. As illustrated in Figure 3a, at a given point in time, tp, the activation across the geometrical lattice of the IG' layer 42 will appear as a distribution of activation levels 44. Of course, if there is no output from the novum 14 due to a correct prediction by the IG 16, the activation levels will be virtually zero. This distribution, even if a zero value, has an inertia which is a result of the wavelike motion described above. However, at the instant in time tp, the distribution of the activation levels appears as illustrated in Figure 3a. For the purposes of the present example, there is no prediction error and, therefore, the output of the novum 14 is zero and the resulting activation levels are zero.
At the instant in time tp, there is also a threshold depression 46 in the threshold plane 44. Normally, the threshold level in the threshold plane 44 is at a high level. However, when a TFD is propagated across the threshold plane 44, there will be a threshold field depression present which has an associated inertia. In Figure 3a, this is illustrated as a threshold field depression (TFD) 46. Since the output of the novum is zero, the estimation provided by the IG 16 is correct. Therefore, the TFD 46 will be directly aligned with respect to the distribution of the activation levels 44. At the lowest point in the threshold depression 46, there will be a high output signal level from the IG 16, even for a zero activation level, indicating that there is a high probability that the source has achieved a certain state, which state is conditioned by the level of the threshold depression 46.
If there were an error in the prediction process indicating a low probability of a given event occurring, this would result in an output from the novum 14 and the generation of activation levels in the IG' layer 42. In this case, some adjustment must be made to the system either to learn the new observation or to steer the threshold depression 46 in a different direction in the geometrical lattice of the IG 14, which the output of the IG1 layer 42 achieves. This will be described in more detail hereinbelow.
In order to more clearly describe the operation of the TFD 46 and the distribution of activation levels 44, reference is now made to Figures 3b and 3c. In Figure 3b, a trajectory is illustrated whiσh moves between a beginning point 48 at a time tp.! to an end point 50 at a time tp+1. When traversing between the points 48 to 50, a point 52 is traversed in the center thereof at a time tp. This observation illustrates a sequence of events which occur in a temporal manner. Thus, this is a dynamic system with inertia. At time tp_ι the system must examine the output of the
IG 16 to determine if the state estimates are correct. This is done through the innovation process in the novum 14. At time tp, which represents the next slice in time, the IG 16 must predict what the status of the system will be at this time and these state estimates are again input to the novum 14 which coittpares them with the observation to determine if the observation is correct. If so, the output of the novum 14 is zero. This continues on to the next slice in time at time tp+1 and so on. Initially, it is assumed that the system has learned the trajectory from point 48 to point 50 and passing through point 52. In accordance with an important aspect of the present invention, it is the generation of the state estimates in a spatio-temporal manner that is accomplished by the IG 16. This is illustrated in Figure 3c. The TFD in this example has been initiated and is propagating across the geometrical lattice of the IG 16 in a predetermined path. This path corresponds to the learned path; that is, the neurons over which the TFD is propagated are the same neurons over which it was propagated during the learning process, which will be described hereinbelow. At each slice in time, from a time reference of, for example t^, a prediction is made only in the area of the TFD, which prediction either allows the TFD to be propagated along its continued path, or which prediction modifies the path of the TFD. This latter situation may occur, for example, when the identical path is being traversed, but it is being traversed at a slower rate. Therefore, the propagation of the TFD directly corresponds to the inertia of the system whereas the geometrical lattice of the IG corresponds to the probability that the point along the path has occurred at a specific time.
In Figure 3c, the TFD occurs at tp^~. to yield a TFD 54 which is illustrated as concentric circles in phantom lines. The concentric circles basically represent the level of the TFD with the center illustrating the lowest value of the depression. Underlying the TFD, of course, is the activation level in the IG1 layer 42. These two layers are illustrated together in an overlapping manner. The TFD propagates from an area 54 in the IG 16 at time tp..! to an area 56 at time tp. This propagation continues from the area 56 to an area 58 at time tp+1. If the observation made by the novum 14 places the object at point 48 at time tp_ι, there would be a zero output from the novum 14. At time tp, if the object traversing the path in Figure 3b were observed to be at the point 52, there would be a zero activation level'in the neurons in the area 56 resulting in a high output from the IG 16. In a similar manner, if at time tp+1 the object were observed to be at point 50, this would result in a zero activation level in the neurons in area 58, resulting in a high output from the IG 16. Therefore, the TFD traversing from area 54 to area 56 to area 58 would track the inertia or speed of the object traversing from points 48 to point 52 to point 50 in the observed system in order for the system to provide the appropriate estimation of the state of the system. It is important to note that the inertia of the system has been embodied in the propagation of the TFD and this TFD in conjunction with the model encoded into the underlying neurons yields the state estimation. Because of the zero activation level in the IG1 layer 42 (due to zero novelty output) , the wave propagation is not altered.
In the situation where the object in the observed system moves from the point 48 to the point 52 to the point 50 in the same pattern that was learned but the inertia of the system changes somewhere prior to arriving at the point 48, then the inertia of the system will be different from that represented by the propagation of the TFD in the threshold plane 43. In this situation, the novum 14 will output a prediction error which will raise the activation level in front of or behind the TFD to essentially modify the threshold depression and either increase or decrease the propagation rate of the threshold depression. For example, suppose that the inertia of the observed system at a time prior to time tp ! is equal to the corresponding inertia of the TFD in the threshold field 43. However, at -time tp_ι the inertia of the observed system changes and as a result, a prediction error is generated by the novum. Since the output of the novum is greater than zero, this output will be distributed across the geometrical lattice of neurons in the IG 16 and will thus raise the activation levels. Of course, the activation levels are of concern only in the vicinity of a TFD, since the threshold elsewhere in the threshold plane 43 is so high as to result in little or no effect on the TFD. However, in the vicinity of the TFD, this increase in activation level will "steer" or "modify" the propagation of the threshold depression. In the present example, the inertia of the observed system has decreased, thus requiring the inertia of the TFD propagation to decrease or slow down. Therefore, this would result in the activation levels just behind the area 54 increasing, thus causing the propagation rate of the TFD to slow down. This would continue until the output of the novum were zero. At this point, the activation level output by the neurons in the IG would be zero as a result of the zero output of the novum 14, due to the whitening effect thereof. Once the inertia of the system becomes a constant value, the inertia of the TFD will be forced that inertia and the output of the novum 14 will be forced to a zero value.
The PA 12 is comprised in part of a number of networks which are integrated together. These will each be described in general and then the way that these networks and learning laws are integrated will be described.
Grossber Avalanche
The "gamma outstar avalanche" of Grossberg is a well-known neural network architecture which utilizes Hebbian learning and an outstar sequencing scheme to
19 effect the recording of a sequence of images into the synapses of each pixel of a focal plane array of neurons. This is described in Grossberg, S., "Some Networks That Can Learn, Remember and Reproduce Any Number of Complicated Space-Time Patterns, I", J. Math. & Mech., Vol. 19, No. 1, 1969, and Grossberg, S. , "Some Networks That Can Learn, Remember and Reproduce Any Number of Complicated Space-Time Patterns, II", Studies in Appl. Math., Vol. XLIX, No. 2, June 1970. In the Grossberg avalanche, once the image sequence has been learned, it can be recalled by turning off the illumination (or the learning law) and replaying the sequence of outstar strobes to the pixel array, whose synaptic weights cause activation of the pixel in such a way as to replay the original motion picture. If one were to look at the vector of synapses on a single pixel neuron on an outstar avalanche after learning had occurred, one would isee a spatially-coded representation of the time series and illumination signal which feed on that pixel during the training sequence.
A dual construction of the gamma outstar avalanσhe can be made which consists of instars from the pixel array falling on each of a sequence of neurons in a "timing chain". This is illustrated in Figure . The pixel array is represented by an array 60 with an illuminated pattern thereon. The output of the pixel array 60 is input to a chain of neurons 62- with a pulse 64 represented as traveling down the chain of neurons. Implementation of this type of "instar" avalanche is not a simple task nor is it obvious that by itself it would have any utility. The only difficulty in launching this instar avalanche is that it requires one to send a coherent, compact pulse of activation down the chain of neurons 62 which parameterize the time access. This cannot be done with the first order equations used by Grossberg in the activation equations because the non- linearities will deform the initially injected pulse and cause "inter symbol interference" between the adjacent images. A compact nondispersing pulse could be propagated using a simple shift register algorithm. However, such an algorithm requires a global supervisor which can communicate with all the neurons in the chain. This becomes computationally unmanageable when the chain is generalized to multiple dimensions and the number of neurons becomes large. Therefore, certain nonlinear differential-difference equations are employed to achieve our purpose. These are described below in the second entitled "Quantum Neurodynamics".
The instar avalanche could also be launched with some very simple estimations. However, these estimations become somewhat uncomputable when it is necessary to go to higher dimensional neural lattices with more than one pulse propagated therein.
With further reference to Figure 4, it is noted that each of the neurons in the neuron chain 62 is encoded with a compact representation of a pattern, and its neighbors encode the causal context in which that pattern occurred. Moreover, the coding is associatively accessible in that the spatial patterns are concentrated into the synapses of individual neurons in the neuron chain 62. Thus, if the activation response of those neurons is coupled to the threshold dynamics (as will be described hereinbelow) it is possible to stimulate recall of a time series associatively from the point of greatest correlation with the current pattern. The outstar avalanche of Grossberg cannot do this.
There is a duality between spatio-temporal pattern representations in the instar and the outstar avalanches. In the outstar avalanche, the temporal signals are compactly represented in the synapses of each pixel and the spatial pattern is distributed over the array of pixels. In the instar avalanche, on the other hand, the spatial patterns are compactly represented in the synapses of each timing neuron, and a temporal pattern is distributed over the chain of timing neurons.
Two-Laver Recurrent Network
Referring now to Figure 5, there is illustrated a recurrent two-layer neural network similar to that utilized by many neural modelers. The network is comprised of a first layer 66 and a second layer 68. The first layer is comprised of a plurality of neurons 70 and the second layer 68 is comprised of a plurality of neurons 72. Each of the neurons 70 is labelled ai~an with the a-_ , the an and the a^ neuron 70 being illustrated, the a^ neuron representing an intermediate neuron. In a similar manner, the neurons 72 in layer 68 are labelled bι-bn with the b^, bn and the bj neuron 72 illustrated, the bj neuron 72 being an intermediate neuron.
Each of the neurons in the first layer 66 receives an input signal from an input vector 74. Each of the neurons in the first layer 66 also receive an input signal from each of the neurons 72 in the second layer 68 and an associated weighting value with this input signal. In a similar manner, each of the neurons in the second layer 68 receives as an input, a signal from each of the neurons in the first layer 66 and associates a weighting value therewith. Each of the neurons in the second layer 68 receives an input from each of the other neurons therein and associates an appropriate weighting factor therewith.
The learning objective for this type of network when utilized in prior systems is to build a compact (preferably a single neuron) code in the second layer 68 22 to represent a class or cluster of patterns that were presented to the first layer 66 in not-necessarily compact distributed form. The recall objective of these networks is to reactivate a pattern of codes in the second layer 68 which identifies the class or set of classes of which the input pattern of the first layer-66 is a representative. Sometimes, the output objective is itself a distributed pattern, but more often in the prior systems, it is to produce a "delta function". This delta function representation is a low entropy distribution of activations, i.e., one that is unlikely to appear by chance. This low entropy distribution of activations, amounts to an unequivocable declaration that the input pattern belongs to the class of patterns which the lone active neuron in the second layer 68 represents. Such a representation requires no further pattern processing to communicate its decision to the human user, although it is easy if desired to employ an outstar from the active neuron in the second layer 68 to another layer to generate a distributed picture for the human user. This is essentially what is accomplished in the Hecht-Nielsen counterpropagation network. This low entropy distribution of activations is essentially how prior systems operate.
In a network illustrated in Figure 5, there are a number of learning methods utilized. In general, the "desired output" must be made available to the network. This is done explicitly in the "heteroassociative" schemes in which the desired output is a distributed output and it is done covertly in all the rest.
Essentially, all learning is "supervised" learning. In one type of learning in prior systems, the Kohenen self- organized feature map, the input pattern is propagated to the second layer 68 where each neuron 72 responds according to its current synaptic "match" to the input pattern. But this prior learning algorithm is not allowed to apply to all the elements of the second layer 68 because the desired second layer 68 output pattern is a delta function over a single one of the neurons 72 whose synapses represent the center or average of the elements of one patterned cluster. To obtain this desired output, learning is allowed to occur only in-the immediate neighborhood of the neuron 72 which has the strongest response to the input pattern. As training proceeds in this type of prior system, the radius of that neighborhood is shrunk to zero so as to achieve the delta function objective.
In another type of prior learning, the Carpenter- Grossberg adaptive resonance network, the desired output is also a delta function. Once again, this is accomplished by finding the neurons 72 in the second layer 68 with the strongest response to the input pattern and preventing the learning algorithm from applying to any other neuron 72, unless the strongest response is obtained from a neuron 72 which presents a "bad match" to the input pattern, in which case, the learning algorithm is allowed to apply only to some other single neuron 72 in the second layer 68.
Similar learning effects to those prior art systems described above are achieved with "masking fields". However, implementation of these schemes in software is very difficult. The PA 12 of the present invention utilizes a learning algorithm that is supervised, but in a locally computable neural form.
Quantum Neurodvnamics As described above, the TFD is propagated as a wave across the IG 16. The propagation of compact, coherent wave-particles over a discrete lattice was discovered by accident by Fermi, Pasta and Ulam in their study of the finite heat conductivity of solids. The differential (in 24 time) difference (in space) equations which they were studying is now called the Fermi-Pasta-Ulam (FPU) equation. When the FPU equation is written as a continuum in the spatial coordinate, it is a form of the Korteweg-deVries (KdV) equation, which has been known for some time to model the shallow water solitary waves of Russell.
Until recently, very little of the work performed in the field of non-linear waves has been applied to the modeling of soliton waves in more than one spatial dimension. However, recent progress has been made concerning the Non-Linear Schroedinger (NLS) equation as it relates to the modeling of Langmuir waves in plasma, and there are other equations now being extended into higher dimensions.
The NLS equation is a complex wave equation,
ih dE(x,t) dt + ( 2m + f ([E| ))E + U(x,t) = 0,
wherein h is Planck's constant, i is the imaginary unit, m is a scaling constant identified with the mass of a particle, is the Laplacian operator, f is a real valued function chosen to offset the dispersion of the wave, and U is a real scalar field which may be identified with the refractive index of the propagation medicum or with an externally applied force field. Depending on the form of the function f, the NLS equation will be solved by εoliton-like wave particles such as the "Gaussons" described in Birula [Annals of Physics, 100, pp. 62-93, 1976]. Once they are initiated, these particles propagate in accordance with the ordinary laws of particle physics, subject to the control of gradients in the potential field U(x,t). In the parametric avalanche, the potential field U(x,t) is established by the activation levels L(x,t) of the neurons of the IG 16, which are in turn determined by the signal vector n(t) which is received from the novum 14. It is easy to show, and is described hereinbelow, that after the PA has been entrained, the prediction errors carried by n(t) generate precisely the right potential field U (x,t) whose gradient vector
U(x,t) will pull the errant wave particle toward the minimum error (in the sense of maximum entropy) estimator.
In the parametric avalanche, it is convenient to compute the solution E(x,t) locally, directly from a discrete form of the NLS, because then each processor in the IG lattice needs to communicate only with its nearest neighbors to propagate the wave. This solution can then be used to condition or sensitize the neurons at positions x in the lattice of coupling it to the IG threshold field through the equation T(x,t) = 1 - |E(x,t)|, so that the baseline threshold level is 1 except in the vicinity where |E(x,t) | indicates a high probability of finding the wave-particle, and there the threshold is depressed below the baseline level. Note, then, that there are now two ways to stimulate an output signal from a neuron in the IG: One is to supply an input signal to the synaptic template which so strongly matches (correlates with) the template that the resulting activation level exceeds the threshold T; and the other is to pass a conditioning wave particle over the neuron so that its threshold T is lowered below whatever the activation level is at the time. They can be implemented using a discrete form of the nonlinear wave mechanics described in the paper, "Nonlinear Wave Mechanics" by Iwo Bialynicki-Birula and Jerzy Mycialski (Annals of Physics, 100, 1976) . As will be described hereinbelow, these
TFDs also act as markers when considered in conjunction 26 with the Kalman-Bucy filtering because they mark the location of a maximum likelihood state (or feature) estimate.
With the TFDs, a mechanism exists by which to dictate where the learning law will apply in a neural-- lattice and at the same time an instar avalanche can be implemented without utilizing rough estimations. Suppose, for example, that the neural lattice in the second layer 68 of the two layer network of Figure 5 has randomly initialized synaptic weights, i.e. uniformly in the interval [-1, +1] and that at the current time the threshold field exhibits a single TFD at some location x1 in the lattice. Then, the current image will elicit a random response in the activation levels in the second layer 68, but the output signals will be nearly zero everywhere except in the neighborhood of x' because there the threshold is so low that almost anything (except a strong antimatch to the current pattern) will produce a strong output. Therefore, a signal Hebbian learning law (i.e., one in which the weight change is proportional to the product of the presynaptic signal times the output signal of the postsynaptic neuron) will capture the input pattern and store it at, and to a lesser extent near, x'. Therefore, the input pattern is stored in the nearest neighbor manner. The learning is shut down everywhere else because the baseline threshold levels of the threshold field 43 squelches random responses in the quiet range.
Since TFDs move like particles in a geodesic across the neural lattice of the IG 16, they can do much more than a simple parameterization of a time axis as in an ordinary avalanche. They serve as idealized internal models of the parametric trajectories of features in the observed scene. By extending the timing chain of neurons 27 1' , into two and three dimensions, it is possible to obtain the ability to store and recall an enormous amount of spatio-temporal patterns in the form of coded trajectories, and the size of that set is larger still if duplicate storage can be avoided by arranging for common patterns to be encoded at the intersection of " trajectories (as will be described hereinbelow) . Also, it is possible to arrange to store not the whole image sequence as it comes in — which would quickly deplete the storage capacity of this and any other conventional design — but only the novel residual that is left over after everything that has previously been stored in the current context has been subtracted from the scene.
It is important to note that if the TFD were a spatial delta function, that a single TFD could only encode N levels of the parameter by its position in an N- neuron lattice, because a delta function disappears when its "peak" is between lattice points. But a distributed TFD which spans as many as six or seven lattice points at a time can represent a virtual continuum of positions between adjacent neurons. In fact, the quantization of the interpolation should be on the order of 2m times the word-length quantization of the activation of the m- neurons under a given TFD; i.e., if a TFD amplitude at each neuron is coded into an n-bit word and the TFD spans m-neurons at a time, then the interpolation ability of the peak of the TFD between neurons should be on the order of m times n-bits.
Learning Law There are two relatively simple learning laws, the Hebbian learning law and the contra-Hebbian learning law** The Hebbian learning law is the archetype of almost all of the so-called unsupervised learning laws, yet it is almost never used in the original form because it fails to account for the temporal latencies which characterize 28 causal processes, which include the classical conditioning behavior of animals. Some variants account for the direction of time by convolving one or both of the presynaptic or postsynaptic signals with the one- sided distribution, such as in Klopf's Drive
Reinforcement models as defined in Klopf, A.H., "A Neuronal Model of Classical Conditioning," Psychology, Vol. 16, pp. 82-125, 1988. Other variants attach significance to positive change in disallowing learning when one or more of the signals is decreasing. One of the two learning laws utilized is the original Hebbian learning:
where a > ~S and y^, yj are the "axonal" signals on both sides of the synapse. Note in particular one of the other common forms which utilize the prethresholded synaptic activations is not utilized.
The contra-Hebbian learning law is a special case of the well-known delta rule. The delta rule adjusts the synaptic weight in accordance to Widrow's stochastic gradient method to drive the actual output yj toward a "desired" output dj. The formula for the delta rule is:
and it is easy to notice that in the special case where dj = ~ , this is just the negative of the Hebbian rule. This special case is therefore called "contraHebbian" learning. It is well known for its use in the construction of novelty filters. See the book by Kohonen, "Self Organization and Associative Memory" (Springer, 1984) . In the Parametric Avalanche, contraHebbian learning is used for the synapses of the 29 novum which receive the estimates from the IG; and Hebbian learning is used for the synapses of the IG.
The Integrated PA Architecture
The fundamental objective of every filtering problem is to determine (i.e., to estimate), the conditional probability density P(x(t) |Y(t') ) for the state or "feature" or "parameter" of the observed system at time t given all the observations Y(t') = {y(s)|t0 s≤t• } up to some possibly different time t' .
The PA 12 is a two-layer architecture as described above, consisting of novum 14 and the IG 16, this architecture being somewhat similar to that illustrated in Figure 5 with the exception that the output of the second layer 68 corresponding to the IG 16 is also fed back to the first layer 66, which corresponds to the novum 14, as an additional input. The novum 14 provides an approximation to the innovations process of the input time series. The IG 16 stores the differential model of the observed system.
Referring now to Figure 6, there is illustrated a block diagram of the innovations approach to stochastic filtering, which has been taken from the paper by Kailath, T. "An Innovations Approach to Least-Squares Estimation, Part I: Linear Filtering in Additive White Noise", IEEE Trans. Automat. Contr.. vol AC-13, pp. 646- 655, Dec. 1968. Superimposed on that block diagram is a partition showing which functions are performed by the novum 14 and which are performed by the IG. Previous to Applicant's present invention, all known implementations of the Kalman filter were based on the well-known iterative formulation and refinements thereof. These previous systems are based on Gaussian statistics in either linear or linearized systems. The bulk of the computational burden is taken up by the matrix operations of the operation of the Kalman gain matrix (shown as the operator "K") in Figure 6.
The novum 14 receives the observation of the plant (i.e., the input) and the IG 16 generates the state estimates of the plant. The "algorithm" of the PA 12- is quite different from that shown in Figure 6. It is based on the more general multi-stage Bayesian estimator as described in Ho, Y.C. and Lee, R.C.K. , "A Bayesian Approach to Problems in Stochastic Estimation and Control", I.E.E.E. Transactions of Automation Control, Vol. AC-9, pp. 333-339, October 1964. In order to describe how the PA 12 operates, the procedure, as described in the Ho and Lee paper will be stepped through to show how the PA 12 accomplishes each step. As can be seen from both Figures 6 and 7, the state transition operator F and the observation operator H within the PA 12 have a "hat" disposed thereover. This indicates that the PA 12 "learns" its internal representation of these operators. This will be described in more detail hereinbelow. Since the Ho and Lee paper assumes that these are given, for the purpose of description, the estimates of these operators are "good". In the following description, the steps as followed by the Ho and Lee paper are as follows:
Step 1 — Evaluate P(X]ς+1 | x^)
Allow the threshold field to evolve forward one increment of time. The new threshold field, T(x,t]+1) , represents the conditional likelihood function for the states (features) x given the prior likelihood function for those states.
Step 2 — Evaluate P(Z]+1 | x*^, xj+i)
In Ho & Lee's terminology, z-^+i is the new observation vector, which in Figure 7 above is denoted by y(t) . To evaluate this probability, current activations L(x) are fed through the threshold field T(x,tjς+1) to produce the a-priori state estimates E(x|T). This IG output is then passed through the synapses of the novum 14, which implement the fl matrix. Since H implements the internal model of the observation matrix H, the result is the estimate of the observation as predicted by the IG 16. (Note that the observation itself is treated as a likelihood function over the receiving transducers, so this estimate is itself a likelihood function.) The novum 14 then subtracts this estimate from the current signal to produce the innovations process.
Step 3 — Evaluate P(X]ς+ι, zjς+i | Zjς)
This is the joint conditional probability for the new state and the new observation, given the history of all the observations up to and including time t]<- (denoted by the uppercase Z-^) . This is accomplished in the PA 12 by using the innovations process ("novelty") from the novum 14, together with the covariance connections among the elements of the IG 16, to deflect the trajectory of the threshold field wave-particle(s) . The interaction between the innovations process and the threshold wave trajectories is dependent on the way those trajectories are realized. If they are realized using Quantum Neurodynamics, it can be accomplished by warping the underlying refractive index field of the propagation "medium" as will be described hereinbelow Section 3.2. This is qualitatively equivalent to the application of the Kalman gain matrix to the innovations process to obtain the correction vector for the tangent to the plant trajectory. This is defined as:
32
dx/dt = F(t) + K(t) n(t) ,
where F(t) is the state transition operator, K(t) is the Kalman gain, and n(t) is the innovations process.
Step 4 — Evaluate P( ]ς+ι | Z~-+~ι)
The novelty resulting from the new observation is passed through the updated IG synapses (along with any recurrent IG signals) to produce the new activation levels L(x) in the IG 16, and then L(x) is passed through the threshold field to obtain P(x|T) for the new set of observations. Note that if the innovations process contains no new information, then it will be orthogonal to the patterns stored in the synapses, and the resulting activation field which "drives" the output through the threshold field will contain white noise. Thus, all the information required to estimate the state of the observed system will already be contained in the threshold field in this case.
Step 5 — Select the state(s) corresponding to the maximum likelihood estimate(s) .
If all other processes were held constant at this point, simple iteration of the PA 12 dynamics would result in contrast enhancement of the likelihood function contained in the output signals of the IG elements due to the effect of the sigmoid threshold function. This would approximate the selection of a maximum likelihood state estimate by driving P(x|T) toward an indicator function for the set of states whose L(x) values exceeded the threshold levels T(x) . The other processes are not held constant, and, therefore, although contrast enhancement will occur, it will not converge in general to the limiting case. Whereas these steps are to be performed sequentially in Ho & Lee's description, they can be executed in parallel and asynchronously in the PA 12, thus implementing a true Kalman-Bucy filter.
System Identification
The five steps above are not all that is going on in the PA 12. They constitute only the "tracking" effort of the PA 12 estimation algorithm. There is also a "system identification" effort going on. For the description above, it was assumed that the system model P and the observer model H were "good", but in fact, they must be continually improved (i.e. , learning) . In the PA 12, the novelty that resulted fro Z-^ is utilized to recode the synapses of the IG 16 so that those belonging to elements with the strongest outputs are vectored most strongly toward the observation. The novelty is itself the conjugate gradient vector for minimizing the error between the actual observation and the predicted observation that resulted from the state of the estimate. The new encoding will not effect the current signal from the IG 16 to the novum 14, nor will it deflect the physical trajectory of the threshold wave particle. But it will deflect the apparent trajectory of the threshold wave particle the next time it crosses its physical trajectory because the processing elements in its path now encode different features. This provides the improvement in the internal model of the dynamics of the observed system. It has no effect on the current estimation effort, but it will effect the convergence rate for the next observation of the same trajectory.
The observer model H is contained in the ΪG-to- novu synapses of the novum 14 and it is established with a very short time constant utilizing delta-rule learning. The threshold level over the novum 14 is level and does not vary with time. That level defines the 34 maximum information level for the novum 14 activations, i.e., the maximum entropy level. That level is the "desired output" for each processing element (pixel or neuron) of the novum, which 14, for simplicity of computation V, has been chosen to equal zero.
When the PA 12 is utilized for passive observation and estimation only, the observation of the vector Y(t) is applied to the novum 14 through hard-wired, non- learning synapses, one component Yj (t) to each novum neuron 28. The feedback signals P(x:T) from the IG 16 enter through learnable synapses on the input vector 22. In the ideal case, P(x[T) will be a traveling delta function, so that at any one time, only one of the IG input lines of each pixel will have a signal on it. The delta learning algorithm will mold that synaptic weight into a mirror image of the signal component, i(t) , falling on the pixel at the same time.
When the PA 12 is used for passive observation and estimation only, the observation vector y(t) is supplied to the novum through hard-wired, nonlearning synapses, one component yj (t) to each novum neuron. Feedback signals P(x|T) from the IG 16 enter through learnable synapses (i.e., input lines 36). In the ideal case, P(x|T) will be a traveling delta-function, so that at any one time only one of the IG input lines 36 on each neuron 36 will have a signal on it. The delta-rule learning algorithm will mold that synaptic weight into a mirror-image of the signal component, y (t) , falling on the pixel at the same time (the mirror being at the threshold level) . Thus, the synaptic weights will be a spatially recorded replica of the signal waveform, and only those synapses which were connected to IG neurons 36 activated by the traveling delta-function actually partake in the representation. Others are available for encoding observations of unrelated signal patterns.
35
The delta rule learning law of B. Widrow has the form,
5wji = -αxi(dj- j) ,
where J is the output of the neuron (after thresholding) , and dj is the desired output. Therefore, when the "desired output" is zero, the delta rules take the form,
δwj = -α ^x ,
which is the negative of the Hebbian law. We call this particular learning law the "contraHebbian" learning law. When used in the Parametric Avalanche*, it has the following important effects: During the early-stages of learning, when the internal model is a poor predictor of. the input, the output of the novum 14 (the novelty) is ^ - very strong — close to +1 or -1. The inputs fcε the novum from the IG are zero everywhere except for a few highly localized regions of high probability and therefore, learning is controlled principally by the magnitude of the prediction error. This means that in the early learning phases (or any time that a "surprise" is present in the input) , the adaptation of the weights in the novum 14 to incoming patterns will have a very short time-constant, whereas after the IG model beq metf "a good predictor, the weight adaptations will have a long time-constant. In every case, the weight changes will tend to record for each IG neuron 34 an "averag *" of the input patterns that were present when that 16 neuron 34 was active, but when the learning improves, the variance in the averaged population becomes small.
Implementation of the Quantum Neurodynamics
The motion of the wa^e-particles which become associated with the dynamical model of the observed system may be achieved in basically two ways. This can be done by using an appropriate bell-shaped depression in an otherwise level threshold field and simply translating it in the desired direction by incrementing indices in the data array, or, if more than one such depression is to be moving simultaneously, by vectoring the data itself. This is appropriate for any implementation of the PA 12 on general-purpose computing equipment or special-purpose uniprocessor/vector processor equipment.
When parallel processing equipment is utilized, the foregoing method will not be appropriate or even desirable, because that method is not a "local" algorithm. But it is still possible to generate this wave-particle motion with a local algorithm based on the mathematics of solitary wave propagation.
The nonlinear Schroedinger (NLS) equation is one route to the extension of the required dynamics to two and three dimensions. This equation describes the motion of photons and phonons in dispersive media, such as Langmuir waves in plasma. The wave-particle solutions propagate in a medium that is characterized by a nonlinear refractive index which need not be spatially uniform and which, therefore, have the requisite properties for control and modulation of the trajectories of the wave-particles. As described above, this refractive index can be tied directly to the response of the IG neurons 34 to the novum "error" signal to induce gradients in the refractive index field which will deflect the soliton trajectories toward smaller errors, as required by the Kalman-Bucy filter.
The particle motion that is obtainable with these equations will also have extremely useful nonlinear interactions which will enable networks of PA 12 modules to build inferential models of the observed system, i.e., models which are not based on any prior observation of the system's behavior. When two TFD's collide during "daydreaming" (i.e., not as a result of current observations) , they will be deflected into trajectories in the lattice that may never have been coded by observed successors to the pre-collision trajectory segments. Whatever events lie in the new paths will then be induced to follow from the "gedanken-experiment". That does not mean that the same events will follow from an equivalent laboratory experiment, but if they do not, then the IG model will have to be improved. This is the equivalent of a hypothetical "production rule" involving the interaction of real-world events.
Processing Elements Referring now to Figures 8a and 8b, there are illustrated schematic representations of the novum neuron 28 and the IG neuron 34, as described hereinabove with respect to Figure 2. The novum neuron 28 in Figure 8a receives an input from other neurons in the novum lattice on the lines 32. Additionally, it receives an input from each of the points in the local plane on input lines 30. Weighting factors are associated with each of the input lines 32 and each of the input lines 30. In general, the external input vector Y(t) will be fanned out so that every novum neuron 28 receives every component of the vector Yj (t) . This is utilized with respect to learning; however, without addressing the adaptive control problem, it will be adequate to consider each component Y(t) with exactly one novum neuron and to fix the synaptic weight at +1 (hardwired not subject to learning) . Basically, this is achieved by fixing the synaptic weights on the input lines 30 to be equal to the Kroneker delta function <5j.j. For identification purposes, novum neurons 28 are identified with integer indices (such as "i") and the IG neurons 34 will be identified with the vector indices (such as "x") corresponding to their coordinants in a geometric lattice.
The forward flow of signals from the novum 14 to the IG 16 implements an instar avalanche, that is, the novum 14 is a pixel array, while the threshold field'-of the IG 16 supports the propagation of TFDs on the two or the three dimensional IG lattice. In particular, the threshold function for the IG neuron 34 at a lattice position x is given by:
σIG(a(x,t);T(x,t)) = [l+exp{4m(T(x,t)-a(x,t) ) }3"1
Where a(x,t) is the activation level of the IG neuron 34 at x; m is the steepest slope which occurs at T(x,t) ; and T(x,t) is governed by a non-linear wave equation which admits soliton-like depressions below the normal level of T(x,t)=l. The "current" position of T(x,t) is illustrated schematically in the interior of the IG neuron 34 by the coordinant axis 80. As described above, the learning law of the IG 16 is the Hebbian law. The output signals of the IG 16 represent the conditional probability density P(x|Y(t)) for the state of the observed system in the context of the prior observations Y.
The feedback flow of signals from the IG 16 to the novum 14 implements an outstar avalanche except that the learning law in the novum 14 is the contra-Hebbian law. Moreover, the "time" domain is factored through the two or three dimensional IGs 16 instead of being a simple one dimensional domain. The result is that when a "pattern" is recalled, it will be the negative of the observed pattern so that if the recall is executed at the same time that the original pattern is replayed into the sensor array, the output of the novum 14 is zero from all pixels. The threshold function of the novum 14 is given by:
~-$ (a (t) ) = -1 + 2[l+exp{-2ma(t) }] -1
independent of position within the novum lattice. Av" waveform 82 represents the output response of the novum 14 in neuron 28.
Referring now to Figure 9, there is illustrated a more detailed flow from the focal plane in the observation block 10 through the novum 14 and the IG 16 for a two dimensional lattice. The novum 14 is comprised of an input plane 84 and an output plane 86. The focal plane in the observation block 10 was considered to have a plurality of pixels with each pixel represented by FftfY) which represents a neuron y (reference numeral 83) in the novum input plane 84. Therefore, one input of this novum is the vector output 22 from the IG 16 represented as u(x) . Each of the novum neurons y described above, has a plurality of weighting factors associated with each of the IG inputs. These are represented by a template 86 with each point representing a weighting factor Wjj(y,x) which represents the novum neuron y and the synaptic input from an IG neuron x. The dot product of u(x) and wN(y,x) is taken to provide an output array 90 which is summed over x for each of the synaptic inputs and each of the novum neurons. This is then summed in addition block 92 with the input vector f(y). The output of the addition block 92 is then passed through the threshold function of the novum 14 as represented by a block 94. This provides the output of the novum n(y) for the specific neuron 83 in the novum 14, represented by a location 96 on the novum output plane 86. The output of the novum 14 is then input to the IG 16 on the vector input 20. The IG 16, described above, is comprised of an activation plane 42 and a threshold plane 43. The activation plane is referred to as the IG' 42. The IG' 42 is comprised of an input plane 98 and an output plane 100, the output plane 100 comprising the output of the IG 16. Each of the neurons 34 in the IG 16 is, as described above, arranged in a geometric lattice. A particular one of the neurons 34 utilized for this example is illustrated by a specific neuron 102 in the input plane 98. The neuron 102 has associated therewith a template 104 wherein the weight values are stored. For each of the neurons 34 in the IG 16, there is one weight associated with each of the novum neurons 28. Therefore, the template 104 is illustrated with a single point 106 representing the weighting factor wIG(x,y) . The dot product of the output vector for the novum n(y) and the associated weighting vector WjG(x,y) is taken to provide a template output 108. The template/output is then input to a threshold block 110. The threshold block 110 receives on the other input thereof the threshold function T(x) which is derived from the input vector 20 at the output of the novum 14 by way of the ave equation, which was described hereinabove. This yields the output u(x) for the output from the IG 16.
Step Function Response of the PA
The network behavior of the PA 12 is rather more complicated than is indicated above, because the learning laws are inseparable from the dynamics of the architecture. It is easiest to explain the step function response of the network. Suppose that at time t = 0 the novum 14 is illuminated with an image (applied to the hard wired synapse) , and that there is a single TFD moving along a geodesic in the IG. Suppose further that all synaptic weights at t = 0 are equal to zero except the hard wired excitatory synapses of the novum. At t = 0 each pixel of the novum receives a constant input signal yn which may be positive, negative, or zero (the latter case being uninteresting) . That signal generates an activation of the same level which is passed through σ-*- before being fanned out to the IG 16. Almost all the IG neurons 34 have zero output due to the high threshold level and the synaptic weights of zero. But in the vicinity of the TFD, whose lattice barycenter is at X(t) , the threshold is low enough that the output signals P(X(t)+δx | Y(t)) are high enough to activate the Hebbian learning law. Thus, at t = 0 the neurons at and near X(0) absorb the input pattern (yn | n e novum} into their synapses, with X(0) itself receiving the strongest copy.
But P(X(0) |Y(0)) is also radiating back to the novum, and after a short time — at t = 0+δt — a group of synapses on each novum pixel have absorbed a portion a. of the negative -yn of the pattern, which they add to the input, which is still +yn, leaving (l-α)yn to go to the IG 16. Thus, the synapses at X(<St) in the IG 16 absorb a weaker version of the input pattern, and the fading continues exponentially so that after a while, the output of the novum 14 drops to an equilibrium level close to zero. (If it were to vanish altogether, there would be no way to continue training the synapses and driving the novum 14 output toward zero.)
The process is that of a feedback control ϊήechanism for the internal model of the "plant" which consists of the soliton wave particles on the IG 16 lattice. The input to this model is the observation, but only after it has been supplemented by the "regulator" in the novum 14. The regulator output is constructed to stabilize -the plant, which in this case means that the TFD's are moving along their geodesies with minimum disturbance, i.e., with their own inertia.
The effect is that the output of the novum approximates the time derivative of the step function input and therefore, the patterns stored in the synapses of the IG trajectory X(t) record that time derivative. The patterns stored in the synapses of the novum 14 record the negative of that time derivative. (Note that IG neurons encode spatial patterns, while novum neurons encode temporal signals.) The time derivative of a step function is also the innovations process of the step function. This does not hold true for more general signals.
Reconstruction of Complex Signals When a complex pattern is presented to the novum, it is always presented in a temporal context. The current internal representation of that context is the state of the threshold field of the IG 16, i.e., the positions and velocity vectors of the TFD markers. Those TFD's sensitize or condition the IG 16 for the detection of certain states/features in the input. The output of the novum 14 is filtered through the synaptic templates of all the IG neurons 34 which respond with activation levels representing the a-priori (or "context free") estimate of the content of that data. These activation levels are filtered through the nonuniform IG threshold field 43 to produce the IG output distribution, which represents the conditional likelihood for the presence of states/ features in the input.
When fed back to the novum 14, the IG output distribution is treated as a collection of scalar coefficients for the formation of a linear combination of the patterns that are stored in spatially distributed form in the novum synapses. This construction produces the projection of the current observation into the pattern subspace spanned by the prior observations. (It is actually a "fuzzy" projection, since the TFD's are not delta functions over the IG lattice.) This construction also constitutes a decoding of the abstract IG estimate and is easily seen to correspond to the method of "model based vision" in that the features that are detected by the IG 16 are used to reconstruct a model in the novum 14 for comparison against the actual observation. The correspondence even reflects hierarchical model based schemes if one allows that a network of PA bdules can achieve a nesting of more and more abstract feature sets as the distance of each module from the sensory array increases.
But the correspondence with model based vision is limited: The reconstructed model is NOT an estimate of the current observation, but rather it is an estimate of the observation that will arrive after a time interval tioop hich is the time required for the signal to propagate forward from novum to IG and back to the novum again (because that is how the recording occurred during learning) . Moreover, the feature detection has been performed not as a one-shot pattern recognition operation on an isolated image (though it could cldrarly do this as a special case) , but rather as an integrated historical estimate with the temporal gain of the Kalman-Bucy filter.
Once the estimated observation is constructed, it is compared against the actual observation to produce an error pattern, which is used both to correct the ongoing estimate (through the Kalman gain operation of the variable refractive index field) and to improve the IG coding for future reduction of the error covariance (through the action of the Hebbian learning law) . This error pattern is the only output of the novum , and since it consists of the residual after projection of the observation onto the historical subspace, it is rightly called the "innovations process" of the observed stochastic process. Technically, however, it is only a partial or suboptimal innovations process because no single PA module has the capacity to store the entire- fully differential history of its input. This is an important technicality: A true innovations process is a Brownian motion, useless for control or error correction. But a suboptimal innovations process can be so used albeit in a computationally intractable form.
Adaptive Control With the PA
In modern control theory, one uses state feedback from the controlled "plant", together with a "reference signal" or desired trajectory, to generate a control input to the plant that will induce its trajectory to converge to the reference. When the state of the plant is not directly available, as is usually the case, it is well known that the control problem can be "separated" from the state estimation problem and the separate solutions can be recombined as if the state had been available. This is called the "Separation Property", and to be valid, it is required that the plant be "observable". This is described in C.T. Chen, "linear System Theory and Design", Rinehart and Winston, 1984.
Since the PA 12 constructs the (estimated) probability density for the state of the system, it contains all the information necessary for achieving any desired control objective so long as the observability and controllability criteria are satisfied. The output of the novum 14 is already adequate to control the evolution of the internal model of the plant, contained in the IG 16, so it only needs a gain transformation to allow it to control the plant itself. In order to allow the adaptive computation of this gain transformation according to neural network principles, we have incorporated it into the K^j synapses of the novum. Up to this point, these synapses have been "rigged" to look like the identity matrix for passing the observation y(t) into the novum 14. Now, we employ the more general case to transform the observation into a form which, ' hen orthogonalized and recorded into the IG 16, generates an innovations process which is suitable for direct control of the plant. That is, during training we "rig" the internal representation of the plant in the IG so that when the innovations process is applied as the feedback control to the plant it has the same effect on the K- transformed observation as it has on the W-transformed estimate of the IG 16.
By way of example, the mechanism by which the PA can accomplish automatic target recognition and parameter estimation is described, and how to train and operate a system. The training procedure and the result thereof is first described, which for this system is the equivalent of defining the feature set and building the feature detectors for a model based vision scheme. Then the target recognition and tracking mechanism will be described, which detects the features in the input signal, uses them to build a representation of the estimated target and tracks the target while it moves.
Training
Training the PA 12 requires first deciding on a set of basic features which are needed to distinguish targets of interest and selecting training data that is rich in those features and low in confusing or conflicting features. The IG 16 subnetwork will be initialized with "grandmother cells", each of whose synaptic weights match one sample of one key feature of the targets. Some care will have to be given to the hierarchical primacy of these features, because the most primitive features belong in a PA module (which modules will be described hereinbelow) that is closest to the sensor array, while the most abstract features belong in a deeper PA module.
Next, the PA 12 will be "imprinted" with training patterns which are rich in "grandmother" images. In 'the context of the Parametric Avalanche, imprinting occurs when at some time t' one of the key features first appears in the spatiotemporal input pattern. Prior to this time there are no TFD's moving in the IG lattice, because no IG neuron 34 has had a high enough activation level to interact with the threshold field and therefore the threshold field is uniformly flat. But at time t1 one of the grandmother cells reacts strongly to the passing image that it is coded for and that reaction "plucks the threshold field" to initiate the first wave motion.
It is believed that the resulting wave motion will probably not propagate initially like a soliton but rather like ripples in a pond, since this first response will supply the initial position but not the initial direction for the threshold field dynamical equations. That is a plus, not a minus for the training procedure. The isotropic propagation of a trough in the threshold field will still supply a feedback signal to the novum 14 to extract the time derivative of the evolving pattern, but that differential representation will simply be stored outward in all directions away from the initial stimulus. (If the threshold field is governed by true soliton equations, the expanding ripple should experience what is called "self-focusing", which will break it up into a starburst of solitons.)
If there were only one grandmother cell in an infinite lattice, there would be nothing to distinguish one trajectory from another. But the lattice is not 47 infinite. It will either have boundaries for the waves to reflect from, or it will be toroidal so that waves will come back around upon themselves. And there will be more than one grandmother cell in the initial encoding. 5 Therefore, asymmetries will result when cells that were once coded for pattern A are subsequently coded during the passage of another TFD for pattern B. The development of these asymmetries are the key to the building of good internal models.
10 Referring now to Figure 10, there is illustrated a top view of the IG 16 lattice. In Figure 10, there are illustrated two grandmother cells Gl and G2 located at x± and x2 respectively in the IG 16 lattice. Suppose that in the training data G2 always follows Gl by a time J 15 interval of δt^, and that another feature F always follows G2 after a time St2. Assume that the distance between Gl and G2 in the lattice is large enough that the time required for a threshold disturbance to travel between them is greater than 6 . And assume that F is
20 not one of the key initial features, and in fact we may suppose that the system has not noticed the feature F before.
What will happen whenever a variety of training data in which the pattern Gl appears is that first, the
25 appearance of Gl as an observation will stimulate the Gl" cell, which will initiate the isotropic radiation of a disturbance in the threshold field. This will occur for the appearance of G2, δt^ seconds later. Then, δt2 seconds after that, the feature F appears in the pattern,
30 i.e. , the pattern at that time consists of F+R, where R is random with zero mean. (If any part of R consistently followed Gl and G2, it would have been included in F.) At that time also, the two threshold disturbances are concentrated in hyperspheres which we
35 presume to intersect in a hypercircle. IG neurons 34 which have a lowered threshold due to being in one of the hyperspheres (and being under a TFD) will weakly absorb the synaptic code for the feature F. The random part of the sample patterns will be cancelled by the learning law. But IG neurons 34 in both the hyperspheres will strongly absorb F because their thresholds will be lower (hence their outputs will be stronger) due to the superposition of pairs of TFD's. Thus, the strength of the code that is learned at any location is determined by the confluence of consistent events in the data.
There are many things happening during learning. Suppose, for example, that we had inadvertently initialized the two grandmother cells Gl and G2 too close together. Then the TFD disturbance from Gl will land on G2 when some pattern in the input scene other than G2 is active, and that grandmother cell will lose its coding. But if G2 consistently follows Gl, then it will be recoded into a set of neurons at the appropriate distance away from Gl.
Consider also the following important phenomenon which results from the use of nonlinear waves in the threshold field. It is expected that during training for separate, distinct event series, certain features (states will be common to more than one series. Thus after training, the recall trajectories for (at least) two series will intersect at the common state. Suppose now that a third series is constructed which consists of the conjunction of the first two series, synchronized so that their recall trajectories converge on the shared state at the same time. We consider three cases: (l) This third series is a "gedankenexperiment", i.e., the recall was stimulated not by an actual observation, but by a contrived signal originating perhaps from the "operator" or perhaps from elsewhere in the network; (2) it is a real experiment, and the observed system reacts in a nonlinear fashion to the attempt to superpose the subsystems onto the same state at the same time (i.e., a collision or some other form of interaction occurs) ; (3) it is a real experiment, but the apparent superposition of states is only an artifact of the training, i.e., the two subsystems behave in conjunction exactly as they 'did in isolation with no interaction at any time.
In case #1 (the "gedankenexperiment") , there is no trajectory correction from any observation, so the two TFD solitons which are recalling the trajectory move along their respective geodesies until they collide at the shared location in the IG 16 lattice. After the collision their trajectories are deflected in accordance with their soliton dynamical equations (which means, among other things that a number of conserved quantities are in fact conserved) . Thus, after the collision, neither TFD recalls to the novum the same estimated events that would have been recalled had the two series been activated in isolation (or had the interaction been linear) . What is recalled is a prediction, or
"inference", about what might be observed should the gedankenexperiment be converted to an observed experiment. This allows the PA 12 to provide a way to account for the differences in the appearance and motion of targets when they are extracted from laboratory isolation and involved in real world interactions.
In Case #2 the motion of two TFD's is stimulated and controlled by a real observation. Due to the continuing error correction of the Kalman gain, the TFD's may not move along lattice geodesies, but we suppose that they converge nonetheless upon some common state in the lattice. After their collision, they will emerge along new trajectories, distinct from those they would have taken had they passed the common state at different times. If the neurons under those post-collision trajectories had previously been correctly coded for the ensuing observations, then the predicted observations will be correct and no trajectory corrections will be needed or generated. Otherwise, the error signal from the novum 14 will do two things: (1) It will warp the refractive index field (RIF) to further deflect the TFD's in the direction of smaller error, and (2) it will add (via the Hebbian learning law) a correction to the synaptic patterns in the wakes of the TFD's so that a subsequent repetition of this experiment will require less of a correction — i.e., it improves the model.
In Case #3, the deflected trajectories of the colliding TFD's predict an interaction of the two observed systems, but no interaction is evidenced in the observations. Thus, the trajectory deflections are
"false", and the TFD's should have proceeded on course without deflection. If the post-collision trajectories are uncoded by another training, then they will eventually receive duplicates of the coding in the geodesic trajectories. Otherwise, an inconsistency develops which can only be resolved by extending the IG model into higher dimensions. In practice, this cannot be done by physically implementing the IG on a 4- dimensional lattice; but it can be accomplished by networking a second PA module to the first.
The implication of these phenomena for network training for target recognition and tracking is that they provide the adjustment mechanism to drive the internal representation of the environment into a form that is consistent with the environment, even though it can never be a complete representation. "Consistency" in this context actually has a precise mathematical definition in terms of homotopic mappings, and it is known that the brains of animals and humans achieve homotopic representations of the sensory fields in the cortical lattice.
Although these examples were constructed with discrete events, it must be appreciated that all the processes involved are taking place continuously (modulo the effective clock rate of the multiplexed host processors) . The feature F which was inferred above by the learning procedure is only a part of a trajectory F(t) of features which evolve continuously during the motion of the observed systems. Moreover, as the learning procedure imbeds features into their correct causal context, subsequent occurrences of those features in an observation will supply not only the position but also the velocity for a TFD (we have shown details of how the velocity information appears in our simulations) .
This will cause recognition to be a TFD out as a directed particle rather than as a ripple in all directions.
Recognition and Tracking
Learning, recognition, and tracking are not performed as separate operations in "the Parametric
Avalanche. All processes are continually in effect, so that if the tracking error is large in the current trial, the features are recoded slightly so that they will be smaller in the next trial. However, for the following discussion it will be assumed that learning is in the
"fine tuning" stage so that the mechanism for recognition and tracking of an observed system can be concentrated on.
Recognition occurs when an input pattern- drives one or more parametrized feature detectors over their thresholds. The outputs of all feature units are sent back to the novum 14, where they are treated as scalar coefficients in the linear combination of one or more spatial patterns stored in the synapses of the novum 52 neurons 28. In the novum 14, this linear combination constitutes the prediction of the next observation, and since there is a small time delay in constructing that prediction, it is active at the time when that next input arrives. There is no problem getting the timing right, because if the delay is not the same as when the patterns were learned in the first place then the resulting error will correct the time base as part of the Kalman gain transformation.
Note the similarity here to "model-based" vision techniques. The observation is processed for matches to a number of abstract features which are coded into the IG neurons 34, and these feature responses are used to regenerate a model of the observation. In this case, however, the regenerated model is not a model of what was seen, but what will be seen a short time step into the future. By the time the model is regenerated, the next observation is received and ready for comparison and processing of the error vector.
When the feature detectors are stimulated by an observation, they tug on the threshold field and initiate the motion of a TFD marker along a trajectory determined by the location of the feature IG neuron 34 and the velocity vector (if any) associated with that feature. (How the velocity vector is determined by the gradient of the "refractive index" field associated with the activation pattern generated by the observation is described hereinbelow.) This marker moves under its own inertia to generate continuing predictions. That is, the IG neurons 34 whose threshold are lowered by the traveling marker generate an output signal whose intensity is determined by the combination of the synaptic template matching and the depth of the threshold; and this signal fans out to the novum 14 to contribute its decoded template to the current prediction. This prediction is subtracted from the actual observation and the residual error is transmitted from the novum 14 back to the IG 16.
Now, let us suppose that one of the TFDs is moving in slightly the wrong direction, but that the learning was good in the sense that the actual received pattern has a feature that is encoded into an IG neuron that is "close" to where the strongest feature response is occurring. Let P be the predicted pattern vector and let P1 be the actually observed pattern; and to make things simpler, suppose that each pattern is identical with the feature that is coded into the synapses of the IG neurons 34. Then the error signal is δP = (P'-P). It is trivial to show that when the error signal from the novum 14 is received by the two IG neurons 34, then the activation at P1 is greater than the activation at P:
a(P') = (δP,P») = (δP,P)+(δP,δP) > (δP,P) = a(P),
with equality if and only if δP=0. Therefore, since the refractive index of the threshold medium of the IG 16 is tied directly to the activation levels of the IG 16, the activation pattern in which the TFD marker is moving will warp the medium in just the right direction to deflect the marker into compliance with the observations. This, at least qualitatively, is what is required by the Kalman-Bucy filter.
The principal advantage of the continuous estimator over stationary DSP methods and model based vision is that the latter are "single-shot" decision methods. That is, they must do the best they can with the signal-to- noise ratio that is available in a single frame (which may be the result of the integration of a number of scans) of data. The continuous estimator, on the other hand, makes decisions based on the information contained in all the relevant history of observations of the target, thus achieving the gain of massive integration while automatically compensating for (or ignoring) constituent motions in the target image.
PA Control Module
A neural network architecture based on the Parametric Avalanche Kalman Filter (PAKF) is operable to observe a complex system and issue control signals to cause that system to track a desired reference trajectory. The design employs a PA module to estimate the state of the "plant" and to function as the servocompensator. This PA module has an adaptive feedback gain matrix to transform its state estimates into the required control signal. The adaptive gain matrix monitors the effect of the control signal on the tracking error and adjusts to minimize it, thus allowing appropriate controls to develop even in the event that an actuator motor is cross-wired.
The objective is to design a neural network solution to the problem of asymptotic tracking and disturbance rejection. This problem is discussed in the Chen of the PA. The asymptotic tracking problem is a generalization of the regulator problem. In the regulator problem, a control input to the plant is sought which will stabilize the plant, which usually means to drive it to the zero state. In the tracking problem, a control is sought which will drive the plant toward a desired trajectory called the reference trajectory, which need not be either zero or constant.
One must be careful about the meaning of stability in the context of the Parametric Avalanche, because the PA does not identify any state as the "zero" state. The PA only associates novel observations with points in a probability space and then estimates the likelihood that those features are present in subsequent observations. The stable state of the PA consists of a (possibly empty) set of TFDs whose trajectories are geodesies on the IG 16 lattice, i.e., a set of TFDs which are not being accelerated by any warping of the refractive index field due to prediction errors or any other induced accelerations.
Asymptotic Tracking and Disturbance Rejection
Referring now to Figure 10, there is illustrated the structure of a robust solution to the problem of symptotic tracking and disturbance rejection as defined in the Chen reference, pp. 504-503. This is a "state space" solution, which is appropriate for implementation with the Parametric Avalanche. It employs an internal model (called the servocompensator) of the disturbance generator and the reference signal generator, coupled in tandem with the observed system.
The servocompensator (S/C) receives the difference between the reference signal and the output of the plant, and that difference modulates the state of the servocompensator in the same way that the sensor input modulates the state of the IG 16 in the Parametric Avalanche. The S/C state is supplied to a gain matrix which transforms it into a control supplement to the state feedback stabilization control (if there is any) . It is well known — and pointed out on page 506 of Chen — that the state feedback can be supplied by a state estimator, so long as the plant is observable and controllable. This is called the Separation Theorem, as it allows the state estimation problem to be separated from the control problem.
Note that if the output of the plant already matches the reference signal, then the "control" input to 56 the S/C vanishes and its state is allowed to stabilize, thus supplying a benign input to the gain matrix.
Design of the PACM Using the PAKF
Referring now to Figure 12, there is illustrated a block diagram of a control module 114 which employs two PA Kalman Filters (PAKFs) 116 and 118 for the state estimation functions required in the tracker described above. Each PAKF 116 and 118 is followed by a gain matrix 120 and 122, respectively, to transform the state estimates into control signals. This diagram is functionally the same as the tracking system described above and shown in Figure 11. However, it is a bit deceptive for two reasons. One is that the PAKF which is used for asymptotic state estimation does not employ the available control input to the plant 10 to improve its estimates, as it should be. The other is that the gain matrices 120 and 122 cannot be implemented as adaptive neural networks in the positions where they are shown.
In the PACM design of Figure 112, PAKF 116, which performs state estimation for the feedback stabilization function, sees only the output of the plant 10. In figure 13, taken from Chapter 7 of the Chen reference, the design of a different asymptotic state estimator is shown which receives both the output of the plant and the control input to the plant. In the figure, an estimator 124 is illustrated in feedback with the plant 10. The difference between the designs of Figure 12 and Figure 13 is that in the design of the Kalman filter, the plant 10 is assumed to be "driven" by noise. That is, all deviations of the plant trajectory about the geodesic are determined by the equation,
dx/dt = A(t)x
are accounted for by the plant noise q(t) , i.e., dx/dt = A(t) x + B ' (t) q(t)
But in the design of the asymptotic state estimator, that noise is assumed to have a significant deterministic component, which is accessible to the user in the form of a control signal u(t) :
dx/dt = A(t)x + B(t)u(t) + B'(t)q(t).
The PAKF simply associates incoming patterns with IG neurons 34 in the path of a TFD, so it will build a model of the control during training. Be since the control signal tends to be generated independently of the plant, any attempt to train the PAKF on observed trajectories that may be pushed one way at a certain point in one trial and another way at the same point in the next trial will encounter great difficulty in constructing a good model. If, however, that control signal could be made accessible to the PAKF through an appropriate mechanism, , then it could serve as an "organizer" of the novelty during training and as an accelerator of estimation convergence during recall.
Comparison of Figure 13 with Figure 7 reveals that in fact the only functional difference between the PAKF and the design in Chen (Figure 11) is the accessibility of the control signal to the estimator. That control signal is now made accessible by combining the stabilizer and servocompensator PA's into a single unit and arranging for the innovations process itself to serve as the control for the plant.
With respect to the gain matrices 120 and 122, its basic function is to move the eigenvalues of the composite system into the left half of the complex plane, and as far left as possible without saturating the controller. What is important here is that a gain is acceptable if the composite system is asymptotically stable (in the sense described above) . One gain is better than another if it drives tne system toward the reference signal faster.
As described hereinabove, the rationale for applying the gain matrix prior to the generation of the innovations process, rather than following it, it was shown that the information needed to adaptively construct this gain was not available in a form suitable for neural network implementation if one tried to apply the gain at the usual place between the state estimator and the control input to the plant. Mathematically, this transposition of the matrix is easy to do. And from a practical standpoint, it permits us to compute it adaptively, rather than having to solve an eigenvalue problem. Among the obvious advantages of the adaptive computation is the fact that if, for example, someone should cross-wire an actuator motor, the learning law of the gain matrix will detect the fact that its control signal results in an increase in the magnitude of the error between the observation and the reference signal and will adjust the matrix to reverse the signal to that motor. Referring now to Figure 14, there is illustrated the preferred PACM design, in which the gain matrix has disappeared because it is implemented adaptively in the novum 14. Each of the novum neurons has an output n(t) which is input to the IG 16 and also to the plant 10 on a line 126. The plant output, y(t) , is input to an error block 128 that subtracts the value of y(t) from an external input r(t) to provide the input value e(t) to the novum 14.
Normally, each synapse is adjusted according to the product of its input times the output of the neuron 28. But the output of the novum neuron nj (t) is important only for its effect on the tracking error e = r - y which is the difference between the reference signal and the observation. The i-th component of that error happens to be available, since it is input to every element of the novum 14. Our learning objective is to minimize the absolute value of this error. We therefore adapt the gain matrix with the following learning law:
δKij = -β βj(t) (d/dt)(e-K ) sgnf ij).
This means, for example, that if the j-th component of the tracking error is positive and the projection of the error vector onto the input synapse vector is increasing, then K-H will be driven toward zero. If, on the other hand, the tracking error is positive and the projection is decreasing, then K-H will be driven further away from zero. In other words, errors that are increasing in magnitude are bad and weights that have the wrong sign with respect to the effect of the control need to be pushed across zero to the other side.
The learning law needs to be modified slightly to prevent the control from saturating. Saturation occurs when n-^(t) approaches +1 or -1, which are the upper and lower asymptotes of the novum sigmoid function. Pushing the Kji further away from zero will have negligible effect on the control signal and may cause numeric overflow of the synaptic weights. A solution is to shutdown the learning by linking the rate constant β to the magnitude of n^(t) .
A simple example will be stepped through in detail to illustrate the action of the PACM. In this example, the observation is a measure of the elevation angle of the barrel of a rapid-fire gun mounted on a moving platform. The reference signal is supplied to the operator and for this example it is assumed to be 60 initially zero (horizontal) . Since this is a one dimensional example, we suppose that the novum 14 contains a single neuron, although the IG 16 may contain several hundred in a one dimensional lattice.
It is assumed that the PA 12 has already been >- trained as described hereinabove to observe the measurement and to estimate the elevation angle through normal vehicular motions and during firing of the gun, but without any stabilization. After this training, the neurons 34 of the IG 16 have come to be associated with a range of elevation angles. As described hereinabove, even though the IG neurons 34 are on a discrete lattice, the likelihood estimates can interpolate between them, so that the IG estimates are practically continuous.
The novum 14 output is then connected to the vertical actuator and a reference signal of zero degrees is supplied so that the input to the novum 14 is the actual elevation angle of the gun. In the following description of various cases, y(t) is the observed elevation angle (positive being above horizontal) , e(t) = r~Y(t) = -y(t) is the tracking error, P(x(t)|Y(t)) is the IG estimate of the tracking error, given the history of observations, n(t) is the output of the novum, which is also the control signal u(t) to the actuator and K(t) is the (scalar) value of the synaptic weight in the novum 14 which receives the input e(t) .
CASE 1: n(t) is connected to the actuator "properly", so that the control acceleration of the gun elevation is directly proportional to n(t) . K(0) = +1. x(0) = 0, y(0) = 0, and the maximum likelihood of P(x(0) |Y(0)) is over the IG neuron which encodes the zero elevation. An impulse disturbance is applied at t=l which raises the elevation angle of the gun. CASE 2: Same as Case 1 except the actuator is cross wired, so that the vertical acceleration of the gun elevation is inversely proportional to n(t) , i.e., the gun moves down when n(t) is positive.
Discussion of Case 1. When the experiment starts, the IG prediction is accurate, so that n(0) = 0. Therefore, there is no force on the gun and no force on the TFD soliton which is stationary over the zero state in the IG. At t=l, suppose that the gun acquires a constant upward elevation rate due to the disturbance (firing the gun) . This force will decelerate the gun until its velocity is negative and it returns to horizontal. But the observation of that negative velocity reverses the sign of n(t) to prevent overshoot of the reference. The PA response (novum output) to an input that rises and then falls is a bipolar swing, first in the direction of the impulse (if K is positive) and then opposite that direction. Thus, the control action stabilizes the plant at the reference level.
Note that the learning law will react to any large magnitude error as if it did not "trust" its gain value(s). That is because such errors are always increasing in magnitude until the control action takes effect, so during that time the matrix is being adapted in the wrong direction. But if the control action is correct, the error will begin decreasing and the gain matrix will return to its trustworthy state. The learning rate constant β controls the time constants for adaptation, so it is necessary to adjust β properly to allow for the latency in the feedback loop.
Discussion of Case 2. Suppose now that the elevation actuator is cross wired after all the training and gain adaptation from Case 1 had taken place. Then when a disturbance takes place, the control action is in the wrong direction. Thus, the negative derivative on the negative error is magnified rather than corrected, so it persists long enough to drive K from +1 across to some negative value. This "tricks" the novum output into changing sign, thus reversing the torque of the actuator. This in turn not only decelerates the gun, but allows the learning law to continue pushing K further negative (rather than back to zero) even though the feedback latency may not have actually resulted in de/dt becoming negative just yet. If all this can happen fast enough to prevent damage to the plant, then the new adaptive gain K will offset the crosswired actuator and will resume stabilization of the system.
These examples make it clear that the adjustment of the reference signal will also control the gun elevation, since as far as the PACM can tell, raising the reference is equivalent to lowering the gun barrel. This reference may be supplied by the (human) operator of the system or in the form of another control signal from another PACM.
Finally, it is not difficult to observe that there are two more ways to input the reference signal, in addition to the one shown in Figure 14 (in which the reference is assumed to be comparable to the observation y(t)). If one knows the coding of the abstract model in the IG 16, one can synthetically "warp" the refractive index medium as if to steer the solitons in the direction opposite the desired direction. When the resulting false track is decoded in the novum and compared with the observation, the resulting estimation error n(t) will cancel the synthetic warp, but will also steer the plant in the desired direction. The third method is to inject the control signal directly into the novum 14 through hard wired excitatory synapses, assuming that an open 63 loop control signal is already known somehow. All three methods are clearly equivalent in their effect.
A number of key experiments were performed to verify that processes essential to the Parametric Avalanche design would execute as desired and as expected. All of these experiments produced positive results.
Soliton Propagation
Although learning and recall experiments were performed with the "simple" TFD propagation method, simulations of the Fermi-Pasta-Ulam equation were also performed on a one dimensional lattice of neurons. Three types of boundary conditions were used for the simulations: Dirichlet (amplitude held at zero at the lattice endpoints) , Neumann (first derivative held at zero at the lattice endpoints) , and periodic (ends of the lattice tied together) .
The FPU equations are anisotropic, so that an initial disturbance results in a positive pulse moving to the left and a negative pulse moving to the right. Both of the nonperiodic boundary conditions caused some degree of reflection of the waves from the ends of the lattice. With the periodic boundary condition, we could run the simulation until the left and right waves collided, and we confirmed that they would emerge from the collision with their shapes intact.
Figures 15 and 16 show the result of such an experiment. The boundary conditions are "WRAP", which allows the initial disturbance over neurons number 1-5 to propagate to the right and to the left from waveform 129 at time tø. The leftward disturbance wraps around and re-enters the array from the right as waveform 131 slightly later in time. The waveform 129 moves to the right to form waveform 133 at t, and waveform 131 moves to the left to form waveform 135 at t. Notice that the disturbance moving to the right is positive and the disturbance moving to the left is negative. That is the opposite of what happens with the usual sign on the nonlinear term of the FPU equation, but the sign was - reversed since it is desired that positive waves move to the right.
In Figure 15, the experiment proceeds up to, but not beyond the point of the collision of the right and left waves 133 and 135 at time t^. Figure 16 shows the two waves 133 and 135 at the time of collision with solid curve 130 and after the collision by dotted curve 132, illustrating one of the key properties of solitons, and demonstrating the viability of one of the most important elements of the Parametric Avalanche design.
Learning
Simulations demonstrated that the learning algorithm extracts the time derivative of the incoming signal. In these learning experiments, a pulse spanning ten neurons was sent down a one-dimensional IG lattice, and while it was moving, a time series of patterns were input into the novum. The sequence of synaptic weights stored in the IG lattice encoded the time derivative of the input, while the sequence of synaptic weights stored in the novum encoded the negative of the time derivative.
Figures 17 and 18 illustrate the response of the synaptic weights in the IG 16 and the novum 14, respectively, to the onset and the offset of a boxcar function which was input to pixel number 5 (only) of the novum. Figure 17 shows that the onset was recorded most strongly at IG neuron number 25, which is where the moving soliton was shortly after the signal came on (at IG neuron number 20) . The offset was recorded at IG neuron number 88 (the signal was turned off when the soliton was at number 80) . The graph in Figure 12 shows the values of synaptic weight number 5 on each of the 100 IG neurons, and clearly illustrates the way in which the novelty in the temporal signal is distributed spatially over the neurons of the IG. Note that the synaptic - weights at the equilibrium point just before the offset of the boxcar do not reach the zero level, for reasons that we discussed in Section 2.3.1.
The graph in Figure 18 illustrates the values of the synaptic weights on each neuron 28 (pixel) of the novum 14. A dashed curve 134 shows all the learnable synapses on pixel number 5 of the novum 14. The other pixels, which received no input, are also shown to illustrate that even though their synapses were receiving input from the IG, their weights remained at their initial values near zero (random within the interval [-.01,+.01]) . Note that these weights are the negative of those in Figure 17, and that they are partially concentrated on a single neuron, rather than being spatially distributed as in the IG.
These experiments illustrate the ability of the PA to extract the innovations process of the incident signal, thus providing near optimal compression of the internal representation of the time-varying input patterns.
Recall
The experiments have shown that the leading edge of the novel pattern is spatially distributed over the lattice synapses in a pattern that duplicates the original novel input at a location on the TFJ) trajectory corresponding to when the leading edge occurred. Were it not for the time-differentiation effect of the PA dynamics, the leading edge of a protracted "boxcar" signal would be duplicated all the way along the TFD trajectory. Then when the pattern was recalled, it would activate not just the location where it first occurred, but the whole track. That is obviously a highly undesirable waste of resources.
To illustrate the ability of the PA to associatively identify an input pattern, a learning experiment was conducted similar to the "boxcar" experiment, except that every pixel of the novum 14 was given some input. The input over the twelve pixels had the spatial appearance of a "Mexican hat" function and the temporal form of the boxcar. We tested the recall without the presence of nonuniformities (TFDs) in the threshold field of the IG 16, to obtain the "unconditioned" response. We found that when the Mexican hat function was turned on, neuron number 25 responded with the largest positive output, since its synapses contained the strongest template matching the input. Neuron number 88 experienced the strongest negative output, since its synapses held the negative of the input. (In fact, Figure 17 is almost exactly the activation level of the IG at onset of the Mexican hat. The vertical axis is rescaled and relabeled as the "activation" of the neuron whose number appears on the horizontal axis.) And when the Mexican hat was turned off, neuron number 88 responded with the largest positive output, since its template aligned with the negative of the Mexican hat function.
This experiment has demonstrated that the pattern and feature recognition capability of the PA is the same as any other template matching detector when the preconditioning effect of the threshold field is turned off. But of course, the PA has the advantage that the threshold field dynamics not only control the learning of 67 the patterns, but also the recall gain in the presence of a consistent and reinforcing history of observations.
Referring now to Figure 19, there is illustrated a block diagram of one of the neurons 34 in the IG 16 which, as described above, comprises a single processing element. Each of these processing elements is arranged in an array of processing elements of, for example, an M x N array for a two-dimensional system or even a higher dimensional system. Each of the processing elements in the array is represented by that illustrated in Figure 19. The processing element in Flϊgure 19 receives on one set of inputs 140 the signal vector
-.-.&__ inputs N(x,t) which are received from ftovu-ιM.4. In addition, inputs 142 receive adjacent threshold levels from selected nodes, which in the preferred embodiment, are neighboring nodes. However, it should be understood that these threshold levels can be received from selected other nodes or neurons in the IG lattice.
Each of the processing elements is comprised of an IG processor 144 and a threshold level 146. The inputs 140 are input to the IG processor 144 and the inputs 142 are input to the threshold level 146. - A memory 148 is provided which is interfaced through a bi¬ directional bus 150 to the processing element to communicate with both the IG processor 144 and *the threshold level 146. A block 152 represents thefportion of the processing element that computes the stefcivation levels. This resides in the IG plane 144. In addition, there is a block 154 that indicates the step of computing weight updates, which also resides in the IG plane 144. In the threshold plane 146, there is a block 156 that is provided for updating the threshold values. In addition, there is a clock 158 that operates the processing element of Figure 19. As described above, each of the processing elements is asynchronous and operates on its own clock, which is an important aspect of the Parametric Avalanche.
The output of the activation block 152 is input to a threshold function block 158 which determines the output on a line 160 as a function of the threshold generated by the threshold computation block 156. As described above, the threshold is low only in the vicinity of the TFD. The output of block 158 comprises the output of the IG neuron or processing element of Figure 19 and this also is fed back to the input of the compute weight update block 154 to determine new weights. The output of the activation block 152 is also input to the threshold level update block 156.
Each of the blocks 152, 154 and 156 interface with the memory 148 which is essentially a multiport memory. This is so because each of the processes operate independently; that is, the synaptic weights are fetched from memory 148 by the activation computation block 152 while they are being updated.
In operation, when a signal is received on the lines 140 from the novum, the activation computation block 152 must fetch the weights from the memory 148 in order to compute the activation level. This is then input to the threshold block 158. At the same time, the threshold output levels from each of the interconnected (preferably adjacent) nodes is received to generate the threshold level at that processing element or node. This is utilized to set the threshold level input to the node, and, thus, determine the output level. As described above, if the threshold level is low, this will produce an output even if the activation level is very low. However, if the threshold level is high, but the activation level is very high, this may also produce an output. 69
In order to update the weights stored in the memory 148, it is necessary to know what the input signal looks like and also to know what the output signal looks like. This provides an indication of what the current weight values are δ ^j = α Ini Outj .
Whenever the weights are updated, this is referred to as learning. Of course, there has to be an output on the line 160 in order for the weight updates to be computed. This is due to the fact that learning occurs primarily in the region of the TFD. This is not to say that the learning algorithm is not operating, but, that any updates to the weight values will be zero. However, when an output signal produced, the currently stored template in memory 148 will be changed so that it incorporates a copy of the current input pattern.
There are three situations that can occur. The first is when the system is initialized and nothing is stored in the template such that the system must learn. As a soliton wave moves across the processing element and the threshold level goes down, the signal level will go up on the output due to the mismatch and the presence of a threshold level and the observed image will be stored in the template in memory 148. In the second situation, a TFD moves across a processing element, but the input signal mismatches with the memory, resulting in a high activation output from activation block 152. In this case, the mismatch either does not produce an output or it does produce an output. If it does not produce an output, then the memory template will stay where it is, and if it does produce an output, the memory template will be transformed so that it looks like whatever signal is activated. The third situation is when the soliton wave passes across the particular processing element and the threshold gets lowered and the incoming signal actually matches the template. Since it actually matches the template, the output will be high but since the template already looks like what was originally stored it will only be reinforced but not changed in form.
Referring now to Figure 20, there is illustrated a block diagram of one of the neurons 28 in the novum 14, which, as described above, comprises a single processing element. Each of these processing elements is arranged in an array of processing elements of, for example, an M x N array for a two dimensional system. Each of the processing elements in the array is represented by that illustrated in Figure 20. The processing element in Figure 20 receives on one set of inputs 170 the signal vector inputs from the output of the observation matrix 10. The processing element of Figure 20 also receives on a second set of inputs 172 the outputs from the IG 16.
Each of the processing elements is comprised of a computational unit and a memory 174. The memory 174 is interfaced with the computational unit through a bi¬ directional bus 176. The computational unit computes the activation energy in a computational block 178. In addition, the computational unit also computes the weight updates, as represented by compuational block 180. The weight update computational block 180 provides the learning portion of the novum. Both the block 178 and the block 180 receive the inputs from both the signal vector inputs and IG outputs.
The output of the activation computation block 178 is input to a threshold function block 182 which was described above and comprises a bi-polar function. The output of the threshold function block 182 is input to the weight update computation block 180 and also provides the novum output on the line 184. A clock 186 is provided which operates the computational unit of the novum.
Control
Referring now to Figure 21, there is illustrated an example of an application of the Parametric Avalanche — the classical "broom balancing" problem. In this problem, a cart 186 is provided with an upright member 188 disposed on the upper surface thereof and mounted on a pivot point 190 at the lower end thereof. The upper end of the member 188 has a weight 192 disposed thereon. The object of this problem is to maintain the member 188 in a vertical and upright direction.
The Parametric Avalanche in this example is comprised of 100 neurons in an IG 194 and a single novum neuron 196. The novum neuron receives as inputs the outputs of the 100 neurons in the IG 194 and it also receives a single observation input, representing the angle from the vertical relative to the cart 186. The angle theta is input to the negative input of summing block 198, the positive input of which is connected to a signal REFISG. The output is input to a block 200 which receives on the other input thereof the adaptive gain input. The output of block 200 provides the observation input to the novum neuron 196. The control input to the cart is a horizontal acceleration and is supplied by the network. The output of the novum neuron 196 is input to each of the 100 neurons in the IG 194 on an output line 204. In addition, the output of the novum neuron 196 is input through a gain scaling block 206 in the cart 186 to provide a control input.
The threshold field is represent by aτUιoving TFD 202 which traverses the IG neurons from the left to the right. This IG is a one dimensional IG. Since the novum output is a smooth approximation of the derivative of the input (when the input is entirely novel) , the novum serves as a "derivative controller".
Referring now to Figure 22, there is illustrated the time evolution of the angle from the vertical, and the corresponding novum output. Figure 22 is similar to Figure 21 except that random disturbances have been injected into the system, as will be described hereinbelow. In this example, a "virgin" Parametric Avalanche generates a control signal which maintains an inverted pendulum in its upright position through the application of a horizontal acceleration to the pivot point of the pendulum. The simulation consists of a loop on the time variable, in each cycle of which the error (difference) between the actual angle of the inverted pendulum (in radians away from the vertical) and the desired angle of zero radians is multiplied by a gain coefficient and then supplied as the input to the novum neuron 196 of the Parametric Avalanche model. The Parametric Avalanche model is then called upon to advance its state forward one increment of time in accordance with its Quantum Neurodynamics and its learning laws, and to present the output of the novum neuron 196 as the control signal (the horizontal acceleration) for the motion of the pivot point 190. (Since the Novum output is restricted by the sigmoid threshold function to being within the range from -1 to +1, it is amplified by a constant positive factor before it reaches the pendulum model.) Next, the update subroutine for the adaptive adjoint gain coefficient is called upon to adjust this gain for optimum effect of the control action. Finally, the pendulum model is called upon to advance its state forward one increment of time by a simple double integration of the second order difference equations, thus producing the actual pendulum angle for use in the next cycle of the loop. For this simulation, the Quantum Neurodynamics of the PA model is implemented by the "naive" method rather than by actual integration of the nonlinear Schroedinger equation. That is, a threshold depression (TFD) 202 is propagated along the one-dimensional IG lattice 194 by a global type of algorithm which is capable of interpolating the TFD 202 into one hundred equally spaced positions between each pair of the IG neurons. The TFD 202 moves with a velocity that is specified by the operator at run time. There is no provision for modulation of this velocity by the "warp drive" mechanism (warping of the refractive index field) , because it is assumed that the synapses of each IG neuron are randomly initialized prior to the passage of the TFD 202. Of course, those synapses will become programmed as the
TFD 202 passes by in accordance with the learning law.
If the TFD 202 velocity is too large in relation to the learning rates of the novum and IG synapses, then the output of the novum will simply be a noisy version of the input signal, the variance of the noise depending on the variance of the random initialization of the synaptic weights. Experimentation has shown that any velocity in excess of approximately 2 IG neurons per simulation second will produce this kind of output, which is useless for control of the pendulum. As the velocity is reduced below 2 neurons per second, the noise disappears and the phase angle of the novum output begins to lead the phase angle of the input, moving toward the time derivative of the input signal. This allows the novum output to function as a derivative controller of the pendulum.
(It also allows a differential form of the input signal to be stored in the synapses of the IG for subsequent use in recall and prediction of the observed motion; but that is not utilized properly in this simulation.) Further reduction of the TFD velocity below about 0.5 neurons per second results in an output which is unable to stabilize 74 the pendulum, for reasons which have not yet been determined.
The simulations shown in the graphs of Figures 22 and 23 were produced with a TFD velocity of 1 IG neuron per simulation second. The program software listing-is contained hereinbelow. The input data which produced the graph of Figure 22 is contained in Table 1 and the output data which produced the graph in Figure 23 is contained in Table 2 in the list bearing the same name as the graph, but with a ".DAT" extension. The main difference between the two is that in Figure 23, a random disturbance was applied to the velocity of the pivot point, as indicated by the nonzero value of the RANGE parameter.
75
TABLE 1
2 R
.5 ALPHA
.1 BETA
1.1 GAIN
30 SCALE
0 REFSIG
11111 SEED
0 RANGE
.5 THETA
0 DTHETA
.1 DT
500 CYCLES
50 KICK
1 GMIN
5 GMAX
1 VEL
0.1 WTMAX
TABLE 2
2 R
.5 ALPHA
.1 BETA
1 GAIN
30 SCALE
0 REFSIG
11111 SEED
.5 RAtfGE
.5 THETA
0 DTHETA
.1 DT
500 CYCLES
50 KICK
1 GMIN
25 GMAX
1 VEL
0.1 WTMAX
Ordinarily, a control system is designed by passing a state estimate through a gain factor to "represent" the estimate properly to the input of the plant. In our design, however, we adopt an adjoint representation of the gain by placing it between the output of the plant and the input of the state estimator (the PA) . This causes the PA to alter its internal representation of the plant so that the state estimates will be compatible with the control input to the plant without further transformation, except possibly for scaling by a positive amplification. The reason for doing this is that the information needed for adaptive gain adjustments is not compatible with neurocomputing methods when the gain is placed between the PA output and the plant input; but it is compatible with neurocomputing methods when it is placed in the observation path. The UPDATE subroutine adjusts the gain so as to favor opposite signs of the input to the novum (the tracking error) and the output of the novum (the control) . When that output approximates the time derivative of the tracking error, such a gain will result in any tracking error being driven to zero by the control signal. Of course, if the gain happens to be negative, then the PA will be learning a reversed image of the observations, but that is what it takes to obtain one signal from the novum that means the'"same thing to both the plant and the PA's model of the plant in terms of controlling its trajectory.
Although the preferred embodiment has been described in detail, it should be understood that various changes, substitutions and alterations can be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

78CLAIMS :
1. A neural network, comprising: an observation input for receiving a time-series of observations; a novelty device for comparing observations with an internally generated prediction in accordance with a novelty filtering algorithm to provide on an output a sub-optimal innovations process related to said received observations and said prediction, said output representing a prediction error; a prediction device for generating said prediction for output to said novelty device, said novelty prediction device including a geometric lattice of nodes, each node having associated therewith: a memory device for storage of spatial patterns which represent a spatial history of said time series of operation, a plurality of signal outputs for receiving said prediction error from said novelty device, a filter for match filtering said received prediction error through the stored spatial patterns in said memory device to produce a correlation coefficient representing the similarity between said stored pattern and the prediction error, a plurality of threshold inputs for receiving threshold output levels from the select other of said nodes, a threshold memory device for storing the threshold levels representing the prior probability for the occurrence of said stored spatial patterns prior to receiving said storage spatial patterns, a CPU for computing an updated threshold level in accordance with a differential- difference equation which operates on the stored threshold level, said received threshold levels and said correlation coefficients to define and propagate a quantum mechanical wave particle across the geometric lattice of nodes, said CPU storing said updated threshold level in said threshold memory device, a threshold output for outputting said updated threshold to other of said nodes, and said CPU computing said internally generated prediction by passing said correlation coefficient through a sigmoid function whose threshold level comprises an updated threshold level, said prediction representing the probability for the occurrence of said stored spatial pattern conditioned upon the prior probability represented by the stored threshold level.
2. The neural network of Claim 1 wherein said novelty device is adaptive.
3. The neural network of Claim 1 wherein the said filter comprises a correlation filter.
4. The neural network of Claim 3 wherein said correlation filter provides the product of the stored spatial patterns and the received prediction error.
5. The neural network of Claim 1 wherein said threshold inputs receive threshold output levels from neighboring ones of said nodes.
6. The neural network of Claim 1 wherein said prediction device further includes learning means for updating the stored spatial patterns so as to correlate said prediction error with the position of the quantum mechanical wave particle. 80
7. The neural network of Claim 6 wherein said learning means operates in accordance with the Hebbian learning law.
8. The neural network of Claim 1 wherein said novelty device comprises an array of nodes, each node including: a plurality of signal inputs that are connected to said observation inputs in accordance with a predetermined interconnect pattern; a plurality of prediction inputs for receiving the prediction output of said prediction device; a memory for storing temporal patterns which represent a time history of said time series of observations; and means for operating on said prediction signal inputs with a predetermined algorithm that utilizes said stored temporal patterns to provide said prediction error output.
9. The neural network of Claim 8, further comprising learning means for updating the stored spatial patterns so as to minimize said prediction error.
10. The neural network of Claim 9 wherein said learning operates in accordance with a contraHebbian learning law.
EP90909520A 1989-06-16 1990-06-15 Continuous bayesian estimation with a neural network architecture Withdrawn EP0433414A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US36746889A 1989-06-16 1989-06-16
US367468 1994-12-30

Publications (1)

Publication Number Publication Date
EP0433414A1 true EP0433414A1 (en) 1991-06-26

Family

ID=23447304

Family Applications (1)

Application Number Title Priority Date Filing Date
EP90909520A Withdrawn EP0433414A1 (en) 1989-06-16 1990-06-15 Continuous bayesian estimation with a neural network architecture

Country Status (4)

Country Link
EP (1) EP0433414A1 (en)
JP (1) JPH04500738A (en)
AU (1) AU5835990A (en)
WO (1) WO1990016038A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4100500A1 (en) * 1991-01-10 1992-07-16 Bodenseewerk Geraetetech SIGNAL PROCESSING ARRANGEMENT FOR THE CLASSIFICATION OF OBJECTS BASED ON THE SIGNALS OF SENSORS
US6054710A (en) * 1997-12-18 2000-04-25 Cypress Semiconductor Corp. Method and apparatus for obtaining two- or three-dimensional information from scanning electron microscopy
JP5541578B2 (en) 2010-09-14 2014-07-09 株式会社リコー Optical scanning apparatus and image forming apparatus
WO2019018533A1 (en) * 2017-07-18 2019-01-24 Neubay Inc Neuro-bayesian architecture for implementing artificial general intelligence
US11556794B2 (en) * 2017-08-31 2023-01-17 International Business Machines Corporation Facilitating neural networks
US11556343B2 (en) 2017-09-22 2023-01-17 International Business Machines Corporation Computational method for temporal pooling and correlation
US11138493B2 (en) 2017-12-22 2021-10-05 International Business Machines Corporation Approaching homeostasis in a binary neural network
EP3782083A4 (en) * 2018-04-17 2022-02-16 HRL Laboratories, LLC A neuronal network topology for computing conditional probabilities
CN111168680B (en) * 2020-01-09 2022-11-15 中山大学 Soft robot control method based on neurodynamics method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9016038A1 *

Also Published As

Publication number Publication date
JPH04500738A (en) 1992-02-06
WO1990016038A1 (en) 1990-12-27
AU5835990A (en) 1991-01-08

Similar Documents

Publication Publication Date Title
Becker Unsupervised learning procedures for neural networks
Barreto et al. Identification and control of dynamical systems using the self-organizing map
Ghosh et al. An overview of radial basis function networks
Haykin Neural networks and learning machines, 3/E
Lim et al. An incremental adaptive network for on-line supervised learning and probability estimation
Ebadzadeh et al. CFNN: Correlated fuzzy neural network
Lehtokangas et al. Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm
Steinberg et al. A neural network approach to source localization
EP0433414A1 (en) Continuous bayesian estimation with a neural network architecture
Amari Mathematical theory of neural learning
Hinton et al. Spiking boltzmann machines
Buscema Theory: foundations of artificial neural networks
Schaal et al. Receptive field weighted regression
Tang et al. Application of fuzzy Naive Bayes and a real-valued genetic algorithm in identification of fuzzy model
Camargo Learning algorithms in neural networks
MacLennan Field computation in motor control
Dash et al. Gold price prediction using an evolutionary pi-sigma neural network
Smyth Probability density estimation and local basis function neural networks
Fujita Trial-and-error correlation learning
Ducke Archaeological predictive modelling in intelligent network structures
Sutton et al. A fuzzy autopilot design approach that utilizes non-linear consequent terms
Iegorova et al. Binary Classification of Terrains Using Energy Consumption of Hexapod Robots
Chang Learning Algorithms and Applications of Principal
CN117710789A (en) Pulse neural network continuous learning target recognition system based on selective activation
Ungar et al. EMRBF: a statistical basis for using radial basis functions for process control

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LI LU NL SE

17P Request for examination filed

Effective date: 19910627

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MARTINGALE RESEARCH CORPN.

17Q First examination report despatched

Effective date: 19940620

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19950103