GB2236608A - Digital neural networks - Google Patents

Digital neural networks Download PDF

Info

Publication number
GB2236608A
GB2236608A GB8922528A GB8922528A GB2236608A GB 2236608 A GB2236608 A GB 2236608A GB 8922528 A GB8922528 A GB 8922528A GB 8922528 A GB8922528 A GB 8922528A GB 2236608 A GB2236608 A GB 2236608A
Authority
GB
United Kingdom
Prior art keywords
output signal
function
neural
neural processor
processor according
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB8922528A
Other versions
GB8922528D0 (en
GB2236608B (en
Inventor
David John Myers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Telecommunications PLC
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Priority to GB8922528A priority Critical patent/GB2236608B/en
Publication of GB8922528D0 publication Critical patent/GB8922528D0/en
Publication of GB2236608A publication Critical patent/GB2236608A/en
Application granted granted Critical
Publication of GB2236608B publication Critical patent/GB2236608B/en
Priority to HK132796A priority patent/HK132796A/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

A digital neuron receiving inputs X1-Xn includes weighting elements W1-Wn, a processor P, and a non-linear compressing element C. Element C uses a piecewise linear approximation to the sigmoidal neuron activation function, which maps compactly into a digital integrated circuit realisation, and involves slopes of powers of two so that it may be implemented by shifting the lower order bits of the neuron output in dependence upon the higher order bits. <IMAGE>

Description

DIGITAL NEURAL NETWORKS This invention relates to digital neurons, and to neural networks comprising a plurality of such neurons, particularly but not exclusively for pattern recognition.
A neuron in this context is a circuit (realised with electrical or optical components) which receives a plurality of inputs and produces an output corresponding to a function (eg the sum) of the inputs each weighted by a respective weighting factor, derived during a training phase.
Attempts were made to inter-connect such neurons in layers, the output of one forming an input to another, so as to form a net. However, it was found that the effect of so doing was exactly equivalent to that of a single layer of neurons, unless a non linear compression stage was included between the layers.
In the implementation of neural networks, there is thus generally a requirement for non-linear activation function at the output of each neuron . This may take a number of forms, including the simple threshold function, but the most popular activation function is the sigmoid function, given by: Y= 1 (1) (1 + e-X) In the reported analogue integrated circuit implementations of neural nets, the non-linearity is usually implicit in one of the other neuron operations, for example in the analogue multiplier2, or by allowing the analog summing amplifier to go into saturation and shit the rails'3. Thus the nature of the non-linear function is rather uncontrolled.
In digital Very Large Scale Integration (VLSI) implementations, by contrast, the activation function can be specified with arbitary precision. A number of possibilities exist for evaluating the function of eqn.
1. It could be evaluated by summing a truncated series expansion, which is likely to be slow in terms of computation time, or by using a table look-up scheme in which the y value (output value) associated with each x (input value) value is stored in memory (eg Random Access Memory (RAH) or Read-only Nemory (RON)), and the x value is used to address the memory. If x is a 16 bit number and y is an 8 bit number, a simple look-up scheme would require 64 K Bytes of memory: this is clearly impractical, in terms of silicon area required, for each neuron of a VLSI neural network since such a network needs a very large number of neurons. One alternative would be to use a piecewise linear approximation scheme, in which the breakpoints are stored in memory, and table lookup is combined with linear interpolation.
According to the invention there is provided a neural processor comprising means for receiving an input signal and means for producing a digital output signal corresponding to a function of the input signal and a weight vector associated with the processor, further comprising means for applying a non-linear function to said output signal to produce a digital compressed output signal wherein the function consists of a plurality of inclined linear segments with different slopes, the slope of each said segment being equal to an, where n is an integer and a is the base of the digital output signal.
This method of executing the activation function can be realised as a combination of a small number of simple logic gates (and shift stages) requiring no large storage table or repetitive calculation, and hence using less silicon area. Scaling to the different slopes is easy to achieve by left or right logical shifts since the slopes are powers of 2 (in binary arithmetic). It is inspired by the piecewise linear approximation used to implement A-law companding for Pulse Code Nodulation (PCx) systems Other aspects of the invention are as claimed or described herein, with advantages which will be apparent from the following.
One embodiment of the invention will now be illustrated, by way of example, with reference to the accompanying drawings in which: Figure 1 shows schematically a generalised digital neuron known in the art; Figure 2a shows the general form of the sigmoid excitation function, and of a standard piecewise-linear A law curve; Figure 2b shows an exemplary function curve according to a preferred embodiment of the invention, and the sigmoid excitation function; Figure 3 shows schematically part of one example of a circuit for generating a non-linear function in a neuron according to the invention; Figure 4 shows schematically a further part of that exemplary circuit; Figure 5 shows part of the circuit of Figure 3 in greater detail Figure 6 shows one example of a circuit comprising an element of the circuit of Figure 5; and Figure 7 shows a circuit, for use with that shown in Figures 3-6, for generating the gradient of the function shown in Figure 2b.
Referring to Figure 1, a digital neuron well known in the art comprises at least one input (X1, X2, X3 means for scaling each input according to the value of a weight (wl, W2, W3, Wn )l and n processor P which produces a binary digital output Y which is a function (e.g. the sum) of these scaled or weighted inputs.
As discussed above, it is also in general necessary to provide means C for applying a non linear function to compress the range of the output Y, to give a compressed output I. This (together with other neuron outputs), is connected to form an input to a further neuron.
During a training phase, training data signal vectors are presented as input to the neuron, and the weights are altered (eg incremented) in dependance upon, amongst other things, - the derivative of the (compressed) output by training means (not shown).
In our initial experiments a 13 segment piecewise-linear, A-law curve was used scaled such that values of neuron output x in the range -8 to +8 mapped to values of compressed output y in the range 0 to +1, to approximate the sigmoid function. Fig. 1 shows a plot of the scaled A-law curve, and for comparison the sigmoid function of eqn. 1. The nature of the piecewise linear A-law curve is explained in Reference 4, for example. However, simulations using this approximation indicated that it did not perform well when used to train Nulti-layer Perceptron (ALP) networks using the Backpropagation Algorithm This is because in the region around x=0, the first derivative (slope) of the A-law linear curve is much higher than that of the sigmoid function.As a result, when the sigmoid was replaced by the A-law curve, the training was unstable, and failed to converge. It could be made to converge by reducing the learning ratel, but this resulted in very long learning times compared to simulations of the same network utilising the sigmoid function. It was found that the A-law curve could however be used in the recognition mode for nets trained (i.e.
weight values derived) using the sigmoid function without serious degradation in the performance of the net.
A modified curve was therefore developed, similar to the A-law curve in that the gradient of each section can be expressed as a power of 2. This curve, shown in Fig. 2, has only 7 segments and is a better approximation to the sigmoid function, which is also shown in Fig. 2 for comparison. Note that at x=O the modified curve and the sigmoid have the same gradient (=0.25). Table 1 compares the breakpoints of the A-law and the modified curves.
The breakpoints are conveniently at input values which are pwers of 2; this enables the particular segment of the curve to which a given input value corresponds to be determined from higher order bits alone.
Simulation of the XLP with backpropagation using the modified piecewise linear approximation indicates that it gives comparable performance to simulations based on the true sigmoid, and is thus useful in trainable neurons and nets thereof.
Table 2 shows a possible truth table for the modified piecewise linear function corresponding to positive values of x, aimed at mapping a 16 bit 2's complement input value (Io-I15) in the range -8 to +8 to an 8 bit 21s complement output value (Ro-R7) in the range 0 to 1.
The negative half of the function corresponds in a obvious fashion. This could be used in a digital neural net system with 8 bit data word representation, which allows an additional 8 bits internal to each neuron for bit growth.
One possible hardware implementation of the truth table of Table 1 is shown in Fig. 3. This consists of a 'most significant 1' selector circuit 1, which takes the most significant bits Ill-Il4 of the neuron output as input, and outputs a 4 bit word Y11-Y14 containing a 1 at the position of the most significant 1 in the input, or else outputs 0 where there is no 1 in Ill-Il4.
Signals Y11-Y14 of the neuron output are decoded to produce compressed output bits R4-R5. Bit R6=1 because the output is always greater than 0.5, in the positive quadrant of the function (shown in Fig. 1) and bit R=O because the output is always positive. R7 is therefore ignored (at this stage). Signals Y12-Yl4 are also used to control a shifting circuit 2 comprising a bank of 2:1 multiplexer circuits 2a, 2b, 2c, which controllably shift bits of the input word to provide the lower order compressed output bits Ro-R3. These multiplexers connect the upper right input to the output if the control signal is 0, and the lower right input to the output otherwise.
Fig. 4 shows one way in which the circuit of Fig. 3 can be generalised to cover both positive and negative values of x. If I15=l the neuron output is negative, and so the input is 2's complemented by a first complementing means 3a before being applied to the circuit of Fig. 2.
The compressed output of the circuit of Fig. 2 (i.e. bits Ro'R6) is also passed through second 2's complementing means 3b, which has the effect of mirroring the activation curve about the line y=0.5, and results in the correct value being output for negative inputs. For positive inputs, both 2's complement circuits 3a, 3b are bypassed.
I15 is used as a control bit to switch 3a and 3b in or out. One's complementing could also be used, at the expense of a slight loss of accuracy, allowing simpler logic implementation of circuits 3a and 3b.
Referring to Figure 5, the 'select most significant 1 circuit takes Ill-Il4 as input, and outputs a bit word Yll-Yl4 containing a 1 at the position of the most significant 1 in the input, or else outputs 0. This can be described by the following truth table:: Table 3
114 I13 I12 Ill Y14 Y13 Y12 Y11 o 0 0 0 0 0 0 0 O 0 0 1 0 0 0 1 O 0 1 X O 0 1 0 0 1 X x 0 1 0 0 1 X X X 1 0 0 0 (X = donut care) This may be implemented in a number of ways. A modular implementation is shown in Figure 5. Each of the modules la, Ib, ic, ld shown in this figure has two inputs; an input Ii and an input Ri+1.Pi+1 indicates whether there have been any more significant (previous) l's input.
i.e. Pi+1=1 if any Ij=l where j > i. Each module has two outputs, Y1 and Pi. The truth table for each module is as follows: Table 4
Pi+1 Ii Pi o 0 0 0 0 1 1 1 1 0 1 0 1 1 1 1 The required functions are thus; P's = I' + P' s+l (=I's.
as shown in Figure 6) and; Y's = I's . P's+1 (=I's+P's+1' as shown in Figure 6) This suggests one simple implementation of each module la, lb, lc, ld using NOR, NAND and NOT gates shown in Figure 6. other logically equivalent circuits are easily derived. As can be seen in Figure 5, P15 is set to 0, and P11 is a signal that is O if none of the inputs I Il4 are equal to 1.
A further feature of the invention is that from outputs Y11-Y14 and P11 it is possible to derive a value for the gradient of the activation fuction, in other words the first derivative of the compressed output with very little additional circuit overhead. This is of course particularly useful during the training phase since it is required by the back-propagation algorithm.
The table below relates values of Y11-Y14 and P11 to the gradient of the curve, expressed as a 2's complement fractional binary number Go-G6.
Table 5
Y14 Y13 Y12 Y11 Y11 G6 G5 G4 G3 G2 G1 G0 1 O O O 1 0 0 O O O O 1 (1/64) O 1 O O 1 O O o O O 1 O (1/32) 0 0 1 0 1 0 0 0 1 0 0 0 (1/8) 0 0 0 1 1 0 0 1 0 0 0 0 (1/4) 0 0 0 0 0 0 0 1 0 0 0 0 (1/4) This gives the simple implementation shown in Figure 7.
The gradient value output from this circuit (G0-G6) is valid regardless of the sign of the original input.
The circuit of Figs. 3 to 6 can be compactly and elegantly implemented in VLSI, using only simple logic functions which operate at extremely high rates compared to arithmetic processes, and occupy little chip area. Given the high operating speed, one potential application is in video image recognition, e.g. for a hybrid video coder (for example for use in a video phone). other applications include speech conversion, industrial robot vision and natural language computing, and the principles behind it can be easily extended to other combinations of input and output wordlength.
It will be understood that the concept of the invention is applicable also to digital optical neural networks, and that it functions analogously with tri-state or other multi-level logic-state arithmetic; for an arithmetic system to base 'a', the slopes of linear segments are powers of 'a' and the segment breakpoints are powers of ., Referendes 1. RUBELHART, D.E., and NcCLELLAND, J.L. (Eds): 'Parallel Distributed Processing' Vol. 1 (The NIT Press, Cambridge, Mass, 1986).
2. SCHWARTZ, D.B., and HOWARD, R. E.:'A Programmable Analog Neural Network Chip' Proc. IEEE Custom Integrated Circuits Conference, 1988.
3. NULLER, P. et Al. 'A General Purpose Analog Neural Computer' Proc. IEEE/INNS Int. Joint Conf. on Neural Networks, 1989.
5. SMITH, D.R. 'Digital Transmission Systems' (Van Nostrand Reinhold, 1985) pp. 78-88.
Tables Table 1. Breakpoints of the A-law and modified piecewise linear curves.
Table 2. Piecewise linear activation function truth table (corresponding to Figure 2a) for positive values of input.
Table 3. Truth table for circuit of Figure 5.
Table 4. Truth table for circuit of Figure 6.
Table 5. Truth table for circuit of Figure 7.
BREAKPOINTS SCALED A-LAW MODIFIED CURVE x Y X X Y -8.0 0.0 -8.0 0.0 -4.0 0.0625 -4.0 0.0625 -2.0 0.125 -2.0 0.125 -1.0 0.1875 -1.0 0.25 -0.5 0.25 - -0.25 0.3125 - -0.125 0.375 - 0.125 0.625 - 0.25 0.6875 - 0.5 0.75 - 1.0 0.8125 1.0 0.75 2.0 0.875 2.0 0.875 4.0 0.9375 4.0 0.9375 8.0 1.0 8.0 1.0
I15 I14 I13 I12 I11 I10 I9 I8 I7 I6 I5 I4 I3 I2 I1 I0 R7 R6 R5 R4 R3 R2 R1 R0 0 0 0 0 0 a b c d x x x x x x x 0 1 0 0 a b c d 0 0 0 0 1 a b c d x x x x x x x 0 1 0 1 a b c d 0 0 0 1 a b c d x x x x x x x x 0 1 1 0 a b c d 0 0 1 a b c x x x x x x x x x x 0 1 1 1 0 a b c 0 1 a b c x x x x x x x x x x x 0 1 1 1 1 a b c

Claims (14)

  1. CLAIMS 1 A neural processor comprising means for receiving an input signal and means for producing a digital output signal corresponding to a function of the input signal and a weight value associated with the processor, further comprising means for applying a non-linear function to said output signal to produce a digital compressed output signal wherein the function consists of a plurality of inclined linear segments with different slopes, the slope of each said segment being equal to an, where n is an integer and a is the base of the digital output signal.
  2. 2 A neural processor according to claim 1, in which the intersections of the said segments occur at digital output signal values of am, where m is an integer and a is the base of the digital output signal.
  3. 3 A neural processor according to claim 1 or claim 2, in which the said digital output is a binary number, the base a being 2.
  4. 4 A neural processor according to any preceding claim, in which the slope of the function around its centre approximates that of a sigmoid function.
  5. 5 A neural processor according to claim 4, in which the said slope of the function around its centre is 2 2.
  6. 6 A neural processor according to claim 4 or claim 5, including means for altering the value of the said weight value in dependence upon the compressed output.
  7. 7 A neural processor according to any preceding claim, in which the non-linear function means comprises shifting means for logically shifting at least some lower order digits of said output signal, in dependence upon the value of the higher order digits of said output signal, the said digits thus shifted forming digits of the compressed output signal.
  8. 8 A neural processor according to claim 7, in which the non-linear function means further comprises a logic gate circuit connected to receive said high order digits and to provide an output for controlling said shifting means.
  9. 9 A neural processor according to claim 8, in which the logic gate circuit is for producing also the high order digits of said compressed output signal.
  10. 10 A neural processor according to claim 8 or claim 9, in which the logic gate circuit further includes means for generating a slope signal indicating the slope of the function corresponding to the said output signal.
  11. 11 A neural network comprising a plurality of neural processors according to any preceding claim connected so that the compressed output signals of some processors may form the input signals to others.
  12. 12 A pattern recognition device comprising a neural network according to claim 11.
  13. 13 A visual pattern recognition device comprising a network according to claim 11 arranged to receive signals derived from a video signal as inputs.
  14. 14 A neural processor substantially as described herein.
GB8922528A 1989-10-06 1989-10-06 Digital neural networks Expired - Fee Related GB2236608B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB8922528A GB2236608B (en) 1989-10-06 1989-10-06 Digital neural networks
HK132796A HK132796A (en) 1989-10-06 1996-07-25 Digital neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB8922528A GB2236608B (en) 1989-10-06 1989-10-06 Digital neural networks

Publications (3)

Publication Number Publication Date
GB8922528D0 GB8922528D0 (en) 1989-11-22
GB2236608A true GB2236608A (en) 1991-04-10
GB2236608B GB2236608B (en) 1993-08-18

Family

ID=10664161

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8922528A Expired - Fee Related GB2236608B (en) 1989-10-06 1989-10-06 Digital neural networks

Country Status (2)

Country Link
GB (1) GB2236608B (en)
HK (1) HK132796A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2254177A (en) * 1991-03-08 1992-09-30 Haneef Akhter Fatmi Neural machine
EP0546624A1 (en) * 1991-12-11 1993-06-16 Laboratoires D'electronique Philips S.A.S. Data processing system operating with piecewise non-linear function
EP0652525A2 (en) * 1993-11-09 1995-05-10 AT&T Corp. High efficiency learning network
US5537513A (en) * 1992-04-29 1996-07-16 U.S. Philips Corporation Neural processor which can calculate a norm or a distance
US5625753A (en) * 1992-04-29 1997-04-29 U.S. Philips Corporation Neural processor comprising means for normalizing data
EP1006437A1 (en) * 1998-11-30 2000-06-07 TELEFONAKTIEBOLAGET L M ERICSSON (publ) Digital value processor for estimating the square of a digital value
US9292790B2 (en) 2012-11-20 2016-03-22 Qualcom Incorporated Piecewise linear neuron modeling
CN108154224A (en) * 2018-01-17 2018-06-12 北京中星微电子有限公司 For the method, apparatus and non-transitory computer-readable medium of data processing

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2254177A (en) * 1991-03-08 1992-09-30 Haneef Akhter Fatmi Neural machine
EP0546624A1 (en) * 1991-12-11 1993-06-16 Laboratoires D'electronique Philips S.A.S. Data processing system operating with piecewise non-linear function
FR2685109A1 (en) * 1991-12-11 1993-06-18 Philips Electronique Lab NEURAL DIGITAL PROCESSOR OPERATING WITH APPROXIMATION OF NONLINEAR ACTIVATION FUNCTION.
US5796925A (en) * 1991-12-11 1998-08-18 U.S. Philips Corporation Neural digital processor utilizing an approximation of a non-linear activation function
US5625753A (en) * 1992-04-29 1997-04-29 U.S. Philips Corporation Neural processor comprising means for normalizing data
US5537513A (en) * 1992-04-29 1996-07-16 U.S. Philips Corporation Neural processor which can calculate a norm or a distance
US5548686A (en) * 1992-04-29 1996-08-20 U.S. Philips Corporation Neural comprising means for calculating a norm or a distance
EP0652525A2 (en) * 1993-11-09 1995-05-10 AT&T Corp. High efficiency learning network
EP0652525A3 (en) * 1993-11-09 1995-12-27 At & T Corp High efficiency learning network.
EP1006437A1 (en) * 1998-11-30 2000-06-07 TELEFONAKTIEBOLAGET L M ERICSSON (publ) Digital value processor for estimating the square of a digital value
WO2000033174A1 (en) * 1998-11-30 2000-06-08 Telefonaktiebolaget Lm Ericsson (Publ) Digital value processor
US6463452B1 (en) 1998-11-30 2002-10-08 Telefonaktiebolaget Lm Ericsson Digital value processor
US9292790B2 (en) 2012-11-20 2016-03-22 Qualcom Incorporated Piecewise linear neuron modeling
US9477926B2 (en) 2012-11-20 2016-10-25 Qualcomm Incorporated Piecewise linear neuron modeling
CN108154224A (en) * 2018-01-17 2018-06-12 北京中星微电子有限公司 For the method, apparatus and non-transitory computer-readable medium of data processing

Also Published As

Publication number Publication date
HK132796A (en) 1996-08-02
GB8922528D0 (en) 1989-11-22
GB2236608B (en) 1993-08-18

Similar Documents

Publication Publication Date Title
US6151594A (en) Artificial neuron and method of using same
US4972363A (en) Neural network using stochastic processing
US5506797A (en) Nonlinear function generator having efficient nonlinear conversion table and format converter
US4967388A (en) Truncated product partial canonical signed digit multiplier
Ramamoorthy et al. Bit-serial VLSI implementation of vector quantizer for real-time image coding
GB2236608A (en) Digital neural networks
KR920006793B1 (en) Learning machine
US5857178A (en) Neural network apparatus and learning method thereof
Hopfield The effectiveness of analogue'neural network'hardware
US5956264A (en) Circuit arrangement for digital multiplication of integers
Abut et al. Vector quantizer architectures for speech and image coding
JPH06314185A (en) Variable logic and arithmetic unit
US5778153A (en) Neural network utilizing logarithmic function and method of using same
KR100326746B1 (en) System and method for approximating nonlinear functions
JP3172278B2 (en) Neural network circuit
US5781128A (en) Data compression system and method
Michel et al. Enhanced artificial neural networks using complex numbers
US20050033785A1 (en) Random number string output apparatus, random number string output method, program, and information recording medium
Vincent Finite Wordlength, Integer Arithmetic Multilayer Perceptron Modelling for Hardware Realization
Bochev Distributed arithmetic implementation of artificial neural networks
Bermak et al. VLSI implementation of a neural network classifier based on the saturating linear activation function
CA2135858A1 (en) Artificial neuron using adder circuit and method of using same
McGinnity et al. Novel architecture and synapse design for hardware implementations of neural networks
CN115952847A (en) Processing method and processing device of neural network model
Kwan Multiplierless designs for artificial neural networks

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20021006