CN110598858A

CN110598858A - Chip and method for realizing binary neural network based on nonvolatile memory calculation

Info

Publication number: CN110598858A
Application number: CN201910713408.6A
Authority: CN
Inventors: 康旺; 潘彪; 邓尔雅; 赵巍胜
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beihang University; Beijing University of Aeronautics and Astronautics
Priority date: 2019-08-02
Filing date: 2019-08-02
Publication date: 2019-12-20

Abstract

The invention provides a chip and a method for realizing a binary neural network based on nonvolatile memory computing, wherein the chip comprises: and the nonvolatile operation module is used for carrying out matrix multiplication and addition operation on the received first binarization data packet and a second binarization data packet prestored in the nonvolatile operation module. The weight of the binary neural network is usually fixed in the reasoning process, the input characteristics corresponding to each layer of neural network are usually changed along with application, the weight of the binary neural network is taken as a second binary data packet to be prestored in the nonvolatile operation module, and the input characteristics of the binary neural network are loaded to the nonvolatile operation module, so that matrix multiplication and addition operation is realized in the nonvolatile operation module, and the problems of power consumption and time delay caused by data movement are solved.

Description

Chip and method for realizing binary neural network based on nonvolatile memory calculation

Technical Field

The invention relates to the technical field of semiconductor integrated circuit application, in particular to a chip and a method for realizing a binary neural network based on nonvolatile memory computing.

Background

With the proposal of deep learning theory and the improvement of numerical computation equipment, the deep learning neural network technology is rapidly developed and is widely applied to the fields of computer vision, natural language processing and the like. At present, the neural network generally adopts floating point calculation, and requires a larger storage space and a longer operation time.

Binary Neural Network (BNN) is a Neural Network obtained by binarizing a weight value and each activation function value (eigenvalue) in a weight matrix of a floating-point Neural Network at the same time, that is, the Binary Neural Network is: the weight value and the activation function value are binarized to 1 or-1. Through binarization operation, the parameters of the model occupy smaller storage space (the memory consumption is reduced to 1/32 times in theory from float32 to 1bit), and bit operation is used to replace multiply-add operation in the network, so that the operation time is greatly reduced. Therefore, the binary neural network can solve the problems of overlarge model, overhigh calculation density and the like existing when the current floating-point neural network model is applied to an embedded or mobile scene (such as a mobile phone terminal, wearable equipment, an automatic driving automobile and the like), effectively reduces the occupation of storage space, reduces the operation time, and becomes a popular research direction for deep learning in recent years due to the potential advantages of high model compression ratio and high calculation speed.

However, although the binary neural network can reduce the memory space occupation and the operation time compared with the floating point neural network, the binary neural network still needs to transmit data between the memory and the processor, and frequent data movement still causes higher power consumption and time delay.

Disclosure of Invention

In view of this, the invention provides a chip and a method for implementing a binary neural network based on non-volatile memory calculation, which do not require frequent data movement between a memory and a processor, and solve the problems of power consumption and time delay caused by frequent data movement; in addition, because the binary neural network weight data is stored in the nonvolatile storage unit, the power failure data is not lost, the static power consumption can be greatly reduced, and the performance and the reliability are improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, a chip for implementing a binary neural network based on non-volatile memory computing is provided, which includes: and the nonvolatile operation module is used for carrying out matrix multiply-add operation on the received first binarization data packet and a second binarization data packet prestored in the module.

Further, the first binarization data packet comprises at least one binarization characteristic data, and the binarization characteristic data comprises at least one binarization characteristic signal; the second binarization data packet comprises at least one binarization weight data, and the binarization weight data comprises at least one binarization weight signal;

the nonvolatile operation module comprises a plurality of binary operation sub-modules, and the binary operation sub-modules are used for carrying out matrix multiplication and addition operation on binary characteristic data and binary weight data;

the binary operator module comprises: at least one binary exclusive-nor logic operation circuit and a counter connected to the at least one binary exclusive-nor logic operation circuit,

the binary exclusive OR circuit is used for carrying out exclusive OR operation on a binary characteristic signal and a binary weight signal, and the counter is used for counting the number of the binary exclusive OR circuits with the exclusive OR operation result of 1 in the binary operation submodule to be used as a matrix multiplication and addition operation result of the binary characteristic data and the binary weight data.

Further, the binary exclusive nor logic operation circuit includes: the CMOS transistor logic tree reading circuit comprises a nonvolatile storage unit, a control switch connected with the nonvolatile storage unit, a CMOS transistor logic tree unit connected with the nonvolatile storage unit and a reading unit connected with the CMOS transistor logic tree unit;

the nonvolatile storage unit is used for storing the binary weight signal;

the CMOS transistor logic tree unit loads the binary characteristic signal;

the control switch responds to a control signal to control the reading unit to carry out reading operation so as to read out the logical operation result of the same or of the binarization characteristic signal and the binarization weight signal;

the counter counts in response to the control signal.

Further, still include: a write module;

the writing module is connected with the nonvolatile storage unit and is used for writing the binary weight signal into the nonvolatile storage unit.

Further, the number of the nonvolatile memory cells of the binary exclusive-nor logic operation circuit is 2 or more,

the binary exclusive nor logic operation circuit further includes: a multi-way selector switch;

the multi-way selection switch is connected between the nonvolatile memory units and the CMOS transistor logic tree unit and used for selectively connecting one of the nonvolatile memory units to the CMOS transistor logic tree unit.

Further, the nonvolatile memory cell includes: a first switching element, a second switching element, a first nonvolatile memory device, and a second nonvolatile memory device;

the control end of the first switch element is connected with a first node, the first end is connected with a second node, and the second end is connected with one end of the first nonvolatile memory device;

the other end of the first nonvolatile memory device is connected with the second end of the control switch and one end of the second nonvolatile memory device;

the first end of the control switch is connected with a first level, and the control end is connected with a first control signal;

the control end of the second switch element is connected with the first node, the first end is connected with the third node, and the second end is connected with the other end of the second nonvolatile memory device;

wherein the first node receives a signal on a word line, the second node receives a signal on a bit line, and the third node receives a signal opposite to the signal received by the second node, the first non-volatile memory device is used for storing the binary weight signal, and the second non-volatile memory device is used for storing an opposite signal of the binary weight signal.

Further, the CMOS transistor logic tree cell includes: a third switching element, a fourth switching element, a fifth switching element, and a sixth switching element;

the reading unit includes: a seventh switching element, an eighth switching element, a ninth switching element, a tenth switching element, an eleventh switching element, and a twelfth switching element;

the control end of the third switching element is used as a first input end and used for receiving the binarization characteristic signal, the first end is connected with the second node, and the second end is connected with the first end of the fourth switching element and the first end of the seventh switching element;

the control end of the fourth switching element is used as a second input end and is used for receiving an opposite signal of the binarization characteristic signal, and the second end is connected with the third node;

the control end of the fifth switching element is used as a second input end and is used for receiving an opposite signal of the binarization characteristic signal, the first end is connected with the second node, and the second end is connected with the first end of the eighth switching element and the second end of the sixth switching element;

the control end of the sixth switching element is used as a first input end for receiving the binarization characteristic signal, and the first end is connected with the third node;

a second terminal of the seventh switching element is used as a first output terminal for outputting an opposite signal of an exclusive nor logical operation result, and is connected to the second terminal of the tenth switching element, the second terminal of the ninth switching element, the control terminal of the eighth switching element, and the control terminal of the eleventh switching element, and a control terminal of the seventh switching element is used as a second output terminal for outputting the exclusive nor logical operation result, and is connected to the control terminal of the tenth switching element, the second terminal of the eleventh switching element, and the second terminal of the twelfth switching element;

the control end of the ninth switching element is connected with a second control signal, and the first end of the ninth switching element is connected with a second level;

the first end of the tenth switching element is connected to the second level;

a first end of the eleventh switching element is connected to a second level;

the first end of the twelfth switching element is connected to the second level, and the control end is connected to the second control signal.

Further, still include: the post-processing module is connected with the nonvolatile operation module;

the post-processing module is used for carrying out arithmetic operation on the matrix multiplication and addition operation result of the nonvolatile operation module.

In a second aspect, a method for implementing a binary neural network based on non-volatile memory computing is provided, including:

loading at least one binarization characteristic signal contained in binarization characteristic data to a CMOS transistor logic tree unit in at least one binarization exclusive-OR logic operation circuit of a binarization operation sub-module, wherein a binarization weight signal corresponding to the binarization characteristic signal is prestored in a nonvolatile storage unit corresponding to the CMOS transistor logic tree unit in the binarization exclusive-OR logic operation circuit;

loading a control signal to a control switch in the binary exclusive nor logic operation circuit, so that the control switch responds to the control signal to control a reading unit in the binary exclusive nor logic operation circuit to perform reading operation, and reading out an exclusive nor logic operation result of the binarization characteristic signal and the binarization weight signal;

and meanwhile, the control signal is loaded to a counter of the binary operation submodule, so that the counter responds to the control signal to count the number of the binary exclusive OR circuits of which the exclusive OR result is 1 in the binary operation submodule to serve as the operation result of the binary operation submodule.

Further, still include:

and controlling a writing module to write the binary weight signal into the nonvolatile storage unit.

Further, still include:

and controlling a multi-way selection switch in the binary exclusive-nor logic operation circuit to selectively connect one of a plurality of nonvolatile memory cells in the binary exclusive-nor logic operation circuit to the CMOS transistor logic tree unit.

Further, still include:

the plurality of binarization weight signals are written into the plurality of nonvolatile memory cells, respectively.

Further, still include:

dividing a plurality of binary characteristic signals to be operated into at least two groups;

inputting at least two groups of binarization characteristic signals into the binarization operation sub-module in a time-sharing manner so as to perform at least two operations on the at least two groups of binarization characteristic signals in a time-sharing manner by using the binarization operation sub-module, wherein binarization weight signals corresponding to the received binarization characteristic signals are stored in the binarization operation sub-module;

and taking the accumulated count value of the counter in the binary operation sub-module corresponding to the at least two operations as the operation result of the plurality of binary characteristic signals to be operated.

Further, still include:

simultaneously inputting at least two groups of binarization characteristic signals into at least two binarization operation sub-modules so as to simultaneously operate the at least two groups of binarization characteristic signals by utilizing the at least two binarization operation sub-modules, wherein binarization weight signals corresponding to the received binarization characteristic signals are stored in the at least two binarization operation sub-modules;

and taking the sum of the count values of the counters of the at least two binary operation submodules as the operation result of the plurality of binary characteristic signals to be operated.

The invention provides a chip and a method for realizing a binary neural network based on nonvolatile memory computing, wherein the chip comprises: and the nonvolatile operation module is used for carrying out matrix multiplication and addition operation on the received first binarization data packet and a second binarization data packet prestored in the nonvolatile operation module. The weight of the binary neural network is usually fixed in the reasoning process, the input characteristics corresponding to each layer of neural network are usually changed along with application, the weight of the binary neural network is taken as a second binary data packet to be prestored in the nonvolatile operation module, and the input characteristics of the binary neural network are loaded to the nonvolatile operation module, so that matrix multiplication and addition operation is realized in the nonvolatile operation module.

In addition, the nonvolatile memory unit is low in writing speed, but small in occupied space, low in cost and low in power consumption, the CMOS transistor logic tree unit is high in writing speed, but large in occupied space and high in cost.

In addition, a plurality of nonvolatile storage units in each binary exclusive nor circuit can store different weights, different weights are controlled to participate in operation in a time division multiplexing mode by controlling a multi-way selection switch, multi-operation aiming at different weights or multi-layer operation of the same neural network is achieved, the hardware utilization rate is effectively improved, and the speed, the occupied space, the cost and the power consumption are further considered.

Furthermore, since the operation scales of different layers of the binary neural network may be different, assuming that the layer with the largest operation scale of the binary neural network is 100 × 5, that is, the number of input feature data is 100, and the number of weight data is 5, a binary operation submodule array of 100 × 5 is normally needed to implement, since the present invention uses the count value of the counter as the characteristic of the operation result of the binary operation sub-module, the 100 input feature data may be divided into 2 shares of 50 input feature data, or 4 shares of 25 input feature data, in the synchronous operation, 2 50 × 5 binary operation sub-module arrays or 4 25 × 5 binary operation sub-module arrays can be used for realizing the synchronous operation, in the asynchronous operation, the operation can be performed twice by a 50 × 5 binary operation sub-module array, and four times by a 25 × 5 binary operation sub-module array. Thus, the operation of a large array can be realized through a small array, and the utilization rate of hardware is improved.

In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. In the drawings:

FIG. 1 is a diagram of a chip implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention;

FIG. 2 is a block diagram of a non-volatile computing module according to an embodiment of the present invention;

FIG. 3 is a block diagram of the binary arithmetic sub-module of FIG. 2;

FIG. 4 is a block diagram of a binary exclusive-nor logic circuit according to an embodiment of the present invention;

FIG. 5a is a first circuit diagram of a binary-exclusive-OR circuit according to an embodiment of the present invention;

FIG. 5b is a circuit diagram of a binary-exclusive-OR circuit according to an embodiment of the present invention;

FIG. 6 is a timing diagram illustrating operation of a binary exclusive-nor logic circuit according to an embodiment of the present invention;

FIG. 7 is a first flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of the operation of a binary convolutional neural network;

fig. 9 illustrates a principle of obtaining elements in the convolution kernel according to a preset rule to obtain a plurality of binarization weight signals in the embodiment of the present invention;

fig. 10 illustrates a principle of obtaining a plurality of binarization feature signals corresponding to the plurality of binarization weight signals according to the input feature map, the convolution kernel and the preset rule in the embodiment of the present invention;

FIG. 11 illustrates the principle of implementing a binary convolutional neural network operation in an embodiment of the present invention;

FIG. 12 is a flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention;

FIG. 13 is a flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention;

FIG. 14 is a flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention;

FIG. 15 is a flow chart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention;

fig. 16 shows the principle of implementing the operation of a large array by a small array in the embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Unless otherwise specified, the expression "element a is connected to element B" means that element a is "directly" or "indirectly" connected to element B through one or more other elements.

In the prior art, although the binary neural network can reduce the occupied storage space and reduce the operation time compared with a floating point neural network, the binary neural network still needs to transmit data between a memory and a processor, which brings larger power consumption and performance overhead, and particularly hinders the application of the binary neural network on low-power-consumption and low-cost terminal equipment.

To solve the foregoing technical problems in the prior art, an embodiment of the present invention provides a chip for implementing a binary neural network based on nonvolatile memory computing, where the chip for implementing a binary neural network based on nonvolatile memory computing may include: and the nonvolatile operation module is used for carrying out matrix multiplication and addition operation on the received first binarization data packet and a second binarization data packet prestored in the nonvolatile operation module.

The weight of the binary neural network is taken as a second binary data packet to be prestored in the nonvolatile operation module, and the input characteristic of the binary neural network is loaded to the nonvolatile operation module, so that the matrix multiplication and addition operation is realized in the nonvolatile operation module, and the operation result of the binary neural network with simple operation can be obtained. The nonvolatile operation module occupies small space, has low cost and low power consumption, and the memory computing architecture can further solve the problems of power consumption and time delay caused by data transfer, so that the nonvolatile operation module is very suitable for low-power-consumption and low-cost terminal scenes.

In an optional embodiment, as shown in fig. 1, the chip for implementing a binary neural network based on non-volatile memory computing may further include: a post-processing module 200 connected to an output terminal of the nonvolatile operation module 100; the post-processing module 200 is configured to perform further arithmetic operations on the result of the matrix multiply-add operation output by the non-volatile operation module 100. The arithmetic operation includes: shifting, averaging, taking the maximum and minimum values, activating functions, etc.

The post-processing module performs arithmetic operations such as shifting, biasing, averaging, maximum and minimum values, activation function and the like, so that the operation result of the complex binary neural network can be obtained.

For example, the activation function can be used to binarize the operation result of the neural network of the previous layer, so that the chip can be used to operate the neural network of the next layer, and the operation result of the previous layer does not need to be transmitted to the processor for binarization, thereby reducing the data transmission times.

In an optional embodiment, the chip for implementing the binary neural network based on the non-volatile memory computing may further include: an input register 300 for registering data to be operated, and an output register 400 for registering an operation result of the nonvolatile operation module 100, wherein the input register 300 is connected to the front end of the nonvolatile operation module 100, and the output register 400 is connected to the output end of the post-processing module 200.

In an optional embodiment, the chip for implementing the binary neural network based on the non-volatile memory computing may further include: an on-chip memory 500 for temporarily storing data to be operated and/or intermediate data and/or operation results, the on-chip memory 500 being connected between the input interface of the chip and the input register 300.

In an optional embodiment, the chip for implementing the binary neural network based on the non-volatile memory computing may further include: and a controller 600 for controlling the operation of each module, wherein the controller 600 is connected with each module of the chip so as to control the operation of each module.

In an optional embodiment, the chip for implementing the binary neural network based on the non-volatile memory computing may further include: and the clock circuit is used for generating a clock signal as a control signal for controlling the switch and the counter. At this time, a clock signal is generated by a clock circuit of the chip itself and supplied as a control signal to the nonvolatile operation module 100, and an external control signal is not necessary.

It will be understood by those skilled in the art that the control signal input to the nonvolatile operation module 100 may include a first control signal and/or a second control signal, and the first control signal and the second control signal may be control signals that are opposite in phase to each other.

In a further embodiment, the clock circuit may further comprise an inverter for generating an inverted clock signal as the inverted control signal.

It is worth mentioning that the first binarization data packet comprises at least one binarization feature data, and the binarization feature data comprises at least one binarization feature signal; the binarized feature signal is "1" or "0". The second binarization data packet comprises at least one binarization weight data, the binarization weight data comprises at least one binarization weight signal, and the binarization weight signal is '1' or '0'.

The binary neural network operation is mainly a matrix multiply-add operation, and the binary matrix multiply-add operation may be equivalent to an XNOR (exclusive or) logical operation, assuming that a binary feature data is [ 1,1,1,1, 0 ], and a binary weight data is [ 0, 0,1,1,1 ], the truth tables of the matrix multiply-add operation and the exclusive or logical operation performed on the binary feature data and the binary weight data are as follows:

through multiple research and analysis, it can be found that multiplication operation is the same as the truth table of exclusive nor operation, the operation result of binary matrix multiply-add operation is 3, and the number of exclusive nor operation results 1 is counted by a counter and is also 3.

The binarization feature data comprises 5 binarization feature signals which are respectively 1,1,1,1 and 0, and the binarization weight data comprises 5 binarization weight signals which are respectively 0, 0,1,1 and 1.

Based on the above characteristics, the inventors found that a binary neural network operation can be realized by designing an exclusive nor logic operation.

FIG. 2 is a block diagram of a non-volatile computing module 100 according to an embodiment of the invention; as shown in fig. 2, the nonvolatile operation module 100 includes: the device comprises a plurality of binary operation sub-modules and a writing module connected with each binary operation sub-module.

The binary operation sub-module is used for carrying out matrix multiplication and addition operation on binary characteristic data and binary weight data.

And the writing module is used for writing the binary weight data into the corresponding binary operation submodule.

Because the operation amount of the neural network operation is large, the nonvolatile operation module 100 may include a plurality of binary operation sub-modules so as to perform exclusive nor logic operation on a plurality of binary feature data, the number of the binary operation sub-modules is greater than or equal to 1, the plurality of binary operation sub-modules may share one write-in module, or one write-in module may be set for each binary operation sub-module, or a part of the binary operation sub-modules share one write-in module.

Each binary operation submodule comprises at least one binary and logical operation circuit and a counter connected with the at least one binary and logical operation circuit, as shown in fig. 3.

The binary and logical operation circuit array is arranged, and the binary and logical operation circuits in each column are connected to a counter.

Specifically, the binary exclusive nor circuit is configured to perform exclusive nor operation on a binary feature signal and a binary weight signal, and the counter is configured to count the number of the binary exclusive nor circuits having an exclusive nor result of 1 in the binary operation sub-module, and serve as a matrix multiply-add operation result of the binary feature data and the binary weight data.

FIG. 4 is a block diagram of a binary exclusive-nor logic circuit according to an embodiment of the present invention; as shown in fig. 4, the binary exclusive nor logic operation circuit includes: the nonvolatile memory comprises a nonvolatile memory unit 3, a control switch 4 connected with the nonvolatile memory unit 3, a CMOS transistor logic tree unit 2 connected with the nonvolatile memory unit 3 and a reading unit 1 connected with the CMOS transistor logic tree unit 2; the nonvolatile storage unit 3 is used for storing a binary weight signal; the CMOS transistor logic tree unit 2 loads a binarization characteristic signal; the control switch 4 responds to a control signal to control the reading unit 1 to carry out reading operation so as to read out the logical operation result of the same or of the binarization characteristic signal and the binarization weight signal, and the counter responds to the control signal to count the number of the binary logical operation circuits of which the logical operation result is 1 in the binary operation sub-module connected with the counter and serves as the operation result of the binary operation sub-module.

Fig. 5a is a first circuit diagram of a binary exclusive nor logic operation circuit according to an embodiment of the present invention. As shown in fig. 5a, the nonvolatile memory cell 3 may include: a switching element M1, a switching element M2, a first nonvolatile memory device R1, and a second nonvolatile memory device R2;

a control terminal of the switching element M1 is connected to a first node, a first terminal is connected to a second node, and a second terminal is connected to one terminal of the first nonvolatile memory device R1;

the other end of the first nonvolatile memory device R1 is connected to the second end of the control switch Mc and one end of the second nonvolatile memory device R2;

a first end of the control switch Mc is connected to a first level, and a control end is connected to a first control signal;

a control terminal of the switching element M2 is connected to the first node, a first terminal thereof is connected to the third node, and a second terminal thereof is connected to the other terminal of the second nonvolatile memory device R2;

wherein a first node receives a signal on a word line WL, a second node receives a signal on a bit line BL, a third node receives a signal opposite to the signal received by the second node, the first non-volatile memory device R1 is for storing the binarized weight signal B, the second non-volatile memory device R2 is for storing the opposite of the binarized weight signalSignal

In addition, the nonvolatile memory device includes a resistive memory device, a spin memory device, a phase change memory device, and the like.

The CMOS transistor logic tree cell includes: a switching element M3, a switching element M4, a switching element M5, and a switching element M6;

the reading unit includes: a switching element M7, a switching element M8, a switching element M9, a switching element M10, a switching element M11, and a switching element M12;

a control terminal of the switching element M3 is used as a first input terminal for receiving the binary characteristic signal a, a first terminal is connected to the second node, and a second terminal is connected to the first terminal of the switching element M4 and the first terminal of the switching element M7;

the control terminal of the switching element M4 is used as a second input terminal for receiving the inverse signal of the binary characteristic signalThe second end is connected with the third node;

a control terminal of the switching element M5 is used as a second input terminal for receiving an opposite signal of the binarized feature signal, a first terminal is connected to the second node, and a second terminal is connected to the first terminal of the switching element M8 and the second terminal of the switching element M6;

the control terminal of the switching element M6 serves as a first input terminal for receiving the binarized feature signal, and the first terminal is connected to the third node;

a second terminal of the switch element M7 is used as a first output terminal for outputting an inverse signal of an exclusive nor logic operation result, and is connected to the second terminal of the switch element M10, the second terminal of the switch element M9, the control terminal of the switch element M8, and the control terminal of the switch element M11, and a control terminal of the switch element M7 is used as a second output terminal for outputting the exclusive nor logic operation result, and is connected to the control terminal of the switch element M10, the second terminal of the switch element M11, and the second terminal of the switch element M12;

the control end of the switching element M9 is connected with a second control signal, and the first end is connected with a second level;

the first end of the switching element M10 is switched into a second level;

the first end of the switching element M11 is switched into a second level;

the first terminal of the switching element M12 is connected to the second level, and the control terminal is connected to the second control signal.

The control switch Mc, the switch element M1, the switch element M2, the switch element M3, the switch element M4, the switch element M5, the switch element M6, the switch element M7, and the switch element M8 may be implemented by NMOS transistors, the switch element M9, the switch element M10, the switch element M11, and the switch element M12 may be implemented by PMOS transistors, the first terminal may be a source of a MOS transistor, the second terminal may be a drain of the MOS transistor, and the control terminal may be a gate of the MOS transistor.

It should be noted that, as will be apparent to those skilled in the art, the high and low levels of the various signals are matched with the type of the transistor to achieve the corresponding functions. The high and low level transistors and the N-type and P-type transistors of the above embodiments are only examples, and the opposite combination is also within the scope of the present invention, for example, turning on the P-type transistor requires matching a low level signal, and turning on the N-type transistor requires matching a high level signal.

The transistor provided by the embodiment of the invention can be a field effect transistor, an enhancement type field effect transistor and a depletion type field effect transistor. More preferably, the transistor low-temperature polysilicon TFT can reduce the manufacturing cost and the product power consumption, has faster electron mobility and smaller thin film circuit area, and improves the resolution and the stability of display. An oxide semiconductor TFT may also be employed.

Of course, the first terminal of the transistor provided in the embodiments of the present invention may be a source, and the second terminal is a drain, or vice versa, and the present invention is not limited to this, and may be selected according to the type of the transistor.

In an alternative embodiment, the binary exclusive nor logic operation circuit may include a plurality of nonvolatile memory cells, and the binary exclusive nor logic operation circuit further includes: a multiplexing switch MUX, see fig. 5b, is arranged between the plurality of non-volatile memory cells and the CMOS transistor logic tree cell for selectively connecting one of the non-volatile memory cells to said CMOS transistor logic tree cell.

Fig. 6 is a timing diagram illustrating the operation of the binary and/or logic operation circuit according to the embodiment of the present invention, and in order to make those skilled in the art better understand the present invention, the operation principle of the binary and/or logic operation circuit will be described below with reference to the circuit diagram of fig. 5 a:

when the control signal CLK is at a falling edge, the control switch Mc is turned off, the switching element M9 and the switching element M12 are closed, the CMOS transistor logic tree unit and the nonvolatile memory cell are not conductive, the readout circuit is in a closed state, three terminals of the switching element M9, the switching element M10, the switching element M11 and the switching element M12 are all at a high level, at this time, the output terminal XNOR is at a high level regardless of whether the binarization characteristic signal and the binarization weight signal are at a high level or a low level, when the CLK is at a rising edge, the control switch Mc is turned on, the switching element M9 and the switching element M12 are turned off, at this time, the CMOS transistor logic tree unit and the nonvolatile memory cell are conductive, the readout circuit performs a read operation, and at this time, the output terminal XNOR outputs the result of the logical operation of a and B.

For example, assuming that a is 1 and B is 0, at this time, the switching elements M3 and M6 are loaded with 1 and turned on, the switching elements M4 and M5 are loaded with 0 and turned off, data 1 is stored in the variable resistor R1 and data 0 is stored in the variable resistor R2, at this time, the path from the switching element M6 to the variable resistor R2 is turned on, and then the path from the switching element M6 to the variable resistor R2 is turned onTerminal output 1, XNOR terminal output 0, and implementationAnd the same or logic operation of A and B.

Fig. 7 is a first flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention. The method for implementing the binary neural network based on the nonvolatile memory computing can be used for the above chip for implementing the binary neural network based on the nonvolatile memory computing, and referring to fig. 7, the method can include the following steps:

step S100: loading at least one binarization characteristic signal contained in binarization characteristic data to a CMOS transistor logic tree unit in at least one binarization exclusive-nor logic operation circuit of a binarization operation sub-module, wherein a binarization weight signal corresponding to the binarization characteristic signal is prestored in a nonvolatile storage unit corresponding to the CMOS transistor logic tree unit in the binarization exclusive-nor logic operation circuit.

Specifically, according to the operation requirement, the neural network operation may include a plurality of binarization feature data, each binarization feature data may include a plurality of binarization feature signals, in operation, the plurality of binarization feature data are loaded to a plurality of binarization operation sub-modules storing corresponding binarization weight data, and the plurality of binarization operation sub-modules respectively perform matrix multiply-add operation on the received binarization feature data and the corresponding binarization weight data.

For one of the binary operation sub-modules, the plurality of binary feature signals in the received binary feature data are loaded into a plurality of binary union or logic operation circuits of the binary operation sub-module respectively, the plurality of binary weight signals in the binary weight data are written into the plurality of binary union or logic operation circuits respectively, and a counter in the binary operation sub-module is used for counting the number of the binary union or logic operation circuits with an operation result of 1, namely the matrix multiplication and addition operation result of the binary feature data and the corresponding binary weight data.

It should be noted that, because the binary arithmetic circuit in the binary arithmetic sub-module can be one or more columns, when the matrix multiply-add operation needs to be performed on the binary characteristic data and the binary weight data, the plurality of binary characteristic signals of the binary characteristic data are loaded into a column of binary arithmetic circuit, and the plurality of binary weight signals in the binary weight data are pre-written into the column of binary arithmetic circuit.

When matrix multiply-add operation needs to be carried out on the binarization feature data and the binarization weight data, the binarization feature signals of the binarization feature data are loaded into a multi-column binary-exclusive OR logic operation circuit, and the binarization weight data are respectively pre-written into the multi-column binary-exclusive OR logic operation circuit, so that the matrix multiply-add operation of the binarization feature data and the binarization weight data can be realized at one time, the operation amount is effectively reduced, the operation speed and the operation efficiency are improved, and the advantages are more remarkable particularly for a convolution neural network with a plurality of convolution kernels.

Step S200: and loading a control signal to a control switch in the binary exclusive OR logic operation circuit, so that the control switch responds to the control signal to control a reading unit in the binary exclusive OR logic operation circuit to perform reading operation, and reading out an exclusive OR logic operation result of the binarization characteristic signal and the binarization weight signal.

Step S300, loading the control signal to a counter of the binary operation sub-module, so that the counter counts the number of binary exclusive nor circuits in the binary operation sub-module, of which exclusive nor result is 1, in response to the control signal, as an operation result of the binary operation sub-module.

The counter counts in response to the rising edge of the control signal or in response to the falling edge of the control signal, and is determined according to the working time sequence of the reading unit.

In an alternative embodiment, the reading unit performs the reading action on the rising edge of the control signal, and the counter also counts on the rising edge of the control signal.

Wherein, step S200 and step S300 are performed synchronously.

In addition, the binary neural network is a binary fully-connected neural network, a binary convolutional neural network, or the like, which is not limited in this embodiment of the present invention.

According to the technical scheme, the weight data which do not need to be changed frequently are stored in the nonvolatile storage unit, the feature data which can be changed at any time are loaded in the CMOS transistor logic tree unit, the advantages of all devices are effectively utilized, the speed, the occupied space, the cost and the power consumption are considered, the operation result can be realized through reading operation, the hardware implementation scheme is simplified, and the method is suitable for parallelization large-scale operation and is beneficial to application on low-power-consumption low-cost terminal equipment.

For a binary fully-connected neural network, a fully-connected network layer can be realized through a binary exclusive-nor logic operation circuit array, the circuit area and the number of elements are effectively reduced, and integration is facilitated.

Fig. 8 is a working schematic diagram of a binary convolutional neural network. Referring to fig. 8, the input feature map includes 5 × 5 feature units, the convolution kernel includes 3 × 3 elements, the shift step x is 1, and y is 1, when the convolution kernel is used to perform convolution operation on the input feature map, the convolution kernel performs matrix multiplication and addition operation on the first three elements in the first row, the first three elements in the second row, and the first three elements in the third row, and the operation result is taken as the first element in the first row of the output, and then the convolution kernel performs matrix multiplication and addition operation on the second to fourth elements in the first row, the second to fourth elements in the second row, and the second to fourth elements in the third row, and the operation result is taken as the second element in the first row of the output, and so on, the convolution operation is implemented.

In an optional embodiment, when the method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention implements a binary convolutional neural network operation, the method further includes the following steps:

step 1: and obtaining elements in the convolution kernel according to a preset rule to obtain a plurality of binarization weight signals, wherein each binarization weight signal corresponds to one element in the convolution kernel.

For example, referring to fig. 9, the elements in the convolution kernel are obtained in the order from left to right and from top to bottom to obtain 9 binary weight signals, which are-1, -1,1,1,1,1,1, -1, respectively, i.e. it can be understood as straightening the convolution kernels of the array.

Step 2: and obtaining a plurality of binarization feature signals corresponding to the binarization weight signals according to the input feature map, the convolution kernel and the preset rule.

For example, referring to fig. 10, for the first operation corresponding to the shaded portion in fig. 8, since the convolution kernel is 3 × 3, the 9 binary feature signals are obtained from left to right and from top to bottom in this order: 1,1,1, -1,1,1, -1, -1,1 may be understood as straightening the input features of the array.

Fig. 11 illustrates the principle of implementing the binary convolutional neural network operation in the embodiment of the present invention. Referring to fig. 11, when performing convolution operation on a binary feature data containing a plurality of binary feature signals according to a plurality of convolution kernels by using an operation unit provided in an embodiment of the present invention, in this case, according to the above description, the plurality of convolution kernels are first converted into a plurality of binary weight data, each of which includes a plurality of binary weight signals, the plurality of binary weight data are respectively written into a plurality of rows of binary exclusive nor logic operation circuits, for each row of binary exclusive nor logic operation circuits, each of which stores one binary weight signal, the plurality of binary feature signals are respectively input into each row of the plurality of rows of binary exclusive nor logic operation circuits, so as to operate the plurality of binary feature signals with the corresponding plurality of binary weight signals, by using the method provided in an embodiment of the present invention, the method can simultaneously carry out convolution operation on the binarization feature data containing a plurality of binarization feature signals according to a plurality of convolution cores, thereby greatly improving the operation speed and efficiency and simplifying the operation complexity.

In an optional embodiment, the method for implementing a binary neural network based on non-volatile memory computing may further include: acquiring a weight signal and a characteristic signal; and carrying out binarization on the weight signal and the characteristic signal to obtain a binarization weight signal and a binarization characteristic signal.

Specifically, when the scheme provided by the invention is used for operating a common neural network, the feature signal and the weight signal need to be binarized first, and then the operation is executed.

Fig. 12 is a flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention. Referring to fig. 12, the method for implementing a binary neural network based on non-volatile memory computing may further include, on the basis of the method for implementing a binary neural network based on non-volatile memory computing shown in fig. 7:

step S10: and controlling a writing module to write the binarization weight signal into the nonvolatile storage unit.

In another alternative embodiment, when the binary exclusive nor logic operation circuit includes a plurality of nonvolatile memory cells, a multiplexer switch needs to be disposed between the plurality of nonvolatile memory cells and the CMOS transistor logic tree unit, and in this case, referring to fig. 13, the method for implementing a binary neural network based on nonvolatile memory computing may further include, on the basis of the method for implementing a binary neural network based on nonvolatile memory computing shown in fig. 7:

step S10': writing a plurality of binarization weight signals into the plurality of nonvolatile memory cells of the binarization operation sub-module, respectively.

Step S20: controlling a multiplexing switch in the binary exclusive-nor logic operation circuit to selectively connect one of a plurality of non-volatile memory cells in the binary exclusive-nor logic operation circuit to the CMOS transistor logic tree cell.

It is worth to be noted that by setting and controlling the multi-way selection switch, a plurality of weights can be prestored in the same binary exclusive OR logic operation circuit, and by means of a multiplexing mode, a plurality of operations of different weights or multi-layer operations of the same neural network are realized, so that the hardware utilization rate is effectively improved, and the speed, the occupied space, the cost and the power consumption are further considered.

Fig. 14 is a fourth flowchart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention. Referring to fig. 14, the method for implementing a binary neural network based on non-volatile memory computing based on the method for implementing a binary neural network based on non-volatile memory computing shown in fig. 7 may further include:

step S1000: dividing a plurality of binary characteristic signals to be operated into at least two groups;

step S2000: inputting at least two groups of binarization characteristic signals into the binarization operation sub-module in a time-sharing manner so as to perform at least two operations on the at least two groups of binarization characteristic signals in a time-sharing manner by using the binarization operation sub-module, wherein binarization weight signals corresponding to the received binarization characteristic signals are stored in the binarization operation sub-module;

step S3000: and taking the accumulated count value of the counter in the binary operation sub-module corresponding to the at least two operations as the operation result of the plurality of binary characteristic signals to be operated.

Wherein, steps S1000 and S2000 are executed before step S100, and step S3000 is executed after step S300.

By adopting the scheme, the operation of the large array can be realized through the small array, so that the utilization rate of hardware is improved.

FIG. 15 is a flow chart of a method for implementing a binary neural network based on non-volatile memory computing according to an embodiment of the present invention; referring to fig. 15, the method for implementing a binary neural network based on non-volatile memory computing based on the method for implementing a binary neural network based on non-volatile memory computing shown in fig. 7 may further include:

step S2000': and simultaneously inputting at least two groups of binarization characteristic signals into at least two binarization operation sub-modules so as to simultaneously operate the at least two groups of binarization characteristic signals by using the at least two binarization operation sub-modules, wherein binarization weight signals corresponding to the received binarization characteristic signals are stored in the at least two binarization operation sub-modules.

Step S3000': and taking the sum of the count values of the counters of the at least two binary operation sub-modules as the operation result of the plurality of binary characteristic signals to be operated.

Wherein, steps S1000 and S2000 'are executed before step S100, and step S3000' is executed after step S300.

The following describes an operation principle for realizing a large array by a small array with reference to fig. 16:

because the operation scales of different layers of the binary neural network may be different, assuming that a layer with the largest operation scale of the binary neural network is 100 × 5, that is, the number of input feature data is 100, and the number of weight data is 5, normally a 100 × 5 binary operation submodule array is needed to implement, because the invention uses the counting value of the counter as the characteristic of the operation result of the binary operation submodule, 100 input feature data can be divided into 2 parts of 50 input feature data or 4 parts of 25 input data, in the synchronous operation, 2 50 × 5 binary operation submodule arrays or 4 25 × 5 binary operation submodule arrays can be used to implement, the counter counts the number of output median values of 2 50 × 5 binary operation submodule arrays or 4 25 × 5 binary operation submodule arrays being 1, in the asynchronous operation, the operation can be performed twice by a 50 × 5 binary operation sub-module array, and four times by a 25 × 5 binary operation sub-module array, and the number of output median values of 1 in each operation is counted by a counter. Thus, the operation of a large array can be realized through a small array, and the utilization rate of hardware is improved.

It should be noted that the data to be calculated may be input data of a neural network, or may be an operation result of each layer of neural network, and the operation result is used as input data of a next layer of neural network to participate in the operation.

The nonvolatile memory cell may be a memory cell in a resistive random access memory (ReRAM), a spin torque transfer magnetic random access memory (STT-MRAM), a phase change memory (PCRAM), or the like. The resistance of the nonvolatile memory cell can have two states, one high and one low, representing data bits "0" and "1", respectively, or vice versa.

In various embodiments of the present application, the control signal may be a clock signal or other type of control signal that is manipulated by the controller.

The embodiment of the present invention further provides an application of a nonvolatile operation module, which may be used in cooperation with a compiling software to implement a binary neural network in image processing, voice recognition, and the like, particularly, a vehicle-mounted navigator, a military instrument device, a handheld face recognition device, and the like with an offline work requirement, where the nonvolatile operation module may be disposed in an electronic device, and specifically, the electronic device may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

In summary, the chip and the method for implementing a binary neural network based on non-volatile memory computing according to the embodiments of the present invention can directly perform computing in a memory by using the non-volatile memory computing principle, greatly reduce data migration volume, and reduce transmission power consumption, thereby overcoming the architecture bottleneck of von neumann computing system, and having great application potential in low-power-consumption and low-cost terminal scenarios, and in addition, effectively utilize the non-volatile memory cells and the non-volatile memory cells

The characteristics of the CMOS transistor are matched with the characteristics of weight data and characteristic data in a neural network, the advantages of all devices are effectively utilized, the speed, the occupied space, the cost and the power consumption are considered, the operation result can be realized by reading operation, the hardware implementation scheme is simplified, the parallel large-scale operation is suitable, and the parallel large-scale operation is favorably applied to low-power-consumption and low-cost terminal equipment.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A chip for realizing a binary neural network based on nonvolatile memory computing is characterized by comprising: and the nonvolatile operation module is used for carrying out matrix multiply-add operation on the received first binarization data packet and a second binarization data packet prestored in the module.

2. The chip for implementing a binary neural network based on non-volatile memory computing according to claim 1, wherein the first binarization data packet comprises at least one binarization feature data, and the binarization feature data comprises at least one binarization feature signal; the second binarization data packet comprises at least one binarization weight data, and the binarization weight data comprises at least one binarization weight signal;

the binary exclusive OR circuit is used for carrying out exclusive OR operation on a binary characteristic signal and a binary weight signal, and the counter is used for counting the number of the binary exclusive OR circuits with the exclusive OR operation result of 1 in the binary operation submodule to serve as a matrix multiplication and addition operation result of the binary characteristic data and the binary weight data.

3. The chip for implementing a binary neural network based on non-volatile memory computing according to claim 2,

the binary exclusive nor logic operation circuit includes: the CMOS transistor logic tree reading circuit comprises a nonvolatile storage unit, a control switch connected with the nonvolatile storage unit, a CMOS transistor logic tree unit connected with the nonvolatile storage unit and a reading unit connected with the CMOS transistor logic tree unit;

the nonvolatile storage unit is used for storing the binarization weight signal;

the CMOS transistor logic tree unit loads the binarization characteristic signal;

the control switch responds to a control signal to control the reading unit to carry out reading operation so as to read out an exclusive OR logical operation result of the binarization characteristic signal and the binarization weight signal;

the counter counts in response to the control signal.

4. The chip for implementing a binary neural network based on non-volatile memory computing according to claim 3, further comprising: a write module;

the writing module is connected with the nonvolatile storage unit and used for writing the binarization weight signal into the nonvolatile storage unit.

5. The chip for realizing the binary neural network based on the nonvolatile memory computing is characterized in that the number of the nonvolatile memory cells of the binary exclusive-OR logic operation circuit is greater than or equal to 2,

6. The chip for implementing a binary neural network based on non-volatile memory computing according to claim 3, wherein the non-volatile memory unit comprises: a first switching element, a second switching element, a first nonvolatile memory device, and a second nonvolatile memory device;

the control end of the first switch element is connected with a first node, the first end of the first switch element is connected with a second node, and the second end of the first switch element is connected with one end of the first nonvolatile memory device;

the other end of the first nonvolatile storage device is connected with the second end of the control switch and one end of the second nonvolatile storage device;

a first end of the control switch is connected with a first level, and a control end is connected with a first control signal;

the control end of the second switch element is connected with the first node, the first end of the second switch element is connected with the third node, and the second end of the second switch element is connected with the other end of the second nonvolatile storage device;

wherein the first node receives a signal on a word line, the second node receives a signal on a bit line, and the third node receives a signal opposite to the signal received by the second node, the first non-volatile storage device is used for storing the binarization weight signal, and the second non-volatile storage device is used for storing an opposite signal of the binarization weight signal.

7. The chip for implementing a binary neural network based on non-volatile memory computing of claim 6, wherein the CMOS transistor logic tree cells comprise: a third switching element, a fourth switching element, a fifth switching element, and a sixth switching element;

a control end of the third switching element is used as a first input end and used for receiving the binarization characteristic signal, a first end is connected with the second node, and a second end is connected with a first end of the fourth switching element and a first end of the seventh switching element;

a control terminal of the fifth switching element is used as a second input terminal for receiving an opposite signal of the binarization characteristic signal, a first terminal is connected with the second node, and a second terminal is connected with a first terminal of the eighth switching element and a second terminal of the sixth switching element;

the control terminal of the sixth switching element is used as the first input terminal for receiving the binary characteristic signal, and the first terminal is connected to the third node;

a second terminal of the seventh switching element is used as a first output terminal for outputting an opposite signal of an exclusive nor logical operation result, and is connected to a second terminal of the tenth switching element, a second terminal of the ninth switching element, a control terminal of the eighth switching element, and a control terminal of the eleventh switching element, and a control terminal of the seventh switching element is used as a second output terminal for outputting the exclusive nor logical operation result, and is connected to a control terminal of the tenth switching element, a second terminal of the eleventh switching element, and a second terminal of the twelfth switching element;

a control end of the ninth switching element is connected with a second control signal, and a first end of the ninth switching element is connected with a second level;

a first end of the tenth switching element is connected to a second level;

a first end of the eleventh switching element is connected to a second level;

and the first end of the twelfth switching element is connected to the second level, and the control end of the twelfth switching element is connected to the second control signal.

8. The chip for implementing a binary neural network based on non-volatile memory computing according to claim 1, further comprising: the post-processing module is connected with the nonvolatile operation module;

and the post-processing module is used for performing arithmetic operation on the matrix multiplication and addition operation result of the nonvolatile operation module.

9. A method for implementing a binary neural network based on non-volatile memory computing, comprising:

loading a control signal to a control switch in the binary exclusive nor logic operation circuit, so that the control switch controls a reading unit in the binary exclusive nor logic operation circuit to perform reading operation in response to the control signal, so as to read out an exclusive nor logic operation result of the binarization characteristic signal and the binarization weight signal;

and meanwhile, the control signal is loaded to a counter of the binary operation submodule, so that the counter responds to the control signal to count the number of binary exclusive OR (XNOR) circuits with exclusive OR results of 1 in the binary operation submodule to serve as the operation result of the binary operation submodule.

10. The method of implementing a binary neural network based on non-volatile memory computing of claim 9, further comprising:

and controlling a writing module to write the binarization weight signal into the nonvolatile storage unit.

11. The method of implementing a binary neural network based on non-volatile memory computing of claim 9, further comprising:

12. The method of implementing a binary neural network based on non-volatile memory computing of claim 11, further comprising:

writing a plurality of binarized weight signals into the plurality of nonvolatile memory cells, respectively.

13. The method of implementing a binary neural network based on non-volatile memory computing of claim 9, further comprising:

14. The method of implementing a binary neural network based on non-volatile memory computing of claim 9, further comprising:

and taking the sum of the count values of the counters of the at least two binary operation sub-modules as the operation result of the plurality of binary characteristic signals to be operated.