CN111144558A

CN111144558A - Multi-bit convolution operation module based on time-variable current integration and charge sharing

Info

Publication number: CN111144558A
Application number: CN202010257151.0A
Authority: CN
Inventors: 阿隆索·莫尔加多; 刘洪杰
Original assignee: Shenzhen Jiutian Ruixin Technology Co ltd
Current assignee: Shenzhen Jiutian Ruixin Technology Co ltd
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-05-12
Anticipated expiration: 2040-04-03
Also published as: CN111144558B; WO2021197073A1

Abstract

The invention relates to an analog operation module, in particular to an analog operation module related to convolution operation, and provides a group of analog Multipliers and Accumulators (MAC). Wherein the current integration in the capacitors is used for the multiplication of two multi-bit binary convolution processes, while the charge sharing between the capacitors realizes the addition process. In the multiplication stage, the integration time of PWM control current in the capacitor is tau, 2 tau and 4 tau for the same clock period tau^（B‑1）τ such that a binary multiplier of a given number of bits has a weight bit change per bit k when multiplied. This idea is applicable to a range of multi-bit convolutions with adjustable number of bits can be used to implement a general convolution with two or more inputs, and the number of bits of the binary can be adjusted. In particular, an array of offset arithmetic units may be added. The invention can be used as a memory or a near memory realized by a neural network convolution operation unit or an operation accelerator hardwareA unit of memory operations.

Description

Multi-bit convolution operation module based on time-variable current integration and charge sharing

Technical Field

The present invention relates to analog computation modules, and particularly to an analog computation module for convolution operations and an analog computation method for convolution operations.

Background

For quantization with low signal-to-noise ratio, analog operation has higher efficiency than traditional digital operation, and therefore, digital quantity is usually converted into analog quantity for operation. Especially for the neural network, compared with the medium and large hardware implementation of the neural network, the operation energy consumption of the neural network is lower, because the traditional data is stored in the disk, the data needs to be extracted into the memory during the operation, and the process needs a large amount of I/O connected with the storage of the traditional memory, which usually occupies more power consumption. And the operation process can be sent to data for local execution based on the analog memory and near memory operation, so that the operation speed is greatly improved, the storage area is saved, and the data transmission and the operation power consumption are reduced. The invention provides an effective realization method of ultra-low power consumption analog memory or near memory operation.

The recent paper "a Mixed-Signal binary weighted Storage and Multiplication for reduced data Movement" symmetry. VLSI Circuits, pp. 141-142, 2018 presents an efficient performance, and the method relates to a procedure of transmitting a Multiplication operation with one-bit emphasis in a Neural network, i.e. an input layer to a volume layer, then to a pooling layer, and finally to output, by storing a weight of 1 bit by a Static Random-Access Memory (SRAM) unit and performing convolution operation on an input Mixed Signal. However, in this background art document, the implementation of the analog operation circuit does not involve a change in the weight bits of the multiplier or multiplicand, and is limited to the input of 1-bit multiplication in the first order layer, and cannot be used for convolution analog operation of a multi-bit binary number.

Very few multi-bit operations involve changes in the weight bits of the multiplier or multiplicand, as in the article:

“In-Memory Computation of a Machine-Learning Classifier in a Standard 6TSRAM Array”, JSSC, pp. 915-924, 2017；（2）“A 481pJ/decision 3.4M decision/smultifunctional deep inmemory inference processor using standard 6T SRAMarray”,arXiv:1610.07501, 2016；（3）“A Microprocessor implemented in 65nm CMOSwith Configurable and Bit-scalable Accelerator for Programmable In-memoryComputing”，arXiv :1811.04047, 2018；（4）“A Twin-8T SRAM Computation-In-MemoryMacro for Multiple-Bit CNN-Based Machine Learning,”，ISSCC, pp. 396-398,2018，(5)“A 42 pJ/Decision 3.12TOPS/W Robust In-Memory Machine Learning Classifierwith On-Chip Training,” ISSCC, pp. 490-491,2018；

however, these multi-bit operations are implemented by modulating the control bus, capacitance charge sharing, Pulse-width-modulated (PWM) control of SRAM reading and writing, SRAM cell modification, or complex digital matrix vector processing using near \ memory operations in the current domain. In the implementation methods of the multi-bit operation, the multi-bit analog multiplier and accumulator always adopt very complicated digital processing control, but in the aspect of quantization with low signal to noise ratio, the traditional digital operation consumes a lot of effects compared with the analog operation, so the multi-bit operation under the control of the digital processing generates great operation energy consumption.

In the stage of performing the exclusive or operation by the binarization convolution proposed in CN201910068644, the potential change is realized by modulating a control bus in the SRAM, but the technical scheme and teaching provided by the patent require complex digital processing control, have high requirements on a control module, and consume excessive energy consumption. Therefore, there is a need in the art for a solution that employs analog operation for signals with low signal-to-noise ratio to achieve ultra-low power consumption.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a multi-bit binary convolution analog operation module based on time-variable current integration and charge sharing with ultra-low power consumption, compact structure and high operation speed, which supports general convolution of two or more inputs, and the bit number of the binary can be adjusted, especially for a neural network convolution operation unit or a unit for analog memory operation implemented by operation accelerator hardware.

Besides the advantages, the implementation of the related module based on the matrix unit is reasonable for the convolution-based operation unit in the memory or close to the memory, so that the power of the process related to the memory access is reduced, and the matrix physical implementation is more compact. In order to realize the purpose, the following technical scheme is adopted:

based on two stages of convolution operation, the invention provides a multi-bit convolution operation module based on time-adjustable current integration and charge sharing. The module includes: at least one digital input x_iAt least one digital-to-Analog Converter (DAC) for converting the digital input into current for transmission in the circuit; at least one weight w_jiWhen the weight is expressed as a binary number, w_ji,kIs the value at its k-th bit; each convolution operation unit (i, j, k) is used for 1 bit weighted 1 bit binary w_ji,kAnd 1 multi-bit binary x_iThe multiplication operation of (1), a convolution operation array composed of a plurality of convolution operation units, the array completing the multiplication operation and the addition operation of the convolution operation; at least one output y_j；

In particular, said current Ix_iIs to input digital x by DAC_iCurrent Ix converted according to a given number of bits of DAC_iMirrored or copied into the convolution array, the currents of the same j x k plane are the same, allowing the input of the multi-bit signal and the currents to be scaled in the DAC so that the currents arrive at the same time of the switches.

In particular, said array of convolution operations has a size i j k, each operation cell (i, j, k) comprising a current Ix_iSwitch, integral control module and node a_ji,kAnd at least one capacitor.

In particular, the integration control module controls the integration time of the current in the capacitor, and the obtained voltage at two ends of the capacitor is changed according to the current integration time from U = Q/C. For the weight w_ji，w_ji,kIs the weight w_jiBinary representation of the value at the k-th bit, k ∈ [1, B ]]Each bit w_ji,kCorresponding to a convolution operation unit, the k-direction convolution operation unit is bit-dependentw_ji,kArranged from low to high.

In particular, w in the control module_ji,kAnd the AND gate output of the PWM signal controls the switch to be closed, the output is 1, and the switch is closed. The weight bit change of multiplicand or multiplier in multiplication stage during binary phase multiplication is realized in the module by controlling the integration time of current in capacitor through PWM signal, and different weight values w_jiThe PWM signal durations of the units corresponding to the same k bit are the same; the duration of the PWM signal of the convolution operation unit corresponding to one bit after the same weighted value is 2 times of that of the previous bit, one end of the capacitor is grounded, and the voltage at the two ends of the capacitor is the voltage at the upper polar plate of the capacitor.

In particular, the logic operation of the integration control module may be an and gate or an or gate, and includes a Static Random-Access Memory (SRAM), which may be implemented by the same SRAM 6T cell or different SRAM cells, and a bit w_ji,k(ii) a The input of the logical operation is w_ji,kAnd PWM signals modulated according to the bit weights, wherein the PWM signals realize multiplication weight bit change, the duration time of the PWM signals is 2 times of that of the positioned bits, namely when k =1, 2 and 3, the duration time of the corresponding PWM signals is 1 tau, 2 tau and 4 tau, and the duration time of the PWM signals of the k bits is 2^(k-1)*Tau, tau is the clock period of the PWM signal; output of logic operation controls switch closure, w_ji,kThe operation unit current of =0 is integrated without passing through the switch into the capacitor, and the voltage at the node above the capacitor is 0.

Further, when the logic operation is an and gate, the PWM signal duration refers to a duration of a high level, and when the logic operation is an or gate, the PWM signal duration refers to a duration of a low level.

Further, assume w_ji,1=w_ji,BIf the voltages across the corresponding capacitors, k = B, are 2 times the voltage of k =1, the capacitance k = B will be the voltage of the capacitor, and the amount of charge stored will be different after the currents in the capacitors have passed different integration times^（k-1）And (4) doubling.

In particular, node a_ji,kAt a voltage of x_i*w_ji,k*2^（k-1）The result of the multiplier is the value w at each bit of the weight from the time the node is connected to the upper plate of the capacitor_ji,kAnd duration determination of the PWM signal; x is the number of_iThe combined voltage of the corresponding 1 × k convolution operation units is x_i* w_jiThe result of (1).

Further, y is_jGiven a j, all a's connecting an i x k plane_ji,kThe voltage of the combined node obtained by the node is charge-shared by the capacitors in different operation units through the respective connected nodes due to the discharge characteristic of the capacitors, after the charge-sharing is finished, the charge amount in each capacitor is the same, but the total charge amount obtained by current integration in the multiplication stage is not changed, and the accumulated voltage at the combined node is

As a result of (1), i.e.

Completing the operation of the convolution process of a convolution kernel and an input matrix;

further, for a module to be used in a neural network arithmetic unit, it is usually necessary to add a bias. Offset b of the invention_jConversion to a given current Ix_iAdditional input of a fixed current I_bAdding additional bias operation units for independent operation, wherein the size of the bias unit array is j × k, and each operation unit (j, k) comprises a current I_bSwitch, integral control module and node a_j,kA value of C_uThe capacitance of (c).

Further, y is_jOffset b of_jAll nodes a of the unit are 1 x k_j,kAccumulated voltage sum

Further, a counter or clock divider is used to generate a PWM signal based on a clock at maximum speed, speeding up the capacitance integration speed.

Further, to reduce kickback or transient effects on the current mirror, the switch is a virtual switch or a current device or a non-switching element.

The invention also comprises a multi-bit convolution analog operation method based on time-variable current integration and charge sharing, which comprises the following steps:

DAC inputting digital number x according to given bit number_iCurrent Ix converted to analog signal_iTransmitting in the circuit;

current Ix_iWhen reaching the switch, the integral control module comprises a logic operation, and the input of the logic operation is weight w_jiK-th bit w of_ji,kAnd PWM signals modulated according to the bit weight, the duration of the PWM signals in the convolution operation unit in the k direction is increased by 2 times from low bit to high bit, and the duration of the PWM signals of the k bit is 2^(k-1)τ, τ being the clock period of the PWM signal, the output of the logical operation controlling the closing of the switch;

current Ix after switch is closed_iThrough a node a connected to the upper plate of the capacitor_ji,kThe voltage at two ends of the capacitor is obtained after the voltage is integrated for a period of time, and the current does not pass through the node a after the switch is switched off_ji,kAfter integrating for a period of time, the voltage at the two ends of the capacitor is 0, the integration time is the duration of the PWM signal, and the node a_ji,kIs x of the convolution operation_i*w_ji,k*2^(k-1)The multiplication result of (2);

all convolution operation unit inner node a for short-circuiting one i x k surface_ji,kThe electric charge sharing between the capacitors in each convolution operation unit obtains the voltage of the combined node as convolution operation

Result of (a) y_j。

Drawings

FIG. 1 is a diagram illustrating a circuit implementation of a multiply stage of a convolution operation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an integration control module according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an output implementation of the convolution addition stage according to an embodiment of the present invention (A is not shown in the diagram)DC, when required, will y_jCan be added to each output y when converted to digital output_jBefore);

FIG. 4 is a schematic diagram of a reuse of an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating an implementation of adding offset unit multiplication for convolution operation according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an output after being biased according to an embodiment of the present invention.

The main elements are indicated by symbols.

Module group	10
		Digital-to-analog converter	101
Convolution operation unit	102
		Integral control module	103
PWM signal	1031
		Static random access memory	1032
And gate	1033
		Switch with a switch body	1021
Capacitor with a capacitor element	1022
		Multiplexer	104
Attenuating capacitor	105
		Offset cell array	106
Offset operation unit	1061
		Offset integral control module	1062
Digital input	x_i
		Electric current	Ix_i
Weight of	w_ji

Detailed Description

In order to make the objects, principles, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

It is to be understood that the specific embodiments described herein are for purposes of illustration, but the invention may be practiced otherwise than as specifically described and that there may be variations which will occur to those skilled in the art without departing from the spirit of the invention and therefore the scope of the invention is not limited to the specific embodiments disclosed below.

Referring to fig. 1, for one general convolution operation as follows:

binary number x of multiple bits_iAn input matrix of i from 1 to N; a plurality of weights w_jiForming a convolution kernel, also called a weight matrix, wherein j represents a corresponding jth window after i is determined; assuming that when an input matrix constituting n × n is inputted and a convolution kernel is a weight matrix of m × m, j is 1 to n-m +1 (n)>m, the window moves); the output is y_jAll of y_jForming a convolution operation result, namely extracting a layer of neural network features;

w is_jiWhen represented as a binary number of multiple bits, w_ji,kIs w_jiThe value at the k bit; two multi-bit binary Σ x_i*w_jiThe convolution operation process is divided into two stages:

a multiplication stage: input x_iMultiplied by a weight w_jiEach bit of (a) is multiplied by the bit weight of the bit 2^（k-1）I.e. x_i*w_ji,k*2^(k-1)，w_ji,kIs 0 or 1.

And (3) addition stage: accumulating and summing the result of each multiplication operation in the multiplication stage to obtain an output y_j。

Output y_jUnder the condition of determining the size of convolution kernel, when the module of the invention is used for convolution calculation of neural network, the multiplication stage weight w_jiThe weight matrix formed is shared, i.e. when j changes from 1 to n-m +1, w_i1=w_i2=w_i3=.....=w_ji。

For the above convolution operation with multi-bit binary, the present invention needs to solve the bit weight change when the multiplicand multiplies each bit of the multiplier in the multiplication stage and the addition stage of the accumulation of the multiplication result.

The embodiment of the invention provides an operation module 10 for realizing the multi-bit convolution operation, which is based on time adjustability of current integration and charge accumulation. The module 10 comprises: at least one digital inputInto x_iAt least one Digital to analog converter (DAC) 101 converts the Digital input to a current Ix_iTransmitting in the circuit; at least one weight w_jiWhen the weight is expressed as a binary number, w_ji,kBinary representing the value at the k bit for it; a convolution operation array comprising a plurality of convolution operation units 102, the convolution operation array having a size of i x j k, each convolution operation unit 102 (i, j, k) including a current Ix_iSwitch 1021, integral control module 103, node a_ji,kA value of C_uCapacitor 1022, one end of capacitor 1022 is grounded, and capacitor 1022 needs to be reset to a given dc voltage before convolution operation. The array performs multiplication and addition of convolution operations, and at least one output y_j。

The multiplication stage, as shown in fig. 1, performs an and operation in conjunction with the PWM signal 1031 to achieve a weighted multi-bit. In the embodiment, the matrix unit is used for realizing the operation unit for convolution in the memory or near memory, so that the power of the process related to memory access is reduced, and the matrix physical realization is more compact. Specifically, the digital-to-analog converter 101 inputs a number x_iCurrent Ix converted to analog signal according to given bit number_iResolution of DAC with digital input x_iThe number of bits of (a) is identical. Current Ix_iThe current mirror images or copies the current mirror images to j × k convolution operation units 102 corresponding to the same i, so that the current integration of convolution operation units 102 in different i × k surfaces and j directions can be performed simultaneously. In particular, the k-directional weight w_jiThe number of bits is increased, the corresponding convolution operation unit 102 is operated for each bit w_ji,kArranged from low to high. In particular, the current Ix required to be converted by the DAC_iThe current value can be controlled not to exceed a certain threshold value by being scaled in the DAC and then transmitted in the circuit according to the requirement, and the power loss of transmission is reduced. Thereafter current Ix_iThrough the switch 1021, the switch 1021 may be a virtual switch or a current device or a non-switching element such as a current device or a virtual load, in order to reduce kickback or transient effects on the current mirror.

Integral control module 103 controlFor example, the logic operation of the module 1021 may be an and gate 1033, the module includes a Static Random-Access Memory (SRAM) cell 1032, and for the entire convolution array, the SRAM cell may be the same SRAM 6T cell or a different SRAM cell, which stores a binary number w_jiA certain bit w of_ji,kK direction is weight w_jiLow to high directions; the input to AND gate 1033 is w_ji,kAnd the output signal of the and gate 1033 controls the on-off of the switch 1021 according to the PWM signal 1031 modulated by the bit weight, thereby realizing the change of the bit weight when the multiplicand in the binary multiplication stage multiplies each bit of the multiplier. In particular, one input PWM signal 1031 of the and gate 1033 of the present invention is based on the corresponding weight w of the cell in which it is located_jiWhen the duration of the PWM signal 1031 of the i × j cells corresponding to adjacent bits respectively increases by 2 times in the k direction, for example, k =1, 2, 3, the duration of the corresponding PWM signal 1031 is 1 τ, 2 τ, 4 τ, the duration of the PWM signal 1031 corresponding to the higher bit is 2 times that of the lower bit, and the duration of the PWM signal 1031 corresponding to the k-th bit is 2 times that of the lower bit^(k-1)τ, τ is the clock period of PWM signal 1031. It should be noted that, in the present embodiment, the duration of the PWM signal 1031 refers to the duration of a high level; current position w_ji,kWhen 1 is asserted and PWM signal 1031 is high, and gate 1033 outputs 1, switch 1021 is closed and current Ix_iThe capacitor 1022 starts storing charge by integration through switch 1021 into capacitor 1022; when the high level duration of the PWM signal 1031 passes, the signal enters a low level state, the switch is turned off, and the current Ix_iWithout this, the current stops integrating in the capacitor 1022, no new charge is accumulated in the capacitor 1022 after the switch 1021 is turned off, and the stored charge is accumulated in a high state; thus, from U = Q/C, the present invention is directed to said w_ji,kA convolution operation unit 102 of 1, wherein the voltage across the capacitor 1022 is related to the charge amount stored by current integration in the capacitor 1022; w is a_ji,kAt 0, and gate 1033 outputs 0 no matter whether PWM signal 1031 is in high state or not, at this time, switch 1021In the off state, the current Ix_iWithout this, there is no current integration in the capacitor 1022, the stored charge is 0, and the voltage across the capacitor 1022 is 0. Based on the same principle, the logic operation of the integral control module 103 can be an or gate in another embodiment, in which the duration of the PWM signal 1031 is a low-level duration, and the PWM signals 1031 and w are the same as the low-level duration_ji,kAnd carrying out OR operation. In other embodiments, a counter or a clock divider is used to generate the PWM signal 1031 based on the maximum speed clock, i.e. to make τ as small as possible, to speed up the integration speed of the capacitor 1022, i.e. to speed up the time required for each operation of the multiplication operation, and the PWM signal 1031 is used for control in order to improve the flexibility of the system.

Specifically, when switch 1021 is in a closed state, current Ix_iTo node a through switch 1021_ji,kThe node a_ji,kConnected to the upper plate of capacitor 1022, and then current Ix_iEntering into a capacitor 1022, for each convolution operation, the capacitor 1022 needs to be at the current Ix_iBefore the flow advances, the current is reset to a given direct current voltage, and the last operation result is cleared. The capacitor 1022 is grounded, and the voltage across the capacitor 1022 is node a_ji,kThe voltage of (d). After the current enters the capacitor 1022, the amount of charge stored in the capacitor 1022 increases with the lapse of the integration time, that is, when the switch 1021 is in the closed state, the current is continuously integrated, and at this time, the voltage across the capacitor 1022 gradually increases, and the integration time is the on-off time of the switch 1021.

For example, assume the weight w_jiEach bit w in binary representation of_ji,kCorresponding to the convolution operation unit, w_ji,1=w_ji,2=w_ji,3= …. =1, corresponding to the same subscript i, j respectively, corresponding to k =1, 2, 3, duration of the PWM signal 1031 is τ, 2 τ, 4 τ respectively, duration of the k-th bit PWM signal 1031 is 2^（k-1）Duration of the PWM signal 1031 of the tau most significant bit is 2^（B ^-1）τ, the capacitance of the capacitor 1022 in the convolution operation unit 102 is the same, and the current Ix in the capacitor 1022_iAfter the respective integration time, the signal is represented by Q =

It can be seen that at the current Ix_iThe same amount of charge stored in the capacitor 1022 and the same amount of current Ix_iThe integral time of (a) is proportional to the integral time of (B), and changes with 2 times of the rising direction of the bit, that is, the amount of charge stored in the capacitor 1022 corresponding to k =1, 2, and 3 is Q, 2Q, and 4Q, respectively, further, from U = Q/C, when the capacity of the capacitor 1022 is the same, the voltage across the capacitor 1022 is proportional to the amount of charge stored therein, the voltage across the capacitor 1022 corresponding thereto is U, 2U, and 4U, respectively, that is, the upper bit is 2 times the lower bit, and the value of the capacitor 1022 in the k = B convolution unit 102 is 2 times the voltage of the capacitor 1022 in the k =1 convolution unit 102^（B-1）Multiplication, i.e. implementation of the weight w_jiOr the multiplier multiplying each bit by the input x_iOr the multiplicand with a weight bit, it is noted that above is only w_jiIn fact, regardless of w_ji,kIs 0 or 1, which corresponds to the same current integration time in the convolution operation unit 102 as the duration of the PWM signal 1031, but w_ji,k=0 is an integral of a current value of 0, w, performed in the convolution operation section 102_ji,kThe value Ix of 1 corresponding to the value performed in the convolution operation section 102_iThe duration of the PWM signal 1031 varies only by a factor of 2 in bits, not by w_ji,kIs 0 or 1.

After the current integration is finished, because one end of the capacitor 1022 is grounded, the node a in each convolution operation unit 1022_ji,kThe voltage at is the voltage across the capacitor 1022, and the voltage value is defined as x_i*w_ji,k*2^（k-1）A multiplier result of yes.

The addition phase, as in fig. 3, results in a convolution output through charge sharing. After all convolution operation units 102 of the present invention complete the current integration operation in the multiplication stage, for j =1, x₁The corresponding k units finish x once₁*w₁₁Operation of (a), x₁*w₁₁Is broken to see the input x₁Are respectively multiplied by the weight w₁₁Each bit w of_11,kAnd the bit weight of the bit 2^（k-1）I.e. x₁*w_11,k*2^（k-1）And then the results obtained respectively are added. For the same reason, x_iCorresponding k units complete x once_i* w_i1Operation, then j =1, i ∈ [1, N ∈]All the corresponding i x 1 x k arrays complete the multiplication of one convolution window, and the node a of each convolution operation unit 102 of the i x 1 x k arrays_ji,kThe voltage is the multiplication result, after the multiplication operation is completed, the capacitor 1022 is short-circuited, and the short circuit j =1 corresponds to the node a above all the capacitors 1022 in the array_ji,kAt this time, all the capacitors in the corresponding array are connected in parallel, due to the different charge amounts stored in the capacitors 1022 in each unit and the discharging characteristics of the capacitors 1022, the capacitors 1022 in the shorted array perform charge sharing, the charge amounts stored in each capacitor 1022 are the same, but the total charge value is unchanged, and the voltage of the obtained combined node is the voltage of each multiplication result node a in the multiplication stage_ji,kThe sum of the voltages being the output y₁. In a further embodiment, for a convolutional neural network, where the weight matrix is shared, the convolution kernels for different windows are the same, i.e., the multiplicand (weight w) when different window convolution results are computed_ji) The weight matrices formed are identical, w_j1=w_j2=w_j3=.....=w_jiThe number of parameters participating in the operation is reduced. Similarly, other corresponding output y can be obtained by short-circuiting the arrays corresponding to other j_jEquation 1 below:

optionally, for output y_jAnd (6) performing conversion. Y output after performing an accumulation operation of analog multiplication by the convolution operation array_jIs an analog signal and outputs y when needed_jWhen the signal is a Digital signal, an Analog-to-Digital Converter (ADC) is added before the output, and the obtained output y_jIs a digital signal. For example, the convolution module is applied to a convolution neural network, and the digital output y_jAnd can be used as digital input to convolution operation array to make second operationConvolution operation of the neural network of layers. Furthermore, if the accumulated voltage swings or is too high in the input range of the analog-to-digital converter, it is possible to increase the unit capacitance C by adding the unit capacitance C in the multiplication stage as shown in FIG. 1_uHowever, the number of capacitors required for each set of convolution operation units 102 increases, and a larger physical area is required, which is disadvantageous for miniaturization of the device. Thus, consider that when connecting the combined nodes, an extra value of C is connected at the same time_attThe attenuation capacitor 105 is brought into the combining node, thereby adjusting the scale range of the accumulated voltage, so that the accumulated voltage is scaled to a certain scale range, and the input range of the digital-to-analog converter is satisfied. Whenever yj is output, the node a above the capacitor is attenuated by using the attenuation capacitor 105_att,jWith the original node a_ji,kConnected, this solution makes more efficient use of the area physically realized by the modules.

The convolution operation module satisfies the requirement of unit reuse. For the physical implementation of the two-stage convolution operation described above, the weight w_jiIs generally fixed, i.e. the size of k is fixed, at the input or weight w_jiWhen the number of bits of binary representation is small, the high-bit unit does not participate in the operation, and when the convolution operation unit 102 corresponding to the high-bit is connected to the circuit, the power consumption of the circuit is increased, so that for the unit which does not participate in the operation, a simple method is to operate y_jWhile, the unused binary weights w are switched off_jiThe array units corresponding to the high digits are only connected to participate in the operation y_jThe convolution operation unit 102, which is advantageous for reducing power consumption. However, this results in unused areas, especially in the weights w for operations using physical elements_jiIs the low bit number. Thus, consider the pair of input and weight w_jiThe bit number of the unit is reconfigured to meet the operational flexibility of matrix input and weight internal quantization, the reuse of unused units is realized, and the reconfiguration process is as follows:

as in FIG. 4, a set of cells associated with the k bit of the weight is reused for input x_iOr input x_ii，Corresponding currents are Ix respectively_iOr Ix_iiThe voltage signals corresponding to the currents are Vgx_iOr Vgx_ii. The multiplexer control signal controlled according to bit k selects the voltage signal corresponding to the unused unit according to the corresponding unit of the remaining unused bit, i.e. the selected voltage V' gx_iAnd respectively with Vgx_iOr Vgx_iiThe same is true. Then the current I' x in the cell corresponding to bit k_iAnd Ix_iOr Ix_iiThe same is true. For example, assume that there is already one support 8-bit weight w_jiThe convolution operation module of operation has only one weight w of 1 bit_jiIf convolution is required, there are 7 (= 8-1) remaining convolution computing units 102 not participating in the computation, and the remaining 7 convolution computing units 102 can be used for inputting and inputting x_iThe same input (i.e. I' x)_i=Ix_i) Performing 7 times of convolution operation of the weight of 1 bit; when the original input x_iOr the original weight w_jiFor 5 bits, it is clear that the remaining 3 groups of cells cannot perform the same convolution operation as the original input, and now consider performing another weight sum input Ix of less than or equal to 3 bits_ii，At this time I' x_i=Ix_ii. In particular, another implementation of reuse, since each group of cells is independent in the i direction, at a given input x_iWhen i is smaller, the unused unit has no current input and no power loss; when i is larger and the weight w_jiSmaller, excess of x_iThe input can be to the convolution operation unit 102 corresponding to the weight bit which is not used by other inputs. In other embodiments, the current may pass through a diode in a current mirror via a voltage V' gx_iControl, DAC can input for a given number of bits and ADC is possible for output y_jIs reconfigured so that the DAC or ADC resolution can match the number of bits of the corresponding output.

Selecting matched inputs I' x at the multiplexer_iAfter, with weight w_jiThe associated PWM signal 1031 duration is reconfigured. Since the original physically implemented unused cells have PWM signals 1031 corresponding to the bit weights, the original cells are used for reuse, and the corresponding bit weights need to be changed, i.e., the corresponding bit weightsThe PWM signal 1031 duration needs to be changed so that the multiplication associated with bit k is coupled to input x_iOr input x_iiAnd (4) associating. Two extreme examples are used below to illustrate this reconfiguration capability. First, assuming that a physical implementation is available for operation with the maximum number of bits that can support the weight, i.e., k =8, and all convolution operation arrays of the physical implementation are shown in fig. 1, it is apparent that the duration of the array PWM signal 1031 ranges from τ to 2^(B-1)τ. However, when the weight bit number k =1, the remaining 2-8 bit corresponding cells may be reused for inputting x_iAt most 8 inputs can be in parallel, where all weights are Pulse Width Modulated (PWM) with a pulse width, i.e., duration of PWM signal 1031, τ, and all weights are quantized to a single bit, rather than to a weight w of 8 bits in the former case_jiEach bit of (a) is quantized.

Fig. 5 and fig. 6 show an embodiment of adding an offset operation unit 1051 when the convolution operation unit 102 is used for convolution neural network operation according to the present invention. The addition of the offset b in view of the convolution operation makes the convolution operation more efficient and accurate, typically for a given output y_jAdding a binary offset b_j. Then the corresponding convolution output y_jFrom equation 1 to equation 2 below.

Figure 5 illustrates how this extra functionality is added in the multiplication stage. Since the quantization of the bias bits is performed in a similar manner to the weights in fig. 1 or fig. 2, the implementation of the bias is considered as a given current Ix_iAdditional input of a fixed current I_b。

Offset b of the invention_jConversion to a given current Ix_iAdditional input of a fixed current I_bThe calculation is performed by adding an additional offset calculation unit 1061, the offset calculation unit 1061 forms an offset calculation array 106 with the size of j × k, and each offset calculation unit 1061 (j, k) includes a current I_bSwitch 1021, bias operation unit integral control module 1062, and node a_j,kA value of C_uThe capacitor 1022; current I_bIntegrating in capacitor 1022, similar to the convolution stage, weight w_jiTo b_jThen the input of the offset AND gate in the offset unit integral control module 1062 is b_j,kAnd b_j,kThe output of the bit-weight modulated PWM signal 1031, the offset AND gate controls the closing time of the switch 1021, i.e., the integral time of the current inside the capacitor 1022 in the offset arithmetic unit (j, k) 1061 is b_j,k*2^（k-1）τ. Offset operation section 1061, PWM signal 1031, and weight w in convolution operation section 102 corresponding to the same k_ji,kThe PWM signal 1031 at is the same. It should be noted that, in the present embodiment, the duration of the PWM signal 1031 refers to the duration of a high level; when bit b _j,k1, when PWM signal 1031 is high, the offset and gate output is 1, and at this time, switch 1021 is closed, and current I_bIntegration through a switch into a capacitor 1022, which stores charge; when the high level duration of the PWM signal 1031 passes, the signal enters a low level state, the switch 1021 is turned off, and the current I_bWithout this, the current stops integrating in the capacitor 1022, no new charge is accumulated in the capacitor 1022 after the switch 1021 is turned off, and the stored charge is accumulated in a high state; b_j,kWhen the voltage is 0, the AND gate is biased to output 0, and the switch 1021 is in an off state, and the current I is_bWithout this, there is no current integration in the capacitor 1022 and the stored charge is 0. Similarly, the voltage across capacitor 1022 is the result of the multiplication phase of offset unit 1061.

Fig. 6 illustrates that during the accumulation phase, an additional capacitor 1022 needs to be added for charge sharing and node accumulation.

Similarly, short circuit is given k unit nodes a corresponding to j_j,kDue to the discharging characteristic of the capacitor 1022, the capacitors 1022 in the shorted array perform charge sharing, the amount of charge stored in each capacitor 1022 is the same, but the total charge value is not changed, and the voltage of the obtained combined node is the node a of each multiplication result in the multiplication stage_ji,kThe sum of voltages, i.e. y_jBias b of (1) k groups of all nodes a_j,kThe physical implementation of the convolution and offset operation units is independent as shown in fig. 6, but when the convolution result with the offset finally added is output, the corresponding nodes of the convolution operation unit 102 and the offset operation unit 1061 may be connected, and the voltage of the combined node obtained is the convolution result with the offset added.

the digital-to-analog converter 101 inputs a digital number x in a given number of bits_iCurrent Ix converted to analog signal_iTransmitted in the circuit.

Current Ix_iWhen the switch is reached, a logic operation is performed, the logic operation is performed in the integration control module 103, and the input of the logic operation is the weight w_jiK-th bit w of_ji,kAnd PWM signal 1031 modulated according to the bit weight, the duration of PWM signal 1031 in the k-direction convolution operation unit increases by 2 times from low bit to high bit, and the duration of PWM signal 1031 of the k-th bit is 2^(k-1)τ, τ is the clock period of the PWM signal, and the output of the logic operation controls the closing of switch 1021. Current Ix after the switch 1021 is closed_iThrough a node a connected to the upper plate of the capacitor_ji,kThe voltage of the two ends of the capacitor is obtained after the voltage enters the capacitor 1022 for integration and is integrated for a period of time, and after the switch is switched off, the current does not pass through the node a_ji,kThe voltage across the capacitor 1022 obtained after integrating for a period of time is 0, the integration time is the duration of the PWM signal 1031, and the node a_ji,kIs x of the convolution operation_i*w_ji,k*2^(k-1)The multiplication result of (1). All nodes a in convolution operation unit 102 for short-circuiting one i x k surface_ji,kThe charge sharing between the capacitors 1022 in each convolution operation unit 102, the obtained voltage of the combined node is convolution operation

Result of (a) y_j。

It should be noted that, in the foregoing embodiment, each included module is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. Multibit convolution operation module based on changeable current integration of time and charge sharing, its characterized in that includes:

at least one digital input x_iAt least one Digital to analog converter (DAC), at least one binary weight w_jiA convolution operation array composed of a plurality of convolution operation units, at least one output y_j；

The digital input x_iCurrent Ix converted into analog signal by DAC according to given bit number_iTransmitting in the circuit;

the binary weight w_jiJ indicates that the weight is the weight index of the jth window, w_ji,kIs the weight w_jiValue at k bit, w_ji,kIs 0 or 1, k ∈ [1, B ]]Wherein B refers to the highest bit of the binary system, each bit w_ji,kCorresponding to a convolution operation unit;

the convolution operation array has the scale of i x j x k, the direction of i is the input direction, the direction of j is the convolution window direction, and the convolution operation unit in the direction of k depends on the weight w_jiEach bit w of_ji,kAre sequentially arranged from low position to high position;

each convolution operation unit comprises an input current Ix_iSwitch, integral control module and node a_ji,kOne end of the capacitor is grounded;

the integral control module is given a logical operation, the input of which is w_ji,kAnd according to w_ji,kBit-weight modulated PWM signal, k-direction convolution operation sheetThe duration of the PWM signal in the unit is increased by 2 times from low bit to high bit, and the duration of the PWM signal at the k bit is 2^(k-1)τ, τ is a clock period of the PWM signal, and the output of the integral control module controls the closing of the switch;

current Ix when the switch is closed_iThrough a node a connected to the upper plate of the capacitor_ji,kEntering capacitance internal integration; current Ix when the switch is off_iDoes not pass through the node a_ji,k(ii) a The integration time being the duration of the PWM signal, node a_ji,kIs x of the convolution operation_i*w_ji,k*2^(k-1)The multiplication result of (2);

said y_jBy shorting all nodes a in a convolution operation unit of an i x k surface_ji,kAnd sharing the charge among the capacitors in each convolution operation unit to obtain the voltage of the combined node, wherein the voltage is the result of the convolution operation.

2. The module of claim 1, wherein x is_iThe combined voltage of the corresponding 1 × k convolution operation units is x_i* w_jiAs a result, the voltage at the combined node of the convolution operation unit for an i x k plane is

And finishing the operation of the convolution process of the convolution kernel and the input matrix.

3. The module of claim 2, wherein said input x_iIs at least one bit binary, converts the input x_iThe resolution of the DAC of (1) can be adjusted.

4. A module as claimed in claim 3, characterized in that the current Ix_iMirrored or copied by a current mirror into a convolution operation array, the current of the same j x k surface is the same, and the current Ix_iScaling in digital-to-analog converters is possible.

5. A mould as claimed in claim 4Wherein the logic operation of the integration control module is an AND gate, and one of the inputs of the AND gate is a bit w stored in an SRAM cell_ji,kThe other one is a PWM signal which is increased by 2 times with tau as a base number along with the increasing duration of k bit by bit, and the output of the AND gate controls the switch to be closed; different weights w_jiThe convolution operation units corresponding to the same k bit have the same PWM signal duration and the same weight w_jiThe duration time of the PWM signals of the convolution operation units corresponding to different bits is different and is respectively 2^(k-1)*τ。

6. The module of claim 5, wherein the counter or clock divider is used to generate the fastest speed PWM clock signal to speed up the capacitive integration speed.

7. The module of any of claims 1 to 6, wherein the switches in the convolution operation unit are non-switching elements such as virtual switches or current devices to reduce kickback or transient effects on the current mirror.

8. The module of claim 7 in which a number is input x_iAnd a weight w_jiCan be reconfigured for re-inputting the number x_iOr new input x_iiThe method comprises the following steps:

the multiplexer receiving the re-input x_iAnd x_iiAccording to the weight w_jiThe convolution operation unit corresponding to the residual unused bit number selects the input voltage signal conforming to the unused unit, and the output voltage signal enters the convolution operation unit;

the PWM signal duration corresponding to the bit weight in the unused convolution operation unit for reuse is reconfigured.

9. A module as claimed in claim 8, characterized in that in the reuse stage, the number of bits of at least one of the multiplexers is adapted to the number of bits of the weight code, the output of the multiplexer being controlled by the number of weight bits k.

10. The module of claim 9, wherein the array of convolution operations further comprises a biasing module, the biasing module comprising:

an offset cell array comprising a plurality of offset cells, said array of offset cells having a size j x k, each offset cell (j, k) comprising a current I_bSwitch, integral control module and node a_j,kA value of C_uThe capacitance of (2);

the bias current I_bIs a current Ix_iAn additional fixed current;

b_j,kis a multi-bit binary bias b_jThe integral time of the current inside the capacitor in the offset operation unit (j, k) is b_j,k*2^（k-1）τ；

In the integral control module, b_j,kAnd is a_j,kThe PWM signal of the bit weight modulation is output through an AND gate operand to control the switch to be closed and control the bias current I in the capacitor in the bias operation unit_bThe integration time of (d);

y_jis biased to 1 × k groups of all nodes a of the cell_j,kThe sum of the voltages is accumulated.

11. The module of claim 10, wherein the output y is output when the accumulated voltage swing at the combining node is above the adc input range or above a threshold value_jThe full scale range of the accumulated voltage is adjusted by connecting a damping capacitor in parallel before connecting the analog-to-digital converter.

12. The multi-bit convolution operation method based on time-variable current integration and charge sharing is characterized by comprising the following steps of:

current Ix_iWhen the switch is reached, a logic operation is performed, the input of which is the weight w_jiK-th bit w of_ji,kAnd according to w_ji,kBit weightsModulated PWM signal, the duration of PWM signal in convolution operation unit in k direction is increased by 2 times from low bit to high bit, and the duration of PWM signal in k bit is 2^(k-1)τ, τ being the clock period of the PWM signal, the output of the logical operation controlling the closing of the switch;

Result of (a) y_j。

13. The method of claim 12, wherein the DAC is converting a digital input x_iPreviously, the resolution of the DAC is adjusted.

14. The method of claim 13, wherein prior to performing the logical operation, a counter or clock divider is used to generate a fastest speed PWM clock signal to increase an integration speed of the current.

15. The method of claim 14, wherein x is input once_iThereafter, reusing the unused convolution operation units, including:

receiving a re-input x using a multiplexer_iAnd x_iiAccording to the weight w_jiSelection of convolution operation units corresponding to the remaining unused bits and the unused unit symbolsThe combined input voltage signal and the output voltage signal enter a convolution operation unit; after the input voltage signal is selected, the PWM signal durations corresponding to the bit weights in the unused convolution operation units are reconfigured.

16. The method of claim 15, wherein y is connected to the ADC output_jPreviously, attenuation capacitors were connected in parallel to adjust the full scale range of the accumulated voltage, making the accumulated voltage swing at the combining node lower than the analog-to-digital converter input range.