CN111144558A - Multi-bit convolution operation module based on time-variable current integration and charge sharing - Google Patents

Multi-bit convolution operation module based on time-variable current integration and charge sharing Download PDF

Info

Publication number
CN111144558A
CN111144558A CN202010257151.0A CN202010257151A CN111144558A CN 111144558 A CN111144558 A CN 111144558A CN 202010257151 A CN202010257151 A CN 202010257151A CN 111144558 A CN111144558 A CN 111144558A
Authority
CN
China
Prior art keywords
bit
convolution operation
current
convolution
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010257151.0A
Other languages
Chinese (zh)
Other versions
CN111144558B (en
Inventor
阿隆索·莫尔加多
刘洪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jiutian Ruixin Technology Co ltd
Original Assignee
Shenzhen Jiutian Ruixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jiutian Ruixin Technology Co ltd filed Critical Shenzhen Jiutian Ruixin Technology Co ltd
Priority to CN202010257151.0A priority Critical patent/CN111144558B/en
Publication of CN111144558A publication Critical patent/CN111144558A/en
Application granted granted Critical
Publication of CN111144558B publication Critical patent/CN111144558B/en
Priority to PCT/CN2021/081322 priority patent/WO2021197073A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Analogue/Digital Conversion (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to an analog operation module, in particular to an analog operation module related to convolution operation, and provides a group of analog Multipliers and Accumulators (MAC). Wherein the current integration in the capacitors is used for the multiplication of two multi-bit binary convolution processes, while the charge sharing between the capacitors realizes the addition process. In the multiplication stage, the integration time of PWM control current in the capacitor is tau, 2 tau and 4 tau for the same clock period tau (B‑1τ such that a binary multiplier of a given number of bits has a weight bit change per bit k when multiplied. This idea is applicable to a range of multi-bit convolutions with adjustable number of bits can be used to implement a general convolution with two or more inputs, and the number of bits of the binary can be adjusted. In particular, an array of offset arithmetic units may be added. The invention can be used as a memory or a near memory realized by a neural network convolution operation unit or an operation accelerator hardwareA unit of memory operations.

Description

Multi-bit convolution operation module based on time-variable current integration and charge sharing
Technical Field
The present invention relates to analog computation modules, and particularly to an analog computation module for convolution operations and an analog computation method for convolution operations.
Background
For quantization with low signal-to-noise ratio, analog operation has higher efficiency than traditional digital operation, and therefore, digital quantity is usually converted into analog quantity for operation. Especially for the neural network, compared with the medium and large hardware implementation of the neural network, the operation energy consumption of the neural network is lower, because the traditional data is stored in the disk, the data needs to be extracted into the memory during the operation, and the process needs a large amount of I/O connected with the storage of the traditional memory, which usually occupies more power consumption. And the operation process can be sent to data for local execution based on the analog memory and near memory operation, so that the operation speed is greatly improved, the storage area is saved, and the data transmission and the operation power consumption are reduced. The invention provides an effective realization method of ultra-low power consumption analog memory or near memory operation.
The recent paper "a Mixed-Signal binary weighted Storage and Multiplication for reduced data Movement" symmetry. VLSI Circuits, pp. 141-142, 2018 presents an efficient performance, and the method relates to a procedure of transmitting a Multiplication operation with one-bit emphasis in a Neural network, i.e. an input layer to a volume layer, then to a pooling layer, and finally to output, by storing a weight of 1 bit by a Static Random-Access Memory (SRAM) unit and performing convolution operation on an input Mixed Signal. However, in this background art document, the implementation of the analog operation circuit does not involve a change in the weight bits of the multiplier or multiplicand, and is limited to the input of 1-bit multiplication in the first order layer, and cannot be used for convolution analog operation of a multi-bit binary number.
Very few multi-bit operations involve changes in the weight bits of the multiplier or multiplicand, as in the article:
“In-Memory Computation of a Machine-Learning Classifier in a Standard 6TSRAM Array”, JSSC, pp. 915-924, 2017;(2)“A 481pJ/decision 3.4M decision/smultifunctional deep inmemory inference processor using standard 6T SRAMarray”,arXiv:1610.07501, 2016;(3)“A Microprocessor implemented in 65nm CMOSwith Configurable and Bit-scalable Accelerator for Programmable In-memoryComputing”,arXiv :1811.04047, 2018;(4)“A Twin-8T SRAM Computation-In-MemoryMacro for Multiple-Bit CNN-Based Machine Learning,”,ISSCC, pp. 396-398,2018,(5)“A 42 pJ/Decision 3.12TOPS/W Robust In-Memory Machine Learning Classifierwith On-Chip Training,” ISSCC, pp. 490-491,2018;
however, these multi-bit operations are implemented by modulating the control bus, capacitance charge sharing, Pulse-width-modulated (PWM) control of SRAM reading and writing, SRAM cell modification, or complex digital matrix vector processing using near \ memory operations in the current domain. In the implementation methods of the multi-bit operation, the multi-bit analog multiplier and accumulator always adopt very complicated digital processing control, but in the aspect of quantization with low signal to noise ratio, the traditional digital operation consumes a lot of effects compared with the analog operation, so the multi-bit operation under the control of the digital processing generates great operation energy consumption.
In the stage of performing the exclusive or operation by the binarization convolution proposed in CN201910068644, the potential change is realized by modulating a control bus in the SRAM, but the technical scheme and teaching provided by the patent require complex digital processing control, have high requirements on a control module, and consume excessive energy consumption. Therefore, there is a need in the art for a solution that employs analog operation for signals with low signal-to-noise ratio to achieve ultra-low power consumption.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a multi-bit binary convolution analog operation module based on time-variable current integration and charge sharing with ultra-low power consumption, compact structure and high operation speed, which supports general convolution of two or more inputs, and the bit number of the binary can be adjusted, especially for a neural network convolution operation unit or a unit for analog memory operation implemented by operation accelerator hardware.
Besides the advantages, the implementation of the related module based on the matrix unit is reasonable for the convolution-based operation unit in the memory or close to the memory, so that the power of the process related to the memory access is reduced, and the matrix physical implementation is more compact. In order to realize the purpose, the following technical scheme is adopted:
based on two stages of convolution operation, the invention provides a multi-bit convolution operation module based on time-adjustable current integration and charge sharing. The module includes: at least one digital input xiAt least one digital-to-Analog Converter (DAC) for converting the digital input into current for transmission in the circuit; at least one weight wjiWhen the weight is expressed as a binary number, wji,kIs the value at its k-th bit; each convolution operation unit (i, j, k) is used for 1 bit weighted 1 bit binary wji,kAnd 1 multi-bit binary xiThe multiplication operation of (1), a convolution operation array composed of a plurality of convolution operation units, the array completing the multiplication operation and the addition operation of the convolution operation; at least one output yj
Figure 372887DEST_PATH_IMAGE001
In particular, said current IxiIs to input digital x by DACiCurrent Ix converted according to a given number of bits of DACiMirrored or copied into the convolution array, the currents of the same j x k plane are the same, allowing the input of the multi-bit signal and the currents to be scaled in the DAC so that the currents arrive at the same time of the switches.
In particular, said array of convolution operations has a size i j k, each operation cell (i, j, k) comprising a current IxiSwitch, integral control module and node aji,kAnd at least one capacitor.
In particular, the integration control module controls the integration time of the current in the capacitor, and the obtained voltage at two ends of the capacitor is changed according to the current integration time from U = Q/C. For the weight wji,wji,kIs the weight wjiBinary representation of the value at the k-th bit, k ∈ [1, B ]]Each bit wji,kCorresponding to a convolution operation unit, the k-direction convolution operation unit is bit-dependentwji,kArranged from low to high.
In particular, w in the control moduleji,kAnd the AND gate output of the PWM signal controls the switch to be closed, the output is 1, and the switch is closed. The weight bit change of multiplicand or multiplier in multiplication stage during binary phase multiplication is realized in the module by controlling the integration time of current in capacitor through PWM signal, and different weight values wjiThe PWM signal durations of the units corresponding to the same k bit are the same; the duration of the PWM signal of the convolution operation unit corresponding to one bit after the same weighted value is 2 times of that of the previous bit, one end of the capacitor is grounded, and the voltage at the two ends of the capacitor is the voltage at the upper polar plate of the capacitor.
In particular, the logic operation of the integration control module may be an and gate or an or gate, and includes a Static Random-Access Memory (SRAM), which may be implemented by the same SRAM 6T cell or different SRAM cells, and a bit wji,k(ii) a The input of the logical operation is wji,kAnd PWM signals modulated according to the bit weights, wherein the PWM signals realize multiplication weight bit change, the duration time of the PWM signals is 2 times of that of the positioned bits, namely when k =1, 2 and 3, the duration time of the corresponding PWM signals is 1 tau, 2 tau and 4 tau, and the duration time of the PWM signals of the k bits is 2(k-1)*Tau, tau is the clock period of the PWM signal; output of logic operation controls switch closure, wji,kThe operation unit current of =0 is integrated without passing through the switch into the capacitor, and the voltage at the node above the capacitor is 0.
Further, when the logic operation is an and gate, the PWM signal duration refers to a duration of a high level, and when the logic operation is an or gate, the PWM signal duration refers to a duration of a low level.
Further, assume wji,1=wji,BIf the voltages across the corresponding capacitors, k = B, are 2 times the voltage of k =1, the capacitance k = B will be the voltage of the capacitor, and the amount of charge stored will be different after the currents in the capacitors have passed different integration times(k-1)And (4) doubling.
In particular, node aji,kAt a voltage of xi*wji,k*2(k-1)The result of the multiplier is the value w at each bit of the weight from the time the node is connected to the upper plate of the capacitorji,kAnd duration determination of the PWM signal; x is the number ofiThe combined voltage of the corresponding 1 × k convolution operation units is xi* wjiThe result of (1).
Further, y isjGiven a j, all a's connecting an i x k planeji,kThe voltage of the combined node obtained by the node is charge-shared by the capacitors in different operation units through the respective connected nodes due to the discharge characteristic of the capacitors, after the charge-sharing is finished, the charge amount in each capacitor is the same, but the total charge amount obtained by current integration in the multiplication stage is not changed, and the accumulated voltage at the combined node is
Figure 923954DEST_PATH_IMAGE002
As a result of (1), i.e.
Figure 67490DEST_PATH_IMAGE003
Completing the operation of the convolution process of a convolution kernel and an input matrix;
further, for a module to be used in a neural network arithmetic unit, it is usually necessary to add a bias. Offset b of the inventionjConversion to a given current IxiAdditional input of a fixed current IbAdding additional bias operation units for independent operation, wherein the size of the bias unit array is j × k, and each operation unit (j, k) comprises a current IbSwitch, integral control module and node aj,kA value of CuThe capacitance of (c).
Further, y isjOffset b ofjAll nodes a of the unit are 1 x kj,kAccumulated voltage sum
Further, a counter or clock divider is used to generate a PWM signal based on a clock at maximum speed, speeding up the capacitance integration speed.
Further, to reduce kickback or transient effects on the current mirror, the switch is a virtual switch or a current device or a non-switching element.
The invention also comprises a multi-bit convolution analog operation method based on time-variable current integration and charge sharing, which comprises the following steps:
DAC inputting digital number x according to given bit numberiCurrent Ix converted to analog signaliTransmitting in the circuit;
current IxiWhen reaching the switch, the integral control module comprises a logic operation, and the input of the logic operation is weight wjiK-th bit w ofji,kAnd PWM signals modulated according to the bit weight, the duration of the PWM signals in the convolution operation unit in the k direction is increased by 2 times from low bit to high bit, and the duration of the PWM signals of the k bit is 2(k-1)τ, τ being the clock period of the PWM signal, the output of the logical operation controlling the closing of the switch;
current Ix after switch is closediThrough a node a connected to the upper plate of the capacitorji,kThe voltage at two ends of the capacitor is obtained after the voltage is integrated for a period of time, and the current does not pass through the node a after the switch is switched offji,kAfter integrating for a period of time, the voltage at the two ends of the capacitor is 0, the integration time is the duration of the PWM signal, and the node aji,kIs x of the convolution operationi*wji,k*2(k-1)The multiplication result of (2);
all convolution operation unit inner node a for short-circuiting one i x k surfaceji,kThe electric charge sharing between the capacitors in each convolution operation unit obtains the voltage of the combined node as convolution operation
Figure 856455DEST_PATH_IMAGE004
Result of (a) yj
Drawings
FIG. 1 is a diagram illustrating a circuit implementation of a multiply stage of a convolution operation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an integration control module according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an output implementation of the convolution addition stage according to an embodiment of the present invention (A is not shown in the diagram)DC, when required, will yjCan be added to each output y when converted to digital outputjBefore);
FIG. 4 is a schematic diagram of a reuse of an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating an implementation of adding offset unit multiplication for convolution operation according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an output after being biased according to an embodiment of the present invention.
The main elements are indicated by symbols.
Module group 10
Digital-to-analog converter 101
Convolution operation unit 102
Integral control module 103
PWM signal 1031
Static random access memory 1032
And gate 1033
Switch with a switch body 1021
Capacitor with a capacitor element 1022
Multiplexer 104
Attenuating capacitor 105
Offset cell array 106
Offset operation unit 1061
Offset integral control module 1062
Digital input xi
Electric current Ixi
Weight of wji
Detailed Description
In order to make the objects, principles, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.
It is to be understood that the specific embodiments described herein are for purposes of illustration, but the invention may be practiced otherwise than as specifically described and that there may be variations which will occur to those skilled in the art without departing from the spirit of the invention and therefore the scope of the invention is not limited to the specific embodiments disclosed below.
Referring to fig. 1, for one general convolution operation as follows:
binary number x of multiple bitsiAn input matrix of i from 1 to N; a plurality of weights wjiForming a convolution kernel, also called a weight matrix, wherein j represents a corresponding jth window after i is determined; assuming that when an input matrix constituting n × n is inputted and a convolution kernel is a weight matrix of m × m, j is 1 to n-m +1 (n)>m, the window moves); the output is yjAll of yjForming a convolution operation result, namely extracting a layer of neural network features;
w isjiWhen represented as a binary number of multiple bits, wji,kIs wjiThe value at the k bit; two multi-bit binary Σ xi*wjiThe convolution operation process is divided into two stages:
a multiplication stage: input xiMultiplied by a weight wjiEach bit of (a) is multiplied by the bit weight of the bit 2(k-1)I.e. xi*wji,k*2(k-1),wji,kIs 0 or 1.
And (3) addition stage: accumulating and summing the result of each multiplication operation in the multiplication stage to obtain an output yj
Output yjUnder the condition of determining the size of convolution kernel, when the module of the invention is used for convolution calculation of neural network, the multiplication stage weight wjiThe weight matrix formed is shared, i.e. when j changes from 1 to n-m +1, wi1=wi2=wi3=.....=wji
For the above convolution operation with multi-bit binary, the present invention needs to solve the bit weight change when the multiplicand multiplies each bit of the multiplier in the multiplication stage and the addition stage of the accumulation of the multiplication result.
The embodiment of the invention provides an operation module 10 for realizing the multi-bit convolution operation, which is based on time adjustability of current integration and charge accumulation. The module 10 comprises: at least one digital inputInto xiAt least one Digital to analog converter (DAC) 101 converts the Digital input to a current IxiTransmitting in the circuit; at least one weight wjiWhen the weight is expressed as a binary number, wji,kBinary representing the value at the k bit for it; a convolution operation array comprising a plurality of convolution operation units 102, the convolution operation array having a size of i x j k, each convolution operation unit 102 (i, j, k) including a current IxiSwitch 1021, integral control module 103, node aji,kA value of CuCapacitor 1022, one end of capacitor 1022 is grounded, and capacitor 1022 needs to be reset to a given dc voltage before convolution operation. The array performs multiplication and addition of convolution operations, and at least one output yj
The multiplication stage, as shown in fig. 1, performs an and operation in conjunction with the PWM signal 1031 to achieve a weighted multi-bit. In the embodiment, the matrix unit is used for realizing the operation unit for convolution in the memory or near memory, so that the power of the process related to memory access is reduced, and the matrix physical realization is more compact. Specifically, the digital-to-analog converter 101 inputs a number xiCurrent Ix converted to analog signal according to given bit numberiResolution of DAC with digital input xiThe number of bits of (a) is identical. Current IxiThe current mirror images or copies the current mirror images to j × k convolution operation units 102 corresponding to the same i, so that the current integration of convolution operation units 102 in different i × k surfaces and j directions can be performed simultaneously. In particular, the k-directional weight wjiThe number of bits is increased, the corresponding convolution operation unit 102 is operated for each bit wji,kArranged from low to high. In particular, the current Ix required to be converted by the DACiThe current value can be controlled not to exceed a certain threshold value by being scaled in the DAC and then transmitted in the circuit according to the requirement, and the power loss of transmission is reduced. Thereafter current IxiThrough the switch 1021, the switch 1021 may be a virtual switch or a current device or a non-switching element such as a current device or a virtual load, in order to reduce kickback or transient effects on the current mirror.
Integral control module 103 controlFor example, the logic operation of the module 1021 may be an and gate 1033, the module includes a Static Random-Access Memory (SRAM) cell 1032, and for the entire convolution array, the SRAM cell may be the same SRAM 6T cell or a different SRAM cell, which stores a binary number wjiA certain bit w ofji,kK direction is weight wjiLow to high directions; the input to AND gate 1033 is wji,kAnd the output signal of the and gate 1033 controls the on-off of the switch 1021 according to the PWM signal 1031 modulated by the bit weight, thereby realizing the change of the bit weight when the multiplicand in the binary multiplication stage multiplies each bit of the multiplier. In particular, one input PWM signal 1031 of the and gate 1033 of the present invention is based on the corresponding weight w of the cell in which it is locatedjiWhen the duration of the PWM signal 1031 of the i × j cells corresponding to adjacent bits respectively increases by 2 times in the k direction, for example, k =1, 2, 3, the duration of the corresponding PWM signal 1031 is 1 τ, 2 τ, 4 τ, the duration of the PWM signal 1031 corresponding to the higher bit is 2 times that of the lower bit, and the duration of the PWM signal 1031 corresponding to the k-th bit is 2 times that of the lower bit(k-1)τ, τ is the clock period of PWM signal 1031. It should be noted that, in the present embodiment, the duration of the PWM signal 1031 refers to the duration of a high level; current position wji,kWhen 1 is asserted and PWM signal 1031 is high, and gate 1033 outputs 1, switch 1021 is closed and current IxiThe capacitor 1022 starts storing charge by integration through switch 1021 into capacitor 1022; when the high level duration of the PWM signal 1031 passes, the signal enters a low level state, the switch is turned off, and the current IxiWithout this, the current stops integrating in the capacitor 1022, no new charge is accumulated in the capacitor 1022 after the switch 1021 is turned off, and the stored charge is accumulated in a high state; thus, from U = Q/C, the present invention is directed to said wji,kA convolution operation unit 102 of 1, wherein the voltage across the capacitor 1022 is related to the charge amount stored by current integration in the capacitor 1022; w is aji,kAt 0, and gate 1033 outputs 0 no matter whether PWM signal 1031 is in high state or not, at this time, switch 1021In the off state, the current IxiWithout this, there is no current integration in the capacitor 1022, the stored charge is 0, and the voltage across the capacitor 1022 is 0. Based on the same principle, the logic operation of the integral control module 103 can be an or gate in another embodiment, in which the duration of the PWM signal 1031 is a low-level duration, and the PWM signals 1031 and w are the same as the low-level durationji,kAnd carrying out OR operation. In other embodiments, a counter or a clock divider is used to generate the PWM signal 1031 based on the maximum speed clock, i.e. to make τ as small as possible, to speed up the integration speed of the capacitor 1022, i.e. to speed up the time required for each operation of the multiplication operation, and the PWM signal 1031 is used for control in order to improve the flexibility of the system.
Specifically, when switch 1021 is in a closed state, current IxiTo node a through switch 1021ji,kThe node aji,kConnected to the upper plate of capacitor 1022, and then current IxiEntering into a capacitor 1022, for each convolution operation, the capacitor 1022 needs to be at the current IxiBefore the flow advances, the current is reset to a given direct current voltage, and the last operation result is cleared. The capacitor 1022 is grounded, and the voltage across the capacitor 1022 is node aji,kThe voltage of (d). After the current enters the capacitor 1022, the amount of charge stored in the capacitor 1022 increases with the lapse of the integration time, that is, when the switch 1021 is in the closed state, the current is continuously integrated, and at this time, the voltage across the capacitor 1022 gradually increases, and the integration time is the on-off time of the switch 1021.
For example, assume the weight wjiEach bit w in binary representation ofji,kCorresponding to the convolution operation unit, wji,1=wji,2=wji,3= …. =1, corresponding to the same subscript i, j respectively, corresponding to k =1, 2, 3, duration of the PWM signal 1031 is τ, 2 τ, 4 τ respectively, duration of the k-th bit PWM signal 1031 is 2(k-1)Duration of the PWM signal 1031 of the tau most significant bit is 2(B -1)τ, the capacitance of the capacitor 1022 in the convolution operation unit 102 is the same, and the current Ix in the capacitor 1022iAfter the respective integration time, the signal is represented by Q =
Figure 958403DEST_PATH_IMAGE005
It can be seen that at the current IxiThe same amount of charge stored in the capacitor 1022 and the same amount of current IxiThe integral time of (a) is proportional to the integral time of (B), and changes with 2 times of the rising direction of the bit, that is, the amount of charge stored in the capacitor 1022 corresponding to k =1, 2, and 3 is Q, 2Q, and 4Q, respectively, further, from U = Q/C, when the capacity of the capacitor 1022 is the same, the voltage across the capacitor 1022 is proportional to the amount of charge stored therein, the voltage across the capacitor 1022 corresponding thereto is U, 2U, and 4U, respectively, that is, the upper bit is 2 times the lower bit, and the value of the capacitor 1022 in the k = B convolution unit 102 is 2 times the voltage of the capacitor 1022 in the k =1 convolution unit 102(B-1)Multiplication, i.e. implementation of the weight wjiOr the multiplier multiplying each bit by the input xiOr the multiplicand with a weight bit, it is noted that above is only wjiIn fact, regardless of wji,kIs 0 or 1, which corresponds to the same current integration time in the convolution operation unit 102 as the duration of the PWM signal 1031, but wji,k=0 is an integral of a current value of 0, w, performed in the convolution operation section 102ji,kThe value Ix of 1 corresponding to the value performed in the convolution operation section 102iThe duration of the PWM signal 1031 varies only by a factor of 2 in bits, not by wji,kIs 0 or 1.
After the current integration is finished, because one end of the capacitor 1022 is grounded, the node a in each convolution operation unit 1022ji,kThe voltage at is the voltage across the capacitor 1022, and the voltage value is defined as xi*wji,k*2(k-1)A multiplier result of yes.
The addition phase, as in fig. 3, results in a convolution output through charge sharing. After all convolution operation units 102 of the present invention complete the current integration operation in the multiplication stage, for j =1, x1The corresponding k units finish x once1*w11Operation of (a), x1*w11Is broken to see the input x1Are respectively multiplied by the weight w11Each bit w of11,kAnd the bit weight of the bit 2(k-1)I.e. x1*w11,k*2(k-1)And then the results obtained respectively are added. For the same reason, xiCorresponding k units complete x oncei* wi1Operation, then j =1, i ∈ [1, N ∈]All the corresponding i x 1 x k arrays complete the multiplication of one convolution window, and the node a of each convolution operation unit 102 of the i x 1 x k arraysji,kThe voltage is the multiplication result, after the multiplication operation is completed, the capacitor 1022 is short-circuited, and the short circuit j =1 corresponds to the node a above all the capacitors 1022 in the arrayji,kAt this time, all the capacitors in the corresponding array are connected in parallel, due to the different charge amounts stored in the capacitors 1022 in each unit and the discharging characteristics of the capacitors 1022, the capacitors 1022 in the shorted array perform charge sharing, the charge amounts stored in each capacitor 1022 are the same, but the total charge value is unchanged, and the voltage of the obtained combined node is the voltage of each multiplication result node a in the multiplication stageji,kThe sum of the voltages being the output y1. In a further embodiment, for a convolutional neural network, where the weight matrix is shared, the convolution kernels for different windows are the same, i.e., the multiplicand (weight w) when different window convolution results are computedji) The weight matrices formed are identical, wj1=wj2=wj3=.....=wjiThe number of parameters participating in the operation is reduced. Similarly, other corresponding output y can be obtained by short-circuiting the arrays corresponding to other jjEquation 1 below:
Figure 465608DEST_PATH_IMAGE006
optionally, for output yjAnd (6) performing conversion. Y output after performing an accumulation operation of analog multiplication by the convolution operation arrayjIs an analog signal and outputs y when neededjWhen the signal is a Digital signal, an Analog-to-Digital Converter (ADC) is added before the output, and the obtained output yjIs a digital signal. For example, the convolution module is applied to a convolution neural network, and the digital output yjAnd can be used as digital input to convolution operation array to make second operationConvolution operation of the neural network of layers. Furthermore, if the accumulated voltage swings or is too high in the input range of the analog-to-digital converter, it is possible to increase the unit capacitance C by adding the unit capacitance C in the multiplication stage as shown in FIG. 1uHowever, the number of capacitors required for each set of convolution operation units 102 increases, and a larger physical area is required, which is disadvantageous for miniaturization of the device. Thus, consider that when connecting the combined nodes, an extra value of C is connected at the same timeattThe attenuation capacitor 105 is brought into the combining node, thereby adjusting the scale range of the accumulated voltage, so that the accumulated voltage is scaled to a certain scale range, and the input range of the digital-to-analog converter is satisfied. Whenever yj is output, the node a above the capacitor is attenuated by using the attenuation capacitor 105att,jWith the original node aji,kConnected, this solution makes more efficient use of the area physically realized by the modules.
The convolution operation module satisfies the requirement of unit reuse. For the physical implementation of the two-stage convolution operation described above, the weight wjiIs generally fixed, i.e. the size of k is fixed, at the input or weight wjiWhen the number of bits of binary representation is small, the high-bit unit does not participate in the operation, and when the convolution operation unit 102 corresponding to the high-bit is connected to the circuit, the power consumption of the circuit is increased, so that for the unit which does not participate in the operation, a simple method is to operate yjWhile, the unused binary weights w are switched offjiThe array units corresponding to the high digits are only connected to participate in the operation yjThe convolution operation unit 102, which is advantageous for reducing power consumption. However, this results in unused areas, especially in the weights w for operations using physical elementsjiIs the low bit number. Thus, consider the pair of input and weight wjiThe bit number of the unit is reconfigured to meet the operational flexibility of matrix input and weight internal quantization, the reuse of unused units is realized, and the reconfiguration process is as follows:
as in FIG. 4, a set of cells associated with the k bit of the weight is reused for input xiOr input xii,Corresponding currents are Ix respectivelyiOr IxiiThe voltage signals corresponding to the currents are VgxiOr Vgxii. The multiplexer control signal controlled according to bit k selects the voltage signal corresponding to the unused unit according to the corresponding unit of the remaining unused bit, i.e. the selected voltage V' gxiAnd respectively with VgxiOr VgxiiThe same is true. Then the current I' x in the cell corresponding to bit kiAnd IxiOr IxiiThe same is true. For example, assume that there is already one support 8-bit weight wjiThe convolution operation module of operation has only one weight w of 1 bitjiIf convolution is required, there are 7 (= 8-1) remaining convolution computing units 102 not participating in the computation, and the remaining 7 convolution computing units 102 can be used for inputting and inputting xiThe same input (i.e. I' x)i=Ixi) Performing 7 times of convolution operation of the weight of 1 bit; when the original input xiOr the original weight wjiFor 5 bits, it is clear that the remaining 3 groups of cells cannot perform the same convolution operation as the original input, and now consider performing another weight sum input Ix of less than or equal to 3 bitsii,At this time I' xi=Ixii. In particular, another implementation of reuse, since each group of cells is independent in the i direction, at a given input xiWhen i is smaller, the unused unit has no current input and no power loss; when i is larger and the weight wjiSmaller, excess of xiThe input can be to the convolution operation unit 102 corresponding to the weight bit which is not used by other inputs. In other embodiments, the current may pass through a diode in a current mirror via a voltage V' gxiControl, DAC can input for a given number of bits and ADC is possible for output yjIs reconfigured so that the DAC or ADC resolution can match the number of bits of the corresponding output.
Selecting matched inputs I' x at the multiplexeriAfter, with weight wjiThe associated PWM signal 1031 duration is reconfigured. Since the original physically implemented unused cells have PWM signals 1031 corresponding to the bit weights, the original cells are used for reuse, and the corresponding bit weights need to be changed, i.e., the corresponding bit weightsThe PWM signal 1031 duration needs to be changed so that the multiplication associated with bit k is coupled to input xiOr input xiiAnd (4) associating. Two extreme examples are used below to illustrate this reconfiguration capability. First, assuming that a physical implementation is available for operation with the maximum number of bits that can support the weight, i.e., k =8, and all convolution operation arrays of the physical implementation are shown in fig. 1, it is apparent that the duration of the array PWM signal 1031 ranges from τ to 2(B-1)τ. However, when the weight bit number k =1, the remaining 2-8 bit corresponding cells may be reused for inputting xiAt most 8 inputs can be in parallel, where all weights are Pulse Width Modulated (PWM) with a pulse width, i.e., duration of PWM signal 1031, τ, and all weights are quantized to a single bit, rather than to a weight w of 8 bits in the former casejiEach bit of (a) is quantized.
Fig. 5 and fig. 6 show an embodiment of adding an offset operation unit 1051 when the convolution operation unit 102 is used for convolution neural network operation according to the present invention. The addition of the offset b in view of the convolution operation makes the convolution operation more efficient and accurate, typically for a given output yjAdding a binary offset bj. Then the corresponding convolution output yjFrom equation 1 to equation 2 below.
Figure 412835DEST_PATH_IMAGE007
Figure 5 illustrates how this extra functionality is added in the multiplication stage. Since the quantization of the bias bits is performed in a similar manner to the weights in fig. 1 or fig. 2, the implementation of the bias is considered as a given current IxiAdditional input of a fixed current Ib
Offset b of the inventionjConversion to a given current IxiAdditional input of a fixed current IbThe calculation is performed by adding an additional offset calculation unit 1061, the offset calculation unit 1061 forms an offset calculation array 106 with the size of j × k, and each offset calculation unit 1061 (j, k) includes a current IbSwitch 1021, bias operation unit integral control module 1062, and node aj,kA value of CuThe capacitor 1022; current IbIntegrating in capacitor 1022, similar to the convolution stage, weight wjiTo bjThen the input of the offset AND gate in the offset unit integral control module 1062 is bj,kAnd bj,kThe output of the bit-weight modulated PWM signal 1031, the offset AND gate controls the closing time of the switch 1021, i.e., the integral time of the current inside the capacitor 1022 in the offset arithmetic unit (j, k) 1061 is bj,k*2(k-1)τ. Offset operation section 1061, PWM signal 1031, and weight w in convolution operation section 102 corresponding to the same kji,kThe PWM signal 1031 at is the same. It should be noted that, in the present embodiment, the duration of the PWM signal 1031 refers to the duration of a high level; when bit b j,k1, when PWM signal 1031 is high, the offset and gate output is 1, and at this time, switch 1021 is closed, and current IbIntegration through a switch into a capacitor 1022, which stores charge; when the high level duration of the PWM signal 1031 passes, the signal enters a low level state, the switch 1021 is turned off, and the current IbWithout this, the current stops integrating in the capacitor 1022, no new charge is accumulated in the capacitor 1022 after the switch 1021 is turned off, and the stored charge is accumulated in a high state; bj,kWhen the voltage is 0, the AND gate is biased to output 0, and the switch 1021 is in an off state, and the current I isbWithout this, there is no current integration in the capacitor 1022 and the stored charge is 0. Similarly, the voltage across capacitor 1022 is the result of the multiplication phase of offset unit 1061.
Fig. 6 illustrates that during the accumulation phase, an additional capacitor 1022 needs to be added for charge sharing and node accumulation.
Similarly, short circuit is given k unit nodes a corresponding to jj,kDue to the discharging characteristic of the capacitor 1022, the capacitors 1022 in the shorted array perform charge sharing, the amount of charge stored in each capacitor 1022 is the same, but the total charge value is not changed, and the voltage of the obtained combined node is the node a of each multiplication result in the multiplication stageji,kThe sum of voltages, i.e. yjBias b of (1) k groups of all nodes aj,kThe physical implementation of the convolution and offset operation units is independent as shown in fig. 6, but when the convolution result with the offset finally added is output, the corresponding nodes of the convolution operation unit 102 and the offset operation unit 1061 may be connected, and the voltage of the combined node obtained is the convolution result with the offset added.
The invention also comprises a multi-bit convolution analog operation method based on time-variable current integration and charge sharing, which comprises the following steps:
the digital-to-analog converter 101 inputs a digital number x in a given number of bitsiCurrent Ix converted to analog signaliTransmitted in the circuit.
Current IxiWhen the switch is reached, a logic operation is performed, the logic operation is performed in the integration control module 103, and the input of the logic operation is the weight wjiK-th bit w ofji,kAnd PWM signal 1031 modulated according to the bit weight, the duration of PWM signal 1031 in the k-direction convolution operation unit increases by 2 times from low bit to high bit, and the duration of PWM signal 1031 of the k-th bit is 2(k-1)τ, τ is the clock period of the PWM signal, and the output of the logic operation controls the closing of switch 1021. Current Ix after the switch 1021 is closediThrough a node a connected to the upper plate of the capacitorji,kThe voltage of the two ends of the capacitor is obtained after the voltage enters the capacitor 1022 for integration and is integrated for a period of time, and after the switch is switched off, the current does not pass through the node aji,kThe voltage across the capacitor 1022 obtained after integrating for a period of time is 0, the integration time is the duration of the PWM signal 1031, and the node aji,kIs x of the convolution operationi*wji,k*2(k-1)The multiplication result of (1). All nodes a in convolution operation unit 102 for short-circuiting one i x k surfaceji,kThe charge sharing between the capacitors 1022 in each convolution operation unit 102, the obtained voltage of the combined node is convolution operation
Figure 587465DEST_PATH_IMAGE008
Result of (a) yj
It should be noted that, in the foregoing embodiment, each included module is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (16)

1. Multibit convolution operation module based on changeable current integration of time and charge sharing, its characterized in that includes:
at least one digital input xiAt least one Digital to analog converter (DAC), at least one binary weight wjiA convolution operation array composed of a plurality of convolution operation units, at least one output yj
The digital input xiCurrent Ix converted into analog signal by DAC according to given bit numberiTransmitting in the circuit;
the binary weight wjiJ indicates that the weight is the weight index of the jth window, wji,kIs the weight wjiValue at k bit, wji,kIs 0 or 1, k ∈ [1, B ]]Wherein B refers to the highest bit of the binary system, each bit wji,kCorresponding to a convolution operation unit;
the convolution operation array has the scale of i x j x k, the direction of i is the input direction, the direction of j is the convolution window direction, and the convolution operation unit in the direction of k depends on the weight wjiEach bit w ofji,kAre sequentially arranged from low position to high position;
each convolution operation unit comprises an input current IxiSwitch, integral control module and node aji,kOne end of the capacitor is grounded;
the integral control module is given a logical operation, the input of which is wji,kAnd according to wji,kBit-weight modulated PWM signal, k-direction convolution operation sheetThe duration of the PWM signal in the unit is increased by 2 times from low bit to high bit, and the duration of the PWM signal at the k bit is 2(k-1)τ, τ is a clock period of the PWM signal, and the output of the integral control module controls the closing of the switch;
current Ix when the switch is closediThrough a node a connected to the upper plate of the capacitorji,kEntering capacitance internal integration; current Ix when the switch is offiDoes not pass through the node aji,k(ii) a The integration time being the duration of the PWM signal, node aji,kIs x of the convolution operationi*wji,k*2(k-1)The multiplication result of (2);
said yjBy shorting all nodes a in a convolution operation unit of an i x k surfaceji,kAnd sharing the charge among the capacitors in each convolution operation unit to obtain the voltage of the combined node, wherein the voltage is the result of the convolution operation.
2. The module of claim 1, wherein x isiThe combined voltage of the corresponding 1 × k convolution operation units is xi* wjiAs a result, the voltage at the combined node of the convolution operation unit for an i x k plane is
Figure 596142DEST_PATH_IMAGE001
And finishing the operation of the convolution process of the convolution kernel and the input matrix.
3. The module of claim 2, wherein said input xiIs at least one bit binary, converts the input xiThe resolution of the DAC of (1) can be adjusted.
4. A module as claimed in claim 3, characterized in that the current IxiMirrored or copied by a current mirror into a convolution operation array, the current of the same j x k surface is the same, and the current IxiScaling in digital-to-analog converters is possible.
5. A mould as claimed in claim 4Wherein the logic operation of the integration control module is an AND gate, and one of the inputs of the AND gate is a bit w stored in an SRAM cellji,kThe other one is a PWM signal which is increased by 2 times with tau as a base number along with the increasing duration of k bit by bit, and the output of the AND gate controls the switch to be closed; different weights wjiThe convolution operation units corresponding to the same k bit have the same PWM signal duration and the same weight wjiThe duration time of the PWM signals of the convolution operation units corresponding to different bits is different and is respectively 2(k-1)*τ。
6. The module of claim 5, wherein the counter or clock divider is used to generate the fastest speed PWM clock signal to speed up the capacitive integration speed.
7. The module of any of claims 1 to 6, wherein the switches in the convolution operation unit are non-switching elements such as virtual switches or current devices to reduce kickback or transient effects on the current mirror.
8. The module of claim 7 in which a number is input xiAnd a weight wjiCan be reconfigured for re-inputting the number xiOr new input xiiThe method comprises the following steps:
the multiplexer receiving the re-input xiAnd xiiAccording to the weight wjiThe convolution operation unit corresponding to the residual unused bit number selects the input voltage signal conforming to the unused unit, and the output voltage signal enters the convolution operation unit;
the PWM signal duration corresponding to the bit weight in the unused convolution operation unit for reuse is reconfigured.
9. A module as claimed in claim 8, characterized in that in the reuse stage, the number of bits of at least one of the multiplexers is adapted to the number of bits of the weight code, the output of the multiplexer being controlled by the number of weight bits k.
10. The module of claim 9, wherein the array of convolution operations further comprises a biasing module, the biasing module comprising:
an offset cell array comprising a plurality of offset cells, said array of offset cells having a size j x k, each offset cell (j, k) comprising a current IbSwitch, integral control module and node aj,kA value of CuThe capacitance of (2);
the bias current IbIs a current IxiAn additional fixed current;
bj,kis a multi-bit binary bias bjThe integral time of the current inside the capacitor in the offset operation unit (j, k) is bj,k*2(k-1)τ;
In the integral control module, bj,kAnd is aj,kThe PWM signal of the bit weight modulation is output through an AND gate operand to control the switch to be closed and control the bias current I in the capacitor in the bias operation unitbThe integration time of (d);
yjis biased to 1 × k groups of all nodes a of the cellj,kThe sum of the voltages is accumulated.
11. The module of claim 10, wherein the output y is output when the accumulated voltage swing at the combining node is above the adc input range or above a threshold valuejThe full scale range of the accumulated voltage is adjusted by connecting a damping capacitor in parallel before connecting the analog-to-digital converter.
12. The multi-bit convolution operation method based on time-variable current integration and charge sharing is characterized by comprising the following steps of:
DAC inputting digital number x according to given bit numberiCurrent Ix converted to analog signaliTransmitting in the circuit;
current IxiWhen the switch is reached, a logic operation is performed, the input of which is the weight wjiK-th bit w ofji,kAnd according to wji,kBit weightsModulated PWM signal, the duration of PWM signal in convolution operation unit in k direction is increased by 2 times from low bit to high bit, and the duration of PWM signal in k bit is 2(k-1)τ, τ being the clock period of the PWM signal, the output of the logical operation controlling the closing of the switch;
current Ix after switch is closediThrough a node a connected to the upper plate of the capacitorji,kThe voltage at two ends of the capacitor is obtained after the voltage is integrated for a period of time, and the current does not pass through the node a after the switch is switched offji,kAfter integrating for a period of time, the voltage at the two ends of the capacitor is 0, the integration time is the duration of the PWM signal, and the node aji,kIs x of the convolution operationi*wji,k*2(k-1)The multiplication result of (2);
all convolution operation unit inner node a for short-circuiting one i x k surfaceji,kThe electric charge sharing between the capacitors in each convolution operation unit obtains the voltage of the combined node as convolution operation
Figure 796179DEST_PATH_IMAGE002
Result of (a) yj
13. The method of claim 12, wherein the DAC is converting a digital input xiPreviously, the resolution of the DAC is adjusted.
14. The method of claim 13, wherein prior to performing the logical operation, a counter or clock divider is used to generate a fastest speed PWM clock signal to increase an integration speed of the current.
15. The method of claim 14, wherein x is input onceiThereafter, reusing the unused convolution operation units, including:
receiving a re-input x using a multiplexeriAnd xiiAccording to the weight wjiSelection of convolution operation units corresponding to the remaining unused bits and the unused unit symbolsThe combined input voltage signal and the output voltage signal enter a convolution operation unit; after the input voltage signal is selected, the PWM signal durations corresponding to the bit weights in the unused convolution operation units are reconfigured.
16. The method of claim 15, wherein y is connected to the ADC outputjPreviously, attenuation capacitors were connected in parallel to adjust the full scale range of the accumulated voltage, making the accumulated voltage swing at the combining node lower than the analog-to-digital converter input range.
CN202010257151.0A 2020-04-03 2020-04-03 Multi-bit convolution operation module based on time-variable current integration and charge sharing Active CN111144558B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010257151.0A CN111144558B (en) 2020-04-03 2020-04-03 Multi-bit convolution operation module based on time-variable current integration and charge sharing
PCT/CN2021/081322 WO2021197073A1 (en) 2020-04-03 2021-03-17 Multi-bit convolution operation module based on time-variable current integration and charge sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010257151.0A CN111144558B (en) 2020-04-03 2020-04-03 Multi-bit convolution operation module based on time-variable current integration and charge sharing

Publications (2)

Publication Number Publication Date
CN111144558A true CN111144558A (en) 2020-05-12
CN111144558B CN111144558B (en) 2020-08-18

Family

ID=70528805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010257151.0A Active CN111144558B (en) 2020-04-03 2020-04-03 Multi-bit convolution operation module based on time-variable current integration and charge sharing

Country Status (2)

Country Link
CN (1) CN111144558B (en)
WO (1) WO2021197073A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232501A (en) * 2020-12-11 2021-01-15 中科院微电子研究所南京智能技术研究院 Memory computing device
WO2021197073A1 (en) * 2020-04-03 2021-10-07 深圳市九天睿芯科技有限公司 Multi-bit convolution operation module based on time-variable current integration and charge sharing
CN113516172A (en) * 2021-05-19 2021-10-19 电子科技大学 Image classification method based on random computation Bayesian neural network error injection
WO2021232949A1 (en) * 2020-05-18 2021-11-25 深圳市九天睿芯科技有限公司 Sub-unit, mac array, bit-width reconfigurable hybrid analog-digital in-memory computing module
CN114723031A (en) * 2022-05-06 2022-07-08 北京宽温微电子科技有限公司 Computing device
WO2023207441A1 (en) * 2022-04-27 2023-11-02 北京大学 Sram storage and computing integrated chip based on capacitive coupling

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11899518B2 (en) 2021-12-15 2024-02-13 Microsoft Technology Licensing, Llc Analog MAC aware DNN improvement
US20230386565A1 (en) * 2022-05-25 2023-11-30 Stmicroelectronics International N.V. In-memory computation circuit using static random access memory (sram) array segmentation and local compute tile read based on weighted current

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629406A (en) * 2017-03-24 2018-10-09 展讯通信(上海)有限公司 Arithmetic unit for convolutional neural networks
CN108764467A (en) * 2018-04-04 2018-11-06 北京大学深圳研究生院 For convolutional neural networks convolution algorithm and full connection computing circuit
CN109104197A (en) * 2018-11-12 2018-12-28 合肥工业大学 The coding and decoding circuit and its coding and decoding method of non-reduced sparse data applied to convolutional neural networks
CN109460817A (en) * 2018-09-11 2019-03-12 华中科技大学 A kind of convolutional neural networks on piece learning system based on nonvolatile storage
GB2568102A (en) * 2017-11-06 2019-05-08 Imagination Tech Ltd Exploiting sparsity in a neural network
CN109800876A (en) * 2019-01-18 2019-05-24 合肥恒烁半导体有限公司 A kind of data operating method of the neural network based on NOR Flash module
CN110378193A (en) * 2019-05-06 2019-10-25 南京邮电大学 Cashmere and Woolens recognition methods based on memristor neural network
CN110543933A (en) * 2019-08-12 2019-12-06 北京大学 Pulse type convolution neural network based on FLASH memory array

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2699307B1 (en) * 1992-12-15 1995-03-03 France Telecom Method and device for analog convolution of images.
US20190251429A1 (en) * 2018-02-12 2019-08-15 Kneron, Inc. Convolution operation device and method of scaling convolution input for convolution neural network
CN108629411A (en) * 2018-05-07 2018-10-09 济南浪潮高新科技投资发展有限公司 A kind of convolution algorithm hardware realization apparatus and method
CN108805270B (en) * 2018-05-08 2021-02-12 华中科技大学 Convolutional neural network system based on memory
CN110008440B (en) * 2019-04-15 2021-07-27 恒烁半导体(合肥)股份有限公司 Convolution operation based on analog matrix operation unit and application thereof
CN111144558B (en) * 2020-04-03 2020-08-18 深圳市九天睿芯科技有限公司 Multi-bit convolution operation module based on time-variable current integration and charge sharing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629406A (en) * 2017-03-24 2018-10-09 展讯通信(上海)有限公司 Arithmetic unit for convolutional neural networks
GB2568102A (en) * 2017-11-06 2019-05-08 Imagination Tech Ltd Exploiting sparsity in a neural network
CN108764467A (en) * 2018-04-04 2018-11-06 北京大学深圳研究生院 For convolutional neural networks convolution algorithm and full connection computing circuit
CN109460817A (en) * 2018-09-11 2019-03-12 华中科技大学 A kind of convolutional neural networks on piece learning system based on nonvolatile storage
CN109104197A (en) * 2018-11-12 2018-12-28 合肥工业大学 The coding and decoding circuit and its coding and decoding method of non-reduced sparse data applied to convolutional neural networks
CN109800876A (en) * 2019-01-18 2019-05-24 合肥恒烁半导体有限公司 A kind of data operating method of the neural network based on NOR Flash module
CN110378193A (en) * 2019-05-06 2019-10-25 南京邮电大学 Cashmere and Woolens recognition methods based on memristor neural network
CN110543933A (en) * 2019-08-12 2019-12-06 北京大学 Pulse type convolution neural network based on FLASH memory array

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021197073A1 (en) * 2020-04-03 2021-10-07 深圳市九天睿芯科技有限公司 Multi-bit convolution operation module based on time-variable current integration and charge sharing
WO2021232949A1 (en) * 2020-05-18 2021-11-25 深圳市九天睿芯科技有限公司 Sub-unit, mac array, bit-width reconfigurable hybrid analog-digital in-memory computing module
US11948659B2 (en) 2020-05-18 2024-04-02 Reexen Technology Co., Ltd. Sub-cell, mac array and bit-width reconfigurable mixed-signal in-memory computing module
CN112232501A (en) * 2020-12-11 2021-01-15 中科院微电子研究所南京智能技术研究院 Memory computing device
CN113516172A (en) * 2021-05-19 2021-10-19 电子科技大学 Image classification method based on random computation Bayesian neural network error injection
CN113516172B (en) * 2021-05-19 2023-05-12 电子科技大学 Image classification method based on Bayesian neural network error injection by random calculation
WO2023207441A1 (en) * 2022-04-27 2023-11-02 北京大学 Sram storage and computing integrated chip based on capacitive coupling
CN114723031A (en) * 2022-05-06 2022-07-08 北京宽温微电子科技有限公司 Computing device
CN114723031B (en) * 2022-05-06 2023-10-20 苏州宽温电子科技有限公司 Computing device

Also Published As

Publication number Publication date
CN111144558B (en) 2020-08-18
WO2021197073A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
CN111144558B (en) Multi-bit convolution operation module based on time-variable current integration and charge sharing
US20210365241A1 (en) Multiplication and accumulation circuit based on radix-4 booth code and differential weight
CN111448573B (en) System and method for mixed signal computation
Wang et al. Low power convolutional neural networks on a chip
CN111431536A (en) Subunit, MAC array and analog-digital mixed memory computing module with reconfigurable bit width
CN110378475B (en) Multi-bit parallel binary synapse array-based neuromorphic computing circuit
CN115048075A (en) SRAM (static random Access memory) storage and calculation integrated chip based on capacitive coupling
US11809837B2 (en) Integer matrix multiplication based on mixed signal circuits
CA3137231A1 (en) Training of artificial neural networks
CN111611529B (en) Multi-bit convolution operation module with variable capacitance, current integration and charge sharing
US20230161627A1 (en) High-energy-efficiency binary neural network accelerator applicable to artificial intelligence internet of things
CN112181895A (en) Reconfigurable architecture, accelerator, circuit deployment and data flow calculation method
Lee et al. A charge-sharing based 8t sram in-memory computing for edge dnn acceleration
Liu et al. A 40-nm 202.3 nJ/classification neuromorphic architecture employing in-SRAM charge-domain compute
Yu et al. A 4-bit mixed-signal MAC array with swing enhancement and local kernel memory
Chen et al. SAMBA: Single-ADC multi-bit accumulation compute-in-memory using nonlinearity-compensated fully parallel analog adder tree
CN111611528B (en) Multi-bit convolution operation module with variable current value, current integration and charge sharing
TWI818547B (en) Apparatus, method, article, system and device related to mixed signal circuitry for bitwise multiplication with different accuracies
Lim et al. AA-ResNet: Energy efficient all-analog ResNet accelerator
Gi et al. A ReRAM-based convolutional neural network accelerator using the analog layer normalization technique
Lin et al. A reconfigurable in-SRAM computing architecture for DCNN applications
CN115691613A (en) Charge type memory calculation implementation method based on memristor and unit structure thereof
CN112784971A (en) Neural network operation circuit based on digital-analog hybrid neurons
Youssefi et al. Efficient mixed-signal synapse multipliers for multi-layer feed-forward neural networks
Patel et al. Low-power multi-layer perceptron neural network architecture for speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant