CN116504281A

CN116504281A - Computing unit, array and computing method

Info

Publication number: CN116504281A
Application number: CN202210054491.2A
Authority: CN
Inventors: 赵先成
Original assignee: Zhejiang Lide Instrument Co ltd
Current assignee: Zhejiang Lide Instrument Co ltd
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2023-07-28

Abstract

The invention discloses a calculation unit, a calculation unit array and a calculation method for performing high-parallel calculation by using the calculation unit, which mainly solve the problems that the accuracy is low due to nonlinear effect of a device current-voltage relationship and the like in the prior art, and the power consumption is high during parallel calculation. The calculation unit includes: a tree-like network comprising a parent node and at least two child nodes, the parent node and at least one child node having a turn-on capability therebetween; at least one forward current driver stage connected to a parent node of the network; and the plurality of current sampling stages at the back end are connected with the sub-nodes of the network in a one-to-one correspondence manner. The invention realizes decoupling of multiplication operation at the device end in linear operation by utilizing a current distribution mode, and can be applied to a large-scale analog computing system integrating memory and calculation.

Description

Computing unit, array and computing method

Technical Field

The present invention relates to the field of integrated circuits, and more particularly to architecture and operation of integrated circuit computing chips.

Background

With the development of the internet of things and cloud computing, the complexity of computing tasks is increased day by day, and greater pressure is brought to a computing chip. The main performance indexes of the computing chip can be divided into power consumption and calculation power indexes. The power consumption index is critical in the scenes of limited energy supply such as the internet of things and the scenes of large energy consumption such as cloud computing. Therefore, how to further reduce the power consumption of the chip on the premise of providing a given computing power is an important research direction of computing the chip.

The power consumption of the computing chip mainly comes from two parts of operation, namely data reading and writing and data computing. For certain calculations, such as reasoning and training calculations involved in deep neural networks, a large number of data read and write operations may occur. The target memory for data reading and writing can be divided into on-chip memory and off-chip memory. Since Static Random Access Memory (SRAM) is widely used for on-chip storage, its density is low, and thus its storage capacity tends to be small. When a large number of data read and write operations are encountered, access to off-chip storage is required. And (3) examining data read-write operation, and charging and discharging parasitic capacitance around the interconnection line are main energy consumption sources. While off-chip storage for data access via larger pitch interconnect lines and pins can result in much greater power consumption than on-chip storage. For this phenomenon, deploying data in on-chip storage as much as possible, and further reducing data access to on-chip storage is an important strategy to optimize the energy efficiency of the computing chip.

The architecture is one specific implementation. This architecture can partially solidify the computation data within the computation unit by constructing a memristor-based high-density computation unit, thereby reducing access operations to the data. Meanwhile, the method adopts an analog calculation mode, so that the method has certain robustness for storing data, and therefore, larger-scale on-chip storage can be realized. A typical in-memory calculation method is to input a voltage vector V through a digital-to-analog converter (DAC) on one side of a memristor array with a conductance value of matrix G, and measure an output current vector I on the other orthogonal side, so as to implement matrix-vector multiplication calculation of i=gv. In the calculation scene of neural network reasoning, G corresponds to network weight, updating is less, and V corresponds to a feature map of data, and updating is more.

The above-described exemplary calculation scheme places a fundamental requirement on memristor devices, namely the need to exhibit a highly linear current-voltage relationship. Existing memory technologies have only a small linear region, so that to meet this requirement, the input voltage needs to be limited to a small range, which in turn presents a significant challenge for the input circuit. Second, in the case of full parallelism, the interconnect lines in the lateral direction of the array all input signals, so that parasitic capacitance charging of the interconnect lines to the input voltage is required. This process consumes a lot of energy.

Disclosure of Invention

The invention aims to solve the technical problems and provides a computing unit, an array and a computing method.

The invention aims at realizing the following technical scheme:

in a first aspect, there is provided a computing unit comprising:

a tree-like network comprising a parent node and at least two child nodes, said parent node and at least one child node having conduction capability therebetween;

at least one forward current driver stage connected to a parent node of the network;

and the plurality of current sampling stages at the back end are connected with the sub-nodes of the network in a one-to-one correspondence manner.

Further, the current driving stage comprises a voltage digital-to-analog converter and a current limiting resistor which are connected in series.

Further, the turn-on capability of the network can be adjusted by, but not limited to, the following means:

mode one: the computing unit comprises a storage structure, wherein the storage structure is provided with at least two ports which meet the following conditions, namely a first port and a second port, namely when different data are stored, different fixed inputs are given to the first port and the second port, and the current between the two ports is different;

the child nodes are connected with at least one port one or port two of the storage structure, and the storage structure is not connected with the port one or port two of the child nodes and the father node; the conduction capacity adjustment of the network is realized by writing data into the storage structure;

in the mode, the storage structure can adopt a memristor, and the conduction capacity of the network is adjusted by carrying out state adjustment on the memristor.

Mode two: and one or more of a magnetic resistance memory, a resistance change memory, a phase change memory, a ferroelectric memory, a static random access memory, a dynamic random access memory, a floating gate memory and a charge trapping memory are arranged between the father node and the child node and are used for realizing the conduction capacity adjustment of the network.

In a second aspect, there is provided a computing method of the computing unit according to the first aspect, including:

setting the potential of the input end of the current sampling stage to be the same as the zero potential of the current driving stage;

step two, applying current to the current driving stage;

step three, current sampling is carried out in the current sampling stage;

the sequence is satisfied, the second step is prior to the third step, and the first step is prior to the third step.

In a third aspect, a computing unit array for implementing parallel computing is provided, in the following manner:

mode one: comprising a plurality of computing units as described in the first aspect;

a back-end current sampling stage connects at least the sub-nodes within two different computational units.

Mode two: comprising a plurality of computing units as described in the first aspect;

a forward current drive stage connects at least parent nodes within two different computing units.

Mode three: comprising a plurality of computing units as described in the first aspect;

a back-end current sampling stage is connected with at least two sub-nodes in different computing units;

A fourth aspect provides a computing method of the computing unit array according to the third aspect, comprising:

step two, applying current to the current driving stage;

step three, current sampling is carried out in the current sampling stage;

The beneficial effects of the invention are as follows: the invention provides a novel calculation principle, based on the principle, a device can work in a low-voltage state by using current input in a proper range, and current is convenient and controllable during array calculation, so that the problem of nonlinearity in the current-voltage relation of the memristor device and the problem of higher power consumption during parallel operation of array scale are solved.

Those skilled in the art will recognize additional features and advantages upon reading the following detailed description, and upon viewing the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a structure and a calculation method of a conventional in-memory calculation;

FIG. 2 illustrates an implementation of an analog computing unit and its corresponding peripheral circuitry;

fig. 3 shows the basic principle of an analog computation unit to implement computation;

FIG. 4 illustrates an implementation of a current drive stage;

FIG. 5 illustrates an implementation of an analog computing unit and its corresponding peripheral circuitry;

fig. 6 shows the basic principle of an analog computation unit implementing computation;

FIG. 7 illustrates an implementation of an analog computation unit including a magnetic tunnel junction and its corresponding peripheral circuitry;

FIG. 8 illustrates a charge storage based structure;

FIG. 9 shows a static random access memory based architecture;

fig. 10 shows a dynamic random access memory based architecture.

Detailed Description

In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. For example, features illustrated or described with respect to one embodiment can be used on or in conjunction with other embodiments to yield yet a further embodiment. It is intended that the present invention encompass such modifications and variations. The embodiments are described using specific language (which should not be construed as limiting the scope of the appended claims). The figures are not drawn to scale and are for illustrative purposes only.

The invention requires to solve the primary technical problems that: how to reduce the linearity requirement of the traditional in-memory computing mode.

Referring to fig. 1, the mainstream in-memory calculation mode is that a voltage vector V is input through a digital-to-analog converter (DAC) on one side of a memristor array with a conductance value of matrix G, and matrix-vector multiplication calculation of i=gv can be implemented by measuring an output current vector I on the other orthogonal side. The above-described exemplary calculation scheme places a fundamental requirement on memristor devices, namely the need to exhibit a highly linear current-voltage relationship. However, existing memory technologies only have a small linear region, so that to meet this requirement, the range of the input voltage needs to be limited to a small range, which in turn presents a significant challenge for the input circuit. For voltage inputs, attempting to fix the voltage at millivolt levels is subject to circuit noise and circuit nonlinearities, which can present considerable difficulties.

In order to solve the problem, the invention provides a new in-memory calculation mode, and multiplication operation is realized through current distribution. This approach is based on a new analog computation unit, comprising:

a tree-like network comprising a parent node and at least two child nodes, said parent node and at least one child node having a certain conduction capability therebetween;

The basic idea is that, unlike the voltage signal, the current corresponds to a constant amount of charge, so that better robustness to noise exists. Meanwhile, the current range corresponding to the parallel calculation of the existing memory is friendly to circuit design, so that the current can be regulated and controlled linearly and stably in the target range.

As shown in fig. 2, is a network that includes a parent node 210, three child nodes 231, 232, and 233. The parent and child nodes are connected by adjustable resistors 221, 222 and 223, respectively. The voltage drop across the device can be maintained at a low level with a wide range of current flow by configuring the nodes and current input for array conductance adaptation, thereby ensuring that their conductance is close to constant. In this configuration, the additional benefits introduced are as follows: first, as the voltage decreases, the charge-discharge energy consumption for the wire capacitance is reduced. Second, the current on the array memory device is maintained at a low level and joule heating is reduced. In the two points, the calculated power consumption can be greatly reduced.

Fig. 3 further illustrates the principle of this calculation. Let us examine the following configuration, with the parent node 310 input current I ₀ The conductance value of the adjustable on-resistance 321 is G ₁ The conductance value of the adjustable on-resistance 322 is G ₂ The conductance value of the adjustable on-resistance 323 is G ₃ The potential of the child node 331 is 0 with respect to the parent node 310, the potential of the child node 332 is 0 with respect to the parent node 310, and the potential of the child node 333 is 0 with respect to the parent node 310. Under the above configuration, the current sampling stage corresponding to the child node can be solved to obtain the following current, I ₁ ＝I ₀ *G ₁ /(G ₁ +G ₂ +G ₃ )，I ₂ ＝I ₀ *G ₂ /(G ₁ +G ₂ +G ₃ )，I ₃ ＝I ₀ *G ₃ /(G ₁ +G ₂ +G ₃ ). It can be seen that child node 331 corresponds to I ₀ And G ₁ /(G ₁ +G ₂ +G ₃ ) Is multiplied by child node 332 corresponds to I ₀ And G ₂ /(G ₁ +G ₂ +G ₃ ) Is the multiplication of child node 333 corresponding to I ₀ And G ₃ /(G ₁ +G ₂ +G ₃ ) Is a multiplication of (a) by (b). As described above, the memory integrated device is required to have a conductive capability with a resistive characteristic. The structure provided by the application can weaken the requirements into a structure with certain conduction capacity through decoupling of the resistance characteristic and the conduction capacity, so that the applicability of the memory device to a memory integrated architecture is greatly improved.

As shown in fig. 4, in at least one embodiment, the current drive stage 400 may comprise at least one voltage digital-to-analog converter 401 and at least one current limiting resistor 402, wherein the current limiting resistor conductance value is much smaller than the driven sub-node load. In the structure of FIG. 4, G ₀ <<G ₁₁ +G ₁₂ +G ₁₃ . Considering that voltage digital-to-analog converters are more common than current digital-to-analog converters, the use of the former as a base unit is in some cases easier to implement. In conventional implementations, the universal voltage-to-current conversion needs to be implemented by complex analog circuitry. By combining and utilizing the specificity of the tree network load resistance in the present embodiment, the current driving stage 400 based on the current limiting resistor 402 can achieve this function with smaller power consumption, area and parasitic capacitance, thereby improving the overall performance of the circuit.

In linear algebra, two basic vector calculations are scalar multiplication of vectors, c=ab, and inter-vector addition c=a+b. In the above embodiment, scalar multiplication of vectors is implemented, i.e. i=i ₀ G, wherein i= [ I ₁ ,I ₂ ,I ₃ ],G＝[G ₁ ,G ₂ ,G ₃ ]. Further, in the embodiment shown in fig. 5, a structure capable of realizing a combination of scalar multiplication of vectors and addition between vectors is realized, including:

a plurality of the analog computation units, two analog computation units 511 and 512 are shown in fig. 5;

one of the back-end current sampling stages is connected with at least two sub-nodes in different analog computing units;

specifically, the sub-nodes 520 and 521 are connected to the same current sampling stage 531, the sub-nodes 522 and 523 are connected to the same current sampling stage 532, and the sub-nodes 524 and 525 are connected to the same current sampling stage 533.

By further expanding the embodiment of fig. 5, that is, using a specific number of analog computation units containing a specific number of child nodes, any linear combination between vectors can be implemented, thereby implementing a general linear algebraic computation.

Fig. 6 further illustrates the principle of this calculation. Let us examine the following configuration, with the parent node 601 input current I ₁₀ Parent node 602 inputs current I ₂₀ The conductance value of the adjustable on-resistance 611 is G ₁₁ The conductance value of the adjustable on-resistance 612 is G ₂₁ The conductance value of the adjustable on-resistance 613 is G ₁₂ The conductance value of the adjustable on-resistance 614 is G ₂₂ The conductance value of the adjustable on-resistance 615 is G ₁₃ The conductance value of the adjustable on-resistance 616 is G ₂₃ All child node potentials are placed at the common zero potential of parent nodes 601 and 602. Under the above configuration, the current sampling stage corresponding to the child node can be solved to obtain the following currents

I ₁ ＝I ₁₀ *G ₁₁ /(G ₁₁ +G ₁₂ +G ₁₃ )+I ₂₀ *G ₂₁ /(G ₂₁ +G ₂₂ +G ₂₃ )，

I ₂ ＝I ₁₀ *G ₁₂ /(G ₁₁ +G ₁₂ +G ₁₃ )+I ₂₀ *G ₂₂ /(G ₂₁ +G ₂₂ +G ₂₃ )，

I ₃ ＝I ₁₀ *G ₁₃ /(G ₁₁ +G ₁₂ +G ₁₃ )+I ₂₀ *G ₂₃ /(G ₂₁ +G ₂₂ +G ₂₃ )。

It can be seen that the above described calculation structure and calculation method achieves multiplication of vector X and matrix G and results in vector Y, i.e. y=gx, where x= [ I ₁₀ ,I ₂₀ ,I ₃₀ ],G＝[G ₁₁ ,G ₁₂ ,G ₁₃ ；G ₂₁ ,G ₂₂ ,G ₂₃ ] ^T ,Y＝[I ₁ ,I ₂ ,I ₃ ]。

In at least one embodiment, the memory may be a memristor. By performing analog adjustment of the conduction characteristics, the calculation function can be realized more efficiently.

In at least one embodiment, the memory may include at least one magnetic tunneling junction (Magnetic tunnel junction, MTJ). The magnetic tunneling junction enables non-volatile storage of information, thus eliminating the need for reconfiguration of the network after each power down.

As shown in fig. 7, in at least one embodiment, the memory may be comprised of a gating device and a magnetic tunneling junction. For example, memory 761 is comprised of a magnetic tunnel junction 733 and a gating device 743 in series. Wherein optionally the gating device may be controlled by a signal line. For example, the gating device 741 is controlled by the signal line 711. The adjustment of the turn-on capability between the child node and the parent node is achieved by writing different values in the memory. Specifically, this is achieved by writing data 0 to the magnetic tunneling junction, setting to a high resistance state, or writing data 1, setting to a low resistance state. The gating device may be a transistor, a transmission gate, or a diode. During the computation, the gating devices in the memories involved in the computation are set to an on state. The input port potentials of the current analog-to-digital converters 751, 752, 753 are set to zero. The current digital-analog converter inputs a current signal, and performs signal sampling in the current analog-digital converter, and a calculated value is obtained after certain signal processing. The phase change memory and the resistive memory are similar in embodiments.

As shown in fig. 8, in at least one embodiment, the memory structure may be a ferroelectric type memory, a floating gate type memory, or a charge trapping type memory. Wherein a parent node is connected to interconnect 802 and a child node is connected to interconnect 803. Information storage and corresponding turn-on capability adjustment may be achieved by an isolated charge storage structure 811.

As shown in fig. 9, in at least one embodiment, the storage structure may be a static random access memory. Wherein a parent node is connected to interconnect 902 and a child node is connected to interconnect 904. Information storage and corresponding turn-on capability adjustment may be accomplished through a latch 911 similar to a static random access memory mechanism.

In at least one embodiment, as shown in FIG. 10, the storage structure may be a dynamic random access memory. Wherein a parent node is connected to interconnect 1002 and a child node is connected to interconnect 1004. Information storage and corresponding adjustment of the turn-on capability can be achieved by an isolatable capacitor structure 1011.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The foregoing is merely a preferred embodiment of the present invention, and the present invention has been disclosed in the above description of the preferred embodiment, but is not limited thereto. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A computing unit, comprising:

2. The computing unit of claim 1, wherein the computing unit is configured to,

the current driving stage comprises a voltage digital-to-analog converter and a current limiting resistor which are connected in series.

3. The computing unit of claim 1, wherein the computing unit is configured to,

the turn-on capability of the network can be adjusted.

4. The computing unit of claim 3, wherein the computing unit,

the computing unit comprises a storage structure, wherein the storage structure is provided with at least two ports which meet the following conditions, namely a first port and a second port, namely when different data are stored, different fixed inputs are given to the first port and the second port, and the current between the two ports is different;

the child nodes are connected with at least one port one or port two of the storage structure, and the storage structure is not connected with the port one or port two of the child nodes and the father node; and implementing the conduction capacity adjustment of the network by writing data into the storage structure.

5. The computing unit of claim 4, wherein the computing unit is configured to,

the storage structure is a memristor, and the conduction capacity of the network is adjusted by carrying out state adjustment on the memristor.

6. The computing unit of claim 4, wherein the computing unit is configured to,

and one or more of a magnetic resistance memory, a resistance change memory, a phase change memory, a ferroelectric memory, a static random access memory, a dynamic random access memory, a floating gate memory and a charge trapping memory are arranged between the father node and the child node and are used for realizing the conduction capacity adjustment of the network.

7. A computing method of a computing unit according to any one of claims 1-6, comprising:

step two, applying current to the current driving stage;

step three, current sampling is carried out in the current sampling stage;

8. An array of computational cells for performing parallel computation, characterized in that,

comprising a plurality of computing units according to any of claims 1-6;

9. An array of computational cells for performing parallel computation, characterized in that,

comprising a plurality of computing units according to any of claims 1-6;

10. An array of computational cells for performing parallel computation, characterized in that,

comprising a plurality of computing units according to any of claims 1-6;

11. A computing method of the computing unit array according to any one of claims 8 to 10, comprising:

step two, applying current to the current driving stage;

step three, current sampling is carried out in the current sampling stage;