CN116504281A - Computing unit, array and computing method - Google Patents

Computing unit, array and computing method Download PDF

Info

Publication number
CN116504281A
CN116504281A CN202210054491.2A CN202210054491A CN116504281A CN 116504281 A CN116504281 A CN 116504281A CN 202210054491 A CN202210054491 A CN 202210054491A CN 116504281 A CN116504281 A CN 116504281A
Authority
CN
China
Prior art keywords
current
computing unit
computing
network
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210054491.2A
Other languages
Chinese (zh)
Inventor
赵先成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lide Instrument Co ltd
Original Assignee
Zhejiang Lide Instrument Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lide Instrument Co ltd filed Critical Zhejiang Lide Instrument Co ltd
Priority to CN202210054491.2A priority Critical patent/CN116504281A/en
Publication of CN116504281A publication Critical patent/CN116504281A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/06Arrangements for interconnecting storage elements electrically, e.g. by wiring
    • G11C5/063Voltage and signal distribution in integrated semi-conductor memory access lines, e.g. word-line, bit-line, cross-over resistance, propagation delay
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/14Power supply arrangements, e.g. power down, chip selection or deselection, layout of wirings or power grids, or multiple supply levels
    • G11C5/147Voltage reference generators, voltage or current regulators; Internally lowered supply levels; Compensation for voltage drops
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a calculation unit, a calculation unit array and a calculation method for performing high-parallel calculation by using the calculation unit, which mainly solve the problems that the accuracy is low due to nonlinear effect of a device current-voltage relationship and the like in the prior art, and the power consumption is high during parallel calculation. The calculation unit includes: a tree-like network comprising a parent node and at least two child nodes, the parent node and at least one child node having a turn-on capability therebetween; at least one forward current driver stage connected to a parent node of the network; and the plurality of current sampling stages at the back end are connected with the sub-nodes of the network in a one-to-one correspondence manner. The invention realizes decoupling of multiplication operation at the device end in linear operation by utilizing a current distribution mode, and can be applied to a large-scale analog computing system integrating memory and calculation.

Description

Computing unit, array and computing method
Technical Field
The present invention relates to the field of integrated circuits, and more particularly to architecture and operation of integrated circuit computing chips.
Background
With the development of the internet of things and cloud computing, the complexity of computing tasks is increased day by day, and greater pressure is brought to a computing chip. The main performance indexes of the computing chip can be divided into power consumption and calculation power indexes. The power consumption index is critical in the scenes of limited energy supply such as the internet of things and the scenes of large energy consumption such as cloud computing. Therefore, how to further reduce the power consumption of the chip on the premise of providing a given computing power is an important research direction of computing the chip.
The power consumption of the computing chip mainly comes from two parts of operation, namely data reading and writing and data computing. For certain calculations, such as reasoning and training calculations involved in deep neural networks, a large number of data read and write operations may occur. The target memory for data reading and writing can be divided into on-chip memory and off-chip memory. Since Static Random Access Memory (SRAM) is widely used for on-chip storage, its density is low, and thus its storage capacity tends to be small. When a large number of data read and write operations are encountered, access to off-chip storage is required. And (3) examining data read-write operation, and charging and discharging parasitic capacitance around the interconnection line are main energy consumption sources. While off-chip storage for data access via larger pitch interconnect lines and pins can result in much greater power consumption than on-chip storage. For this phenomenon, deploying data in on-chip storage as much as possible, and further reducing data access to on-chip storage is an important strategy to optimize the energy efficiency of the computing chip.
The architecture is one specific implementation. This architecture can partially solidify the computation data within the computation unit by constructing a memristor-based high-density computation unit, thereby reducing access operations to the data. Meanwhile, the method adopts an analog calculation mode, so that the method has certain robustness for storing data, and therefore, larger-scale on-chip storage can be realized. A typical in-memory calculation method is to input a voltage vector V through a digital-to-analog converter (DAC) on one side of a memristor array with a conductance value of matrix G, and measure an output current vector I on the other orthogonal side, so as to implement matrix-vector multiplication calculation of i=gv. In the calculation scene of neural network reasoning, G corresponds to network weight, updating is less, and V corresponds to a feature map of data, and updating is more.
The above-described exemplary calculation scheme places a fundamental requirement on memristor devices, namely the need to exhibit a highly linear current-voltage relationship. Existing memory technologies have only a small linear region, so that to meet this requirement, the input voltage needs to be limited to a small range, which in turn presents a significant challenge for the input circuit. Second, in the case of full parallelism, the interconnect lines in the lateral direction of the array all input signals, so that parasitic capacitance charging of the interconnect lines to the input voltage is required. This process consumes a lot of energy.
Disclosure of Invention
The invention aims to solve the technical problems and provides a computing unit, an array and a computing method.
The invention aims at realizing the following technical scheme:
in a first aspect, there is provided a computing unit comprising:
a tree-like network comprising a parent node and at least two child nodes, said parent node and at least one child node having conduction capability therebetween;
at least one forward current driver stage connected to a parent node of the network;
and the plurality of current sampling stages at the back end are connected with the sub-nodes of the network in a one-to-one correspondence manner.
Further, the current driving stage comprises a voltage digital-to-analog converter and a current limiting resistor which are connected in series.
Further, the turn-on capability of the network can be adjusted by, but not limited to, the following means:
mode one: the computing unit comprises a storage structure, wherein the storage structure is provided with at least two ports which meet the following conditions, namely a first port and a second port, namely when different data are stored, different fixed inputs are given to the first port and the second port, and the current between the two ports is different;
the child nodes are connected with at least one port one or port two of the storage structure, and the storage structure is not connected with the port one or port two of the child nodes and the father node; the conduction capacity adjustment of the network is realized by writing data into the storage structure;
in the mode, the storage structure can adopt a memristor, and the conduction capacity of the network is adjusted by carrying out state adjustment on the memristor.
Mode two: and one or more of a magnetic resistance memory, a resistance change memory, a phase change memory, a ferroelectric memory, a static random access memory, a dynamic random access memory, a floating gate memory and a charge trapping memory are arranged between the father node and the child node and are used for realizing the conduction capacity adjustment of the network.
In a second aspect, there is provided a computing method of the computing unit according to the first aspect, including:
setting the potential of the input end of the current sampling stage to be the same as the zero potential of the current driving stage;
step two, applying current to the current driving stage;
step three, current sampling is carried out in the current sampling stage;
the sequence is satisfied, the second step is prior to the third step, and the first step is prior to the third step.
In a third aspect, a computing unit array for implementing parallel computing is provided, in the following manner:
mode one: comprising a plurality of computing units as described in the first aspect;
a back-end current sampling stage connects at least the sub-nodes within two different computational units.
Mode two: comprising a plurality of computing units as described in the first aspect;
a forward current drive stage connects at least parent nodes within two different computing units.
Mode three: comprising a plurality of computing units as described in the first aspect;
a back-end current sampling stage is connected with at least two sub-nodes in different computing units;
a forward current drive stage connects at least parent nodes within two different computing units.
A fourth aspect provides a computing method of the computing unit array according to the third aspect, comprising:
setting the potential of the input end of the current sampling stage to be the same as the zero potential of the current driving stage;
step two, applying current to the current driving stage;
step three, current sampling is carried out in the current sampling stage;
the sequence is satisfied, the second step is prior to the third step, and the first step is prior to the third step.
The beneficial effects of the invention are as follows: the invention provides a novel calculation principle, based on the principle, a device can work in a low-voltage state by using current input in a proper range, and current is convenient and controllable during array calculation, so that the problem of nonlinearity in the current-voltage relation of the memristor device and the problem of higher power consumption during parallel operation of array scale are solved.
Those skilled in the art will recognize additional features and advantages upon reading the following detailed description, and upon viewing the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 illustrates a structure and a calculation method of a conventional in-memory calculation;
FIG. 2 illustrates an implementation of an analog computing unit and its corresponding peripheral circuitry;
fig. 3 shows the basic principle of an analog computation unit to implement computation;
FIG. 4 illustrates an implementation of a current drive stage;
FIG. 5 illustrates an implementation of an analog computing unit and its corresponding peripheral circuitry;
fig. 6 shows the basic principle of an analog computation unit implementing computation;
FIG. 7 illustrates an implementation of an analog computation unit including a magnetic tunnel junction and its corresponding peripheral circuitry;
FIG. 8 illustrates a charge storage based structure;
FIG. 9 shows a static random access memory based architecture;
fig. 10 shows a dynamic random access memory based architecture.
Detailed Description
In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. For example, features illustrated or described with respect to one embodiment can be used on or in conjunction with other embodiments to yield yet a further embodiment. It is intended that the present invention encompass such modifications and variations. The embodiments are described using specific language (which should not be construed as limiting the scope of the appended claims). The figures are not drawn to scale and are for illustrative purposes only.
The invention requires to solve the primary technical problems that: how to reduce the linearity requirement of the traditional in-memory computing mode.
Referring to fig. 1, the mainstream in-memory calculation mode is that a voltage vector V is input through a digital-to-analog converter (DAC) on one side of a memristor array with a conductance value of matrix G, and matrix-vector multiplication calculation of i=gv can be implemented by measuring an output current vector I on the other orthogonal side. The above-described exemplary calculation scheme places a fundamental requirement on memristor devices, namely the need to exhibit a highly linear current-voltage relationship. However, existing memory technologies only have a small linear region, so that to meet this requirement, the range of the input voltage needs to be limited to a small range, which in turn presents a significant challenge for the input circuit. For voltage inputs, attempting to fix the voltage at millivolt levels is subject to circuit noise and circuit nonlinearities, which can present considerable difficulties.
In order to solve the problem, the invention provides a new in-memory calculation mode, and multiplication operation is realized through current distribution. This approach is based on a new analog computation unit, comprising:
a tree-like network comprising a parent node and at least two child nodes, said parent node and at least one child node having a certain conduction capability therebetween;
at least one forward current driver stage connected to a parent node of the network;
and the plurality of current sampling stages at the back end are connected with the sub-nodes of the network in a one-to-one correspondence manner.
The basic idea is that, unlike the voltage signal, the current corresponds to a constant amount of charge, so that better robustness to noise exists. Meanwhile, the current range corresponding to the parallel calculation of the existing memory is friendly to circuit design, so that the current can be regulated and controlled linearly and stably in the target range.
As shown in fig. 2, is a network that includes a parent node 210, three child nodes 231, 232, and 233. The parent and child nodes are connected by adjustable resistors 221, 222 and 223, respectively. The voltage drop across the device can be maintained at a low level with a wide range of current flow by configuring the nodes and current input for array conductance adaptation, thereby ensuring that their conductance is close to constant. In this configuration, the additional benefits introduced are as follows: first, as the voltage decreases, the charge-discharge energy consumption for the wire capacitance is reduced. Second, the current on the array memory device is maintained at a low level and joule heating is reduced. In the two points, the calculated power consumption can be greatly reduced.
Fig. 3 further illustrates the principle of this calculation. Let us examine the following configuration, with the parent node 310 input current I 0 The conductance value of the adjustable on-resistance 321 is G 1 The conductance value of the adjustable on-resistance 322 is G 2 The conductance value of the adjustable on-resistance 323 is G 3 The potential of the child node 331 is 0 with respect to the parent node 310, the potential of the child node 332 is 0 with respect to the parent node 310, and the potential of the child node 333 is 0 with respect to the parent node 310. Under the above configuration, the current sampling stage corresponding to the child node can be solved to obtain the following current, I 1 =I 0 *G 1 /(G 1 +G 2 +G 3 ),I 2 =I 0 *G 2 /(G 1 +G 2 +G 3 ),I 3 =I 0 *G 3 /(G 1 +G 2 +G 3 ). It can be seen that child node 331 corresponds to I 0 And G 1 /(G 1 +G 2 +G 3 ) Is multiplied by child node 332 corresponds to I 0 And G 2 /(G 1 +G 2 +G 3 ) Is the multiplication of child node 333 corresponding to I 0 And G 3 /(G 1 +G 2 +G 3 ) Is a multiplication of (a) by (b). As described above, the memory integrated device is required to have a conductive capability with a resistive characteristic. The structure provided by the application can weaken the requirements into a structure with certain conduction capacity through decoupling of the resistance characteristic and the conduction capacity, so that the applicability of the memory device to a memory integrated architecture is greatly improved.
As shown in fig. 4, in at least one embodiment, the current drive stage 400 may comprise at least one voltage digital-to-analog converter 401 and at least one current limiting resistor 402, wherein the current limiting resistor conductance value is much smaller than the driven sub-node load. In the structure of FIG. 4, G 0 <<G 11 +G 12 +G 13 . Considering that voltage digital-to-analog converters are more common than current digital-to-analog converters, the use of the former as a base unit is in some cases easier to implement. In conventional implementations, the universal voltage-to-current conversion needs to be implemented by complex analog circuitry. By combining and utilizing the specificity of the tree network load resistance in the present embodiment, the current driving stage 400 based on the current limiting resistor 402 can achieve this function with smaller power consumption, area and parasitic capacitance, thereby improving the overall performance of the circuit.
In linear algebra, two basic vector calculations are scalar multiplication of vectors, c=ab, and inter-vector addition c=a+b. In the above embodiment, scalar multiplication of vectors is implemented, i.e. i=i 0 G, wherein i= [ I 1 ,I 2 ,I 3 ],G=[G 1 ,G 2 ,G 3 ]. Further, in the embodiment shown in fig. 5, a structure capable of realizing a combination of scalar multiplication of vectors and addition between vectors is realized, including:
a plurality of the analog computation units, two analog computation units 511 and 512 are shown in fig. 5;
one of the back-end current sampling stages is connected with at least two sub-nodes in different analog computing units;
specifically, the sub-nodes 520 and 521 are connected to the same current sampling stage 531, the sub-nodes 522 and 523 are connected to the same current sampling stage 532, and the sub-nodes 524 and 525 are connected to the same current sampling stage 533.
By further expanding the embodiment of fig. 5, that is, using a specific number of analog computation units containing a specific number of child nodes, any linear combination between vectors can be implemented, thereby implementing a general linear algebraic computation.
Fig. 6 further illustrates the principle of this calculation. Let us examine the following configuration, with the parent node 601 input current I 10 Parent node 602 inputs current I 20 The conductance value of the adjustable on-resistance 611 is G 11 The conductance value of the adjustable on-resistance 612 is G 21 The conductance value of the adjustable on-resistance 613 is G 12 The conductance value of the adjustable on-resistance 614 is G 22 The conductance value of the adjustable on-resistance 615 is G 13 The conductance value of the adjustable on-resistance 616 is G 23 All child node potentials are placed at the common zero potential of parent nodes 601 and 602. Under the above configuration, the current sampling stage corresponding to the child node can be solved to obtain the following currents
I 1 =I 10 *G 11 /(G 11 +G 12 +G 13 )+I 20 *G 21 /(G 21 +G 22 +G 23 ),
I 2 =I 10 *G 12 /(G 11 +G 12 +G 13 )+I 20 *G 22 /(G 21 +G 22 +G 23 ),
I 3 =I 10 *G 13 /(G 11 +G 12 +G 13 )+I 20 *G 23 /(G 21 +G 22 +G 23 )。
It can be seen that the above described calculation structure and calculation method achieves multiplication of vector X and matrix G and results in vector Y, i.e. y=gx, where x= [ I 10 ,I 20 ,I 30 ],G=[G 11 ,G 12 ,G 13 ;G 21 ,G 22 ,G 23 ] T ,Y=[I 1 ,I 2 ,I 3 ]。
In at least one embodiment, the memory may be a memristor. By performing analog adjustment of the conduction characteristics, the calculation function can be realized more efficiently.
In at least one embodiment, the memory may include at least one magnetic tunneling junction (Magnetic tunnel junction, MTJ). The magnetic tunneling junction enables non-volatile storage of information, thus eliminating the need for reconfiguration of the network after each power down.
As shown in fig. 7, in at least one embodiment, the memory may be comprised of a gating device and a magnetic tunneling junction. For example, memory 761 is comprised of a magnetic tunnel junction 733 and a gating device 743 in series. Wherein optionally the gating device may be controlled by a signal line. For example, the gating device 741 is controlled by the signal line 711. The adjustment of the turn-on capability between the child node and the parent node is achieved by writing different values in the memory. Specifically, this is achieved by writing data 0 to the magnetic tunneling junction, setting to a high resistance state, or writing data 1, setting to a low resistance state. The gating device may be a transistor, a transmission gate, or a diode. During the computation, the gating devices in the memories involved in the computation are set to an on state. The input port potentials of the current analog-to-digital converters 751, 752, 753 are set to zero. The current digital-analog converter inputs a current signal, and performs signal sampling in the current analog-digital converter, and a calculated value is obtained after certain signal processing. The phase change memory and the resistive memory are similar in embodiments.
As shown in fig. 8, in at least one embodiment, the memory structure may be a ferroelectric type memory, a floating gate type memory, or a charge trapping type memory. Wherein a parent node is connected to interconnect 802 and a child node is connected to interconnect 803. Information storage and corresponding turn-on capability adjustment may be achieved by an isolated charge storage structure 811.
As shown in fig. 9, in at least one embodiment, the storage structure may be a static random access memory. Wherein a parent node is connected to interconnect 902 and a child node is connected to interconnect 904. Information storage and corresponding turn-on capability adjustment may be accomplished through a latch 911 similar to a static random access memory mechanism.
In at least one embodiment, as shown in FIG. 10, the storage structure may be a dynamic random access memory. Wherein a parent node is connected to interconnect 1002 and a child node is connected to interconnect 1004. Information storage and corresponding adjustment of the turn-on capability can be achieved by an isolatable capacitor structure 1011.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The foregoing is merely a preferred embodiment of the present invention, and the present invention has been disclosed in the above description of the preferred embodiment, but is not limited thereto. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (11)

1. A computing unit, comprising:
a tree-like network comprising a parent node and at least two child nodes, said parent node and at least one child node having conduction capability therebetween;
at least one forward current driver stage connected to a parent node of the network;
and the plurality of current sampling stages at the back end are connected with the sub-nodes of the network in a one-to-one correspondence manner.
2. The computing unit of claim 1, wherein the computing unit is configured to,
the current driving stage comprises a voltage digital-to-analog converter and a current limiting resistor which are connected in series.
3. The computing unit of claim 1, wherein the computing unit is configured to,
the turn-on capability of the network can be adjusted.
4. The computing unit of claim 3, wherein the computing unit,
the computing unit comprises a storage structure, wherein the storage structure is provided with at least two ports which meet the following conditions, namely a first port and a second port, namely when different data are stored, different fixed inputs are given to the first port and the second port, and the current between the two ports is different;
the child nodes are connected with at least one port one or port two of the storage structure, and the storage structure is not connected with the port one or port two of the child nodes and the father node; and implementing the conduction capacity adjustment of the network by writing data into the storage structure.
5. The computing unit of claim 4, wherein the computing unit is configured to,
the storage structure is a memristor, and the conduction capacity of the network is adjusted by carrying out state adjustment on the memristor.
6. The computing unit of claim 4, wherein the computing unit is configured to,
and one or more of a magnetic resistance memory, a resistance change memory, a phase change memory, a ferroelectric memory, a static random access memory, a dynamic random access memory, a floating gate memory and a charge trapping memory are arranged between the father node and the child node and are used for realizing the conduction capacity adjustment of the network.
7. A computing method of a computing unit according to any one of claims 1-6, comprising:
setting the potential of the input end of the current sampling stage to be the same as the zero potential of the current driving stage;
step two, applying current to the current driving stage;
step three, current sampling is carried out in the current sampling stage;
the sequence is satisfied, the second step is prior to the third step, and the first step is prior to the third step.
8. An array of computational cells for performing parallel computation, characterized in that,
comprising a plurality of computing units according to any of claims 1-6;
a back-end current sampling stage connects at least the sub-nodes within two different computational units.
9. An array of computational cells for performing parallel computation, characterized in that,
comprising a plurality of computing units according to any of claims 1-6;
a forward current drive stage connects at least parent nodes within two different computing units.
10. An array of computational cells for performing parallel computation, characterized in that,
comprising a plurality of computing units according to any of claims 1-6;
a back-end current sampling stage is connected with at least two sub-nodes in different computing units;
a forward current drive stage connects at least parent nodes within two different computing units.
11. A computing method of the computing unit array according to any one of claims 8 to 10, comprising:
setting the potential of the input end of the current sampling stage to be the same as the zero potential of the current driving stage;
step two, applying current to the current driving stage;
step three, current sampling is carried out in the current sampling stage;
the sequence is satisfied, the second step is prior to the third step, and the first step is prior to the third step.
CN202210054491.2A 2022-01-18 2022-01-18 Computing unit, array and computing method Pending CN116504281A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210054491.2A CN116504281A (en) 2022-01-18 2022-01-18 Computing unit, array and computing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210054491.2A CN116504281A (en) 2022-01-18 2022-01-18 Computing unit, array and computing method

Publications (1)

Publication Number Publication Date
CN116504281A true CN116504281A (en) 2023-07-28

Family

ID=87317110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210054491.2A Pending CN116504281A (en) 2022-01-18 2022-01-18 Computing unit, array and computing method

Country Status (1)

Country Link
CN (1) CN116504281A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011166360A (en) * 2010-02-08 2011-08-25 Nec Corp Multicast-tree calculation device, calculation method, and network system
KR20140029644A (en) * 2012-08-29 2014-03-11 삼성에스디에스 주식회사 Distributed computing system and recovery method thereof
US20190043560A1 (en) * 2018-09-28 2019-02-07 Intel Corporation In-memory multiply and accumulate with global charge-sharing
CN109669772A (en) * 2018-12-28 2019-04-23 第四范式(北京)技术有限公司 Calculate the parallel execution method and apparatus of figure
CN110196709A (en) * 2019-06-04 2019-09-03 浙江大学 A kind of non-volatile 8 booth multipliers based on RRAM
US20200211649A1 (en) * 2018-12-31 2020-07-02 Samsung Electronics Co., Ltd. Nonvolatile memory device and method of processing in memory (pim) using the same
CN111460365A (en) * 2020-03-10 2020-07-28 华中科技大学 Equation set solver based on memristive linear neural network and operation method thereof
US20200242461A1 (en) * 2019-01-29 2020-07-30 Silicon Storage Technology, Inc. Algorithms and circuitry for verifying a value stored during a programming operation of a non-volatile memory cell in an analog neural memory in deep learning artificial neural network
WO2021163866A1 (en) * 2020-02-18 2021-08-26 杭州知存智能科技有限公司 Neural network weight matrix adjustment method, writing control method, and related device
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011166360A (en) * 2010-02-08 2011-08-25 Nec Corp Multicast-tree calculation device, calculation method, and network system
KR20140029644A (en) * 2012-08-29 2014-03-11 삼성에스디에스 주식회사 Distributed computing system and recovery method thereof
US20190043560A1 (en) * 2018-09-28 2019-02-07 Intel Corporation In-memory multiply and accumulate with global charge-sharing
CN109669772A (en) * 2018-12-28 2019-04-23 第四范式(北京)技术有限公司 Calculate the parallel execution method and apparatus of figure
US20200211649A1 (en) * 2018-12-31 2020-07-02 Samsung Electronics Co., Ltd. Nonvolatile memory device and method of processing in memory (pim) using the same
US20200242461A1 (en) * 2019-01-29 2020-07-30 Silicon Storage Technology, Inc. Algorithms and circuitry for verifying a value stored during a programming operation of a non-volatile memory cell in an analog neural memory in deep learning artificial neural network
CN110196709A (en) * 2019-06-04 2019-09-03 浙江大学 A kind of non-volatile 8 booth multipliers based on RRAM
WO2021163866A1 (en) * 2020-02-18 2021-08-26 杭州知存智能科技有限公司 Neural network weight matrix adjustment method, writing control method, and related device
CN111460365A (en) * 2020-03-10 2020-07-28 华中科技大学 Equation set solver based on memristive linear neural network and operation method thereof
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller

Similar Documents

Publication Publication Date Title
US10297315B2 (en) Resistive memory accelerator
Cai et al. Proposal of analog in-memory computing with magnified tunnel magnetoresistance ratio and universal STT-MRAM cell
US20200279012A1 (en) Resistive Memory Device For Matrix-Vector Multiplications
CN110390074B (en) Computing system of resistance type memory
CN110007895B (en) Analog multiplication circuit, analog multiplication method and application thereof
US10261977B2 (en) Resistive memory accelerator
WO2024104427A1 (en) All-analog vector matrix multiplication processing-in-memory circuit and operation method thereof, computer device, and computer-readable storage medium
Ma et al. In-memory computing: The next-generation ai computing paradigm
Lepri et al. In-memory computing for machine learning and deep learning
CN116504281A (en) Computing unit, array and computing method
CN113553028B (en) Problem solving and optimizing method and system based on probability bit circuit
CN114093394B (en) Rotatable internal computing circuit and implementation method thereof
Zang et al. 282-to-607 TOPS/W, 7T-SRAM based CiM with reconfigurable column SAR ADC for neural network processing
CN115954029A (en) Multi-bit operation module and in-memory calculation circuit structure using the same
Gou et al. 2T1C DRAM based on semiconducting MoS2 and semimetallic graphene for in-memory computing
CN114898792A (en) Multi-bit memory inner product and exclusive-or unit, exclusive-or vector and operation method
CN114627937A (en) Memory computing circuit and method based on nonvolatile memory device
CN115312090A (en) Memory computing circuit and method
Luo et al. SpinCIM: Spin orbit torque memory for ternary neural networks based on the computing-in-memory architecture
Gao et al. Current research status and future prospect of the in-memory computing
Wang et al. Sparsity-aware clamping readout scheme for high parallelism and low power nonvolatile computing-in-memory based on resistive memory
Numan Integrated Circuit Blocks for In-Memory Computing
US20240153552A1 (en) Memory array for compute-in-memory and the operating method thereof
CN116504285A (en) Computing system and computing method
CN115906968B (en) Dual signed operand nonvolatile memory integrated unit, array and operation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination