US20210064379A1 - Refactoring MAC Computations for Reduced Programming Steps - Google Patents
Refactoring MAC Computations for Reduced Programming Steps Download PDFInfo
- Publication number
- US20210064379A1 US20210064379A1 US16/556,101 US201916556101A US2021064379A1 US 20210064379 A1 US20210064379 A1 US 20210064379A1 US 201916556101 A US201916556101 A US 201916556101A US 2021064379 A1 US2021064379 A1 US 2021064379A1
- Authority
- US
- United States
- Prior art keywords
- array
- crossbar
- volatile memory
- architecture
- elements
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015654 memory Effects 0.000 claims abstract description 43
- 238000000034 method Methods 0.000 claims abstract description 35
- 230000004913 activation Effects 0.000 claims abstract description 30
- 238000001994 activation Methods 0.000 claims abstract description 30
- 238000013528 artificial neural network Methods 0.000 claims abstract description 26
- 230000001133 acceleration Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 210000004027 cell Anatomy 0.000 description 13
- 230000000875 corresponding effect Effects 0.000 description 11
- 239000011159 matrix material Substances 0.000 description 8
- 230000007704 transition Effects 0.000 description 8
- 239000000463 material Substances 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 239000010410 layer Substances 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007711 solidification Methods 0.000 description 1
- 230000008023 solidification Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/487—Multiplying; Dividing
- G06F7/4876—Multiplying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
- G06F9/30014—Arithmetic instructions with variable precision
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/48—Indexing scheme relating to groups G06F7/48 - G06F7/575
- G06F2207/4802—Special implementations
- G06F2207/4814—Non-logic devices, e.g. operational amplifiers
Definitions
- the present disclosure relates generally to machine-learning accelerators, and more particularly, to multiply-and-accumulate (MAC) acceleration for improving the efficiency of machine learning operations.
- MAC multiply-and-accumulate
- Non-Volatile Memory (NVM)-based crossbar architectures provide an alternative mechanism for performing MAC operations in machine-learning algorithms, particularly, neural-networks.
- the mixed-signal approach using NVM-bit cells relies upon Ohm's law to implement multiply operations by taking advantage of the resistive nature of emerging NVM technologies (e.g., phase change memory (PCM), resistive random-access memory (RRAM), correlated electron random access memory (CeRAM), and the like).
- PCM phase change memory
- RRAM resistive random-access memory
- CeRAM correlated electron random access memory
- An application of a voltage-bias across an NVM-bit cell generates a current that is proportional to the product of the conductance of the NVM element and the voltage-bias across the cell.
- MAC acceleration utilizing NVM crossbars requires programming NVM elements with precision conductance levels that represent a multi-bit weight parameter. Due to inherent device limitations, the bit-precision that can be represented is limited to 4 or 5 bits, which provides 16 to 32 distinct conductance levels. This complicates the weight programming step since the entire crossbar array of NVM bits needs to be precisely programmed (capacities of 1-10 Mb are typical).
- FIG. 1 depicts a high-level block diagram of a multiplication-accumulation (MAC) operation, in accordance with an embodiment of the disclosure.
- MAC multiplication-accumulation
- FIG. 2 depicts a diagram of a convolution operation within a single-layer of a convolutional neural network, in accordance with an embodiment of the disclosure.
- FIG. 3 depicts an architecture of crossbars having non-volatile memory elements for performing the operation shown in FIG. 2 , in accordance with an embodiment of the disclosure.
- FIG. 4 depicts a schematic of a K-bit precision architecture for implementing refactored matrix multiplication in accordance with an embodiment of the disclosure, in accordance with an embodiment of the disclosure.
- FIG. 5 depicts a flow diagram of a non-precision programming process in accordance with an embodiment of the disclosure for performing MAC acceleration, in accordance with an embodiment of the disclosure.
- FIG. 6 depicts a flow diagram of a precision programming process in accordance with an embodiment of the disclosure for performing MAC acceleration, in accordance with an embodiment of the disclosure.
- FIG. 7 depicts a flow diagram of a method of performing MAC acceleration in a neural network, in accordance with an embodiment of the disclosure.
- the terms “a” or “an,” as used herein, are defined as one or more than one.
- the term “plurality,” as used herein, is defined as two or more than two.
- the term “another,” as used herein, is defined as at least a second or more.
- the terms “including” and/or “having,” as used herein, are defined as comprising (i.e., open language).
- the term “coupled,” as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically.
- the term “providing” is defined herein in its broadest sense, e.g., bringing/coming into physical existence, making available, and/or supplying to someone or something, in whole or in multiple parts at once or over a period.
- any references to the term “longitudinal” should be understood to mean in a direction corresponding to an elongated direction of a personal computing device from one terminating end to an opposing terminating end.
- FIG. 1 depicts a high-level block diagram 100 of a multiplication-accumulation (MAC) operation that computes the product of two numbers and adds that product to an accumulator, in accordance with an embodiment of the disclosure.
- MAC multiplication-accumulation
- the composition of a group of MACs may represent dot-products and vector-matrix multiplication.
- MAC operations are utilized in Machine Learning (ML) applications, and more specifically Deep Neural Networks (DNN).
- FIG. 1 represents a simple multi-layer fully connected neural network 100 .
- Each of the neurons in the first layer computes the MAC between the input vector 102 and the corresponding weights in matrix 104 .
- An activation function is applied to the partial result, and an output vector 106 is generated. This process takes place for every neuron present in a plurality of layers.
- FIG. 2 depicts a convolution operation within a single-layer of a convolutional neural network (CNN 200 ), in accordance with an embodiment of the disclosure.
- the CNN 200 includes multiple “M” filters and multiple “C” input channels.
- a single filter is convolved across input feature maps in different channels to produce an output feature map corresponding to a single filter.
- output feature maps each corresponding to an individual filter.
- Convolutional layers require the movement of large amounts of data, generate a significant computational load, and require buffers of considerable size to store intermediate values.
- a single filter is convolved across “C” input feature maps 204 1 , 204 2 , and 204 N in different channels to produce “M” output feature maps 206 1 , 206 2 , and 206 N corresponding to a single filter.
- M output feature maps
- the dimensions of the filters are 2 ⁇ 2 and the dimensions of the input feature map are 5 ⁇ 5.
- the total number of operations are 2 ⁇ 2 ⁇ C ⁇ (5 ⁇ 5) ⁇ M for the specific arrangement shown in FIG. 2 .
- FIG. 3 depicts an architecture 300 showing how such an operation can be implemented using NVM crossbar 302 , in accordance with an embodiment of the disclosure. Reference numerals identifying like components are repeated from FIG. 2 .
- NVM crossbar 302 weights for a convolutional filter are programmed into individual bit-cells W 0 11 , W 0 12 , W 0 21 , W 0 22 .
- the crossbar 302 includes a plurality of row signal lines 308 1 , 308 2 , . . . , 308 N and column signal lines 312 1 , 312 2 , . . . , 312 M .
- a plurality of neural network nodes are represented by 314 11 , . . .
- the LRS and HRS conductances are represented by G 0 11 , G 0 12 , G 0 21 , G 0 22 . . . .
- the ratio of HRS/LRS is at least 2-orders of magnitude. Therefore, encoding a 4-bit (or 16 levels) resistance is possible.
- DAC digital-to-analog convertor
- the DAC includes converter elements 310 1 , 310 2 , . . . , 310 N .
- bias+activation may be represented by:
- FIG. 3 a mapping of CNN to NVM crossbars with M filters and C input channels is depicted.
- the weights co are stationary i.e., they are programmed into the crossbar once and do not change during the course of inference operations.
- Typical NVM elements for instance, Phase-Change Memory and Resistive RAM
- Other NVM elements (such as Magnetic RAM and CeRAM) demonstrate the promise for relatively higher endurance (near 10 12 ), but continuous operation still leads to a limited lifetime.
- Such arrangements differ from SRAM behavior (which has significantly higher write endurance) and are not amenable to reprogramming the weights during inference.
- the entire network needs to be unrolled into an on-chip crossbar and fixed during inference. While this has the advantage of eliminating DRAM power consumption, it undesirably limits the maximum size of the network that can be programmed on-chip. Further, it also incurs an area penalty as mapping larger networks requires instantiation of crossbars that are megabits in capacity. This consumes higher area and increases susceptibility to chip-failures due to yield loss.
- instantiating multiple crossbars requires instantiation of multiple ADCs/DACs, all of which need to be programmed, trimmed and compensated for drift.
- An NVM/CeRAM element is a particular type of random access memory formed (wholly or in part) from a correlated electron material.
- the CeRAM may exhibit an abrupt conductive or insulative state transition arising from electron correlations rather than solid state structural phase changes such as, for example, filamentary formation and conduction in resistive RAM devices.
- An abrupt conductor/insulator transition in a CeRAM may be responsive to a quantum mechanical phenomenon, in contrast to melting/solidification or filament formation.
- a quantum mechanical transition of a CeRAM between an insulative state and a conductive state may be understood in terms of a Mott transition.
- a material may switch from an insulative state to a conductive state if a Mott transition condition occurs.
- a critical carrier concentration is achieved such that a Mott criteria is met, the Mott transition will occur and the state will change from high resistance/impedance (or capacitance) to low resistance/impedance (or capacitance).
- a “state” or “memory state” of the CeRAM element may be dependent on the impedance state or conductive state of the CeRAM element.
- the “state” or “memory state” means a detectable state of a memory device that is indicative of a value, symbol, parameter or condition, just to provide a few examples.
- a memory state of a memory device may be detected based, at least in part, on a signal detected on terminals of the memory device in a read operation.
- a memory device may be placed in a particular memory state to represent or store a particular value, symbol or parameter by application of one or more signals across terminals of the memory device in a “write operation.”
- a CeRAM element may comprise material sandwiched between conductive terminals. By applying a specific voltage and current between the terminals, the material may transition between the aforementioned conductive and insulative states.
- the material of a CeRAM element sandwiched between conductive terminals may be placed in an insulative state by application of a first programming signal across the terminals having a reset voltage and reset current at a reset current density, or placed in a conductive state by application of a second programming signal across the terminals having a set voltage and set current at set current density.
- the NVM equivalent is represented by:
- Refactoring as represented by Eq. 10 leads to a simpler implementation where all input multiplicands are initially added to conditionally add together the input activations depending on whether they factor into the MAC operation with a specific weight value.
- the initial addition operation can be done using NVM elements. However, in accordance with embodiments of the disclosure, these NVM elements need not be precisely programmed.
- a binary weight encoding (R ON /R OFF ) is utilized to connect an input activation to a weight value without need for precision programming.
- FIG. 4 depicts a schematic of a K-bit precision architecture 400 for implementing refactored matrix multiplication, in accordance with an embodiment of the disclosure.
- the refactored matrix G can be deployed in a system composed of block 402 of M digital-to-analog converters (DACs) 403 1 , . . . , 403 M , an M ⁇ N ⁇ K crossbar 404 and a multiplier/scaling module 406 .
- the crossbar 404 includes a plurality of neural network nodes 408 11 , . . . , 408 MN , provided at each junction of row signal lines 410 1 , 410 2 , 410 3 , . . .
- Each respective node 408 11 , . . . , 408 M,N includes one or more NVM elements to store the weights associated with that node.
- a node is switchable between a first impedance state and a second impedance state (R ON /R OFF ).
- the multiplier/scaling module 406 includes nodes 407 0 , . . . , 407 K-1 corresponding to weights w_0, w_1, w_K ⁇ 1.
- a_M ⁇ 1 along respective row signal lines 410 1 , 410 2 , 410 3 , . . . , 410 M are first summed together in the crossbar 404 by placing the nodes 408 11 , . . . , 408 MN into R ON /R OFF states as described above where the NVM elements do not require precision programming.
- certain nodes among the plurality of nodes 408 1,1 , . . . , 408 MN are switched between impedance states as shown.
- a summed signal from each column Nis input to a respective node 407 0 , . . . , 407 K-1 of the multiplier scaling module 406 , and the final MAC computation occurs in a single column in the multiplier/scaling module 406 where all elements are precisely programmed (K in elements in total).
- FIG. 5 depicts a flow-diagram 500 of a non-precision programming process for performing multiply-accumulate acceleration, in accordance with an embodiment of the disclosure.
- M ⁇ N crossbar having K-bit weights
- M ⁇ N ⁇ 2 K elements are binary programmed
- 2 K ⁇ N elements are precision programmed.
- the output is given by:
- K provides 2 K levels.
- the process is initialized and proceeds to block 504 of the low-precision write loop.
- block 504 ijk are updated.
- the g_ijk (binary) resistance is read.
- g_ijk (binary) resistance is written.
- the process terminates at block 510 .
- the M ⁇ N ⁇ 2 K cells are programmed to either a “0” (R OFF ) or a “1” (R ON ) where 2 K defines the number of levels. From here, 2 K ⁇ N ⁇ M ⁇ N non-volatile memory cells are precision programmed.
- FIG. 6 depicts a flow diagram 600 of a precision programming process of the 2 K ⁇ N non-volatile memory cells, in accordance with an embodiment of the disclosure.
- the process is initialized at block 602 and proceeds to block 604 of a high-precision write loop.
- ij are updated.
- g_ij are read (non-binary).
- a high-precision operation changes the resistivity of a single non-volatile memory cell to a very precise known value between 2 ⁇ circumflex over ( ) ⁇ 4 possible conductances.
- the multilevel resistance g_ij is tuned.
- the multilevel resistance g_ij is read, and the correct multilevel resistance verified in block 612 .
- the process loops back to block 604 if the multilevel resistance is correct, otherwise, the process proceeds back to block 608 for further tuning.
- the process terminates at block 614 .
- the output is given by:
- v_i, g_ij are high precision numbers in K levels.
- FIG. 7 depicts a flow-diagram 700 of a method for performing multiply-accumulate acceleration in a neural network ( 100 , FIG. 1 ), in accordance with an embodiment of the disclosure.
- the method is initialized at block 702 , and proceeds to block 704 which corresponds to the low-precision write loop 500 shown in FIG. 5 .
- the input activations a_0, a_1, . . . , a_M ⁇ 1 are applied to a summing array (crossbar 404 , FIG. 4 ).
- a summed signal is generated by each column of NVM elements in the summing array as shown in FIG. 4 by the low-precision write loop 500 .
- the process proceeds to block 706 , which corresponds to the high-precision write loop 600 shown in FIG. 6 .
- the summed signal is input to a multiplying array (module 406 , FIG. 4 ) having a plurality of NVM elements.
- each NVM element in the multiplying array is precisely programmed to a conductance level proportional to a weight in the neural network as shown in the flow diagram of FIG. 6 .
- the process terminates at block 708 .
- Embodiments of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- a method of performing multiply-accumulate acceleration in a neural network includes generating, in a summing array having a plurality of non-volatile memory elements arranged in columns, a summed signal by the columns of non-volatile memory elements in the summing array, each non-volatile memory element in the summing array being programmed to either a high or low resistance state; and inputting the summed signal from the summing array to a multiplying array having a plurality of non-volatile memory elements, each non-volatile memory element in the multiplying array being precisely programmed to a conductance level proportional to a weight in the neural network.
- the summing array and multiplying array is an M ⁇ N crossbar having K-bit weights, where M ⁇ N ⁇ 2 ⁇ circumflex over ( ) ⁇ Kelements are programmed to either the high or low resistance state and 2 ⁇ K ⁇ N elements are precisely programmed to the conductance level proportional to the weight in the neural network.
- a plurality of input activations are conditionally summed depending upon specific weight values.
- a plurality of input activations are significantly greater than a plurality of weight values.
- summing array comprises a plurality of high and low resistance levels.
- the method further comprises the M ⁇ N ⁇ 2 ⁇ circumflex over ( ) ⁇ K elements programmed to either a resistance “off” state or a resistance “on” state for 2 ⁇ circumflex over ( ) ⁇ K levels.
- 2 ⁇ circumflex over ( ) ⁇ K ⁇ N ⁇ M ⁇ N non-volatile memory cells are fine-tuned.
- the method further comprises scaling an output in a multiplier/scaling module.
- an architecture for performing multiply-accumulate operations in a neural network includes a summing array having a plurality of non-volatile memory elements arranged in columns, the summing array generating a summed signal by the columns of non-volatile memory elements in the summing array, each non-volatile memory element in the summing array being programmed to either a high or low resistance state; and a multiplying array having a plurality of non-volatile memory elements that receive a summed signal from the summing array, each non-volatile memory element in the multiplying array being precisely programmed to a conductance level proportional to a weight in the neural network.
- the summing array and multiplying array is an M ⁇ N crossbar having K-bit weights, where M ⁇ N ⁇ 2 ⁇ circumflex over ( ) ⁇ Kelements are programmed to either the high or low resistance state and 2 ⁇ circumflex over ( ) ⁇ K ⁇ N elements are precisely programmed to the conductance level proportional to the weight in the neural network.
- a plurality of input activations is conditionally summed depending upon specific weight values.
- a plurality of input activations is significantly greater than a plurality of weight values.
- the architecture further comprises a plurality of resistors and where the summing array comprises a plurality of high and low resistance levels.
- the architecture further comprises the M ⁇ N ⁇ 2 ⁇ circumflex over ( ) ⁇ K elements programmed to either a resistance “off” state or a resistance “on” state for 2 ⁇ circumflex over ( ) ⁇ K levels.
- 2 ⁇ circumflex over ( ) ⁇ K ⁇ N ⁇ M ⁇ N non-volatile memory cells are fine-tuned.
- the architecture further comprises a multiplier/scaling module for scaling an output.
- an architecture for performing multiply-accumulate operations in a neural network includes a crossbar including a plurality of crossbar nodes arranged in an array of rows and columns, each crossbar node being programmable to a first resistance level or a second resistance level, the crossbar being configured to sum a plurality of analog input activation signals over each column of crossbar nodes and output a plurality of summed activation signals; and a multiplier, coupled to the crossbar, including a plurality of multiplier nodes, each multiplier node being programmable to a resistance level proportional to one of a plurality of neural network weights, the multiplier being configured to sum the plurality of summed activation signals over the multiplier nodes and output an analog output activation signal.
- each crossbar node includes one or more non-volatile elements (NVMs), and each multiplier node includes a plurality of NVMs.
- NVMs non-volatile elements
- the crossbar includes M rows, N columns, K-bit weights and M ⁇ N ⁇ 2K programmable NVMs
- the multiplier includes N multiplier nodes and N ⁇ 2K programmable NVMs.
- the architecture further comprises a plurality of digital-to-analog converters (DACs) coupled to the crossbar, each DAC being configured to receive a plurality of digital input activation signals and output the plurality of analog input activation signals.
- DACs digital-to-analog converters
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Nonlinear Science (AREA)
- Semiconductor Memories (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
A method and architecture for performing multiply-accumulate operations in a neural network is disclosed. The architecture includes a crossbar having a plurality of non-volatile memory elements. A plurality of input activations is applied to the crossbar, which are then summed by binary weight encoding a plurality of the non-volatile memory elements to connect the input activations to weight values. At least one of the plurality of non-volatile memory elements is then precision programmed.
Description
- The present disclosure relates generally to machine-learning accelerators, and more particularly, to multiply-and-accumulate (MAC) acceleration for improving the efficiency of machine learning operations.
- Non-Volatile Memory (NVM)-based crossbar architectures provide an alternative mechanism for performing MAC operations in machine-learning algorithms, particularly, neural-networks. The mixed-signal approach using NVM-bit cells relies upon Ohm's law to implement multiply operations by taking advantage of the resistive nature of emerging NVM technologies (e.g., phase change memory (PCM), resistive random-access memory (RRAM), correlated electron random access memory (CeRAM), and the like). An application of a voltage-bias across an NVM-bit cell generates a current that is proportional to the product of the conductance of the NVM element and the voltage-bias across the cell.
- Currents from multiple bit cells are added in parallel to implement an accumulated sum. Thus, a combination of Ohm's law and Kirchoff's current law implements multiple MAC operations in parallel. These, however, can be energy-intensive when implemented using explicit multipliers and adders in the digital domain.
- In neural networks, MAC acceleration utilizing NVM crossbars requires programming NVM elements with precision conductance levels that represent a multi-bit weight parameter. Due to inherent device limitations, the bit-precision that can be represented is limited to 4 or 5 bits, which provides 16 to 32 distinct conductance levels. This complicates the weight programming step since the entire crossbar array of NVM bits needs to be precisely programmed (capacities of 1-10 Mb are typical).
- In accordance with the present disclosure, there is provided an improved technique for refactoring MAC operations to reduce programming steps in such systems.
- The present disclosure is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements.
-
FIG. 1 depicts a high-level block diagram of a multiplication-accumulation (MAC) operation, in accordance with an embodiment of the disclosure. -
FIG. 2 depicts a diagram of a convolution operation within a single-layer of a convolutional neural network, in accordance with an embodiment of the disclosure. -
FIG. 3 depicts an architecture of crossbars having non-volatile memory elements for performing the operation shown inFIG. 2 , in accordance with an embodiment of the disclosure. -
FIG. 4 depicts a schematic of a K-bit precision architecture for implementing refactored matrix multiplication in accordance with an embodiment of the disclosure, in accordance with an embodiment of the disclosure. -
FIG. 5 depicts a flow diagram of a non-precision programming process in accordance with an embodiment of the disclosure for performing MAC acceleration, in accordance with an embodiment of the disclosure. -
FIG. 6 depicts a flow diagram of a precision programming process in accordance with an embodiment of the disclosure for performing MAC acceleration, in accordance with an embodiment of the disclosure. -
FIG. 7 depicts a flow diagram of a method of performing MAC acceleration in a neural network, in accordance with an embodiment of the disclosure. - Specific embodiments of the disclosure will now be described in detail regarding the accompanying figures. For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the examples described herein. Also, the description is not to be considered as limiting the scope of the examples described herein.
- It will be appreciated that the examples and corresponding diagrams used herein are for illustrative purposes only. Different configurations and terminology can be used without departing from the principles expressed herein. For instance, components and modules can be added, deleted, modified, or arranged with differing connections without departing from these principles.
- In the following detailed description of embodiments of the disclosure, numerous specific details are set forth in order to provide a more thorough understanding of the disclosure. However, it will be apparent to those skilled in the art that the disclosure may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
- It is to be understood that the terminology used herein is for the purposes of describing various embodiments in accordance with the present disclosure and is not intended to be limiting. The terms “a” or “an,” as used herein, are defined as one or more than one. The term “plurality,” as used herein, is defined as two or more than two. The term “another,” as used herein, is defined as at least a second or more. The terms “including” and/or “having,” as used herein, are defined as comprising (i.e., open language). The term “coupled,” as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The term “providing” is defined herein in its broadest sense, e.g., bringing/coming into physical existence, making available, and/or supplying to someone or something, in whole or in multiple parts at once or over a period.
- As used herein, the terms “about” or “approximately” apply to all numeric values, irrespective of whether these are explicitly indicated. Such terms generally refer to a range of numbers that one of skill in the art would consider equivalent to the recited values (i.e., having the same function or result). These terms may include numbers that are rounded to the nearest significant figure. In this document, any references to the term “longitudinal” should be understood to mean in a direction corresponding to an elongated direction of a personal computing device from one terminating end to an opposing terminating end.
-
FIG. 1 depicts a high-level block diagram 100 of a multiplication-accumulation (MAC) operation that computes the product of two numbers and adds that product to an accumulator, in accordance with an embodiment of the disclosure. -
a←a+(bc)(1) (Eq. 1) - The composition of a group of MACs may represent dot-products and vector-matrix multiplication. MAC operations are utilized in Machine Learning (ML) applications, and more specifically Deep Neural Networks (DNN).
FIG. 1 represents a simple multi-layer fully connectedneural network 100. Each of the neurons in the first layer computes the MAC between theinput vector 102 and the corresponding weights inmatrix 104. An activation function is applied to the partial result, and anoutput vector 106 is generated. This process takes place for every neuron present in a plurality of layers. -
FIG. 2 depicts a convolution operation within a single-layer of a convolutional neural network (CNN 200), in accordance with an embodiment of the disclosure. Generally, the CNN 200 includes multiple “M” filters and multiple “C” input channels. A single filter is convolved across input feature maps in different channels to produce an output feature map corresponding to a single filter. Thus, there are multiple output feature maps each corresponding to an individual filter. Convolutional layers require the movement of large amounts of data, generate a significant computational load, and require buffers of considerable size to store intermediate values. In this example, there are multiple filters “M” 202 1, 202 2, . . . , 202 M and multiple input channels. A single filter is convolved across “C” input feature maps 204 1, 204 2, and 204 N in different channels to produce “M” output feature maps 206 1, 206 2, and 206 N corresponding to a single filter. Thus, there are multiple output feature maps each corresponding to an individual filter. In the illustration ofFIG. 2 , it is assumed that the dimensions of the filters are 2×2 and the dimensions of the input feature map are 5×5. Thus, the total number of operations are 2×2×C×(5×5)×M for the specific arrangement shown inFIG. 2 . -
FIG. 3 depicts anarchitecture 300 showing how such an operation can be implemented usingNVM crossbar 302, in accordance with an embodiment of the disclosure. Reference numerals identifying like components are repeated fromFIG. 2 . InNVM crossbar 302, weights for a convolutional filter are programmed into individual bit-cells W0 11, W0 12, W0 21, W0 22. Thecrossbar 302 includes a plurality of row signal lines 308 1, 308 2, . . . , 308 N and column signal lines 312 1, 312 2, . . . , 312 M. A plurality of neural network nodes are represented by 314 11, . . . , 314 NM, each of which includes a plurality of NVM elements which can be programmed between resistance states. It is generally useful to encode multiple linearly-separated resistance levels within an individual bit cell due to the wide separation between low-resistance state (LRS) RON and the high-resistance state (HRS) ROFF. The LRS and HRS conductances are represented by G0 11, G0 12, G0 21, G0 22 . . . . In the case of correlated electron material RAM (CeRAM), the ratio of HRS/LRS is at least 2-orders of magnitude. Therefore, encoding a 4-bit (or 16 levels) resistance is possible. Digital words from the input feature maps 306 1 . . . N are input to thecrossbar 302 via corresponding inputs I0 11, I0 12, . . . , IC 22 along the row signal lines 308 11, 308 12, . . . , 308 22 ofcrossbar 302 and converted to analog-voltages V0 11, V0 12, . . . , VC 22 using a digital-to-analog convertor (DAC), and applied across the NVM cell. The DAC includes converter elements 310 1, 310 2, . . . , 310 N. The column signal lines 312 1, 312 2, . . . , 312 M apply corresponding signals BLK0, BLK1, . . . , BLKM. The resultant current is therefore proportional to the dot-product of the input word and the weight. These individual currents are then accumulated in parallel on a bit line. Once the accumulated current signal develops on the bit line, it can then be digitized again using an analog-to-digital converter (ADC) and bias-addition, scaling and activation-functions can be applied on the resulting digital word to obtain output activation. The bias+activation may be represented by: -
BL K0 =Σc-1 c[(G 0 11 *Vc 11 + . . . +G 0 22 *Vc 22)] (Eq. 2) - In
FIG. 3 , a mapping of CNN to NVM crossbars with M filters and C input channels is depicted. In certain embodiments, it is assumed that the weights co are stationary i.e., they are programmed into the crossbar once and do not change during the course of inference operations. Typical NVM elements (for instance, Phase-Change Memory and Resistive RAM) have a limited write “endurance”—i.e., it is possible to write to them a limited number of times (say, approximately 108 times) after which the devices exhibit functional failure. Other NVM elements (such as Magnetic RAM and CeRAM) demonstrate the promise for relatively higher endurance (near 1012), but continuous operation still leads to a limited lifetime. Hence, such lifetime limits pose significant constraints on accelerator architectures that rely upon updating weights on a per-inference cycle. For example, for an IoT-class accelerator operating at 100 MHz, an accelerator with an endurance of 108 has a lifetime of 1-sec, and an accelerator with an endurance of 1012 has a lifetime of 10,000 secs or 4-days (worst-case, peak usage). Therefore, such weights cannot be practically streamed from an external DRAM and must be fixed on-chip. Further, NVM bit cells suffer from high write-powers, and consequent expensive power consumption to perform update operations. Thus, the write phase can be problematic and take a long time to complete. - Such arrangements differ from SRAM behavior (which has significantly higher write endurance) and are not amenable to reprogramming the weights during inference. As a consequence, the entire network needs to be unrolled into an on-chip crossbar and fixed during inference. While this has the advantage of eliminating DRAM power consumption, it undesirably limits the maximum size of the network that can be programmed on-chip. Further, it also incurs an area penalty as mapping larger networks requires instantiation of crossbars that are megabits in capacity. This consumes higher area and increases susceptibility to chip-failures due to yield loss. Moreover, instantiating multiple crossbars requires instantiation of multiple ADCs/DACs, all of which need to be programmed, trimmed and compensated for drift.
- An NVM/CeRAM element is a particular type of random access memory formed (wholly or in part) from a correlated electron material. The CeRAM may exhibit an abrupt conductive or insulative state transition arising from electron correlations rather than solid state structural phase changes such as, for example, filamentary formation and conduction in resistive RAM devices. An abrupt conductor/insulator transition in a CeRAM may be responsive to a quantum mechanical phenomenon, in contrast to melting/solidification or filament formation.
- A quantum mechanical transition of a CeRAM between an insulative state and a conductive state may be understood in terms of a Mott transition. In a Mott transition, a material may switch from an insulative state to a conductive state if a Mott transition condition occurs. When a critical carrier concentration is achieved such that a Mott criteria is met, the Mott transition will occur and the state will change from high resistance/impedance (or capacitance) to low resistance/impedance (or capacitance).
- A “state” or “memory state” of the CeRAM element may be dependent on the impedance state or conductive state of the CeRAM element. In this context, the “state” or “memory state” means a detectable state of a memory device that is indicative of a value, symbol, parameter or condition, just to provide a few examples. In a particular implementation, a memory state of a memory device may be detected based, at least in part, on a signal detected on terminals of the memory device in a read operation. In another implementation, a memory device may be placed in a particular memory state to represent or store a particular value, symbol or parameter by application of one or more signals across terminals of the memory device in a “write operation.”
- A CeRAM element may comprise material sandwiched between conductive terminals. By applying a specific voltage and current between the terminals, the material may transition between the aforementioned conductive and insulative states. The material of a CeRAM element sandwiched between conductive terminals may be placed in an insulative state by application of a first programming signal across the terminals having a reset voltage and reset current at a reset current density, or placed in a conductive state by application of a second programming signal across the terminals having a set voltage and set current at set current density.
- In accordance with embodiments of the disclosure, a vector-matrix multiplication performs the following MAC operations, where an Input vector V={vi}, i∈[0, M−1], matrix W={wij}, j∈[0, N−1], and an output vector O is composed of:
-
O j=Σi w ij a i (Eq. 3) - The NVM equivalent is represented by:
-
I j=Σi wg ij v i (Eq. 4) - where matrix I represents the currents flowing through the bitlines, and V is the input voltages vector and g is the conductance of the NVM elements. For a K-bit weight representation, there can only be 2K unique weight values. For low-precision weight encoding (3 or 4-bit values), this leads to only 8 or 16 such unique weight values:
-
g ij=Σk=0 K-1 g′ ijk(R ON +Δk) (Eq. 5) - where g′ijk∈{0,1}, k∈[0, K−1] and Δ=(ROFF−RON)/2K.
- Voltages V′={v′i} are defined as following:
-
- where v′i=0 if activation ai does not have gijk as a multiplicand, otherwise where v′i=vi, Eq. 5 can be rewritten as:
-
- given that g′ijk=0 should be v′i=0.
- Refactoring as represented by Eq. 10 leads to a simpler implementation where all input multiplicands are initially added to conditionally add together the input activations depending on whether they factor into the MAC operation with a specific weight value. The initial addition operation can be done using NVM elements. However, in accordance with embodiments of the disclosure, these NVM elements need not be precisely programmed. A binary weight encoding (RON/ROFF) is utilized to connect an input activation to a weight value without need for precision programming.
-
FIG. 4 depicts a schematic of a K-bit precision architecture 400 for implementing refactored matrix multiplication, in accordance with an embodiment of the disclosure. In accordance with an embodiment, the refactored matrix G can be deployed in a system composed ofblock 402 of M digital-to-analog converters (DACs) 403 1, . . . , 403 M, an M×N×K crossbar 404 and a multiplier/scaling module 406. Thecrossbar 404 includes a plurality ofneural network nodes 408 11, . . . , 408 MN, provided at each junction of row signal lines 410 1, 410 2, 410 3, . . . , 410 M and column signal lines 412 1, 412 2, 412 3, . . . , 412 N as shown. Eachrespective node 408 11, . . . , 408 M,N includes one or more NVM elements to store the weights associated with that node. As described above, a node is switchable between a first impedance state and a second impedance state (RON/ROFF). The multiplier/scaling module 406 includesnodes 407 0, . . . , 407 K-1 corresponding to weights w_0, w_1, w_K−1. Input activations a_0, a 1, . . . , a_M−1 along respective row signal lines 410 1, 410 2, 410 3, . . . , 410 M are first summed together in thecrossbar 404 by placing thenodes 408 11, . . . , 408 MN into RON/ROFF states as described above where the NVM elements do not require precision programming. In the example illustrated inFIG. 4 , certain nodes among the plurality ofnodes 408 1,1, . . . , 408 MN are switched between impedance states as shown. A summed signal from each column Nis input to arespective node 407 0, . . . , 407 K-1 of themultiplier scaling module 406, and the final MAC computation occurs in a single column in the multiplier/scaling module 406 where all elements are precisely programmed (K in elements in total). -
FIG. 5 depicts a flow-diagram 500 of a non-precision programming process for performing multiply-accumulate acceleration, in accordance with an embodiment of the disclosure. For an M×N crossbar having K-bit weights, M×N×2K elements are binary programmed and 2K×N elements are precision programmed. The output is given by: -
outputj=sum(g ijk)sum(v′ ik) (Eq. 11) - where K provides 2K levels. In
block 502, the process is initialized and proceeds to block 504 of the low-precision write loop. Inblock 504 ijk are updated. Inblock 506 the g_ijk (binary) resistance is read. Then inblock 508, g_ijk (binary) resistance is written. The process terminates atblock 510. In this implementation, the M×N×2K cells are programmed to either a “0” (ROFF) or a “1” (RON) where 2K defines the number of levels. From here, 2K×N<<M×N non-volatile memory cells are precision programmed. -
FIG. 6 depicts a flow diagram 600 of a precision programming process of the 2K×N non-volatile memory cells, in accordance with an embodiment of the disclosure. The process is initialized atblock 602 and proceeds to block 604 of a high-precision write loop. Inblock 604, ij are updated. Next, inblock 606 g_ij are read (non-binary). A high-precision operation changes the resistivity of a single non-volatile memory cell to a very precise known value between 2{circumflex over ( )}4 possible conductances. Inblock 608, the multilevel resistance g_ij is tuned. Then, inblock 610, the multilevel resistance g_ij is read, and the correct multilevel resistance verified inblock 612. The process loops back to block 604 if the multilevel resistance is correct, otherwise, the process proceeds back to block 608 for further tuning. The process terminates atblock 614. The output is given by: -
outputj=sum(v_i*g_ij) (Eq. 12) - where v_i, g_ij are high precision numbers in K levels.
-
FIG. 7 depicts a flow-diagram 700 of a method for performing multiply-accumulate acceleration in a neural network (100,FIG. 1 ), in accordance with an embodiment of the disclosure. The method is initialized atblock 702, and proceeds to block 704 which corresponds to the low-precision write loop 500 shown inFIG. 5 . In the lowprecision write loop 500, the input activations a_0, a_1, . . . , a_M−1 are applied to a summing array (crossbar 404,FIG. 4 ). A summed signal is generated by each column of NVM elements in the summing array as shown inFIG. 4 by the low-precision write loop 500. The process proceeds to block 706, which corresponds to the high-precision write loop 600 shown inFIG. 6 . In the high-precision write loop 600, the summed signal is input to a multiplying array (module 406,FIG. 4 ) having a plurality of NVM elements. Inblock 706, each NVM element in the multiplying array is precisely programmed to a conductance level proportional to a weight in the neural network as shown in the flow diagram ofFIG. 6 . The process terminates atblock 708. - Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the system. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Embodiments of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
- Some portions of the detailed descriptions, like the processes may be presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. An algorithm may be generally conceived to be steps leading to a desired result. The steps are those requiring physical transformations or manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
- It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “deriving” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- The operations described herein can be performed by an apparatus. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Accordingly, embodiments and features of the present disclosure include, but are not limited to, the following combinable embodiments.
- In one embodiment, a method of performing multiply-accumulate acceleration in a neural network includes generating, in a summing array having a plurality of non-volatile memory elements arranged in columns, a summed signal by the columns of non-volatile memory elements in the summing array, each non-volatile memory element in the summing array being programmed to either a high or low resistance state; and inputting the summed signal from the summing array to a multiplying array having a plurality of non-volatile memory elements, each non-volatile memory element in the multiplying array being precisely programmed to a conductance level proportional to a weight in the neural network.
- In another embodiment, the summing array and multiplying array is an M×N crossbar having K-bit weights, where M×N×2{circumflex over ( )}Kelements are programmed to either the high or low resistance state and 2×K×N elements are precisely programmed to the conductance level proportional to the weight in the neural network.
- In another embodiment, a plurality of input activations are conditionally summed depending upon specific weight values.
- In another embodiment, a plurality of input activations are significantly greater than a plurality of weight values.
- In another embodiment, summing array comprises a plurality of high and low resistance levels.
- In another embodiment, the method further comprises the M×N×2{circumflex over ( )}K elements programmed to either a resistance “off” state or a resistance “on” state for 2{circumflex over ( )}K levels.
- In another embodiment, 2{circumflex over ( )}K×N<<M×N non-volatile memory cells are fine-tuned.
- In another embodiment, the method further comprises scaling an output in a multiplier/scaling module.
- In a further embodiment, an architecture for performing multiply-accumulate operations in a neural network includes a summing array having a plurality of non-volatile memory elements arranged in columns, the summing array generating a summed signal by the columns of non-volatile memory elements in the summing array, each non-volatile memory element in the summing array being programmed to either a high or low resistance state; and a multiplying array having a plurality of non-volatile memory elements that receive a summed signal from the summing array, each non-volatile memory element in the multiplying array being precisely programmed to a conductance level proportional to a weight in the neural network.
- In another embodiment, the summing array and multiplying array is an M×N crossbar having K-bit weights, where M×N×2{circumflex over ( )}Kelements are programmed to either the high or low resistance state and 2{circumflex over ( )}K×N elements are precisely programmed to the conductance level proportional to the weight in the neural network.
- In another embodiment, a plurality of input activations is conditionally summed depending upon specific weight values.
- In another embodiment, a plurality of input activations is significantly greater than a plurality of weight values.
- In another embodiment, the architecture further comprises a plurality of resistors and where the summing array comprises a plurality of high and low resistance levels.
- In another embodiment, the architecture further comprises the M×N×2{circumflex over ( )}K elements programmed to either a resistance “off” state or a resistance “on” state for 2{circumflex over ( )}K levels.
- In another embodiment, 2{circumflex over ( )}K×N<<M×N non-volatile memory cells are fine-tuned.
- In another embodiment, the architecture further comprises a multiplier/scaling module for scaling an output.
- In another further embodiment, an architecture for performing multiply-accumulate operations in a neural network includes a crossbar including a plurality of crossbar nodes arranged in an array of rows and columns, each crossbar node being programmable to a first resistance level or a second resistance level, the crossbar being configured to sum a plurality of analog input activation signals over each column of crossbar nodes and output a plurality of summed activation signals; and a multiplier, coupled to the crossbar, including a plurality of multiplier nodes, each multiplier node being programmable to a resistance level proportional to one of a plurality of neural network weights, the multiplier being configured to sum the plurality of summed activation signals over the multiplier nodes and output an analog output activation signal.
- In another embodiment, each crossbar node includes one or more non-volatile elements (NVMs), and each multiplier node includes a plurality of NVMs.
- In another embodiment, the crossbar includes M rows, N columns, K-bit weights and M×N×2K programmable NVMs, and the multiplier includes N multiplier nodes and N×2K programmable NVMs.
- In another embodiment, the architecture further comprises a plurality of digital-to-analog converters (DACs) coupled to the crossbar, each DAC being configured to receive a plurality of digital input activation signals and output the plurality of analog input activation signals.
- In accordance with the foregoing, a method and architecture for performing multiply-accumulate acceleration is disclosed. Having thus described the disclosure of the present application in detail and by reference to embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope defined in the appended claims as follows:
Claims (20)
1. A method of performing multiply-accumulate acceleration in a neural network, comprising:
generating, in a summing array having a plurality of non-volatile memory elements arranged in columns, a summed signal by the columns of non-volatile memory elements in the summing array, each non-volatile memory element in the summing array being programmed to either a high or low resistance state; and
inputting the summed signal from the summing array to a multiplying array having a plurality of non-volatile memory elements, each non-volatile memory element in the multiplying array being precisely programmed to a conductance level proportional to a weight in the neural network.
2. The method of claim 1 , where the summing array and multiplying array is an M×N crossbar having K-bit weights, where M×N×2K elements are programmed to either the high or low resistance state and 2K×N elements are precisely programmed to the conductance level proportional to the weight in the neural network.
3. The method of claim 2 , where a plurality of input activations are conditionally summed depending upon specific weight values.
4. The method of claim 2 , where a plurality of input activations are significantly greater than a plurality of weight values.
5. The method of claim 2 , where summing array comprises a plurality of high and low resistance levels.
6. The method of claim 5 , further comprising the M×N×2K elements programmed to either a resistance “off” state or a resistance “on” state for 2K levels.
7. The method of claim 6 , where 2K×N<<M×N non-volatile memory cells are fine-tuned.
8. The method of claim 2 , further comprising scaling an output in a multiplier/scaling module.
9. An architecture for performing multiply-accumulate operations in a neural network, comprising:
a summing array having a plurality of non-volatile memory elements arranged in columns, the summing array generating a summed signal by the columns of non-volatile memory elements in the summing array, each non-volatile memory element in the summing array being programmed to either a high or low resistance state; and
a multiplying array having a plurality of non-volatile memory elements that receive a summed signal from the summing array, each non-volatile memory element in the multiplying array being precisely programmed to a conductance level proportional to a weight in the neural network.
10. The architecture of claim 9 , where the summing array and multiplying array is an M×N crossbar having K-bit weights, where M×N×2K elements are programmed to either the high or low resistance state and 2K×N elements are precisely programmed to the conductance level proportional to the weight in the neural network.
11. The architecture of claim 10 , where a plurality of input activations is conditionally summed depending upon specific weight values.
12. The architecture of claim 10 , where a plurality of input activations is significantly greater than a plurality of weight values.
13. The architecture of claim 10 , further comprising a plurality of resistors and where the summing array comprises a plurality of high and low resistance levels.
14. The architecture of claim 13 , further comprising the M×N×2K elements programmed to either a resistance “off” state or a resistance “on” state for 2K levels.
15. The architecture of claim 14 , where 2K×N<<M×N non-volatile memory cells are fine-tuned.
16. The architecture of claim 10 , further comprising a multiplier/scaling module for scaling an output.
17. An architecture for performing multiply-accumulate operations in a neural network, comprising:
a crossbar including a plurality of crossbar nodes arranged in an array of rows and columns, each crossbar node being programmable to a first resistance level or a second resistance level, the crossbar being configured to sum a plurality of analog input activation signals over each column of crossbar nodes and output a plurality of summed activation signals; and
a multiplier, coupled to the crossbar, including a plurality of multiplier nodes, each multiplier node being programmable to a resistance level proportional to one of a plurality of neural network weights, the multiplier being configured to sum the plurality of summed activation signals over the multiplier nodes and output an analog output activation signal.
18. The architecture of claim 17 , where each crossbar node includes one or more non-volatile elements (NVMs), and each multiplier node includes a plurality of NVMs.
19. The architecture of claim 18 , where the crossbar includes M rows, N columns, K-bit weights and M×N×2K programmable NVMs, and the multiplier includes N multiplier nodes and N×2K programmable NVMs.
20. The architecture of claim 17 , further comprising:
a plurality of digital-to-analog converters (DACs) coupled to the crossbar, each DAC being configured to receive a plurality of digital input activation signals and output the plurality of analog input activation signals.
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/556,101 US20210064379A1 (en) | 2019-08-29 | 2019-08-29 | Refactoring MAC Computations for Reduced Programming Steps |
PCT/GB2020/050720 WO2021038182A2 (en) | 2019-08-29 | 2020-03-19 | Refactoring mac computations for reduced programming steps |
EP20765332.0A EP4022426A1 (en) | 2019-08-29 | 2020-08-27 | Refactoring mac operations |
PCT/GB2020/052053 WO2021038228A1 (en) | 2019-08-29 | 2020-08-27 | Refactoring mac operations |
CN202080061121.0A CN114365078A (en) | 2019-08-29 | 2020-08-27 | Reconstructing MAC operations |
TW109129533A TW202111703A (en) | 2019-08-29 | 2020-08-28 | Refactoring mac operations |
US17/674,503 US11922169B2 (en) | 2019-08-29 | 2022-02-17 | Refactoring mac operations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/556,101 US20210064379A1 (en) | 2019-08-29 | 2019-08-29 | Refactoring MAC Computations for Reduced Programming Steps |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2020/052053 Continuation WO2021038228A1 (en) | 2019-08-29 | 2020-08-27 | Refactoring mac operations |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210064379A1 true US20210064379A1 (en) | 2021-03-04 |
Family
ID=70050154
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/556,101 Abandoned US20210064379A1 (en) | 2019-08-29 | 2019-08-29 | Refactoring MAC Computations for Reduced Programming Steps |
US17/674,503 Active 2040-01-12 US11922169B2 (en) | 2019-08-29 | 2022-02-17 | Refactoring mac operations |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/674,503 Active 2040-01-12 US11922169B2 (en) | 2019-08-29 | 2022-02-17 | Refactoring mac operations |
Country Status (5)
Country | Link |
---|---|
US (2) | US20210064379A1 (en) |
EP (1) | EP4022426A1 (en) |
CN (1) | CN114365078A (en) |
TW (1) | TW202111703A (en) |
WO (2) | WO2021038182A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11393516B2 (en) * | 2020-10-19 | 2022-07-19 | Western Digital Technologies, Inc. | SOT-based spin torque oscillators for oscillatory neural networks |
WO2022212282A1 (en) * | 2021-03-29 | 2022-10-06 | Infineon Technologies LLC | Compute-in-memory devices, systems and methods of operation thereof |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230022516A1 (en) * | 2021-07-23 | 2023-01-26 | Taiwan Semiconductor Manufacturing Company, Ltd. | Compute-in-memory systems and methods with configurable input and summing units |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150067273A1 (en) | 2013-08-30 | 2015-03-05 | Microsoft Corporation | Computation hardware with high-bandwidth memory interface |
WO2018148293A1 (en) | 2017-02-07 | 2018-08-16 | The Regents Of The University Of Michigan | Systems and methods for mixed-signal computing |
US20190042915A1 (en) | 2018-03-30 | 2019-02-07 | Intel Corporation | Procedural neural network synaptic connection modes |
US11144824B2 (en) | 2019-01-29 | 2021-10-12 | Silicon Storage Technology, Inc. | Algorithms and circuitry for verifying a value stored during a programming operation of a non-volatile memory cell in an analog neural memory in deep learning artificial neural network |
KR20230090849A (en) * | 2021-12-15 | 2023-06-22 | 삼성전자주식회사 | Neural network apparatus and electronic system including the same |
-
2019
- 2019-08-29 US US16/556,101 patent/US20210064379A1/en not_active Abandoned
-
2020
- 2020-03-19 WO PCT/GB2020/050720 patent/WO2021038182A2/en active Application Filing
- 2020-08-27 CN CN202080061121.0A patent/CN114365078A/en active Pending
- 2020-08-27 WO PCT/GB2020/052053 patent/WO2021038228A1/en unknown
- 2020-08-27 EP EP20765332.0A patent/EP4022426A1/en active Pending
- 2020-08-28 TW TW109129533A patent/TW202111703A/en unknown
-
2022
- 2022-02-17 US US17/674,503 patent/US11922169B2/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11393516B2 (en) * | 2020-10-19 | 2022-07-19 | Western Digital Technologies, Inc. | SOT-based spin torque oscillators for oscillatory neural networks |
WO2022212282A1 (en) * | 2021-03-29 | 2022-10-06 | Infineon Technologies LLC | Compute-in-memory devices, systems and methods of operation thereof |
Also Published As
Publication number | Publication date |
---|---|
TW202111703A (en) | 2021-03-16 |
CN114365078A (en) | 2022-04-15 |
WO2021038182A2 (en) | 2021-03-04 |
EP4022426A1 (en) | 2022-07-06 |
US11922169B2 (en) | 2024-03-05 |
WO2021038228A1 (en) | 2021-03-04 |
US20220179658A1 (en) | 2022-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11886987B2 (en) | Non-volatile memory-based compact mixed-signal multiply-accumulate engine | |
US10534840B1 (en) | Multiplication using non-volatile memory cells | |
CN109416760B (en) | Artificial neural network | |
US20210064379A1 (en) | Refactoring MAC Computations for Reduced Programming Steps | |
US20200372330A1 (en) | Control circuit for multiply accumulate circuit of neural network system | |
US9152827B2 (en) | Apparatus for performing matrix vector multiplication approximation using crossbar arrays of resistive memory devices | |
US11544540B2 (en) | Systems and methods for neural network training and deployment for hardware accelerators | |
CN113325909B (en) | Product accumulation circuit applied to binary neural network system | |
CN114424198A (en) | Multiplication accumulator | |
Ananthakrishnan et al. | All-passive hardware implementation of multilayer perceptron classifiers | |
US11226763B2 (en) | Device for high dimensional computing comprising an associative memory module | |
TWI771014B (en) | Memory circuit and operating method thereof | |
Papandroulidakis et al. | Multi-state memristive nanocrossbar for high-radix computer arithmetic systems | |
US20220309328A1 (en) | Compute-in-memory devices, systems and methods of operation thereof | |
US20200395053A1 (en) | Integrated circuits | |
US11183238B2 (en) | Suppressing outlier drift coefficients while programming phase change memory synapses | |
US10754921B2 (en) | Resistive memory device with scalable resistance to store weights | |
Mondal et al. | Current comparator-based reconfigurable adder and multiplier on hybrid memristive crossbar | |
US11127446B2 (en) | Stochastic memristive devices based on arrays of magnetic tunnel junctions | |
Strukov | 3D hybrid CMOS/memristor circuits: basic principle and prospective applications | |
KR20210141090A (en) | Artificial neural network system using phase change material | |
CN117577151A (en) | Memory cell, memristor array, memory calculation integrated circuit and operation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARM LIMITED, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATTINA, MATTHEW;DAS, SHIDHARTHA;ROSENDALE, GLEN ARNOLD;AND OTHERS;SIGNING DATES FROM 20190805 TO 20190828;REEL/FRAME:050837/0722 |
|
STCB | Information on status: application discontinuation |
Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION |