WO2020131868A1 - System and method for training artificial neural networks - Google Patents

System and method for training artificial neural networks Download PDF

Info

Publication number
WO2020131868A1
WO2020131868A1 PCT/US2019/066847 US2019066847W WO2020131868A1 WO 2020131868 A1 WO2020131868 A1 WO 2020131868A1 US 2019066847 W US2019066847 W US 2019066847W WO 2020131868 A1 WO2020131868 A1 WO 2020131868A1
Authority
WO
WIPO (PCT)
Prior art keywords
ann
mram
values
write
training
Prior art date
Application number
PCT/US2019/066847
Other languages
English (en)
French (fr)
Inventor
Michail TZOUFRAS
Marcin GAJEK
Original Assignee
Spin Memory, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/223,058 external-priority patent/US11586906B2/en
Priority claimed from US16/223,055 external-priority patent/US20200193282A1/en
Application filed by Spin Memory, Inc. filed Critical Spin Memory, Inc.
Priority to CN201980092228.9A priority Critical patent/CN113841165A/zh
Publication of WO2020131868A1 publication Critical patent/WO2020131868A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/005Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor comprising combined but independently operative RAM-ROM, RAM-PROM, RAM-EPROM cells
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron

Definitions

  • This relates generally to the field of memory applications, including but not limited to training artificial neural networks using magnetic memory.
  • Modem artificial neural networks train on massive amounts of data to optimize their internal parameters— e.g., their weights and biases— before they can be deployed.
  • the optimization process i.e., training
  • the training involves a large number of iterations (i.e., epochs) through the training data.
  • the training requires a large amount of energy usage due to a large amount of data transfer (of training data) to and from the chip and energy leakage of the on-chip memory.
  • a related issue is the large footprint of SRAM, which limits the available on-chip memory thereby increasing the need for data transfers.
  • some embodiments of the present disclosure train an ANN, at least partially, using error-prone memory.
  • the error-prone memory requires less energy than error-free memory and has a potentially smaller on-chip area footprint.
  • using an error-prone memory for part of an ANN training process does not affect the efficacy of the training process, and in fact can provide a beneficial degree of stochasticity for an initial“high-entropy” phase of the ANN training process (e.g., search of the ANN parameter space).
  • Such ANN include but are not limited to fully connected networks (FNN) and convolutional neural networks (CNN)
  • a method performed at a computing device includes one or more processors, a first random access memory (RAM) comprising magnetic random access memory (MRAM), a second random access memory of a type distinct from MRAM, and a non-transitory computer-readable storage medium storing instructions for execution by the one or more processors.
  • the non-transitory computer-readable storage medium includes instructions for executing the method.
  • the method includes receiving first data on which to train an artificial neural network (ANN).
  • ANN artificial neural network
  • the method includes training the ANN by, using the first RAM comprising the MRAM, performing a first set of training iterations to train the ANN using the first data and, after performing the first set of training iterations, using the second RAM of the type distinct from MRAM, performing a second set of training iterations to train the ANN using the first data.
  • the method further includes storing values for the trained ANN, wherein the trained ANN is configured to classify second data based on the stored values.
  • each of the first set of training iterations includes: reading values for a set of weights of the ANN from the first RAM comprising the MRAM; using the one or more processors, performing a set of arithmetic operations to update the values for the set of weights of the ANN; and writing the updated set of weights of the ANN to the first RAM comprising the MRAM.
  • each of the first set of training iterations includes reading values for a set of biases of the ANN from the first RAM comprising the MRAM; using the one or more processors, performing a set of arithmetic operations to update the values for the set of biases of the ANN; and writing the updated set of biases of the ANN to the first RAM comprising the MRAM.
  • each of the second set of training iterations includes reading values for the set of weights and/or biases of the ANN from the second RAM of the type distinct from MRAM, using the one or more processors, performing the set of arithmetic operations to update the values for the set of weights and/or biases of the ANN; and writing the updated set of weights and/or biases of the ANN to the second RAM of the type distinct from the MRAM.
  • each of the second set of training iterations includes: reading values for the set of activations of the ANN from the second RAM of the type distinct from MRAM; using the one or more processors, performing the set of arithmetic operations to update the values for the set of activations of the ANN; and writing the updated set of activations of the ANN to the second RAM of the type distinct from the MRAM.
  • the first RAM comprising the MRAM is on the same chip as the one or more processors.
  • the first RAM is operated, during the first set of training iterations, as error-prone memory.
  • the stored values of the trained ANN comprise stored weights. The method further comprises, during the first set of training iterations, performing error detection that includes detecting an error in a respective weight stored in the first RAM, and replacing a value stored in the respective weight with a zero value prior to using the respective weight.
  • the first RAM has a bit error-rate below a threshold for convergence of the first set of training iterations.
  • the threshold for convergence is greater than: 10 3 , 10 5 , or 10 7 .
  • bit error rate is greater than: 10 4 , 10 6 , or 10 8 .
  • the second RAM comprises static RAM (SRAM).
  • the first set of training iterations includes more than 20%, 40%, 60%, 80%, or 95% of a total number of training iterations used for training the ANN.
  • the method further comprises, after training the ANN, receiving second data and assigning scores to the second data using the stored values of the trained ANN.
  • an electronic system comprises one or more processors, a first random access memory (RAM) comprising magnetic random access memory (MRAM), a second random access memory of a type distinct from MRAM, and a non-transitory computer-readable storage medium storing instructions executable by the one or more processors.
  • the instructions include instructions for training an artificial neural network (ANN) using first data by performing a first set of training iterations using the first RAM comprising MRAM, training the ANN using the first data by performing a second set of training iterations using the second RAM comprising a type distinct from MRAM, and storing values for the trained ANN, wherein the trained ANN is configured to classify second data based on the stored values.
  • ANN artificial neural network
  • the electronic system includes a chip. [0020] In some implementations, the electronic system is configured to perform any of the methods (A1)-(A14) described above.
  • a method performed at a computing device that includes one or more processors, magnetic random access memory (MRAM), and a non-transitory computer-readable storage medium storing instructions.
  • MRAM magnetic random access memory
  • a non-transitory computer-readable storage medium storing instructions.
  • the non-transitory computer-readable storage medium includes instructions for executing the method.
  • the method includes receiving first data on which to train an artificial neural network (ANN).
  • the method further includes, using the MRAM, training the ANN by performing a first set of training iterations on the first data.
  • Each of the first set of iterations includes writing values for a set of weights of the ANN to the MRAM using first write parameters corresponding to a first write error rate.
  • Training the ANN further includes, after performing the first set of iterations, performing a second set of training iterations on the first data.
  • Each of the second set of iterations includes writing values for the set of weights of the ANN to the MRAM using second write parameters corresponding to a second write error rate.
  • the second write error rate is lower than the first write error rate.
  • the method further includes storing values for the trained ANN, wherein the trained ANN is configured to classify second data based on the stored values.
  • each of the first set of iterations includes writing values for a set of biases and a set of activations of the ANN and each of the second set of iterations includes writing values for the set of biases and the set of activations of the ANN.
  • the first write parameters include a first write pulse duration and the second write parameters include a second write pulse duration that is longer than the first write pulse duration.
  • the first write parameters include a first write current and the second write parameters include a second write current that is greater than the first write current.
  • writing the values for the set of weights of the ANN to the MRAM using the write parameters corresponding to the first write error rate includes writing the values without using an error-correcting code
  • writing the values for the set of weights of the ANN to the MRAM using the second write parameters corresponding to the second write error rate includes writing the values using an error-correcting code
  • each of the first set of training iterations includes reading the values for the set of weights of the ANN from the MRAM and, using the one or more processors, performing a set of arithmetic operations to update the values for the set of weights of the ANN.
  • the values for the set of weights of the ANN written to the MRAM for the iteration are the updated set of weights.
  • each of the second set of training iterations includes reading the values for the set of weights of the ANN from the MRAM, and, using the one or more processors, performing the set of arithmetic operations to update the values for the set of weights of the ANN.
  • the values for the set of weights of the ANN written to the MRAM for the iteration are the updated set of weights.
  • the MRAM is on the same chip as the one or more processors.
  • the method further includes, during the first set of training iterations, performing error detection that includes detecting an error in a respective weight stored in the MRAM and replacing a value stored for the respective weight with a zero value prior to using the respective weight.
  • the first write parameters correspond to a bit error-rate below a threshold for convergence of the first set of training iterations.
  • bit error rate threshold for convergence is greater than: 10 3 , 10 5 , or 10 7 .
  • the first write error rate is greater than: 10 4 , 10 6 , or 10 8 .
  • the first set of training iterations includes more than 20%, 40%, 60%, 80%, or 95% of a total number of training iterations used for training the ANN.
  • each of the first set of iterations includes reading values for the set of weights of the ANN to the MRAM using first read parameters corresponding to a first read error rate and each of the second set of iterations includes reading values for the set of weights of the ANN to the MRAM using second read parameters corresponding to a second read error rate.
  • the second read error rate is lower than the first read error rate.
  • the first read parameters include a first read pulse duration and the second read parameters include a second read pulse duration that is longer than the first read pulse duration.
  • the first read parameters include a first read current and the second read parameters include a second read current that is greater than the first read current.
  • the method further includes, after training the ANN, receiving second data and assigning scores to the second data using the stored values of the trained ANN.
  • a system having one or more processors, magnetic random access memory (MRAM), write circuitry configured to write data to the MRAM, and a non-transitory computer-readable storage medium.
  • the non-transitory computer-readable storage medium stores instructions for execution by the one or more processors, including instructions for receiving first data on which to train an artificial neural network (ANN).
  • the instructions further include instructions for, using the MRAM, training the ANN by performing a first set of training iterations on the first data.
  • Each of the first set of iterations includes writing, using the write circuitry, values for a set of weights of the ANN to the MRAM using first write parameters corresponding to a first write error rate.
  • the instructions include instructions for, after performing the first set of iterations, performing a second set of training iterations on the first data.
  • Each of the second set of iterations includes writing, using the write circuitry, values for the set of weights of the ANN to the MRAM using second write parameters corresponding to a second write error rate.
  • the second write error rate is lower than the first write error rate.
  • the instructions further include instructions for storing values for the trained ANN.
  • the trained ANN is configured to classify second data based on the stored values.
  • the electronic system includes a chip.
  • the electronic system is configured to any of the methods (B1)-(B17) described above.
  • ANNs are used as a specific example of a training process that may benefit from the methods and devices described herein, it should be noted that any resource intensive optimization process (e.g., statistical process) may also benefit from the methods and device described herein.
  • resource intensive optimization process e.g., statistical process
  • Some embodiments of the present disclosure apply not only to ANNs, but to any sort of optimization (e.g., statistical) process.
  • Some embodiments of the present disclosure apply to a machine learning process.
  • Such machine learning processes may include supervised learning (e.g., classification, regression), clustering (e.g., latent Dirichlet allocation), dimensionality reduction, structured prediction, anomaly detection, and reinforcement learning.
  • training a machine learning process may include training a model for any of the above applications.
  • a method performed at a computing device includes one or more processors, a first random access memory (RAM) comprising magnetic random access memory (MRAM), a second random access memory of a type distinct from MRAM, and a non-transitory computer- readable storage medium storing instructions for execution by the one or more processors.
  • the method includes receiving first data on which to train a machine learning process.
  • the method includes training the machine learning process by, using the first RAM comprising the MRAM, performing a first set of training iterations to train the machine learning process using the first data and, after performing the first set of training iterations, using the second RAM of the type distinct from MRAM, performing a second set of training iterations to train the machine learning process using the first data.
  • the method further includes storing values for the machine learning process based on the training. The values for the machine learning process are used to re-configure a machine (e.g., cause a machine to operate differently than before the machine was re-configured using the values).
  • a method performed at a computing device that includes one or more processors, magnetic random access memory (MRAM), and a non-transitory computer-readable storage medium storing instructions for execution by the one or more processors.
  • the instructions include instructions for receiving first data on which to train a machine learning process.
  • the instructions further include instructions for, using the MRAM, training the machine learning process by performing a first set of training iterations on the first data.
  • Each of the first set of iterations includes writing values for the machine learning process to the MRAM using first write parameters corresponding to a first write error rate.
  • Training the machine learning process further includes, after performing the first set of iterations, performing a second set of training iterations on the first data.
  • Each of the second set of iterations includes writing values for the machine learning process to the MRAM using second write parameters corresponding to a second write error rate.
  • the second write error rate is lower than the first write error rate.
  • the instructions further include instructions for storing values for the machine learning process.
  • the values for the machine learning process are used to re-configure a machine (e.g., cause a machine to operate differently than before the machine was re-configured using the values).
  • optimization processes e.g., machine learning processes, ANN training, etc.
  • ANN training e.g., ANN training, etc.
  • Figure 1 illustrates a schematic diagram of a chip structure in accordance with some implementations.
  • Figure 2 illustrates a schematic diagram of a chip structure in accordance with some implementations.
  • Figure 3 A illustrates a graph of a loss function in accordance with some implementations.
  • Figure 3B illustrates a graph of training loss for a plurality of iterations in accordance with some implementations.
  • Figures 4A-4B illustrate a method for training an artificial neural network in accordance with some implementations.
  • Figures 5A-5C illustrate a method for training an artificial neural network in accordance with some implementations.
  • Figure 6 is a block diagram of a computer system for training an artificial neural network, according to some embodiments.
  • Like reference numerals refer to corresponding parts throughout the several views of the drawings.
  • a fully connected artificial neural network can be represented mathematically as the following:
  • Equations (la)-(lc) above A L represents the activations, W L represents the weights, b t represents the biases.
  • the variable X represents the input data.
  • the parameter n represents the number of layers in the neural network.
  • the last layer of activations, O is referred to as the outputs.
  • An ANN training process aims to optimize certain network parameters (e.g., weights and biases) through an iterative process.
  • the total number of parameters often exceeds 10 6 and can reach 10 9 .
  • Finding a global minimum in such a multidimensional space is a huge challenge.
  • the process starts by initializing the network parameters with random values.
  • the network parameters are adjusted to reduce an error metric for the 10 x -dimensional landscape (e.g., ideally find a global minimum of the error metric). The starting point is usually far from the minimum and there is a danger that the iterative process may get trapped in a local minimum.
  • the ANN training process is split into two parts: a first set of iterations and a second set of iterations.
  • some implementations replace highly-accurate memory in an ANN training process with an error-prone memory.
  • the detriment to the efficacy is small and often worth the gains in energy cost, chip area, and speed. This is especially true when the errors are detectable and can be constrained.
  • the first set of iterations may comprise most iterations (e.g., epochs), from the beginning of training until almost the end (e.g., until the last 10 epochs).
  • an error-prone energy-efficient memory is used, such as MRAM or a combination of MRAM and small SRAM buffers.
  • MRAM Error-prone energy-efficient memory
  • SRAM Small SRAM buffers
  • This error-free memory enables the process to hone in on a minimum of the loss function for the network parameters (e.g., a global minimum, or at least a local minimum).
  • MRAM can be used as error-prone memory or an essentially-error-free memory depending on read/write times (e.g., durations of electrical pulses used to read/write the MRAM).
  • read/write times e.g., durations of electrical pulses used to read/write the MRAM.
  • the MRAM can be operated with faster read/write times (e.g., as compared to error-free MRAM) because the error rates associated with short read/write times do not affect the result.
  • MRAM is used for all of the iterations to train the ANN, but the read and/or write speed is reduced during the final stages of the calculation. This makes read and write operations less prone to errors as the minimum is approached.
  • Figure 1 illustrates a schematic diagram of an electronic system that includes a chip 102.
  • the system includes a first type of RAM (e.g., error-prone MRAM) and a second type of RAM (e.g., error-free RAM, which may comprise one or more external (e.g., off- chip) devices, such as RAM chips 100).
  • RAM chips 100 comprise DRAM or another form of memory (e.g., that is distinct from magnetic RAM).
  • the chip 102 includes a RAM interface 104 (e.g., a DDR3 interface) that facilitates communication between the chip 102 and the external RAM chips 100.
  • the chip 102 includes SRAM buffer(s)/cache(s) 108 for buffering data to be used by processors 112 during an ANN training process.
  • SRAM buffer(s)/cache(s) 108 buffers data stored off-chip (e.g., in external RAM 100) and/or data stored in MRAM 110 (e.g., error- prone memory).
  • data used to train the ANN is stored in MRAM 110 (e.g., all of the data needed to perform a first set of iterations of an ANN training process is stored in MRAM 110).
  • Data is cached from MRAM 110 as needed by SRAM buffer(s)/cache(s) 108 so that it is available to the processor 112 (e.g., an arithmetic logic unit (ALU)), which performs the calculations necessary to train the ANN.
  • the data includes values, such as weights, activations, and biases, of the ANN.
  • the data includes intermediate values (e.g., during the training of the ANN).
  • a second set of iterations of the ANN training process data used to train the ANN is stored in RAM chips 100.
  • the data is bussed on and off the chip 102 through RAM interface 104.
  • Data is cached from RAM chips 100 as needed by SRAM buffer(s)/cache(s) 108 so that it is available to the processor 112.
  • the second set of iterations produces final values of the ANN parameters (e.g., values of weight and biases after the ANN has been fully trained), which are exported to a host computer via host interface 106.
  • processing unit(s) 112 executes instructions for training an ANN (e.g., instructions for performing the process 400).
  • instructions for training an ANN e.g., instructions for performing the process 400.
  • the instructions executable by the one or more processor unit(s) 112 are stored in a non-transitory computer-readable storage medium.
  • the instructions are stored on chip 102.
  • the instructions are stored off- chip (e.g., in RAM chips 100).
  • chip 102 includes two distinct types of memory, including MRAM 110 and a second type of memory distinct from MRAM.
  • MRAM 110 and a second type of memory distinct from MRAM.
  • RAM chips 100 are illustrated as separate from chip 102, it is to be understood that in some implementations, the data stored on RAM chips 100 is stored on chip 102.
  • the first RAM for the first set of iterations (e.g., MRAM 110) and the second RAM, distinct from the first RAM, for the second set of iterations, reside on the chip 102.
  • the one or more processors (e.g., processor unit(s) 112) reside on the chip.
  • a non- transitory computer readable storage medium storing instructions for training the ANN resides on the chip 102.
  • the non-transitory computer-readable storage medium is loaded (e.g., written) with the instructions (e.g., from a host computer) when the chip 102 is powered-up.
  • the non-transitory computer-readable storage medium comprises a portion of first RAM or second RAM.
  • chip 102 is used to classify untrained second data.
  • the off-chip memory e.g., RAM chips 100 stores some or all of the second data.
  • Figure 2 illustrates a schematic diagram of a computing device (e.g., chip
  • Figure 2 illustrates a computing device in which both the first set of ANN training iterations and the second set of ANN training iterations are performed by the same RAM (e.g., MRAM 210).
  • the first RAM is operated in an error-prone mode
  • the second set of ANN training operations the first RAM is operated in an error-free mode.
  • the chip 202 includes a host interface
  • the MRAM 210 is communicatively coupled with write circuitry 214 for writing data (e.g., ANN weights calculated during training iterations for the ANN) to the MRAM and read circuitry 216 for reading data (e.g., values for the ANN weights) from the MRAM.
  • the write circuitry includes word lines and bit lines (e.g., wires) and sets of corresponding transistors (e.g., for activating the word lines and bit lines).
  • the write circuitry includes or is coupled with memory storing the first write parameters and second write parameters.
  • the read circuitry is configured to modify read parameters for reading values from the MRAM.
  • the write circuitry is configured to modify write parameters for writing values to the MRAM.
  • chip 102 also includes similar write circuitry and read circuitry, but for simplicity, those features are not shown in Figure 1.
  • the chip 202 includes a non-transitory computer- readable storage medium storing instructions for receiving first data on which to train an artificial neural network (ANN).
  • the instructions further include instructions for, using the MRAM 210, training the ANN by performing a first set of training iterations on the first data.
  • Each of the first set of iterations includes writing, using the write circuitry, values for a set of weights of the ANN to the MRAM using first write parameters corresponding to a first write error rate.
  • the instructions include instructions for, after performing the first set of iterations, performing a second set of training iterations on the first data.
  • Each of the second set of iterations includes writing, using the write circuitry, values for the set of weights of the ANN to the MRAM using second write parameters corresponding to a second write error rate.
  • the second write error rate is lower than the first write error rate.
  • the instructions further include instructions for storing values for the trained ANN.
  • the trained ANN is configured to classify second data based on the stored values.
  • chip 202 trains the ANN using MRAM (e.g., for all of the iterations) and reads/writes parameters during the first set of iterations for training the ANN with a high read/write error rates, and reads/writes parameters during the second set of iterations using a lower read/write error rate than the read/write error rate for the first set of iterations.
  • MRAM e.g., for all of the iterations
  • chip 202 includes any or all of the modules of chip
  • Figure 3A illustrates a graph of an loss function 300 of an ANN. Loss function
  • the graph of the loss function 300 is simplified to show error as a function of a single parameter. It should be noted, however, that the total number of parameters often exceeds 10 6 and can reach 10 9 .
  • a first set of iterations (e.g., represented by dashed lines) are performed on error-prone memory (e.g., MRAM).
  • the first set of training iterations begins with iteration 302.
  • a second set of iterations 304 (e.g., represented by solid lines, performed after the first set of iterations are performed) are performed on error-free memory such as SRAM, DRAM, or MRAM with error correction.
  • the first set of training iterations includes more than 20%, 40%, 60%, 80%, or 95% of a total number of training iterations used for training the ANN.
  • the second set of training iterations 304 are performed as the training iterations converge to a minimum of the loss function 300.
  • a set of weights, biases and/or activations are updated.
  • Figure 3B illustrates a graph of the training loss of the first set of iterations
  • the errors from the first set of iterations are not too large and the ANN is still able to be trained (e.g., to calculate and store weights to be applied to untrained data).
  • the first set of iterations are performed on MRAM (e.g., or a combination of MRAM and SRAM), and the second set of iterations are performed on SRAM (e.g., or a combination of SRAM and DRAM).
  • errors can take place at some rate that is not too high, e.g., 0.01% - 50%. Then, error detection and masking can be performed such that when an erroneous number is detected, it is either replaced (e.g., masked) with a zero value or its value is constrained in order not to overwhelm the rest of the data.
  • the resulting information contains a small number of constrained errors, e.g., 0.01%-50% erroneous zeros. The training continues as usual despite these errors.
  • FIGS 4A-4B illustrate a method 400 for training an artificial neural network in accordance with some implementations.
  • Method 400 is performed at a computing device (402) (e.g., the computing device shown in Figure 1) that includes one or more processors (e.g., processor unit(s) 112), a first random access memory (RAM) comprising magnetic random access memory (MRAM) (e.g., MRAM 110), a second random access memory of a type distinct from MRAM (e.g., RAM chips 100), and a non-transitory computer-readable storage medium having instructions for execution by the one or more processors.
  • processors e.g., processor unit(s) 112
  • MRAM magnetic random access memory
  • MRAM magnetic random access memory
  • RAM non-transitory computer-readable storage medium having instructions for execution by the one or more processors.
  • the first RAM comprising the MRAM is (404) on the same chip as the one or more processors (e.g., as shown in Figure 1).
  • MRAM allows for an increase in on-chip memory data and thus reduces the need for data movement on and off the chip. Data movement requires a lot of energy, and thus using MRAM in method 400, as described herein, is more efficient than conventional ANN training methods using conventional types of memory.
  • the second RAM of the type distinct from MRAM is on the same chip as the first RAM comprising the MRAM.
  • the first RAM comprising MRAM is on a first chip that includes one or more first processors
  • the second RAM distinct from MRAM is on one or more second chips (e.g., RAM chips 100), distinct from the first chip, that includes one or more second processors, distinct from the one or more first processors.
  • the second RAM comprises (406) static RAM
  • the second RAM comprises dynamic RAM (DRAM). In some embodiments, the second RAM comprises high-bandwidth memory (HBM).
  • DRAM dynamic RAM
  • HBM high-bandwidth memory
  • Method 400 includes receiving (408) first data on which to train an artificial neural network (ANN).
  • the first data is stored in a database (e.g., stored on-chip or off-chip).
  • the first data is received in response to an instruction from a host (e.g., via host interface 106).
  • the method further includes training (410) the ANN by, using the first RAM comprising the MRAM, performing (412) a first set of training iterations (e.g., epochs) to train the ANN using the first data.
  • a first set of training iterations e.g., epochs
  • the first set of training iterations includes (at least) the first half of a total number of training iterations.
  • the first set of training iterations includes all of the training iterations except for a last N iterations, where N is an integer.
  • each of the first set of training iterations includes
  • the computing device writes the updated set of weights of the ANN to the first RAM comprising the MRAM.
  • the updated values of the set of weights written to the first RAM comprise intermediate values (e.g., not final values to be applied to untrained data).
  • the updated values written during the first set of training iterations are intermediate values because the final values are determined after performing the second set of training iterations.
  • each of the first set of training iterations includes (416) reading values for a set of biases (and/or activations) of the ANN from the first RAM comprising the MRAM, and, using the one or more processors, performing a set of arithmetic operations to update the values for the set of biases (and/or activations) of the ANN.
  • the computing device writes the updated set of biases (and/or activations) of the ANN to the first RAM comprising the MRAM.
  • the MRAM comprises magnetic tunnel junctions
  • the MRAM comprises a spin-transfer torque (STT) memory.
  • STT spin-transfer torque
  • Data stored in the MRAM are encoded with using a relative orientation of two or more magnetic layers (e.g., a free layer and a reference data) in a bit.
  • the value of the bit e.g., the value of the data stored in the bit
  • magnetoresistance effect e.g., an anti-parallel arrangement of the respective magnetizations of free layer and the fixed layer has a different resistance than a parallel arrangement of the same.
  • an MRAM bit is written by applying a current pulse
  • MRAM bit (e.g., a write pulse having an amplitude and temporal length) to the MRAM bit, to switch the bit under the action of spin-transfer torque.
  • the MRAM bit is read by applying a smaller pulse (e.g., a read pulse having a smaller amplitude and/or shorter temporal length) to the bit to determine its resistance.
  • a smaller pulse e.g., a read pulse having a smaller amplitude and/or shorter temporal length
  • the voltage (and thus current) of the read pulse should be high enough and applied for long enough to allow the MTJ state to be determined (e.g., to allow the bit’s resistance to be determined by a sense amplifier) but the voltage should not be so high, or the read pulse so long, that the data is disturbed (e.g., through the action of STT).
  • the voltage should be high enough and applied for long enough so that the information is correctly and reliably written, but no so high or for so long that the write pulse would stress or break the MTJ.
  • write error occur when the write pulse voltage amplitude is not high enough (or the write pulse not long enough) to write the data to the MRAM.
  • Breakdown errors occur when the write voltage amplitude for writing is so high (or the write pulse so long) that the MRAM bit is damaged due to breakdown of the tunnel oxide barrier.
  • the probability that the data is retained correctly can be
  • Read error occur when the applied voltage amplitude is not high enough (or the read pulse is not long enough) to detect the resistance state of the MRAM bit. These errors arise due to the sense amplifier, not the MRAM.
  • Read disturb error read disturb errors occur when the read voltage is so high (or the read pulse is so long) that it disturbs the state of the MRAM bit (e.g., effectively writes the MRAM bit) while attempting to read it.
  • the read disturb probability can be calculated from the read pulse and the MRAM characteristics.
  • read errors are preferable to read disturb errors.
  • the read operations described herein are performed using read parameters that limit the number of read disturb errors (e.g., a read voltage and/or pulse length is below a respective threshold for causing read disturb errors, even at the cost of a greater number of read errors).
  • ECC error-correcting code
  • the first RAM is operated (418), during the first set of training iterations, as error-prone memory.
  • the computing device does not perform error-correction (or performs minimal error-correction) to the MRAM during the first set of training iterations.
  • the stored values of the trained ANN comprise stored weights and the method further comprises, during the first set of training iterations, performing (420) error detection that includes detecting an error in a respective weight (or an activation or bias) stored in the first RAM.
  • a value for a weight may be incorrect (e.g., erroneous) because (I) the stored value for the weight is wrong (e.g., a write error, a retention error, or a read disturb error); or (II) the stored value for the weight is correct but it was read incorrectly).
  • a zero value is used in its place for the iteration (e.g., an iteration starts with an attempt to read the set of weights, errors are detected in the weights, and the calculation for the iteration is performed with the zero values replacing the errors).
  • the computing device replaces a value stored in the respective weight with a zero value prior to using the respective weight (e.g., in forward or back propagation).
  • the method includes, determining if the erroneous value was read correctly (e.g., determining whether the value was stored incorrectly or read incorrectly).
  • the replacing of the value stored in the respective weight with the zero value is performed in accordance with a determination that the erroneous value was read correctly,
  • detecting and masking errors with zero values significantly relaxes the requirements for convergence (e.g., the first set of training iterations can tolerate a much higher bit error rate and still converge). Errors can occur with respect to stored biases, activations, and possibly other network parameters, in addition to stored weights. The above discussion of errors applies generally to any network parameters.
  • the first RAM has (422) a bit error-rate below a threshold for convergence of the first set of training iterations.
  • the bit error-rate can be configured for the first RAM.
  • the bit error rate of MRAM depends on a write pulse duration and/or a write pulse length, and the write pulse duration and/or write pulse length are selected (e.g., from a calibration curve) to operate the first RAM with a bit error rate below the threshold for convergence during the first set of training operations.
  • the threshold for convergence is greater than (424):
  • 10 3 , 10 5 , 10 7 (e.g., in terms of a bit error rate).
  • the method is performed without error detection and masking, and the threshold for convergence is greater than 10 2 , 10 3 , 10 4 .
  • the bit error rate is greater than (426): 10 4 , 10 6 , 10 '8 .
  • the first set of training iterations includes (428) more than 20%, 40%, 60%, 80%, or 95% of a total number of training iterations used for training the ANN.
  • the method includes, after performing the first set of training iterations, using the second RAM of the type distinct from MRAM, performing (430) a second set of training iterations to train the ANN using the first data.
  • the second set of training iterations include all of the remaining iterations that are not included in the first set of training iterations.
  • the total training iterations for training the ANN comprise the first set of training iterations and the second set of training iterations.
  • each of the second set of training iterations includes
  • each of the second set of training iterations further includes writing the updated set of weights and/or biases of the ANN to the second RAM of the type distinct from the MRAM.
  • the updated values for the set of weights and/or biases written during the second set of training iterations comprise intermediate values.
  • the updated values for the set of weights and/or biases (and/or activations) of the ANN comprise final values, wherein the final values are stored such that the computing device can apply the final stored values to received second data (e.g., unclassified second data) in order to classify the second data.
  • the intermediate stored values are stored in MRAM (e.g., on the chip) and the final stored values are stored in non-volatile off-chip memory.
  • all immediate stored values are stored in (e.g., available from) MRAM and there is no need to bus data on and/or off chip.
  • only a sub-portion of the intermediate stored values are stored in MRAM.
  • the method further includes storing (434) values (e.g., weights and/or biases) for the trained ANN, wherein the trained ANN is configured to classify second data based on the stored values (e.g., classifying the second data by assigning classification scores).
  • the computing device receives (436) second data (e.g., untrained data); and assigns scores to the second data using the stored values of the trained ANN.
  • the computing device after assigning scores (e.g., classifying) the second data using the stored (final) values of the trained ANN, the computing device provides the assigned scores to a host device (e.g., via host interface 106).
  • an electronic system (e.g., system shown in Fig. 1) is provided.
  • the electronic system includes one or more processors, a first random access memory (RAM) comprising magnetic random access memory (MRAM), a second random access memory of a type distinct from MRAM and a non-transitory computer-readable storage medium storing instructions executable by the one or more processors.
  • the instructions include instructions for training an artificial neural network (ANN) using first data by performing a first set of training iterations using the first RAM comprising MRAM, training the ANN using the first data by performing a second set of training iterations using the second RAM comprising a type distinct from MRAM, and storing values for the trained ANN.
  • the trained ANN is configured to classify second data based on the stored values.
  • the electronic system includes a chip (e.g., chip 102).
  • the first RAM e.g., MRAM 110
  • the second RAM e.g., at least a portion of the second RAM stored in SRAM buffer(s)/cache(s) 108) reside on the chip 102.
  • the one or more processors e.g., processor unit(s) 112 reside on the chip.
  • the one or more processors comprise an arithmetic logic unit (ALU).
  • a non-transitory computer readable storage medium resides on the chip.
  • the non-transitory computer-readable storage medium is loaded (e.g., written) with the instructions when the chip is powered-up.
  • the non-transitory computer-readable storage medium comprises a portion of first RAM or second RAM.
  • the electronic system includes an off-chip memory (e.g., DRAM, HBM, RAM chips 100) that holds some or all of the first data during the first set of training iterations and/or the second set of training iterations (e.g., the first data is bussed on and off the chip as needed during the first and second sets of iterations).
  • an off-chip memory e.g., DRAM, HBM, RAM chips 100
  • the first data is bussed on and off the chip as needed during the first and second sets of iterations.
  • the off-chip memory after receiving the second data, stores some or all of the second data (e.g., while the scores are being assigned to the second data).
  • an off-chip memory (e.g., a non-volatile memory) stores the instructions when the chip is powered off.
  • the chip includes a buffer (e.g., SRAM buffer(s)/cache(s) 108) that is communicatively coupled with the off-chip memory.
  • the buffer comprises a portion of the first RAM or the second RAM.
  • the electronic system is configured to perform any of the operations of method 400.
  • FIGs 5A-5B illustrate a method 500 for training an artificial neural network.
  • the method 500 is performed (502) at a computing device (e.g., chip 202 as shown in Figure 2) that includes one or more processors, magnetic random access memory (MRAM), and a non-transitory computer-readable storage medium storing instructions for execution by the one or more processors.
  • the MRAM is (504) on the same chip as the one or more processors.
  • the method includes receiving (506) first data on which to train an artificial neural network (ANN).
  • the computing device trains (508) the ANN by performing a first set of training iterations on the first data.
  • Each of the first set of iterations includes writing values for a set of weights of the ANN to the MRAM using first write parameters corresponding to a first write error rate.
  • Various types of errors are discussed above with reference to method 400 ( Figures 4A-4B). For brevity, those details are not repeated here.
  • the error rates described herein can refer to, in accordance with various embodiments, a specific error rate (e.g., an error rate for a specific type of error) or a net- error rate (e.g., a rate based on the combination of errors because (I) the stored value for the weight is wrong (e.g., a write error, a retention error, or a read disturb error); and (II) the stored value for the weight is correct but it was read incorrectly).
  • a specific error rate e.g., an error rate for a specific type of error
  • a net- error rate e.g., a rate based on the combination of errors because (I) the stored value for the weight is wrong (e.g., a write error, a retention error, or a read disturb error); and (II) the stored value for the weight is correct but it was read incorrectly).
  • each of the first set of training iterations includes (510) reading the values for the set of weights of the ANN from the MRAM and, using the one or more processors, performing a set of arithmetic operations to update the values for the set of weights of the ANN.
  • the values for the set of weights of the ANN written to the MRAM for the iteration are the updated set of weights.
  • the computing device performs (512) error detection that includes detecting an error in a respective weight (or an activation or bias) stored in the MRAM and replaces a value stored for the respective weight with a zero value prior to using the respective weight (e.g., in forward or back propagation).
  • error detection includes detecting an error in a respective weight (or an activation or bias) stored in the MRAM and replaces a value stored for the respective weight with a zero value prior to using the respective weight (e.g., in forward or back propagation).
  • detecting and masking errors with zero values significantly relaxes the requirements for convergence (e.g., the first set of training iterations can tolerate a much higher bit error rate and still converge).
  • the first write parameters correspond to (514) a bit error-rate below a threshold for convergence of the first set of training iterations (e.g., generate errors during a read process at a bit error rate that is below the threshold for convergence).
  • the bit error rate can be configured for the first RAM.
  • the bit error rate of MRAM depends on a write pulse duration and/or a write pulse length, and the write pulse duration and/or write pulse length are selected (e.g., from a calibration curve) to operate the first RAM with a bit error rate below the threshold for convergence during the first set of training operations.
  • the threshold for convergence is (516) greater than: 10 3 , 10 5 , or 10 7 .
  • the method is performed without error detection and masking, and the threshold for convergence is greater than 10 2 , 10 3 , or 10 4 .
  • the first write error rate is (518) greater than: 10 4 , 10 6 , or 10 8 .
  • the first set of training iterations includes (520) more than 20%, 40%, 60%, 80%, or 95% of a total number of training iterations used for training the ANN.
  • the computing device After performing the first set of iterations, the computing device performs (522) a second set of training iterations on the first data.
  • Each of the second set of iterations includes writing values for the set of weights of the ANN to the MRAM using second write parameters corresponding to a second write error rate (e.g., generate errors during a read process at a bit error rate that is below the threshold for convergence).
  • the second write error rate is lower than the first write error rate.
  • the write error rate is gradually reduced.
  • there are more than two sets of iterations including a third set of iterations.
  • Each of the third set of iterations includes writing values for the set of weights of the ANN to the MRAM using third write parameters corresponding to a third write error rate.
  • the third write error rate is lower than the first write error rate and the second error rate.
  • each set of iterations includes a single iteration. In some embodiments, each set of iterations includes a plurality of iterations.
  • each of the first set of iterations includes (524) writing values for a set of biases and a set of activations of the ANN and each of the second set of iterations includes writing values for the set of biases and the set of activations of the ANN.
  • the first write parameters include (526) a first write pulse duration
  • the second write parameters include a second write pulse duration that is longer than the first write pulse duration
  • the first write parameters include (528) a first write current
  • the second write parameters include a second write current that is greater than the first write current
  • writing the values for the set of weights of the ANN to the MRAM using the write parameters corresponding to the first write error rate includes (530) writing the values without using an error-correcting code, and writing the values for the set of weights of the ANN to the MRAM using the second write parameters corresponding to the second write error rate includes writing the values using an error-correcting code.
  • the first set of iterations includes error detection and masking, as described above, but not error correction.
  • each of the second set of training iterations includes (532) reading the values for the set of weights of the ANN from the MRAM and, using the one or more processors, performing the set of arithmetic operations to update the values for the set of weights of the ANN.
  • the values for the set of weights of the ANN written to the MRAM for the iteration are the updated set of weights.
  • each of the first set of iterations includes (534) reading values for the set of weights of the ANN to the MRAM using first read parameters corresponding to a first read error rate and each of the second set of iterations includes reading values for the set of weights of the ANN to the MRAM using second read parameters corresponding to a second read error rate.
  • the second read error rate is lower than the first read error rate.
  • each of the third set of iterations includes reading values for the set of weights of the ANN to the MRAM using third read parameters corresponding to a third read error rate.
  • the third read error rate is lower than the first read error rate and the second read error rate (e.g., the read error rate is gradually reduced as the training progresses).
  • the first read parameters include (536) a first read pulse duration and the second read parameters include a second read pulse duration that is longer than the first read pulse duration.
  • the first read parameters include (538) a first read current and the second read parameters include a second read current that is greater than the first read current.
  • the computing device stores (540) values for the trained ANN.
  • the trained ANN is configured to classify second data (e.g., untrained data) based on the stored values.
  • the computing device receives (542) second data and assigns scores to the second data using the stored values of the trained ANN.
  • an electronic system includes one or more processors, magnetic random access memory (MRAM), write circuitry configured to write data to the MRAM and a non-transitory computer-readable storage medium storing instructions for execution by the one or more processors.
  • MRAM magnetic random access memory
  • write circuitry configured to write data to the MRAM
  • non-transitory computer-readable storage medium storing instructions for execution by the one or more processors.
  • the write circuitry includes word lines and bit lines (e.g., wires) and sets of corresponding transistors (e.g., for activating the word lines and bit lines).
  • the write circuitry includes or is coupled with memory storing the first write parameters and second write parameters.
  • the stored instructions include instructions for receiving first data on which to train an artificial neural network (ANN).
  • the instructions further include instructions for, using the MRAM, training the ANN by performing a first set of training iterations on the first data.
  • Each of the first set of iterations includes writing, using the write circuitry, values for a set of weights of the ANN to the MRAM using first write parameters corresponding to a first write error rate.
  • the instructions include instructions for, after performing the first set of iterations, performing a second set of training iterations on the first data.
  • Each of the second set of iterations includes writing, using the write circuitry, values for the set of weights of the ANN to the MRAM using second write parameters corresponding to a second write error rate.
  • the second write error rate is lower than the first write error rate.
  • the instructions further include instructions for storing values for the trained ANN.
  • the trained ANN is configured to classify second data based on the stored values.
  • the electronic system includes a chip (e.g., chip 202).
  • the MRAM resides on the chip.
  • the one or more processors reside on the chip.
  • the one or more processors comprise an arithmetic logic unit (ALU).
  • ALU arithmetic logic unit
  • a non-transitory computer readable storage medium resides on the chip.
  • the non-transitory computer-readable storage medium is loaded (e.g., written) with the instructions when the chip is powered-up.
  • the non-transitory computer-readable storage medium comprises a portion of MRAM.
  • the electronic system includes an off-chip memory (e.g., DRAM, HBM) that holds some or all of the first data during the first set of training iterations and/or the second set of training iterations (e.g., the first data is bussed on and off the chip as needed during the first and second sets of iterations.
  • the off-chip memory after receiving the second data, stores some or all of the second data (e.g., while the scores are being assigned to the second data).
  • an off-chip memory (e.g., a non-volatile memory) stores the instructions when the chip is powered off.
  • the chip includes a buffer that is communicatively coupled with the off-chip memory.
  • the buffer comprises a portion of the MRAM.
  • the buffer comprises a memory of type distinct from MRAM (e.g., SRAM).
  • the electronic system is configured to perform any of the operations described with reference to method 500.
  • FIG 6 is a block diagram of a computer system 630 for training an artificial neural network, according to some embodiments.
  • Computer system 630 typically includes one or more processors (sometimes called CPUs) 602 for executing programs or instructions; memory 610; one or more communications interfaces 606; and one or more communication buses 605 for interconnecting these components.
  • processors 602 include the chips 102/202 shown and described with reference to Figures 1-2.
  • Computer system 630 optionally includes a user interface 609 comprising a display device 611 and one or more input devices 613 (e.g., one or more of a keyboard, mouse, touch screen, keypad, etc.) coupled to other components of computer system 630 by the one or more
  • input devices 613 e.g., one or more of a keyboard, mouse, touch screen, keypad, etc.
  • the one or more communication buses 605 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • Communication interface 606 is used by computer system 630 to convey information to external systems, and to receive communications from external systems, such as external database 652 (e.g., which may store ANN training data or data to be classified by a trained ANN).
  • external database 652 e.g., which may store ANN training data or data to be classified by a trained ANN.
  • the connection between computer system 630 and external database 652 may include a communication network such as the internet or a public or proprietary wireless network.
  • Memory 610 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 610 optionally includes one or more storage devices remotely located from the CPU(s) 602.
  • Memory 610 or alternately the non-volatile memory device(s) within memory 610, comprises a computer readable storage medium.
  • memory 610 or the computer readable storage medium of memory 610 stores the following programs, modules and data structures, or a subset thereof:
  • an operating system 612 that includes procedures for handling various basic system services and for performing hardware dependent tasks
  • interface 606 to handle communications between computer 130 and external systems
  • a user interface module 616 for receiving information from one or more input device 613 of user interface database 609, and to convey information to a user of computer system 630 via one or more display or output devices 611;
  • an ANN training module 618 for training an artificial neural network (e.g., causing the system to perform any of the ANN training methods described herein);
  • ANN training data 620 used for training artificial neural networks (e.g., sets of inputs and labels indicating correct classifications.
  • Operating system 612 and each of the above identified modules and applications correspond to a set of instructions for performing a function described above.
  • the set of instructions can be executed by the one or more processors 602 of computer system 630.
  • the above identified modules, applications or programs i.e., sets of instructions
  • memory 610 stores a subset of the modules and data structures identified above.
  • memory 610 optionally stores additional modules and data structures not described above.
  • Figure 6 is intended more as a functional description of the various features which may be present in a computer system 630 than as a structural schematic of the embodiments described herein.
  • items shown separately could be combined and some items could be separated.
  • some items shown separately in Figure 6 could be combined into a single module or component, and single items could be implemented using two or more modules or components.
  • the actual number of modules and components, and how features are allocated among them will vary from one implementation to another.
  • CPUs 602 include specialized hardware for performing these and other tasks.
  • stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
  • first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
  • a first electronic device could be termed a second electronic device, and, similarly, a second electronic device could be termed a first electronic device, without departing from the scope of the various described implementations.
  • the first electronic device and the second electronic device are both electronic devices, but they are not the same type of electronic device.
  • the term“if’ is, optionally, construed to mean“when” or “upon” or“in response to determining” or“in response to detecting” or“in accordance with a determination that,” depending on the context.
  • the phrase“if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean“upon determining” or“in response to determining” or“upon detecting [the stated condition or event]” or“in response to detecting [the stated condition or event]” or“in accordance with a determination that [a stated condition or event] is detected,” depending on the context.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Computer Hardware Design (AREA)
  • Mram Or Spin Memory Techniques (AREA)
PCT/US2019/066847 2018-12-17 2019-12-17 System and method for training artificial neural networks WO2020131868A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201980092228.9A CN113841165A (zh) 2018-12-17 2019-12-17 用于训练人工神经网络的***和方法

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16/223,055 2018-12-17
US16/223,058 US11586906B2 (en) 2018-12-17 2018-12-17 System and method for training artificial neural networks
US16/223,058 2018-12-17
US16/223,055 US20200193282A1 (en) 2018-12-17 2018-12-17 System and Method for Training Artificial Neural Networks

Publications (1)

Publication Number Publication Date
WO2020131868A1 true WO2020131868A1 (en) 2020-06-25

Family

ID=69326626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/066847 WO2020131868A1 (en) 2018-12-17 2019-12-17 System and method for training artificial neural networks

Country Status (2)

Country Link
CN (1) CN113841165A (zh)
WO (1) WO2020131868A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022232066A1 (en) * 2021-04-27 2022-11-03 Micron Technology, Inc. Decoders and systems for decoding encoded data using neural networks
US11563449B2 (en) 2021-04-27 2023-01-24 Micron Technology, Inc. Systems for error reduction of encoded data using neural networks
US11599773B2 (en) 2018-12-27 2023-03-07 Micron Technology, Inc. Neural networks and systems for decoding encoded data
US11755408B2 (en) 2021-10-07 2023-09-12 Micron Technology, Inc. Systems for estimating bit error rate (BER) of encoded data using neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154225B (zh) * 2016-12-06 2021-09-03 上海磁宇信息科技有限公司 一种使用模拟计算的神经网络芯片
US20180322386A1 (en) * 2017-05-05 2018-11-08 Intel Corporation Fine-grain compute communication execution for deep learning frameworks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IMANI MOHSEN ET AL: "CANNA: Neural network acceleration using configurable approximation on GPGPU", 2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), IEEE, 22 January 2018 (2018-01-22), pages 682 - 689, XP033323883, DOI: 10.1109/ASPDAC.2018.8297401 *
LOCATELLI NICOLAS ET AL: "Use of Magnetoresistive Random-Access Memory as Approximate Memory for Training Neural Networks", 2018 25TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), IEEE, 9 December 2018 (2018-12-09), pages 553 - 556, XP033504086, DOI: 10.1109/ICECS.2018.8617952 *
VENKATARAMANI SWAGATH ET AL: "Approximate computing and the quest for computing efficiency", 2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), IEEE, 8 June 2015 (2015-06-08), pages 1 - 6, XP033181631, DOI: 10.1145/2744769.2744904 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11599773B2 (en) 2018-12-27 2023-03-07 Micron Technology, Inc. Neural networks and systems for decoding encoded data
WO2022232066A1 (en) * 2021-04-27 2022-11-03 Micron Technology, Inc. Decoders and systems for decoding encoded data using neural networks
US11563449B2 (en) 2021-04-27 2023-01-24 Micron Technology, Inc. Systems for error reduction of encoded data using neural networks
US11973513B2 (en) 2021-04-27 2024-04-30 Micron Technology, Inc. Decoders and systems for decoding encoded data using neural networks
US11755408B2 (en) 2021-10-07 2023-09-12 Micron Technology, Inc. Systems for estimating bit error rate (BER) of encoded data using neural networks

Also Published As

Publication number Publication date
CN113841165A (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
WO2020131868A1 (en) System and method for training artificial neural networks
US11501109B2 (en) Non-volatile memory die with on-chip data augmentation components for use with machine learning
US11914860B2 (en) Data storage for artificial intelligence-based applications
US11507843B2 (en) Separate storage and control of static and dynamic neural network data within a non-volatile memory array
US9972382B2 (en) Non-volatile memory device
US9086993B2 (en) Memory device with internal signal processing unit
US9128822B2 (en) On-chip bad block management for NAND flash memory
US11520521B2 (en) Storage controller having data augmentation components for use with non-volatile memory die
US10740165B2 (en) Extending the error correction capability of a device using a neural network
US11216696B2 (en) Training data sample selection for use with non-volatile memory and machine learning processor
US20200193282A1 (en) System and Method for Training Artificial Neural Networks
US10804935B2 (en) Techniques for reducing latency in the detection of uncorrectable codewords
KR20210024188A (ko) 기입 버퍼 관리
US11836607B2 (en) System and method for classifying data using neural networks with errors
KR20210028265A (ko) 고속 비휘발성 스토리지 장치 복구 기술
US11574194B2 (en) System and method for training neural networks with errors
US11586906B2 (en) System and method for training artificial neural networks
US20130246847A1 (en) Method of detecting error in write data and data processing system to perform the method
US20200125932A1 (en) Method and apparatus for defect-tolerant memory-based artificial neural nework
US20220139453A1 (en) Memory management device, system and method
CN117693758A (zh) 与递归神经网络一起使用的非易失性存储器(nvm)设备的混合存储器管理
US20220044102A1 (en) Fault tolerant artificial neural network computation in deep learning accelerator having integrated random access memory
US11972122B2 (en) Memory read operation using a voltage pattern based on a read command type
CN117808059A (zh) 神经网络中用于灵活的二次幂计算的非均匀量化

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19842916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19842916

Country of ref document: EP

Kind code of ref document: A1