CN113378115A

CN113378115A - Near-memory sparse vector multiplier based on magnetic random access memory

Info

Publication number: CN113378115A
Application number: CN202110689836.7A
Authority: CN
Inventors: 蔡浩; 陈骏通; 张优优; 郭亚楠; 周永亮; 刘波
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-09-10
Anticipated expiration: 2041-06-22
Also published as: CN113378115B

Abstract

The invention discloses a Magnetic Random Access Memory (MRAM) -based near-memory sparse vector multiplier, which belongs to the field of integrated circuit design and comprises a sparse mark generator, an input unit, a controller, a near-memory multiplier accumulator, a near-memory processing unit, a core memory array, a cache memory array, a sensitive amplifier and a shift adder tree. The invention has the functions of realizing multiplication calculation of 2 signed integer vectors and automatically skipping zero vectors. The MRAM has the characteristics of non-volatility and extremely low standby power consumption, and meanwhile, the sparse zone bit is introduced and calculation is carried out at the output end of the memory, so that the data transfer power consumption and the overturning power consumption are reduced respectively. Compared with a traditional neural network accelerator with a von Neumann architecture, the method effectively improves the calculation energy efficiency of vector multiplication.

Description

Near-memory sparse vector multiplier based on magnetic random access memory

Technical Field

The invention relates to the field of integrated circuits, in particular to a magnetic random access memory-based near-memory sparse vector multiplier.

Background

In recent years, neural networks have been widely varied in the fields of computer vision, natural language processing and the like, leading to a new round of artificial intelligence enthusiasm. The neural network is composed of layers with different functions, and the current mainstream design comprises the following components: a convolution computation layer, a full-link computation layer, an activation function layer, a normalization layer, an attention layer, etc. In the application process, the core calculation process can be abstracted into the form of vector multiplication, as shown in formula (1):

wherein

The result of the calculation for the input or each layer will be changed continuously in the whole network calculation process,

is a fixed weight and does not change.

At present, in order to effectively reduce consumption of hardware resources, particularly in embedded mobile devices, an idea is to use a quantization method to change an activation value and a weight from a 32-bit floating point number to an 8-bit integer, so that under the condition of not losing application performance, storage requirements and data calculation amount are greatly reduced, and energy efficiency is improved. Another idea is to use the sparse property of activation values or weights, as shown in the following example:

obviously, for vectors

For example, the result of multiplying the first four elements by any vector is 0, so skipping the zero vector multiplication can effectively reduce power consumption. Currently for the multiplication of sparse vectors,the method of judging after reading data is mostly adopted, although the calculation power consumption can be reduced, the access and storage are still needed, and the access and storage power consumption also occupies the leading factor considering that each element occupies the bit width of 8bits, so the method still has an optimization space.

In the conventional von neumann architecture, the memory and the computing unit are independent from each other, and when a computing operation needs to be performed, data needs to be transferred to a cache of the computing unit, which is usually composed of a Static Random Access Memory (SRAM) or a Flip-Flop (Flip-Flop), and then the result is transferred to the memory, which consumes a lot of energy for data transfer and cache update. Near Memory Computing (NMC) breaks through the traditional von Neumann architecture, and a Computing circuit and a Memory are connected into a whole, so that the data transfer and Memory access power consumption are greatly reduced. Since the NMC usually employs a memory array in cooperation with a digital processing unit, the calculation accuracy can be guaranteed, but it is a key challenge in the NMC architecture to further reduce the power consumption of the two circuits. Most NMC technologies are based on Dynamic Random Access Memory (DRAM) which requires frequent refresh operations to maintain data or FLASH (FLASH) which is slow and presents a short board in the face of neural network applications with large data computations. The novel non-volatile memory MRAM can store data in a power-off state, greatly reduces data maintenance power consumption and leakage power consumption, and has a high memory access speed to meet the calculation requirement of a neural network, so that the MRAM-based near-memory sparse vector multiplier has great advantages compared with other NMC technologies.

Disclosure of Invention

The technical problem is as follows: aiming at the defects in the prior art, the invention discloses a near memory sparse vector multiplier based on a Magnetic Random Access Memory (MRAM). A bit zone bit is additionally written in while data is written in, and the near memory processing unit skips the access and calculation processes by using sparse zone bit information to realize near memory sparse vector multiplication. The multiplier is optimized in power consumption in the aspects of circuit structure and network structure, and the problems of low speed and high energy consumption of the conventional NMC technology are solved.

The technical scheme is as follows: the invention relates to a near memory sparse vector multiplier based on a magnetic random access memory, which comprises a sparse sign generator, an input unit, a near memory accumulator and a controller, wherein the input unit is used for inputting a sparse sign;

the sparse flag generator is connected with the input unit and judges whether input data is 0 or not through a logic circuit to generate a sparse flag bit, and the data and the sparse flag bit are transmitted into the input unit; the input data comprises a weight vector and an activation vector;

the input unit is connected with the near memory accumulator, the near memory accumulator receives data from the input unit and performs near memory accumulation calculation, and memory access and calculation of zero vectors are skipped in the near memory accumulation calculation process;

the controller is respectively connected with the sparse mark generator, the input unit and the near-storing-multiplying accumulator, and is used for controlling the realization of functions of the sparse mark generator, the input unit and the near-storing-multiplying accumulator and generating address signals for reading and storing data.

Further, the sparse flag generator includes six two-input or gates and one two-input nor gate, and is configured to judge whether all 8-bit data are 0, and generate a sparse flag bit of the data; the six two-input or gates are respectively marked as a first two-input or gate, a second two-input or gate, a third two-input or gate, a fourth two-input or gate, a fifth two-input or gate and a sixth two-input or gate, wherein the input ends of the first to fourth two-input or gates form the input end of the sparse flag generator, the output ends of the first two-input or gate and the second two-input or gate are connected with the input end of the fifth two-input or gate, the output ends of the third two-input or gate and the fourth two-input or gate are connected with the input end of the sixth two-input or gate, the output ends of the fifth two-input or gate and the sixth two-input or gate are connected with the input end of the two-input nor gate, and the output end of the two-input nor gate is the output end of the sparse flag generator.

Further, the input unit is configured to receive input data and a sparse flag bit of the sparse flag generator, receive 8-bit write data and a sparse flag bit of the data per cycle, receive 8-bit write data from the sparse flag generator in eight cycles, update the current sparse flag bit after each data reception cycle, and output 64-bit and 1-bit sparse flag bits in total after eight cycles;

as shown in formula (4), the sparse flag bit F is used to characterize whether the length-8 and bit-width-8 vector is zero, F_iIndicating whether the vector written during the i-th cycle is zero or not.

Furthermore, the near memory accumulator comprises near memory processing units PE and a part accumulator, each near memory processing unit PE in the near memory accumulator carries out parallel calculation, and the final result is accumulated by the part accumulator;

the near memory processing unit comprises an address decoder, a core array MRAM1, a buffer array MRAM2, a buffer array MRAM3, a first sense amplifier, a second sense amplifier, a shift adder tree and a logic AND module.

The address decoder is respectively connected with the core array MRAM1, the buffer array MRAM2 and the buffer array MRAM3, and is used for decoding the address signals output by the controller and storing data into corresponding addresses according to the address signals; or reading data participating in the calculation;

the core array MRAM1 is used to store weight vectors, the buffer array MRAM2 is used to store activation vectors, and the buffer array MRAM3 is used to store output vectors;

the first sense amplifier is connected with the core array MRAM1 for reading the weight vector sparse flag bit F of the core array MRAM1₀The second sense amplifier is connected to the data bit and the buffer array MRAM2, and the first sense amplifier and the second sense amplifier are sensitive to the sparse flag signal, and are used for reading the sparse flag bit F of the activation vector in the buffer array MRAM2₁And a data bit;

the first and second sense amplifiers first read sparse flag bits in the weight vector and the activation vector, where F₀And F₁Interacting with each other and feeding back to the first and second sense amplifiers if F₀|F₁If true, at least one of the weight vector and the activation vector is zero, so that the first sense amplifier and the second sense amplifier are all turned off, and the access of the zero vector is skipped. If F₀|F₁If false, the weight vector and the activation vector are multiplied by the AND logic module and sent to the shift adder tree.

The shift adder tree is sensitive to sparse flag signals, receives sparse flag bits transmitted by the first sensitive amplifier and the second sensitive amplifier, skips calculation of zero vectors if the sparse flag bits indicate that zero vectors exist in vectors to be multiplied, maintains all data unchanged, sets output to 0 through combinational logic, and reduces turnover power consumption. Otherwise, the input of the first sensitive amplifier and the input of the second sensitive amplifier are multiplied through logical AND and are sent to a shift adder tree for shift addition;

the logic and module is used for calculating the product of the activation vector and the weight vector, the logic and module calculates (1bit multiplied by 8bits) each time and sends the result to the shift adder tree, and the (8bit multiplied by 8bits) calculation is completed after 8 cycles.

The near-memory multiply accumulator works in a three-stage pipeline mode and comprises a PE calculation part and an accumulation part, wherein the PE calculation part and the accumulation part are written back. The vector multiplication is performed internally by each PE, the accumulated result of 48 PEs is then sent to the partial sum accumulator, the accumulation operation is performed, the shift is performed, the data is restored to 8bits, and finally the 8-bit data is written back to the cache array MRAM 3. In the whole process, the read operation occurs in the core MRAM1 and the buffer MRAM2, and the write operation occurs in the buffer MRAM3

Further, the core array MRAM1 is used to store weight vectors, and the weight matrix M is mapped in the proximity processing unit PE core array MRAM1 as shown in formula (2)

The mapping mode is that each element of the weight matrix M is expanded into 8-bit binary number, and a sparse flag bit is additionally added to each row to judge whether the vector of the row is zero or not.

Further, the cache memory array MRAM2 is used to store an activation vector

In the cache memory array MRAM2 disclosed in the present invention, the mapping formula is shown in (3)

Mapping into an activation vector

Each element is developed as an 8-bit binary number, and each row is arranged with eight identical address bits of the operated-on number and a sparse flag bit, and whether the vector of the row and the previous row is a zero vector is determined, for example, f_a7Indicates whether the row vector is zero and f_a0-f_a6Whether or not it is also zero, so f_a7For representing the activation vector

Whether it is a zero vector.

Has the advantages that: by adopting the technical scheme, the invention has the following beneficial effects:

(1) according to the invention, the near memory sparse vector multiplier is constructed based on the MRAM, data stored in the MRAM array cannot be lost due to power supply shutoff, the storage requirement that a large amount of weights are hardly updated in neural network application is met, the data maintenance power consumption is effectively reduced, meanwhile, the power consumption of data migration is greatly reduced due to the characteristic of near memory calculation, and the overall energy efficiency is improved.

(2) The invention realizes the sparsity judgment of input data by using the sparse mark generator, records the sparsity only with 1.6 percent of storage overhead, and overcomes the defect that all data still need to be accessed when sparse vector operation is carried out.

(3) The invention realizes the neural network calculation of full-connection 8-bit quantization by using a near-memory sparse vector multiplier, skips the memory access and calculation stages based on sparse flag bits, and reduces the memory access power consumption and the calculation power consumption.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are always needed for describing the embodiments are simply reduced, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a block diagram of a structure for implementing MNIST handwritten digit recognition by using a magnetic random access memory-based near-memory sparse vector multiplier according to an embodiment of the present invention;

FIG. 2 is a block diagram of a magnetic random access memory-based near-memory sparse vector multiplier according to an embodiment of the present invention;

FIG. 3 is a circuit diagram of a sparse flag generator provided by an embodiment of the present invention;

FIG. 4 is a block diagram of a near memory multiply accumulator according to an embodiment of the present invention;

FIG. 5 is a block diagram of a near memory processing unit according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a shift adder according to an embodiment of the present invention;

FIG. 7 is a timing diagram illustrating operation of a near memory processing unit according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a working pipeline of a magnetic random access memory-based near-memory sparse vector multiplier according to an embodiment of the present invention;

FIG. 9 is a comparison diagram of power consumption of a near-memory sparse vector multiplication provided by an embodiment of the present invention;

fig. 10 is a statistical result of sparsity of a neural network in an MNIST handwriting database application according to an embodiment of the present invention;

fig. 11 is a block diagram of the multiplier of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the examples of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

Fig. 1 is a structural block diagram of a magnetic random access memory-based near-memory sparse vector multiplier for implementing MNIST handwritten digit recognition according to an embodiment of the present invention; the picture to be identified is converted into an input vector, the circle in the square frame represents a weight vector, a group of probability vectors are obtained through a calculation mode of multiple vector multiplication, the number corresponding to the maximum value is taken out from the probability vectors and is an identification value, and the vector multiplication is realized by using a near-memory sparse vector multiplier.

As shown in fig. 2 and fig. 11, the near memory sparse vector multiplier based on the magnetic random access memory of the present invention includes a sparse flag generator, an input unit, a controller and a near memory accumulator.

The sparse mark generator is connected with the input unit and judges whether input data is 0 or not through a logic circuit, a sparse mark bit is generated, and the data and the sparse mark bit are transmitted into the input unit. The input data includes a weight vector and an activation vector.

the controller is respectively connected with the sparse mark generator, the input unit and the near-multiplication accumulator, and is used for controlling the realization of the functions of the sparse mark generator, the input unit and the near-multiplication accumulator.

As shown in fig. 3, the sparse flag generator includes a combinational logic circuit composed of 6 two-input or gates and 1 two-input nor gate, and is configured to determine whether all 8-bit data are 0, implement the logical operation of equation (5), and generate a sparse flag bit of the data.

In this embodiment, a 64 × 384 fully-connected layer is used as a design object, i.e. the weight data is a 64 × 384 matrix, the input activation vector is a 1 × 384 ordered array, and the output activation vector is a 1 × 64 ordered array, the system will complete the calculation of formula (6), where-128 < i, and w < 127.

As shown in fig. 4, the near memory multiply accumulator provided by the embodiment of the present invention includes 48 near memory processing units PE and a partial sum accumulator, each near memory processing unit PE performs parallel computation, and the computation result is accumulated in the partial sum accumulator. Therefore, the weight array of 64 × 384 is divided into 48 groups of 64 × 8 data corresponding to PEs one-to-one, while the input activation vector is divided into 48 groups of 1 × 8 data corresponding to PEs one-to-one in the same way; thus, equation (6) can be transformed into equation (7) again, where j represents the jth PE unit.

As shown in fig. 5, the near memory processing unit includes an address decoder, a core array MRAM1, a buffer array MRAM2, a buffer array MRAM3, a first sense amplifier, a second sense amplifier, a shift adder tree, and a logical and module.

The address decoder is respectively connected with the core array MRAM1, the buffer array MRAM2 and the buffer array MRAM3, and is used for decoding the address signals output by the controller and storing data into corresponding addresses; or reading data participating in the calculation;

the core MRAM1 is used to store weight vectors, the buffer MRAM2 is used to store activation vectors, and the buffer MRAM3 is used to store output vectors.

The first sense amplifier is connected with the core array MRAM1 for reading the weight vector sparse flag bit F of the core array MRAM1₀The second sense amplifier is connected to the data bit and the buffer array MRAM2 for reading the sparse flag bit F of the activation vector in the buffer array MRAM2₁And a data bit, wherein F₀And F₁Interacting with each other and feeding back to the first and second sense amplifiers if F₀|F₁If true, it means that at least one of the weight vector or the activation vector is zero, thus turning off all of the first and second sense amplifiers and skipping the calculation cycle. If F₀|F₁If false, the weight vector and the activation vector are multiplied by the AND logic module and sent to the shift adder tree.

Weight matrix W in equation (7)_ijThe mapping in the core array MRAM1 in PE is shown in equation (8), where the elements of the mapping weight matrix W are expanded into 8-bit binary numbers, and each row is additionally provided with a sparse flag f_wjx(j is jth PE, x is xth operand of PE), and the controller provided in this embodiment of the present invention generates a weight vector write signal to write the weight vector into the core array MRAM1, and since the MRAM used in this embodiment stores data without being affected by the power being turned off, the weight needs to be uploaded only once for the MNIST handwriting recognition application of this embodiment.

Direction of activationThe quantity mapping is shown in equation (9), where

Representing the input vector corresponding to the ith PE, and expanding each element into 8-bit binary number, wherein each row is arranged in such a way that the same address bit and one sparse flag bit f of eight operated numbers_ijx(j is the jth PE and x is the xth operand of that PE). The controller provided by the embodiment of the present invention generates an activation vector write signal to write the activation vector into the cache memory array MRAM 2.

The intra-PE calculation is thus as shown in equation (10):

FIG. 6 is a schematic diagram of a shift adder according to an embodiment of the present invention, in which 8 data with a bit width of 8 are added two by two to calculate a final result and stored in S_regAnd the shift adder is sensitive to sparse flag bits, if the input vector is a zero vector, then S_regRemain unchanged and set the output to 0, otherwise shift add.

As shown in FIG. 7, the timing diagram of the operation of the near memory processing unit according to the embodiment of the present invention is that the controller generates the read enable signal SAE, and the core array MRAM1 implements the read weight sparse flag F on the falling edge (r) of the read enable signal SAE₀The cache memory array MRAM2 implements the read input sparse flag F₁；

At rising edge of SAE @, if the sparse flag bits are both 0, it means that neither the activation vector nor the weight vector is 0, so the calculation operation is ready to be entered, including the following three simultaneous steps:

a) reading all data (8 multiplied by 8bits) of the weight vector at the rising edge according to the stored data mapping mode, reading the most significant bit (8 multiplied by 1bit) of the activation vector, performing logic AND operation on the two, generating a product result (8 multiplied by 8bits) of the weight vector and the most significant bit of the activation vector, and sending the product result into a shift adder tree;

b) the tree output of the shift adder is reset at the rising edge (c) and outputs S in the next cycle₀；

c) The read enable of the first sense amplifier is turned off at position two (data is kept through a register in the sense amplifier, no turning power consumption and read power consumption are generated), the read enable of the second sense amplifier is kept, the next highest bit of the activation vector is read out in the next period, and is subjected to AND operation with the weight vector and sent into the shift adder tree, and S is completed₀Shifting left by one bit and accumulating the current result;

d) repeating the operation c until the lowest bit of the activated vector is read out, and outputting the final accumulation result S by the shift adder₇。

If any sparse flag bit is 1, it indicates that there is a zero vector in the active vector and the weight vector at the address, and neither the weight vector and the active vector nor the value of the register in the shift adder (PSUM) are updated in the next eight cycles, at which time the output of the shift adder is set to 0 by the combinational logic.

Fig. 8 is a schematic diagram of a working pipeline of a magnetic random access memory-based near memory sparse vector multiplier according to an embodiment of the present invention, where after a weight matrix is uploaded, a near memory multiplier accumulator enters an inference stage; the vector multiply accumulation is performed internally by each PE, the accumulated results of 48 PEs are then sent to the partial sum accumulator, the accumulation operation is performed, the shift is performed to restore the data to 8bits, and finally the 8-bit data is written back to the cache array MRAM 3. In the overall process, a read operation occurs in the core array MRAM1 and the buffer array MRAM2, and a write operation occurs in the buffer array MRAM 3.

FIG. 9 is a comparison diagram of power consumption for near memory sparse vector multiplication according to an embodiment of the present invention; it can be obtained from the above calculation process that when a single PE calculates a non-zero vector, it needs to read 130 bits of data (8 × 8bits of weight vector and 1bit sparse flag bit, 8 × 8bits of activation vector and 1bit sparse flag bit), and since the activation vector only reads 8bits at a time, it needs 93 bits of registers (8 × 8bits of weight vector and 1bit sparse flag bit, 8bits of activation vector and 1bit sparse flag bit, and 19 bits of accumulated sum result), and partial combinational logic. Due to the addition of the sparse flag bits, when the PE processes any one of the activation vector and the weight vector as a zero vector, only 2 flag bits need to be read in the first period, then all the sensitive amplifiers are turned off, the register maintains the value of the last moment, and the output is set to be 0 through combinational logic. In this embodiment, the inversion of the register and the read stage of the sense amplifier consume more than 80% of the energy, so that the power consumption can be effectively reduced and the energy efficiency can be improved by the above method.

Fig. 10 is a statistical result of sparsity of a neural network in an application of an MNIST handwriting data set according to an embodiment of the present invention; statistical results are obtained by analyzing the weights uploaded to the PE units with the input data. The statistical result corresponds to the structure of a near memory vector multiplier, each box represents a PE unit, the deeper the color is, the lower the sparsity degree is, otherwise, the higher the sparsity degree is, the overall average sparsity level is 61.2%, namely, more than six calculations are skipped on average in ten calculations. Therefore, the near-memory sparse vector multiplier of the embodiment saves power consumption and improves energy efficiency by identifying the sparsity and skipping the calculation process.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A near-memory sparse vector multiplier based on a magnetic random access memory is characterized by comprising a sparse mark generator, an input unit, a near-memory accumulator and a controller;

2. The MRAM-based near-memory sparse vector multiplier of claim 1, wherein the sparse flag generator comprises six two-input OR gates and one two-input NOR gate for determining whether all 8-bit data is 0 and generating sparse flag bits of the data.

3. The MRAM-based near-memory sparse vector multiplier of claim 1, wherein the input unit is configured to receive input data and sparse flag bits of the sparse flag generator, receive an 8-bit write data and sparse flag bits of the data per cycle, receive 8-bit write data from the sparse flag generator in eight cycles, update the current sparse flag bit after each data reception cycle, and output 64 bits and 1-bit sparse flag bits in total after eight cycles;

4. The MRAM-based near-memory sparse vector multiplier of claim 1, wherein the near-memory multiplier accumulator comprises near-memory processing units (PE) and a partial sum accumulator, wherein each of the near-memory processing units (PE) in the near-memory multiplier accumulator performs parallel calculation, and the final result is accumulated by the partial sum accumulator;

the near memory processing unit comprises an address decoder, a core array MRAM1, a buffer array MRAM2, a buffer array MRAM3, a first sense amplifier, a second sense amplifier, a shift adder tree and a logic AND module;

the address decoder is respectively connected with the core array MRAM1, the cache array MRAM2 and the cache array MRAM3, and is used for decoding the address signals output by the controller and storing data into corresponding addresses according to the address signals; or reading data participating in the calculation;

the first and second sense amplifiers first read sparse flag bits in the weight vector and the activation vector, where F₀And F₁Interacting with each other and feeding back to the first and second sense amplifiers if F₀|F₁If true, at least one group of vectors in the weight vector or the activation vector is zero, so that the first sensitive amplifier and the second sensitive amplifier are all turned off, and the access and the storage of the zero vector are skipped; if F₀|F₁If the result is false, the weight vector and the activation vector are subjected to AND multiplication by the logical AND module and are sent to the shift adder tree;

the shift adder tree is sensitive to sparse flag signals, receives sparse flag bits transmitted by a first sense amplifier and a second sense amplifier, skips calculation of zero vectors if the sparse flag bits indicate that zero vectors exist in vectors to be multiplied, maintains all data unchanged, sets output to 0 through combinational logic, and reduces turnover power consumption; otherwise, the input of the first sensitive amplifier and the input of the second sensitive amplifier are multiplied through logical AND and are sent to a shift adder tree for shift addition;

after the multiplication of the weight vector and the activation vector is completed in the near memory processing element PE, the accumulated result of each PE is then sent to the partial sum accumulator, shifted after the accumulation operation is performed, the data is restored to 8bits, and finally the 8-bit output vector is written back to the cache array MRAM 3.

5. The MRAM-based near-memory sparse vector multiplier of claim 4, wherein the core array MRAM1 is configured to store weight vectors, and the weight matrix M is mapped in the near-memory processing unit PE core array MRAM1 as shown in formula (2)

6. The MRAM-based near-memory sparse vector multiplier of claim 4, wherein the buffer array MRAM2 is configured to store an activation vector

Mapping into an activation vector

Whether it is a zero vector.