CN113345484A - Data operation circuit and storage and calculation integrated chip - Google Patents

Data operation circuit and storage and calculation integrated chip Download PDF

Info

Publication number
CN113345484A
CN113345484A CN202110705287.8A CN202110705287A CN113345484A CN 113345484 A CN113345484 A CN 113345484A CN 202110705287 A CN202110705287 A CN 202110705287A CN 113345484 A CN113345484 A CN 113345484A
Authority
CN
China
Prior art keywords
circuit
data
multiplicand
input
multiplier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110705287.8A
Other languages
Chinese (zh)
Inventor
佘一奇
吴守道
郑坚斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Zhaoxin Semiconductor Technology Co ltd
Original Assignee
Suzhou Zhaoxin Semiconductor Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Zhaoxin Semiconductor Technology Co ltd filed Critical Suzhou Zhaoxin Semiconductor Technology Co ltd
Priority to CN202110705287.8A priority Critical patent/CN113345484A/en
Publication of CN113345484A publication Critical patent/CN113345484A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/12Bit line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, equalising circuits, for bit lines
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1051Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
    • G11C7/106Data output latches
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1078Data input circuits, e.g. write amplifiers, data input buffers, data input registers, data input level conversion circuits
    • G11C7/1087Data input latches
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/08Word line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, for word lines

Landscapes

  • Engineering & Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a data operation circuit and a storage and calculation integrated chip. The decoding circuit comprises a multiplicand input end, a multiplier input end and a decoding output end; the multiplicand input has bit width N1, input 2N1A seed multiplicand; the multiplier input has bit width N2, input 2N2A seed multiplier; bit width of decoding output end is 2N1+N2Output 2N1+N2Decoded output signals, each decoded output signal corresponding to a multiplicand and multiplier combination. The look-up table array comprises a storage array connected with the decoding output end and a reading circuit; storage array has 2N1+N2Operation results, each operation result is an operation result obtained by multiplying a multiplicand and a multiplier combination; readout circuit for reading memory array and the likeAnd decoding the operation result corresponding to the output signal. The number of the opened word lines is reduced, and the interference to the writing operation is reduced. And a large amount of operation is not needed, the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved.

Description

Data operation circuit and storage and calculation integrated chip
Technical Field
The invention relates to the technical field of semiconductors, in particular to a data operation circuit and a storage and calculation integrated chip.
Background
PIM (Processing in Memory) is a process in which some operations are performed in Memory and some operations are performed in a processor. Compared with the mode that all data required by operation are put into the memory in memory operation and all the operation is completed by the processor, the mode of memory operation reduces the energy consumption of data moving between the memory and the cache and between the cache and the CPU (central processing unit), and improves the performance of the memory operation system.
The existing memory operation is mainly realized by adopting an analog memory operation mode. The main disadvantages of analog memory operations are: the method needs to open a plurality of word lines simultaneously to operate the numerical value of the memory cell, and in the process, the interference to the write operation is serious because the number of the opened word lines is large. And the multiplication period is longer, the energy loss is larger and the efficiency is lower.
Disclosure of Invention
The invention provides a data operation circuit and a storage and computation integrated chip, which are used for reducing the number of open word lines and reducing the interference on write operation; meanwhile, the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved.
In a first aspect, the present invention provides a data operation circuit comprising a decoding circuit and a look-up table array. The decoding circuit comprises a multiplicand input end, a multiplier input end and a decoding output end; the multiplicand input has bit width N1 for input 2N1A seed multiplicand; the multiplier input has bit width N2 and is used for inputting 2N2A seed multiplier; bit width of decoding output end is 2N1+N2For outputting 2N1+N2The decoded output signals are each of a multiplicand and multiplier combination. The look-up table array comprises a storage array connected with the decoding output end and a reading circuit connected with the storage array; storage array has 2N1+N2Each operation result is an operation result obtained by multiplying a multiplicand and a multiplier combination, and each operation result corresponds to a decoding output signal; the sensing circuit is adapted to output a signal based on a decoding transmitted to the memory array,and reading an operation result corresponding to the decoding output signal in the storage array.
In the above scheme, the multiplication circuit is composed by adopting a decoding circuit and a look-up table array, and the decoding circuit receives 2 through a multiplicand input endN1A kind of multiplicand received by multiplier input terminal 2N2Seed multiplier, and will be 2N1Kind of multiplicand and 2N22 composed of various multipliersN1+N2All operation results multiplied by the multiplicand and multiplier combination are stored in the storage array in advance. When the multiplication is carried out, the decoding circuit only needs to generate a corresponding decoding output signal according to the input combination of the multiplicand and the multiplier, transmit the decoding output signal to the storage array, and then read out the operation result of the multiplication of the combination of the multiplicand and the multiplier which is stored in the storage array in advance by the reading circuit. Compared with the mode of simulating memory operation in the prior art, the method has the advantages that the operation result stored in the storage array can be addressed and read only through the decoding output signal generated by the decoding circuit, so that the operation on the numerical value in the storage unit is not required to be carried out by opening a plurality of word lines, the number of the opened word lines is reduced, and the interference on the writing operation is reduced. The operation of multiplication is changed into the operation of storing the operation result in the storage array in advance, and the operation result of multiplying the multiplicand and the multiplier can be obtained in a query mode without carrying out a large amount of operations, so that the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved. In application, all products are converted into small-scale multiplication operations for multiplicands and multipliers with fixed numbers of bits, and for a large number of repeated multiplication operations, the query table array is used for replacing the large number of repeated multiplication operations, so that the operation amount can be greatly reduced, and the operation efficiency is improved. Compared with the scheme in the prior art, the scheme of the application can support the unequal combination mode of the bit width N1 of the input end of the multiplicand and the bit width N2 of the input end of the multiplier, so that the method is more flexible.
In one particular embodiment, the memory array includes at least 2N1+N2A root word line, at least (N1+ N2) bit lines, and memory cells formed at intersections of any one word line and any one bit lineAnd (5) Yuan. Decoding output and 2N1+N2Root line connected to 2N1+N2Root word line transmission 2N1+N2Decoding the output signal; 2N1+N2Seed operation result and 2N1+N2The root word lines are in one-to-one correspondence, and each operation result is stored in (N1+ N2) storage units of the corresponding word line from high order to low order; the sensing circuit comprises (N1+ N2) sensing circuit units, the (N1+ N2) sensing circuit units correspond to (N1+ N2) bit lines one by one, and each sensing circuit unit is connected with the corresponding bit line so as to transmit the data to the corresponding 2N1+N2The decoded output signal of the root word line reads the operation results stored in the (N1+ N2) memory cells on the corresponding word line. Will 2N1+N2Each operation result in the operation results is stored in the storage unit of one bit line, when the reading circuit reads the operation results, only one word line needs to be opened, the number of the opened word lines is reduced, and the interference to the writing operation is reduced. And at least using the (N1+ N2) storage unit to store each operation result, the situation that the operation results cannot be stored due to more occupied places is prevented.
In a specific embodiment, 2 of the decoding output terminalN1+N2Bit width of 2N1+N2The word lines are in one-to-one correspondence, and each decoded output signal comprises an on 2N1+N2One of the root word lines turns off the signals of the other root word lines. One bit of the decoding output end corresponds to one word line, so that the corresponding word line is controlled to be opened or closed by controlling the value of each bit of the decoding output end to be 0 or 1, and the corresponding operation result is conveniently addressed.
In one embodiment, the memory array comprises at least (N1+ N2+ N3) bit lines, and each word line comprises (N1+ N2+ N3) memory cells. The (N1+ N2+ N3) memory cells sequentially store the operation results of the N3 carry compensation bits and the (N1+ N2) bits from the high order to the low order. The sensing circuit comprises (N1+ N2+ N3) sensing circuit units, wherein the (N1+ N2+ N3) sensing circuit units correspond to the (N1+ N2+ N3) bit lines one by one, and each sensing circuit unit is connected with the corresponding bit line. The N3 carry compensation bit is added to store the operation result, so that overflow can be prevented when the addition operation is carried out subsequently.
In one specific embodiment, the data operation circuit further includes (N1+ N2+ N3) full adders, and (N1+ N2+ N3) D flip-flops. The full adders (N1+ N2+ N3) correspond to the readout circuit units (N1+ N2+ N3) one by one, each full adder is provided with a first input end, a second input end, a low-order carry input end, a high-order carry output end and a local summation output end, and the first input end of each full adder is connected with the corresponding readout circuit unit. The D flip-flops (N1+ N2+ N3) correspond to the full adders (N1+ N2+ N3) one by one, each D flip-flop is provided with a data input end, a data output end and a clock signal end, the data input end of each D flip-flop is connected with the local summation output end of the corresponding full adder, and the data output end of each D flip-flop is connected with the second input end of the corresponding full adder. The low-order carry input end of the high-order full adder is connected with the high-order carry output end of the low-order full adder; the low carry input end in the full adder with the lowest digit is connected with the low level, and the high carry output end in the full adder with the highest digit is connected with the low level. The self-accumulation addition operation can be realized through a group of full adders and D triggers, and the multi-bit addition is realized without adopting the mode that a plurality of adders form an adder tree in the prior art. The addition operation level is reduced, and the operation efficiency is improved.
In a specific embodiment, the data operation circuit further comprises a control circuit connected to the clock signal terminal to transmit a control signal to the clock signal terminal, and the control circuit is further connected to the readout circuit. So as to control the clock signal end to start the self-accumulation addition operation or stop the self-accumulation addition operation through the control circuit.
In one specific embodiment, each D flip-flop includes: the latch circuit comprises a first latch connected with a data input end and a second latch connected with the first latch, wherein the second latch is also connected with a data output end. The first latch is used for latching input data from a data input end and transmitting the input data to the second latch under the control of a clock signal, and the second latch is used for latching the data from the first latch and transmitting the data from a data output end to a corresponding full adder under the control of the clock signal. By arranging the two latches, the operation results of the first two times output by the home position summation output end can be stored, and a plurality of historical addition operation results can be cached conveniently.
In a specific embodiment, each D flip-flop further includes a first transmission gate, a second transmission gate, a third transmission gate, and a fourth transmission gate. The first transmission gate is connected between the first latch and the data input end, the second transmission gate is connected in the first latch, the third transmission gate is connected between the first latch and the second latch, and the fourth transmission gate is connected in the second latch. The four transmission gates are all connected with a clock signal end. When the clock signal end receives the first level signal, the first transmission gate and the fourth transmission gate are turned off, and the second transmission gate and the third transmission gate are turned on; when the clock signal end receives the second level signal, the first transmission gate and the fourth transmission gate are conducted, and the second transmission gate and the third transmission gate are turned off. So as to control the input of the full adder through the second input terminal to be kept unchanged and update the data latched by the two latches.
In a second aspect, the present invention further provides a storage and computation integrated chip, where the storage and computation integrated chip includes any one of the data operation circuits described above. The multiplication circuit is composed by adopting a decoding circuit and a look-up table array, and the decoding circuit receives 2 through a multiplicand input endN1A kind of multiplicand received by multiplier input terminal 2N2Seed multiplier, and will be 2N1Kind of multiplicand and 2N22 composed of various multipliersN1+N2All operation results multiplied by the multiplicand and multiplier combination are stored in the storage array in advance. When the multiplication is carried out, the decoding circuit only needs to generate a corresponding decoding output signal according to the input combination of the multiplicand and the multiplier, transmit the decoding output signal to the storage array, and then read out the operation result of the multiplication of the combination of the multiplicand and the multiplier which is stored in the storage array in advance by the reading circuit. Compared with the mode of adopting analog memory operation in the prior art, the mode of the application only needs to carry out memory pair through the decoding output signal generated by the decoding circuitThe operation result stored in the storage array can be addressed and read, so that a plurality of word lines do not need to be opened to operate the numerical value in the storage unit, the number of the opened word lines is reduced, and the interference to write operation is reduced. The operation of multiplication is changed into the operation of storing the operation result in the storage array in advance, and the operation result of multiplying the multiplicand and the multiplier can be obtained in a query mode without carrying out a large amount of operations, so that the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved. In application, all products are converted into small-scale multiplication operations for multiplicands and multipliers with fixed numbers of bits, and for a large number of repeated multiplication operations, the query table array is used for replacing the large number of repeated multiplication operations, so that the operation amount can be greatly reduced, and the operation efficiency is improved. Compared with the scheme in the prior art, the scheme of the application can support the unequal combination mode of the bit width N1 of the input end of the multiplicand and the bit width N2 of the input end of the multiplier, so that the method is more flexible.
In one specific embodiment, the memory integrated chip is a memory. The decoding circuit is a decoder in the memory, the storage array is a storage unit array in the memory, and the reading circuit is a reading circuit used for reading the storage unit array in the memory. The decoder, the memory cell array and the reading circuit in the memory are used as each device in the data operation circuit, so that the memory and calculation in the memory are integrated conveniently.
Drawings
Fig. 1 is a block diagram of a data operation circuit according to an embodiment of the present invention;
FIG. 2 is a block diagram of another data operation circuit according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a circuit connection of a portion of a data operation circuit according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a circuit connection between a full adder and a D flip-flop according to an embodiment of the present invention;
fig. 5 is a schematic circuit connection diagram of a D flip-flop according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating the variation of the signals at the multiplicand input, the multiplier input, the memory array, the readout circuit, and the local summation output of the full adder in four cycles according to the present invention.
Reference numerals:
10-decoding circuit 11-multiplicand input 12-multiplier input
13-decoding output 20-storage array 30-reading circuit
31-readout circuit unit 40-full adder 50-D flip-flop
501-first latch 502-second latch 503-inverter
51-first transmission gate 52-second transmission gate 53-third transmission gate
54-fourth transmission gate 60-control circuit
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
To facilitate understanding of the data operation circuit provided in the embodiment of the present invention, an application scenario of the data operation circuit provided in the embodiment of the present invention is first described below, where the data operation circuit is applied in a data operation process for obtaining an operation value. The data operation circuit will be described in detail below with reference to the drawings.
Referring to fig. 1, a data operation circuit according to an embodiment of the present invention includes a decoding circuit 10 and a look-up table array. The decoding circuit 10 includes a multiplicand input terminal 11, a multiplier input terminal 12, and a decoding output terminal 13. The multiplicand input terminal 11 has a bit width N1, and when a multiplicand is input thereto through the multiplicand input terminal 11, each bit of the N1 bit width has a value of "0" or "1", so that 2 can be input through the multiplicand input terminal 11N1A multiplicand is seeded. Riding deviceThe bit width of the number input terminal 12 is N2, and when a multiplicand is input thereto through the multiplier input terminal 12, each bit of the N1 bit width has a value of "0" or "1", so that 2 can be input through the multiplier input terminal 12N2A multiplier is planted. So that there is 2 of the combination of multiplicand and multiplier inputted through the multiplicand input terminal 11 and multiplier input terminal 12N1+N2And (4) a combination mode. Bit width of the decoding output end 13 is 2N1+N2For outputting 2N1+N2A decoded output signal, 2N1+N2A decoded output signal and 2N1+N2There is a one-to-one correspondence between the multiplicand and multiplier combinations, and each decoded output signal corresponds to a multiplicand and multiplier combination.
As shown in fig. 1, the look-up table array comprises a memory array 20 connected to the decoding output 13, and a read-out circuit 30 connected to the memory array 20. In the storage array 20 is stored 2N1+N2Operation results, each operation result being an operation result obtained by multiplying a multiplicand and a multiplier combination, so that 2N1+N2Seed operation result sum2N1+N2There is a one-to-one correspondence between the various multiplicands and multiplier combinations. And each operation result corresponds to a decoding output signal so as to address the corresponding operation result in the memory array 20 through the decoding output signal. The readout circuit 30 is configured to read an operation result corresponding to one of the decoded output signals in the memory array 20 according to the one of the decoded output signals transmitted to the memory array 20.
In the above scheme, the multiplying circuit is composed by using the decoding circuit 10 and the look-up table array, and the decoding circuit 10 receives 2 through the multiplicand input terminal 11N1A kind of multiplicand received by multiplier input terminal 12 2N2Seed multiplier, and will be 2N1Kind of multiplicand and 2N22 composed of various multipliersN1+N2All the operation results multiplied by the multiplicand and multiplier combination are stored in the memory array 20 in advance. When performing multiplication, the decoding circuit 10 only needs to generate a corresponding decoding output signal according to the input combination of the multiplicand and the multiplier, and transmit the decoding output signal to the memory array 20, and then the reading circuit 30 reads the multiplicand and multiplier group pre-stored in the memory array 20The result of the multiplication is obtained. Compared with the mode of simulating memory operation in the prior art, the mode of the application only needs to address and read the operation result stored in the storage array 20 through the decoding output signal generated by the decoding circuit 10, so that the operation on the numerical value in the storage unit is not needed to be carried out by opening a plurality of word lines, the number of the opened word lines is reduced, and the interference on the writing operation is reduced. The operation of multiplication is changed into the operation of storing the operation result in the storage array 20 in advance, and the operation result of multiplying the multiplicand and the multiplier can be obtained in a query mode without carrying out a large amount of operations, so that the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved. In application, all products are converted into small-scale multiplication operations for multiplicands and multipliers with fixed numbers of bits, and for a large number of repeated multiplication operations, the query table array is used for replacing the large number of repeated multiplication operations, so that the operation amount can be greatly reduced, and the operation efficiency is improved. Compared with the scheme in the prior art, the scheme in the application can support the unequal combination mode of the bit width N1 of the multiplicand input end 11 and the bit width N2 of the multiplier input end 12, so that the method is more flexible. The above circuit configurations will be described in detail with reference to the drawings.
When the decoding circuit 10 is provided, referring to fig. 3, the decoding circuit 10 at least includes 3 ports, two of which are a multiplicand input terminal 11 and a multiplier input terminal 12, and a decoding output terminal 13. The multiplicand input terminal 11 has a bit width N1, and when a multiplicand is input thereto through the multiplicand input terminal 11, each bit of the N1 bit width has a value of "0" or "1", so that 2 can be input through the multiplicand input terminal 11N1A multiplicand is seeded. The bit width N1 of the multiplicand input terminal 11 may be any positive integer such as 1, 2, 3, 4, etc. For example, referring to FIG. 3, bit width N1 of the multiplicand input 11 may be equal to 2, with multiplicand input 11 being D [1:0] as shown in FIG. 3]Four multiplicands of 00, 01, 10 and 11 can be input. The multiplier input terminal 12 has a bit width N2, and when a multiplicand is input thereto through the multiplier input terminal 12, each bit of the N1 bit width has a value of "0" or "1", so that 2 can be input through the multiplier input terminal 12N2A multiplier is planted. The bit width N2 of the multiplier input terminal 12 may be any positive integer such as 1, 2, 3, 4, etc. For example, referring to FIG. 3, bit width N2 of multiplier input 12 may be equal to 2, multiplier input 12 being W [1:0] as shown in FIG. 3]Four kinds of multipliers, 00, 01, 10, and 11, can be input. So that there is 2 of the combination of multiplicand and multiplier inputted through the multiplicand input terminal 11 and multiplier input terminal 12N1+N2And (4) a combination mode. When the bit widths of the multiplicand input terminal 11 and the multiplier input terminal 12 are both 2, 16 combinations of multiplicand and multiplier can be input 24. The specific combinations are shown in table 1 below. Of course, it should be noted that the multiplicand bit width N1 and the multiplier bit width N2 may be equal or different.
TABLE 1 multiplicand and multiplier combination and operation results stored in memory array 20
Figure BDA0003130935750000051
When the decoding output end 13 is set, the bit width of the decoding output end 13 is 2N1+N2For outputting 2N1+N2A decoded output signal, and 2N1+N2A decoded output signal and 2N1+N2There is a one-to-one correspondence between the multiplicand and multiplier combinations, and each decoded output signal corresponds to a multiplicand and multiplier combination. For example, the bit width of the decoding output 13 shown in fig. 3 and table 1 is 16 bits (N1 ═ N2 ═ 2), specifically WL [15:0 ═ 2]。
When the memory array 20 is provided, referring to fig. 1 and 3, the memory array 20 is connected to the decode output terminal 13, and 2 is stored in the memory array 20N1+N2Operation results, each operation result being an operation result obtained by multiplying a multiplicand and a multiplier combination, so that 2N1+N2Seed operation result sum2N1+N2There is a one-to-one correspondence between the various multiplicands and multiplier combinations. And each operation result corresponds to a decoding output signal so as to address the corresponding operation result in the memory array 20 through the decoding output signal. For example, when the bit widths of the multiplicand input terminal 11 and multiplier input terminal 12 in the foregoing example are both 2,the operation results stored in the memory array 20 are 16 types, and each operation result is an operation result multiplied by a combination of a multiplicand and a multiplier in a manner as shown in table 1.
When the readout circuit 30 is provided, the readout circuit 30 is connected to the memory array 20, and is configured to read an operation result corresponding to one kind of decoding output signal in the memory array 20 according to the one kind of decoding output signal transmitted to the memory array 20. For example, as shown in table 1, when the multiplicand input to the input/output terminal is binary "01" and the multiplier input to the multiplier input terminal 12 is binary "11", the operation result read from the memory array 20 by the read circuit 30 is binary "00011". The multiplying circuit is composed by adopting a decoding circuit 10 and a look-up table array, and the decoding circuit 10 receives 2 through a multiplicand input end 11N1A kind of multiplicand received by multiplier input terminal 12 2N2Seed multiplier, and will be 2N1Kind of multiplicand and 2N22 composed of various multipliersN1+N2All the operation results multiplied by the multiplicand and multiplier combination are stored in the memory array 20 in advance. In the multiplication operation, the decoding circuit 10 only needs to generate a corresponding decoding output signal according to the input combination of the multiplicand and the multiplier, transmit the decoding output signal to the memory array 20, and then read out the result of the multiplication of the combination of the multiplicand and the multiplier, which is stored in the memory array 20 in advance, by the readout circuit 30. Compared with the mode of simulating memory operation in the prior art, the mode of the application only needs to address and read the operation result stored in the storage array 20 through the decoding output signal generated by the decoding circuit 10, so that the operation on the numerical value in the storage unit is not needed to be carried out by opening a plurality of word lines, the number of the opened word lines is reduced, and the interference on the writing operation is reduced. The operation of multiplication is changed into the operation of storing the operation result in the storage array 20 in advance, and the operation result of multiplying the multiplicand and the multiplier can be obtained in a query mode without carrying out a large amount of operations, so that the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved. In application, for fixed number of multiplicands and multipliers, all products are converted into small-scale multiplication, and for large number of repeatsFor complex multiplication, the query table array is used for replacing a large number of repeated multiplication, so that the operation amount can be greatly reduced, and the operation efficiency is improved. Compared with the scheme in the prior art, the scheme in the application can support the unequal combination mode of the bit width N1 of the multiplicand input end 11 and the bit width N2 of the multiplier input end 12, so that the method is more flexible.
Specific storage 2 in storage array 20N1+N2At least 2 may be provided in the memory array 20 when the operation result is of the kindN1 +N2A root word line, at least (N1+ N2) bit lines, and memory cells formed at intersections of any one word line and any one bit line.
Make the decoding output terminals 13 and 2N1+N2Root line connected to 2N1+N2Root word line transmission 2N1+N2The output signal is decoded. And also 2N1+N2Seed operation result and 2N1+N2The root word lines are stored in a one-to-one correspondence mode, and the result of each operation is stored in (N1+ N2) memory cells of the corresponding word line from high order to low order so as to store 2N1+N2The results of the seed operation are stored in the memory array 20. And when the operation result is read subsequently, one operation result can be read by only opening one word line at a time, so that the number of the word lines opened in the operation process is reduced, and the interference to the write operation is reduced. And at least using the (N1+ N2) storage unit to store each operation result, the situation that the operation results cannot be stored due to more occupied places is prevented. When N1 is N2 is 2, the number of word lines is at least 16, such as 16 word lines WL0 to WL15 shown in fig. 3, the number of bit lines is at least 4, such as BL0 to BL3 shown in fig. 3, each word line has at least 4 memory cells, and the result of each operation is stored in the at least 4 memory cells.
At this time, referring to fig. 3, the sensing circuit 30 may include (N1+ N2) sensing circuit cells 31 (each DO in fig. 3 represents one sensing circuit cell 31), the (N1+ N2) sensing circuit cells 31 and (N1+ N2) bit lines are in one-to-one correspondence, and each sensing circuit cell 31 is connected to a corresponding bit line to transmit data to 2N1+N2The decoded output signal of the root word line reads the (N1+ N2) memory cells stored on the corresponding word lineTo obtain the corresponding operation result in the memory array 20. For example, when N1 is N2 is 2, the number of the sense circuit cells 31 is at least 4, and one bit line is connected to each sense circuit cell 31. When each of the sense circuit units 31 is provided, each of the sense circuit units 31 may include a sense amplifier circuit or the like, and implement a read function for the memory array 20.
When the connection between the decode output terminal 13 and the word line in the memory array 20 is realized, 2 of the decode output terminal 13 can be set to 2N1+N2Bit width of 2N1+N2The root word lines are in one-to-one correspondence, so that one bit of the decoding output end 13 corresponds to one word line, and the corresponding word line is controlled to be opened or closed by controlling the value of each bit of the decoding output end 13 to be 0 or 1, so that the corresponding operation result is conveniently addressed. Decoded output signal pair 2 output at decoded output 13N1+N2When the root word line is controlled, each decoding output signal can include turn-on 2N1+N2One of the root word lines turns off the signals of the other root word lines. Therefore, only one word line can be opened at a time in the operation process, the number of the opened word lines is greatly reduced, and the interference to the writing operation is reduced. Of course, the decoded output signal is transmitted between the decoded output 13 and the memory array 20, pair 2N1+N2Other control methods can also be used to control the root word line.
In addition, referring to FIG. 3, N3 bit lines may be provided in the memory array 20, so that more memory cells are formed on each bit line to store the result of the operation. At this time, the memory array 20 may at least include (N1+ N2+ N3) bit lines, and each word line has (N1+ N2+ N3) memory cells. In storing each operation result, the operation results of N3 carry compensation bits and (N1+ N2) bits can be stored in the (N1+ N2+ N3) memory cells from the upper to the lower bits. The N3 carry compensation bit is added to store the operation result, so that overflow can be prevented when the addition operation is carried out subsequently. The bit width N3 of the carry compensation bit may be any positive integer such as 1, 2, 3, 4, etc., and the actual value of N3 may be selected according to the actual scale of the addition operation. For example, when N3 is 1 as shown in fig. 3, that is, the carry compensation bit shown in fig. 3 is 1 bit wide, and when N1 is N2 is 2, there are at least 5 bit lines, such as BL0 to BL4 as shown in fig. 3. When each operation result is stored, referring to table 1 above, each carry compensation bit can be set to be "0", so that the numerical value of each operation result is not affected, and the overflow phenomenon during the subsequent addition operation can be prevented. Continuing to refer to fig. 3, at this time, the readout circuit 30 includes (N1+ N2+ N3) readout circuit units 31, the (N1+ N2+ N3) readout circuit units 31 correspond to (N1+ N2+ N3) bit lines one by one, and each readout circuit unit 31 is connected to a corresponding bit line. That is, when the number of bit lines is increased by N3, the number of sense circuit cells 31 in the sense circuit 30 also needs to be increased by N3 correspondingly so that all values of the carry compensation bits can be read.
Referring to fig. 2 and 3, a circuit configuration for processing addition may be further provided in the data operation circuit, and in this case, the self-accumulation full-addition operation may be implemented by using a combination of the full adder 40 and the D flip-flop 50. Specifically, the data operation circuit may further include (N1+ N2+ N3) full adders 40 and (N1+ N2+ N3) D flip-flops 50. Wherein, (N1+ N2+ N3) full adders 40 and (N1+ N2+ N3) read circuit units 31 correspond to each other one by one, and (N1+ N2+ N3) D flip-flops 50 and (N1+ N2+ N3) full adders 40 correspond to each other one by one, that is, each read circuit unit 31 is correspondingly connected with one full adder 40, and each full adder 40 is correspondingly connected with one D flip-flop 50. In specific connection, referring to fig. 3 and 4, each full adder 40 at least includes 5 ports, which are a first input terminal (the terminal a in fig. 4 represents the first input terminal), a second input terminal (the terminal B in fig. 4 represents the first input terminal), a low carry input terminal (the terminal CIN in fig. 3 and 4 represents the low carry input terminal), a high carry output terminal (the terminal COUT in fig. 3 and 4 represents the low carry input terminal), and a local SUM output terminal (the terminal SUM in fig. 3 and 4 represents the local SUM output terminal). Wherein the first input and the second input are used for inputting the summand and. The carry-low input terminal receives the input terminal of the carry from the low-order full adder 40 to the high-order full adder 40 when the sum of the low-order full adder 40 is greater than the binary number "1". And when the sum result of the high-order carry output end of the full adder 40 as the current position is greater than binary number "1", the high-order carry output end carries to the high-order full adder 40. The result output by the home position summation output end and the high-order carry output end is equal to the carry value input by the home position from the summand, the addend and the low-order carry input end output by the first input end and the second input end. Referring to fig. 4 and 5, each D flip-flop 50 includes at least 3 ports, which are a data input port (the D port in fig. 4 and 5 represents the data input port), a data output port (the Q port in fig. 4 and 5 represents the data input port), and a clock signal port (the CLK port in fig. 4 and 5 represents the clock signal port).
In particular, when each full adder 40 and D flip-flop 50 are connected, referring to fig. 4, the first input terminal of each full adder 40 is connected to the corresponding sense circuit unit 31 to receive the value of the local output after the multiplication, and the output D0 of each sense circuit unit 31 shown in fig. 3 is connected to the a terminal of the full adder 40 below the output D0. The data output of each D flip-flop 50 is connected to the second input of the corresponding full adder 40. The data input of each D flip-flop 50 is connected to the local summing output of the corresponding full adder 40. And between two adjacent full adders 40 in high and low order, the low carry input terminal of the high full adder 40 is connected with the high carry output terminal of the low full adder 40. The carry low input terminal of the full adder 40 with the lowest bit number is connected to the low level, so that the carry low input terminal of the full adder 40 with the lowest bit number has a binary "0" value, which means that the full adder 40 with the lowest bit number has no carry operation. The high carry output terminal of the full adder 40 with the highest bit number is connected to the low level, so that the value output from the high carry output terminal of the full adder 40 with the highest bit number is initially "0" in binary. The self-accumulation addition operation can be realized through a group of full adders 40 and D flip-flops 50, and the multi-bit addition can be realized without adopting the mode that adder trees are formed by a plurality of adders in the prior art. The addition operation level is reduced, and the operation efficiency is improved.
Referring to fig. 2, the data operation circuit may further include a control circuit 60 connected to the clock signal terminal to transmit a control signal to the clock signal terminal, so as to control the start or stop of the self-accumulation addition operation of the clock signal terminal by the control circuit 60. In addition, the control circuit 60 may be connected to the readout circuit 30 to control the readout circuit 30 to read the operation result in the memory array 20.
In the setting of each D flip-flop 50, as shown in fig. 5, in a D flip-flop 50, the D flip-flop 50 includes two latches, a first latch 501 connected to a data input terminal and a second latch 502 connected to the first latch 501, and the second latch 502 is also connected to a data output terminal. Wherein the first latch 501 is used to latch input data from the data input terminal and transfer the input data to the second latch 502 under the control of the clock signal, and the second latch 502 is used to latch data from the first latch 501 and transfer the data from the data output terminal to the corresponding full adder 40 under the control of the clock signal. By arranging the two latches, the operation results of the first two times output by the home position summation output end can be stored, and a plurality of historical addition operation results can be cached conveniently.
With continued reference to fig. 5, a plurality of transmission gates may also be provided in the D flip-flop 50, such that the clock signal controls the data input and data output in the two latches by controlling the turning off or on of the transmission gates. In a specific arrangement, as shown in fig. 5, each D flip-flop 50 may have 4 transmission gates, where the 4 transmission gates are a first transmission gate 51(PG1), a second transmission gate 52(PG2), a third transmission gate 53(PG3), and a fourth transmission gate 54(PG 4). Wherein the first transmission gate 51 may be connected between the first latch 501 and the data input terminal, the second transmission gate 52 may be connected within the first latch 501, the third transmission gate 53 may be connected between the first latch 501 and the second latch 502, and the fourth transmission gate 54 may be connected within the second latch 502. The four transmission gates are all connected with a clock signal end. When the clock signal specifically controls the transmission gates to be turned off or on, it may be set that when the clock signal end receives a first level signal, the first transmission gate 51 and the fourth transmission gate 54 are turned off, and the second transmission gate 52 and the third transmission gate 53 are turned on; when the clock signal terminal receives the second level signal, the first transmission gate 51 and the fourth transmission gate 54 are turned on, and the second transmission gate 52 and the third transmission gate 53 are turned off. So as to control the input to the full adder 40 through the second input to remain unchanged and update the data latched by the two latches. The first level signal and the second level signal may be a low level signal and a high level signal, respectively, and specifically, the first level signal may be a low level signal, and the second level signal may be a high level signal; the first level signal may be a high level signal and the second level signal may be a low level signal.
For example, when the first level signal is a low level signal and the second level signal is a high level signal, the control process is as follows. When the clock signal CLK of the D flip-flop 50 is a low level signal, the first transmission gate 51 is turned off, the second transmission gate 52 is turned on, and the transmission data Q0 signal transmitted from the data input terminal is latched in the first latch 501. The initial state Q0 may be set high. At this time, the fourth transmission gate 54 is turned off, the third transmission gate 53 is turned on, the Q terminal is kept at a low level, and the other input terminal B of the full adder 40 representing the first addition is "0". At this time, addition operation is performed, and the obtained home summation result transmitted from the home summation output SUM is transmitted to the D terminal, and since the first transmission is turned off, the new home summation result is not transmitted to the D flip-flop 50. When the clock signal of the D flip-flop 50 is at a high level, the third transmission gate 53 is turned off, the fourth transmission gate 54 is turned on, and the Q signal is latched in the second latch 502, so that the input of the second input terminal of the full adder 40 is kept unchanged, and the Q terminal of the data output terminal of the D flip-flop 50 is kept unchanged. At this time, the second transmission gate 52 is turned off, the first transmission gate 51 is turned on, the data at the D terminal of the data input terminal of the D flip-flop 50 will be transmitted to the D flip-flop 50, and the Q0 is written with a new value and will be maintained at the new level value. It should be noted that the transmission gate may be arranged in other ways than the above-described arrangement.
In addition, referring to fig. 5, an inverter 503 may be provided between the D terminal and the first transmission gate 51 to adjust the phase. Of course, an inverter 503 may be provided between the Q terminal and the second latch 502 to adjust the phase.
An example of the operation of the data operation circuit in performing the multiplication operation and the self-accumulation addition operation is given below with reference to fig. 6. The operation mode shown in fig. 6 is based on a circuit configuration in which N1 is N2 is 2 and N3 is 1. In fig. 6, 4 operation cycles are shown, each operation cycle comprising a multiplication operation and an addition operation, each bit width, word line, readout circuit 30, etc. displaying different binary values by high and low levels, wherein the high level indicates a binary value "1" and the low level indicates a binary value "0". For the sake of convenience of distinction, the first period, the second period, the third period, and the fourth period are sequentially shown from left to right in fig. 6.
In the first cycle, referring to fig. 6, the multiplicand input terminal 11 and multiplier input terminal 12 are both binary "0", i.e. D [1:0] and W [1:0] correspond to "00" and "00", respectively, the decode output signal output by the decode circuit 10 indicates that WL0 in the memory array 20 is open, the operation result stored in the 5 memory cells in row 0 is binary "00000", and "00000" is read out by the read circuit 30, i.e. the operation result when both the multiplicand and the multiplier are "0" is "0". The control circuit 60 controls the clock signal to be in a low state, and each local summation output end in the D flip-flop 50 is binary '0'. When the addition operation is performed, the Q0 of the 0 th bit D flip-flop 50 latches 0, and the output Q terminal is 0; the CIN terminal of the 0 th full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM0 terminal is 0 and the COUT terminal is 0; the Q0 of the 1 st bit D flip-flop 50 latches 0, and the output Q end is 0; the CIN terminal of the 1 st bit full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM1 terminal is 0 and the COUT terminal is 0; the Q0 of the 2 nd bit D flip-flop 50 latches 0, and the output Q end is 0; the CIN terminal of the 2 nd bit full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM2 terminal is 0 and the COUT terminal is 0; the Q0 of the 3 rd bit D flip-flop 50 latches 0, and the output Q end is 0; the CIN terminal of the 3 rd bit full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM3 terminal is 0 and the COUT terminal is 0; the Q0 of the 4 th bit D flip-flop 50 latches 0, and the output Q end is 0; since the CIN terminal, the A terminal and the B terminal of the 4 th full adder 40 are 0, 0 respectively, the SUM4 terminal and the COUT terminal are 0, respectively. The corresponding addition at this moment is 0000+0000 and decimal 0+ 0. Then the clock signal is switched to a high level state, Q0 of the D flip-flop 50 is transmitted into the SUM terminal, and the Q terminal latches the value at the previous time, so that the second input terminal and the local summation output terminal of the full adder 40 are both kept unchanged, specifically: q0 of the 0 th D flip-flop 50 is 0, and Q-end latch 0 is output; q0 of the 1 st bit D flip-flop 50 is 0, and the output Q end latches 0; q0 of the 2 nd bit D flip-flop 50 is 0, and Q end latch 0 is output; q0 of the 3 rd bit D flip-flop 50 is 0, and Q end latch 0 is output; q0 of the 4 th D flip-flop 50 is 0, and the output Q latches 0.
In the second cycle, referring to fig. 6, the multiplicand input terminal 11 and multiplier input terminal 12 are both binary "11", i.e. D [1:0] and W [1:0] correspond to "11" and "11", respectively, the decoding output signal output by the decoding circuit 10 indicates that WL15 in the memory array 20 is open, the operation result stored in the 5 memory cells in the 15 th row is read out as "01001" by the read-out circuit 30, i.e. the operation result of "11" where both the multiplicand and the multiplier are binary, respectively, is binary "01001". The control circuit 60 controls the clock signal to be in a low level state, and the result after the self-accumulation operation is as follows: the CIN terminal of the 0 th full adder 40 is 0, the a terminal is 1, and the B terminal is 0, so that the SUM0 terminal is 1, and the COUT terminal is 0; the CIN terminal of the 1 st bit full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM1 terminal is 0 and the COUT terminal is 0; the CIN terminal of the 2 nd bit full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM2 terminal is 0 and the COUT terminal is 0; the CIN end of the 3 rd bit full adder 40 is 0, the a end is 1, and the B end is 0, so the SUM3 end is 1, and the COUT end is 0; since the CIN terminal, the A terminal and the B terminal of the 4 th full adder 40 are 0, 0 respectively, the SUM4 terminal and the COUT terminal are 0, respectively. The corresponding multiplication at this point is 11 × 11 — 1001, decimal 3 × 3 — 9. The corresponding addition at this moment is 0000+ 1001-0000, decimal 0+ 9-9. Then the control circuit 60 controls the clock signal to be in a high state, Q0 of the D flip-flop 50 transmits the SUM value, and the Q terminal latches the last value, so that the input and output of the full adder 40 remain unchanged. The method specifically comprises the following steps: q0 of the 0 th D flip-flop 50 is 1, and Q end latch 0 is output; q0 of the 1 st bit D flip-flop 50 is 0, and the output Q end latches 0; q0 of the 2 nd bit D flip-flop 50 is 0, and Q end latch 0 is output; q0 of the 3 rd bit D flip-flop 50 is 1, and Q end latch 0 is output; q0 of the 4 th D flip-flop 50 is 0, and the output Q latches 0.
In the third cycle, referring to fig. 6, the binary number input to the multiplicand input terminal 11 is "10" and the binary number input to the multiplier input terminal 12 is "01", i.e., D [1:0] and W [1:0] correspond to "10" and "01", respectively. The decode output signal outputted from the decode circuit 10 instructs WL9 in the memory array 20 to be turned on, and the operation result stored in the 5 memory cells in the 15 th row is read out as "00010" by the read circuit 30, that is, the operation result obtained by multiplying the multiplicand "10" by the multiplier "01" is binary "00010". The control circuit 60 controls the clock signal to be in a low level state, and the result after the self-accumulation operation is as follows: the CIN terminal of the 0 th full adder 40 is 0, the a terminal is 0, and the B terminal is 1, so that the SUM0 terminal is 1, and the COUT terminal is 0; the CIN end of the 1 st bit full adder 40 is 0, the a end is 1, and the B end is 0, so the SUM1 end is 1, and the COUT end is 0; the CIN terminal of the 2 nd bit full adder 40 is 0, the a terminal is 0, and the B terminal is 0, so that the SUM2 terminal is 0 and the COUT terminal is 0; the CIN terminal of the 3 rd bit full adder 40 is 0, the a terminal is 0, and the B terminal is 1, so the SUM3 terminal is 1, and the COUT terminal is 0; since the CIN terminal, the A terminal and the B terminal of the 4 th full adder 40 are 0, 0 respectively, the SUM4 terminal and the COUT terminal are 0, respectively. The multiplication operation at this point corresponds to 10 × 01 to 0010 and decimal 2 × 1 to 2. The corresponding addition at this point is 1001+0010 to 1011 and decimal 2+9 to 11. Then the control circuit 60 controls the clock signal to switch to a high state, Q0 of the D flip-flop 50 transmits the SUM value, and the Q terminal latches the last value, so that the input and output of the full adder 40 remain unchanged. The method specifically comprises the following steps: q0 of the 0 th bit D flip-flop 50 is 1, and the output Q end latches 1; the Q0 of the 1 st bit D flip-flop 50 is 1, and the output Q end latches 0; q0 of the 2 nd bit D flip-flop 50 is 0, and Q end latch 0 is output; the Q0 of the 3 rd bit D flip-flop 50 is 1, and the output Q end latches 1; q0 of the 4 th D flip-flop 50 is 0, and the output Q latches 0.
In the fourth cycle, referring to fig. 6, the binary number input to the multiplicand input terminal 11 is "01" and the binary number input to the multiplier input terminal 12 is "10", i.e., D [1:0] and W [1:0] correspond to "01" and "10", respectively. The decode output signal output by the decode circuit 10 indicates that WL6 in the memory array 20 is on, and the operation structure stored in the 5 memory cells in row 6 is read out as "00010" by the read circuit 30, i.e., the result of the multiplication of the multiplicand "01" and multiplier "10" is binary "00010". The control circuit 60 controls the clock signal to be in a low level state, and the result after the self-accumulation operation is as follows: the CIN terminal of the 0 th full adder 40 is 0, the a terminal is 0, and the B terminal is 1, so that the SUM0 terminal is 1, and the COUT terminal is 0; the CIN end of the 1 st bit full adder 40 is 0, the a end is 1, and the B end is 1, so the SUM1 end is 0, and the COUT end is 1; the CIN end of the 2 nd full adder 40 is 1, the a end is 0, and the B end is 0, so the SUM2 end is 1, and the COUT end is 0; the CIN terminal of the 3 rd bit full adder 40 is 0, the a terminal is 0, and the B terminal is 1, so the SUM3 terminal is 1, and the COUT terminal is 0; since the CIN terminal, the A terminal and the B terminal of the 4 th full adder 40 are 0, 0 respectively, the SUM4 terminal and the COUT terminal are 0, respectively. The multiplication operation at this point corresponds to 01 × 10 — 0010, decimal 1 × 2 — 2. The corresponding addition at this point is 1011+0010 to 1101 and decimal 2+11 to 13. Then the control circuit 60 controls the clock signal to switch to a high state, Q0 of the D flip-flop 50 transmits the SUM value, and the Q terminal latches the last value, so that the input and output of the full adder 40 remain unchanged. The method specifically comprises the following steps: q0 of the 0 th bit D flip-flop 50 is 1, and the output Q end latches 1; q0 of the 1 st bit D flip-flop 50 is 0, and a Q-end latch 1 is output; q0 of the 2 nd bit D flip-flop 50 is 1, and Q-end latch 0 is output; the Q0 of the 3 rd bit D flip-flop 50 is 1, and the output Q end latches 1; q0 of the 4 th D flip-flop 50 is 0, and the output Q latches 0.
The multiplication circuit is composed of a decoding circuit 10 and a look-up table array, and the decoding circuit 10 receives 2 through a multiplicand input end 11N1A kind of multiplicand received by multiplier input terminal 12 2N2Seed multiplier, and will be 2N1Kind of multiplicand and 2N22 composed of various multipliersN1+N2All the operation results multiplied by the multiplicand and multiplier combination are stored in the memory array 20 in advance. In the multiplication operation, the decoding circuit 10 only needs to generate a corresponding decoding output signal according to the input combination of the multiplicand and the multiplier, transmit the decoding output signal to the memory array 20, and then read out the result of the multiplication of the combination of the multiplicand and the multiplier, which is stored in the memory array 20 in advance, by the readout circuit 30. Compared with the prior art which adopts the analog memory operation, the method of the present application only needs the decoding generated by the decoding circuit 10The code output signal addresses and reads the operation result stored in the memory array 20, so that a plurality of word lines do not need to be opened to operate the numerical value in the memory cell, the number of the opened word lines is reduced, and the interference to the write operation is reduced. The operation of multiplication is changed into the operation of storing the operation result in the storage array 20 in advance, and the operation result of multiplying the multiplicand and the multiplier can be obtained in a query mode without carrying out a large amount of operations, so that the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved. In application, all products are converted into small-scale multiplication operations for multiplicands and multipliers with fixed numbers of bits, and for a large number of repeated multiplication operations, the query table array is used for replacing the large number of repeated multiplication operations, so that the operation amount can be greatly reduced, and the operation efficiency is improved. Compared with the scheme in the prior art, the scheme in the application can support the unequal combination mode of the bit width N1 of the multiplicand input end 11 and the bit width N2 of the multiplier input end 12, so that the method is more flexible.
In addition, the embodiment of the invention also provides a storage and calculation integrated chip, which comprises any one of the data operation circuits. The multiplying circuit is composed by adopting a decoding circuit 10 and a look-up table array, and the decoding circuit 10 receives 2 through a multiplicand input end 11N1A kind of multiplicand received by multiplier input terminal 12 2N2Seed multiplier, and will be 2N1Kind of multiplicand and 2N22 composed of various multipliersN1+N2All the operation results multiplied by the multiplicand and multiplier combination are stored in the memory array 20 in advance. In the multiplication operation, the decoding circuit 10 only needs to generate a corresponding decoding output signal according to the input combination of the multiplicand and the multiplier, transmit the decoding output signal to the memory array 20, and then read out the result of the multiplication of the combination of the multiplicand and the multiplier, which is stored in the memory array 20 in advance, by the readout circuit 30. Compared with the prior art which adopts the analog memory operation mode, the mode of the application only needs to address and read the operation result stored in the memory array 20 through the decoding output signal generated by the decoding circuit 10, thereby not needing to open a plurality of word lines to operate the numerical value in the memory cell, and reducing the operation timeThe number of the opened word lines is reduced, and the interference to the writing operation is reduced. The operation of multiplication is changed into the operation of storing the operation result in the storage array 20 in advance, and the operation result of multiplying the multiplicand and the multiplier can be obtained in a query mode without carrying out a large amount of operations, so that the operation period is shortened, the energy consumption is reduced, and the operation efficiency is improved. In application, all products are converted into small-scale multiplication operations for multiplicands and multipliers with fixed numbers of bits, and for a large number of repeated multiplication operations, the query table array is used for replacing the large number of repeated multiplication operations, so that the operation amount can be greatly reduced, and the operation efficiency is improved. Compared with the scheme in the prior art, the scheme in the application can support the unequal combination mode of the bit width N1 of the multiplicand input end 11 and the bit width N2 of the multiplier input end 12, so that the method is more flexible.
When provided, the memory integrated chip may be a memory. The memory has therein a decoder, a memory cell array, and a readout circuit 30 for reading the memory cell array. A decoder in a memory may be used as the decoding circuit 10, a memory cell array in the memory may be used as the memory array 20, and a readout circuit 30 for reading the memory cell array in the memory may be used as the readout circuit 30 in the data arithmetic circuit. By using the decoder, the memory cell array, and the read circuit 30 in the memory as each device in the data arithmetic circuit, it is convenient to realize integration of storage and calculation in the memory. In addition, when the type of the memory is determined, the memory may be a memory in which each memory cell in the memory cell array is a single bit. For example, the memory may be a Static Random Access Memory (SRAM), and may also be other types of memory such as a Read Only Memory (ROM). The memory cell array in the standard memory is convenient to realize as the memory array 20 in the data operation circuit, thereby facilitating the standardized manufacture and improving the expansion performance.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A data arithmetic circuit, comprising:
the decoding circuit comprises a multiplicand input end, a multiplier input end and a decoding output end; wherein, the bit width of the multiplicand input end is N1, and the multiplicand input end is used for inputting 2N1A seed multiplicand; the bit width of the multiplier input end is N2, and the multiplier input end is used for inputting 2N2A seed multiplier; the bit width of the decoding output end is 2N1+N2For outputting 2N1+N2Decoding output signals, each decoding output signal corresponding to a multiplicand and multiplier combination;
the look-up table array comprises a storage array connected with the decoding output end and a reading circuit connected with the storage array; wherein the storage array has 2 stored thereinN1+N2Each operation result is an operation result obtained by multiplying a multiplicand and a multiplier combination, and each operation result corresponds to a decoding output signal; the readout circuit is used for reading an operation result corresponding to a decoding output signal in the storage array according to the decoding output signal transmitted to the storage array.
2. The data arithmetic circuit of claim 1 wherein the memory array comprises at least 2N1+N2A root word line, at least (N1+ N2) bit lines, and memory cells formed at intersections of any one word line and any one bit line;
said decoding output and said 2N1+N2Root line connected to the 2N1+N2Root word line transfers the 2N1+N2Decoding the output signal;
2 is describedN1+N2The operation result and said 2N1+N2The root word lines are in one-to-one correspondence, and each operation result is stored in (N1+ N2) storage units of the corresponding word line from high order to low order;
the sensing circuit comprises a first bit line and the (N1+ N2) bit linesA corresponding (N1+ N2) sense circuit cells, each sense circuit cell connected to a corresponding bit line for transferring to the 2 bit linesN1+N2The decoded output signal of the root word line reads the operation results stored in the (N1+ N2) memory cells on the corresponding word line.
3. The data arithmetic circuit of claim 2 wherein 2 of the decode output terminalsN1+N2Bit width of said 2N1+N2The root word lines are in one-to-one correspondence;
each decoded output signal comprises switching on said 2N1+N2One of the root word lines turns off the signals of the other root word lines.
4. The data operation circuit of claim 2, wherein the memory array comprises at least (N1+ N2+ N3) bit lines, each word line having (N1+ N2+ N3) memory cells; wherein, the (N1+ N2+ N3) storage units sequentially store the operation results of N3 carry compensation bits and (N1+ N2) bits from high bit to low bit;
the sensing circuit comprises (N1+ N2+ N3) sensing circuit units which are in one-to-one correspondence with the (N1+ N2+ N3) bit lines, and each sensing circuit unit is connected with the corresponding bit line.
5. The data arithmetic circuit of claim 4, further comprising:
(N1+ N2+ N3) full adders in one-to-one correspondence with the (N1+ N2+ N3) readout circuit units; each full adder is provided with a first input end, a second input end, a low-order carry input end, a high-order carry output end and a home position summation output end; the first input end of each full adder is connected with the corresponding reading circuit unit;
(N1+ N2+ N3) D flip-flops in one-to-one correspondence with the (N1+ N2+ N3) full adders; each D flip-flop is provided with a data input end, a data output end and a clock signal end; the data input end of each D flip-flop is connected with the local summation output end of the corresponding full adder, and the data output end of each D flip-flop is connected with the second input end of the corresponding full adder;
the low-order carry input end of the high-order full adder is connected with the high-order carry output end of the low-order full adder; the low carry input end in the full adder with the lowest digit is connected with the low level, and the high carry output end in the full adder with the highest digit is connected with the low level.
6. The data arithmetic circuit of claim 5, further comprising: and the control circuit is connected with the clock signal end to transmit a control signal to the clock signal end, and is also connected with the reading circuit.
7. The data arithmetic circuit of claim 6, wherein each D flip-flop comprises: the first latch is connected with the data input end, the second latch is connected with the first latch, and the second latch is also connected with the data output end;
wherein the first latch is configured to latch input data from the data input and to transfer the input data to the second latch under control of a clock signal; the second latch is used for latching the data from the first latch and transmitting the data from the data output end to the corresponding full adder under the control of a clock signal.
8. The data arithmetic circuit of claim 7, wherein each D flip-flop further comprises:
a first transmission gate connected between the first latch and the data input;
a second transmission gate connected in the first latch;
a third transmission gate connected between the first latch and the second latch;
a fourth transmission gate connected within the second latch;
the four transmission gates are connected with the clock signal end; when the clock signal end receives a first level signal, the first transmission gate and the fourth transmission gate are turned off, and the second transmission gate and the third transmission gate are turned on; when the clock signal end receives a second level signal, the first transmission gate and the fourth transmission gate are conducted, and the second transmission gate and the third transmission gate are turned off.
9. A memory-integrated chip comprising the data arithmetic circuit according to any one of claims 1 to 8.
10. The banker chip of claim 9, wherein the banker chip is a memory;
the decoding circuit is a decoder in the memory, and the storage array is a storage unit array in the memory; the readout circuit is a readout circuit in the memory for reading the memory cell array.
CN202110705287.8A 2021-06-24 2021-06-24 Data operation circuit and storage and calculation integrated chip Pending CN113345484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110705287.8A CN113345484A (en) 2021-06-24 2021-06-24 Data operation circuit and storage and calculation integrated chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110705287.8A CN113345484A (en) 2021-06-24 2021-06-24 Data operation circuit and storage and calculation integrated chip

Publications (1)

Publication Number Publication Date
CN113345484A true CN113345484A (en) 2021-09-03

Family

ID=77478479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110705287.8A Pending CN113345484A (en) 2021-06-24 2021-06-24 Data operation circuit and storage and calculation integrated chip

Country Status (1)

Country Link
CN (1) CN113345484A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937470A (en) * 2022-05-20 2022-08-23 电子科技大学 Fixed point full-precision memory computing circuit based on multi-bit SRAM unit
CN117235003A (en) * 2023-09-26 2023-12-15 海光信息技术(苏州)有限公司 Memory readout circuit, data operation method in memory and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110383237A (en) * 2017-02-28 2019-10-25 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN112997147A (en) * 2018-10-10 2021-06-18 美光科技公司 Vector register implemented in memory

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110383237A (en) * 2017-02-28 2019-10-25 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN112997147A (en) * 2018-10-10 2021-06-18 美光科技公司 Vector register implemented in memory

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林其芃 等: ""一种基于多值RRAM的快速逻辑电路"", 《微电子学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937470A (en) * 2022-05-20 2022-08-23 电子科技大学 Fixed point full-precision memory computing circuit based on multi-bit SRAM unit
CN117235003A (en) * 2023-09-26 2023-12-15 海光信息技术(苏州)有限公司 Memory readout circuit, data operation method in memory and related equipment

Similar Documents

Publication Publication Date Title
CN102197436B (en) Data path for multi-level cell memory, methods for storing and methods for utilizing a memory array
TWI620058B (en) Swap operations in memory
US4044243A (en) Information processing system
US5261068A (en) Dual path memory retrieval system for an interleaved dynamic RAM memory unit
CN110058839B (en) Circuit structure based on static random access memory internal subtraction method
CN113345484A (en) Data operation circuit and storage and calculation integrated chip
CN113467751B (en) Analog domain memory internal computing array structure based on magnetic random access memory
US5333119A (en) Digital signal processor with delayed-evaluation array multipliers and low-power memory addressing
EP3786956A1 (en) Memory reads of weight values
US20190189166A1 (en) System comprising a memory capable of implementing calculation operations
CN114937470B (en) Fixed point full-precision memory computing circuit based on multi-bit SRAM unit
US11211115B2 (en) Associativity-agnostic in-cache computing memory architecture optimized for multiplication
CN114360595A (en) Subtraction calculation circuit structure based on row and column bi-direction in 8T SRAM memory
US20140082282A1 (en) Multi-granularity parallel storage system and storage
CN110737612A (en) processors with in-memory computation
CN111627479B (en) Coding type flash memory device, system and coding method
CN110085270B (en) Storage operation circuit module and processor
CN112216323B (en) Memory cell and static random access memory
Chen et al. An INT8 Charge-Digital Hybrid Compute-In-Memory Macro with CNN-Friendly Shift-Feed Register Design
CN112951290B (en) Memory computing circuit and device based on nonvolatile random access memory
CN117608519B (en) Signed multiplication and multiply-accumulate operation circuit based on 10T-SRAM
CN114647398B (en) Carry bypass adder-based in-memory computing device
US11935586B2 (en) Memory device and method for computing-in-memory (CIM)
US20240257865A1 (en) Memory device and method for computing-in-memory (cim)
US20230153067A1 (en) In-memory computing method and circuit, semiconductor memory, and memory structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903

RJ01 Rejection of invention patent application after publication