CN116090513A - Operation method and device for matrix multiplication - Google Patents

Operation method and device for matrix multiplication Download PDF

Info

Publication number
CN116090513A
CN116090513A CN202111294888.0A CN202111294888A CN116090513A CN 116090513 A CN116090513 A CN 116090513A CN 202111294888 A CN202111294888 A CN 202111294888A CN 116090513 A CN116090513 A CN 116090513A
Authority
CN
China
Prior art keywords
bit
precision
bits
data
sign
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111294888.0A
Other languages
Chinese (zh)
Inventor
雷洪
甄德根
吴桐庆
孔德辉
徐科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanechips Technology Co Ltd
Original Assignee
Sanechips Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanechips Technology Co Ltd filed Critical Sanechips Technology Co Ltd
Priority to CN202111294888.0A priority Critical patent/CN116090513A/en
Priority to PCT/CN2022/129619 priority patent/WO2023078364A1/en
Publication of CN116090513A publication Critical patent/CN116090513A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06GANALOGUE COMPUTERS
    • G06G7/00Devices in which the computing operation is performed by varying electric or magnetic quantities
    • G06G7/12Arrangements for performing computing operations, e.g. operational amplifiers
    • G06G7/16Arrangements for performing computing operations, e.g. operational amplifiers for multiplication or division
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Power Engineering (AREA)
  • Nonlinear Science (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention provides an operation method and device for matrix multiplication, wherein the operation method comprises the following steps: respectively splitting two 2N-bit floating point data into corresponding sign bits, precision bits and exponent bits, and respectively splitting four N-bit integer data into corresponding sign bits and precision bits; and performing matrix multiplication operation on the two floating point type data through digit addition, sign bit exclusive OR and precision bit multiplication, performing matrix multiplication operation on the four integer type data two by two through sign bit exclusive OR and precision bit multiplication, and multiplexing a multiplication unit and an addition unit in the matrix multiplication operation of the floating point type data and the integer type data. In the invention, the input data with different data types are split, so that multiplication and addition operation resources of the accelerator can be multiplexed in the matrix multiplication process, thereby greatly reducing the chip area of the accelerator and lowering the cost.

Description

Operation method and device for matrix multiplication
Technical Field
The embodiment of the invention relates to the field of matrix multiplication, in particular to an operation method and device for matrix multiplication.
Background
With the progress of technology, the neural network in artificial intelligence has an increasing demand for convolution operation and full-join operation capabilities in accelerators, but convolution operation and full-join operation can be converted into matrix multiplication operation. Matrix multiplication consists of multiplication and addition, with the multiplication and addition forces of existing accelerators already lifted from goss to TOPS. At the same time, the multiplication and addition forces in the accelerator are raised, requiring more arithmetic units to support. However, it is necessary for the chip designer to support more arithmetic units with as small an area and cost as possible, thereby achieving greater computational power.
The existing AI accelerator mainly supports the input data types of INT8, INT16, INT32, FP16, FP32, FP64, etc., and if the AI accelerator supporting the previous 6 data types is to be implemented, 6 independent operation units are needed to support the operations of 6 inputs respectively. Therefore, the AI accelerator has a disadvantage that, for the same neural network, generally one input data type, only one operation unit performs operation at the same time, but a plurality of independent operation units are required, thus leading to an increase in chip area and cost.
Disclosure of Invention
The embodiment of the invention provides a matrix multiplication operation method and a matrix multiplication operation device, which at least solve the problems of increased chip area and cost caused by the fact that independent operation units with multiple input data types are needed in an accelerator in the related technology.
According to an embodiment of the present invention, there is provided a method of operation of matrix multiplication, including: respectively splitting two 2N-bit floating point data into corresponding sign bits, precision bits and exponent bits, and respectively splitting four N-bit integer data into corresponding sign bits and precision bits; and performing matrix multiplication operation on the two floating point type data through digit addition, sign bit exclusive OR and precision bit multiplication, performing matrix multiplication operation on the four integer type data two by two through sign bit exclusive OR and precision bit multiplication, and multiplexing a multiplication unit and an addition unit in the matrix multiplication operation of the floating point type data and the integer type data.
In one exemplary embodiment, matrix multiplication of the two floating-point data by means of finger addition, sign bit exclusive OR, and precision bit multiplication includes: adding the exponent bits of the first floating point type data and the exponent bits of the second floating point type data, performing exclusive OR operation on the sign bits of the first floating point type data and the sign bits of the second floating point type data, and performing multiplication operation on the precision bits of the first floating point type data and the precision bits of the second floating point type data.
In one exemplary embodiment, matrix multiplying the four integer data two by sign bit exclusive OR and precision bit multiplication comprises: performing exclusive OR operation on sign bits of first integer data and sign bits of second integer data, and performing multiplication operation on precision bits of the first integer data and the second integer data to obtain a first operation result containing the sign bits and the precision bits; performing exclusive OR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and performing multiplication operation on the precision bit of the third integer data and the precision bit of the fourth integer data to obtain a second operation result containing the sign bit and the precision bit; and adding the first operation result and the second operation result.
In one exemplary embodiment, splitting two 2N-bit floating point data into corresponding sign bits, precision bits, and exponent bits, and splitting four N-bit integer data into corresponding sign bits and precision bits, includes: two 16-bit floating point type data are respectively split into a 1-bit sign bit, an 11-bit precision bit and a 4-bit exponent bit, and four 8-bit integer type data are respectively split into a 1-bit sign bit and a 7-bit precision bit.
In one exemplary embodiment, matrix multiplication of the two floating-point data by means of finger addition, sign bit exclusive OR, and precision bit multiplication includes: and multiplying the first floating point type data consisting of the 1-bit sign bit and the 11-bit precision bit with the second floating point type data consisting of the 1-bit sign bit and the 11-bit precision bit to obtain a 1-bit sign bit and a 22-bit original code, and converting the 1-bit sign bit and the 22-bit original code into a complementary code to obtain a 1-bit sign bit 22-bit complementary code.
In one exemplary embodiment, matrix multiplying the four integer data by sign bit exclusive OR and precision bit multiplication comprises: multiplying first integer data consisting of 1bit sign bit and 7bit precision bit with second integer data consisting of 1bit sign bit and 7bit precision bit to obtain a first operation result consisting of 1bit sign bit and 14bit original code; multiplying the third integer data consisting of the 1bit sign bit and the 7bit precision bit with the fourth integer data consisting of the 1bit sign bit and the 7bit precision bit to obtain a second operation result consisting of the 1bit sign bit and the 14bit original code; and adding the first multiplication operation and the second multiplication operation result to obtain a 1bit symbol bit and a 15bit original code, and converting the original code into a complementary code to obtain the 1bit symbol bit and the 15bit complementary code.
In one exemplary embodiment, the addition operation in the matrix multiplication operation of floating point type data includes: selecting a maximum number from the split indexes; respectively calculating the step difference of each index relative to the maximum number; right shifting is carried out on the product data bit according to the step difference; the shifted product data is added.
According to another embodiment of the present invention, there is provided an operation device for matrix multiplication, including: the splitting module is used for splitting two 2N-bit floating point data into corresponding sign bits, precision bits and index bits respectively, and splitting four N-bit integer data into corresponding sign bits and precision bits respectively; the operation module is used for carrying out matrix multiplication operation on the two floating point type data through addition of the digit bits, exclusive OR of the sign bits and multiplication of the precision bits, carrying out matrix multiplication operation on the four integer type data two by two through multiplication of the exclusive OR of the sign bits and multiplication of the precision bits, and multiplexing the multiplication unit and the addition unit in the matrix multiplication operation of the floating point type data and the integer type data.
In an exemplary embodiment, the operation module includes: and the first operation unit is used for carrying out addition operation on the exponent bits of the first floating point type data and the exponent bits of the second floating point type data, carrying out exclusive OR operation on the sign bits of the first floating point type data and the sign bits of the second floating point type data, and carrying out multiplication operation on the precision bits of the first floating point type data and the precision bits of the second floating point type data.
In an exemplary embodiment, the operation module further includes: the second operation unit is used for performing exclusive OR operation on the sign bit of the first integer data and the sign bit of the second integer data and performing multiplication operation on the precision bit of the first integer data and the precision bit of the second integer data so as to obtain a first operation result comprising the sign bit and the precision bit; performing exclusive OR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and performing multiplication operation on the precision bit of the third integer data and the precision bit of the fourth integer data to obtain a second operation result containing the sign bit and the precision bit; and adding the first operation result and the second operation result
According to a further embodiment of the invention, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In the embodiment of the invention, the input data with different data types are split, so that multiplication and addition operation resources of the accelerator can be multiplexed in the matrix multiplication process, thereby greatly reducing the chip area of the accelerator and lowering the cost.
Drawings
FIG. 1 is a flow chart of a method of operation of matrix multiplication according to an embodiment of the present invention;
FIG. 2 is a block diagram of an operation device of matrix multiplication according to an embodiment of the present invention;
FIG. 3 is a block diagram of an operation device for matrix multiplication according to another embodiment of the present invention;
FIG. 4 is a schematic diagram of 4 pairs of FP16 multiply add and 8 pairs of INT8 multiply add multiplexing according to an embodiment of the invention;
FIG. 5 is a schematic diagram of preprocessing prior to FP16 and INT8 multiplication according to an embodiment of the invention;
FIG. 6 is a schematic diagram of FP16 and INT8 implementing multiplication operations separately and conversion to complement form according to an embodiment of the invention;
FIG. 7 is a schematic diagram of FP16 and INT8 multiplicative split multiplexing according to an embodiment of the invention;
fig. 8 is a schematic diagram of fp16 and int8 add multiplexing in accordance with an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
For accelerators that support an input data type, it is common to include a plurality of matrix multiplication units for that input data type. Since the neural network calculation only uses one input data type, only one matrix multiplication unit of multiple input data types is in working state at the same time, but the matrix multiplication units of multiple input data types must exist in the accelerator.
In order to solve the above problems, an embodiment of the present invention provides a method for performing matrix multiplication. The core of the matrix multiplication operation is an adder and a multiplier, and the operation mode of the matrix multiplication provided by the embodiment mainly multiplexes the multiplier and the adder in the matrix multiplication, so that the area consumption can be greatly reduced under the condition of meeting the function realization.
In this embodiment, multiplication multiplexing is implemented by utilizing multiplication after data splitting, and the multiplexing principle in this embodiment is that 2 multiplications and 1 addition of n-bit shaped data and 1 multiplication of 2 n-bit floating point data are used for resource multiplexing. For example: the resources of INT8+INT8 are multiplexed with the resources of FP16, the resources of INT16+INT16 are multiplexed with the resources of FP32, and the resources of INT32+INT32 are multiplexed with the resources of FP 64.
FIG. 1 is a flow chart of an operation method of matrix multiplication according to an embodiment of the present invention, as shown in FIG. 1, the flow includes the following steps:
step S102, respectively splitting two 2N-bit floating point data into corresponding sign bits, precision bits and index bits, and respectively splitting four N-bit integer data into corresponding sign bits and precision bits;
step S104, performing matrix multiplication operation on the two floating point type data through digit addition, sign bit exclusive OR and precision bit multiplication, performing matrix multiplication operation on the four integer type data two by two through sign bit exclusive OR and precision bit multiplication, and multiplexing a multiplication unit and an addition unit in the matrix multiplication operation of the floating point type data and the integer type data.
In an exemplary embodiment, performing matrix multiplication on the split two floating-point data through addition of the exponent bits, exclusive or of the sign bits, and multiplication of the precision bits in step S104 includes: adding the exponent bits of the first floating point type data and the exponent bits of the second floating point type data, performing exclusive OR operation on the sign bits of the first floating point type data and the sign bits of the second floating point type data, and performing multiplication operation on the precision bits of the first floating point type data and the precision bits of the second floating point type data.
In an exemplary embodiment, performing matrix multiplication on the split four integer data two by two through sign bit exclusive or and precision bit multiplication in step S104 includes: performing exclusive OR operation on sign bits of first integer data and sign bits of second integer data, and performing multiplication operation on precision bits of the first integer data and the second integer data to obtain a first operation result containing the sign bits and the precision bits; performing exclusive OR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and performing multiplication operation on the precision bit of the third integer data and the precision bit of the fourth integer data to obtain a second operation result containing the sign bit and the precision bit; and adding the first operation result and the second operation result.
In an exemplary embodiment where the resources of INT8 INT8+INT8 are multiplexed with the resources of FP16, step S102 comprises: two 16-bit floating point type data are respectively split into a 1-bit sign bit, an 11-bit precision bit and a 4-bit exponent bit, and four 8-bit integer type data are respectively split into a 1-bit sign bit and a 7-bit precision bit.
In an exemplary embodiment in which the resources of one INT8 INT8+INT8 are multiplexed with the resources of FP16, matrix multiplication of the two floating-point data after splitting by means of finger addition, sign bit exclusive OR, and precision bit multiplication comprises: and multiplying the first floating point type data consisting of the 1-bit sign bit and the 11-bit precision bit with the second floating point type data consisting of the 1-bit sign bit and the 11-bit precision bit to obtain a 1-bit sign bit and a 22-bit original code, and converting the 1-bit sign bit and the 22-bit original code into complementary codes to obtain a 1-bit sign bit 22-bit complementary code.
In one exemplary embodiment, matrix multiplication of the four integer data INT8 by sign bit exclusive or and precision bit multiplication comprises: multiplying first integer data consisting of 1bit sign bit and 7bit precision bit with second integer data consisting of 1bit sign bit and 7bit precision bit to obtain a first operation result consisting of 1bit sign bit and 14bit original code; multiplying the third integer data consisting of the 1bit sign bit and the 7bit precision bit with the fourth integer data consisting of the 1bit sign bit and the 7bit precision bit to obtain a second operation result consisting of the 1bit sign bit and the 14bit original code; and adding the first multiplication operation and the second multiplication operation result to obtain a 1bit symbol bit and a 15bit original code, and converting the original code into a complementary code to obtain the 1bit symbol bit and the 15bit complementary code.
In one exemplary embodiment, the addition operation in the matrix multiplication operation of floating point type data includes: selecting a maximum number from the split indexes; respectively calculating the step difference of each index relative to the maximum number; right shifting is carried out on the product data bit according to the step difference; the shifted product data is added.
The method of resource multiplexing of the present embodiment may be, but not limited to, a matrix a of 4 rows and 4 columns applied to n-bit shaped data multiplied by a matrix B of 4 rows and 4 columns, or a matrix B of 4 columns and 8 columns of 2 n-bit floating point data multiplied by 8 columns and 4 rows.
The embodiment also provides an operation device for matrix multiplication, which is used for implementing the foregoing embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. For example an operator consisting of a multiplier and an adder.
Fig. 2 is a block diagram of an operation device for matrix multiplication according to an embodiment of the present invention, and as shown in fig. 2, the operation device 100 includes a splitting module 10 and an operation module 20.
The splitting module 10 is configured to split two 2N-bit floating point data into corresponding sign bits, precision bits and exponent bits, and split four N-bit integer data into corresponding sign bits and precision bits.
The operation module 20 is configured to perform matrix multiplication operation on the two floating point data by adding the digits, performing sign bit exclusive OR and performing precision bit multiplication, and performing matrix multiplication operation on the four integer data two by performing sign bit exclusive OR and performing precision bit multiplication, and multiplexing a multiplication unit and an addition unit in the matrix multiplication operation of the floating point data and the integer data.
Fig. 3 is a block diagram of an operation device for matrix multiplication according to an embodiment of the present invention, and as shown in fig. 3, the operation device 100 includes all the blocks shown in fig. 2, and the operation module 10 includes a first operation unit 11 and a second operation unit 12.
A first operation unit 11, configured to add the exponent bits of the first floating point type data and the exponent bits of the second floating point type data, perform an exclusive-or operation on the sign bits of the first floating point type data and the sign bits of the second floating point type data, and perform a multiplication operation on the precision bits of the first floating point type data and the precision bits of the second floating point type data.
A second operation unit 12, configured to perform an exclusive-or operation on the sign bit of the first integer data and the sign bit of the second integer data, and perform a multiplication operation on the precision bit of the first integer data and the precision bit of the second integer data, so as to obtain a first operation result including the sign bit and the precision bit; performing exclusive OR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and performing multiplication operation on the precision bit of the third integer data and the precision bit of the fourth integer data to obtain a second operation result containing the sign bit and the precision bit; and adding the first operation result and the second operation result
In the computing device provided by the embodiment, the input data with different data types are split, so that multiplication and addition operation resources of the accelerator can be multiplexed in the matrix multiplication process, and the chip area of the accelerator is greatly reduced and the cost is reduced.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
To facilitate an understanding of the present invention, the multiplexing of INT8 INT8+INT8 resources with FP16 resources is illustrated in FIG. 4. In the figure 4 multiplications of FP16 data type 3 additions and 8 multiplications of INT8 data type 7 additions are multiplexed.
In this embodiment, the place where the matrix multiplication is multiplexed is mainly multiplication and addition, and the operation flow of this embodiment is mainly divided into 3 stages: input data preprocessing, multiplication and addition.
First, input data is preprocessed.
Specifically, in this embodiment, the input fp16 and int8 are first converted to a fixed format, primarily for the purpose of enabling subsequent multiplications to be multiplexed. As shown in fig. 5, fp16 is split into fix12 and exponent parts, and int8 is converted into a 1bit symbol bit 7bit original code format.
Second, each multiplication unit will input 2 fp16 that need to be multiplied or 4 int8 that need to be multiplied two by two. The operation mode of multiplying the 2 fp16 by each other is specifically as follows: the fix12 formed by the two 1bit symbol bit 11bit original codes is multiplied to obtain a 1bit symbol bit 22bit original code, and then the 1bit symbol bit 22bit original code is converted into a complementary code, so that the 1bit symbol bit 22bit complementary code is finally obtained. The 4 int8 multiplication operation modes are specifically as follows: the method comprises the steps of multiplying fix8 consisting of two 1bit symbol bit 7bit original codes to obtain a 1bit symbol bit 14bit original code, adding 2 1bit symbol bit 14bit original codes to obtain a 1bit symbol bit 15bit original code, converting the original code into a complementary code, and finally obtaining the 1bit symbol bit 15bit complementary code.
In another embodiment, if FP16 and INT8 multiplications are handled separately, their implementation is as in fig. 6, with the final product of two INT8 added to reduce the number of output data.
As shown in fig. 6, two FP16 multiplications can be split into 3 operations: exponential addition, sign bit exclusive OR, 11bit precision bit multiplication, four INT8 pairwise multiplications can be expressed as 4 operations: 2 sign bit exclusive OR, 2 7bit precision multiplication.
In the present embodiment, in order to sufficiently multiplex the multiplication resources of fp16 and int8 here, a method for implementing multiplication operation by subsequent resource multiplexing is proposed.
In this embodiment, the multiplication of fp16 and the multiplication of int8 may be split in the split manner of FIG. 7. Specifically, as shown in fig. 6, operation 3 of fp16 is split into smaller granularity of 7bit, 4bit, and operation D of int8 is split into forms of 7bit, and 4 bit. This eventually allows multiplexing of 3 multipliers DSP7 x 4, DSP7 x 7.
In this embodiment, the addition implementation resources in fp16 and int8 matrix multiplication are also fully multiplexed, so a method for implementing addition operation by subsequent resource multiplexing is proposed.
In this embodiment, the addition in fp16 matrix multiplication is performed as in FIG. 8.
First, find the maximum index from 4 index after splitting. The 4 indexes are compared pairwise to select larger values to obtain 2 indexes, and the larger values of the 2 indexes are compared to obtain the maximum value of the 4 indexes.
Step two, calculating the step difference, namely respectively calculating the difference value between the maximum index and the 4 indexes in the step one to obtain the step difference of 4 numbers relative to the maximum index;
thirdly, shifting, namely right shifting 4 product data bits, wherein the shift bit number is the step difference calculated in the second step;
and fourthly, adding 4 numbers to obtain 2 numbers every two phases for the first time, and then adding the 2 numbers for the second time to obtain 1 number, wherein add_0_3 is the final result.
In this embodiment, the addition in the int8 matrix multiplication is performed as shown in fig. 8.
Namely, the number of 4 is 2 for every two phases for the first time, the number of 2 is 1 for the second addition, and add_0_7 is the final result.
In this embodiment, the addition of fp16 and int8 is accomplished by multiplexing the following parts: 8 adders for the first addition, 4 adders for the second addition, 2 adders for the third addition, and adders for the fourth addition.
In the embodiment, the matrix multiplication unit can be multiplexed by multiple operation precision, and the area consumption is greatly reduced on the premise of ensuring the function. That is, under the condition of limited chip area resources, matrix multiplication operation with more precision can be realized, so that the artificial intelligent accelerator can support more precision. Thereby improving the computing power of the artificial intelligent accelerator and increasing the application scene thereof.
Embodiments of the present invention also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In an exemplary embodiment, the electronic apparatus may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method of operation for matrix multiplication, comprising:
respectively splitting two 2N-bit floating point data into corresponding sign bits, precision bits and exponent bits, and respectively splitting four N-bit integer data into corresponding sign bits and precision bits;
and performing matrix multiplication operation on the two floating point type data through digit addition, sign bit exclusive OR and precision bit multiplication, performing matrix multiplication operation on the four integer type data two by two through sign bit exclusive OR and precision bit multiplication, and multiplexing a multiplication unit and an addition unit in the matrix multiplication operation of the floating point type data and the integer type data.
2. The method of claim 1, wherein matrix multiplying the two floating point data by finger addition, sign bit exclusive or, and precision bit multiplication comprises:
adding the exponent bits of the first floating point type data and the exponent bits of the second floating point type data, performing exclusive OR operation on the sign bits of the first floating point type data and the sign bits of the second floating point type data, and performing multiplication operation on the precision bits of the first floating point type data and the precision bits of the second floating point type data.
3. The method of claim 1, wherein matrix multiplying the four integer data two by sign bit exclusive or and precision bit multiplication comprises:
performing exclusive OR operation on sign bits of first integer data and sign bits of second integer data, and performing multiplication operation on precision bits of the first integer data and the second integer data to obtain a first operation result containing the sign bits and the precision bits;
performing exclusive OR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and performing multiplication operation on the precision bit of the third integer data and the precision bit of the fourth integer data to obtain a second operation result containing the sign bit and the precision bit;
and adding the first operation result and the second operation result.
4. The method of claim 1, wherein splitting two 2N bits of floating point type data into corresponding sign bits, precision bits, and exponent bits, and splitting four N bits of integer type data into corresponding sign bits and precision bits, comprises:
two 16-bit floating point type data are respectively split into a 1-bit sign bit, an 11-bit precision bit and a 4-bit exponent bit, and four 8-bit integer type data are respectively split into a 1-bit sign bit and a 7-bit precision bit.
5. The method of claim 4, wherein matrix multiplying the two floating point data by finger addition, sign bit exclusive or, and precision bit multiplication comprises:
and multiplying the first floating point type data consisting of the 1-bit sign bit and the 11-bit precision bit with the second floating point type data consisting of the 1-bit sign bit and the 11-bit precision bit to obtain a 1-bit sign bit and a 22-bit original code, and converting the 1-bit sign bit and the 22-bit original code into complementary codes to obtain a 1-bit sign bit 22-bit complementary code.
6. The method of claim 4, wherein matrix multiplying the four integer data by sign bit exclusive or and precision bit multiplication comprises:
multiplying first integer data consisting of 1bit sign bit and 7bit precision bit with second integer data consisting of 1bit sign bit and 7bit precision bit to obtain a first operation result consisting of 1bit sign bit and 14bit original code;
multiplying the third integer data consisting of the 1bit sign bit and the 7bit precision bit with the fourth integer data consisting of the 1bit sign bit and the 7bit precision bit to obtain a second operation result consisting of the 1bit sign bit and the 14bit original code;
and adding the first multiplication operation and the second multiplication operation result to obtain a 1bit symbol bit and a 15bit original code, and converting the original code into a complementary code to obtain the 1bit symbol bit and the 15bit complementary code.
7. The method of claim 4, wherein the adding in the matrix multiplication of floating point data comprises:
selecting a maximum number from the split indexes;
respectively calculating the step difference of each index relative to the maximum number;
right shifting is carried out on the product data bit according to the step difference;
the shifted product data is added.
8. An arithmetic device for matrix multiplication, comprising:
the splitting module is used for splitting two 2N-bit floating point data into corresponding sign bits, precision bits and index bits respectively, and splitting four N-bit integer data into corresponding sign bits and precision bits respectively;
the operation module is used for carrying out matrix multiplication operation on the two floating point type data through addition of the digit bits, exclusive OR of the sign bits and multiplication of the precision bits, carrying out matrix multiplication operation on the four integer type data two by two through multiplication of the exclusive OR of the sign bits and multiplication of the precision bits, and multiplexing the multiplication unit and the addition unit in the matrix multiplication operation of the floating point type data and the integer type data.
9. The apparatus of claim 8, wherein the operation module comprises:
and the first operation unit is used for carrying out addition operation on the exponent bits of the first floating point type data and the exponent bits of the second floating point type data, carrying out exclusive OR operation on the sign bits of the first floating point type data and the sign bits of the second floating point type data, and carrying out multiplication operation on the precision bits of the first floating point type data and the precision bits of the second floating point type data.
10. The apparatus of claim 8, wherein the operation module further comprises:
the second operation unit is used for performing exclusive OR operation on the sign bit of the first integer data and the sign bit of the second integer data and performing multiplication operation on the precision bit of the first integer data and the precision bit of the second integer data so as to obtain a first operation result comprising the sign bit and the precision bit; performing exclusive OR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and performing multiplication operation on the precision bit of the third integer data and the precision bit of the fourth integer data to obtain a second operation result containing the sign bit and the precision bit; and adding the first operation result and the second operation result.
11. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 7.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.
CN202111294888.0A 2021-11-03 2021-11-03 Operation method and device for matrix multiplication Pending CN116090513A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111294888.0A CN116090513A (en) 2021-11-03 2021-11-03 Operation method and device for matrix multiplication
PCT/CN2022/129619 WO2023078364A1 (en) 2021-11-03 2022-11-03 Operation method and apparatus for matrix multiplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111294888.0A CN116090513A (en) 2021-11-03 2021-11-03 Operation method and device for matrix multiplication

Publications (1)

Publication Number Publication Date
CN116090513A true CN116090513A (en) 2023-05-09

Family

ID=86208771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111294888.0A Pending CN116090513A (en) 2021-11-03 2021-11-03 Operation method and device for matrix multiplication

Country Status (2)

Country Link
CN (1) CN116090513A (en)
WO (1) WO2023078364A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986264B (en) * 2010-11-25 2013-07-31 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN108287681B (en) * 2018-02-14 2020-12-18 中国科学院电子学研究所 Single-precision floating-point fusion point multiplication operation device
CN113157247B (en) * 2021-04-23 2022-10-25 西安交通大学 Reconfigurable integer-floating point multiplier

Also Published As

Publication number Publication date
WO2023078364A1 (en) 2023-05-11

Similar Documents

Publication Publication Date Title
US20210349692A1 (en) Multiplier and multiplication method
CN115934030B (en) Arithmetic logic unit, method and equipment for floating point number multiplication
CN110288086B (en) Winograd-based configurable convolution array accelerator structure
CN111832719A (en) Fixed point quantization convolution neural network accelerator calculation circuit
CN111008003B (en) Data processor, method, chip and electronic equipment
TW202115560A (en) Multiplier and method for floating-point arithmetic, integrated circuit chip, and computing device
CN112434801B (en) Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN106682258B (en) Multi-operand addition optimization method and system in high-level comprehensive tool
CN114341796A (en) Signed multiword multiplier
CN113608718B (en) Method for realizing prime number domain large integer modular multiplication calculation acceleration
CN110554854A (en) Data processor, method, chip and electronic equipment
CN116661734B (en) Low-precision multiply-add operator supporting multiple inputs and multiple formats
CN114003194A (en) Operation method and device based on multiplier and computer readable storage medium
CN116205244B (en) Digital signal processing structure
CN110825346B (en) Low logic complexity unsigned approximation multiplier
CN110458277B (en) Configurable precision convolution hardware architecture suitable for deep learning hardware accelerator
CN116090513A (en) Operation method and device for matrix multiplication
WO2023124371A1 (en) Data processing apparatus and method, and chip, computer device and storage medium
CN113283591B (en) Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
CN115827555A (en) Data processing method, computer device, storage medium and multiplier structure
CN110647307B (en) Data processor, method, chip and electronic equipment
CN113504892A (en) Method, system, equipment and medium for designing multiplier lookup table
CN109992242B (en) Operation method and device based on multiplier
CN112685001A (en) Booth multiplier and operation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication