CN109726357A

CN109726357A - Matrix multiplication calculation method and calculating equipment

Info

Publication number: CN109726357A
Application number: CN201711030226.6A
Authority: CN
Inventors: 贾喆; 陈凯
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2019-05-07
Anticipated expiration: 2037-10-27
Also published as: CN109726357B

Abstract

This application discloses a kind of matrix multiplication calculation method and computing devices, this method comprises: multiple matrix multiplication implementations are provided, row parameter, column parameter and the second matrix column parameter of corresponding first matrix of the matrix multiplication implementation；According to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, corresponding objective matrix multiplication implementation is determined；The product of the first object matrix and the second objective matrix is calculated using the objective matrix multiplication implementation determined.Scheme disclosed by the invention can allow program always to choose optimal kernel in the process of running.In the case where the row parameter of multiplication matrix and column parameter are relatively fixed, optimal implementation can be automatically selected, the work load of executing subject such as GPU is reduced, guarantees the process performance of executing subject, improve the processing speed of executing subject.

Description

Matrix multiplication calculation method and calculating equipment

Technical field

This application involves computer field, in particular to a kind of matrix multiplication calculation method and calculating equipment.

Background technique

Graphics processor (GPU) has in terms of calculating matrix multiplication as single-instruction multiple-data stream (SIMD) (SIMD) processing equipment Natural advantage.In recent years with GPU performance be substantially improved and general-purpose computations graphics processor (GPGPU) technology It continues to develop, more and more applications start to calculate using in the enterprising row matrix multiplication of GPU.The main calculating of deep learning application Time all consumes in the relevant operation of matrix multiplication, so, it, can be in big model if can be improved the calculated performance of matrix multiplication The interior performance for improving a variety of applications is enclosed, production and Efficiency are all made a significant impact.

When carrying out matrix multiplication using GPU, it will usually select multiple thread blocks, per thread block is responsible for calculated result square A part of battle array；Specifically, for two matrix multiples of 1024x1024 and 1024x1024 size, matrix of consequence size It also is 1024x1024；In calculating, the matrix of consequence of 1024x1024 size can be calculated with 4x4 thread block, therefore every A thread block is responsible for calculating 256x256 size in the matrix of consequence；The thread block of 8x8 can also be used, per thread block is responsible for meter Calculate 128x128 size in matrix of consequence.It is responsible for calculating the big minor matrix of 256x256, the part of the big minor matrix of 128x128 is referred to as Kernel (i.e. the implementation of matrix calculating)；The mode of the thread block of selection 8x8 size thread block or 4x4 size, just It is this patent problem to be solved.In practical calling process, generally according to different matrix sizes, select specific kernel into Row calculates.Existing kernel selection method often carries out simply judgement according to matrix size and chooses corresponding kernel.And This method, because the row parameter of each matrix and the possibility situation of column parameter are excessive, is generally difficult to select in actually calculating Optimal kernel causes GPU to calculate time extension, entire to calculate equipment performance decline.

Summary of the invention

In view of the problems existing in the prior art, the purpose of the present invention is to propose to a kind of matrix multiplication calculation methods, to solve It is of the existing technology cannot correctly select kernel cause GPU calculate the time extend, entire program feature decline the problem of.

To solve the above-mentioned problems, one embodiment of the application discloses a kind of matrix multiplication calculation method, comprising:

Multiple matrix multiplication implementations, the row parameter of corresponding first matrix of the matrix multiplication implementation, column are provided Parameter and the second matrix column parameter；

Determine the row parameter of first object matrix, the column parameter of column parameter and the second objective matrix；

According to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, determine corresponding Objective matrix multiplication implementation；

The first object matrix and the second objective matrix are calculated using the objective matrix multiplication implementation determined Product.

In an embodiment of matrix multiplication calculation method of the present invention, join according to the row of the first object matrix It is described before the step of counting, the column parameter of column parameter and the second objective matrix, determining corresponding objective matrix multiplication implementation Method further include:

Identification id is distributed for the matrix multiplication implementation；

By the row parameter of the identification id of the matrix multiplication implementation the first matrix corresponding with the implementation, column ginseng Several and the second matrix column parameter is stored in look-up table.

In an embodiment of matrix multiplication calculation method of the present invention, by the identification id of the matrix multiplication implementation The step of row parameter, column parameter and the second matrix column parameter of the first matrix corresponding with the implementation are stored in look-up table is wrapped It includes:

The row parameter of first matrix of the matrix multiplication implementation, column parameter and the second matrix column parameter are generated Key assignments；

The identification id and key assignments composition key-value pair in the matrix multiplication implementation are stored in the look-up table.

It is described to be joined according to the row of the first object matrix in an embodiment of matrix multiplication calculation method of the present invention The column parameter of number, column parameter and the second objective matrix, the step of determining corresponding objective matrix multiplication implementation include:

The row parameter, column parameter and the second matrix column that have with first matrix are searched whether in the look-up table The matrix multiplication implementation of the identical parameter of parameter.

In an embodiment of matrix multiplication calculation method of the present invention, the method also includes:

When not finding corresponding matrix multiplication implementation, multiple optional implementations are assessed, according to meter Evaluation time selects preferred implementation scheme from the optional implementation.

It is described that multiple optional implementations are commented in an embodiment of matrix multiplication calculation method of the present invention Estimate, according to calculate the time, from the optional implementation select preferred implementation scheme the step of include:

The first test matrix and the second test matrix are generated, first test matrix and second test matrix have Ranks parameter identical with the first object matrix and second objective matrix；

It is utilized respectively the product that multiple optional implementations calculate first test matrix and the second test matrix, is obtained The calculating time of each optional implementation；

According to the time is calculated, preferred implementation scheme is selected from the optional implementation.

In an embodiment of matrix multiplication calculation method of the present invention, according to calculate the time, from the optional realization After the step of selecting preferred implementation scheme in scheme, the method also includes:

The preferred implementation scheme is uploaded to GPU；

The row parameter of first matrix of the preferred implementation scheme, column parameter and the second matrix column parameter are generated into key assignments；

The identification id and key assignments composition key-value pair in the preferred implementation scheme are stored in the look-up table.

To achieve the above object, one embodiment of the invention also proposes a kind of matrix multiplication computing device, comprising:

Module is provided, for providing multiple matrix multiplication implementations, corresponding first square of the matrix multiplication implementation Row parameter, column parameter and the second matrix column parameter of battle array；

Parameter determination module, for determining the row parameter of first object matrix, the column ginseng of column parameter and the second objective matrix Number；

Scheme determining module, for according to the row parameter of the first object matrix, column parameter and the second objective matrix Column parameter determines corresponding objective matrix multiplication implementation；

Computing module, for calculating the first object matrix and the using the objective matrix multiplication implementation determined The product of two objective matrixs.

In an embodiment of matrix multiplication computing device of the present invention, described device further include:

Distribution module, for distributing identification id for the matrix multiplication implementation；

Memory module, for by the identification id of the matrix multiplication implementation the first matrix corresponding with the implementation Row parameter, column parameter and the second matrix column parameter be stored in look-up table.

In an embodiment of matrix multiplication computing device of the present invention, the memory module further include:

Key assignments generate submodule, for by the row parameter of the first matrix of the matrix multiplication implementation, column parameter and Second matrix column parameter generates key assignments；

Sub-module stored, for storing the identification id and key assignments composition key-value pair in the matrix multiplication implementation In the look-up table.

In an embodiment of matrix multiplication computing device of the present invention, the scheme determining module is used for:

Preferred embodiment obtains module, for when not finding corresponding matrix multiplication implementation, to multiple optional realities Existing scheme is assessed, and according to the time is calculated, preferred implementation scheme is selected from the optional implementation.

In an embodiment of matrix multiplication computing device of the present invention, the preferred embodiment obtains module and includes:

Test matrix generates submodule, for generating the first test matrix and the second test matrix, the first test square Battle array and second test matrix have ranks parameter identical with the first object matrix and second objective matrix；

Calculate time acquisition submodule, for be utilized respectively multiple optional implementations calculate first test matrix and The product of second test matrix obtains the calculating time of each optional implementation；

Submodule is selected, for selecting preferred implementation scheme from the optional implementation according to the time is calculated.

Uploading module, for the preferred implementation scheme to be uploaded to GPU；

Key assignments generation module, row parameter, column parameter and second for first matrix by the preferred implementation scheme Matrix column parameter generates key assignments；

Key-value pair memory module, for being stored in the identification id and key assignments composition key-value pair in the preferred implementation scheme In the look-up table.

The embodiment of the present application also proposes a kind of electronic device, comprising:

Memory, wherein being stored with computer readable program code；And

Processor, when the processor executes the computer readable program code of the storage, the electronic device It performs the following operations:

It can be seen from the above, matrix multiplication calculation method and matrix multiplication computing device that the embodiment of the present application proposes at least have It has the advantage that

The embodiment of the present invention proposes a kind of matrix multiplication calculation method and matrix multiplication computing device, passes through preset Matrix Multiplication The implementation of method can be joined in the case where the row parameter of multiplication matrix and relatively fixed column parameter by row parameter and column Number automatically selects optimal implementation, reduces the work load for calculating equipment, guarantees the treatability of executing subject such as GPU Can, improve the processing speed of executing subject.Present invention is particularly suitable for the fields that the multiplication that matrix parameter is repeatedly fixed calculates It closes, such as in deep learning prediction and Training scene, the performance of deep learning training and prediction can be significantly improved, reduce meter The amount of calculation of equipment is calculated, the processing speed for calculating equipment is improved.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the flow chart of the matrix multiplication calculation method of the application first embodiment.

Fig. 2 is the flow chart of the matrix multiplication calculation method of the application second embodiment.

Fig. 3 is the flow chart for the sub-step that step S206 is included in Fig. 2.

Fig. 4 is the block diagram of the matrix multiplication computing device of the application 3rd embodiment.

Fig. 5 is the block diagram of the matrix multiplication computing device of the application fourth embodiment.

Fig. 6 is the execution schematic diagram of the method and step of one embodiment of the application.

Fig. 7 is that the preferred embodiment of the application fourth embodiment obtains the block diagram of module.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, those of ordinary skill in the art's every other embodiment obtained belong to the application protection Range.

One of core concept of the application is, proposes a kind of matrix multiplication calculation method and matrix multiplication computing device, The optimal implementation prestored is searched using the row parameter and column parameter of multiplication matrix, avoids and needs to repeat in the prior art The problem of finding implementation.

It is the execution schematic diagram of the method and step of one embodiment of the application referring to Fig. 6.As shown in fig. 6, mesh of the invention Mark is to look for the best implementation of the realization particular matrix multiplication prestored.In embodiments of the present invention, phase can be inputted first Row parameter m, the column parameter n and the second matrix column parameter k for multiplying the first matrix in matrix, are searched in the look-up table stored The ID of the corresponding implementation of these parameters, the implementation are that the multiplication of the authenticated calculating parameter matrix is calculated by history Best implementation.When in look-up table there are the ID that when above-mentioned implementation, can use the implementation find it is corresponding Implementation (Kernel) is utilizing this implementation to calculate the multiplication between the matrix of corresponding parameter in executing subject example. In this scheme, look-up table be can store in memory blocks such as memory, hard disks, or is not provided with look-up table and is directly passed through parameter Search implementation.

First embodiment

First embodiment of the invention proposes a kind of matrix multiplication calculation method, such as can apply in GPU.Shown in Fig. 1 For the step flow chart of the matrix multiplication calculation method of first embodiment of the invention.As shown in Figure 1, the matrix of the embodiment of the present invention Multiplication calculation method includes the following steps:

S101 provides multiple matrix multiplication implementations, the row ginseng of corresponding first matrix of the matrix multiplication implementation Number, column parameter and the second matrix column parameter；

The matrix multiplication calculation method that the embodiment of the present invention proposes, can be real in the calculating equipment comprising GPU and CPU It is existing, such as server or PC, mobile terminal etc..

In this step, when needing to calculate the product of two multiplication matrix in GPU, executing subject, for example (,) it is above-mentioned Server, can store multiple matrix multiplication implementations in GPU.A variety of implementations have been prestored in GPU (kernel), optimal case of the matrix multiplication implementation namely for the matrix of the specific ranks parameter of calculating.This Optimal case can be by multiple authentication and relatively after the calculating time shortest scheme etc. that determines.

These implementations respectively correspond the row parameter, column parameter and the second matrix column parameter of the first matrix.For example, Row parameter, column parameter and the second matrix column parameter of corresponding first matrix of one implementation (Kernel) be respectively (m, N, k), then these three parameters and corresponding implementation can be generated and map and stored, such as can store in memory In space.These implementations can be stored in advance in corresponding memory space, be also possible to empty from others storage Between --- for example cloud is called immediately, is not intended to limit herein.

S102 determines the row parameter of first object matrix, the column parameter of column parameter and the second objective matrix；

In this step, can obtain can two target multiplication matrix ranks number.Due to two squares that can be multiplied Line number of the battle array necessarily satisfying for the columns and second target matrix (such as right matrix) of first aim matrix (such as left matrix) It is identical, therefore only need to obtain the row parameter of first object matrix, the column parameter of column parameter and the second objective matrix.

The row parameter and column parameter of goal matrix can be what user was previously entered, such as user can be The parameter that the objective matrix to be calculated is inputted in the interactive interface of offer, is also possible to the ginseng automatically obtained according to the matrix of input Number, is not intended to limit herein.

S103, according to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, determining pair The objective matrix multiplication implementation answered；

In this step, can be joined according to the column of the row parameter of first object matrix, column parameter and the second objective matrix Number determines corresponding target square according to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix Battle array multiplication implementation.For example, the row parameter of first object matrix is m, column parameter is n, the column parameter of the second objective matrix is K can search corresponding implementation (kernel) according to tri- parameters of m, n, k, this implementation is usually to pass through performance The optimal implementation of the matrix multiple of the realization of the assessment parameter.The implementation is, for example, one section of calculation code, utilizes this Calculation code, executing subject can in an optimal manner calculate the matrix multiple.

For example, the first matrix of two multiplication matrix is the matrix of m × n, the second matrix is the matrix of n × k, above-mentioned two The parameter of multiplication matrix is indicated with (m, n, k).A variety of implementations for realizing different matrix multiples have been prestored in GPU, i.e., it is multiple kernel.For example, kernel-1 is for realizing the preferred embodiment of the multiplication matrix of (m1, n1, k1) parameter；Kernel-2 is to use In the preferred embodiment of the multiplication matrix of realization (m2, n2, k2) parameter；Kernel-3 is for realizing the multiplication of (m, n, k) parameter The preferred embodiment of matrix；In step s 102, according to the parameter of multiplication matrix (m, n, k), can choose kernel-3 conduct should The implementation of multiplication matrix.

For example, when multiplication matrix be 1024 × 1024 two matrixes (i.e. m, n, k are 1024), kernel-1 The scheme of two 64 × 64 matrix products is for example calculated, kernel-2 is, for example, the side for calculating two 128 × 128 matrix products Case；Kernel-3 is, for example, the scheme for calculating two 256 × 256 matrix products.

When carrying out matrix multiplication using GPU or CPU, it can choose multiple thread blocks, per thread block is responsible for calculating A part of matrix of consequence；Specifically, for two matrix multiples of 1024x1024 and 1024x1024 size, result square Battle array size is also 1024x1024；In calculating, the matrix of consequence of 1024x1024 size can be calculated with 4x4 thread block, Therefore per thread block is responsible for calculating 256x256 size in the matrix of consequence；The thread block of 8x8, per thread block can also be used It is responsible for 128x128 size in calculated result matrix.That is, kernel is to be responsible for calculating the big minor matrix of 256x256,128x128 size The part of matrix.

In calculating matrix multiplication, one big matrix (such as above-mentioned 1024 × 1024 matrix) can be divided into multiple small Matrix (as escribed above 64 × 64 matrixes), and the above-mentioned implementation in GPU be after computation partition these are small Matrix, the upper per thread block of GPU is responsible for calculating a minor matrix；The multiplication matrix of input can be very big, can also with very little, but It is that the minor matrix being finally divided into generally only has limited possibility.If known in the treatment process of history: by above-mentioned 1024 × 1,024 two matrixes are divided into multiple 256 × 256 matrix, and calculate two 256 × 256 matrixes using kernel-3 The scheme of product calculates its product, can quickly obtain the product of 1024 × 1,024 two matrixes, then can will Row parameter, column parameter and the second matrix column parameter (1024,1024,1024) of the first kernel-3 corresponding matrix exist It is associated in look-up table.This calculate 1024 × 1024 two objective matrixs when, directly by parameter (1024,1024, 1024) corresponding kernel-3 is found.

S104 calculates the first object matrix and the second target square using the objective matrix multiplication implementation determined The product of battle array.

In this step, if be consistent there are parameter with above three parameter m, n, k in the implementation prestored The implementation of matrix multiplication, then it is assumed that existing therefore can corresponding to the implementation for currently needing to carry out multiplication calculating Using the implementation, the product of first object matrix and the second objective matrix is calculated.The step of calculating, can hold in GPU Row, by calling corresponding implementation in GPU, it can optimal mode executes the multiplying.

By above-mentioned it is found that in the first embodiment of the invention, a kind of matrix multiplication calculation method is proposed, by being arranged The implementation of matrix multiplication can pass through row parameter in the case where the row parameter of multiplication matrix and relatively fixed column parameter Optimal implementation is automatically selected with column parameter, reduces the work load for calculating equipment, guarantees the place of executing subject such as GPU Rationality energy improves the processing speed of executing subject.Present invention is particularly suitable for the multiplication calculating that matrix parameter is repeatedly fixed Occasion, such as deep learning prediction and Training scene in, can significantly improve deep learning training and prediction performance, drop The low amount of calculation for calculating equipment improves the processing speed for calculating equipment.

Second embodiment

Second embodiment of the invention proposes a kind of matrix multiplication calculation method.Fig. 2 show second embodiment of the invention The step flow chart of matrix multiplication calculation method.As shown in Fig. 2, the matrix multiplication calculation method of the embodiment of the present invention includes as follows Step:

S202 provides multiple matrix multiplication implementations, the row ginseng of corresponding first matrix of the matrix multiplication implementation Number, column parameter and the second matrix column parameter；

S203 determines the row parameter of first object matrix, the column parameter of column parameter and the second objective matrix；

S204, according to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, determining pair The objective matrix multiplication implementation answered；

S205 calculates the first object matrix and the second target square using the objective matrix multiplication implementation determined The product of battle array.

Above-mentioned steps S202 to S205 and the step S101 to S104 of a upper embodiment are same or similar, no longer superfluous herein It states.Related content can refer to the step S101 to S104 of a upper embodiment.The present embodiment is stressed and a upper embodiment Difference.

In this embodiment, optionally, it can store look-up table in memory or in hard disk, connect in step S203 After receiving row parameter m, the column parameter n of first object matrix and the column parameter k of the second objective matrix, it can use these three ginsengs Number, searches corresponding scheme in a lookup table.

In look-up table other than storing above-mentioned parameter, the corresponding number of each implementation, example can also be stored with The mark ID of such as implementation.Therefore, in step S204, i.e., according to the row parameter of the first object matrix, column parameter and The column parameter of second objective matrix, before the step of determining corresponding objective matrix multiplication implementation, this method can also be wrapped Include following steps:

S200 distributes identification id for the matrix multiplication implementation；

In this step, a number can be assigned for the matrix multiplication implementation, such as the implementation is only One corresponding identification id.

S201 joins the row of the identification id of the matrix multiplication implementation the first matrix corresponding with the implementation Number, column parameter and the second matrix column parameter are stored in the look-up table.

It in this step, can be by the corresponding identification id write-in memory of each matrix multiplication implementation or hard disk In look-up table in, and at the same time by the row parameter, column parameter and the second matrix column of corresponding first matrix of the implementation Parameter is also stored in the look-up table.

The step for S201, i.e., by the identification id of the matrix multiplication implementation the first square corresponding with the implementation The step of row parameter, column parameter and the second matrix column parameter of battle array are stored in look-up table can specifically include following sub-step:

S2011 joins the row parameter, column parameter and the second matrix column of the first matrix of the matrix multiplication implementation Number generates key assignments；

Identification id and key assignments composition key-value pair in the matrix multiplication implementation is stored in the lookup by S2012 In table.

Optionally, after completing to calculate, this method can also include the following steps:

The look-up table is recycled.

In this step, when look-up table is stored in memory, look-up table can be recycled after calculating completion, by this Look-up table is removed from memory.Due to look-up table recycling can releasing memory, but do not removed in disk, therefore next time makes Used time only needs to be loaded into memory from disk.So can releasing memory, maintain calculate equipment the speed of service；When looking into When table being looked for store in a hard disk, recycling look-up table can discharge the space of hard disk after calculating completion, improve storage capacity.

It optionally, in the present embodiment, can also include not finding the processing step after corresponding implementation, specifically May include:

S206: when not finding corresponding matrix multiplication implementation, multiple optional implementations are assessed, root According to the time is calculated, preferred implementation scheme is selected from the optional implementation.

Fig. 3 is the flow chart for the sub-step that step S206 is included in Fig. 2.As shown in figure 3, optionally step S206 can be with Including following sub-step:

S2061 generates the first test matrix and the second test matrix, first test matrix and the second test square Battle array has ranks parameter identical with the first object matrix and second objective matrix；

S2062 is utilized respectively multiple optional implementations and calculates multiplying for first test matrix and the second test matrix Product, obtains the calculating time of each optional implementation；

S2063 selects preferred implementation scheme according to the time is calculated from the optional implementation.

In above-mentioned steps, when not finding corresponding implementation, need to assess multiple optional implementations, from And obtain preferred implementation.Specifically, as a kind of implementation, two and input matrix size can be generated first Identical but random content matrix utilizes the effect of the optional implementation of the matrix verification.For example, two multiplication matrix are It, can be raw according to the first row matrix parameter m, column parameter n of input and the second matrix column parameter k when the matrix of m × n and n × k The matrix for being m × n and n × k at two shapes, the content of matrix are random.Later, obtaining the two matrix multiples may be divided into Multiple and different minor matrixs the case where, the calculating of different minor matrixs is each responsible for by thread each on GPU.Due to what is finally generated It may be divided into the situation of minor matrix and few, i.e., optional implementation type is simultaneously few, therefore by traversing all realizations The method of scheme calculates the time that its every kind scheme uses, and selects time shortest implementation, can be used as optimal realization Scheme.

Optionally, in step S2063, i.e., according to the time is calculated, preferred implementation side is selected from the optional implementation After the step of case, the method also includes:

The preferred implementation scheme is uploaded to GPU by S2064；

S2065 generates the row parameter of the first matrix of the preferred implementation scheme, column parameter and the second matrix column parameter Key assignments；

Identification id and key assignments composition key-value pair in the preferred implementation scheme is stored in the look-up table by S2066.

In above-mentioned steps, a number, such as the preferred implementation scheme can be assigned for the preferred implementation scheme obtained Unique corresponding identification id.The preferred implementation scheme of acquisition can be uploaded to GPU later, be uploaded by the preferred implementation scheme To GPU, corresponding implementation can be obtained by these identification ids, for subsequent calls.It later can will be on each It reaches in the look-up table in the corresponding identification id write-in memory of preferred implementation scheme or hard disk of GPU, and at the same time this is excellent The row parameter, column parameter and the second matrix column parameter of corresponding first matrix of implementation is selected also to be stored in the look-up table.

By above-mentioned it is found that a kind of matrix multiplication calculation method is proposed in second embodiment of the invention, by the way that square is arranged The implementation of battle array multiplication, in the case where the row parameter of multiplication matrix and relatively fixed column parameter, can by row parameter and Column parameter automatically selects optimal implementation, reduces the work load for calculating equipment, guarantees the process performance of executing subject, mention The processing speed of high executing subject.Present invention is particularly suitable for the occasion that the multiplication that matrix parameter is repeatedly fixed calculates, examples Such as in deep learning prediction and Training scene, the performance of deep learning training and prediction can be significantly improved, reduces to calculate setting Standby amount of calculation improves the processing speed for calculating equipment.

In addition, in the matrix multiplication calculation method of the present embodiment, it, can be in advance in GPU when executing subject is GPU The implementation of a variety of matrix multiplications is stored, and presets look-up table, it is real conveniently and efficiently to search optimal matrix multiplication Existing scheme.It, can be under the matrix size to all optional realizations when corresponding implementation is not present in a lookup table Scheme is assessed, and according to the time is calculated, preferred implementation scheme is selected from the optional implementation, and by the preferred implementation Scheme is uploaded to GPU, while the row parameter of the ID of the corresponding identification of the preferred implementation scheme and first matrix, column being joined Several and the second matrix column parameter is stored into look-up table.In this way, can make to prestore in GPU by learning and optimize Implementation is able to the increase with number of processes and increases, so that it is optimal to have bigger possibility to obtain in calculating every time Processing method.In addition, look-up table can be recycled after calculating terminates, the memory space of committed memory or hard disk is avoided.

3rd embodiment

Third embodiment of the invention proposes a kind of matrix multiplication computing device, such as can apply in GPU, shown in Fig. 4 For the block diagram of the matrix multiplication computing device of third embodiment of the invention.As shown in figure 4, the matrix multiplication of the GPU calculates dress It sets and includes:

Module 301 is provided, for providing multiple matrix multiplication implementations, the matrix multiplication implementation corresponding first Row parameter, column parameter and the second matrix column parameter of matrix；

In the present embodiment, when needing to calculate the product of two multiplication matrix in GPU, providing module 301 can be mentioned For multiple matrix multiplication implementations.For example, having prestored a variety of implementations (kernel), the square in the GPU of electronic device Optimal case of the battle array multiplication implementation namely for the matrix of the specific ranks parameter of calculating.This optimal case can be through Cross multiple authentication and calculating time shortest scheme etc. more determining.Offer module 301 can obtain corresponding from GPU Implementation is supplied to subsequent module use.

Parameter determination module 302, for determining the row parameter of first object matrix, the column of column parameter and the second objective matrix Parameter；

Parameter determination module 302 can obtain can two target multiplication matrix ranks number.Due to two for capable of being multiplied Row of the matrix necessarily satisfying for the columns and second target matrix (such as right matrix) of first aim matrix (such as left matrix) Number is identical, therefore only needs to obtain the row parameter of first object matrix, the column parameter of column parameter and the second objective matrix.

Scheme determining module 303, for the row parameter, column parameter and the second objective matrix according to the first object matrix Column parameter, determine corresponding objective matrix multiplication implementation；

In the present embodiment, scheme determining module 303 can be according to the row parameter, column parameter and the second matrix of the first matrix Column parameter according to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, determine corresponding Objective matrix multiplication implementation.For example, the row parameter of the first matrix is m, column parameter is n, the second matrix column parameter is k, Corresponding implementation can be searched according to tri- parameters of m, n, k, this scheme is, for example, to pass through the realization of the Performance Evaluation square The optimal case that battle array is multiplied.The implementation is, for example, one section of calculation code, and using the calculation code, executing subject can be with most Excellent mode calculates the matrix multiple.

Computing module 304, for calculating the first object matrix using the objective matrix multiplication implementation determined With the product of the second objective matrix.

In the present embodiment, if be consistent there are parameter with above three parameter m, n, k in the implementation prestored The implementation of matrix multiplication, then it is assumed that there is the implementation for corresponding to and currently needing to carry out multiplication calculating, therefore, calculate mould Block 304 can utilize the implementation, calculate the product of first object matrix and the second objective matrix.Calculating can be It is executed in GPU, by calling corresponding implementation in GPU, it can optimal mode executes the multiplying.

By above-mentioned it is found that a kind of matrix multiplication computing device is proposed in third embodiment of the invention, by the way that square is arranged The implementation of battle array multiplication, in the case where the row parameter of multiplication matrix and relatively fixed column parameter, can by row parameter and Column parameter automatically selects optimal implementation, reduces the work load for calculating equipment, guarantees the process performance of GPU, improves GPU Processing speed.Present invention is particularly suitable for the occasions that the multiplication that matrix parameter is repeatedly fixed calculates, such as in depth It practises in prediction and Training scene, the performance of deep learning training and prediction can be significantly improved, reduce the calculating work for calculating equipment It measures, improves the processing speed for calculating equipment.

Fourth embodiment

Fourth embodiment of the invention proposes that a kind of matrix multiplication computing device, Fig. 5 show fourth embodiment of the invention The block diagram of the matrix multiplication computing device of GPU.As shown in figure 5, the matrix multiplication computing device includes:

Module 402 is provided, for providing multiple matrix multiplication implementations, the matrix multiplication implementation corresponding first Row parameter, column parameter and the second matrix column parameter of matrix；

In this step, when needing to calculate the product of two multiplication matrix in GPU, providing module 402 can be mentioned For multiple matrix multiplication implementations.It has been prestored in GPU a variety of implementations (kernel), the matrix multiplication implementation point It is not the optimal case for calculating the matrix of specific ranks parameter.This optimal case can be by multiple authentication and compare Determining calculating time shortest scheme etc. afterwards.

Parameter determination module 403, for determining the row parameter of first object matrix, the column of column parameter and the second objective matrix Parameter；

In the present embodiment, parameter determination module 403 can obtain can two target multiplication matrix ranks number.Due to energy Two matrixes being enough multiplied necessarily satisfying for first aim matrix (such as left matrix) columns and second target matrix (such as Right matrix) line number it is identical, therefore only need to obtain the column of the row parameter of first object matrix, column parameter and the second objective matrix Parameter.

Scheme determining module 404, for the row parameter, column parameter and the second objective matrix according to the first object matrix Column parameter, determine corresponding objective matrix multiplication implementation；

In the present embodiment, scheme determining module 404 can be according to the row parameter, column parameter and the second matrix of the first matrix Column parameter according to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, determine corresponding Objective matrix multiplication implementation.For example, the row parameter of the first matrix is m, column parameter is n, the second matrix column parameter is k, Corresponding implementation can be searched according to tri- parameters of m, n, k, this scheme for example can be the realization by Performance Evaluation The optimal case of the matrix multiple.

Computing module 405, for calculating the first object matrix using the objective matrix multiplication implementation determined With the product of the second objective matrix.

In the present embodiment, if be consistent there are parameter with above three parameter m, n, k in the implementation prestored The implementation of matrix multiplication, then it is assumed that existing therefore can corresponding to the implementation for currently needing to carry out multiplication calculating Using the implementation, the product of first object matrix and the second objective matrix is calculated.Calculating can execute in GPU, lead to Cross corresponding implementation in calling GPU, it can optimal mode executes the multiplying.

The module 301 to 304 of above-mentioned module 402 to 405 and a upper embodiment is same or similar, and details are not described herein.Phase Hold the module 301 to 304 that can refer to a upper embodiment inside the Pass.The present embodiment stress it is different from a upper embodiment it Place.

In one embodiment, the scheme determining module 404 can use look-up table and search corresponding implementation.It searches Table is, for example, the table being stored in memory.

In one embodiment, which for example can also include distribution module 400 and memory module 401, distribution module 400 For distributing identification id for the matrix multiplication implementation；Memory module 401 be used for the identification id of each implementation and Row parameter, column parameter and the second matrix column parameter of corresponding first matrix of the implementation are stored in the look-up table.

In one embodiment, the memory module 401 may include that key assignments generates submodule and sub-module stored, and key assignments is raw It is used at submodule by the row parameter of the first matrix of the matrix multiplication implementation, column parameter and the second matrix column parameter Generate key assignments；Sub-module stored is used for the identification id and key assignments composition key-value pair in the matrix multiplication implementation, storage In the look-up table.Above-mentioned look-up table for example can store in hard disk or memory.

In one embodiment, described device can also include recycling module, for recycling the look-up table.

In one embodiment, described device can also include that preferred embodiment obtains module 406, which obtains module 406 for assessing multiple optional implementations under the matrix size when not finding corresponding implementation, According to the time is calculated, preferred implementation scheme is selected from the optional implementation.

In one embodiment, as shown in fig. 7, it may include following submodule that the preferred embodiment, which obtains module 406:

Test matrix generates submodule 4061, and for generating the first test matrix and the second test matrix, described first is surveyed Matrix and second test matrix is tried to join with ranks identical with the first object matrix and second objective matrix Number；

Time acquisition submodule 4062 is calculated, calculates the first test square for being utilized respectively multiple optional implementations The product of battle array and the second test matrix, obtains the calculating time of each optional implementation；

Submodule 4063 is selected, for selecting preferred implementation scheme from the optional implementation according to the time is calculated.

In the present embodiment, when not finding corresponding implementation, need to assess multiple optional implementations, from And obtain preferred implementation.Specifically, as a kind of implementation, two and input matrix size can be generated first Identical but random content matrix utilizes the effect of the optional implementation of the matrix verification.For example, two multiplication matrix are It, can be raw according to the first row matrix parameter m, column parameter n of input and the second matrix column parameter k when the matrix of m × n and n × k The matrix for being m × n and n × k at two shapes, the content of matrix are random.Later, obtaining the two matrix multiples may be divided into Multiple and different minor matrixs the case where, the calculating of different minor matrixs is each responsible for by thread each on GPU.Due to what is finally generated It may be divided into the situation of minor matrix and few, i.e., optional implementation type is simultaneously few, therefore by traversing all realizations The method of scheme calculates the time that its every kind scheme uses, and selects time shortest implementation, can be used as optimal realization Scheme.

Optionally, which, which obtains module 406, to include:

Submodule 4064 is uploaded, for the preferred implementation scheme to be uploaded to GPU；

Key assignments generates submodule 4065, for by the row parameter of the first matrix of the preferred implementation scheme, column parameter and the Two matrix column parameters generate key assignments；

Key-value pair sub-module stored 4066, for by the preferred implementation scheme identification id and key assignments form key-value pair, It is stored in the look-up table.

By above-mentioned it is found that a kind of matrix multiplication computing device is proposed in fourth embodiment of the invention, by the way that square is arranged The implementation of battle array multiplication, in the case where the row parameter of multiplication matrix and relatively fixed column parameter, can by row parameter and Column parameter automatically selects optimal implementation, reduces the work load for calculating equipment, guarantees the process performance of GPU, improves GPU Processing speed.Present invention is particularly suitable for the occasions that the multiplication that matrix parameter is repeatedly fixed calculates, such as in depth It practises in prediction and Training scene, the performance of deep learning training and prediction can be significantly improved, reduce the calculating work for calculating equipment It measures, improves the processing speed for calculating equipment.

In addition, the realization side of a variety of matrix multiplications can be stored in advance in the matrix multiplication computing device of the present embodiment Case, and look-up table is preset, conveniently and efficiently to search optimal matrix multiplication implementation.When being not present in a lookup table When corresponding implementation, all implementations can be assessed under the matrix size, according to the time is calculated, from institute It states in optional implementation selection preferred implementation scheme, and the preferred implementation scheme is uploaded to GPU, while by the preferred reality The ID of the existing corresponding identification of scheme and row parameter, column parameter and the second matrix column parameter of first matrix are stored to lookup In table.In this way, the implementation prestored in GPU can be enable with the increase of number of processes by learning and optimizing And increase, so that can there is bigger possibility to obtain optimal processing method in calculating every time.

For device embodiment, since it is basically similar to the method embodiment, related so describing fairly simple Place illustrates referring to the part of embodiment of the method.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It should be understood by those skilled in the art that, the embodiments of the present application may be provided as method, apparatus or calculating Machine program product.Therefore, the embodiment of the present application can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present application can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.

In a typical configuration, the computer equipment includes one or more processors (CPU), input/output Interface, network interface and memory.Memory may include the non-volatile memory in computer-readable medium, random access memory The forms such as device (RAM) and/or Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is to calculate The example of machine readable medium.Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with Realize that information is stored by any method or technique.Information can be computer readable instructions, data structure, the module of program or Other data.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or Other magnetic storage devices or any other non-transmission medium, can be used for storage can be accessed by a computing device information.According to Herein defines, and computer-readable medium does not include non-persistent computer readable media (transitory media), such as The data-signal and carrier wave of modulation.

The embodiment of the present application is referring to according to the method for the embodiment of the present application, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.

Although preferred embodiments of the embodiments of the present application have been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and all change and modification within the scope of the embodiments of the present application.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.

Above to a kind of matrix multiplication calculation method provided herein and matrix multiplication computing device, carry out in detail It introduces, specific examples are used herein to illustrate the principle and implementation manner of the present application, the explanation of above embodiments It is merely used to help understand the present processes and its core concept；At the same time, for those skilled in the art, according to this The thought of application, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not answered It is interpreted as the limitation to the application.

Claims

1. a kind of matrix multiplication calculation method characterized by comprising

Multiple matrix multiplication implementations, row parameter, the column parameter of corresponding first matrix of the matrix multiplication implementation are provided With the second matrix column parameter；

According to the column parameter of the row parameter of the first object matrix, column parameter and the second objective matrix, corresponding target is determined Matrix multiplication implementation；

The product of the first object matrix and the second objective matrix is calculated using the objective matrix multiplication implementation determined.

2. the method as described in claim 1, which is characterized in that in row parameter, the column parameter according to the first object matrix With the column parameter of the second objective matrix, before the step of determining corresponding objective matrix multiplication implementation, the method is also wrapped It includes:

By the row parameter of the identification id of the matrix multiplication implementation the first matrix corresponding with the implementation, column parameter and Second matrix column parameter is stored in look-up table.

3. method according to claim 2, which is characterized in that by the identification id of the matrix multiplication implementation and the realization The step of row parameter of corresponding first matrix of scheme, column parameter and the second matrix column parameter deposit look-up table includes:

The row parameter of first matrix of the matrix multiplication implementation, column parameter and the second matrix column parameter are generated into key Value；

4. method according to claim 2, which is characterized in that described to be joined according to the row parameter of the first object matrix, column Several and the second objective matrix column parameters, the step of determining corresponding objective matrix multiplication implementation include:

The row parameter, column parameter and the second matrix column parameter that have with first matrix are searched whether in the look-up table The matrix multiplication implementation of identical parameter.

5. the method as described in claim 1, which is characterized in that the method also includes:

When not finding corresponding matrix multiplication implementation, multiple optional implementations are assessed, when according to calculating Between, preferred implementation scheme is selected from the optional implementation.

6. method as claimed in claim 5, which is characterized in that it is described that multiple optional implementations are assessed, according to meter Evaluation time, from the optional implementation select preferred implementation scheme the step of include:

The first test matrix and the second test matrix are generated, first test matrix and second test matrix have and institute State first object matrix and the identical ranks parameter of second objective matrix；

It is utilized respectively the product that multiple optional implementations calculate first test matrix and the second test matrix, is obtained each The calculating time of a optional implementation；

7. method as claimed in claim 6, which is characterized in that according to the time is calculated, selected from the optional implementation After the step of preferentially selecting implementation, the method also includes:

The preferred implementation scheme is uploaded to GPU；

8. a kind of matrix multiplication computing device characterized by comprising

Module is provided, for providing multiple matrix multiplication implementations, corresponding first matrix of the matrix multiplication implementation Row parameter, column parameter and the second matrix column parameter；

Parameter determination module, for determining the row parameter of first object matrix, the column parameter of column parameter and the second objective matrix；

Scheme determining module, for being joined according to the column of the row parameter of the first object matrix, column parameter and the second objective matrix Number, determines corresponding objective matrix multiplication implementation；

Computing module, for calculating the first object matrix and the second mesh using the objective matrix multiplication implementation determined Mark the product of matrix.

9. device as claimed in claim 8, which is characterized in that described device further include:

Memory module, for by the row of the identification id of the matrix multiplication implementation the first matrix corresponding with the implementation Parameter, column parameter and the second matrix column parameter are stored in look-up table.

10. device as claimed in claim 9, which is characterized in that the memory module further include:

Key assignments generates submodule, for by the row parameter of the first matrix of the matrix multiplication implementation, column parameter and second Matrix column parameter generates key assignments；

Sub-module stored, for being stored in institute for the identification id and key assignments composition key-value pair in the matrix multiplication implementation It states in look-up table.

11. device as claimed in claim 9, which is characterized in that the scheme determining module is used for:

12. device as claimed in claim 8, which is characterized in that described device further include:

Preferred embodiment obtains module, for when not finding corresponding matrix multiplication implementation, to multiple optional realization sides Case is assessed, and according to the time is calculated, preferred implementation scheme is selected from the optional implementation.

13. device as claimed in claim 12, which is characterized in that the preferred embodiment obtains module and includes:

Test matrix generates submodule, for generating the first test matrix and the second test matrix, first test matrix and Second test matrix has ranks parameter identical with the first object matrix and second objective matrix；

Time acquisition submodule is calculated, calculates first test matrix and second for being utilized respectively multiple optional implementations The product of test matrix obtains the calculating time of each optional implementation；

14. device as claimed in claim 13, which is characterized in that described device further include:

Key assignments generation module, row parameter, column parameter and the second matrix for first matrix by the preferred implementation scheme Column parameter generate key assignments；

Key-value pair memory module, for being stored in the identification id and key assignments composition key-value pair in the preferred implementation scheme described In look-up table.

15. a kind of electronic device, comprising:

Memory, wherein being stored with computer readable program code；And

Processor, when the processor executes the computer readable program code of the storage, the electronic device is executed Following operation: