CN102043605B

CN102043605B - Multimedia transformation multiplier and processing method thereof

Info

Publication number: CN102043605B
Application number: CN201010603133A
Authority: CN
Inventors: 胡伟武; 刘宏伟; 陈云霁
Original assignee: Loongson Technology Corp Ltd
Current assignee: Loongson Technology Corp Ltd
Priority date: 2010-12-23
Filing date: 2010-12-23
Publication date: 2012-10-24
Anticipated expiration: 2030-12-23
Also published as: CN102043605A

Abstract

The invention relates to a multimedia transformation multiplier and a processing method thereof. The multimedia transformation multiplier comprises a matrix multiplication module and an operation control module, wherein the matrix multiplication module is used for carrying out a matrix multiplication operation on data of a first matrix and data of a second matrix to obtain data of an intermediate result matrix; and the operation control module is used for reading operation control parameter values and carrying out an operation on the data of the intermediate result matrix in accordance with the operation control parameter values so as to obtain data of a result matrix. The multimedia transformation multiplier can accelerate the multimedia processing procedure, has fine universality, realizes commutative operation procedures for different requirements and can complete the multimedia commutative operation at low hardware cost.

Description

Multimedia transformation multiplier and processing method thereof

Technical Field

The present invention relates to the field of computer technology, and more particularly, to a multimedia transform multiplier and a method for processing multimedia files.

Background

With the development of processor technology, the application field is expanding. In particular, with the increasing demand for various operations such as multimedia operations and scientific operations, various general-purpose processors are adding instruction sets of single instruction stream and multiple data streams.

In the multimedia field, single instruction multiple data (simd) instructions can greatly increase the speed of multimedia processing. In the multimedia processing process, transform operations are very common. This is because most images share a common feature, namely that flat and slowly changing areas are the majority, while details and content are abrupt. Or stated another way, the dc and low frequency components make up the majority of the image, while the high frequency components make up a small portion of the image. This allows the spatial domain image to be transformed to the frequency domain or a particular transform domain, resulting in transform coefficients with low correlation. Thus, various operations are carried out on the basis of the method, and various treatments can be conveniently realized: such as direct image processing; or compression encoding, so-called transform encoding, etc., on the basis thereof to achieve a compression effect, etc.

Generally, there is a class of transforms called orthogonal transforms that can be used for image coding. Such as fast fourier transforms, K-L transforms, discrete cosine transforms, and the like. These transformations have a relatively general format:

the general form for a one-dimensional transform is:

F＝A×f

where A is a transformation matrix, F is an image original value matrix, and F is a transformed coefficient matrix. Correspondingly, the inverse transformation is f ═ A^T×F

For two-dimensional transformation, it can be generally understood that one-dimensional transformation is performed on each row and then one-dimensional transformation is performed on each column, and the matrix form is written as follows:

F＝A×f×A^T

the corresponding inverse transform is: f is A^T×F×A

Integer transform is mostly used for transform in the multimedia field, because floating point operation brings error in operation precision, thereby causing mismatch problem of inverse transform.

In some specific media formats, addition and shift operations are also performed during integer transformation, which is actually a round-up process. For example, the formula F is A × F × A^TThe calculation of (a) will become the following form:

E＝(f×A^T+2^shift1-1)＞＞shift1

F＝(A×E+2^shift2-1)＞＞shift2

the following takes the processes in the codec of several mainstream media today as an example to explain the general form presented above. It is to be understood that this is merely an illustration of a usage scenario and not a limitation.

Integer discrete cosine transform, Hadamard transform and their inverse in multimedia transform operations may use the present method. Since the media format compresses the media spatially unimportant information using a process that uses transformations in conjunction with quantization. Therefore, the transform operation becomes an important step in the media encoding and decoding process, and is also a step which generates a large amount of calculation and occupies a large amount of processor time. For popular media formats, such as H.264, VC-1, AVS, rmvb, mpeg4, etc., contain such steps.

For multimedia decoding as an example, the basic operation form of the inverse discrete cosine transform is as follows:

f ═ T '× T, where X and T' are both matrices, typically 8 × 8 or 4 × 4 matrices. In particular streaming media formats, such as avs, H.264, and vc-1 formats, the particular inverse discrete cosine transform is as follows:

AVS format

The process of converting an 8 × 8 transform coefficient matrix CoeffMatrix into an 8 × 8 residual value matrix resofumatrix in the AVS includes the following steps:

first, the transform coefficient matrix is subjected to the following horizontal inverse transform with rounding shift:

E_8x8＝(CoeffMatrix×T₈ ^T+4)＞＞3

wherein, T₈Is an 8x8 inverse transform matrix, T₈ ^TIs T₈Of a transposed matrix of, CoeffMatrix x T₈ ^TRepresenting the intermediate result after the horizontal inverse transformation. CoeffMatrix xT decoded from a bitstream conforming to this section₈ ^TThe value range of matrix elements should be-2¹⁵～2¹⁵-5。

T_{8} = [\begin{matrix} 8 & 10 & 10 & 9 & 8 & 6 & 4 & 2 \\ 8 & 9 & 4 & - 2 & - 8 & - 10 & - 10 & - 6 \\ 8 & 6 & - 4 & - 10 & - 8 & 2 & 10 & 9 \\ 8 & 2 & - 10 & - 6 & 8 & 9 & - 4 & - 10 \\ 8 & - 2 & - 10 & 6 & 8 & - 9 & - 4 & 10 \\ 8 & - 6 & - 4 & 10 & - 8 & - 2 & 10 & - 9 \\ 8 & - 9 & 4 & 2 & - 8 & 10 & - 10 & 6 \\ 8 & - 10 & 10 & - 9 & 8 & - 6 & 4 & - 2 \end{matrix}]

Next, for matrix E_8x8The following vertical inverse transform is performed with rounding shifts:

R_8x8＝(T₈×E_8x8+2⁶)＞＞7

wherein, T₈×E_8x8Representing the inverse transformed 8x8 matrix. T decoded from a bitstream conforming to the portion₈×E_8x8The value range of matrix elements should be-2¹⁵～2¹⁵-65. Finally calculating to obtain R_8x8I.e. the residual sample matrix resisumatrix.

(II) VC-1 Format

The units of the inverse transformation of VC-1 are 8 × 8, 8 × 4, 4 × 8 and 4 × 4. The inverse quantized coefficients are 12-bit signed numbers and the inverse transformed coefficients are 10-bit signed numbers. The reverse transformation step of VC-1 is as follows:

<math> <mfrac> <mrow> <msub> <mi>E</mi> <mrow> <mi>M</mi> <mo>×</mo> <mi>N</mi> </mrow> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>D</mi> <mrow> <mi>M</mi> <mo>×</mo> <mi>N</mi> </mrow> </msub> <mo>·</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>+</mo> <mn>4</mn> <mo>)</mo> </mrow> <mo>></mo> <mo>></mo> <mn>3</mn> </mrow> <mrow> <msub> <munder> <mi>R</mi> <mo>&OverBar;</mo> </munder> <mrow> <mi>M</mi> <mo>×</mo> <mi>N</mi> </mrow> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msubsup> <mi>T</mi> <mi>N</mi> <mo>′</mo> </msubsup> <mo>·</mo> <mo>·</mo> <msub> <mi>E</mi> <mrow> <mi>M</mi> <mo>×</mo> <mi>N</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>N</mi> </msub> <mo>·</mo> <mo>·</mo> <msub> <mn>1</mn> <mi>M</mi> </msub> <mo>+</mo> <mn>64</mn> <mo>)</mo> </mrow> <mo>></mo> <mo>></mo> <mn>7</mn> </mrow> </mfrac> </math>

R_8x8＝(T₈×E_8x8+2⁶)＞＞7

wherein D_M×NFor the input matrix, i.e. the inverse-quantized coefficient matrix, E_M×NIs an intermediate result matrix, is a 13-bit signed number,R _M×Nis an output matrix, namely a coefficient matrix after inverse transformation. M, N may be 4 or 8, T₈，T₄Respectively as follows:

T_{8} = [\begin{matrix} 12 & 12 & 12 & 12 & 12 & 12 & 12 & 12 \\ 16 & 15 & 9 & 4 & - 4 & - 9 & - 15 & - 16 \\ 16 & 6 & - 6 & - 16 & - 16 & - 6 & 6 & 16 \\ 15 & - 4 & - 16 & - 9 & 9 & 16 & 4 & - 15 \\ 12 & - 12 & - 12 & 12 & 12 & - 12 & - 12 & 12 \\ 9 & - 16 & 4 & 15 & - 15 & - 4 & 16 & - 9 \\ 6 & - 16 & 16 & - 6 & - 6 & 16 & - 16 & 6 \\ 4 & - 9 & 15 & - 16 & 16 & - 15 & 9 & - 4 \end{matrix}]

T_{4} = [\begin{matrix} 17 & 17 & 17 & 17 \\ 22 & 10 & - 10 & - 22 \\ 17 & - 17 & - 17 & 17 \\ 10 & - 22 & 22 & - 10 \end{matrix}]

C₈＝(0 0 0 0 1 1 1 1)^T，C₄＝(0 0 0 0)^T。

in application, for a relatively large matrix, such as a matrix multiplication of 8x 8. The block matrix multiplication method can be used for realizing the matrix multiplication instruction of 4x 4. The implementation method is not described in detail here.

In the second step T '(× T) of the transformation, T' is used, which is the transpose of the transformation matrix T of the corresponding dimension: as a matrix transformation for 4x 8:

T′＝T₄ ^T，T＝T₈。

(III) H.264 format

The transformation matrix T in h.264 is:

4x4 inverse transform matrix:

T_{4} = [\begin{matrix} 2 & 2 & 2 & 2 \\ 2 & 1 & - 1 & - 2 \\ 2 & - 2 & - 2 & 2 \\ 1 & - 2 & 2 & - 1 \end{matrix}]

8x8 inverse transform matrix:

T_{8} = [\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 12 & 10 & 6 & 3 & - 3 & - 6 & - 10 & - 12 \\ 8 & 4 & - 4 & - 8 & - 8 & - 4 & 4 & 8 \\ 10 & - 3 & - 12 & - 6 & 6 & 12 & 3 & 10 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 6 & - 12 & 3 & 10 & - 10 & - 3 & 12 & - 6 \\ 4 & - 8 & 8 & - 4 & - 4 & 8 & - 8 & 4 \\ 3 & - 6 & 10 & - 12 & 12 & - 10 & 6 & - 3 \end{matrix}]

for the inverse transform of 8x8, the process is as follows:

E_8x8＝(CoeffMatrix×T₈+4)＞＞3

H_8x8＝(T₈ ^T×E_8x8+4)＞＞3

R_8x8＝(H_8x8+2⁵)＞＞6

for the encoding process, i.e. the forward process of the transform operation, it is of the form: x ═ T ═ F ═ T^TFor h.264 format, the transformation matrix is:

but the operation mode is identical to the general mode.

Also for other transforms, such as the Hadamard transform on the luminance block WD used in H264, the form is:

Y_{D} = ([\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 1 & 1 & - 1 \end{matrix}] W_{D} [\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 1 & 1 & - 1 \end{matrix}]) / 2

also in accordance with the general form described.

In summary, in the multimedia processing transformation operation process, matrix multiplication, addition and shift in the system are main parts, and a large number of multiplication and addition instructions need to be circulated for multiple times to realize one-time coding and decoding transformation, so that the time of a processor is long, the realization is complex, and the universality is not high.

Disclosure of Invention

The present invention is directed to overcome the drawbacks of the prior art, and provides a multimedia transform multiplier and a processing method thereof, which accelerate the multimedia processing process, have good versatility, and can complete the transform operation of multimedia data with less hardware cost.

The multimedia transform multiplier provided for realizing the purpose of the invention comprises a matrix multiplication module and an operation control module;

the matrix multiplication module is used for carrying out matrix multiplication operation on the data of the first matrix and the data of the second matrix to obtain the data of an intermediate result matrix;

and the operation control module is used for reading the operation control parameter values and controlling the data of the intermediate result matrix to perform operation according to the operation control parameter values to obtain the data of the result matrix.

Preferably, the multimedia transform multiplier further comprises a parameter loading module for loading the data of the first matrix, the data of the second matrix and the operation control parameter value.

Preferably, the data of the first matrix is data of a coefficient matrix of a multimedia transformation operation; and the data of the second matrix is the data of a transformation matrix for multimedia transformation operation.

Preferably, the data of the first matrix is data of a transpose matrix of a coefficient matrix of the multiplier which performs the multimedia transform operation last time; and the data of the second matrix is the data of a result matrix obtained after the last operation of the operation control module of the multiplier.

Preferably, the multimedia transform multiplier further includes a transposer for performing a transposing operation on the coefficient matrix of the multimedia transform operation to obtain data of the first matrix.

Preferably, the operation control parameter values include operation mode parameter values and operation parameter values;

the operation control module comprises a judgment module and an operation module;

the judging module is used for reading the operation mode parameter value loaded in the loading parameter module and determining the operation mode;

and the operation module is used for reading the operation parameter values loaded in the loading parameter module and controlling the intermediate result matrix to carry out corresponding operation in the operation mode determined by the judgment module according to the operation parameter values.

Preferably, the judging module comprises a digit precision bit and an operation mode bit; the judging module determines an operation mode according to the read operation mode parameter value through a bit precision bit and an operation mode bit; the digit precision bit and the operation mode bit are respectively expressed by binary numbers.

Preferably, the bit precision bits are bit precision requiring more than 16 bits or less than 16 bits for the data precision valid bits of the intermediate matrix;

when the operation mode bit is that the data precision effective bit of the intermediate matrix is lower than 16 bits, the operation mode is whether to carry out addition operation; when the data precision significant bit of the intermediate matrix is higher than 32 bits, the operation mode is to take out the lower half or the upper half of the intermediate matrix.

Preferably, the operation module includes an operation control bit representing the number of shift bits.

Preferably, the multimedia transform multiplier further comprises a first storage module, a second storage module, a third storage module and a fourth storage module; wherein:

the first storage module is used for storing the data of the first matrix;

the second storage module is used for storing the data of the second matrix;

the third storage module is used for storing the data of the intermediate result matrix obtained after the matrix multiplication module carries out matrix multiplication operation;

and the fourth storage module is used for storing the data of the result matrix obtained after the operation of the operation control module.

To achieve the object of the present invention, there is also provided a processing method of a multimedia transform multiplier, comprising the steps of:

a, performing matrix multiplication operation on data of a first matrix and data of a second matrix to obtain data of an intermediate result matrix;

and step B, controlling the data of the intermediate result matrix to carry out operation according to the loaded operation control parameter values to obtain the data of the result matrix.

Preferably, the step a is preceded by the following steps:

and step A', loading the data of the first matrix and the data of the second matrix, and calculating the control parameter value.

Preferably, the data of the first matrix is data of a coefficient matrix of a multimedia transformation operation, and the data of the second matrix is data of a transformation matrix of the multimedia transformation operation.

Preferably, the first matrix is data of a transpose matrix of a coefficient matrix of the multiplier which last performed multimedia transform operation; and the data of the second matrix is the data of a result matrix obtained after the last generation operation of the operation control module of the multiplier.

Preferably, step a further comprises the following steps:

and step A', transposing the coefficient matrix subjected to the multimedia transformation operation last time to obtain data of the first matrix.

Preferably, the operation control parameter values include operation mode parameter values and operation parameter values, and the step S200 includes the following steps:

step B1, reading the parameter value of the operation mode, and determining the operation mode;

and step B2, reading the operation parameter values, and controlling the intermediate result matrix to perform corresponding operation according to the operation parameter values in the operation mode determined in the step B1 to obtain the data of the final result matrix.

The invention has the beneficial effects that: the multimedia transformation multiplier and the processing method thereof can quickly realize matrix multiplication, can greatly accelerate the speed of transformation operation in multimedia processing, accelerate the multimedia processing process, simultaneously have good universality, realize the transformation operation process under different requirements, can finish the multimedia transformation operation with smaller hardware cost, and can save the operation time of matrix transposition in the transformation processing and simultaneously reduce the occupation of the number of computer registers.

Drawings

FIG. 1 is a diagram illustrating a multimedia transform multiplier according to an embodiment of the present invention;

FIG. 2 is a flow chart of a processing method of a multimedia transform multiplier according to a third embodiment;

fig. 3 is a flowchart of a processing method of the multimedia transform multiplier according to the fourth embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the multimedia transform multiplier and the processing method thereof according to the present invention are further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting.

In the transformation operation process of multimedia processing, multiplication, addition and shift are main parts, a large number of multiplication and addition loop iterations are needed to realize one-time transformation operation, and the transformation operation process can be realized by the cascade cooperation of a single or a small number of multimedia transformation multipliers.

From the existing transformation processes for different multimedia processes, a common matrix multiplication process can be abstracted as follows:

E_mxm＝(CoeffMatrix×T_m+2^shift-1)＞＞shift (I)

or

E_mxm＝CoeffMatrix×T_m (II)

Wherein, CoeffMatrix represents an input matrix required by multimedia processing, and takes an image transformation example, for forward transformation into an original image value, and for inverse transformation, the CoeffMatrix is a coded coefficient; shift represents the number of bits shifted; e_m×mIs an output matrix, namely a matrix multiplication result matrix; tm is a specific coefficient matrix required for m × m (m ═ 4, 8) transformation, whose values take different fixed values during the transformation of different multimedia processes, where:

A) in the discrete cosine transform of AVS format, m is 8, T₈Is represented as follows:

T_{8} = [\begin{matrix} 8 & 10 & 10 & 9 & 8 & 6 & 4 & 2 \\ 8 & 9 & 4 & - 2 & - 8 & - 10 & - 10 & - 6 \\ 8 & 6 & - 4 & - 10 & - 8 & 2 & 10 & 9 \\ 8 & 2 & - 10 & - 6 & 8 & 9 & - 4 & - 10 \\ 8 & - 2 & - 10 & 6 & 8 & - 9 & - 4 & 10 \\ 8 & - 6 & - 4 & 10 & - 8 & - 2 & 10 & - 9 \\ 8 & - 9 & 4 & 2 & - 8 & 10 & - 10 & 6 \\ 8 & - 10 & 10 & - 9 & 8 & - 6 & 4 & - 2 \end{matrix}]

B) in the discrete cosine transform of VC-1 format, m is 8 or 4, T₈，T₄Respectively, as follows:

T_{8} = [\begin{matrix} 12 & 12 & 12 & 12 & 12 & 12 & 12 & 12 \\ 16 & 15 & 9 & 4 & - 4 & - 9 & - 15 & - 16 \\ 16 & 6 & - 6 & - 16 & - 16 & - 6 & 6 & 16 \\ 15 & - 4 & - 16 & - 9 & 9 & 16 & 4 & - 15 \\ 12 & - 12 & - 12 & 12 & 12 & - 12 & - 12 & 12 \\ 9 & - 16 & 4 & 15 & - 15 & - 4 & 16 & - 9 \\ 6 & - 16 & 16 & - 6 & - 6 & 16 & - 16 & 6 \\ 4 & - 9 & 15 & - 16 & 16 & - 15 & 9 & - 4 \end{matrix}]

T_{4} = [\begin{matrix} 17 & 17 & 17 & 17 \\ 22 & 10 & - 10 & - 22 \\ 17 & - 17 & - 17 & 17 \\ 10 & - 22 & 22 & - 10 \end{matrix}]

C) in the discrete cosine transform of h.264 format, m is 8, 4, T₈，T₄Respectively, as follows:

T_{8} = [\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 12 & 10 & 6 & 3 & - 3 & - 6 & - 10 & - 12 \\ 8 & 4 & - 4 & - 8 & - 8 & - 4 & 4 & 8 \\ 10 & - 3 & - 12 & - 6 & 6 & 12 & 3 & 10 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 6 & - 12 & 3 & 10 & - 10 & - 3 & 12 & - 6 \\ 4 & - 8 & 8 & - 4 & - 4 & 8 & - 8 & 4 \\ 3 & - 6 & 10 & - 12 & 12 & - 10 & 6 & - 3 \end{matrix}]

T_{4} = [\begin{matrix} 2 & 2 & 2 & 2 \\ 2 & 1 & - 1 & - 2 \\ 2 & - 2 & - 2 & 2 \\ 1 & - 2 & 2 & - 1 \end{matrix}]

D) in the Hadamard transform of h.264 format, m is 4, and T4 represents as follows:

T_{4} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 1 & 1 & - 1 \end{matrix}]

the multimedia transformation multiplier disclosed by the invention can complete the matrix multiplication operation of formula (I) or formula (II) as an implementable mode.

Since 8 × 8 matrix multiplication can be realized by using 4 × 4 matrix to perform block matrix multiplication, in the embodiment of the present invention, 4 × 4 matrix multiplication is described first; the process of implementing 8 × 8 matrix multiplication by block matrix multiplication using a 4 × 4 matrix is then described.

As shown in fig. 1, the multimedia transform multiplier disclosed in the present invention includes a loading parameter module 1, a first storage module 2, a second storage module 3, a matrix multiplication module 4, a third storage module 5, a fourth storage module 7 and an extraction control module 6.

Wherein:

the loading parameter module 1 is used for loading and respectively storing the data of the transformation matrix and the coefficient matrix for multimedia transformation operation into the second storage module 3 and the first storage module 2; loading the extraction control parameter values and storing the extraction control parameter values in the extraction control module 6;

the second storage module 3 and the first storage module 2 are used for storing data of the input transformation matrix CoeffMatrix and data of the coefficient matrix T;

as an implementation, each element of the input transformation matrix CoeffMatrix is a 16-bit signed integer, and is stored in the second storage module 3 from lower to higher bits by row first.

As an implementable way, the second memory module 3 comprises at least 16-bit registers for storing 16 signed numbers of the transformation matrix.

The first memory module 2 is used to store each element of the coefficient matrix T.

As an implementable way, the first memory module 2 stores each element T of the coefficient matrix T_ijAs with the second memory module 3, the embodiments of the present invention will not be described in detail.

And the matrix multiplication module 4 is configured to take out an element of the transformation matrix stored in the second storage module 3 and each element of the coefficient matrix T stored in the first storage module 2 according to the storage start address parameters of the first storage module 2 and the second storage module 3 loaded by the parameter loading module, perform matrix multiplication operation, and store an obtained multiplication result in the third storage module 5, so that CoeffMatrix XT is completed.

It should be noted that the computer matrix multiplication operation is a prior art, and therefore, in the embodiment of the present invention, the present invention is not described in detail.

And a third storage module 5 for storing each element of the intermediate result matrix multiplied by the matrix multiplication module.

As an implementable way, each element of the intermediate result matrix stored by the third storage module 5 is a 32-bit signed integer. The third memory module 5 comprises at least 16 32-bit registers for 16 32-bit signed integers of the intermediate result matrix.

The extraction control module 6 is used for reading the extraction control parameter values loaded in the loading parameter module, performing corresponding addition and shift operations by using the data of the intermediate result matrix in the third storage module 5, and storing the data of the result matrix in the fourth storage module 7;

and the fourth storage module 7 is used for storing each element of the final multiplication result matrix obtained by extraction by the extraction control module 6.

As an implementation manner, the fourth storage module 7 stores each element of the final multiplication result matrix extracted by the extraction control module 6 according to the storage head address parameter loaded by the parameter loading module, and each element of the final multiplication result matrix is stored the same as that of the first storage module 2.

The third storage module 5 temporarily stores intermediate result matrix data obtained by multiplying the input transformation matrix data and the coefficient matrix data, and the intermediate result matrix data is extracted by the extraction control module 6 to obtain a required result and then stored in the fourth storage module 7.

Preferably, the multimedia transform multiplier according to the embodiment of the present invention further includes a transposer 8, configured to perform a transposing operation on a coefficient matrix of the multimedia transform operation to obtain data of the first matrix.

Preferably, as an implementation manner, the extraction control module 6 of the present invention includes a judging module 61 and a shifting module 62;

wherein:

the judgment module 61 is provided with an extraction mode bit and an operation control bit, and is used for reading an extraction mode value and an operation control value loaded in the loading parameter module and determining an extraction mode;

the shift module 62 is provided with a shift control bit, and is used for reading the shift parameter value loaded in the loading parameter module and performing shift operation on the extracted intermediate result matrix according to the shift parameter value;

the extraction control parameter values include extraction mode values, operation control values and shift parameter values.

Wherein, the extraction mode is one, two, three or four determined by setting the extraction mode value to1 or 0, for example, when the extraction mode value is set to1, the first or second mode is adopted, and when the extraction mode value is 0, the third or fourth mode is adopted;

next, an operation control value is set in the extraction control module 6, and whether rounding operation is required in the first and second extraction methods or whether the upper half or the lower half of the intermediate result matrix is extracted in the third and fourth extraction methods is determined by setting the operation control value to1 or 0.

If the value of m is 1 and the value of n is set to1, rounding operation is required, namely a first extraction mode; setting the value of m to be 1, and when the value of n is 0, performing rounding operation, namely a second extraction mode; if the value of m is set to be 0 and the value of n is set to be 1, taking the lower half part of the intermediate result matrix, namely a third extraction mode; setting m to a value of 0 and n to a value of 0, the upper half of the intermediate result matrix, i.e. the fourth extraction pattern, is taken.

Preferably, as an implementation, there are four extraction modes:

1) the extraction mode is 11, i.e. the transformation process for which addition and shifting are required after matrix multiplication. This process first performs an addition (i.e., 2 shift-1 addition) and then extracts the desired 16-bit result by shifting (i.e., right shifting bits). This mode is generally applicable to transform operations where the result of the operation has valid bits within 16 bits and requires rounding. See example 1 for applications;

2) the extraction mode is 10, i.e. for transformation processes that do not require rounding. This process does not add, but directly shifts to extract the desired 16-bit result (i.e., shift bit right). This mode applies to transform processes where the result of an operation has significant bits within 16 bits and does not require rounding or a simplified approximation to reduce the number of operations.

3) The extraction pattern is 01, i.e. the lower half of the intermediate result matrix, i.e. the lower two rows of the intermediate result matrix, is extracted directly, for a total of 16 32-bit elements. In cooperation with mode 4) to save the entire 32-bit precision matrix multiplication result, can be used for a transformation process with a result precision higher than 16 bits or for multiplication of sub-blocks to complete block matrix multiplication. For the case that the result precision is higher than 16 bits, it is necessary to obtain the lower half matrix and the upper half matrix of the result respectively, and then perform the corresponding 32-bit addition and bit-weighing operations on them to obtain the corresponding result. In the case where the block size is larger than 4 × 4, the transform operation needs to be realized by block matrix multiplication, but since the addition operation is performed after the sub-matrix multiplication, the result of the sub-matrix multiplication must be maintained with 32-bit accuracy. This is because the result C00 × T00+ C01 × T10 is 16 bits, but the two addends C00 × T00 and C01 × T1 may have 32 bits, even though they are signed numbers involved in the operation. This mode is widely applied to the block matrix multiplication process, as can be seen in example 4;

4) the extraction mode is 00, namely the high half of the intermediate result matrix is extracted, and the extraction mode is matched with the mode 3) to store all matrix multiplication results with the precision of 32 bits, and the extraction mode can be used for a transformation process with high requirement on the precision of the intermediate result or used for multiplication of subblocks for completing block matrix multiplication.

Preferably, the shift parameter is a shift control bit that specifies a sufficient number of bits and is set, for example, if 7 bits are to be shifted, at least three bits are required to be represented by 2-ary numbers, and no less than three bits are required to be specified as the shift control bit.

The multimedia transformation multiplier provided by the invention realizes the transformation of multimedia data with different formats, has good universality, can greatly accelerate the media decoding speed and save computer resources.

The processing method of the multimedia transform multiplier provided by the invention is further explained by combining a matrix multiplication formula (I).

Let vs be the matrix CoeffMatrix stored in the second memory module 3, vt be the coefficient matrix T stored in the first memory module 2, temp be the intermediate result matrix temp stored in the third memory module 5, the elements of the vs and vt matrices are 16-bit signed integers, and each element of temp is 32-bit signed integer, then

The extraction process is represented as follows:

vd is the result matrix stored in the fourth memory block 7.

In an implementation, one byte is set as the mode control module, the extraction mode bit is the 7 th bit, i.e., m is 6, the operation control bit is the 6 th bit, i.e., n is 5, and the 1 st to 5 th bits are the shift control bits, which may indicate a shift of 31 bits at maximum (11111).

Correspondingly, the invention also provides a processing method of the multimedia transform multiplier, which comprises the following steps:

and S100, loading the data of the first matrix and the data of the second matrix, and calculating a control parameter value.

As an implementable manner, the processing method of the multimedia transform multiplier according to the embodiment of the present invention loads the storage start address parameters of the first storage module, the second storage module, and the fourth storage module, and extracts the mode parameter and the shift bit number parameter;

and step S200, performing matrix multiplication operation on the data of the first matrix and the data of the second matrix to obtain data of an intermediate result matrix.

As an implementable manner, the processing method of the multimedia transform multiplier according to the embodiment of the present invention extracts, according to the storage start address parameters of the first storage module and the second storage module loaded by the loading parameter module, the elements of the transform matrix stored in the second storage module and each element of the coefficient matrix T stored in the first storage module, performs matrix multiplication, and stores the obtained multiplication result in the third storage module.

And step S300, controlling the data of the intermediate result matrix to carry out operation according to the loaded operation control parameter values to obtain the data of the result matrix.

As an implementable manner, the processing method of the multimedia transform multiplier according to the embodiment of the present invention extracts a shift from the intermediate result matrix stored in the third storage module according to the extraction mode parameter and the shift bit number parameter loaded by the loading parameter module, and stores the final multiplication result in the fourth storage module. Preferably, the roles of the first and second memory blocks in the multiplier are identical, so that both the transform matrix and the coefficient matrix can be stored in the second memory block or the first memory block.

Preferably, the step S300 includes the steps of:

step S310, reading the parameter value of the operation mode and determining the operation mode;

as an implementation mode, different extraction modes are adopted for the intermediate result matrix after judgment according to the extraction mode parameters loaded by the parameter loading module;

and step S320, reading the operation parameter values, and controlling the intermediate result matrix to perform corresponding operation according to the operation parameter values in the operation mode determined in the step S310 to obtain data of a final result matrix.

As an implementation manner, after the intermediate result matrix is shifted according to different extraction modes and shift parameters, each element of the final multiplication result matrix is obtained.

As an implementable manner, the following instructions may be utilized:

load $ v0, coffmatrix: indicating that a 4x4 matrix CoffiMatrix is loaded into the vector register $ v 0;

vaddh $ v2, $ v1, $ v 0: the vector registers $ v0, $ v1 are represented, signed addition is carried out on each 16-bit unit in the corresponding position, and the obtained result is stored in the corresponding position of $ v 2;

vaddw $ v2, $ v1, $ v 0: the vector registers are represented as $ v0, $ v1, signed addition is carried out on each 32-bit unit in the corresponding position, and the obtained result is stored in the corresponding position of $ v 2;

vshifth $ v1, $ v0, imm: indicating that each 16 bit cell in $ v0 is shifted by imm bits;

vshiftw $ v1, $ v0, imm: indicating that each 32 bit cell in $ v0 is shifted by imm bits;

convert32to16 $ v2, $ v1, $ v 0: shows that each 32-bit element in $ v1, $ v0 is packed into a 16-bit half word, and is saturated.

As an implementable manner, the matrix multiplication in the multimedia transform multiplier is implemented as follows:

temp00[31:0]＝vs[15:0]*vt[15:0]+vs[31:16]*vt[79:64]+vs[47:32]*vt[143:128]+vs[63:48]*vt[207:192]；

temp01[31:0]＝vs[15:0]*vt[31:16]+vs[31:16]*vt[95:80]+vs[47:32]*vt[159:144]+vs[63:48]*vt[223:208]；

temp02[31:0]＝vs[15:0]*vt[47:32]+vs[31:16]*vt[111:96]+vs[47:32]*vt[175:160]+vs[63:48]*vt[239:224]；

temp03[31:0]＝vs[15:0]*vt[63:48]+vs[31:16]*vt[127:112]+vs[47:32]*vt[191:176]+vs[63:48]*vt[255:240]；

temp10[31:0]＝vs[79:64]*vt[15:0]+vs[95:80]*vt[79:64]+vs[111:96]*vt[143:128]+vs[127:112]*vt[207:192]；

temp11[31:0]＝vs[79:64]*vt[31:16]+vs[95:80]*vt[95:80]+vs[111:96]*vt[159:144]+vs[127:112]*vt[223:208]；

temp12[31:0]＝vs[79:64]*vt[47:32]+vs[95:80]*vt[111:96]+vs[111:96]*vt[175:160]+vs[127:112]*vt[239:224]；

temp13[31:0]＝vs[79:64]*vt[63:48]+vs[95:80]*vt[127:112]+vs[111:96]*vt[191:176]+vs[127:112]*vt[255:240]；

temp20[31:0]＝vs[143:128]*vt[15:0]+vs[159:144]*vt[79:64]+vs[175:160]*vt[143:128]+vs[191:176]*vt[207:192]；

temp21[31:0]＝vs[143:128]*vt[31:16]+vs[159:144]*vt[95:80]+vs[175:160]*vt[159:144]+vs[191:176]*vt[223:208]；

temp22[31:0]＝vs[143:128]*vt[47:32]+vs[159:144]*vt[111:96]+vs[175:160]*vt[175:160]+vs[191:176]*vt[239:224]；

temp23[31:0]＝vs[143:128]*vt[63:48]+vs[159:144]*vt[127:112]+vs[175:160]*vt[191:176]+vs[191:176]*vt[255:240]；

temp30[31:0]＝vs[207:192]*vt[15:0]+vs[223:208]*vt[79:64]+vs[239:224]*vt[143:128]+vs[255:240]*vt[207:192]；

temp31[31:0]＝vs[207:192]*vt[31:16]+vs[223:208]*vt[95:80]+vs[239:224]*vt[159:144]+vs[255:240]*vt[223:208]；

temp32[31:0]＝vs[207:192]*vt[47:32]+vs[223:208]*vt[111:96]+vs[239:224]*vt[175:160]+vs[255:240]*vt[239:224]；

temp33[31:0]＝vs[207:192]*vt[63:48]+vs[223:208]*vt[127:112]+vs[239:224]*vt[191:176]+vs[255:240]*vt[255:240]；

if (imm8[6 ])//// imm8 is a mode control instruction// H

{

if (imm8[5 ])// 7 th bit is 1 and 6 th bit is 1, a first extraction pattern

dij (tij +1 < (imm8[4:0] -1)) > imm8[4:0 ]///' rounding operation >

else //**//

dij tij > imm8[4:0]// [ 7 th position is 1, 6 th position is 0, a second extraction pattern

J takes a value from 0 to 3

else

{

if (imm8[5 ])// bit 7 takes the value 0, bit 6 takes the value 1, and a third extraction pattern

{

d01，d00＝t00；

d02，d03＝t01；

d33，d32＝t13；

}

else//' bit 7 takes the value 0, bit 6 takes the value 0, and a fourth extraction pattern// H

{

d01，d00＝t20；

d02，d03＝t21；

d33，d32＝t33；

}

If the operation program of the multimedia transform multiplier is integrated into a single instruction, the instruction is named as vmtxmulh, the operation mode of the instruction is as follows:

the first element address of the first memory module, the first element address of the second memory module, the first element address of the fourth memory module, the extraction mode parameter and the shift parameter of the vmtxmulh;

taking a specific application environment as an example, it can be considered that the multimedia transform multiplier performs one-dimensional transform after performing one operation. For example, if an image is processed, two-step transformations (i.e. two-dimensional transformations) including horizontal and vertical are required, so that a two-dimensional transformation can be performed by combining the above-mentioned multimedia transformation multipliers. The two-dimensional transformation realized by the multimedia transformation multiplier and the processing method thereof is further explained below with reference to the attached drawings.

As an implementable mode, the processing method of the multiplier provided by the invention realizes two-dimensional transformation. The two-step multiplication operations that the two-dimensional transformation needs to complete are as follows:

E＝(f×A^T+2^shift1-1)＞＞shift1 (*)

F＝(A×E+2^shift2-1)＞＞shift2

therefore, as an implementable manner, when performing two-dimensional transformation, the processing method of the multimedia transformation multiplier according to the embodiment of the present invention further includes, before step S200, the following steps:

and step S200', transposing the coefficient matrix subjected to the multimedia transformation operation last time to obtain data of the first matrix.

When two-dimensional transformation is carried out, the coefficient matrix A is transposed to obtain a transposed matrix A^T(ii) a Then, the result matrix obtained after the last generation operation of the operation control module of the multiplier and the transposed coefficient matrix A are used^TLoading into the second and first storage modules respectively;

then, repeating the step S200 and the step S300 to carry out iterative operation to obtain a final result matrix E;

as an implementation manner, the result matrix E obtained after the last generation operation of the operation control module of the multiplier is sent to the second storage module of the multimedia transform multiplier to participate in the iterative multiplication operation, so as to obtain the final result matrix E.

Preferably, after the iterative operation is finished, rounding operation is further performed according to the requirement of the data bit, so as to obtain a final result matrix E.

As another possible implementation, the matrices E and A can be used as new input transformation matrix and coefficient matrix, respectively, and reloaded into another multimedia transformation multiplier, so as to complete the second step operation serially or iteratively.

The embodiment can obtain that the two-dimensional multimedia transformation is completed by adopting the invention, and the same multimedia transformation multiplier can be adopted to repeatedly carry out iterative operation, or two multimedia transformation multipliers are adopted to carry out serial or iterative operation to obtain the multi-dimensional multimedia transformation.

Therefore, the operational flexibility of the multimedia transformation multiplier can be greatly improved through the transposer of the processor and iterative or serial operation. For example, in the above embodiment, the coefficient matrix is transposed before it is written to the multiplier and multiplied by the input transform matrix, and to distinguish this multiplication from the previously described multiplication, the instruction may be named vmtxvmulh.

Similar to the vmtxmulh instruction, the operation mode of the vmtxmulh instruction is:

the first element address of a first memory module, the first element address of a second memory module, the first element address of a fourth memory module, the extraction mode parameter and the shift parameter of Vmtxinvmulh;

it should be noted that the above formula is only used to illustrate the transformation process of the present invention, but in different transformation processes, there may be a transposition process or no transposition process, and the iteration may have more than two iterations according to the transformation process, that is, by using the multiplier and the multimedia processing transformation method provided by the present invention, one-dimensional, two-dimensional or even more than two-dimensional transformation can be realized. The formula (#) of the present invention is only for better illustrating the transformation process of the present invention, and is not limited thereto. In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention.

Example 1:

as an implementable embodiment, the procedure for implementing the two-dimensional transformation of formula (, v) using the combination of vmtxmulh and vmtxvmulh instructions in the multiplier is as follows:

load $ v0, f; ($ v0 denotes the first matrix data, this sentence loads matrix f into $ v0)

load $ v1, A; ($ v1 denotes secondary matrix data, this sentence for matrix A $ v1)

vmtxmulh $ v2, $ v0, $ v1, 0x 63; (first extraction mode, shift 3 bits)

vmtxvmulh $ 3, $ v1, $ v2, 0x 67; (first extraction mode, shift 7 bits)

Example 2:

for some decoders with less stringent accuracy requirements, rounding can be omitted from the operation to reduce the amount of operation, thereby increasing the operation speed. Also based on the above example, the following procedure can be obtained:

E_4x4＝(CoeffMatrix×T4)＞＞3

R_4x4＝(T₄×E_4x4)＞＞7 (**)

the instructions that can be used in the present invention are implemented as follows:

load v0，CoeffMatrix；

load v1，T4；

vmtxmulh v $ v2, $ v0, $ v1, 0x 43; (second middle extraction mode, shift 3 bits)

//imm8[6]＝1，imm8[5]＝0；imm8[4:0]＝3

vmtxinvmulh $v3，$v1，$v2，0x47；

//imm8[6]＝1，imm8[5]＝0；imm8[4:0]＝7

Example 3:

for matrices larger than 4x4 and intermediate results can be represented by 16 bits:

for a block such as 8x8, since the above instruction can only complete one 4x4 matrix multiplication, it can be implemented here with block matrix multiplication. The block matrix multiplication method is as follows:

for T₈ ^T×E_8x8Can be calculated as follows

Due to the specificity of the media application, so that the intermediate result will not exceed 16 bits in general, taking VC-1 as an example, it can be implemented as the following procedure, as shown in fig. 2, which is a flow chart of the method.

1：

load $v0，CoeffMatrix00；

load $v1，CoeffMatrix01；

load $v2，CoeffMatrix10；

load $v3，CoeffMatrix11；

load $v4，T00；

load $v5，T01；

load $v6，T10；

load $v7，T11；

2：

vmtxmulh $v8，$v0，$v4，0x40

//imm8[6]＝1，imm8[5]＝0；imm8[4:0]＝0

vmtxmulh $v9，$v1，$v6，0x40

vmtxmulh $v10，$v0，$v5，0x40

vmtxmulh $v11，$v1，$v7，0x40

vmtxmulh $v12，$v2，$v4，0x40

vmtxmulh $v13，$v3，$v6，0x40

vmtxmulh $v14，$v2，$v5，0x40

vmtxmulh $v15，$v3，$v7，0x40

3：

vaddh $v8，$v8，$v9

vaddh $v9，$v10，$v11

vaddh $v10，$v12，$v13

vaddh $v11，$v14，$v15

4：

load $v12，{4(16bits)<repeat 16 times>}；

5：

vaddh $v8，$v8，$v12

vaddh $v9，$v9，$v12

vaddh $v10，$v10，$v12

vaddh $v11，$v11，$v12

6：

vsrahi $v8，$v8，3

vsrahi $v9，$v9，3

vsrahi $v10，$v10，3

vsrahi $v11，$v11，3

7：

vmtxinvmulh $v12，$v4，$v8，0x40

//imm8[6]＝1，imm8[5]＝0；imm8[4:0]＝0

vmtxinvmulh $v13，$v6，$v10，0x40

vmtxinvmulh $v14，$v4，$v9，0x40

vmtxinvmulh $v15，$v6，$v11，0x40

vmtxinvmulh $v16，$v5，$v8，0x40

vmtxinvmulh $v17，$v7，$v10，0x40

vmtxinvmulh $v18，$v5，$v9，0x40

vmtxinvmulh $v19，$v7，$v11，0x40

8：

vaddh $v8，$v12，$v13

vaddh $v9，$v14，$v15

vaddh $v10，$v16，$v17

vaddh $v11，$v18，$v19

9：

load $v12，{64(16bits)<repeat 16 times>}；

10：

vaddh $v8，$v8，$v12

vaddh $v9，$v9，$v12

vaddh $v10，$v10，$v12

vaddh $v11，$v11，$v12

11：

vsrahi $v8，$v8，7

vsrahi $v9，$v9，7

vsrahi $v10，$v10，7

vsrahi $v11，$v11，7；

Example 4:

for matrices larger than 4x4, here 8x8 and the intermediate results are represented by 32 bits:

because of its intermediate result, each sub-block a × B has 32 bits due to multiplication, each vector register can only store half of the matrix, and the flow chart is as shown in fig. 3, and is implemented as follows:

1：

load $v0，CoeffMatrix00；

load $v1，CoeffMatrix01；

load $v2，CoeffMatrix10；

load $v3，CoeffMatrix11；

load $v4，T00；

load $v5，T01；

load $v6，T10；

load $v7，T11；

2：

vmtxmulh $v8，$v0，$v4，0x0；

vmtxmulh $v9，$v0，$v4，0x20；

vmtxmulh $v10，$v1，$v6，0x0；

vmtxmulh $v11，$v1，$v6，0x20；

vmtxmulh $v12，$v0，$v5，0x0；

vmtxmulh $v13，$v0，$v5，0x20；

vmtxmulh $v14，$v1，$v7，0x0；

vmtxmulh $v15，$v1，$v7，0x20；

vmtxmulh $v16，$v2，$v4，0x0；

vmtxmulh $v17，$v2，$v4，0x20；

vmtxmulh $v18，$v3，$v6，0x0；

vmtxmulh $v19，$v3，$v6，0x20；

vmtxmulh $v20，$v2，$v5，0x0；

vmtxmulh $v21，$v2，$v5，0x20；

vmtxmulh $v22，$v3，$v7，0x0；

vmtxmulh $v23，$v3，$v7，0x20；

3：

vaddw $v8，$v8，$v10

vaddw $v9，$v9，$v11

vaddw $v10，$v12，$v14

vaddw $v11，$v13，$v15

vaddw $v12，$v16$v18

vaddw $v13，$v17，$v19

vaddw $v14，$v20，$v22

vaddw $v15，$v21，$v23

4：

load $v16，{4(32bits)<repeat 8 times>}；

5：

vaddw $v8，$v8，$v16

vaddw $v9，$v9，$v16

vaddw $v10，$v10，$v16

vaddw $v11，$v11，$v16

vaddw $v12，$v12，$v16

vaddw $v13，$v13，$v16

vaddw $v14，$v14，$v16

vaddw $v15，$v15，$v16

6：

vsrawi $v8，$v8，3

vsrawi $v9，$v9，3

vsrawi $v10，$v10，3

vsrawi $v11，$v11，3

vsrawi $v12，$v12，3

vsrawi $v13，$v13，3

vsrawi $v14，$v14，3

vsrawi $v15，$v15，3

7：

convert32to16 $v8，$v8，$v9

convert32to16 $v9，$v10，$v11

convert32to16 $v10，$v12，$v13

convert32to16 $v11，$v14，$v15

8：

vmtxinvmulh $v12，$v4，$v8，0x0；

vmtxinvmulh $v13，$v4，$v8，0x20；

vmtxinvmulh $v14，$v6，$v10，0x0；

vmtxinvmulh $v15，$v6，$v10，0x20；

vmtxinvmulh $v16，$v4，$v9，0x0；

vmtxinvmulh $v17，$v4，$v9，0x20；

vmtxinvmulh $v18，$v6，$v11，0x0；

vmtxinvmulh $v19，$v6，$v11，0x20；

vmtxinvmulh $v20，$v5，$v8，0x0；

vmtxinvmulh $v21，$v5，$v8，0x20；

vmtxinvmulh $v22，$v7，$v10，0x0；

vmtxinvmulh $v23，$v7，$v10，0x20；

vmtxinvmulh $v24，$v5，$v9，0x0；

vmtxinvmulh $v25，$v5，$v9，0x20；

vmtxinvmulh $v26，$v7，$v11，0x0；

vmtxinvmulh $v27，$v7，$v11，0x20；

9：

vaddw $v8，$v12，$v14

vaddw $v9，$v13，$v15

vaddw $v10，$v16，$v18

vaddw $v11，$v17，$v19

vaddw $v12，$v20$v122

vaddw $v13，$v21，$v23

vaddw $v14，$v24，$v26

vaddw $v15，$v25，$v27

10：

load $v16，{64(32bits)<repeat 8 times>}；

11：

vaddw $v8，$v8，$v16

vaddw $v9，$v9，$v16

vaddw $v10，$v10，$v16

vaddw $v11，$v11，$v16

vaddw $v12，$v12，$v16

vaddw $v13，$v13，$v16

vaddw $v14，$v14，$v16

vaddw $v15，$v15，$v16

12：

vsrawi $v8，$v8，7

vsrawi $v9，$v9，7

vsrawi $v10，$v10，7

vsrawi $v11，$v11，7

vsrawi $v12，$v12，7

vsrawi $v13，$v13，7

vsrawi $v14，$v14，7

vsrawi $v15，$v15，7

convert32to16 $v8，$v8，$v9

convert32to16 $v9，$v10，$v11

convert32to16 $v10，$v12，$v13

convert32to16 $v11，$v14，$v15

example 5:

for a general matrix multiplication process, obtaining a 16-bit truncated matrix multiplication value may be implemented by the following procedure:

load v0，M1；

load v1，M2；

vmtxmulh $v2，$v0，$v1，0x40

//imm8[6]＝1，imm8[5]＝0；imm8[4:0]＝0

the multimedia transformation multiplier and the processing method thereof greatly improve the speed of the processor for encoding and decoding the streaming multimedia. For the above-mentioned inverse transformation of vc1 format 4x4, with-02 optimization of GCC, the resulting assembly can see that its core loop requires roughly 110 instructions, whereas by the method of the present invention only a few instructions are needed; meanwhile, for the operation time, the core loop is executed four times in the general method, if in the ideal full flow condition, it is executed about 450 beats, and if in the full flow condition, the method is executed dozens to tens of beats. In addition, in actual situations, neither method can completely realize ideal full-flow water, so that even under the condition of considering memory access delay and the like, the method has a very large acceleration ratio. For other formats, the results obtained are essentially the same as in this example due to the similarity of the inverse transform format.

Finally, it should be noted that it is obvious that various changes and modifications can be made to the present invention by those skilled in the art without departing from the spirit and scope of the present invention. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

1. A multimedia transform multiplier is characterized by comprising a matrix multiplication module and an operation control module;

the operation control module is used for reading the operation control parameter values and controlling the data of the intermediate result matrix to perform operation according to the operation control parameter values to obtain the data of the result matrix;

the data of the first matrix is data of a coefficient matrix of multimedia transformation operation; the data of the second matrix is data of a transformation matrix for multimedia transformation operation;

or,

the data of the first matrix is the data of a transposed matrix of a coefficient matrix of the multiplier which carries out multimedia transformation operation last time; and the data of the second matrix is the data of a result matrix obtained after the last operation of the operation control module of the multiplier.

2. The multimedia transform multiplier of claim 1, further comprising a parameter loading module for loading data of the first matrix, data of the second matrix, and operation control parameter values.

3. The multimedia transform multiplier of claim 1, further comprising a transposer for transposing the coefficient matrix of the multimedia transform operation to obtain data of the first matrix.

4. The multimedia transform multiplier device according to any of claims 1 to 3, wherein the operation control parameter values comprise an operation mode parameter value and an operation parameter value;

5. The multimedia transform multiplier of claim 4 wherein the decision module comprises a bit precision bit and an operation mode bit; the judging module determines an operation mode according to the read operation mode parameter value through a bit precision bit and an operation mode bit; the digit precision bit and the operation mode bit are respectively expressed by binary numbers.

6. The multimedia transform multiplier of claim 5, wherein the bit precision bits are bit precision bits requiring more than 16 bits or less than 16 bits for the data precision significant bits of the intermediate matrix;

7. The multimedia transform multiplier of claim 4, wherein the operation module comprises operation control bits, the operation control bits representing the number of shift bits.

8. The multimedia transform multiplier of any of claims 1 to 3, further comprising a first storage module, a second storage module, a third storage module and a fourth storage module; wherein:

the first storage module is used for storing the data of the first matrix;

the second storage module is used for storing the data of the second matrix;

9. A method for processing a multimedia transform multiplier, comprising the steps of:

b, controlling the data of the intermediate result matrix to carry out operation according to the loaded operation control parameter values to obtain the data of the result matrix;

the data of the first matrix is data of a coefficient matrix of multimedia transformation operation, and the data of the second matrix is data of a transformation matrix of multimedia transformation operation;

or,

the first matrix is data of a transposed matrix of a coefficient matrix of the multiplier which carries out multimedia transformation operation last time; and the data of the second matrix is the data of a result matrix obtained after the last generation operation of the operation control module of the multiplier.

10. The method of claim 9, wherein said step a is preceded by the steps of:

11. The method of claim 9, wherein step a is preceded by the steps of:

12. The processing method of the multimedia transform multiplier of claim 9, wherein the operation control parameter values comprise operation mode parameter values and operation parameter values, and the step B comprises the steps of: