CN116755663A

CN116755663A - High-level synthesis HLS library

Info

Publication number: CN116755663A
Application number: CN202310650897.1A
Authority: CN
Inventors: 张川; 赵舞穹; 郭祉辰; 陈宣伯; 计奥; 李昌瀚; 冀贞昊; 尤优; 葛荧萌; 黄永明; 尤肖虎
Original assignee: Network Communication and Security Zijinshan Laboratory
Current assignee: Network Communication and Security Zijinshan Laboratory
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-09-15

Abstract

The invention provides a high-level synthesis HLS library, which is applied to the field of digital signal processing and comprises the following components: a MatView module, a Mat module and a matrix operation realization module; the MatView module is realized based on MatView class and is used for determining a matrix storage address, a matrix reading mode and a matrix operation type of the digital signal processing matrix, sending the matrix storage address and the matrix reading mode to the Mat module and sending the matrix operation type to the matrix operation realization module; the Mat module is realized based on Mat class and is used for storing a matrix and a matrix type, and determining a to-be-operated digital signal processing matrix and a matrix type which are sent to the matrix operation realization module based on a matrix storage address and a matrix reading mode; the matrix operation realization module is used for executing matrix operation based on the to-be-operated digital signal processing matrix, the matrix type and the matrix operation type, so that the waste of hardware resources is avoided.

Description

High-level synthesis HLS library

Technical Field

The invention relates to the technical field of digital signal processing, in particular to a high-level synthesis HLS library.

Background

For decades, efficient hardware design for digital signal processing (e.g., baseband signal processing) has been sought. However, design workflows that traditionally use hardware description languages (hardware description language, HDL) can be very time consuming. To address this problem, in recent years, high level synthesis (high level synthesis, HLS) has been assisting hardware design to allow other programming languages (e.g., c++) to be integrated into hardware.

The existing HLS library is designed by representing matrix operation in a unified way, and hardware resources are wasted greatly.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a high-level synthesis HLS library.

In a first aspect, the present invention provides a high-level synthesis HLS library for use in the field of digital signal processing, comprising:

a MatView module, a Mat module and a matrix operation realization module;

the MatView module is realized based on MatView class and is used for determining a matrix storage address, a matrix reading mode and a matrix operation type of a digital signal processing matrix, sending the matrix storage address and the matrix reading mode to the Mat module and sending the matrix operation type to the matrix operation realization module;

the Mat module is realized based on Mat class and is used for storing a matrix and a matrix type, and determining a to-be-operated digital signal processing matrix and the matrix type of the to-be-operated digital signal processing matrix, which are sent to the matrix operation realization module, based on the matrix storage address and the matrix reading mode sent by the MatView module;

the matrix operation realization module is used for executing matrix operation based on the matrix type of the to-be-operated digital signal processing matrix and the to-be-operated digital signal processing matrix sent by the Mat module and the matrix operation type sent by the MatView module;

Wherein the digital signal processing matrix comprises: at least one of a received signal matrix, a channel matrix, and an intermediate computational variable matrix in a digital signal processing process.

Optionally, the performing matrix operation based on the matrix type of the to-be-operated digital signal processing matrix and the to-be-operated digital signal processing matrix sent by the Mat module and the matrix operation type sent by the Mat view module includes:

determining a plurality of first position indexes under the condition that the matrix operation type is matrix addition operation, wherein two elements corresponding to the same first position index in the two matrices to be operated are non-zero elements;

inputting two elements corresponding to the same first position index in the two matrixes to be operated into the same adder for operation, obtaining element values corresponding to the first position indexes in a result matrix of the matrix addition operation, wherein the element values corresponding to the second position index in the result matrix of the matrix addition operation are obtained according to element values of corresponding positions in the two matrixes to be operated; wherein at least one of the two elements of the two matrices to be operated corresponding to the same second position index is a zero element.

under the condition that the matrix operation type is matrix multiplication operation and only one of the two matrixes to be operated is scalar matrix, determining a third position index corresponding to a non-zero element in a non-scalar matrix, and respectively carrying out operation on each element corresponding to the third position index in the non-scalar matrix and a scalar input multiplier of the scalar matrix to obtain an element value corresponding to each third position index in a result matrix of the matrix multiplication operation; or alternatively, the process may be performed,

and under the condition that the matrix operation type is matrix multiplication operation and the two matrices to be operated are scalar matrices, performing operation on scalar input multipliers of the two scalar matrices, and then assigning an operation result of the multipliers to each element on a main diagonal in a result matrix of the matrix multiplication operation.

And under the condition that the matrix operation type is matrix multiplication operation and the two matrixes are diagonal matrixes, inputting main diagonal elements corresponding to indexes at the same position in the two diagonal matrixes into the same multiplier for operation, and obtaining the values of the main diagonal elements in a result matrix of the matrix multiplication operation.

when the matrix operation type is matrix multiplication operation and one of the two matrixes is an upper triangular matrix, a lower triangular matrix, a strict upper triangular matrix or a strict lower triangular matrix and the other matrix is not a scalar matrix, determining the values of r, c and i of the m-th execution key sentence operation based on a lookup table which corresponds to the multiplication operation of the two matrixes and is used for representing the relation among m, r, c and i, inputting the element at the ith row and ith column position of the left multiplication matrix and the element at the ith row and ith column position of the right multiplication matrix into the same multiplier for operation, inputting the multiplier operation result which corresponds to the multiple key sentence operations with the same r and c values into the same adder for operation, and obtaining the element value at the ith row and ith column position of the matrix multiplication operation result matrix;

The expression of the key sentence operation is Mat (r, c) +=mat left (r, i) ×mat right (i, c), wherein Mat (r, c) represents the value of an element at the position of the r-th row and the c-th column in the result matrix of the matrix multiplication operation, mat left (r, i) represents the value of an element at the position of the r-th row and the i-th column in the left multiplication matrix, mat right (i, c) represents the value of an element at the position of the i-th row and the c-th column in the right multiplication matrix, M is an integer greater than or equal to 1 and less than or equal to N, N is determined based on the number of rows and columns of the two matrices, r, c, i are integers greater than or equal to 1 and less than or equal to M, and the dimensions of the two matrices are m×m.

when the matrix operation type is matrix multiplication operation and the right multiplication matrix in the two matrices is a diagonal matrix and the left multiplication matrix is other matrix except the first matrix, inputting the elements at the jth row and the jth column in the left multiplication matrix and the elements at the jth row and the jth column in the diagonal matrix into the same multiplier for operation, so as to obtain the values of the elements at the jth column and the jth row in the result matrix of the matrix multiplication operation; s and j are integers greater than or equal to 1 and less than or equal to O, and the dimensions of the two matrices are O; or alternatively, the process may be performed,

When the matrix operation type is matrix multiplication operation and the left multiplication matrix in the two matrices is a diagonal matrix and the right multiplication matrix is other matrix except the first matrix, inputting the element at the kth line and the element at the kth column in the kth line in the right multiplication matrix and the element at the kth column in the diagonal matrix into the same multiplier for operation, so as to obtain the element value at the kth line and the kth column in the matrix as the result of the matrix multiplication operation; the k and the t are integers which are more than or equal to 1 and less than or equal to P, and the dimension of the two matrixes is P;

the first type of matrix includes a scalar matrix, a diagonal matrix, an upper triangular matrix, a lower triangular matrix, a strictly upper triangular matrix, and a strictly lower triangular matrix.

Optionally, the matrix reading mode includes:

at least one of matrix read-only operation, matrix inversion read-only operation, matrix transposition read-only operation, matrix diagonal read-only operation for column vector, matrix diagonal read-only operation for row vector, matrix non-diagonal read-only operation, matrix selection row read-only operation, matrix menu column read-only operation, matrix selection multi-row read-only operation and matrix selection multi-column read-only operation.

Optionally, the matrix operation type includes:

at least one of matrix addition operation, matrix multiplication operation, matrix inversion operation, matrix transposition operation, matrix diagonal operation, matrix row operation and matrix column operation.

Optionally, the matrix operation implementation module implements matrix multiplication operations based on systolic arrays.

Optionally, the HLS library further comprises:

the sorting module is used for determining indexes of n maximum value elements or minimum value elements in the matrix, wherein n is an integer greater than or equal to 1.

According to the high-level synthesis HLS library provided by the invention, the digital signal processing matrix to be operated is determined based on the matrix address and the matrix reading mode, and corresponding matrix operation is executed on the digital signal processing matrix to be operated based on the matrix operation type of the digital signal processing matrix to be operated, so that the waste of hardware resources is avoided.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a high level synthetic HLS library according to the present invention;

FIG. 2 is a schematic diagram of a high level synthetic HLS library according to the second embodiment of the present invention;

FIG. 3 is a schematic diagram of a systolic array provided by the present invention;

fig. 4 is a schematic diagram of a sorting flow of the sorting network according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic structural diagram of a high-level synthetic HLS library according to the present invention, as shown in fig. 1, the HLS library includes:

a MatView module 100, a Mat module 110, and a matrix operation implementation module 120;

the MatView module 100 is realized based on MatView class, and is used for determining a matrix storage address, a matrix reading mode and a matrix operation type of the digital signal processing matrix, sending the matrix storage address and the matrix reading mode to the Mat module, and sending the matrix operation type to the matrix operation realization module;

The Mat module 110 is realized based on Mat class and is used for storing matrixes and matrix types, and determining a to-be-operated digital signal processing matrix and matrix types of the to-be-operated digital signal processing matrix sent to the matrix operation realization module based on a matrix storage address and a matrix reading mode sent by the MatView module;

the matrix operation implementation module 120 is configured to perform matrix operations based on the to-be-operated digital signal processing matrix and the matrix type of the to-be-operated digital signal processing matrix sent by the Mat module, and the matrix operation type sent by the Mat view module;

Specifically, the MatView module can realize the function based on the MatView instruction of C++, and the Mat module can realize the function based on the Mat instruction of C++.

When linear algebra operation is needed, the MatView module can firstly determine the matrix storage address, the matrix reading mode and the matrix operation type of the digital signal processing matrix to be read, and after the information is determined, the MatView module can send the matrix storage address and the matrix reading mode to the Mat module and send the matrix operation type to the matrix operation realization module.

The Mat module can store actual values of a plurality of matrixes and corresponding matrix types, and after receiving a matrix storage address and a matrix reading mode sent by the MatView module, the Mat module can determine the values of all elements in the digital signal processing matrix to be operated and the matrix types of the digital signal processing matrix to be operated and send the values and the matrix types to the matrix operation realization module.

After receiving the values of all elements in the digital signal processing matrix to be operated and the matrix type of the digital signal processing matrix to be operated, the matrix operation realization module executes matrix operation on the digital signal processing matrix to be operated according to the corresponding rule according to the matrix operation type sent by the MatView module, and determines and stores the result of the matrix operation on the digital signal processing matrix to be operated and the matrix type corresponding to the result.

Wherein the digital signal processing matrix may comprise: the received signal matrix, the channel matrix and the intermediate computational variable matrix in the digital signal processing. The intermediate computation variable matrix in the digital signal processing process can be a matrix which is used in the operation of the received signal matrix and the channel matrix in the digital signal processing process and has no practical meaning.

When the HLS library is utilized to realize digital signal processing, a plurality of HLS libraries are designed by using classes/functions in the C/C++ library to realize corresponding matrix description algorithm, serial/parallel parameters are configured for the HLS library in a PRAGMA mode, register transmission level (Register Transfer Level, RTL) codes are generated through the HLS library, and corresponding field programmable gate arrays (Field Programmable Gate Array, FPGA) or other hardware circuits are designed according to the finally generated RTL codes to complete digital signal processing.

Optionally, performing matrix operation based on the to-be-operated digital signal processing matrix and the matrix type of the to-be-operated digital signal processing matrix sent by the Mat module, and the matrix operation type sent by the MatView module, including:

under the condition that the matrix operation type is matrix addition operation, determining a plurality of first position indexes, wherein two elements corresponding to the same first position index in two matrices to be operated are non-zero elements;

Inputting two elements corresponding to the same first position index in two matrixes to be operated into the same adder for operation, obtaining the element values corresponding to each first position index in the result matrix of matrix addition operation, and obtaining the element values corresponding to the second position index in the result matrix of matrix addition operation according to the element values of the corresponding positions in the two matrixes to be operated; wherein at least one of the two elements of the two matrices to be operated corresponding to the same second position index is a zero element.

Specifically, when the matrix operation implementation module determines that the matrix operation type is matrix addition operation, the position indexes of the corresponding two elements in the two matrices to be operated, which are both non-zero elements, may be determined as the first position indexes. In addition, the position index of the element, at least one of which is zero, in the corresponding two elements in the two matrices to be operated on may also be determined as the second position index.

After determining the plurality of first position indexes, two elements corresponding to the same first position index in the two matrices to be operated can be input into the same adder to operate, and the result of the adder operation is used as the value of the element corresponding to each first position index in the result matrix of the matrix addition operation. The elements are input into the same adder for operation, each group of elements can be sequentially input into one adder for operation, or a plurality of groups of elements can be simultaneously input into a plurality of adders for operation.

After determining the plurality of second position indexes, when only one corresponding second position index in the two matrices to be operated is zero element, non-zero element on the corresponding second position index in the two matrices to be operated can be assigned to the element corresponding to the second position index in the result matrix of the matrix addition operation. When two elements corresponding to the second position index in the two matrices to be operated are both 0, the element corresponding to the second position index in the result matrix of the matrix addition operation takes the value of 0.

Through the matrix addition operation in the steps, only the addition operation of non-0 elements of the matrix is executed, the number of loop execution times is reduced, and hardware resources are saved.

under the condition that the matrix operation type is matrix multiplication operation and only one of two matrixes to be operated is scalar matrix, determining a third position index corresponding to a non-zero element in a non-scalar matrix, and respectively operating each element corresponding to the third position index in the non-scalar matrix with a scalar input multiplier of the scalar matrix to obtain the element value corresponding to each third position index in a result matrix of the matrix multiplication operation; or alternatively, the process may be performed,

And under the condition that the matrix operation type is matrix multiplication operation and both the two matrices to be operated are scalar matrices, performing operation on scalar inputs of the two scalar matrices in a multiplier, and then assigning an operation result of the multiplier to each element on a main diagonal in a result matrix of the matrix multiplication operation.

Specifically, when the matrix operation implementation module determines that the matrix operation type is matrix multiplication operation and only one of the two matrices to be operated is scalar matrix, the position index of the non-zero element in the non-scalar matrix can be determined to be a third position index, then each element corresponding to the third position index in the non-scalar matrix is respectively input into the multiplier together with the scalar of the scalar matrix to perform operation, and the result of the multiplier operation is taken as the value of the element corresponding to each third position index in the result matrix of the matrix multiplication operation. The elements are input into the multipliers to operate, each group of elements can be sequentially input into one multiplier to operate, or a plurality of groups of elements can be simultaneously input into a plurality of multipliers to operate.

When the matrix operation implementation module determines that the matrix operation type is matrix multiplication operation and that both the matrices to be operated are scalar matrices, scalar input multipliers of the two matrices can be operated, and then the result of the multiplier operation is assigned to each element on a main diagonal in a result matrix of the matrix multiplication operation. Only one multiplier operation is needed to obtain the product of the two matrix scalars.

By adopting specific steps to execute multiplication operation on some special matrixes, the number of cyclic operation times is reduced, and the waste of hardware resources is avoided.

under the condition that the matrix operation type is matrix multiplication operation and both the matrices are diagonal matrices, inputting the main diagonal elements corresponding to the indexes at the same position in the two diagonal matrices into the same multiplier for operation, and obtaining the values of the main diagonal elements in the result matrix of the matrix multiplication operation.

Specifically, when the matrix operation implementation module determines that the matrix operation type is matrix multiplication operation and both the two matrices to be operated are diagonal matrices, elements on main diagonal lines corresponding to the same position index of the two diagonal matrices can be input into the same multiplier to operate, and the result of multiplier operation is used as the value of the corresponding position index element on the main diagonal line in the result matrix of the matrix multiplication operation. The elements are input into the multipliers to operate, each group of elements can be sequentially input into one multiplier to operate, or a plurality of groups of elements can be simultaneously input into a plurality of multipliers to operate.

For example, the elements of the 1 st row and the 1 st column of the two diagonal matrixes are input into a multiplier to be operated, so that the element values of the 1 st row and the 1 st column in the result matrix are obtained; inputting the elements of the 2 nd row and the 2 nd column of the two diagonal matrixes into multipliers for operation to obtain the values of the elements of the 2 nd row and the 2 nd column in the result matrix; and so on.

under the condition that the matrix operation type is matrix multiplication operation and one of the two matrixes is an upper triangular matrix, a lower triangular matrix, a strict upper triangular matrix or a strict lower triangular matrix and the other matrix is not a scalar matrix, determining the values of r, c and i for executing key sentence operation for the mth time based on a lookup table which corresponds to the multiplication operation of the two matrixes and is used for representing the relation among m, r, c and i, inputting elements at the ith row and column positions of the r row and the ith column position of the left multiplication matrix and elements at the ith row and the ith column position of the right multiplication matrix into the same multiplier for operation, inputting multiplier operation results which correspond to multiple key sentence operation with the same values of r and c into the same adder for operation, and obtaining the values of the elements at the ith row and the ith column positions of the matrix multiplication operation result matrix;

The expression of the key sentence operation is Mat (r, c) +=MatLeft (r, i) ×MatRight (i, c), mat (r, c) represents the value of an element at the position of the (r) th row and the (c) th column in the matrix multiplication operation result matrix, matLeft (r, i) represents the value of an element at the position of the (r) th row and the (i) th column in the left multiplication matrix, matright (i, c) represents the value of an element at the position of the (i) th row and the (c) th column in the right multiplication matrix, M is an integer which is greater than or equal to 1 and less than or equal to N, N is determined based on the row number of the two matrices, r, c and i are integers which are greater than or equal to 1 and less than or equal to M, and the dimensions of the two matrices are M.

Specifically, when the matrix operation implementation module determines that the matrix operation type is matrix multiplication operation and one of the two matrices is an upper triangular matrix, a lower triangular matrix, a strict upper triangular matrix or a strict lower triangular matrix, and the other matrix is not a scalar matrix, the values of r, c and i for executing the key sentence operation for the mth time can be determined based on a lookup table for representing the relation between m and r, c and i corresponding to the multiplication operation of the two matrices. Each matrix type multiplication operation corresponds to a separate look-up table, such as an upper triangular matrix by a diagonal matrix, a lower triangular matrix by a symmetric matrix, etc.

After the values of r, c and i of the mth execution key sentence operation are determined based on the corresponding lookup table, the element at the ith row and ith column position in the left multiplication matrix and the element at the ith row and ith column position in the right multiplication matrix can be input into the same multiplier for operation, so that the corresponding multiplier operation result is obtained. And inputting multiplier operation results corresponding to the multiple key statement operations with the same r and c values into the same adder for operation, and taking the result of the adder operation as the element value of the position of the r row and the c column in the result matrix of the matrix multiplication operation.

The elements are input into the multipliers to operate, each group of elements can be sequentially input into one multiplier to operate, or a plurality of groups of elements can be simultaneously input into a plurality of multipliers to operate. The multiplier operation results are input into the adder for operation, and each group of multiplier operation results can be sequentially input into one adder for operation, or a plurality of groups of multiplier operation results can be simultaneously input into a plurality of adders for operation.

under the condition that the matrix operation type is matrix multiplication operation, the right multiplication matrix in the two matrices is a diagonal matrix, the left multiplication matrix is other matrix except the first matrix, and elements at the jth row and the jth column in the diagonal matrix and elements at the jth row and the jth column in the left multiplication matrix are input into the same multiplier to be operated, so that the values of the elements at the jth row and the jth column in the result matrix of the matrix multiplication operation are obtained; s and j are integers greater than or equal to 1 and less than or equal to O, and the dimensions of the two matrices are O; or alternatively, the process may be performed,

under the condition that the matrix operation type is matrix multiplication operation, the left multiplication matrix in the two matrices is a diagonal matrix, the right multiplication matrix is other matrix except the first matrix, and elements at the kth row and the kth column in the diagonal matrix and elements at the kth row and the kth column in the right multiplication matrix are input into the same multiplier to be operated, so that the values of the elements at the kth row and the kth column in the result matrix of the matrix multiplication operation are obtained; k and t are integers greater than or equal to 1 and less than or equal to P, and the dimensions of the two matrices are P;

The first type of matrix includes scalar matrices, diagonal matrices, upper triangular matrices, lower triangular matrices, strictly upper triangular matrices, and strictly lower triangular matrices.

Specifically, a first type of matrix may be determined prior to performing the matrix multiplication operation, the first type of matrix including a scalar matrix, a diagonal matrix, an upper triangular matrix, a lower triangular matrix, a strictly upper triangular matrix, and a strictly lower triangular matrix.

When the matrix operation implementation module determines that the matrix operation type is matrix multiplication operation and the right multiplication matrix in the two matrices is a diagonal matrix and the left multiplication matrix is other matrix except the first matrix, elements at the jth row and column positions in the jth row in the left multiplication matrix and elements at the jth column position in the diagonal matrix can be input into the same multiplier for operation, and the result of the multiplier operation is used as the value of the elements at the jth column position in the jth row in the result matrix of the matrix multiplication operation.

When the matrix operation implementation module determines that the matrix operation type is matrix multiplication operation and the left multiplication matrix in the two matrices is a diagonal matrix and the right multiplication matrix is other matrix except the first matrix, elements at the kth row and the kth column in the diagonal matrix and elements at the kth row and the kth column in the diagonal matrix can be input into the same multiplier for operation, and the result of the multiplier operation is used as the value of the elements at the kth row and the kth column in the result matrix of the matrix multiplication operation.

The elements are input into the multipliers to operate, each group of elements can be sequentially input into one multiplier to operate, or a plurality of groups of elements can be simultaneously input into a plurality of multipliers to operate.

Optionally, the lookup table for representing the relation between m and r, c and i is constructed based on at least one of a first operator, a second operator, a third operator and a fourth operator;

the first operator is used for the following operations: q=a#b [ a, B ], wherein # denotes a first operator, q denotes an operation result, a is an integer greater than or equal to 1, B denotes a function related to i, a, B denotes that the value of i changes from a to B one by one, a successively compares with B in the process of changing the value of i one by one, the initial value of q takes 0, when a is greater than or equal to B, the value of q is added with 1, a subtracts the current value of B to update the value of a, and the steps of comparing a with B and updating the values of q, a, B are repeated until a is less than B, and the final value of q is determined, a, B are integers greater than or equal to 1, and a is less than B;

The second operator is used for the following operations: re=a-! B [ a, B ], wherein ]! Representing a second operator, q represents an operation result, A is an integer greater than or equal to 1, B represents a function related to i, a, B represents that the value of i is changed from a to B one by one, A is successively compared with B in the process that the value of i is changed one by one, when A is greater than or equal to B, the value of A is updated by subtracting the current B from A, the value of B is updated based on the next value of i, the steps of comparing A with B and updating A, B are repeated until A is smaller than B, the final value of A is determined and is assigned to Re, wherein a and B are integers greater than or equal to 1, and a is smaller than B;

the third operator is used for the following operations: d=c '/' D, where '/' represents a third operator, D represents an operation result, C, D are integers greater than or equal to 1, when the remainder of the integer division D of C is 0, the value of D is the quotient of the integer division D of C minus 1, and when the remainder of the integer division D of C is not 0, the value of D is the quotient of the integer division D of C;

the fourth operator is used for the following operations: e=c '%' D, where '%' represents the fourth operator, e represents the operation result, C, D are integers greater than or equal to 1, where when the remainder of the integer division D of C is 0, the value of e is D, and when the remainder of the integer division D of C is not 0, the value of e is the remainder of the integer division D of C.

Specifically, in the case where one of the two matrices to be subjected to multiplication operations is an upper triangular matrix, a lower triangular matrix, a strict upper triangular matrix or a strict lower triangular matrix, and the other is not a scalar matrix, an expression for representing the relationship between m and r, c, i corresponding to the multiplication operations of the two matrices may be determined based on a newly defined operator, and a layer for loop is used to perform N key sentence operations. The expression used for representing the relation between m and r, c and i when N times of key sentence operation are executed is determined through the newly defined operator, so that the number of times of cyclic operation is reduced, and the waste of hardware resources is avoided.

For example, when the two matrices to be multiplied are both upper triangular matrices with dimensions n×n, the total number of loop executions can be calculatedDefinition of operators with new definitions: q=m#i (n+1-i) [1, n]，Re＝m！i(n+1-i)[1,n]B=re '/' (q+1), a=re '%' (q+1). The expression of the relation between m and r, c and i can be obtained: r=a, c=q+a, i=a+b+1, so that the key sentence operation is performed based on the expression of the relation between m and r, c, i.

Optionally, the matrix reading method includes:

Specifically, the matrix reading manner sent by the MatView module to the mats module may include at least one of the following reading manners:

matrix read-only operation, matrix inversion read-only operation, matrix transposition read-only operation, matrix diagonal read-only operation for column vector, matrix diagonal read-only operation for row vector, matrix non-diagonal read-only operation, matrix selection row read-only operation, matrix menu column read-only operation, matrix selection multi-row read-only operation and matrix selection multi-column read-only operation.

By adopting different reading modes to carry out read-only operation on the matrix, the condition of resource waste caused by repeated copying of the matrix in the process of returning the value can be avoided.

Optionally, the matrix operation type includes:

Specifically, the matrix operation type sent by the MatView module to the matrix operation implementation module may include at least one of the following operations:

matrix addition operation, matrix multiplication operation, matrix inversion operation, matrix transposition operation, matrix diagonal operation, matrix row operation and matrix column operation.

In one embodiment, the matrix row-taking, column-taking operations may be taking a particular row-column, taking consecutive rows-columns, taking discrete rows-columns using containers, taking submatrices with consecutive row-column indices or discrete row-column indices, etc. The matrix inversion operation, the matrix transposition operation, the matrix diagonal operation, the matrix row operation and the matrix column operation can also be operations of executing corresponding copy inversion, transposition, diagonal operation, row operation, column operation and the like. By defining different matrix operation types, the HLS library can better support linear algebraic operation.

Specifically, the systolic array module can adopt the same structure as the existing systolic array, and the systolic array is used for performing systolic reading when performing matrix multiplication operation, so that the number of times of accessing the memory is reduced.

Optionally, the HLS library further comprises:

Specifically, the HLS library provided by the present invention further includes a ranking module that can be used to determine the index of n maximum value elements or minimum value elements in the matrix. The specific architecture of the ranking module is not limited herein.

In one embodiment, the ranking module may determine an index of n maximum value elements or minimum value elements in the matrix based on a heap ranking of the multi-way tree.

For example, the index of n maximum or minimum elements in the matrix may be determined based on the heap ordering of the 32-ary tree. When the indexes of 2 maximum value elements in the matrix need to be determined, the elements in the matrix can be ordered pairwise, then 2 larger elements are taken from every 4 groups of elements in a merging mode, and finally the indexes of 2 maximum value elements in the matrix are obtained.

The indexes of n maximum value elements or minimum value elements in the matrix are determined based on the heap ordering of the multi-way tree, the parallelism of hardware is fully utilized, and the maximum parallelism can be achieved by the pairwise ordering and merging operation.

The high-level synthetic HLS library provided by the present invention is illustrated below by a specific application scenario.

Fig. 2 is a second schematic structural diagram of the high-level synthesis HLS library provided by the present invention, where, as shown in fig. 2, the HLS library includes a Mat class and a MatView class, and matrix operations are implemented based on the two classes.

The present invention provides a flexible-to-use linear algebraic hardware high-level synthesis library (Flexible Linear Algebra with Matrix-Empowered Synthesis, FLAMES).

The FLAMES provides various operation modules related to the linear algebra matrix, such as matrix addition, matrix multiplication, matrix pulsation array multiplication, matrix inversion, matrix transposition, matrix diagonal, matrix row, matrix column, matrix maximum value and the like, so that the digital signal processing module design can be realized conveniently.

MatView class is designed in FLAMES, so that the related operation of the matrix can be subjected to return value optimization, and the hardware implementation is more friendly.

The FLAMES flexibly uses the PRAGMA provided by the HLS tool, namely the optimized instruction, to describe hardware description details which cannot be expressed by C++, and is used for configuring related operations and parallelism of the operations, so that the digital signal processing is more efficient and flexible.

1. Design thinking

In order to solve the high requirements of the design of high-efficiency digital signal processing hardware by using a hardware description language on a designer and a design period, the HLS tool can enable the designer to obtain the design of a register transfer (register transfer level, RTL) level by only building the design on a high-level C++ class without excessively paying attention to detailed processes. But in order to make up for the deficiency of HLS in related linear algebra operation, thereby further improving the design efficiency, the FLAMES constructs related linear algebra modules and packages the modules into classes for calling, so that the design process is more convenient and flexible, and the hardware design synthesized by the HLS is more friendly, flexible and efficient.

2. Overall frame

FLAMES is a c++ library of header-only files based on Vitis HLS with an optimized design for hardware. The library contains a Mat class and a MatView class, both of which can implement matrix operations, and the MatView class provides the Mat class with an optimization regarding return values. The overall architecture of the FLAMES library is shown in fig. 2. The FLAMES is compatible with various data (fixed point and floating point) types of HLS, and comprises matrix types of a common matrix, a diagonal matrix, an upper triangular matrix, a lower triangular matrix, a strict upper triangular matrix, a strict lower triangular matrix, a symmetrical matrix, an antisymmetric matrix and the like, and the special matrices are introduced to save the storage space occupied by the special matrices, and different optimization operations are carried out on the data types of different matrix types or matrix elements in Mat classes and MatView classes.

3. Linear algebraic operation module

Different additions and multiplications are reloaded in the FLAMES according to different matrix types and data types of matrix elements. Wherein, when using the loop, the related PRAGMA (C++ hardware comprehensive attribute) is set, thereby different parallelism can be configured. Different parallelism is configured using the PRAGMA, unool (unwind parallelism) when performing a single layer for loop; while multiple for loops are performed, the inner loop loops are spread out using the PRAGMA of FLATTEN (flattened) at the same time, and different parallelism of unool configurations is also used.

In performing the addition between common matrices, FLAMES uses the for loop to add matrix elements at corresponding positions.

When the addition of the special matrix type is performed, the FLAMES utilizes the characteristic that elements at certain positions of certain special matrixes (an antisymmetric matrix, a symmetrical matrix, a diagonal matrix, a scalar matrix, an upper triangular matrix, a lower triangular matrix, a strict upper triangular matrix and a strict lower triangular matrix) are 0, so that the times of the addition operation can be reduced.

For example: as shown below, the diagonal matrix plus other types of matrices only need to add the elements on the diagonal with the elements on the multiplication matrix, but since the elements on the non-diagonal of the diagonal matrix are 0, no addition can be performed, thus reducing the use of adders and the number of cycles.

In performing multiplication between normal matrices, FLAMES uses three layers for loops, performing row vectors traversing the left-hand matrix, column vectors traversing the right-hand matrix, and vector multiplication, respectively.

And Mat (r, c) +=MatLeft (r, i): matRight (i, c) is a key statement in execution, wherein Mat (r, c) represents the value of an element at the position of the c-th row and the c-th column in the result matrix of multiplication operation, matLeft (r, i) represents the value of an element at the position of the i-th row and the i-th column in the left multiplication matrix, and Matright (i, c) represents the value of an element at the position of the c-th row and the c-th column in the right multiplication matrix.

When the multiplication of the special matrix type is carried out, similar to the addition of the special matrix, FLAMES utilizes the characteristic that elements at certain positions of certain special matrixes are 0, so that the multiplication times can be reduced. However, if a conventional multiple for loop algorithm is used and multiplication with element 0 at the matrix position is not performed, the number of loops for loop is not a constant determined, resulting in timing errors when the viruses HLS are integrated. However, with a known matrix size, the number N of executions of the key statement Mat (r, c) +=matleft (r, i) MatRight (i, c) in the multiplication operation is determinable and has no timing relationship with each other, and according to this property, FLAMES only needs to determine the relationship with the subscripts r, c, i at the mth execution cycle, and store this relationship in a lookup table, and perform such matrix multiplication using a layer of determined number of for cycles.

To facilitate the description of this relationship, a new binary operator #, +|! ' v ', ' and%.

(1) Operator # is used for the following operations: q=a#b [ a, B ], wherein q represents an operation result, a is an integer greater than or equal to 1, B represents a function related to i, a, B represents that the value of i changes from a to B one by one, a is successively compared with B in the process that the value of i changes one by one, the initial value of q takes 0, when a is greater than or equal to B, the value of q is added with 1, a is subtracted by a to update the value of a, the value of B is updated based on the next value of i, the steps of comparing a with B and updating the values of q, a and B are repeated until a is smaller than B, the final value of q is determined, a and B are integers greater than or equal to 1, and a is smaller than B.

(2) Operator-! The method is used for the following operation: re=a-! And B [ a, B ], wherein q represents an operation result, A is an integer greater than or equal to 1, B represents a function related to i, a, B represents that the value of i is changed from a to B one by one, A is successively compared with B in the process that the value of i is changed one by one, when A is greater than or equal to B, the value of A is updated by subtracting the current B from A, the value of B is updated based on the next value of i, the steps of comparing A with B and updating A, B are repeated until A is smaller than B, and the final value of A is determined and assigned to Re, wherein a and B are integers greater than or equal to 1, and a is smaller than B.

(3) The operator '/' is used for the following operations: d=c '/' D, where D represents an integer greater than or equal to 1, and C, D is an integer greater than or equal to 1, where D is subtracted by 1 from the quotient of C integer division D when the remainder of C integer division D is 0, and is a quotient of C integer division D when the remainder of C integer division D is not 0.

(4) The operator '%' is used for the following operations: e=c '%' D, where e represents an operation result, C, D is an integer greater than or equal to 1, and when the remainder of the integer division D by C is 0, the value of e is D, and when the remainder of the integer division D by C is not 0, the value of e is the remainder of the integer division D by C.

For example: 1. in the special matrix multiplication of multiplying an upper triangular matrix with the matrix size of n by an upper triangular matrix, the number of key sentences to be executed by elements at each position of the product matrix Mat is as follows, wherein the numerical value at the corresponding position of the Mat matrix represents the number of key sentences to be executed:

it can be seen that the elements from the matrix are equal in size in the direction of the arrow and increment to the upper right, from which the total number of executions can be calculatedThe FLAMES sequences the N execution times in this direction. With defined operators, q=m#i (n+1-i) [1, n]，Re＝m！i(n+1-i)[1,n]B=re '/' (q+1), a=re '%' (q+1). Then the correspondence may be found as: r=a, c=q+a, i=a+b+1.

2. In a special matrix multiplication, an upper triangular matrix multiplied by a lower triangular matrix with a matrix size n, the Mat matrix is as follows:

it can be seen that the elements from the matrix are all equal in size in the direction of the arrow and increment to the upper left, from which the total number of executions can be calculatedThe FLAMES sequences the N execution times in this direction. With defined operators, q=m#i (2n+1-2 i) [1, n]，Re＝m！i(2n+1-2i)[1,n]B=re '/' (q+1), a=re '%' (q+1), then the correspondence can be obtained as: if when b >n-q-1, then r=n-q-1, c=2n-2 q-b-2, i=max (r, c) +a-1; and when b is less than or equal to n-q-1, r=b,c＝n-q-1，i＝max(r,c)+a-1。

3. In a special matrix multiplication, which is to multiply an upper triangular matrix with a matrix size of n by a common matrix or a symmetric matrix, the Mat matrix is as follows:

it can be seen that the sum of the sizes of the elements in each column from the matrix along the arrow direction is equal, and thus the total execution times can be calculatedFLAMES sequences the N execution times in this direction and from left to right. With defined operators, b=2m '/' (n+1) n, a=2m '%' (n+1) n, q=b#i [1, n]，Re＝b！i[1,n]Then the correspondence may be found as: c=b, r=n-q-1, i=r+re.

4. In a special matrix multiplication, which is multiplication of an upper triangular matrix with a matrix size n by a strict lower triangular matrix, the Mat matrix is as follows:

it can be seen that the elements of the upper graph from the matrix along the direction of the arrow are equal in size and increment to the upper left, from which the total number of executions can be calculatedThe FLAMES sequences the N execution times in this direction. With defined operators, q=m#i (2n—2i) [1, n-1]，Re＝m！i(2n-2i)[1,n-1]B=re '/' (q+1), a=re '%' (q+1), then the correspondence can be obtained as: if when b >n-q-2, then r=2n-2 q-b-3, c=n-q-2, i=max (r, c+1) +a-1; and when b.ltoreq.n-q-2, r=n-q-1, c=b, i=max (r, c+1) +a-1.

The above is a typical process for implementing the relation between the determined number of loops m and the subscript of the matrix multiplication by using a layer of for loops in various special matrices, and other various cases are similar to those cases, and the core idea is to determine the total number of loops N, and reasonably order the execution sequence of N key sentences, so as to determine the relation between m and the matrix subscripts r, c, i.

The total number of cycles and the subscript relationship required to use various types of matrix multiplication implemented using one layer of cycles, and the matrix multiplication that can be implemented using only multiple cycles are listed below in tabular form.

/>

The FLAMES also provides a data stream to implement systolic array matrix multiplication, and fig. 3 is a schematic structural diagram of the systolic array provided by the present invention, as shown in fig. 3, which provides systolic reading, and reduces the number of accesses.

Matrix multiplication is realized by using a systolic array, and the key statement operation of the matrix multiplication is more hardware-friendly when Mat (r, c) +=MatLeft (r, i) MatRight (i, c) is performed.

FLAMES provides for matrix inversion operations using Neumann series approximation iteration and modified Neumann series approximation iteration, and also provides for inversion, transpose, diagonal, row, column operations that override certain rows and columns, discrete rows and columns using containers, submatrices using either continuous row indices or discrete row indices, and the like. Meanwhile, the corresponding copy type operations of inverting, transposing, diagonal taking, row taking, column taking and the like are also available.

4. MatView class

The FLAMES provides a MatView class, corresponding to MatView, matViewOpp, matViewT, matViewDiagMat, matViewDiagVec, matViewDiagRowVec, matViewOffDiag, matViewCol, matViewRow, matViewCols, matViewRows, which provides a matrix read-only operation, a matrix inverse read-only operation, a matrix transpose read-only operation, a matrix diagonal column vector read-only operation, a matrix diagonal row vector read-only operation, a matrix non-diagonal read-only operation, a matrix selection row read-only operation, a matrix list column read-only operation, a matrix selection multi-row read-only operation, a matrix column selection multi-column read-only operation, and the like. The method not only provides a construction function with the MatView class and a construction function with the Mat class, but also provides an operation of converting the MatView class into the actual Mat class, and provides different read-only modes for different matrix types (common matrix, antisymmetric matrix, symmetrical matrix, diagonal matrix, scalar matrix, upper triangular matrix, lower triangular matrix, strict upper triangular matrix and strict lower triangular matrix), so that the characteristics of a special matrix can be utilized to optimize storage and subsequent operation when the MatView class is converted into the Mat class. Therefore, when matrix operation is needed, the situation that resources are wasted due to repeated copying of the matrix in the process of returning values can be avoided by utilizing the operation function provided by the corresponding MatView class, and the problem of HLS return value optimization is solved.

5. Ranking network

FLAMES provides a hardware-friendly ordering network, and can efficiently obtain the subscripts of the largest or smallest n elements in the Mat class matrix.

The ordering network may have a variety of architectures, for example, an index of n maximum value elements or minimum value elements in a matrix is determined based on heap ordering of the multi-way tree. The ordering network provided by this embodiment may be constructed based on heap ordering of 32-ary trees. Fig. 4 is a schematic diagram of a sorting flow of the sorting network provided by the present invention, as shown in fig. 4, the operation of taking the maximum value in 32 sub-nodes adopts the first two-by-two sorting, and then takes two larger elements from every four elements by a merging method, so as to finally obtain the subscript of the largest child.

The ordering network generally adopts 32-fork tree heap ordering, generally, the number of layers of the tree is not more than four, and the heap building operation is quite efficient. The operation of taking the maximum value in the 32 sub-nodes adopts the first two-by-two sorting, and then takes larger two elements from every four elements by a merging method, so that the subscript of the largest child is finally obtained. The method fully utilizes the parallelism of hardware, and the maximum parallelism can be achieved by the two-by-two sequencing and merging operation.

The max function, taking the largest n elements, provides the reference array, continuous user interface. The index of the largest n elements is passed to the user's set of real parameters. When the user takes the maximum value next time, if the matrix element is not changed in the middle process, the continuous interface can be made to be true, so that the result of the previous sequencing is multiplexed, and the value taking efficiency is improved.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A high-level synthetic HLS library for use in the field of digital signal processing, comprising:

a MatView module, a Mat module and a matrix operation realization module;

2. The high-level synthesis HLS library of claim 1, wherein the performing a matrix operation based on the matrix type of the digital signal processing matrix to be operated and the digital signal processing matrix to be operated sent by the Mat module, and the matrix operation type sent by the MatView module, comprises:

3. The high-level synthesis HLS library of claim 1, wherein the performing a matrix operation based on the matrix type of the digital signal processing matrix to be operated and the digital signal processing matrix to be operated sent by the Mat module, and the matrix operation type sent by the MatView module, comprises:

4. The high-level synthesis HLS library of claim 1, wherein the performing a matrix operation based on the matrix type of the digital signal processing matrix to be operated and the digital signal processing matrix to be operated sent by the Mat module, and the matrix operation type sent by the MatView module, comprises:

5. The high-level synthesis HLS library of claim 1, wherein the performing a matrix operation based on the matrix type of the digital signal processing matrix to be operated and the digital signal processing matrix to be operated sent by the Mat module, and the matrix operation type sent by the MatView module, comprises:

6. The high-level synthesis HLS library of claim 1, wherein the performing a matrix operation based on the matrix type of the digital signal processing matrix to be operated and the digital signal processing matrix to be operated sent by the Mat module, and the matrix operation type sent by the MatView module, comprises:

7. The high-level synthetic HLS library of claim 1, wherein the matrix read mode comprises:

8. The high-level synthetic HLS library of claim 1, wherein the matrix operation types comprise:

9. The high-level synthetic HLS library of claim 1, wherein the matrix operation implementation module implements matrix multiplication operations based on systolic arrays.

10. The high-level synthetic HLS library of claim 1, further comprising: