CN111737169B

CN111737169B - EDMA-based implementation method of high-capacity high-speed line-row output cache structure

Info

Publication number: CN111737169B
Application number: CN202010702851.6A
Authority: CN
Inventors: 钟国波
Original assignee: Chengdu Zhimingda Electronic Co ltd
Current assignee: Chengdu Zhimingda Electronic Co ltd
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2020-11-27
Anticipated expiration: 2040-07-21
Also published as: CN111737169A

Abstract

The invention discloses a realization method of a high-capacity high-speed line-row output cache structure based on EDMA, which relates to the technical field of high-speed line-row output cache structures and comprises the following contents: storing original data into a new data cache operation and reading data from the new data cache operation; storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode; the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use; and the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache according to a specific mode for subsequent use, and in some large data processing occasions, each row and column data of any large-resolution image are accessed, so that the data access time is reduced, and the data access speed is greatly increased.

Description

EDMA-based implementation method of high-capacity high-speed line-row output cache structure

Technical Field

The invention relates to the technical field of high-speed row-column output cache structures, in particular to a realization method of a high-capacity high-speed row-column output cache structure based on EDMA (enhanced direct memory access).

Background

In data processing, in some special application occasions, such as some image processing applications of common image processing, SAR radar imaging processing and the like, any line or column of data in an image is generally required to be processed, such as FFT operation on the line or column; before data processing, a row of data or a column of data needs to be read.

The conventional data reading mode of the processor adopts code reading, when the code is adopted to read externally transmitted data which needs to be updated frequently, the efficiency is very low, and particularly aiming at the occasion of reading bytes of non-continuous addresses, the efficiency is extremely low, and the efficiency can only reach one percent compared with EDMA reading.

When reading a column of data, the method belongs to discontinuous address access, and the code reading is very slow, so an EDMA interval reading mode, namely interval row length reading, is usually adopted; however, for reading data with a large number of interval points, EDMA generally only supports interval data reading with a maximum number of 32767 points, that is, only supports column data reading with a row smaller than 32767 points at the maximum, and column reading with a larger row length cannot be achieved. In addition, if the transmission efficiency of the EMDA needs to be exerted, the transmission data length needs to be longer than a certain length to achieve higher transmission efficiency.

For application occasions needing real-time processing, such as images, SAR radars and the like, data are transmitted from a camera or a radar and the like in real time, data processing with specific data length is generally needed to be realized within a specified time interval period, and for the occasions needing processing simultaneously on rows and columns, the reading and writing time of row and column data and the processing time of row and column data are required to be the same and controllable as much as possible, so that the application process can work strictly according to a preset time sequence without interrupting the original process; in order to guarantee the algorithm operation time to the maximum extent, the data reading and writing time needs to be reduced as much as possible, that is, a set of efficient data reading and writing mechanism and a cache structure are needed to guarantee efficient data access and equal row and column processing time, and the application process is controllable.

In order to solve the problems, an EDMA-based high-capacity high-speed row-column output cache structure and an implementation method are provided, so that in some large data processing occasions, such as SAR radar imaging, any large-resolution image, such as each row and column of data with row data larger than 32767, can be accessed, and the data access time is reduced.

Disclosure of Invention

The present invention is directed to a method for implementing an EDMA-based high-capacity high-speed line/row output cache structure, so as to solve the problems in the background art.

In order to achieve the purpose, the invention provides the following technical scheme: an implementation method of a high-capacity high-speed line-row output cache structure based on EDMA comprises the following contents:

storing original data into a new data cache operation and reading data from the new data cache operation;

storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;

the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;

the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);

the original data matrix is an M-row N-column matrix, the submatrix is an a-row b-column matrix, the original data matrix is divided into submatrix data blocks of a and b, namely the (x, y) -th submatrix, wherein x is an integer, and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:

s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;

s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;

s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;

s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;

and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.

Preferably, the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of (i-1) divided by a quotient of a to truncate a fraction, i.e., a maximum integer not greater than the original data; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.

Preferably, the operation of reading data from the new data cache comprises a row operation of reading data relative to the original data from the recombined new data cache and a column operation of reading data relative to the original data from the recombined new data cache.

Preferably, the operation of reading the ith row relative to the original data from the recombined new data cache includes the following specific steps:

s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;

s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;

s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;

s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;

s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.

Preferably, in step s5, when the ith row of data corresponding to the original data is read from the new data cache after being reorganized, starting from the m + k2 th row of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the b (i-1) +1 th address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.

Preferably, the reading from the new data cache after the reorganization is relative to the column operation with the original data, including the following cases:

reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;

(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;

reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:

(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;

(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;

(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;

and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.

Preferably, the j-th column of data in the submatrix corresponding to the even column of the (x, y) -th submatrix of the original data, where 1< (j < (b), 1< (y < (N/b), and y is an even number, is read from the new data buffer after reorganization, and the process is as follows:

1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;

2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;

3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.

Compared with the prior art, the invention has the beneficial effects that: a method for realizing a high-capacity high-speed line-row output cache structure based on EDMA is provided, and original data are stored into a new data cache operation and read from the new data cache operation; storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode; and the data reading operation from the new data cache is used for taking out row and/or column data corresponding to the original data to be used from the new data cache in a specific mode for subsequent use, and in some large data processing occasions, the access to any large-resolution image, such as each row and each column of data with row data of more than 32767, is realized, and the data access time is reduced.

Detailed Description

The described embodiments are only some embodiments of the invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a technical scheme that: an implementation method of a high-capacity high-speed line-row output cache structure based on EDMA comprises the following contents:

the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data, and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);

the original data matrix is an M-row N-column matrix, a sub-matrix is an a-row b-column matrix, and the original data matrix is divided into a sub-matrix data block of a b, namely an (x, y) -th sub-matrix, wherein x is an integer and 1< ═ x < (M/a); y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:

Wherein the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a truncated fraction, i.e., a largest integer not greater than the original data; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.

Wherein the read data from the new data cache comprises a row operation relative to the original data read from the reassembled new data cache and a column operation relative to the original data read from the reassembled new data cache.

The operation of reading the ith row relative to the original data from the recombined new data cache specifically comprises the following steps:

In step s5, when the row i of data corresponding to the original data is read from the new data cache after being reorganized, starting from the row m + k2 of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the row b (i-1) +1 address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.

Wherein the reading from the reorganized new data cache is relative to the column operation with the original data, including the following cases:

Reading j-th column data in a submatrix relative to an even column of the (x, y) -th submatrix of the original data, wherein y is an even number, from the recombined new data buffer, wherein 1< j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:

Example (b):

A. setting an original data matrix as an M-row N-column matrix, setting a sub-matrix as an a-row b-column matrix, and dividing the original data matrix into sub-matrix data blocks of a b, namely the (x, y) -th sub-matrix, wherein x is an integer and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the original data matrix is divided into sub-matrices according to 4 x 4, and the following table is adopted:

storing original data into a new data cache operation, comprising the following steps:

s5, after sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer;

the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a, which is not greater than the maximum integer of the original data, with a fraction cut off; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by performing an EDMA transfer operation on the continuous original address and the equal-interval destination address, and the original data is divided into a sub-matrix data block of a × b and stored in a new data cache, where the value of the new data cache is as shown in the following figure:

the read data from the new data cache operation includes a row operation relative to the original data read from the reorganized new data cache and a column operation relative to the original data read from the reorganized new data cache:

reading the ith row operation relative to the original data from the recombined new data cache, wherein the ith row operation comprises the following steps:

s5, starting from the m + k2 line of the new data cache, continuing to execute the processes similar to the processes from s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data cache, wherein 5< m < N/b, and m is an integer;

in the step s5, when the ith row of data corresponding to the original data is read from the new data cache after being reorganized, starting from the m + k2 th row of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the b (i-1) +1 th address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; where k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) denotes the value of the quotient of (i-1) divided by a, with the fractional part being left, i.e. not greater than the largest integer of the original data.

The flows 1 and 3 are completed by adopting the continuous original address and continuous destination address moving operation of the EDMA.

The flows 2 and 4 are completed by equal-interval original address and continuous destination address moving operation of EDMA.

Reading a column operation corresponding to the original data from the recombined new data cache, and the method comprises the following steps:

reading j-th column data in an odd column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the odd column submatrix and the even column submatrix thereof are defined as the following chart:

the reading process is as follows:

the steps (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA;

(3) and the steps 1 to 3 are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.

The steps (1) to (3) are completed by adopting the continuous original address and continuous destination address moving operation of EDMA

Reading j-th column data in a submatrix relative to an even column of y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An implementation method of an EDMA-based high-capacity high-speed line output cache structure is characterized by comprising the following steps:

the original data is an M-row N-column matrix, a sub-matrix is an a-row b-column matrix, the original data matrix is divided into a sub-matrix data block of a b, namely an (x, y) -th sub-matrix, wherein x is an integer, and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:

2. An implementation method of EDMA-based high-capacity high-speed line output cache structure as claimed in claim 1, wherein: the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a, which is not greater than the maximum integer of the original data, with a fraction cut off; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.

3. An implementation method of EDMA-based high-capacity high-speed line output cache structure as claimed in claim 2, wherein: the read data operation from the new data cache comprises a row operation of reading data relative to original data from the recombined new data cache and a column operation of reading data relative to the original data from the recombined new data cache.

4. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 3, characterized in that: the method for reading the ith row operation relative to the original data from the recombined new data cache comprises the following specific steps:

5. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 4, characterized in that: in the step s5, when the row i of data corresponding to the original data is read from the new data cache after being reorganized, starting from the row m + k2 of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the row b (i-1) +1 of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.

6. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as claimed in claim 5, wherein: the reading from the recombined new data cache is relative to the column operation of the original data, and comprises the following conditions:

7. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 6, characterized in that: reading j-th column data in a submatrix relative to an even column of y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows: