CN111737169B - EDMA-based implementation method of high-capacity high-speed line-row output cache structure - Google Patents

EDMA-based implementation method of high-capacity high-speed line-row output cache structure Download PDF

Info

Publication number
CN111737169B
CN111737169B CN202010702851.6A CN202010702851A CN111737169B CN 111737169 B CN111737169 B CN 111737169B CN 202010702851 A CN202010702851 A CN 202010702851A CN 111737169 B CN111737169 B CN 111737169B
Authority
CN
China
Prior art keywords
data
address
cache
new data
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010702851.6A
Other languages
Chinese (zh)
Other versions
CN111737169A (en
Inventor
钟国波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhimingda Electronic Co ltd
Original Assignee
Chengdu Zhimingda Electronic Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhimingda Electronic Co ltd filed Critical Chengdu Zhimingda Electronic Co ltd
Priority to CN202010702851.6A priority Critical patent/CN111737169B/en
Publication of CN111737169A publication Critical patent/CN111737169A/en
Application granted granted Critical
Publication of CN111737169B publication Critical patent/CN111737169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a realization method of a high-capacity high-speed line-row output cache structure based on EDMA, which relates to the technical field of high-speed line-row output cache structures and comprises the following contents: storing original data into a new data cache operation and reading data from the new data cache operation; storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode; the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use; and the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache according to a specific mode for subsequent use, and in some large data processing occasions, each row and column data of any large-resolution image are accessed, so that the data access time is reduced, and the data access speed is greatly increased.

Description

EDMA-based implementation method of high-capacity high-speed line-row output cache structure
Technical Field
The invention relates to the technical field of high-speed row-column output cache structures, in particular to a realization method of a high-capacity high-speed row-column output cache structure based on EDMA (enhanced direct memory access).
Background
In data processing, in some special application occasions, such as some image processing applications of common image processing, SAR radar imaging processing and the like, any line or column of data in an image is generally required to be processed, such as FFT operation on the line or column; before data processing, a row of data or a column of data needs to be read.
The conventional data reading mode of the processor adopts code reading, when the code is adopted to read externally transmitted data which needs to be updated frequently, the efficiency is very low, and particularly aiming at the occasion of reading bytes of non-continuous addresses, the efficiency is extremely low, and the efficiency can only reach one percent compared with EDMA reading.
When reading a column of data, the method belongs to discontinuous address access, and the code reading is very slow, so an EDMA interval reading mode, namely interval row length reading, is usually adopted; however, for reading data with a large number of interval points, EDMA generally only supports interval data reading with a maximum number of 32767 points, that is, only supports column data reading with a row smaller than 32767 points at the maximum, and column reading with a larger row length cannot be achieved. In addition, if the transmission efficiency of the EMDA needs to be exerted, the transmission data length needs to be longer than a certain length to achieve higher transmission efficiency.
For application occasions needing real-time processing, such as images, SAR radars and the like, data are transmitted from a camera or a radar and the like in real time, data processing with specific data length is generally needed to be realized within a specified time interval period, and for the occasions needing processing simultaneously on rows and columns, the reading and writing time of row and column data and the processing time of row and column data are required to be the same and controllable as much as possible, so that the application process can work strictly according to a preset time sequence without interrupting the original process; in order to guarantee the algorithm operation time to the maximum extent, the data reading and writing time needs to be reduced as much as possible, that is, a set of efficient data reading and writing mechanism and a cache structure are needed to guarantee efficient data access and equal row and column processing time, and the application process is controllable.
In order to solve the problems, an EDMA-based high-capacity high-speed row-column output cache structure and an implementation method are provided, so that in some large data processing occasions, such as SAR radar imaging, any large-resolution image, such as each row and column of data with row data larger than 32767, can be accessed, and the data access time is reduced.
Disclosure of Invention
The present invention is directed to a method for implementing an EDMA-based high-capacity high-speed line/row output cache structure, so as to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: an implementation method of a high-capacity high-speed line-row output cache structure based on EDMA comprises the following contents:
storing original data into a new data cache operation and reading data from the new data cache operation;
storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;
the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;
the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);
the original data matrix is an M-row N-column matrix, the submatrix is an a-row b-column matrix, the original data matrix is divided into submatrix data blocks of a and b, namely the (x, y) -th submatrix, wherein x is an integer, and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.
Preferably, the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of (i-1) divided by a quotient of a to truncate a fraction, i.e., a maximum integer not greater than the original data; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.
Preferably, the operation of reading data from the new data cache comprises a row operation of reading data relative to the original data from the recombined new data cache and a column operation of reading data relative to the original data from the recombined new data cache.
Preferably, the operation of reading the ith row relative to the original data from the recombined new data cache includes the following specific steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.
Preferably, in step s5, when the ith row of data corresponding to the original data is read from the new data cache after being reorganized, starting from the m + k2 th row of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the b (i-1) +1 th address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.
Preferably, the reading from the new data cache after the reorganization is relative to the column operation with the original data, including the following cases:
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;
(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
Preferably, the j-th column of data in the submatrix corresponding to the even column of the (x, y) -th submatrix of the original data, where 1< (j < (b), 1< (y < (N/b), and y is an even number, is read from the new data buffer after reorganization, and the process is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Compared with the prior art, the invention has the beneficial effects that: a method for realizing a high-capacity high-speed line-row output cache structure based on EDMA is provided, and original data are stored into a new data cache operation and read from the new data cache operation; storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode; and the data reading operation from the new data cache is used for taking out row and/or column data corresponding to the original data to be used from the new data cache in a specific mode for subsequent use, and in some large data processing occasions, the access to any large-resolution image, such as each row and each column of data with row data of more than 32767, is realized, and the data access time is reduced.
Detailed Description
The described embodiments are only some embodiments of the invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a technical scheme that: an implementation method of a high-capacity high-speed line-row output cache structure based on EDMA comprises the following contents:
storing original data into a new data cache operation and reading data from the new data cache operation;
storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;
the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;
the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data, and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);
the original data matrix is an M-row N-column matrix, a sub-matrix is an a-row b-column matrix, and the original data matrix is divided into a sub-matrix data block of a b, namely an (x, y) -th sub-matrix, wherein x is an integer and 1< ═ x < (M/a); y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.
Wherein the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a truncated fraction, i.e., a largest integer not greater than the original data; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.
Wherein the read data from the new data cache comprises a row operation relative to the original data read from the reassembled new data cache and a column operation relative to the original data read from the reassembled new data cache.
The operation of reading the ith row relative to the original data from the recombined new data cache specifically comprises the following steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.
In step s5, when the row i of data corresponding to the original data is read from the new data cache after being reorganized, starting from the row m + k2 of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the row b (i-1) +1 address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.
Wherein the reading from the reorganized new data cache is relative to the column operation with the original data, including the following cases:
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;
(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
Reading j-th column data in a submatrix relative to an even column of the (x, y) -th submatrix of the original data, wherein y is an even number, from the recombined new data buffer, wherein 1< j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Example (b):
A. setting an original data matrix as an M-row N-column matrix, setting a sub-matrix as an a-row b-column matrix, and dividing the original data matrix into sub-matrix data blocks of a b, namely the (x, y) -th sub-matrix, wherein x is an integer and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the original data matrix is divided into sub-matrices according to 4 x 4, and the following table is adopted:
Figure GDA0002744345450000101
storing original data into a new data cache operation, comprising the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
s5, after sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer;
the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a, which is not greater than the maximum integer of the original data, with a fraction cut off; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by performing an EDMA transfer operation on the continuous original address and the equal-interval destination address, and the original data is divided into a sub-matrix data block of a × b and stored in a new data cache, where the value of the new data cache is as shown in the following figure:
Figure GDA0002744345450000111
the read data from the new data cache operation includes a row operation relative to the original data read from the reorganized new data cache and a column operation relative to the original data read from the reorganized new data cache:
reading the ith row operation relative to the original data from the recombined new data cache, wherein the ith row operation comprises the following steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data cache, continuing to execute the processes similar to the processes from s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data cache, wherein 5< m < N/b, and m is an integer;
in the step s5, when the ith row of data corresponding to the original data is read from the new data cache after being reorganized, starting from the m + k2 th row of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the b (i-1) +1 th address of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; where k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) denotes the value of the quotient of (i-1) divided by a, with the fractional part being left, i.e. not greater than the largest integer of the original data.
The flows 1 and 3 are completed by adopting the continuous original address and continuous destination address moving operation of the EDMA.
The flows 2 and 4 are completed by equal-interval original address and continuous destination address moving operation of EDMA.
Reading a column operation corresponding to the original data from the recombined new data cache, and the method comprises the following steps:
reading j-th column data in an odd column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the odd column submatrix and the even column submatrix thereof are defined as the following chart:
Figure GDA0002744345450000131
the reading process is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
the steps (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA;
(3) and the steps 1 to 3 are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
The steps (1) to (3) are completed by adopting the continuous original address and continuous destination address moving operation of EDMA
Reading j-th column data in a submatrix relative to an even column of y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. An implementation method of an EDMA-based high-capacity high-speed line output cache structure is characterized by comprising the following steps:
storing original data into a new data cache operation and reading data from the new data cache operation;
storing the original data into a new data cache for storing each line of data of the original data into the new data cache line by line according to a specific mode;
the data reading operation from the new data cache is used for taking out row and/or column data which are needed to be used and correspond to the original data from the new data cache in a specific mode for subsequent use;
the new data cache refers to a memory which can be accessed by an EDMA (enhanced direct memory access) for storing whole frame data and needs to support a two-dimensional EDMA (enhanced direct memory access), such as an EDMA of a DSP (digital signal processor);
the original data is an M-row N-column matrix, a sub-matrix is an a-row b-column matrix, the original data matrix is divided into a sub-matrix data block of a b, namely an (x, y) -th sub-matrix, wherein x is an integer, and 1< ═ x < ═ M/a; y is an integer, and 1< ═ y < ═ N/b; m is a number divisible by a, and a > -2; n is a number divisible by b, and 1000< ═ b < ═ 32767; the operation of storing the ith row of the original data matrix into the new data cache comprises the following steps:
s1, sequentially storing b data of addresses from 1 st to b of the ith row of the original data matrix into b addresses beginning with the addresses b (i-1) +1 as the first addresses of the 1+ k1 th row of the new data cache;
s2, sequentially and respectively storing the b data of the b +1 th to 2 x b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 th address of the 2+ k1 th row of the new data cache;
s3, sequentially storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into the b addresses beginning with the b × (i-1) +1 addresses as the first addresses of the y + k1 row of the new data cache; wherein y is an odd number, and 3< y < N/b;
s4, sequentially and respectively storing the b data of the (y-1) × b +1 to y × b addresses of the ith row of the original data matrix into b-1 addresses of every interval starting from the ((i-1)% a) +1 address as the first address of the y + k1 row of the new data cache; wherein y is an even number, and 4< y < N/b;
and S5, sequentially and circularly and alternately operating S3 and S4 from small to large in all the value ranges of y, and then sequentially and repeatedly executing the processes from S1 to S4 from small to large in all the value ranges of i until all the data of the original data matrix are stored in a new data cache, wherein 1< i < M, and i is an integer.
2. An implementation method of EDMA-based high-capacity high-speed line output cache structure as claimed in claim 1, wherein: the steps S1 to S5, wherein k1 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents a value of a quotient of (i-1) divided by a, which is not greater than the maximum integer of the original data, with a fraction cut off; the ((i-1)% a) represents the operation of (i-1) taking the residue of a; the processes of S1 and S3 are completed by adopting the continuous original address and continuous destination address moving operation of EDMA; the processes of S2 and S4 are completed by adopting the operations of moving the continuous original addresses and the equal-interval destination addresses of the EDMA.
3. An implementation method of EDMA-based high-capacity high-speed line output cache structure as claimed in claim 2, wherein: the read data operation from the new data cache comprises a row operation of reading data relative to original data from the recombined new data cache and a column operation of reading data relative to the original data from the recombined new data cache.
4. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 3, characterized in that: the method for reading the ith row operation relative to the original data from the recombined new data cache comprises the following specific steps:
s1, sequentially and continuously fetching b data from the 1+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s2, starting from the 2+ k2 line of the new data cache with the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
s3, sequentially and continuously fetching b data from the 3+ k2 line of the new data buffer by taking the b (i-1) +1 address as the initial address;
s4, starting from the 4+ k2 th line of the new data cache by using the (i-1)% a +1 address as the initial address, sequentially taking one data at intervals of b-1 data until taking out the b data;
s5, starting from the m + k2 line of the new data buffer, continuing to execute the process similar to the above s1 to s4, and sequentially finishing all data from small to large in the value range of m of the new data buffer, wherein 5< ═ m < ═ N/b, and m is an integer.
5. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 4, characterized in that: in the step s5, when the row i of data corresponding to the original data is read from the new data cache after being reorganized, starting from the row m + k2 of the new data cache, in the odd steps of the above steps, such as s1 and s3, the first address of the corresponding row needs to be set to the row b (i-1) +1 of the corresponding row; in the even step of the above steps such as s2, s4, the first address of the corresponding row needs to be set to the (i-1)% a +1 th address of the corresponding row, wherein ((i-1)% a) represents the (i-1) operation of taking the remainder of a; wherein i is the number of lines of the original data required, 1< ═ i < ═ M, and i is an integer, e.g., i equals 2 representing the number of line 2 corresponding to the original data taken from the new data cache; wherein k2 ═ floor ((i-1)/a)) × (N/b), and floor ((i-1)/a) represents the value of the fraction of (i-1) divided by a, i.e., the largest integer not greater than the original data, said s1 and s3 flows are completed with successive original address and successive destination address move operations of EDMA; the s2 and s4 processes are completed by equal-interval original address and continuous destination address moving operations of EDMA.
6. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as claimed in claim 5, wherein: the reading from the recombined new data cache is relative to the column operation of the original data, and comprises the following conditions:
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< j < ═ b,1< y < ═ N/b, and y is an odd number;
(ii) reading j-th column data in a submatrix corresponding to an even column with y in an (x, y) -th submatrix of the original data as an even number from the recombined new data buffer, wherein 1< (j < (b), 1< (y < (N/b), and y is an even number;
reading j-th column data in an odd-numbered column submatrix which is odd relative to y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an odd number, and the flow is as follows:
(1) starting from the y-th line of the new data cache by taking the j-th address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(2) starting from the y + N/b line of the new data cache by taking the j address as the initial address, sequentially taking one data at intervals of b-1 data until b data are taken out;
(3) sequentially taking one data from the y + (N/b) N1 th line of the new data cache at intervals of b-1 data by taking the j address as the initial address until b data are taken out; n1 is sequentially valued from small to large within the value range until all data are taken out; wherein 2< ═ n1< (M/a-1), and n1 is an integer;
and (1) to (3) are completed by adopting equal-interval original address and continuous destination address moving operation of EDMA.
7. An EDMA-based implementation method of a high-capacity high-speed line-row output cache structure as per claim 6, characterized in that: reading j-th column data in a submatrix relative to an even column of y in an (x, y) -th submatrix of the original data from the recombined new data buffer, wherein 1< ═ j < ═ b,1< ═ y < ═ N/b, and y is an even number, and the flow is as follows:
1) sequentially reading one data from the y line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
2) sequentially reading one datum from the y + N/b line of the new data cache by taking the b x (j-1) +1 address as a first address until b data are taken out;
3) sequentially reading one datum from the y + N/b N2 line of the new data cache by taking the b (j-1) +1 address as a first address until b data are taken out; n2 is sequentially valued from small to large within the value range until all data are taken out; where 2< ═ n2< (M/a-1), and n2 is an integer, and the 1 st) to 3 rd) steps are completed using successive source and destination address move operations of EDMA.
CN202010702851.6A 2020-07-21 2020-07-21 EDMA-based implementation method of high-capacity high-speed line-row output cache structure Active CN111737169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010702851.6A CN111737169B (en) 2020-07-21 2020-07-21 EDMA-based implementation method of high-capacity high-speed line-row output cache structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010702851.6A CN111737169B (en) 2020-07-21 2020-07-21 EDMA-based implementation method of high-capacity high-speed line-row output cache structure

Publications (2)

Publication Number Publication Date
CN111737169A CN111737169A (en) 2020-10-02
CN111737169B true CN111737169B (en) 2020-11-27

Family

ID=72656064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010702851.6A Active CN111737169B (en) 2020-07-21 2020-07-21 EDMA-based implementation method of high-capacity high-speed line-row output cache structure

Country Status (1)

Country Link
CN (1) CN111737169B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879584B (en) * 2022-07-05 2022-10-28 成都智明达电子股份有限公司 DMA controller boundary alignment method based on FPGA and circuit thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262553A (en) * 2011-08-03 2011-11-30 中国科学技术大学 Method for optimizing linear system software package based on loongson 3B
CN102930636A (en) * 2012-11-15 2013-02-13 广州广电运通金融电子股份有限公司 Recognition device and recognition method for paper money number
CN105739874A (en) * 2016-03-11 2016-07-06 沈阳聚德视频技术有限公司 EDMA achieving method in image rotation based on DSP
CN106303582A (en) * 2016-08-20 2017-01-04 航天恒星科技有限公司 A kind of Joint Source Channel decoding method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7673076B2 (en) * 2005-05-13 2010-03-02 Texas Instruments Incorporated Concurrent read response acknowledge enhanced direct memory access unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262553A (en) * 2011-08-03 2011-11-30 中国科学技术大学 Method for optimizing linear system software package based on loongson 3B
CN102930636A (en) * 2012-11-15 2013-02-13 广州广电运通金融电子股份有限公司 Recognition device and recognition method for paper money number
CN105739874A (en) * 2016-03-11 2016-07-06 沈阳聚德视频技术有限公司 EDMA achieving method in image rotation based on DSP
CN106303582A (en) * 2016-08-20 2017-01-04 航天恒星科技有限公司 A kind of Joint Source Channel decoding method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于通用DSP的多路视频编码器的优化实现;李波等;《电子学报》;20061130(第11期);第2103-2108页 *

Also Published As

Publication number Publication date
CN111737169A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
EP3667607B1 (en) Virtual linebuffers for image signal processors
US5053985A (en) Recycling dct/idct integrated circuit apparatus using a single multiplier/accumulator and a single random access memory
CN106846235B (en) Convolution optimization method and system accelerated by NVIDIA Kepler GPU assembly instruction
US10347220B1 (en) Data compression and decompression method for DeMura table
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN111737169B (en) EDMA-based implementation method of high-capacity high-speed line-row output cache structure
JPH0944356A (en) Processor and data processor
JP5359569B2 (en) Memory access method
US4768159A (en) Squared-radix discrete Fourier transform
CN108920097B (en) Three-dimensional data processing method based on interleaving storage
CN110060196B (en) Image processing method and device
CN110532219B (en) FPGA-based ping-pong data storage removing method
Chang et al. Fast convolution kernels on Pascal GPU with high memory efficiency
CN105513004B (en) A kind of image distortion calibration system and its storage method and addressing method
JPH07152730A (en) Discrete cosine transformation device
CN101499163A (en) Image scaling method
US6760741B1 (en) FFT pointer mechanism for FFT memory management
CN110570802B (en) Digital gamma correction system and display driving chip comprising same
CN114254740A (en) Convolution neural network accelerated calculation method, calculation system, chip and receiver
JP5113174B2 (en) Method and apparatus for simultaneously generating multi-site FM screen dots
JP2004185800A (en) Range selectable address decoder and high speed graphic processing frame memory device using the decoder
CN109753629B (en) Multi-granularity parallel FFT computing device
JP2020160828A (en) Image data processing device for affine transformation of two-dimensional image
CN112184565B (en) Multi-window serial image sharpening method
CN117314730B (en) Median filtering computing device and method for accelerating digital image processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant