CN116010313A

CN116010313A - Universal and configurable image filtering calculation multi-line output system and method

Info

Publication number: CN116010313A
Application number: CN202211506014.1A
Authority: CN
Inventors: 黄明强; 陈嘉豪; 马文凌
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-25
Also published as: WO2024114505A1

Abstract

The invention relates to a general purpose configurable image filtering computing multi-line output system and method. In the system and the method, a CPU stores control parameters related to an input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters; the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result. The method solves the technical problem that the prior filtering calculation cannot output a plurality of rows of results at the same time.

Description

Universal and configurable image filtering calculation multi-line output system and method

Technical Field

The invention relates to the field of image filtering, in particular to a general configurable image filtering calculation multi-line output system and method.

Background

The filtering operation is a common operation in the image processing process, and has the main effects of filtering out the wave bands of specific frequencies in the image, so that the filtering is beneficial to reducing or eliminating the influence of environmental noise or noise generated in the signal extraction process on the image, the quality of the image can be improved, convenience is provided for the subsequent image processing operation, and the image information of the reserved frequency wave bands is processed later. The filtering can be divided into two functions according to different filtering wave bands: (1) reducing noise effects in the image. This function is typically achieved by low pass filtering, as noise is typically a high frequency signal relative to the image. (2) extracting key information in the image.

A similar approach is currently available to filter the input image using a single size filter kernel. The data reading address frequently jumps in the existing image filtering scheme, efficient burst transmission is not used, high efficiency and flexible data flow are not available, a fixed 5*5 filtering core is used for filtering, filtering cannot be carried out on a plurality of types of images, universality is not available, and when the filtering scheme is implemented in a memristor array, the utilization rate of a memristor is low.

Disclosure of Invention

The embodiment of the invention provides a general and configurable image filtering calculation multi-line output system and method, which at least solve the technical problem that multi-line results cannot be output at the same time when filtering calculation is performed in the prior art.

According to an embodiment of the present invention, there is provided a general purpose, configurable image filtering computing multi-line output system, including: the system comprises a direct memory access module, a parameter configuration module, an on-chip cache region and a linear filtering calculation module; wherein:

the CPU stores control parameters related to the input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters;

the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result.

Further, the CPU stores control parameters related to the input image to a parameter configuration module through an AXI-lite bus; after the parameter configuration module obtains the control parameters from the CPU, the control parameters are utilized to control the access operation of the direct memory access module and control the convolution calculation of the linear filtering calculation module; the direct memory access module acquires image input data matched with the control parameters from the off-chip memory DDR through an AXI4 bus according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the off-chip memory DDR through the direct memory access module, and the off-chip memory DDR stores the calculation result.

Further, the linear filtering calculation module comprises a buffer zone, a control unit and a calculation unit; the buffer area receives and extracts input data from the DDR (double data rate) of the off-chip memory and sends the input data to the computing unit; the computing unit receives the input data of the buffer area, generates a computed output result, and sends the computed output result to the off-chip memory DDR for storage through the direct memory access module.

Further, the control parameters calculated by the parameter configuration module include several parameters related to the on-chip buffer space, including the width Wout of the output feature, the height Hout of the output feature, the number of input channels CHout, and so on.

According to another embodiment of the present invention, there is provided a general-purpose, configurable image filtering calculation multi-line output method including the steps of:

Further, calculating the single-row, multi-row output width direction parameters includes:

Step one, inputting a BURST transmission length burst_len;

step two, width direction DEPTH x_depth=burst length burst_len;

step three, calculating the number width_out_test of the width direction output;

step four, comparing the width_out_test with Wout, and if the width_out_test is smaller than Wout, filling the number to the right;

step five, calculating the number width_in_test of the width direction input;

step six, judging whether the number of the width direction inputs is not greater than the width direction depth, if so, continuing step seven, if not, carrying out burst transmission for times split_w_num=1 in the width direction, wherein the first width direction output number first_width_out, the last width direction output number last_width_out are equal to Wout, the first width direction input number first_width_in and the last width direction input number last_width_in are equal to width_in_test;

step seven, if the number of the width direction inputs is larger than the width direction depth, determining the first time width direction input number first_width_in, the residual width direction input number res_width_in, the first time width direction output number first_width_out and the residual width direction output number res_width_out;

Step eight, judging whether the remaining width direction input number res_width_in is not larger than the width direction DEPTH x_depth, if so, continuing step nine, if not, the number of times of burst transmission in the width direction is required split_w_num=2, the last time of width direction input number last_width_in=the remaining width direction input number res_width_in, and the last time of width direction output number last_width_out=the remaining width direction output number res_width_out;

step nine, determining the maximum output number max_out_width of one burst transmission in the width direction, and determining the number split_w_num of burst transmission in the width direction;

step ten, determining the middle transmission width direction output number middle_width_out=the burst transmission maximum output number max_out_width, and determining the middle transmission width direction input number middle_width_in;

step eleven, the maximum output number Wout-the first time Width direction output number first_width_out can be divided by the Middle transmission Width direction output number middle_width_out, if yes, continuing to step twelve; if not, determining the number of the last burst transmission, and turning to a step thirteen;

step twelve, the last time width direction output number last_width_out=the number middle_width_out of the middle transmission width direction output;

And thirteenth, determining the number last_width_in of the characteristic value inputs of the last burst transmission in the width direction.

Further, calculating the multi-row output elevation direction parameter includes:

step one, inputting a BURST transmission length burst_len;

step two, DEPTH y_depth=7 in the height direction;

step three, calculating the number height_out_test output in the height direction;

step four, comparing the height_out_test with the Hout, and if the height_out_test is smaller than the Wout, filling the number if the height_out_test is smaller than the Wout;

step five, calculating the number height_in_test input in the height direction;

step six, judging whether the number of the height direction inputs is not greater than the depth of the height direction, if so, continuing step seven, if not, carrying out burst transmission on the height direction for times split_h_num=1, wherein the number of the first height direction outputs first_height_out, the number of the last height direction outputs last_height_out are equal to Hout, the number of the first height direction inputs first_height_in and the number of the last height direction inputs last_height_in are equal to height_in_test;

step seven, if the number of the height direction inputs is larger than the height direction depth, determining the first time of the height direction input number first_height_in, the residual height direction input number res_width_in, the first time of the height direction output number first_height_out and the residual height direction output number res_height_out;

Step eight, judging whether the residual height direction input number res_height_in is not larger than the height direction DEPTH Y_DEPTH, if so, continuing step nine, if not, carrying out burst transmission times split_h_num=2 on the height direction, carrying out last height direction input number last_height_in=residual height direction input number res_height_in, and carrying out last height direction output number last_height_out=residual height direction output number res_height_out;

step nine, determining the maximum output number max_out_height of one burst transmission in the height direction, and determining the times split_h_num of burst transmission in the height direction;

step ten, determining the middle transmission height direction output number middle_height_out=the burst transmission maximum output number max_out_height, and determining the middle transmission height direction input number middle_height_in;

step eleven, the maximum output number Hout-the first time width direction output number first_height_out can be divided by the middle transmission height direction output number middle_height_out, if yes, the step twelve is continued; if not, determining the number of the last burst transmission, and turning to a step thirteen;

step twelve, the last time of height direction output number last_height_out=the middle transmission height direction output number middle_height_out;

And thirteenth, determining the number last_height_in of the characteristic value inputs of the last burst transmission in the height direction.

Further, the size of the on-chip buffer space depends on the length of the burst transfer.

Further, the characteristic data transmitted into the hardware are subjected to parallel conversion, the characteristic values of a plurality of different channels are spliced together to form a new characteristic matrix, and then the data is transmitted;

and filtering and calculating the parallel image data by using the same linear filter kernel to obtain corresponding output image data.

Further, when the image data is transferred to the on-chip buffer, five loops are included according to the order of extracting the data:

the first layer cycle is a burst transfer length cycle of the AXI bus; data is transmitted from the first several data of the first row of the first input channel;

the second layer cycle is a cycle in the Ky direction;

the third layer cycle is a cycle in the input image width direction;

the fourth layer of circulation is a circulation in the height direction of the input image, and the image of one channel is completely traversed through the two layers of circulation;

the last layer is a loop in the channel direction after parallelism through which the entire image is completely traversed.

Further, buffer is used to store image data from DDR, first input data is 1/24/47/70/93/116/139 in Ky direction, then input data is 2/25/48/71/94/117/140 in the second column, then expansion in width direction is continued, and so on;

7 data can be input at most once in the Ky direction, when the data is expanded for 3 times in the width direction, 6 output results in the Hout direction are calculated at the same time, and after all the data in the width direction are traversed, the requirement of multi-row output is met.

Further, the convolution calculation of the linear filtering calculation module has three stages, namely an initialization stage, a data writing stage and a data reading stage:

the first phase is a set phase and an initialization phase of the memristor; the specific step of initialization is that an external forward large voltage is applied to the memristor array;

the second phase is a wirte phase, in the array, the resistance value of the memristor is used as the data of the convolution kernel to operate, firstly, negative large voltage is applied to the memristor array again according to the data of the convolution kernel;

the third stage is read stage, which is the stage of realizing convolution calculation process, firstly, external image data is converted into forward small voltage input array by digital-to-analog module DAC, and is operated with the conductance value written in the previous stage, and the obtained current value is passed through analog-to-digital module ADC, and the output result is the filtered image.

A storage medium storing a program file capable of implementing any one of the above-described general-purpose, configurable image filtering calculation multi-line output methods.

A processor for running a program, wherein the program when run performs the generic, configurable image filtering computation multi-line output method of any of the above.

The invention relates to a general configurable image filtering calculation multi-line output system and a method, wherein a CPU stores control parameters related to an input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters; the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result. The invention uses burst transmission and can support the filtering of the filtering core with the size of 2-7, thus greatly enhancing the universality of the scheme.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a block diagram of a generic, configurable image filtering computing multi-line output system of the present invention;

FIG. 2 is a block diagram illustrating a filtering process analysis in accordance with the present invention;

wherein, fig. 2 (a) is a characteristic feature parameter list; FIG. 2 (b) is a diagram of the characterization data before and after filtering, wherein Loop1_Loop5 represents the 5-cycle transmission strategy of DMA;

FIG. 3 is a flow chart of the configuration of the single-row output width direction parameter for filtering calculation in the invention;

FIG. 4 is a flow chart of the configuration of the filtering calculation multi-row output height direction parameters according to the present invention;

FIG. 5 is a data flow diagram of the present invention;

FIG. 6 is a graph of data comparison of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

The invention provides a general and configurable image filtering calculation multi-line output system and method for solving the technical problem that image filtering can not output results in a plurality of lines at the same time, wherein the system comprises a direct memory access module, a parameter configuration module, an on-chip buffer area and a linear filtering calculation module; in the invention, burst transmission is used, and filtering cores with the size of 2-7 can be supported for filtering, so that the universality of the scheme is greatly enhanced.

Referring to fig. 1-2, a general-purpose, configurable image filtering computing multi-line output system includes a direct memory access module, a parameter configuration module, an on-chip buffer (preferably an off-chip memory DDR), a linear filtering computing module; the CPU stores control parameters related to the input image into a parameter configuration module through an AXI-lite bus, the parameter configuration module stores the control parameters calculated in advance after acquiring data from the CPU, and the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module are controlled by utilizing the control parameters calculated in advance; the direct memory access module acquires image input data matched with the control parameters from the off-chip memory DDR through an AXI4 bus according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the off-chip memory DDR through the direct memory access module, and the off-chip memory DDR stores the calculation result.

The linear filtering calculation module comprises a buffer zone, a control unit and a calculation unit; the buffer area receives and extracts input data from the DDR (double data rate) of the off-chip memory and sends the input data to the computing unit; the computing unit receives the input data of the buffer area, generates a computed output result, and sends the computed output result to the off-chip memory DDR for storage through the direct memory access module.

The control parameters calculated by the parameter configuration module of the upper computer comprise a plurality of parameters related to the on-chip cache space, wherein the plurality of parameters comprise the width Wout of the output characteristic, the height Hout of the output characteristic, the number of input channels CHout and the like.

The size of the on-chip buffer space depends on the length of the burst transfer.

The invention designs a general and configurable image filtering calculation multi-line output method.

A generic, configurable image filtering computing multi-line output system and method is shown in fig. 3 and 4. Calculating the single-row, multi-row output width direction parameters includes:

step one, inputting a BURST transmission length burst_len;

step two, width direction DEPTH x_depth=burst length burst_len;

step five, calculating the number width_in_test of the width direction input;

Calculating the multi-row output elevation direction parameter includes:

step one, inputting a BURST transmission length burst_len;

step two, DEPTH y_depth=7 in the height direction;

step five, calculating the number height_in_test input in the height direction;

1. Image filtering analysis

As shown in fig. 2, the goal of image filtering is to downsample the input features (the output matrix of the image or hidden layer in the CNN network) while reducing the dimensions of the input features.

As can be seen from the figure, in order to accelerate the convolution filtering calculation, the Feature data transmitted into hardware needs to be converted in parallel, the Feature values of different channels with a certain number Tin (called parallelism) are spliced together to form a new Feature matrix Feature [ the number of channels of an input image/the parallelism ] [ the height of the input image ] [ the width of the input image ] [ the parallelism ], and then the data is transmitted. As shown in fig. 2, the parallel image data is filtered and calculated by the same linear filter kernel to obtain corresponding output image data, which is the function realized by the whole hardware design. The DMA module is responsible for transferring the image data in the DDR to the on-chip cache, and the figure also shows the order of extracting the data, including five cycles.

The first layer cycle is the burst transfer length cycle of the AXI bus. Because data is transferred from the off-chip memory unit to the on-chip memory via the AXI bus, i.e. several parallel data packets (equal to the burst transfer length) are transferred at once, the transfer starts from the first several data of the first line of the first input channel.

The second layer cycle is a cycle in the Ky direction. This is to enable the completion of multiple rows of calculations in the height direction, resulting in the output being transmitted.

The third layer cycle is a cycle in the width direction of the input image, and the fourth layer cycle is a cycle in the height direction of the input image, through which the image of one channel can be completely traversed. The last layer is a loop in the channel direction after parallelism through which the entire image can be completely traversed.

As shown in fig. 5, the input image parameters are: win=23, hin=24, chi=3, ky=2, kx=3, sx=2, sy=1, px=2, py=1. The first data to be transferred is first_height_in data, first is 1/2/3/4/5/6/7 in width direction and 1/24/47/70/93/116/139 in height direction, seven rows and seven columns of data are all arranged in total, then the second data to be transferred is 7/8/9/10/11/12/13 in width direction and is seven rows and seven columns of data surrounded by 7/30/53/76/99/122/145 in height direction, and the first_width_in/middle_width_in/last_width_in is similarly calculated in width direction until the data in width direction is transmitted; then extending to the height direction, and traversing the first_width_in/last_width_in in the width direction at the stage until the data reaches the data of the middle_height_in; and finally, traversing the first_width_in/middle_width_in/last_width_in data on last_height_in to finish the whole input image.

2. Linear filtering calculation module

The buffer is used to store the image data from DDR, and in order to realize multi-line output of the calculated data, the calculation of the present invention is performed in Ky direction, as shown in fig. 5 (a), which is the first part of data input in fig. 5 (b), where convolution kernel size ky=2, kx=3, ky direction step sy=1, and kx direction step sx=2. In calculation, first, the input data is 1/24/47/70/93/116/139 in the Ky direction, that is, all data in the first column, then the input data is 2/25/48/71/94/117/140 in the second column, then the expansion in the width direction is continued, and so on. 7 data can be input at most once in the Ky direction, when the data is expanded 3 times in the width direction, namely after 7x3 data is input, 6 output results in the Hout direction can be calculated at the same time, and after all the data in the width direction are traversed, the requirement of multi-row output can be met.

3. Implementing filtering computation in memristors

The computational unit of the present invention is responsible for completion by the memristor array.

In the working process of the memristor, three phases are an initialization phase, a data writing phase and a data reading phase.

The first phase, the set phase, is the initialization phase of the memristor. Before the memristor works, the device is required to be initialized and awakened, and the specific step of initialization is to apply external forward large voltage to the memristor array, so that the resistance value of the memristor is the lowest, the conductance value is the highest, and the memristor can work normally after the stage.

In the second stage, namely the writing data stage, in the array, the resistance value of the memristor is used as the data of the convolution kernel to operate, firstly, negative large voltage is applied to the memristor array again according to the data of the convolution kernel, and at the moment, the conductivity value of the memristor is the magnitude of the convolution kernel parameter.

In the third stage, the read stage is a stage realized in the convolution calculation process, firstly, external image data is converted into a forward small voltage input array through a digital-to-analog module DAC, the forward small voltage input array is operated with a conductance value written in the previous stage, the obtained current value is then passed through an analog-to-digital module ADC, and an output result is an image after filtering.

Compared with the prior art, the invention has the beneficial effects that:

in order to achieve the purpose of reducing the circuit area, the invention uses a method of multi-row output, wherein the multi-row output refers to that a computing unit returns to an external DDR (double data rate) final result to comprise a plurality of rows. In the parameter setting of the multi-line output in the present invention, the first consideration is that the maximum depth of the buffer unit is equal to 7, because the convolution kernel with the size ranging from 2 to 7 is used in the current filtering operation, so in the present invention, the storage depth in the height direction is set to 7, and the depth of the buffer unit determines the maximum line number of the output, thereby determining the line number of the input.

The innovation of the present invention is the ability to transmit multiple rows of data at a time, as compared to conventional convolved multiple row outputs. The conventional convolution multi-line output method determines the number of lines to be carried in each time in the height direction of the input image according to the magnitude Ky in the height direction of the convolution kernel, and when the magnitude Ky in the height direction of the convolution kernel is not equal to the step Sy in the height direction, data overlap in the height direction, that is, overlap_y, is generated, that is, the same line of data is carried for multiple times.

As shown in fig. 5, fig. 5 is an input image of 23x24, and the parameters are: win=23, hin=24, chi=3, ky=2, kx=3, sx=2, sy=1, px=2, py=1. Then overlap_y=2-1=1, that is to say 2 rows of data are each handled, one row of data being repeated the last time. In FIG. 5, to calculate the first pixel of the output image, two lines of data, 1/2/3 and 24/25/26, respectively, need to be carried; when the next pixel in the Ky direction is to be calculated, the data to be carried are 24/25/26 and 47/48/49. At this point, it can be seen that the data of 24/25/26 is repeatedly carried once. When the parameters of the input image are determined, the present invention can calculate the height Hout of the output image. In the example of fig. 5, hout=25 of the output image, the number of lines of the conventional multi-line output required to transfer data is 25×2=50.

The key point of the multi-line output realized by the invention is that the size of a convolution kernel is not required to be concerned during data transmission, but the multi-line data is directly carried, and then the multi-line output is realized by the operation in the Ky direction in a calculation module. As shown in fig. 5, the number of data lines to be carried is 50 in the conventional multi-line output, and the method of the present invention only needs to carry hin=24 lines of data.

FIG. 6 is a table listing the number of data lines to be handled for conventional multi-line output of input images of various sizes versus the number of data lines handled in accordance with the present invention. As can be seen from the image data of example 1 and example 4 in the table, when the convolution kernel height direction size Ky is the same as the height direction step Sy, that is, overlap_y is 0, the data to be carried in the two methods are the same. When the input image has overlap_y, as can be seen from the data of the images of examples 2, 3 and 5, the method of the invention has greatly reduced data to be carried compared with the traditional method. Therefore, compared with the prior method, the multi-line output method reduces repeated carrying of data and improves memory access efficiency.

The invention is proved to be feasible through simulation, and the result proves that compared with the original scheme, the efficiency is greatly improved. The invention can be extended to parallelism with the input channel of the image as the filtering calculation.

Example 2

Example 3

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The system embodiments described above are merely exemplary, and for example, the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A universal, configurable image filtering computing multi-line output system, comprising: the system comprises a direct memory access module, a parameter configuration module, an on-chip cache region and a linear filtering calculation module; wherein:

2. The universal, configurable image filtering computing multi-line output system of claim 1 wherein the CPU stores control parameters associated with the input image to the parameter configuration module via an AXI-lite bus; after the parameter configuration module obtains the control parameters from the CPU, the control parameters are utilized to control the access operation of the direct memory access module and control the convolution calculation of the linear filtering calculation module; the direct memory access module acquires image input data matched with the control parameters from the off-chip memory DDR through an AXI4 bus according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the off-chip memory DDR through the direct memory access module, and the off-chip memory DDR stores the calculation result.

3. The universal, configurable image filtering computing multi-line output system of claim 1 wherein the linear filtering computing module comprises a buffer, a control unit, a computing unit; the buffer area receives and extracts input data from the DDR (double data rate) of the off-chip memory and sends the input data to the computing unit; the computing unit receives the input data of the buffer area, generates a computed output result, and sends the computed output result to the off-chip memory DDR for storage through the direct memory access module.

4. A universal, configurable image filtering computation multi-line output method, comprising the steps of:

5. The universal configurable image filtering computing multi-line output method of claim 4 wherein computing single-line, multi-line output width direction parameters comprises:

step one, inputting a BURST transmission length burst_len;

step two, width direction DEPTH x_depth=burst length burst_len;

step five, calculating the number width_in_test of the width direction input;

6. The universal configurable image filtering computing multi-line output method of claim 4 wherein computing multi-line output height direction parameters comprises:

step one, inputting a BURST transmission length burst_len;

Step two, DEPTH y_depth=7 in the height direction;

step five, calculating the number height_in_test input in the height direction;

7. The method for computing multi-line output by general-purpose configurable image filtering according to claim 4, wherein the feature data transmitted into hardware are converted in parallel, the feature values of a plurality of different channels are spliced together to form a new feature matrix, and then the data is transmitted;

8. The method of claim 7, wherein the image data is transferred to the on-chip buffer, comprising five cycles according to the order in which the data is extracted:

the second layer cycle is a cycle in the Ky direction;

the third layer cycle is a cycle in the input image width direction;

9. The method of claim 4, wherein buffer is used to store image data from DDR, first input data is 1/24/47/70/93/116/139 in Ky direction, then input data is 2/25/48/71/94/117/140 in second column, then continue expanding in width direction, and so on;

10. The method of claim 4, wherein the convolution calculation of the linear filter calculation module has three stages, namely an initialization stage, a data writing stage and a data reading stage: