CN116010313A - Universal and configurable image filtering calculation multi-line output system and method - Google Patents

Universal and configurable image filtering calculation multi-line output system and method Download PDF

Info

Publication number
CN116010313A
CN116010313A CN202211506014.1A CN202211506014A CN116010313A CN 116010313 A CN116010313 A CN 116010313A CN 202211506014 A CN202211506014 A CN 202211506014A CN 116010313 A CN116010313 A CN 116010313A
Authority
CN
China
Prior art keywords
height
width
width direction
last
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211506014.1A
Other languages
Chinese (zh)
Inventor
黄明强
陈嘉豪
马文凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202211506014.1A priority Critical patent/CN116010313A/en
Publication of CN116010313A publication Critical patent/CN116010313A/en
Priority to PCT/CN2023/133763 priority patent/WO2024114505A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Processing (AREA)

Abstract

The invention relates to a general purpose configurable image filtering computing multi-line output system and method. In the system and the method, a CPU stores control parameters related to an input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters; the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result. The method solves the technical problem that the prior filtering calculation cannot output a plurality of rows of results at the same time.

Description

Universal and configurable image filtering calculation multi-line output system and method
Technical Field
The invention relates to the field of image filtering, in particular to a general configurable image filtering calculation multi-line output system and method.
Background
The filtering operation is a common operation in the image processing process, and has the main effects of filtering out the wave bands of specific frequencies in the image, so that the filtering is beneficial to reducing or eliminating the influence of environmental noise or noise generated in the signal extraction process on the image, the quality of the image can be improved, convenience is provided for the subsequent image processing operation, and the image information of the reserved frequency wave bands is processed later. The filtering can be divided into two functions according to different filtering wave bands: (1) reducing noise effects in the image. This function is typically achieved by low pass filtering, as noise is typically a high frequency signal relative to the image. (2) extracting key information in the image.
A similar approach is currently available to filter the input image using a single size filter kernel. The data reading address frequently jumps in the existing image filtering scheme, efficient burst transmission is not used, high efficiency and flexible data flow are not available, a fixed 5*5 filtering core is used for filtering, filtering cannot be carried out on a plurality of types of images, universality is not available, and when the filtering scheme is implemented in a memristor array, the utilization rate of a memristor is low.
Disclosure of Invention
The embodiment of the invention provides a general and configurable image filtering calculation multi-line output system and method, which at least solve the technical problem that multi-line results cannot be output at the same time when filtering calculation is performed in the prior art.
According to an embodiment of the present invention, there is provided a general purpose, configurable image filtering computing multi-line output system, including: the system comprises a direct memory access module, a parameter configuration module, an on-chip cache region and a linear filtering calculation module; wherein:
the CPU stores control parameters related to the input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters;
the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result.
Further, the CPU stores control parameters related to the input image to a parameter configuration module through an AXI-lite bus; after the parameter configuration module obtains the control parameters from the CPU, the control parameters are utilized to control the access operation of the direct memory access module and control the convolution calculation of the linear filtering calculation module; the direct memory access module acquires image input data matched with the control parameters from the off-chip memory DDR through an AXI4 bus according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the off-chip memory DDR through the direct memory access module, and the off-chip memory DDR stores the calculation result.
Further, the linear filtering calculation module comprises a buffer zone, a control unit and a calculation unit; the buffer area receives and extracts input data from the DDR (double data rate) of the off-chip memory and sends the input data to the computing unit; the computing unit receives the input data of the buffer area, generates a computed output result, and sends the computed output result to the off-chip memory DDR for storage through the direct memory access module.
Further, the control parameters calculated by the parameter configuration module include several parameters related to the on-chip buffer space, including the width Wout of the output feature, the height Hout of the output feature, the number of input channels CHout, and so on.
According to another embodiment of the present invention, there is provided a general-purpose, configurable image filtering calculation multi-line output method including the steps of:
the CPU stores control parameters related to the input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters;
the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result.
Further, calculating the single-row, multi-row output width direction parameters includes:
Step one, inputting a BURST transmission length burst_len;
step two, width direction DEPTH x_depth=burst length burst_len;
step three, calculating the number width_out_test of the width direction output;
step four, comparing the width_out_test with Wout, and if the width_out_test is smaller than Wout, filling the number to the right;
step five, calculating the number width_in_test of the width direction input;
step six, judging whether the number of the width direction inputs is not greater than the width direction depth, if so, continuing step seven, if not, carrying out burst transmission for times split_w_num=1 in the width direction, wherein the first width direction output number first_width_out, the last width direction output number last_width_out are equal to Wout, the first width direction input number first_width_in and the last width direction input number last_width_in are equal to width_in_test;
step seven, if the number of the width direction inputs is larger than the width direction depth, determining the first time width direction input number first_width_in, the residual width direction input number res_width_in, the first time width direction output number first_width_out and the residual width direction output number res_width_out;
Step eight, judging whether the remaining width direction input number res_width_in is not larger than the width direction DEPTH x_depth, if so, continuing step nine, if not, the number of times of burst transmission in the width direction is required split_w_num=2, the last time of width direction input number last_width_in=the remaining width direction input number res_width_in, and the last time of width direction output number last_width_out=the remaining width direction output number res_width_out;
step nine, determining the maximum output number max_out_width of one burst transmission in the width direction, and determining the number split_w_num of burst transmission in the width direction;
step ten, determining the middle transmission width direction output number middle_width_out=the burst transmission maximum output number max_out_width, and determining the middle transmission width direction input number middle_width_in;
step eleven, the maximum output number Wout-the first time Width direction output number first_width_out can be divided by the Middle transmission Width direction output number middle_width_out, if yes, continuing to step twelve; if not, determining the number of the last burst transmission, and turning to a step thirteen;
step twelve, the last time width direction output number last_width_out=the number middle_width_out of the middle transmission width direction output;
And thirteenth, determining the number last_width_in of the characteristic value inputs of the last burst transmission in the width direction.
Further, calculating the multi-row output elevation direction parameter includes:
step one, inputting a BURST transmission length burst_len;
step two, DEPTH y_depth=7 in the height direction;
step three, calculating the number height_out_test output in the height direction;
step four, comparing the height_out_test with the Hout, and if the height_out_test is smaller than the Wout, filling the number if the height_out_test is smaller than the Wout;
step five, calculating the number height_in_test input in the height direction;
step six, judging whether the number of the height direction inputs is not greater than the depth of the height direction, if so, continuing step seven, if not, carrying out burst transmission on the height direction for times split_h_num=1, wherein the number of the first height direction outputs first_height_out, the number of the last height direction outputs last_height_out are equal to Hout, the number of the first height direction inputs first_height_in and the number of the last height direction inputs last_height_in are equal to height_in_test;
step seven, if the number of the height direction inputs is larger than the height direction depth, determining the first time of the height direction input number first_height_in, the residual height direction input number res_width_in, the first time of the height direction output number first_height_out and the residual height direction output number res_height_out;
Step eight, judging whether the residual height direction input number res_height_in is not larger than the height direction DEPTH Y_DEPTH, if so, continuing step nine, if not, carrying out burst transmission times split_h_num=2 on the height direction, carrying out last height direction input number last_height_in=residual height direction input number res_height_in, and carrying out last height direction output number last_height_out=residual height direction output number res_height_out;
step nine, determining the maximum output number max_out_height of one burst transmission in the height direction, and determining the times split_h_num of burst transmission in the height direction;
step ten, determining the middle transmission height direction output number middle_height_out=the burst transmission maximum output number max_out_height, and determining the middle transmission height direction input number middle_height_in;
step eleven, the maximum output number Hout-the first time width direction output number first_height_out can be divided by the middle transmission height direction output number middle_height_out, if yes, the step twelve is continued; if not, determining the number of the last burst transmission, and turning to a step thirteen;
step twelve, the last time of height direction output number last_height_out=the middle transmission height direction output number middle_height_out;
And thirteenth, determining the number last_height_in of the characteristic value inputs of the last burst transmission in the height direction.
Further, the size of the on-chip buffer space depends on the length of the burst transfer.
Further, the characteristic data transmitted into the hardware are subjected to parallel conversion, the characteristic values of a plurality of different channels are spliced together to form a new characteristic matrix, and then the data is transmitted;
and filtering and calculating the parallel image data by using the same linear filter kernel to obtain corresponding output image data.
Further, when the image data is transferred to the on-chip buffer, five loops are included according to the order of extracting the data:
the first layer cycle is a burst transfer length cycle of the AXI bus; data is transmitted from the first several data of the first row of the first input channel;
the second layer cycle is a cycle in the Ky direction;
the third layer cycle is a cycle in the input image width direction;
the fourth layer of circulation is a circulation in the height direction of the input image, and the image of one channel is completely traversed through the two layers of circulation;
the last layer is a loop in the channel direction after parallelism through which the entire image is completely traversed.
Further, buffer is used to store image data from DDR, first input data is 1/24/47/70/93/116/139 in Ky direction, then input data is 2/25/48/71/94/117/140 in the second column, then expansion in width direction is continued, and so on;
7 data can be input at most once in the Ky direction, when the data is expanded for 3 times in the width direction, 6 output results in the Hout direction are calculated at the same time, and after all the data in the width direction are traversed, the requirement of multi-row output is met.
Further, the convolution calculation of the linear filtering calculation module has three stages, namely an initialization stage, a data writing stage and a data reading stage:
the first phase is a set phase and an initialization phase of the memristor; the specific step of initialization is that an external forward large voltage is applied to the memristor array;
the second phase is a wirte phase, in the array, the resistance value of the memristor is used as the data of the convolution kernel to operate, firstly, negative large voltage is applied to the memristor array again according to the data of the convolution kernel;
the third stage is read stage, which is the stage of realizing convolution calculation process, firstly, external image data is converted into forward small voltage input array by digital-to-analog module DAC, and is operated with the conductance value written in the previous stage, and the obtained current value is passed through analog-to-digital module ADC, and the output result is the filtered image.
A storage medium storing a program file capable of implementing any one of the above-described general-purpose, configurable image filtering calculation multi-line output methods.
A processor for running a program, wherein the program when run performs the generic, configurable image filtering computation multi-line output method of any of the above.
The invention relates to a general configurable image filtering calculation multi-line output system and a method, wherein a CPU stores control parameters related to an input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters; the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result. The invention uses burst transmission and can support the filtering of the filtering core with the size of 2-7, thus greatly enhancing the universality of the scheme.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a block diagram of a generic, configurable image filtering computing multi-line output system of the present invention;
FIG. 2 is a block diagram illustrating a filtering process analysis in accordance with the present invention;
wherein, fig. 2 (a) is a characteristic feature parameter list; FIG. 2 (b) is a diagram of the characterization data before and after filtering, wherein Loop1_Loop5 represents the 5-cycle transmission strategy of DMA;
FIG. 3 is a flow chart of the configuration of the single-row output width direction parameter for filtering calculation in the invention;
FIG. 4 is a flow chart of the configuration of the filtering calculation multi-row output height direction parameters according to the present invention;
FIG. 5 is a data flow diagram of the present invention;
FIG. 6 is a graph of data comparison of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The invention provides a general and configurable image filtering calculation multi-line output system and method for solving the technical problem that image filtering can not output results in a plurality of lines at the same time, wherein the system comprises a direct memory access module, a parameter configuration module, an on-chip buffer area and a linear filtering calculation module; in the invention, burst transmission is used, and filtering cores with the size of 2-7 can be supported for filtering, so that the universality of the scheme is greatly enhanced.
Referring to fig. 1-2, a general-purpose, configurable image filtering computing multi-line output system includes a direct memory access module, a parameter configuration module, an on-chip buffer (preferably an off-chip memory DDR), a linear filtering computing module; the CPU stores control parameters related to the input image into a parameter configuration module through an AXI-lite bus, the parameter configuration module stores the control parameters calculated in advance after acquiring data from the CPU, and the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module are controlled by utilizing the control parameters calculated in advance; the direct memory access module acquires image input data matched with the control parameters from the off-chip memory DDR through an AXI4 bus according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the off-chip memory DDR through the direct memory access module, and the off-chip memory DDR stores the calculation result.
The linear filtering calculation module comprises a buffer zone, a control unit and a calculation unit; the buffer area receives and extracts input data from the DDR (double data rate) of the off-chip memory and sends the input data to the computing unit; the computing unit receives the input data of the buffer area, generates a computed output result, and sends the computed output result to the off-chip memory DDR for storage through the direct memory access module.
The control parameters calculated by the parameter configuration module of the upper computer comprise a plurality of parameters related to the on-chip cache space, wherein the plurality of parameters comprise the width Wout of the output characteristic, the height Hout of the output characteristic, the number of input channels CHout and the like.
The size of the on-chip buffer space depends on the length of the burst transfer.
The invention designs a general and configurable image filtering calculation multi-line output method.
A generic, configurable image filtering computing multi-line output system and method is shown in fig. 3 and 4. Calculating the single-row, multi-row output width direction parameters includes:
step one, inputting a BURST transmission length burst_len;
step two, width direction DEPTH x_depth=burst length burst_len;
step three, calculating the number width_out_test of the width direction output;
step four, comparing the width_out_test with Wout, and if the width_out_test is smaller than Wout, filling the number to the right;
step five, calculating the number width_in_test of the width direction input;
step six, judging whether the number of the width direction inputs is not greater than the width direction depth, if so, continuing step seven, if not, carrying out burst transmission for times split_w_num=1 in the width direction, wherein the first width direction output number first_width_out, the last width direction output number last_width_out are equal to Wout, the first width direction input number first_width_in and the last width direction input number last_width_in are equal to width_in_test;
Step seven, if the number of the width direction inputs is larger than the width direction depth, determining the first time width direction input number first_width_in, the residual width direction input number res_width_in, the first time width direction output number first_width_out and the residual width direction output number res_width_out;
step eight, judging whether the remaining width direction input number res_width_in is not larger than the width direction DEPTH x_depth, if so, continuing step nine, if not, the number of times of burst transmission in the width direction is required split_w_num=2, the last time of width direction input number last_width_in=the remaining width direction input number res_width_in, and the last time of width direction output number last_width_out=the remaining width direction output number res_width_out;
step nine, determining the maximum output number max_out_width of one burst transmission in the width direction, and determining the number split_w_num of burst transmission in the width direction;
step ten, determining the middle transmission width direction output number middle_width_out=the burst transmission maximum output number max_out_width, and determining the middle transmission width direction input number middle_width_in;
step eleven, the maximum output number Wout-the first time Width direction output number first_width_out can be divided by the Middle transmission Width direction output number middle_width_out, if yes, continuing to step twelve; if not, determining the number of the last burst transmission, and turning to a step thirteen;
Step twelve, the last time width direction output number last_width_out=the number middle_width_out of the middle transmission width direction output;
and thirteenth, determining the number last_width_in of the characteristic value inputs of the last burst transmission in the width direction.
Calculating the multi-row output elevation direction parameter includes:
step one, inputting a BURST transmission length burst_len;
step two, DEPTH y_depth=7 in the height direction;
step three, calculating the number height_out_test output in the height direction;
step four, comparing the height_out_test with the Hout, and if the height_out_test is smaller than the Wout, filling the number if the height_out_test is smaller than the Wout;
step five, calculating the number height_in_test input in the height direction;
step six, judging whether the number of the height direction inputs is not greater than the depth of the height direction, if so, continuing step seven, if not, carrying out burst transmission on the height direction for times split_h_num=1, wherein the number of the first height direction outputs first_height_out, the number of the last height direction outputs last_height_out are equal to Hout, the number of the first height direction inputs first_height_in and the number of the last height direction inputs last_height_in are equal to height_in_test;
Step seven, if the number of the height direction inputs is larger than the height direction depth, determining the first time of the height direction input number first_height_in, the residual height direction input number res_width_in, the first time of the height direction output number first_height_out and the residual height direction output number res_height_out;
step eight, judging whether the residual height direction input number res_height_in is not larger than the height direction DEPTH Y_DEPTH, if so, continuing step nine, if not, carrying out burst transmission times split_h_num=2 on the height direction, carrying out last height direction input number last_height_in=residual height direction input number res_height_in, and carrying out last height direction output number last_height_out=residual height direction output number res_height_out;
step nine, determining the maximum output number max_out_height of one burst transmission in the height direction, and determining the times split_h_num of burst transmission in the height direction;
step ten, determining the middle transmission height direction output number middle_height_out=the burst transmission maximum output number max_out_height, and determining the middle transmission height direction input number middle_height_in;
step eleven, the maximum output number Hout-the first time width direction output number first_height_out can be divided by the middle transmission height direction output number middle_height_out, if yes, the step twelve is continued; if not, determining the number of the last burst transmission, and turning to a step thirteen;
Step twelve, the last time of height direction output number last_height_out=the middle transmission height direction output number middle_height_out;
and thirteenth, determining the number last_height_in of the characteristic value inputs of the last burst transmission in the height direction.
1. Image filtering analysis
As shown in fig. 2, the goal of image filtering is to downsample the input features (the output matrix of the image or hidden layer in the CNN network) while reducing the dimensions of the input features.
As can be seen from the figure, in order to accelerate the convolution filtering calculation, the Feature data transmitted into hardware needs to be converted in parallel, the Feature values of different channels with a certain number Tin (called parallelism) are spliced together to form a new Feature matrix Feature [ the number of channels of an input image/the parallelism ] [ the height of the input image ] [ the width of the input image ] [ the parallelism ], and then the data is transmitted. As shown in fig. 2, the parallel image data is filtered and calculated by the same linear filter kernel to obtain corresponding output image data, which is the function realized by the whole hardware design. The DMA module is responsible for transferring the image data in the DDR to the on-chip cache, and the figure also shows the order of extracting the data, including five cycles.
The first layer cycle is the burst transfer length cycle of the AXI bus. Because data is transferred from the off-chip memory unit to the on-chip memory via the AXI bus, i.e. several parallel data packets (equal to the burst transfer length) are transferred at once, the transfer starts from the first several data of the first line of the first input channel.
The second layer cycle is a cycle in the Ky direction. This is to enable the completion of multiple rows of calculations in the height direction, resulting in the output being transmitted.
The third layer cycle is a cycle in the width direction of the input image, and the fourth layer cycle is a cycle in the height direction of the input image, through which the image of one channel can be completely traversed. The last layer is a loop in the channel direction after parallelism through which the entire image can be completely traversed.
As shown in fig. 5, the input image parameters are: win=23, hin=24, chi=3, ky=2, kx=3, sx=2, sy=1, px=2, py=1. The first data to be transferred is first_height_in data, first is 1/2/3/4/5/6/7 in width direction and 1/24/47/70/93/116/139 in height direction, seven rows and seven columns of data are all arranged in total, then the second data to be transferred is 7/8/9/10/11/12/13 in width direction and is seven rows and seven columns of data surrounded by 7/30/53/76/99/122/145 in height direction, and the first_width_in/middle_width_in/last_width_in is similarly calculated in width direction until the data in width direction is transmitted; then extending to the height direction, and traversing the first_width_in/last_width_in in the width direction at the stage until the data reaches the data of the middle_height_in; and finally, traversing the first_width_in/middle_width_in/last_width_in data on last_height_in to finish the whole input image.
2. Linear filtering calculation module
The buffer is used to store the image data from DDR, and in order to realize multi-line output of the calculated data, the calculation of the present invention is performed in Ky direction, as shown in fig. 5 (a), which is the first part of data input in fig. 5 (b), where convolution kernel size ky=2, kx=3, ky direction step sy=1, and kx direction step sx=2. In calculation, first, the input data is 1/24/47/70/93/116/139 in the Ky direction, that is, all data in the first column, then the input data is 2/25/48/71/94/117/140 in the second column, then the expansion in the width direction is continued, and so on. 7 data can be input at most once in the Ky direction, when the data is expanded 3 times in the width direction, namely after 7x3 data is input, 6 output results in the Hout direction can be calculated at the same time, and after all the data in the width direction are traversed, the requirement of multi-row output can be met.
3. Implementing filtering computation in memristors
The computational unit of the present invention is responsible for completion by the memristor array.
In the working process of the memristor, three phases are an initialization phase, a data writing phase and a data reading phase.
The first phase, the set phase, is the initialization phase of the memristor. Before the memristor works, the device is required to be initialized and awakened, and the specific step of initialization is to apply external forward large voltage to the memristor array, so that the resistance value of the memristor is the lowest, the conductance value is the highest, and the memristor can work normally after the stage.
In the second stage, namely the writing data stage, in the array, the resistance value of the memristor is used as the data of the convolution kernel to operate, firstly, negative large voltage is applied to the memristor array again according to the data of the convolution kernel, and at the moment, the conductivity value of the memristor is the magnitude of the convolution kernel parameter.
In the third stage, the read stage is a stage realized in the convolution calculation process, firstly, external image data is converted into a forward small voltage input array through a digital-to-analog module DAC, the forward small voltage input array is operated with a conductance value written in the previous stage, the obtained current value is then passed through an analog-to-digital module ADC, and an output result is an image after filtering.
Compared with the prior art, the invention has the beneficial effects that:
in order to achieve the purpose of reducing the circuit area, the invention uses a method of multi-row output, wherein the multi-row output refers to that a computing unit returns to an external DDR (double data rate) final result to comprise a plurality of rows. In the parameter setting of the multi-line output in the present invention, the first consideration is that the maximum depth of the buffer unit is equal to 7, because the convolution kernel with the size ranging from 2 to 7 is used in the current filtering operation, so in the present invention, the storage depth in the height direction is set to 7, and the depth of the buffer unit determines the maximum line number of the output, thereby determining the line number of the input.
The innovation of the present invention is the ability to transmit multiple rows of data at a time, as compared to conventional convolved multiple row outputs. The conventional convolution multi-line output method determines the number of lines to be carried in each time in the height direction of the input image according to the magnitude Ky in the height direction of the convolution kernel, and when the magnitude Ky in the height direction of the convolution kernel is not equal to the step Sy in the height direction, data overlap in the height direction, that is, overlap_y, is generated, that is, the same line of data is carried for multiple times.
As shown in fig. 5, fig. 5 is an input image of 23x24, and the parameters are: win=23, hin=24, chi=3, ky=2, kx=3, sx=2, sy=1, px=2, py=1. Then overlap_y=2-1=1, that is to say 2 rows of data are each handled, one row of data being repeated the last time. In FIG. 5, to calculate the first pixel of the output image, two lines of data, 1/2/3 and 24/25/26, respectively, need to be carried; when the next pixel in the Ky direction is to be calculated, the data to be carried are 24/25/26 and 47/48/49. At this point, it can be seen that the data of 24/25/26 is repeatedly carried once. When the parameters of the input image are determined, the present invention can calculate the height Hout of the output image. In the example of fig. 5, hout=25 of the output image, the number of lines of the conventional multi-line output required to transfer data is 25×2=50.
The key point of the multi-line output realized by the invention is that the size of a convolution kernel is not required to be concerned during data transmission, but the multi-line data is directly carried, and then the multi-line output is realized by the operation in the Ky direction in a calculation module. As shown in fig. 5, the number of data lines to be carried is 50 in the conventional multi-line output, and the method of the present invention only needs to carry hin=24 lines of data.
FIG. 6 is a table listing the number of data lines to be handled for conventional multi-line output of input images of various sizes versus the number of data lines handled in accordance with the present invention. As can be seen from the image data of example 1 and example 4 in the table, when the convolution kernel height direction size Ky is the same as the height direction step Sy, that is, overlap_y is 0, the data to be carried in the two methods are the same. When the input image has overlap_y, as can be seen from the data of the images of examples 2, 3 and 5, the method of the invention has greatly reduced data to be carried compared with the traditional method. Therefore, compared with the prior method, the multi-line output method reduces repeated carrying of data and improves memory access efficiency.
The invention is proved to be feasible through simulation, and the result proves that compared with the original scheme, the efficiency is greatly improved. The invention can be extended to parallelism with the input channel of the image as the filtering calculation.
Example 2
A storage medium storing a program file capable of implementing any one of the above-described general-purpose, configurable image filtering calculation multi-line output methods.
Example 3
A processor for running a program, wherein the program when run performs the generic, configurable image filtering computation multi-line output method of any of the above.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The system embodiments described above are merely exemplary, and for example, the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A universal, configurable image filtering computing multi-line output system, comprising: the system comprises a direct memory access module, a parameter configuration module, an on-chip cache region and a linear filtering calculation module; wherein:
the CPU stores control parameters related to the input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters;
the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result.
2. The universal, configurable image filtering computing multi-line output system of claim 1 wherein the CPU stores control parameters associated with the input image to the parameter configuration module via an AXI-lite bus; after the parameter configuration module obtains the control parameters from the CPU, the control parameters are utilized to control the access operation of the direct memory access module and control the convolution calculation of the linear filtering calculation module; the direct memory access module acquires image input data matched with the control parameters from the off-chip memory DDR through an AXI4 bus according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the off-chip memory DDR through the direct memory access module, and the off-chip memory DDR stores the calculation result.
3. The universal, configurable image filtering computing multi-line output system of claim 1 wherein the linear filtering computing module comprises a buffer, a control unit, a computing unit; the buffer area receives and extracts input data from the DDR (double data rate) of the off-chip memory and sends the input data to the computing unit; the computing unit receives the input data of the buffer area, generates a computed output result, and sends the computed output result to the off-chip memory DDR for storage through the direct memory access module.
4. A universal, configurable image filtering computation multi-line output method, comprising the steps of:
the CPU stores control parameters related to the input image into a parameter configuration module; the parameter configuration module is used for controlling the access operation of the direct memory access module and the convolution calculation of the linear filtering calculation module by utilizing the control parameters after acquiring the control parameters;
the direct memory access module acquires image input data matched with the control parameters from the on-chip cache area according to the control parameters sent by the parameter configuration module, and sends the input data to the linear filtering calculation module; the linear filtering calculation module performs convolution calculation matched with the control parameters according to the control parameters sent by the parameter configuration module, and sends the calculation result to the on-chip cache area through the direct memory access module, and the on-chip cache area stores the calculation result.
5. The universal configurable image filtering computing multi-line output method of claim 4 wherein computing single-line, multi-line output width direction parameters comprises:
step one, inputting a BURST transmission length burst_len;
step two, width direction DEPTH x_depth=burst length burst_len;
Step three, calculating the number width_out_test of the width direction output;
step four, comparing the width_out_test with Wout, and if the width_out_test is smaller than Wout, filling the number to the right;
step five, calculating the number width_in_test of the width direction input;
step six, judging whether the number of the width direction inputs is not greater than the width direction depth, if so, continuing step seven, if not, carrying out burst transmission for times split_w_num=1 in the width direction, wherein the first width direction output number first_width_out, the last width direction output number last_width_out are equal to Wout, the first width direction input number first_width_in and the last width direction input number last_width_in are equal to width_in_test;
step seven, if the number of the width direction inputs is larger than the width direction depth, determining the first time width direction input number first_width_in, the residual width direction input number res_width_in, the first time width direction output number first_width_out and the residual width direction output number res_width_out;
step eight, judging whether the remaining width direction input number res_width_in is not larger than the width direction DEPTH x_depth, if so, continuing step nine, if not, the number of times of burst transmission in the width direction is required split_w_num=2, the last time of width direction input number last_width_in=the remaining width direction input number res_width_in, and the last time of width direction output number last_width_out=the remaining width direction output number res_width_out;
Step nine, determining the maximum output number max_out_width of one burst transmission in the width direction, and determining the number split_w_num of burst transmission in the width direction;
step ten, determining the middle transmission width direction output number middle_width_out=the burst transmission maximum output number max_out_width, and determining the middle transmission width direction input number middle_width_in;
step eleven, the maximum output number Wout-the first time Width direction output number first_width_out can be divided by the Middle transmission Width direction output number middle_width_out, if yes, continuing to step twelve; if not, determining the number of the last burst transmission, and turning to a step thirteen;
step twelve, the last time width direction output number last_width_out=the number middle_width_out of the middle transmission width direction output;
and thirteenth, determining the number last_width_in of the characteristic value inputs of the last burst transmission in the width direction.
6. The universal configurable image filtering computing multi-line output method of claim 4 wherein computing multi-line output height direction parameters comprises:
step one, inputting a BURST transmission length burst_len;
Step two, DEPTH y_depth=7 in the height direction;
step three, calculating the number height_out_test output in the height direction;
step four, comparing the height_out_test with the Hout, and if the height_out_test is smaller than the Wout, filling the number if the height_out_test is smaller than the Wout;
step five, calculating the number height_in_test input in the height direction;
step six, judging whether the number of the height direction inputs is not greater than the depth of the height direction, if so, continuing step seven, if not, carrying out burst transmission on the height direction for times split_h_num=1, wherein the number of the first height direction outputs first_height_out, the number of the last height direction outputs last_height_out are equal to Hout, the number of the first height direction inputs first_height_in and the number of the last height direction inputs last_height_in are equal to height_in_test;
step seven, if the number of the height direction inputs is larger than the height direction depth, determining the first time of the height direction input number first_height_in, the residual height direction input number res_width_in, the first time of the height direction output number first_height_out and the residual height direction output number res_height_out;
step eight, judging whether the residual height direction input number res_height_in is not larger than the height direction DEPTH Y_DEPTH, if so, continuing step nine, if not, carrying out burst transmission times split_h_num=2 on the height direction, carrying out last height direction input number last_height_in=residual height direction input number res_height_in, and carrying out last height direction output number last_height_out=residual height direction output number res_height_out;
Step nine, determining the maximum output number max_out_height of one burst transmission in the height direction, and determining the times split_h_num of burst transmission in the height direction;
step ten, determining the middle transmission height direction output number middle_height_out=the burst transmission maximum output number max_out_height, and determining the middle transmission height direction input number middle_height_in;
step eleven, the maximum output number Hout-the first time width direction output number first_height_out can be divided by the middle transmission height direction output number middle_height_out, if yes, the step twelve is continued; if not, determining the number of the last burst transmission, and turning to a step thirteen;
step twelve, the last time of height direction output number last_height_out=the middle transmission height direction output number middle_height_out;
and thirteenth, determining the number last_height_in of the characteristic value inputs of the last burst transmission in the height direction.
7. The method for computing multi-line output by general-purpose configurable image filtering according to claim 4, wherein the feature data transmitted into hardware are converted in parallel, the feature values of a plurality of different channels are spliced together to form a new feature matrix, and then the data is transmitted;
And filtering and calculating the parallel image data by using the same linear filter kernel to obtain corresponding output image data.
8. The method of claim 7, wherein the image data is transferred to the on-chip buffer, comprising five cycles according to the order in which the data is extracted:
the first layer cycle is a burst transfer length cycle of the AXI bus; data is transmitted from the first several data of the first row of the first input channel;
the second layer cycle is a cycle in the Ky direction;
the third layer cycle is a cycle in the input image width direction;
the fourth layer of circulation is a circulation in the height direction of the input image, and the image of one channel is completely traversed through the two layers of circulation;
the last layer is a loop in the channel direction after parallelism through which the entire image is completely traversed.
9. The method of claim 4, wherein buffer is used to store image data from DDR, first input data is 1/24/47/70/93/116/139 in Ky direction, then input data is 2/25/48/71/94/117/140 in second column, then continue expanding in width direction, and so on;
7 data can be input at most once in the Ky direction, when the data is expanded for 3 times in the width direction, 6 output results in the Hout direction are calculated at the same time, and after all the data in the width direction are traversed, the requirement of multi-row output is met.
10. The method of claim 4, wherein the convolution calculation of the linear filter calculation module has three stages, namely an initialization stage, a data writing stage and a data reading stage:
the first phase is a set phase and an initialization phase of the memristor; the specific step of initialization is that an external forward large voltage is applied to the memristor array;
the second phase is a wirte phase, in the array, the resistance value of the memristor is used as the data of the convolution kernel to operate, firstly, negative large voltage is applied to the memristor array again according to the data of the convolution kernel;
the third stage is read stage, which is the stage of realizing convolution calculation process, firstly, external image data is converted into forward small voltage input array by digital-to-analog module DAC, and is operated with the conductance value written in the previous stage, and the obtained current value is passed through analog-to-digital module ADC, and the output result is the filtered image.
CN202211506014.1A 2022-11-29 2022-11-29 Universal and configurable image filtering calculation multi-line output system and method Pending CN116010313A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211506014.1A CN116010313A (en) 2022-11-29 2022-11-29 Universal and configurable image filtering calculation multi-line output system and method
PCT/CN2023/133763 WO2024114505A1 (en) 2022-11-29 2023-11-23 Universal and configurable system and method for image filtering computation and multi-row output

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211506014.1A CN116010313A (en) 2022-11-29 2022-11-29 Universal and configurable image filtering calculation multi-line output system and method

Publications (1)

Publication Number Publication Date
CN116010313A true CN116010313A (en) 2023-04-25

Family

ID=86030717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211506014.1A Pending CN116010313A (en) 2022-11-29 2022-11-29 Universal and configurable image filtering calculation multi-line output system and method

Country Status (2)

Country Link
CN (1) CN116010313A (en)
WO (1) WO2024114505A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024114505A1 (en) * 2022-11-29 2024-06-06 中国科学院深圳先进技术研究院 Universal and configurable system and method for image filtering computation and multi-row output

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019041264A1 (en) * 2017-08-31 2019-03-07 深圳市大疆创新科技有限公司 Image processing apparatus and method, and related circuit
KR102126857B1 (en) * 2018-05-10 2020-06-25 서울대학교산학협력단 Neural network processor based on row operation and data processing method using thereof
CN110390384B (en) * 2019-06-25 2021-07-06 东南大学 Configurable general convolutional neural network accelerator
US11726950B2 (en) * 2019-09-28 2023-08-15 Intel Corporation Compute near memory convolution accelerator
CN111767986A (en) * 2020-06-24 2020-10-13 深兰人工智能芯片研究院(江苏)有限公司 Operation method and device based on neural network
CN114265801B (en) * 2021-12-21 2023-07-25 中国科学院深圳先进技术研究院 Universal and configurable high-energy-efficiency pooling calculation multi-line output method
CN114372012B (en) * 2021-12-21 2024-02-20 中国科学院深圳先进技术研究院 Universal and configurable high-energy-efficiency pooling calculation single-row output system and method
CN116010313A (en) * 2022-11-29 2023-04-25 中国科学院深圳先进技术研究院 Universal and configurable image filtering calculation multi-line output system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024114505A1 (en) * 2022-11-29 2024-06-06 中国科学院深圳先进技术研究院 Universal and configurable system and method for image filtering computation and multi-row output

Also Published As

Publication number Publication date
WO2024114505A1 (en) 2024-06-06

Similar Documents

Publication Publication Date Title
US11461684B2 (en) Operation processing circuit and recognition system
US10768894B2 (en) Processor, information processing apparatus and operation method for processor
JP6767660B2 (en) Processor, information processing device and how the processor operates
CN107742150B (en) Data processing method and device of convolutional neural network
CN111758107A (en) System and method for hardware-based pooling
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN112991142B (en) Matrix operation method, device, equipment and storage medium for image data
US20200342294A1 (en) Neural network accelerating apparatus and operating method thereof
CN109858622B (en) Data handling circuit and method for deep learning neural network
CN111583095B (en) Image data storage method, image data processing system and related device
US20230267571A1 (en) Data loading method and apparatus for convolution operation
CN116010313A (en) Universal and configurable image filtering calculation multi-line output system and method
JP7419574B2 (en) Dilated convolution acceleration calculation method and device
EP3970036A1 (en) High throughput neural network operations using inter-layer memory layout transformation
CN111028136B (en) Method and equipment for processing two-dimensional complex matrix by artificial intelligence processor
CN111639701B (en) Method, system and equipment for extracting image features and readable storage medium
KR20210014561A (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
GB2551188A (en) Data processing systems
CN110009644B (en) Method and device for segmenting line pixels of feature map
WO2019206162A1 (en) Computing device and computing method
CN112712457B (en) Data processing method and artificial intelligence processor
CN112016522A (en) Video data processing method, system and related components
US20210064688A1 (en) Compute optimization
CN106878586B (en) reconfigurable parallel image detail enhancement method and device
US11587203B2 (en) Method for optimizing hardware structure of convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication