CN116828130A

CN116828130A - Method for realizing high throughput rate median filter based on FPGA

Info

Publication number: CN116828130A
Application number: CN202310865178.1A
Authority: CN
Inventors: 王岩; 曲春辉; 杜江; 尹航; 艾占杨; 岳海涛
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-09-29

Abstract

The invention provides a method for realizing a high throughput rate median filter based on an FPGA (field programmable gate array), which is a design method of a 32-channel parallel median filter, wherein a selected two-dimensional template is a 5*5 rectangular window, 32 8bits (256 bits total) pixel values are input into each clock, 32-channel parallel median filtering is carried out, and 32 filtered 8bits pixel values are output from each clock after processing. The processing process of the invention is mainly divided into two modules, namely a RAM cache module and a parallel median filtering network module. The invention improves throughput rate through 32 paths of parallel computation, and simultaneously multiplexes a part of intermediate results according to the similarity of the median process of adjacent pixels to optimize processing resources.

Description

Method for realizing high throughput rate median filter based on FPGA

Technical Field

The invention belongs to the field of digital image processing, and particularly relates to a method for realizing a high throughput median filter based on an FPGA (FieldProgrammableGate Array ).

Background

The median filter is a nonlinear filter based on ranking statistical theory and is used more to remove impulse noise from images. It is a more reliable method than traditional linear filtering because it retains sharp edges, although its computational cost is much higher.

The basic principle of median filtering is to replace the value of a point in a digital image with the median of the values of points in a neighborhood of the point, thereby eliminating isolated noise points. The method is to sort pixels in a two-dimensional sliding template with a certain structure according to the size of pixel values, generate a monotonically ascending (or descending) two-dimensional data sequence, and find out the median value of the sorted pixel values as an output value. And then sliding the two-dimensional template from left to right and from top to bottom, and performing median filtering processing once for each sliding unit. The two-dimensional templates are usually 3 x 3,5 x 5 regions, and can be in different shapes, such as linear, circular, cross-shaped, circular ring-shaped, etc.

The flow of the 5*5 median filter, which is proposed in the prior art and can be executed in a pipelined manner, is shown in fig. 1, and is mainly divided into four steps, wherein each solid point in the figure represents a pixel, a line segment with an arrow indicates that pixels on the line segment are ordered, the minimum value after the ordering is positioned at the top of the arrow, the maximum value is positioned at the tail end of the arrow, the solid point with a circle indicates a pixel value which needs to be calculated in partial ordering, and the solid point without a circle indicates a pixel value which does not need to be calculated in partial ordering. Fig. 1, panel a, shows the original 5*5 pixel values, fig. 1, panel b shows the sorting of each column first, fig. 1, panel c shows the partial sorting of each row, fig. 1, panel d shows the partial sorting of the diagonal with slope k=1, and fig. 1, panel e shows the partial sorting of the diagonal with slope k=1/2. The result of the processing is the filtered value of the middle pixel in the graph a of fig. 1.

A common sorting network generally refers to fully sorting the input pixels, that is, how many values are input and how many sorted values are output, but in some cases, not all the output is needed, for example, after sorting the 5 input values, only the first two largest values are concerned, and the sorting of only partial values is called partial sorting. Partial ordering may save some resource consumption compared to full ordering. The ordering in FIG. 1, FIG. c, FIG. 1, FIG. d, FIG. 1 e is a partial ordering.

The existing scheme mainly selects a two-dimensional template of 3*3 for median filtering, and researches on the two-dimensional template of 5*5 are less. This is mainly because the complexity of the median filtering of 5*5 is significantly higher than 3*3. For the two-dimensional template of 5*5, the median of 25 pixels around is needed to be calculated every time the median filtering result of one pixel is calculated, the total operation amount is proportional to the number of pixels of the whole image, in addition, the throughput rate requirement of the median filtering of many application scenes is high, and when the image size is large, the performance of a processor is a huge test.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for realizing a high throughput rate median filter based on an FPGA, wherein a selected two-dimensional template is a 5*5 rectangular window, 32 8bits (256 bits total) pixel values are input into each clock, 32 paths of parallel median filtering is carried out, and 32 filtered 8bits pixel values are output from each clock after processing. The processing process of the invention is mainly divided into two modules, namely a RAM cache module and a parallel median filtering network module. The invention improves throughput rate through 32 paths of parallel computation, and simultaneously multiplexes a part of intermediate results according to the similarity of the median process of adjacent pixels to optimize processing resources.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a method for realizing a high throughput rate median filter based on FPGA selects 5*5 rectangular window as a two-dimensional template, 32 paths of parallel median filtering is carried out on 32 8bits pixel values input by each clock, and 32 filtered 8bits pixel values are output by each clock after processing; the method is realized through a RAM cache module and a parallel median filtering network module; the RAM buffer module is used for storing the data of the first 4 rows, and after the data of the 5 th row arrives, the data of the 5 th row is read out simultaneously and sent to the following parallel median filtering network module; the parallel median filtering network module is used for receiving the 5 lines of data output by the RAM cache module, completing parallel median filtering calculation and then outputting a filtering result.

Further, the RAM cache module comprises a write control module and a read control module.

Further, the write control module is used for controlling the wheel flow to write data into the 5-block RAM, writing data of one row in an image into each block RAM, and writing 32 pixel values into each clock; the read control module is used for controlling the simultaneous reading of data from the 5 RAMs, and then outputting the read 5 data to the parallel median filtering network module.

Further, the parallel median filtering network module data selection flow is as follows: the leftmost 5 columns of pixels are selected as the first path median filtering input, the 2 nd column to the 6 th column of pixels are selected as the second path median filtering input, and the like, and 32 paths of median filtering inputs are selected.

Further, the parallel median filtering network module consists of 32 public sorting networks, 32 independent sorting networks and one data distribution module.

Further, each public sorting network is used for sorting the input 5 pixel values from small to large, the result of each 5 public sorting networks is the result of column sorting in a 5*5 rectangular window, and the result of each public sorting network is commonly used by a plurality of pixels; and selecting 5 adjacent columns of data for each independent ordering network by utilizing a data distribution module, and then entering the independent ordering network.

Further, the public sorting network is composed of 9 basic sorting units, a first-stage pipeline register is inserted after the six basic sorting units on the left, and after any 5 pixel values enter the public sorting network, the values are changed into 5 values after sorting from small to large through delay of one clock.

Further, the independent ordering network includes a partial row ordering, a partial diagonal ordering (k=1), and a partial diagonal ordering (k=1/2), including 25 input pixel values; the four-stage pipeline register is inserted, the first stage pipeline register is positioned in the middle of partial row sequencing, the second stage pipeline register is positioned after partial row sequencing, the third stage pipeline register is positioned in the middle of partial diagonal sequencing (k=1), and the fourth stage pipeline register is positioned after partial diagonal sequencing (k=1), namely, the independent sequencing network outputs a filtering result after delay of 4 clocks.

Further, the basic ordering unit is used for realizing the size comparison and the exchange sequence of two 8bits pixel values; the basic ordering unit consists of an 8bits comparator and two multiplexers, assuming that the two 8bits pixel values are A and B respectively, the output of the 8bits comparator controls the multiplexer output A or B.

The beneficial effects are that:

1. the invention fully utilizes the characteristic of strong parallel computing capability of the FPGA, and the throughput rate of the design scheme can reach 6.4GB/s when the processing clock is 200 MHz.

2. All modules are carefully optimized, and the sequencing network structure reduces the required comparison times to the minimum and reduces the resource consumption to the maximum.

Drawings

FIG. 1 is a flow chart of median filtering for a single pixel 5*5;

FIG. 2 is a schematic diagram of a method of implementing a high throughput median filter based on an FPGA of the present invention;

FIG. 3 is a diagram showing the internal design of a RAM cache module;

FIG. 4 is a schematic diagram of a parallel median filter network module data selection process;

FIG. 5 is a diagram of the internal design of a parallel median filter network module;

FIG. 6 is a simplified symbol diagram and basic ordering unit;

FIG. 7 is a diagram of the internal structure of a public ordering network;

fig. 8 is a diagram of the internal structure of the independent ordering network.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The two-dimensional template selected by the invention is a 5*5 rectangular window, 32 8bits (256 bits total) pixel values are input into each clock, 32 paths of parallel median filtering is carried out according to the processing flow of fig. 2, and 32 filtered 8bits pixel values are output from each clock after processing. The processing procedure of the invention is mainly divided into two modules, namely a RAM (random access memory) buffer module and a parallel median filtering network module.

(1) RAM cache module design:

for a gray scale image of size n×m, n rows and m columns are used, and the image data is generally stored in a memory outside the FPGA in rows. When image processing is started, the image is required to be read out according to the rows and input into a processing module, and for median filtering of a 5*5 rectangular window, 5 rows of data are required to be input at the same time to start processing, and the RAM buffer module is used for storing the first 4 rows of data, waiting for the arrival of the 5 th row of data, and then simultaneously reading out the 5 th row of data and sending the 5 th row of data to a following parallel median filtering network module, as shown in fig. 3.

The RAM cache module comprises a write control module and a read control module. The write control module is mainly responsible for controlling the wheel flow to write data in5 blocks of RAMs (RAM_1-RAM_5), wherein each block of RAMs is written with data of one line in an image, for example, the 1 st line of data is written into the RAM_1, the 2 nd line of data is written into the RAM_2, the 5 th line of data is written into the RAM_5, the 6 th line of data is written into the RAM_1, and the like. There are 32 pixel values per line of data.

The read control module is mainly responsible for controlling the simultaneous reading of data from the 5-block RAM, and then outputting the read 5-block data to the parallel median filtering network module.

(2) Parallel median filtering network module design:

the parallel median filtering network module data selection flow is shown in fig. 4. Each cell in the figure represents an 8bits pixel value, d1_1 represents the pixel value at the 1 st row 1 st column position in the image, d2_3 represents the pixel value at the 2 nd row 3 rd column position in the image, and so on. In the figure, median filtering is performed on a 5*5 rectangular window formed by 5 columns of grids, and dout_1 to dout_32 represent 32 parallel median filtering outputs of each clock. In addition, the left-most two columns of shaded boxes in fig. 4 represent the last two columns of pixel values of the last clock input, and the right-most two columns of shaded boxes represent the first two columns of pixel values of the next clock input.

The internal design of the parallel median filtering network module is shown in fig. 5, and mainly consists of 32 public ordering networks, 32 independent ordering networks and a data distribution module. The left column in fig. 5 is 32 common sorting networks, each of which functions to sort the five pixel values of the input from small to large. The result of every 5 public ordering networks is the result of column ordering in one 5*5 rectangular window in fig. 1 b, so the result of each public ordering network is commonly used by multiple pixels. The data distribution module is responsible for selecting 5 adjacent columns of data for each independent ordering network. Then the data enter an independent ordering network, the independent ordering network is used for realizing the processing procedures from the graph c of fig. 1 to the graph e of fig. 1, and the processed output is the median filtering result.

Specific designs of the public ordering network, the independent ordering network are described below. First, the basic ranking unit is described. A basic ordering unit can be represented in fig. 6, and can implement size comparison and exchange order of two 8bits pixel values. The basic ordering unit consists of an 8bits comparator and two MUXs (multiplexers), assuming that the two 8bits pixel values are A and B, respectively, the output of the 8bits comparator controls whether the multiplexer outputs A or B, which refer to the two pixel values that are input. L is a smaller value in A, B and H is a larger value in A, B. The right half of fig. 6 is a simplified symbol of a basic ordering unit.

The internal structure of the common ranking network is shown in fig. 7 and consists of 9 basic ranking units in total. The basic ordering unit is composed of all combinational logic, and in order to increase the system frequency, a first stage pipeline register needs to be inserted, and the position of the dotted line in fig. 7 is the position of the inserted pipeline register. Five arbitrary pixel values in 1-in 5 enter the network and then become 5 values after being sequenced from small to large after delay of one clock.

As shown in fig. 8, in1 to in25 are 25 input pixel values. The network structure is mainly divided into three parts, namely a part of row ordering (corresponding to a diagram c of fig. 1), a part of diagonal ordering (k=1) (corresponding to a diagram d of fig. 1), a part of diagonal ordering (k=1/2) (corresponding to a diagram e of fig. 1), and k refers to the slope of a straight line where the pixels to be ordered are located in each step in fig. 1. The vertical dashed line in fig. 8 indicates the position of the pipeline registers, and four stages of pipeline registers are inserted in total, i.e. the independent ordering network outputs the filtering result with a delay of 4 clocks.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for realizing a high throughput rate median filter based on an FPGA is characterized in that a 5*5 rectangular window is selected as a two-dimensional template, 32 8bits pixel values are input into each clock to carry out 32-path parallel median filtering, and each clock after being processed outputs 32 filtered 8bits pixel values; the method is realized through a RAM cache module and a parallel median filtering network module; the RAM buffer module is used for storing the data of the first 4 rows, and after the data of the 5 th row arrives, the data of the 5 th row is read out simultaneously and sent to the following parallel median filtering network module; the parallel median filtering network module is used for receiving the 5 lines of data output by the RAM cache module, completing parallel median filtering calculation and then outputting a filtering result.

2. The method for implementing a high throughput median filter based on FPGA of claim 1, wherein the RAM cache module comprises a write control module and a read control module.

3. The method for implementing a high throughput median filter based on FPGA of claim 2, wherein the write control module is configured to control a round trip to write data into 5 blocks of RAM, write data into one line of an image into each block of RAM, and write 32 pixel values into each clock; the read control module is used for controlling the simultaneous reading of data from the 5 RAMs, and then outputting the read 5 data to the parallel median filtering network module.

4. The method for implementing a high throughput median filter based on FPGA of claim 1, wherein the parallel median filtering network module data selection process is as follows: the leftmost 5 columns of pixels are selected as the first path median filtering input, the 2 nd column to the 6 th column of pixels are selected as the second path median filtering input, and the like, and 32 paths of median filtering inputs are selected.

5. The method for implementing a high throughput median filter based on FPGA of claim 4, wherein said parallel median filter network module is comprised of 32 common ordering networks, 32 independent ordering networks and a data distribution module.

6. The method for implementing a high throughput median filter based on FPGA of claim 5, wherein each common ordering network is configured to order input 5 pixel values from small to large, and the result of each common ordering network is the result of column ordering in a 5*5 rectangular window, and the result of each common ordering network is commonly used by a plurality of pixels; and selecting 5 adjacent columns of data for each independent ordering network by utilizing a data distribution module, and then entering the independent ordering network.

7. The method for implementing a high throughput median filter based on FPGA of claim 5, wherein the common ordering network is composed of 9 basic ordering units, and a first stage pipeline register is inserted after the six basic ordering units on the left, and any 5 pixel values enter the common ordering network and become 5 values after ordering from small to large after a delay of one clock.

8. The method of implementing a high throughput median filter based on FPGA of claim 5, wherein said independent ordering network comprises partial row ordering, partial diagonal ordering (k=1), and partial diagonal ordering (k=1/2), including 25 input pixel values; the four-stage pipeline register is inserted, the first stage pipeline register is positioned in the middle of partial row sequencing, the second stage pipeline register is positioned after partial row sequencing, the third stage pipeline register is positioned in the middle of partial diagonal sequencing (k=1), and the fourth stage pipeline register is positioned after partial diagonal sequencing (k=1), namely, the independent sequencing network outputs a filtering result after delay of 4 clocks.

9. The method for implementing a high throughput median filter based on FPGA of claim 7 or 8, wherein the basic ordering unit is configured to implement a size comparison and a switching order of two 8bits pixel values; the basic ordering unit consists of an 8bits comparator and two multiplexers, assuming that the two 8bits pixel values are A and B respectively, the output of the 8bits comparator controls the multiplexer output A or B.