WO2021036424A1 - 盒式滤波器并行高效计算方法 - Google Patents

盒式滤波器并行高效计算方法 Download PDF

Info

Publication number
WO2021036424A1
WO2021036424A1 PCT/CN2020/096461 CN2020096461W WO2021036424A1 WO 2021036424 A1 WO2021036424 A1 WO 2021036424A1 CN 2020096461 W CN2020096461 W CN 2020096461W WO 2021036424 A1 WO2021036424 A1 WO 2021036424A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
filter core
pixels
filter
average value
Prior art date
Application number
PCT/CN2020/096461
Other languages
English (en)
French (fr)
Inventor
刘心哲
陈富鹏
哈亚军
Original Assignee
上海科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海科技大学 filed Critical 上海科技大学
Priority to US17/054,169 priority Critical patent/US11094071B1/en
Publication of WO2021036424A1 publication Critical patent/WO2021036424A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • G06T7/44Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/10Distribution of clock signals, e.g. skew
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30109Register structure having multiple operands in a single register
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators

Definitions

  • the present invention relates to a fast and effective calculation method for box filters.
  • Box filters are generally used in various image and video processing applications, and they are also widely used to implement other algorithms.
  • the box filter is a smoothing filter that can be used to calculate the average of all pixels in the kernel. Therefore, the implementation of the box filter should be fast enough so that it does not consume too much time even if it is used extensively. For the same reason, the box filter should be more economical so that it does not consume too many resources.
  • Partial sum method first calculate the partial sum, and then use the partial sum to calculate the final result. This calculation can save resources, but it has strong dependencies and cannot be accelerated by parallelization.
  • the purpose of the present invention is to provide a fast and effective calculation method for the box filter.
  • the technical solution of the present invention is to provide a parallel efficient calculation method for a box filter.
  • the filter core starts from the upper left corner of the box filter and moves from left to right and from top to bottom. If the radius of the filter core is r, the filter core is composed of (2r+1) columns of pixels, and each column of pixels is composed of (2r+1) rows of pixels. When the filter core moves from left to right, it moves one column at a time. When the filter core moves from top to bottom, it moves one line at a time.
  • the average value of all pixels in the filter core is defined as the pixel average value, and the average value of the pixels corresponding to the filter core after each movement is calculated.
  • the corresponding pixel average value is the calculation result of the box filter, which is characterized in that calculating the average value corresponding to all filter cores includes the following steps:
  • Step 1 Establish two parallel architectures for a given degree of parallelism N and the radius r of the filter core. They are the architecture that does not require additional registers and the architecture that requires additional registers. Among them:
  • the average value of N pixels among the average values of all pixels formed by the filter core during a certain movement from left to right in each clock cycle is calculated in parallel, and the average value of each pixel is directly determined by All partial sums are added together, and the partial sum is the sum of the pixel values of a column of pixels;
  • the average value of N pixels among the average values of all pixels formed by the filter core during a certain movement from left to right in each clock cycle is calculated in parallel.
  • the register stores the average value of pixels at the positions of N filter cores in the previous clock cycle (T-1), and the position F T (x, y) of any filter core in the current clock cycle T is changed from the previous a position where the filter kernel F clock cycle moves from left to right T-1 (x, y) to obtain N times, i.e.
  • F T-1 (x, y ) F T (xN, y), set the current clock period T
  • the average value of the pixel at the location of the filter core F T (x, y) is F T (x, y), which is stored in the register at the location of the filter core in the previous clock cycle F T-1 (x, y)
  • the average pixel value of is F T-1 (x, y), then:
  • F T (x, y) F T-1 (x, y) -SS - + SS +, wherein, SS - indicates a position where the filter kernel F T-1 (x, y ) moves from left to right N
  • the sum of the pixel values of the N columns of pixels passed by the left edge of the filter core location F T-1 (x, y) is defined as a partial sum
  • SS + means When the filter core location F T-1 (x, y) moves from left to right N times, the pixel value of the N columns of pixels passed by the right edge of the filter core location F T-1 (x, y) The sum of the pixel values of each column of pixels is defined as a partial sum;
  • Step 2 Establish addition trees for the two architectures established in step 1, respectively, which are defined as addition tree one and addition tree two, among which two addition trees two are established, which are used to calculate each SS - and each SS + respectively ;
  • Step 3 Search the addition tree 1 and the addition tree 2 from top to bottom, calculate the pixel average value corresponding to each filter core through the addition tree 1 and the addition tree 2, respectively, calculate the current filter core time and reuse the previous filter core time The same part sum, the two architectures established in step 1 need to consume resources respectively;
  • Step 4 Choose the less resource-consuming architecture established in Step 1 to calculate the box filter.
  • the constraint conditions for establishing the addition tree one include: 1) the input of the addition tree one is (2r+N) of the partial sums; 2) the addition tree one calculates N outputs at the same time, and each output is a filter The pixel average of the kernel; 3) Each output is the sum of 2r+1 adjacent inputs;
  • the method of constructing the addition tree one includes: 1) It is a combination of several binary trees, with a total of [log 2 (2r+1)]+1 layers, which are the 0th layer to the [log 2 (2r+1) layer respectively. ] Layer; 2) Each node of the 0th layer is a said partial sum, from the 1st layer to the [log 2 (2r+1)] layer, the starting element of the adjacent node is in the input The index difference of 2; the number of elements contained in each node of each layer is the power of 2;
  • the combination of the nodes of each layer of the addition tree one to obtain the output is: 2r+1 is expressed as an M-bit binary number, and the first to the Mth bits of the binary number from right to left correspond to the 0th to the Mth level of the addition tree one in turn
  • the M-1 layer on the layer number corresponding to the bit with the value of 1 in the M-bit binary number, searches for the required nodes in the order from high to low, from left to right, and combines them to form an output.
  • the constraint condition for establishing the second additive tree for calculating each SS - or for calculating each SS + is: 1) the input of the second additive tree is (2N-1) the partial sums; 2 ) Each output of the addition tree 2 is the sum of N adjacent inputs; 3) The addition tree 2 calculates N outputs at the same time, and each output is an SS - or an SS + ;
  • the construction method of the additive tree two is a combination of several binary trees, a total of log 2 N+1 layers, respectively from the 0th layer to the log 2 N layer; 2) each node of the 0th layer Is a said partial sum, so that starting from the first layer, the index of the starting element of each adjacent node in the input differs by 2; 3) The Nth node located in the 0th layer is respectively compared with the one in the first layer The N/2-1th node and the N/2th node are combined, the new node generated contains all the elements of the parent node, and the new node generated is still located in the first layer and occupies the same position as the parent node in the first layer; 4) When calculating the second layer, the node containing three elements in the first layer is preferred, unless the second) condition cannot be met; 5) the other layers are constructed in the same way as the addition tree one
  • the combination of the nodes of each layer of the additive tree two to obtain the output is: from top to bottom, from left to right to find the nodes for combination, and when there is no suitable element for combination in this layer, go to the next layer.
  • the present invention proposes two parallel implementations of the box filter architecture, and reduces the required resources by constructing an addition tree.
  • the present invention has the following characteristics:
  • the present invention proposes two architectures suitable for different parameter combinations, which eliminate the inherent data dependence of the algorithm, so it can be arbitrarily paralleled;
  • the present invention develops a program that can synthesize and automatically generate a box filter code based on input parameters, and the code can be accepted by high-level synthesis tools.
  • Figure 1 is a schematic diagram of the design of a box filter
  • FIG 2 is a schematic calculating S y (x + r, y ) ;
  • Figure 3 is a schematic diagram of the calculation of F(x, y);
  • FIG. 4 is a schematic diagram of calculating the pixel average value of the filter core used in the architecture that does not require additional registers;
  • FIG. 5 is a schematic diagram of calculating the average pixel value of the filter core used in the architecture that requires additional registers
  • Figure 6 is a schematic diagram of the construction of the addition tree one
  • Figure 7 is a schematic diagram of the construction of the addition tree one.
  • the outer frame in Figure 1 is equivalent to a box filter
  • the small frame in the center is equivalent to a filter core.
  • r is the radius of the filter core, which is the external input value when calculating the box filter
  • the current position F(x,y) of the filter core is composed of (2r+1) columns of pixels, and each column of pixels is composed of (2r+ 1) Line pixel composition.
  • the average value of all pixels in the filter kernel is defined as the pixel average value.
  • the center pixel of the box filter is used as the origin to establish XY as the axis.
  • the average pixel value is F(x, y), then:
  • I(x, y) is the current position of the filter kernel F(x, y) as the x-th column and the y-th row, that is, the pixel value of the pixel marked as (x, y);
  • (X, Y) is the mark of the center pixel of the current position F(x, y) of the filter core;
  • dx and dy are the offsets relative to X and Y, respectively.
  • the filter core starts from the upper left corner of the box filter and moves from left to right and from top to bottom.
  • the filter core moves one column at a time, and when the filter core moves from top to down, it moves one row at a time.
  • the filter core can get a different average value of pixels every time it moves.
  • the goal of calculating the box filter is to calculate the average value of all pixels.
  • the present invention proposes two architectures suitable for different parameter combinations, which eliminate the inherent data dependence of the algorithm, and therefore can be arbitrarily paralleled.
  • the following is a brief introduction to the "partial sum” algorithm:
  • this method first calculates the sum of the pixel values of the pixels in the 2r+1 rows in each column of pixels. For example, for the pixels in the x+r column, the sum of the pixel values of the pixels in the 2r+1 rows is calculated. Defined as Sy (x+r, y), as shown in Figure 2. When calculating Sy (x+r, y), the previously calculated result Sy (x+r, y-1) is used, and S y (x+r, y-1) is the current x+r column pixel Move up the sum of the pixel values of the corresponding column of pixels by one row.
  • Sy (x+r, y) S y (x+r, y-1)+I(x+r, y+r)-I(x+r, yr-1). That is to say, the current x+r-th column of pixels is regarded as a column of pixels at the same position moved down by one row. When that column of pixels is moved down, the pixel value of a pixel whose upper edge moves down through is I(x+ r, yr-1). At the same time, when the column of pixels moves downward, the pixel value of one pixel through which the lower edge moves downward is I(x+r, y+r). The sum of each column of pixels is then used to calculate the pixel average of the filter kernel.
  • F(x-1,y) is the current position of the filter core F(x,y) before moving from left to right
  • (X, Y) is the coordinate of the center pixel of the current position F(x, y) of the filter core
  • S y (X+r , y) represents pixel values X + r and columns of pixels
  • S y (Xr-1, y) represents pixel Xr-1 and column values.
  • the previous position F(x-1, y) of the filter core moves one column from left to right, and the current position F(x, y) forming the filter core, then Sy (X+r, y) corresponds to the filter nuclear previous position F (x-1, y) which is moved through the right edge of the pixel values of a and, S y (Xr-1, y) corresponding to the previous position of the filter kernel F (x- 1, y) The sum of the pixel values of a column of pixels passed by the right edge when moving.
  • the present invention designs two parallel architectures for a given degree of parallelism N and the radius r of the filter core. They are the architecture that does not require additional registers and the architecture that requires additional registers. Among them:
  • the pixel average value F 0 and the pixel average value F 1 need to be calculated at the same time.
  • the calculation method is shown in FIG. 4.
  • S a to S h are different pixel values and columns of pixels
  • the average pixel value F 0 is the accumulation to a S a S g
  • the average pixel value F 1 is accumulated to S b S h.
  • the calculation formula for the average pixel value can be summarized as:
  • Sy (x, y) represents the sum of the pixel values of the pixels in the x-th column
  • (X, Y) is the center pixel of the current position of the filter core.
  • the average value of the pixels of the N filter cores in the same row is calculated once in each clock cycle.
  • the register stores the previous clock cycle (T-1 )
  • the position F T (x, y) of any filter core of the current clock cycle T is changed from the position of the filter core of the previous clock cycle F T-1 (x, y) moves from left to right to give N times, i.e.
  • F T-1 (x, y ) F T (xN, y), the current clock period T provided a position where the filter kernel F T (x, y) of the pixel where the average value of the position of filter kernel F T (x, y), stored on a clock cycle within the register of the average pixel value F T-1 (x, y) is F T-1 (x, y ), Then there are:
  • F T (x, y) F T-1 (x, y) -SS - + SS +, wherein, SS - indicates a position where the filter kernel F T-1 (x, y ) moves from left to right N
  • the sum of the pixel values of the N columns of pixels passed by the left edge of the filter core location F T-1 (x, y) is defined as a partial sum
  • SS + N columns of pixels represents a position where the filter kernel F when T-1 (x, y) moves from left to right N times, the position of the right edge of the filter kernel F where T-1 (x, y) the elapsed
  • the present invention reuses all intermediate results in the above architecture by using an additive tree, which greatly reduces the computing resources required for algorithm parallelism.
  • the addition tree one suitable for the architecture that does not require additional registers is used to perform the calculation process in the above step A). After using the addition tree, the number of adders required is reduced to 20.
  • the number of elements contained in each node of each layer is a power of two.
  • a to n are the sum of the pixel values of pixels in different columns, that is, the input, which is the node of the 0th layer, and the elements of the node of the 1st layer are bc, de, etc., and the other layers are And so on.
  • Addition tree two suitable for architectures that require additional registers, used to calculate each SS - and each SS + , for example, to perform step B) to The calculation process or used to perform step B) to The calculation process. After using the addition tree, the number of adders required is reduced to 32
  • the addition tree 2 used to calculate SS- is constructed through the following aspects (the addition tree 2 used to calculate SS + is the same as this):
  • N 6 in step B) above, then calculate at the same time to A total of 6 outputs.
  • the node containing three elements in the first layer is preferred, unless the second condition cannot be met.
  • a to k are the sum of the pixel values of pixels in different columns, that is, the input, which is the node of the 0th layer.
  • step A) and step B) become:
  • iii.tmp_bcde tmp_bc+tmp_de
  • tmp_defg tmp_de+tmp_fg
  • the present invention also develops a program that can synthesize and automatically generate a box filter code according to the input parameters, and the code can be accepted by high-level synthesis tools:
  • This program contains two manually written code templates, each using one of the two parallel architectures described above;
  • This program reads the input parameters (parallelism N and filter kernel radius r), and then generates the addition tree corresponding to the two architectures according to the above rules;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

一种盒式滤波器并行高效计算方法,该方法提出了两种并行实现盒式过滤器的体系结构,并通过构造一个加法树来减少所需的资源,该方法通过使用加法树重用所有中间结果,大大减少了算法并行性所需的计算资源。

Description

盒式滤波器并行高效计算方法 技术领域
本发明涉及一种盒式滤波器的快速有效的计算方法,盒式滤波器通常用于各种图像和视频处理应用中,它们也被广泛应用于实现其他算法。
背景技术
在计算机视觉领域,可能需要在算法中使用大量的盒式滤波器。盒式滤波器是一种平滑滤波器,可用于计算内核中所有像素的平均值。因此,盒式滤波器的实现应该足够快,以便即使大量使用也不会消耗太多时间。出于同样的原因,盒式滤波器的实现应该更经济,这样它就不会消耗太多的资源。
过去的研究项目试图在不同的计算平台中加速盒式滤波器,如图形处理单元(GPU)或现场可编程门阵列(FPGA)。在某些研究(Christoph Rhemann,Asmaa Hosni,Michael Bleyer,Carsten Rother,and Margrit Gelautz.Fast cost-volume filtering for visual correspondence and beyond.CVPR 2011,pages 3017-3024,2011.;Ziyang Ma,Kaiming He,Yichen Wei,Jian Sun,and Enhua Wu.Constant time weighted median filtering for stereo matching and beyond.2013 IEEE International Conference on Computer Vision,pages 49-56,2013.;H.Gupta,D.S.Antony,and R.G.N.Implementation of gaussian and box kernel-based approximation of bilateral filter using OpenCL.In 2015 International Conference on Digital Image Computing:Techniques and Applications(DICTA),pages 1-5,Nov 2015.)中,使用GPU来加速一系列盒式滤波器,这些盒式滤波器受益于CUDA/OpenCL提供的更大的可访问性。然而,对于功率受限的***,FPGA正在成为竞争日益激烈的替代方案。在其他一些研究(Hadi Parandeh-Afshar,Arkosnato Neogy,Philip Brisk,and Paolo Ienne.Compressor tree synthesis on commercial high-performance fpgas.TRETS,4:39:1-39:19,2011.)中公开了使用FPGA实现盒式滤波器的原始方法。不幸的是,虽然现有研究提高了盒式滤波器的速度,但它们还没有充分考虑FPGA的并行化能力。它与普通算法不同,这种Z字形扫描方法使得必须具有额外的行缓冲器辅助以与其他算法协作,这也导致资源的浪费和通用性受限。
通常人们使用以下几种方法来计算盒式滤波器:
通用方法:只是简单的内核中的数字一个一个加起来。每一轮都是这样计算:
Figure PCTCN2020096461-appb-000001
该方法在操作之间没有依赖性,因此可以任意并行以实现高速。但是,这种方法计算复杂度太高,对每一个像素平均需要(2r+1) 2个计算。这使得它的实现会消耗大量资源。
积分图方法:积分图的定义是从坐标原点到当前位置所有像素的加和。使用这种方法时,首先计算积分图:
Figure PCTCN2020096461-appb-000002
然后使用积分图来计算最终结果:F(x,y)=Box(x+r,y+r)-Box(x-r-1,y+r)-Box(x+r,y-r-1)+Box(x-r-1,y-r-1)。这种计算方法要求我们缓存积分图以用于后续计算。我们至少需要缓存(2r+1)×W+(2r+1)个中间结果,每个结果需要消耗
Figure PCTCN2020096461-appb-000003
个比特。这对于硬件实现来说是过于昂贵而不可接受的。
部分和方法:首先计算部分和,然后使用部分和来计算最终结果。此计算可节省资源,但它具有很强的依赖性,并且无法通过并行化加速。
发明内容
本发明的目的是提供一种盒式滤波器的快速有效的计算方法。
为了达到上述目的,本发明的技术方案是提供了一种盒式滤波器并行高效计算方法,将滤波器核自盒式滤波器的左上角开始自左向右、自上向下移动,设滤波器核的半径为r,则滤波器核由(2r+1)列像素组成,每一列像素由(2r+1)行像素组成,则滤波器核自左向右移动时每次移动一列,滤波器核自上向下移动时每次移动一行,将滤波器核内所有像素的平均值定义为像素平均值,计算每次移动后的滤波器核所对应的像素平均值,所有滤波器核所对应的像素平均值即为盒式滤波器的计算结果,其特征在于,计算所有滤波器核所对应的平均值包括以下步骤:
步骤1、针对给定的并行度N及滤波器核的半径r建立两种并行的架构,分别为不需要额外寄存器的架构及需要额外寄存器的架构,其中:
在不需要额外寄存器的架构中,每个时钟周期内并行计算滤波器核在某次自左向右移动过程中所形成的所有像素平均值中N个像素平均值,每个像素平均值直接由所有部分和相加而得,部分和为一列像素的像素值的和;
在需要额外寄存器的架构中,每个时钟周期内并行计算滤波器核在某次自左 向右移动过程中所形成的所有像素平均值中N个像素平均值,对于当前时钟周期T而言,在寄存器中存储有上一时钟周期(T-1)的N个滤波器核所在位置的像素平均值,则当前时钟周期T的任一滤波器核所在位置F T(x,y)由上一时钟周期的滤波器核所在位置F T-1(x,y)自左向右移动N次得到,即F T-1(x,y)=F T(x-N,y),设当前时钟周期T的滤波器核所在位置F T(x,y)的像素平均值为F T(x,y),存储于寄存器内的上一时钟周期的滤波器核所在位置F T-1(x,y)的像素平均值为F T-1(x,y),则有:
F T(x,y)=F T-1(x,y)-SS -+SS +,式中,SS -表示滤波器核所在位置F T-1(x,y)自左向右移动N次时,滤波器核所在位置F T-1(x,y)的左侧边缘所经过的N列像素的像素值的和,每一列像素的像素值的和定义为一个部分和;SS +表示滤波器核所在位置F T-1(x,y)自左向右移动N次时,滤波器核所在位置F T-1(x,y)的右侧边缘所经过的N列像素的像素值的和,每一列像素的像素值的和定义为一个部分和;
步骤2、针对步骤1建立的两种架构分别建立加法树,分别定义为加法树一及加法树二,其中建立两个加法树二,分别用于计算每个SS -及每个SS +
步骤3、自顶向下搜索加法树一及加法树二,分别通过加法树一及加法树二计算各个滤波器核所对应的像素平均值,计算当前滤波器核时重用计算之前滤波器核时相同的部分和,统计步骤1建立的两种架构分别需要耗费的资源;
步骤4、选择步骤1建立的耗费资源少的架构来计算盒式滤波器。
优选地,建立所述加法树一的约束条件包括:1)加法树一的输入为(2r+N)个所述部分和;2)加法树一同时计算N个输出,每个输出为滤波器核的像素平均值;3)每个输出都是2r+1个相邻输入的和;
所述加法树一的建树方式包括:1)是几颗组合在一起的二叉树,共有[log 2(2r+1)]+1层,分别为第0层至第[log 2(2r+1)]层;2)第0层的每个节点为一个所述部分和,从第1层至第[log 2(2r+1)]层的每一层,相邻节点的起始元素在输入中的索引相差2;每一层每个节点所包含的元素数目是2的层数次幂;
加法树一的各层节点以得到输出的组合方式为:将2r+1表示为M位二进制数,从右至左二进制数的第1位至第M位依次对应加法树一的第0层至第M-1层,在M位二进制数中值为1的位所对应的层数上,按照从高到低、从左到右的顺序寻找所需要的节点并组合形成输出。
优选地,建立用于计算每个SS -或用于计算每个SS +的所述加法树二的约束条件为:1)加法树二的输入为(2N-1)个所述部分和;2)加法树二的每个输出都是N个相邻输入的和;3)加法树二同时计算N个输出,每个输出为一个SS -或一个SS +
所述加法树二的建树方式:1)是几颗组合在一起的二叉树,一共有log 2N+1层,分别为第0层至第log 2N层;2)第0层的每个节点为一个所述部分和,从而第1层开始,每一层相邻节点的起始元素在输入中的索引相差2;3)将位于第0层的第N个节点分别与第1层中的第N/2-1个节点、第N/2个节点组合,生成的新节点包含双亲节点的所有元素,生成的新节点仍位于第1层并与其第1层中的父节点占据同一位置;4)计算第2层时,优先选用第1层中包含三个元素的节点,除非不能满足第2)个条件;5)其余层构建方式与所述加法树一的相同
加法树二的各层节点以得到输出的组合方式为:自顶向下、自左向右寻找节点进行组合,本层没有合适的用于组合的元素时进入下一层。
本发明提出了两种并行实现盒式过滤器的体系结构,并通过构造一个加法树来减少所需的资源,本发明具有如下特点:
(1)基于“部分和”算法,本发明提出了两种适用于不同参数组合的体系结构,它们消除了算法的固有数据依赖性,因此可以任意并行;
(2)本发明通过使用加法树重用所有中间结果,这大大减少了算法并行性所需的计算资源;
(3)本发明开发了一个程序,可以根据输入参数合成自动生成一个盒式滤波器代码,该代码可以被高层次综合工具接受。
附图说明
图1为盒式滤波器的设计示意图;
图2为S y(x+r,y)的计算示意图;
图3为F(x,y)的计算示意图;
图4为不需要额外寄存器的架构所采用的滤波器核的像素平均值的计算示意图;
图5为需要额外寄存器的架构所采用的滤波器核的像素平均值的计算示意图;
图6为加法树一的构建示意图;
图7为加法树一的构建示意图。
具体实施方式
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解,在阅读了本发明讲授的内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。
如图1所示,图1中的外框等效为盒式滤波器,位于中心的小框等效为滤波器核。r为滤波器核的半径,为计算盒式滤波器时的外部输入值,则滤波器核的当前位置F(x,y)由(2r+1)列像素组成,每一列像素由(2r+1)行像素组成。将滤波器核内所有像素的平均值定义为像素平均值。以盒式滤波器的中心像素为原点建立XY作标轴,则对于滤波器核的当前位置F(x,y),其像素平均值为F(x,y),则有:
Figure PCTCN2020096461-appb-000004
式中,I(x,y)为滤波器核的当前位置F(x,y)中作标为第x列第y行,即作标为(x,y),的像素点的像素值;(X,Y)为滤波器核的当前位置F(x,y)的中心像素点的作标;dx、dy分别为相对于X、Y的偏移量。
计算盒式滤波器时将滤波器核自盒式滤波器的左上角开始自左向右、自上向下移动。滤波器核自左向右移动时每次移动一列,滤波器核自上向下移动时每次移动一行。滤波器核每移动一次就能够得到不同的像素平均值,计算盒式滤波器的目标是计算所有的像素平均值。
本发明基于现有的“部分和”算法,提出了两种适用于不同参数组合的体系结构,它们消除了算法的固有数据依赖性,因此可以任意并行。以下对“部分和”算法做简单介绍:
对于半径为r的滤波器核,这种方先计算每一列像素中2r+1行像素的像素值的和,例如对于第x+r列像素,将其中2r+1行像素的像素值的和定义为S y(x+r,y),如图2所示。计算S y(x+r,y)时,采用先前计算的结果S y(x+r,y-1), S y(x+r,y-1)为将当前的第x+r列像素向上移动一行所对应的那一列像素的像素值的和。则有:S y(x+r,y)=S y(x+r,y-1)+I(x+r,y+r)-I(x+r,y-r-1)。即将当前的第x+r列像素看作是由相同位置的一列像素下移一行得到的,那一列像素向下移时,其上边缘向下移动经过的一个像素的像素值为I(x+r,y-r-1),同时,那一列像素向下移时,其下边缘向下移动经过的一个像素的像素值为I(x+r,y+r)。随后用每一列像素的和来计算滤波器核的像素平均值。
如图3所示,滤波器核的当前位置F(x,y)内的像素平均值F(x,y)表示为:F(x,y)=F(x-1,y)+S y(x+r,y)-S y(x-r-1,y),式中,F(x-1,y)为滤波器核的当前位置F(x,y)自左向右移动前的滤波器核的前一位置F(x-1,y)内所有像素的平均值;(X,Y)为滤波器核的当前位置F(x,y)中心像素的坐标;S y(X+r,y)表示第X+r列像素的像素值的和,S y(X-r-1,y)表示第X-r-1列像素的像素值的和。即滤波器核的前一位置F(x-1,y)左向右移动一列有形成滤波器核的当前位置F(x,y),则S y(X+r,y)对应于滤波器核的前一位置F(x-1,y)移动时其右侧边缘经过的一列像素的像素值和,S y(X-r-1,y)对应于滤波器核的前一位置F(x-1,y)移动时其右侧边缘经过的一列像素的像素值和。
我们希望能够并行计算多个像素平均值以提高计算速度。假设约定的并行度为N,则我们希望同时得到N个像素平均值:F(x,y),F(x+1,y),F(x+2,y),...,F(x+N-1,y)。这要求计算这些最终结果的过程是相互独立的。而原始的部分和算法公式为F(x,y)=F(x-1,y)+S y(X+r,y)-S y(X-r-1,y),可见F(x,y)的计算依赖于F(x-1,y)的计算结果。
为此,本发明针对给定的并行度N及滤波器核的半径r设计了可并行的两种构架,分别为不需要额外寄存器的架构及需要额外寄存器的架构,其中:
在不需要额外寄存器的架构中,在计算最终结果时,直接将所有部分和相加,部分和为一列像素的像素值的和,而不利用之前的计算结果。举例:并行度N=2,需同时计算像素平均值F 0和像素平均值F 1,计算方式如图4所示。图4中,S a至S h均为不同列像素的像素值的和,则像素平均值F 0即为将S a累加至S g,像素平均值F 1为将S b累加至S h。在不需要额外寄存器的架构中,像素平均值的计算公式可以总结为:
Figure PCTCN2020096461-appb-000005
式中,S y(x,y)表示第x列像素的像素值的和,(X,Y)为滤波器核当前位置的中心像素。
在需要额外寄存器的架构中,每个时钟周期内一次计算位于同一行的N个滤波器核的像素平均值,对于当前时钟周期T而言,在寄存器中存储有上一时钟周期(T-1)的N个滤波器核的像素平均值,则当前时钟周期T的任一滤波器核所在位置F T(x,y)由上一时钟周期的滤波器核所在位置F T-1(x,y)自左向右移动N次得到,即F T-1(x,y)=F T(x-N,y),设当前时钟周期T的滤波器核所在位置F T(x,y)的像素平均值为F T(x,y),存储于寄存器内的上一时钟周期的滤波器核所在位置F T-1(x,y)的像素平均值为F T-1(x,y),则有:
F T(x,y)=F T-1(x,y)-SS -+SS +,式中,SS -表示滤波器核所在位置F T-1(x,y)自左向右移动N次时,滤波器核所在位置F T-1(x,y)的左侧边缘所经过的N列像素的像素值的和,每一列像素的像素值的和定义为一个部分和,
Figure PCTCN2020096461-appb-000006
SS +表示滤波器核所在位置F T-1(x,y)自左向右移动N次时,滤波器核所在位置F T-1(x,y)的右侧边缘所经过的N列像素的像素值的和,每一列像素的像素值的和定义为一个部分和,
Figure PCTCN2020096461-appb-000007
举例:并行度N=2,需同时计算像素平均值F 0和像素平均值F 1,计算方式如图5所示。图5中,S a至S j均为不同列像素的像素值的和,F 0’及F 1’为存储在寄存器内的上一时钟周期的像素平均值,则有:F 0=F 0’-S a-S b+S h+S i,F 1=F 1’-S b-S c+S i+S j
假设并行度N=6,滤波器内核半径r=4,在使用部分S y(x,y)和计算最终结果F(x,y)的过程中,进行了运算:
A)对于不需要额外寄存器的架构,按照图4中的公式,需要同时运算如下六个等式。共需要2Nr=48个加法器:
i.F 0=S a+S b+S c+S d+S e+S f+S g+S h+S i
ii.F 1=S b+S c+S d+S e+S f+S g+S h+S i+S j
iii.F 2=S c+S d+S e+S f+S g+S h+S i+S j+S k
iv.F 3=S d+S e+S f+S g+S h+S i+S j+S k+S l
v.F 4=S e+S f+S g+S h+S i+S j+S k+S l+S m
vi.F 5=S f+S g+S h+S i+S j+S k+S l+S m+S n
B)对于需要额外寄存器的架构,按照图5中的公式,需要同时运算如下12个等式,共需2(N-1) 2=60个加法器:
i.
Figure PCTCN2020096461-appb-000008
ii.
Figure PCTCN2020096461-appb-000009
iii.
Figure PCTCN2020096461-appb-000010
iv.
Figure PCTCN2020096461-appb-000011
v.
Figure PCTCN2020096461-appb-000012
vi.
Figure PCTCN2020096461-appb-000013
vii.
Figure PCTCN2020096461-appb-000014
viii.
Figure PCTCN2020096461-appb-000015
ix.
Figure PCTCN2020096461-appb-000016
x.
Figure PCTCN2020096461-appb-000017
xi.
Figure PCTCN2020096461-appb-000018
xii.
Figure PCTCN2020096461-appb-000019
本发明通过使用加法树重用上述架构中的所有中间结果,这大大减少了算法并行性所需的计算资源。
适用于不需要额外寄存器的架构的加法树一,用以进行上述步骤A)中的运算过程。使用加法树后所需要的加法器数目降低为20个。
通过以下几个方面构建加法树一:
第一)需求
1)需要2r+N个输入,如上述步骤A)中N=6、r=4,则输入为S a至S n共14个;
2)每个输出都是2r+1个相邻输入的和,如上述步骤A)中r=4,则输出F为9个S相加;
3)同时计算N个输出,如上述步骤A)中N=6,则同时计算F 0至F 5共6个输出。
第二)建树方式
1)这是几颗组合在一起的二叉树,[log 2(2r+1)]+1层,分别为第0层至第[log 2(2r+1)]层;
2)每一层相邻节点的起始元素在输入中的索引相差2;
3)每一层每个节点所包含的元素数目是2的层数次幂。
如图6所示,a至n分别为不同列的像素的像素值的和,即为输入,其为第0层的节点,第1层的节点包含的元素为bc、de等,其他层以此类推。
第三)组合方式
将2r+1表示为二进制,在值为1的位所对应的层数上,按照从高到低,从左到右的顺序寻找所需要的节点并组合。
举例:2r+1=19=10011。则需要在第4层,第1层和第0层寻找节点进行组合,组合后得到输出。
适用于需要额外寄存器的架构的加法树二,分别用于计算每个SS -及每个SS +,例如用以进行步骤B)中
Figure PCTCN2020096461-appb-000020
Figure PCTCN2020096461-appb-000021
的运算过程或用以进行步骤B)中
Figure PCTCN2020096461-appb-000022
Figure PCTCN2020096461-appb-000023
的运算过程。使用加法树后所需要的加法器数目降低为32个
通过以下几个方面构建用于计算SS -的加法树二(构建用于计算SS +的加法树二与此相同):
第一)需求
1)需要2N-1个输入,例如上述步骤B)中N=6、r=4,则输入为S a至S k共11个;
2)每个输出都是N个相邻输入的和,例如上述步骤B)中N=6,则输出SS为6个S相加;
3)同时计算N个输出,如例如上述步骤B)中N=6,则同时计算
Figure PCTCN2020096461-appb-000024
Figure PCTCN2020096461-appb-000025
共6个输出。
第二)建树方式
1)这是几颗组合在一起的二叉树,log 2N+1层,分别为第0层至第log 2N层。
2)每一层,相邻节点的起始元素在输入中的索引相差2。
3)将第N个输入元素(位于第0层)分别与第一层中的第N/2-1个节点、第N/2个节点组合。生成的新节点包含双亲节点的所有元素。生成的新节点仍位于第一层并与其第一层中的父节点占据同一位置。
4)计算第二层时,优先选用第一层中包含三个元素的节点,除非不能满足第2个条件。
5)其余层构建方式与不需要额外寄存器的架构的加法树一相同。
如图7所示,a至k分别为不同列的像素的像素值的和,即为输入,其为第0层的节点。
第三)组合方式
自顶向下,自左向右寻找节点进行组合,本层没有合适的用于组合的元素时进入下一层。
使用加法树后,步骤A)及步骤B)中的运算变为:
对于不需要额外寄存器的架构,进行如下运算,共需要20个加法器:
i.tmp_bc=S b+S c;tmp_de=S d+S e;tmp_fg=S f+S g
ii.tmp_hi=S h+S i;tmp_jk=S j+S k;tmp_lm=S l+S m
iii.tmp_bcde=tmp_bc+tmp_de;tmp_defg=tmp_de+tmp_fg;
iv.tmp_fghi=tmp_fg+tmp_hi;tmp_hijk=tmp_hi+tmp_jk;
v.tmp_jklm=tmp_jk+tmp_lm;
vi.tmp_bcdefghi=tmp_bcde+tmp_fghi;
vii.tmp_defghijk=tmp_defg+tmp_hijk;
viii.tmp_fghihklm=tmp_fghi+tmp_jklm;
ix.F 0=S a+tmp_bcdefghi;F 1=tmp_bcdefghi+S i
x.F 2=S c+tmp_defghijk;F 3=tmp_defghijk+S l
xi.F 4=S e+tmp_fghihklm;F 5=tmp_fghihklm+S n
本发明还开发了一个程序,可以根据输入参数合成自动生成一个盒式滤波器代码,该代码可以被高层次综合工具接受:
a)这个程序包含两份人工编写的代码模板,分别使用以上描述的两种并行化架构之一;
b)这个程序读取输入参数(并行度N和滤波器内核半径r),随后根据上述规则生成两种架构所对应的加法树;
c)程序自顶向下搜索加法树,生成代码,并统计两种架构所需要耗费的资源;
d)比较并选择消耗资源更少的解决方案,填充代码模板,并输出可以被高层次综合工具接受的C++代码。

Claims (3)

  1. 一种盒式滤波器并行高效计算方法,将滤波器核自盒式滤波器的左上角开始自左向右、自上向下移动,设滤波器核的半径为r,则滤波器核由(2r+1)列像素组成,每一列像素由(2r+1)行像素组成,则滤波器核自左向右移动时每次移动一列,滤波器核自上向下移动时每次移动一行,将滤波器核内所有像素的平均值定义为像素平均值,计算每次移动后的滤波器核所对应的像素平均值,所有滤波器核所对应的像素平均值即为盒式滤波器的计算结果,其特征在于,计算所有滤波器核所对应的平均值包括以下步骤:
    步骤1、针对给定的并行度N及滤波器核的半径r建立两种并行的架构,分别为不需要额外寄存器的架构及需要额外寄存器的架构,其中:
    在不需要额外寄存器的架构中,每个时钟周期内并行计算滤波器核在某次自左向右移动过程中所形成的所有像素平均值中N个像素平均值,每个像素平均值直接由所有部分和相加而得,部分和为一列像素的像素值的和;
    在需要额外寄存器的架构中,每个时钟周期内并行计算滤波器核在某次自左向右移动过程中所形成的所有像素平均值中N个像素平均值,对于当前时钟周期T而言,在寄存器中存储有上一时钟周期(T-1)的N个滤波器核的像素平均值,则当前时钟周期T的任一滤波器核所在位置F T(x,y)由上一时钟周期的滤波器核所在位置F T-1(x,y)自左向右移动N次得到,即F T-1(x,y)=F T(x-N,y),设当前时钟周期T的滤波器核所在位置F T(x,y)的像素平均值为F T(x,y),存储于寄存器内的上一时钟周期的滤波器核所在位置F T-1(x,y)的像素平均值为F T-1(x,y),则有:
    F T(x,y)=F T-1(x,y)-SS -+SS +,式中,SS -表示滤波器核所在位置F T-1(x,y)自左向右移动N次时,滤波器核所在位置F T-1(x,y)的左侧边缘所经过的N列像素的像素值的和,每一列像素的像素值的和定义为一个部分和;SS +表示滤波器核所在位置F T-1(x,y)自左向右移动N次时,滤波器核所在位置F T-1(x,y)的右侧边缘所经过的N列像素的像素值的和,每一列像素的像素值的和定义为一个部分和;
    步骤2、针对步骤1建立的两种架构分别建立加法树,分别定义为加法树一及加法树二,其中建立两个加法树二,分别用于计算每个SS -及每个SS +
    步骤3、自顶向下搜索加法树一及加法树二,分别通过加法树一及加法树二计算各个滤波器核所对应的像素平均值,计算当前滤波器核时重用计算之前滤波 器核时相同的部分和,统计步骤1建立的两种架构分别需要耗费的资源;
    步骤4、选择步骤1建立的耗费资源少的架构来计算盒式滤波器。
  2. 如权利要求1所述的一种盒式滤波器并行高效计算方法,其特征在于,建立所述加法树一的约束条件包括:1)加法树一的输入为(2r+N)个所述部分和;2)加法树一同时计算N个输出,每个输出为滤波器核的像素平均值;3)每个输出都是2r+1个相邻输入的和;
    所述加法树一的建树方式包括:1)是几颗组合在一起的二叉树,共有[log 2(2r+1)]+1层,分别为第0层至第[log 2(2r+1)]层;2)第0层的每个节点为一个所述部分和,从第1层至第[log 2(2r+1)]层的每一层,相邻节点的起始元素在输入中的索引相差2;每一层每个节点所包含的元素数目是2的层数次幂;
    加法树一的各层节点以得到输出的组合方式为:将2r+1表示为M位二进制数,从右至左二进制数的第1位至第M位依次对应加法树一的第0层至第M-1层,在M位二进制数中值为1的位所对应的层数上,按照从高到低、从左到右的顺序寻找所需要的节点并组合形成输出。
  3. 如权利要求2所述的一种盒式滤波器并行高效计算方法,其特征在于,建立用于计算每个SS -或用于计算每个SS +的所述加法树二的约束条件为:1)加法树二的输入为(2N-1)个所述部分和;2)加法树二的每个输出都是N个相邻输入的和;3)加法树二同时计算N个输出,每个输出为一个SS -或一个SS +
    所述加法树二的建树方式:1)是几颗组合在一起的二叉树,一共有log 2N+1层,分别为第0层至第log 2N层;2)第0层的每个节点为一个所述部分和,从而第1层开始,每一层相邻节点的起始元素在输入中的索引相差2;3)将位于第0层的第N个节点分别与第1层中的第N/2-1个节点、第N/2个节点组合,生成的新节点包含双亲节点的所有元素,生成的新节点仍位于第1层并与其第1层中的父节点占据同一位置;4)计算第2层时,优先选用第1层中包含三个元素的节点,除非不能满足第2)个条件;5)其余层构建方式与所述加法树一的相同
    加法树二的各层节点以得到输出的组合方式为:自顶向下、自左向右寻找节点进行组合,本层没有合适的用于组合的元素时进入下一层。
PCT/CN2020/096461 2019-08-26 2020-06-17 盒式滤波器并行高效计算方法 WO2021036424A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/054,169 US11094071B1 (en) 2019-08-26 2020-06-17 Efficient parallel computing method for box filter

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910788715.0 2019-08-26
CN201910788715.0A CN110648287B (zh) 2019-08-26 2019-08-26 盒式滤波器并行高效计算方法

Publications (1)

Publication Number Publication Date
WO2021036424A1 true WO2021036424A1 (zh) 2021-03-04

Family

ID=69009733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096461 WO2021036424A1 (zh) 2019-08-26 2020-06-17 盒式滤波器并行高效计算方法

Country Status (3)

Country Link
US (1) US11094071B1 (zh)
CN (1) CN110648287B (zh)
WO (1) WO2021036424A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110648287B (zh) * 2019-08-26 2022-11-25 上海科技大学 盒式滤波器并行高效计算方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110116727A1 (en) * 2009-11-13 2011-05-19 Daegu Gyeongbuk Institute of Science and Technolog y Method and apparatus for performing parallel box filtering in region based image processing
US8781234B2 (en) * 2010-10-01 2014-07-15 Intel Corporation Optimized fast hessian matrix computation architecture
CN104571401A (zh) * 2013-10-18 2015-04-29 中国航天科工集团第三研究院第八三五八研究所 一种高速导向滤波器在fpga平台的实现装置
CN110097514A (zh) * 2019-04-18 2019-08-06 南京理工大学 基于学习余弦字典的稀疏逼近加速双边滤波方法
CN110648287A (zh) * 2019-08-26 2020-01-03 上海科技大学 盒式滤波器并行高效计算方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8005881B1 (en) * 2007-03-02 2011-08-23 Xilinx, Inc. Scalable architecture for rank order filtering
US7889923B1 (en) * 2007-05-31 2011-02-15 Adobe Systems Incorporated System and method for sparse histogram merging
CN105787910B (zh) * 2015-12-24 2019-01-11 武汉鸿瑞达信息技术有限公司 一种人脸区域滤波方法基于异构平台的计算优化方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110116727A1 (en) * 2009-11-13 2011-05-19 Daegu Gyeongbuk Institute of Science and Technolog y Method and apparatus for performing parallel box filtering in region based image processing
US8781234B2 (en) * 2010-10-01 2014-07-15 Intel Corporation Optimized fast hessian matrix computation architecture
CN104571401A (zh) * 2013-10-18 2015-04-29 中国航天科工集团第三研究院第八三五八研究所 一种高速导向滤波器在fpga平台的实现装置
CN110097514A (zh) * 2019-04-18 2019-08-06 南京理工大学 基于学习余弦字典的稀疏逼近加速双边滤波方法
CN110648287A (zh) * 2019-08-26 2020-01-03 上海科技大学 盒式滤波器并行高效计算方法

Also Published As

Publication number Publication date
CN110648287B (zh) 2022-11-25
US20210248764A1 (en) 2021-08-12
US11094071B1 (en) 2021-08-17
CN110648287A (zh) 2020-01-03

Similar Documents

Publication Publication Date Title
US11263525B2 (en) Progressive modification of neural networks
JP5340088B2 (ja) 情報処理方法及び装置
JP5058681B2 (ja) 情報処理方法及び装置、プログラム、記憶媒体
US20230305802A1 (en) Median Value Determination in a Data Processing System
CN111640089A (zh) 一种基于特征图中心点的缺陷检测方法及装置
US20160283801A1 (en) Image recognition method, image recognition device and image recognition program
CN114241388A (zh) 基于时空记忆信息的视频实例分割方法和分割装置
WO2021036424A1 (zh) 盒式滤波器并行高效计算方法
JP2023541350A (ja) 表畳み込みおよびアクセラレーション
Tourtounis et al. Salt-n-pepper noise filtering using cellular automata
US9898805B2 (en) Method for efficient median filtering
CN111161289A (zh) 图像中物体轮廓精度提升方法、装置和计算机程序产品
CN116109481A (zh) 缩放方法、芯片、存储介质及电子设备
JP2004282701A (ja) 最小領域による合成画像の分割
Saidi et al. Implementation of a real‐time stereo vision algorithm on a cost‐effective heterogeneous multicore platform
US11132569B2 (en) Hardware accelerator for integral image computation
JPH01288974A (ja) 画像処理方法
CN115546009B (zh) 非极大值抑制算法的优化方法、装置以及设备、存储介质
Kharinov et al. Object Hierarchy in a Digital Image
WO2023241276A1 (zh) 图像编辑方法及相关设备
Singh An alternate algorithm for (3x3) median filtering of digital images
Duong et al. Lite FPN_SSD: A Reconfiguration SSD with Adapting Feature Pyramid Network Scheme for Small Object Detection
CN112712489A (zh) 图像处理的方法、***及计算机可读存储介质
CN117292073A (zh) 矢量电子海图地形剖分及构建方法、装置及电子设备
JP2658346B2 (ja) 並列画像処理装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20857213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20857213

Country of ref document: EP

Kind code of ref document: A1