CN112750101A - FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) - Google Patents

FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) Download PDF

Info

Publication number
CN112750101A
CN112750101A CN202011256011.8A CN202011256011A CN112750101A CN 112750101 A CN112750101 A CN 112750101A CN 202011256011 A CN202011256011 A CN 202011256011A CN 112750101 A CN112750101 A CN 112750101A
Authority
CN
China
Prior art keywords
gpu
fft
algorithm
calculation
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011256011.8A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Pingheng Intelligent Technology Co ltd
Original Assignee
Beijing Pingheng Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Pingheng Intelligent Technology Co ltd filed Critical Beijing Pingheng Intelligent Technology Co ltd
Priority to CN202011256011.8A priority Critical patent/CN112750101A/en
Publication of CN112750101A publication Critical patent/CN112750101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Discrete Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a 2-FFT-based parallel GPU detection algorithm, which mainly provides a rapid calculation scheme for detection of large-scale images. The method circularly performs the following processing on 3 layers of the original FFT butterfly algorithm: serial calculation is carried out on the outmost layer of circulation, and parallel calculation is carried out after the internal 2 layers of circulation are unified by a formula. The outermost loop times are logarithms of the computation amount, similar to the depth of a binary tree, so the outer layer serial computation amount is small. The internal 2-layer circulation with large calculation amount and the unified formula are calculated by a GPU parallel method. Finally, the purpose of fast calculation of FFT is realized.

Description

FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit)
Technical Field
The invention mainly relates to industrial-grade high-precision real-time detection, in particular to an image processing technology which solves the problem of high-precision detection of small defects of an ultra-large image and meets the requirement of high-precision detection.
Background
The modern industrial product is manufactured to be refined and processed at high speed, the requirement of product detection is more and more, early manual detection meets the current requirement, and the requirement cannot be met more and more no matter the detection speed or the detection accuracy is developed along with the time. With the development of image processing and target detection, an automatic image detection mode gradually enters the industrial detection industry to replace manual detection. Mainly dealing with size detection, various defect detections, and the like.
Aiming at the requirement of industrial image processing, removing some noise points, noise points and the like in image information, Fourier frequency domain filtering processing is needed, Fourier transformation processes continuous signals into trigonometric function signals, the trigonometric function signals are converted from a time domain to a frequency domain, then the related information is processed and filtered on the frequency domain, and the related information is reversely converted on the time domain to remove interference information, so that support is provided for effectively searching defects.
The existing discrete Fourier transform basically adopts fast Fourier transform to achieve the purpose of acceleration, but before acceleration, optimization preparation needs to be carried out, the time is long, the requirement of real-time detection of a factory cannot be met when the discrete Fourier transform is normally used for detection, and the cufft effect of cuda is not ideal.
Detecting small defects (less than 0.01 mm) in a super-large image such as (8K 13K, 16K 30K) by industrial products2) And real-time performance (300 ms) and the like, the detection speed is a very important index for limiting the performance of the algorithm.
Disclosure of Invention
Therefore, in order to solve the problems of large FFT computation amount and high speed, the present invention considers designing the parallel algorithm of the GPU on the basis of the FFT for solving the above-mentioned needs.
The technical scheme adopted by the invention is as follows.
Based on a 2-based FFT, will be other than 2nData of quantity, extended to 2nIndex (most recent index) of (c).
And performing GPU parallel of one-dimensional FFT, firstly storing the FFT transformation coefficient W by using an array, and transmitting the FFT transformation coefficient W to a GPU shared memory.
The odd-type and even-type results inside the multi-cycle are represented by a pool-type array flag under the branch condition, the flag array identifier is calculated by a branch on a CPU, and then the flag array identifier is transmitted to a GPU shared memory for use.
Merging and paralleling inner two-layer loops (i, j) of three-layer loops (k, i, j) of FFT (fast Fourier transform) algorithm, and reserving outermost-layer loops, wherein the number of the outermost-layer loops is log of calculated quantity2m, similar to the depth of the binary tree, so the outer loop volume is small.
The loop of the inner two layers has odd-type items and even-type items in parallel, the items are judged and distinguished by adding conditions by using a [0010] step flag array, and the parallel subscript of a branch array flag is [ (tid% (nNum)) + k (1< < r) ].
And adding an array f for odd-type terms, and optimizing subscripts of the two arrays into f [ tid ] and f [ tid + (1< (r-k-1)) ] ].
And subtracting an array by using the even type term, then multiplying the even type term by a transformation coefficient W, optimally arranging the two subtracted array indexes into f [ tid ] and f [ tid + (1< (r-k-1)) ], and using [ tid < (1< < k))% (1< (r-1)) as an index to represent the transformation coefficient W.
And each time the outer layer of circulation traverses, assigning the parity type result obtained by calculation to an input array of the next circulation.
And after the outer circulation is finished, carrying out subscript sorting to obtain a converted array of the normal conversion point sequence.
And (4) completing a one-dimensional fast Fourier parallel algorithm in the steps (0008) to (0016).
The method is expanded to two dimensions on the basis of one dimension, one direction is selected for expansion, for example, the height direction, each line is a one-dimensional Fourier parallel algorithm, and total parallelism can be formed.
And after finishing calculation of each row in the height direction, starting calculation of each column. And (4) finishing the calculation of all rows and columns, namely finishing the Fourier transform of one image.
Drawings
The following description of the invention, an understanding of the application scenario, is helpful to reading and referring to the following drawings.
FIG. 1 is a logic flow diagram of a one-dimensional parallel algorithm of the present invention.
Figure 1 shows the algorithm implementation steps and the parallel conditions in detail.
FIG. 2 is a general flow chart of the application steps of the present invention using a parallel algorithm, explicitly identifying the input data, the calculation steps and the output data of the present invention.
Detailed Description
And importing a picture, acquiring the width and the height of the picture, judging whether the width and the height are certain indexes of 2, if so, keeping the width and the height unchanged, and entering the next step, otherwise, filling pixels 0 in the lower right corner of the picture to certain indexes of 2 in width and height.
And declaring a storage space of the image data on the GPU, synthesizing and converting the image data of the CPU into a one-dimensional array, and transmitting the one-dimensional array to the GPU.
And expanding the one-dimensional real number array on the GPU into a complex number array.
And setting Grid and Block parameters, and transmitting the one-dimensional complex array in the step [0009] into a kernel function to perform GPU parallel of one-dimensional FFT.
And distributing two-dimensional Thread through Grid and Block, and transforming in the image broadband direction. One thread block needs to be synchronized by a _ synchreads method to keep data normally computed.
And when one loop is finished, assigning the calculation result to the input value for the next loop iteration.
And finishing all the loop calculations, and performing subscript sorting by using a reverse sorting method.
And (3) transposing the image data, exchanging rows and columns, and then performing GPU parallelism of one-dimensional FFT again according to the form.
And after the steps are completed, Fourier transform of the image is obtained.
And obtaining a central filter convolution kernel by using a Gaussian function, then obtaining a peripheral filter by four branch conditions, and removing partial frequency information on the peripheral filter to obtain the required filter.
And then [0032] the image after Fourier transform and a filter are calculated to obtain a filtered image.
And performing one-dimensional Fourier inverse transformation on the filtered image according to the row direction.
And (4) transposing the image obtained in the step [0035], and then continuing to perform one-dimensional inverse Fourier transform in the row direction.
And (4) transposing the image obtained in the step [0036] to obtain an image with the amplitude, namely the image after filtering.
And transmitting the converted image from the GPU side to the CPU side.
The desired result is obtained.

Claims (4)

1. The algorithm for parallel detection of the OCA defects by the aid of the FFT-based super-large image GPU mainly aims at visual detection task requirements such as high-precision real-time performance of super-large images in industrial products, improves parallelism and achieves the purpose of fast calculation on the basis of fast Fourier transform, and has the following main innovation points.
2. The algorithm of claim 1, wherein the inner multi-layer loop is modified to enhance the representation of parallelism by using a shift pattern of indices to allow arrays to be indexed with a uniform pattern.
3. The algorithm of claim 2, wherein parity patterns in the multi-level loop are replaced with branch conditional predicate statements to facilitate GPU parallelism. The conditional branch outcome is recorded with a pool type array flag. Because the branch condition judgment performance of the GPU end is lower than that of the CPU end, the flag array is calculated and assigned at the CPU end, and then the flag array is transmitted into a shared memory of the GPU, so that the access efficiency is improved, when the GPU end performs calculation, the flag array can be directly inquired to obtain a result quickly, and subscripts are convenient to unify and calculate in a multi-thread parallel mode.
4. The algorithm of claim 1, wherein the coefficients of the fourier transform are pre-computed in the CPU and then transmitted to the GPU shared memory once, thereby improving the computational efficiency and reducing the transmission time per computation.
CN202011256011.8A 2020-11-11 2020-11-11 FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) Pending CN112750101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011256011.8A CN112750101A (en) 2020-11-11 2020-11-11 FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011256011.8A CN112750101A (en) 2020-11-11 2020-11-11 FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit)

Publications (1)

Publication Number Publication Date
CN112750101A true CN112750101A (en) 2021-05-04

Family

ID=75648902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011256011.8A Pending CN112750101A (en) 2020-11-11 2020-11-11 FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit)

Country Status (1)

Country Link
CN (1) CN112750101A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937343A (en) * 2010-09-17 2011-01-05 上海交通大学 Method for realizing rear-end translation framework of heterogeneous multi-core virtual execution environment
CN104731563A (en) * 2015-04-03 2015-06-24 中国科学院软件研究所 FFT-based large integer multiplication SSA algorithm multi-core parallel implementation method
CN109493318A (en) * 2018-10-09 2019-03-19 广东仙童智能机器人科技有限公司 A kind of image parallel processing method, device and computer storage medium
CN111786688A (en) * 2020-06-16 2020-10-16 重庆邮电大学 Broadband parallel channelization receiving method based on embedded GPU
CN111858066A (en) * 2020-07-30 2020-10-30 中国空气动力研究与发展中心超高速空气动力研究所 CPU + GPU heterogeneous parallel optimization method in pneumatic theory unified algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937343A (en) * 2010-09-17 2011-01-05 上海交通大学 Method for realizing rear-end translation framework of heterogeneous multi-core virtual execution environment
CN104731563A (en) * 2015-04-03 2015-06-24 中国科学院软件研究所 FFT-based large integer multiplication SSA algorithm multi-core parallel implementation method
CN109493318A (en) * 2018-10-09 2019-03-19 广东仙童智能机器人科技有限公司 A kind of image parallel processing method, device and computer storage medium
CN111786688A (en) * 2020-06-16 2020-10-16 重庆邮电大学 Broadband parallel channelization receiving method based on embedded GPU
CN111858066A (en) * 2020-07-30 2020-10-30 中国空气动力研究与发展中心超高速空气动力研究所 CPU + GPU heterogeneous parallel optimization method in pneumatic theory unified algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
丁丽丽;李雁冰;张素平;王鹏翔;张庆花;: "分支嵌套循环的自动并行化研究", 计算机科学, no. 05 *
周叶江;郑彬;赵永廷;: "GPU在活塞销尺寸快速检测中的应用研究", 计算机应用与软件, no. 01, pages 0 - 1 *
彭自然,王国军: ""一种在多核嵌入式平台上实现FFT的快速并行算法"", 《计算机应用研究》, pages 1 - 3 *
狄鹏;胡长军;李建江;: "GPU上高效Jacobi迭代算法的研究与实现", 小型微型计算机***, no. 09 *

Similar Documents

Publication Publication Date Title
CN106875011B (en) Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof
JP6314628B2 (en) Arithmetic processing unit
WO2018074012A1 (en) Operation processing circuit and recognition system
CN104807534B (en) Equipment eigentone self study recognition methods based on on-line vibration data
CN106156851A (en) The accelerator pursued one&#39;s vocational study towards the degree of depth and method
CN109872396B (en) Rapid cross-section contour generation method suitable for triangular mesh model
CN109146065A (en) The convolution algorithm method and device of 2-D data
CN106484532B (en) GPGPU parallel calculating method towards SPH fluid simulation
CN112750101A (en) FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit)
CN108388102B (en) Low-frequency-suppression random multivariate search binary phase hologram generation method
CN112288847B (en) Light field three-dimensional reconstruction method based on fast Fourier transform
CN117496352A (en) Remote sensing change detection method, device and equipment based on gradual fusion of adjacent features
CN113496248A (en) Method and apparatus for training computer-implemented models
CN108920097B (en) Three-dimensional data processing method based on interleaving storage
CN106991638A (en) A kind of method of many granularity parallel optimizations based on sequential images Harris DOG feature extractions
CN108269246B (en) Image equalization enhancement method for low-frequency wavelet coefficient interpolation
CN116309429A (en) Chip defect detection method based on deep learning
CN115730438A (en) Parallel processing method for inverse solution of GPU (graphics processing Unit) of NURBS (non-Uniform rational B-spline) surface mapping of product
CN102279415B (en) Method for calculating Fourier integral one-way wave depth migration based on graphics processor
CN113112435B (en) Variable contrast enhancement method and device for wavelet domain positive and negative image fusion
CN115600666A (en) Self-learning method and device for power transmission and distribution line defect detection model
CN117435547A (en) Artificial intelligent chip, method, equipment and medium for flexibly accessing data
CN113052292B (en) Convolutional neural network technique method, device and computer readable storage medium
CN114926352A (en) Image reflection removing method, system, device and storage medium
CN117474797B (en) Image denoising method and device for multi-scale complementary learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination