CN112750101A - FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) - Google Patents
FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) Download PDFInfo
- Publication number
- CN112750101A CN112750101A CN202011256011.8A CN202011256011A CN112750101A CN 112750101 A CN112750101 A CN 112750101A CN 202011256011 A CN202011256011 A CN 202011256011A CN 112750101 A CN112750101 A CN 112750101A
- Authority
- CN
- China
- Prior art keywords
- gpu
- fft
- algorithm
- calculation
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 20
- 230000007547 defect Effects 0.000 title claims description 7
- 230000003287 optical effect Effects 0.000 title description 2
- 238000004364 calculation method Methods 0.000 claims abstract description 14
- 238000003491 array Methods 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 claims 1
- 230000000007 visual effect Effects 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 6
- 230000009466 transformation Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011897 real-time detection Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Discrete Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a 2-FFT-based parallel GPU detection algorithm, which mainly provides a rapid calculation scheme for detection of large-scale images. The method circularly performs the following processing on 3 layers of the original FFT butterfly algorithm: serial calculation is carried out on the outmost layer of circulation, and parallel calculation is carried out after the internal 2 layers of circulation are unified by a formula. The outermost loop times are logarithms of the computation amount, similar to the depth of a binary tree, so the outer layer serial computation amount is small. The internal 2-layer circulation with large calculation amount and the unified formula are calculated by a GPU parallel method. Finally, the purpose of fast calculation of FFT is realized.
Description
Technical Field
The invention mainly relates to industrial-grade high-precision real-time detection, in particular to an image processing technology which solves the problem of high-precision detection of small defects of an ultra-large image and meets the requirement of high-precision detection.
Background
The modern industrial product is manufactured to be refined and processed at high speed, the requirement of product detection is more and more, early manual detection meets the current requirement, and the requirement cannot be met more and more no matter the detection speed or the detection accuracy is developed along with the time. With the development of image processing and target detection, an automatic image detection mode gradually enters the industrial detection industry to replace manual detection. Mainly dealing with size detection, various defect detections, and the like.
Aiming at the requirement of industrial image processing, removing some noise points, noise points and the like in image information, Fourier frequency domain filtering processing is needed, Fourier transformation processes continuous signals into trigonometric function signals, the trigonometric function signals are converted from a time domain to a frequency domain, then the related information is processed and filtered on the frequency domain, and the related information is reversely converted on the time domain to remove interference information, so that support is provided for effectively searching defects.
The existing discrete Fourier transform basically adopts fast Fourier transform to achieve the purpose of acceleration, but before acceleration, optimization preparation needs to be carried out, the time is long, the requirement of real-time detection of a factory cannot be met when the discrete Fourier transform is normally used for detection, and the cufft effect of cuda is not ideal.
Detecting small defects (less than 0.01 mm) in a super-large image such as (8K 13K, 16K 30K) by industrial products2) And real-time performance (300 ms) and the like, the detection speed is a very important index for limiting the performance of the algorithm.
Disclosure of Invention
Therefore, in order to solve the problems of large FFT computation amount and high speed, the present invention considers designing the parallel algorithm of the GPU on the basis of the FFT for solving the above-mentioned needs.
The technical scheme adopted by the invention is as follows.
Based on a 2-based FFT, will be other than 2nData of quantity, extended to 2nIndex (most recent index) of (c).
And performing GPU parallel of one-dimensional FFT, firstly storing the FFT transformation coefficient W by using an array, and transmitting the FFT transformation coefficient W to a GPU shared memory.
The odd-type and even-type results inside the multi-cycle are represented by a pool-type array flag under the branch condition, the flag array identifier is calculated by a branch on a CPU, and then the flag array identifier is transmitted to a GPU shared memory for use.
Merging and paralleling inner two-layer loops (i, j) of three-layer loops (k, i, j) of FFT (fast Fourier transform) algorithm, and reserving outermost-layer loops, wherein the number of the outermost-layer loops is log of calculated quantity2m, similar to the depth of the binary tree, so the outer loop volume is small.
The loop of the inner two layers has odd-type items and even-type items in parallel, the items are judged and distinguished by adding conditions by using a [0010] step flag array, and the parallel subscript of a branch array flag is [ (tid% (nNum)) + k (1< < r) ].
And adding an array f for odd-type terms, and optimizing subscripts of the two arrays into f [ tid ] and f [ tid + (1< (r-k-1)) ] ].
And subtracting an array by using the even type term, then multiplying the even type term by a transformation coefficient W, optimally arranging the two subtracted array indexes into f [ tid ] and f [ tid + (1< (r-k-1)) ], and using [ tid < (1< < k))% (1< (r-1)) as an index to represent the transformation coefficient W.
And each time the outer layer of circulation traverses, assigning the parity type result obtained by calculation to an input array of the next circulation.
And after the outer circulation is finished, carrying out subscript sorting to obtain a converted array of the normal conversion point sequence.
And (4) completing a one-dimensional fast Fourier parallel algorithm in the steps (0008) to (0016).
The method is expanded to two dimensions on the basis of one dimension, one direction is selected for expansion, for example, the height direction, each line is a one-dimensional Fourier parallel algorithm, and total parallelism can be formed.
And after finishing calculation of each row in the height direction, starting calculation of each column. And (4) finishing the calculation of all rows and columns, namely finishing the Fourier transform of one image.
Drawings
The following description of the invention, an understanding of the application scenario, is helpful to reading and referring to the following drawings.
FIG. 1 is a logic flow diagram of a one-dimensional parallel algorithm of the present invention.
Figure 1 shows the algorithm implementation steps and the parallel conditions in detail.
FIG. 2 is a general flow chart of the application steps of the present invention using a parallel algorithm, explicitly identifying the input data, the calculation steps and the output data of the present invention.
Detailed Description
And importing a picture, acquiring the width and the height of the picture, judging whether the width and the height are certain indexes of 2, if so, keeping the width and the height unchanged, and entering the next step, otherwise, filling pixels 0 in the lower right corner of the picture to certain indexes of 2 in width and height.
And declaring a storage space of the image data on the GPU, synthesizing and converting the image data of the CPU into a one-dimensional array, and transmitting the one-dimensional array to the GPU.
And expanding the one-dimensional real number array on the GPU into a complex number array.
And setting Grid and Block parameters, and transmitting the one-dimensional complex array in the step [0009] into a kernel function to perform GPU parallel of one-dimensional FFT.
And distributing two-dimensional Thread through Grid and Block, and transforming in the image broadband direction. One thread block needs to be synchronized by a _ synchreads method to keep data normally computed.
And when one loop is finished, assigning the calculation result to the input value for the next loop iteration.
And finishing all the loop calculations, and performing subscript sorting by using a reverse sorting method.
And (3) transposing the image data, exchanging rows and columns, and then performing GPU parallelism of one-dimensional FFT again according to the form.
And after the steps are completed, Fourier transform of the image is obtained.
And obtaining a central filter convolution kernel by using a Gaussian function, then obtaining a peripheral filter by four branch conditions, and removing partial frequency information on the peripheral filter to obtain the required filter.
And then [0032] the image after Fourier transform and a filter are calculated to obtain a filtered image.
And performing one-dimensional Fourier inverse transformation on the filtered image according to the row direction.
And (4) transposing the image obtained in the step [0035], and then continuing to perform one-dimensional inverse Fourier transform in the row direction.
And (4) transposing the image obtained in the step [0036] to obtain an image with the amplitude, namely the image after filtering.
And transmitting the converted image from the GPU side to the CPU side.
The desired result is obtained.
Claims (4)
1. The algorithm for parallel detection of the OCA defects by the aid of the FFT-based super-large image GPU mainly aims at visual detection task requirements such as high-precision real-time performance of super-large images in industrial products, improves parallelism and achieves the purpose of fast calculation on the basis of fast Fourier transform, and has the following main innovation points.
2. The algorithm of claim 1, wherein the inner multi-layer loop is modified to enhance the representation of parallelism by using a shift pattern of indices to allow arrays to be indexed with a uniform pattern.
3. The algorithm of claim 2, wherein parity patterns in the multi-level loop are replaced with branch conditional predicate statements to facilitate GPU parallelism. The conditional branch outcome is recorded with a pool type array flag. Because the branch condition judgment performance of the GPU end is lower than that of the CPU end, the flag array is calculated and assigned at the CPU end, and then the flag array is transmitted into a shared memory of the GPU, so that the access efficiency is improved, when the GPU end performs calculation, the flag array can be directly inquired to obtain a result quickly, and subscripts are convenient to unify and calculate in a multi-thread parallel mode.
4. The algorithm of claim 1, wherein the coefficients of the fourier transform are pre-computed in the CPU and then transmitted to the GPU shared memory once, thereby improving the computational efficiency and reducing the transmission time per computation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011256011.8A CN112750101A (en) | 2020-11-11 | 2020-11-11 | FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011256011.8A CN112750101A (en) | 2020-11-11 | 2020-11-11 | FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112750101A true CN112750101A (en) | 2021-05-04 |
Family
ID=75648902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011256011.8A Pending CN112750101A (en) | 2020-11-11 | 2020-11-11 | FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112750101A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101937343A (en) * | 2010-09-17 | 2011-01-05 | 上海交通大学 | Method for realizing rear-end translation framework of heterogeneous multi-core virtual execution environment |
CN104731563A (en) * | 2015-04-03 | 2015-06-24 | 中国科学院软件研究所 | FFT-based large integer multiplication SSA algorithm multi-core parallel implementation method |
CN109493318A (en) * | 2018-10-09 | 2019-03-19 | 广东仙童智能机器人科技有限公司 | A kind of image parallel processing method, device and computer storage medium |
CN111786688A (en) * | 2020-06-16 | 2020-10-16 | 重庆邮电大学 | Broadband parallel channelization receiving method based on embedded GPU |
CN111858066A (en) * | 2020-07-30 | 2020-10-30 | 中国空气动力研究与发展中心超高速空气动力研究所 | CPU + GPU heterogeneous parallel optimization method in pneumatic theory unified algorithm |
-
2020
- 2020-11-11 CN CN202011256011.8A patent/CN112750101A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101937343A (en) * | 2010-09-17 | 2011-01-05 | 上海交通大学 | Method for realizing rear-end translation framework of heterogeneous multi-core virtual execution environment |
CN104731563A (en) * | 2015-04-03 | 2015-06-24 | 中国科学院软件研究所 | FFT-based large integer multiplication SSA algorithm multi-core parallel implementation method |
CN109493318A (en) * | 2018-10-09 | 2019-03-19 | 广东仙童智能机器人科技有限公司 | A kind of image parallel processing method, device and computer storage medium |
CN111786688A (en) * | 2020-06-16 | 2020-10-16 | 重庆邮电大学 | Broadband parallel channelization receiving method based on embedded GPU |
CN111858066A (en) * | 2020-07-30 | 2020-10-30 | 中国空气动力研究与发展中心超高速空气动力研究所 | CPU + GPU heterogeneous parallel optimization method in pneumatic theory unified algorithm |
Non-Patent Citations (4)
Title |
---|
丁丽丽;李雁冰;张素平;王鹏翔;张庆花;: "分支嵌套循环的自动并行化研究", 计算机科学, no. 05 * |
周叶江;郑彬;赵永廷;: "GPU在活塞销尺寸快速检测中的应用研究", 计算机应用与软件, no. 01, pages 0 - 1 * |
彭自然,王国军: ""一种在多核嵌入式平台上实现FFT的快速并行算法"", 《计算机应用研究》, pages 1 - 3 * |
狄鹏;胡长军;李建江;: "GPU上高效Jacobi迭代算法的研究与实现", 小型微型计算机***, no. 09 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106875011B (en) | Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof | |
JP6314628B2 (en) | Arithmetic processing unit | |
WO2018074012A1 (en) | Operation processing circuit and recognition system | |
CN104807534B (en) | Equipment eigentone self study recognition methods based on on-line vibration data | |
CN106156851A (en) | The accelerator pursued one's vocational study towards the degree of depth and method | |
CN109872396B (en) | Rapid cross-section contour generation method suitable for triangular mesh model | |
CN109146065A (en) | The convolution algorithm method and device of 2-D data | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
CN112750101A (en) | FFT (fast Fourier transform) -based algorithm for parallel detection of OCA (optical clear array) defects by using super-large graph GPU (graphic processing Unit) | |
CN108388102B (en) | Low-frequency-suppression random multivariate search binary phase hologram generation method | |
CN112288847B (en) | Light field three-dimensional reconstruction method based on fast Fourier transform | |
CN117496352A (en) | Remote sensing change detection method, device and equipment based on gradual fusion of adjacent features | |
CN113496248A (en) | Method and apparatus for training computer-implemented models | |
CN108920097B (en) | Three-dimensional data processing method based on interleaving storage | |
CN106991638A (en) | A kind of method of many granularity parallel optimizations based on sequential images Harris DOG feature extractions | |
CN108269246B (en) | Image equalization enhancement method for low-frequency wavelet coefficient interpolation | |
CN116309429A (en) | Chip defect detection method based on deep learning | |
CN115730438A (en) | Parallel processing method for inverse solution of GPU (graphics processing Unit) of NURBS (non-Uniform rational B-spline) surface mapping of product | |
CN102279415B (en) | Method for calculating Fourier integral one-way wave depth migration based on graphics processor | |
CN113112435B (en) | Variable contrast enhancement method and device for wavelet domain positive and negative image fusion | |
CN115600666A (en) | Self-learning method and device for power transmission and distribution line defect detection model | |
CN117435547A (en) | Artificial intelligent chip, method, equipment and medium for flexibly accessing data | |
CN113052292B (en) | Convolutional neural network technique method, device and computer readable storage medium | |
CN114926352A (en) | Image reflection removing method, system, device and storage medium | |
CN117474797B (en) | Image denoising method and device for multi-scale complementary learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |