KR101841547B1 - Optimization method for the scale space construction on a mobile GPU - Google Patents
Optimization method for the scale space construction on a mobile GPU Download PDFInfo
- Publication number
- KR101841547B1 KR101841547B1 KR1020160006114A KR20160006114A KR101841547B1 KR 101841547 B1 KR101841547 B1 KR 101841547B1 KR 1020160006114 A KR1020160006114 A KR 1020160006114A KR 20160006114 A KR20160006114 A KR 20160006114A KR 101841547 B1 KR101841547 B1 KR 101841547B1
- Authority
- KR
- South Korea
- Prior art keywords
- scale
- image
- scale space
- present
- gaussian filtering
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000005457 optimization Methods 0.000 title claims description 6
- 238000010276 construction Methods 0.000 title 1
- 238000001914 filtration Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 description 4
- 230000001965 increasing effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 240000001436 Antirrhinum majus Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
- H04N19/426—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- H04N5/232—
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The present invention relates to a method for optimizing scale space generation for a mobile GPU, A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space; A second step of reading necessary data at a time using the image of the lowest scale and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, .
According to the present invention, when generating a plurality of scale Gaussian images, necessary data are read at a time using an image of the lowest scale and Gaussian filtering is performed at a time, Memory access is greatly reduced, and the number of threads required to process one image is greatly reduced, thereby remarkably improving the performance of generating a scale space.
Description
The present invention relates to a method for optimizing scale space generation for a mobile GPU, and more particularly, to a method and apparatus for optimizing a scale space generation process using a GPU (Graphic Processing Unit) in a mobile device such as a smart phone. And a method for optimizing generation of scale space for a GPU.
Recently, as smartphones are becoming popular, feature extraction algorithms using cameras are widely used. Various techniques have been proposed for extracting such feature points, and among them, SIFT algorithm is used as proposed in Japanese Patent Registration No. 10-1076487 (Reference 1).
SIFT extracts attributes such as position, scale, and direction of a feature in consideration of local image characteristics. As suggested in
In this case, the scale space is generally a space created by a scale axis, and is composed of a Gaussian image that is gradually smoothed.
In the field of computer vision such as smart phone, the scale space is widely used to extract edges and feature points, and algorithms using these algorithms can take into account features of various sizes in the image.
However, when the feature extraction algorithm using a camera is used in a mobile device such as a smart phone, high computational complexity of the scale space generation process makes real-time processing difficult.
Therefore, studies have been conducted to parallelize the process of creating a scale space using openCL language in a mobile GPU of a smartphone.
As suggested in Ref. 3, the amount of memory access is reduced by packing the gray image into the GPU texture, and the processing time of about 40 ms is applied to the 320 × 256 image It looked.
Nevertheless, the recent resolution of the camera is much higher, and further optimization studies are needed considering that the generation of scale space is the preprocessing process of most algorithms.
Accordingly, the present invention has been made to solve the above problems, and it is an object of the present invention to provide a method of optimizing a scale space generation process using a mobile GPU (Graphic Processing Unit).
In particular, it is an object of the present invention to provide a method for optimizing scale space generation for a mobile GPU capable of optimizing thread operation from the viewpoint of decreasing the memory access amount and enhancing the parallel core utilization.
In order to solve such a technical problem,
A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space; A second step of reading necessary data at a time using the image of the lowest scale and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, The method of
In this case, a plurality of scale images constituting the scale space are six in number.
In the first step, the plurality of scale images are stored in a GPU texture format.
According to the present invention, when generating a plurality of scale Gaussian images, necessary data are read at a time using an image of the lowest scale and Gaussian filtering is performed at a time, Memory access is greatly reduced, and the number of threads required to process one image is greatly reduced, thereby remarkably improving the performance of generating a scale space.
The method according to the present invention can optimize the scale space generation from the viewpoint of the memory access of the mobile GPU and the use of the parallel core, and is useful when applying it to the feature point extraction algorithm.
FIG. 1 is a diagram illustrating a scale space generation structure for optimizing scale space generation for a mobile GPU according to the present invention.
FIG. 2 is a diagram illustrating an example of SIFT minutia depth variation measurement according to optimization of scale space generation for a mobile GPU according to the present invention.
FIG. 3 is a table showing a result of a scale space generation speed according to the optimization method according to the present invention.
Hereinafter, a method for optimizing scale space generation for a mobile GPU according to the present invention will be described in detail with reference to the accompanying drawings.
Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined.
Therefore, the embodiments described in the present specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It should be understood that various equivalents and modifications may be present.
Referring to FIG. 1, a method for optimizing scale space generation for a mobile GPU according to the present invention is a method for optimizing a process of generating a scale space generated on a scale using a mobile GPU, You can optimize thread behavior in terms of reducing memory access and increasing parallel core utilization.
The method of optimizing scale space generation for a mobile GPU according to the present invention is a method for generating a plurality of scale Gaussian images by performing Gaussian smoothing on a source image using a Gaussian filter, A first step of selecting an image S0 of the lowest scale among a plurality of scale images S0 to S5 to generate the image S0; A second step of reading necessary data at a time and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, .
This method greatly reduces the number of kernels while taking similar approaches to the existing approaches and memory accesses.
Hereinafter, the present invention will be described in more detail with reference to Fig.
First, a scale space is generally a space created by a scale axis, and is composed of a Gaussian image that is gradually smoothed.
In the present invention, a plurality of scale images constituting a scale space is assumed to be six (S0 to S5), and a plurality of scale images (S0 to S5) are GPU texture ), And is described in reference 3 (G. Wang, B. Rister, and JR Cavallaro, "Workload analysis and efficient OpenCL-based Implementation of SIFT algorithm on a smartphone," in Proc. GlobalSIP, 2013, pp.759- 762), which is a conventional image processing method. When reading a texture pixel from memory, four consecutive gray pixels are read.
A conventional method of generating a scale space uses a Gaussian image of a previous scale to generate a Gaussian image of a next scale. Such a conventional method uses a Gaussian filter The size of the Gaussian filter is small, but there are a lot of parts to wait for the previous convolution operation to be completed, and therefore the utilization of the parallel core is greatly reduced.
Accordingly, the present invention uses only the image S0 of the lowest scale to generate a Gaussian image of various scales as shown in FIG. The black and white arrows indicate Gaussian filters (1, 2) in the vertical and horizontal directions, respectively. The numbers above the arrows indicate the order of execution (①, ②).
Accordingly, Gaussian filtering in the vertical direction using the
That is, execution (① and ②) are performed in order, and a total of two kernel operations are required. Data necessary for Gaussian filtering of the first longitudinal direction can be read at a time and shared by a Gaussian filter operation corresponding to the scales S1 to S5.
Thus, the method according to the present invention greatly reduces the number of kernels while taking similar amounts of memory access to existing methods.
On the other hand, a conventional Gaussian filter has many memory accesses overlapping with neighboring threads. However, in the present invention, it is possible to reduce overlapping memory accesses by increasing the number of pixels processed by one thread. .
2 shows an example in which one thread processes two vertical pixels P in a
In other words, the more pixels you process in a thread, the less memory access you need and the fewer threads you need. However, the number of registers required is increased, and the number of registers available in the GPU core is limited.
Also, as the number of registers used by a single thread increases, the occupancy decreases and the latency hiding effect between threads decreases. In the present invention, the speed is measured while increasing the pipeline stage, and it is confirmed that the speed is the fastest in the 8th stage.
Hereinafter, the experimental result of optimizing scale space generation for a mobile GPU according to the present invention will be described with reference to FIG.
In the present invention, an image of 1280 × 720 size was used, a Qualcomm Snapdragon 805 processor Adreno 420 GPU was used, and a table of FIG. 3 shows a degree of performance improvement when a scale space generation optimization method was applied.
According to this, when the kernel dependency is reduced (O_KD), the speed is improved 1.5 times and the memory access of the Gaussian filter is reduced (O_RM) 1.7 times.
The present invention can be applied to a feature point extraction algorithm by optimizing the generation of a scale space in terms of memory access and parallel core utilization of a mobile GPU.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The scope of protection of the present invention should be construed under the following claims, and all technical ideas within the scope of equivalents thereof should be construed as being included in the scope of the present invention.
1: Vertical direction Gaussian filter
2: Horizontal direction Gaussian filter
P: pixel
S0 ~ S5: Scale image
Claims (3)
A second step of performing vertical direction Gaussian filtering by reading necessary data at a time using an image of the lowest scale; And
Performing Gaussian filtering in the vertical direction through the second step and performing horizontal Gaussian filtering by sharing the read data with the remaining scales; However,
A plurality of necessary pixels from a memory are read and a convolution process on a plurality of pixels is processed in a pipeline form to increase the utilization of the parallel core,
Wherein the plurality of scale images are stored in a GPU texture format in the first step.
Wherein a plurality of scale images constituting the scale space is 6, and wherein the scale space generation optimization method for a mobile GPU comprises:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160006114A KR101841547B1 (en) | 2016-01-18 | 2016-01-18 | Optimization method for the scale space construction on a mobile GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160006114A KR101841547B1 (en) | 2016-01-18 | 2016-01-18 | Optimization method for the scale space construction on a mobile GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170086365A KR20170086365A (en) | 2017-07-26 |
KR101841547B1 true KR101841547B1 (en) | 2018-03-23 |
Family
ID=59427119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160006114A KR101841547B1 (en) | 2016-01-18 | 2016-01-18 | Optimization method for the scale space construction on a mobile GPU |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101841547B1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657582A (en) * | 2017-09-29 | 2018-02-02 | 郑州云海信息技术有限公司 | A kind of information acquisition method and acquisition device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090324087A1 (en) | 2008-06-27 | 2009-12-31 | Palo Alto Research Center Incorporated | System and method for finding stable keypoints in a picture image using localized scale space properties |
-
2016
- 2016-01-18 KR KR1020160006114A patent/KR101841547B1/en active IP Right Grant
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090324087A1 (en) | 2008-06-27 | 2009-12-31 | Palo Alto Research Center Incorporated | System and method for finding stable keypoints in a picture image using localized scale space properties |
Non-Patent Citations (3)
Title |
---|
E. S. Kim, et al. A novel hardware design for SIFT generation with reduced memory requirement. Journal of Semiconductor Technology and Science. Apr. 2013. Vol.13, No.2, pp.157-169* |
G. Wang, et al. Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone. 2013 IEEE Global Conference on Signal and Information Processing. Dec. 2013, pp.759-762* |
Optimizing Gaussian blurs on a mobile GPU. 2013년 10월 21일. 인터넷:http://www.sunsetlakesoftware.com/2013/10/21/optimizing-gaussian-blurs-mobile-gpu* |
Also Published As
Publication number | Publication date |
---|---|
KR20170086365A (en) | 2017-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10936911B2 (en) | Logo detection | |
Zamberletti et al. | Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions | |
JP2020095713A (en) | Method and system for information extraction from document images using conversational interface and database querying | |
JP2015513754A (en) | Face recognition method and device | |
US11551027B2 (en) | Object detection based on a feature map of a convolutional neural network | |
US9824421B2 (en) | Content-aware image resizing using superpixels | |
JP6161266B2 (en) | Information processing apparatus, control method therefor, electronic device, program, and storage medium | |
WO2020125062A1 (en) | Image fusion method and related device | |
CN111640123B (en) | Method, device, equipment and medium for generating background-free image | |
US20210312215A1 (en) | Method for book recognition and book reading device | |
TW201939356A (en) | Code-scanning image recognition method, apparatus and device | |
CN113498521A (en) | Text detection method and device and storage medium | |
JP2022160662A (en) | Character recognition method, device, apparatus, storage medium, smart dictionary pen, and computer program | |
US20150242703A1 (en) | Method and apparatus for extracting image feature | |
Chen et al. | Adaptive fusion network for RGB-D salient object detection | |
CN114238904A (en) | Identity recognition method, and training method and device of two-channel hyper-resolution model | |
Park et al. | Robust keypoint detection using higher-order scale space derivatives: application to image retrieval | |
WO2013112065A1 (en) | Object selection in an image | |
Song et al. | Residual network with dense block | |
KR101841547B1 (en) | Optimization method for the scale space construction on a mobile GPU | |
Wicht et al. | Mixed handwritten and printed digit recognition in Sudoku with Convolutional Deep Belief Network | |
CN113537187A (en) | Text recognition method and device, electronic equipment and readable storage medium | |
JP2013120517A (en) | Image processing device | |
Bampis et al. | Real-time indexing for large image databases: color and edge directivity descriptor on GPU | |
Zhu et al. | Scene text detection with selected anchors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |