KR101841547B1 - Optimization method for the scale space construction on a mobile GPU - Google Patents

Optimization method for the scale space construction on a mobile GPU Download PDF

Info

Publication number
KR101841547B1
KR101841547B1 KR1020160006114A KR20160006114A KR101841547B1 KR 101841547 B1 KR101841547 B1 KR 101841547B1 KR 1020160006114 A KR1020160006114 A KR 1020160006114A KR 20160006114 A KR20160006114 A KR 20160006114A KR 101841547 B1 KR101841547 B1 KR 101841547B1
Authority
KR
South Korea
Prior art keywords
scale
image
scale space
present
gaussian filtering
Prior art date
Application number
KR1020160006114A
Other languages
Korean (ko)
Other versions
KR20170086365A (en
Inventor
이채은
Original Assignee
인하대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 인하대학교 산학협력단 filed Critical 인하대학교 산학협력단
Priority to KR1020160006114A priority Critical patent/KR101841547B1/en
Publication of KR20170086365A publication Critical patent/KR20170086365A/en
Application granted granted Critical
Publication of KR101841547B1 publication Critical patent/KR101841547B1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • H04N19/426Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • H04N5/232

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present invention relates to a method for optimizing scale space generation for a mobile GPU, A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space; A second step of reading necessary data at a time using the image of the lowest scale and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, .
According to the present invention, when generating a plurality of scale Gaussian images, necessary data are read at a time using an image of the lowest scale and Gaussian filtering is performed at a time, Memory access is greatly reduced, and the number of threads required to process one image is greatly reduced, thereby remarkably improving the performance of generating a scale space.

Figure 112016005556642-pat00001

Description

[0001] The present invention relates to a method of optimizing scale space generation for a mobile GPU,

The present invention relates to a method for optimizing scale space generation for a mobile GPU, and more particularly, to a method and apparatus for optimizing a scale space generation process using a GPU (Graphic Processing Unit) in a mobile device such as a smart phone. And a method for optimizing generation of scale space for a GPU.

Recently, as smartphones are becoming popular, feature extraction algorithms using cameras are widely used. Various techniques have been proposed for extracting such feature points, and among them, SIFT algorithm is used as proposed in Japanese Patent Registration No. 10-1076487 (Reference 1).

SIFT extracts attributes such as position, scale, and direction of a feature in consideration of local image characteristics. As suggested in Reference 2, SIFT is a feature of a scale space created through a Difference-Of-Gaussian (DOG) Selecting a sample point (or candidate pixel) by searching for a maximum or minimum in a scale space, selecting keypoints based on a stability value, assigning one or more directions to each keypoint, , And creating a keypoint descriptor using local image gradients.

In this case, the scale space is generally a space created by a scale axis, and is composed of a Gaussian image that is gradually smoothed.

In the field of computer vision such as smart phone, the scale space is widely used to extract edges and feature points, and algorithms using these algorithms can take into account features of various sizes in the image.

However, when the feature extraction algorithm using a camera is used in a mobile device such as a smart phone, high computational complexity of the scale space generation process makes real-time processing difficult.

Therefore, studies have been conducted to parallelize the process of creating a scale space using openCL language in a mobile GPU of a smartphone.

As suggested in Ref. 3, the amount of memory access is reduced by packing the gray image into the GPU texture, and the processing time of about 40 ms is applied to the 320 × 256 image It looked.

Nevertheless, the recent resolution of the camera is much higher, and further optimization studies are needed considering that the generation of scale space is the preprocessing process of most algorithms.

Reference 1: Registration No. 10-1076487

 References 2: Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110  Reference 3: G. Wang, B. Rister, and J. R. Cavallaro, " Workload analysis and efficient OpenCL-based Implementation of SIFT algorithm on a smartphone, GlobalSIP, 2013, pp. 759-762

Accordingly, the present invention has been made to solve the above problems, and it is an object of the present invention to provide a method of optimizing a scale space generation process using a mobile GPU (Graphic Processing Unit).

In particular, it is an object of the present invention to provide a method for optimizing scale space generation for a mobile GPU capable of optimizing thread operation from the viewpoint of decreasing the memory access amount and enhancing the parallel core utilization.

In order to solve such a technical problem,

A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space; A second step of reading necessary data at a time using the image of the lowest scale and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, The method of claim 1, further comprising:

In this case, a plurality of scale images constituting the scale space are six in number.

In the first step, the plurality of scale images are stored in a GPU texture format.

According to the present invention, when generating a plurality of scale Gaussian images, necessary data are read at a time using an image of the lowest scale and Gaussian filtering is performed at a time, Memory access is greatly reduced, and the number of threads required to process one image is greatly reduced, thereby remarkably improving the performance of generating a scale space.

The method according to the present invention can optimize the scale space generation from the viewpoint of the memory access of the mobile GPU and the use of the parallel core, and is useful when applying it to the feature point extraction algorithm.

FIG. 1 is a diagram illustrating a scale space generation structure for optimizing scale space generation for a mobile GPU according to the present invention.
FIG. 2 is a diagram illustrating an example of SIFT minutia depth variation measurement according to optimization of scale space generation for a mobile GPU according to the present invention.
FIG. 3 is a table showing a result of a scale space generation speed according to the optimization method according to the present invention.

Hereinafter, a method for optimizing scale space generation for a mobile GPU according to the present invention will be described in detail with reference to the accompanying drawings.

Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined.

Therefore, the embodiments described in the present specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It should be understood that various equivalents and modifications may be present.

Referring to FIG. 1, a method for optimizing scale space generation for a mobile GPU according to the present invention is a method for optimizing a process of generating a scale space generated on a scale using a mobile GPU, You can optimize thread behavior in terms of reducing memory access and increasing parallel core utilization.

The method of optimizing scale space generation for a mobile GPU according to the present invention is a method for generating a plurality of scale Gaussian images by performing Gaussian smoothing on a source image using a Gaussian filter, A first step of selecting an image S0 of the lowest scale among a plurality of scale images S0 to S5 to generate the image S0; A second step of reading necessary data at a time and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, .

This method greatly reduces the number of kernels while taking similar approaches to the existing approaches and memory accesses.

Hereinafter, the present invention will be described in more detail with reference to Fig.

First, a scale space is generally a space created by a scale axis, and is composed of a Gaussian image that is gradually smoothed.

In the present invention, a plurality of scale images constituting a scale space is assumed to be six (S0 to S5), and a plurality of scale images (S0 to S5) are GPU texture ), And is described in reference 3 (G. Wang, B. Rister, and JR Cavallaro, "Workload analysis and efficient OpenCL-based Implementation of SIFT algorithm on a smartphone," in Proc. GlobalSIP, 2013, pp.759- 762), which is a conventional image processing method. When reading a texture pixel from memory, four consecutive gray pixels are read.

A conventional method of generating a scale space uses a Gaussian image of a previous scale to generate a Gaussian image of a next scale. Such a conventional method uses a Gaussian filter The size of the Gaussian filter is small, but there are a lot of parts to wait for the previous convolution operation to be completed, and therefore the utilization of the parallel core is greatly reduced.

Accordingly, the present invention uses only the image S0 of the lowest scale to generate a Gaussian image of various scales as shown in FIG. The black and white arrows indicate Gaussian filters (1, 2) in the vertical and horizontal directions, respectively. The numbers above the arrows indicate the order of execution (①, ②).

Accordingly, Gaussian filtering in the vertical direction using the Gaussian filter 1 in the vertical direction in FIG. 1 is performed (1), Gaussian filtering in the horizontal direction is performed (2) do.

That is, execution (① and ②) are performed in order, and a total of two kernel operations are required. Data necessary for Gaussian filtering of the first longitudinal direction can be read at a time and shared by a Gaussian filter operation corresponding to the scales S1 to S5.

Thus, the method according to the present invention greatly reduces the number of kernels while taking similar amounts of memory access to existing methods.

On the other hand, a conventional Gaussian filter has many memory accesses overlapping with neighboring threads. However, in the present invention, it is possible to reduce overlapping memory accesses by increasing the number of pixels processed by one thread. .

2 shows an example in which one thread processes two vertical pixels P in a Gaussian filter 1 in a vertical direction. A convolution process for two pixels P is processed in a pipeline form while reading a necessary pixel P from the memory. Two registers are needed as a resource for this. This can reduce duplicate memory accesses by a factor of two, and also reduce the number of threads required to process a single image.

In other words, the more pixels you process in a thread, the less memory access you need and the fewer threads you need. However, the number of registers required is increased, and the number of registers available in the GPU core is limited.

Also, as the number of registers used by a single thread increases, the occupancy decreases and the latency hiding effect between threads decreases. In the present invention, the speed is measured while increasing the pipeline stage, and it is confirmed that the speed is the fastest in the 8th stage.

Hereinafter, the experimental result of optimizing scale space generation for a mobile GPU according to the present invention will be described with reference to FIG.

In the present invention, an image of 1280 × 720 size was used, a Qualcomm Snapdragon 805 processor Adreno 420 GPU was used, and a table of FIG. 3 shows a degree of performance improvement when a scale space generation optimization method was applied.

According to this, when the kernel dependency is reduced (O_KD), the speed is improved 1.5 times and the memory access of the Gaussian filter is reduced (O_RM) 1.7 times.

The present invention can be applied to a feature point extraction algorithm by optimizing the generation of a scale space in terms of memory access and parallel core utilization of a mobile GPU.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The scope of protection of the present invention should be construed under the following claims, and all technical ideas within the scope of equivalents thereof should be construed as being included in the scope of the present invention.

1: Vertical direction Gaussian filter
2: Horizontal direction Gaussian filter
P: pixel
S0 ~ S5: Scale image

Claims (3)

A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space;
A second step of performing vertical direction Gaussian filtering by reading necessary data at a time using an image of the lowest scale; And
Performing Gaussian filtering in the vertical direction through the second step and performing horizontal Gaussian filtering by sharing the read data with the remaining scales; However,
A plurality of necessary pixels from a memory are read and a convolution process on a plurality of pixels is processed in a pipeline form to increase the utilization of the parallel core,
Wherein the plurality of scale images are stored in a GPU texture format in the first step.
The method according to claim 1,
Wherein a plurality of scale images constituting the scale space is 6, and wherein the scale space generation optimization method for a mobile GPU comprises:
delete
KR1020160006114A 2016-01-18 2016-01-18 Optimization method for the scale space construction on a mobile GPU KR101841547B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020160006114A KR101841547B1 (en) 2016-01-18 2016-01-18 Optimization method for the scale space construction on a mobile GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020160006114A KR101841547B1 (en) 2016-01-18 2016-01-18 Optimization method for the scale space construction on a mobile GPU

Publications (2)

Publication Number Publication Date
KR20170086365A KR20170086365A (en) 2017-07-26
KR101841547B1 true KR101841547B1 (en) 2018-03-23

Family

ID=59427119

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020160006114A KR101841547B1 (en) 2016-01-18 2016-01-18 Optimization method for the scale space construction on a mobile GPU

Country Status (1)

Country Link
KR (1) KR101841547B1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657582A (en) * 2017-09-29 2018-02-02 郑州云海信息技术有限公司 A kind of information acquisition method and acquisition device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090324087A1 (en) 2008-06-27 2009-12-31 Palo Alto Research Center Incorporated System and method for finding stable keypoints in a picture image using localized scale space properties

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090324087A1 (en) 2008-06-27 2009-12-31 Palo Alto Research Center Incorporated System and method for finding stable keypoints in a picture image using localized scale space properties

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
E. S. Kim, et al. A novel hardware design for SIFT generation with reduced memory requirement. Journal of Semiconductor Technology and Science. Apr. 2013. Vol.13, No.2, pp.157-169*
G. Wang, et al. Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone. 2013 IEEE Global Conference on Signal and Information Processing. Dec. 2013, pp.759-762*
Optimizing Gaussian blurs on a mobile GPU. 2013년 10월 21일. 인터넷:http://www.sunsetlakesoftware.com/2013/10/21/optimizing-gaussian-blurs-mobile-gpu*

Also Published As

Publication number Publication date
KR20170086365A (en) 2017-07-26

Similar Documents

Publication Publication Date Title
US10936911B2 (en) Logo detection
Zamberletti et al. Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions
JP2020095713A (en) Method and system for information extraction from document images using conversational interface and database querying
JP2015513754A (en) Face recognition method and device
US11551027B2 (en) Object detection based on a feature map of a convolutional neural network
US9824421B2 (en) Content-aware image resizing using superpixels
JP6161266B2 (en) Information processing apparatus, control method therefor, electronic device, program, and storage medium
WO2020125062A1 (en) Image fusion method and related device
CN111640123B (en) Method, device, equipment and medium for generating background-free image
US20210312215A1 (en) Method for book recognition and book reading device
TW201939356A (en) Code-scanning image recognition method, apparatus and device
CN113498521A (en) Text detection method and device and storage medium
JP2022160662A (en) Character recognition method, device, apparatus, storage medium, smart dictionary pen, and computer program
US20150242703A1 (en) Method and apparatus for extracting image feature
Chen et al. Adaptive fusion network for RGB-D salient object detection
CN114238904A (en) Identity recognition method, and training method and device of two-channel hyper-resolution model
Park et al. Robust keypoint detection using higher-order scale space derivatives: application to image retrieval
WO2013112065A1 (en) Object selection in an image
Song et al. Residual network with dense block
KR101841547B1 (en) Optimization method for the scale space construction on a mobile GPU
Wicht et al. Mixed handwritten and printed digit recognition in Sudoku with Convolutional Deep Belief Network
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
JP2013120517A (en) Image processing device
Bampis et al. Real-time indexing for large image databases: color and edge directivity descriptor on GPU
Zhu et al. Scene text detection with selected anchors

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant