KR101841547B1

KR101841547B1 - Optimization method for the scale space construction on a mobile GPU

Info

Publication number: KR101841547B1
Application number: KR1020160006114A
Authority: KR
Inventors: 이채은
Original assignee: 인하대학교 산학협력단
Priority date: 2016-01-18
Filing date: 2016-01-18
Publication date: 2018-03-23
Also published as: KR20170086365A

Abstract

The present invention relates to a method for optimizing scale space generation for a mobile GPU, A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space; A second step of reading necessary data at a time using the image of the lowest scale and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, .
According to the present invention, when generating a plurality of scale Gaussian images, necessary data are read at a time using an image of the lowest scale and Gaussian filtering is performed at a time, Memory access is greatly reduced, and the number of threads required to process one image is greatly reduced, thereby remarkably improving the performance of generating a scale space.

Description

[0001] The present invention relates to a method of optimizing scale space generation for a mobile GPU,

The present invention relates to a method for optimizing scale space generation for a mobile GPU, and more particularly, to a method and apparatus for optimizing a scale space generation process using a GPU (Graphic Processing Unit) in a mobile device such as a smart phone. And a method for optimizing generation of scale space for a GPU.

Recently, as smartphones are becoming popular, feature extraction algorithms using cameras are widely used. Various techniques have been proposed for extracting such feature points, and among them, SIFT algorithm is used as proposed in Japanese Patent Registration No. 10-1076487 (Reference 1).

SIFT extracts attributes such as position, scale, and direction of a feature in consideration of local image characteristics. As suggested in Reference 2, SIFT is a feature of a scale space created through a Difference-Of-Gaussian (DOG) Selecting a sample point (or candidate pixel) by searching for a maximum or minimum in a scale space, selecting keypoints based on a stability value, assigning one or more directions to each keypoint, , And creating a keypoint descriptor using local image gradients.

In this case, the scale space is generally a space created by a scale axis, and is composed of a Gaussian image that is gradually smoothed.

In the field of computer vision such as smart phone, the scale space is widely used to extract edges and feature points, and algorithms using these algorithms can take into account features of various sizes in the image.

However, when the feature extraction algorithm using a camera is used in a mobile device such as a smart phone, high computational complexity of the scale space generation process makes real-time processing difficult.

Therefore, studies have been conducted to parallelize the process of creating a scale space using openCL language in a mobile GPU of a smartphone.

As suggested in Ref. 3, the amount of memory access is reduced by packing the gray image into the GPU texture, and the processing time of about 40 ms is applied to the 320 × 256 image It looked.

Nevertheless, the recent resolution of the camera is much higher, and further optimization studies are needed considering that the generation of scale space is the preprocessing process of most algorithms.

Reference 1: Registration No. 10-1076487

References 2: Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110 Reference 3: G. Wang, B. Rister, and J. R. Cavallaro, " Workload analysis and efficient OpenCL-based Implementation of SIFT algorithm on a smartphone, GlobalSIP, 2013, pp. 759-762

Accordingly, the present invention has been made to solve the above problems, and it is an object of the present invention to provide a method of optimizing a scale space generation process using a mobile GPU (Graphic Processing Unit).

In particular, it is an object of the present invention to provide a method for optimizing scale space generation for a mobile GPU capable of optimizing thread operation from the viewpoint of decreasing the memory access amount and enhancing the parallel core utilization.

In order to solve such a technical problem,

A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space; A second step of reading necessary data at a time using the image of the lowest scale and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, The method of claim 1, further comprising:

In this case, a plurality of scale images constituting the scale space are six in number.

In the first step, the plurality of scale images are stored in a GPU texture format.

According to the present invention, when generating a plurality of scale Gaussian images, necessary data are read at a time using an image of the lowest scale and Gaussian filtering is performed at a time, Memory access is greatly reduced, and the number of threads required to process one image is greatly reduced, thereby remarkably improving the performance of generating a scale space.

The method according to the present invention can optimize the scale space generation from the viewpoint of the memory access of the mobile GPU and the use of the parallel core, and is useful when applying it to the feature point extraction algorithm.

FIG. 1 is a diagram illustrating a scale space generation structure for optimizing scale space generation for a mobile GPU according to the present invention.
FIG. 2 is a diagram illustrating an example of SIFT minutia depth variation measurement according to optimization of scale space generation for a mobile GPU according to the present invention.
FIG. 3 is a table showing a result of a scale space generation speed according to the optimization method according to the present invention.

Hereinafter, a method for optimizing scale space generation for a mobile GPU according to the present invention will be described in detail with reference to the accompanying drawings.

Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined.

Therefore, the embodiments described in the present specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It should be understood that various equivalents and modifications may be present.

Referring to FIG. 1, a method for optimizing scale space generation for a mobile GPU according to the present invention is a method for optimizing a process of generating a scale space generated on a scale using a mobile GPU, You can optimize thread behavior in terms of reducing memory access and increasing parallel core utilization.

The method of optimizing scale space generation for a mobile GPU according to the present invention is a method for generating a plurality of scale Gaussian images by performing Gaussian smoothing on a source image using a Gaussian filter, A first step of selecting an image S0 of the lowest scale among a plurality of scale images S0 to S5 to generate the image S0; A second step of reading necessary data at a time and performing vertical Gaussian filtering; And performing a horizontal direction Gaussian filtering by performing vertical direction Gaussian filtering through the second step and sharing the read data with the remaining scales, .

This method greatly reduces the number of kernels while taking similar approaches to the existing approaches and memory accesses.

Hereinafter, the present invention will be described in more detail with reference to Fig.

First, a scale space is generally a space created by a scale axis, and is composed of a Gaussian image that is gradually smoothed.

In the present invention, a plurality of scale images constituting a scale space is assumed to be six (S0 to S5), and a plurality of scale images (S0 to S5) are GPU texture ), And is described in reference 3 (G. Wang, B. Rister, and JR Cavallaro, "Workload analysis and efficient OpenCL-based Implementation of SIFT algorithm on a smartphone," in Proc. GlobalSIP, 2013, pp.759- 762), which is a conventional image processing method. When reading a texture pixel from memory, four consecutive gray pixels are read.

A conventional method of generating a scale space uses a Gaussian image of a previous scale to generate a Gaussian image of a next scale. Such a conventional method uses a Gaussian filter The size of the Gaussian filter is small, but there are a lot of parts to wait for the previous convolution operation to be completed, and therefore the utilization of the parallel core is greatly reduced.

Accordingly, the present invention uses only the image S0 of the lowest scale to generate a Gaussian image of various scales as shown in FIG. The black and white arrows indicate Gaussian filters (1, 2) in the vertical and horizontal directions, respectively. The numbers above the arrows indicate the order of execution (①, ②).

Accordingly, Gaussian filtering in the vertical direction using the Gaussian filter 1 in the vertical direction in FIG. 1 is performed (1), Gaussian filtering in the horizontal direction is performed (2) do.

That is, execution (① and ②) are performed in order, and a total of two kernel operations are required. Data necessary for Gaussian filtering of the first longitudinal direction can be read at a time and shared by a Gaussian filter operation corresponding to the scales S1 to S5.

Thus, the method according to the present invention greatly reduces the number of kernels while taking similar amounts of memory access to existing methods.

On the other hand, a conventional Gaussian filter has many memory accesses overlapping with neighboring threads. However, in the present invention, it is possible to reduce overlapping memory accesses by increasing the number of pixels processed by one thread. .

2 shows an example in which one thread processes two vertical pixels P in a Gaussian filter 1 in a vertical direction. A convolution process for two pixels P is processed in a pipeline form while reading a necessary pixel P from the memory. Two registers are needed as a resource for this. This can reduce duplicate memory accesses by a factor of two, and also reduce the number of threads required to process a single image.

In other words, the more pixels you process in a thread, the less memory access you need and the fewer threads you need. However, the number of registers required is increased, and the number of registers available in the GPU core is limited.

Also, as the number of registers used by a single thread increases, the occupancy decreases and the latency hiding effect between threads decreases. In the present invention, the speed is measured while increasing the pipeline stage, and it is confirmed that the speed is the fastest in the 8th stage.

Hereinafter, the experimental result of optimizing scale space generation for a mobile GPU according to the present invention will be described with reference to FIG.

In the present invention, an image of 1280 × 720 size was used, a Qualcomm Snapdragon 805 processor Adreno 420 GPU was used, and a table of FIG. 3 shows a degree of performance improvement when a scale space generation optimization method was applied.

According to this, when the kernel dependency is reduced (O_KD), the speed is improved 1.5 times and the memory access of the Gaussian filter is reduced (O_RM) 1.7 times.

The present invention can be applied to a feature point extraction algorithm by optimizing the generation of a scale space in terms of memory access and parallel core utilization of a mobile GPU.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The scope of protection of the present invention should be construed under the following claims, and all technical ideas within the scope of equivalents thereof should be construed as being included in the scope of the present invention.

1: Vertical direction Gaussian filter
2: Horizontal direction Gaussian filter
P: pixel
S0 ~ S5: Scale image

Claims

A first step of selecting an image of the lowest scale among a plurality of scale images constituting a scale space;
A second step of performing vertical direction Gaussian filtering by reading necessary data at a time using an image of the lowest scale; And
Performing Gaussian filtering in the vertical direction through the second step and performing horizontal Gaussian filtering by sharing the read data with the remaining scales; However,
A plurality of necessary pixels from a memory are read and a convolution process on a plurality of pixels is processed in a pipeline form to increase the utilization of the parallel core,
Wherein the plurality of scale images are stored in a GPU texture format in the first step.

The method according to claim 1,
Wherein a plurality of scale images constituting the scale space is 6, and wherein the scale space generation optimization method for a mobile GPU comprises:

delete