KR20170077429A

KR20170077429A - Saliency Map Generation Method and System based on Video Analysis

Info

Publication number: KR20170077429A
Application number: KR1020150187306A
Authority: KR
Inventors: 송혁; 고민수
Original assignee: 전자부품연구원
Priority date: 2015-12-28
Filing date: 2015-12-28
Publication date: 2017-07-06

Abstract

A method and system for generating a map of interest based on video analysis is provided. According to the embodiment of the present invention, the attention map generating method extracts attention map from space, extracts the attention map from the motion basis, and combines the extracted attention maps. Thus, a moving image analysis based attention drawing map can be generated, and the important region in the moving image can be automatically extracted without manual selection of the user.

Description

[0001] The present invention relates to a map generation method and system based on video analysis,

The present invention relates to an image processing technique, and more particularly, to a method and system for estimating an area of interest in a moving image using various image analysis techniques.

Human beings tended to focus more on specific areas such as fast-moving objects or bright areas when viewing images, and attention map extraction techniques that applied this human visual attention to the field of computer vision were developed.

Conventional map extraction techniques have been proposed to apply to one - frame images using spatial feature information such as brightness, color, and color histogram.

However, existing still - image - based map generation techniques use only spatial information of the image, so if they are applied to the moving image, the temporal correlation between the map of interest and the map is inferior.

Therefore, there is a need for a technique of generating a map of interest, which is suitable for a moving image rather than a still image.

SUMMARY OF THE INVENTION The present invention has been made in order to solve the above problems, and it is an object of the present invention to provide a moving image analyzing system capable of generating an attention map from a moving image using a video analysis technique such as image segmentation and motion estimation. And to provide a map generation method and system.

It is another object of the present invention to provide a method and apparatus for automatically extracting important regions in a moving image through a map of interest based on a moving image analysis, so that it can be utilized in various fields such as image editing and object extraction.

According to an aspect of the present invention, there is provided a method for creating a map of interest, comprising: a first extraction step of extracting a first map of interest from a space; A second extracting step of extracting a second attention map based on a motion; And combining the first attention map with the second map of interest.

The first extracting step may include: dividing an image into a plurality of regions; Calculating average color values of the divided regions; Computing the difference from the surrounding area with the average color values, and calculating the similarities of the divided areas; And generating the first attention map based on the calculated similarities.

According to another aspect of the present invention, there is provided a method of creating a map of interest, comprising: detecting motion regions in an image; And generating the second attention map based on the motion magnitudes of the detected motion regions.

In the combining step, the first map of interest and the second map of interest may be combined using adaptive weights.

According to another aspect of the present invention, there is provided a method for creating a map of interest, comprising the steps of: searching for an important region in a final map of interest generated in the combining step; And tracking an important area searched in the searching step.

According to another aspect of the present invention, there is provided a map generating system for a map of interest, comprising: a first extractor for extracting a first map of interest from a space; A second extracting unit for extracting a second map of interest based on a motion; And a combining unit for combining the first map of interest and the second map of interest.

As described above, according to the embodiments of the present invention, it is possible to automatically extract important regions in a moving image without manually selecting a user through a moving image analysis based attention drawing map.

In addition, according to embodiments of the present invention, meaningful information in a moving image is automatically extracted, so that the user can utilize the same in various fields / services such as image editing and object extraction.

FIG. 1 is a flowchart illustrating a method for generating a map of interest based on a moving image analysis according to an exemplary embodiment of the present invention.
Fig. 2 is a diagram provided for explanation of area division, Fig.
Figure 3 is a diagram provided in the description of motion estimation,
4 is a diagram illustrating the final attention map,
5 is a block diagram of a attention map generation system according to another embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings.

1. A method of creating a map of interest based on video analysis

FIG. 1 is a flowchart illustrating a method of generating a map of interest based on a moving image analysis according to an exemplary embodiment of the present invention.

The attention map generating method according to an embodiment of the present invention generates an attention map by estimating an area of interest in a moving image by using an image analysis technique such as motion estimation and image segmentation. In the point of analyzing images using motion information of moving images, it is different from conventional still image based map generation.

Furthermore, the attention map generating method according to the embodiment of the present invention may automatically extract important regions in a moving image using the extracted attention map. This is different from the existing technology in which the user has to manually select an important area in the video.

In order to create a map of interest, as shown in FIG. 1, first, a map of interest is extracted based on a space through an area division in a current frame (S110). Then, the motion of the next frame of the current frame and the predetermined section is estimated, and the attention map is extracted based on the motion using the information (S120).

Next, a final attention map is generated by combining the map of interest generated in step S110 and the map of interest generated in step S120 into an adaptive weight (step S130).

Hereinafter, each of the steps constituting FIG. 1 will be described in detail.

2. Space-based attention map extraction using region segmentation (S110)

First, region segmentation is performed using the SLIC super pixel method for the current frame image. The SLIC superpixel scheme has a property of preserving the boundaries well, creating a relatively homogeneous region compared to other schemes.

The left side of FIG. 2 is the original image, and the right side is the result of applying the region segmentation technique to the original image.

Next, for the divided areas, the average color value in the area is calculated. Calculate the similarity by calculating the difference from the surrounding area through the average color value. The lower the degree of similarity, the higher the attention value. The higher the degree of similarity, the lower the attention value.

3. Extraction of attention map using motion estimation (S120)

First, a motion vector is calculated using a motion estimation technique in a current frame and a next frame of a predetermined interval.

If the motion is estimated using only the current frame and the next frame, erroneous motion can be estimated by a momentary illumination change or a subtle movement. Therefore, in order to reduce such errors, motion estimation is performed in various frames of a predetermined section, and the error is reduced by correcting the motion.

The left side of FIG. 3 is the current frame image and the right side is the result of motion estimation for the current frame image. The areas indicated by color are movement areas, and the areas marked with red are areas with greater motion than areas marked with blue.

A region with a large motion has a high attention value, a region with a small motion has a low attention value, and a region with no motion has a zero attention value.

4. Creating a map of the final attention map by the attention map combination (S130)

Here, the two priorities generated above are combined using an adaptive weighting map. Equation (1) represents an equation for calculating the final attention map.

_{S f = α 1 S s +} α 2 S m + α 3 S s S m

? ₃ = (? ₁ +? ₂ ) / 2 (1)

Where S _f is the final map of interest, S _s is the space-based map of interest and S _m is the motion-based map of interest. α ₁ , α ₂ , α ₃ represent weights for combination.

The left side of FIG. 4 is the original images, and the right side is the final attention maps for the original images. In addition to the area with a large color difference from the periphery, a region with a large motion also appeared as a target area, and the target value was larger when both of them were satisfied.

5. A map generation system based on video analysis

5 is a block diagram of a attention map generation system according to another embodiment of the present invention. The attention map generating system according to the embodiment of the present invention is a system for generating an attention map in accordance with the algorithm shown in FIG.

As shown in FIG. 5, the attention map generating system according to the embodiment of the present invention performing such functions includes a moving image input unit 210, a space-based attention map extracting unit 220, An extracting unit 230, a attention map combining unit 240, and an important region detecting / searching unit 250.

The video input unit 210 receives a moving image generated through a camera, or receives moving images from a storage medium, an external device, or an external network.

The space-based attention map extractor 220 extracts a map of interest from space based on the segmentation of the current frame of the moving image input through the moving image input unit 210. [

The motion-based attention map extraction unit 230 estimates the motion of a current frame and a next frame of a moving image input through the moving image input unit 210, and extracts a map of interest based on the motion using the information .

The attention map combining unit 240 generates a final attention map by adaptively weighting the attention maps generated by the space-based map extraction unit 220 and the motion-based map extraction unit 230.

The critical region search / tracking unit 250 searches a critical region from the final attention map generated by the attention map combining unit 240, and tracks the searched critical region.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

210:
220: Space-based attention map extracting unit
230: motion-based attention map extracting unit
240: Attention map combination part
250: critical area detecting / searching unit

Claims

A first extraction step of extracting a first map of interest from space;
A second extracting step of extracting a second attention map based on a motion; And
And combining the first map of interest with the second map of interest.

The method according to claim 1,
Wherein the first extraction step comprises:
Dividing the image into a plurality of regions;
Calculating average color values of the divided regions;
Computing the difference from the surrounding area with the average color values, and calculating the similarities of the divided areas;
And generating the first attention map based on the calculated similarities.

The method according to claim 1,
Detecting motion regions in an image;
And generating the second attention map based on the motion magnitudes of the detected motion regions.

The method according to claim 1,
Wherein the combining step comprises:
Wherein the first map of interest and the second map of interest are combined using adaptive weights.

The method according to claim 1,
Searching for a significant region in a final attention map generated in the combining step; And
And tracking the important area searched in the searching step.

A first extracting unit for extracting a first map of interest based on the space;
A second extracting unit for extracting a second map of interest based on a motion; And
And a combining unit for combining the first map of interest and the second map of interest.