CN110189294B

CN110189294B - RGB-D image significance detection method based on depth reliability analysis

Info

Publication number: CN110189294B
Application number: CN201910298984.9A
Authority: CN
Inventors: 周洋; 刘晓琪; 尉婉丽; 梁文青
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Eyecloud Technology Co ltd
Priority date: 2019-04-15
Filing date: 2019-04-15
Publication date: 2021-05-07
Anticipated expiration: 2039-04-15
Also published as: CN110189294A

Abstract

The invention discloses an RGB-D image significance detection method based on depth reliability analysis. The method comprises the steps of firstly, evaluating the reliability of scene far and near degree information reflected by a depth map, leading out a depth reliability factor lambda as a main characteristic index for measuring the significance of a stereo image, and when the depth map can accurately reflect the scene information, namely lambda is less than 0.45, carrying out stereo image significance detection only through depth characteristics; when the depth map reliability is low or the scene is fuzzy, namely lambda is more than or equal to 0.45, and the significance detection is carried out by combining other characteristics such as color and the like. The method fully considers the contribution degree of the clues of the depth map to the stereoscopic vision significance detection, and judges whether the depth map can accurately reflect the scene distance or whether the depth map has distortion or not by calculating the credibility factor of the depth map. The method has low computational complexity, the obtained three-dimensional image saliency map has high quality, and the method can be directly applied to the engineering fields of 3D image perception, 3D quality evaluation, object identification and the like.

Description

RGB-D image significance detection method based on depth reliability analysis

Technical Field

The invention belongs to the technical field of image processing, particularly relates to the technical field of stereo image processing, and relates to a RGB-D image significance detection method based on depth reliability analysis.

Background

With the advent of the "explosion-type" of multimedia tools, huge amounts of information in various ways are emerging in front of people, and images and videos are favored due to the characteristics of abundant expressive force, vivid content and the like, and become the mainstream way of information expression and transmission. How to accurately and efficiently mine significant information with high attention of human eyes in images/videos to reduce the burden of machine computing processing and storage becomes a common research hotspot in the fields of video image processing and computer vision. Most of the existing calculation methods are mainly used for monocular saliency detection, and accurate detection of color images is realized, but in natural scenes, the human visual system can process the distance of the scenes when perceiving information such as colors and shapes, namely perceiving depth information. The traditional significance model ignores the influence of depth features on target detection to a certain extent. In recent years, scholars have developed stereo image saliency detection in combination with depth features in the field of binocular stereo vision. In the human eye vision system, the binocular disparity makes the images of the natural scene projected into the eyes slightly different, the disparity is mainly focused on the horizontal dimension, namely parallax, and the disparity is an important channel for human perception of the stereoscopic depth of the scene. In saliency detection, depth information is generally expressed as a grayscale image, and a pixel value thereof represents a distance from a camera to an object projected in an image plane, reflects a distance of an object from human eyes, and is one of important perceptual features of a stereoscopic image.

In the method for detecting the significance of the three-dimensional scene, students highlight a significant target area by adopting different calculation models and combining plane features such as colors, textures and the like and depth features. In these methods, some depth maps may generate distortion or scene blurring due to different depth map acquisition modes, thereby affecting the significance detection result to generate a certain deviation.

Disclosure of Invention

The invention aims to provide an RGB-D image significance detection method based on depth reliability analysis aiming at the defects of the prior art.

The method comprises the steps of firstly, evaluating the reliability of the information of the degree of distance of a scene reflected by a depth map to obtain a reliability factor lambda of the depth map, taking lambda as a characteristic index for measuring the significance of a three-dimensional image, carrying out three-dimensional visual significance detection based on the depth information when the lambda is less than 0.45, and carrying out three-dimensional visual significance detection by combining color characteristics when the lambda is more than or equal to 0.45; the method comprises three modules: analyzing depth reliability, performing stereoscopic vision significance detection based on depth information, and performing stereoscopic vision significance detection by combining color characteristics; the method comprises the following specific steps:

(1) first, a depth confidence analysis is performed: inputting a depth image: firstly, image segmentation is carried out by adopting a linear iterative clustering SLIC superpixel segmentation algorithm, and a graph G (V, E) is constructed, wherein V represents a node set, and each node in V corresponds to a superpixel block V_iThe edge E connects adjacent superpixel blocks; superpixel block v from depth map feature analysis_iThe depth confidence of (d) is:

wherein m and s represent the mean and standard deviation of pixel values in the whole depth map, respectively, and m_iRepresenting superpixel blocks v_iH represents the image entropy,

wherein L is the gray level of the depth map, P_jIs the probability of the occurrence of the jth gray level in the depth map; c₀Is an intensity adjustment factor, C is more than or equal to 1₀≤10；

Then fuse each superpixel block v_iDepth confidence value λ of_iObtaining the credibility factor of the whole depth map

Wherein N is_sRepresenting the number of superpixels, N_s∈[100,200,300](ii) a When lambda is less than 0.45, stereoscopic vision significance detection is carried out based on the depth information; and when the lambda is more than or equal to 0.45, performing stereoscopic vision significance detection by combining color characteristics.

(2) The specific steps of performing stereoscopic vision saliency detection based on depth information are as follows:

(2-1) coarse background filtering: comparing the average depth difference degree of the pixel points in the depth map with the average depth difference degree of the lines where the pixel points are located to obtain a preliminary depth map front background distinction:

wherein, I_kAnd l'_kRespectively representing the depth values of the pixel point k in the original depth image and the coarsely filtered image,

representing the average depth value of the line where the pixel point k is located;

(2-2) depth compactness significance analysis: in combination with the depth confidence factor, the superpixel block v is first analyzed_iDepth-based compactness of (2):

where Sdc (vi) represents the compact saliency value, n, of each super-pixel block_jIs a super pixel v_jNumber of pixels, λ, contained in_jRepresenting calculated superpixel blocks v_jThe depth confidence value of (a) is,

is a super pixel block v_jThe coordinates of the center of mass of the image,

representing the centroid position of the entire depth image; a is_ijRepresenting the similarity between two superpixel blocks in the preprocessed depth map,

m′_iand m'_jRepresenting a superpixel v_i'and v'_jAverage value of (d);

in order to control the constant amount of the liquid,

σ²to control the affinity constant, σ, of the similarity matrix²0.1; significance map S for depth compactness calculation_comThe final realization is as follows: s_com(vi) 1-norm (Sdc (vi)), norm (. cndot.) as a normalization function, remapping significant values to [0,255 [ (. cndot.) ]]Obtaining a depth compactness saliency map from the range of (1);

(2-3) depth contrast significance analysis: saliency value S of pixel k_con(k) Based on its contrast with all other pixels in the depth image:

f_lrepresenting the frequency of the appearance of different depth values in the depth map after the rough background filtering; wherein D (I'_k,I′_l)＝||I′_k-I′_lL, representing image pixel I'_kAnd other image pixel points I'_lSpatial distance between, pixel point I'_kAnd l'_lDepth value of [0,255 ]]Within the range;

(2-4) significance of depth compactness result S_comAnd depth contrast saliency result S_conAnd (3) fusion, wherein the acquired saliency map based on the depth information is as follows: s_depth＝θS_com+(1-θ)S_conAnd theta is a positive control parameter between two significance terms, and theta is 0.5.

(3) The method for detecting the stereoscopic vision significance by combining the color characteristics comprises the following specific steps of:

(3-1) parallel structure based on background priors:

firstly, the influence of the background is minimized by eliminating the false boundary, and the realization method comprises the following steps:

D_color(I_p,I_q) Representing the color distance difference between different sides, R/G/B are respectively red, green and blue channels, p and q are any two sides of the four boundaries,

is the mean of the characteristics of the three channels on the boundary p,

is the mean value of the characteristics of the three channels on the boundary q; the 4 x 4 matrix A is obtained by calculating the distances D between all the boundaries and normalized, at the sum of the maximum columns

Sum of minimum column

Satisfies the conditions

When the conditions are met, defining the boundary corresponding to the column and the maximum value as a false boundary, and removing the false boundary; tau is_CTo set the threshold, 0.1 ≦ τ_C≤1.0；

Then, respectively carrying out significance sorting based on the background and the foreground; calculating a saliency map based on the edge background for other edges after filtering the false boundaries;

finally, the background prior based significance ranking results are:

wherein the content of the first and second substances,

in correspondence with the remaining boundary(s),

representing a significant result graph based on each background edge, and obtaining a significant sequencing result S based on background prior through final multiplicative fusion_b(i) (ii) a The significance implementation of the analysis based on the foreground is expressed as:

after respectively solving significance graphs based on background and foreground, multiplicative fusion is carried out to obtain a primary significant target area S_initial(i)＝S_b(i)·S_f(i)；

(3-2) after obtaining a primary significant result, performing feature optimization by taking the foreground area in the graph as a seed point, and redefining a popular ranking function indication vector as follows:

calculating to obtain a final saliency map based on RGB color features after the saliency features are optimized;

(3-3) significance update in combination with depth information:

the significance results are updated by using an iterative method of a cellular automaton, and the method is modified according to the actual situation of the invention:

wherein, | | d_i,d_jI | represents the depth distance between superpixel blocks i and j, N_iIs a neighborhood set of superpixels i, f_ijRepresenting the similarity, δ, between different superpixel blocks²Indicating a parameter controlling similar intensity, δ²＝0.1；

Based on the depth feature similarity, the significance value of each super pixel is determined by the significance feature value of the super pixel and the neighborhood feature value of the super pixel. And setting the iterative propagation times of the cell machine as K times, wherein K is 5-50, and obtaining a more accurate RGB-D significance map by combining significance updating of depth information.

The method of the invention provides a novel stereo image significance detection technology in the field of stereo image processing research, and can simulate the human eye visual attention mechanism as much as possible. The method improves the existing stereo image salient object detection algorithm, and fully utilizes the important function of depth information in stereo vision salient object detection. The method introduces a novel discrimination mode, namely a depth map credibility evaluation factor, and can accurately measure whether scene information in the depth map has blur or distortion, and further judge whether the three-dimensional image significance detection can be carried out only through the depth map information. When the depth map is fuzzy or distorted, the scene characteristics can not be accurately judged only through the depth map, and the stereoscopic vision significance is analyzed by combining color information. The three-dimensional image saliency model can be directly applied to the engineering fields of 3D video processing, 3D quality evaluation, object recognition and the like.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a stereo image depth map;

fig. 3 is a stereo image depth map SLIC processing result;

FIG. 4 is a depth map pre-processing result diagram;

FIG. 5 is a depth map compact salient results image;

FIG. 6 is a depth map versus saliency results image;

FIG. 7 is a perspective image saliency map result based on depth information;

FIG. 8 is a perspective image saliency map result incorporating color information;

fig. 9 is a diagram of detection results of different stereoscopic image sequences.

Detailed Description

As shown in fig. 1, in the RGB-D image saliency detection method based on depth reliability analysis, firstly, the reliability of the information of the degree of distance of a scene reflected by a depth map is evaluated, a depth reliability factor λ is introduced as a main characteristic index for measuring the saliency of a stereo image, and when the depth map can more accurately reflect the scene information, that is, λ is less than 0.45, the saliency of the stereo image is detected only by depth characteristics; conversely, when the depth map is low in credibility or the scene is fuzzy, the saliency detection is performed by combining other features such as colors and the like, so that the saliency detection complexity of the stereo image is reduced on the whole.

The method comprises three modules: depth reliability analysis, stereoscopic vision significance detection based on depth information, and stereoscopic vision significance detection based on color characteristics. The method comprises the following specific steps:

(1) first, a depth confidence analysis is performed:

as shown in fig. 2, the depth image is input: firstly, the image segmentation is carried out by adopting the existing simple linear iterative clustering SLIC superpixel segmentation algorithm to construct(V, E), where V represents a set of nodes, each node in V corresponding to a superpixel block V_iThe edge E connects adjacent superpixel blocks; superpixel block v from depth map feature analysis_iThe depth confidence of (d) is:

wherein L is the gray level of the depth map, P_jIs the probability of the j-th gray level occurring in the depth map. C₀Is an intensity adjustment factor, C is more than or equal to 1₀≦ 10, example C ₀2. The image depth map SLIC processing results are shown in fig. 3.

Wherein N is_sRepresenting the number of superpixels, N_s∈[100,200,300]In the present invention, 200 is taken. The smaller lambda is, the higher the reliability of the depth map is, and the scene target information can be extracted more accurately from the depth map. When lambda is less than 0.45, a salient target area is accurately extracted through scene information reflected by the depth map, and stereoscopic vision saliency detection is carried out by adopting depth-based information; when the lambda is more than or equal to 0.45, processing is carried out by combining with other scene characteristics, and the method carries out stereoscopic vision significance detection by combining with color characteristics.

(2-1) coarse background filtering: in order to reduce the interference of a non-significant area with a large depth value in a depth map, the invention provides a simple rough background filtering mode, and the purpose of reducing the interference is realized by comparing the average depth difference degree of pixel points and the lines in the depth map, thereby obtaining a preliminaryDepth map front background discrimination:

and the average depth value of the line where the pixel point k is located is represented. The depth map obtained by the coarse background filtering is shown in fig. 4.

is a super pixel block v_jThe coordinates of the center of mass of the image,

m′_iand m'_jRepresenting a superpixel v_i'and v'_jAverage value of (d);

in order to control the constant amount of the liquid,

σ²to control the affinity constant of the similarity matrix,σ²＝0.1；

significance map S for depth compactness calculation_comThe final realization is as follows: s_com(vi) 1-norm (Sdc (vi)), norm (. cndot.) as a normalization function, remapping significant values to [0,255 [ (. cndot.) ]]The range of (a) yields a depth compactness saliency map. The results are shown in FIG. 5.

f_lindicating the frequency of occurrence of different depth values in the depth map after the rough background filtering. Wherein D (I'_k,I′_l)＝||I′_k-I′_lL, representing image pixel I'_kAnd other image pixel points I'_lSpatial distance between, pixel point I'_kAnd l'_lDepth value of [0,255 ]]Within the range. The results are shown in FIG. 6.

(2-4) significance of depth compactness result S_comAnd depth contrast saliency result S_conAnd (3) fusion, wherein the acquired saliency map based on the depth information is as follows: s_depth＝θS_com+(1-θ)S_con(ii) a θ is a positive control parameter between two significance terms, and θ is 0.5. The results are shown in FIG. 7.

(3-1) parallel structure based on background priors:

the effect of the background is first minimized by eliminating false boundaries to improve the accuracy of the detection. The realization method comprises the following steps:

is the mean of the characteristics of the three channels on the boundary p,

is the mean of the characteristics of the three channels on the boundary q. The 4 x 4 matrix a is obtained by calculating the distance D between all the boundaries and normalized. When a column in the matrix a is maximum, the corresponding boundary has a large characteristic difference from the boundaries of other images, and may contain foreground objects. Sum in maximum column

Sum of minimum column

Satisfies the conditions

When the conditions are met, defining the boundary corresponding to the column and the maximum value as a false boundary, and removing the false boundary; tau is_CTo set the threshold, 0.1 ≦ τ_C1.0. ltoreq. in this example τ_C＝0.4。

And (3) taking the existing popular ranking algorithm as a support, and respectively performing significance ranking based on the background and the foreground by adopting the general popular ranking algorithm function identification. The method for calculating the saliency map based on the edge background for the other edges after the false boundary filtering is implemented by taking the left boundary as an example, and comprises the following steps:

S_lfor a significant result calculated with the left boundary of the depth map as the background seed point,

is a normalized vector and i denotes a superpixel block index. And calculating the significance result based on other boundaries as the query seed point by adopting the same method.

Finally, the background prior based significance ranking results are:

wherein the content of the first and second substances,

in correspondence with the remaining boundary(s),

representing a significant result graph based on each background edge, and obtaining a significant sequencing result S based on background prior through final multiplicative fusion_b(i) In that respect Removing false boundaries may improve the accuracy of salient object detection when one or more boundaries are adjacent to a foreground object.

The same foreground-based significance implementation analyzed using the popularity ranking algorithm is shown as:

after the significance maps based on the background and the foreground are respectively obtained, multiplicative fusion is carried out, and the primary significant target area is obtained as follows:

S_initial(i)＝S_b(i)·S_f(i)。

and calculating to obtain a final saliency map based on the RGB color features after the saliency features are optimized.

(3-3) significance update in combination with depth information: the contribution of spatial scene position information to saliency is considered in combination with depth information to optimize the initial result. The significance results are updated by using a cell Automata iterative method (SCA), and the method is modified as follows according to the actual situation of the invention:

wherein, | | d_i,d_jI | represents the depth distance between superpixel blocks i and j, N_iIs a neighborhood set of superpixels i, f_ijRepresenting the similarity, δ, between different superpixel blocks²Indicating a parameter controlling similar intensity, δ²＝0.1。

Based on the depth feature similarity, the significance value of each super pixel is determined by the significance feature value of the super pixel and the neighborhood feature value of the super pixel. The number of iterative propagation times of the cell machine is set to be K times, wherein K is 5-50, and K is 10 in the embodiment, and a more accurate RGB-D significance map is obtained by combining significance updating of depth information. The final test results are shown in fig. 8.

In consideration of the detection results of the invention in different scenes, 5 groups of RGB-D image test sequences in different scenes are also selected for experiments to perform detection by using the stereo video saliency detection method of the invention, and the detection results are shown in fig. 9. The experimental result proves that the method can effectively detect the significant region of the RGB-D stereo image.

The protection content of the present invention is not limited to the above examples. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims

1. The RGB-D image significance detection method based on depth reliability analysis is characterized by comprising the following steps of: the method comprises the steps of firstly, evaluating the reliability of the information of the degree of distance of a scene reflected by a depth map to obtain a reliability factor lambda of the depth map, taking lambda as a characteristic index for measuring the significance of a three-dimensional image, carrying out three-dimensional visual significance detection based on the depth information when the lambda is less than 0.45, and carrying out three-dimensional visual significance detection by combining color characteristics when the lambda is more than or equal to 0.45; the method comprises three modules: analyzing depth reliability, performing stereoscopic vision significance detection based on depth information, and performing stereoscopic vision significance detection by combining color characteristics; the method comprises the following specific steps:

(1) first, a depth confidence analysis is performed: inputting a depth image: first using linear iterationThe clustering SLIC superpixel segmentation algorithm carries out image segmentation and constructs a graph G ═ V, E, wherein V represents a node set, and each node in V corresponds to a superpixel block V_iThe edge E connects adjacent superpixel blocks; superpixel block v from depth map feature analysis_iThe depth confidence of (d) is:

wherein L is the gray level of the depth map, P_jIs the probability of the occurrence of the jth gray level in the depth map; c₀Is an intensity adjustment factor;

Wherein N is_sRepresenting the number of superpixels, N_s∈[100,200,300](ii) a When lambda is less than 0.45, stereoscopic vision significance detection is carried out based on the depth information; when the lambda is more than or equal to 0.45, stereoscopic vision significance detection is carried out by combining color characteristics;

is a super pixel block v_jThe coordinates of the center of mass of the image,

m′_iand m'_jRepresents superpixel v'_iAnd v'_jAverage value of (d);

to control constant, σ²To control the affinity constant of the similarity matrix;

significance map S for depth compactness calculation_comThe final realization is as follows: s_com(vi) 1-norm (Sdc (vi)), norm (. cndot.) as a normalization function, remapping significant values to [0,255 [ (. cndot.) ]]Obtaining a depth compactness saliency map from the range of (1);

(2-4) significance of depth compactness result S_comAnd depth contrast saliency result S_conAnd (3) fusion, wherein the acquired saliency map based on the depth information is as follows: s_depth＝θS_com+(1-θ)S_conθ is a positive control parameter between two significant terms;

(3-1) parallel structure based on background priors:

is the mean of the characteristics of the three channels on the boundary p,

Sum of minimum column

Satisfies the conditions

When the conditions are met, defining the boundary corresponding to the column and the maximum value as a false boundary, and removing the false boundary; tau is_CSetting a threshold value;

finally, the background prior based significance ranking results are:

wherein the content of the first and second substances,

in correspondence with the remaining boundary(s),

S_initial(i)＝S_b(i)·S_f(i)；

calculating to obtain final saliency based on RGB color features after the saliency feature optimizationA sex map;

(3-3) significance update in combination with depth information:

updating the significance result by using a cellular automaton iteration method, and modifying the method according to the actual situation:

wherein, | | d_i,d_jI | represents the depth distance between superpixel blocks i and j, N_iIs a neighborhood set of superpixels i, f_ijRepresenting the similarity, δ, between different superpixel blocks²A parameter indicating control of similar intensity;

based on the depth feature similarity, the significance value of each super pixel is determined by the significance feature value of each super pixel and the neighborhood feature value of each super pixel; and setting the iterative propagation times of the cell machine as K times, and obtaining a more accurate RGB-D significance map by combining significance updating of depth information.

2. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: 1 is less than or equal to C₀≤10。

3. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by:

4. the RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: sigma²＝0.1。

5. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: n is a radical of_s＝200。

6. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: θ is 0.5.

7. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: tau is not less than 0.1_C≤1.0。

8. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: delta²＝0.1。

9. The RGB-D image saliency detection method based on depth credibility analysis as claimed in claim 1, characterized by: k is 5-50.