CN110991547A

CN110991547A - Image significance detection method based on multi-feature optimal fusion

Info

Publication number: CN110991547A
Application number: CN201911276766.1A
Authority: CN
Inventors: 李建平; 顾小丰; 胡健; 王青松; 蒋涛; 陈强强; 贺喜; 李天凯
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-10

Abstract

The invention discloses an image saliency detection method based on multi-feature optimal fusion, aiming at the problem that the existing saliency detection algorithm only selects the bottom layer features of color, texture, direction and the like of an image in a space domain or only analyzes the singleness of a problem from the angle of a frequency domain, and the saliency of an object is calculated by selecting the features which can show the saliency of the object through comparison between the space domain and the frequency domain; the method can set different weights for various characteristics according to the importance degree of the various characteristics of the image to the significance detection by learning the training data through the support vector machine; the invention provides a saliency detection algorithm capable of obtaining a more accurate and clear saliency map so as to allocate more computer resources to the extracted saliency areas in various computer vision field tasks and efficiently process various vision field tasks.

Description

Image significance detection method based on multi-feature optimal fusion

Technical Field

The invention belongs to the technical field of saliency detection, and particularly relates to an image saliency detection method based on multi-feature optimal fusion.

Background

In recent decades, the field of significance detection has been developed vigorously, and many significance detection models have been proposed by scholars. The main idea of these models is based on a feature fusion theory and a visual attention theory, and through a Center-periphery (Center-around) mechanism, color, brightness and direction features of an image are calculated to simulate a bottom-up visual attention mechanism of a human visual system, and a saliency detection model is established by calculating contrast of color features.

These algorithms generally work well by mimicking the mechanism by which the human visual system can quickly locate objects of interest, but there is still some room for improvement.

However, at present, the knowledge of the attention mechanism of the visual system is not perfect, and the diversity of images and the interference of noise cause that many methods can only start from the primary features without paying attention to the importance of advanced features such as image semantic information, and many methods have low calculation efficiency and the accuracy of the calculation result needs to be improved. And because many methods are proposed aiming at some specific tasks, the universality of the method is not strong, and some methods have good effect on a standard data set, but when the foreground and the background of the image are relatively close and the like, the deviation between the position of the obtained saliency map and the true value is large, and the saliency region is difficult to accurately extract.

Therefore, the significance detection algorithm capable of obtaining more accurate and fresh significance map is provided, and the significance detection algorithm has great significance for fully utilizing computer resources and efficiently processing various visual field tasks.

Disclosure of Invention

Aiming at the defects in the prior art, the image saliency detection method based on multi-feature optimal fusion provided by the invention solves the problems that the conventional saliency detection method is single in feature analysis angle and is difficult to accurately reflect the image saliency.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: an image saliency detection method based on multi-feature optimal fusion comprises the following steps:

s1, acquiring an image to be detected;

s2, preprocessing an image to be detected through a linear iterative clustering segmentation algorithm to obtain a plurality of super pixels formed by adjacent pixels;

s3, extracting color features and texture features of the super pixels, and extracting frequency domain features of the image to be detected;

and S4, inputting the extracted color features, texture features and frequency domain features into a trained support vector machine to obtain corresponding saliency values, forming a final saliency map, and realizing image saliency detection.

Further, in step S3, the method for extracting color features of the super-pixels specifically includes:

establishing a color histogram for each super pixel, taking each super pixel as a target super pixel in sequence, and performing super pixel color comparison on the target super pixel and all other super pixels to obtain a color comparison significant value of the target super pixel as a color feature of the super pixel.

Further, the color contrast saliency value S (r) of said target superpixel_k) The calculation formula of (2) is as follows:

in the formula, D_s(r_k,r_i) Is a target super pixel r_iAnd a super pixel r_kThe spatial distance between；

w(r_i) Is a target super pixel r_iThe spatial weight of (2);

δ_sfor the influencing variable of the spatial weighting for the calculation of the color contrast saliency value, said δ_sThe smaller the space weight value is, the greater the calculation effect on the color significant contrast is;

D_r(r_k,r_i) Is a target super pixel r_iAnd a super pixel r_kThe calculation formula of the color space distance between the two is specifically as follows:

in the formula, f (c)₁I) is the i-th color in the super pixel c₁The probability of occurrence of (a);

f(c₂j) is the jth color in the super pixel c₂The probability of occurrence of (a);

D(c_1,i,c_2,j) Is a super pixel c₁And the ith color of (c) and the super pixel₂The spatial distance of the jth color in (a);

n₁,n₂are respectively a super pixel c₁And super pixel c₁Total number of colors in (1).

Further, in step S3, the method for extracting texture features of the super pixels specifically includes:

a1 calculating the super pixel R by two-dimensional Gabor filter_jIn each pixel I_iFeature vector G (R)_j,I_i)；

A2, based on feature vector G (R)_j,I_i) Calculating the superpixel R_jEach pixel of (1)_iTexture feature vector G (R)_j)；

A3, calculating all pixels I_iTexture feature vector G (R)_j) Average value to obtain super pixel R_jTexture saliency value S of_t(j) As a super pixel R_jThe texture feature of (1).

Further, the expression of the two-dimensional Gabor filter in the step a1 is:

wherein, the real number part of the two-dimensional Gabor filter is:

the imaginary part of the two-dimensional Gabor filter is:

wherein x '═ x cos θ + y sin θ, y' ═ x sin θ + y cos θ;

x is the value of a pixel on the x-axis in two-dimensional space;

y is the value of the pixel on the y-axis in two-dimensional space;

λ is the wavelength of the sine function;

θ represents the direction of the Gabor kernel function;

ψ denotes a corresponding phase shift amount;

σ represents the standard deviation of the Gaussian function;

γ represents the width to height ratio of the space;

the super-pixel R in the step A1_jIn each pixel I_iFeature vector G (R)_j,I_i) Comprises the following steps:

in the formula, G_i(s, o) is the pixel I at a certain scale and orientation_iThe characteristic vector after being filtered by a two-dimensional Gabor filter, s is a first-dimensional characteristic vector, and o is a second-dimensional characteristic vector;

in the step A2, the super-pixel R_jEach pixel of (1)_iTexture feature vector G (R)_j) Comprises the following steps:

in the formula, N_iIs a super pixel R_jThe total number of pixels in;

in the step A3, the super-pixel R_jTexture saliency value S of_t(j) Comprises the following steps:

in the formula, N_tThe number of superpixels in the image to be detected is shown;

D(G(R_i),G(R_j) Is a super pixel R_jAnd a super pixel R_iThe euclidean distance between.

Further, in step S3, the method for extracting the frequency domain feature of the image to be detected i (x) specifically includes:

b1, converting the image I (x) to be detected in the original space domain into a frequency domain through Fourier transform, and calculating a phase spectrum and a magnitude spectrum of the image in the frequency domain;

b2, calculating a log spectrum of the amplitude spectrum, and filtering the log spectrum;

b3, calculating spectrum residual error information corresponding to the filtered log spectrum;

b4, performing inverse Fourier transform on the frequency spectrum residual error information and the magnitude spectrum, and performing Gaussian smoothing on the inverse Fourier transform result to obtain a saliency map S_f(x) And taking the frequency domain feature of the image to be detected as the frequency domain feature of the image to be detected.

Further, the calculation formula of the amplitude spectrum p (f) in the step B1 is:

P(f)＝S(F[I(x)])

wherein S (-) is a function of amplitude as a function of angular frequency;

f [. cndot. ] is a Fourier transform;

phase spectrum a (f) is:

A(f)＝R(F[I(x)])

wherein R (-) is a function of phase with angular frequency;

in step B2, the log spectrum l (f) of the amplitude spectrum p (f) is:

L(f)＝log(A(f))

in the formula, log (-) is a logarithm operator;

in step B3, the spectrum residual information r (f) is:

R(f)＝L(f)-h_n(f)*L(f)

in the formula, h_n(f) Is an average filter, h_n(f) L (f) is a filtering operation performed on the log spectrum l (f) by a mean filter;

in the step B4, the saliency map S_f(x) Comprises the following steps:

S_f(x)＝g(x)*F^-1[exp(R(f)+P(f))]²

wherein g (x) is a Gaussian smoothing filter;

F^-1[·]an inverse of a Fourier transform;

exp (. cndot.) is an exponential function.

Further, in step S4, the training data set used for training the support vector machine is a super-pixel set T { (x)₁,y₁),(x₂,y₂),.....(x_N,y_N) In which x_iFor each super pixel a feature vector (c)_i,t_i,f_i)，c_iTo correspond to the average color characteristic of a superpixel, t_iAverage texture feature of corresponding superpixel f_iFor the average frequency domain characteristics of the corresponding image, y_iControl variable, y, for the class of the corresponding superpixel_iWhen 1, y_iIndicating that the corresponding superpixel is a salient target region, y_iWhen equal to 0, y_iRepresenting the corresponding superpixel as a background area;

in step S4, the method for training the support vector machine specifically includes:

c1, segmenting data in the training data set through a linear hyperplane to obtain a corresponding hyperplane;

and C2, determining the hyperplane with the maximum interval in all hyperplanes to obtain the trained support vector machine.

Further, in the step C1, the expression when the training data set is segmented by the linear hyperplane is as follows:

h(x)＝ω^Tx+b

wherein h (x) is a hyperplane after division;

omega is a normal vector and is used for determining the direction of the hyperplane;

b is a displacement term used for the distance between the absolute hyperplane and the origin;

wherein (ω, b) denotes a hyperplane, and the hyperplane satisfies the condition:

h(x_i)y_i≥1。

further, the step S4 is specifically:

and learning input data through a trained support vector machine to respectively obtain optimal fusion coefficients of color features, texture features and frequency domain features, calculating a significant value of each super pixel based on the fusion coefficients, further obtaining a significant graph, and realizing image significance detection.

The invention has the beneficial effects that: the invention discloses an image saliency detection method based on multi-feature optimal fusion, aiming at the problem that the existing saliency detection algorithm only selects the bottom layer features of color, texture, direction and the like of an image in a space domain or only analyzes the singleness of a problem from the angle of a frequency domain, and the saliency of an object is calculated by selecting the features which can show the saliency of the object through comparison between the space domain and the frequency domain; the method can set different weights for various characteristics according to the importance degree of the various characteristics of the image to the significance detection by learning the training data through the support vector machine; the invention provides a saliency detection algorithm capable of obtaining a more accurate and clear saliency map so as to allocate more computer resources to the extracted saliency areas in various computer vision field tasks and efficiently process various vision field tasks.

Drawings

Fig. 1 is a flowchart of an image saliency detection method based on multi-feature fusion provided by the invention.

Fig. 2 is a schematic comparison diagram of a saliency map of each algorithm of the MSRA-1000 dataset according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating a saliency map comparison of each algorithm of the SED2 data set in an embodiment provided by the present invention.

FIG. 4 is a diagram illustrating PR curves of each algorithm on an SOD data set according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating a comparison of PR curves of algorithms on MSRA-1000 in an embodiment of the present invention.

FIG. 6 is a diagram illustrating a comparison of PR curves of algorithms on the SED2 according to an embodiment of the present invention.

FIG. 7 is a diagram illustrating PR curves of algorithms on SOD according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, an image saliency detection method based on multi-feature optimal fusion includes the following steps:

s1, acquiring an image to be detected;

In the above step S2, in order to facilitate the extraction of the features of the subsequent image, it is first necessary to process the image by using a super-pixel segmentation technique of the image to obtain a super-pixel block for the subsequent processing, where the pixel is the minimum representation unit of the image, most of the previous image saliency detection techniques use the pixel as a basic processing unit for processing, but since the subsequent image size is larger and larger, the calculation complexity is higher, and the image processing using the pixel as a basic unit cannot well utilize the local structural features of the image, in this context, the proposed super-pixel concept is favored by researchers, the super-pixel refers to a process in which adjacent pixels cluster these pixels into pixel blocks having a certain visual meaning but irregular shape according to the similarity of the features such as color, texture, brightness, etc., the super-pixel segmentation aims to cluster the pixels into the super-pixel block according to the similarity of the features, due to the fact that the local structure information of the image can be well obtained through superpixel segmentation, the efficiency of subsequent calculation is greatly improved, and therefore the method becomes a preprocessing step of multiple processes.

At present, various superpixel segmentation algorithms have appeared, each algorithm has corresponding characteristics, and the superpixel segmentation algorithms can be divided into two types according to different principles of segmenting an image to generate superpixels: graph theory based algorithms and gradient based algorithms. The first class of algorithms generally converts the image to be processed into a weighted undirected graph structure, the relationship of any adjacent pixels is mapped to the edges of the undirected graph, the weights on the edges represent the similarity of the corresponding pixel characteristics, then the undirected graph is divided, and an objective cost function is constructed, the optimal value of which is calculated to obtain the superpixels. Based on the gradient descent segmentation algorithm, part of pixel points are usually selected as seed points, then clustering is carried out by taking the pixel points as centers, and then the corresponding clustering results are simultaneously modified in the continuous iteration process until the corresponding convergence conditions are reached, and the operation is stopped. The invention processes images using a SLIC superpixel segmentation algorithm with high accuracy.

In the step S3, when extracting the color feature, the contrast of the color generally refers to the degree of difference between the color of a certain region in the image and the color of the surrounding region, and generally, the larger the difference in color, the more noticeable the human is. Since the global contrast can better show the whole saliency region, the algorithm uses the global contrast to calculate the saliency value of the target region.

Therefore, the method for extracting the color feature of the super-pixel specifically comprises the following steps:

Wherein the color contrast saliency value S (r) of the target superpixel_k) The calculation formula of (2) is as follows:

in the formula, D_s(r_k,r_i) Is a target super pixel r_iAnd a super pixel r_kThe spatial distance therebetween;

w(r_i) Is a target super pixel r_iThe spatial weight of (2);

D(c_1,i,c_2,j) Is a super pixel c₁And the ith color of (c) and the super pixel₂The spatial distance of the jth color in (1)Separating;

Color contrast saliency value S (r) at target superpixel_k) The calculation formula (2) takes into account the influence of the spatial distance between the super-pixel blocks on the contrast of the super-pixel blocks to enhance the enhancement effect of the closer distance on the contrast calculation and inhibit the inhibition effect of the farther distance on the contrast calculation.

In the step S3, the texture features can reflect the uniform or non-uniform variation degree of a certain structure in a certain image region, and can visually reflect the characteristics of the object, and the method for extracting the texture features of the image includes a model method, a statistical method, a spectrum method and a structure analysis method. The Gabor filter can analyze image information in a frequency domain and a space domain simultaneously, the expression of the frequency and the direction of the image information is very similar to the human visual system, and the texture information of the image can be well extracted. Therefore, the method for extracting the texture feature of the super pixel in step S3 specifically includes:

Specifically, a Gabor filter is used for carrying out filtering processing on the segmented super-pixel block, the filter is set to use 4 directions and 4 scale parameters to obtain 4 groups of 4 different scales and 16 different Gabor characteristic vectors in 4 different directions, and a pixel I under a certain scale and direction is set_iThe Gabor filtered feature vector of (1) is G_i(s, o) to obtain a super pixel R_jIn each pixel I_iFeature vector G (R)_j,I_i)；

Wherein, the expression of the two-dimensional Gabor filter in the step a1 is:

wherein, the real number part of the two-dimensional Gabor filter is:

the imaginary part of the two-dimensional Gabor filter is:

wherein x '═ x cos θ + y sin θ, y' ═ x sin θ + y cos θ;

x is the value of a pixel on the x-axis in two-dimensional space;

y is the value of the pixel on the y-axis in two-dimensional space;

λ is the wavelength of the sine function;

θ represents the direction of the Gabor kernel function;

ψ denotes a corresponding phase shift amount;

σ represents the standard deviation of the Gaussian function;

γ represents the width to height ratio of the space;

in the formula, G_i(s, o) is the pixel I at a certain scale and orientation_iThe feature vector after being filtered by a two-dimensional Gabor filter, s is a first-dimensional feature vector, and o is a second-dimensional feature vector；

in the formula, N_iIs a super pixel R_jThe total number of pixels in;

In the step S3, when performing frequency domain feature extraction on the image i (x) to be detected, the saliency target of the image can be efficiently analyzed by converting the image information into the frequency domain and calculating the logarithmic frequency of the image. The main principle of image significance analysis from the angle of a frequency domain is that a human visual system is sensitive to comparative special information and can inhibit frequently-occurring information, so that image information can be divided into redundant information and special information from the angle of information theory, and the redundant information of the image is removed to obtain the significance information of the remaining image. Therefore, the method for extracting the frequency domain feature of the image i (x) to be detected in step S3 specifically includes:

The calculation formula of the amplitude spectrum p (f) in the step B1 is:

P(f)＝S(F[I(x)])

wherein S (-) is a function of amplitude as a function of angular frequency;

f [. cndot. ] is a Fourier transform;

phase spectrum a (f) is:

A(f)＝R(F[I(x)])

wherein R (-) is a function of phase with angular frequency;

in step B2, the log spectrum l (f) of the amplitude spectrum p (f) is:

L(f)＝log(A(f))

in the formula, log (-) is a logarithm operator;

in step B3, the spectrum residual information r (f) is:

R(f)＝L(f)-h_n(f)*L(f)

in the step B4, the saliency map S_f(x) Comprises the following steps:

S_f(x)＝g(x)*F^-1[exp(R(f)+P(f))]²

wherein g (x) is a Gaussian smoothing filter;

F^-1[·]an inverse of a Fourier transform;

exp (. cndot.) is an exponential function.

Most of the traditional image saliency detection methods simply add or multiply the extracted feature maps linearly, so that different importance degrees of the extracted feature maps to the finally formed saliency map are not considered, and different contribution degrees of the extracted different features to the saliency map generation can be better distinguished by learning the super-pixel features of the target region and the background region by using a support vector machine, and meanwhile, in a classification model of an image, the saliency detection problem can be regarded as a problem of dividing the image region into a target and a background. The support vector machine is a model for processing classification and regression problems in a learning mode under supervision, is widely applied to the fields of pattern recognition, text classification and the like, and has high accuracy in small sample and high-dimensional data classification. The method has the advantages of high robustness and the like, can process significance detection tasks, and mainly comprises two stages by using a support vector machine to carry out significance detection; the first stage is to use an initial support vector machine to learn the characteristics of the super-pixels of the salient object or the background, and obtain a classifier through a continuous iterative optimization process; the second stage is to use the classifier to compute its saliency values for the features of the superpixels of the input image, forming the final saliency image. The basic idea of the support vector machine is that for given data, if the data are linearly separable, a hyperplane needs to be found to distinguish the data, but the hyperplane is often many, so that a separating hyperplane which maximizes two types of support vectors needs to be selected, and if the data are linearly inseparable, the current data needs to be mapped to a high-dimensional space for processing. Based on this, when the saliency of the image is detected by the support vector machine in step S4,

the training data set used to train the support vector machine is the superpixel set T { (x)₁,y₁),(x₂,y₂),.....(x_N,y_N) In which x₁For each super pixel a feature vector (c)_i,t_i,f_i)，c_iTo correspond to the average color characteristic of a superpixel, t_iAverage texture feature of corresponding superpixel f_iFor the average frequency domain characteristics of the corresponding image, y_iControl variable, y, for the class of the corresponding superpixel_iWhen 1, y_iRepresenting corresponding superThe pixel is a salient target region, y_iWhen equal to 0, y_iRepresenting the corresponding superpixel as a background area;

the method for training the support vector machine specifically comprises the following steps:

In step C1, the expression when the training data set is segmented by the linear hyperplane is:

h(x)＝ω^Tx+b

wherein h (x) is a hyperplane after division;

(ω, b) to represent the hyperplane that can correctly classify the training data, then the hyperplane satisfies the condition:

h(x_i)y_i≥1；

meanwhile, in order to search for the hyperplane having the maximum interval, it is necessary to solve (ω, b) that the sum of the distances from the support vector to the hyperplane, where the equal sign of the above equation holds, is the maximum, and assuming that the sum of the distances from the support vector to the hyperplane is represented by γ, the calculation formula is as follows:

the problem of finding the maximum spacing then translates into finding (ω, b) such that

At the most, the specific requirements are as follows:

meanwhile, the above formula can be transformed into the following problem, and the appropriate (ω, b) is obtained to minimize | | ω |, and the specific requirement is as follows:

corresponding (omega) is obtained by solving the minimization of the above formula^*,b^*) The corresponding separation hyperplane is:

(ω^*)^Tx+b^*＝0

after the trained support vector machine (classifier) is obtained, the feature vector x of each super pixel can be input_iAnd obtaining the significant value of the super pixel, and finally obtaining a corresponding significant image.

Therefore, the step S4 is specifically:

Example 2:

in one embodiment of the invention, the method of the invention is compared with the existing 8 classical algorithms, the algorithms are IT, LC, HC, SR, FT, AC, CA and GB respectively, the saliency maps generated by the classical algorithms are generated by the codes of the corresponding algorithms, and the validity of the algorithms is verified by combining the true value maps and the evaluated indexes;

(1) experimental Environment

The effectiveness of the multi-feature fusion based algorithm was verified experimentally, and the programming environment used was Matlab2014 a. The computer is configured as Windows7(64 bits), Intel [email protected], 8G memory. Experiments were performed in this section on the MSRA-1000 dataset, the SED2 dataset and the SOD dataset.

(2) Subjective effect comparison

In MSRA-1000, SED2 and SOD data set random drawing picture, using the invention proposed based on the multi-feature optimal fusion significance detection algorithm generated significance map, and the other mentioned in the above section of the significance map generated by some other classical algorithms to qualitatively compare, the classical algorithm obtained significance map and the algorithm obtained significance map is shown in the lower graph. In fig. 2 to 4, the original image, the saliency maps obtained by the IT, LC, HC, SR, FT, AC, CA, and GB algorithms, and the saliency map and the truth map obtained by the present algorithm are shown in order from left to right.

By comparing the significance detection effects of the algorithms on different data sets, the results of most algorithms on the MSRA-1000 data set are the best, the results on the SED2 data set are good, and the results on the SOD data set are poor, which is mainly because the images in the MSRA-1000 data set only have one target area, the difference between foreground and background characteristics is large, the targets are clear, most of the target areas are located in the center of the images, and the difficulty of detecting the target areas is low. The image in the SED2 dataset typically has two target regions and the target regions are typically located relatively far from the centre and are difficult to detect. However, the target area of the image in the SOD data set is usually not clear enough, and the background is complex, so that the interference to foreground detection is large, and the salient area is not easy to detect. The target area obtained by the IT algorithm is found to be incomplete and poor in effect by comparing the effect of the IT algorithm with that of other algorithms on a single data set; the HC algorithm can better resist the interference of background noise, but can not highlight a salient object; the SR algorithm and the GB algorithm can only detect the boundary of an object, and the comparison cavity inside the object is obvious; the CA algorithm can only detect the edge of an object and cannot highlight the inner area of a salient object; LC algorithms, AC algorithms and FT algorithms are often prone to false detection of the background as a target area when the foreground and target are relatively close. In general, a classical saliency detection algorithm can detect saliency objects, but the problems that a generated target region is incomplete or background noise is easy to detect as a saliency object exist, and the like, while the algorithm provided in this chapter can completely detect the saliency target region, obtain clearer object edges and highlight target regions, inhibit the interference of background noise, and show better effect in the detection of natural scene pictures.

(3) Objective effect comparison

Comparing the performance of the algorithm of the chapter with 8 classical algorithms on public data sets MSRA-1000, SED2 and SOD, and mainly using the following performance indexes: PR curve, F-measure value, MAE value, and AUC value. The PR curve is mainly used for describing the relation between the precision ratio and the recall ratio; the F-measure value comprehensively reflects the relationship between the accuracy and the recall rate; MAE represents the error degree between the saliency map obtained by the algorithm and the artificially labeled real map; AUC represents the probability that, for one positive and one negative case of randomness, the probability value predicted as a positive case by the algorithm is greater than the probability value predicted as a negative case. The PR curve image, F-measure image, MAE value image and AUC image at a fixed threshold for the algorithm of the invention and the other 8 classical algorithms were compared, respectively, mainly in three sets of data sets MSRA-1000, SED2 and SOD.

Fig. 5-7 show PR curve comparisons between the algorithm of the present invention and 8 other algorithms on different datasets at a fixed threshold, respectively.

As can be seen from fig. 5-7, the PR curves for most algorithms are higher than the IT algorithms proposed earlier, and the PR curves for the various algorithms are highest on the MSRA-1000 dataset, lower on the SED2 dataset, and lowest on the SOD dataset. The reason is that the picture in the MSRA-1000 generally has only one target area, the difference between the foreground characteristic and the background characteristic is large, the target is clear, most of the target area is located in the center of the image, and the difficulty of detecting the target area is low. It is for this reason that the image in the SED2 dataset typically has two target regions and the target is typically located relatively far off-center, making it difficult to detect the target regions, which results in the algorithm being less accurate on the SED2 dataset than it is on the MSRA-1000. The target area of the image in the SOD dataset is usually not clear enough, and the foreground and background image features are relatively close, and are relatively disturbed by background noise, so that the salient area is not easy to detect, and therefore the PR curve of an algorithm on the dataset is much lower than that of the other datasets. The PR curve of the algorithm is positioned above other PR curves on the MSRA-1000 data set, the accuracy rate of the algorithm under the same recall rate is almost higher than that of all other algorithms, and the algorithm can well detect a significant target; at the same accuracy, the recall rate of the algorithm in this chapter is almost close to or higher than that of all other algorithms. The comparison proves that the algorithm has better significance detection performance, and the obtained significance map is closer to the real map. Even in the SED2 data set and the SOD data set with high difficulty in detecting the obvious target, the PR curve of the algorithm is slightly higher than that of other classical algorithms, because most of the algorithms only simply fuse the feature maps of the images, and the algorithm fuses the image features by using the optimal weight by using a support vector machine, so that a more accurate target region can be obtained.

Claims

1. An image saliency detection method based on multi-feature optimal fusion is characterized by comprising the following steps:

s1, acquiring an image to be detected;

2. The method for detecting image saliency based on multi-feature optimal fusion according to claim 1, wherein in the step S3, the method for extracting color features of the super-pixels specifically comprises:

3. The image saliency detection method based on multi-feature optimal fusion according to claim 2, characterized in that the color contrast saliency value S (r) of the target superpixel_k) The calculation formula of (2) is as follows:

w(r_i) Is a target super pixel r_iThe spatial weight of (2);

4. The method for detecting image saliency based on multi-feature optimal fusion according to claim 1, wherein in the step S3, the method for extracting texture features of the superpixels specifically comprises:

5. The method for detecting the image saliency based on multi-feature optimal fusion according to claim 4, wherein the expression of the two-dimensional Gabor filter in the step A1 is as follows:

wherein, the real number part of the two-dimensional Gabor filter is:

the imaginary part of the two-dimensional Gabor filter is:

wherein x '═ xcos θ + ysin θ, y' ═ xsin θ + ycos θ;

x is the value of a pixel on the x-axis in two-dimensional space;

y is the value of the pixel on the y-axis in two-dimensional space;

λ is the wavelength of the sine function;

θ represents the direction of the Gabor kernel function;

ψ denotes a corresponding phase shift amount;

σ represents the standard deviation of the Gaussian function;

γ represents the width to height ratio of the space;

in the formula, N_iIs a super pixel R_jThe total number of pixels in;

6. The image saliency detection method based on multi-feature optimal fusion according to claim 1, wherein in the step S3, the method for extracting the frequency domain features of the image to be detected i (x) specifically comprises:

7. The image saliency detection method based on multi-feature optimal fusion according to claim 6, characterized in that the calculation formula of the magnitude spectrum P (f) in the step B1 is as follows:

P(f)＝S(F[I(x)])

wherein S (-) is a function of amplitude as a function of angular frequency;

f [. cndot. ] is a Fourier transform;

phase spectrum a (f) is:

A(f)＝R(F[I(x)])

wherein R (-) is a function of phase with angular frequency;

in step B2, the log spectrum l (f) of the amplitude spectrum p (f) is:

L(f)＝log(A(f))

in the formula, log (-) is a logarithm operator;

in step B3, the spectrum residual information r (f) is:

R(f)＝L(f)-h_n(f)*L(f)

in the step B4, the saliency map S_f(x) Comprises the following steps:

S_f(x)＝g(x)*F^-1[exp(R(f)+P(f))]²

wherein g (x) is a Gaussian smoothing filter;

F^-1[·]an inverse of a Fourier transform;

exp (. cndot.) is an exponential function.

8. The method for detecting image saliency based on multi-feature optimal fusion of claim 1, characterized in that in step S4, the training data set for training support vector machine is a super-pixel set T { (x)₁,y₁),(x₂,y₂),.....(x_N,y_N) In which x_iFor each super pixel a feature vector (c)_i,t_i,f_i)，c_iTo correspond to the average color characteristic of a superpixel, t_iAverage texture feature of corresponding superpixel f_iFor the average frequency domain characteristics of the corresponding image, y_iControl variable, y, for the class of the corresponding superpixel_iWhen 1, y_iIndicating that the corresponding superpixel is a salient target region, y_iWhen equal to 0, y_iRepresenting the corresponding superpixel as a background area;

9. The method for detecting image saliency based on multi-feature optimal fusion according to claim 8, wherein in the step C1, the expression when the training data set is segmented by the linear hyperplane is as follows:

h(x)＝ω^Tx+b

wherein h (x) is a hyperplane after division;

h(x_i)y_i≥1。

10. the method for detecting image saliency based on multi-feature optimal fusion according to claim 1, wherein the step S4 specifically includes: