CN111242941A

CN111242941A - Salient region detection method and device based on visual attention

Info

Publication number: CN111242941A
Application number: CN202010065654.8A
Authority: CN
Inventors: 杨剑; 沈阳; 史玉回
Original assignee: Southern University of Science and Technology
Current assignee: Southern University of Science and Technology
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-05
Anticipated expiration: 2040-01-20
Also published as: CN111242941B

Abstract

The invention discloses a method and a device for detecting a salient region based on visual attention. The method comprises the steps of carrying out superpixel segmentation on an input scene to obtain a plurality of superpixels, carrying out significance evaluation by using a brain storm optimization algorithm, selecting and calculating the significance of the superpixels, carrying out superpixel classification according to the significance, selecting new superpixels according to classification results to carry out significance evaluation, iterating for multiple times until a termination condition is reached, and generating a significance map of the input scene. The clustering characteristic of the head-to-head Feng-Bai optimization algorithm is utilized to guide the search of the salient region to converge towards the salient region in the iteration process, all regions of the image or the scene do not need to be traversed, partial super pixels are selected for carrying out the significance evaluation, the incomplete salient image is finally output, the visual attention mechanism of human is fully simulated, the salient region detection from bottom to top is realized, and the salient region in the input scene is quickly positioned. Can be widely applied to the field of visual inspection.

Description

Salient region detection method and device based on visual attention

Technical Field

The invention relates to the field of computer software, in particular to a method and a device for detecting a salient region based on visual attention.

Background

Nowadays, research on rapidly positioning a salient region from a scene by simulating the perception principle of human vision through a visual attention mechanism to perform next processing so as to reduce the data volume needing to be processed is increasing, and a visual attention calculation model has been applied to the fields of target detection, identification, tracking and the like.

The existing main visual attention detection methods include 1) a cognitive attention model such as Itti extracts primary color, brightness, orientation and other features of an input image according to the behavior and network structure of an early original visual system, uses central and peripheral operations to generate feature maps representing significance measures under multiple scales, and finally combines the feature maps to obtain a final significance map. 2) The discriminative saliency model proposed by Gao et al considers the saliency problem as the optimal decision for two types of visual stimuli: the saliency is calculated by determining a classification problem, the saliency at each location in the visual field is equivalent to the discriminant power of a visual feature set at that location, and the smaller the expected classification error for a particular location or point, the greater the saliency. 3) The significance model based on frequency domain analysis calculates the significance of the image in the frequency domain, such as a spectrum residual method, a phase spectrum Fourier transform method, a frequency tuning method and the like. 4) The visual attention model based on graph theory regards eye movement data in the visual attention test as a time sequence, and due to the fact that a large number of hidden variables influence the eye movement, the visual attention model is realized by using methods such as a hidden Markov model, a dynamic Bayesian network or a conditional random field.

However, the above methods all use a traversal processing method to calculate the saliency value of each region or pixel for the whole scene or image, and the final output is a complete saliency map, which not only increases the amount of calculation, but also does not conform to the processing mechanism of real visual attention selectivity. Therefore, there is a need to provide a method for detecting a salient region based on visual attention, which can sufficiently simulate a human visual attention mechanism to quickly locate the salient region in a scene without traversing the whole region of an image or the scene.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the invention aims to provide a salient region detection method based on visual attention.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a method for detecting a salient region based on visual attention, comprising:

performing superpixel segmentation on an input scene to obtain a plurality of superpixels;

carrying out significance evaluation by using a group intelligent optimization algorithm, selecting and calculating the significance of the superpixels, and carrying out superpixel classification according to the significance;

and selecting new super pixels to carry out significance evaluation according to the classification result, iterating for multiple times until a termination condition is reached, and generating a significance map of the input scene.

Further, the calculating the saliency of the super-pixel specifically includes:

selecting a first preset number of superpixels as superpixels to be evaluated in a selection range which takes the center of the input scene as a circle center and takes a first preset distance as a radius;

calculating to obtain the spatial significance of the super pixel to be evaluated by combining the attenuation coefficient and the feature vector of the super pixel to be evaluated;

and calculating the significance of the super pixel to be evaluated according to the focus deviation and the spatial significance.

Further, the super pixels to be evaluated are sorted according to the significance, and are classified into first super pixels and second super pixels according to a second preset probability.

Further, selecting a new super-pixel according to the classification result specifically includes: primary selection and/or secondary selection;

the primary selection is as follows:

selecting a new super pixel from the first super pixel or the second super pixel according to a first preset rule;

the secondary selection is as follows:

judging to utilize one super pixel or a plurality of super pixels according to a second preset rule;

if one superpixel is selected, the superpixel is used as a central superpixel, a new superpixel is selected by taking the central superpixel as the center of a circle and a second preset distance as a radius;

and if a plurality of superpixels are selected, selecting one superpixel as a central superpixel in the central connecting line range of the plurality of superpixels, and selecting a new superpixel by taking the central superpixel as the center of a circle and a third preset distance as a radius.

Further, the method further includes filling blind areas in the saliency map, and specifically includes:

acquiring the significance of the super pixel which is adjacent to the super pixel which is not subjected to significance evaluation and is subjected to significance evaluation;

and taking the value obtained by averaging the significance as the significance of the super-pixel without significance evaluation.

Further, normalizing the saliency map is further included.

Further, before the super-pixel segmentation, the method further comprises the following steps: and initializing a super-pixel extraction parameter and a parameter of the brain storm optimization algorithm.

In a second aspect, the present invention also provides a salient region detection apparatus based on visual attention, including:

a superpixel segmentation module: the system comprises a super-pixel segmentation module, a super-pixel segmentation module and a display module, wherein the super-pixel segmentation module is used for performing super-pixel segmentation on an input scene to obtain a plurality of super-pixels;

a significance evaluation module: the system is used for carrying out significance evaluation by utilizing a group intelligent optimization algorithm, selecting and calculating the significance of the super pixels, and carrying out super pixel classification according to the significance;

a generate saliency map module: and selecting a new super pixel according to the classification result to carry out significance evaluation, and iterating for multiple times until a termination condition is reached to generate a significance map of the input scene.

In a third aspect, the present invention provides a salient region detecting apparatus based on visual attention, comprising:

at least one processor, and a memory communicatively coupled to the at least one processor;

wherein the processor is adapted to perform the method of any of the first aspects by invoking a computer program stored in the memory.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any of the first aspects.

The invention has the beneficial effects that:

the method comprises the steps of carrying out superpixel segmentation on an input scene to obtain a plurality of superpixels, carrying out significance evaluation by using a brainstorm optimization algorithm, selecting and calculating the significance of the superpixels, carrying out superpixel classification according to the significance, selecting new superpixels according to classification results to carry out significance evaluation, iterating for multiple times until a termination condition is reached, and generating a significance map of the input scene. The clustering characteristic of the head-to-head Feng-Bai optimization algorithm is utilized to guide the search of the salient region to converge towards the salient region in the iteration process, all regions of the image or the scene do not need to be traversed, partial super pixels are selected for carrying out the significance evaluation, the incomplete salient image is finally output, the visual attention mechanism of human is fully simulated, the salient region detection from bottom to top is realized, and the salient region in the input scene is quickly positioned. Can be widely applied to the field of visual inspection.

Drawings

FIG. 1 is a flow chart of an implementation of a salient region detection method based on visual attention according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating the process of calculating saliency according to an embodiment of the method for detecting a salient region based on visual attention in the present invention;

FIG. 3 is a diagram of selecting a new super-pixel according to the classification result according to an embodiment of the salient region detection method based on visual attention in the present invention;

FIG. 4 is a flowchart illustrating a method for detecting a salient region based on visual attention according to an embodiment of the present invention;

fig. 5 is a block diagram of a salient region detection apparatus based on visual attention according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The first embodiment is as follows:

the first embodiment of the present invention provides a method for detecting a salient region based on visual attention, and the following terms used in this embodiment are used for explanation.

1) Visual attention — the visual information that is most important, most useful, and most relevant to the current purpose is quickly selected from among the many visual information in the field of view; 2) the group intelligent optimization algorithm mainly simulates the group behaviors of insects, herds of animals, bird groups and fish groups, the groups search food in a cooperative mode, each member in the group continuously changes the searching direction by learning the experience of the member and the experiences of other members, and any algorithm or distributed problem solving strategy which is triggered and designed by the social behavior mechanism of the insect groups or other animals belongs to the group intelligent optimization algorithm; 3) a Brain storm Optimization algorithm, namely a Brain Storm Optimization (BSO) algorithm, is a novel group intelligent Optimization method, takes a problem solved by people in a broad sense as a prototype, extracts a mode of solving the problem and abstracts the mode into an intelligent Optimization algorithm; 4) saliency map-an image used to represent the degree of saliency of each region (or pixel) in a scene.

The group intelligent optimization algorithm in the embodiment comprises: the brain storm optimization algorithm, the particle swarm optimization method or the ant colony algorithm and the like are used for realizing the purpose of utilizing colony intelligence to search the significant area, and the brain storm optimization algorithm is taken as an example for explanation.

Fig. 1 is a flowchart of an implementation of a method for detecting a salient region based on visual attention according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

s1: and performing superpixel segmentation on the input scene to obtain a plurality of superpixels.

S2: and performing significance evaluation by using a brain storm optimization algorithm, selecting and calculating the significance of the superpixels, and classifying the superpixels according to the significance.

S3: and selecting new super pixels according to the classification result to carry out significance evaluation, and iterating for multiple times until a termination condition is reached to generate a significance map of the input scene.

Further, the method also comprises the step of S0: initializing superpixel extraction parameters and parameters of a brainstorming optimization algorithm, wherein the superpixel extraction parameters in the embodiment comprise: the number of super pixels N, the parameters of the brainstorming optimization algorithm comprise: a first preset number (i.e. the number of populations of the brainstorming optimization algorithm) N_sA second predetermined probability perc_eA first probability value p_eA second probability value p_oneA first predetermined distance R₁A second predetermined distance R₂A third predetermined distance R₃。

The above steps are described in detail below.

Specifically, in step S1, in order to reduce the calculation amount of salient region extraction, the present embodiment performs superpixel segmentation on an input scene to obtain a plurality of superpixels, where the superpixel segmentation is an irregular pixel block having a certain visual significance, which is formed by a series of adjacent pixels with adjacent positions and characteristics such as color, brightness, texture, and the like in the scene, is called a superpixel, most of which retain effective information for further image segmentation, and generally do not destroy the boundary of an object in an image, the pixels are grouped by using the similarity of the characteristics between the pixels, and a large number of pixels are replaced by a small number of superpixels to express the image characteristics, so that the complexity of image post-processing is greatly reduced, and subsequent steps in the present embodiment are performed in units of superpixels. Many common algorithms can implement superpixel segmentation, for example, simple linear iterative clustering (slic) can implement superpixel segmentation.

In step S2, the calculating the saliency of the super pixel specifically includes:

s21: taking the center of the input scene as the circle center and a first preset distance R₁Selecting a first preset number N for the radius selection range_sThe superpixels to be evaluated are used as the superpixels to be evaluated, and further, the superpixels to be evaluated are selected in a random selection mode.

S22: and calculating to obtain the spatial significance of the super-pixel to be evaluated by combining the attenuation coefficient and the feature vector of the super-pixel to be evaluated, wherein the calculation specifically comprises the following steps:

is provided with

Representing a superpixel sp_iAnd sp_jFeature vector of

And

w (i, j) represents an attenuation coefficient, and is used for representing that the region which is farther away from the current super pixel has smaller significance influence on the current super pixel region, and then the super pixel sp_iThe spatial saliency of (d) is expressed as:

wherein D is_s[sp_i,sp_j]Representing a superpixel sp_iAnd sp_jOf a spatial distance of between, σ²Indicating the decay rate, may be preset.

S23: according to the focus deviation and the spatial saliency, the saliency of the super-pixel to be evaluated is calculated, and because the existing scheme introduces a central offset when generating a saliency map, namely the center of an image or a scene is considered to be more prominent than the periphery, which is inconsistent with the actual human visual attention, the focus deviation is introduced in the embodiment to evaluate the saliency of the super-pixel, which is expressed as:

wherein, X_iDenotes the center position, X, of the salient region_msr(t) represents the position of the region with the highest significance at the t-th iteration, X at the initial iteration_msr(t) is the center position X of the input scene_cAnd Z represents a diagonal length.

As shown in fig. 2, which is a schematic view of the significance calculation process in this embodiment, it can be seen from the figure that the super-pixel to be evaluated is selected after the parameters are initialized, and the first preset distance R is calculated₁And calculating the spatial significance of each super-pixel to be evaluated according to the attenuation coefficient between each super-pixel to be evaluated and other super-pixels to be evaluated in the radius selection range, and then calculating the significance of the super-pixel to be evaluated according to the focus deviation and the spatial significance.

S24: and sorting the superpixels to be evaluated according to the significance, and classifying the superpixels to be evaluated into a first class superpixel and a second class superpixel according to a second preset probability.

In one specific scenario, the number of pairs is N_sThe superpixels to be evaluated are sorted according to the significance, wherein the value of the significance is positioned at the top N_s×perc_e% of the total amount of N is selected as the first class of superpixels (alternatively referred to as elite superpixels), and the remainder is N_s×(1-perc_e%) as a second type of superpixel (otherwise known as normal superpixel).

In step S3, selecting a new super pixel according to the classification result specifically includes: primary selection and/or secondary selection. In this embodiment, only one selection or two selections may be adopted, or two selections may be performed after one selection.

The first selection is as follows: and selecting a new super pixel from the first super pixel or the second super pixel according to a first preset rule.

The secondary selection is as follows: judging to utilize one superpixel or a plurality of superpixels according to a second preset rule, if one superpixel is selected, selecting a new superpixel by taking the superpixel as a central superpixel, taking the central superpixel as the circle center and taking a second preset distance as a radius; if a plurality of superpixels are selected, selecting one superpixel as a central superpixel in the central connecting line range of the plurality of superpixels, and selecting a new superpixel by taking the central superpixel as the center of a circle and a third preset distance as a radius.

Optionally, when only one selection is performed, one superpixel is selected as a central pixel according to a judgment result of the first preset rule, the central pixel is used as a circle center, the selection radius is set, and the corresponding number of superpixels is selected for significance evaluation. Optionally, when only the second selection is performed, one or more superpixels are randomly selected from the elite superpixels or the normal superpixels.

As shown in fig. 3, which is a schematic diagram of selecting a new super-pixel according to a classification result in this embodiment, in this implementation scenario, after performing a selection once in this embodiment, performing a secondary selection according to a result of the selection, and after the process is initializing a parameter, generating a random number rand, where the first preset rule is described as: judging whether the random number rand is less than the first probability value p_eAnd if the value is less than the preset value, selecting a new super pixel from the elite super pixels for significance evaluation, and otherwise, selecting a new super pixel from the ordinary super pixels for significance evaluation.

The second preset rule is described as: judging whether the random number rand is less than the second probability value p_oneIf yes, selecting a superpixel as a center superpixel, taking the superpixel as the center of a circle, and taking a second preset distance R₂For the radius, the number N is selected_s(the number is not limited and can be changed according to actual needs) performing significance evaluation, otherwise selecting a plurality of superpixels, and selecting one superpixel as a central superpixel in the central connecting line range of the plurality of superpixelsElement, with a third predetermined distance of R₃Radius, selected number N_s(the number is not limited, and can be changed according to actual needs) to perform significance evaluation.

Further, in this embodiment, the first preset distance R₁A second predetermined distance R₂A third predetermined distance R₃The same value may be taken.

Step S3 further includes filling the blind area, where, using the search strategy, the super pixel whose saliency is evaluated will have a corresponding saliency, and for the super pixel whose saliency is not evaluated, the super pixel is called a blind area super pixel, and since the neighboring areas will have similar saliency features, that is, the saliency of the blind area super pixel is obtained by using the similarity of the neighboring areas, the filling of the blind area for the saliency map in this embodiment specifically includes: the significance of the super-pixel which is adjacent to the blind area super-pixel and subjected to significance evaluation is obtained, and the value obtained by averaging the significance is used as the significance of the blind area super-pixel and is expressed as follows:

in the above formula, S' (sp)_b) Representing the saliency, sp, of a dead zone superpixel_adjRepresentation and dead zone superpixel sp_bAdjacent super-pixels with saliency, N_adjIndicating the number thereof.

It should be noted that, since the neighboring parts of the partial blind area superpixels may also be blind areas beyond the ideal, even if the blind areas are filled by the present embodiment, the finally generated saliency map is still an incomplete map with blind areas.

Further, step S3 includes normalizing the saliency map for display convenience, and normalizing the obtained saliency of each super-pixel as:

in the above formula, S_N(sp_i) Representing normalized saliency mapsN denotes the number of super pixels with significance, S (sp)_i) Representing the saliency of a super-pixel, S_max、S_minRespectively, the maximum and minimum values of the saliency in the non-normalized saliency map.

The termination conditions in step S3 include: 1) reaching the preset iteration times; or 2) the saliency map does not change after the preset number of iterations although the preset number of iterations is not reached. When the termination condition is reached, i.e. the iteration is stopped, a saliency map is generated.

As shown in fig. 4, which is a specific flowchart of the salient region detection method based on visual attention in this embodiment, it can be seen from the diagram that first a population of a brainstorm optimization algorithm is initialized, a super-pixel to be evaluated for the first iteration is selected, the saliency of the super-pixel is evaluated, then the super-pixel is classified according to the evaluation result, a new super-pixel is selected according to the classification result, the saliency evaluation is continued, the super-pixel classification is updated in the process, the most salient region is updated correspondingly, whether the termination condition is reached is judged, if the termination condition is not reached, the saliency evaluation is continued, otherwise, the generated saliency map is filled with blind regions, and the filled saliency map is normalized to obtain a final saliency map.

In the embodiment, repeated iterative search of the salient region is performed by adopting a brain storm optimization algorithm, the search is gradually guided to converge towards the salient region by adopting a random selection mode, all regions do not need to be traversed, and the focus deviation is introduced at the same time, namely the central position is the most salient region in each step of search, so that the real visual attention characteristic is better met, and the calculated amount is reduced.

Example two:

the present embodiment provides a salient region detection apparatus based on visual attention, as shown in fig. 5, which is a block diagram of a structure of the salient region detection apparatus based on visual attention of the present embodiment, and includes:

superpixel segmentation module 10: the system comprises a super-pixel segmentation module, a super-pixel segmentation module and a display module, wherein the super-pixel segmentation module is used for performing super-pixel segmentation on an input scene to obtain a plurality of super-pixels;

the significance evaluation module 20: the system is used for carrying out significance evaluation by utilizing a group intelligent optimization algorithm, selecting and calculating the significance of the superpixels, and carrying out superpixel classification according to the significance;

generate saliency map module 30: and selecting new super pixels according to the classification result to perform significance evaluation, iterating for multiple times until a termination condition is reached, and generating a significance map of the input scene.

The specific details of the above salient region detection apparatus module based on visual attention have been described in detail in the embodiment of a corresponding salient region detection method based on visual attention, and therefore are not described herein again.

Example three:

the present invention also provides a salient region detecting apparatus based on visual attention, comprising:

wherein the processor is configured to perform the method according to embodiment one by calling the computer program stored in the memory. A computer program, i.e. a program code for causing a visual attention based salient region detecting apparatus to perform the steps of the visual attention based salient region detecting method as described in the previous embodiment of the present description, when the program code is run on the visual attention based salient region detecting apparatus.

In addition, the present invention also provides a computer-readable storage medium, which stores computer-executable instructions for causing a computer to perform the method according to the first embodiment.

The above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same, although the present invention is described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A method for detecting a salient region based on visual attention, comprising:

2. A method as claimed in claim 1, wherein calculating the saliency of the super-pixels specifically comprises:

3. A method as claimed in claim 2, wherein the super-pixels to be evaluated are sorted according to the saliency, and are classified into a first super-pixel type and a second super-pixel type according to a second predetermined probability.

4. A method as claimed in claim 3, wherein selecting a new super-pixel according to the classification result comprises: primary selection and/or secondary selection;

the primary selection is as follows:

the secondary selection is as follows:

5. The method according to claim 1, further comprising performing blind filling on the saliency map, specifically comprising:

6. A method as claimed in claim 1, further comprising normalizing the saliency map.

7. A salient region detection method based on visual attention according to any one of claims 1 to 6, characterized in that the swarm intelligence optimization algorithm comprises any one of the following: a brainstorming storm optimization algorithm, a particle swarm optimization method and an ant colony algorithm;

before the super-pixel segmentation, the method further comprises the following steps: and initializing super-pixel extraction parameters and parameters of the group intelligent optimization algorithm.

8. A salient region detection apparatus based on visual attention, comprising:

9. A salient region detection apparatus based on visual attention, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the processor is operable to perform the method of any one of claims 1 to 7 by invoking a computer program stored in the memory.

10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 7.