CN111242941B

CN111242941B - Salient region detection method and device based on visual attention

Info

Publication number: CN111242941B
Application number: CN202010065654.8A
Authority: CN
Inventors: 杨剑; 沈阳; 史玉回
Original assignee: Southern University of Science and Technology
Current assignee: Southern University of Science and Technology
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2023-05-30
Anticipated expiration: 2040-01-20
Also published as: CN111242941A

Abstract

The invention discloses a salient region detection method and device based on visual attention. The method comprises the steps of obtaining a plurality of super pixels by performing super pixel segmentation on an input scene, performing saliency assessment by using a brain storm optimization algorithm, selecting and calculating the saliency of the super pixels, performing super pixel classification according to the saliency, selecting new super pixels according to classification results to perform the saliency assessment, iterating for a plurality of times until reaching a termination condition, and generating a saliency map of the input scene. The clustering characteristic of the brain-Feng-Bai optimization algorithm is utilized, the search of the salient region is guided to converge towards the salient region in the iterative process, the whole region of an image or a scene does not need to be traversed, partial superpixels are selected for carrying out saliency evaluation, the finally output incomplete salient map fully simulates the visual attention mechanism of human beings, the salient region detection from bottom to top is realized, and the salient region in the input scene is rapidly positioned. Can be widely applied to the field of visual detection.

Description

Salient region detection method and device based on visual attention

Technical Field

The invention relates to the field of computer software, in particular to a method and a device for detecting a salient region based on visual attention.

Background

Today, research is increasingly being conducted on the perception principle of simulating human vision through a visual attention mechanism to rapidly locate a salient region from a scene for further processing, so as to reduce the amount of data to be processed, and a visual attention computing model has been applied to the fields of target detection, recognition, tracking and the like.

The existing main visual attention detection methods include 1) a cognitive attention model such as Itti extracts primary color, brightness, azimuth and other features of an input image according to the behavior and network structure of an early original visual system, generates feature images reflecting significance measurement by using central peripheral operation under multiple scales, and finally combines the feature images to obtain a final significance image. 2) The discriminating significance model proposed by Gao et al regards the significance problem as the optimal decision for two classes of visual stimuli: the calculation of salience is achieved by determining a classification problem for the stimulus of interest and the background containing non-salience, the salience at each location in the field of view being equivalent to the discriminant power of a set of visual features at that location, the smaller the expected classification error for a particular location or point, the greater the salience. 3) The saliency model based on frequency domain analysis calculates the saliency of the image on the frequency domain, such as a spectrum residual method, a phase spectrum Fourier transform method, a frequency tuning method and the like. 4) The eye movement data in the visual attention test is regarded as a time sequence by the visual attention model based on graph theory, and the eye movement is influenced by a large number of hidden variables, so that the visual attention model is realized by using a hidden Markov model, a dynamic Bayesian network or a conditional random field and the like.

However, each method calculates the saliency value of each region or pixel by adopting a traversing processing method for the whole scene or image, and the final output is a complete saliency map, which not only increases the calculation amount, but also does not accord with a processing mechanism of true visual attention selectivity. Therefore, there is a need to propose a visual attention-based salient region detection method for quickly locating salient regions in a scene by sufficiently simulating a human visual attention mechanism without traversing the whole region of the image or scene.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems in the related art to some extent. To this end, it is an object of the present invention to provide a salient region detection method based on visual attention.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a method for detecting a salient region based on visual attention, comprising:

performing super-pixel segmentation on an input scene to obtain a plurality of super-pixels;

performing significance evaluation by using a group intelligent optimization algorithm, selecting and calculating the significance of the super pixels, and performing super pixel classification according to the significance;

and selecting a new superpixel to carry out saliency assessment according to the classification result, iterating for a plurality of times until reaching a termination condition, and generating a saliency map of the input scene.

Further, calculating the saliency of the superpixel specifically includes:

selecting a first preset number of super pixels as super pixels to be evaluated in a selection range taking the center of the input scene as a circle center and a first preset distance as a radius;

combining the attenuation coefficient and the characteristic vector of the super pixel to be evaluated, and calculating to obtain the spatial significance of the super pixel to be evaluated;

and calculating the saliency of the super pixel to be evaluated according to the focus deviation and the spatial saliency.

Further, sorting the super-pixels to be evaluated according to the saliency, and classifying the super-pixels to be evaluated into a first type super-pixel and a second type super-pixel according to a second preset probability.

Further, selecting a new superpixel according to the classification result specifically includes: primary selection and/or secondary selection;

the primary selection is as follows:

selecting new super pixels from the first type super pixels or the second type super pixels according to a first preset rule;

the secondary selection is as follows:

judging and utilizing one super pixel or a plurality of super pixels according to a second preset rule;

if one superpixel is selected, the superpixel is used as a central superpixel, the central superpixel is used as a circle center, a second preset distance is used as a radius, and a new superpixel is selected;

if a plurality of super pixels are selected, selecting one super pixel as a central super pixel in the central connecting line range of the plurality of super pixels, and selecting a new super pixel by taking the central super pixel as a circle center and taking a third preset distance as a radius.

Further, the method further comprises the step of filling the blind area in the saliency map, and specifically comprises the following steps:

acquiring the saliency of the super pixel subjected to the saliency evaluation adjacent to the super pixel not subjected to the saliency evaluation;

and taking the value obtained by averaging the saliency as the saliency of the super pixel without saliency evaluation.

Further, normalizing the saliency map is included.

Further, before the super-pixel segmentation, the method further comprises: initializing super-pixel extraction parameters and parameters of the brain storm optimization algorithm.

In a second aspect, the present invention also provides a salient region detection device based on visual attention, including:

a super-pixel segmentation module: the method comprises the steps of performing super-pixel segmentation on an input scene to obtain a plurality of super-pixels;

the saliency assessment module: the method comprises the steps of performing significance evaluation by using a population intelligent optimization algorithm, selecting and calculating the significance of the super pixels, and performing super pixel classification according to the significance;

and a saliency map generation module: and selecting a new super pixel to perform saliency assessment according to the classification result, iterating for a plurality of times until reaching a termination condition, and generating a saliency map of the input scene.

In a third aspect, the present invention provides a salient region detection device based on visual attention, comprising:

at least one processor, and a memory communicatively coupled to the at least one processor;

wherein the processor is adapted to perform the method according to any of the first aspects by invoking a computer program stored in the memory.

In a fourth aspect, the present invention provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of any one of the first aspects.

The beneficial effects of the invention are as follows:

according to the invention, a plurality of super pixels are obtained by carrying out super pixel segmentation on an input scene, the saliency evaluation is carried out by using a brain storm optimization algorithm, the saliency of the super pixels is selected and calculated, the super pixels are classified according to the saliency, a new super pixel is selected according to the classification result to carry out the saliency evaluation, and the iteration is carried out for a plurality of times until reaching a termination condition, so that a saliency map of the input scene is generated. The clustering characteristic of the brain-Feng-Bai optimization algorithm is utilized, the search of the salient region is guided to converge towards the salient region in the iterative process, the whole region of an image or a scene does not need to be traversed, partial superpixels are selected for carrying out saliency evaluation, the finally output incomplete salient map fully simulates the visual attention mechanism of human beings, the salient region detection from bottom to top is realized, and the salient region in the input scene is rapidly positioned. Can be widely applied to the field of visual detection.

Drawings

FIG. 1 is a flow chart of an implementation of one embodiment of a visual attention-based salient region detection method in the present invention;

FIG. 2 is a schematic diagram of a saliency calculation process according to an embodiment of a saliency detection method according to the present invention;

FIG. 3 is a schematic diagram of selecting new superpixels according to classification results according to an embodiment of a salient region detection method based on visual attention in the present invention;

FIG. 4 is a flowchart illustrating an embodiment of a salient region detection method based on visual attention in accordance with the present invention;

fig. 5 is a block diagram of a specific embodiment of a salient region detection device based on visual attention in the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will explain the specific embodiments of the present invention with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the invention, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiment one:

an embodiment of the present invention provides a salient region detection method based on visual attention, and the following is a noun explanation used in this embodiment.

1) Visual attention—quickly selecting the most important, most useful, and most relevant visual information for the current purpose from among the multitude of visual information in the field of view; 2) The intelligent group optimization algorithm mainly simulates group behaviors of insects, beasts, birds and fish groups, the groups search food in a cooperative mode, each member in the groups continuously changes the searching direction by learning own experience and experience of other members, and any algorithm or distributed problem solving strategy which is designed by the excitation of the insect groups or other animal social behavior mechanisms belongs to the intelligent group optimization algorithm; 3) The brain storm optimization algorithm, namely the brain storm optimization (Brain Strom Optimization, BSO) algorithm, is an emerging group intelligent optimization method, takes the solution problem of people's intensive thinking as a prototype, extracts a mode for solving the problem, and abstracts the mode as an intelligent optimization algorithm; 4) Saliency map—an image used to represent the degree of saliency of each region (or pixel) in a scene.

The intelligent optimization algorithm for the group in the embodiment comprises the following steps: the brain storm optimization algorithm, the particle swarm optimization method or the ant colony algorithm, etc. are used for realizing the intelligent searching of the significant areas by using the swarm, and the brain storm optimization algorithm is taken as an example for the explanation.

Fig. 1 is a flowchart of an implementation of a salient region detection method based on visual attention according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

s1: and performing superpixel segmentation on the input scene to obtain a plurality of superpixels.

S2: and performing significance evaluation by using a brain storm optimization algorithm, selecting and calculating the significance of the superpixels, and performing superpixel classification according to the significance.

S3: and selecting a new superpixel to carry out saliency evaluation according to the classification result, iterating for a plurality of times until reaching a termination condition, and generating a saliency map of the input scene.

Further, S0: initializing the super-pixel extraction parameters and parameters of a brain storm optimization algorithm, wherein the super-pixel extraction parameters in the embodiment comprise: super-pixel number N, brain storm optimization calculationThe parameters of the method include: a first preset number (i.e. population number of the brain storm optimization algorithm) N _s Second preset probability perc _e First probability value p _e Second probability value p _one A first preset distance R ₁ A second preset distance R ₂ Third preset distance R ₃ 。

The above steps are described in detail below.

Specifically, in step S1, in order to reduce the calculation amount of significant region extraction, in this embodiment, the input scene is subjected to superpixel segmentation by adopting a superpixel segmentation manner to obtain a plurality of superpixels, the superpixel segmentation refers to an irregular pixel block with a certain visual meaning, which is formed by a series of adjacent pixels with adjacent positions and features such as color, brightness, texture and the like in the scene, and is called a superpixel, most of effective information for further image segmentation is reserved, and the boundaries of objects in the image are not damaged generally, pixels are grouped by utilizing the similarity of features among the pixels, a small number of superpixels are used for replacing a large number of pixels to express the picture features, so that the complexity of image post-processing is reduced to a great extent, and all the subsequent steps in this embodiment are performed in a superpixel unit. Many common algorithms can realize super-pixel segmentation, for example SLIC (simple linear iterative clustering) simple linear iterative clustering can realize super-pixel segmentation.

In step S2, calculating the saliency of the superpixel specifically includes:

s21: taking the center of the input scene as the center of a circle and taking the first preset distance R as the center of the circle ₁ Selecting a first preset number N for the selection range of the radius _s As a super-pixel to be evaluated, further, selecting the super-pixel to be evaluated by a random selection manner.

S22: combining the attenuation coefficient and the characteristic vector of the super pixel to be evaluated, and calculating to obtain the spatial saliency of the super pixel to be evaluated, wherein the spatial saliency is specifically as follows:

is provided with

Representing superpixel sp _i And sp (sp) _j Feature vector +.>

And->

The difference between the two is that w (i, j) represents an attenuation coefficient for representing that the region farther from the current superpixel has smaller influence on the saliency of the current superpixel region, the superpixel sp _i Is expressed as:

wherein D is _s [sp _i ,sp _j ]Representing superpixel sp _i And sp (sp) _j The spatial distance between sigma ² The damping rate is indicated and may be set in advance.

S23: according to the focus deviation and the spatial saliency, the saliency of the super pixel to be evaluated is obtained through calculation, and because the center offset is introduced when the saliency map is generated by the existing scheme, namely the center of the image or the scene is considered to be more remarkable than the periphery, which is not in agreement with the actual human visual attention, the focus deviation is introduced to evaluate the saliency of the super pixel in the embodiment, which is expressed as follows:

/>

wherein X is _i Representing the central position of the salient region, X _msr (t) represents the position of the region with the highest significance at the t-th iteration, X at the initial iteration _msr (t) is the center position X of the input scene _c Z represents the diagonal length.

As shown in fig. 2, which is a schematic diagram of a saliency calculation flow in this embodiment, it can be seen from the figure that the super-pixel to be evaluated is selected after initializing the parametersCalculating a first preset distance R ₁ And in the radius selection range, the attenuation coefficient between each super pixel to be evaluated and other super pixels to be evaluated is calculated, and then the saliency of the super pixels to be evaluated is calculated according to the focus deviation and the spatial saliency.

S24: the super pixels to be evaluated are ranked according to the saliency, and the super pixels to be evaluated are classified into first super pixels and second super pixels according to the second preset probability.

In one specific scenario, the number of numbers is N _s Ordering the super-pixels to be evaluated of (2) according to saliency, wherein the value of saliency is at the first N _s ×perc _e % of the portion being the first type of superpixel (otherwise known as elite superpixel), the remainder N _s ×(1-perc _e %) as a second type of superpixel (alternatively referred to as a normal superpixel).

In step S3, selecting a new superpixel according to the classification result specifically includes: primary selection and/or secondary selection. In this embodiment, only one selection or two selections may be used, or two selections may be performed after one selection.

The primary selection is as follows: and selecting a new super pixel from the first super pixel type or the second super pixel type according to a first preset rule.

The secondary selection is as follows: judging and utilizing one super pixel or a plurality of super pixels according to a second preset rule, if one super pixel is selected, selecting a new super pixel by taking the super pixel as a central super pixel and taking the central super pixel as a circle center and taking a second preset distance as a radius; if a plurality of super pixels are selected, selecting one super pixel as a central super pixel in the range of the central connecting line of the plurality of super pixels, selecting a new super pixel by taking the central super pixel as a circle center and taking a third preset distance as a radius.

Optionally, when only one selection is performed, selecting one superpixel as a central pixel according to the judgment result of the first preset rule, setting a selection radius by taking the central pixel as a circle center, and selecting a corresponding number of superpixels for saliency evaluation. Alternatively, when only a secondary selection is made, one or more superpixels are randomly selected from elite superpixels or general superpixels.

As shown in fig. 3, a schematic diagram of selecting a new superpixel according to the classification result in this embodiment is shown, in this implementation scenario, after performing primary selection, performing secondary selection according to the result of the primary selection, where after the flow is an initialization parameter, a random number rand is generated, and the first preset rule is described as follows: judging whether the random number rand is smaller than the first probability value p _e If the pixel is smaller than the threshold value, selecting a new superpixel from elite superpixels for saliency evaluation, otherwise selecting a new superpixel from common superpixels for saliency evaluation.

The second preset rule is described as: judging whether the random number rand is smaller than the second probability value p _one If yes, selecting one superpixel as a central superpixel, taking the superpixel as a circle center, and taking a second preset distance R ₂ For radius, select the number N _s (the number is not limited and can be changed according to actual needs) to evaluate the saliency, otherwise, selecting a plurality of superpixels, selecting one superpixel as a central superpixel in the central connecting range of the superpixels, and taking a third preset distance as R ₃ Radius, selected number N _s (the number is not limited and can be changed according to actual needs) and the saliency evaluation is performed by the super pixel.

Further, in this embodiment, the first preset distance R ₁ A second preset distance R ₂ Third preset distance R ₃ The same value can be taken.

Step S3 further includes filling up the blind area, and using the above search strategy, the super pixel with the evaluated saliency will have a corresponding saliency, and for the super pixel without the evaluated saliency, called a blind area super pixel, since the adjacent area will have similar salient features, that is, the saliency of the blind area super pixel is obtained by using the similarity of the peripheral area, in this embodiment, filling up the blind area for the saliency map specifically includes: obtaining the saliency of the super pixel which is adjacent to the dead zone super pixel and has been subjected to saliency evaluation, wherein the value for averaging the saliency is taken as the saliency of the dead zone super pixel and expressed as follows:

in the above, S' (sp _b ) Representing the saliency of the dead zone superpixel, sp _adj Representation and dead zone superpixel sp _b Adjacent super-pixels with saliency, N _adj Indicating the number thereof.

It should be noted that, since the adjacent part of the partial blind area superpixel may be too much blind area, even if the blind area is filled in by the present embodiment, the finally generated saliency map is still an incomplete map with blind area.

Further, step S3 further includes normalizing the saliency map for convenience of display, where the obtained saliency of each superpixel is normalized, which is expressed as:

in the above, S _N (sp _i ) Represents the normalized saliency map, N represents the number of superpixels with saliency, S (sp _i ) Representing the saliency of the super-pixel, S _max 、S _min The maximum and minimum values of saliency in the unnormalized saliency maps are represented, respectively.

The termination conditions in step S3 include: 1) Reaching the preset iteration times; or 2) the saliency map does not change after the preset number of iterations, although the preset number of iterations is not reached. When the termination condition is reached, the iteration is stopped, and a saliency map is generated.

As shown in fig. 4, a specific flowchart of a salient region detection method based on visual attention in this embodiment is shown, which is that firstly, a population of a brain storm optimization algorithm is initialized, super pixels to be evaluated for initial iteration are selected, saliency evaluation is performed on the super pixels, then super pixel classification is performed according to an evaluation result, new super pixels are selected according to a classification result, saliency evaluation is continued, the super pixel classification is updated in the process, the corresponding updated most salient region is judged whether a termination condition is reached, if not, the saliency evaluation is continued, otherwise, the generated salient map is filled in a blind area, and normalization is performed on the filled salient map to obtain a final salient map.

According to the method, the brain storm optimization algorithm is adopted to conduct repeated iterative search of the salient regions, the random selection mode is adopted to gradually guide the search to converge towards the salient regions, all regions do not need to be traversed, meanwhile, focus deviation is introduced, namely the central position is the most salient region in each step of search, the method is more in line with the real visual attention characteristics, and the calculated amount is reduced.

Embodiment two:

the present embodiment provides a visual attention-based salient region detection device, as shown in fig. 5, which is a structural block diagram of the visual attention-based salient region detection device of the present embodiment, and includes:

super pixel segmentation module 10: the method comprises the steps of performing super-pixel segmentation on an input scene to obtain a plurality of super-pixels;

saliency assessment module 20: the method comprises the steps of performing significance evaluation by using a population intelligent optimization algorithm, selecting and calculating the significance of the superpixels, and performing superpixel classification according to the significance;

the generate saliency map module 30: and selecting a new superpixel to perform saliency evaluation according to the classification result, iterating for a plurality of times until reaching a termination condition, and generating a saliency map of the input scene.

The specific details of the above noted visual attention-based salient region detection device module have been described in detail in the corresponding visual attention-based salient region detection method in the embodiment, and thus are not described herein.

Embodiment III:

the present invention also provides a visual attention-based salient region detection device, comprising:

wherein the processor is configured to perform the method according to embodiment one by invoking a computer program stored in the memory. The computer program, i.e. the program code, is for causing the visual attention-based salient region detection device to perform the steps in the visual attention-based salient region detection method as described in the above-described embodiments of the present specification, when the program code is run on the visual attention-based salient region detection device.

In addition, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions for causing a computer to execute the method according to the first embodiment.

The above embodiments are only for illustrating the technical solution of the present invention, not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims

1. A method for detecting salient regions based on visual attention, comprising:

selecting a new superpixel to carry out saliency evaluation according to the classification result, iterating for a plurality of times until reaching a termination condition, and generating a saliency map of the input scene;

the calculating the saliency of the super pixel specifically comprises:

calculating the saliency of the super pixel to be evaluated according to the focus deviation and the spatial saliency;

sorting the super-pixels to be evaluated according to the saliency, and classifying the super-pixels to be evaluated into first-class super-pixels and second-class super-pixels according to a second preset probability;

selecting a new superpixel according to the classification result specifically includes: primary selection and/or secondary selection;

the primary selection is as follows:

the secondary selection is as follows:

2. The visual attention-based salient region detection method of claim 1, further comprising filling up a blind zone in the salient map, and specifically comprising:

3. The visual attention-based salient region detection method of claim 1, further comprising normalizing the salient map.

4. A method of visual attention-based salient region detection as recited in any of claims 1 to 3, wherein said population intelligent optimization algorithm comprises any of: a brain storm optimization algorithm, a particle swarm optimization method and an ant colony algorithm;

the method further comprises the following steps before the super-pixel segmentation: initializing super-pixel extraction parameters and parameters of the group intelligent optimization algorithm.

5. A visual attention-based salient region detection device, comprising:

and a saliency map generation module: the method comprises the steps of selecting new superpixels according to classification results to carry out saliency evaluation, iterating for a plurality of times until termination conditions are reached, and generating a saliency map of the input scene;

the calculating the saliency of the super pixel specifically comprises:

the primary selection is as follows:

the secondary selection is as follows:

6. A visual attention-based salient region detection device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the processor is adapted to perform the method of any of claims 1 to 4 by invoking a computer program stored in the memory.

7. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 4.