CN108665455B

CN108665455B - Method and device for evaluating image significance prediction result

Info

Publication number: CN108665455B
Application number: CN201810457947.3A
Authority: CN
Inventors: 李甲; 苏金明; 夏长群; 赵沁平
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2022-04-26
Anticipated expiration: 2038-05-14
Also published as: CN108665455A

Abstract

The application provides an evaluation method and a device for an image significance prediction result, wherein the method comprises the following steps: the method comprises the steps of obtaining significant area prediction results of a plurality of acquired significant area prediction methods on a plurality of image sets, preprocessing the prediction results, carrying out a subjective test experiment by using the prediction results to obtain a subjective relative significant relation between two significant prediction result graphs generated by any two prediction methods, constructing a subjective test data result pair, supplementing a significant area true value graph and a random graph data pair, constructing a subjective test result data set, constructing a convolution neural network model based on the relative significant relation aiming at the subjective test result data set, training the convolution neural network model, and obtaining an image significant prediction result from the model.

Description

Method and device for evaluating image significance prediction result

Technical Field

The invention relates to the field of computer vision and image processing, in particular to a method and a device for evaluating an image significance prediction result.

Background

The prediction of the salient region of an image is an important basic problem of computer vision, the evaluation of the prediction of the salient region of the image is an important problem of computer vision, and in the existing literature, researchers have proposed a plurality of saliency prediction methods and evaluation methods.

The existing image salient region prediction method comprises the following steps: the method comprises an IT method, a GB method, a CA method and the like, and the methods can obtain good prediction results of the significant areas of the images. In the aspect of image salient region prediction, 10 evaluation indexes such as AUC, PRE and NSS and the like are widely used, and various models are established to realize the evaluation of the image salient prediction result.

However, although the above evaluation methods achieve evaluation of the prediction result of the salient region of the image, none of the evaluation methods can cover the attributes of all the prediction methods and are easily affected by human visual perception, and therefore, evaluation of the prediction result of the saliency of the image in conformity with human visual perception cannot be achieved.

Disclosure of Invention

The embodiment of the invention provides an evaluation method and device for an image significance prediction result, which are used for solving the problems that the evaluation method in the scheme cannot cover all the attributes of the prediction method and is easily influenced by human visual cognition, so that the evaluation of the image significance prediction result consistent with the human visual cognition is realized.

The first aspect of the embodiments of the present invention provides an evaluation method for an image saliency prediction result, including:

carrying out image salient region prediction on an image to obtain a salient region prediction result of the image;

adopting an image significance prediction result evaluation model obtained according to pre-training to obtain the evaluation of the significance prediction result of the image;

the evaluation model of the image significance prediction result is obtained by training a mathematical model of a convolutional neural network structure, which is learned based on human evaluation standards of relative significance relations, according to a subjective test data set.

In one particular implementation of the method of the invention,

before the image saliency prediction result evaluation model obtained according to the pre-training is adopted to obtain the evaluation result of the image saliency prediction result according to the prediction result of the salient region, the method further comprises the following steps:

acquiring subjective test data sets according to the salient region prediction results of a plurality of salient region prediction methods on a plurality of image data sets;

and designing a mathematical model of a convolutional neural network structure based on a relatively significant relation for learning human evaluation standards according to the subjective test data set, and training the mathematical model to obtain an evaluation model of the image significance prediction result.

In one particular implementation of the method of the invention,

the acquiring of the subjective test data set according to the significant region prediction results of the plurality of significant region prediction methods on the plurality of image data sets includes:

acquiring the prediction results of the salient regions of the acquired multiple salient region prediction methods on multiple image data sets;

preprocessing the prediction result, performing a subjective test experiment, and constructing a subjective test data result pair;

and analyzing the subjective test data result pair, supplementing a salient region true value image and a random image data pair, and constructing the subjective test data set.

In a specific implementation manner, the preprocessing the prediction result includes:

and carrying out histogram equalization processing on the prediction result.

In one particular implementation of the method of the invention,

the analyzing the subjective test data result pair, supplementing a salient region true value graph and a random graph, and acquiring the subjective test data set includes:

supplementing the true value image and random image data pair of the image salient region into the subjective test data result pair to obtain the subjective test data set; wherein the true value map is the best result generated by each significant region prediction method for each significant region, and the random map is the worst result generated by each significant region prediction method for each significant region.

A second aspect of the embodiments of the present invention provides an apparatus for evaluating an image saliency prediction result, including:

the acquisition module is used for predicting the salient region of the image to obtain a prediction result of the salient region of the image;

the processing module is used for acquiring an evaluation result of the image significance prediction result by adopting an evaluation model of the image significance prediction result acquired according to pre-training according to the significant region prediction result;

the evaluation model of the image significance prediction result is obtained by training a mathematical model of a convolutional neural network structure, which is learned based on human evaluation standards of relative relations, according to a subjective test data set.

Optionally, the obtaining module is specifically configured to:

the processing module is specifically configured to: and designing a mathematical model of a convolutional neural network structure based on a relatively significant relation for learning human evaluation standards according to the subjective test data set, and training the mathematical model to obtain an evaluation model of the image significance prediction result.

Optionally, the obtaining module is specifically configured to:

acquiring salient regions of a plurality of acquired salient region prediction methods on a plurality of image data sets;

and analyzing the subjective test data result pair, supplementing a salient region true value image and a random image data pair, and acquiring the subjective test data set.

Optionally, the obtaining module is specifically configured to:

and preprocessing the prediction result by histogram equalization processing.

Optionally, the obtaining module is specifically configured to:

supplementing the true value image and random image data pair of the image into the subjective test data result pair to obtain the subjective test data set; wherein the true value map is the best result generated by each significant region prediction method for each significant region, and the random map is the worst result generated by each significant region prediction method for each significant region.

A third aspect of embodiments of the present invention provides an apparatus, including: a memory and a processor;

the memory is to store computer instructions; the processor is configured to execute the computer instructions stored by the memory to implement a saliency evaluation of an image consistent with human visual perception.

A fourth aspect of the embodiments of the present invention provides a storage medium, including: a readable storage medium and computer instructions stored in the readable storage medium for enabling evaluation of image saliency prediction results consistent with human visual perception.

According to the method and the device for evaluating the image significance performance, the image significance region is predicted to obtain the prediction result of the image significance region, the prediction result is used as input and substituted into the evaluation model of the image significance prediction result obtained through pre-training according to the method, so that the evaluation result of the image significance prediction result is obtained, wherein the image significance evaluation model is obtained through training a mathematical model of a convolutional neural network structure, which is learned based on human evaluation standards of relative significance relations, according to a subjective test data set, and the evaluation of the image significance prediction result consistent with human visual cognition is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flowchart illustrating a first embodiment of a method for evaluating a result of an image saliency prediction according to the present invention;

FIG. 2 is a schematic diagram of prediction result preprocessing according to an embodiment of the method for evaluating an image saliency prediction result of the present invention;

FIG. 3 is a schematic diagram of a subjective test data set according to an embodiment of the method for evaluating an image saliency prediction result of the present invention;

FIG. 4 is a schematic diagram of a convolutional neural network model based on a relative saliency relationship according to an embodiment of the method for evaluating an image saliency prediction result of the present invention;

FIG. 5 is a schematic diagram of an image saliency prediction result evaluation model according to an embodiment of the image saliency prediction result evaluation method of the present invention;

fig. 6 is a schematic structural diagram of a first embodiment of the apparatus for evaluating an image saliency prediction result according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a first embodiment of a method for evaluating an image saliency prediction result according to the present invention, as shown in fig. 1, the method for evaluating an image saliency prediction result includes:

s101, acquiring subjective test result data sets according to the salient region prediction results of the plurality of salient region prediction methods on the plurality of image data sets.

In a specific implementation mode, the significant region prediction results generated by the existing significant region prediction method on a plurality of image data sets are obtained, the prediction results are preprocessed, due to the characteristics of human visual cognition, a subjective test experiment is carried out by adopting any two prediction results, the subjective relative significant relation of two significant prediction result graphs generated by any two predictions is obtained, and a subjective test data result pair is constructed, wherein the subjective test data result pair is a quadruple, and the quadruple comprises a significant region true value graph, prediction results respectively generated by the two methods and the relative relation of the two prediction results; supplementing a truth map and random map data pair of the salient region, wherein the data pair is also a quadruple and comprises a truth map, a random map, a truth map and a relative saliency relation of the truth map and the random map of the salient region, and the relative saliency relation of the quadruple is 1, and constructing a subjective test result data set.

The significant region prediction method may be a prediction method in the prior art, such as: the method comprises an IT method, a mean value graph method and the like, wherein the data sets can be three data sets including a Toronto data set, the three data sets reflect the prediction results of the salient regions generated by different scenes from different angles, and the original image of the salient region to be predicted comprises the scenes of all angles, and can cover images with various characteristics.

In this step, the image salient region may be a region in the image that is most interesting to the user and most representative of the image content. Although the region of interest of a user is very subjective, and different users may select different regions as the region of interest for the same image due to different user tasks and different knowledge backgrounds, some regions in the image can always attract attention remarkably due to the commonality of the human visual system and the attention mechanism, and the regions often contain rich information, so that the salient regions in the image can be approximately judged by using certain low-level features of the image according to the characteristics of the human visual system and by utilizing general rules in the human cognitive process, for example, a human can determine the salient regions of the image according to color, brightness and the like.

The image significant region prediction graph may be a prediction result generated by using an existing significant region prediction method for an image, and optionally, the image significant region prediction result may be an image between a truth graph and a random graph of the significant region, where the truth graph is a best result generated by using the significant region prediction method for predicting the image significant region, and the random graph is a worst result generated by using the significant region prediction method for predicting the image significant region.

The preprocessing of the prediction result may be histogram equalization processing of the prediction graph. The present invention analyzes the original saliency region prediction map to find that it is not well perceptible for small details, e.g. branches of a tree are almost invisible to the naked eye, which makes it difficult to perform subsequent human subjective test experiments well. Referring to the existing method, processing the original salient region prediction map by Jet color is a good method, and the histogram equalization processing method proposed by Bruce and the like can better approach the state of human visual perception. Therefore, the invention finally adopts a histogram equalization processing method to process the prediction result.

As shown in fig. 2, fig. 2 is a schematic diagram of prediction result preprocessing according to an embodiment of the method for evaluating an image saliency prediction result of the present invention, and it is assumed that a preprocessing module 20 for a prediction result, a unit 21 without any processing, a unit 22 after Jet color processing, and a unit 23 after histogram equalization processing are used, and each unit includes an original picture, a true value map of a salient region, and a different prediction result, and since the details of the salient region of an image cannot be restored well to some extent by the prediction result without any processing, it is difficult to perform a subsequent human subjective test.

Further, the subjective test experiment is carried out on the prediction result, the subjective test experiment can be in a crowdsourcing mode, so that the subjects can compare the relative quality of the prediction results generated by adopting two different significant region prediction methods for the same image, and the relative significant relation of the two prediction results can be obtained according to the judgment results of different subjects.

Constructing a subjective test data result pair, wherein the subjective test data result pair is a quadruple, and the quadruple comprises: the method comprises the steps of constructing a subjective test result data set by supplementing a truth-value graph and a random graph data pair of a salient region, wherein the truth-value graph of the salient region, the predicted results generated by two methods respectively and a relative relation of the two predicted results, and the truth-value graph and the random graph data pair of the salient region are also a quadruple which comprises the truth-value graph, the random graph, the truth-value graph and the relative significance relation of the truth-value graph and the random graph of the salient region, the quadruple comprises the relative significance relation of 1, and the subjective test result data set is constructed, wherein the value of the relative relation is a real number normalized to the value between < -1,1 >.

Referring to fig. 3, fig. 3 is a schematic diagram of a subjective test data set according to an embodiment of the method for evaluating a prediction result of a salient region of the present invention.

In a specific implementation, taking the prediction result of one image as an example, 20 subjects are organized to perform human subjective test experiments on the prediction result, the subjective test result data set module is 30, the prediction result generated by the a prediction method is 30a, the prediction result generated by the B prediction method is 30B, the test result is 6/20 subjects believe that the result of 30a is better than 30B, 14/20 subjects believe that the result of 30B is better than 30a, the relationship between the two prediction methods can be 6:14, the result is normalized to be 0.43:1, and the relative significance relationship is 0.53 after the difference operation.

Constructing a subjective test data result pair quadruple, wherein the quadruple comprises: the significant region true value map, the prediction results generated by the two methods, respectively, and the relative significant relationship between the two prediction results, as shown in fig. 3, the quadruple includes a significant region true value map 30c, a prediction result 30a generated by the a prediction method, a prediction result 30B generated by the B prediction method, and a relative significant relationship 30d between the prediction results 30a and 30B.

And supplementing the truth map and random map data pair of the salient region into a subjective test data result pair, wherein the data pair comprises a true value map 30c, a random map 30e and a relative salient relation 30f of the true value map and the random map, and the relative salient relation 30f is 1, so that a subjective test result data set is obtained.

S202, aiming at the subjective test result data set, designing a mathematical model of a convolutional neural network structure for learning human evaluation standards based on relatively significant relations.

In a specific implementation manner, it is found through human cognition analysis that the process of human saliency evaluation on an image is a relative process, a relative relationship between two images is obtained, and learning of the relative relationship can better simulate human cognition, so that a convolutional neural network model based on the relative saliency relationship is constructed, namely a main network, the model is a two-way parallel parameter sharing model, the inputs of two branches of the model are respectively a prediction result and a true value map generated by an A prediction method, a prediction result and a true value map generated by a B prediction method, a parameter sharing module of the model uses a network architecture proposed by the prior art, and then a scoring function is used for scoring before each branch is converged, wherein the scoring is a value in [0,1], and the value is based on the subjective test result data set, the convolutional neural network based on the relative relationship is obtained through autonomous learning.

And performing difference operation on the scores at the junction of the two branches to obtain a relatively significant relation, wherein the difference is one value in [1,1 ].

In the convolutional neural network model based on the relative significance relationship, a loss function for auxiliary training is set, that is, the true value graph and the random graph of the significance region are respectively used as a prediction result generated by the prediction method a and a prediction result generated by the prediction method B, that is, the inputs of two branches are respectively: and training the convolutional neural network model to enable the output result and the difference result of each branch to better approximate the subjective test result.

Referring to fig. 4, fig. 4 is a schematic diagram of a convolutional neural network model based on a relative relationship according to an embodiment of the method for evaluating a significance prediction result of the present invention. The model can be a convolutional neural network model reference model 40 with two parallel parameters sharing, two branch units of the model, namely subnetworks 401 and 402 respectively, the input of each branch is respectively a prediction result and a true value diagram generated by a prediction method A and a prediction result and a true value diagram generated by a prediction method B, and after calculation of the convolutional neural network unit 403, a [0,1] output is output at the end of each branch]Numerical values in the interval, i.e. phi_CPJ(A,G)、Φ_CPJAnd (B, G) the numerical value is obtained by autonomous learning of a convolutional neural network based on the relative relationship on the basis of the subjective test result data set.

And performing difference operation to obtain the output of the model, namely the evaluation of the image significance prediction result, wherein the output is a numerical value in an interval of [ -1,1], specifically, B1-B5 in the convolutional neural network is a convolution module, and F6-F8 is a connection weight module of the convolutional neural network, the output plays a role of memorizing information in the network training process and is a series of numerical values, each numerical value is continuously adjusted in the training process until the output difference value passing through the model is consistent with human visual cognition, and in combination with fig. 3, the result after the difference operation in embodiment 3 is 0.57, and then the parameters in the convolutional neural network are assigned, and the convolutional neural network model based on the relative relationship is trained to enable the output result to be 0.57.

After the model parameters are determined, the salient region true value graph and the random graph are taken as input and are substituted into the model, wherein the loss function is set in the part, theoretically, the output result of each branch should be 1 and 0, and the difference result of the true value graph and the random graph should be 1, actually, after the calculation of the convolutional neural network, the output of each branch should be approximate to 1 and 0, and the result of the same difference can be approximate to 1, so that the three training-assisted results are combined, the existing loss function is supplemented into the convolutional neural network, and the result consistent with the human visual cognition can be obtained.

S203, extracting a sub-network from the convolutional neural network model based on the relative relationship, and acquiring an evaluation model of the image significance prediction result.

After the convolutional neural network model based on the relative relationship is trained, an evaluation sub-network of the image significance prediction result is extracted from the model, so that the evaluation method of the image significance prediction result is obtained, the input of the evaluation method of the significance prediction result is a prediction result generated by any prediction method and a truth value diagram of the significance region, and one value in [0,1] is output through the calculation of the convolutional neural network, wherein the value represents the evaluation result of the model on the prediction result.

Referring to fig. 5, fig. 5 is a significance prediction result evaluation model 50 of an embodiment of the method for evaluating an image significance prediction result according to the present invention, which is an evaluation sub-network that extracts a significance prediction result from a convolutional neural network model based on a relative relationship, that is, 401 or 402 in the above-mentioned embodiment 4, the output of the model, which is a truth diagram of a prediction result generated by a prediction method and the significant region, is a value in [0,1] through calculation by a convolutional neural network, and the value is an evaluation of the image significance prediction result.

The method for evaluating an image saliency prediction result provided by this embodiment obtains a saliency region prediction result generated by an existing saliency region prediction method on a plurality of image data sets, preprocesses the prediction result, performs a subjective test experiment on the prediction result to obtain a relative relationship between two prediction results, then constructs a subjective test result data pair quadruple, supplements a truth value map and a random map data pair of the saliency region, constructs a subjective test result data set, designs a mathematical model of a convolutional neural network structure that learns human evaluation criteria based on a relative saliency relationship for the subjective test result data set, trains the model, extracts a saliency prediction result evaluation sub-network from the model after training the convolutional neural network model based on the relative saliency relationship, and thus obtains an evaluation method of the image saliency prediction result, according to the technical scheme, the method for evaluating the image significance prediction result is constructed, and the evaluation on the image significance prediction result is realized.

Fig. 6 is a schematic structural diagram of an embodiment of an evaluation apparatus for an image saliency prediction result according to the present invention, and as shown in fig. 6, the image saliency evaluation apparatus 60 includes:

an obtaining module 601, configured to perform image salient region prediction on an image to obtain a salient region prediction map of the image;

the processing module 602 is configured to obtain an evaluation result of the saliency prediction result of the image by using an evaluation model of the saliency prediction result of the image obtained by pre-training according to the saliency region prediction result;

The image saliency evaluation device of this embodiment is used to execute the technical solutions provided by the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and are not described herein again.

On the basis of the foregoing embodiment, the obtaining module 601 is specifically configured to:

the processing module 602 is specifically configured to: and designing a mathematical model of a convolutional neural network structure based on a relatively significant relation for learning human evaluation standards according to the subjective test data set, and training the mathematical model to obtain an evaluation model of the image significance prediction result.

Optionally, the obtaining module 601 is specifically configured to:

and analyzing the result pair of the subjective test data, supplementing a true value graph and a random graph of the salient region, and acquiring the subjective test data set.

Optionally, the obtaining module 601 is specifically configured to: the prediction result is subjected to preprocessing of histogram equalization processing.

Optionally, the obtaining module 601 is specifically configured to:

supplementing the true value image and random image data pairs obtained when each image salient region prediction method is used for prediction into a subjective test data result pair, and constructing a subjective test data set; wherein the true value map is the best result generated by each significant region prediction method for each significant region, and the random map is the worst result generated by each significant region prediction method for each significant region.

The present invention also provides a terminal, comprising: memory, processor. The memory is to store computer instructions; the processor is configured to execute the computer instructions stored by the memory to enable evaluation of image saliency predictions in concert with human visual perception.

An embodiment of the present invention further provides a storage medium, including: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions are for enabling evaluation of image saliency predictions in conformity with human visual perception.

In any of the above embodiments of the apparatus, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (magnetic tape), floppy disk (flexible disk), optical disk (optical disk), and any combination thereof.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for evaluating an image significance prediction result is characterized by comprising the following steps:

carrying out image salient region prediction on an image to obtain a prediction result of the image salient region;

obtaining the evaluation of the image significance prediction result by adopting an evaluation model of the image significance prediction result obtained according to pre-training;

the evaluation model of the image significance prediction result is obtained by training a mathematical model of a convolutional neural network structure, which is learned based on human evaluation standards of relative significance relations, according to a subjective test data set;

before the pre-obtained evaluation model of the image significance prediction result is adopted to obtain the evaluation result of the image significance prediction result according to the significant region prediction result, the method further comprises the following steps:

aiming at the subjective test data set, designing a mathematical model of a convolutional neural network structure based on a relatively significant relation and learning human evaluation standards, and training the mathematical model to obtain an evaluation model of the image significance prediction result;

analyzing the subjective test data result pair, supplementing a salient region true value image and a random image data pair, and constructing the subjective test data set;

analyzing the subjective test data result pair, supplementing a significant region true value image and a random image data pair, and constructing the subjective test data set, wherein the subjective test data set comprises:

supplementing the true value image and random image data pair of each image salient region into the subjective test data result pair to obtain the subjective test data set; wherein the true value map is the best result generated by each significant region prediction method for each significant region, and the random map is the worst result generated by each significant region prediction method for each significant region.

2. The method of claim 1, wherein the pre-processing the prediction result comprises:

and carrying out histogram equalization processing on the prediction result.

3. An apparatus for evaluating an image saliency prediction result, comprising:

the acquisition module is used for predicting the image salient region of the image to obtain a prediction result of the image salient region;

the processing module is used for acquiring the evaluation of the image significance prediction result by adopting an evaluation model of the image significance prediction result acquired according to pre-training according to the significant region prediction result;

the acquisition module is specifically configured to:

the processing module is specifically configured to: aiming at the subjective test data set, designing a mathematical model of a convolutional neural network structure based on a relatively significant relation and learning human evaluation standards, and training the mathematical model to obtain an evaluation model of the image significance prediction result;

the acquisition module is specifically configured to:

4. The apparatus of claim 3, wherein the obtaining module is specifically configured to: and preprocessing the prediction result by histogram equalization processing.