CN112487225A - Saliency image generation method and device and server - Google Patents

Saliency image generation method and device and server Download PDF

Info

Publication number
CN112487225A
CN112487225A CN202011453179.8A CN202011453179A CN112487225A CN 112487225 A CN112487225 A CN 112487225A CN 202011453179 A CN202011453179 A CN 202011453179A CN 112487225 A CN112487225 A CN 112487225A
Authority
CN
China
Prior art keywords
group
sample image
forgetting
prediction model
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011453179.8A
Other languages
Chinese (zh)
Other versions
CN112487225B (en
Inventor
胡屹凛
高星宇
方小刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Zhejiang Industrial Internet Co Ltd
Original Assignee
China Unicom Zhejiang Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Zhejiang Industrial Internet Co Ltd filed Critical China Unicom Zhejiang Industrial Internet Co Ltd
Priority to CN202011453179.8A priority Critical patent/CN112487225B/en
Publication of CN112487225A publication Critical patent/CN112487225A/en
Application granted granted Critical
Publication of CN112487225B publication Critical patent/CN112487225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method, a device and a server for generating a saliency image, wherein the method comprises the following steps: acquiring forgetting degree characteristics of all test users, and dividing the test users into a plurality of groups according to the forgetting degree characteristics of the test users, wherein the forgetting degree characteristics of each group are the same; acquiring a clear sample image set and an annotated sample image set of each group, wherein the annotated sample image set corresponds to the clear sample image set; training the marked sample image set and the clear sample image of each group by adopting a neural network model respectively to construct a prediction model of each group; the method comprises the steps of obtaining the forgetting degree characteristic of a target user, determining a target prediction model according to the forgetting degree characteristic of the target user and the prediction model, and inputting a target image into the target prediction model to generate a saliency image, so that the matching degree of the generated saliency image and the saliency of an actual visual system of the user is improved.

Description

Saliency image generation method and device and server
Technical Field
The invention relates to the field of computer vision, in particular to a method and a device for generating a saliency image and a server.
Background
The saliency of the human visual system has the capability of quickly searching and locating interested objects, and when people face natural scenes, unimportant information can be quickly filtered out according to the saliency of the visual system, so that the attention of people is focused on interested areas. Therefore, in an application scene of image pushing, the degree of interest of the user on the image to be pushed is improved by predicting the significance of visual systems of different users and generating the image to be pushed according to the predicted image significance prediction result.
In the prior art, a saliency detection method mostly adopts a deep learning method, training is performed according to saliency image samples submitted by a user, that is, convolution operation and pooling operation are combined, image features are extracted by using a deep convolution neural network to obtain a saliency prediction model, and saliency images to be pushed are obtained according to the saliency prediction model.
However, in the prior art, the forgetting of the user on the image may cause some feature information of the image to be lost in the saliency image sample submitted by the user, so that the accuracy of obtaining the saliency prediction model is low, and the generated saliency image does not match with the saliency of the actual visual system of the user.
Disclosure of Invention
The invention aims to provide a method, a device and a server for generating a saliency image, so as to improve the matching degree of the generated saliency image and the saliency of an actual visual system of a user.
In a first aspect, the present invention provides a saliency image generation method, comprising:
acquiring forgetting degree characteristics of all test users, and dividing the test users into a plurality of groups according to the forgetting degree characteristics of the test users, wherein the forgetting degree characteristics of each group are the same;
acquiring a clear sample image set and an annotated sample image set of each group, wherein the annotated sample image set corresponds to the clear sample image set;
training the marked sample image set and the clear sample image of each group by adopting a neural network model to construct a prediction model of each group;
acquiring the forgetting degree characteristic of a target user, determining a target prediction model according to the forgetting degree characteristic of the target user and the prediction model, and inputting a target image into the target prediction model to generate a significant image.
In one possible design, the dividing the test users into a plurality of groups according to the forgetfulness characteristics of the test users includes:
dividing the test users into a first group, a second group and a third group according to the forgetting degree characteristics of the test users, wherein the forgetting degree of the test users in the first group is higher, the forgetting degree of the test users in the second group is normal, and the forgetting degree of the test users in the third group is lower.
In a possible design, the training with the neural network model respectively performed on the labeled sample image set and the clear sample image of each population to construct a prediction model of each population includes:
respectively obtaining a first training set, a second training set and a third training set according to the labeled sample image sets and the clear sample image sets of the first population, the second population and the third population, respectively performing convolutional neural network training on the first training set, the second training set and the third training set by adopting a neural network model, and constructing a first prediction model, a second prediction model and a third prediction model;
wherein the convolutional neural network training step is:
constructing a convolutional neural network model, inputting the clear sample image set into the convolutional neural network model for iterative training, and obtaining a significance prediction image set;
and determining a loss function set according to the marked sample image set and the significance prediction image set, and determining a prediction model according to a minimum loss function in the loss function set.
In one possible design, the obtaining a first training set, a second training set, and a third training set according to the labeled sample image sets and the clear sample image sets of the first population, the second population, and the third population, respectively, and constructing a first prediction model, a second prediction model, and a third prediction model according to the first training set, the second training set, and the third training set, respectively, includes;
performing convolutional neural network model training according to the first training set, the second training set and the third training set respectively to construct a first prediction model, a second prediction model and a third prediction model, wherein the convolutional neural network model training comprises the following steps:
constructing a convolutional neural network model, inputting the clear sample image set into the convolutional neural network model for iterative training, and obtaining a visual prediction map set;
and determining a loss function set according to the marked sample image set and the visual prediction image set, and determining a prediction model according to a minimum loss function in the loss function set.
In one possible design, the obtaining the set of clear sample images and the set of labeled sample images for each population includes:
acquiring a clear sample image set, and performing Gaussian blur processing on the clear sample image set to obtain a test image set;
and obtaining the labeled sample image sets of the users in different groups according to the labeled data of the users in different groups on the test image set, wherein the labeled data is coordinate data.
In one possible design, the obtaining the forgetfulness characteristics of all the test users includes:
acquiring forgetting degree evaluation data of all test users;
obtaining a forgetting degree characteristic of a test user according to the forgetting degree evaluation data, wherein the forgetting degree evaluation data comprises: at least one of a psychological scale assessment data, a rapid test questionnaire assessment data, and a self assessment data.
In a second aspect, an embodiment of the present invention provides a saliency image generation apparatus, based on the saliency image generation method in any one of the first aspects, including:
the acquisition module is used for acquiring the forgetting characteristics of all test users and dividing the test users into a plurality of groups according to the forgetting characteristics of the test users, wherein the forgetting characteristics of each group are the same;
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a clear sample image set and an annotated sample image set of each group, and the annotated sample image set corresponds to the clear sample image set;
the construction module is used for training the marked sample image set and the clear sample image of each group by adopting a neural network model respectively to construct a prediction model of each group;
and the generation module is used for obtaining the forgetting characteristic of the target user, determining a target prediction model according to the forgetting characteristic of the target user and the prediction model, and inputting a target image into the target prediction model to generate a significant image.
In one possible design, the obtaining module is specifically configured to:
dividing the test users into a first group, a second group and a third group according to the forgetting degree characteristics of the test users, wherein the forgetting degree of the test users in the first group is higher, the forgetting degree of the test users in the second group is normal, and the forgetting degree of the test users in the third group is lower.
In a third aspect, an embodiment of the present invention provides a server, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the memory-stored computer-executable instructions cause the at least one processor to perform the salient image generation method of any of the first aspects;
in a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the salient image generation method according to any one of the first aspects is implemented.
In a fifth aspect, an embodiment of the present invention provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for generating a saliency image according to any one of the first aspect is implemented.
According to the method, the device and the server for generating the saliency image, provided by the embodiment of the invention, the test user is divided into a plurality of groups according to the forgetting degree characteristic of the test user, the training set of each group is obtained according to the marked sample image set and the clear sample image set of each group, a plurality of prediction models are respectively constructed according to the training set of each group, the target prediction model is determined according to the forgetting degree characteristic and the prediction models of the target user, the target image is input into the target prediction model to generate the saliency image, and the matching degree of the generated saliency image and the saliency of the actual visual system of the user is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic view of an application scenario of a saliency image generation method according to an embodiment of the present invention;
fig. 2 is a flowchart of a salient image generation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a clear sample image provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a blur test picture according to an embodiment of the present invention;
fig. 5 is a flowchart of a significant image generation method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a neural network structure employed in an embodiment of the present invention;
fig. 7 is a schematic diagram of an upsampling process provided by an embodiment of the present invention;
fig. 8 is a schematic diagram of an FCN hopping structure provided in an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a saliency image generation device according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
With the wide spread of mass images of the internet, the significance analysis result of the human visual system is applied, and the image characteristics are reasonably arranged and designed to improve the effectiveness of image information transfer, so that the method becomes a development direction in the field of computer vision. The saliency of the human visual system has the capability of quickly searching and locating interested objects, and when people face natural scenes, unimportant information can be quickly filtered out according to the saliency of the visual system, so that the attention of people is focused on interested areas. Therefore, in an application scene of image pushing, the degree of interest of the user on the image to be pushed is improved by predicting the significance of visual systems of different users and generating the image to be pushed according to the predicted image significance prediction result.
In the prior art, a saliency detection method mostly adopts a deep learning method, training is performed according to saliency image samples submitted by a user, that is, convolution operation and pooling operation are combined, image features are extracted by using a deep convolution neural network to obtain a saliency prediction model, and saliency images to be pushed are obtained according to the saliency prediction model. However, in the prior art, the forgetting of the user on the image may cause some feature information of the image to be lost in the saliency image sample submitted by the user, so that the accuracy of obtaining the saliency prediction model is low, and the generated saliency image does not match with the saliency of the actual visual system of the user.
In order to solve the above technical problem, an embodiment of the present invention provides a saliency image generation method, in which a test user is divided into a plurality of groups according to a forgetting degree feature of the test user, a training set of each group is obtained according to a labeled sample image set and a clear sample image set of each group, a plurality of prediction models are constructed according to the training set of each group, a target prediction model is determined according to the forgetting degree feature of the target user and the plurality of prediction models, and a saliency image is generated by inputting the target image into the target prediction model, so that a matching degree of the generated saliency image and a saliency of an actual visual system of the user is improved.
Fig. 1 is a schematic view of an application scenario of a saliency image generation method provided by an embodiment of the present invention, as shown in fig. 1: the user scheduling application scenario in the embodiment of the present invention includes a terminal 10 for collecting a forgetting feature of a test user and a server 20 for generating a saliency image. Wherein, the terminal 10 in the forgetting degree characteristic collection terminal 10 includes but is not limited to: desktop computers, notebook computers, tablet computers, mobile phones, and the like. The terminal 10 may be installed with an application client for acquiring the forgetting degree characteristic, and the test user obtains the forgetting degree evaluation data of the test user by performing a test on the application client. In addition, the terminal 10 further acquires the annotation sample images of the test users, and obtains an annotation sample image set according to the acquired annotation sample images of all the test users. Each terminal 10 sends the acquired forgetting degree evaluation data and the annotation sample image to the server. The server 20 may be a server, a server cluster composed of several servers, or a cloud computing service platform. The server 20 analyzes the forgetting degree of the test user according to the forgetting degree evaluation data, and obtains the specific significant image model according to the labeled sample image set. The following examples are given for illustrative purposes.
Fig. 2 is a first flowchart of a saliency image generation method according to an embodiment of the present invention. The execution subject of the embodiment of the present invention may be the server 20 shown in fig. 1. As shown in fig. 2, the method for generating a saliency image according to the embodiment of the present invention includes the following steps:
s201: and acquiring the forgetting characteristics of all the test users, and dividing the test users into a plurality of groups according to the forgetting characteristics of the test users, wherein the forgetting characteristics of each group are the same.
In the embodiment of the invention, illustratively, forgetting degree evaluation data of all test users are obtained; obtaining the forgetting degree characteristic of the test user according to the forgetting degree evaluation data, wherein the forgetting degree evaluation data comprises: at least one of a psychological scale assessment data, a rapid test questionnaire assessment data, and a self assessment data.
In the embodiment of the present invention, specifically, the test users obtain the forgetting degree evaluation data of all the test users through a forgetting degree test program on the terminal, where the forgetting degree evaluation data may be at least one of psychology scale evaluation data, quick test questionnaire evaluation data, and self evaluation data. After the terminal obtains the forgetting degree evaluation data of the test user, the forgetting degree of the test user is judged by analyzing the forgetting degree evaluation data.
In the embodiment of the present invention, for example, after obtaining the forgetting degree evaluation data of all the test users, the test users may be divided into a first group, a second group and a third group according to the forgetting degree characteristics of the test users, where the forgetting degree of the test users in the first group is higher, the forgetting degree of the test users in the second group is normal, and the forgetting degree of the test users in the third group is lower. By dividing the test users with the same forgetting degree into the same group and analyzing the significance image of each group, the matching between the predicted significance image and the forgetting degree of each test user is improved.
S202: and acquiring a clear sample image set and an annotated sample image set of each population, wherein the annotated sample image set corresponds to the clear sample image set.
In the embodiment of the invention, illustratively, a clear sample image set is obtained, and a test image set is obtained by performing gaussian fuzzy processing on the clear sample image set; and obtaining the labeled sample image sets of the users in different groups according to the labeled data of the users in different groups on the test image set, wherein the labeled data is coordinate data.
With the development of eye tracking technology, user gaze data may be used in the fields of behavioral analysis, psychological analysis, and the like. Mouse data has been proved to have strong correlation with eye movement data in some researches, and visual attention correlation researches can be carried out instead of the eye movement data. In an embodiment of the present invention, mouse click data is used instead of eye tracking data. Specifically, as shown in fig. 3, fig. 3 is a schematic diagram of a clear sample image provided by the embodiment of the present invention. After a test user watches a clear sample image in a clear sample image set for a period of time, the clear sample image is blurred to obtain a blurred test picture, as shown in fig. 4, where fig. 4 is a schematic diagram of the blurred test picture provided by the embodiment of the present invention. And requiring the test user to recall all the regions which are watched before and requiring the test user to click the region with memory in the fuzzy test picture by using a mouse, wherein when the test user clicks, the region with a certain radius of the clicked region is displayed clearly. Recording all coordinates clicked by a test user, and generating an annotation sample image of the test user after Gaussian blurring of coordinate data, wherein each annotation sample image is an annotation hot area image.
In the embodiment of the present invention, for example, the number of the clear sample image sets is 200, and according to the method for obtaining labeled sample images, all labeled coordinate data of the same clear sample image by the test users of each group are summed together to obtain a labeled coordinate data set. The mouse coordinates are subjected to Gaussian blur by 32 degrees to match the visual angle of human eyes, a labeling sample image set corresponding to the group of test users is generated after the labeling coordinate data set is subjected to Gaussian blur by utilizing a Gaussian function, the images are presented in a fog-like shadow mode, and the transparency of some regions is higher, so that the regions are more concerned by the test users. The generation of the marked sample image adopts a Gaussian fuzzy algorithm, and in order to enable the marked sample image to transit more naturally, the Gaussian fuzzy algorithm that each point can affect all pixels on the image is adopted.
In the embodiment of the present invention, specifically, the calculation method of the gaussian function is shown in formula (1):
Figure BDA0002832250920000071
where x is the mean of the random variables following a normal distribution, μ is the mean of x, σ2Is the variance of x and is also the radius of the gaussian blur). And setting the distribution of the hotspot point recall mouse annotation data to generate a hotspot graph as I. The calculation process of the pixel value of the ith point is as described in formula (2).
Figure BDA0002832250920000081
Wherein, I is a distribution matrix of the hotspot graph generated by marking coordinates of a user of one picture, and the initial value of I is a matrix which has the same size with the picture and is set as 0. i is a certain pixel point in the matrix when n data points are calculated. S is equal to sigma2And sigma is an influence factor in a Gaussian function, the value is 32 in the embodiment of the invention, and d represents the Euclidean distance between the pixel point and the labeling point.
S203: and training the marked sample image set and the clear sample image of each group by adopting a neural network model respectively to construct a prediction model of each group.
In an embodiment of the present invention, illustratively, a first training set, a second training set, and a third training set are obtained according to labeled sample image sets and clear sample image sets of a first population, a second population, and a third population, respectively, and a neural network model is used to perform convolutional neural network training on the first training set, the second training set, and the third training set, respectively, so as to construct a first prediction model, a second prediction model, and a third prediction model.
In the embodiment of the invention, the test users with different forgetting degrees can generate different visual attention distributions when watching pictures, the test users with the same forgetting degree are divided into a group, user labeling data sets with different forgetting degrees are established, a labeling sample image set and a clear sample image of each group are respectively trained, and a prediction model of each group is established.
In the embodiment of the invention, the neural network model adopts a full convolution neural network, wherein the full convolution neural network realizes the output of end-to-end pixel-to-pixel image prediction results. In the process of training each group, a user forgetting detection model based on user fixation point recall data takes a labeling sample image set and a clear sample image set of each group corresponding to each group as training samples of a training model, the training samples are divided into a training sample training set, a verification set and a test set according to 60%, 20% and 20%, model evaluation is carried out on the test set by utilizing training set training pictures, and the clear sample image is trained to obtain a prediction model.
S204: and acquiring the forgetting degree characteristic of the target user, determining a target prediction model according to the forgetting degree characteristic of the target user and the prediction model, and inputting the target image into the target prediction model to generate a significant image.
In the embodiment of the invention, in order to predict the saliency image of the target user, firstly, the forgetting degree characteristic of the target user is obtained. Illustratively, the forgetting degree evaluation data of the target user is obtained through the terminal, and the forgetting degree evaluation data is analyzed to obtain the forgetting degree characteristic of the target user. And judging a target prediction model corresponding to the target user according to the forgetting degree characteristic of the target user. Illustratively, if the forgetting degree characteristic of the target user is that the forgetting degree is high, the first prediction model is used as the target prediction model. The target image is used as the input of the target prediction model, and the output content is the predicted saliency image of the target user corresponding to the target image. Wherein the generated saliency image matches the visual saliency of the target user and also matches the forgetting degree of the target user.
It can be known from the foregoing embodiment that, by dividing the test user into a plurality of groups according to the forgetting degree feature of the test user, obtaining the training set of each group according to the labeled sample image set and the clear sample image set of each group, respectively constructing a plurality of prediction models according to the training set of each group, determining a target prediction model according to the forgetting degree feature of the target user and the plurality of prediction models, and inputting the target image into the target prediction model to generate a saliency image, the matching degree of the saliency image generated and the saliency of the actual visual system of the user is improved.
Fig. 5 is a flow chart of a significant image generation method provided by an embodiment of the present invention, and based on the embodiment of fig. 2, as shown in fig. 5, the steps of convolutional neural network training provided by an embodiment of the present invention are as follows:
s501: and constructing a convolutional neural network model, inputting the clear sample image set into the convolutional neural network model for iterative training, and obtaining a significance prediction image set.
In the embodiment of the invention, the convolutional layer in the neural network replaces the fully-connected layer of the VGG network, and the VGG network in a fully-convolutional form is used as a training deep neural network. In a convolutional neural network, the image size passing through the pooling layer becomes smaller, and the pooling layer acts as a down-sampling. Because the task of predicting the visual saliency needs to output a segmentation map with the same size as that of an input picture, an upsampling mode is usually adopted to restore the prediction output with the same image size. The upsampling refers to a technology capable of upsampling an image to a higher resolution, and an upsampling mode of deconvolution is learnable, and specifically, fig. 6 is a schematic diagram of a neural network structure adopted in the embodiment of the present invention. As shown in fig. 6, the FCN32s network is reduced in size through multiple pooling layers to 1/32 the size of the original image, and the network output is restored by upsampling to the same prediction as the original image size.
Fig. 7 is a schematic diagram of an upsampling process provided by an embodiment of the present invention, and as shown in fig. 7, a full convolution neural network model provided by an embodiment of the present invention performs upsampling by using deconvolution. Specifically, taking an input size of 3 × 3 as an example, a deconvolution operation is performed using a convolution kernel of 3 × 3, the step size is 2, the padding is 1, the deconvolved output is the upper square, and the size increase is 5 × 5. The result output after 32 times of upsampling the output of the seventh convolutional layer in the full convolutional neural network provided by the embodiment of the present invention is called FCN32 s. Exemplarily, fig. 8 is a schematic diagram of an FCN hopping structure provided in an embodiment of the present invention. As shown in fig. 8, by using a skip structure, the detail content of the output image is improved, and the FCN16s can be generated by performing upsampling twice as much as the prediction of conv7 and performing fusion of pool4 and then performing upsampling 16 times as much. In the same principle, the output from conv7 is upsampled by a factor of 4 and then fused with the prediction of the fourth pooling layer which has been upsampled by a factor of two, and the prediction of the third pooling layer, and then upsampled by a factor of 8 to generate FCN8 s. The jumping structure can fill in some missing data, improve the precision of FCN prediction and enable the prediction to be more accurate. Illustratively, FCN-8S is adopted as the full convolution neural network model provided by the embodiment of the invention, so that the accuracy of the prediction model is improved.
S302: and determining a loss function set according to the marked sample image set and the significance prediction image set, and determining a prediction model according to a minimum loss function in the loss function set.
In the embodiment of the invention, the clear sample image set and the corresponding marked sample image set are input into the visual saliency model for training. And in the training of the model, the model can automatically learn the visual perception characteristics of human eyes from the pictures and the corresponding crowdsourcing eye movement data by minimizing the loss value between the true value image and the prediction result. An exemplary user population labeled saliency image of different degrees of forgetfulness is a grayscale map, mapping the grayscale map [0, 255] to a [0,1] range, the value of the [0,1] range for each pixel of the saliency map can be taken as the minimum true value map for that pixel, with closer to 1 representing more saliency.
In an embodiment of the present invention, the loss function of the visual saliency prediction model is: taking the example of using FCN for semantic segmentation of images, when the segmentation class is 2, the prediction for different classes is {0,1 }. For the visual saliency prediction task, the prediction result of each pixel in the image is a range value of Qi epsilon [0,1], in which 1 represents the highest degree of visual saliency and 0 represents the least saliency. For this reason, the loss function calculation of the visual saliency prediction model in the embodiment of the present invention is as shown in equation (3).
Figure BDA0002832250920000101
Wherein, Pi=σ(fi(Θ)) is the predicted output f of the FCN networki(Θ) and Sigmoid functions, Sigmoid functions being σ (x) — (1+ exp (-x)) -1. The loss calculation method in equation (3) is often used for the two-class problem, and Qi is equal to {0,1}, Q is predicted when the two-class problem is predictediIs 0 or 1. In predicting the visual saliency task, a similar loss calculation is used, with the portion of the variation being the range value defining Qi as the interval 0 to 1, i.e. Qi E [0,1]]. The loss function is better matched to [0,1]]The previous predicted values are matched, and the evaluation index of the visual significance model, namely the loss of the relative entropy (KL for short), can be optimized while the change loss is minimized.
Illustratively, when the momentum parameter of the model training is 0.9, the initial learning rate is set according to the number of data set pictures. For example, when the number of pictures is 1000, the initial learning of training may be set to 10-5, and the learning rate decreases by 10 times every 2 iteration processes. All picture input sizes are controlled to fix pixels at the longest edge. Specifically, the evaluated indexes are a linear Correlation Coefficient (referred to as CC) and a KL index. CC and KL are two of the multiple visual significance study-related indicators, wherein minimizing model loss is equivalent to a KL loss value that is one of the optimized evaluation indicators. The visual saliency model is used for comparing the prediction output of the test set picture with the collected user annotation graph truth value, and the model can be evaluated by using two indexes of CC and KL. Where CC may measure the correlation between the prediction output and the user-labeled gray scale map. The same is done for the wrong positive and negative samples. A larger CC indicates a closer proximity of the two distributions. The calculation of CC is shown in equation (4).
Figure BDA0002832250920000111
Wherein, Q is the distribution of the true values, and P is the picture to be compared. Here, Q is the hot-zone map of the user mouse annotation and P is the prediction output of the visual saliency model. Wherein the content of the first and second substances,
Figure BDA0002832250920000112
Figure BDA0002832250920000113
the CC ranges from-1 to 1, and when the value of CC is 1, it means that the correlation between the two picture distributions is the largest.
In the present embodiment, the distribution of KL for the measurement is the true value. Illustratively, in the embodiment of the invention, the true value is to label the sample image. The value of each pixel in the labeled sample image set can be used as a measure of the importance degree of the pixel, KL is used as prediction of high punishment error, and when KL is higher, the prediction capability of the significance image prediction model is weaker. If the predicted result is greatly different from the true value, the KL value becomes large. The graph with the true value is Q, the graph with the predicted significance is P, and calculation of KL is shown in equation (5).
Figure BDA0002832250920000114
Wherein the content of the first and second substances,
Figure BDA0002832250920000115
is the cross-entropy of the true value,
Figure BDA0002832250920000116
cross entropy is the significance and truth of the prediction. A larger KL indicates a larger difference in the distribution of the two graphs, and when KL (P, Q) is 0, the two graphs indicate the same, and the current saliency image prediction mode indicatesIf the prediction capability of the model is strong, the prediction model can be determined.
According to the embodiment, the prediction model is constructed by adopting the convolutional neural network, the clear sample image set is input into the convolutional neural network model for iterative training, the significance prediction image set is obtained, the prediction model is determined according to the marked sample image set and the significance prediction image set, and the matching degree of the significance of the generated significance image and the actual visual system of the user is improved.
Fig. 9 is a schematic structural diagram of a saliency image generation device according to an embodiment of the present invention. As shown in fig. 9, the saliency image generation device includes: an obtaining module 901, an obtaining module 902, a constructing module 903 and a generating module 904.
An obtaining module 901, configured to obtain forgetting characteristics of all test users, and divide the test users into multiple groups according to the forgetting characteristics of the test users, where the forgetting characteristics of each group are the same;
an obtaining module 902, configured to obtain a clear sample image set and an annotated sample image set of each group, where the annotated sample image set corresponds to the clear sample image set;
a building module 903, configured to train the labeled sample image set and the clear sample image of each group respectively by using a neural network model, and build a prediction model of each group;
a generating module 904, configured to obtain a forgetting characteristic of a target user, determine a target prediction model according to the forgetting characteristic of the target user and the prediction model, and input a target image into the target prediction model to generate a significant image.
In a possible implementation manner, the obtaining module is specifically configured to:
dividing the test users into a first group, a second group and a third group according to the forgetting degree characteristics of the test users, wherein the forgetting degree of the test users in the first group is higher, the forgetting degree of the test users in the second group is normal, and the forgetting degree of the test users in the third group is lower.
The apparatus provided in this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention. As shown in fig. 10, the server 100 of the present embodiment includes: a processor 1001 and a memory 1002; wherein:
a memory 1002 for storing computer-executable instructions;
the processor 1001 is configured to execute the computer executable instructions stored in the memory to implement the steps performed by the server in the above embodiments.
Reference may be made in particular to the description relating to the method embodiments described above.
In one possible design, the memory 1002 may be separate or integrated with the processor 1001.
When the memory 1002 is provided separately, the server further includes a bus 1003 for connecting the memory 1002 and the processor 1001.
The embodiment of the present invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for generating a saliency image as described above is implemented.
Embodiments of the present invention further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for generating a saliency image as described above is implemented.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to implement the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods described in the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A saliency image generation method characterized by comprising:
acquiring forgetting degree characteristics of all test users, and dividing the test users into a plurality of groups according to the forgetting degree characteristics of the test users, wherein the forgetting degree characteristics of each group are the same;
acquiring a clear sample image set and an annotated sample image set of each group, wherein the annotated sample image set corresponds to the clear sample image set;
training the marked sample image set and the clear sample image of each group by adopting a neural network model to construct a prediction model of each group;
acquiring the forgetting degree characteristic of a target user, determining a target prediction model according to the forgetting degree characteristic of the target user and the prediction model, and inputting a target image into the target prediction model to generate a significant image.
2. The method of claim 1, wherein the grouping the test users into a plurality of groups according to their forgetfulness characteristics comprises:
dividing the test users into a first group, a second group and a third group according to the forgetting degree characteristics of the test users, wherein the forgetting degree of the test users in the first group is higher, the forgetting degree of the test users in the second group is normal, and the forgetting degree of the test users in the third group is lower.
3. The method of claim 2, wherein the training with the neural network model for the labeled sample image set and the clear sample image of each group respectively to construct the prediction model of each group comprises:
respectively obtaining a first training set, a second training set and a third training set according to the labeled sample image sets and the clear sample image sets of the first population, the second population and the third population, respectively performing convolutional neural network training on the first training set, the second training set and the third training set by adopting a neural network model, and constructing a first prediction model, a second prediction model and a third prediction model;
wherein the convolutional neural network training step is:
constructing a convolutional neural network model, inputting the clear sample image set into the convolutional neural network model for iterative training, and obtaining a significance prediction image set;
and determining a loss function set according to the marked sample image set and the significance prediction image set, and determining a prediction model according to a minimum loss function in the loss function set.
4. The method of claim 1, wherein obtaining the set of sharp sample images and the set of labeled sample images for each population comprises:
acquiring a clear sample image set, and performing Gaussian blur processing on the clear sample image set to obtain a test image set;
and obtaining the labeled sample image sets of the users in different groups according to the labeled data of the users in different groups on the test image set, wherein the labeled data is coordinate data.
5. The method according to any one of claims 1 to 4, wherein the obtaining of the forgetfulness characteristics of all the test users comprises:
acquiring forgetting degree evaluation data of all test users;
obtaining a forgetting degree characteristic of a test user according to the forgetting degree evaluation data, wherein the forgetting degree evaluation data comprises: at least one of a psychological scale assessment data, a rapid test questionnaire assessment data, and a self assessment data.
6. A saliency image generation device characterized by comprising:
the acquisition module is used for acquiring the forgetting characteristics of all test users and dividing the test users into a plurality of groups according to the forgetting characteristics of the test users, wherein the forgetting characteristics of each group are the same;
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a clear sample image set and an annotated sample image set of each group, and the annotated sample image set corresponds to the clear sample image set;
the construction module is used for training the marked sample image set and the clear sample image of each group by adopting a neural network model respectively to construct a prediction model of each group;
and the generation module is used for obtaining the forgetting characteristic of the target user, determining a target prediction model according to the forgetting characteristic of the target user and the prediction model, and inputting a target image into the target prediction model to generate a significant image.
7. The apparatus of claim 6, wherein the obtaining module is specifically configured to:
dividing the test users into a first group, a second group and a third group according to the forgetting degree characteristics of the test users, wherein the forgetting degree of the test users in the first group is higher, the forgetting degree of the test users in the second group is normal, and the forgetting degree of the test users in the third group is lower.
8. A server, comprising a memory and at least one processor;
the memory is used for storing computer execution instructions;
at least one processor configured to execute computer-executable instructions stored by the memory to cause the at least one processor to perform the saliency image generation method of any of claims 1 to 5.
9. A computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement the saliency image generation method of any one of claims 1 to 5.
10. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the saliency image generation method of any of claims 1 to 5.
CN202011453179.8A 2020-12-11 2020-12-11 Saliency image generation method and device and server Active CN112487225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011453179.8A CN112487225B (en) 2020-12-11 2020-12-11 Saliency image generation method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011453179.8A CN112487225B (en) 2020-12-11 2020-12-11 Saliency image generation method and device and server

Publications (2)

Publication Number Publication Date
CN112487225A true CN112487225A (en) 2021-03-12
CN112487225B CN112487225B (en) 2022-07-08

Family

ID=74916674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011453179.8A Active CN112487225B (en) 2020-12-11 2020-12-11 Saliency image generation method and device and server

Country Status (1)

Country Link
CN (1) CN112487225B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549863A (en) * 2022-04-27 2022-05-27 西安电子科技大学 Light field saliency target detection method based on pixel-level noise label supervision
CN114648672A (en) * 2022-02-25 2022-06-21 北京百度网讯科技有限公司 Method and device for constructing sample image set, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2012268887A1 (en) * 2012-12-24 2014-07-10 Canon Kabushiki Kaisha Saliency prediction method
US20150117783A1 (en) * 2013-10-24 2015-04-30 Adobe Systems Incorporated Iterative saliency map estimation
CN105913064A (en) * 2016-04-12 2016-08-31 福州大学 Image visual saliency detection fitting optimization method
US20180181593A1 (en) * 2016-12-28 2018-06-28 Shutterstock, Inc. Identification of a salient portion of an image
CN108492322A (en) * 2018-04-04 2018-09-04 南京大学 A method of user's visual field is predicted based on deep learning
CN110570490A (en) * 2019-09-06 2019-12-13 北京航空航天大学 saliency image generation method and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2012268887A1 (en) * 2012-12-24 2014-07-10 Canon Kabushiki Kaisha Saliency prediction method
US20150117783A1 (en) * 2013-10-24 2015-04-30 Adobe Systems Incorporated Iterative saliency map estimation
CN105913064A (en) * 2016-04-12 2016-08-31 福州大学 Image visual saliency detection fitting optimization method
US20180181593A1 (en) * 2016-12-28 2018-06-28 Shutterstock, Inc. Identification of a salient portion of an image
CN108492322A (en) * 2018-04-04 2018-09-04 南京大学 A method of user's visual field is predicted based on deep learning
CN110570490A (en) * 2019-09-06 2019-12-13 北京航空航天大学 saliency image generation method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JE-JIN RYU等: "Saliency Map Generation Based on the Meanshift Clustering and the Pseudo Background", 《THE JOURNAL OF KOREAN INSTITUTE OF INFORMATION TECHNOLOGY》 *
李德仁等: "基于视觉反差的显著图生成与目标检测", 《武汉大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648672A (en) * 2022-02-25 2022-06-21 北京百度网讯科技有限公司 Method and device for constructing sample image set, electronic equipment and readable storage medium
CN114549863A (en) * 2022-04-27 2022-05-27 西安电子科技大学 Light field saliency target detection method based on pixel-level noise label supervision
CN114549863B (en) * 2022-04-27 2022-07-22 西安电子科技大学 Light field saliency target detection method based on pixel-level noise label supervision

Also Published As

Publication number Publication date
CN112487225B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
Hénaff et al. Perceptual straightening of natural videos
CN110598845B (en) Data processing method, data processing device, computer equipment and storage medium
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN109086811B (en) Multi-label image classification method and device and electronic equipment
CN110383299B (en) Memory enhanced generation time model
CN111680678B (en) Target area identification method, device, equipment and readable storage medium
Shen et al. Transform-invariant convolutional neural networks for image classification and search
CN112487225B (en) Saliency image generation method and device and server
CN113722474A (en) Text classification method, device, equipment and storage medium
WO2021169642A1 (en) Video-based eyeball turning determination method and system
CN112990318A (en) Continuous learning method, device, terminal and storage medium
CN115345905A (en) Target object tracking method, device, terminal and storage medium
CN112486338A (en) Medical information processing method and device and electronic equipment
CN110135428B (en) Image segmentation processing method and device
CN113918738B (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
CN111462184A (en) Online sparse prototype tracking method based on twin neural network linear representation model
CN113780365B (en) Sample generation method and device
CN108875901B (en) Neural network training method and universal object detection method, device and system
Oga et al. River state classification combining patch-based processing and CNN
CN114332457A (en) Image instance segmentation model training method, image instance segmentation method and device
CN116258937A (en) Small sample segmentation method, device, terminal and medium based on attention mechanism
CN113610080B (en) Cross-modal perception-based sensitive image identification method, device, equipment and medium
CN115345917A (en) Multi-stage dense reconstruction method and device for low video memory occupation
Feng-Hui et al. Road traffic accident scene detection and mapping system based on aerial photography
CN110598028B (en) Image classification method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant