CN109886317B - General image aesthetic evaluation method, system and equipment based on attention mechanism - Google Patents

General image aesthetic evaluation method, system and equipment based on attention mechanism Download PDF

Info

Publication number
CN109886317B
CN109886317B CN201910086789.XA CN201910086789A CN109886317B CN 109886317 B CN109886317 B CN 109886317B CN 201910086789 A CN201910086789 A CN 201910086789A CN 109886317 B CN109886317 B CN 109886317B
Authority
CN
China
Prior art keywords
aesthetic
image
square image
image block
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910086789.XA
Other languages
Chinese (zh)
Other versions
CN109886317A (en
Inventor
盛柯恺
董未名
马重阳
梅星
胡包钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910086789.XA priority Critical patent/CN109886317B/en
Publication of CN109886317A publication Critical patent/CN109886317A/en
Application granted granted Critical
Publication of CN109886317B publication Critical patent/CN109886317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition and machine learning, in particular to a general image aesthetic evaluation method, system and device based on an attention mechanism, and aims to improve the accuracy of an evaluation result. The evaluation method of the present invention includes: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length; randomly cutting a preset number of square image blocks on the zoomed image, and enabling the side length of each square image block to be equal to a preset second length; inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling level confidence coefficient vector; calculating the mean value of confidence vectors of the preset number of two-dimensional aesthetic feeling levels; and performing aesthetic evaluation on the image to be evaluated according to the mean value. The accuracy of the invention is obviously higher than that of the prior technical scheme; no additional information by means of the image is required; the evaluation process consumes less time and the model occupies less space.

Description

General image aesthetic evaluation method, system and equipment based on attention mechanism
Technical Field
The invention relates to the technical field of image recognition and machine learning, in particular to a general image aesthetic evaluation method, system and device based on an attention mechanism.
Background
The general image aesthetic evaluation aims to intelligently judge the aesthetic degree of an input image by using a computer system, and requires that the judgment given by the system has higher consistency with the judgment made by a human expert with good aesthetic quality. General image aesthetic evaluation is one of the bases of multiple technologies such as image recommendation, image post-processing and the like, and is also a cross subject (including cognitive psychology, computer vision, machine learning and the like), so that effective evaluation of the aesthetic degree of any input image is an important issue worthy of attention and investment.
Currently, mainstream general image aesthetic evaluation methods all utilize additional information of an image (e.g., object type, image scene type, image attribute information, etc. contained in the image), and have the following two technical schemes:
the first technical scheme is as follows: the network model is designed to be a multitask output mode (i.e., Multi-task learning) in combination with the image aesthetic rating labels and the image additional information.
The second technical scheme is as follows: first training a model of image aesthetic evaluation and a model of a plurality of related tasks, given image aesthetic rating labels and additional information; the representations of certain hidden layers of the models are then stitched together in a certain designed way and based on this a model for aesthetic assessment tasks is trained.
The first scheme adopts a multi-task training design scheme, and aims to improve the utilization rate of data through a multi-task training mode and inject more information related to image aesthetic evaluation into a model. The training method needs to balance primary and secondary relations among a plurality of tasks, and cannot ensure that a multi-task mode can achieve the purpose.
The second solution adopts a module based on characterization aggregation (for example, taking statistics of the characterization vector as an input of the aesthetic evaluation module), and aims to improve the effect of aesthetic evaluation of the image by effectively combining various attribute information (for example, scene information of the image, object information contained in the image, and the like) about the image. Such design methods bring a large amount of training, and are not an End-to-End (End-to-End) training mode, and cannot effectively complete the training task from the data plane.
Both the two schemes need a large amount of manpower to label the extra information of the image, and the kind of the extra information depends on the design of experts, so that the time and the labor are consumed, and the maintenance and the expansion are not easy.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a general image aesthetic evaluation method, system and device based on an attention mechanism, which not only improves the classification accuracy, but also has faster evaluation speed.
In a first aspect of the present invention, a general image aesthetic evaluation method based on an attention mechanism is provided, the evaluation method comprising:
step A1, scaling the size of the image to be evaluated, and making the shortest side length of the scaled image equal to a preset first length;
a2, randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of each square image block is equal to a preset second length;
step A3, inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling grade confidence coefficient vector;
step A4, calculating the mean value of the confidence vectors of the two-dimensional aesthetic sense levels of the preset number;
and A5, performing aesthetic evaluation on the image to be evaluated according to the average value.
Preferably, the convolutional neural network model comprises a backbone network, a full connection layer and a softmax module which are connected in sequence;
wherein,
the main network is used for receiving the square image blocks and outputting the characterization vectors with dimension H multiplied by 1 corresponding to the square image blocks;
the dimension of the full connection layer is (2+ K) multiplied by H, and the aesthetic semantic vector with the dimension of (2+ K) multiplied by 1 is calculated according to the representation vector;
the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector; the values of the first and second dimensions of the aesthetic discrimination confidence vector form the two-dimensional aesthetic level confidence vector;
k is the number of the added random vector lines and a preset value; h is the number of rows of the characterization vector.
Preferably, the training method of the convolutional neural network model includes:
step B1, randomly extracting a preset number of images from the training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to the preset first length, and randomly cutting a square image block with the side length being the preset second length from each scaled image;
step B2, respectively inputting each square image block obtained by cutting into the convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;
step B3, calculating the training weight omega corresponding to each square image block according to the following formula respectively according to the two-dimensional aesthetic sense level confidence coefficient vector corresponding to each square image blockp
Figure BDA0001962045660000031
Wherein,
Figure BDA0001962045660000032
representing an aesthetic class prediction made by the convolutional neural network model on a square image block p;
Figure BDA0001962045660000033
representing the aesthetic category of the artificial label corresponding to the square image block p;
Figure BDA0001962045660000034
representing the given model parameters theta and the input square image block p,
Figure BDA0001962045660000035
the probability of (d); β represents a weight control factor;
step B4, training weight omega corresponding to each square image blockpThe weighted cross entropy losses are calculated according to the following equations, respectively:
Figure BDA0001962045660000036
step B5, according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating;
and step B6, repeatedly executing the iterative training steps from the step B1 to the step B5 until the preset number of optimization iterative rounds is completed or the optimization process reaches a convergence state.
Preferably, the step of "performing gradient back-transfer and model parameter update according to the weighted cross-entropy loss" in step B5 includes:
calculating the model parameters to be updated according to the weighted cross entropy loss and the following formula:
Figure BDA0001962045660000041
wherein, theta' is a model parameter to be updated; λ represents a learning rate for controlling a step size of each parameter update; b represents a set of square image blocks with preset numbers obtained by cutting;
and carrying out gradient back transmission and updating the parameter of the convolutional neural network model according to the model parameter to be updated.
Preferably, the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;
accordingly, the two-dimensional aesthetic level confidence vector is represented as:
Figure BDA0001962045660000042
wherein,
Figure BDA0001962045660000043
representing the aesthetic class prediction made by the convolutional neural network model on the square image block p, wherein theta is a parameter of the convolutional neural network model; first dimension element
Figure BDA0001962045660000044
And a second dimension element
Figure BDA0001962045660000045
Representing the given model parameter theta and the input square image block p respectively,
Figure BDA0001962045660000046
probability of and
Figure BDA0001962045660000047
the probability of (c).
Preferably, the numerical matrices of the square image blocks are normalized, whitened, and divided by the variance.
In a second aspect of the present invention, an attention-based general image aesthetics evaluation system is presented, the evaluation system comprising:
a scaling module configured to: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length;
a cropping module configured to: randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of each square image block is equal to a preset second length;
a confidence vector generation module configured to: inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling grade confidence coefficient vector;
a mean calculation module configured to: calculating the average value of the confidence vectors of the two-dimensional aesthetic feeling levels of the preset number;
an evaluation module configured to: and performing aesthetic evaluation on the image to be evaluated according to the average value.
Preferably, the convolutional neural network model comprises a backbone network, a full connection layer and a softmax module which are connected in sequence;
wherein,
the main network is used for receiving the square image blocks and outputting the characterization vectors with dimension H multiplied by 1 corresponding to the square image blocks;
the dimension of the full connection layer is (2+ K) multiplied by H, and the aesthetic semantic vector with the dimension of (2+ K) multiplied by 1 is calculated according to the representation vector;
the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector; the values of the first and second dimensions of the aesthetic discrimination confidence vector form the two-dimensional aesthetic level confidence vector;
k is the number of the added random vector lines and a preset value; h is the number of rows of the characterization vector.
Preferably, the evaluation system further comprises:
a training module configured to train the convolutional neural network model;
the training module comprises:
a scaling and cropping unit configured to: randomly extracting a preset number of images from a training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to the preset first length, and randomly cutting a square image block with the side length being the preset second length from each scaled image;
a confidence vector generation unit configured to: respectively inputting each square image block obtained by cutting into the convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;
a weight calculation unit configured to: according to the two-dimensional aesthetic feeling level confidence coefficient vector corresponding to each square image block, the training weight omega corresponding to the square image block is calculated according to the following formulap
Figure BDA0001962045660000051
Wherein,
Figure BDA0001962045660000052
representing an aesthetic class prediction made by the convolutional neural network model on a square image block p;
Figure BDA0001962045660000053
representing the aesthetic category of the artificial label corresponding to the square image block p;
Figure BDA0001962045660000054
representing the given model parameters theta and the input square image block p,
Figure BDA0001962045660000055
the probability of (d); β represents a weight control factor;
a cross entropy loss calculation unit configured to: according to the training weight omega corresponding to each square image blockpThe weighted cross entropy losses are calculated according to the following equations, respectively:
Figure BDA0001962045660000061
a parameter updating unit configured to: according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating;
a control unit configured to: and repeatedly calling the scaling and cutting unit, the confidence coefficient vector generating unit, the weight calculating unit, the cross entropy loss calculating unit and the parameter updating unit to perform iterative training until a preset number of iterative rounds of optimization is completed or the optimization process reaches a convergence state.
Preferably, the parameter updating unit includes:
a parameter calculating subunit, configured to calculate, according to the weighted cross entropy loss, a model parameter to be updated according to the following formula:
Figure BDA0001962045660000062
wherein, theta' is a model parameter to be updated; λ represents a learning rate for controlling a step size of each parameter update; b represents a set of square image blocks with preset numbers obtained by cutting;
and the parameter updating subunit is configured to perform gradient return and update the parameter of the convolutional neural network model according to the model parameter to be updated.
Preferably, the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;
accordingly, the two-dimensional aesthetic level confidence vector is represented as:
Figure BDA0001962045660000063
wherein,
Figure BDA0001962045660000064
representing the aesthetic class prediction made by the convolutional neural network model on the square image block p, wherein theta is a parameter of the convolutional neural network model;
Figure BDA0001962045660000065
and
Figure BDA0001962045660000066
representing the given model parameter theta and the input square image block p respectively,
Figure BDA0001962045660000067
probability of and
Figure BDA0001962045660000068
the probability of (c).
Preferably, the value matrix of each square image block needs to be normalized, whitened, and divided by the variance.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned attention-based general image aesthetics evaluation method.
In a fourth aspect of the present invention, a control apparatus is provided, including:
a processor adapted to load a program;
a memory adapted to store the program;
the program is adapted to be loaded and executed by the processor to implement the above-described attention-based general image aesthetics evaluation method.
Compared with the closest prior art, the invention has the following beneficial effects:
(1) the accuracy of the image aesthetic evaluation is obviously higher than that of the prior technical scheme;
(2) the method does not need to use extra information of the image, and directly marks the image level from the aesthetic level to train the convolutional neural network;
(3) by adding random vector rows in the full-connection layer, the problem of supersaturation of confidence coefficient in the aesthetic two-classification learning process is effectively avoided;
(4) the aesthetic evaluation function of the system on a single image takes less than 0.1 ms and the space occupied by the model (approx. 40MB) is smaller in the prior art solutions.
Drawings
FIG. 1 is a schematic diagram of the main steps of an embodiment of the general image aesthetic assessment method based on attention mechanism of the present invention;
FIG. 2 is a schematic diagram of the structure of a convolutional neural network model in an embodiment of the present invention;
FIG. 3 is a schematic diagram of the main steps of an embodiment of the training method of the convolutional neural network model of the present invention;
FIG. 4 is an example of the results of an aesthetic evaluation on an AVA data set by an embodiment of the evaluation method of the present invention;
FIG. 5 is a diagram illustrating the results of an aesthetic evaluation of images obtained by different scaling methods according to an embodiment of the evaluation method of the present invention;
FIG. 6 is a schematic diagram of the main components of an embodiment of the general image aesthetic evaluation system based on attention mechanism of the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that the terms "first" and "second" in the description of the present invention are used for convenience of description only and do not indicate or imply relative importance of the devices, elements or parameters, and therefore should not be construed as limiting the present invention.
According to the method, corresponding training weights are given according to expected category confidence coefficients of different square image blocks in the process of training the model through a machine learning method and a neural network algorithm, so that the purpose of quickly and efficiently optimizing network parameters without additional image information and obtaining a convolutional neural network model with a good aesthetic evaluation effect is achieved.
In order to achieve the purpose, the method is based on a convolutional neural network model in deep learning and a loss function based on an attention mechanism, and different training weights are given to different square image blocks of the same image in the training process, so that a user can be helped to efficiently train and obtain a general image aesthetic evaluation model with good judgment performance under the condition of missing an image additional information label.
In the embodiment of the invention, the aesthetic category of the artificial annotation has two values of 0 and 1, wherein 0 represents that the aesthetic feeling of the image is low, and 1 represents that the aesthetic feeling of the image is high. Accordingly, the two-dimensional aesthetic level confidence vector output by the convolutional neural network is as shown in equation (1):
Figure BDA0001962045660000081
wherein,
Figure BDA0001962045660000082
representing the aesthetic class prediction of the square image block p by the convolutional neural network model, wherein theta is a parameter of the convolutional neural network model; first dimension element
Figure BDA0001962045660000083
And a second dimension element
Figure BDA0001962045660000084
Representing the given model parameter theta and the input square image block p respectively,
Figure BDA0001962045660000085
probability of and
Figure BDA0001962045660000086
the probability of (c).
FIG. 1 is a schematic diagram of the main steps of an embodiment of the general image aesthetic assessment method based on attention mechanism of the present invention. As shown in FIG. 1, the evaluation method of the present embodiment includes steps A1-A5:
step a1, performing size scaling on the image to be evaluated, so that the shortest side length of the scaled image is equal to a preset first length (256 in this embodiment).
Step a2, randomly cropping a preset number (10 in this embodiment) of square image blocks on the scaled image, where the side length of each square image block is equal to a preset second length (224 in this embodiment), and the preset second length is less than or equal to the preset first length.
And step A3, inputting each square image block into the trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic sense level confidence coefficient vector.
Step A4, calculating the mean value of the confidence vectors of the two-dimensional aesthetic sense levels of the preset number.
Step a5, performing aesthetic evaluation on the image to be evaluated according to the mean value.
In this step, the numerical value of the second-dimension element in the mean vector is taken, and the decision confidence coefficient for determining that the image to be evaluated is high in aesthetic sense degree can be obtained. And if the numerical value of the second dimension element is larger than or equal to the numerical value of the first dimension element, the aesthetic feeling degree of the image to be evaluated is considered to be high, otherwise, the aesthetic feeling degree of the image to be evaluated is considered to be low. Also, the larger the value of the second-dimension element is than the value of the first-dimension element, it means that the aesthetic degree of the image to be evaluated is higher.
Fig. 2 is a schematic diagram of the structure of the convolutional neural network model in the embodiment of the present invention. As shown in fig. 2, the convolutional neural network model in the embodiment of the present invention includes a backbone network, a full connection layer, and a softmax module, which are connected in sequence.
Wherein, a backbone network (Back-bone network, a common implementation scheme is such as VGG, or ResNet, etc.) is used to receive a square image block and output a Representation vector (Representation vector) with dimension H × 1 corresponding to the square image block; the dimension of the Fully connected layer (full connected layer) is (2+ K) × H, and is used to calculate an aesthetic semantic vector (aesthetical vector) with dimension (2+ K) × 1 from the characterization vector: { z1,z2,…,z2+K}; the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector (Confidences) with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector. The calculation method of the output value of each dimension of the softmax module is shown as the formula (2):
Figure BDA0001962045660000091
zivalue of ith dimension in aesthetic semantic vector output for fully connected layer, σ (z)iThe value of the ith dimension, i ═ 1,2, …, K, in the aesthetic discrimination confidence vector output by the softmax module.
And combining the value of the first dimension and the value of the second dimension of the aesthetic judgment confidence coefficient vector into a two-dimensional aesthetic level confidence coefficient vector corresponding to the square image block. Wherein the value of the first dimension represents a discrimination confidence level with a low aesthetic measure; the value of the second dimension represents a discrimination confidence that the aesthetic measure is high; k is the number of the added random vector lines and a preset value; h is the number of rows characterizing the vector.
It should be noted that: unlike the conventional two-classification aesthetic evaluation model design, we introduce K random vectors at the full-connected layer, and the conventional two-classification model is shown in formula (3):
Figure BDA0001962045660000092
after adding K random vectors, as shown in equation (4):
Figure BDA0001962045660000101
the method weakens the numerical value of the two dimensions in front of the aesthetic judgment confidence coefficient vector, thereby preventing the problem of over confidence level saturation (over confidence level) in the aesthetic two-classification learning process to a certain extent. The effectiveness of this design has been validated by our experimental results.
FIG. 3 is a schematic diagram of the main steps of an embodiment of the training method of the convolutional neural network model of the present invention. As shown in FIG. 3, the training method of the present embodiment includes steps B1-B5:
step B1, randomly extracting a preset number of images (32 in this embodiment) from the training set, scaling each image according to the requirement that the shortest side length is equal to the preset first length (256 in this embodiment), and randomly cutting a square image block with a side length of the preset second length (224 in this embodiment) from each scaled image.
And step B2, respectively inputting each square image block obtained by cutting into a convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block, as shown in formula (1).
Step B3, calculating the training weight omega corresponding to each square image block according to the formula (5) and the two-dimensional aesthetic sense level confidence coefficient vector corresponding to each square image blockp
Figure BDA0001962045660000102
Wherein,
Figure BDA0001962045660000103
representing an aesthetic class prediction made by the convolutional neural network model on the square image block p;
Figure BDA0001962045660000104
representing persons to which square image blocks p correspondThe aesthetic category of the manual labeling has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;
Figure BDA0001962045660000105
representing the given model parameters theta and the input square image block p,
Figure BDA0001962045660000106
the probability of (d); β represents a weight control factor.
Step B4, training weight omega corresponding to each square image blockpThe weighted cross entropy losses are calculated according to equation (6) respectively:
Figure BDA0001962045660000107
and step B5, carrying out gradient back transmission and model parameter updating according to the weighted cross entropy loss. The method specifically comprises the following steps:
step B51, calculating the model parameters to be updated according to formula (7) according to the weighted cross entropy loss:
Figure BDA0001962045660000111
wherein, theta' is a model parameter to be updated; λ represents a learning rate for controlling a step size of each parameter update; b represents a set of square image blocks with preset numbers obtained by cutting;
and step B52, according to the model parameters to be updated, carrying out gradient back transmission and updating the parameters of the convolutional neural network model.
And step B6, repeatedly executing the iterative training steps from the step B1 to the step B5 until the preset number of optimization iterative rounds is completed or the optimization process reaches a convergence state.
The square image block value matrices input in step a3 and step B2 need to be divided by 255 to normalize the value range to [0, 1], and then perform whitening operation, i.e. subtract the mean value of each image channel and divide by the square difference to make the mean value zero and the variance one.
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
FIG. 4 is an example of the result of an aesthetic assessment on an AVA (Audio Visual actions) dataset by an embodiment of the assessment method of the present invention. As shown in fig. 4, the left side of the dotted line is an image determined as having low aesthetic sense by the model, the right side of the dotted line is an image determined as having high aesthetic sense by the model, and in the average value of the confidence vectors of the two-dimensional aesthetic sense levels calculated in step a4, the higher the value of the first-dimensional element is, the lower the aesthetic sense degree is represented, and the higher the value of the second-dimensional element is, the higher the aesthetic sense degree is represented.
FIG. 5 is a diagram illustrating the results of aesthetic evaluation of images obtained by different scaling methods according to an embodiment of the evaluation method of the present invention. As shown in fig. 5, in each image pair, the left image is obtained by uniform scaling (i.e., scaling the shorter side to the target value while keeping the aspect ratio of the image constant), and the right image is obtained by global scaling (i.e., scaling both the length and width of the image to the target values). Two values separated by "/" are plotted below each picture, the first value representing the assessment category of the present invention and the second value representing the confidence level of the assessment category. For example, the first image pair in the first row, the left and right images are labeled with "1/0.740" and "1/0.562", respectively, which indicates that the final evaluation category given by the model after the two scaling methods is "high" in aesthetic sense, but the right image has relatively low confidence because it is compressed up and down. If there is a second image pair in the first row, the left and right images are labeled with "1/0.670" and "0/0.521", respectively, indicating that the predicted aesthetic level of the left image is "high", and the predicted aesthetic level of the right image is "low" due to severe distortion.
As can be seen from fig. 5: even if no image sample after global scaling is input in the model training stage, the trained model can reasonably detect different image aesthetic losses caused by uniform scaling and global scaling: by giving different assessment categories or different confidences.
Compared with the statistical experimental results (classification accuracy) of other evaluation methods on the AVA data set, the method disclosed by the invention is shown in the following table 1:
TABLE 1
Name of method Experimental results on the AVA dataset
VGG-Scale 73.8
VGG-Pad 72.9
VGG-Crop 71.2
SPP 76.0
DMA-Net 75.41
MNA-CNN 77.1
RAPID 75.42
A&C CNN 74.51
MTCNN 78.56
MTRLCNN 79.08
BDN 78.08
A-Lamp 82.5
NIMA 81.51
The invention 83.03
The comparison methods listed in table 1 are representative methods in the field of image aesthetic evaluation research, and thus the comparison results are of great significance. The comparison method comprises the following steps: VGG-Scale (Very Deep CNN with Scaled Image), VGG-Pad (Very Deep CNN with Scaled Image), VGG-crop (Very Deep CNN with Scaled Image), SPP (CNN with Spatial gradient Point), DMA-Net (Deep Multi-Page Aggregation Network), MNA-CNN (Multi-Net Adaptive Spatial Neural Network), RAPID (Two-column CNN for random graphical analysis), A & C (atomic resolution with Spatial gradients) and multimedia analysis, Multi-skin NN (MTJN) and branched texture Image, hierarchical texture map (hierarchical map-texture map) and hierarchical map (hierarchical map-texture map) are used. As can be seen from the experimental results in Table 1, the classification accuracy of the present invention is significantly higher than that of the other comparative methods.
Based on the same technical concept as the above evaluation method, the invention also provides an evaluation system, which is specifically described below.
FIG. 6 is a schematic diagram of the main components of an embodiment of the general image aesthetic evaluation system based on attention mechanism of the present invention. As shown in fig. 6, the evaluation system 1 of the present embodiment includes: scaling module 10, clipping module 20, confidence vector generation module 30, mean calculation module 40, evaluation module 50, and training module 60.
Wherein the scaling module 10 is configured to: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length; the cropping module 20 is configured to: randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of the square image blocks is equal to a preset second length; the confidence vector generation module 30 is configured to: inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling level confidence coefficient vector; the mean calculation module 40 is configured to: calculating the mean value of confidence vectors of the preset number of two-dimensional aesthetic feeling levels; the evaluation module 50 is configured to: performing aesthetic evaluation on the image to be evaluated according to the mean value; the training module 60 is configured to train the convolutional neural network model.
In this embodiment, the training module 60 includes: the device comprises a scaling and clipping unit, a confidence coefficient vector generation unit, a weight calculation unit, a cross entropy loss calculation unit, a parameter updating unit and a control unit.
Wherein the scaling and cropping unit is configured to: randomly extracting a preset number of images from a training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to a preset first length, and randomly cutting a square image block with the side length being a preset second length from each scaled image; the confidence vector generation unit is configured to: respectively inputting each square image block obtained by cutting into a convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;the weight calculation unit is configured to: according to the two-dimensional aesthetic feeling level confidence coefficient vector corresponding to each square image block, respectively calculating the training weight omega corresponding to the square image block according to a formula (5)p(ii) a The cross entropy loss calculation unit is configured to: according to the training weight omega corresponding to each square image blockpCalculating weighted cross entropy losses according to equation (6) respectively; the parameter updating unit is configured to: according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating; the control unit is configured to: and repeatedly calling the scaling and clipping unit, the confidence coefficient vector generation unit, the weight calculation unit, the cross entropy loss calculation unit and the parameter updating unit to perform iterative training until a preset optimization iteration round number is completed or the optimization process reaches a convergence state.
Specifically, the parameter updating unit in this embodiment includes: a parameter calculating subunit and a parameter updating subunit.
Wherein the parameter calculation subunit is configured to calculate the model parameters to be updated according to formula (7) based on the weighted cross entropy loss; the parameter updating subunit is configured to perform gradient back transmission and update the parameter of the convolutional neural network model according to the model parameter to be updated.
Further, the present invention also proposes an embodiment of a storage device, in which a plurality of programs are stored, said programs being adapted to be loaded and executed by a processor to implement the above-mentioned attention-based general image aesthetics evaluation method.
Further, an embodiment of a control device is also presented that includes a processor and a memory. Wherein the processor is adapted to load a program and the memory is adapted to store said program, said program being adapted to be loaded and executed by said processor to implement the above-described attention-based general image aesthetics evaluation method.
Those of skill in the art will appreciate that the various illustrative method steps, modules, elements described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (12)

1. A general image aesthetic assessment method based on attention mechanism, characterized in that the assessment method comprises:
step A1, scaling the size of the image to be evaluated, and making the shortest side length of the scaled image equal to a preset first length;
a2, randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of each square image block is equal to a preset second length;
step A3, inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling grade confidence coefficient vector;
step A4, calculating the mean value of the confidence vectors of the two-dimensional aesthetic sense levels of the preset number;
step A5, performing aesthetic evaluation on the image to be evaluated according to the average value;
the convolutional neural network model comprises a backbone network, a full connection layer and a softmax module which are connected in sequence;
wherein,
the main network is used for receiving the square image blocks and outputting the characterization vectors with dimension H multiplied by 1 corresponding to the square image blocks;
the dimension of the full connection layer is (2+ K) multiplied by H, and the aesthetic semantic vector with the dimension of (2+ K) multiplied by 1 is calculated according to the representation vector;
the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector; the values of the first and second dimensions of the aesthetic discrimination confidence vector form the two-dimensional aesthetic level confidence vector;
k is the number of the added random vector lines and a preset value; h is the number of rows of the characterization vector.
2. The attention mechanism-based general image aesthetic evaluation method according to claim 1, wherein the training method of the convolutional neural network model comprises:
step B1, randomly extracting a preset number of images from the training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to the preset first length, and randomly cutting a square image block with the side length being the preset second length from each scaled image;
step B2, respectively inputting each square image block obtained by cutting into the convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;
step B3, calculating the training weight omega corresponding to each square image block according to the following formula respectively according to the two-dimensional aesthetic sense level confidence coefficient vector corresponding to each square image blockp
Figure FDA0002797428340000021
Wherein,
Figure FDA0002797428340000022
representing an aesthetic class prediction made by the convolutional neural network model on a square image block p;
Figure FDA0002797428340000023
representing the aesthetic category of the artificial label corresponding to the square image block p;
Figure FDA0002797428340000024
Figure FDA0002797428340000025
representing the given model parameters theta and the input square image block p,
Figure FDA0002797428340000026
the probability of (d); β represents a weight control factor;
step B4, training weight omega corresponding to each square image blockpThe weighted cross entropy losses are calculated according to the following equations, respectively:
Figure FDA0002797428340000027
step B5, according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating;
and step B6, repeatedly executing the iterative training steps from the step B1 to the step B5 until the preset number of optimization iterative rounds is completed or the optimization process reaches a convergence state.
3. The method for universal image aesthetic assessment based on attention mechanism according to claim 2, wherein the step of "performing gradient back-transmission and model parameter update according to the weighted cross entropy loss" in step B5 comprises:
calculating the model parameters to be updated according to the weighted cross entropy loss and the following formula:
Figure FDA0002797428340000028
wherein, theta' is a model parameter to be updated; λ represents a learning rate for controlling a step size of each parameter update; b represents a set of square image blocks with preset numbers obtained by cutting;
and carrying out gradient back transmission and updating the parameter of the convolutional neural network model according to the model parameter to be updated.
4. The attention mechanism-based general image aesthetic assessment method according to claim 2, wherein the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;
accordingly, the two-dimensional aesthetic level confidence vector is represented as:
Figure FDA0002797428340000031
wherein,
Figure FDA0002797428340000032
representing the aesthetic class prediction made by the convolutional neural network model on the square image block p, wherein theta is a parameter of the convolutional neural network model; first dimension element
Figure FDA0002797428340000033
And a second dimension element
Figure FDA0002797428340000034
Representing the given model parameter theta and the input square image block p respectively,
Figure FDA0002797428340000035
probability of and
Figure FDA0002797428340000036
the probability of (c).
5. The attention-based universal image aesthetics evaluation method according to any one of claims 1-4, wherein the numerical matrices of the square image blocks are all subjected to normalization, whitening operations, and divided by the variance.
6. A universal image aesthetics evaluation system based on an attention mechanism, said evaluation system comprising:
a scaling module configured to: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length;
a cropping module configured to: randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of each square image block is equal to a preset second length;
a confidence vector generation module configured to: inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling grade confidence coefficient vector;
a mean calculation module configured to: calculating the average value of the confidence vectors of the two-dimensional aesthetic feeling levels of the preset number;
an evaluation module configured to: performing aesthetic evaluation on the image to be evaluated according to the mean value;
the convolutional neural network model comprises a backbone network, a full connection layer and a softmax module which are connected in sequence;
wherein,
the main network is used for receiving the square image blocks and outputting the characterization vectors with dimension H multiplied by 1 corresponding to the square image blocks;
the dimension of the full connection layer is (2+ K) multiplied by H, and the aesthetic semantic vector with the dimension of (2+ K) multiplied by 1 is calculated according to the representation vector;
the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector; the values of the first and second dimensions of the aesthetic discrimination confidence vector form the two-dimensional aesthetic level confidence vector;
k is the number of the added random vector lines and a preset value; h is the number of rows of the characterization vector.
7. The attention mechanism-based universal image aesthetics evaluation system according to claim 6, further comprising:
a training module configured to train the convolutional neural network model;
the training module comprises:
a scaling and cropping unit configured to: randomly extracting a preset number of images from a training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to the preset first length, and randomly cutting a square image block with the side length being the preset second length from each scaled image;
a confidence vector generation unit configured to: respectively inputting each square image block obtained by cutting into the convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;
a weight calculation unit configured to: according to the two-dimensional aesthetic feeling level confidence coefficient vector corresponding to each square image block, the training weight omega corresponding to the square image block is calculated according to the following formulap
Figure FDA0002797428340000041
Wherein,
Figure FDA0002797428340000042
representing an aesthetic class prediction made by the convolutional neural network model on a square image block p;
Figure FDA0002797428340000043
representing the aesthetic category of the artificial label corresponding to the square image block p;
Figure FDA0002797428340000044
Figure FDA0002797428340000045
representing the given model parameters theta and the input square image block p,
Figure FDA0002797428340000046
the probability of (d); β represents a weight control factor;
a cross entropy loss calculation unit configured to: according to the training weight omega corresponding to each square image blockpThe weighted cross entropy losses are calculated according to the following equations, respectively:
Figure FDA0002797428340000047
a parameter updating unit configured to: according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating;
a control unit configured to: and repeatedly calling the scaling and cutting unit, the confidence coefficient vector generating unit, the weight calculating unit, the cross entropy loss calculating unit and the parameter updating unit to perform iterative training until a preset number of iterative rounds of optimization is completed or the optimization process reaches a convergence state.
8. The attention mechanism-based universal image aesthetics evaluation system according to claim 7, wherein said parameter update unit comprises:
a parameter calculating subunit, configured to calculate, according to the weighted cross entropy loss, a model parameter to be updated according to the following formula:
Figure FDA0002797428340000051
wherein, theta' is a model parameter to be updated; λ represents a learning rate for controlling a step size of each parameter update; b represents a set of square image blocks with preset numbers obtained by cutting;
and the parameter updating subunit is configured to perform gradient return and update the parameter of the convolutional neural network model according to the model parameter to be updated.
9. The attention-based universal image aesthetic evaluation system according to claim 7, wherein the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;
accordingly, the two-dimensional aesthetic level confidence vector is represented as:
Figure FDA0002797428340000052
wherein,
Figure FDA0002797428340000053
representing the aesthetic class prediction made by the convolutional neural network model on the square image block p, wherein theta is a parameter of the convolutional neural network model;
Figure FDA0002797428340000054
and
Figure FDA0002797428340000055
Figure FDA0002797428340000056
representing the given model parameter theta and the input square image block p respectively,
Figure FDA0002797428340000057
probability of and
Figure FDA0002797428340000058
the probability of (c).
10. The attention-based universal image aesthetics evaluation system according to any one of claims 6-9, wherein the numerical matrix of each of said square image patches is subjected to a normalization, whitening operation, and division by a variance.
11. A storage device having stored therein a plurality of programs, characterized in that said programs are adapted to be loaded and executed by a processor to implement the attention mechanism based universal image aesthetics evaluation method according to any one of claims 1-5.
12. A control device, comprising:
a processor adapted to load a program;
a memory adapted to store the program;
characterized in that said program is adapted to be loaded and executed by said processor to implement the attention mechanism based general image aesthetics evaluation method of any one of claims 1-5.
CN201910086789.XA 2019-01-29 2019-01-29 General image aesthetic evaluation method, system and equipment based on attention mechanism Active CN109886317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910086789.XA CN109886317B (en) 2019-01-29 2019-01-29 General image aesthetic evaluation method, system and equipment based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910086789.XA CN109886317B (en) 2019-01-29 2019-01-29 General image aesthetic evaluation method, system and equipment based on attention mechanism

Publications (2)

Publication Number Publication Date
CN109886317A CN109886317A (en) 2019-06-14
CN109886317B true CN109886317B (en) 2021-04-27

Family

ID=66927190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910086789.XA Active CN109886317B (en) 2019-01-29 2019-01-29 General image aesthetic evaluation method, system and equipment based on attention mechanism

Country Status (1)

Country Link
CN (1) CN109886317B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242176B (en) * 2019-12-31 2023-10-13 北京迈格威科技有限公司 Method and device for processing computer vision task and electronic system
CN112287965A (en) * 2020-09-21 2021-01-29 卓尔智联(武汉)研究院有限公司 Image quality detection model training method and device and computer equipment
CN116681583A (en) * 2023-06-13 2023-09-01 上海数莅科技有限公司 Automatic picture composition method and system based on depth aesthetic network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650737A (en) * 2016-11-21 2017-05-10 中国科学院自动化研究所 Image automatic cutting method
CN106803067A (en) * 2016-12-28 2017-06-06 浙江大华技术股份有限公司 A kind of quality of human face image appraisal procedure and device
WO2017166137A1 (en) * 2016-03-30 2017-10-05 中国科学院自动化研究所 Method for multi-task deep learning-based aesthetic quality assessment on natural image
CN107330455A (en) * 2017-06-23 2017-11-07 云南大学 Image evaluation method
CN107392244A (en) * 2017-07-18 2017-11-24 厦门大学 The image aesthetic feeling Enhancement Method returned based on deep neural network with cascade
CN109146892A (en) * 2018-07-23 2019-01-04 北京邮电大学 A kind of image cropping method and device based on aesthetics

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894025A (en) * 2016-03-30 2016-08-24 中国科学院自动化研究所 Natural image aesthetic feeling quality assessment method based on multitask deep learning
CN106651830A (en) * 2016-09-28 2017-05-10 华南理工大学 Image quality test method based on parallel convolutional neural network
CN106920229B (en) * 2017-01-22 2021-01-05 北京奇艺世纪科技有限公司 Automatic detection method and system for image fuzzy area
KR101880901B1 (en) * 2017-08-09 2018-07-23 펜타시큐리티시스템 주식회사 Method and apparatus for machine learning
CN107610123A (en) * 2017-10-11 2018-01-19 中共中央办公厅电子科技学院 A kind of image aesthetic quality evaluation method based on depth convolutional neural networks
CN108417201B (en) * 2018-01-19 2020-11-06 苏州思必驰信息科技有限公司 Single-channel multi-speaker identity recognition method and system
CN108388925A (en) * 2018-03-06 2018-08-10 天津工业大学 The anti-pattern collapse robust image generation method for generating network is fought based on New Conditions
CN108492294B (en) * 2018-03-23 2022-04-12 北京邮电大学 Method and device for evaluating harmony degree of image colors

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017166137A1 (en) * 2016-03-30 2017-10-05 中国科学院自动化研究所 Method for multi-task deep learning-based aesthetic quality assessment on natural image
CN106650737A (en) * 2016-11-21 2017-05-10 中国科学院自动化研究所 Image automatic cutting method
CN106803067A (en) * 2016-12-28 2017-06-06 浙江大华技术股份有限公司 A kind of quality of human face image appraisal procedure and device
CN107330455A (en) * 2017-06-23 2017-11-07 云南大学 Image evaluation method
CN107392244A (en) * 2017-07-18 2017-11-24 厦门大学 The image aesthetic feeling Enhancement Method returned based on deep neural network with cascade
CN109146892A (en) * 2018-07-23 2019-01-04 北京邮电大学 A kind of image cropping method and device based on aesthetics

Also Published As

Publication number Publication date
CN109886317A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
Bucak et al. Incremental subspace learning via non-negative matrix factorization
CN111242841B (en) Image background style migration method based on semantic segmentation and deep learning
CN111091045A (en) Sign language identification method based on space-time attention mechanism
CN109886317B (en) General image aesthetic evaluation method, system and equipment based on attention mechanism
US20220222796A1 (en) Image processing method and apparatus, server, and storage medium
WO2015062209A1 (en) Visualized optimization processing method and device for random forest classification model
CN111488985A (en) Deep neural network model compression training method, device, equipment and medium
CN109816438B (en) Information pushing method and device
CN107506792B (en) Semi-supervised salient object detection method
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN114490065A (en) Load prediction method, device and equipment
CN110956655A (en) Dense depth estimation method based on monocular image
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN115564194A (en) Method and system for constructing metering abnormality diagnosis information generation model of smart power grid
CN111368707A (en) Face detection method, system, device and medium based on feature pyramid and dense block
CN114782742A (en) Output regularization method based on teacher model classification layer weight
CN117788629A (en) Image generation method, device and storage medium with style personalization
CN112785479B (en) Image invisible watermark universal detection method based on few sample learning
CN112767038B (en) Poster CTR prediction method and device based on aesthetic characteristics
Li et al. Real-time crowd density estimation based on convolutional neural networks
CN115953330B (en) Texture optimization method, device, equipment and storage medium for virtual scene image
CN116541593A (en) Course recommendation method based on hypergraph neural network
CN114387524B (en) Image identification method and system for small sample learning based on multilevel second-order representation
CN113962332B (en) Salient target identification method based on self-optimizing fusion feedback
CN115035408A (en) Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant