CN109886317B

CN109886317B - General image aesthetic evaluation method, system and equipment based on attention mechanism

Info

Publication number: CN109886317B
Application number: CN201910086789.XA
Authority: CN
Inventors: 盛柯恺; 董未名; 马重阳; 梅星; 胡包钢
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2021-04-27
Anticipated expiration: 2039-01-29
Also published as: CN109886317A

Abstract

The invention relates to the technical field of image recognition and machine learning, in particular to a general image aesthetic evaluation method, system and device based on an attention mechanism, and aims to improve the accuracy of an evaluation result. The evaluation method of the present invention includes: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length; randomly cutting a preset number of square image blocks on the zoomed image, and enabling the side length of each square image block to be equal to a preset second length; inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling level confidence coefficient vector; calculating the mean value of confidence vectors of the preset number of two-dimensional aesthetic feeling levels; and performing aesthetic evaluation on the image to be evaluated according to the mean value. The accuracy of the invention is obviously higher than that of the prior technical scheme; no additional information by means of the image is required; the evaluation process consumes less time and the model occupies less space.

Description

General image aesthetic evaluation method, system and equipment based on attention mechanism

Technical Field

The invention relates to the technical field of image recognition and machine learning, in particular to a general image aesthetic evaluation method, system and device based on an attention mechanism.

Background

The general image aesthetic evaluation aims to intelligently judge the aesthetic degree of an input image by using a computer system, and requires that the judgment given by the system has higher consistency with the judgment made by a human expert with good aesthetic quality. General image aesthetic evaluation is one of the bases of multiple technologies such as image recommendation, image post-processing and the like, and is also a cross subject (including cognitive psychology, computer vision, machine learning and the like), so that effective evaluation of the aesthetic degree of any input image is an important issue worthy of attention and investment.

Currently, mainstream general image aesthetic evaluation methods all utilize additional information of an image (e.g., object type, image scene type, image attribute information, etc. contained in the image), and have the following two technical schemes:

the first technical scheme is as follows: the network model is designed to be a multitask output mode (i.e., Multi-task learning) in combination with the image aesthetic rating labels and the image additional information.

The second technical scheme is as follows: first training a model of image aesthetic evaluation and a model of a plurality of related tasks, given image aesthetic rating labels and additional information; the representations of certain hidden layers of the models are then stitched together in a certain designed way and based on this a model for aesthetic assessment tasks is trained.

The first scheme adopts a multi-task training design scheme, and aims to improve the utilization rate of data through a multi-task training mode and inject more information related to image aesthetic evaluation into a model. The training method needs to balance primary and secondary relations among a plurality of tasks, and cannot ensure that a multi-task mode can achieve the purpose.

The second solution adopts a module based on characterization aggregation (for example, taking statistics of the characterization vector as an input of the aesthetic evaluation module), and aims to improve the effect of aesthetic evaluation of the image by effectively combining various attribute information (for example, scene information of the image, object information contained in the image, and the like) about the image. Such design methods bring a large amount of training, and are not an End-to-End (End-to-End) training mode, and cannot effectively complete the training task from the data plane.

Both the two schemes need a large amount of manpower to label the extra information of the image, and the kind of the extra information depends on the design of experts, so that the time and the labor are consumed, and the maintenance and the expansion are not easy.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a general image aesthetic evaluation method, system and device based on an attention mechanism, which not only improves the classification accuracy, but also has faster evaluation speed.

In a first aspect of the present invention, a general image aesthetic evaluation method based on an attention mechanism is provided, the evaluation method comprising:

step A1, scaling the size of the image to be evaluated, and making the shortest side length of the scaled image equal to a preset first length;

a2, randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of each square image block is equal to a preset second length;

step A3, inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling grade confidence coefficient vector;

step A4, calculating the mean value of the confidence vectors of the two-dimensional aesthetic sense levels of the preset number;

and A5, performing aesthetic evaluation on the image to be evaluated according to the average value.

Preferably, the convolutional neural network model comprises a backbone network, a full connection layer and a softmax module which are connected in sequence;

wherein,

the main network is used for receiving the square image blocks and outputting the characterization vectors with dimension H multiplied by 1 corresponding to the square image blocks;

the dimension of the full connection layer is (2+ K) multiplied by H, and the aesthetic semantic vector with the dimension of (2+ K) multiplied by 1 is calculated according to the representation vector;

the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector; the values of the first and second dimensions of the aesthetic discrimination confidence vector form the two-dimensional aesthetic level confidence vector;

k is the number of the added random vector lines and a preset value; h is the number of rows of the characterization vector.

Preferably, the training method of the convolutional neural network model includes:

step B1, randomly extracting a preset number of images from the training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to the preset first length, and randomly cutting a square image block with the side length being the preset second length from each scaled image;

step B2, respectively inputting each square image block obtained by cutting into the convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;

step B3, calculating the training weight omega corresponding to each square image block according to the following formula respectively according to the two-dimensional aesthetic sense level confidence coefficient vector corresponding to each square image block_p：

Wherein,

representing an aesthetic class prediction made by the convolutional neural network model on a square image block p;

representing the aesthetic category of the artificial label corresponding to the square image block p;

representing the given model parameters theta and the input square image block p,

the probability of (d); β represents a weight control factor;

step B4, training weight omega corresponding to each square image block_pThe weighted cross entropy losses are calculated according to the following equations, respectively:

step B5, according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating;

and step B6, repeatedly executing the iterative training steps from the step B1 to the step B5 until the preset number of optimization iterative rounds is completed or the optimization process reaches a convergence state.

Preferably, the step of "performing gradient back-transfer and model parameter update according to the weighted cross-entropy loss" in step B5 includes:

calculating the model parameters to be updated according to the weighted cross entropy loss and the following formula:

wherein, theta' is a model parameter to be updated; λ represents a learning rate for controlling a step size of each parameter update; b represents a set of square image blocks with preset numbers obtained by cutting;

and carrying out gradient back transmission and updating the parameter of the convolutional neural network model according to the model parameter to be updated.

Preferably, the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;

accordingly, the two-dimensional aesthetic level confidence vector is represented as:

wherein,

representing the aesthetic class prediction made by the convolutional neural network model on the square image block p, wherein theta is a parameter of the convolutional neural network model; first dimension element

And a second dimension element

Representing the given model parameter theta and the input square image block p respectively,

probability of and

the probability of (c).

Preferably, the numerical matrices of the square image blocks are normalized, whitened, and divided by the variance.

In a second aspect of the present invention, an attention-based general image aesthetics evaluation system is presented, the evaluation system comprising:

a scaling module configured to: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length;

a cropping module configured to: randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of each square image block is equal to a preset second length;

a confidence vector generation module configured to: inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling grade confidence coefficient vector;

a mean calculation module configured to: calculating the average value of the confidence vectors of the two-dimensional aesthetic feeling levels of the preset number;

an evaluation module configured to: and performing aesthetic evaluation on the image to be evaluated according to the average value.

wherein,

Preferably, the evaluation system further comprises:

a training module configured to train the convolutional neural network model;

the training module comprises:

a scaling and cropping unit configured to: randomly extracting a preset number of images from a training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to the preset first length, and randomly cutting a square image block with the side length being the preset second length from each scaled image;

a confidence vector generation unit configured to: respectively inputting each square image block obtained by cutting into the convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;

a weight calculation unit configured to: according to the two-dimensional aesthetic feeling level confidence coefficient vector corresponding to each square image block, the training weight omega corresponding to the square image block is calculated according to the following formula_p：

Wherein,

the probability of (d); β represents a weight control factor;

a cross entropy loss calculation unit configured to: according to the training weight omega corresponding to each square image block_pThe weighted cross entropy losses are calculated according to the following equations, respectively:

a parameter updating unit configured to: according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating;

a control unit configured to: and repeatedly calling the scaling and cutting unit, the confidence coefficient vector generating unit, the weight calculating unit, the cross entropy loss calculating unit and the parameter updating unit to perform iterative training until a preset number of iterative rounds of optimization is completed or the optimization process reaches a convergence state.

Preferably, the parameter updating unit includes:

a parameter calculating subunit, configured to calculate, according to the weighted cross entropy loss, a model parameter to be updated according to the following formula:

and the parameter updating subunit is configured to perform gradient return and update the parameter of the convolutional neural network model according to the model parameter to be updated.

wherein,

representing the aesthetic class prediction made by the convolutional neural network model on the square image block p, wherein theta is a parameter of the convolutional neural network model;

and

probability of and

the probability of (c).

Preferably, the value matrix of each square image block needs to be normalized, whitened, and divided by the variance.

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned attention-based general image aesthetics evaluation method.

In a fourth aspect of the present invention, a control apparatus is provided, including:

a processor adapted to load a program;

a memory adapted to store the program;

the program is adapted to be loaded and executed by the processor to implement the above-described attention-based general image aesthetics evaluation method.

Compared with the closest prior art, the invention has the following beneficial effects:

(1) the accuracy of the image aesthetic evaluation is obviously higher than that of the prior technical scheme;

(2) the method does not need to use extra information of the image, and directly marks the image level from the aesthetic level to train the convolutional neural network;

(3) by adding random vector rows in the full-connection layer, the problem of supersaturation of confidence coefficient in the aesthetic two-classification learning process is effectively avoided;

(4) the aesthetic evaluation function of the system on a single image takes less than 0.1 ms and the space occupied by the model (approx. 40MB) is smaller in the prior art solutions.

Drawings

FIG. 1 is a schematic diagram of the main steps of an embodiment of the general image aesthetic assessment method based on attention mechanism of the present invention;

FIG. 2 is a schematic diagram of the structure of a convolutional neural network model in an embodiment of the present invention;

FIG. 3 is a schematic diagram of the main steps of an embodiment of the training method of the convolutional neural network model of the present invention;

FIG. 4 is an example of the results of an aesthetic evaluation on an AVA data set by an embodiment of the evaluation method of the present invention;

FIG. 5 is a diagram illustrating the results of an aesthetic evaluation of images obtained by different scaling methods according to an embodiment of the evaluation method of the present invention;

FIG. 6 is a schematic diagram of the main components of an embodiment of the general image aesthetic evaluation system based on attention mechanism of the present invention.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that the terms "first" and "second" in the description of the present invention are used for convenience of description only and do not indicate or imply relative importance of the devices, elements or parameters, and therefore should not be construed as limiting the present invention.

According to the method, corresponding training weights are given according to expected category confidence coefficients of different square image blocks in the process of training the model through a machine learning method and a neural network algorithm, so that the purpose of quickly and efficiently optimizing network parameters without additional image information and obtaining a convolutional neural network model with a good aesthetic evaluation effect is achieved.

In order to achieve the purpose, the method is based on a convolutional neural network model in deep learning and a loss function based on an attention mechanism, and different training weights are given to different square image blocks of the same image in the training process, so that a user can be helped to efficiently train and obtain a general image aesthetic evaluation model with good judgment performance under the condition of missing an image additional information label.

In the embodiment of the invention, the aesthetic category of the artificial annotation has two values of 0 and 1, wherein 0 represents that the aesthetic feeling of the image is low, and 1 represents that the aesthetic feeling of the image is high. Accordingly, the two-dimensional aesthetic level confidence vector output by the convolutional neural network is as shown in equation (1):

wherein,

representing the aesthetic class prediction of the square image block p by the convolutional neural network model, wherein theta is a parameter of the convolutional neural network model; first dimension element

And a second dimension element

probability of and

the probability of (c).

FIG. 1 is a schematic diagram of the main steps of an embodiment of the general image aesthetic assessment method based on attention mechanism of the present invention. As shown in FIG. 1, the evaluation method of the present embodiment includes steps A1-A5:

step a1, performing size scaling on the image to be evaluated, so that the shortest side length of the scaled image is equal to a preset first length (256 in this embodiment).

Step a2, randomly cropping a preset number (10 in this embodiment) of square image blocks on the scaled image, where the side length of each square image block is equal to a preset second length (224 in this embodiment), and the preset second length is less than or equal to the preset first length.

And step A3, inputting each square image block into the trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic sense level confidence coefficient vector.

Step A4, calculating the mean value of the confidence vectors of the two-dimensional aesthetic sense levels of the preset number.

Step a5, performing aesthetic evaluation on the image to be evaluated according to the mean value.

In this step, the numerical value of the second-dimension element in the mean vector is taken, and the decision confidence coefficient for determining that the image to be evaluated is high in aesthetic sense degree can be obtained. And if the numerical value of the second dimension element is larger than or equal to the numerical value of the first dimension element, the aesthetic feeling degree of the image to be evaluated is considered to be high, otherwise, the aesthetic feeling degree of the image to be evaluated is considered to be low. Also, the larger the value of the second-dimension element is than the value of the first-dimension element, it means that the aesthetic degree of the image to be evaluated is higher.

Fig. 2 is a schematic diagram of the structure of the convolutional neural network model in the embodiment of the present invention. As shown in fig. 2, the convolutional neural network model in the embodiment of the present invention includes a backbone network, a full connection layer, and a softmax module, which are connected in sequence.

Wherein, a backbone network (Back-bone network, a common implementation scheme is such as VGG, or ResNet, etc.) is used to receive a square image block and output a Representation vector (Representation vector) with dimension H × 1 corresponding to the square image block; the dimension of the Fully connected layer (full connected layer) is (2+ K) × H, and is used to calculate an aesthetic semantic vector (aesthetical vector) with dimension (2+ K) × 1 from the characterization vector: { z₁,z₂,…,z_2+K}; the softmax module is used for calculating an aesthetic discrimination confidence coefficient vector (Confidences) with the dimension of (2+ K) multiplied by 1 according to the aesthetic semantic vector. The calculation method of the output value of each dimension of the softmax module is shown as the formula (2):

z_ivalue of ith dimension in aesthetic semantic vector output for fully connected layer, σ (z)_iThe value of the ith dimension, i ═ 1,2, …, K, in the aesthetic discrimination confidence vector output by the softmax module.

And combining the value of the first dimension and the value of the second dimension of the aesthetic judgment confidence coefficient vector into a two-dimensional aesthetic level confidence coefficient vector corresponding to the square image block. Wherein the value of the first dimension represents a discrimination confidence level with a low aesthetic measure; the value of the second dimension represents a discrimination confidence that the aesthetic measure is high; k is the number of the added random vector lines and a preset value; h is the number of rows characterizing the vector.

It should be noted that: unlike the conventional two-classification aesthetic evaluation model design, we introduce K random vectors at the full-connected layer, and the conventional two-classification model is shown in formula (3):

after adding K random vectors, as shown in equation (4):

the method weakens the numerical value of the two dimensions in front of the aesthetic judgment confidence coefficient vector, thereby preventing the problem of over confidence level saturation (over confidence level) in the aesthetic two-classification learning process to a certain extent. The effectiveness of this design has been validated by our experimental results.

FIG. 3 is a schematic diagram of the main steps of an embodiment of the training method of the convolutional neural network model of the present invention. As shown in FIG. 3, the training method of the present embodiment includes steps B1-B5:

step B1, randomly extracting a preset number of images (32 in this embodiment) from the training set, scaling each image according to the requirement that the shortest side length is equal to the preset first length (256 in this embodiment), and randomly cutting a square image block with a side length of the preset second length (224 in this embodiment) from each scaled image.

And step B2, respectively inputting each square image block obtained by cutting into a convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block, as shown in formula (1).

Step B3, calculating the training weight omega corresponding to each square image block according to the formula (5) and the two-dimensional aesthetic sense level confidence coefficient vector corresponding to each square image block_p：

Wherein,

representing an aesthetic class prediction made by the convolutional neural network model on the square image block p;

representing persons to which square image blocks p correspondThe aesthetic category of the manual labeling has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;

the probability of (d); β represents a weight control factor.

Step B4, training weight omega corresponding to each square image block_pThe weighted cross entropy losses are calculated according to equation (6) respectively:

and step B5, carrying out gradient back transmission and model parameter updating according to the weighted cross entropy loss. The method specifically comprises the following steps:

step B51, calculating the model parameters to be updated according to formula (7) according to the weighted cross entropy loss:

and step B52, according to the model parameters to be updated, carrying out gradient back transmission and updating the parameters of the convolutional neural network model.

The square image block value matrices input in step a3 and step B2 need to be divided by 255 to normalize the value range to [0, 1], and then perform whitening operation, i.e. subtract the mean value of each image channel and divide by the square difference to make the mean value zero and the variance one.

Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.

FIG. 4 is an example of the result of an aesthetic assessment on an AVA (Audio Visual actions) dataset by an embodiment of the assessment method of the present invention. As shown in fig. 4, the left side of the dotted line is an image determined as having low aesthetic sense by the model, the right side of the dotted line is an image determined as having high aesthetic sense by the model, and in the average value of the confidence vectors of the two-dimensional aesthetic sense levels calculated in step a4, the higher the value of the first-dimensional element is, the lower the aesthetic sense degree is represented, and the higher the value of the second-dimensional element is, the higher the aesthetic sense degree is represented.

FIG. 5 is a diagram illustrating the results of aesthetic evaluation of images obtained by different scaling methods according to an embodiment of the evaluation method of the present invention. As shown in fig. 5, in each image pair, the left image is obtained by uniform scaling (i.e., scaling the shorter side to the target value while keeping the aspect ratio of the image constant), and the right image is obtained by global scaling (i.e., scaling both the length and width of the image to the target values). Two values separated by "/" are plotted below each picture, the first value representing the assessment category of the present invention and the second value representing the confidence level of the assessment category. For example, the first image pair in the first row, the left and right images are labeled with "1/0.740" and "1/0.562", respectively, which indicates that the final evaluation category given by the model after the two scaling methods is "high" in aesthetic sense, but the right image has relatively low confidence because it is compressed up and down. If there is a second image pair in the first row, the left and right images are labeled with "1/0.670" and "0/0.521", respectively, indicating that the predicted aesthetic level of the left image is "high", and the predicted aesthetic level of the right image is "low" due to severe distortion.

As can be seen from fig. 5: even if no image sample after global scaling is input in the model training stage, the trained model can reasonably detect different image aesthetic losses caused by uniform scaling and global scaling: by giving different assessment categories or different confidences.

Compared with the statistical experimental results (classification accuracy) of other evaluation methods on the AVA data set, the method disclosed by the invention is shown in the following table 1:

TABLE 1

Name of method	Experimental results on the AVA dataset
		VGG-Scale	73.8
VGG-Pad	72.9
		VGG-Crop	71.2
SPP	76.0
		DMA-Net	75.41
MNA-CNN	77.1
		RAPID	75.42
A&C CNN	74.51
		MTCNN	78.56
MTRLCNN	79.08
		BDN	78.08
A-Lamp	82.5
		NIMA	81.51
The invention	83.03

The comparison methods listed in table 1 are representative methods in the field of image aesthetic evaluation research, and thus the comparison results are of great significance. The comparison method comprises the following steps: VGG-Scale (Very Deep CNN with Scaled Image), VGG-Pad (Very Deep CNN with Scaled Image), VGG-crop (Very Deep CNN with Scaled Image), SPP (CNN with Spatial gradient Point), DMA-Net (Deep Multi-Page Aggregation Network), MNA-CNN (Multi-Net Adaptive Spatial Neural Network), RAPID (Two-column CNN for random graphical analysis), A & C (atomic resolution with Spatial gradients) and multimedia analysis, Multi-skin NN (MTJN) and branched texture Image, hierarchical texture map (hierarchical map-texture map) and hierarchical map (hierarchical map-texture map) are used. As can be seen from the experimental results in Table 1, the classification accuracy of the present invention is significantly higher than that of the other comparative methods.

Based on the same technical concept as the above evaluation method, the invention also provides an evaluation system, which is specifically described below.

FIG. 6 is a schematic diagram of the main components of an embodiment of the general image aesthetic evaluation system based on attention mechanism of the present invention. As shown in fig. 6, the evaluation system 1 of the present embodiment includes: scaling module 10, clipping module 20, confidence vector generation module 30, mean calculation module 40, evaluation module 50, and training module 60.

Wherein the scaling module 10 is configured to: scaling the size of the image to be evaluated, and enabling the shortest side length of the scaled image to be equal to a preset first length; the cropping module 20 is configured to: randomly cutting a preset number of square image blocks on the zoomed image, wherein the side length of the square image blocks is equal to a preset second length; the confidence vector generation module 30 is configured to: inputting each square image block into a trained convolutional neural network model, and outputting a corresponding two-dimensional aesthetic feeling level confidence coefficient vector; the mean calculation module 40 is configured to: calculating the mean value of confidence vectors of the preset number of two-dimensional aesthetic feeling levels; the evaluation module 50 is configured to: performing aesthetic evaluation on the image to be evaluated according to the mean value; the training module 60 is configured to train the convolutional neural network model.

In this embodiment, the training module 60 includes: the device comprises a scaling and clipping unit, a confidence coefficient vector generation unit, a weight calculation unit, a cross entropy loss calculation unit, a parameter updating unit and a control unit.

Wherein the scaling and cropping unit is configured to: randomly extracting a preset number of images from a training set, carrying out size scaling on each image according to the requirement that the shortest side length is equal to a preset first length, and randomly cutting a square image block with the side length being a preset second length from each scaled image; the confidence vector generation unit is configured to: respectively inputting each square image block obtained by cutting into a convolutional neural network model to obtain a two-dimensional aesthetic feeling level confidence coefficient vector corresponding to the square image block;the weight calculation unit is configured to: according to the two-dimensional aesthetic feeling level confidence coefficient vector corresponding to each square image block, respectively calculating the training weight omega corresponding to the square image block according to a formula (5)_p(ii) a The cross entropy loss calculation unit is configured to: according to the training weight omega corresponding to each square image block_pCalculating weighted cross entropy losses according to equation (6) respectively; the parameter updating unit is configured to: according to the weighted cross entropy loss, carrying out gradient back transmission and model parameter updating; the control unit is configured to: and repeatedly calling the scaling and clipping unit, the confidence coefficient vector generation unit, the weight calculation unit, the cross entropy loss calculation unit and the parameter updating unit to perform iterative training until a preset optimization iteration round number is completed or the optimization process reaches a convergence state.

Specifically, the parameter updating unit in this embodiment includes: a parameter calculating subunit and a parameter updating subunit.

Wherein the parameter calculation subunit is configured to calculate the model parameters to be updated according to formula (7) based on the weighted cross entropy loss; the parameter updating subunit is configured to perform gradient back transmission and update the parameter of the convolutional neural network model according to the model parameter to be updated.

Further, the present invention also proposes an embodiment of a storage device, in which a plurality of programs are stored, said programs being adapted to be loaded and executed by a processor to implement the above-mentioned attention-based general image aesthetics evaluation method.

Further, an embodiment of a control device is also presented that includes a processor and a memory. Wherein the processor is adapted to load a program and the memory is adapted to store said program, said program being adapted to be loaded and executed by said processor to implement the above-described attention-based general image aesthetics evaluation method.

Those of skill in the art will appreciate that the various illustrative method steps, modules, elements described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A general image aesthetic assessment method based on attention mechanism, characterized in that the assessment method comprises:

step A5, performing aesthetic evaluation on the image to be evaluated according to the average value;

the convolutional neural network model comprises a backbone network, a full connection layer and a softmax module which are connected in sequence;

wherein,

2. The attention mechanism-based general image aesthetic evaluation method according to claim 1, wherein the training method of the convolutional neural network model comprises:

Wherein,

the probability of (d); β represents a weight control factor;

3. The method for universal image aesthetic assessment based on attention mechanism according to claim 2, wherein the step of "performing gradient back-transmission and model parameter update according to the weighted cross entropy loss" in step B5 comprises:

4. The attention mechanism-based general image aesthetic assessment method according to claim 2, wherein the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;

wherein,

And a second dimension element

probability of and

the probability of (c).

5. The attention-based universal image aesthetics evaluation method according to any one of claims 1-4, wherein the numerical matrices of the square image blocks are all subjected to normalization, whitening operations, and divided by the variance.

6. A universal image aesthetics evaluation system based on an attention mechanism, said evaluation system comprising:

an evaluation module configured to: performing aesthetic evaluation on the image to be evaluated according to the mean value;

wherein,

7. The attention mechanism-based universal image aesthetics evaluation system according to claim 6, further comprising:

a training module configured to train the convolutional neural network model;

the training module comprises:

Wherein,

the probability of (d); β represents a weight control factor;

8. The attention mechanism-based universal image aesthetics evaluation system according to claim 7, wherein said parameter update unit comprises:

9. The attention-based universal image aesthetic evaluation system according to claim 7, wherein the aesthetic category of the manual annotation has two values: 0 indicates that the image is low in aesthetic sense, and 1 indicates that the image is high in aesthetic sense;

wherein,

and

probability of and

the probability of (c).

10. The attention-based universal image aesthetics evaluation system according to any one of claims 6-9, wherein the numerical matrix of each of said square image patches is subjected to a normalization, whitening operation, and division by a variance.

11. A storage device having stored therein a plurality of programs, characterized in that said programs are adapted to be loaded and executed by a processor to implement the attention mechanism based universal image aesthetics evaluation method according to any one of claims 1-5.

12. A control device, comprising:

a processor adapted to load a program;

a memory adapted to store the program;

characterized in that said program is adapted to be loaded and executed by said processor to implement the attention mechanism based general image aesthetics evaluation method of any one of claims 1-5.