CN115880516A

CN115880516A - Image classification method, image classification model training method and related equipment

Info

Publication number: CN115880516A
Application number: CN202111135062.XA
Authority: CN
Inventors: 陈圣; 王洪斌; 蒋宁; 吴海英; 周迅溢; 曾定衡
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2023-03-31

Abstract

The application discloses an image classification method, an image classification model training method and related equipment. The method comprises the following steps: inputting the first image into a reconstruction network to obtain a first characteristic diagram; inputting the first characteristic diagram into a peak signal-to-noise ratio calculation network to obtain a peak signal-to-noise ratio value; under the condition that the peak signal-to-noise ratio is larger than or equal to a first threshold value, inputting the first feature map into an image quality evaluation network to obtain the mean subjective opinion score of the first image; determining the blurring degree of the first image according to the peak signal-to-noise ratio value and the mean opinion score; determining the first image as a first type of image under the condition that the peak signal-to-noise ratio value is smaller than a first threshold value or the mean subjective opinion score is larger than or equal to a second threshold value; and under the condition that the mean subjective opinion score is smaller than a second threshold value, determining the first image as a second type of image, wherein the blurring degree of the first type of image is smaller than that of the second type of image.

Description

Image classification method, image classification model training method and related equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an image classification method, an image classification model training method and related equipment.

Background

As is well known, to facilitate the processing of images, images are generally classified based on their degree of blur. In the prior art, the fuzzy degree of an image is generally evaluated through a network model, and the network model is used for analyzing and evaluating the global characteristics of the image, so that the evaluation accuracy of the fuzzy degree of the image is poor. Therefore, the prior art has the problem that the accuracy of the fuzzy degree classification of the image is poor.

Disclosure of Invention

The embodiment of the application aims to provide an image classification method, an image classification model training method and related equipment, which can solve the problem of poor accuracy of fuzzy degree classification of images.

In a first aspect, an embodiment of the present application provides an image classification method, including:

inputting the first image into a reconstruction network to obtain a first characteristic diagram;

inputting the first characteristic diagram into a peak signal-to-noise ratio calculation network to obtain the peak signal-to-noise ratio;

under the condition that the peak signal-to-noise ratio value is larger than or equal to a first threshold value, inputting the first feature map into an image quality evaluation network to obtain the mean subjective opinion score of the first image;

determining the blurring degree of the first image according to the peak signal-to-noise ratio value and the mean subjective opinion score;

determining that the first image is a first-class image when the peak signal-to-noise ratio value is smaller than the first threshold or the mean subjective opinion score is greater than or equal to a second threshold; and under the condition that the average subjective opinion score is smaller than the second threshold value, determining that the first image is a second type of image, wherein the blurring degree of the first type of image is smaller than that of the second type of image.

In a second aspect, an embodiment of the present application provides an image classification model training method, including:

pre-training a peak signal-to-noise ratio calculation network to be trained by using the first sample image to obtain the peak signal-to-noise ratio calculation network;

inputting a second sample image into a pre-trained reconstruction network, and respectively inputting the output of the reconstruction network into the peak signal-to-noise ratio calculation network and the image quality evaluation network to be trained;

calculating a first loss value based on the output of the peak signal-to-noise ratio calculation network, and calculating a second loss value based on the output of the image quality evaluation network to be trained;

under the condition that the weighted sum value of the first loss value and the second loss value meets the loss convergence condition, determining an image classification model based on the peak signal-to-noise ratio calculation network, the reconstruction network and the currently trained image quality evaluation network to be trained;

in the image classification model, the peak signal-to-noise ratio calculation network is used for inputting the output of the reconstruction network into an image quality evaluation network for image blur degree classification under the condition that the output peak signal-to-noise ratio is greater than or equal to a first threshold value.

In a third aspect, an embodiment of the present application provides an image classification apparatus, including:

the first input module is used for inputting the first image into a reconstruction network to obtain a first characteristic diagram;

the second input module is used for inputting the first characteristic diagram into a peak signal-to-noise ratio calculation network to obtain the peak signal-to-noise ratio;

a third input module, configured to input the first feature map into an image quality evaluation network to obtain a mean subjective opinion score of the first image when the peak signal-to-noise ratio is greater than or equal to a first threshold;

a first determining module, configured to determine a degree of blur of the first image according to the peak signal-to-noise value and the mean opinion score;

determining the first image as a first type of image when the peak signal-to-noise ratio value is smaller than the first threshold value or the mean opinion score is larger than or equal to a second threshold value; and under the condition that the mean subjective opinion score is smaller than the second threshold value, determining that the first image is a second type of image, wherein the blurring degree of the first type of image is smaller than that of the second type of image.

In a fourth aspect, an embodiment of the present application provides an image classification model training apparatus, including:

the first training module is used for pre-training the peak signal-to-noise ratio calculation network to be trained by utilizing the first sample image to obtain the peak signal-to-noise ratio calculation network;

the second training module is used for inputting a second sample image into a pre-trained reconstruction network and respectively inputting the output of the reconstruction network into the peak signal-to-noise ratio calculation network and the image quality evaluation network to be trained;

the calculation module is used for calculating a first loss value based on the output of the peak signal-to-noise ratio calculation network and calculating a second loss value based on the output of the image quality evaluation network to be trained;

the second determining module is used for determining an image classification model based on the peak signal-to-noise ratio computing network, the reconstruction network and the currently trained image quality evaluation network to be trained under the condition that the weighted sum of the first loss value and the second loss value meets a loss convergence condition;

In a fifth aspect, the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect, or implement the steps of the method according to the second aspect.

In a sixth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect, or implement the steps of the method according to the second aspect.

In a seventh aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect, or implement the steps of the method according to the second aspect.

The method comprises the steps of amplifying a first image by setting a reconstruction network to obtain a first characteristic diagram, judging the local blurring degree of the first image by a peak signal-to-noise ratio calculation network, and judging the overall blurring degree of the first image by an image quality evaluation network; therefore, the local part and the whole part of the first image are comprehensively judged, so that the accuracy of judging the blurring degree of the image can be improved, and the accuracy of classifying the blurring degree of the image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a flowchart of an image classification method provided in an embodiment of the present application;

FIG. 2 is a block diagram of a flow chart of an image classification method provided in an embodiment of the present application;

fig. 3 is a structural diagram of a reconstruction network in the image classification method provided in the embodiment of the present application;

fig. 4 is a structural diagram of a down-sampling network layer in the image classification method provided in the embodiment of the present application;

fig. 5 is a structural diagram of an image quality evaluation network in the image classification method provided in the embodiment of the present application;

fig. 6 is a structural diagram of a peak snr computing network in the image classification method provided in the embodiment of the present application;

FIG. 7 is a flowchart of an image classification model training method provided in an embodiment of the present application;

fig. 8 is a structural diagram of an image classification apparatus provided in an embodiment of the present application;

FIG. 9 is a block diagram of an image classification model training apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device provided in an embodiment of the present application;

fig. 11 is a block diagram of another electronic device provided in the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/", and generally means that the former and latter related objects are in an "or" relationship.

With the popularization of various digital instruments and digital products, images and videos become the most common information carriers in human activities, and the images and videos contain a large amount of information of objects, so that the images and videos become the main ways for people to obtain external original information. However, in the processes of image acquisition, transmission and storage, the image is often affected by various noises to degrade the image, and the quality of the image preprocessing algorithm is directly related to the effect of the subsequent image processing, such as image segmentation, object recognition, edge extraction, etc., so the clarity (that is, the quality) of the received image should be guaranteed as much as possible before the subsequent image processing, and therefore, in order to obtain a high-quality digital image, it is necessary to perform fuzzy degree judgment on the image to maintain the integrity of the received information. Therefore, the image blur determination process has been a hot spot of image processing and computer vision research.

With the development of deep learning, deep research and innovation are made on an Image Quality evaluation algorithm, deep learning Image Quality (DeepBIQ) is provided in 2017, and a leaf Image Quality evaluation (BIQA) task is realized by Convolutional Neural Network (CNN) transfer learning trained in advance based on a classification task. The overall image quality is estimated by accumulating and averaging the prediction fractions of the sub-regions of the image. The fine tuning uses a random initialization value to replace the last fully connected layer of the pre-trained CNN as the new CNN. The migration learning allows the depth of the network to increase, but its performance is affected by the original task. In 2018, a depth Image QuAlity method DIQaM-NF (Deep Image QuAlity Measure for NR IQA) without reference Image QuAlity evaluation was proposed, and the method is named Deep IQA in the reference provided by the authors, and some references also adopt the name. Based on end-to-end training, 10 convolutional layers and 5 pooling layers are included, as well as 2 fully-connected layers. It is possible that the data volume cannot support the deep network, and the experimental results do not exceed the shallow network such as IQA-CNN.

At present, image blur judgment algorithms in the industry are basically realized by classification, but the blur degree of an image is not distinguished by a clear separable line, and at present, a quality evaluation algorithm is used for evaluating an image and is also evaluated in perception and integration, so that local blur noise and real noise cannot be judged in many times. This will result in a lower accuracy of the blur determination of the image, and thus a poorer accuracy of the classification of the degree of image blur. Therefore, the image classification method is provided.

The image classification method, the image classification model training method and the related devices provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.

Referring to fig. 1, fig. 1 is a flowchart of an image classification method provided in an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step 101, inputting a first image into a reconstruction network to obtain a first characteristic diagram;

in the embodiment of the application, all images needing image classification can be adjusted to be images with preset sizes, so that the image classification model can conveniently identify and classify. For example, in some embodiments, before inputting the first image into the reconstruction network to obtain the first feature map, the method further includes: the size of the first image to be processed is adjusted to a preset size, which can be set according to actual needs, for example, 512 × 512, as an input of the reconstruction network.

It should be understood that no adjustment is necessary, provided that the size of the image to be processed is itself a preset size. The resolution of the first feature map is increased relative to the first image.

Optionally, the image classification model includes a reconstruction network and an image quality evaluation network, a peak signal-to-noise ratio calculation network, and an image quality evaluation network. The reconstruction network may be referred to as an image super-resolution reconstruction network, and is used to amplify an image. The first image is input to the reconstruction network to obtain the first feature map, and the feature map representing the first image with the preset size is input to the reconstruction network, and the feature map is subjected to image enlargement processing to obtain the first feature map.

It should be noted that the image is amplified through the reconstruction network, so that the degree of blur of the image can be amplified, and the determination of the degree of blur of the image by the peak signal-to-noise ratio calculation network and the image quality evaluation network can be facilitated.

Step 102, inputting the first characteristic diagram into a peak signal-to-noise ratio calculation network to obtain the peak signal-to-noise ratio;

in the embodiment of the application, the peak signal-to-noise ratio calculation network is used for simulating a PSNR calculation formula and calculating the first characteristic diagram to obtain the PSNR value. The primary classification of the blur degree of the target object may be performed based on the PSNR value, for example, in a case where the PSNR value is smaller than a first threshold, the first image is determined to be a first type of image, and in a case where the PSNR value is greater than or equal to the first threshold, the blur degree of the target object is evaluated by the image quality evaluation network here.

Step 103, inputting the first feature map into an image quality evaluation network to obtain a Mean Opinion Score (MOS) value of the first image when the peak signal-to-noise ratio is greater than or equal to a first threshold;

in the embodiment of the present application, the image quality evaluation network output MOS value may include 1 to 5, where the larger the MOS value is, the higher the definition of the image is. For example, a MOS value of 5 indicates the sharpest image, and a MOS value of 1 indicates the blurriest image, or the image of the worst quality.

104, determining the blurring degree of the first image according to the peak signal-to-noise ratio value and the mean subjective opinion score;

Optionally, the definitions of the first-type image and the second-type image may be set according to actual needs, and may specifically relate to the sizes of the first threshold and the second threshold. For example, in some implementations, assuming that the first threshold is set to 37.7 and the second threshold is set to 3.5, the first type of image may be interpreted as a sharp image and the second type of image may be interpreted as a blurred image. As shown in fig. 2, an input image is first preprocessed to obtain a first image. Inputting the first image into a reconstruction network for image amplification processing to obtain a first characteristic image, calculating a peak signal-to-noise ratio value through a peak signal-to-noise ratio calculation network, and determining the first image as a clear image under the condition that the peak signal-to-noise ratio value is less than 37.7; inputting the first feature map into an image quality evaluation network to obtain an average subjective opinion score of 1-5 under the condition that the peak signal-to-noise ratio value is greater than or equal to 37.7; in the case where the mean subjective opinion score is greater than or equal to 3.5, the first image is determined to be a clear image, and in the case where the mean subjective opinion score is less than 3.5, the first image is determined to be a blurred image.

Of course, in other embodiments, the first type of image may also be defined as a high definition image, and the second type of image may be defined as a normal image, and the specific classification is not further limited herein.

The method comprises the steps of amplifying a first image by setting a reconstruction network to obtain a first characteristic diagram, judging the local blurring degree of the first image by a peak signal-to-noise ratio calculation network, and judging the overall blurring degree of the first image by an image quality evaluation network; therefore, the local and the whole of the first image are comprehensively judged, so that the accuracy of judging the fuzzy degree of the image can be improved, and the accuracy of classifying the fuzzy degree of the image is improved.

Optionally, as shown in fig. 3, in some embodiments, the reconstruction network includes a first shallow feature extraction network layer, N down-sampling network layers, N first up-sampling network layers, a first convolution layer, and a second up-sampling network layer, which are connected in sequence, where N is an integer greater than 1, the N down-sampling network layers are connected in series in sequence, the N first up-sampling network layers are connected in series in sequence, and an input of an nth first up-sampling network layer is a feature obtained by fusing an output of a last network layer and an output of an N-N +1 th down-sampling network layer; wherein, the inputting the first image into the reconstruction network to obtain the first feature map comprises:

inputting the first image into the first shallow feature extraction network layer to perform shallow feature extraction processing to obtain a first sub-feature map;

sampling the first sub-feature graph by using the N down-sampling network layers and the N first up-sampling network layers to obtain a second sub-feature graph;

smoothing the second sub-feature map by using the first convolution layer to obtain a third sub-feature map;

and performing upsampling processing on the third sub-feature map by using the second upsampling network layer to obtain the first feature map.

It should be understood that the value of N may be set according to actual needs, for example, in this embodiment, the value of N may be 5.

Alternatively, the nth first upsampling network layer and the second upsampling network layer may adopt a deconvolution (or referred to as a transposed convolution) network structure, and the nth first downsampling network layer and the first N-1 first upsampling network layers may adopt a convolution network structure. The first shallow feature extraction network layer may adopt convolution of 2 layers 3*3. The first convolution layer may be a convolution of 2 layers 3*3.

In the embodiment of the present application, the convolution kernel of each convolution in the reconstructed network may be 64. As shown in fig. 3, the process flow of reconstructing the network is as follows:

first, shallow feature extraction is performed by using convolution of 2 layers 3*3 to obtain a first sub-feature map, and the convolution of 2 layers 3*3 can extract texture information of a picture, wherein the texture information includes medium-high frequency information.

And then, the extracted feature maps are sent to a down-sampling network layer to be down-sampled for 5 times, and the final sampling result is output to a first up-sampling network layer to be up-sampled for 4 times so as to keep the input and output feature maps of the reconstruction network unchanged. The high-frequency information lost when the network extracts the features is supplemented by residual error learning of down-sampling and up-sampling of symmetrical positions every time, the down-sampling can increase the robustness of small disturbance of an input image, such as image translation, rotation and the like, the risk of overfitting is reduced, the operation amount is reduced, the size of a receptive field is increased, the up-sampling can restore and decode the abstract features to the size of an original image, and the encoding and decoding can extract the abstract features such as fuzzy texture features, which represent the image, through the up-sampling and the down-sampling.

Next, the second sub-feature map of the last up-sampling is smoothed by convolution of 2 layers 3*3 to obtain a third sub-feature map.

And finally, the third sub-feature diagram is sent to a transposition convolution (Deconv) for final up-sampling to obtain a first feature diagram.

It should be noted that the structure of the downsampled network layer may be set according to actual needs, as shown in fig. 4, and in some embodiments, the downsampled network layer may include a 2-layer residual layer, a convolutional layer, and a max-pooling layer. In the process of down-sampling once, residual error learning can be carried out by a 2-layer residual error structure, the residual error learning can supplement information lost in the down-sampling process, then a layer of 3*3 convolution layer is used for further feature extraction, and finally a maximum pooling layer (MaxPool) is used for down-sampling.

Optionally, in some embodiments, the image quality evaluation network comprises a plurality of convolutional network layers and a first fully-connected network layer connected in sequence; wherein the step of inputting the first feature map into an image quality evaluation network to obtain the mean subjective opinion score of the first image comprises:

inputting the first feature map into the multilayer convolution network layer for feature extraction and fusion processing to obtain a second feature map with high-frequency abstract features;

and performing dimensionality reduction processing on the second feature map by using the first fully-connected network layer to obtain the mean subjective opinion score.

In this embodiment, the number of the first fully-connected network layers may be 2, where the last layer of the first fully-connected network layer is configured to output a MOS value of 0 to 5.

As shown in fig. 5, the multilayer convolutional network layer is formed by sequentially connecting the following network layers in series: two third convolutional layers of 3*3, a first mixed depth convolution kernel layer, a first connection layer, a fourth convolutional layer of 1*1, a first maximum pooling layer of 2*2, a second mixed depth convolution kernel layer, a second connection layer, a fifth convolutional layer of 1*1, a second maximum pooling layer of 2*2, a third mixed depth convolution kernel layer, a third connection layer, a sixth convolutional layer of 1*1, a third maximum pooling layer of 2*2, two seventh convolutional layers of 3*3, an eighth convolutional layer of 1*1, and a second global pooling layer, wherein the first mixed depth convolution kernel layer is a mixed depth of 3*3, 5*5, 7*7 and 5323 zxft 355623, the second mixed depth convolution kernel layer is a mixed depth layer of 355762, 3262 zxft 70 zxft, 3238 and 3438 zxft 35 zxft, the second connection layer is understood as the third convolutional layers of 3246 zxft, and the third convolutional layers are connected by the third convolutional layers of 3238, and the third convolutional layers of the third convolutional layers 6256 zxft, and the third mixed depth convolution kernel layers of the third mixed depth convolution kernel 5756, and the third mixed depth convolutional layers.

In the embodiment of the application, the process of performing feature extraction and fusion processing on the first feature map by the multilayer convolutional network layer to obtain the second feature map with high-frequency abstract features is as follows:

the first stage is as follows: firstly, performing shallow feature extraction by using a third convolutional layer of two 3*3 layers, performing feature extraction by using a first mixed depth convolutional kernel layer of 3*3, 5*5, 7*7 and 9*9, fusing the extracted features through a first connecting layer, performing dimension reduction and feature fusion by using a fourth convolutional layer of 1*1 as a bottleneck layer, and performing pooling by using a first maximum pooling layer of 2*2 to halve a feature map output by a previous layer;

and a second stage: firstly, feature extraction is carried out on a feature map output by a first maximum pooling layer through a second mixed depth convolution kernel layer of 3*3, 5*5 and 7*7, the extracted features are fused through a second connecting layer, then dimension reduction and feature fusion are carried out by using a fifth convolution layer of 1*1 as a bottleneck layer, and then pooling is carried out by using a second maximum pooling layer of 2*2 so as to reduce the feature map output by the previous layer of network by half;

and a third stage: firstly, feature extraction is carried out on the feature graph output by the second maximum pooling layer through a third mixed depth convolution kernel layer of 3*3, 5*5 and 7*7, the extracted features are fused through a second connecting layer, then dimension reduction and feature fusion are carried out by using a sixth convolution layer of 1*1 as a bottleneck layer, and then pooling is carried out by using a third maximum pooling layer of 2*2 so as to reduce the feature graph output by the previous network by half.

Abstract features can be well extracted through the descending cascade network corresponding to the three stages, the extracted features are smoothed through a seventh convolution layer of 2 3*3, dimensionality reduction is performed through an eighth convolution layer of 1*1, and then a feature graph of each channel is pooled into 1 value through a second global pooling layer. Finally, the dimensionality reduction is carried out through two first fully-connected network layers to obtain MOS values of 1-5.

Optionally, as shown in fig. 6, in some embodiments, the peak snr computing network includes a first residual error network layer, a feature extraction network layer, a second residual error network layer, a second convolution layer, a first global pooling layer, and a second fully-connected network layer, which are connected in sequence; the first residual network layer is configured to perform global residual processing on the first feature map to obtain a fourth sub-feature map, the feature extraction network layer is configured to perform high-frequency detail information feature extraction processing on the fourth sub-feature map to obtain a fifth sub-feature map, the second residual network layer is configured to perform global residual processing on the fifth sub-feature map to obtain a sixth sub-feature map, the second convolution layer is configured to perform smoothing processing on the feature map obtained by fusing the sixth sub-feature map and the fourth sub-feature map to obtain a seventh sub-feature map, the first global pooling layer is configured to perform compression processing on the seventh sub-feature map to obtain an eighth sub-feature map, and the second full-connection network layer is configured to perform dimensionality reduction processing on the eighth sub-feature map to obtain the peak signal-to-noise ratio.

In the embodiment of the application, the purpose that the first residual error network layer and the second residual error network layer are used for carrying out global residual error is to supplement middle-high frequency detail information which disappears in the convolution process. Optionally, the first residual network layer, the feature extraction network layer, the second residual network layer, and the second convolution layer may all adopt a convolution structure of 3*3. Wherein, the feature extraction network layer can adopt a convolution layer of 2 layers 3*3.

It should be noted that the PSNR value calculated by using the formula is only a pixel-level parameter, but the quality of an image cannot be judged by the PSNR value, in this embodiment of the present application, the reconstruction network may reconstruct a high-resolution image through a dictionary, but the dictionary includes a plurality of mappings, and the feasibility of the PSNR index may be increased within a certain range, so as to improve the accuracy of the image fuzzy judgment.

It should be noted that the image classification method provided in the embodiment of the present application may be applied to any scene that needs to be subjected to image blur degree determination, for example, in some embodiments, the image classification method may be applied to an image receiving scene to ensure the integrity of image reception.

If the first electronic device is used for receiving a plurality of images transmitted by the second electronic device, after each image is received, the currently received image can be input into the image classification model for blur degree judgment, and the image information judged to be the second type of image is stored.

After the receiving is completed, a retransmission request may be sent to the second electronic device, where the retransmission request carries the image information, and the image information is continuously input to the image classification model for fuzzy recognition in the retransmission process until all the received images are the first type of images. In the embodiment of the application, the image receiving quality can be ensured by applying the image classification method.

Further, referring to fig. 7, an embodiment of the present application further provides an image classification model training method, and as shown in fig. 7, the image classification model training method includes:

step 701, pre-training a to-be-trained peak signal-to-noise ratio calculation network by using a first sample image to obtain a peak signal-to-noise ratio calculation network;

step 702, inputting a second sample image into a pre-trained reconstruction network, and respectively inputting the output of the reconstruction network into the peak signal-to-noise ratio calculation network and the image quality evaluation network to be trained;

703, calculating a first loss value based on the output of the peak signal-to-noise ratio calculation network, and calculating a second loss value based on the output of the image quality evaluation network to be trained;

step 704, determining an image classification model based on the peak signal-to-noise ratio calculation network, the reconstruction network and the currently trained image quality evaluation network to be trained under the condition that the weighted sum of the first loss value and the second loss value meets a loss convergence condition;

In the embodiment of the present application, the loss convergence condition may be set according to actual needs. For example, in some embodiments, in the case that the variation of the weighted sum value is smaller than a preset value, a network structure including the peak signal-to-noise ratio calculation network, the reconstruction network, and the currently trained image quality evaluation network to be trained may be used as an image classification model. Wherein, the first loss value is loss1, the second loss value is loss2, and the weighted sum value is loss, and the loss can satisfy: loss = loss1/30 +2.

It should be understood that after the pre-training of the peak snr computing network to be trained is performed by using the first sample image, the network parameters of the trained peak snr computing network may be fixed, and at the same time, the network parameters of the pre-trained reconstructed network also remain fixed.

Optionally, when the first sample image is used for pre-training the to-be-trained peak signal-to-noise ratio calculation network, the high-definition image may be degraded to different degrees, the degradation model uses double-triple interpolation downsampling firstly, then uses different hyper-segmentation algorithms to perform hyper-segmentation on the downsampled image, and then performs PSNR calculation, the PSNR value serves as a training label, 3000 pieces of data (namely 3000 first sample images) are manufactured by using free data, and the labels are calculated PSNR values.

In the embodiment of the application, the pre-training of the peak signal to noise ratio calculation network is performed through the first sample image, then the pre-trained reconstruction network, the pre-trained peak signal to noise ratio calculation network and the image quality evaluation network to be trained are used for performing combined training to obtain the image classification model, and therefore the image classification model can be used for comprehensively judging the local part and the whole part of the first image, the accuracy of judging the fuzzy degree of the image can be improved, and the accuracy of classifying the fuzzy degree of the image can be improved.

It should be noted that, in the training process, the processing process of the reconstruction network, the peak signal-to-noise ratio calculation network, and the image quality evaluation network to be trained on the sample image to be trained is similar to the processing process of the reconstruction network, the peak signal-to-noise ratio calculation network, and the image quality evaluation network on the first image in the foregoing embodiment, and reference may be specifically made to the description of the foregoing embodiment. For example, the processing flow of each network structure on the second sample image based on the training process of the second sample image includes:

firstly, a second sample image can be input into a reconstruction network to obtain a third feature map;

then, inputting the third feature map into a peak signal-to-noise ratio calculation network to obtain the peak signal-to-noise ratio;

then, under the condition that the peak signal-to-noise ratio value is larger than or equal to a first threshold value, inputting the third feature map into an image quality evaluation network to obtain the average subjective opinion score of the second sample image;

finally, determining the blurring degree of the second sample image according to the peak signal-to-noise ratio value and the mean subjective opinion score;

determining the second sample image as a first type of image under the condition that the peak signal-to-noise ratio value is smaller than the first threshold value or the mean opinion score is larger than or equal to a second threshold value; and under the condition that the average subjective opinion score is smaller than the second threshold value, determining that the second sample image is a second type image, wherein the blurring degree of the first type image is smaller than that of the second type image.

Optionally, the reconstruction network includes a first shallow feature extraction network layer, N down-sampling network layers, N first up-sampling network layers, a first convolution layer, and a second up-sampling network layer, which are connected in sequence, where N is an integer greater than 1, the N down-sampling network layers are connected in series in sequence, the N first up-sampling network layers are connected in series in sequence, and an input of an nth first up-sampling network layer is a feature obtained by fusing an output of a network layer above the nth first up-sampling network layer and an output of an N-N +1 th down-sampling network layer; inputting the second sample image into a reconstruction network, and obtaining a third feature map comprises:

inputting the second sample image into the first shallow feature extraction network layer for shallow feature extraction processing to obtain a ninth sub-feature map;

sampling the ninth sub-feature map by using the N downsampling network layers and the N first upsampling network layers to obtain a tenth sub-feature map;

performing smoothing processing on the tenth sub-feature map by using the first convolution layer to obtain an eleventh sub-feature map;

and performing upsampling processing on the eleventh sub-feature map by using the second upsampling network layer to obtain the fourth feature map.

Optionally, the image quality evaluation network includes a multilayer convolutional network layer and a first fully-connected network layer which are connected in sequence; wherein the inputting the third feature map into an image quality evaluation network to obtain the mean subjective opinion score of the second sample image comprises:

inputting the third feature map into the multilayer convolution network layer for feature extraction and fusion processing to obtain a fourth feature map with high-frequency abstract features;

and performing dimensionality reduction processing on the fourth feature map by using the first fully-connected network layer to obtain the mean subjective opinion score.

Optionally, the peak signal-to-noise ratio calculation network includes a first residual error network layer, a feature extraction network layer, a second residual error network layer, a second convolution layer, a first global pooling layer, and a second full-connection network layer, which are connected in sequence; the first residual network layer is configured to perform global residual processing on the fourth feature map to obtain a twelfth sub-feature map, the feature extraction network layer is configured to perform high-frequency detail information feature extraction processing on the twelfth sub-feature map to obtain a thirteenth sub-feature map, the second residual network layer is configured to perform global residual processing on the thirteenth sub-feature map to obtain a fourteenth sub-feature map, the second convolution layer is configured to perform smoothing processing on the feature map obtained by fusing the fourteenth sub-feature map and the twelfth sub-feature map to obtain a fifteenth sub-feature map, the first global pooling layer is configured to perform compression processing on the fifteenth sub-feature map to the sixteenth sub-feature map, and the second fully-connected network layer is configured to perform dimensionality reduction processing on the sixteenth sub-feature map to obtain the peak signal-to-noise ratio.

It should be noted that, when a reconstructed network is obtained through pre-training, if the final amplification factor of the reconstructed network is 4 times, the reconstructed network can be directly trained at one time to be amplified by 4 times, or the reconstructed network to be trained can be trained to be amplified by 4 times step by step through three pre-training stages. For example, each stage amplifies the reconstruction network to be trained 4 by sample data ^1/3 . The training mode not only can reduce the burden of network training, but also can enlarge the network step by step so that the network can more easily obtain real results step by step.

It should be noted that, in the image classification method provided in the embodiment of the present application, the execution subject may be an image classification device, or a control module in the image classification device for executing the loaded image classification method. In the embodiment of the present application, an image classification device executes a loaded image classification method as an example, and the image classification method provided in the embodiment of the present application is described.

Referring to fig. 8, fig. 8 is a structural diagram of an image classification apparatus according to an embodiment of the present application, and as shown in fig. 8, an image classification apparatus 800 includes:

a first input module 801, configured to input the first image to a reconstruction network to obtain a first feature map;

a second input module 802, configured to input the first feature map into a peak signal-to-noise ratio calculation network, so as to obtain the peak signal-to-noise ratio;

a third input module 803, configured to input the first feature map into an image quality evaluation network to obtain a mean subjective opinion score of the first image when the peak signal-to-noise ratio is greater than or equal to a first threshold;

a first determining module 804, configured to determine a blur degree of the first image according to the peak signal-to-noise ratio value and the mean subjective opinion score;

Optionally, the reconstruction network includes a first shallow feature extraction network layer, N down-sampling network layers, N first up-sampling network layers, a first convolution layer, and a second up-sampling network layer, which are connected in sequence, where N is an integer greater than 1, the N down-sampling network layers are connected in series in sequence, the N first up-sampling network layers are connected in series in sequence, and an input of an nth first up-sampling network layer is a feature obtained by fusing an output of a network layer above the nth first up-sampling network layer and an output of an N-N +1 th down-sampling network layer; the first input module 801 is specifically configured to perform the following operations:

and utilizing the second upsampling network layer to perform upsampling processing on the third sub-feature map to obtain the first feature map.

Optionally, the image quality evaluation network includes a multilayer convolutional network layer and a first fully-connected network layer which are connected in sequence; the second input module is specifically configured to perform the following operations:

inputting the first feature map into the image quality evaluation network for feature extraction and fusion processing to obtain a second feature map with high-frequency abstract features;

Optionally, the peak signal-to-noise ratio calculation network includes a first residual error network layer, a feature extraction network layer, a second residual error network layer, a second convolution layer, a first global pooling layer, and a second full-connection network layer, which are connected in sequence; the first residual network layer is used for carrying out global residual processing on the first feature map to obtain a fourth sub-feature map, the feature extraction network layer is used for carrying out high-frequency detail information feature extraction processing on the fourth sub-feature map to obtain a fifth sub-feature map, the second residual network layer is used for carrying out global residual processing on the fifth sub-feature map to obtain a sixth sub-feature map, the second convolution layer is used for carrying out smoothing processing on the feature map obtained by fusing the sixth sub-feature map and the fourth sub-feature map to obtain a seventh sub-feature map, the first global pooling layer is used for carrying out compression processing on the seventh sub-feature map to the eighth sub-feature map, and the second full-connection network layer is used for carrying out dimensionality reduction processing on the eighth sub-feature map to obtain the peak signal-to-noise ratio.

Optionally, the image classification apparatus 800 further includes:

and the adjusting module is used for adjusting the size of the first image to be processed to the preset size.

It should be noted that, in the image classification model training method provided in the embodiment of the present application, the execution subject may be an image classification model training apparatus, or a control module in the image classification model training apparatus for executing the loaded image classification model training method. In the embodiment of the present application, an example in which an image classification model training device executes a training method for loading an image classification model is taken as an example, and the training method for the image classification model provided in the embodiment of the present application is described.

Referring to fig. 9, fig. 9 is a block diagram of an image classification model training apparatus according to an embodiment of the present application, and as shown in fig. 9, the image classification model training apparatus 900 includes:

the first training module 901 is configured to pre-train a to-be-trained peak signal-to-noise ratio calculation network by using a first sample image to obtain a peak signal-to-noise ratio calculation network;

a second training module 902, configured to input a second sample image to a pre-trained reconstruction network, and input outputs of the reconstruction network to the peak signal-to-noise ratio calculation network and the quality evaluation network of an image to be trained, respectively;

a calculating module 903, configured to calculate a first loss value based on the output of the peak signal-to-noise ratio calculation network, and calculate a second loss value based on the output of the to-be-trained image quality evaluation network;

a second determining module 904, configured to determine an image classification model based on the peak signal-to-noise ratio computing network, the reconstruction network, and the currently trained image quality evaluation network to be trained, if the weighted sum of the first loss value and the second loss value satisfies a loss convergence condition;

Optionally, the second training module 902 comprises:

the first input unit is used for inputting a second sample image with a preset size into a reconstruction network to obtain a third feature map with amplified resolution;

the second input power supply is used for inputting the third characteristic diagram into a peak signal-to-noise ratio calculation network to obtain the peak signal-to-noise ratio;

a third input unit, configured to input the third feature map into an image quality evaluation network when the peak signal-to-noise ratio is greater than or equal to a first threshold, to obtain a mean subjective opinion score of the second sample image;

a determination unit configured to determine a blur degree classification of the second sample image;

Optionally, the reconstruction network includes a first shallow feature extraction network layer, N down-sampling network layers, N first up-sampling network layers, a first convolution layer, and a second up-sampling network layer, which are connected in sequence, where N is an integer greater than 1, the N down-sampling network layers are connected in series in sequence, the N first up-sampling network layers are connected in series in sequence, and an input of an nth first up-sampling network layer is a feature obtained by fusing an output of a network layer above the nth first up-sampling network layer and an output of an N-N +1 th down-sampling network layer; the first input unit is specifically configured to perform the following operations:

inputting the second sample image with the preset size into the first shallow feature extraction network layer for shallow feature extraction processing to obtain a ninth sub-feature map;

smoothing the tenth sub-feature map by using the first convolution layer to obtain an eleventh sub-feature map;

Optionally, the image quality evaluation network includes a multilayer convolutional network layer and a first fully-connected network layer which are connected in sequence; the second input unit is specifically configured to perform the following operations:

and performing dimension reduction processing on the fourth feature map by using the first fully-connected network layer to obtain the mean subjective opinion score.

Optionally, the peak signal-to-noise ratio calculation network includes a first residual error network layer, a feature extraction network layer, a second residual error network layer, a second convolution layer, a first global pooling layer, and a second full-connection network layer, which are connected in sequence; the first residual network layer is configured to perform global residual processing on the fourth feature map to obtain a twelfth sub-feature map, the feature extraction network layer is configured to perform high-frequency detail information feature extraction processing on the twelfth sub-feature map to obtain a thirteenth sub-feature map, the second residual network layer is configured to perform global residual processing on the thirteenth sub-feature map to obtain a fourteenth sub-feature map, the second convolution layer is configured to perform smoothing processing on the feature map obtained by fusing the fourteenth sub-feature map and the twelfth sub-feature map to obtain a fifteenth sub-feature map, the first global pooling layer is configured to perform compression processing on the fifteenth sub-feature map to the sixteenth sub-feature map, and the second full-connection network layer is configured to perform dimensionality reduction processing on the sixteenth sub-feature map to obtain the peak signal-to-noise ratio.

The image classification device and the image classification model training device in the embodiment of the present application may be devices, or may be components, integrated circuits, or chips in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The image classification device and the image classification model training device in the embodiment of the present application may be devices having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which is not specifically limited in the embodiment of the present application.

The image classification device and the image classification model training device provided in the embodiment of the present application can implement each process in the method embodiments of fig. 1 to 7, and are not described here again to avoid repetition.

Optionally, an electronic device is further provided in this embodiment of the present application, and includes a processor 1010, a memory 1009, and a program or an instruction stored in the memory 1009 and capable of running on the processor 1010, where the program or the instruction is executed by the processor 1010 to implement each process of the above-described embodiment of the image classification method or the image classification model training method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 11 is a schematic hardware structure diagram of an electronic device implementing various embodiments of the present application.

The electronic device 1100 includes, but is not limited to: a radio frequency unit 1101, a network module 1102, an audio output unit 1103, an input unit 1104, a sensor 1105, a display unit 1106, a user input unit 1107, an interface unit 1108, a memory 1109, a processor 1110, and the like.

Those skilled in the art will appreciate that the electronic device 1100 may further include a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 1110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system. The electronic device structure shown in fig. x does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

Wherein, the processor 1110 is configured to perform the following operations:

determining the blurring degree of the first image according to the peak signal-to-noise ratio value and the mean opinion score;

Or processor 1110 for performing the following operations:

pre-training a to-be-trained peak signal-to-noise ratio calculation network by utilizing a first sample image to obtain a peak signal-to-noise ratio calculation network;

under the condition that the weighted sum of the first loss value and the second loss value meets a loss convergence condition, determining an image classification model based on the peak signal-to-noise ratio calculation network, the reconstruction network and the currently trained image quality evaluation network to be trained;

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the program or the instruction implements the processes of the embodiment of the image classification method or the embodiment of the image classification model, and can achieve the same technical effects, and in order to avoid repetition, the detailed description is omitted here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the embodiment of the image classification method or the image classification model training method, and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image classification method, comprising:

inputting the first image into a reconstruction network to obtain a first feature map;

2. The method according to claim 1, wherein the reconstruction network includes a first shallow feature extraction network layer, N downsampling network layers, N first upsampling network layers, a first convolution layer, and a second upsampling network layer, which are connected in sequence, where N is an integer greater than 1, the N downsampling network layers are connected in series in sequence, the N first upsampling network layers are connected in series in sequence, and an input of an nth first upsampling network layer is a feature obtained by fusing an output of a previous network layer and an output of an N-N +1 downsampling network layer; wherein, the inputting the first image into the reconstruction network to obtain the first feature map comprises:

3. The method of claim 1, wherein the image quality evaluation network comprises a plurality of convolutional network layers and a first fully-connected network layer connected in sequence; wherein, the inputting the first feature map into an image quality evaluation network to obtain the mean subjective opinion score of the first image comprises:

and performing dimension reduction processing on the second characteristic diagram by using the first fully-connected network layer to obtain the mean subjective opinion score.

4. The method of claim 1, wherein the peak snr computation network comprises a first residual network layer, a feature extraction network layer, a second residual network layer, a second convolutional layer, a first global pooling layer, and a second fully-connected network layer, which are connected in sequence; the first residual network layer is configured to perform global residual processing on the first feature map to obtain a fourth sub-feature map, the feature extraction network layer is configured to perform high-frequency detail information feature extraction processing on the fourth sub-feature map to obtain a fifth sub-feature map, the second residual network layer is configured to perform global residual processing on the fifth sub-feature map to obtain a sixth sub-feature map, the second convolution layer is configured to perform smoothing processing on the feature map obtained by fusing the sixth sub-feature map and the fourth sub-feature map to obtain a seventh sub-feature map, the first global pooling layer is configured to perform compression processing on the seventh sub-feature map to obtain an eighth sub-feature map, and the second full-connection network layer is configured to perform dimensionality reduction processing on the eighth sub-feature map to obtain the peak signal-to-noise ratio.

5. The method of claim 1, wherein before inputting the first image into the reconstruction network to obtain the first feature map, the method further comprises:

and adjusting the size of the first image to be processed to be a preset size.

6. An image classification model training method is characterized by comprising the following steps:

in the image classification model, the peak signal-to-noise ratio calculation network is used for inputting the output of the reconstruction network into an image quality evaluation network for image blur degree classification under the condition that the output peak signal-to-noise ratio value is greater than or equal to a first threshold value.

7. An image classification apparatus, comprising:

8. An image classification model training device, comprising:

9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the image classification method according to any one of claims 1 to 5 or implementing the steps of the image classification model training method according to claim 6.

10. A readable storage medium on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the image classification method of any one of claims 1 to 5 or carry out the steps of the image classification model training method of claim 6.