CN112132145B

CN112132145B - Image classification method and system based on model extended convolutional neural network

Info

Publication number: CN112132145B
Application number: CN202010768636.6A
Authority: CN
Inventors: 李岩山; 陈嘉欢
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2023-08-01
Anticipated expiration: 2040-08-03
Also published as: CN112132145A

Abstract

The invention discloses an image classification method and system based on a model expansion convolutional neural network, which uses an image segmentation and pixel point threshold judgment algorithm to preprocess images, paves the subsequent decision fusion process, removes images with lower information content, uses a detail feature attention model to optimize the traditional network, uses a model expansion and post convolutional feature extraction mechanism to enable the network to adapt and explore deeper features and endow more weight on coefficient level, enables the attention mechanism to be more biased to local or detail features, moves a convolutional layer to create new image features and reduce dimensions through linear combination, obtains an image feature vector image with more information content, has the capability of protecting the image feature information content and reducing the fitting risk of the model, further realizes more accurate classification of the images, increases the decision fusion process at the rear end of the network, reduces the interference of the mixed features of the image local on classification results, and further improves the accuracy of a classifier.

Description

Image classification method and system based on model extended convolutional neural network

Technical Field

The invention relates to the field of cognitive image processing, in particular to an image classification method and system based on a model extended convolutional neural network.

Background

Classification problems have been important research problems in the field of deep learning research, and in supervised learning, substantially all problems can be categorized as classification or regression problems. The traditional image classification of deep learning is mainly based on a convolutional neural network, the number of channels of an input image is the same as that of channels of a convolutional kernel, the input image is subjected to dimension reduction through convolutional operation, feature extraction and edge information acquisition, and features of the image on various layers are learned through pooling operation. However, due to limitations in terms of computational resources and time, it is difficult to compromise both classification accuracy and time complexity.

Based on image classification, an image classification algorithm based on deep learning is recently developed, and dimensions of an image are decomposed to obtain a multi-channel two-dimensional image, and the multi-channel two-dimensional image is converted into a plurality of groups of representations. Subsequently, the two-dimensional arrays of different channels are put into a convolutional neural Network (Convolutional Neural Networks, CNN), a Residual Network (res net), and the convolutional neural Network (Recurrent Neural Networks, RNN) is used to learn the characteristic information and the edge information of different dimensions of the image.

The CNN-based method is to search and learn image features through the cooperation of a convolution layer, a pooling layer and a full connection layer. The method has stronger robustness, however, the network depth explosiveness increase required by improving the accuracy after the accuracy reaches a certain value is difficult to reach an excellent standard, and the robustness of the result is poor. The ResNet adds a residual mapping function on the basis of CNN, so that the problems of gradient disappearance and gradient explosion in the process of increasing the network depth are alleviated, the problems of large consumption of resources and high time complexity in the process of network training are not solved, and the problem of model degradation occurs when the number of layers exceeds thousands of levels. The pixel values of different channels of the image are connected in series to form vectors based on the RNN method, and then the vectors are input into the RNN model, so that the corresponding relation between the features is difficult to learn, the features of the image are also difficult to express, the description capability of space information is also very limited, and a large data set is difficult to drive. The traditional method is insufficient to extract detail features with deeper depth on the premise of a small sample, which makes the image edge information and the feature information difficult to associate, causes the problem of confusion of different types of features and gradient disappearance, and therefore, is difficult to obtain a good classification effect.

Disclosure of Invention

Therefore, the invention aims to overcome the defect that the network in the prior art cannot fully express the characteristics of the image so as to have poor classification effect, and further provides an image classification method and system based on a model expansion convolutional neural network.

In order to achieve the above purpose, the present invention provides the following technical solutions:

in a first aspect, an embodiment of the present invention provides an image classification method based on a model extended convolutional neural network, including the steps of:

preprocessing the images to be classified, and taking the original image corresponding to the image meeting the condition as an input image of the network;

improving a convolution network through model expansion and a post-convolution feature extraction mechanism to form a detail feature attention model to extract features of an input image so as to obtain image features;

the method comprises the steps of utilizing a mobile convolution layer to filter each input channel, calculating the linear combination of the input channels to create new image features, and reducing the dimension of the image features to obtain an image feature vector diagram with more information content;

and classifying and judging the images based on the image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient as a final classification result through back-end decision fusion based on a preset back-end decision fusion algorithm.

In an embodiment, the preprocessing the image to be classified, taking an original image corresponding to the image meeting the condition as an input image of the network, includes:

dividing an image to be classified into a plurality of subgraphs, removing subgraphs with less information, rotating the subgraphs with representative significance, mirroring and adjusting contrast, and enhancing image characteristics to obtain an input image, wherein the divided subgraphs are S _i The gray pixel value image is obtained by the following formula:

wherein R (x, y), G (x, y), B (x, y) respectively represent the average value of the pixel values of the three channels in the subgraph, and are calculated according to the average pixel gray value of the image;

taking an original image corresponding to an image meeting preset conditions as an input image of a network, wherein the preset conditions are as follows:

num(S _gray (x _j ,y _j ))>K

the threshold value of the preset gray value is P, the threshold value of the number of the image pixels is K, and n is the size of the image pixels.

In one embodiment, the process of improving a convolution network through model expansion and post-convolution feature extraction mechanism to form a detailed feature attention model to extract features from an input image and obtain image features includes:

the length, width and resolution of a traditional convolution network are changed by uniformly scaling the model in a constant proportion, and the precision of the model is maximized under the preset resource constraint, wherein the constraint condition is as follows:

Wherein w, d, r are the width, depth and resolution of the extended network, respectively;and +.> Is a parameter of a predefined base line network；

Obtaining weight factors of the channels by using weighted average values of characteristic parameters of a base network, adding a pooling layer and a full-connection layer in the process of extracting the characteristics to finally obtain the characteristics of the image, and stringing the characteristics of the different channels in dimensions to finally obtain a characteristic diagram of the image;

the method comprises the steps of obtaining information quantity of an image, and dividing the information quantity of a characteristic image with higher confidence in an area and the information quantity of the characteristic image with lower confidence through a clustering algorithm, wherein the judgment basis of a characteristic image with high confidence is as follows:

wherein H is ₀ For the points on the two-dimensional coordinates corresponding to the randomly selected image information, the value of K is a preset range, and the corresponding value of K is H on the two-dimensional coordinates ₀ The feature map corresponding to the point within the range K subsequently passes through a network with less expansion, and conversely, passes through a network with more expansion.

In one embodiment, the information amount H of the image is calculated by the following formula:

wherein, a certain pixel value of each channel has a size of P _i The probability of pixel value of each channel is respectively

In one embodiment, the mobile convolution layer comprises: a depth separable convolutional layer and a linear residual bottleneck layer connected thereto, wherein:

The depth separable convolution layer is to replace a complete convolution operator with a convolution operator decomposed into two independent layers, wherein the first layer is a depth convolution for performing lightweight filtering, and a convolution filter is applied to each input channel; the second layer is a point convolution for creating new image features by computing a linear combination of input channels;

the linear residual bottleneck layer is used for performing dimension reduction processing on the image features and extracting the information quantity of the feature vector diagram.

In an embodiment, a process for classifying and judging an image based on an image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient as a final classification result through back-end decision fusion based on a preset back-end decision fusion algorithm includes:

the posterior probability calculation formula output by the Softmax layer of the sub-graph judgment and classification process channel network is as follows:

wherein p represents a classification category, V _p And V _j Respectively representing the confidence of the category and the confidence of a certain category, and deducing the classification result of the subgraph through the following formula:

prob _max (i)＝max{prob(1),prob(2)...prob(M)}

the classifying result of the subgraph is added into the candidate region, and the characteristic coefficient obtained by the ith subgraph is m _i Then in the candidate region, the result z= { Z of classification ₁ ,Z ₂ ...Z _M The score of the category for the original image is expressed as:

wherein k represents the result label corresponding to the classification result, and N represents the number of the sub-graphs by the pixel threshold judgment method;

combining the judgment categories of all the effective subgraphs to obtain a decision fusion classification result, and calculating by the following formula:

Z _last ＝max{fra(Z ₁ ),fra(Z ₂ )...fra(Z _M )}。

in an embodiment, the image to be classified is a fine-grained image.

In a second aspect, an embodiment of the present invention provides an image classification system based on a model-extended convolutional neural network, including:

the image preprocessing module is used for preprocessing the images to be classified, and taking the original image corresponding to the image meeting the condition as an input image of the network;

the image feature acquisition module is used for improving a convolution network through model expansion and a post-convolution feature extraction mechanism to form a detail feature attention model to extract features of an input image so as to obtain image features;

the image feature vector diagram acquisition module is used for utilizing the mobile convolution layer to filter each input channel, then calculating the linear combination of the input channels to create new image features, and reducing the dimension of the image features to obtain an image feature vector diagram with more information content;

the classification result acquisition module is used for carrying out classification judgment on the image based on the image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient through back-end decision fusion based on a preset back-end decision fusion algorithm to serve as a final classification result.

In a third aspect, embodiments of the present invention provide a computer readable storage medium storing computer instructions for causing a computer to perform the image classification method based on the model-extended convolutional neural network of the first aspect of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer apparatus, including: the image classification method based on the model expansion convolutional neural network comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the image classification method based on the model expansion convolutional neural network in the first aspect of the embodiment of the invention is executed.

The technical scheme of the invention has the following advantages:

1. the image classification method and the system based on the model extended convolutional neural network provided by the invention are used for preprocessing the images to be classified, and the original image corresponding to the image meeting the condition is used as the input image of the network; improving a convolution network through model expansion and a post-convolution feature extraction mechanism to form a detail feature attention model to extract features of an input image so as to obtain image features; the method comprises the steps of utilizing a mobile convolution layer to filter each input channel, calculating the linear combination of the input channels to create new image features, and reducing the dimension of the image features to obtain an image feature vector diagram with more information content; and classifying and judging the images based on the image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient as a final classification result through back-end decision fusion based on a preset back-end decision fusion algorithm. The embodiment of the invention enhances the capturing capability of the convolutional network to deep image features based on the model extended convolutional neural network, fully expresses the image features, moves the convolutional layer model, reduces the calculation time of the deep network, protects the image features, reduces the risk of model overfitting, properly reduces the operation amount of the network, simultaneously combines a decision fusion algorithm, simulates repeated experiments to reduce the interference of the partially confused features of the image on classification results, and improves the classification accuracy of the classifier.

2. The image preprocessing method provided by the invention uses a pixel point threshold detection method, and removes a pure background sub-image and a sub-image with less information content by image gray level conversion and setting the gray level number and the gray level threshold of the pixel points.

3. The invention provides a detail feature attention model of model expansion, gives different model expansion and confidence weights to different features by considering feature coefficients of respective channels, and provides a feature feedback method, combines shannon theorem and clustering model feedback and eliminates features with less information.

4. The invention provides a mobile convolution layer model, which comprises a depth separable convolution layer and a linear residual bottleneck layer. Depth separable convolutional layers simplify the complexity of the network and time complexity by replacing a complete convolutional operator with a decomposed convolutional operator, using an inserted convolutional instead of a standard convolutional layer, providing lower time consumption and exchanging for nearly equivalent feature extraction capabilities. The linear residual bottleneck layer enables the traditional residual block network parameters to be effectively balanced between combination and precision through a width multiplication method, and the dimension of an activation space is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a workflow diagram of one specific example of a model-extended convolutional neural network-based image classification method in an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of a specific example of an image classification method based on a model-extended convolutional neural network in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of a detailed feature attention model in an embodiment of the present invention;

FIG. 4 is a schematic diagram of the operation of a mobile convolutional layer according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the operation of a separable convolution block in an embodiment of the present invention;

FIG. 6 is a block diagram of a specific example of an image classification system based on a model-extended convolutional neural network in an embodiment of the present invention;

fig. 7 is a composition diagram of a specific example of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Example 1

The embodiment of the invention provides an image classification method based on a model expansion convolutional neural network, which can be used for scenes such as automatic album classification, traffic scene object detection, animal identification and the like, and the workflow of the method is shown in a figure 1, and as shown in a figure 2, the method specifically comprises the following steps:

step S10: preprocessing the images to be classified, and taking the original image corresponding to the image meeting the condition as an input image of the network.

The images to be classified in the embodiment of the invention can be fine-grained images or common images, and the difference between the fine-grained images and the common images is mainly represented as follows: firstly, fine-granularity image features are not obvious, so that the problems that the classifier is easy to be excessively fitted or the gradient cannot be reduced are caused; secondly, the inter-class gap of the fine-grained images is smaller, the characteristics of the classes are similar, and higher requirements are put on the performance of the classifier; finally, the problem of a smaller number of data sets is that too little data can easily overfit the classifier. The network model provided by the embodiment of the invention is improved based on the traditional convolution network algorithm, and the accuracy rate can be higher by classifying the fine-grained images.

The image needs to be preprocessed before classification, if the data set is smaller, the expansion data set is a factor which needs to be considered in the data preprocessing process, and the method for combining image segmentation and pixel point threshold judgment in the embodiment of the invention comprises the following steps: firstly, image segmentation is carried out on a picture to obtain a plurality of subgraphs, images with smaller information content in the subgraphs are removed, the images with representative significance are rotated, mirrored and contrast adjusted, the characteristics of the images are enhanced to obtain an input image, the input image is then represented as a two-dimensional array of three channels, a certain channel is set as i, and S represents the subgraphs to be classified:

because the image is an RGB data image with three channels, and weights of the three channels are equal and have no high-low score, if three channels are used for respectively judging pixel threshold values, the calculated amount and the network training amount are multiplied, and the segmented image without practical meaning is converted into the network training parameters to cause noise interference of a data set, the embodiment of the invention processes the sub-image by using a gray algorithm: assume that the segmented sub-image is S _i The gray pixel value image is obtained by the following formula:

wherein R (x, y), G (x, y), B (x, y) represent the average of the pixel values of the three channels in the sub-graph, respectively. After that, the average pixel gray value of the image is calculated, the threshold value of the set gray value is assumed to be P, the threshold value of the set pixel number of the image is assumed to be K, the pure background image is removed through the threshold value of the gray value and the average pixel gray value, then the image with the smaller gray value number is removed through calculating the number of the image gray values, and the final classified image with a certain effective information amount and capable of being used for reference is obtained, and the specific calculation is as follows:

num(S _gray (x _j ,y _j ))>K (3)

Where n is the size of the image pixel. And finally, taking the original image corresponding to the image meeting the condition as the input of the network, and making a bedding for the subsequent image expansion convolution and decision fusion.

Step S20: and improving a convolution network through model expansion and a post convolution feature extraction mechanism to form a detail feature attention model to extract features of the input image so as to obtain image features.

In the image classification process, the characteristics and the edge information carry important classifier discrimination basis, for example, in the process of recognizing animal fur, more important features such as the shape of fur edge thorns, whether the fur middle end is smooth and has color marks and the like are required to be paid attention to, bird features are feathers and pointed pecks with sparse edges, sheep features have unique angle structures and face characteristics and the like, however, the existing method is more important to the display of the integral characteristics of the image, such as the central object edge information extracted by a shallow network and the overall outline. The embodiment of the invention improves the original convolution network, provides a Detail Feature Attention Model (DFAM), can adapt and explore deeper features and give more weight on parameter level through model expansion and post convolution feature extraction mechanism, so that the attention mechanism of the model is more biased to local or detail features, and the schematic diagram of the DFAM is shown in figure 3.

In the embodiment of the present invention, the conventional convolutional network is first referred to as N, and the ith convolutional layer can be regarded as the following function mapping:wherein->Is an operator, Y _i To output vector X _i For input vectors, assume the dimension of the input is<H _i ,W _i ,C _i >Wherein H is _i ，W _i Is the space dimension, C _i Is the channel dimension. Network N is represented by a list of layers:

in practice, the network layer is often divided into a plurality of stages, all layers of each stage sharing the same architecture, all layers of each stage having the same convolution type except for the first layer downsampling. The network can thus be further described as:

wherein the method comprises the steps ofRepresentation->Repeat L in stage i _i And twice. In convolutional neural networks, its spatial dimension is progressively smaller, while the channel dimension is expanded layer by layer. Therefore, in order to find a better layer structure +.>The embodiment of the invention is not changed->In the predefined case of a network of base lines, the length and width (L _i And C _i ) Or resolution. In order to further reduce the design space, limiting the fact that all layers must be scaled uniformly at a constant scale, the goal is to maximize the accuracy of the model under any given resource constraint, which can be expressed as an optimization problem:

Wherein w, d, r are the width, depth and resolution of the extended network, respectively;and +.> Is a parameter of a predefined base line network.

Under the guidance of the characteristic parameters of the base network, the image sequence characteristic diagram U= { U after the local characteristic transformation of the image ₁ ,u ₂ ,…,u _t ,…,u _T }, whereinFirstly, the weighted average value of the characteristic parameters of the base line network is utilized to obtain the weight factors of the channels, the weight factors show the channel characteristic ratio under the condition of different weights, the image with the channel characteristic ratio can be self-adaptively obtained with more weights in the network, the corresponding characteristic learning is more thorough, and the weight factors of the characteristics are not shared with each other. The weight factors can be obtained through different calculation modes, for example, the weight factors of all channels can be determined through the information quantity H of the feature map, the information entropy of the feature map or the average pixel value as the basis, and the characteristics of more information quantity are endowed with higher weight factors, and the characteristics of more information quantity are low in the contrary. In the process of extracting the features, a pooling layer and a full connection layer are added, the influence of the image feature position on the weight factors of the network features is reduced, and finally the features of the image are obtained:

and (3) stringing the two channels together in the dimensions of different channels to finally obtain a feature map U '= [ U' ₁ ,u′ ₂ ,…,u′ _t ,…,u′ _T ]. Therefore, the feedforward structure of the model can obtain a deeper image characteristic diagram, which corresponds to the feedforward structureThe calculation time is increased, partial effective information may be lost between different channels, but more information is lost, the low-level characteristic information is lost, the high-level characteristic information is less lost, the influence on the result is negligible, and the influence of the position information of the image characteristic on the network is correspondingly reduced.

In order to make the features more excellent overall features, the invention adds a feature feedback structure at the end of the network. Assume that the obtained feature map U '= [ U ]' ₁ ,u′ ₂ ,…,u′ _t ,…,u′ _T ]The feature map information and data need to be stored, and the feature map information and data are returned and compared with the extracted feature maps of other sub-images, and assuming that the pixel value of the feature map of the image of different channels is i, the pixel value of each channel is p _i The probability of pixel value of each channel is respectively(the pixel value probability can be calculated from the pixel gray level histogram). The invention obtains the information quantity H of the image by improving the shannon theorem:

the information quantity of the obtained image is a data group with aggregation, and under the condition that sample data are not much, the characteristic image information quantity with higher confidence and the characteristic image information quantity with lower confidence in the region can be effectively divided by a clustering KNN algorithm (which is only used as an example, not limited by the example and can also be other clustering algorithms). According to the characteristics of the images to be classified (for example, according to the types of animal fur), a more reasonable K value can be determined experimentally, and the judgment basis of the high-confidence characteristic diagram is obtained:

Wherein H is ₀ For randomly-selected image information quantityThe value of K is a self-set range corresponding to H on two-dimensional coordinates ₀ The feature map corresponding to the points within the range K may then pass through a less extended network, or vice versa, through a higher extended network. The value of K can be selected according to the data set, if the information content of the data set is smaller, a larger value of K can be selected to reduce subsequent redundant calculation, and if the data set is larger, a smaller value of K can be selected to improve the network classification accuracy.

In the feedback feature set, features with lower information content have smaller feature coefficients, lower confidence weights are occupied in classification results, the network of the images with lower information content can be adaptively expanded in the subsequent feature extraction process, the depth and width of the network layer of the subsequent mobile convolution layer are slightly improved to obtain finer features, if the feature attributes of the images can not be better extracted, the corresponding network tail ends can be distributed with lower result weights to reduce interference of noise information on decision fusion, and the accuracy of classification results is guaranteed to the greatest extent.

Step S30: and (3) utilizing the moving convolution layer to filter each input channel, calculating the linear combination of the input channels to create new image features, and reducing the dimension of the image features to obtain an image feature vector diagram with more information content.

The mobile convolution layer (MOVConv) of the present embodiment includes an initial full convolution layer of depth separable convolution (32 filters) and 19 residual bottleneck layers (the number of filters and the residual bottleneck layers are only examples, but not limited thereto). The embodiment of the invention uses ReLU6 as a nonlinear layer, has higher robustness and can be used for low-precision calculation, the used core size is 3 multiplied by 3 as a standard of a network, and the dropout and batch regularization processing are used in a penetrating way to prevent the occurrence of network overfitting. A schematic diagram of the operation of the mobile convolutional layer is shown in fig. 4.

In the embodiment of the invention, the depth separable convolution can be used for a convolution block of a high-efficiency neural network structure, and the basic idea is to replace a complete convolution operator with a decomposed convolution operator, which decomposes the convolution into two independent layers. The first layer, called the depth convolution, performs lightweight filtering, applying one convolution filter to each input channel. The second layer is a convolution of the size 1 x 1, called point convolution, responsible for creating new image features by computing the linear combination of the input channels, the working schematic of the separable convolution block is shown in fig. 5.

The convolution layer acquisition size of the existing standard is h _i ×w _i ×d _i Is the input vector L of (2) _i And useThe convolution kernel of (a) produces a vector of the same scale output. Therefore, the calculated amount of the standard convolution layer is h _i ·w _i ·d _i ·d _j K.k. However, the depth separable convolution uses an inserted convolution instead of the standard convolution layer, in which case the inserted convolution can provide lower time consumption and trade for almost equivalent feature extraction capabilities, simplifying the complexity of the network and the time complexity, by the following: h is a _i ·w _i ·d _i ·(k ² +d _j ). Depth separable convolutions effectively reduce by about k compared to conventional layers ² The amount of calculation of the times. MOVConv uses k=3 (3×3 separable convolutions), so the computational cost is 8 to 9 times that of standard convolutions, and there is only a small reduction in accuracy.

Since the neural network considering one depth has L with n layers _i And the activation tensors are all of size h _i ×w _i ×d _i . The feature manifolds extracted from a general neural network can be embedded in a low-dimensional subspace, and when all single channel pixels of the deep convolutional layer are observed, the information encoded in these values is actually present in some manifolds, which in turn can be embedded in a low-dimensional subspace. Therefore, it is necessary to reduce the dimension of one layer and thus the dimension of the operation space. The embodiment of the invention effectively balances the combination and the precision of parameters through a width multiplication method, and the width multiplication method can reduce the dimension of an activation space until The symptom vector spans the entire space.

The embodiment of the invention provides a linear residual bottleneck layer, and the expression capacity of the linear residual bottleneck layer is as follows: if the valid manifold remains a certain non-0 value after transformation, it corresponds to a linear transformation and the ReLU function can hold the complete information of the input manifold, but only if the input manifold is located in a low-dimensional subspace of the input space. The experimental results show that: the structure of the existing neural network can be optimized by inserting the linear bottleneck layer into the structure block, so that the nonlinear layer is prevented from damaging more information, and the information quantity of the network extracted feature vector diagram is improved. Therefore, the invention constructs a moving convolution layer by connecting separable convolution blocks in series with a linear residual bottleneck layer.

DFAM and MOVConv can improve the efficiency of the network and the extraction of deep features and can be embedded into existing CNN structures to improve model performance. Therefore, the embodiment of the invention combines the detailed feature attention model, the mobile convolution layer and the traditional convolution, and the formed image classification network is called a model extended convolution neural network (ME-CNN).

The invention takes MOVConv and a common convolution layer as a backbone network, embeds a characteristic attention model into the backbone network to form ME-CNN, wherein a nested unit of the MOVConv layer and the DFAM is named as MBDFAM-Block. Features with different data dimensions are acquired step by step through the images of the units, features with lower information content are endowed with lower feature coefficients, more deep features are adaptively extracted through network width expansion, and the original features can be transferred. In addition, DFAM and MOVConv can be embedded in most pre-trained networks, and as a network optimization algorithm, new weights can be learned in the migration learning as long as parameters of the DFAM are set to be initialized to a certain constant. In one embodiment, the ME-CNN network structure is as follows:

/>

Step S40: and classifying and judging the images based on the image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient as a final classification result through back-end decision fusion based on a preset back-end decision fusion algorithm.

The embodiment of the invention carries out classification judgment on the image based on the image feature vector diagram to obtain a classification result, and in the sub-graph judgment classification process, the posterior probability output by the Softmax layer of the channel network is calculated as follows:

wherein p represents a classification category, V _p And V _j Respectively representing the confidence of the category and the confidence of a certain category, and deducing the classification result of the subgraph through the formula:

prob _max (i)＝max{prob(1),prob(2)...prob(M)} (13)

in order to improve the final classification accuracy and expand the image dataset, the invention adopts an algorithm of image segmentation and judgment based on pixel threshold values. Based on the data set processing before the feature extraction network, a back-end decision fusion algorithm is added: the classification result of the sub-images obtained by the depth network model is added into the candidate region, and the characteristic coefficient obtained by the ith sub-image is set as m _i The result z= { Z of classification is set in the candidate region ₁ ,Z ₂ ...Z _M The score of the category for the original image is expressed as:

wherein k represents the result label corresponding to the classification result, and N represents the number of the sub-graphs by the pixel threshold judgment method. Finally, combining the judgment categories of all the effective subgraphs to obtain a decision fusion classification result as follows:

Z _last ＝max{fra(Z ₁ ),fra(Z ₂ )...fra(Z _M )} (16)

The classification result obtained through the back-end decision fusion has better robustness, wherein the overall measurement of the sub-image information quantity is considered, the effect of repeated experiments is simulated in a segmentation mode, the interference of the locally confused image features on the classification result is reduced, and the obtained classifier result is superior to the traditional classifier algorithm.

The embodiment of the invention provides an image classification method based on a model-expanded convolutional neural network, which enhances the capturing capacity of the convolutional network on deep image features and can properly reduce the operation amount of the network. Firstly, preprocessing an image by using an image segmentation and pixel point threshold judgment algorithm, laying a cushion for a subsequent decision fusion process, and removing an image with lower information content. Secondly, a Detail Feature Attention Model (DFAM) is provided, which can optimize a traditional network and adapt and explore deeper features and give more weight to coefficient levels through model expansion and post-convolution feature extraction mechanisms, so that the attention mechanism of the model is more biased to local or detail features, and then a moving convolution layer (MOVConv) is provided, which has higher robustness and can be used for low-precision calculation, can protect the capability of image feature information and reduce the risk of model overfitting. Both models are generic and extensible models that can be combined with existing CNN structures to form ME-CNNs, thereby enabling more accurate classification of images. In addition, the decision fusion process is added at the back end of the network, the process simulates the scene of repeated experiments, the interference of the locally confused image features on classification results is reduced to the greatest extent, and the accuracy of the classifier is further improved.

Example 2

An embodiment of the present invention provides an image classification system based on a model extended convolutional neural network, as shown in fig. 6, including:

the image preprocessing module 10 is configured to preprocess an image to be classified, and take an original image corresponding to an image satisfying the condition as an input image of the network. This module performs the method described in step S10 in embodiment 1, and will not be described here.

The image feature acquisition module 20 is configured to improve the convolutional network through model expansion and a post-convolutional feature extraction mechanism to form a detail feature attention model to extract features from the input image, so as to obtain image features. This module performs the method described in step S20 in embodiment 1, and will not be described here.

The image feature vector diagram obtaining module 30 is configured to create new image features by using the moving convolution layer to filter each input channel and then calculate a linear combination of the input channels, and perform dimension reduction on the image features to obtain an image feature vector diagram with more information. This module performs the method described in step S30 in embodiment 1, and will not be described here.

The classification result obtaining module 40 is configured to perform classification judgment on the image based on the image feature vector diagram to obtain a classification result, and obtain a classification result with the highest confidence coefficient through back-end decision fusion based on a preset back-end decision fusion algorithm as a final classification result. This module performs the method described in step S40 in embodiment 1, and will not be described here.

The image classification system based on the model expansion convolutional neural network provided by the embodiment of the invention uses the image segmentation and the threshold value judgment algorithm of the pixel points to preprocess the images, paves the subsequent decision fusion process and removes the images with lower information content. Secondly, a Detail Feature Attention Model (DFAM) is provided, which can optimize a traditional network and adapt and explore deeper features and give more weight to coefficient levels through model expansion and post-convolution feature extraction mechanisms, so that the attention mechanism of the model is more biased to local or detail features, and then a moving convolution layer (MOVConv) is provided, which has higher robustness and can be used for low-precision calculation, can protect the capability of image feature information and reduce the risk of model overfitting. Both models are generic and extensible models that can be combined with existing CNN structures to form ME-CNNs, thereby enabling more accurate classification of images. In addition, the decision fusion process is added at the back end of the network, the process simulates the scene of repeated experiments, the interference of the locally confused image features on classification results is reduced to the greatest extent, and the accuracy of the classifier is further improved.

Example 3

Embodiments of the present invention provide a computer device, as shown in fig. 7, which may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or otherwise, fig. 7 being an example of a connection via a bus.

The processor 51 may be a central processing unit (Central Processing Unit, CPU). The processor 51 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 52 serves as a non-transitory computer readable storage medium that may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as corresponding program instructions/modules in embodiments of the present invention. The processor 51 executes various functional applications of the processor and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 52, that is, implements the model-extended convolutional neural network-based image classification method in the above-described method embodiment 1.

Memory 52 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 51, etc. In addition, memory 52 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 52 may optionally include memory located remotely from processor 51, which may be connected to processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 52 that, when executed by the processor 51, perform the model-extended convolutional neural network-based image classification method of embodiment 1.

The details of the above computer device may be correspondingly understood by referring to the corresponding related descriptions and effects in embodiment 1, and will not be repeated here.

It will be appreciated by those skilled in the art that a program implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, and when executed, may include the above-described embodiment method flow. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. The image classification method based on the model extended convolutional neural network is characterized by comprising the following steps of:

improving a convolution network through model expansion and a post-convolution feature extraction mechanism to form a detail feature attention model to extract features of an input image to obtain image features, wherein the method comprises the following steps:

wherein w, d, r are the width, depth and resolution of the extended network, respectively; And +.> Is a parameter of a predefined base line network;

wherein H is ₀ For the points on the two-dimensional coordinates corresponding to the randomly selected image information, the value of K is a preset range, and the corresponding value of K is H on the two-dimensional coordinates ₀ The feature map corresponding to the point in the range K passes through a network with less expansion subsequently, and otherwise passes through a network with more expansion;

classifying and judging the image based on the image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient as a final classification result through back-end decision fusion based on a preset back-end decision fusion algorithm, wherein the method comprises the following steps:

prob _max (i)＝max{prob(1),prob(2)...prob(M)}

Z _last ＝max{fra(Z ₁ ),fra(Z ₂ )...fra(Z _M )}。

2. the image classification method based on the model expansion convolutional neural network according to claim 1, wherein the preprocessing of the image to be classified, taking the original image corresponding to the image satisfying the condition as the input image of the network, comprises:

num(S _gray (x _j ,y _j ))>K

3. The image classification method based on the model extended convolutional neural network according to claim 1, wherein the information amount H of the image is calculated by the following formula:

4. The model extended convolutional neural network-based image classification method of claim 1, wherein the moving convolutional layer comprises: a depth separable convolutional layer and a linear residual bottleneck layer connected thereto, wherein:

5. The image classification method based on a model-extended convolutional neural network according to any one of claims 1-4, wherein the image to be classified is a fine-grained image.

6. An image classification system based on a model-extended convolutional neural network, comprising:

the image feature acquisition module is used for improving a convolution network through model expansion and a post-convolution feature extraction mechanism to form a detail feature attention model to extract features of an input image to obtain image features, and comprises the following steps:

wherein w, d, r are the width, depth and resolution of the extended network, respectively;and +.> Is a parameter of a predefined base line network;

the classification result obtaining module is used for carrying out classification judgment on the image based on the image feature vector diagram to obtain a classification result, and obtaining the classification result with the highest confidence coefficient as a final classification result through back-end decision fusion based on a preset back-end decision fusion algorithm, and comprises the following steps:

prob _max (i)＝max{prob(1),prob(2)...prob(M)}

Z _last ＝max{fra(Z ₁ ),fra(Z ₂ )...fra(Z _M )}。

7. a computer-readable storage medium storing computer instructions for causing the computer to perform the model-extended convolutional neural network-based image classification method of any one of claims 1-5.

8. A computer device, comprising: a memory and a processor in communication with each other, the memory storing computer instructions, the processor executing the computer instructions to perform the model-extended convolutional neural network-based image classification method of any one of claims 1-5.