CN114445664A

CN114445664A - Image classification and identification method and device based on adaptive dynamic convolution network and computer equipment

Info

Publication number: CN114445664A
Application number: CN202210088188.4A
Authority: CN
Inventors: 钟福金; 黄健
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-06

Abstract

The invention relates to the field of pattern classification and identification, in particular to an image classification and identification method, a device and computer equipment based on a self-adaptive dynamic convolution network, wherein the method comprises the steps of obtaining an image to be detected, inputting the image to be detected into a preprocessing block, and obtaining a shallow characteristic diagram and pattern parameter information of the image; combining the preprocessed graphic parameter information with the image to be detected, and inputting the combined graphic parameter information and the image to be detected into a self-adaptive dynamic convolution network of a backbone network to obtain the global characteristics of the image; the self-adaptive dynamic convolution refers to selecting a convolution kernel with a corresponding shape according to corresponding parameter information; inputting the preprocessed image shallow feature map into a branch network, and extracting local features of the image to be detected; and performing feature fusion on the local features and the global features, inputting the fusion features into a classification network, and outputting classification identification information of the image to be detected. The method has the advantages of low calculation cost, high precision and strong applicability of related products, and is used for classifying and identifying the images.

Description

Image classification and identification method and device based on adaptive dynamic convolution network and computer equipment

Technical Field

The invention belongs to the field of pattern classification and identification, and particularly relates to an image classification and identification method and device based on a self-adaptive dynamic convolution network and computer equipment.

Background

The method comprises the steps that a deep learning network model is built for the purpose of a graph classification and recognition technology, and various information of an image is output after the image is input and passes through a deep learning network; the method is widely applied to image classification-based recognition tasks, such as the field of face recognition, the field of clothing image recommendation, the field of accurate advertisement push, the field of clothing matching recommendation, the fields of games, movies and the like, and is an active research topic in computer vision.

The classical pattern classification recognition algorithm consists of two successive but relatively independent stages: image feature extraction and image classification identification. According to the way of feature extraction, the current image classification and identification methods can be divided into two categories: the method is based on the traditional machine learning method; the second is a method based on deep learning. The traditional machine learning method mainly extracts image features manually and then classifies the images through a traditional classifier, so that classification and identification of the images are realized. In recent years, with the development of deep learning technology, the deep neural network has the most advanced performance in image recognition, can realize automatic extraction of image features, is widely applied to image classification and recognition, and achieves the achievement superior to the traditional machine learning method.

In the prior art, the design of the deep convolution neural network is mainly based on a network built by uniform rectangular convolution, and the main reason is that the rectangular convolution is more convenient and simpler for the calculation and storage of graphs. However, different images have different image characteristics, and the obtained model expression capability is not strong enough by using uniform rectangular convolution; moreover, the obtained image receptive field does not conform to the human visual range, and a lot of redundant and invalid data are often accumulated, so that the network structure is bloated, and a large amount of memory and computational power are needed.

Disclosure of Invention

Based on the problems in the prior art, the method and the device provided by the invention can quickly and accurately acquire the characteristics of the image if the corresponding convolution kernel can be selected for different input images. For example, for pictures with more circular factors, circular convolution is used, for pictures with more rectangular factors, rectangular convolution is used, for pictures with more circular arc elliptical factors, and elliptical convolution is used for convolution calculation; therefore, the self-adaptive dynamic convolution kernel can reduce the actual parameter quantity of the model and improve the characteristic expression capability of the deep learning neural network, thereby improving the accuracy of image identification and classification, and being a relatively new research direction at present.

In view of this, the present invention provides an image classification and identification method, apparatus, and computer device based on an adaptive dynamic convolution network, which can better acquire global features of an image through the adaptive dynamic convolution network, acquire features of local positions of the image through network branches based on Transformers blocks in an attention-oriented mechanism added in the network, and perform fusion by combining global features of a dynamic convolution backbone network, thereby further enhancing the extraction capability of image features of a model and effectively improving the accuracy of image classification and feature identification.

In a first aspect of the present invention, the present invention provides an image classification and identification method based on an adaptive dynamic convolution network, including the following steps:

a classification and identification method of a self-adaptive dynamic convolutional neural network comprises the following steps:

acquiring an image to be detected, inputting the image to be detected into a preprocessing block for preprocessing operation, and obtaining parameter information and a shallow characteristic diagram of the image to be detected;

combining the parameter information of the image to be detected with the original image to be detected to obtain image data with characteristic marks;

inputting image data into a self-adaptive dynamic convolution network of a backbone network, selecting a convolution kernel with a corresponding shape according to corresponding parameter information, and obtaining global features with rich semantics after multilayer convolution operation;

inputting the shallow feature map into a branch network, and extracting local features of the image to be detected;

and performing feature fusion on the local features and the global features, inputting the fusion features into a classification network, and outputting classification identification information of the image to be detected.

In a second aspect of the present invention, the present invention provides an image classification and identification apparatus based on an adaptive dynamic convolution network, specifically including:

the image acquisition unit is used for acquiring an image to be detected;

the image processing unit is used for inputting the image to be detected into a preprocessing block for preprocessing operation to obtain parameter information and a shallow characteristic diagram of the image to be detected;

the convolution matching unit is used for combining the parameter information of the image to be detected with the original image to be detected to obtain image data with characteristic labels; inputting image data into a self-adaptive dynamic convolution network of a backbone network, and selecting a convolution kernel with a corresponding shape according to corresponding parameter information;

the global feature extraction unit is used for carrying out multilayer convolution operation on the image to be detected according to the selected convolution kernel with the corresponding shape to obtain global features with rich semantics;

the local feature extraction unit is used for inputting the shallow feature map into a branch network and extracting local features of the image to be detected;

a global local feature fusion unit, configured to perform feature fusion on the local feature and the global feature;

and the image classification and identification unit is used for inputting the fusion characteristics into a classification network and outputting the classification and identification information of the image to be detected.

In a third aspect of the invention, the invention also provides a computer apparatus comprising at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, the processor being capable of performing the method according to the first aspect of the invention when invoked by the processor.

The invention has the beneficial technical effects that:

(1) the invention has the effects of high speed and high precision, and can classify and identify any input image to obtain a result.

(2) The invention provides a self-adaptive dynamic convolution network, which can select convolutions with different shapes according to the types of different images, thereby being capable of more dynamically extracting the characteristic information of the images and improving the classification precision.

(3) The invention provides a novel feature extraction framework considering global-local information, and by the overall and local feature extraction mode, the network can extract different types of graphic features, the feature characterization capability of the network is enhanced, and the classification and identification precision of the graphic is improved.

Drawings

FIG. 1 is a schematic diagram of an overall adaptive dynamic convolution network model of the present invention;

FIG. 2 is a flowchart of an image classification and identification method based on an adaptive dynamic convolution network according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the structure of MBconv convolution according to the present invention;

FIG. 4 is a schematic diagram of a dynamic convolution block of the present invention;

FIG. 5 is a schematic diagram of a circular convolution kernel in a dynamic convolution block according to the present invention;

FIG. 6 is a diagram of an elliptical convolution kernel in a dynamic convolution block according to the present invention;

FIG. 7 is a schematic diagram of a branch network of the adaptive dynamic convolution network model of the present invention;

FIG. 8 is a diagram of a attention module Transformer in a branched network according to the present invention;

FIG. 9 is a flowchart illustrating the overall operation of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 is a schematic diagram of an overall adaptive dynamic convolution network model according to the present invention, as shown in FIG. 1, the overall adaptive dynamic convolution network model in the present invention mainly includes a trunk dynamic convolution network model and a branch network; inputting an original image to be detected into a shallow feature extraction block to extract a shallow feature, and inputting the shallow feature into a branch network to extract a local feature; processing the shallow features with a classifier, and inputting the shallow features into a main dynamic convolution network model to extract global features; after the local features and the global features are fused, the classification recognition result of the image to be detected can be output through the two full-connection layers.

The invention relates to an image classification and identification method based on an adaptive dynamic convolution network, which comprises the following steps of:

101. acquiring an image to be detected, inputting the image to be detected into a preprocessing block for preprocessing operation, and obtaining parameter information and a shallow characteristic diagram of the image to be detected;

in the embodiment of the invention, the image classification and identification method can be realized based on an artificial intelligence technology. The method can be applied to scenes for classifying images. For example, classifying building images, classifying animal and plant images, classifying human bodies, classifying cells, and the like, classifying clothing images, classifying face images, and the like. The image to be detected is input into the self-adaptive dynamic convolution network, and classification identification information of the image to be detected can be output through a series of processing, wherein the classification identification information comprises image types, image characteristics and the like.

In the embodiment of the invention, the preprocessing block is divided into a shallow feature extraction block and a multi-task classifier, the shallow feature extraction block is composed of 3 MBConv convolution blocks, each convolution block is provided with three layers of convolution kernels and a SEnet channel, the first layer of convolution kernels is 1 multiplied by 1, and the number of the channels is set according to requirements; the convolution kernel of the middle layer is 3 multiplied by 3, the step length is 2, the filling is 1, and the down sampling is carried out; the last convolution kernel is 1 × 1, and a SEnet block structure is inserted in the middle as shown in FIG. 3. And obtaining a shallow feature map of the image after passing through the shallow feature extraction block, and preparing for extracting local information of the image in a subsequent branch network.

The multi-task classifier consists of two fully-connected layers and mainly has the main functions of preliminarily classifying shallow features (textures, shapes, edge information and the like) extracted by a residual error network and obtaining a shape probability label according to the shape with the largest proportion in the graph texturesIs classified as l_i(i ═ 1,2,3,4), which in turn represents circles, ovals, rectangles, squares, can be prepared for the subsequent selection of convolution kernels of different shapes.

102. Combining the parameter information of the image to be detected with the original image to be detected to obtain image data with characteristic marks;

in the embodiment of the invention, the parameter information of the image to be detected and the image to be detected need to be combined, so that the image to be detected has the characteristic label, and the characteristic label can reflect the shape information of the image to be detected; that is, the label probability information is added to the text label of the original image to be detected, so that the original image to be detected has shape information, the shape information is represented as the texture and the edge of the image, and the shape with the most factors in the shape is a circle, an ellipse, a rectangle or a square.

103. Inputting image data into a self-adaptive dynamic convolution network of a backbone network, selecting a convolution kernel with a corresponding shape according to corresponding parameter information, and obtaining global features with rich semantics after multilayer convolution operation;

in the embodiment of the invention, preprocessed label information and an original image to be detected are combined and input into a self-adaptive dynamic convolution network of a backbone network, a convolution kernel with a corresponding shape is selected according to corresponding parameter information, global features with rich semantics are obtained, and preparation is made for subsequent classification and identification;

in the embodiment of the present invention, the preprocessed image to be measured is accompanied by the parameter information thereof, and after being input to the dynamic volume block, the convolution kernel with the corresponding shape can be determined and selected according to the parameter information, and the structure of the dynamic volume block is as shown in fig. 4.

Compared with the traditional image classification and identification algorithm, the method adopts the neural network built on the basis of the dynamic volume block to reduce the size of the model and improve the accuracy of the model, the following explains the calculation mode about the circular convolution kernel and the elliptic convolution kernel receptive field in the dynamic volume block, and the calculation mode is realized through one input characteristic diagram

And a special square convolution kernel in the rectangular convolution kernel

This embodiment can obtain a shallow profile of the output of a single j

Correspondingly, the shallow feature map obtained by the circular convolution kernel or the elliptical convolution kernel in this embodiment:

since the circular or elliptical kernel field contains fractional positions, bilinear interpolation is used to obtain the sample values in the approximate square field, where t represents any (fractional) position in the square field, and B (s, j + r) is the transformation matrix of the bilinear interpolation. The structure of the invention realizes the process of converting the square convolution kernel to the ellipse convolution kernel, and can acquire the characteristics of the graph more for the convolution operation of the graph.

The formula of the bilinear difference is as follows:

linear interpolation in the x direction yields:

linear interpolation in the y direction yields:

the result f (x, y) is finally obtained:

therefore, the structure of the circular convolution kernel in the dynamic convolution kernel is as follows, the square convolution is obtained by bilinear difference, and the specific operation takes 3 × 3 square convolution as an example: abstracting 3 multiplied by 3 lattices into 9 points, taking the center point of square convolution as the center of a circular convolution kernel, establishing a coordinate system, and taking the length from the center point to one side point as a radius; thus the formula x is given by the circle²+y²＝r²(x, y) represents the position of the point in the coordinate system, r represents the radius of the circle, and the positions of the four points of the convolution of the circle relative to the center point are

The positions of the remaining 5 points are the same as the original square convolution kernel; the size of the selected circular convolution kernel can be changed to be 5 × 5, 7 × 7 and the like according to requirements. In the embodiment of the present invention, a 3 × 3 square convolution kernel as shown in fig. 5 is adopted, and a corresponding circular convolution kernel is obtained after Bilinear interpolation (Bilinear interpolation).

The structure of the elliptic convolution kernel in the dynamic convolution kernel is as follows, and the rectangular convolution is obtained by bilinear difference, specifically taking a 3 × 4 rectangular convolution kernel as an example: abstracting 3 multiplied by 4 lattices into 12 points, establishing a coordinate system by using a central point of the rectangular convolution, and taking a half of the length of a rectangular convolution kernel as a long axis a and a half of the width as a short axis b; thus formed by an elliptical equation

The positions of the four angular points of the elliptic convolution are relative to the position of the central point

Wherein, the characteristic information such as the boundary texture of the graph is difficult to obtain in the convolution process due to the two points in the middleThe calculation parameters are reduced by two amounts and are therefore cancelled. The remaining 6 point locations are the same as the original rectangular convolution kernel. In the embodiment of the present invention, a 3 × 4 rectangular convolution kernel as shown in fig. 6 is adopted, and a corresponding elliptic convolution kernel is obtained after Bilinear interpolation (Bilinear interpolation).

The square convolution in the dynamic convolution kernel is found in the adjustment experiment of the VGG network structure, and the experiment effect of the 2 x 2 square convolution kernel structure is good, so that the square convolution kernel is selected for use.

The rectangular convolution in the dynamic convolution kernel is found in the adjustment experiment of the VGG network structure, and the experiment effect of the 1 × 3 rectangular convolution kernel structure is better, so that the rectangular convolution kernel is selected for use.

104. Inputting the shallow feature map into a branch network, and extracting local features of the image to be detected;

in the embodiment of the present invention, the branch network is composed of 2 self-attention mechanism fransformer blocks and 2 MBconv structure blocks in series, where the self-attention fransformer block is divided into an encoder and a decoder, the structure is shown in fig. 8, and the left part is the encoder divided into three parts: input part, attention mechanism, feedforward neural network. The input part is divided into Embedding and position Embedding, the Embedding divides information into certain dimension word vectors, and the position Embedding is obtained by adopting the following formula:

PE_(pos,2i)＝sin(pos/1000^2i/dmodel)

PE_(pos,2i+1)＝cos(pos/1000^2i/dmodel)

i represents the position of the character code, and the input part is obtained by fusing the byte obtained by the segmentation and the position code. And the formula of the attention mechanism is as follows:

wherein Q represents a query vector, K represents a key vector, and V represents a value vector. The Muti-Head attention is shown as connecting a plurality of attention modules to form a multi-Head attention mechanism. The feedforward neural network consists of inputs passing through a fully-connected layer and a Norm layer, where the inputs plus the residual connect the Norm layer.

The right part is a decoder, and the structure consists of two multi-head attention mechanisms and a feedforward neural network group. The MBConv convolution consists of a bottleneck structure, which consists of a 1 × 1, 3 × 3 and 1 × 1 convolution block, and a SEnet structure is added in the middle, and the structure is shown in FIG. 3. In the embodiment of the present invention, MBConv is selected because the FFN modules of MBConv and transforms both use an "inverted bottleeck" design, that is, the size of an input channel is expanded by 4 times, and then the input channel is mapped back to the original channel size, so that residual concatenation can be used, except for using "inverted bottleeck", a deep convolution kernel can be expressed as a numerical weighted sum of predefined sense fields from the attention, and convolution requires a fixed convolution kernel, which obtains information from a local sense field:

wherein x_i，

Respectively an input and an output for the i-position,

representing the adjacent area of i.

For comparison, the self-attentive receptive field covers all spatial locations according to a pair of points (x)_i，y_i) Calculating the weight value according to the normalized similarity:

wherein,

representing global position space information. MBConv and self attentionThe advantages and disadvantages of (1) are as follows:

first, a deep convolution kernel w_i-jIs a static parameter independent of the input, while the attention weight is dynamically dependent on the characterization of the input, and the correlation of different spatial positions can be easily obtained from the attention, but this flexibility also carries the risk of easy overfitting, especially when the data is limited.

Second, given a pair of spatial points (i, j), the corresponding convolution weights w_i-jConcerning only the relative offset of the two, and not the specific value of i or j, which is usually referred to as translational equivalence, generalization can be improved over a limited data set because of the use of absolute position embedding, which is lacking in criterion ViT.

Finally, the receptive field size is also the most fundamental difference between self-attention and convolution, and in general, a larger receptive field provides more semantic information and the model capability is stronger. Therefore, the key reason why people apply their own attention to the visual field is that it can provide a global receptive field. However, a large field requires a very large amount of computation. Taking the global receptive field as an example, the complexity is exponential with respect to the size of the space, which limits its application scope. Therefore, the advantages of the two modules are combined, the local characteristic information of the graph can be better extracted, and the local characteristic information is obtained after Softmax planning:

and finally, selecting a C-C-T-T mode, namely selecting two MBConv modules and then accessing a Rel-Attention module and an FFN module, wherein a specific structural diagram is shown in FIG. 7.

105. And performing feature fusion on the local features and the global features, inputting the fusion features into a classification network, and outputting classification identification information of the image to be detected.

After the features of the main network and the branch network are extracted, a global pooling layer and a full connection layer are connected to form a classification network, and the classification identification information of the image to be detected can be output by inputting the fusion features of the image to be detected into the classification network.

Therefore, the core of the invention lies in providing a non-traditional static convolution kernel, namely, an adaptive dynamic convolution kernel, for the graphs, most of the graphs are not in the shape of a square and have graphs with various radians, so that the graph is subjected to convolution scanning calculation by using convolution kernels with different shapes aiming at different images, the characteristics of the original graph can be better fitted, obvious and effective image global characteristic elements can be better extracted, and the characteristic elements are utilized and combined with the characteristics of the local key points of the image detected by a branch network for fusion, and finally input into a classifier, and the related results of image classification and attribute characteristics can be better obtained. In the invention, if not specifically emphasized, the neural network model of the invention may adopt a traditional VGG model structure, or may be constructed by selecting a lightweight model MobileNet structure, and those skilled in the art can adaptively understand according to the overall embodiment of the invention and the accompanying drawings.

In consideration of the fact that the convolutional neural network model finally obtained by the present invention needs to be trained before being used for image classification and recognition, fig. 9 shows a flowchart of an image classification and recognition method based on an adaptive dynamic convolutional network according to a preferred embodiment of the present invention. As shown in fig. 9, the method includes: constructing a self-adaptive dynamic convolution network, inputting a data set into a preprocessing block for preprocessing to obtain graphic parameter information and a shallow feature map; combining the graphic parameter information and the original graph to be tested and inputting the graphic parameter information and the original graph to be tested into a backbone network for continuous processing, and extracting global characteristics; continuously processing the shallow feature map obtained by preprocessing through a branch network, and extracting local features; after the global features and the local features are fused, a classification network formed by a global pooling layer and a full connection layer is used for completing image classification and identification of a data set, in the process, a cross entropy loss function of two branches needs to be solved in a joint optimization mode, a convolutional neural network formed by the global branches and the local branches is subjected to iterative training until convergence, and a trained convolutional neural network model is stored.

The data set adopted by the invention is an ImageNet data set, the data set ImageNet is created by Kaggle, Anthony Goldbloom 2010 in Melbour, and mainly provides a platform for developers and data scientists to hold machine learning competitions, host databases, write and share codes) for continuous maintenance.

ImageNet is a continuous research effort aimed at providing an easily accessible database of images for researchers around the world. Currently there are a total of 14197122 images in ImageNet, which are divided into a total of 21841 categories (syncets), and the major categories include: ampibian, animal, apple, bird, mining, device, fabric, fish, flower, food, free, funus, furniture, geographic format, overture, mammal, musical instrument, plant, reptile, sport, structure, tool, tree, utesil, vegetable, vehicle, person.

For the attributes of the target, there are approximately 400 syncets labeled so far, and for each syncet, 25 attributes are included: A. color: black, blue, brown, grey, green, orange, pink, red, violet, white, yellow; B. pattern (pattern): spots, streaks; C. shape: long, round, rectangular, square; D. texture (texture): hairy antler, smooth, rough, glossy, metallic color, etc.

In some embodiments of the invention, the classification network can be trained and adjusted by using an Adam optimizer, after multiple rounds of training, the neural network tends to be stable, and the iterative process is finished, so that a trained convolutional neural network model is obtained.

The embodiment of the invention also provides an image classification and identification device based on the self-adaptive dynamic convolution network, which comprises:

the image acquisition unit is used for acquiring an image to be detected;

The embodiment of the invention also provides computer equipment, which comprises at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform an adaptive dynamic convolution network based image classification recognition method.

It can be understood that, part of features of the image classification and identification method, apparatus and computer device based on the adaptive dynamic convolutional network of the present invention may be mutually cited, for example, a global module of a global branch corresponding system in the method, etc., and those skilled in the art may correspondingly understand and implement the method according to the embodiment of the present invention, and the details of the present invention are not repeated.

In the description of the present invention, it is to be understood that the terms "coaxial", "bottom", "one end", "top", "middle", "other end", "upper", "one side", "top", "inner", "outer", "front", "center", "both ends", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "disposed," "connected," "fixed," "rotated," and the like are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; the terms may be directly connected or indirectly connected through an intermediate, and may be communication between two elements or interaction relationship between two elements, unless otherwise specifically limited, and the specific meaning of the terms in the present invention will be understood by those skilled in the art according to specific situations.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An image classification and identification method based on an adaptive dynamic convolution network, the method comprising:

2. The image classification and identification method based on the adaptive dynamic convolution network as claimed in claim 1, wherein the preprocessing block comprises a shallow feature extraction block and a multitask classifier: the shallow feature extraction block is used for extracting shallow features of the image to be detected, the shallow features comprise parameter information, namely texture, shape and edge information, of the image to be detected, the multi-task classifier preliminarily classifies the image to be detected into images of different shapes according to the parameter information, label information, namely label probability information, of a graph probability is obtained, and the label information is marked as l_i(i ═ 1,2,3,4) and represents, in this order, a circle, an ellipse, a square, and a rectangle.

3. The image classification and identification method based on the adaptive dynamic convolution network as claimed in claim 2, wherein the step of combining the parameter information of the image to be detected with the original image to be detected to obtain the image data with the feature label includes the step of adding the label probability information to a text label of the original image to be detected, so that the original image to be detected is provided with shape information, and the shape information represents texture and edge of the image, and the shape with the most factors in the shape is a circle, an ellipse, a square or a rectangle.

4. The image classification and identification method based on the adaptive dynamic convolution network is characterized in that the adaptive dynamic convolution network comprises a plurality of groups of dynamic convolution blocks, and each group of dynamic convolution blocks is composed of a group of convolution kernels with different shapes in parallel and comprises a circular convolution kernel, an elliptical convolution kernel, a square convolution kernel and a rectangular convolution kernel; labeling each shape's convolution kernel with aParameter m of same_u(u is 1,2,3,4), and selecting a convolution kernel corresponding to the labeled parameter information according to the parameter information matching of the image to be detected, namely when l is_i＝m_uSelecting convolution kernel with corresponding shape to perform convolution operation,/_i(i is 1,2,3,4) and sequentially represents a circle, an ellipse, a square and a rectangle.

5. The image classification and identification method based on the adaptive dynamic convolution network as claimed in claim 3, wherein the circular convolution kernel is obtained by square convolution through bilinear difference, and the square convolution kernel with size of N x N is abstracted to N²The grid points take the center point of the square convolution kernel as the center of the circular convolution kernel and establish a coordinate system, and the length from the center point to one side point is taken as a radius; and determining the positions of the circular convolution kernel points according to a circular formula, namely determining the lattice points of the positions corresponding to the 45-degree angles in the circular convolution kernel, wherein the positions of the rest lattice points are the same as the positions of the original square convolution kernel.

6. The image classification and identification method based on the adaptive dynamic convolution network according to claim 3, characterized in that the elliptic convolution kernel is obtained by a rectangular convolution kernel through bilinear difference values, the rectangular convolution kernel with the size of N x M is abstracted into N x M grid points, the center point of the rectangular convolution kernel is taken as the center of the elliptic convolution kernel, and a coordinate system is established; determining the position of each point of the elliptic convolution kernel according to an elliptic formula by taking 1/2 length of the rectangular convolution kernel as the semimajor axis of the elliptic convolution kernel and 1/2 width as the semiminor axis of the elliptic convolution kernel; the remaining lattice positions are the same as the original square convolution kernel.

7. The image classification and identification method based on the adaptive dynamic convolution network is characterized in that the branch network comprises 2 MBConv convolution modules and 2 transform modules which are serially connected.

8. An image classification and identification device based on an adaptive dynamic convolution network, which is characterized by comprising:

the image acquisition unit is used for acquiring an image to be detected;

the local feature extraction unit is used for inputting the shallow feature map into a branch network and extracting the local features of the image to be detected;

9. A computer device comprising at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.