CN112784909B - Image classification and identification method based on self-attention mechanism and self-adaptive sub-network - Google Patents

Image classification and identification method based on self-attention mechanism and self-adaptive sub-network Download PDF

Info

Publication number
CN112784909B
CN112784909B CN202110119391.9A CN202110119391A CN112784909B CN 112784909 B CN112784909 B CN 112784909B CN 202110119391 A CN202110119391 A CN 202110119391A CN 112784909 B CN112784909 B CN 112784909B
Authority
CN
China
Prior art keywords
network
layer
neuron
attention
image classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110119391.9A
Other languages
Chinese (zh)
Other versions
CN112784909A (en
Inventor
李惠
徐阳
胡芳侨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202110119391.9A priority Critical patent/CN112784909B/en
Publication of CN112784909A publication Critical patent/CN112784909A/en
Application granted granted Critical
Publication of CN112784909B publication Critical patent/CN112784909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image classification and identification method based on an attention mechanism and an adaptive sub-network, which comprises the following steps: constructing a neuron calculation model for initial image classification identification; enabling each layer of neurons of the model to focus attention on a region of interest of a preset image by using a self-attention mechanism, and extracting an attention proportion coefficient of the region of interest; enabling a single neuron of the model to learn the nonlinear expression ability by using an adaptive sub-network so as to extract high-level features of a preset image; the calculated amount in the image classification and identification process is controlled by setting the attention proportion coefficient, the number of sub-network layers and the number of sub-network nodes in the nonlinear expression capability, so that the calculation cost is controlled while the high-accuracy image classification and identification result is obtained. The method has the difficult problem of complex nonlinear expression capability, and can realize the effects of concentration, local network deepening and widening under a complex application scene, thereby improving the expression capability and the identification accuracy of the network.

Description

Image classification and identification method based on self-attention mechanism and self-adaptive sub-network
Technical Field
The invention relates to the technical field of artificial intelligence, neural networks, computer vision and deep learning, in particular to an image classification and identification method based on an attention mechanism and an adaptive sub-network.
Background
At present, a neuron calculation model commonly used in the field of artificial intelligence comprises two parts, namely a linear operation (weight multiplication and bias addition) and a nonlinear activation function, and the mathematical expression of the neuron calculation model is as follows:
Figure BDA0002921422550000011
in the formula,
Figure BDA0002921422550000012
the ith neuron representing the l-th layer,
Figure BDA0002921422550000013
represents the jth neuron of layer l +1,
Figure BDA0002921422550000014
represents a weight coefficient connecting an ith neuron of the l-th layer and a jth neuron of the l + 1-th layer,
Figure BDA0002921422550000015
represents the bias coefficient corresponding to the jth neuron of the l +1 th layer, nlRepresents the number of neurons in layer I, and sigma represents a nonlinear activation function, and common forms include sigmoid, tanh, ReLU, and the like.
The formula (1) is a neuron calculation model commonly used at present, takes a node value of a neuron at an upper layer as input, multiplies a weight coefficient connected with the neuron at the current layer, accumulates the weight coefficient, and sums the weight coefficient and a bias coefficient of the neuron at the current layer to obtain an operation result of a first step of linear transformation; then, nonlinear transformation is carried out on the linear operation result through a nonlinear activation function, so that the neuron obtains nonlinear expression capability.
As shown in fig. 1, on the basis of a single neuron operation, a plurality of neuron nodes are arranged in each layer of neural network, and the transmission process from the front layer input to the rear layer output is repeated for a plurality of times, so that a multilayer neural network can be formed, and approximate fitting of a more complex nonlinear mapping relation is realized.
As shown in fig. 2, based on the operation mode of the multi-layer neural network, deep learning is based on a deeper network architecture. In addition, the operation principle of the convolutional neural network which is very popular in the deep learning field is similar to that of the neural network, and only the convolutional kernel is adopted for realization, namely, firstly, the linear operation is completed by multiplying the weight coefficient in the convolutional kernel by the pixel point in the receiving domain and then overlapping the bias coefficient, and then the nonlinear activation operation is performed, which is the same as the formula (1).
In summary, the current common network architecture in the field of artificial intelligence is a neuron calculation model based on formula (1), that is, the interior of a single neuron is multiplied and summed by a weight coefficient, then added with a bias coefficient, and finally added with nonlinearity through a nonlinear activation function.
However, when facing a practical complex application scenario, the conventional neuron computational model has the following disadvantages:
(1) due to the limitation of the design function of the formula (1), high-order operation cannot be realized in a single neuron, and only simple linear (multiplication and addition) and nonlinear (sigmoid, tanh, ReLU and the like) operation can be performed, so that the expression capacity of the single neuron is limited;
(2) the human brain nerve cells can release very complex chemical transmitters and electrical signals in the working process, and the process cannot be effectively expressed only by carrying out one-time combination of simple linear and nonlinear operations according to the formula (1), namely the traditional neuron calculation model cannot truly reflect the calculation process of processing data by the human brain nerve cells;
(3) at present, deep learning needs to obtain a sufficiently strong nonlinear expression capability through a very deep network architecture, and one important reason is that one layer of neurons can only represent one nonlinear activation operation, so that a large number of layers is needed to construct a complex nonlinear approximation function. However, the nerve cells in the human cerebral cortex are not infinitely multi-layered, with only a few distinct functional areas. Therefore, the real human brain does not obtain the nonlinear expression capability through the connection of a plurality of layers of neurons, that is, the single neuron calculation model is required to have the sufficiently strong nonlinear expression capability, which is not possessed by the formula (1);
(4) aiming at some special problems, such as image classification and identification in fuzzy, shielding and other scenes, if the neuron can adaptively focus attention on some special areas and extract local high-level features of the image for classification and identification, the identification performance can be improved; however, the existing neuron model does not have the significance of distinguishing the nodes.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
To this end, an object of the present invention is to provide an image classification and identification method based on an attention-free mechanism and an adaptive sub-network.
In order to achieve the above object, an embodiment of the present invention provides an image classification and identification method based on an attention-free mechanism and an adaptive sub-network, including the following steps: step S1, constructing a neuron calculation model for initial image classification and identification; step S2, using a self-attention mechanism to make each layer of neurons of the neuron calculation model identified by the initial image classification focus attention on a region of interest of a preset image, and extracting an attention proportion coefficient of the region of interest; step S3, learning the non-linear expression ability of a single neuron of the neuron calculation model identified by the initial image classification by using an adaptive sub-network so as to extract the high-level features of the preset image, wherein the non-linear expression ability is controlled by the number of sub-network layers and the number of sub-network nodes; and step S4, controlling the calculation amount in the image classification and identification process by setting the attention ratio proportion coefficient, the number of the sub-network layers and the number of the sub-network nodes, and controlling the calculation cost while obtaining the high-accuracy image classification and identification result.
The image classification and identification method based on the self-attention mechanism and the adaptive sub-network is characterized in that a self-attention module and an adaptive sub-network module are added on the basis of the traditional neuron calculation process, key control parameters of the image classification and identification method comprise an attention proportion coefficient, a sub-network layer number, a sub-network node number proportion coefficient and the like, only the kernel of the traditional neuron calculation model needs to be modified, the increase of the calculated amount in the image classification and identification process is controlled by setting the attention proportion coefficient, the sub-network layer number and the sub-network node number proportion coefficient, the accuracy of an image classification and identification result is guaranteed, and the image classification and identification method can be applied to any neural network architecture, such as data mining, image classification, structural damage identification and the like.
In addition, the image classification and identification method based on the self-attention mechanism and the adaptive sub-network according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the step S2 specifically includes:
αl=G[softmax(Xl),β]
Figure BDA0002921422550000031
wherein exp is exponential operation, softmaxPerforming exponential normalization operation on all input neurons in the l-th layer to obtain a weight vector softmax (X) with a value range of (0,1)l) (ii) a G is gate operation, beta is attention force proportion coefficient, the value range (0,1) is taken, namely the first beta elements in the weight vector are selected, and all the other elements are set as 0; alpha is alphalTo obtain the attention coefficient vector of all neurons in the l-th layer.
Further, in an embodiment of the present invention, the step S3 specifically includes:
Figure BDA0002921422550000032
h=γ·nl+1
wherein,
Figure BDA0002921422550000033
representing an adaptive sub-network, wherein k is the number of hidden layers in the adaptive sub-network and is at least 2; h is the number of neuron nodes in each layer of the adaptive sub-network, and is composed of the node coefficient gamma of the adaptive sub-network and the node number n of the new network in the current layerl+1The value range of gamma is 0-1.
Further, in an embodiment of the present invention, the width of the adaptive subnetwork is proportional to the size of the network node coefficient γ of the adaptive subnetwork, and the number k of concealment layers in the adaptive subnetwork is proportional to the depth of the adaptive subnetwork.
Further, in an embodiment of the present invention, the operation structure in the step S4 is:
Figure BDA0002921422550000034
wherein,
Figure BDA0002921422550000035
representing an implicit subnetwork, α, in the jth neuron of layer l +1lIs the attention weight coefficient vector, X, of all neurons in the l-th layerlA vector formed by all neuron values of the l-th layer,
Figure BDA0002921422550000036
for the jth neuron internal subnetwork of the l +1 th layer
Figure BDA0002921422550000037
The corresponding network parameters.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a data flow diagram of a conventional neuron computational model;
FIG. 2 is a schematic diagram of a conventional multi-layer neural network;
FIG. 3 is a flow chart of an image classification identification method based on an adaptive attention mechanism and an adaptive sub-network according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the operation of the self-attention module according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating the operation of an adaptive subnetwork in accordance with an embodiment of the present invention;
FIG. 6 is a novel neuron computational model constructed by the image classification and identification method based on the self-attention mechanism and the adaptive sub-network according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating recognition of a handwritten digit classification task in a fuzzy scene according to a first embodiment of the present invention, in which (a) is a number 1 in different fuzzy degrees, (b) is a number 5 in different fuzzy degrees, and (c) is a number 6 in different fuzzy degrees;
FIG. 8 is a schematic diagram illustrating a fusion manner on a U-Net semantic segmentation architecture according to a second embodiment of the present invention;
fig. 9 is a schematic diagram of an identification effect on a semantic segmentation task of a steel box girder micro fatigue crack according to a second embodiment of the present invention, where (a) is an original image, (b) is a real label, and (c) is a new neuron calculation model prediction result;
fig. 10 is a comparison diagram of the xor operation performed on the conventional neuron and the novel neuron model according to the third embodiment of the present invention, where (a) is an xor classification problem, (b) is the novel neuron model, and (c) is the conventional multilayer neural network model.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
An image classification and identification method based on an attention-free mechanism and an adaptive sub-network according to an embodiment of the present invention is described below with reference to the accompanying drawings.
FIG. 3 is a flowchart of an image classification identification method based on an attention-free mechanism and an adaptive sub-network according to an embodiment of the present invention.
As shown in fig. 3, the image classification and identification method based on the self-attention mechanism and the adaptive sub-network comprises the following steps:
in step S1, a neuron computational model for initial image classification recognition is constructed.
That is, an initial image classification recognition neuron calculation model is established, which is an existing general neuron calculation model, that is, an initial image classification recognition neuron calculation model
Figure BDA0002921422550000051
Wherein,
Figure BDA0002921422550000052
the ith neuron representing the l-th layer,
Figure BDA0002921422550000053
represents the jth neuron of layer l +1,
Figure BDA0002921422550000054
represents a weight coefficient connecting an ith neuron of the l-th layer and a jth neuron of the l + 1-th layer,
Figure BDA0002921422550000055
represents the bias coefficient corresponding to the jth neuron of the l +1 th layer, nlRepresents the number of neurons in the l-th layer, and σ represents the nonlinear activation function. Based on the existing general neuron calculation model.
In step S2, each layer of neurons of the neuron computational model identified by the initial image classification is made to concentrate attention on a region of interest of the preset image by using the self-attention mechanism, and an attention proportion coefficient of the region of interest is extracted.
That is, as shown in FIG. 4, a self-attention module is used on the basis of the initial neuron computational model, i.e.
αl=G[softmax(Xl),β] (2)
Figure BDA0002921422550000056
Wherein exp is exponential operation, softmax is exponential normalization operation of all input neurons of the l-th layer, and weight vector softmax (X) with value range of (0,1) is obtainedl) I.e. the region of interest of the preset image; g is gate operation, beta is attention force proportion coefficient, the value range (0,1) is taken, namely the first beta elements in the weight vector are selected, and all the other elements are set as 0; alpha is alphalTo obtain the attention coefficient vector of all neurons in the l-th layer.
Further, according to the formulas (2) and (3), the invention can cut off in the neurons of the next layer according to the importance degree of all the neurons of the previous layer, and reserve the nodes with relative importance and ignore the nodes with less importance, that is, reserve the interested region of the preset image and ignore the non-interested region, thereby reducing a part of the calculation amount. Meanwhile, the degree to which the calculation amount is reduced may be controlled by the attention force proportion coefficient β.
In step S3, the adaptive sub-network is used to make individual neurons of the neuron computational model identified by the initial image classification learn the non-linear expression capability controlled by the number of sub-network layers and the number of sub-network nodes to extract the high-level features of the preset image.
In particular, as shown in fig. 5, an adaptive sub-network is employed on the basis of the use of the self-attention module, i.e.
Figure BDA0002921422550000057
h=γ·nl+1 (5)
Wherein,
Figure BDA0002921422550000061
representing an adaptive sub-network, wherein k is the number of hidden layers in the adaptive sub-network and is at least 2; h is the number of neuron nodes in each layer of the adaptive sub-network, and is composed of the node coefficient gamma of the adaptive sub-network and the node number n of the new network in the current layerl+1The value range of gamma is 0-1.
Obviously, the larger the value of the node coefficient γ of the adaptive sub-network is, the wider the width of the adaptive sub-network is; the larger the number of hidden layers k in the adaptive subnetwork, the deeper the subnetwork is represented. Therefore, the setting of the adaptive sub-network can widen and deepen the original network in a local range, thereby improving the capability of feature extraction, namely extracting the high-level features of the preset image. Meanwhile, a game balance exists between the expression capability and the calculation amount consumption of the network, namely: the more powerful the expression of the sub-network, the more computationally intensive the overall network will be.
In step S4, the calculation amount in the image classification and identification process is controlled by setting the attention scaling factor, the number of sub-network layers, and the number of sub-network nodes, thereby controlling the calculation cost while obtaining a high-accuracy image classification and identification result.
Specifically, as shown in fig. 6, a new neuron computational model is obtained after step S3, i.e.
Figure BDA0002921422550000062
Wherein,
Figure BDA0002921422550000063
representing an implicit subnetwork, α, in the jth neuron of layer l +1lIs the attention weight coefficient vector, X, of all neurons in the l-th layerlA vector formed by all neuron values of the l-th layer,
Figure BDA0002921422550000064
for the jth neuron internal subnetwork of the l +1 th layer
Figure BDA0002921422550000065
The corresponding network parameters.
The invention only needs to modify the kernel of the initial neuron calculation model (adding the attention module and the self-adaptive sub-network module), namely, the invention controls the increase of the calculated amount in the image classification and identification process by setting the attention proportionality coefficient, the number of the sub-network layers and the proportionality coefficient of the number of the sub-network nodes, obtains the image classification and identification result with higher accuracy, and can be applied to any neural network architecture.
The image classification and identification method based on the self-attention mechanism and the adaptive sub-network is further described below with reference to three specific examples.
First embodiment, the method is applied to recognition of the handwritten digit classification task in the complex scene
Firstly, constructing a neuron calculation model for initial image classification and identification; then, a self-attention module is utilized to enable each layer of neurons of the neuron calculation model for initial image classification and identification to focus the analyzed emphasis on a local important region of the image, and meanwhile, an attention proportion coefficient of the local important region is extracted; and then, a single neuron of a neuron calculation model for initial image classification recognition learns the nonlinear expression capability by using the self-adaptive sub-network, so that an internal sub-network can extract high-level features of the image, and further, the calculation amount is controlled by controlling the attention proportion coefficient and the nonlinear expression capability, so that the classification accuracy of an image classification result is improved, and the image recognition under a complex scene also has better performance.
Specifically, considering the fuzzy scene first, as shown in fig. 7, classifying MNIST handwritten digital images 0-9 under the fuzzy scene improves the classification accuracy by 4.76% compared to the conventional model (from 82.02% to 86.78%).
Then, considering the occlusion scene, as shown in table 1 below, the MNIST handwritten digital images 0-9 are classified in the occlusion scene, 1, 3, and 5 random occlusion regions are used at 50% occlusion rate, compared with the conventional model, the classification accuracy is respectively improved by 0.63%, 1.58%, and 1.45%, and the average accuracy is improved by 1.22% (from 68.41% to 69.63%).
TABLE 1 recognition results of novel neuron model on handwritten digit classification task in sheltered scene
Figure BDA0002921422550000071
Second embodiment, the method is applied to semantic segmentation task of the micro fatigue cracks of the steel box girder
As shown in FIG. 8, the method can be used for fusing on a U-Net semantic segmentation network, replacing the original convolution and nonlinear activation operation with a calculation process shown in a formula (1), and verifying on a steel box girder micro fatigue crack data set by using the fused network.
As shown in FIG. 9, the invention can well identify the fine crack pixels from the fatigue crack image of the steel box girder with the complex background interference. The quantitative evaluation index is the occupation ratio of a fatigue crack pixel prediction region and a real region, compared with original U-Net, after a novel neuron calculation model is fused, the occupation ratio is increased from 0.315 to 0.342, the promotion of 0.027 is obtained, and the effect promotion reaches 8.5%.
Further, the image classification and recognition direction can be applied to other problems as follows:
third embodiment, the method is applied to the task of classifying the XOR problem
As shown in fig. 10, a complex xor operation can be achieved using only the present invention; however, for a conventional neuron (initial neuron computational model), at least two layers of neural networks are required to achieve this function.
To sum up, the image classification and identification method based on the attention mechanism and the adaptive sub-network provided by the embodiment of the invention fundamentally solves the problem that the current single neuron calculation model does not have complex nonlinear expression capability, and also has the following advantages:
(1) the method breaks through the limitation that linear and nonlinear operations are simply combined once in the traditional single neuron calculation model, realizes high-order nonlinear operation in the neuron through a sub-network module, and improves the expression capacity of the single neuron, wherein the order can be controlled by sub-network parameters;
(2) the calculation process of human brain nerve cell processing data is better simulated, input high-level features can be extracted from the interior of a single neuron, the importance of nodes can be analyzed based on a self-attention mechanism, the calculation power is concentrated on the nodes needing important processing, and unimportant nodes are abandoned to reduce the calculation amount;
(3) the increase of the calculated amount is controlled by setting parameters such as an attention proportional coefficient, the number of sub-network layers, the number of sub-network nodes proportional coefficient and the like, and meanwhile, the identification precision is ensured, so that the calculation cost is reduced;
(4) the method can be expanded and applied to any neural network architecture, only the kernel of the traditional neuron calculation model needs to be modified, the application is very convenient, and the fusion on any neural network can be realized by modifying the bottom code of the original neuron calculation model into the code of the proposed model, including but not limited to a multilayer perceptron, a convolutional neural network and the like;
(5) complex XOR operation can be realized only by a novel neuron, and for the traditional neuron, the function can be realized by at least two layers of neural networks;
(6) aiming at the image classification problem of a fuzzy scene, compared with a traditional neuron calculation model, the MNIST data set improves the classification accuracy by 4.76%;
(7) aiming at the problem of image classification of an occlusion scene, compared with a traditional neuron calculation model on an MNIST data set, the method has the advantage that 1.22% of classification accuracy improvement is obtained on average under three working conditions of the same occlusion rate and different occlusion areas;
(8) the method can also be applied to the field of structural damage recognition, a novel neuron calculation model is added into a U-Net semantic segmentation framework, and 2.7% of occupation ratio promotion is obtained on a steel box girder micro fatigue crack data set.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (3)

1. An image classification and identification method based on an attention mechanism and an adaptive sub-network is characterized by comprising the following steps:
step S1, constructing a neuron calculation model for initial image classification and identification;
step S2, using a self-attention mechanism to make each layer of neurons of the neuron computing model identified by the initial image classification focus attention on a region of interest of a preset image, and extract an attention proportion coefficient of the region of interest, wherein the step S2 specifically is:
αl=G[softmax(Xl),β]
Figure FDA0003149506760000011
wherein exp is exponential operation, softmax is exponential normalization operation of all input neurons of the l-th layer, and weight vector softmax (X) with value range of (0,1) is obtainedl) (ii) a G is gate operation, beta is attention force proportion coefficient, the value range (0,1) is taken, namely the first beta elements in the weight vector are selected, and all the other elements are set as 0; alpha is alphalTo obtain attention coefficient vectors for all neurons in layer I, XlVectors formed for all neuron values of the l-th layer, nlThe number of all neurons in the l layer;
step S3, learning a non-linear expression capability by a single neuron of the neuron computational model identified by the initial image classification using an adaptive subnetwork to extract a high-level feature of the preset image, wherein the non-linear expression capability is controlled by the number of subnetwork layers and the number of subnetwork nodes, and the step S3 specifically includes:
Figure FDA0003149506760000012
h=γ·nl+1
wherein,
Figure FDA0003149506760000013
representing the self-adaptive sub-network corresponding to the jth neuron of the l +1 th layer, wherein k is the number of hidden layers in the self-adaptive sub-network and is at least 2; h is the number of neuron nodes in each layer of the adaptive sub-network, and is composed of the node coefficient gamma of the adaptive sub-network and the node number n of the new network in the current layerl+1The value range of gamma is 0 to 1,
Figure FDA0003149506760000014
for the jth neuron internal subnetwork of the l +1 th layer
Figure FDA0003149506760000015
And step S4, controlling the calculation amount in the image classification and identification process by setting the attention ratio proportion coefficient, the number of the sub-network layers and the number of the sub-network nodes, and controlling the calculation cost while obtaining the high-accuracy image classification and identification result.
2. The method according to claim 1, wherein the width of the adaptive sub-network is proportional to a node coefficient γ of the adaptive sub-network, and the number k of hidden layers in the adaptive sub-network is proportional to the depth of the adaptive sub-network.
3. The image classification and identification method based on the self-attention mechanism and the adaptive sub-network according to claim 1, wherein the operation architecture in the step S4 is as follows:
Figure FDA0003149506760000021
wherein,
Figure FDA0003149506760000022
is the jth neuron of layer l +1, σ is the nonlinear activation function, nlIs the number of all neurons in layer l,
Figure FDA0003149506760000023
to connect the ith neuron of the l-th layer and the jth neuron of the l + 1-th layer,
Figure FDA0003149506760000024
the ith neuron of the l-th layer,
Figure FDA0003149506760000025
the bias coefficient corresponding to the jth neuron of the l +1 th layer,
Figure FDA0003149506760000026
represents the adaptive sub-network corresponding to the jth neuron of the l +1 th layer, alphalIs the attention coefficient vector of all neurons in the l-th layer, XlA vector formed by all neuron values of the l-th layer,
Figure FDA0003149506760000027
for the jth neuron internal subnetwork of the l +1 th layer
Figure FDA0003149506760000028
The corresponding network parameters.
CN202110119391.9A 2021-01-28 2021-01-28 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network Active CN112784909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110119391.9A CN112784909B (en) 2021-01-28 2021-01-28 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110119391.9A CN112784909B (en) 2021-01-28 2021-01-28 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network

Publications (2)

Publication Number Publication Date
CN112784909A CN112784909A (en) 2021-05-11
CN112784909B true CN112784909B (en) 2021-09-28

Family

ID=75759480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110119391.9A Active CN112784909B (en) 2021-01-28 2021-01-28 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network

Country Status (1)

Country Link
CN (1) CN112784909B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392899B (en) * 2021-06-10 2022-05-10 电子科技大学 Image classification method based on binary image classification network
CN114463308B (en) * 2022-02-09 2024-07-16 广东数字生态科技有限责任公司 Visual inspection method, device and processing equipment for visual angle photovoltaic module of unmanned aerial vehicle

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11669718B2 (en) * 2017-05-23 2023-06-06 Intel Corporation Methods and apparatus for discriminative semantic transfer and physics-inspired optimization of features in deep learning
CN109165667B (en) * 2018-07-06 2022-02-22 中国科学院自动化研究所 Brain disease classification system based on self-attention mechanism
CN111242157A (en) * 2019-11-22 2020-06-05 北京理工大学 Unsupervised domain self-adaption method combining deep attention feature and conditional opposition
CN111861880B (en) * 2020-06-05 2022-08-30 昆明理工大学 Image super-fusion method based on regional information enhancement and block self-attention
CN111832814B (en) * 2020-07-01 2023-06-23 北京工商大学 Air pollutant concentration prediction method based on graph attention mechanism
CN111985205A (en) * 2020-08-05 2020-11-24 重庆大学 Aspect level emotion classification model

Also Published As

Publication number Publication date
CN112784909A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN111950649B (en) Attention mechanism and capsule network-based low-illumination image classification method
CN110399518B (en) Visual question-answer enhancement method based on graph convolution
CN113628249B (en) RGBT target tracking method based on cross-modal attention mechanism and twin structure
CN112784909B (en) Image classification and identification method based on self-attention mechanism and self-adaptive sub-network
CN110503613B (en) Single image-oriented rain removing method based on cascade cavity convolution neural network
Wu et al. Dynamic attention network for semantic segmentation
CN113313173B (en) Human body analysis method based on graph representation and improved transducer
CN114511576B (en) Image segmentation method and system of scale self-adaptive feature enhanced deep neural network
CN112991350A (en) RGB-T image semantic segmentation method based on modal difference reduction
CN113298129A (en) Polarized SAR image classification method based on superpixel and graph convolution network
CN113706544A (en) Medical image segmentation method based on complete attention convolution neural network
CN112418032A (en) Human behavior recognition method and device, electronic equipment and storage medium
Liu et al. Classification of hyperspectral image by CNN based on shadow area enhancement through dynamic stochastic resonance
Yang et al. Research on classification algorithms for attention mechanism
CN109345497B (en) Image fusion processing method and system based on fuzzy operator and computer program
CN112784831B (en) Character recognition method for enhancing attention mechanism by fusing multilayer features
CN111695436A (en) High spatial resolution remote sensing image scene classification method based on target enhancement
Wang et al. Feature enhancement: predict more detailed and crisper edges
Yu et al. LBP‐based progressive feature aggregation network for low‐light image enhancement
CN116861078A (en) Graphic neural network recommendation method integrating multiple behaviors of users
Cosovic et al. Cultural heritage image classification
CN114611673A (en) Neural network compression method, device, equipment and readable storage medium
Kim et al. Convolution layer with nonlinear kernel of square of subtraction for dark-direction-free recognition of images
CN115481215A (en) Partner prediction method and prediction system based on temporal partner knowledge graph
Huang et al. Algorithm of image classification based on Atrous-CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant