CN116012687A - Image interaction fusion method for identifying tread defects of wheel set - Google Patents

Image interaction fusion method for identifying tread defects of wheel set Download PDF

Info

Publication number
CN116012687A
CN116012687A CN202310101570.9A CN202310101570A CN116012687A CN 116012687 A CN116012687 A CN 116012687A CN 202310101570 A CN202310101570 A CN 202310101570A CN 116012687 A CN116012687 A CN 116012687A
Authority
CN
China
Prior art keywords
image
interaction
decoding
module
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310101570.9A
Other languages
Chinese (zh)
Inventor
***
杨皓楠
何静
张昌凡
李哲姝
王忠美
贾林
黄刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202310101570.9A priority Critical patent/CN116012687A/en
Publication of CN116012687A publication Critical patent/CN116012687A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses an image interaction fusion method for recognizing wheel set tread defects, which mainly comprises five stages, namely a data acquisition and processing stage, a multi-scale interaction attention feature extraction stage, a constraint coupling decoding stage, a fusion inference stage and a result display stage; the self-adaptive mixed interaction attention module applied in the multi-scale interaction attention feature extraction stage is beneficial to distinguishing the shape of a model target, and the multi-scale sparse feature extraction module applied in the multi-scale interaction attention feature extraction stage is beneficial to distinguishing the scale when extracting features; the designed constraint coupling decoding stage introduces consistency loss and reconstruction loss into a modal decoder, so that the accuracy of model identification is improved; in conclusion, the method solves the problem that the shape and the scale of the target are difficult to distinguish when the features are extracted, and the model identification precision is high.

Description

Image interaction fusion method for identifying tread defects of wheel set
Technical Field
The invention relates to the technical field of wheel set tread recognition, in particular to an image interaction fusion method for wheel set tread defect recognition.
Background
As trains run over a high-speed heavy-load age, intelligent operation and maintenance of the trains are attracting attention, and as important running and supporting components of the trains, factors such as high-frequency vibration caused by high-speed rotation of wheel sets, wheel rail temperature rise caused by sliding/rolling in the contact process of the wheel rails and the like can cause wheel rail contact fatigue, so that multiple types of defects such as scratch, scratch and block drop are generated on the wheel set tread, and accurate identification of tread defects can provide key support for train operation and maintenance. The tread defect data samples are mostly presented as the situations of small difference of heterogeneous defect shapes and large difference of the same type of defect dimensions due to the restriction of factors such as complex and changeable contact working conditions of wheel and rail, so that the existing deep learning network always faces the problems of difficult distinction of target shapes and dimensions, low model recognition precision and the like in feature extraction, and the requirements of intelligent operation and maintenance of trains on real-time diagnosis of tread defects of wheel pairs are difficult to meet.
The invention of the publication No. CN114663344A creates a patent, disclose a train wheel set tread defect identification method based on image fusion, when the said invention creates the tread defect of train wheel set, gather the image through visible light camera and infrared camera, obtain the tread regional image of train wheel set; constructing a fusion model of a visible light image and an infrared image based on a neural network, training the model until the model converges, and inputting the corresponding visible light image and infrared image into the trained fusion model to obtain a fusion image; a region growing method is adopted, and pixel points in the fused image are polymerized according to the similarity of the gray values of the image, so that an image of the tread defect region of the train wheel set is obtained; the invention adopts the conventional neural network to construct a fusion model of the visible light image and the infrared image, and when the wheel set tread defect is carried out, the problems of difficult distinction of the shape and the scale of the target, low model identification precision and the like exist.
Disclosure of Invention
Aiming at the technical problems, the invention provides an image interaction fusion method for identifying wheel set tread defects.
The invention adopts the following specific technical scheme:
the main body of the method comprises five stages, namely a data acquisition and processing stage, a multi-scale interaction attention feature extraction stage, a constraint coupling decoding stage, a fusion inference stage and a result display stage, wherein the five stages are respectively a data acquisition and processing stage, a multi-scale interaction attention feature extraction stage, a constraint coupling decoding stage, a fusion inference stage and a result display stage, and the method comprises the following steps of:
s1, data acquisition and processing: and acquiring RGB image samples of the tread defect of the wheel set on site, and encoding the original RGB image samples by using a Poisson encoder to obtain Poisson mode POS images for image fusion.
S2, a multi-scale interaction attention feature extraction stage:
s2.1, taking a pre-training lightweight network Mobilenetv2 fused with RGB and Poisson images as a model backbone network, extracting bottom features of modes of respectively acquiring the RGB images and the Poisson mode images in the step S1, and setting an input feature map x m ∈R H×W×C For an m-modality input image, its final encoding characteristics are as follows:
h m =Mobile m (x m ),m∈{r,p}
its characteristic map x m ∈R H×W×C H, W, C in the formula is expressed as the length, width and channel number of RGB image characteristics of the input image, in the formula
Figure BDA0004085416830000021
Wherein H is 1 、W 1 、C 1 But also respectively represents the length, width and channel number of the characteristic diagram,
Figure BDA0004085416830000022
Mobile m coding features respectively representing m modes, mobilenetv2 network, r, p represent the features of RGB image modality and poisson modality POS image modality, respectively;
s2.2, adopting a multi-scale sparse feature extraction module DGASPP to extract multi-scale features of the bottom layer features so as to solve the problem of large difference of defect scales in the image sample, wherein the multi-scale sparse feature extraction module DGASPP is used for encoding features h of two modes m Extracting multi-scale features, wherein the extraction formula is as follows:
s m =DGASPP m (h m ),m∈{r,p}
in the middle of
Figure BDA0004085416830000023
DGASPP m Respectively representing the multi-scale coding characteristics of m modes and a DGASPP module;
s2.3, extracting a spatial channel attention feature with interaction information in the bottom layer feature by using an adaptive hybrid interaction attention module AHMA module, so as to solve the problem of small defect shape difference in an image sample, and extracting a formula of the spatial channel attention feature with interaction information by using the AHMA module:
m r ,m p =AHMA(s r ,s p )
in the method, in the process of the invention,
Figure BDA0004085416830000024
attention weighting features respectively representing two modalities; AHMA is the adaptive hybrid interactive attention module.
S3, constraint coupling decoding stage: and (2) starting a constraint coupling decoding module, decoding the characteristics extracted by the multi-scale interaction attention characteristic extraction module in the step S2.2 by using a modal decoder, wherein the modal decoder adopts a network deep 3 with an improved coding and decoding structure, and the improvement process of the decoding part is as follows:
s3.1, adding additional characteristic splicing convolution CConv and shortcut convolution SConv for obtaining decoding characteristics on more scales;
s3.2, adding consistency constraint loss among decoding features of different modes, and capturing interaction features in the decoding features;
s3.3, increasing reconstruction loss between the input image and the reconstructed image so as to enhance shape feature extraction;
s3.4, constructing a total target loss function composed of task loss, consistency loss and reconstruction loss, wherein the total target loss function is used for guiding the network to learn related characteristics, and the formula of the total target loss function is as follows:
L total =μL task +(1-μ)(L consis +L recon )
wherein: mu is a loss function adjustment factor, L task L is a task loss function consis As a consistency loss function, L recon Reconstructing a loss function, wherein the total target loss function is used for adjusting the contribution of task loss and decoding part loss to network learning, and taking the minimized loss function as a target training network;
s4, fusion inference phase: and adopting a global average pooling and a multi-layer perceptron to perform fusion inference on the interaction attention characteristics, wherein the fusion inference formula is as follows:
Figure BDA0004085416830000031
wherein P represents fusion inference output; MLP represents a multi-layer perceptron; avg represents an average pooling operation;
Figure BDA0004085416830000032
representing a channel splice operation.
S5, a result display stage: the respective categories of the partial test set image samples are shown as belonging to the categories.
Further, the constraint coupling decoding module is used for assisting the multi-scale interaction attention feature extraction stage to train the network and not participate in the test.
Further, the multi-scale sparse feature extraction module DGASPP is formed by introducing a ghost module and a dense connection idea into the void space pyramid pooling module ASPP.
Further, in step S2.3, the adaptive hybrid interaction attention module AHMA is composed of a hybrid non-local module HNL, attention moment array fusion and adaptive attention reliability weights.
Further, the task loss function L in step S3.4 task The formula of (2) is:
L task =-α(1-P) γ log(P)
where α=0.25 represents a weight factor, and γ=2 represents an adjustment factor.
Further, the consistency loss function L in step S3.4 consis The formula of (2) is:
Figure BDA0004085416830000033
the 16-times downsampling and 8-times downsampling decoding characteristics of the m mode are respectively expressed as
Figure BDA0004085416830000041
Wherein m is { r, p }, where +.>
Figure BDA0004085416830000042
Representing the decoded features of the 16-fold downsampled RGB image modality and the poisson modality POS image modality respectively,
Figure BDA0004085416830000043
the decoding characteristics of the 8-time downsampled RGB image mode and the Poisson mode POS image mode are respectively represented, and the consistency loss function is obtained by adopting the mean square error MSE loss at the 16-time and 8-time downsampled decoding characteristics among the modes.
Further, the m-mode 16-times downsampling, 8-times downsampling decoding feature
Figure BDA0004085416830000044
And->
Figure BDA0004085416830000045
The formulas of (a) are respectively as follows:
Figure BDA0004085416830000046
Figure BDA0004085416830000047
wherein the method comprises the steps of
Figure BDA0004085416830000048
up 2 Representing bilinear interpolation 2 x upsampling operation,/->
Figure BDA0004085416830000049
Representing a channel splice operation.
Further, decoding features
Figure BDA00040854168300000410
CConv in the formula 8 Decoding characteristics->
Figure BDA00040854168300000411
CConv in the formula 16 With the same topology.
Further, the loss function L is reconstructed in step S3.4 recon The method comprises the following steps:
Figure BDA00040854168300000412
wherein x is r 、x p Representing RGB image and poisson image inputs, r, respectively r 、r p Representing the RGB image and poisson image decoding reconstruction output, respectively.
Further, the image interaction fusion method adopts experiments to carry out verification analysis.
The beneficial effects of the invention are as follows:
the image interaction fusion method for identifying the tread defects of the wheel set has the beneficial effects that:
(1) In the multi-scale interaction attention feature extraction stage, a self-adaptive mixed interaction attention module is provided, and channel branches and a self-adaptive interaction mechanism are introduced into a non-local attention module, so that the non-local attention module can improve attention information gain by using inter-mode interaction information, optimize target shape feature capturing capability, facilitate the distinction of target shapes when features are extracted, and solve the problem of small defect shape difference in a model.
(2) In the multi-scale interaction attention feature extraction stage, a multi-scale sparse feature extraction module is provided, and introduces a ghost module and a dense connection idea into a cavity space pyramid pooling module, so that the multi-scale sparse representation can be optimized by associating branch information of each scale, the scale distinction in feature extraction is facilitated, and the problem of large defect scale difference is solved.
(3) The constrained coupling decoding stage is designed, and consistency loss and reconstruction loss are introduced into the modal decoder, so that the model can better capture defect shape characteristics and inter-modal interaction characteristic information by an auxiliary network in the training process, and the model identification accuracy is high.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a simplified flow chart of an image interaction fusion method for identifying wheel set tread defects;
FIG. 2 is a schematic diagram of a data acquisition and processing stage;
FIG. 3 is a schematic illustration of a portion of a sample;
FIG. 4 is a schematic diagram of a multi-scale interaction attention feature extraction stage;
FIG. 5 is a schematic diagram of a fusion inference phase;
FIG. 6 is a schematic diagram of a constraint coupled decoding stage;
FIG. 7 is a schematic diagram of a result display stage;
fig. 8 is a schematic diagram of a DGASPP module structure.
Detailed Description
The present invention is further illustrated and described below with reference to examples, which are not intended to be limiting in any way.
Example 1
As shown in fig. 1, the main body of the method comprises five stages, namely a data acquisition and processing stage, a multi-scale interaction attention feature extraction stage, a constraint coupling decoding stage, a fusion inference stage and a result display stage, wherein the method comprises the following steps:
s1, data acquisition and processing: as shown in fig. 2, RGB image samples of tread defects of the wheel set are collected on site, and a poisson encoder is adopted to encode the original RGB image samples to obtain poisson mode POS images for image fusion;
the data set used in the experiment in the embodiment 1 is acquired by a CCD camera in a unified standard from a vehicle section wheel axle workshop, and the database established by the invention comprises 9 state images and 1 interference image derived from the wheel set tread under the action of wheel rail contact force in the actual running process of the train.
Specifically, as shown in table 1, the total of 343 images of 10 wheel set states acquired from the tread defect dataset was obtained. Wherein, the number of the sheets is 57, the number of the sheets is fluctuated 52, the number of the cracks is 47, the number of the scratches is 45, the number of the peeled sheets is 42, the number of stains (interference images) is 29, the number of the scratches is 24, the number of the edge abrasion is 21, the number of the circumferential abrasion is 20, and the number of the blocks is 6.
Tread defect class Training sample number (sheet) Number of test samples (sheet)
Tread face correctionOften times 34 23
Punch damage on tread 31 21
Tread crack 28 19
Tread scratch 27 18
Tread stripping 25 17
Tread stain 17 12
Scratch on tread 14 10
Edge abrasion 12 9
Circumferential abrasion 12 8
Tread block 3 3
TABLE 1
To facilitate network training and testing, defects in the original RGB image are reconstructed into an image with 224×224 pixel size, and the RGB image is poisson coded to obtain a POS image, and a part of sample is shown in FIG. 3. And finally, approximately dividing each state data in a ratio of 3:2 to obtain 203 images of the training set and 140 images of the testing set.
S2, as shown in FIG. 4, a multi-scale interaction attention feature extraction stage:
s2.1, taking a pre-training lightweight network Mobilenetv2 fused with RGB and Poisson images as a model backbone network, extracting bottom features of modes of respectively acquiring the RGB images and the Poisson mode images in the step S1, and setting an input feature map x m ∈R 224×224×3 For an m-modality input image, its final encoding characteristics are as follows:
h m =Mobile m (x m ),m∈{r,p} (1)
h m ∈R 7×7×128 、Mobile m respectively representing coding characteristics of m modes and a Mobilenetv2 network, and r and p respectively represent characteristics of RGB image modes and Poisson mode POS image modes;
s2.2, adopting a multi-scale sparse feature extraction module DGASPP to extract multi-scale features of the bottom layer features so as to solve the problem of large difference of defect scales in the image sample, wherein the multi-scale sparse feature extraction module DGASPP is used for encoding features h of two modes m Extracting multi-scale features, wherein the extraction formula is as follows:
s m =DGASPP m (h m ),m∈{r,p} (2)
s in the formula (2) m ∈R 7×7×128 、DGASPP m Respectively representing the multi-scale coding characteristics of m modes and a DGASPP module;
s2.3, extracting a spatial channel attention feature with interaction information in the bottom layer feature by using an adaptive hybrid interaction attention module AHMA module, so as to solve the problem of small defect shape difference in an image sample, and extracting a formula of the spatial channel attention feature with interaction information by using the AHMA module:
m r ,m p =AHMA(s r ,s p ) (3)
in the formula (3), m r ,m p ∈R 7×7×128 Attention weighting features respectively representing two modalities; AHMA is the self-adaptive mixed interaction attention module;
s3, constraint coupling decoding stage: as shown in fig. 5, the constraint coupled decoding module is enabled, the features extracted by the multi-scale interaction attention feature extraction module in step S2.2 are decoded by using a modal decoder, and the process of improving the decoding part by adopting a network deep 3 with an improved encoding and decoding structure in the modal decoder is as follows:
s3.1, adding additional characteristic splicing convolution CConv and shortcut convolution SConv for decoding characteristics acquired on more scales, wherein the decoding characteristics of 16 times downsampling and 8 times downsampling of m modes in a Mobilenetv2 module are respectively expressed as
Figure BDA0004085416830000071
Wherein m is { r, p }, 16 times downsampling and 8 times downsampling of m modes decode the characteristic ∈ ->
Figure BDA0004085416830000072
And->
Figure BDA0004085416830000073
The formulas of (a) are respectively as follows:
Figure BDA0004085416830000074
Figure BDA0004085416830000075
/>
in the formula (4)
Figure BDA0004085416830000076
In formula (5)>
Figure BDA0004085416830000077
up 2 Representing bilinear interpolation 2 x upsampling operation,/->
Figure BDA0004085416830000078
Representing a channel splicing operation;
s3.2, adding consistency constraint loss among decoding features of different modes, and capturing interaction features in the decoding features, wherein a task loss function L task The formula of (2) is:
L task =-α(1-P) γ log(P) (6)
wherein α=0.25 represents a weight factor, and γ=2 represents an adjustment factor;
wherein the coherence loss function L consis The formula of (2) is:
Figure BDA0004085416830000081
in the formula (7)
Figure BDA0004085416830000082
Decoding features representing the 16-fold downsampled RGB image modality and poisson modality POS image modality, respectively,/->
Figure BDA0004085416830000083
The decoding characteristics of the 8-time downsampling RGB image mode and the Poisson mode POS image mode are respectively represented, and the consistency loss function is obtained by adopting a mean square error MSE loss at the 16-time and 8-time downsampling decoding characteristics among the modes;
s3.3, adding reconstruction loss between the input image and the reconstructed image to enhance shape feature extraction, wherein the reconstruction loss function L recon The method comprises the following steps:
Figure BDA0004085416830000084
wherein x is r 、x p Representing RGB image and poisson image inputs, r, respectively r 、r p Respectively representing RGB image and Poisson image decoding reconstruction output;
s3.4, constructing a total target loss function composed of task loss, consistency loss and reconstruction loss, wherein the total target loss function is used for guiding the network to learn related characteristics, and the formula of the total target loss function is as follows:
L total =μL task +(1-μ)(L consis +L recon ) (9)
in formula (9): mu is a loss function adjustment factor, mu is set to 0.8, L task L is a task loss function consis As a consistency loss function, L recon Reconstructing a loss function, wherein the total target loss function is used for adjusting the contribution of task loss and decoding part loss to network learning, and taking the minimized loss function as a target training network;
s4, fusion inference phase: as shown in fig. 6, the global average pooling and multi-layer perceptron is used to perform fusion inference on the interaction attention features, and the fusion inference formula is as follows:
Figure BDA0004085416830000085
in the formula (10), P represents a fusion inference output; MLP represents a multi-layer perceptron; avg represents an average pooling operation;
Figure BDA0004085416830000086
representing a channel splicing operation;
s5, a result display stage: as shown in fig. 7, the respective categories of the partial test set image samples are shown as well as the belonging categories.
Further, the constraint coupling decoding stage is used for assisting the multi-scale interaction attention feature extraction stage to train the network without participating in the test.
Further, the multi-scale sparse feature extraction module DGASPP is formed by introducing a ghost module and a dense connection idea into the cavity space pyramid pooling module ASPP, the structure of the DGASPP module is shown in fig. 8, 3 GASPP Block branches with different expansion coefficients are designed in the DGASPP module, three branches are densely connected to obtain global features, another pooling branch is added to average pooling input feature mapping, finally, the features of the input feature mapping and the four branches are spliced, and 11 convolutions are used to compress the features into 128 channels so as to restore the dimension of the input channel; the multi-scale sparse feature extraction module DGASPP provided by the invention can optimize multi-scale sparse representation by associating branch information of each scale, is beneficial to scale discrimination when features are extracted, and solves the problem of large difference of defect scales.
Further, in step S2.3, the adaptive hybrid interaction attention module AHMA is composed of a hybrid non-local module HNL, attention moment array fusion and adaptive attention reliability weights.
Further, decoding features
Figure BDA0004085416830000091
CConv in the formula 8 Decoding characteristics->
Figure BDA0004085416830000092
CConv in the formula 16 Has the same topology for integrating the corresponding downsampling feature with the previous layer decoding feature.
Further, the image interaction fusion method adopts experiments to carry out verification analysis.
Example 2
Embodiment 2 differs from embodiment 1 in that embodiment 2 uses only the HNL module mentioned in embodiment 1 for attention feature extraction in the multi-scale interactive attention feature extraction stage.
Under the condition of consistent experimental parameter settings, the HNL module and the AHMA module are respectively applied to extract attention characteristic experiment results as shown in the following table 2
Module Acc P R F1
HNL 0.85 0.859 0.8.4 0.836
AHMA 0.871 0.903 0.877 0.883
TABLE 2
The experiment mainly adopts the following indexes: accuracy (Acc), recall (R), precision (P), F1 value (F1), pa, and T. Wherein, the accuracy Acc represents the proportion of the number of samples with correct prediction (positive class and negative class) to the total number of samples; recall R represents the proportion of the number of samples correctly predicted as positive to all instances as positive samples; the accuracy rate P represents the proportion of the number of samples correctly predicted as positive types to all the samples predicted as positive; the F1 value represents a weighted harmonic average of both recall and precision; pa represents the model parameter number (unit: M); t represents the time (unit: S) spent by a single sample testing the model on the CPU
As can be seen from table 2, the classification indexes of the AHMA module used in embodiment 1 are better than those of the HNL module used in embodiment 2, so that the validity and superiority of the AHMA module are verified, and the indexes of the HNL module are respectively reduced by 0.021, 0.044, 0.043 and 0.047 compared with the AHMA, which indicates that the AHMA module formed by the modal interaction strategy designed on the basis of the HNL module in embodiment 1 can effectively capture the interaction information between the modalities, and the feature extraction capability of the HNL module is improved.
Example 3
Embodiment 3 differs from embodiment 1 in that embodiment 3 uses only the ASPP scale feature extraction module mentioned in embodiment 1 for feature extraction in the multi-scale interactive attention feature extraction stage.
In the case where the experimental parameter settings are consistent, the DGASPP module and the ASPP module extraction attention feature experiment results are shown in table 3 below.
Module Acc P R F1
ASPP 0.821 0.863 0.81 0.819
DGASPP 0.871 0.903 0.877 0.883
TABLE 3 Table 3
As can be seen from table 3, the module DGASPP of embodiment 1 can maintain a relatively small parameter under the condition that the performance is significantly better than that of the comparison module ASPP module, which indicates that the module DGASPP of embodiment 1 can extract multi-scale features more effectively, and further can alleviate the problem of large difference of defect scales more effectively.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1. The image interaction fusion method for recognizing the wheel set tread defects is characterized in that the method main body comprises five stages, namely a data acquisition and processing stage, a multi-scale interaction attention feature extraction stage, a constraint coupling decoding stage, a fusion inference stage and a result display stage;
the method comprises the following steps:
s1, data acquisition and processing: on-site collecting RGB image samples of tread defects of the wheel set, and encoding the original RGB image samples by using a Poisson encoder to obtain Poisson mode POS images for image fusion;
s2, a multi-scale interaction attention feature extraction stage:
s2.1, taking a pre-training lightweight network Mobilenetv2 fused with RGB and Poisson images as a model backbone network, extracting bottom features of modes of respectively acquiring the RGB images and the Poisson mode images in the step S1, and setting an input feature map x m ∈R H ×W×C For an m-modality input image, its final encoding characteristics are as follows:
h m =Mobile m (x m ),m∈{r,p}
its characteristic map x m ∈R H×W×C H, W, C in the formula is expressed as the length, width and channel number of the input image, in the formula
Figure FDA0004085416820000011
Wherein H is 1 、W 1 、C 1 And respectively represent the length, width and channel number of the feature map, ">
Figure FDA0004085416820000012
Mobile m Respectively representing coding characteristics of m modes and a mobiletv 2 network;
s2.2, adopting a multi-scale sparse feature extraction module DGASPP for multi-scale feature extraction of the bottom layer features, wherein the multi-scale sparse feature extraction module DGASPP is used for encoding features h of two modes m Extracting multi-scale features, wherein the extraction formula is as follows:
Figure FDA0004085416820000013
in the middle of
Figure FDA0004085416820000014
DGASPP m Respectively representing the multi-scale coding characteristics of m modes and a DGASPP module;
s2.3, extracting a spatial channel attention feature with interaction information in the bottom layer feature by using an adaptive hybrid interaction attention module AHMA module, so as to solve the problem of small defect shape difference in an image sample, and extracting a formula of the spatial channel attention feature with interaction information by using the AHMA module:
m r ,m p =AHMA(s r ,s p )
in the method, in the process of the invention,
Figure FDA0004085416820000015
attention weighting features respectively representing two modalities; AHMA is the self-adaptive mixed interaction attention module;
s3, constraint coupling decoding stage: enabling constraint coupled decoding module, decoding step using modal decoder
S2.2, the multi-scale interaction attention feature extraction module extracts features, the mode decoder adopts a network deep 3 with an improved coding and decoding structure, and the decoding part is improved as follows:
s3.1, adding additional characteristic splicing convolution CConv and shortcut convolution SConv for obtaining decoding characteristics on more scales;
s3.2, adding consistency constraint loss among decoding features of different modes, and capturing interaction features in the decoding features;
s3.3, increasing reconstruction loss between the input image and the reconstructed image, and extracting shape features;
s3.4, constructing a total target loss function composed of task loss, consistency loss and reconstruction loss, and guiding the network to learn relevant characteristics, wherein the formula of the total target loss function is as follows:
L total =μL task +(1-μ)(L consis +L recon )
wherein: mu is a loss function adjustment factor, L task L is a task loss function consis As a consistency loss function, L recon Reconstructing a loss function;
s4, fusion inference phase: and adopting a global average pooling and a multi-layer perceptron to perform fusion inference on the interaction attention characteristics, wherein the fusion inference formula is as follows:
Figure FDA0004085416820000021
wherein P represents fusion inference output, MLP represents a multi-layer perceptron, avg represents average pooling operation, and avg represents channel splicing operation;
s5, a result display stage: the respective categories of the partial test set image samples are shown as belonging to the categories.
2. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the constraint coupling decoding module is used for assisting a multi-scale interaction attention feature extraction stage training network and does not participate in testing.
3. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the multi-scale sparse feature extraction module DGASPP is formed by introducing a ghost module and a dense connection idea into a cavity space pyramid pooling module ASPP.
4. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the adaptive hybrid interaction attention module AHMA in step S2.3 is composed of a hybrid non-local module HNL, attention moment array fusion and adaptive attention reliability weights.
5. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the task loss function L in the step S3.4 task The formula of (2) is:
L task =-α(1-P) γ log(P)
where α=0.25 represents a weight factor, and γ=2 represents an adjustment factor.
6. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the consistency loss function L in the step S3.4 consis The formula of (2) is:
Figure FDA0004085416820000031
16 times of m modeThe downsampled and 8-times downsampled decoding features are represented as
Figure FDA0004085416820000032
Wherein m is { r, p }, where
Figure FDA0004085416820000033
Decoding features representing the 16-fold downsampled RGB image modality and poisson modality POS image modality, respectively,/->
Figure FDA0004085416820000034
The decoding characteristics of the 8-time downsampled RGB image mode and the Poisson mode POS image mode are respectively represented, and the consistency loss function is obtained by adopting the mean square error MSE loss at the 16-time and 8-time downsampled decoding characteristics among the modes.
7. The image interactive fusion method for wheel set tread defect identification according to claim 6, wherein the m-mode 16-time downsampling and 8-time downsampling decoding features
Figure FDA0004085416820000035
And->
Figure FDA0004085416820000036
The formulas of (a) are respectively as follows:
Figure FDA0004085416820000037
Figure FDA0004085416820000038
wherein the method comprises the steps of
Figure FDA0004085416820000039
up 2 Representing bilinear interpolation 2 x upsampling operations ∈Representing a channel splice operation.
8. The image interactive fusion method for wheel set tread defect identification according to claim 7, wherein the decoding features
Figure FDA00040854168200000310
CConv in the formula 8 Decoding characteristics->
Figure FDA00040854168200000311
CConv in the formula 16 With the same topology.
9. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the reconstructing loss function L in step S3.4 recon The method comprises the following steps:
Figure FDA00040854168200000312
wherein x is r 、x p Representing RGB image and poisson image inputs, r, respectively r 、r p Representing the RGB image and poisson image decoding reconstruction output, respectively.
10. The image interaction fusion method for identifying wheel set tread defects according to claim 1, wherein the image interaction fusion method adopts experiments for verification analysis.
CN202310101570.9A 2023-02-10 2023-02-10 Image interaction fusion method for identifying tread defects of wheel set Pending CN116012687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310101570.9A CN116012687A (en) 2023-02-10 2023-02-10 Image interaction fusion method for identifying tread defects of wheel set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310101570.9A CN116012687A (en) 2023-02-10 2023-02-10 Image interaction fusion method for identifying tread defects of wheel set

Publications (1)

Publication Number Publication Date
CN116012687A true CN116012687A (en) 2023-04-25

Family

ID=86024962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310101570.9A Pending CN116012687A (en) 2023-02-10 2023-02-10 Image interaction fusion method for identifying tread defects of wheel set

Country Status (1)

Country Link
CN (1) CN116012687A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843685A (en) * 2023-08-31 2023-10-03 山东大学 3D printing workpiece defect identification method and system based on image detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843685A (en) * 2023-08-31 2023-10-03 山东大学 3D printing workpiece defect identification method and system based on image detection
CN116843685B (en) * 2023-08-31 2023-12-12 山东大学 3D printing workpiece defect identification method and system based on image detection

Similar Documents

Publication Publication Date Title
CN109035251B (en) Image contour detection method based on multi-scale feature decoding
CN110070091B (en) Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding
CN111738363B (en) Alzheimer disease classification method based on improved 3D CNN network
CN116485717B (en) Concrete dam surface crack detection method based on pixel-level deep learning
CN110349170B (en) Full-connection CRF cascade FCN and K mean brain tumor segmentation algorithm
CN111080591A (en) Medical image segmentation method based on combination of coding and decoding structure and residual error module
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN116012687A (en) Image interaction fusion method for identifying tread defects of wheel set
CN116228792A (en) Medical image segmentation method, system and electronic device
CN114972312A (en) Improved insulator defect detection method based on YOLOv4-Tiny
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN114463340B (en) Agile remote sensing image semantic segmentation method guided by edge information
CN116703812A (en) Deep learning-based photovoltaic module crack detection method and system
CN116030396A (en) Accurate segmentation method for video structured extraction
CN111223113A (en) Nuclear magnetic resonance hippocampus segmentation algorithm based on dual dense context-aware network
CN114821067A (en) Pathological image segmentation method based on point annotation data
CN116188352A (en) Pulmonary nodule segmentation method based on enhanced edge features
CN113269702A (en) Low-exposure vein image enhancement method based on cross-scale feature fusion
CN111696070A (en) Multispectral image fusion power internet of things fault point detection method based on deep learning
CN116610080B (en) Intelligent production method of leisure chair and control system thereof
CN117648890B (en) Semiconductor device modeling method and system based on artificial intelligence
CN116524199B (en) Image rain removing method and device based on PReNet progressive network
CN114758387B (en) Lightweight face anti-fraud method and device based on single-frame RGB image
CN116202874B (en) Drainage pipe flexibility testing method and system
CN113393521B (en) High-precision flame positioning method and system based on dual semantic attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination