CN110599452B

CN110599452B - Rust detection system, method, computer device and readable storage medium

Info

Publication number: CN110599452B
Application number: CN201910726368.9A
Authority: CN
Inventors: 甘津瑞; 吴鹏; 赵婷; 董世文; 王岳
Original assignee: State Grid Corp of China SGCC; State Grid Shandong Electric Power Co Ltd; Global Energy Interconnection Research Institute
Current assignee: State Grid Corp of China SGCC; State Grid Shandong Electric Power Co Ltd; Global Energy Interconnection Research Institute
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2022-02-22
Anticipated expiration: 2039-08-07
Also published as: CN110599452A

Abstract

The invention discloses a rust detection system, a rust detection method and a readable storage medium, and belongs to the technical field of computer vision. The rust detection system provided by the embodiment of the invention is of a U-shaped segmentation network structure, and the decoding stage can be spliced with the corresponding feature map of the encoding stage in the up-sampling process so as to furthest reserve some important feature information obtained in the down-sampling process in the encoding stage, and can process images with any size and accurately position rust; the induction field module is introduced to replace the traditional convolution operation, so that the induction field can be effectively improved, the misclassification part of the rust can be well solved, and the accuracy of rust detection is improved. Meanwhile, the rust detection method provided by the application processes the image to be detected based on the rust detection system, and can realize accurate positioning of rust.

Description

Rust detection system, method, computer device and readable storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a rust detection system, a rust detection method, computer equipment and a readable storage medium.

Background

With the rapid development of economy, high-voltage transmission lines are more and more built, the distance is long, the span is wide, and the requirement on reliability is higher and higher. The transmission line comprises parts such as overhead ground wires, insulators, hardware fittings, towers, foundations, grounding devices and the like, and because the equipment is exposed outside for a long time and can be influenced by various weather conditions, the metal parts of the transmission line equipment are easily corroded and damaged, so that the equipment is damaged, and safety accidents are caused. Therefore, the corrosion detection of the high-voltage transmission line is of great importance, and the corrosion detection has important influence on the safe operation of the transmission line.

At present, in the field of safety detection of power transmission lines, two modes are mainly adopted, one mode is manual inspection, and the other mode is that relevant parts of the power transmission lines are shot by using an unmanned aerial vehicle and a robot to obtain inspection images, and the images are processed by using a computer vision technology to realize automatic diagnosis of the power transmission lines.

However, at present, the research on the corrosion detection of the power transmission line is very few, and the problems of low accuracy and excessive computation in model training are existed in the aspect of the corrosion detection of the power transmission line.

Disclosure of Invention

The embodiment of the application provides a rust detection system, a rust detection method, computer equipment and a readable storage medium, which can solve the problem of low accuracy rate of rust identification in the related technology. The technical scheme is as follows:

in one aspect, a rust detection system is provided, comprising:

an encoding network and a decoding network; the output of the encoding network is connected with the input of the decoding network; wherein the content of the first and second substances,

the coding network has 5 layers: the first layer to the fifth layer of the coding network are connected in sequence;

the first layer of the coding network comprises an enhanced receptive field module;

the second to fifth layers of the coding network comprise a maximum pooling layer and an enhanced receptive field module;

the decoding network has 5 layers: the fifth layer to the first layer of the coding network are connected in sequence;

the fifth layer to the second layer of the decoding network comprise a deconvolution layer, a splicing layer and an enhanced reception field module;

the first layer of the decoding network comprises a splicing layer, an enhanced reception field module and a convolution layer;

the fifth layer of the encoding network is connected with the fifth layer of the decoding network.

Optionally, the enhanced receptive field module includes:

the acquisition module is used for acquiring the characteristic diagram obtained by the previous layer, and taking the acquired characteristic diagram as the original characteristic diagram of the enhanced receptive field module;

the first convolution layer comprises 2 different 1 × 1 convolution kernels and is used for processing the original characteristic map of the enhanced receptor field module and outputting a characteristic map A1 and a characteristic map A2;

a second convolutional layer, comprising 3 different convolutional kernels, for processing the feature map output by the first layer convolution: processing the original characteristic diagram through a1 × 1 convolution core, and outputting a characteristic diagram B0; performing convolution operation on the feature map A1 through a 3 × 3 convolution core to output a feature map B1; performing convolution operation on the feature map A2 through a 5 × 5 convolution core to output a feature map B2;

a third convolutional layer comprising 3 different convolutional modules for processing the feature map output by the second convolutional layer: processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 1, and outputting a feature map C0; processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 3, and outputting a feature map C1; processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 5, and outputting a feature map C2;

a first fusion module, configured to process a feature map output by the third convolutional layer: splicing the feature map C0, the feature map C1 and the feature map C2, filtering the spliced feature maps through a1 × 1 convolution kernel, and outputting a feature map D1;

the second fusion module is used for adding the original characteristic diagram of the enhanced receptive field module and the characteristic diagram output by the first fusion module;

an output module for activating the feature map output by the second fusion module and outputting the activated feature map,

and filling strategies are adopted in the convolution operation so as to ensure that the resolution is consistent with that before convolution.

In one aspect, a rust detection method is provided, including:

acquiring an image to be detected;

analyzing an image to be detected by a rust detection system to obtain a segmentation map of the image to be detected; wherein, the rust detection system is the rust detection system provided by the embodiment of the application;

and calculating and marking the rust positions in the image to be detected.

Optionally, analyzing the image to be detected by the rust detection system to obtain a segmentation map of the image to be detected, including:

filtering an image to be detected through a receptive field module to obtain a first-layer output characteristic diagram of the coding network;

performing maximum pooling on the first layer output characteristic diagram of the coding network, and then filtering through a receptive field module to obtain a second layer output characteristic diagram of the coding network;

performing maximum pooling on the second-layer output characteristic diagram of the coding network, and then filtering through a receptive field module to obtain a third-layer output characteristic diagram of the coding network;

performing maximum pooling on the third-layer output characteristic diagram of the coding network, and then filtering through a receptive field module to obtain a fourth-layer output characteristic diagram of the coding network;

and performing maximum pooling on the fourth-layer output characteristic diagram of the coding network, and then filtering through a receptive field module to obtain a fifth-layer output characteristic diagram of the coding network.

Optionally, the analyzing, by the rust detection system, the image to be detected to obtain a segmentation map of the image to be detected, further includes:

deconvoluting the fifth-layer output characteristic diagram of the coding network to obtain an image with the same size as the fourth-layer output characteristic diagram of the coding network, splicing the image with the fourth-layer output characteristic diagram of the coding network in channel dimension, and filtering the image through an enhanced receptive field module to obtain a fourth-layer output characteristic diagram of a decoding network;

deconvoluting the output characteristic diagram of the fourth layer of the decoding network to obtain an image with the same size as the output characteristic diagram of the third layer of the coding network, splicing the image with the output characteristic diagram of the third layer of the coding network in the channel dimension, and filtering the image through an enhanced receptive field module to obtain the output characteristic diagram of the third layer of the decoding network;

deconvoluting the output characteristic diagram of the third layer of the decoding network to obtain an image with the same size as the output characteristic diagram of the second layer of the coding network, splicing the image with the output characteristic diagram of the second layer of the coding network in channel dimension, and filtering the image through an enhanced receptive field module to obtain the output characteristic diagram of the second layer of the decoding network;

and performing deconvolution on the second-layer output characteristic graph of the decoding network to obtain an image with the same size as the first-layer output characteristic graph of the coding network, splicing the image with the first-layer output characteristic graph of the coding network in channel dimension, and filtering the image through an enhanced receptive field module to obtain the first-layer output characteristic graph of the decoding network, namely the output characteristic graph of the first layer of the decoding network.

Optionally, the enhanced receptive field module includes:

the first convolution layer is used for processing the original characteristic diagram of the enhanced receptive field module;

the second convolution layer is used for processing the characteristic diagram output by the first convolution layer;

a third convolutional layer for processing the characteristic diagram output by the second convolutional layer;

the first fusion module is used for processing the feature map output by the third convolutional layer;

and the output module is used for activating the characteristic diagram output by the second fusion module and outputting the activated characteristic diagram.

Optionally, the first convolution layer is configured to process the original feature map of the enhanced receptor field module, and includes:

and performing convolution operation on the original feature map of the enhanced receptor field module through two different 1 × 1 convolution kernels respectively to output a feature map A1 and a feature map A2.

Optionally, the second convolution layer is configured to process the feature map output by the first convolution layer, and includes:

performing convolution operation on the original characteristic diagram of the enhanced receptive field module through a1 × 1 convolution core, and outputting a characteristic diagram B0;

performing convolution operation on the feature map A1 through a 3 × 3 convolution core to output a feature map B1;

and performing convolution operation on the feature map A2 through a 5 × 5 convolution kernel to output a feature map B2.

Optionally, the third convolutional layer is configured to process the feature map output by the second convolutional layer, and includes:

processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 1, and outputting a feature map C0;

processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 3, and outputting a feature map C1;

the feature map B0 was processed by a 3 × 3 convolution kernel and a hole convolution filter with a dilation rate of 5, and a feature map C2 was output.

Optionally, the first fusion module is configured to process the feature map output by the third convolutional layer, and includes:

and splicing the characteristic diagram C0, the characteristic diagram C1 and the characteristic diagram C2, filtering the spliced characteristic diagrams through a1 × 1 convolution kernel, and outputting a characteristic diagram D1.

Optionally, the convolution operations all use a padding strategy to ensure that the resolution is consistent with that before convolution.

Optionally, the obtaining of the segmentation map of the image to be detected includes:

performing convolution operation on the output characteristic diagram of the first layer of the decoding network to obtain a segmentation diagram of the image to be detected, wherein the number of channels of the segmentation diagram is 1, and the value range is 0-1;

loss measurement is carried out on the detection result in a mode of fusing image-level two-class loss and pixel-level two-class loss;

the image-level classification loss adopts common cross entropy loss;

the pixel-level binary loss employs a focus loss function.

Optionally, the calculating and labeling the rust position in the image to be detected includes:

and when the segmentation map value of a certain pixel is more than 0.5, marking the pixel as a rust pixel.

In one aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction or program is stored in the memory, and the at least one instruction or program is loaded and executed by the processor to implement the rust detection method provided herein.

In one aspect, a computer-readable storage medium is provided, in which at least one instruction or program is stored, and the instruction or program is loaded and executed by a processor to implement the rust detection method provided in the present application.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the rust detection system is of a U-shaped segmentation network structure, a decoding stage can be spliced with a feature map corresponding to an encoding stage in an up-sampling process, so that some important feature information obtained in a down-sampling process in the encoding stage is reserved to the greatest extent, and images of any size and rust can be accurately positioned; the traditional convolution operation is replaced by introducing the receptive field module, so that the receptive field can be effectively improved, the misclassification part of the rust can be well solved, and the accuracy of rust detection is improved.

The rust detection method provided by the application processes the image to be detected based on the rust detection system, and can accurately position rust.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic block diagram of a rust detection system in an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a receptor field module according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a method for rust detection in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a U-Net network structure;

FIG. 5 is a schematic block diagram of a receptor field module according to an embodiment of the present invention;

FIG. 6 is a block diagram of a computer device in an exemplary embodiment of the present application.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present application, it is noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In addition, the technical features mentioned in the different embodiments of the present application described below may be combined with each other as long as they do not conflict with each other. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

First, related terms related to the embodiments are explained, and details of these concepts are not repeated in the embodiments:

receptive Field (Receptive Field): in the convolutional neural network, the definition of the receptive field is the area size of the mapping of the pixel points on the feature map (feature map) output by each layer of the convolutional neural network on the input picture. The explanation of the colloquial point is that one point on the feature map corresponds to an area on the input map.

Initial Network (inclusion Network): the Incep network is an important milestone in the development history of the convolutional neural network. In order to enable the network to learn both global characteristics and local characteristics, the inclusion network is improved as follows: using 3 different convolution kernels 1 × 1, 3 × 3, 5 × 5; 3 x 3 maximal pooling is added in width to enhance the noise immunity of the image; the results of the above 4 modules will be spliced on the channel axis.

Hole convolution (punctured convolution): holes are injected in the standard convolution operation to increase the field of view. Compared with the original normal convolution, the hole convolution has one more hyper-parameter called expansion rate, which refers to the interval number of the kernel (for example, the expansion rate of the normal convolution is 1).

Focal loss function (focal length): the focus loss function is mainly used for solving the problem that the proportion of positive and negative samples in target detection is seriously unbalanced. The loss function reduces the weight of a large number of simple negative samples in training, which can also be understood as a difficult sample mining. The focus loss function is based on the cross entropy loss function and adds a gamma factor, wherein gamma >0 reduces the loss of the easy-to-classify samples. Making more focus on difficult, miscut samples.

Relu activation function: a linear rectifying activation function.

FIG. 1 is a schematic block diagram of a rust detection system according to an embodiment of the present invention. As shown in fig. 1, it includes an encoding network and a decoding network; the output of the coding network is connected with the input of the decoding network;

wherein the content of the first and second substances,

a first layer of the encoded network comprising an enhanced receptive field module;

Optionally, the enhanced receptive field module includes:

the acquisition module is used for acquiring the characteristic diagram obtained by the previous layer and taking the acquired characteristic diagram as the original characteristic diagram of the enhanced receptive field module;

the first convolution layer comprises 2 different 1 × 1 convolution kernels and is used for processing the original characteristic diagram of the enhanced receptor field module and outputting a characteristic diagram A1 and a characteristic diagram A2;

a second convolutional layer, comprising 3 different convolutional kernels, for processing the feature map output by the first layer convolution: processing the original characteristic diagram through a1 × 1 convolution core, and outputting a characteristic diagram B0; performing convolution operation on the feature map A1 through a 3 × 3 convolution kernel to output a feature map B1; performing convolution operation on the feature map A2 through a 5 × 5 convolution kernel to output a feature map B2;

a third convolutional layer, comprising 3 different convolutional modules, for processing the feature map output by the second convolutional layer:

processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 1, and outputting a feature map C0; processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 3, and outputting a feature map C1; processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 5, and outputting a feature map C2;

the first fusion module is used for processing the feature map output by the third convolutional layer: splicing the feature map C0, the feature map C1 and the feature map C2, filtering the spliced feature maps through a1 × 1 convolution kernel, and outputting a feature map D1;

an output module for activating the feature diagram output by the second fusion module and outputting the activated feature diagram,

and the convolution operation adopts a filling strategy so as to ensure that the resolution is consistent with that before convolution.

The rust detection system provided by the application is based on a classical U-Net network structure, and the U-Net network structure is shown in figure 4. U-Net comprises two parts: encoding and decoding, respectively, as in the left and right halves of fig. 4. Each layer in the encoding stage contains two unfilled 3 x 3 convolutional layers, followed by a2 x 2 max pooling layer of step size 2. The relu activation function is used after each convolution and each pooling increases the number of channels by a factor of two compared to the previous layer.

In the decoding phase, the resolution restoration of the image employs a2 × 2 deconvolution operation (the activation function is also relu) and two unfilled 3 × 3 convolutional layers. Each upsampled feature map is stitched to the feature map of the same layer in the encoding stage (cropped to keep the same shape). The last convolutional layer of the network is connected to 1 × 1 to obtain the number of classification results. The U-Net network can process pictures with any shapes because of only convolution and pooling operations.

Generally, a shallower convolutional layer (forward) has a smaller receptive field, a greater ability to learn to perceive detailed parts, and a deeper hidden layer (backward) has a relatively larger receptive field, which is suitable for learning more integral and relatively more macroscopic features. Deconvolution on deeper convolutional layers naturally loses many of the detail features. Although the U-Net network utilizes the feature map in the encoding stage in the decoding stage, the problems of insensitivity to details and the like cannot be avoided.

The rust detection system provided by the application strengthens the characteristic extraction capability of the network through the receptive field of the simulation human vision in the convolution operation on the basis of the U-Net network, introduces the enhanced receptive field module, namely, borrows the idea of the Incep for reference, and mainly adds the cavity convolution layer on the basis of the Incep, thereby effectively increasing the receptive field. The rust detection system provided by the application is based on a U-Net model, a conventional convolution operation in the rust detection system is replaced by an enhanced receptive field module, and the structure of the enhanced receptive field module is shown in fig. 2.

The enhanced receptor field module is a multi-branch convolution block, the internal structure of which can be divided into two parts, a multi-branch convolution layer with different kernels and a cavity convolution layer, as shown in fig. 3: the former part simulates the receptive field of various sizes by using a multi-branch cellularization layer with different kernels, and the latter part of the cavity convolution layer reproduces the relationship between the receptive field size and the eccentricity in the human visual system. Finally, the convolution layer outputs with different sizes and ratios are spliced to achieve the purpose of fusing different characteristics.

Finally, the entire network structure is shown in fig. 1.

The structure of the U-Net network is different from that of the U-Net network in that:

(1) replacing the conventional convolution operation with an enhanced receptive field module;

(2) the convolution operation in the enhanced receptive field module adopts a filling strategy to ensure that the resolution is the same as that before convolution, so that the splicing operation in the decoding stage does not need to be cut to ensure the consistency of the shapes. The final loss function contains two parts: the two-classification loss at the image level and the two-classification loss at the pixel level are as follows:

L＝L_pixel+λL_img

wherein L is_pixelAdopting a conventional focal loss function for pixel-level binary loss; l is_imgAdopting a conventional cross entropy loss function for the image-level binary loss; λ is a smoothing parameter and is set to 0.01.

In the training phase of the network, the weight initialization is initialized by Zerewinder (Xavier), the optimizer uses a Stochastic Gradient Descent (SGD) method with 0.99 momentum coefficients, the batch processing (batch) number is set to 16, and other parameters similar to U-Net and a similar data enhancement strategy are adopted.

In the loss function design, the two classification losses of the image level and the two classification losses of the pixel level are fused. The image grade classification loss adopts common cross entropy loss, and penalizes the misclassification of whether the image has rust or not; and the binary classification loss at the pixel level adopts a focal loss function, so that the rust fault of the pixel is punished, and the imbalance of positive and negative samples is balanced.

To sum up, in the embodiment of the present application, the provided rust detection system is a U-shaped segmented network structure, and the decoding stage splices the feature map corresponding to the encoding stage in the up-sampling process, so as to maximally retain some important feature information obtained in the down-sampling process in the encoding stage, and can process images of any size and accurately position rust; the traditional convolution operation is replaced by introducing the receptive field module, so that the receptive field can be effectively improved, the misclassification part of the rust can be well solved, and the accuracy of rust detection is improved.

Optionally, a focal loss function is used for the two-classification loss at the pixel level, so that on one hand, the rust misclassification of the pixel is punished, and on the other hand, the imbalance of the positive and negative samples is balanced, and the imbalance of the positive and negative samples in rust detection in an actual scene can be well coped with.

As shown in fig. 5, an embodiment of the present application provides a rust detection method, including:

step 501, acquiring an image to be detected;

step 502, analyzing an image to be detected by a rust detection system to obtain a segmentation map of the image to be detected; wherein, the rust detection system is the rust detection system provided by the embodiment of the application;

step 503, calculating and labeling the rust position in the image to be detected.

performing maximum pooling on the first-layer output characteristic diagram of the coding network, and then filtering through a receptive field module to obtain a second-layer output characteristic diagram of the coding network;

performing maximum pooling on the output characteristic diagram of the second layer of the coding network, and then filtering through a receptive field module to obtain an output characteristic diagram of the third layer of the coding network;

performing maximum pooling on the output characteristic diagram of the third layer of the coding network, and then filtering through a receptive field module to obtain an output characteristic diagram of the fourth layer of the coding network;

Optionally, analyzing the image to be detected by the rust detection system to obtain a segmentation map of the image to be detected, further comprising:

deconvoluting the fifth-layer output characteristic diagram of the coding network to obtain an image with the same size as the fourth-layer output characteristic diagram of the coding network, splicing the image with the fourth-layer output characteristic diagram of the coding network in channel dimension, and filtering the image through an enhanced receptive field module to obtain the fourth-layer output characteristic diagram of the decoding network;

For example, the detection network provided by the present application can perform the following processing on an input image:

filtering an input image by an enhanced receptive field module to obtain a first-layer output characteristic diagram in a coding stage, wherein the number of channels is 64;

performing 2 multiplied by 2 maximum value pooling operation on the first layer output feature map in the encoding stage to reduce the size of the feature map by half; then filtering the data by an enhanced receptive field module to obtain a second-layer output characteristic diagram in a coding stage, wherein the number of channels is 128;

2 x 2 maximum value pooling operation is carried out on the second layer output characteristic diagram in the encoding stage, so that the size of the characteristic diagram is reduced by half again; then filtering the data by an enhanced receptive field module to obtain a third-layer output characteristic diagram in a coding stage, wherein the number of channels is 256;

performing 2 multiplied by 2 maximum value pooling operation on the third layer output characteristic diagram in the encoding stage, so that the size of the characteristic diagram is reduced by half again; then, filtering is carried out through an enhanced receptive field module to obtain a fourth-layer output characteristic diagram in a coding stage, wherein the number of channels is 512;

performing 2 multiplied by 2 maximum value pooling operation on the fourth layer output characteristic diagram in the encoding stage to reduce the size of the characteristic diagram to half again; then filtering the signals by an enhanced receptive field module to obtain a fifth-layer output characteristic diagram in a coding stage, wherein the number of channels is 1024;

carrying out 2 x 2 deconvolution operation on the fifth-layer output characteristic diagram in the encoding stage to enable the size of the characteristic diagram to be doubled so as to ensure that the size of the characteristic diagram is the same as that of the fourth-layer output characteristic diagram in the encoding stage; then simply splicing with a fourth-layer output characteristic diagram in the encoding stage on the channel dimension; then filtering the signal by an enhanced reception field module to obtain a fourth layer output characteristic diagram in a decoding stage, wherein the number of channels is 512;

2 x 2 deconvolution operation is carried out on the fourth layer output characteristic diagram in the decoding stage, so that the size of the characteristic diagram is doubled, and the same size as that of the third layer output characteristic diagram in the encoding stage is ensured; then simply splicing with a third layer output characteristic diagram in the encoding stage on the channel dimension; filtering the data by an enhanced receptive field module to obtain a third-layer output characteristic diagram in a decoding stage, wherein the number of channels is 256;

2 x 2 deconvolution operation is carried out on the third layer output characteristic diagram in the decoding stage, so that the size of the characteristic diagram is doubled, and the characteristic diagram is ensured to keep the same size with the second layer output characteristic diagram in the encoding stage; then simply splicing with a second-layer output characteristic diagram in the encoding stage on the channel dimension; then filtering the data by an enhanced receptive field module to obtain a second-layer output characteristic diagram in a decoding stage, wherein the number of channels is 128;

2 x 2 deconvolution operation is carried out on the second layer output characteristic diagram in the decoding stage, so that the size of the characteristic diagram is doubled, and the characteristic diagram is ensured to keep the same size as the first layer output characteristic diagram in the encoding stage; then simply splicing the first layer of output characteristic diagram in the encoding stage on the channel dimension; then filtering the data by an enhanced receptive field module to obtain a first-layer output characteristic diagram in a decoding stage, wherein the number of channels is 64;

and finally, performing conventional 1 × 1 convolution operation on the first layer output characteristic graph in the decoding stage to obtain an output segmentation graph. Because rust detection is a binary problem, the number of output segmentation graph channels is 1, and the value range is 0-1.

And according to the output segmentation graph, when the output value of a certain pixel is greater than 0.5, marking the pixel as a rust pixel.

Optionally, the enhanced receptive field module comprises:

the third convolution layer is used for processing the characteristic diagram output by the second convolution layer;

and respectively carrying out convolution operation on the original feature maps of the enhanced receptor field modules through two different 1 × 1 convolution kernels, and outputting a feature map A1 and a feature map A2.

Optionally, the second convolution layer is configured to process a feature map output by the first convolution layer, and includes:

carrying out convolution operation on the original characteristic diagram of the enhanced receptive field module through a1 × 1 convolution kernel, and outputting a characteristic diagram B0; at this time, in order to add detail information, the input original feature map is filtered again using the 1 × 1 convolution kernel in the second layer; performing convolution operation on the feature map A1 through a 3 × 3 convolution kernel to output a feature map B1;

performing convolution operation on the feature map A2 through a 5 × 5 convolution kernel to output a feature map B2;

this use of multi-branch pooling layers with different kernels is used to simulate multiple sizes of receptive fields.

processing the feature map B0 through a 3 × 3 convolution kernel and a hole convolution filter with an expansion rate of 5, and outputting a feature map C2;

this use of the hole convolution layer reproduces the relationship between the receptive field size and eccentricity in the human visual system.

splicing the feature map C0, the feature map C1 and the feature map C2, filtering the spliced feature maps through a1 × 1 convolution kernel, and outputting a feature map D1; the splicing can achieve the purpose of fusing different characteristics.

Optionally, the second fusion module is configured to add the original feature map of the enhanced receptive field module and the feature map output by the first fusion module, and add the original feature map of the enhanced receptive field module and the feature map output by the fusion module after performing reduction operation on the original feature map, where the reduction operation can avoid disappearance of gradients in the deep network, so that network training is easier.

Optionally, the output module activates the feature map output by the second fusion module, and outputs the activated feature map, where the output and the input of the module keep the same size and channel number.

Optionally, the convolution operation all uses a filling strategy to ensure that the resolution is consistent with that before convolution.

Optionally, obtaining a segmentation map of the image to be detected includes:

performing convolution operation on the output characteristic diagram of the first layer in the decoding stage to obtain a segmentation diagram of the image to be detected, wherein the number of channels of the segmentation diagram is 1, and the value range is 0-1;

the image-level classification loss adopts common cross entropy loss;

the pixel-level binary penalty uses a focus penalty function.

Optionally, calculating and labeling the rust position in the image to be detected includes:

Fig. 6 shows a block diagram of a computer device provided in an exemplary embodiment of the present application. The computer device includes: a processor 610 and a memory 620. The computer device may be the processing terminal 140 in the embodiment of fig. 1.

The processor 610 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP. The processor may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The memory 620 is connected to the processor 610 through a bus or other means, and at least one instruction, at least one program, a code set, or an instruction set is stored in the memory 620, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the rust detection method executed by the processing terminal in the above embodiments.

The memory 620 may be a volatile memory (or a nonvolatile memory), a non-volatile memory (or a combination thereof). The volatile memory may be a random-access memory (RAM), such as a static random-access memory (SRAM) or a dynamic random-access memory (DRAM). The nonvolatile memory may be a Read Only Memory (ROM), such as a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), and an Electrically Erasable Programmable Read Only Memory (EEPROM). The non-volatile memory may also be a flash memory, a magnetic memory, such as a magnetic tape, a floppy disk, or a hard disk. The non-volatile memory may also be an optical disc.

The present invention also provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the rust detection method according to any one of the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A rust detection system, comprising: an encoding network and a decoding network; the output of the encoding network is connected with the input of the decoding network; wherein the content of the first and second substances,

the fifth layer of the encoding network is connected with the fifth layer of the decoding network;

wherein the enhanced receptive field module comprises:

2. A method of rust detection, the method comprising:

acquiring an image to be detected;

analyzing an image to be detected by a rust detection system to obtain a segmentation map of the image to be detected; wherein the rust detection system is obtained according to claim 1;

and calculating and marking the rust positions in the image to be detected.

3. The method of claim 2, wherein: analyzing an image to be detected by a rust detection system to obtain a segmentation map of the image to be detected, wherein the segmentation map comprises the following steps:

4. The method of claim 3, wherein: the analyzing the image to be detected by the rust detection system to obtain the segmentation map of the image to be detected, further comprising:

5. The method of claim 2, wherein the enhanced receptive field module comprises:

6. The method of claim 5, wherein the first convolution layer for processing the raw feature map of the enhanced receptor field module comprises:

7. The method of claim 5, wherein the second convolutional layer for processing the feature map output by the first convolutional layer comprises:

8. The method of claim 5, wherein the third convolutional layer for processing the feature map output by the second convolutional layer comprises:

9. The method of claim 5, wherein the first fusing module is configured to process the feature map output by the third convolutional layer, and comprises:

10. The method according to any one of claims 5 to 9, wherein the convolution operations each employ a padding strategy to ensure that the resolution is consistent with that before convolution.

11. The method of claim 2, wherein the obtaining of the segmentation map of the image to be detected comprises:

the image-level classification loss adopts common cross entropy loss;

the pixel-level binary loss employs a focus loss function.

12. The method of claim 2, wherein the calculating and labeling of rust locations in the image to be detected comprises:

13. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction or program that is loaded and executed by the processor to implement the rust detection method of any one of claims 2 to 10.

14. A computer-readable storage medium, wherein at least one instruction or program is stored, which is loaded and executed by a processor to implement the rust detection method of any one of claims 2 to 10.