CN117314986A

CN117314986A - Unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation

Info

Publication number: CN117314986A
Application number: CN202311501960.1A
Authority: CN
Inventors: 张明达; 胡洁; 王涨; 娄一艇; 卲淦; 赵天昊; 申兴发
Original assignee: Ningbo Hengchen Electric Power Construction Co ltd; Hangzhou Dianzi University
Current assignee: Ningbo Hengchen Electric Power Construction Co ltd; Hangzhou Dianzi University
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2023-12-29

Abstract

The invention discloses a semantic segmentation-based unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method. The invention utilizes a universal semantic segmentation model with zero sample migration capability to segment a cross-mode unmanned aerial vehicle power distribution network inspection image and generate a mask image, and extracts edge information from the mask image by using a filtering operator and a non-maximum suppression method; and extracting and matching characteristic points of the obtained edge image, and calculating a perspective transformation matrix according to the nearest neighbor matching result, so that equipment coordinates in visible light are mapped into an infrared thermal imaging image, and the temperature is read by using an SDK provided by a lens manufacturer. The invention improves the problem that the traditional edge extraction algorithm is easy to be interfered by complex background by introducing the semantic information of the image, and provides a new method for the registration of the cross-mode image.

Description

Unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation

Technical Field

The invention belongs to the technical field of power equipment detection, and particularly relates to a method for registering infrared thermal imaging and visible light image data of power distribution equipment acquired by an unmanned aerial vehicle.

Background

It is counted that about one quarter of the failures in the circuitry are caused by loosening, oxidation, wear, corrosion, etc. of the lines. Before the faults occur, certain symptoms such as heating, overheating, burning and the like of equipment in a distribution line can occur. And heating is particularly easy to occur on wire clamps, insulators, lightning arresters and other devices. Therefore, if the temperature of equipment such as the wire clamp can be effectively and automatically monitored on line, once the abnormal temperature condition occurs, the equipment is immediately uploaded to alarm, so that maintenance personnel are informed of timely eliminating hidden danger, and the purpose of preventing the accident can be achieved by eliminating the fault in a sprouting state.

In recent years, unmanned aerial vehicles and thermal imaging technology have been widely used as a novel power equipment detection means in industrial equipment monitoring, substation inspection and other applications. The technology can remotely capture the fault information of the power equipment under the contactless condition, and is beneficial to the power system to complete early fault prevention and diagnosis. However, in the actual inspection process of the power distribution network, the infrared image is often doped with a large number of interference items, and especially in a nodding angle, the ground background has a great influence on the identifiability of the thermal imaging result. The high-resolution visible light image has great advantages in the aspect of positioning the equipment, so that the combination of the advantages of the two modal images can be realized after the coordinate transformation relation between the two modal images is established. Because different lenses are required to shoot images of different modes, parameters such as visual angles, focal lengths and the like of the images are different, and a registration algorithm which is not influenced by the modes is required to establish a coordinate mapping relation.

The focus of image registration is to extract feature points of an object to be identified, but the visible light image and the infrared thermal imaging image cannot be directly applied because of the difference of modes and the image features cannot be less in similarity. The method for extracting the edge images in two modes is generally capable of unifying features, however, the existing algorithm can accurately register the target main body only when the existing algorithm is used in pure images with few interference items and insignificant noise and taking the sky as the background. When an inspection image taken at a top view angle is related, the inspection image is disturbed by the edge information of a complicated object in the background, and accurate extraction of the feature points cannot be realized.

Disclosure of Invention

Aiming at the defects of the existing cross-mode image registration algorithm, the invention provides a cross-mode power distribution equipment inspection image registration method of an unmanned aerial vehicle based on semantic segmentation in order to fully utilize the flexibility and the accuracy of unmanned aerial vehicle acquisition and monitoring technologies.

The process of the invention comprises the following steps:

step one: unmanned aerial vehicle data acquisition, unmanned aerial vehicle carry infrared and visible light camera, gather distribution network equipment image, including visible light image and infrared thermal imaging image.

Step two: and (3) carrying out semantic segmentation on the image, namely respectively carrying out integral semantic segmentation on the visible light image and the infrared thermal imaging image obtained in the step one.

Step three: generating mask images, and generating mask images with different colors for pixels with different semantics according to the semantic segmentation result in the second step.

Step four: and extracting the edge, namely extracting the edge image of the generated mask image.

Step five: and extracting the characteristic points, extracting the characteristic points from the edge image and generating descriptors.

Step six: and (3) matching the characteristic points, namely directly carrying out nearest neighbor matching on the characteristic points extracted from the visible light image and the infrared thermal imaging image to obtain an optimal matching point pair.

Step seven: calculating a transformation matrix, eliminating error matching point pairs and calculating the transformation matrix;

step eight: and (3) calculating the coordinates of the same equipment in the infrared thermal imaging image according to the equipment coordinates in the visible light image and the transformation matrix obtained in the step seven, and reading the equipment temperature information.

Further, images under two modes shot by the unmanned aerial vehicle in the first step are visible light images and infrared thermal imaging images with different resolutions shot by the same machine under the same time and visual angle.

Further, the specific method in the second step is as follows:

and scaling the visible light image to a size similar to that of the infrared thermal imaging image by a linear interpolation method, and performing semantic segmentation on the scaled visible light image by using a pre-trained good Segment analysis model.

The Segment analysis model is provided with two inputs of a prompt word and a source image, the inputs are processed by a prompt word encoder and an image encoder, and then the encoding result is processed by a mask image decoder, so that a final mask image is obtained.

Further, the specific method in the third step is as follows:

the Segment analysis model in the second step is used for respectively dividing an input zoomed visible light image and an input infrared thermal imaging image under the condition of no prompting word input to obtain two result texts, respectively generating transparent images with the same size as a source visible light image and an infrared thermal imaging image based on the two result texts, filling each divided block in the result texts with random colors to form an initial mask image, and drawing the initial mask image to the corresponding transparent image to obtain an integral mask image.

Further, the specific method in the fourth step is as follows:

extracting edge information of the whole mask image obtained in the step three by utilizing a Sobel operator, and performing non-maximum suppression on the obtained edge information to obtain an edge image; specifically, for each pixel point in the edge information, firstly calculating the gradient of the pixel point along the x and y directions to obtain the gradient direction of the pixel point, and taking the pixel point as a central pixel point, so that the maximum value in the 8 fields is necessarily present in the gradient direction. And calculating the gradient values of the sub-pixel points at the possible two maximum positions in the gradient direction by a linear interpolation method, comparing the sub-pixel points with the central pixel point, and if the central pixel point is the maximum value, considering the sub-pixel points as edge points to be reserved, otherwise, discarding the sub-pixel points.

Further, the fifth specific method comprises the following steps:

and (3) Gaussian blur is applied to the edge image extracted in the step four, and different Gaussian kernels G (sigma) are used for convolution of each scale sigma with the edge image to obtain convolution images of different scales. And generating two Gaussian difference images corresponding to each scale in the same mode, and detecting extreme points. And carrying out Gaussian smoothing on the Gaussian difference image, and then constructing a Gaussian difference pyramid to obtain images with different scales under the same mode.

And for each layer of image in the Gaussian differential pyramid, detecting extreme points by using the DOG pyramid to obtain characteristic points.

For each feature point, a gaussian fit is used to determine its position and scale.

For each feature point, the direction histogram is counted and the direction of the maximum value is found as the main direction of the feature point by calculating the gradient amplitude and direction of the image area around the feature point.

For each feature point, a descriptor is used to describe its feature. Dividing an image area near the feature points into a plurality of sub-areas, counting gradient amplitude values and directions of each sub-area, weighting by using a Gaussian weighting function to obtain a feature vector, and finally obtaining a high-dimensional vector containing 128 elements, wherein each element represents a gradient value in one direction.

Further, the sixth specific method comprises the following steps:

and (3) mapping the high-dimensional vector obtained in the step five into a K-Means Tree established in a low-dimensional space by utilizing a FLANN algorithm through random projection and random sensitive hash, searching in the low-dimensional space by combining a clustering method, and then reversely mapping the search result back to the original high-dimensional space to find the nearest neighbor matching point, thereby forming a matching point pair.

Further, the seventh specific method comprises the following steps:

and D, performing outlier elimination and transformation matrix calculation on the matched point pair obtained in the step six by using OpenCV.

Further, the eighth specific method comprises the following steps:

and acquiring equipment coordinate information of the temperature to be detected in the visible light image, mapping the coordinate into an infrared thermal imaging image coordinate system through a transformation matrix, and reading a temperature value by utilizing an SDK (software description language) provided by a lens manufacturer and used for reading image information in a specific format according to the transformed coordinate information.

The invention has the following beneficial effects:

the system fully utilizes the flexibility of the unmanned aerial vehicle for sampling equipment information in the inspection line, innovatively introduces an image semantic segmentation model to solve the problem that the traditional image registration algorithm is easy to interfere under the condition of complex background processing so that effective feature points are difficult to extract, improves the registration algorithm flow, improves the registration success rate and accuracy, and provides a new method for cross-mode image information automatic processing.

Drawings

FIG. 1 is a schematic flow chart of an algorithm of the invention;

FIG. 2 is an example of a data source image for use with the present invention;

FIG. 3 is a diagram of a Segment analysis model employed in the present invention;

FIG. 4 is a mask image generated from the semantic segmentation results of images in two modalities according to the present invention;

FIG. 5 is an image of the invention after edge extraction of mask images for two modalities;

fig. 6 is a graph showing the result of registering edge images of two modalities in accordance with the present invention.

Detailed Description

The invention will be described in detail below with reference to the accompanying drawings for better explanation of the invention.

The invention fully combines semantic information contained in pixels in the image, combines a neural network applied by zero sample migration and a traditional feature point extraction algorithm, completes the distinction of the identification main body and the background, extracts edges by means of mask images, and realizes accurate registration.

The algorithm provided by the invention comprises the following steps:

step one: the unmanned aerial vehicle carries infrared and visible light cameras, acquires images of power distribution network equipment, including visible light images and infrared thermal imaging images, and the images are shown in fig. 2.

Step two: and (3) respectively carrying out integral semantic segmentation on the pictures of the two modes obtained in the step one.

Scaling the visible light image to a size similar to the infrared thermal imaging image by a linear interpolation method, wherein the scaling Scale is calculated according to the following formula:

Sacle＝len _T /len _Z

wherein len _T The number of pixels in the lateral direction for the infrared thermographic image, len _Z Is the number of horizontal pixels of the visible light image.

And carrying out semantic segmentation on the scaled visible light image by using a pre-trained Segment analysis model.

The Segment analysis model receives two inputs of a prompt word and a source image, processes the inputs through a prompt word encoder and an image encoder, and processes the encoding result through a mask image decoder to obtain a final mask image, wherein the model structure is shown in figure 3. Which uses a multiple-task penalty function including masking predicted penalty L _mask Edge prediction loss L _edge Sum mask diversity penalty L _div The total loss function is:

L _total ＝L _mask +λ ₁ L _edge +λ ₂ L _div

wherein lambda is ₁ And lambda (lambda) ₂ Is a hyper-parameter used to balance the different penalty terms.

Step three: and generating mask images with different colors for pixels with different semantics according to the semantic segmentation result in the step two.

The Segment analysis model in the second step is used for respectively dividing the input zoomed visible light image and the input infrared thermal imaging image under the condition of no prompting word input to obtain two result texts, respectively generating transparent images with the same size as the source visible light image and the infrared thermal imaging image based on the two result texts, filling each divided block in the result texts with random colors to form an initial mask image, and drawing the initial mask image to the corresponding transparent image to obtain an integral mask image, as shown in fig. 4.

Step four: and extracting an edge image from the generated mask image.

Specifically, let C be the center pixel, its gradient value be grad (i, j), its horizontal gradient be grad _x (i, j) gradient in vertical direction of grad _y (i, j) the gradient values of the sub-pixel points are grad respectively _t1 (i, j) and grad _t2 (i, j) the linear interpolation weight is weight, and the gradient values of four field points used in interpolation are grad respectively ₁ (i,j),grad ₂ (i,j),grad ₃ (i, j) and grad ₄ (i,j)。

Then calculating gradient values of two sub-pixel points, wherein:

grad _t1 (i,j)＝weight(i,j)grad ₁ (i,j)+(1-weight(i,j))grad ₂ (i,j)

grad _t2 (i,j)＝weught(i,j))grad ₃ (i,j)+(1-weight(i,j))grad ₄ (i,j)

and comparing the two sub-pixel points with the central point gradient value, if the central point gradient value is the maximum value in the three values, reserving, otherwise, setting the central point gradient value to 0. The edge extraction result is shown in FIG. 5

Step five: feature points are extracted from the edge image and descriptors are generated.

Where x represents the coordinates of a position in the image, x ₀ Representing the coordinates of the neighboring points of this point, Δx represents the distance between them, and D (x) represents the value in the gaussian difference space at this position. If the difference between a pixel point and its surrounding pixels is compared, it can be determined whether the current pixel point is an extreme point.

Where m (x, y) represents the gradient magnitude of the image area around the center pixel point and θ (x, y) represents the gradient direction.

f(x,y,σ)＝d ₁ ,d ₂ ,…,d ₁₂₈

Where f (x, y, σ) represents a feature vector with coordinates (x, y) at scale σ, d _i Representing their eigenvalues in different dimensions.

Step six: and (3) carrying out nearest neighbor matching on the feature points extracted from the images under two modes by utilizing a FLANN algorithm, mapping the high-dimensional feature point data obtained in the step (V) into a K-Means Tree established in a low-dimensional space by Means of random projection and random sensitive hash, searching in the low-dimensional space by combining a clustering method, and then reversely mapping the search result back to the original high-dimensional space to find nearest neighbor data points as optimal matching.

Step seven: and D, performing outlier elimination and homography matrix calculation on the matched point pair obtained in the step six by using OpenCV, wherein an RANSAC algorithm is selected as an outlier elimination algorithm. The registration results are shown in fig. 6.

Step eight: and acquiring equipment coordinate information of the temperature to be detected in the visible light image, mapping the coordinate into an infrared thermal imaging image coordinate system through a transformation matrix, and reading a temperature value by utilizing an SDK (software description language) provided by a lens manufacturer and used for reading image information in a specific format according to the coordinate information.

Examples:

in this embodiment, 100 groups of image data collected in an actual distribution line of a power supply company in Ningbo city of Zhejiang province are used as effect verification data, and each group includes a visible light image and an infrared thermal imaging image. For each group of images, at least four groups of point pairs scattered in different directions of target equipment are manually marked to serve as true values, the algorithm flow provided by the invention is applied to process, for the coordinate of the marked point in each visible light image, the mapping coordinate in the infrared image space is calculated by utilizing the transformation matrix obtained in the step seven, and the transformation accuracy of the point pairs is counted. The experimental results were evaluated by the index Precision and RMSE (root mean square error), which are defined as follows. The RMSE is the root mean square error of the coordinates obtained by the algorithm after the transformation of the visible light image matching points and the reference matching points obtained by manual calibration in the thermal imaging image'. The calculation formula of RMSE is as follows:

wherein%And->The new coordinates obtained by perspective transformation of the coordinates of the ith pair of matching points in the original visible light image and the coordinates of the ith pair of matching points in the corresponding thermal imaging image, which are marked manually, are respectively obtained, and n is the number of the matching point pairs. The smaller the RMSE, the better the matching method is.

Precision is the ratio of the number of correct matches to the total number of matches, and the calculation formula of Precision is as follows:

the Correct Matches is the number of Correct Matches, and since only rough marking is performed when the matching points are manually marked, the marking points have errors of about + -5 pixels in the thermal imaging image and errors of about + -50 pixels in the visible light image, so that we set the error range of the matching point pairs to + -25 pixels, namely when the euclidean distance between the two matching point pairs to the coordinates is less than or equal to 25, the two matching point pairs are considered to be correctly matched, namely:

the remaining matching point pairs are then classified as mismatching point pairs, i.e., false Matches.

Table 1 test results

	Failed	Precision	RMSE
				Canny+SIFT	8	0.314	14.03
Canny+ORB	19	0.142	16.81
				Canny+BRISK	4	0.220	15.03
This embodiment	0	0.859	5.14

As shown in the test results of Table 1, the accuracy of the algorithm flow provided by the invention for the registration task of the cross-mode image reaches 85.9%, the root mean square error is 5.14, and compared with other methods without introducing image semantic information extraction, the accuracy is obviously improved, the root mean square error is obviously reduced, and the algorithm flow has practical significance.

The invention, in part not described in detail, is within the skill of those skilled in the art.

Claims

1. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation is characterized by comprising the following steps of:

step one: the unmanned aerial vehicle data acquisition, wherein the unmanned aerial vehicle carries infrared cameras and visible cameras, and acquires images of power distribution network equipment, including visible light images and infrared thermal imaging images;

step two: the image semantic segmentation is carried out, and the visible light image and the infrared thermal imaging image obtained in the first step are respectively subjected to integral semantic segmentation;

step three: generating mask images, and generating mask images with different colors for pixels with different semantics according to the semantic segmentation result in the second step;

step four: edge extraction, namely extracting an edge image of the generated mask image;

step five: extracting feature points, extracting feature points from the edge image and generating descriptors;

step six: feature point matching, namely directly carrying out nearest neighbor matching on feature points extracted from a visible light image and an infrared thermal imaging image to obtain an optimal matching point pair;

step eight: according to the equipment coordinates in the visible light image and the transformation matrix obtained in the step seven, calculating the coordinates of the same equipment in the infrared thermal imaging image and reading the equipment temperature information;

2. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 1, wherein the second specific method is as follows:

Sacle＝lem _T /len _Z

wherein len _T For infrared thermographic imagesNumber of lateral pixels len _Z The number of horizontal pixels of the visible light image;

performing semantic segmentation on the scaled visible light image by using a pre-trained Segment analysis model;

3. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 2, wherein the third specific method is as follows:

4. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 3, wherein the method is characterized by comprising the following steps:

extracting edge information of the whole mask image obtained in the step three by utilizing a Sobel operator, and performing non-maximum suppression on the obtained edge information to obtain an edge image; specifically, for each pixel point in the edge information, firstly calculating the gradient of the pixel point along the x and y directions to obtain the gradient direction of the pixel point, and taking the pixel point as a central pixel point, so that the maximum value in the 8 fields is necessarily present in the gradient direction; and calculating the gradient values of the sub-pixel points at the possible two maximum positions in the gradient direction by a linear interpolation method, comparing the sub-pixel points with the central pixel point, and if the central pixel point is the maximum value, considering the sub-pixel points as edge points to be reserved, otherwise, discarding the sub-pixel points.

5. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 4, wherein the fifth specific method is as follows:

applying Gaussian blur to the edge image extracted in the step four, and convoluting the edge image with different Gaussian kernels G (sigma) for each scale sigma to obtain convolution images with different scales; each scale in the same mode correspondingly generates two Gaussian difference images for detecting extreme points; carrying out Gaussian smoothing on the Gaussian difference image, and then constructing a Gaussian difference pyramid to obtain images with different scales under the same mode;

for each layer of image in the Gaussian differential pyramid, detecting extreme points by using the DOG pyramid to obtain characteristic points;

for each feature point, a gaussian fit is used to determine its position and scale;

for each feature point, calculating the gradient amplitude and direction of the image area around the feature point, counting a direction histogram and finding out the direction of the maximum value as the main direction of the feature point;

for each feature point, a descriptor is used to describe its feature; dividing an image area near the feature points into a plurality of sub-areas, counting gradient amplitude values and directions of each sub-area, weighting by using a Gaussian weighting function to obtain a feature vector, and finally obtaining a high-dimensional vector containing 128 elements, wherein each element represents a gradient value in one direction.

6. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 5, wherein the sixth specific method is as follows:

7. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 1 or 6, wherein the seventh specific method is as follows:

and performing outlier elimination and transformation matrix calculation on the obtained matching point pairs by using OpenCV.

8. The unmanned aerial vehicle cross-mode power distribution equipment inspection image registration method based on semantic segmentation according to claim 7, wherein the eighth concrete method is as follows: