CN113569727A

CN113569727A - Method, system, terminal and medium for identifying construction site in remote sensing image

Info

Publication number: CN113569727A
Application number: CN202110851731.7A
Authority: CN
Inventors: 王彤; 魏瑞增; 王磊; 饶章权; 黄勇; 周恩泽; 刘淑琴; 田翔; 罗颖婷; 鄂盛龙; 许海林; 江俊飞; 石墨; 李晖; 成国雄; 范亚洲
Original assignee: Guangdong Power Grid Co Ltd; Electric Power Research Institute of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Electric Power Research Institute of Guangdong Power Grid Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-29
Anticipated expiration: 2041-07-27
Also published as: CN113569727B

Abstract

The application discloses a method, a system, a terminal and a medium for identifying a construction site in a remote sensing image, wherein the method comprises the following steps: carrying out data preprocessing on the obtained remote sensing image; inputting the processing result into a CenterNet model for feature extraction; predicting the extracted feature map; and decoding the prediction result to generate a recognition result. The method is based on the CenterNet framework, so that the recognition speed, the recognition precision and the flexibility of detecting the shape of the complex object are improved; the Hourglass-104 network is improved through deep separable convolution, and the parameter calculation amount can be greatly reduced on the premise of keeping the precision basically the same as that of normal convolution operation, so that the network detection efficiency is improved; the ECA module replaces the cross-layer connection of the original Hourglass-104 network, so that background information in the image can be suppressed, more target characteristic information can be extracted, and finally, the detection precision of the model is greatly improved.

Description

Method, system, terminal and medium for identifying construction site in remote sensing image

Technical Field

The application relates to the technical field of artificial intelligence and image recognition, in particular to a method, a system, a terminal and a medium for recognizing a construction site in a remote sensing image.

Background

With the popularization of the application of the satellite remote sensing image in the power industry, the automatic inspection of the power transmission line based on the satellite remote sensing image and the image processing technology becomes possible. The construction site is used as a common satellite remote sensing image target, and has important influence on the safety of the power transmission line. In the construction process, if the operation of workers is improper, the tower pole is extremely easy to collapse, the wire is easily touched and the like, so that the safety accidents of the power transmission line are influenced, and the automatic intelligent identification of the violation construction site on the power transmission line corridor is significant.

At present, the construction site identification method for the power transmission line corridor mainly comprises the following steps of extracting a vehicle region according to the color and shape characteristics of a vehicle, and finally identifying the vehicle by adopting a direction gradient histogram and a support vector machine. However, since the background of the satellite remote sensing image is complex, the target shape of the construction site is complex, and the target area extraction and identification error detection rate is high and the robustness is poor only by manually setting the target color and the target shape as the basis. The other method is an unmanned aerial vehicle aerial image construction vehicle target detection algorithm based on a MobileNet-SSD model, in order to improve the calculation efficiency, the method replaces VGG-16 with a lightweight network MobileNet, and the detection precision is improved by adopting a hole convolution to increase the receptive field of a shallow feature map. But SSD networks require manual setting of the initial values of the pre-selected boxes, resulting in a very empirical debugging process. Meanwhile, the method also has the problems of large noise, large calculation amount and long training time due to the introduction of excessive parameters.

Disclosure of Invention

The application aims to provide a method, a system, a terminal and a medium for identifying a construction site in a remote sensing image, so as to solve the problems of high noise, large calculated amount and long training time in remote sensing image identification in the prior art.

In order to overcome the defects in the prior art, the application provides a method for identifying a construction site in a remote sensing image, which comprises the following steps:

carrying out data preprocessing on the obtained remote sensing image;

inputting the processing result into an improved CenterNet model for feature extraction;

predicting the extracted feature map;

and decoding the prediction result to generate a recognition result.

Further, the method for identifying the construction site in the remote sensing image further comprises the following steps: the partial convolution in the residual module of the Hourglass-104 network in the CenterNet model is replaced with a depth separable convolution.

Further, the method for identifying the construction site in the remote sensing image further comprises the following steps: the cross-layer connection of the Hourglass-104 network is replaced with a channel attention Module ECA.

Further, the predicting the extracted feature map includes: and performing target center point prediction, center point offset prediction and target scale prediction on the feature graph.

The application also provides a recognition system of construction site in remote sensing image, include:

the preprocessing unit is used for preprocessing the data of the acquired remote sensing image;

the characteristic extraction unit is used for inputting the processing result into the improved CenterNet model to carry out characteristic extraction;

a prediction unit for predicting the extracted feature map;

and the identification unit is used for decoding the prediction result to generate an identification result.

Further, the feature extraction unit is further configured to replace the partial convolution in the residual module of the Hourglass-104 network in the cenernet model with a depth separable convolution.

Further, the feature extraction unit is further configured to replace a cross-layer connection of the Hourglass-104 network with a channel attention module ECA.

Further, the prediction unit is further configured to perform target center point prediction, center point offset prediction, and target scale prediction on the feature map.

The present application further provides a terminal device, including: the remote sensing image identification method comprises a processor, a memory and a computer program which is stored in the memory and is configured to be executed by the processor, wherein the processor executes the computer program to realize the identification method of the construction site in the remote sensing image.

The application also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program is executed by a processor to realize the method for identifying the construction site in the remote sensing image.

Compared with the prior art, the beneficial effects of this application lie in:

1) the application adopts the latest Anchor-Free detection model CenterNet framework. Because the target detection model is a single-stage target detection model, the target detection model has better balance in speed and precision compared with typical models such as SSD and YOLOv 3; secondly, the Gaussian kernel is used for representing the center of the object, so that the flexibility of detecting the shape of the complex object is improved;

2) the Hourglass-104 network is improved by utilizing the deep separable convolution, so that the parameter calculation amount can be greatly reduced under the condition of keeping the precision approximately same as that of the normal convolution operation, and the network detection efficiency is improved;

3) the improved feature extraction network ECA-Hourglass-104 network can suppress background information in the image, extract more target feature information and greatly improve the detection precision of the model.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for identifying a construction site in a remote sensing image according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a centret model provided in an embodiment of the present application;

FIG. 3 is a block diagram illustrating the structure of a depth separable convolutional residual block according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an ECA channel attention module according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a Hourglass network architecture based on ECA modification according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a system for identifying a construction site in a remote sensing image according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that the step numbers used herein are for convenience of description only and are not used as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

In a first aspect:

referring to fig. 1, an embodiment of the present application provides a method for identifying a construction site in a remote sensing image, including:

and S10, carrying out data preprocessing on the obtained remote sensing image.

In this step, since the acquired remote sensing image data set is large and there is a lot of noise interference irrelevant to the recognition target, the acquired remote sensing image is first subjected to data preprocessing. As shown in fig. 2, in the improved centret model structure provided in this embodiment of fig. 2, the remote sensing image is first input into the preprocessing module, wherein the parameters "constraint" and "residual constraint" of the preprocessing module are both set to 2.

And S20, inputting the processing result into the improved CenterNet model for feature extraction.

In this step, feature extraction is performed mainly based on the feature extraction network ECA-Hourglass-104 in the improved centret model structure.

In one embodiment, the acquired satellite remote sensing image data set is large, the image resolution is high, the background is complex, a large amount of calculation is generated in the processing process, and the training time is long. The present embodiment thus improves on the Hourglass-104 using a depth separable convolution.

It should be noted that the Hourglass-104 Network is composed of two Hourglass-shaped Hourglass networks connected together, and each Hourglass Network is composed of residual modules. The convolution operation in the residual block produces a large amount of computation during the network training process, so the embodiment of the present application replaces the partial convolution in the residual block with the depth separable convolution with a relatively small amount of computation, and the improved residual block is shown in fig. 3.

Further, the basic idea of depth separable convolution is to split the standard convolution into two layer implementations. The first layer is a layer-by-layer convolution, which performs lightweight filtering by applying a convolution filter to each input channel; the second layer, called point-by-point convolution, is responsible for building new features by computing linear combinations of input channels. When the complexity of the network is high, the deep separable convolution can reduce a large amount of calculation while hardly affecting the detection precision, thereby improving the network detection efficiency.

Wherein the depth separable convolution is compared to the standard convolution computation as follows:

in the formula, h × w × c is the dimension of the input feature map, and k is the number of convolution kernels for point-by-point convolution. By contrast, the depth separable convolution is calculated only as (9+ k)/9k of the standard convolution.

In one embodiment, the obtained satellite remote sensing image has a large observation area and a complex background, so that a large amount of characteristic information irrelevant to a target is extracted by a characteristic extraction network in a training process, and the network convergence is slow. Therefore, the lightweight and efficient channel attention module ECA is introduced in the embodiment to improve the capability of the feature extraction network to extract the target information.

Specifically, the ECA module, as shown in fig. 4, first, after the channel-level global average pooling without dimensionality reduction, the ECA captures local cross-channel mutual information by considering each channel and its k neighbors to avoid information loss that would be caused for the process. And then, the obtained parameters and the feature graph are subjected to product operation, different channel feature weights are redistributed, the invalid features are well inhibited, and the weights of the valid features are enhanced.

By analyzing the Network structure, the topology of each Hourglass Network is symmetrical, and the modules symmetrical to each other have the same resolution and are connected across layers. If the ECA is used for replacing simple cross-layer connection in the original network, the weight of effective features can be enhanced in the feature extraction process of the feature extraction network, and meanwhile, the ineffective features are restrained, so that a feature map with higher quality can be obtained. The improved Hourglass Network is shown in FIG. 5.

And S30, predicting the extracted feature map.

In the step, the main feature map respectively enters a target central point prediction branch, a central point offset prediction branch and a target scale prediction branch so as to perform target central point prediction, central point offset prediction and target scale prediction and generate corresponding prediction results.

And S40, decoding the prediction result to generate a recognition result.

In this step, the prediction results obtained from the three branches are decoded, so that the conversion from the point to the bounding box is realized, and the recognition result is finally generated.

The method for identifying the construction site in the remote sensing image, provided by the embodiment of the application, is based on the CenterNet frame, and improves the identification speed, the identification precision and the flexibility of detecting the shape of a complex object; the Hourglass-104 network is improved through deep separable convolution, so that the parameter calculation amount can be greatly reduced on the premise of keeping the precision basically the same as that of normal convolution operation, and the network detection efficiency is improved; the ECA module replaces the cross-layer connection of the original Hourglass-104 network, so that background information in the image can be suppressed, more target characteristic information can be extracted, and finally, the detection precision of the model is greatly improved.

In a second aspect:

referring to fig. 6, an embodiment of the present application further provides a system for identifying a construction site in a remote sensing image, including:

the preprocessing unit 01 is used for preprocessing data of the acquired remote sensing image;

the feature extraction unit 02 is used for inputting the processing result into the CenterNet model to perform feature extraction;

a prediction unit 03 configured to predict the extracted feature map;

and the identification unit 04 is used for decoding the prediction result and generating an identification result.

In a certain embodiment, the feature extraction unit 02 is further configured to replace the partial convolution in the residual module of the Hourglass-104 network in the cenertet model with a depth separable convolution.

In one embodiment, the feature extraction unit 02 is further configured to replace the cross-layer connection of the Hourglass-104 network with a channel attention Module ECA.

In a certain embodiment, the prediction unit 03 is further configured to perform target center point prediction, center point offset prediction, and target scale prediction on the feature map.

The identification system of the construction site in the remote sensing image is used for executing the identification method of the construction site in the remote sensing image, and the method is based on a CenterNet frame, so that the identification speed and precision are improved, and the flexibility of detecting the shape of a complex object is improved; the Hourglass-104 network is improved through deep separable convolution, so that the parameter calculation amount can be greatly reduced on the premise of keeping the precision basically the same as that of normal convolution operation, and the network detection efficiency is improved; the ECA module replaces the cross-layer connection of the original Hourglass-104 network, so that background information in the image can be suppressed, more target characteristic information can be extracted, and finally, the detection precision of the model is greatly improved.

In a third aspect:

referring to fig. 7, an embodiment of the present application further provides a terminal device, where the terminal device includes:

a processor, a memory, and a bus;

the bus is used for connecting the processor and the memory;

the memory is used for storing operation instructions;

the processor is configured to call the operation instruction, and the executable instruction enables the processor to execute an operation corresponding to the identification method of the construction site in the remote sensing image according to the first aspect of the application.

In an alternative embodiment, there is provided a terminal device, as shown in fig. 7, the terminal device shown in fig. 7 includes: a processor 001 and a memory 003. Where processor 001 is coupled to memory 003, such as by bus 002. Optionally, the terminal device may also include a transceiver 004. It should be noted that the transceiver 004 is not limited to one in practical application, and the structure of the terminal device does not constitute a limitation to the embodiments of the present application.

The processor 001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 001 may also be a combination that performs computing functions, including for example, one or more microprocessors, a combination of DSPs and microprocessors, and the like.

Bus 002 may include a path to transfer information between the aforementioned components. The bus 002 may be a PCI bus or an EISA bus, etc. The bus 002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.

The memory 003 can be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 003 is used for storing application program codes for performing the present solution and is controlled in execution by the processor 001. Processor 001 is configured to execute application code stored in memory 003 to implement any of the method embodiments described above.

Wherein, the terminal device includes but is not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like.

Yet another embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program, which, when run on a computer, causes the computer to perform the respective ones of the aforementioned method embodiments.

The foregoing is a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations are also regarded as the protection scope of the present application.

Claims

1. A method for identifying a construction site in a remote sensing image is characterized by comprising the following steps:

carrying out data preprocessing on the obtained remote sensing image;

predicting the extracted feature map;

and decoding the prediction result to generate a recognition result.

2. The method for identifying a construction site in a remote sensing image according to claim 1, further comprising: the partial convolution in the residual module of the Hourglass-104 network in the CenterNet model is replaced with a depth separable convolution.

3. The method for identifying a construction site in a remote sensing image according to claim 2, further comprising: the cross-layer connection of the Hourglass-104 network is replaced with a channel attention Module ECA.

4. The method for identifying a construction site in a remote sensing image according to any one of claims 1 to 3, wherein the predicting the extracted feature map comprises: and performing target center point prediction, center point offset prediction and target scale prediction on the feature graph.

5. The utility model provides an identification system of construction site in remote sensing image which characterized in that includes:

a prediction unit for predicting the extracted feature map;

6. The system for identifying a construction site in a remote sensing image according to claim 5, wherein the feature extraction unit is further configured to replace a partial convolution in a residual module of a Hourglass-104 network in the CenterNet model with a depth separable convolution.

7. The system for identifying a construction site in a remote sensing image according to claim 6, wherein the feature extraction unit is further configured to replace a cross-layer connection of the Hourglass-104 network with a channel attention module ECA.

8. The system for identifying a construction site in a remote sensing image according to any one of claims 5 to 7, wherein the prediction unit is further configured to perform target center point prediction, center point offset prediction and target scale prediction on the feature map.

9. A terminal device, comprising: a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of identifying a construction site in a remote sensing image according to any one of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the method for identifying a construction site in a remote sensing image according to any one of claims 1 to 4.