CN116740547A

CN116740547A - Digital twinning-based substation target detection method, system, equipment and medium

Info

Publication number: CN116740547A
Application number: CN202310787855.2A
Authority: CN
Inventors: 李文琢; 于若颜; 常乃超; 张炜; 张金虎; 张海燕; 赵娜; 刘筱萍; 王化鹏; 李亚蕾; 李昂; 纪欣; 崔旭; 姜佳宁; 李劲松; 沈艳; 赵铭洋; 南祎; 刘洋; 彭聪
Original assignee: Nanjing University of Aeronautics and Astronautics; China Electric Power Research Institute Co Ltd CEPRI
Current assignee: Nanjing University of Aeronautics and Astronautics; China Electric Power Research Institute Co Ltd CEPRI
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-09-12

Abstract

A method, a system, equipment and a medium for detecting a transformer substation target based on digital twinning comprise the steps of collecting transformer substation image data and constructing a training data set, a verification data set and a test data set; training a pre-established target detection network model by utilizing a training data set, wherein the target detection network model is obtained by integrating a MobileNet v2 model, a YOLOv5 model and an Openpost model; verifying the trained target detection network model by using the verification data set, and storing the target detection network model with the optimal performance index; and inputting the test data set into a target detection network model with optimal performance indexes, and outputting a substation target identification result. The method can accurately identify the real-time state of basic environment parameters (namely environment, equipment and operators) of the digital twin system of the transformer substation, has good accuracy, robustness and real-time performance, and realizes the dynamic synchronization between the physical equipment and the virtual representation thereof in the actual scene of the transformer substation.

Description

Digital twinning-based substation target detection method, system, equipment and medium

Technical Field

The application belongs to the technical field of intelligent substations, and particularly relates to a digital twinning-based substation target detection method, a digital twinning-based substation target detection system, digital twinning-based substation target detection equipment and digital twinning-based substation target detection media.

Background

Under the age background of digital informatization, the construction of a digital twin system of the transformer substation is promoted, so that the operation, management and service of the transformer substation are virtually imported, virtual control is realized through modeling, simulation, deduction and control in a virtual space, the self-perception, self-decision and self-evolution capability of the transformer substation are enhanced, the digitization and intelligent transformation of the transformer substation are promoted, and the method is an inevitable stage and a necessary way for building an energy Internet enterprise.

At present, a great amount of field operation demands exist in daily operation and maintenance management of a transformer substation, and personnel scheduling is frequent, so that in order to ensure production safety, an actual environment, equipment and operators are required to be regarded as important elements of a digital model together when the digital twin model of the transformer substation is constructed. The transformer substation has the characteristics of wide field, complex environment, numerous devices and the like, and meanwhile, the behavior of an operator has high autonomy and uncertainty. Under the influence of factors such as illumination conditions, shooting distance, angles and the like, under the conditions that target pixels are too small and feature information is sparse, the existing algorithm is difficult to meet the requirement of accurately and rapidly identifying targets. Therefore, the digital twin system construction of the intelligent substation needs a high-precision, high-speed and high-robustness multi-type small target detection algorithm under a large-scale complex scene to cope with the characteristics.

Disclosure of Invention

The application aims to solve the problems in the prior art and provides a digital twinning-based substation target detection method, a digital twinning-based substation target detection system, digital twinning-based substation target detection equipment and digital twinning-based substation medium, which can finish detection of a static object and human body gesture recognition of an operator in a wide-field and numerous substation actual scenes of the digital twinning-based substation, and have good accuracy, robustness and real-time performance.

In order to achieve the above purpose, the present application has the following technical scheme:

in a first aspect, a method for detecting a target of a substation based on digital twinning is provided, including:

substation image data are collected and input into a target detection network model, wherein the target detection network model is obtained based on integration of a MobileNetv2 model, a Yolov5 model and an Openphase model;

and outputting a substation target identification result by the target detection network model.

Preferably, a training data set, a verification data set and a test data set are constructed through the collected substation image data;

training a pre-established target detection network model by utilizing the training data set;

verifying the trained target detection network model by using the verification data set, and storing the target detection network model with the optimal performance index;

inputting the test data set into a target detection network model with optimal performance indexes, and outputting a substation target identification result;

the substation image data is used for shooting the substation environment in multiple directions, multiple angles and multiple scenes through the camera equipment, adjusting the size of the image to 320 multiplied by 320, and finishing data annotation.

Preferably, the target detection network model comprises a feature extraction module, a feature fusion module, a target detection module and a gesture recognition module;

the feature extraction module uses a depth separable convolutional network of the mobilenet v2 model to perform feature extraction by replacing CSPDarknet53 in the YOLOv5 model; the feature fusion module integrates and fuses the feature images extracted by the shallow convolution of the MobileNet v2 network and the feature images extracted by the deep convolution; the target detection module is a YOLOv5 model, inputs a feature map extracted by a MobileNet v2 network, and outputs a substation static object identification result; the gesture recognition module adopts an Openphase model, replaces a VGG-19 network by the feature fusion module to perform feature extraction, and outputs a substation operator gesture recognition result.

Preferably, the feature extraction module has three basic convolution modules, the first being an extended convolution module that uses a 1×1 convolution to extend the number of channels in the input data; the second is a depth convolution module, filtering the input from the previous module using a 3 x 3 convolution module without a pooling layer; the third is a projection convolution module that projects high-dimensional data into low-dimensional data using a 1 x 1 convolution;

the first module and the second module use a linear activation function instead of a ReLU activation function.

Preferably, the feature fusion module uses dilation convolution to reduce the size of the shallow feature map, and the calculation expression is as follows:

where α is the fill pixel value, r is the expansion ratio, l is the step size, k is the size of the convolution kernel, S _out Is the size of the output characteristic diagram, S _in Is the size of the input feature map;

and employs standard deconvolution of the YOLOv5 model to reduce the size of the incremental deep feature map while compressing the number of channels of the feature map according to:

wherein C is _d Is the output of the d-th channel of the characteristic diagram, E _dhw Is the pixel in the H row and W column of the d-th channel, h×w is the size of the picture;

fusing the deep feature map and the shallow feature map by re-distributing weights to splice a new feature map f _new 。

Preferably, in said target detectionIn the module, the final output of the MobileNet v2 network is 10×10 feature diagram, the YOLOv5 model is detected according to the feature diagram of the MobileNet v2 network, and the output is Pre _i1 The method comprises the steps of carrying out a first treatment on the surface of the Then up-sampling the 10×10 feature map, fusing with the 20×20 feature map output by the previous convolution layer, detecting by the YOLOv5 model according to the fused 20×20 feature map, and outputting as Pre _i2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the 20×20 feature map is up-sampled and fused with the 30×30 feature map output from the previous convolution layer, and the YOLOv5 model is detected according to the fused 30×30 feature map, and output as Pre _i3 Finally, synthesizing Pre by the YOLOv5 model _i1 、Pre _i2 And Pre _i3 And outputting the identification result of the static object in the digital twin scene of the transformer substation.

Preferably, the openphase model of the gesture recognition module comprises two parallel convolution networks, wherein one convolution network Branch1 is output as Part Confidence Maps and is used for positioning key points of a human body; another convolutional network Branch2, output Part Affinity Fields, is used to connect body keypoints to form limbs; the whole network of the Openphase model comprises a plurality of stages, the output result of each stage and the label calculate an L2 loss function, and the loss function of the whole network is the sum of the loss functions calculated by each stage.

In a second aspect, a digital twinning-based substation target detection system is provided, including:

the substation image data acquisition module is used for acquiring substation image data and inputting the substation image data into the target detection network model, wherein the target detection network model is obtained based on the integration of the MobileNetv2, the YOLOv5 and the Openpost model;

and the identification result output module is used for outputting a substation target identification result by the target detection network model.

Preferably, the substation image data acquisition module acquires substation image data and constructs a training data set, a verification data set and a test data set;

In a third aspect, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the digital twinning-based substation target detection method when executing the computer program.

In a fourth aspect, a computer readable storage medium is provided, where a computer program is stored, where the computer program, when executed by a processor, implements the digital twinning-based substation target detection method.

Compared with the prior art, the application has at least the following beneficial effects:

the digital twinning-based substation target detection method can finish detection of static objects and human body gesture recognition of operators in a substation actual scene with wide field and numerous devices, and has good accuracy, robustness and real-time performance. In order to improve the accuracy of simulation and prediction results of a digital twin system of an intelligent substation, the target detection network model is obtained based on integration of a MobileNet v2 model, a YOLOv5 model and an Openpost model, and mainly solves the problem of two real-time target recognition in practical application; secondly, remote human body gesture recognition, which is an important dynamic environment parameter in digital twinning, has high autonomy and uncertainty of the operator's behavior, and the whole body characteristics of the operator are not generally available due to the different angles and distances of the cameras. Compared with the traditional algorithm based on the key points of the human bones, the target detection network model can objectively depict the gesture and the behavior characteristics of an operator.

It will be appreciated that the advantages of the second to fourth aspects may be found in the relevant description of the first aspect and are not repeated here.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a substation target detection method based on digital twinning in an embodiment of the application;

FIG. 2 is a schematic diagram of a target detection network model according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Referring to fig. 1, the method for detecting a target of a transformer substation based on digital twin provided by the embodiment of the application is used for accurately identifying real-time states of basic environmental parameters (i.e. environment, equipment and operators) of a digital twin system of the transformer substation, and aims to realize dynamic synchronization between physical equipment and virtual representation thereof, and specifically comprises the following steps:

s1, collecting substation image data and constructing a training data set, a verification data set and a test data set;

s2, training a pre-established target detection network model by using a training data set, wherein the target detection network model is obtained by integrating a MobileNet v2 model, a YOLOv5 model and an Openpost model;

s3, verifying the trained target detection network model by using a verification data set, and storing the target detection network model with the optimal performance index;

and S4, inputting the test data set into a target detection network model with optimal performance indexes, and outputting a substation target identification result.

In one possible implementation, step S1 photographs the substation environment through the photographing apparatus, where the photographing should meet the requirements of multiple directions, multiple angles, and multiple scenes. In consideration of the influence of factors such as resolution of shooting equipment, illumination and the like in the actual detection process, a blurred image can be artificially introduced to enhance the robustness of an algorithm. The captured image is sized 320×320, and can be adjusted according to 7:2:1 to make corresponding three data sets for training, testing and verifying.

Referring to fig. 2, in one possible implementation, the object detection network model is integrated based on MobileNetv2, YOLOv5 and openpore models, and is used to identify real-time status of intelligent substation environments, devices and operators. The established target detection network model comprises a feature extraction module, a feature fusion module, a target detection module and a gesture recognition module.

Wherein the feature extraction module performs feature extraction by replacing CSPDarknet53 in the YOLOv5 model using a depth separable convolutional network of the MobileNetv2 model; the feature fusion module integrates and fuses the feature images extracted by the shallow convolution of the MobileNet v2 network and the feature images extracted by the deep convolution; the target detection module is a YOLOv5 model, inputs a feature map extracted by a MobileNet v2 network, and outputs a substation static object identification result; the gesture recognition module adopts an Openphase model, replaces a VGG-19 network by the feature fusion module to perform feature extraction, and outputs a substation operator gesture recognition result.

Furthermore, the feature extraction module uses a depth separable convolution network of the MobileNet v2 model to replace CSPDarknet53 in YOLOv5 for feature extraction, so that abundant semantic information can be provided, and real-time performance of small target detection is effectively improved. The input of the module is the marked data in the data set obtained in the step S1, and the marked data is output as a feature map containing rich semantic information. Specifically, the feature extraction module includes three convolution modules. The first is a spread convolution module, which uses a 1 x 1 convolution to spread the number of channels in the input data. The second is a depth convolution module, using a 3 x 3 convolution module without a pooling layer to filter the input from the previous module. The third is a projective convolution module that projects high-dimensional data into low-dimensional data using a 1 x 1 convolution. In addition, a linear activation function is used in the first and second modules to replace the original ReLU activation function to mitigate information loss and corruption.

The feature fusion module integrates and fuses the feature map extracted by the shallow convolution of the MobileNet v2 network and the feature map extracted by the deep convolution, so that detail features from the shallow layer and features from the deep layer are realizedFeature fusion of deep semantic features can save computing resources and alleviate the problems of gradient disappearance and performance degradation in over-deep convolution. Specifically, the shallow feature map needs to be reduced in size and enlarged in receptive field (receptive field), and this process is done using dilation convolution (dilated convolution) to avoid information loss, as shown in computational expression (1). Further, standard deconvolution based on YOLOv5 model is employed to increase the size of the deep feature map while compressing the number of channels of the feature map in accordance with the calculation expression (2). Finally, fusing the deep feature map and the shallow feature map by re-distributing weights to splice a new feature map f _new 。

Where α is the fill pixel value, r is the expansion ratio, l is the step size, k is the size of the convolution kernel, S _out Is the size of the output characteristic diagram, S _in Is the size of the input feature map.

Wherein C is _d Is the output of the d-th channel of the characteristic diagram, E _dhw Is the pixel in the H row and W column of the d-th channel, and h×w is the size of the picture.

The target detection module is YOLOv5, inputs a characteristic diagram extracted by a MobileNetv2 network, and outputs a target detection result. Specifically, the final output of the mobilenet v2 network is a 10×10 feature map, and YOLOv5 is detected from the feature map and output is Pre _i1 The method comprises the steps of carrying out a first treatment on the surface of the Upsampling the 10×10 feature map, fusing it with the 20×20 feature map output by the previous convolution layer, detecting YOLOv5 according to the fused 20×20 feature map, and outputting as Pre _i2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the 20×20 feature map is up-sampled and fused with the 30×30 feature map output from the previous convolution layer, and YOLOv5 is detected from the fused 30×30 feature map and output as Pre _i3 . Finally, the step of obtaining the product,YOLOv5 comprehensive Pre _i1 、Pre _i2 And Pre _i3 And outputting a detection result of the static small object in the digital twin complex scene of the transformer substation. In conclusion, the network module obtained by integrating MobileNet v2 and YOLOv5, namely YOLOv5-Mv2, is used for detecting and identifying static small targets in a digital twin system of a transformer substation.

The gesture recognition module is an openpost network. In order to enhance the nonlinear fitting capability of the network and further improve the accuracy of remote human body gesture recognition, the feature fusion module is adopted to replace VGG-19 for feature extraction. The openpost network can be regarded as a parallel convolution network model, wherein one convolution network Branch1 has an output of Part Confidence Maps and is used for positioning key points of a human body; another convolutional network Branch2, output Part Affinity Fields, is used to connect body keypoints to form a limb. The whole network comprises a plurality of stages, the output result of each stage is combined with the label to calculate the L2 loss function, so that the network converges in the direction of each stage towards the label, the training and identifying precision of the network are accelerated, and the loss function of the whole network is the sum of the loss functions calculated by each stage.

In order to improve the accuracy of simulation and prediction results of a digital twin system of an intelligent substation, the application integrates and improves MobileNet v2, YOLOv5 and Openpose, and provides a digital twin small target detection model of the intelligent substation. Firstly, a depth separable convolution network of MobileNet v2 is integrated into YOLOv5 to replace the original CSPDarknet53 for feature extraction, so that abundant semantic information is provided for the YOLOv5, the calculation efficiency is effectively improved, and the requirement of digital twin on real-time performance is met. The whole YOLOv5-Mv2 module is used for static small object detection in digital twinning of the intelligent substation. Meanwhile, the feature fusion module is designed, shallow detail information and deep semantic information are further fused, and the fused feature map is used as input of an Openpost network, so that calculation resources can be saved, and the problems of gradient disappearance and performance reduction in over-deep convolution are solved. The improved Openphase network can reduce unnecessary background noise and focus on learning accurate human skeleton characteristics so as to improve the detection accuracy of digital twin long-distance human body gesture recognition.

The method can accurately identify the real-time state of basic environmental parameters (namely environment, equipment and operators) of the digital twin system of the transformer substation, has good accuracy, robustness and real-time performance, realizes dynamic synchronization between physical equipment and virtual representation thereof in the actual scene of the transformer substation, and is beneficial to modeling, monitoring and optimizing the operation process of the intelligent transformer substation.

Another embodiment of the present application further provides a digital twinning-based substation target detection system, including:

In one possible implementation, the substation image data acquisition module acquires substation image data and constructs a training data set, a validation data set, and a test data set; training a pre-established target detection network model by using a training data set; verifying the trained target detection network model by using the verification data set, and storing the target detection network model with the optimal performance index; inputting the test data set into a target detection network model with optimal performance indexes, and outputting a substation target identification result;

the substation image data is used for shooting the substation environment in multiple directions, multiple angles and multiple scenes through the camera equipment, the size of the image is adjusted to 320 multiplied by 320, and the data annotation is completed.

In one possible implementation, the object detection network model includes a feature extraction module, a feature fusion module, a target detection module and a gesture recognition module, wherein the feature extraction module uses a depth separable convolution network of the MobileNetv2 model to perform feature extraction by replacing the CSPDarknet53 in the YOLOv5 model; the feature fusion module integrates and fuses the feature images extracted by the shallow convolution of the MobileNet v2 network and the feature images extracted by the deep convolution; the target detection module is a YOLOv5 model, inputs a feature map extracted by a MobileNet v2 network, and outputs a substation static object identification result; the gesture recognition module adopts an Openphase model, replaces a VGG-19 network by the feature fusion module to perform feature extraction, and outputs a substation operator gesture recognition result.

Further, the feature extraction module has three basic convolution modules, the first is an extended convolution module, and the number of channels in the input data is extended by using 1×1 convolution; the second is a depth convolution module, filtering the input from the previous module using a 3 x 3 convolution module without a pooling layer; the third is a projection convolution module that projects high-dimensional data into low-dimensional data using a 1 x 1 convolution; the first module and the second module use a linear activation function instead of a ReLU activation function.

Further, the feature fusion module uses dilation convolution to reduce the size of the shallow feature map, and the calculation expression is as follows:

Furthermore, in the target detection module, the final output of the mobilenet v2 network is 10×10 feature map, the YOLOv5 model detects according to the feature map of the mobilenet v2 network, and the output is Pre _i1 The method comprises the steps of carrying out a first treatment on the surface of the Then up-sampling the 10×10 feature map, fusing with the 20×20 feature map output by the previous convolution layer, detecting by the YOLOv5 model according to the fused 20×20 feature map, and outputting as Pre _i2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the 20×20 feature map is up-sampled and fused with the 30×30 feature map output from the previous convolution layer, and the YOLOv5 model is detected according to the fused 30×30 feature map, and output as Pre _i3 Finally synthesizing Pre by the YOLOv5 model _i1 、Pre _i2 And Pre _i3 And outputting the identification result of the static object in the digital twin scene of the transformer substation.

Further, the openpost model of the gesture recognition module comprises two parallel convolution networks, wherein one convolution network Branch1 is output as Part Confidence Maps and is used for positioning key points of a human body; another convolutional network Branch2, output Part Affinity Fields, is used to connect body keypoints to form a limb. The whole network of the Openpost model comprises a plurality of stages, the output result of each stage and the label calculate an L2 loss function, and the loss function of the whole network is the sum of the loss functions calculated by each stage.

Another embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the substation target detection method based on digital twinning when executing the computer program.

Another embodiment of the present application also proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the digital twinning-based substation target detection method.

The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable storage medium may include: any entity or device, medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, software distribution media, and the like capable of carrying the computer program code. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals. For convenience of description, the foregoing disclosure shows only those parts relevant to the embodiments of the present application, and specific technical details are not disclosed, but reference is made to the method parts of the embodiments of the present application. The computer readable storage medium is non-transitory and can be stored in a storage device formed by various electronic devices, and can implement the execution procedure described in the method according to the embodiment of the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the application without departing from the spirit and scope of the application, which is intended to be covered by the claims.

Claims

1. The method for detecting the target of the transformer substation based on digital twinning is characterized by comprising the following steps of:

2. The digital twinning-based substation target detection method according to claim 1, wherein a training dataset, a verification dataset and a test dataset are constructed from the acquired substation image data;

3. The digital twinning-based substation target detection method according to claim 1, wherein the target detection network model comprises a feature extraction module, a feature fusion module, a target detection module and a gesture recognition module;

4. A digital twinning-based substation target detection method according to claim 3, wherein the feature extraction module has three basic convolution modules, the first one being an expansion convolution module that uses 1 x 1 convolution to expand the number of channels in the input data; the second is a depth convolution module, filtering the input from the previous module using a 3 x 3 convolution module without a pooling layer; the third is a projection convolution module that projects high-dimensional data into low-dimensional data using a 1 x 1 convolution;

5. A digital twinning-based substation target detection method according to claim 3, wherein the feature fusion module uses dilation convolution to reduce the size of the shallow feature map, and the calculation expression is as follows:

6. A digital twinning-based substation target detection method according to claim 3, wherein in the target detection module, the final output of the MobileNetv2 network is 10×10 feature map, the YOLOv5 model detects according to the feature map of the MobileNetv2 network, and the output is Pre _i1 The method comprises the steps of carrying out a first treatment on the surface of the Upsampling the 10×10 feature map, and outputting with 20×20 feature map of previous convolution layerThe YOLOv5 model is fused, detected according to the 20X 20 characteristic diagram after fusion, and output as Pre _i2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the 20×20 feature map is up-sampled and fused with the 30×30 feature map output from the previous convolution layer, and the YOLOv5 model is detected according to the fused 30×30 feature map, and output as Pre _i3 Finally, synthesizing Pre by the YOLOv5 model _i1 、Pre _i2 And Pre _i3 And outputting the identification result of the static object in the digital twin scene of the transformer substation.

7. The digital twinning-based substation target detection method according to claim 3, wherein the openpost model of the gesture recognition module comprises two parallel convolution networks, wherein one convolution network Branch1 is output as Part Confidence Maps and is used for positioning key points of a human body; another convolutional network Branch2, output Part Affinity Fields, is used to connect body keypoints to form limbs; the whole network of the Openphase model comprises a plurality of stages, the output result of each stage and the label calculate an L2 loss function, and the loss function of the whole network is the sum of the loss functions calculated by each stage.

8. A digital twinning-based substation target detection system, comprising:

9. The digital twinning-based substation target detection system of claim 8, wherein: the substation image data acquisition module acquires substation image data and constructs a training data set, a verification data set and a test data set;

10. The digital twinning-based substation target detection system of claim 8, wherein: the target detection network model comprises a feature extraction module, a feature fusion module, a target detection module and a gesture recognition module;

11. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized by: the processor, when executing the computer program, implements the digital twinning-based substation target detection method according to any one of claims 1 to 7.

12. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements a digital twinning-based substation target detection method according to any one of claims 1 to 7.