CN117058385A

CN117058385A - Image segmentation method and device based on intelligent fusion of multi-system data

Info

Publication number: CN117058385A
Application number: CN202311066461.4A
Authority: CN
Inventors: ***; 赵峰; 赵林林; 刘茂凯; 王誉博; 许中平; 谢可; 安丽利; 吴晓峰; 张维; 张朔
Original assignee: Beijing Sgitg Accenture Information Technology Co ltd; State Grid Information and Telecommunication Co Ltd
Current assignee: Beijing Sgitg Accenture Information Technology Co ltd; State Grid Information and Telecommunication Co Ltd
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-11-14

Abstract

The embodiment of the invention discloses an image segmentation method and device based on intelligent fusion of multi-system data. One embodiment of the method comprises the following steps: inputting the image feature extraction information and the text feature extraction information of the target equipment into a multi-modal feature fusion network to obtain multi-modal feature fusion information corresponding to the target power equipment; inputting the target equipment image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power equipment image segmentation model to obtain filtered multi-modal feature fusion information; and responding to the fact that the target multi-mode fusion characteristic information meets a first preset characteristic condition, and generating device image segmentation information according to the target multi-mode fusion characteristic information, wherein the target multi-mode fusion characteristic information is fusion characteristic information of target device image characteristic extraction information and filtered multi-mode characteristic fusion information. This embodiment improves the accuracy of the image segmentation.

Description

Image segmentation method and device based on intelligent fusion of multi-system data

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to an image segmentation method and device based on intelligent fusion of multi-system data.

Background

The image of the power equipment is segmented, and the foreground region of the power equipment in the image can be extracted. Currently, in image segmentation, the following methods are generally adopted: the monochromatic gradual change background in the image is removed based on the visual mode, so that a segmentation result of the image is obtained.

However, the following technical problems generally exist in the above manner:

firstly, when a large amount of redundant substances exist around power equipment in an image, the interference factors for dividing the image are more, and the image of the power equipment is divided only through a visual mode, so that the accuracy of image identification of the power equipment in the image is lower, the accuracy of image division of the power equipment is lower, and the image of the redundant substances exists in the divided power equipment image;

secondly, the traditional neural network model is not fine enough for the segmentation of the image, so that redundant substances exist in the segmented image;

third, when the feature information is decoded, the decoded information is not detected, and when the decoded image information is inaccurate, the image segmentation effect is affected.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose an image segmentation method, an electronic device, and a computer-readable medium based on intelligent fusion of multi-system data to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide an image segmentation method based on intelligent fusion of multi-system data, the method comprising: the power equipment information acquisition system acquires equipment image information and equipment text information of target power equipment, and sends the image information and the text information to an associated image segmentation system; the image segmentation system responds to the received device image information and the device text information, and inputs the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information; the image segmentation system uses the device image feature extraction information as target device image feature extraction information, and based on the target device image feature extraction information and the device text feature extraction information, the image segmentation system executes the following processing steps: inputting the target equipment image feature extraction information and the equipment text feature extraction information into a multi-modal feature fusion network included in the power equipment image segmentation model to obtain multi-modal feature fusion information corresponding to the target power equipment; inputting the target equipment image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power equipment image segmentation model to obtain filtered multi-modal feature fusion information; and generating equipment image segmentation information corresponding to the equipment image information according to the target multi-mode fusion characteristic information in response to determining that the target multi-mode fusion characteristic information meets a first preset characteristic condition, wherein the target multi-mode fusion characteristic information is the fusion characteristic information of the target equipment image characteristic extraction information and the filtered multi-mode characteristic fusion information.

In a second aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a third aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

The above embodiments of the present disclosure have the following advantageous effects: by the image segmentation method based on intelligent fusion of the multi-system data, which is disclosed by the embodiment of the invention, the accuracy of image segmentation is improved. In particular, the reason for the presence of the image of redundant substances in the segmented power plant image is that: when a large amount of redundant substances exist around the power equipment in the image, the interference factor for dividing the image is more, and the image of the power equipment is divided only through a visual mode, so that the accuracy of image identification of the power equipment in the image is lower, and the accuracy of image division of the power equipment is lower, so that the image of the redundant substances exists in the divided power equipment image. Based on this, in some embodiments of the present disclosure, an image segmentation method based on intelligent fusion of multi-system data first, a power device information acquisition system acquires device image information and device text information of a target power device, and sends the image information and the text information to an associated image segmentation system. And secondly, the image segmentation system responds to the received device image information and the device text information, and inputs the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information. Thus, the image features and text features corresponding to the image information can be obtained, and the image features and text features can be used for dividing the image of the target power equipment. Then, the image segmentation system uses the device image feature extraction information as target device image feature extraction information, and based on the target device image feature extraction information and the device text feature extraction information, performs the following processing steps: inputting the target equipment image feature extraction information and the equipment text feature extraction information into a multi-modal feature fusion network included in the power equipment image segmentation model to obtain multi-modal feature fusion information corresponding to the target power equipment; therefore, the multi-mode characteristic information after the image characteristic extraction information of the target equipment and the text characteristic extraction information of the equipment are subjected to fusion processing can be obtained. And then, inputting the target equipment image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power equipment image segmentation model to obtain filtered multi-modal feature fusion information. Therefore, the multi-mode characteristic information can be filtered through the target equipment image characteristic extraction information, so that redundant characteristics irrelevant to target power equipment in the multi-mode characteristic information are filtered, the obtained filtered multi-mode characteristic information can be used for generating image segmentation information of the image information, and the redundant characteristics can be areas corresponding to redundant substances in the image. And finally, responding to the fact that the target multi-mode fusion characteristic information meets a first preset characteristic condition, and generating equipment image segmentation information corresponding to the equipment image information according to the target multi-mode fusion characteristic information. The target multi-mode fusion characteristic information is fusion characteristic information of the target device image characteristic extraction information and the filtered multi-mode characteristic fusion information. Thus, multi-modal image division information corresponding to the target power device can be obtained. Image segmentation information generated from the image feature extraction information and the filtered text information can be obtained. And obtaining a characteristic diagram which characterizes the image segmentation processing corresponding to the target power equipment. Also, since the image division information of the target power device is generated based on the extracted target device image feature extraction information and the device text feature extraction information, the image of the substance is not divided based on only the image feature extraction information. And the text information may include descriptive information of the device's outline, which may be used to improve the accuracy of determining the area of the target power device in the image.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of an image segmentation method based on intelligent fusion of multi-system data according to the present disclosure;

fig. 2 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a flow chart of some embodiments of an image segmentation method based on intelligent fusion of multi-system data according to the present disclosure. A flow 100 of some embodiments of an image segmentation method based on intelligent fusion of multi-system data according to the present disclosure is shown. The image segmentation method based on the intelligent fusion of the multi-system data comprises the following steps:

step 101, a power equipment information acquisition system acquires equipment image information and equipment text information of a target power equipment, and sends the image information and the text information to an associated image segmentation system.

In some embodiments, the power device information acquisition system may acquire device image information and device text information of the target power device and send the image information and the text information to an associated image segmentation system. The power device information collection system may be a computing device for collecting device image information and device text information of the target power device. The target power device may be any of a number of power devices. For example, the target power device may be a transformer, a circuit breaker. The device image information may be an image including the target power device. The device text information may be text describing the target power device. The associated image segmentation system may refer to a computing device having image segmentation functionality in communication with the power device information acquisition system.

Optionally, the image segmentation system acquires a training sample set of device information.

In some embodiments, the image segmentation system described above may obtain a training sample set of device information. The device information training samples in the device information training sample set comprise a device image information sample and a device text information sample.

Optionally, the image segmentation system selects an equipment information training sample from the equipment information training sample set, and performs the following training steps:

the first step, inputting the equipment image information sample and the equipment text information sample which are included in the equipment information training sample into an initial image feature extraction network which is included in an initial power equipment image segmentation model to obtain initial equipment image feature extraction information and initial equipment text feature extraction information. The initial power device image segmentation model may refer to an untrained neural network model that is used to generate segmentation information for the image.

And secondly, inputting the initial equipment image feature extraction information into a network layer for image cue mapping, which is included in an initial multi-mode feature fusion network included in the initial power equipment image segmentation model, so as to obtain initial equipment image cue feature information corresponding to the initial equipment image feature extraction information. The initial multimodal feature fusion network may refer to an untrained multimodal feature fusion network.

And thirdly, inputting the initial equipment text feature extraction information into a network layer for text query mapping included in the initial multi-mode feature fusion network to obtain initial equipment text query feature information corresponding to the initial equipment text feature extraction information. The initial multimodal feature fusion network may refer to an untrained multimodal feature fusion network.

And fifthly, carrying out fusion processing on the initial equipment image clue characteristic information and the initial equipment text query characteristic information to obtain initial multi-mode characteristic fusion information. And performing dot-multiplication operation on the initial equipment image clue characteristic information and the initial equipment text query characteristic information to obtain initial multi-mode characteristic fusion information.

And step six, inputting the initial equipment image feature extraction information into a network layer for image query mapping, which is included in an initial target modal feature filtering network included in the initial power equipment image segmentation model, so as to obtain initial equipment image query feature information. The initial target modality feature filtering network may refer to an untrained target modality feature filtering network.

And seventhly, converting the initial multi-mode characteristic fusion information to obtain initial multi-mode clue characteristic fusion information. The conversion processing here may take part in the conversion processing of the multimodal feature fusion information.

And eighth step, fusing the initial equipment image feature extraction information and the filtered initial multi-mode feature fusion information into initial target multi-mode fusion feature information.

And a ninth step of inputting the initial target multi-mode fusion characteristic information into an initial image decoding network included in the initial power equipment image segmentation model to obtain initial decoding characteristic information. The initial image decoding network may be an untrained image decoding network.

And tenth, determining a loss value between the initial decoding characteristic information and a sample label corresponding to the equipment information training sample. That is, a loss value between the sample tag corresponding to the initial decoding feature information and the device information training sample may be determined by a loss function. For example, the loss function may be a hinge loss function. ""

And eleventh, determining the initial power equipment image segmentation model as a trained power equipment image segmentation model in response to determining that the loss value is less than or equal to a preset threshold.

The above-mentioned related matters are taken as an invention point of the present disclosure, and solve the second technical problem mentioned in the background art, that the conventional neural network model is not fine enough for dividing the image, so that redundant substances exist in the divided image. ". The factors that lead to the presence of redundant material in the segmented image are often as follows: traditional neural network models are not fine enough for segmentation of images, resulting in redundant materials in the segmented images. If the above factors are solved, the effect of avoiding the redundant substances in the segmented image can be achieved. In order to achieve the effect, firstly, the equipment image information sample and the equipment text information sample which are included in the equipment information training sample are input into an initial image feature extraction network which is included in an initial power equipment image segmentation model, and initial equipment image feature extraction information and initial equipment text feature extraction information are obtained. And secondly, inputting the initial equipment image feature extraction information into a network layer for image cue mapping, which is included in an initial multi-mode feature fusion network included in the initial power equipment image segmentation model, so as to obtain initial equipment image cue feature information corresponding to the initial equipment image feature extraction information. And then, inputting the initial equipment text feature extraction information into a network layer for text query mapping included in the initial multi-mode feature fusion network to obtain initial equipment text query feature information corresponding to the initial equipment text feature extraction information. And then, carrying out fusion processing on the initial equipment image clue characteristic information and the initial equipment text query characteristic information to obtain initial multi-mode characteristic fusion information. And then, inputting the initial equipment image feature extraction information into a network layer for image query mapping, which is included in an initial target modal feature filtering network included in the initial power equipment image segmentation model, so as to obtain initial equipment image query feature information. Then, converting the initial multi-mode characteristic fusion information to obtain initial multi-mode clue characteristic fusion information; and filtering the initial multi-modal clue feature fusion information according to the initial equipment image query feature information to obtain filtered initial multi-modal feature fusion information. And then fusing the initial equipment image feature extraction information and the filtered initial multi-mode feature fusion information into initial target multi-mode fusion feature information. Then, inputting the initial target multi-mode fusion characteristic information into an initial image decoding network included in the initial power equipment image segmentation model to obtain initial decoding characteristic information; and determining a loss value between the initial decoding characteristic information and a sample label corresponding to the equipment information training sample. And finally, determining the initial power equipment image segmentation model as a trained power equipment image segmentation model in response to determining that the loss value is smaller than or equal to a preset threshold value. Therefore, the image can be subjected to fusion segmentation processing by utilizing the initial image feature extraction network, the initial multi-mode feature fusion network, the initial target mode feature filtering network and the initial image decoding network. Therefore, the fine granularity of image segmentation is improved, and redundant substances in segmented images are avoided.

Step 102, the image segmentation system responds to the received device image information and the device text information, and inputs the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information.

In some embodiments, the image segmentation system may input the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model in response to receiving the device image information and the device text information, so as to obtain device image feature extraction information and device text feature extraction information. The image feature extraction network includes: a visual feature extraction network layer and a text feature extraction network layer. The power device image division model may be a neural network model for generating division information of an image corresponding to the above-described target power device. For example, the neural network model may be Convolutional Neural Networks, CNN model. The image feature extraction network may be a neural network for downsampling processing, and for feature extraction of image information and text information. For example, the image feature extraction network may be a fusion model of a Bert pre-training model and a multi-layer convolutional neural network. In the application scenario, the device text information may be a title of the target power device on a presentation page. The device image information may be an image of the target power device on a presentation page.

In practice, the image segmentation system may input the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information, where the device image feature extraction information and the device text feature extraction information include the following steps:

and the first step is to input the device image information into the visual feature extraction network layer to obtain the device image feature extraction information corresponding to the target power device. The above-described visual feature extraction network layer may be a neural network layer for extracting visual feature information. The visual characteristic information may be characteristic information in the form of at least one modality. The at least one modality may be, but is not limited to, one of the following: text modality form, image modality form. The device image feature extraction information may be a feature map representing the semantic information of the image after the feature extraction process. For example, the visual feature extraction network layer may be a visual encoder layer. The visual characteristics extraction network layer may be trained from a computer vision (ImageNet) dataset.

And secondly, inputting the equipment text information into the text feature extraction network layer to obtain equipment text feature extraction information corresponding to the target power equipment. The device text feature extraction information may be a vector representing text semantic information after feature extraction processing. The text feature extraction network layer may be a neural network model for extracting text feature information. The text feature information may be feature information in the form of a text modality. The text feature extraction network layer may be a speech coder layer.

Step 103, the image segmentation system uses the device image feature extraction information as target device image feature extraction information, and based on the target device image feature extraction information and the device text feature extraction information, performs the following processing steps:

step 1031, inputting the target device image feature extraction information and the device text feature extraction information into a multi-mode feature fusion network included in the power device image segmentation model to obtain multi-mode feature fusion information corresponding to the target power device.

In some embodiments, the image segmentation system may input the target device image feature extraction information and the device text feature extraction information into a multi-modal feature fusion network included in the power device image segmentation model, so as to obtain multi-modal feature fusion information corresponding to the target power device. The target device image feature extraction information may be feature information corresponding to at least one mode form. The at least one modality may be, but is not limited to, one of the following: text modality form, image modality form. The multi-modal feature information may characterize a fusion feature of the target device image feature extraction information and the device text feature extraction information. The multi-modal feature information may be a feature vector corresponding to the target power device. The multi-modal feature information may characterize feature information in an image mode form and a text mode form corresponding to the target device image feature extraction information and the device text feature extraction information, respectively. The multi-modal feature fusion network may be a network that performs fusion processing on the target device image feature extraction information and the device text feature extraction information. The multi-modal feature fusion network may be a network that maps the image feature extraction information and the text feature extraction information of the target device to the same feature dimension, and obtains multi-modal feature information corresponding to the target power device.

In practice, the image segmentation system may input the target device image feature extraction information and the device text feature extraction information into a multi-modal feature fusion network included in the power device image segmentation model to obtain multi-modal feature fusion information corresponding to the target power device, through the following steps:

the first step, inputting the target device image feature extraction information into a network layer for image cue mapping included in the multi-mode feature fusion network, and obtaining device image cue feature information corresponding to the target device image feature extraction information. The device image clue feature information may be feature information obtained by mapping the target device image feature extraction information. The image cue mapping may map the target device image feature extraction information and the device text feature extraction information to the same dimension, and perform fusion processing on the target device image feature extraction information and the device text feature extraction information. In practice, the execution subject may perform mapping processing on the image feature extraction information of the target device through visual cue projection, so as to obtain image cue feature information. For example, the network layer for image cue mapping may be the first fully-connected layer in the multimodal feature fusion network.

And secondly, inputting the equipment text feature extraction information into a network layer for text query mapping included in the multi-mode feature fusion network to obtain equipment text query feature information corresponding to the equipment text feature extraction information. The text query feature information of the device may be feature information of the text feature extraction information after the mapping process. The text query mapping may map the device text feature extraction information and the target device image feature extraction information to the same dimension, and perform fusion processing on the target device image feature extraction information and the device text feature extraction information. In practice, the execution subject may map the device text feature extraction information through language query projection to obtain device text query feature information. For example, the network layer for text query mapping may be a second fully-connected layer in the multimodal feature fusion network.

And thirdly, carrying out fusion processing on the equipment image clue characteristic information and the equipment text query characteristic information to obtain multi-mode characteristic fusion information corresponding to the target power equipment. The multi-modal feature fusion information may be feature vectors of the fused image clue feature information and the text query feature information. In practice, the execution body may perform a dot-multiplication operation on the device image clue feature information and the device text query feature information, so as to obtain multi-mode feature fusion information corresponding to the target power device.

And step 1032, inputting the target device image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power device image segmentation model to obtain filtered multi-modal feature fusion information.

In some embodiments, the image segmentation system may input the target device image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power device image segmentation model, so as to obtain filtered multi-modal feature fusion information. The target modal feature filtering network may be a network model that performs feature filtering on target modal features. The target modality feature may be a fusion feature of at least one modality form. The at least one modality may be, but is not limited to, at least one of: an image modality form, a text modality form. The target modal feature filtering network may be a network that performs conversion processing on the target device image feature extraction information and the multi-modal feature fusion information, and generates filtered multi-modal feature fusion information. For example, the target modality feature filter network may be a convolutional neural network model that performs feature filtering processing.

In practice, the image segmentation system may input the target device image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power device image segmentation model to obtain filtered multi-modal feature fusion information through the following steps:

the first step, inputting the target device image feature extraction information into a network layer for image query mapping included in the target modal feature filtering network to obtain device image query feature information corresponding to the target device image feature extraction information. The image query map may be used to convert feature maps corresponding to the multimodal feature fusion information into vectors. In practice, the multi-mode feature fusion information can be converted through image query projection, so that the device image query feature information can be obtained.

And secondly, converting the multi-mode characteristic fusion information to obtain multi-mode clue characteristic fusion information. The multi-modal characteristics in the multi-modal characteristic fusion information can be normalized to obtain a normalization processing result. And then converting the normalization processing result into multi-mode clue characteristic information through a linear layer. For example, the normalization processing described above may be processing of the multi-modal feature by a normalized exponential function (Softmax function).

And thirdly, filtering the multi-mode clue feature fusion information according to the equipment image query feature information to obtain filtered multi-mode feature fusion information. The obtained device image query characteristic information and the multi-modal cue characteristic fusion information can be subjected to matrix multiplication processing to obtain filtered multi-modal characteristic fusion information.

Step 1033, in response to determining that the target multi-mode fusion feature information meets a first preset feature condition, generating device image segmentation information corresponding to the device image information according to the target multi-mode fusion feature information.

In some embodiments, the image segmentation system may generate device image segmentation information corresponding to the device image information according to the target multi-mode fusion feature information in response to determining that the target multi-mode fusion feature information meets a first preset feature condition. The target multi-mode fusion characteristic information is fusion characteristic information of the target device image characteristic extraction information and the filtered multi-mode characteristic fusion information. The power equipment image segmentation model comprises an image decoding network. The target multi-mode fusion characteristic information may be fusion characteristic information in at least one mode form. The at least one modality may be, but is not limited to, one of the following: an image modality form, a text modality form. The target multi-mode fusion feature information may be a feature map obtained by fusion processing of the target device image feature extraction information and the filtered multi-mode feature fusion information. The first preset feature condition may be that a feature dimension corresponding to the target multi-mode fusion feature information is smaller than a preset feature dimension. For example, the predetermined feature dimension may be 60×60×10. The device image division information may be a division result corresponding to the target power device. For example, the device image division information may be bounding box information of the target power device information in the image.

In practice, the above-described image segmentation system may generate device image segmentation information corresponding to the above-described device image information by:

the first step, according to the target multi-mode fusion characteristic information and the filtered multi-mode characteristic fusion information, the following decoding steps are executed:

and a first sub-step of inputting the target multi-mode fusion characteristic information into the image decoding network to obtain decoding characteristic information corresponding to the target multi-mode fusion characteristic information. The decoding feature information may represent respective feature maps of a plurality of channels corresponding to the multi-modal fusion feature information. The multi-mode fusion feature decoding process may perform an up-sampling process on the multi-mode fusion feature information, and then perform a convolution process on a result obtained by the up-sampling process. The image decoding network can be a convolutional neural network model for decoding the target multi-mode fusion characteristic information.

And a second sub-step of determining the decoding characteristic information as device image division information corresponding to the device image information in response to determining that the decoding characteristic information satisfies a second preset characteristic condition. The second preset feature condition may be that a feature dimension corresponding to the decoded feature information is greater than a target preset dimension value. For example, the target preset dimension value may be 420×420×10. It should be noted that, when the decoding feature information satisfies the second preset feature condition, each feature map of the preset number of channels exists in the decoding feature information. For example, the number of the preset values may be two. When the decoding characteristic information does not meet the second preset characteristic condition, each characteristic diagram of the target preset number of channels can exist in the decoding characteristic information. For example, the target preset number may be 3 at a minimum.

And secondly, in response to determining that the decoding characteristic information does not meet the second preset characteristic condition, performing characteristic fusion processing on the decoding characteristic information and the filtered multi-mode characteristic fusion information to obtain target fusion characteristic information. The target fusion feature information may represent a feature map obtained by fusing decoding feature information and corresponding filtered multi-mode feature fusion information. And connecting the decoded characteristic information and the filtered multi-mode characteristic fusion information to obtain a characteristic diagram after the characteristic fusion processing.

And thirdly, taking the target fusion characteristic information as new target multi-mode fusion characteristic information, and executing the decoding step again according to the target multi-mode fusion characteristic information and the filtered multi-mode characteristic fusion information. The target fusion feature information can be used as multi-modal feature fusion information, and the decoding step is executed again according to the multi-modal feature fusion information and the corresponding filtered multi-modal feature information. The filtered multi-modal feature fusion information corresponding to the target fusion feature information may be the filtered multi-modal feature fusion information corresponding to the decoded feature information that does not satisfy the second preset feature condition.

The above related matters are taken as an invention point of the present disclosure, and the technical problem mentioned in the background art is solved, which affects the image segmentation effect. ". Factors influencing the image segmentation effect tend to be as follows: when the feature information is decoded, the decoded information is not detected, and when the decoded image information is inaccurate, the image segmentation effect is affected. If the above factors are solved, the effect of improving the image segmentation can be achieved. To achieve this, first, according to the target multi-modal fusion feature information and the filtered multi-modal fusion feature information, the following decoding steps are performed: inputting the target multi-mode fusion characteristic information into the image decoding network to obtain decoding characteristic information corresponding to the target multi-mode fusion characteristic information; in response to determining that the decoding feature information satisfies a second preset feature condition, the decoding feature information is determined as device image division information corresponding to the device image information. And then, in response to determining that the decoding characteristic information does not meet the second preset characteristic condition, performing characteristic fusion processing on the decoding characteristic information and the filtered multi-mode characteristic fusion information to obtain target fusion characteristic information. And then taking the target fusion characteristic information as new target multi-mode fusion characteristic information, and executing the decoding step again according to the target multi-mode fusion characteristic information and the filtered multi-mode characteristic fusion information. Thus, the decoding feature information can be judged by the feature condition. Therefore, when the decoded image information is inaccurate, the decoding can be performed again so as to improve the image segmentation effect.

Optionally, the image segmentation system inputs the target multi-mode fusion feature information into the image feature extraction network to obtain multi-mode feature extraction information in response to determining that the target multi-mode fusion feature information does not meet the first preset feature condition.

In some embodiments, the image segmentation system may input the target multi-modal fusion feature information into the image feature extraction network to obtain multi-modal feature extraction information in response to determining that the target multi-modal fusion feature information does not satisfy the first preset feature condition.

Optionally, the image segmentation system uses the multi-modal feature extraction information as target multi-modal fusion feature information, and executes the processing step again.

In some embodiments, the image segmentation system may use the multi-modal feature extraction information as the target multi-modal fusion feature information, and execute the processing step again.

Referring now to fig. 2, a schematic diagram of a configuration of an electronic device (e.g., a power device information acquisition system and/or an image segmentation system) 200 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic devices in some embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 2 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 2, the electronic device 200 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 201, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 202 or a program loaded from a storage means 208 into a Random Access Memory (RAM) 203. In the RAM203, various programs and data necessary for the operation of the electronic apparatus 200 are also stored. The processing device 201, ROM202, and RAM203 are connected to each other through a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

In general, the following devices may be connected to the I/O interface 205: input devices 206 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 207 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 208 including, for example, magnetic tape, hard disk, etc.; and a communication device 209. The communication means 209 may allow the electronic device 200 to communicate with other devices wirelessly or by wire to exchange data. While fig. 2 shows an electronic device 200 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 2 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication device 209, or from the storage device 208, or from the ROM 202. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 201.

It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: the power equipment information acquisition system acquires equipment image information and equipment text information of target power equipment, and sends the image information and the text information to an associated image segmentation system; the image segmentation system responds to the received device image information and the device text information, and inputs the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information; the image segmentation system uses the device image feature extraction information as target device image feature extraction information, and based on the target device image feature extraction information and the device text feature extraction information, the image segmentation system executes the following processing steps: inputting the target equipment image feature extraction information and the equipment text feature extraction information into a multi-modal feature fusion network included in the power equipment image segmentation model to obtain multi-modal feature fusion information corresponding to the target power equipment; inputting the target equipment image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power equipment image segmentation model to obtain filtered multi-modal feature fusion information; and generating equipment image segmentation information corresponding to the equipment image information according to the target multi-mode fusion characteristic information in response to determining that the target multi-mode fusion characteristic information meets a first preset characteristic condition, wherein the target multi-mode fusion characteristic information is the fusion characteristic information of the target equipment image characteristic extraction information and the filtered multi-mode characteristic fusion information.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. An image segmentation method based on intelligent fusion of multi-system data comprises the following steps:

the method comprises the steps that an electric power equipment information acquisition system acquires equipment image information and equipment text information of target electric power equipment, and sends the image information and the text information to an associated image segmentation system;

The image segmentation system responds to the received device image information and the device text information, and inputs the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information;

the image segmentation system takes the device image feature extraction information as target device image feature extraction information, and based on the target device image feature extraction information and the device text feature extraction information, the image segmentation system executes the following processing steps:

inputting the target equipment image feature extraction information and the equipment text feature extraction information into a multi-modal feature fusion network included in the power equipment image segmentation model to obtain multi-modal feature fusion information corresponding to the target power equipment;

inputting the target equipment image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power equipment image segmentation model to obtain filtered multi-modal feature fusion information;

and responding to the fact that the target multi-mode fusion characteristic information meets a first preset characteristic condition, and generating equipment image segmentation information corresponding to the equipment image information according to the target multi-mode fusion characteristic information, wherein the target multi-mode fusion characteristic information is fusion characteristic information of the target equipment image characteristic extraction information and the filtered multi-mode characteristic fusion information.

2. The method of claim 1, wherein the method further comprises:

the image segmentation system responds to the fact that the target multi-mode fusion characteristic information does not meet the first preset characteristic condition, and inputs the target multi-mode fusion characteristic information into the image characteristic extraction network to obtain multi-mode characteristic extraction information;

the image segmentation system takes the multi-mode feature extraction information as target multi-mode fusion feature information, and the processing step is executed again.

3. The method of claim 1, wherein the inputting the target device image feature extraction information and the multi-modal feature fusion information into a target modal feature filtering network included in the power device image segmentation model to obtain filtered multi-modal feature fusion information includes:

inputting the target equipment image feature extraction information into a network layer for image query mapping included in the target modal feature filtering network to obtain equipment image query feature information corresponding to the target equipment image feature extraction information;

converting the multi-mode characteristic fusion information to obtain multi-mode clue characteristic fusion information;

And filtering the multi-modal clue feature fusion information according to the equipment image query feature information to obtain filtered multi-modal feature fusion information.

4. The method of claim 1, wherein the image feature extraction network comprises: a visual feature extraction network layer and a text feature extraction network layer; and

inputting the device image information and the device text information into an image feature extraction network included in a pre-trained power device image segmentation model to obtain device image feature extraction information and device text feature extraction information, wherein the method comprises the following steps:

inputting the device image information into the visual feature extraction network layer to obtain device image feature extraction information corresponding to the target power device;

and inputting the equipment text information into the text feature extraction network layer to obtain equipment text feature extraction information corresponding to the target power equipment.

5. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

6. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-4.