CN115496777A

CN115496777A - Training method of image segmentation model and related equipment

Info

Publication number: CN115496777A
Application number: CN202211244124.5A
Authority: CN
Inventors: 王伟农; 戴宇荣; 陶鑫
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2022-12-20

Abstract

The embodiment of the disclosure provides a training method of an image segmentation model and related equipment. The method comprises the following steps: acquiring an image to be segmented and an interactive representation image; determining a current segmentation mode in a foreground segmentation mode and an interactive segmentation mode based on a random selection mode, wherein the foreground segmentation mode is used for performing stage training on an image segmentation model by using an image to be segmented, and the interactive segmentation mode is used for performing stage training on the image segmentation model by using the image to be segmented and an interactive representation image; performing stage training on the image segmentation model by using a current segmentation mode; and when the stage training is finished, updating the current segmentation mode based on a random selection mode, and training the image segmentation model by using the updated current segmentation mode until a training end condition is met. The method can save memory space and avoid the problem that the image segmentation model obtained by training is easy to cause blockage when applied; meanwhile, the generalization and the robustness of the two tasks can be improved.

Description

Training method of image segmentation model and related equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a training method and a training apparatus for an image segmentation model, an image segmentation apparatus, an electronic device, and a computer-readable storage medium.

Background

With the rapid development of computer vision technology, image target segmentation technology is used as an important computer vision task and has a great number of applications in image retrieval, picture editing and movie and television production.

In the related technology, image target segmentation is performed by using foreground segmentation alone or interactive segmentation alone, the foreground segmentation and the interactive segmentation are used as two independent tasks to be tested and deployed respectively, the mode ignores the commonality between the two tasks, in the service deployment, respective model files need to be deployed respectively, the two tasks need to occupy a large amount of memory space, and the problems of blockage, memory explosion and the like are easily caused, so that the actual product experience is influenced.

It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a training method and a training device for an image segmentation model, an image segmentation method, an electronic device and a computer-readable storage medium, which can save memory space and avoid the problem that the image segmentation model obtained by training is easy to cause unsmooth in application; meanwhile, the generalization and the robustness of the two tasks can be improved.

The embodiment of the disclosure provides a training method of an image segmentation model, which comprises the following steps: acquiring an image to be segmented and an interactive representation image; determining a current segmentation mode in a foreground segmentation mode and an interactive segmentation mode based on a random selection mode, wherein the foreground segmentation mode is used for performing stage training on an image segmentation model by using the image to be segmented, and the interactive segmentation mode is used for performing stage training on the image segmentation model by using the image to be segmented and the interactive representation image; performing stage training on the image segmentation model by using the current segmentation mode; and when the stage training is finished, updating the current segmentation mode based on the random selection mode, and training the image segmentation model by using the updated current segmentation mode until a training end condition is met.

In some exemplary embodiments of the present disclosure, when the current segmentation mode is the foreground segmentation mode, the performing stage training on the image segmentation model using the current segmentation mode includes: acquiring a preset mask image, a preset positive interaction characterization image and a preset negative interaction characterization image, wherein the pixel values of the preset mask image, the preset positive interaction characterization image and the preset negative interaction characterization image are all designated pixel values; combining the image to be segmented, the preset mask image, the preset positive interaction characterization image and the preset negative interaction characterization image to obtain a first input image; inputting the first input image into the image segmentation model, and performing stage training on the image segmentation model.

In some exemplary embodiments of the present disclosure, the image segmentation model includes an encoding layer and a decoding layer; the inputting the first input image into the image segmentation model, and performing stage training on the image segmentation model includes: inputting the first input image into the coding layer for coding processing to obtain a first feature vector; inputting the first feature vector into the decoding layer for decoding processing to obtain a first prediction mask image; and performing stage training on the image segmentation model according to the first prediction mask image.

In some exemplary embodiments of the present disclosure, when the current segmentation mode is the interactive segmentation mode, the interactive characterization image includes a first mask image, a first orthogonal mutual characterization image, and a first negative interactive characterization image; the performing stage training on the image segmentation model by using the current segmentation mode comprises: combining the image to be segmented, the first mask image, the first orthogonal mutual representation image and the first negative mutual representation image to obtain a second input image; and inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model.

In some exemplary embodiments of the present disclosure, the interactive characterization images further include a second orthogonal mutual characterization image and a second negative interactive characterization image; inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model, wherein the stage training comprises: inputting the second input image into the image segmentation model to obtain a second mask image; combining the image to be segmented, the second mask image, the second orthogonal mutual representation image and the second negative mutual representation image to obtain a third input image; and inputting the third input image into the image segmentation model, and performing stage training on the image segmentation model.

In some exemplary embodiments of the present disclosure, the image segmentation model includes an encoding layer and a decoding layer; the inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model includes: inputting the second input image into the coding layer for coding processing to obtain a second feature vector; inputting the second feature vector into the decoding layer for decoding processing to obtain a second prediction mask image; and performing stage training on the image segmentation model according to the second prediction mask image.

In some exemplary embodiments of the present disclosure, the random selection manner is to randomly select between the foreground segmentation mode and the interactive segmentation mode with a preset probability.

The embodiment of the disclosure provides an image segmentation method, which includes: acquiring an image to be segmented and an interactive representation image; when the segmentation mode is a foreground segmentation mode, inputting the image to be segmented into an image segmentation model obtained by training according to any one of the methods to obtain a first target mask image; and when the segmentation mode is an interactive segmentation mode, inputting the image to be segmented and the interactive representation image into an image segmentation model obtained by training according to any one of the methods to obtain a second target mask image.

The embodiment of the present disclosure provides a training device for an image segmentation model, including: an acquisition module configured to perform acquisition of an image to be segmented and an interactive representation image; a selection module configured to perform determining a current segmentation mode among a foreground segmentation mode and an interactive segmentation mode based on a random selection manner, the foreground segmentation mode being used for performing stage training on an image segmentation model by using the image to be segmented, the interactive segmentation mode being used for performing stage training on the image segmentation model by using the image to be segmented and the interactive representation image; a training module configured to perform phase training of the image segmentation model using the current segmentation mode; and the updating module is configured to update the current segmentation mode based on the random selection mode when the stage training is completed, so as to train the image segmentation model by using the updated current segmentation mode until a training end condition is met.

An embodiment of the present disclosure provides an image segmentation apparatus, including: the acquisition module is configured to acquire an image to be segmented and an interactive representation image; the obtaining module is configured to input the image to be segmented into an image segmentation model obtained by training according to any one of the methods when the segmentation mode is a foreground segmentation mode, so as to obtain a first target mask image; the obtaining module is further configured to input the image to be segmented and the interactive representation image into an image segmentation model obtained by training according to any one of the above methods to obtain a second target mask image when the segmentation mode is an interactive segmentation mode.

An embodiment of the present disclosure provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute executable instructions to implement a training method of an image segmentation model as described in any one of the above or an image segmentation method as described above.

Embodiments of the present disclosure provide a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a training method of an image segmentation model as in any one of the above or an image segmentation method as in the above.

The disclosed embodiments provide a computer program product comprising a computer program which, when executed by a processor, implements the method for training an image segmentation model of any of the above or the method for image segmentation as described above.

In the training method of the image segmentation model provided by the embodiment of the disclosure, in the model training process, a current segmentation mode is determined in a foreground segmentation mode and an interactive segmentation mode based on a random selection mode, the current segmentation mode is used for carrying out stage training on the image segmentation model, after the stage training on the image segmentation model is completed by using the current segmentation mode, a segmentation mode is randomly selected in the foreground segmentation mode and the interactive segmentation mode based on a random selection mode again, the current segmentation mode is updated, and the image segmentation model is subjected to stage training by using the updated current segmentation mode, so that two tasks of foreground segmentation and interactive segmentation share one image segmentation model, on one hand, the memory space can be saved, the computer resources are saved, and the problem that the image segmentation model obtained by training is easy to cause blocking during application is avoided; on the other hand, the network is trained in a multi-task mode, so that the generalization and the robustness of the two tasks can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which a training method of an image segmentation model or an image segmentation method of an embodiment of the present disclosure may be applied.

FIG. 2 is a flow diagram illustrating a method of training an image segmentation model according to an exemplary embodiment.

FIG. 3 is a schematic diagram illustrating training of an image segmentation model according to an exemplary embodiment.

FIG. 4 is a flow chart illustrating another method of training an image segmentation model in accordance with an exemplary embodiment.

FIG. 5 is a flowchart illustrating another method of training an image segmentation model, according to an exemplary embodiment.

FIG. 6 is a flow chart illustrating another method of training an image segmentation model in accordance with an exemplary embodiment.

FIG. 7 is a flowchart illustrating an image segmentation method according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating an apparatus for training an image segmentation model according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating an image segmentation apparatus according to an exemplary embodiment.

FIG. 10 is a block diagram illustrating an electronic device suitable for use in implementing exemplary embodiments of the present disclosure, according to an exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted.

The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in at least one hardware module or integrated circuit, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and steps, nor do they necessarily have to be performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In this specification, the terms "a", "an", "the", "said" and "at least one" are used to indicate the presence of at least one element/component/etc.; the terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first," "second," and "third," etc. are used merely as labels, and are not limiting as to the number of their objects.

As shown in fig. 1, the system architecture may include a server 101, a network 102, a terminal device 103, a terminal device 104, and a terminal device 105. Network 102 is the medium used to provide communication links between terminal device 103, terminal device 104, or terminal device 105, and server 101. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The server 101 may be a server that provides various services, such as a background management server that provides support for devices operated by the user using the terminal apparatus 103, the terminal apparatus 104, or the terminal apparatus 105. The background management server may analyze and otherwise process the received data such as the request, and feed back the processing result to the terminal device 103, the terminal device 104, or the terminal device 105.

Terminal device 103, terminal device 104, and terminal device 105 may be, but are not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a wearable smart device, a virtual reality device, an augmented reality device, and the like.

In the embodiment of the present disclosure, the server 101 may: acquiring an image to be segmented and an interactive representation image; determining a current segmentation mode in a foreground segmentation mode and an interactive segmentation mode based on a random selection mode, wherein the foreground segmentation mode is used for performing stage training on an image segmentation model by using an image to be segmented, and the interactive segmentation mode is used for performing stage training on the image segmentation model by using the image to be segmented and an interactive representation image; performing stage training on the image segmentation model by using a current segmentation mode; and when the stage training is finished, updating the current segmentation mode based on a random selection mode, and training the image segmentation model by using the updated current segmentation mode until a training end condition is met.

In the embodiment of the present disclosure, the server 101 may obtain an image to be segmented and an interactive representation image from the terminal device; when the segmentation mode is a foreground segmentation mode, inputting an image to be segmented into an image segmentation model obtained according to the training of the method to obtain a first target mask image; when the segmentation mode is an interactive segmentation mode, inputting an image to be segmented and an interactive representation image into an image segmentation model obtained according to the method training to obtain a second target mask image; the server 101 may return the obtained first target mask image or second target mask image to the terminal device.

It should be understood that the numbers of the terminal device 103, the terminal device 104, the terminal device 105, the network 102 and the server 101 in fig. 1 are only illustrative, and the server 101 may be a physical server, a server cluster composed of a plurality of servers, a cloud server, and any number of terminal devices, networks and servers according to actual needs.

Hereinafter, the steps of the training method of the image segmentation model in the exemplary embodiment of the present disclosure will be described in more detail with reference to the drawings and the embodiments. The method provided by the embodiment of the present disclosure may be executed by any electronic device, such as the server and/or the terminal device in fig. 1, but the present disclosure is not limited thereto.

As shown in fig. 2, the method provided by the embodiment of the present disclosure may include the following steps.

In step S210, an image to be segmented and an interactive representation image are acquired.

In the embodiment of the present disclosure, the image to be segmented refers to an image that needs to be subjected to image instance segmentation to obtain a target mask, and the image instance in the image to be segmented may include one or at least two. Specifically, the server may obtain an image to be segmented from the terminal device; the server may also acquire the image to be segmented from the service server, the server may also acquire the image to be segmented from the internet, and the server may also directly acquire the image to be segmented from the database, which is not limited in this disclosure.

In the embodiment of the disclosure, the interactive representation image is an image generated based on the interactive behavior of the user, and is used for representing the interactive behavior of the user; the interaction behavior of the user can include, but is not limited to, two ways, namely click (click) and smear (script), each interaction way can be divided into positive interaction and negative interaction (the corresponding interactive characterization images can be referred to as positive interactive characterization images and negative interactive characterization images), the orthogonal interaction refers to smearing (or clicking) in the region of interest of the user, and the negative interaction refers to smearing (or clicking) in the background miscut region.

In the embodiment of the disclosure, the interactive representation image may be generated directly according to the interactive behavior of the user, or may be automatically generated by a machine. Specifically, the server may obtain an interactive representation image from the terminal device; the server may also obtain the interactive representation image from the service server, the server may also acquire the interactive representation image from the internet, and the server may also directly obtain the interactive representation image from the database, which is not limited in this disclosure.

In the embodiment of the disclosure, the image segmentation model may be trained through a foreground segmentation mode and an interactive segmentation mode, and in the foreground segmentation mode, the training set may include images to be segmented (no interaction may be considered in the foreground segmentation mode, and a corresponding interactive representation image may be a preset image); in the interactive segmentation mode, the training set may include images to be segmented and interactive characterization images.

In step S220, a current segmentation mode is determined among the foreground segmentation mode and the interactive segmentation mode based on a random selection manner.

The foreground segmentation mode is used for performing stage training on the image segmentation model by using the image to be segmented, and the interactive segmentation mode is used for performing stage training on the image segmentation model by using the image to be segmented and the interactive representation image.

The foreground segmentation (also called foreground object segmentation) refers to simulating human visual characteristics through an intelligent algorithm, extracting a salient region (namely an object mask) in an image pixel by pixel, and the interactive segmentation (also called interactive object segmentation) refers to distinguishing an interested object from a background based on interactive behaviors of a user, so that a current segmentation result can be effectively adjusted according to guidance of the user. Image foreground segmentation and image interactive segmentation both belong to computer vision tasks and have various applications in image retrieval, visual tracking picture editing and movie and television production.

In the related art, foreground segmentation and interactive segmentation are respectively trained, tested and deployed as two independent tasks; in the method, the current segmentation mode can be randomly selected from the foreground segmentation mode and the interactive segmentation mode, and the foreground segmentation mode and the interactive segmentation mode can be randomly switched, so that the foreground segmentation and the interactive segmentation tasks share one image segmentation model, and computer resources are saved.

In the embodiment of the disclosure, in the training process of the image segmentation model, a current segmentation mode may be randomly selected from a foreground segmentation mode and an interactive segmentation mode based on a random selection mode, phase training is performed using the current segmentation mode (one of the foreground segmentation mode and the interactive segmentation mode), then one of the foreground segmentation mode and the interactive segmentation mode is randomly selected based on the random selection mode to update the current segmentation mode, and phase training is continued using the updated current segmentation mode (one of the foreground segmentation mode and the interactive segmentation mode), and so on until the model training is completed. The training end condition of the image segmentation model may be a training number, which may be set according to an actual situation, for example, the training number may be directly set (for example, the training number may be set to N, where N is an integer greater than or equal to 2); training completion conditions (for example, the training completion conditions may be training times corresponding to a certain pattern, or convergence of a loss function in the image segmentation model, or reaching a preset condition by a certain parameter in the image segmentation model, etc.), which are not limited in this disclosure.

For example, in the training process of the image segmentation model, the foreground segmentation model is selected as the current segmentation mode, and the foreground segmentation mode is used for performing stage training on the image segmentation model; after the training at the stage is finished, continuously and randomly selecting a segmentation mode to update the current segmentation mode, for example, selecting an interactive segmentation mode to update the current segmentation mode, and performing stage training on the same image segmentation model by using the interactive segmentation mode; after the training of the stage is finished, the segmentation modes are continuously and randomly selected to update the current segmentation mode, and in this way, the segmentation mode is randomly selected from the foreground segmentation mode and the interactive segmentation mode in each stage of training, and the two segmentation modes train the same image segmentation model until the training of the model is finished.

In the embodiment of the present disclosure, the image segmentation model is a model used for image segmentation to obtain the target mask, and may be a deep neural network model or other models, which is not limited in the present disclosure. The deep neural network can be applied to image segmentation, and high-level semantic features extracted from the deep neural network can accurately distinguish a target object and a background from a complex scene, so that the image segmentation effect is improved.

In an exemplary embodiment, the random selection manner is to randomly select between the foreground segmentation mode and the interactive segmentation mode with a preset probability.

The preset probability may be a probability of randomly selecting one of the foreground segmentation mode and the interactive segmentation mode, for example, the preset probability may be a probability of selecting the foreground segmentation mode, or the preset probability may be a probability of selecting the interactive segmentation mode, or the probability of selecting the foreground segmentation mode may be set as a first preset probability and the probability of selecting the interactive segmentation mode may be set as a second preset probability.

The preset probability may be set according to actual situations, for example, may be set to 0.3, 0.5, 0.8, and the like, which is not limited in this disclosure.

In the embodiment of the present disclosure, the preset probability of selecting the foreground segmentation mode (or the interactive segmentation mode) may be determined according to the use frequency of the foreground segmentation mode (or the interactive segmentation mode) when the image segmentation model is applied.

For example, when the image segmentation model is applied, the foreground segmentation mode is used more, so when the image segmentation model is trained, the preset probability of selecting the foreground segmentation mode for training is set to be larger (for example, greater than 0.5), the number of times of training using the foreground segmentation mode is larger, the accuracy of segmenting the image segmentation model using the foreground segmentation mode when the image segmentation model is applied is higher, and the overall accuracy of the image segmentation model when the image segmentation model is applied is higher.

For example, when the image segmentation model is applied, the interactive segmentation mode is used more frequently, so when the image segmentation model is trained, the preset probability of selecting the interactive segmentation mode for training is set to be larger (for example, greater than 0.5), and the number of times of training using the interactive segmentation mode is larger, so that the accuracy of segmenting the image segmentation model using the interactive segmentation mode when the image segmentation model is applied is higher, and the overall accuracy of the image segmentation model when the image segmentation model is applied is higher.

For example, when the image segmentation model is applied, the frequency of use of the foreground segmentation mode and the interactive segmentation mode is substantially the same, when the image segmentation model is trained, the preset probability of selecting the foreground segmentation mode (or the interactive segmentation mode) for training may be set to 0.5, and the number of times of training using the foreground segmentation mode and the interactive segmentation mode is substantially the same, so that the overall accuracy of the image segmentation model in application may be higher.

In step S230, the image segmentation model is subjected to stage training using the current segmentation mode.

In the embodiment of the present disclosure, if the foreground segmentation mode is selected as the current segmentation mode in step S220, the image segmentation model is subjected to stage training in the foreground segmentation mode; if the interactive segmentation mode is selected as the current segmentation mode in step S220, the image segmentation model is subjected to stage training in the interactive segmentation mode.

The following is a description of the phase training of the image segmentation model in the foreground segmentation mode and the phase training of the image segmentation model in the interactive segmentation mode.

In the embodiment of the disclosure, when the current segmentation mode is the foreground segmentation mode, the image to be segmented is used for performing stage training on the image segmentation model.

When the current segmentation mode is a foreground segmentation mode, the image to be segmented and a target mask image of the image to be segmented can be used as a training set, and the image to be segmented is input into an image segmentation model to be trained to obtain a prediction mask image; and determining a loss function according to the target mask image and the prediction mask image, and training an image segmentation model according to the loss function so as to complete the training at the current stage.

In the embodiment of the disclosure, when the current segmentation mode is the interactive segmentation mode, the image segmentation model is subjected to stage training by using the image to be segmented and the interactive representation image.

When the current segmentation mode is a foreground segmentation mode, the image to be segmented, the interactive representation image and a target mask image of the image to be segmented can be used as a training set, and the image to be segmented and the interactive representation image are input into an image segmentation model to be trained to obtain a prediction mask image; and determining a loss function according to the target mask image and the prediction mask image, and training an image segmentation model according to the loss function to finish the training at the current stage.

Specifically, the image segmentation model may include an encoding layer (encoder) and a decoding layer (decoder).

The training process of the image segmentation model is described below with reference to fig. 3.

Referring to fig. 3, when the current segmentation mode is the foreground segmentation mode, input data (shown by a dashed box) corresponding to the foreground segmentation mode may be input to the encoding layer 309 for encoding processing, so as to obtain a first feature vector; inputting the first feature vector to the decoding layer 310 for decoding processing to obtain a first prediction mask image 311; the image segmentation model is trained on the first predicted mask image 311 and the target mask image.

When the current segmentation mode is the interactive segmentation mode, input data (shown by a dashed box) corresponding to the interactive segmentation mode may be input to the encoding layer 309 for encoding processing, so as to obtain a second feature vector; inputting the second feature vector to the decoding layer 310 for decoding processing to obtain a second predicted mask image 311; the image segmentation model is trained on the second predicted mask image 311 and the target mask image.

Specifically, when the current segmentation mode is the foreground segmentation mode, the input data corresponding to the foreground segmentation mode may include an image to be segmented 301, a preset mask image 302, a preset positive interaction characterization image 303, and a preset negative interaction characterization image 304, where pixel values of the preset mask image 302, the preset positive interaction characterization image 303, and the preset negative interaction characterization image 304 are all specified pixel values, that is, when the current segmentation mode is the foreground segmentation mode, the interactive characterization image is not required to be used as an input image, and at this time, the three images may be set to an inactive state (set to an active state when the current segmentation mode is the interactive segmentation mode), that is, the three images may be set to the preset image. The preset image may be a binary image, and the designated pixel value may be 0 (that is, the pixel point is displayed as black), that is, when the current segmentation mode is the foreground segmentation mode, the full black image may be used as the preset mask image, the preset positive interaction characterization image, and the preset negative interaction characterization image.

Specifically, when the current segmentation mode is the interactive segmentation mode, the input data corresponding to the interactive segmentation mode may include an image to be segmented 305, a first mask image 306, a first orthogonal mutual representation image 307, and a first negative mutual representation image 308, where if the training is the first training of the interactive segmentation mode, the first mask image 306 is an initial mask image; if the training is the second or Nth training of the interactive segmentation, the first mask image 306 is the predicted mask image M obtained by segmentation after the last interaction ^prev (ii) a The first mask image 306 may be a binary image whose pixel values may range from 0,1]The larger the value is, the higher the probability of representing the foreground is, and the smaller the value is, the higher the probability of representing the background is; the first orthogonal mutual characterization image 307 is a positive characterization image generated by smearing (or clicking) the first mask image 306 on the image 305 to be segmented by the user during the interaction

(the user paints or clicks on the image to be segmented 305 a point or area that should belong to the mask but not to the first mask image 306), the first orthogonal mutual representation image may be a binary image whose pixel values may range from 0,1]The value of the part smeared by the user is 1, and the values of the other parts are 0; the first negative interaction representation image 308 represents smearing (or dots) of the user based on the first mask image 306 during the interactionHits) generated negative token image

(the user paints or clicks on a point or area on the image to be segmented 305 that should not belong to a mask, but which belongs to the first mask image 306), the second orthogonal mutual characterization image may be a binary image whose pixel values may range in value from 0,1]The value of the part smeared by the user is 1, and the value of the other parts is 0.

Inputting an image to be segmented 305, a first mask image 306, a first orthogonal mutual representation image 307 and a first negative mutual representation image 308 into a coding layer 309 and a decoding layer 310 of an image segmentation model for processing to obtain a second mask image; the user then paints or clicks according to the image 305 to be segmented and the second mask image to generate a second orthogonal mutual representation image and a second negative mutual representation image; the image 305 to be segmented, the second orthogonal mutual representation image of the second mask image, and the second negative mutual representation image are input into the coding layer 309 and the decoding layer 310 of the image segmentation model for processing, and so on, to obtain a predicted mask image (i.e., the second predicted mask image).

In step S240, when the phase training is completed, the current segmentation mode is updated based on the random selection manner, so as to train the image segmentation model using the updated current segmentation mode until the training end condition is satisfied.

After the foreground segmentation mode or the interactive segmentation mode is used for stage training, one mode is randomly selected again from the foreground segmentation mode and the interactive segmentation mode to update the current segmentation mode, and the updated current segmentation mode (one of the foreground segmentation mode and the interactive segmentation mode) is used for continuing stage training, and so on until the training of the model is completed.

In an exemplary embodiment, the foreground segmentation mode and the interactive segmentation mode may be randomly selected with a preset probability to update the current segmentation mode.

The step of randomly selecting again in the foreground segmentation mode and the interactive segmentation mode according to the preset probability is similar to the step of randomly selecting again in the foreground segmentation mode and the interactive segmentation mode according to the preset probability, and the above description may be referred to, and this disclosure is not repeated herein.

The process of randomly selecting the division pattern twice is described below with reference to fig. 4. Referring to fig. 4, in step S410, an image to be segmented and an interactive representation image are acquired; in step S420, randomly selecting a current segmentation mode from the foreground segmentation model and the interactive segmentation mode, for example, selecting the foreground segmentation mode as the current segmentation mode, then performing step S430, i.e., performing stage training on the image segmentation model in the foreground segmentation mode; after the training phase of the image segmentation model is completed in the foreground segmentation mode, in step S450, the segmentation mode is randomly selected again from the foreground segmentation mode and the interactive segmentation mode to update the current segmentation mode, for example, the interactive segmentation mode is selected as the updated current segmentation mode, then step S470 is performed (when the updated current segmentation mode is the interactive segmentation mode, the image segmentation model is subjected to the phase training using the image to be segmented and the interactive representation image), that is, the image segmentation model is subjected to the phase training in the interactive segmentation mode; after the stage training of the image segmentation model using the interactive segmentation mode is completed, the segmentation mode continues to be randomly selected from the foreground segmentation mode and the interactive segmentation mode in step S280 to update the current segmentation model again, so as to train the image segmentation model using the segmentation mode that continues to be selected until the model training is finished. In this case, the method is performed by the steps of: s410 → S420 → S430 → S450 → S470 → S480.

Alternatively, referring to fig. 4, in step S420, a segmentation mode is randomly selected as the current segmentation mode, for example, an interactive segmentation mode is selected as the current segmentation mode, then step S440 is performed, i.e., the image segmentation model is subjected to stage training in the interactive segmentation mode; after the stage training of the image segmentation model in the interactive segmentation mode is completed, the segmentation mode is randomly selected again in step S450 to update the current segmentation mode, for example, the foreground segmentation mode is selected as the updated current segmentation mode, and then step S460 (when the updated current segmentation mode is the foreground segmentation mode, the image segmentation model is stage trained by using the image to be segmented) is performed, that is, the image segmentation model is stage trained in the foreground mode; after the stage training of the image segmentation model in the foreground segmentation mode is completed, the segmentation mode continues to be randomly selected from the foreground segmentation mode and the interactive segmentation mode in step S480, so as to train the image segmentation model using the segmentation mode that continues to be selected until the model training is finished. In this case, the method is performed by: s410 → S420 → S440 → S450 → S460 → S480.

Similarly, the method may also be performed by: s410 → S420 → S430 → S450 → S460 → S480, or S410 → S420 → S440 → S450 → S470 → S480.

Similarly, the process of randomly selecting the segmentation mode multiple times is similar to the process of randomly selecting the segmentation mode twice, and details are not repeated in the present disclosure.

In addition, compared with the prior art that the foreground segmentation and the interactive segmentation are respectively trained, tested and deployed as two independent tasks, the image segmentation model obtained by training through the method can save about half of memory space during training, testing and deployment.

FIG. 5 is a flowchart illustrating another method of training an image segmentation model, according to an exemplary embodiment. Fig. 5 shows specific steps of performing stage training on the image segmentation model using the current segmentation mode when the current segmentation mode is the foreground segmentation mode.

In the fig. 5 embodiment, the step S230 in the fig. 2 embodiment described above may further include the following steps.

In step S231, a preset mask image, a preset positive interaction characterization image, and a preset negative interaction characterization image are obtained.

The preset mask image, the preset positive interactive representation image and the preset negative interactive representation image have designated pixel values, that is, when the current segmentation mode is the foreground segmentation mode, the interactive representation image is not required to be used as an input image, and at this time, the three images can be set to be in an inactive state (set to be in an active state when the current segmentation mode is the interactive segmentation mode), that is, the three images can be set to be the preset images. The preset image may be a binary image, and the designated pixel value may be 0 (that is, the pixel point is displayed as black), that is, when the current segmentation mode is the foreground segmentation mode, the full black image may be used as the preset mask image, the preset positive interaction characterization image, and the preset negative interaction characterization image.

In step S232, the image to be segmented, the preset mask image, the preset positive interaction characterization image, and the preset negative interaction characterization image are merged to obtain a first input image.

In the embodiment of the disclosure, the image I to be segmented can be ^RGB ∈R ^H×W×3 Presetting a mask image M ^prev ∈R ^H×W×1 Presetting a positive interaction representation image

And preset negative interaction representation image

On-lineMerging (concat) in the road dimension to obtain a first input image I ^input ∈R ^H×W×5 Wherein, H in the upper corner mark of R represents the height of the image, W represents the width of the image, and the number represents the channel number.

In step S233, the first input image is input to the image segmentation model, and the image segmentation model is subjected to stage training.

In the embodiment of the disclosure, a first input image may be input into an image segmentation model to obtain a first prediction mask image; determining a loss function according to the first prediction mask image and a target mask image (label) of the image to be segmented, adjusting model parameters of the image segmentation model, enabling errors of the first prediction mask image and the target mask image of the image to be segmented to meet preset conditions, and completing stage training of the image segmentation model.

In an exemplary embodiment, the image segmentation model includes an encoding layer and a decoding layer; inputting a first input image into an image segmentation model, and performing stage training on the image segmentation model, wherein the stage training comprises the following steps: inputting a first input image into a coding layer for coding processing to obtain a first feature vector; inputting the first feature vector into a decoding layer for decoding to obtain a first prediction mask image; and performing stage training on the image segmentation model according to the first prediction mask image.

With reference to fig. 3, when the current segmentation mode is the foreground segmentation mode, the image to be segmented 301, the preset mask image 302, the preset positive interaction characterization image 303, and the preset negative interaction characterization image 304 may be merged to obtain a first input image; inputting a first input image into an encoding layer 309 for encoding processing to obtain a first feature vector; inputting the first feature vector to the decoding layer 310 for decoding processing to obtain a first predicted mask image 311; and performing stage training on the image segmentation model according to the first prediction mask image 311 and the target mask image, determining a loss function according to the first prediction mask image 311 and the target mask image of the image to be segmented, and adjusting model parameters of the image segmentation model so that the error between the first prediction mask image 311 and the target mask image of the image to be segmented meets a preset condition, thereby completing the stage training on the image segmentation model.

FIG. 6 is a flow chart illustrating another method of training an image segmentation model in accordance with an exemplary embodiment. Fig. 6 shows specific steps of performing stage training on the image segmentation model using the current segmentation mode when the current segmentation mode is the interactive segmentation mode.

In the fig. 6 embodiment, step S230 in the above-mentioned fig. 2 embodiment may further include the following steps.

In an exemplary embodiment, the interactive characterization image includes a first mask image, a first orthogonal mutual characterization image, and a first negative interactive characterization image. If the training is the first training of the interactive segmentation mode, the first mask image is an initial mask image; if the training is the second or Nth training of the interactive segmentation, the first mask image is a predicted mask image obtained by segmentation after the last interaction; the first orthogonal mutual representation image is a positive representation image generated by smearing (or clicking) the first mask image on the image to be segmented by a user during the interaction (the user smears or clicks a point or an area which belongs to the mask but does not belong to the first mask image on the image to be segmented); the first negative interaction representation image is a negative representation image generated by a user based on smearing (or clicking) of the first mask image during the interaction (the user smears or clicks on a point or a region which does not belong to the mask but belongs to the first mask image on the image to be segmented).

In step S234, the image to be segmented, the first mask image, the first orthogonal mutual representation image, and the first negative mutual representation image are merged to obtain a second input image.

In the embodiment of the disclosure, the image I to be segmented can be ^RGB ∈R ^H×W×3 First mask image M ^prev ∈R ^H×W×1 The first orthogonal mutual representation image

And a first negative interaction representation image

Merging (concat) in the channel dimension to obtain a second input image I ^input ∈R ^H×W×5 Wherein, H in the upper corner mark of R represents the height of the image, W represents the width of the image, and the number represents the number of channels.

In step S235, a second input image is input to the image segmentation model, and the image segmentation model is subjected to stage training.

In the embodiment of the disclosure, the second input image may be input into the image segmentation model to obtain a second prediction mask image; determining a loss function according to the second prediction mask image and a target mask image (label) of the image to be segmented, adjusting model parameters of the image segmentation model, enabling errors of the second prediction mask image and the target mask image of the image to be segmented to meet preset conditions, and completing stage training of the image segmentation model.

In an exemplary embodiment, the interactive characterization image further includes a second orthogonal mutual characterization image and a second negative interactive characterization image.

In an exemplary embodiment, inputting the second input image into the image segmentation model, and training the image segmentation model may include: inputting the second input image into the image segmentation model to obtain a second mask image; combining the image to be segmented, the second mask image, the second orthogonal mutual representation image and the second negative mutual representation image to obtain a third input image; and inputting the third input image into the image segmentation model, and performing stage training on the image segmentation model.

Specifically, an image to be segmented, a first mask image, a first orthogonal mutual representation image and a first negative interaction representation image are input into an image segmentation model to be processed to obtain a second mask image; the user paints or clicks according to the image to be segmented and the second mask image to generate a second orthogonal mutual representation image and a second negative mutual representation image; combining the image to be segmented, a second orthogonal mutual representation image of a second mask image and a second negative mutual representation image to obtain a third input image; and inputting the third input image into the image segmentation model for processing, and so on to obtain a predicted mask image (namely the second predicted mask image).

In an exemplary embodiment, the image segmentation model includes an encoding layer and a decoding layer; inputting a second input image into the image segmentation model, and performing stage training on the image segmentation model, wherein the stage training comprises the following steps: inputting a second input image into the coding layer for coding processing to obtain a second feature vector; inputting the second feature vector into a decoding layer for decoding to obtain a second prediction mask image; and performing stage training on the image segmentation model according to the second prediction mask image.

With reference to fig. 3, when the current segmentation mode is the interactive segmentation mode, merging the image to be segmented 305, the first mask image 306, the first orthogonal mutual representation image 307 and the first negative mutual representation image 308 to obtain a second input image; inputting a second input image into the coding layer 309 of the image segmentation model for processing to obtain a second feature vector; inputting the second feature vector into the decoding layer 310 for processing to obtain a second mask image; determining a loss function according to the second prediction mask image and a target mask image (label) of the image to be segmented, adjusting model parameters of the image segmentation model, enabling errors of the second prediction mask image and the target mask image of the image to be segmented to meet preset conditions, and completing stage training of the image segmentation model.

FIG. 7 is a flow chart illustrating a method of image segmentation according to an exemplary embodiment. Fig. 7 shows an application process of the image segmentation model after the image segmentation model is obtained by training using the method provided by the above embodiment.

In step S710, an image to be segmented and an interactive characterization image are acquired.

In step S720, when the segmentation mode is the foreground segmentation mode, the image to be segmented is input into the image segmentation model obtained by training according to the method provided in any of the embodiments, so as to obtain a first target mask image.

In step S730, when the segmentation mode is the interactive segmentation mode, the image to be segmented and the interactive representation image are input into the image segmentation model obtained by training according to the method provided in any of the embodiments, so as to obtain a second target mask image.

In practical application, a segmentation model can be selected according to actual needs; when the selected segmentation mode is the foreground segmentation mode, inputting an image to be segmented into an image segmentation model (at the moment, the interactive representation image can be set to be in an inactivated state), and automatically processing the image to be segmented by the image segmentation model to obtain a first target mask image; when the selected segmentation mode is the interactive segmentation mode, inputting the image to be segmented and the interactive representation image into the image segmentation model (at this time, the interactive representation image can be set to be in an activated state), and automatically processing the image to be segmented and the interactive representation image by the image segmentation model to obtain a second target mask image.

Therefore, in the image segmentation method provided by the embodiment of the disclosure, in the model application process, one image segmentation model can be used for respectively processing two tasks of foreground segmentation and interactive segmentation, so that on one hand, the memory space can be saved, and the computer resources can be saved, thereby avoiding the problem that the image segmentation model is easy to cause unsmooth in application; and on the other hand, the generalization and the robustness of the two tasks can be improved.

In addition, compared with the prior art that the foreground segmentation and the interactive segmentation are respectively tested and deployed as two independent tasks, the image segmentation model obtained by training through the method can save about half of memory space during testing and deployment.

It should also be understood that the above description is intended only to assist those skilled in the art in better understanding the embodiments of the present disclosure, and is not intended to limit the scope of the embodiments of the present disclosure. Various equivalent modifications or changes will be apparent to those skilled in the art in light of the above examples given, for example, some steps in the above methods may not be necessary or some steps may be added newly. Or a combination of any two or more of the above embodiments. Such modifications, variations, or combinations are also within the scope of the embodiments of the present disclosure.

It should also be understood that the foregoing description of the disclosed embodiments focuses on emphasizing differences between the various embodiments, and that the same or similar elements that are not mentioned may be referred to one another and, for brevity, are not described again.

It should also be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiment of the present disclosure.

It is also to be understood that, in various embodiments of the present disclosure, unless otherwise specified or conflicting, terms and/or descriptions between different embodiments may have consistency and may be mutually referenced, and technical features in different embodiments may be combined to form new embodiments according to their inherent logical relationships.

Examples of the training method of the image segmentation model provided by the present disclosure are described above in detail. It is understood that the computer device comprises hardware structures and/or software modules for performing the functions in order to realize the functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

FIG. 8 is a block diagram illustrating an apparatus for training an image segmentation model according to an exemplary embodiment. Referring to fig. 8, the apparatus 800 may include an acquisition module 810, a selection module 820, a training module 830, and an update module 840.

Wherein the obtaining module 810 is configured to perform obtaining the image to be segmented and the interactive representation image; the selection module 820 is configured to perform determining a current segmentation mode among a foreground segmentation mode and an interactive segmentation mode based on a random selection manner, the foreground segmentation mode being used for performing stage training on an image segmentation model by using the image to be segmented, the interactive segmentation mode being used for performing stage training on the image segmentation model by using the image to be segmented and the interactive characterization image; the training module 830 is configured to perform a phase training of the image segmentation model using the current segmentation mode; the updating module 840 is configured to perform updating the current segmentation mode based on the randomly selected manner when the stage training is completed, so as to train the image segmentation model using the updated current segmentation mode until a training end condition is satisfied.

In some exemplary embodiments of the present disclosure, when the current segmentation mode is the foreground segmentation mode, the training module 830 is further configured to perform acquiring a preset mask image, a preset positive interaction characterization image, and a preset negative interaction characterization image, where pixel values of the preset mask image, the preset positive interaction characterization image, and the preset negative interaction characterization image are all designated pixel values; combining the image to be segmented, the preset mask image, the preset positive interaction characterization image and the preset negative interaction characterization image to obtain a first input image; and inputting the first input image into the image segmentation model, and performing stage training on the image segmentation model.

In some exemplary embodiments of the present disclosure, the image segmentation model includes an encoding layer and a decoding layer; the training module 830 is further configured to perform encoding processing on the first input image input to the encoding layer, to obtain a first feature vector; inputting the first feature vector into the decoding layer for decoding processing to obtain a first prediction mask image; and performing stage training on the image segmentation model according to the first prediction mask image.

In some exemplary embodiments of the present disclosure, when the current segmentation mode is the interactive segmentation mode, the interactive characterization image includes a first mask image, a first orthogonal mutual characterization image, and a first negative interactive characterization image; wherein the training module 830 is further configured to perform merging the image to be segmented, the first mask image, the first orthogonal mutual representation image, and the first negative mutual representation image to obtain a second input image; and inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model.

In some exemplary embodiments of the present disclosure, the interactive representation images further comprise a second orthogonal mutual representation image and a second negative interactive representation image; wherein the training module 830 is further configured to perform inputting the second input image into the image segmentation model, obtaining a second mask image; combining the image to be segmented, the second mask image, the second orthogonal mutual representation image and the second negative mutual representation image to obtain a third input image; and inputting the third input image into the image segmentation model, and performing stage training on the image segmentation model.

In some exemplary embodiments of the present disclosure, the image segmentation model includes an encoding layer and a decoding layer; wherein the training module 830 is further configured to perform encoding processing on the second input image input to the encoding layer, to obtain a second feature vector; inputting the second feature vector into the decoding layer for decoding processing to obtain a second prediction mask image; and performing stage training on the image segmentation model according to the second prediction mask image.

Fig. 9 is a block diagram illustrating an image segmentation apparatus according to an exemplary embodiment. Referring to fig. 9, the apparatus 900 may include an obtaining module 910 and an obtaining module 920.

Wherein the obtaining module 910 is configured to perform obtaining an image to be segmented and an interactive representation image; the obtaining module 920 is configured to input the image to be segmented into an image segmentation model obtained by training according to the method of any of the embodiments when the segmentation mode is the foreground segmentation mode, so as to obtain a first target mask image; the obtaining module 920 is further configured to input the image to be segmented and the interactive representation image into an image segmentation model obtained by training according to the method of any of the above embodiments when the segmentation mode is the interactive segmentation mode, so as to obtain a second target mask image.

It is noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor terminal devices and/or microcontroller terminal devices.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting different system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.

Where the storage unit stores program code that may be executed by the processing unit 1010 to cause the processing unit 1010 to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may perform the various steps as shown in fig. 2.

As another example, the electronic device may implement the various steps shown in FIG. 2.

The memory unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 1021 and/or a cache memory unit 1022, and may further include a read only memory unit (ROM) 1023.

Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus to perform the above-described method is also provided. Alternatively, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the training method of the image segmentation model in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of an image segmentation model is characterized by comprising the following steps:

acquiring an image to be segmented and an interactive representation image;

determining a current segmentation mode in a foreground segmentation mode and an interactive segmentation mode based on a random selection mode, wherein the foreground segmentation mode is used for performing stage training on an image segmentation model by using the image to be segmented, and the interactive segmentation mode is used for performing stage training on the image segmentation model by using the image to be segmented and the interactive representation image;

performing stage training on the image segmentation model by using the current segmentation mode;

and when the stage training is finished, updating the current segmentation mode based on the random selection mode, and training the image segmentation model by using the updated current segmentation mode until a training end condition is met.

2. The method for training the image segmentation model according to claim 1, wherein when the current segmentation mode is the foreground segmentation mode, the performing stage training on the image segmentation model using the current segmentation mode includes:

acquiring a preset mask image, a preset positive interaction characterization image and a preset negative interaction characterization image, wherein the pixel values of the preset mask image, the preset positive interaction characterization image and the preset negative interaction characterization image are all designated pixel values;

combining the image to be segmented, the preset mask image, the preset positive interaction characterization image and the preset negative interaction characterization image to obtain a first input image;

and inputting the first input image into the image segmentation model, and performing stage training on the image segmentation model.

3. The method for training an image segmentation model according to claim 2, wherein the image segmentation model comprises an encoding layer and a decoding layer;

the inputting the first input image into the image segmentation model, and performing stage training on the image segmentation model includes:

inputting the first input image into the coding layer for coding processing to obtain a first feature vector;

inputting the first feature vector into the decoding layer for decoding processing to obtain a first prediction mask image;

and performing stage training on the image segmentation model according to the first prediction mask image.

4. The training method of the image segmentation model according to claim 1 or 2, wherein when the current segmentation mode is the interactive segmentation mode, the interactive characterization image comprises a first mask image, a first orthogonal mutual characterization image and a first negative interactive characterization image; the performing stage training on the image segmentation model by using the current segmentation mode comprises:

combining the image to be segmented, the first mask image, the first orthogonal mutual representation image and the first negative mutual representation image to obtain a second input image;

and inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model.

5. The method for training the image segmentation model according to claim 4, wherein the interactive characterization images further comprise a second orthogonal mutual characterization image and a second negative interactive characterization image;

the inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model includes:

inputting the second input image into the image segmentation model to obtain a second mask image;

combining the image to be segmented, the second mask image, the second orthogonal mutual representation image and the second negative mutual representation image to obtain a third input image;

and inputting the third input image into the image segmentation model, and performing stage training on the image segmentation model.

6. The method for training an image segmentation model according to claim 4, wherein the image segmentation model comprises an encoding layer and a decoding layer;

inputting the second input image into the image segmentation model, and performing stage training on the image segmentation model, wherein the stage training comprises:

inputting the second input image into the coding layer for coding processing to obtain a second feature vector;

inputting the second feature vector into the decoding layer for decoding processing to obtain a second prediction mask image;

and performing stage training on the image segmentation model according to the second prediction mask image.

7. The method for training an image segmentation model according to claim 1, wherein the random selection is performed in the foreground segmentation mode and the interactive segmentation mode with a preset probability.

8. An image segmentation method, comprising:

acquiring an image to be segmented and an interactive representation image;

when the segmentation mode is a foreground segmentation mode, inputting the image to be segmented into an image segmentation model obtained by training according to the method of any one of claims 1 to 7, and obtaining a first target mask image;

when the segmentation mode is an interactive segmentation mode, inputting the image to be segmented and the interactive representation image into an image segmentation model obtained by training according to the method of any one of claims 1 to 7, and obtaining a second target mask image.

9. An apparatus for training an image segmentation model, comprising:

an acquisition module configured to perform acquisition of an image to be segmented and an interactive representation image;

a selection module configured to perform a determination of a current segmentation mode based on a random selection manner among a foreground segmentation mode and an interactive segmentation mode, the foreground segmentation mode being used for performing a phase training on an image segmentation model using the image to be segmented, the interactive segmentation mode being used for performing a phase training on the image segmentation model using the image to be segmented and the interactive characterization image;

a training module configured to perform phase training of the image segmentation model using the current segmentation mode;

and the updating module is configured to update the current segmentation mode based on the random selection mode when the stage training is completed so as to train the image segmentation model by using the updated current segmentation mode until a training end condition is met.

10. An image segmentation apparatus, comprising:

the acquisition module is configured to acquire an image to be segmented and an interactive representation image;

an obtaining module configured to input the image to be segmented into an image segmentation model obtained by training according to the method of any one of claims 1 to 7 when a segmentation mode is a foreground segmentation mode, and obtain a first target mask image;

the obtaining module is further configured to perform, when the segmentation mode is an interactive segmentation mode, inputting the image to be segmented and the interactive representation image into an image segmentation model obtained by training according to the method of any one of claims 1 to 7, and obtaining a second target mask image.

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the executable instructions to implement the method of training an image segmentation model according to any one of claims 1 to 7 or the method of image segmentation according to claim 8.

12. A computer readable storage medium, instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the method of training an image segmentation model according to any one of claims 1 to 7 or the method of image segmentation according to claim 8.