CN116934769A

CN116934769A - Interactive segmentation model training method, annotation data generation method and equipment

Info

Publication number: CN116934769A
Application number: CN202210319010.6A
Authority: CN
Inventors: 吴俊塔; 傅依
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-10-24
Also published as: WO2023185391A1

Abstract

The embodiment of the disclosure provides an interactive segmentation model training method, a labeling data generation method and equipment, wherein a labeling image is obtained, the labeling image comprises a target image and corresponding labeling information, simulation interaction information is generated based on the labeling image, and the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of the target object indicated by the simulation interaction information; based on the mask information and the labeling information, model parameters of the interactive segmentation model are adjusted. The trained interactive segmentation model can output mask information representing an image area of a target object indicated by the user interaction information based on the user interaction information, so that the aim of quickly segmenting the target object from an image by a user can be fulfilled, and the generation efficiency of annotation data is improved.

Description

Interactive segmentation model training method, annotation data generation method and equipment

Technical Field

The embodiment of the disclosure relates to the technical field of computer vision and artificial intelligence, in particular to an interactive segmentation model training method, a labeling data generation method and equipment.

Background

Currently, with the rapid development of artificial intelligence technology, an image segmentation technology based on deep learning is widely applied to various industries, and the acquired images are processed through a pre-trained neural network model so as to realize various applications such as identification, segmentation and the like of target objects in the images.

In the prior art, in order to obtain a better segmentation effect, a neural network model for image segmentation needs to be trained by using a large amount of labeling data before use. However, in the prior art, the marking data is usually obtained by manual edge-drawing and image-matting, which results in the problems of low obtaining efficiency and high obtaining cost of the marking data.

Disclosure of Invention

The embodiment of the disclosure provides an interactive segmentation model training method, a labeling data generation method and equipment, so as to solve the problems of low labeling data generation efficiency and high cost.

In a first aspect, an embodiment of the present disclosure provides an interactive segmentation model training method, including:

obtaining a labeling image, wherein the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of a target object in the target image, and the following steps are circularly executed based on the labeling image until a preset condition is reached: generating simulation interaction information based on the annotation image, wherein the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of a target object indicated by the simulation interaction information; and adjusting model parameters of the interactive segmentation model based on the mask information and the labeling information.

In a second aspect, an embodiment of the present disclosure provides a method for generating annotation data, including:

acquiring an image to be segmented and user interaction information, wherein the user interaction information characterizes the selection operation of a user on a target object in the image to be segmented; inputting the image to be segmented and the user interaction information into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information characterizes a predicted image area of a target object indicated by the user interaction information, the interactive segmentation model is obtained by training based on a labeling image and automatically generated simulation interaction information, the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of the target object in the target image, and the simulation interaction information is used for simulating selection operation of a user on the target object in the target image; and obtaining labeling data based on the mask information and the image to be segmented.

In a third aspect, an embodiment of the present disclosure provides an interactive segmentation model training apparatus, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a marked image, the marked image comprises a target image and corresponding marked information, and the marked information is used for indicating an actual image area of a target object in the target image;

The training module is used for circularly executing the following steps based on the marked image until reaching the preset condition: generating simulation interaction information based on the annotation image, wherein the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of a target object indicated by the simulation interaction information; and adjusting model parameters of the interactive segmentation model based on the mask information and the labeling information.

In a fourth aspect, an embodiment of the present disclosure provides an annotation data generating apparatus, including:

the acquisition module is used for acquiring an image to be segmented and user interaction information, wherein the user interaction information characterizes the selection operation of a user on a target object in the image to be segmented;

the processing module is used for inputting the image to be segmented and the user interaction information into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information characterizes a predicted image area of a target object indicated by the simulation interaction information, the interactive segmentation model is obtained by training based on a labeling image and automatically generated simulation interaction information, the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of the target object in the target image, and the simulation interaction information is used for simulating selection operation of a user on the target object in the target image;

And the labeling module is used for obtaining labeling data based on the mask information and the image to be segmented.

In a fifth aspect, embodiments of the present disclosure provide an electronic device, including:

a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes the computer-executable instructions stored by the memory to implement the interactive segmentation model training method as described above in the first aspect and the various possible designs of the first aspect; or to implement the annotation data generation method as described above in the second aspect and the various possible designs of the second aspect.

In a sixth aspect, embodiments of the present disclosure provide a computer readable storage medium having stored therein computer executable instructions that, when executed by a processor, implement the interactive segmentation model training method according to the first aspect and the various possible designs of the first aspect; alternatively, the method for generating annotation data according to the second aspect and the various possible designs of the second aspect may be implemented.

In a seventh aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the interactive segmentation model training method of the first aspect and the various possible designs of the first aspect as described above; alternatively, the method for generating annotation data according to the second aspect and the various possible designs of the second aspect may be implemented.

According to the interactive segmentation model training method, the annotation data generating method and the equipment, through obtaining the annotation image, the annotation image comprises a target image and corresponding annotation information, the annotation information is used for indicating an actual image area of a target object in the target image, and the following steps are circularly executed based on the annotation image until a preset condition is reached: generating simulation interaction information based on the annotation image, wherein the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of a target object indicated by the simulation interaction information; and adjusting model parameters of the interactive segmentation model based on the mask information and the labeling information. Because the trained interactive segmentation model can output mask information representing an image area of the target object indicated by the user interaction information based on the user interaction information, the target object is not required to be manually segmented by a user, the aim of quickly segmenting the target object from the image by the user can be achieved, and the generation efficiency of the annotation data is improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the present disclosure, and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.

Fig. 1 is a schematic diagram of a training process of an image segmentation model according to an embodiment of the disclosure;

FIG. 2 is a schematic diagram of a process of manual labeling in the prior art;

fig. 3 is a flowchart of a method for generating annotation data according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of user interaction information according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a process for obtaining mask information based on user interaction information according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an image segmentation process with multiple interaction steps according to an embodiment of the present disclosure;

FIG. 7 is a flowchart illustrating a method for training an interactive segmentation model according to an embodiment of the present disclosure;

FIG. 8 is a flowchart showing steps for implementing step S202 in the embodiment shown in FIG. 7;

FIG. 9 is a schematic diagram of a process for generating line segment identifiers according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a process for training an interactive segmentation model according to an embodiment of the present disclosure;

FIG. 11 is a second flowchart of an interactive segmentation model training method according to an embodiment of the present disclosure;

FIG. 12 is a schematic illustration of a region of discrepancy provided by an embodiment of the present disclosure;

FIG. 13 is a schematic view of a first marking area and a second marking area provided by an embodiment of the present disclosure;

FIG. 14 is a schematic diagram of another process for training an interactive segmentation model provided by embodiments of the present disclosure;

fig. 15 is a block diagram of a labeling data generating device according to an embodiment of the present disclosure;

FIG. 16 is a block diagram of an interactive segmentation model training apparatus provided in an embodiment of the disclosure;

fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;

fig. 18 is a schematic hardware structure of an electronic device according to an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

The application scenario of the embodiments of the present disclosure is explained below:

fig. 1 is a schematic diagram of a training process of an image segmentation model according to an embodiment of the present disclosure, where a neural network model (shown as an image segmentation model in the figure) for automatic image segmentation is currently used, and before use, model training is required to be performed through a large amount of labeling data with labeling information, and referring to fig. 1, firstly, service-related image data (non-labeling data) is acquired, then, the image data is manually labeled by a labeling person, so as to generate labeling data with labeling information, and then, model training is performed on the image segmentation model by using the labeling data until convergence is achieved, so that a model with a certain generalization capability is obtained, and a specified object (e.g., a face and a human body) in an image can be automatically segmented and extracted, so that corresponding service application is realized.

The interactive segmentation model training method and the annotation data generating method provided by the embodiment of the disclosure can be applied to the links of manual annotation in the scene, in particular to the application scene of interactively generating the annotation data. In the prior art, a scheme for manually labeling an image is generally that an annotator manually performs edge drawing selection, namely image matting, based on the outlines of different objects in the image by using image processing software, generates a mask corresponding to the objects, and performs corresponding type labeling to complete the manual labeling process. Fig. 2 is a schematic diagram of a manual labeling process in the prior art, as shown in fig. 2, in which an image includes a "tree" and a "bicycle", and in the process of manually labeling the image by using the "tree" as a target object, a labeling person needs to manually outline the "tree" by means of image processing software, so as to generate a mask corresponding to the "tree", and the process is very time-consuming, thereby causing the problems of low efficiency and high cost of the manual labeling process.

Therefore, a method is needed currently for solving the problem of low image labeling efficiency caused by that an operation instruction (interaction information) input by a user cannot be mapped to a corresponding target object to be labeled quickly in the manual labeling process. The embodiment of the disclosure provides a method for generating annotation data based on an interactive segmentation model and a training method for the interactive segmentation model to solve the problems.

Fig. 3 is a flowchart of a method for generating annotation data according to an embodiment of the disclosure. The method of the embodiment can be applied to electronic equipment such as terminal equipment, a server and the like, and the method for generating the annotation data comprises the following steps:

step S101: and acquiring an image to be segmented and user interaction information, wherein the user interaction information characterizes the selection operation of a user on a target object in the image to be segmented.

The process of generating the annotation data is described in this embodiment by way of example with a terminal device, such as a computer, as an execution subject. Specifically, the terminal device is provided with a man-machine interaction interface, and a user inputs the terminal device through the man-machine interaction interface and obtains information output by the terminal device. The user interaction information is operation information input to the terminal device by a user through a man-machine interaction interface. More specifically, the user interaction information is, for example, a point or a line segment characterizing a user selection operation for a target object within the image to be segmented. Fig. 4 is a schematic diagram of user interaction information provided by an embodiment of the present disclosure, where, as shown in fig. 4, a terminal device displays an image to be segmented, including a target object "tree" through an output device, such as a display screen. The user operates an input device of the terminal device, such as a mouse, to perform an interactive operation, and controls the mouse pointer to move a distance inside the target object. The terminal device obtains user interaction information corresponding to the interaction operation, namely, a line segment formed by movement of the mouse pointer, which is positioned inside the target object as shown in fig. 4. The user interaction information characterizes the selection operation of the target object by a user.

It will be appreciated that controlling the mouse pointer to move a distance inside the target object to form a line segment is only one exemplary implementation of the user interaction information, and the user interaction information may be implemented in other ways, such as clicking on the inside of the target object. However, no matter what form, the user interaction information is indicative information, and compared with the interaction information in the form of 'tracing' input by the user in the prior art, the method has the advantages that the great simplification is realized, and the time consumption for inputting the user interaction information by the user is also greatly shortened.

Step S102: and inputting the image to be segmented and the user interaction information into the interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of the target object indicated by the user interaction information.

The interactive segmentation model is trained based on a labeling image and automatically generated simulation interaction information, the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of a target object in the target image, and the simulation interaction information is used for simulating selection operation of a user on the target object in the target image.

Further, after obtaining the image to be segmented and the user interaction information, the terminal equipment takes the image to be segmented and the user interaction information as input, executes the pre-trained interactive segmentation model, and obtains mask information output by the interactive segmentation model. The mask information is a result of predicting the target object indicated by the user interaction information by the interactive segmentation model, that is, a predicted image area of the target object in the image to be segmented.

The interactive segmentation model is a model which takes a labeling image and user interaction information as training samples and trains until convergence, and can realize the mapping of the user interaction information and a target object indicated by the user interaction information. Therefore, after the image to be segmented and the user interaction information are input into the interactive segmentation model, the image area of the target object indicated by the user interaction information, namely, the mask information, can be obtained. Fig. 5 is a schematic diagram of a process of obtaining mask information based on user interaction information according to an embodiment of the present disclosure, as shown in fig. 5, after obtaining the mask information based on the user interaction information, obtaining mask information representing an image area of a target object by inputting an image to be segmented and the user interaction information into an interactive segmentation model, and more specifically, the mask information is used for describing a contour of the target object in the image to be segmented, and segmentation of the target object can be completed based on the mask information. The mask information, that is, the output result of image segmentation by the "stroking" method in the prior art, is herein described. Because the mask information obtained based on the interactive segmentation model and the user interaction information has higher efficiency compared with the mask information obtained in a 'tracing' mode, the generation efficiency of the annotation data is realized.

In the application scene of manual annotation, based on different segmentation requirements, a user selects a target object, so that a certain degree of flexibility exists, full automatic segmentation cannot be realized based on an interactive segmentation model, and accurate image segmentation is realized through a plurality of interactive steps. For example, the image to be segmented includes a hat-shaped character image, and when the image segmentation is performed based on the interactive segmentation model, the user interaction information is a simplified identifier (such as a line segment, a point, etc.), so when the user interaction information is located in the image area where the character image is located, whether the hat carried by the character image is segmented as a target object, the terminal device cannot obtain corresponding reference information, and therefore, at this time, the user interaction information needs to be further input to perform confirmation, for example, by further clicking the hat carried by the character image in the image to be segmented, and the hat not segmented last time is segmented as a part of the target object.

For the above reasons, in one possible implementation, the interactive segmentation model further comprises an input: the prior mask information is the mask information output by the last interactive segmentation model. Fig. 6 is a schematic diagram of an image segmentation process with multiple interaction steps according to an embodiment of the present disclosure, as shown in fig. 6, and in an exemplary image segmentation process requiring multiple interactions, when an image to be segmented is first interacted (Step 1), the image to be segmented is segmented, and an interactive segmentation model is executed by using the image to be segmented and first user interaction information as input information, so as to obtain first mask information output by the interactive segmentation model; and in the second interaction (Step 2), the first mask information, the image to be segmented and the second user interaction information are used as input information, the interactive segmentation model is executed, the second mask information output by the interactive segmentation model is obtained, and then the process is repeated until the ideal mask information is obtained.

In the step of the embodiment, the prior mask information output by the last interactive segmentation model is obtained and used as input information to participate in model operation, so that the image segmentation in the scene of multiple interactive steps is realized, the accuracy and the flexibility of the image segmentation are improved, and the generation efficiency of the annotation data is improved.

Step S103: and obtaining labeling data based on the mask information and the image to be segmented.

For example, after obtaining the mask information, the mask information is a predicted image area representing the target object, and because the trained interactive segmentation model can accurately describe the image area corresponding to the target object, the mask information can be used as labeling information of the target object in the image to be segmented, so as to generate labeling data. And further training of an automated image segmentation model is achieved by using the annotation data.

In this embodiment, because the trained interactive segmentation model can output mask information representing the image area of the target object indicated by the user interaction information based on the user interaction information, the user does not need to manually segment the target object, so that the purpose of quickly segmenting the target object from the image by the user can be achieved, and the generation efficiency of the annotation data is improved.

Corresponding to the method for generating annotation data based on the interactive segmentation model provided in the above embodiment, the embodiment of the disclosure further provides an interactive segmentation model training method. Fig. 7 is a flowchart of an interactive segmentation model training method according to an embodiment of the present disclosure, and referring to fig. 7, the interactive segmentation model training method according to an embodiment of the present disclosure includes:

step S201: and acquiring a labeling image, wherein the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of a target object in the target image, and the following steps are circularly executed based on the labeling image until a preset condition is reached.

Illustratively, the annotation image is image data comprising a target image and corresponding annotation information, wherein the target image is an original image for model training, e.g. an RGB image; the labeling information is used for indicating information of an actual image area of the target object in the target image, namely a segmentation result of the target object in the target image. More specifically, the labeling image is image data obtained by completing image segmentation, wherein the number of segmented target objects in the labeling image can be one or more, and the labeling image can be obtained by an open-source segmentation data set, which is not described herein.

Further, in the process of training the interactive segmentation model, a plurality of labeling images are required to train the model repeatedly until reaching a preset condition, for example, the training frequency reaches a preset number of times, or the training time reaches a preset duration, or the loss function value is smaller than a preset value, etc., which is not particularly limited herein, and the following steps in the embodiment are described by taking one training cycle process as an example.

Step S202: based on the annotation image, simulation interaction information is generated, and the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image.

For example, unlike an automatic image segmentation model, the interactive segmentation model needs to operate with user interaction information as input, so that in the process of training the interactive segmentation model, not only image data to be marked, but also corresponding user interaction information is needed to be used as a training sample. Because there is no open source training sample of this type, the cost of obtaining the training sample with user interaction information by means of manual collection is extremely high. In order to solve the above problems, in this embodiment, based on the annotation image, description of the target object in the target image by using the annotation information in the annotation image simulates the selection operation of the user on the target object (because the selection operation of the user on the target object in the target image is also performed based on the actual image area of the target object), generates the simulated interactive information, and further trains the generated simulated interactive information as the input quantity in the subsequent training process, thereby solving the problems of high cost and low efficiency of manually collecting the user interactive information.

Illustratively, as shown in fig. 8, step S202 includes two specific implementation steps of steps S2021, S2022:

in step S2021, a marker region is determined based on the labeling information, where the marker region is a trigger region for simulating the selection operation corresponding to the interaction information.

Step S2022, randomly generating an operation identifier in the marker region.

In step S2023, the simulated interactive information is generated according to the pixel coordinates of the operation identifier.

Specifically, the marking area is a trigger area for simulating a selection operation corresponding to the interaction information, i.e., an area for simulating a selection operation performed on the target object by a user, for example, an area for performing a click operation on the target image, and an area for generating a track by sliding a mouse pointer on the target image. In one possible implementation, when the simulated interactive information is that the simulated user performs the selection operation on the target object for the first time, the marked area is an actual image area of the target object indicated by the marking information, and the simulated user performs the operation in the area, so as to generate an operation identifier, and specifically, the operation identifier is a line segment, a point, or other graphics or shapes, for example. And then, based on the pixel coordinates of the operation identifier, generating corresponding simulation interaction information to simulate the operation of the user on the region.

In one possible implementation, the operation identifier includes a line segment identifier; randomly generating an operation identifier in the marked area, including: based on a skeleton extraction algorithm, obtaining a region skeleton corresponding to the marked region, wherein the region skeleton comprises continuous line segments with preset pixel widths; and randomly intercepting at least one segment with a preset length on the regional skeleton to generate a segment mark.

Fig. 9 is a schematic diagram of a process of generating a line segment identifier according to an embodiment of the present disclosure, where, as shown in fig. 9, an actual image area of a target object is determined to be an a area based on labeling information, then the a area is taken as an identifier area, a corresponding area skeleton a is generated in the a area based on a skeleton extraction algorithm, the area skeleton a includes continuous line segments a1, a2 and a3 with preset pixel widths, and then at least one line segment with preset length is randomly intercepted on the a1, a2 and a3 to obtain a line segment identifier b1 and a line segment identifier b2. Further, based on the pixel coordinates of the line segment identification, simulated interaction information describing the line segment identification is generated.

Step S203: and inputting the simulation interaction information and the target image into the interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of the target object indicated by the simulation interaction information.

Step S204: based on the mask information and the labeling information, model parameters of the interactive segmentation model are adjusted.

Illustratively, the interactive segmentation model is a neural network model based on an encoder-decoder structure. Fig. 10 is a schematic diagram of a process of training an interactive segmentation model according to an embodiment of the present disclosure, as shown in fig. 10, the simulated interaction information and the target image are input into an encoder of the interactive segmentation model, and then the corresponding segmentation result, that is, mask information, is output via a decoder in the model. The mask information is a prediction result (predicted image area) of the interactive segmentation model on the image area of the target object indicated by the simulated interactive information, when the interactive segmentation model is not converged, a certain difference area exists between the predicted image area and the actual image area of the target object represented by the labeling information, a loss function value corresponding to the difference area is calculated based on a preset loss function, and the interactive segmentation model is back-propagated to adjust model parameters of the interactive segmentation model.

And circularly executing the steps until the preset conditions are reached, and finishing the training process of the interactive segmentation model. The trained interactive segmentation model can realize rapid segmentation of the target object by inputting the image to be segmented and the user interaction information and outputting the mask information of the target object indicated by the user interaction information. The specific implementation process of calculating the loss function and adjusting the model parameters by using back propagation is the prior art executed by those skilled in the art, and will not be described herein.

In this embodiment, by acquiring a labeling image, the labeling image includes a target image and corresponding labeling information, where the labeling information is used to indicate an actual image area of a target object in the target image, and performing the following steps in a loop based on the labeling image until a preset condition is reached: based on the labeling image, generating simulation interaction information, wherein the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of the target object indicated by the simulation interaction information; based on the mask information and the labeling information, model parameters of the interactive segmentation model are adjusted. Because the trained interactive segmentation model can output mask information representing an image area of the target object indicated by the user interaction information based on the user interaction information, the target object is not required to be manually segmented by a user, the aim of quickly segmenting the target object from the image by the user can be achieved, and the generation efficiency of the annotation data is improved.

Fig. 11 is a second flowchart of an interactive segmentation model training method according to an embodiment of the disclosure. In this embodiment, the process of generating the simulated interactive information is further refined, and a training process based on prior information is added, and the interactive segmentation model training method includes:

Step S301: and acquiring a labeling image, wherein the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of a target object in the target image, and the following steps are circularly executed based on the labeling image until a preset condition is reached.

In the process of training the interactive segmentation model, in the scene of the image segmentation process of multiple interaction steps as shown in fig. 6, training needs to be performed for multiple interaction steps of one labeling image until a preset condition is reached, where the preset condition in the scene includes, for example, that the training times reach a preset number of times, or that the training time reach a preset duration, or that the loss function values of the mask information and the labeling information are smaller than a preset value, and the like, and no specific limitation is made herein. And then, according to the requirement, carrying out cyclic training by using a plurality of marked images until the model converges, wherein the following steps in the embodiment are introduced by taking training cycle in a non-first interaction process as an example.

Step S302: and acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process.

The prior information is mask information obtained in a previous iteration process, and more specifically, for example, the present iteration is to simulate an interaction process corresponding to a second selection operation for the target object, and in a previous iteration process before the present iteration (i.e., an interaction process corresponding to a first selection operation for the target object), the mask information output by the interactive segmentation model is the prior information in the step of the present embodiment.

Step S303: and determining difference information according to the labeling information and the priori information, wherein the difference information characterizes a difference region between a predicted image region corresponding to the priori information and an actual image region corresponding to the labeling information.

When the interactive segmentation model is not converged, the output mask information (prior information) and the labeling information are different, namely a certain difference area exists between the predicted image area and the actual image area. In the multiple interaction step scenario applied in this embodiment, the difference region corresponds to the "understanding deviation region" of the interactive segmentation model to the user instructions. Fig. 12 is a schematic diagram of a difference region provided in an embodiment of the present disclosure, as shown in fig. 12, based on an original target image, an actual image region corresponding to labeling information is an a region, that is, a head outline region of a character image, a predicted image region corresponding to prior information is a B region, and a non-overlapping region between the two regions is a difference region, where the difference information is information describing the difference region, for example, implemented in a pixel coordinate manner.

Illustratively, the annotation information comprises a first binary map corresponding to an actual image region of the target object and the prior information comprises a second binary map corresponding to a predicted image region. In one possible implementation, determining the difference information according to the labeling information and the prior information includes: and comparing the difference points of the first binary image and the second binary image to determine difference information.

The binary image is a data structure represented by only two numerical values, for example, a matrix formed by 0 and 1, and the size of the binary image is the same as that of the target image, and the description of the outline of the target object can be realized through the binary image. Further, in this embodiment, by comparing the first binary image corresponding to the actual image area with the second binary image corresponding to the predicted image area, the difference point between the first binary image and the second binary image may be determined, so as to generate difference information describing the difference point set.

Step S304: and determining a first mark area and/or a second mark area according to the difference information, wherein the first mark area is a missed cut image area of the predicted image area relative to the actual image area, and the second mark area is a miscut image area of the predicted image area relative to the actual image area.

Further, in order to simulate the interactive operation of the user after finding the difference region, the difference region corresponding to the different type is set as a different mark region according to the difference information describing the difference point, namely, the type of the difference region is set according to whether the difference region is positioned at the outer side or the inner side of the actual image region, specifically, when the difference region is positioned at the inner side of the actual image, namely, the image region is miscut, and when the deviation is positioned at the outer side of the actual image, namely, the image region is missed. Accordingly, according to the type of the difference region, a corresponding first mark region and/or a second mark region are determined, and fig. 13 is a schematic diagram of the first mark region and the second mark region provided in the embodiment of the present disclosure, as shown in fig. 13, in the target image, the first mark region is a missed image region (indicated by "+" in the figure) of the predicted image region relative to the actual image region, and the second mark region is a miscut image region (indicated by "-" in the figure) of the predicted image region relative to the actual image region.

Step S305: a first operation mark for indicating the missed cut image region is generated in the first mark region, and a second operation mark for indicating the miscut image region is generated in the second mark region.

Step S306: and generating simulation interaction information according to the first operation identifier and the second operation identifier.

Further, after the first marking area and/or the second marking area are/is determined, an operation identifier is set in the corresponding marking area to realize further filling or clipping of the difference area of the corresponding type, specifically, a first operation identifier is set in the missed clipping image area, a second operation identifier is set in the miscut image area, the first operation identifier represents filling of the corresponding area, and the second operation identifier represents clipping of the corresponding area. Further, interaction simulation interaction information is generated according to the first operation identifier and the second operation identifier, and the interaction simulation interaction information can represent different operations input by a user for different difference areas (a missed cut image area and a misclassified image area).

Step S307: and inputting the simulation interaction information, the target image and the prior information into the interactive segmentation model to obtain mask information output by the interactive segmentation model.

Further, fig. 14 is a schematic diagram of another process of training an interactive segmentation model according to an embodiment of the present disclosure, as shown in fig. 14, simulation interaction information, a target image and prior information are input into the interactive segmentation model to obtain mask information output by the interactive segmentation model, where the mask information utilizes the prior information output in the previous iteration process, so that understanding capability of the model on user interaction information input by a user in a scenario of multiple interaction steps is further improved, and thus accuracy of image segmentation in a scenario of multiple interaction steps is further improved.

Step S308: based on the mask information and the labeling information, model parameters of the interactive segmentation model are adjusted.

In this embodiment, the implementation manners of step S301 and step S308 are the same as the implementation manners of step S201 to step S204 in the embodiment shown in fig. 7 of the present disclosure, and are not described in detail herein.

Corresponding to the method for generating annotation data of the above embodiment, fig. 15 is a block diagram of the structure of the device for generating annotation data provided in the embodiment of the present disclosure. For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 15, the annotation data generating apparatus 4 includes:

The obtaining module 41 is configured to obtain an image to be segmented and user interaction information, where the user interaction information characterizes a selection operation of a user on a target object in the image to be segmented;

the processing module 42 is configured to input the image to be segmented and the user interaction information into an interactive segmentation model, so as to obtain mask information output by the interactive segmentation model, where the mask information characterizes a predicted image area of the target object indicated by the simulated interaction information, and the interactive segmentation model is obtained by training according to a method as shown in any one of fig. 7-14;

the labeling module 43 is configured to obtain labeling data based on the mask information and the image to be segmented.

In one embodiment of the present disclosure, the obtaining module 41 is further configured to: acquiring prior mask information, wherein the prior mask information is mask information output by the last interactive segmentation model;

the processing module 42 is specifically configured to: and inputting the image to be segmented, the user interaction information and the priori mask information into the interactive segmentation model to obtain the mask information output by the interactive segmentation model.

The acquiring module 41, the processing module 42 and the labeling module 43 are sequentially connected. The labeling data generating device 4 provided in this embodiment may execute the technical scheme of any one of the method embodiments shown in fig. 3 to 6, and its implementation principle and technical effect are similar, and this embodiment will not be described herein again.

Corresponding to the interactive segmentation model training method of the above embodiment, fig. 16 is a structural block diagram of an interactive segmentation model training apparatus provided in an embodiment of the present disclosure. For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 16, the interactive segmentation model training device 5 includes:

the obtaining module 51 is configured to obtain a labeling image, where the labeling image includes a target image and corresponding labeling information, and the labeling information is used to indicate an actual image area of a target object in the target image;

the training module 52 is configured to perform the following steps in a loop based on the labeling image until a preset condition is reached: based on the labeling image, generating simulation interaction information, wherein the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of the target object indicated by the simulation interaction information; based on the mask information and the labeling information, model parameters of the interactive segmentation model are adjusted.

In one embodiment of the present disclosure, the training module 52 is specifically configured to, when generating simulated interaction information based on the annotation image: determining a marking area based on the marking information, wherein the marking area is a triggering area for simulating the selection operation corresponding to the interaction information; randomly generating an operation identifier in the marked area; and generating simulation interaction information according to the pixel coordinates of the operation identification.

In one embodiment of the present disclosure, training module 52 is specifically configured to, when determining the marked region based on the marking information: acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process; determining difference information according to the labeling information and the priori information, wherein the difference information characterizes a difference region between a predicted image region corresponding to the priori information and an actual image region corresponding to the labeling information; and determining a marked area according to the difference information.

In one embodiment of the disclosure, the labeling information includes a first binary image corresponding to an actual image region of the target object, and the prior information includes a second binary image corresponding to a predicted image region; the training module 52 is specifically configured to, when determining the difference information according to the labeling information and the prior information: and comparing the difference points of the first binary image and the second binary image to determine difference information.

In one embodiment of the present disclosure, training module 52 is specifically configured to, when determining the marker region based on the difference information: determining a first mark area and/or a second mark area according to the difference information, wherein the first mark area is a missed cut image area of the predicted image area relative to the actual image area, and the second mark area is a miscut image area of the predicted image area relative to the actual image area; the training module 52 is specifically configured to, when randomly generating the operation identifier in the marking area: generating a first operation mark for indicating a missed cut image region in the first mark region; a second operation identifier for indicating a misclassified image area is generated in the second mark area.

In one embodiment of the present disclosure, the operation identifier comprises a line segment identifier; the training module 52 is specifically configured to, when randomly generating the operation identifier in the marking area: based on a skeleton extraction algorithm, obtaining a region skeleton corresponding to the marked region, wherein the region skeleton comprises continuous line segments with preset pixel widths; and randomly intercepting at least one segment with a preset length on the regional skeleton to generate a segment mark.

In one embodiment of the present disclosure, the obtaining module 51 is further configured to: acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process; the training module 52 is specifically configured to, when inputting the simulated interactive information and the target image into the interactive segmentation model to obtain the mask information output by the interactive segmentation model: inputting the simulation interaction information, the target image and the prior information into an interactive segmentation model to obtain mask information output by the interactive segmentation model;

in one embodiment of the present disclosure, the training module 52 is specifically configured to, when adjusting model parameters of the interactive segmentation model based on the mask information and the labeling information: obtaining loss function values of mask information and labeling information; model parameters of the interactive segmentation model are adjusted based on the loss function values.

In one embodiment of the present disclosure, the interactive segmentation model is a neural network model based on an encoder-decoder structure.

The acquisition module 51 and the training module 52 are sequentially connected. The interactive segmentation model training device 5 provided in this embodiment may execute the technical scheme of any one of the method embodiments shown in fig. 7 to 14, and its implementation principle and technical effect are similar, and this embodiment will not be described herein.

Fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure, and as shown in fig. 17, the electronic device 6 includes

A processor 61, and a memory 62 communicatively connected to the processor 61;

memory 62 stores computer-executable instructions;

processor 61 executes computer-executable instructions stored in memory 62 to implement the method in the embodiments shown in fig. 3-14.

Wherein optionally processor 61 and memory 62 are connected by bus 63.

The relevant descriptions and effects corresponding to the steps in the embodiments corresponding to fig. 3 to 14 may be understood correspondingly, and are not repeated here.

Referring to fig. 18, there is shown a schematic structural diagram of an electronic device 900 suitable for use in implementing embodiments of the present disclosure, where the electronic device 900 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (Personal Digital Assistant, PDA for short), a tablet (Portable Android Device, PAD for short), a portable multimedia player (Portable Media Player, PMP for short), an in-vehicle terminal (e.g., an in-vehicle navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 18 is merely an example, and should not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 18, the electronic apparatus 900 may include a processing device (e.g., a central processor, a graphics processor, or the like) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage device 908 into a random access Memory (Random Access Memory, RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

In general, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 907 including, for example, a liquid crystal display (Liquid Crystal Display, LCD for short), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While fig. 18 shows an electronic device 900 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When executed by the processing device 901, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or it may be connected to an external computer (e.g., connected via the internet using an internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In a first aspect, according to one or more embodiments of the present disclosure, there is provided an interactive segmentation model training method, comprising:

According to one or more embodiments of the present disclosure, generating simulated interaction information based on the annotation image includes: determining a marking area based on the marking information, wherein the marking area is a triggering area of the selection operation corresponding to the simulation interaction information; randomly generating an operation identifier in the marked area; and generating the simulation interaction information according to the pixel coordinates of the operation identification.

According to one or more embodiments of the present disclosure, determining a marker region based on the labeling information includes: acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process; determining difference information according to the labeling information and the priori information, wherein the difference information characterizes a difference region between a predicted image region corresponding to the priori information and an actual image region corresponding to the labeling information; and determining the marked area according to the difference information.

According to one or more embodiments of the present disclosure, the labeling information includes a first binary image corresponding to an actual image region of the target object, and the prior information includes a second binary image corresponding to the predicted image region; determining difference information according to the labeling information and the priori information, including: and comparing the difference points of the first binary image and the second binary image to determine the difference information.

According to one or more embodiments of the present disclosure, determining the marker region according to the difference information includes: determining a first mark area and/or a second mark area according to the difference information, wherein the first mark area is a missed cut image area of the predicted image area relative to an actual image area, and the second mark area is a miscut image area of the predicted image area relative to the actual image area; randomly generating operation identifiers in the marked area, wherein the operation identifiers comprise: generating a first operation identifier for indicating the missed cut image region in the first mark region; and generating a second operation identifier for indicating the misclassification image area in the second mark area.

According to one or more embodiments of the present disclosure, the operation identification includes a line segment identification; randomly generating operation identifiers in the marked area, wherein the operation identifiers comprise: based on a skeleton extraction algorithm, obtaining a region skeleton corresponding to the marking region, wherein the region skeleton comprises continuous line segments with preset pixel widths; and randomly intercepting at least one segment with a preset length on the regional skeleton to generate the segment identifier.

According to one or more embodiments of the present disclosure, the method further comprises: acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process; inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the method comprises the following steps of: inputting the simulation interaction information, the target image and the prior information into the interactive segmentation model to obtain mask information output by the interactive segmentation model; based on the mask information and the labeling information, adjusting model parameters of the interactive segmentation model, including: obtaining the loss function values of the mask information and the labeling information; and adjusting model parameters of the interactive segmentation model based on the loss function value.

According to one or more embodiments of the present disclosure, the interactive segmentation model is a neural network model based on an encoder-decoder structure.

In a second aspect, according to one or more embodiments of the present disclosure, there is provided a method for generating annotation data, including:

According to one or more embodiments of the present disclosure, the interactive segmentation model is trained using the method described above in the first aspect and the various possible designs of the first aspect.

According to one or more embodiments of the present disclosure, the method further comprises: acquiring prior mask information, wherein the prior mask information is mask information output by the last interactive segmentation model; inputting the image to be segmented and the user interaction information into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the method comprises the following steps: and inputting the image to be segmented, the user interaction information and the prior mask information into the interactive segmentation model to obtain mask information output by the interactive segmentation model.

In a third aspect, according to one or more embodiments of the present disclosure, there is provided an interactive segmentation model training apparatus, comprising:

According to one or more embodiments of the present disclosure, the training module is specifically configured to, when generating simulated interaction information based on the annotation image: determining a marking area based on the marking information, wherein the marking area is a triggering area of the selection operation corresponding to the simulation interaction information; randomly generating an operation identifier in the marked area; and generating the simulation interaction information according to the pixel coordinates of the operation identification.

According to one or more embodiments of the present disclosure, the training module is specifically configured to, when determining the labeling area based on the labeling information: acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process; determining difference information according to the labeling information and the priori information, wherein the difference information characterizes a difference region between a predicted image region corresponding to the priori information and an actual image region corresponding to the labeling information; and determining the marked area according to the difference information.

According to one or more embodiments of the present disclosure, the labeling information includes a first binary image corresponding to an actual image region of the target object, and the prior information includes a second binary image corresponding to the predicted image region; the training module is specifically configured to, when determining the difference information according to the labeling information and the prior information: and comparing the difference points of the first binary image and the second binary image to determine the difference information.

According to one or more embodiments of the present disclosure, the training module is specifically configured to, when determining the marker region according to the difference information: determining a first mark area and/or a second mark area according to the difference information, wherein the first mark area is a missed cut image area of the predicted image area relative to an actual image area, and the second mark area is a miscut image area of the predicted image area relative to the actual image area; the training module is specifically configured to, when the operation identifier is randomly generated in the marking area: generating a first operation identifier for indicating the missed cut image region in the first mark region; and generating a second operation identifier for indicating the misclassification image area in the second mark area.

According to one or more embodiments of the present disclosure, the operation identification includes a line segment identification; the training module is specifically configured to, when the operation identifier is randomly generated in the marking area: based on a skeleton extraction algorithm, obtaining a region skeleton corresponding to the marking region, wherein the region skeleton comprises continuous line segments with preset pixel widths; and randomly intercepting at least one segment with a preset length on the regional skeleton to generate the segment identifier.

According to one or more embodiments of the present disclosure, the acquiring module is further configured to: acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process; the training module is specifically configured to, when inputting the simulated interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model: inputting the simulation interaction information, the target image and the prior information into the interactive segmentation model to obtain mask information output by the interactive segmentation model;

according to one or more embodiments of the present disclosure, the training module is specifically configured to, when adjusting model parameters of the interactive segmentation model based on the mask information and the labeling information: obtaining the loss function values of the mask information and the labeling information; and adjusting model parameters of the interactive segmentation model based on the loss function value.

In a fourth aspect, according to one or more embodiments of the present disclosure, there is provided an annotation data generating apparatus, including:

According to one or more embodiments of the present disclosure, the acquiring module is further configured to: acquiring prior mask information, wherein the prior mask information is mask information output by the last interactive segmentation model;

the processing module is specifically used for: and inputting the image to be segmented, the user interaction information and the prior mask information into the interactive segmentation model to obtain mask information output by the interactive segmentation model.

a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An interactive segmentation model training method, comprising:

obtaining a labeling image, wherein the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of a target object in the target image, and the following steps are circularly executed based on the labeling image until a preset condition is reached:

generating simulation interaction information based on the annotation image, wherein the simulation interaction information is used for simulating the selection operation of a user on a target object in the target image;

inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information represents a predicted image area of a target object indicated by the simulation interaction information;

and adjusting model parameters of the interactive segmentation model based on the mask information and the labeling information.

2. The method of claim 1, wherein generating simulated interaction information based on the annotation image comprises:

determining a marking area based on the marking information, wherein the marking area is a triggering area of the selection operation corresponding to the simulation interaction information;

randomly generating an operation identifier in the marked area;

and generating the simulation interaction information according to the pixel coordinates of the operation identification.

3. The method of claim 2, wherein determining a marked region based on the marking information comprises:

acquiring prior information, wherein the prior information is mask information obtained in the previous iteration process;

determining difference information according to the labeling information and the priori information, wherein the difference information characterizes a difference region between a predicted image region corresponding to the priori information and an actual image region corresponding to the labeling information;

and determining the marked area according to the difference information.

4. A method according to claim 3, wherein the annotation information comprises a first binary map corresponding to an actual image region of the target object, and the prior information comprises a second binary map corresponding to the predicted image region;

Determining difference information according to the labeling information and the priori information, including:

and comparing the difference points of the first binary image and the second binary image to determine the difference information.

5. A method according to claim 3, wherein determining the marked area based on the difference information comprises:

determining a first mark area and/or a second mark area according to the difference information, wherein the first mark area is a missed cut image area of the predicted image area relative to an actual image area, and the second mark area is a miscut image area of the predicted image area relative to the actual image area;

randomly generating operation identifiers in the marked area, wherein the operation identifiers comprise:

generating a first operation identifier for indicating the missed cut image region in the first mark region; and generating a second operation identifier for indicating the misclassification image area in the second mark area.

6. The method of claim 2, wherein the operation identifier comprises a line segment identifier;

based on a skeleton extraction algorithm, obtaining a region skeleton corresponding to the marking region, wherein the region skeleton comprises continuous line segments with preset pixel widths;

And randomly intercepting at least one segment with a preset length on the regional skeleton to generate the segment identifier.

7. The method according to any one of claims 1-6, further comprising:

inputting the simulation interaction information and the target image into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the method comprises the following steps of:

inputting the simulation interaction information, the target image and the prior information into the interactive segmentation model to obtain mask information output by the interactive segmentation model;

based on the mask information and the labeling information, adjusting model parameters of the interactive segmentation model, including:

obtaining the loss function values of the mask information and the labeling information;

and adjusting model parameters of the interactive segmentation model based on the loss function value.

8. The method of any of claims 1-6, wherein the interactive segmentation model is a neural network model based on an encoder-decoder architecture.

9. A method of generating annotation data, the method comprising:

acquiring an image to be segmented and user interaction information, wherein the user interaction information characterizes the selection operation of a user on a target object in the image to be segmented;

inputting the image to be segmented and the user interaction information into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information characterizes a predicted image area of a target object indicated by the user interaction information, the interactive segmentation model is obtained by training based on a labeling image and automatically generated simulation interaction information, the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of the target object in the target image, and the simulation interaction information is used for simulating selection operation of a user on the target object in the target image;

and obtaining labeling data based on the mask information and the image to be segmented.

10. The method according to claim 9, wherein the method further comprises:

acquiring prior mask information, wherein the prior mask information is mask information output by the last interactive segmentation model;

Inputting the image to be segmented and the user interaction information into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the method comprises the following steps:

and inputting the image to be segmented, the user interaction information and the prior mask information into the interactive segmentation model to obtain mask information output by the interactive segmentation model.

11. An interactive segmentation model training device, comprising:

12. An annotation data generation apparatus, comprising:

the processing module is used for inputting the image to be segmented and the user interaction information into an interactive segmentation model to obtain mask information output by the interactive segmentation model, wherein the mask information characterizes a predicted image area of a target object indicated by simulation interaction information, the interactive segmentation model is obtained by training based on a labeling image and automatically generated simulation interaction information, the labeling image comprises a target image and corresponding labeling information, the labeling information is used for indicating an actual image area of the target object in the target image, and the simulation interaction information is used for simulating selection operation of a user on the target object in the target image;

13. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

The memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 10.

14. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method of any one of claims 1 to 10.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 10.