CN110782466A

CN110782466A - Picture segmentation method, device and system

Info

Publication number: CN110782466A
Application number: CN201810858221.0A
Authority: CN
Inventors: 李为; 刘奎龙
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-07-31
Filing date: 2018-07-31
Publication date: 2020-02-11
Anticipated expiration: 2038-07-31
Also published as: CN110782466B

Abstract

The invention discloses a picture segmentation method, a device and a system. Wherein, the method comprises the following steps: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture. The invention solves the technical problem of poor image segmentation effect in the prior art.

Description

Picture segmentation method, device and system

Technical Field

The invention relates to the field of image processing, in particular to a method, a device and a system for segmenting a picture.

Background

The demand of pictures such as e-commerce platform advertisement, shop decoration and the like is huge, the independent extraction of target commodities from the images according to edges is the premise of other related operations or further design, the service demand of designers is increased, the workload of related workers is huge, and the intellectualization of image design is an effective method for relieving the problem.

Currently, in the business of picture processing, the way of segmenting an image generally includes: (1) a semantic segmentation method is used for solving the segmentation problem of thousands of commodities in image carriers such as plane advertisements on e-commerce platforms, if semantic segmentation is used, a large number of samples are needed for training a segmentation network, but the commodities are various, the quantity of the samples which can be collected by various classes is large in difference, the image feature difference in partial classes is also large, and the semantic segmentation scheme currently needs a large number of sample data of each class to train the network, so that a large number of pictures cannot be subjected to semantic segmentation. (2) And (3) significance segmentation, if the significance segmentation is used, due to the lack of semantic information during the significance segmentation, the segmentation is not beneficial to post-processing, and when the significance of the target in the image is not strong, the segmentation effect is poor.

Aiming at the problem of poor image segmentation effect in the prior art, no effective solution is provided at present.

Disclosure of Invention

The embodiment of the invention provides a picture segmentation method, a device and a system, which are used for at least solving the technical problem of poor image segmentation effect in the prior art.

According to an aspect of the embodiments of the present invention, there is provided a picture segmentation method, including: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

According to another aspect of the embodiments of the present invention, there is also provided a picture segmentation method, including: extracting features of the picture through a backbone network of the segmentation model to select a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; carrying out first segmentation processing on the picture according to a feature layer obtained by feature extraction through a segmentation network corresponding to a target segmentation algorithm in a segmentation model, wherein the segmentation network comprises: the image segmentation method comprises a semantic segmentation network and a saliency segmentation network, wherein the saliency segmentation network is used for performing saliency segmentation on an image, and the semantic segmentation network is used for performing semantic segmentation on the image.

According to another aspect of the embodiments of the present invention, there is also provided a picture segmentation apparatus, including: the selection module is used for selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting the features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; the first segmentation module is used for carrying out rough segmentation on the picture by using a target segmentation algorithm to obtain a first segmentation result, wherein the rough segmentation is used for preliminarily determining a target object in the picture; and the second segmentation module is used for performing fine segmentation on the first segmentation result to obtain a target object in the picture.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to perform the following steps: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

According to another aspect of the embodiments of the present invention, there is further provided a processor, wherein the processor is configured to execute a program, and the program executes the following steps: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

According to another aspect of the embodiments of the present invention, there is also provided a picture segmentation system, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

In the embodiment of the invention, a target segmentation algorithm corresponding to a picture is selected from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture. According to the scheme, the target segmentation algorithm corresponding to the picture is determined by extracting the features of the picture, so that the openness and the complexity of commodity categories in the E-commerce field can be solved, and the technical problem of poor image segmentation effect in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a picture segmentation method;

fig. 2 is a flowchart of a picture segmentation method according to embodiment 1 of the present invention;

fig. 3 is a schematic diagram of a backbone network parameter according to embodiment 1 of the present application;

FIG. 4 is a schematic diagram of a segmentation model according to embodiment 1 of the present application;

fig. 5 is a flowchart of a picture segmentation method according to embodiment 2 of the present application;

fig. 6 is a schematic diagram of a picture dividing apparatus according to embodiment 3;

fig. 7 is a schematic diagram of a picture segmentation apparatus according to embodiment 4; and

fig. 8 is a block diagram of a computer terminal according to embodiment 6 of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

semantic segmentation: at the pixel level, objects within a specified category are segmented from the image.

And (3) significant segmentation: on the pixel level, the most significant subject object is segmented from the image, regardless of the category to which the object belongs.

Deep convolutional neural network: one kind of neural network generally has more hidden layers and a large number of parameters, and the hidden layers are generally implemented in structures such as convolution, pooling, full connection and the like, and are the main methods in the current image processing field.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a picture segmentation method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a picture segmentation method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the picture segmentation method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the picture segmentation method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

Under the operating environment, the application provides a picture segmentation method as shown in fig. 2. Fig. 2 is a flowchart of a picture segmentation method according to embodiment 1 of the present invention.

Step S21, selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting the features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm. Specifically, the saliency segmentation algorithm is generally not limited by categories, segments relatively salient foreground targets in an image, does not care about the categories to which the foreground targets belong, but lacks understanding of common features of the categories due to lack of semantic information, and is difficult to achieve an accurate segmentation effect. The semantic segmentation algorithm is to segment targets in a specified category range in a picture, a common PASCAL VOC data set contains 20 common targets including people, birds, airplanes, bottles and the like, and other objects are taken as backgrounds and do not need to be distinguished, but a large amount of sample data is needed when a model is trained, otherwise, an accurate segmentation result is difficult to obtain.

In the above scheme, the manner of extracting the features of the picture may be that, by using a convolution layer in the neural network model, the picture is convolved and maximally pooled to obtain feature layers of the picture under various sizes, and the feature layers are used for recording picture features. After the feature extraction result of the picture is obtained, a target segmentation algorithm can be determined in various ways.

In an alternative embodiment, the present solution may be implemented by a segmentation model, which is a convolutional neural network model, including a backbone network for determining an algorithm, a saliency segmentation network for performing saliency segmentation, and a semantic segmentation network for performing semantic segmentation. The method comprises the steps that a picture is input from a backbone network, the backbone network is used for extracting the features of the picture, if the backbone network can determine the category of the picture according to the feature layer of the picture, the segmentation model can determine what a significant region in the picture is specifically through the backbone network, and therefore the picture can be roughly segmented through a semantic segmentation algorithm; if the backbone network cannot determine the category of the picture according to the feature layer of the picture, the segmentation model cannot determine what the salient region in the picture belongs to through the backbone network, and therefore the picture can be roughly segmented through a saliency segmentation algorithm.

In another optional embodiment, still taking the segmentation model as an example, a category to be subjected to semantic segmentation and a category to be subjected to saliency segmentation may be preset before the picture is segmented, and after the backbone network determines the category to which the picture belongs through feature extraction, a target segmentation algorithm corresponding to the picture is determined according to a correspondence between the category and the segmentation algorithm. More specifically, the category that needs to be subjected to semantic segmentation may be a category in which the number of sample pictures is greater than a preset value, and the category that needs to be subjected to saliency segmentation may be a category in which samples are less than or equal to the preset value.

For the segmentation model, the scheme is that the picture input from the backbone network is received, the backbone network carries out feature extraction on the picture, and the picture is input into the segmentation network corresponding to the target algorithm, namely the significance segmentation network or the semantic segmentation network, according to the feature extraction result.

And step S25, roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture.

In particular, the coarse segmentation is used to perform pixel-level segmentation on the image, so as to determine a target object (i.e., a salient region) in the picture, and the obtained result is coarser than that of the fine segmentation. The first segmentation result obtained through the rough segmentation is actually a mask corresponding to the picture, and the mask is used for distinguishing a foreground color and a background color of the picture, wherein the foreground color in the mask is used for indicating a region where the target object is located. In the first segmentation result, the foreground colors may be set to all 0's and the background colors may be set to all 1's, so that the foreground colors and the background colors can be distinguished.

In an alternative embodiment, a classifier may be used to classify the pictures according to the category to which the pictures belong. Specifically, after the category to which the picture belongs is determined, the picture is classified into an algorithm module corresponding to the category by using a classifier, and the algorithm module can be a segmentation network for roughly segmenting the picture, so that the picture is segmented in the first step, namely roughly segmented.

And step S27, performing fine segmentation on the first segmentation result to obtain the target object in the picture.

Specifically, the target object is a salient region in the picture, that is, a portion that needs to be segmented from the picture. The fine segmentation is used for refining the edge of the target object in the picture on the basis of the first segmentation result, so that the target object is really segmented from the picture.

After the target object in the picture is preliminarily determined through rough segmentation, a more detailed boundary of the target object can be determined through fine segmentation, more specifically, the fine segmentation can be realized by performing statistical modeling on colors of foreground and background pixels in a result, prior probability distribution of each pixel belonging to a front/background can be obtained, and then, based on color distance information and position distance information, a maximum flow/minimum segmentation algorithm is used for performing secondary segmentation on an image commodity main body, so that the accuracy of the primary segmentation can be effectively improved.

In an alternative embodiment, the first segmentation result may be subdivided using the grabCut algorithm. The grabCut algorithm mainly comprises the following steps of (1) respectively modeling a mixed Gaussian distribution model for a background color and a foreground color in a first segmentation result of the whole picture through a clustering algorithm during initialization, (2) correcting the mixed Gaussian distribution model obtained in the step (1) for an expectation maximization algorithm aiming at a global pixel, and (3) further segmenting the picture through a maximum flow minimum segmentation algorithm, and iterating the steps in the step (2) and the step (3) until the cycle is repeated to a specified number of times.

In the above embodiments of the present application, feature extraction is performed on a picture, and a target segmentation algorithm corresponding to the picture is selected from candidate segmentation algorithms, where the candidate segmentation algorithms include: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture. According to the scheme, the target segmentation algorithm corresponding to the picture is determined by extracting the features of the picture, so that the openness and the complexity of commodity categories in the E-commerce field can be solved, and the technical problem of poor image segmentation effect in the prior art is solved.

Further, in the above scheme, both the semantic segmentation algorithm and the saliency segmentation algorithm can directly acquire the picture features extracted from the picture features for segmentation processing, so that the effects of saving the video memory of the GPU and accelerating the calculation speed can be achieved.

As an alternative embodiment, selecting a target segmentation algorithm corresponding to a picture from candidate segmentation algorithms by performing feature extraction on the picture includes: inputting the picture into a backbone network in the segmentation model, and distributing the picture to corresponding nodes by performing feature extraction on the picture through the backbone network, wherein the nodes comprise: a first node corresponding to each category of which the number of the sample pictures exceeds a preset value, and a second node corresponding to all categories of which the number of the sample pictures is less than or equal to the preset value; if the picture is distributed to the first node, determining that the target segmentation algorithm is a semantic segmentation algorithm; and if the picture is distributed to the second node, determining the target segmentation algorithm as a significance segmentation algorithm.

Specifically, the category is used to refer to a category, and the category to which the picture belongs may be used to indicate a category to which the target object in the picture belongs. Taking pictures in the e-commerce field as an example, the categories of the pictures may be: clothing, cosmetics, food, etc., the example is a coarser division of categories, which may also be continued, for example: the garment may include: one-piece dress, half-length dress, sweater, etc.

Still in the e-commerce field, if the pictures to be processed are various types of pictures, namely, pictures of clothes, and pictures of articles such as cosmetics and food, the pictures can be classified into coarser categories; if the pictures needing to be processed are known to belong to the clothes, the corresponding clothes can be divided into finer categories.

The segmentation model may be a convolutional neural Network model, and in an optional embodiment, the backbone Network may be a VGG Network (neural Network), which includes a plurality of convolutional layers, and each convolutional layer performs convolution and maximum pooling (con + maxpool) on the picture according to a corresponding parameter. Fig. 3 is a schematic diagram of parameters of a backbone network according to embodiment 1 of the present application, and fig. 4 is a schematic diagram of a segmentation model according to embodiment 1 of the present application, as shown in fig. 3 and 4, the backbone network has five convolutional layers, and each feature extraction has corresponding parameters including: layer type, kernel size, and number of channels, each convolutional layer capable of outputting a corresponding feature layer.

Natural images have their intrinsic characteristics, i.e. the statistical characteristics of a part of the image are the same as the other parts, which means that the learned characteristics of a part can be used in other parts, i.e. the same learned characteristics can be used for all positions on this image. More specifically, when a small block is randomly selected from a large-size image, for example, 3 × 3 is used as a sample, and features are learned from the small block sample, the features learned from the 3 × 3 sample can be used as a detector, in the parameters of the backbone network, the kernel size of the convolutional layer is the size of the small block sample used as the detector, the small block sample is applied to any position of the image, and the features learned from the 3 × 3 sample can be convolved with the original large-size image, so that an activation value of a different feature can be obtained for any position on the large-size image. And the maximum pooling is used for maximizing the feature points in the neighborhood, and maxpool can reduce the deviation of the estimated mean value caused by parameter errors of the convolutional layer, so that more texture information is reserved.

The nodes include a first node and a second node, where the first node may be multiple, that is, one first node corresponds to a category in which the number of each sample picture is greater than a preset value, and the second node is one and includes categories in which the number of all the sample pictures is less than or equal to the preset value. In an alternative embodiment, in the image processing scenario of the e-commerce, the categories for which the number of sample images is greater than the preset value correspond to the first category, and all the categories for which the number of samples is less than or equal to the preset value correspond to the second category.

In an optional embodiment, a plurality of nodes are arranged at the bottom layer of the backbone network of the segmentation model, wherein a category of which the number of each sample image is greater than a preset value corresponds to one first node, the first node belongs to the first category, all categories of which the number of samples is less than or equal to the preset value correspond to one second node, and the second node belongs to the second category. And after the last layer of feature extraction is carried out on the picture by the backbone network, distributing the picture to the corresponding nodes, thereby carrying out corresponding operation on the picture. Still referring to fig. 4, a Full Connection layer (Full Connection) is provided below the backbone network, four black dots and two white dots are provided below the Full Connection layer, each dot represents a classification result of one picture, the pictures corresponding to the four black dots belong to a category for performing semantic segmentation, and the pictures corresponding to the two white dots belong to a category for performing saliency segmentation.

It should be noted that semantic segmentation can segment specified contents in a picture, so that semantic segmentation is easy to obtain a more accurate segmentation result, but semantic segmentation requires a large number of samples to train a model, so that semantic segmentation cannot be performed for classes with fewer samples, and therefore in the above scheme, a semantic segmentation algorithm is used to segment classes with a large number of samples capable of training a semantic segmentation model. And for the categories which have fewer samples and cannot be subjected to semantic segmentation, the saliency segmentation algorithm is used for completing the segmentation of the picture, so that the accuracy of the segmentation of the categories with more samples is ensured, and the categories with fewer samples can be roughly segmented.

It should be noted that, as the total number of samples increases, the number of samples of the category whose number of samples is less than or equal to the preset value also increases, so that the number of original samples exceeds the preset value.

In the scheme, the two different split networks can directly acquire the characteristic parameters output by the backbone network, and the effects of saving GPU video memory and accelerating the calculation speed can be achieved.

In the following, how the backbone network classifies pictures according to feature extraction is explained: in an optional embodiment, the probability that each pixel in the picture belongs to each node is determined according to the result of feature extraction, and the node with the maximum activation value at each pixel position is determined as the node to which the pixel belongs.

Specifically, the probability that each pixel belongs to each category may be determined by softmax mapping, that is, mapping the output value of the neuron between (0, 1) by softmax, so as to obtain the probability that each pixel belongs to each node. The activation value is used for representing the calculation result of each neuron in the neural network, and can be directly calculated through the existing parameters and input values.

As an alternative embodiment, the segmentation model further comprises a segmentation network, the segmentation network comprising: the method comprises the following steps of performing coarse segmentation on a picture by using a target segmentation algorithm to obtain a first segmentation result, wherein the steps of the semantic segmentation network and the saliency segmentation network comprise:

step S231, inputting the first feature layer output by the backbone network to the segmentation network corresponding to the target segmentation algorithm.

In step S233, the picture is roughly divided by the division network.

Still referring to fig. 4, a semantic segmentation network and a saliency segmentation network are provided on the left side of the backbone network, and for a picture in which the target algorithm represented by the black dots is the semantic segmentation algorithm, the backbone network sends the feature layer of the picture to the semantic segmentation network for rough segmentation, and for a picture in which the target algorithm represented by the black dots is the saliency segmentation network, the backbone network sends the feature layer of the picture to the saliency segmentation network for rough segmentation.

Therefore, the above-mentioned backbone network is not only used for classifying pictures and determining the target segmentation algorithm corresponding to the pictures, but also used for outputting the feature layer as the image feature required by the segmentation network.

As an alternative embodiment, in the case that the target segmentation algorithm is a saliency segmentation algorithm, the roughly segmenting the picture through a segmentation network includes:

and step S251, stacking a plurality of second feature layers according to a preset layer jump rule to obtain a plurality of single-channel prediction masks, where the second feature layers are obtained by performing convolution and deconvolution on a first feature layer output by the backbone network.

In step S251, the layer in the layer hopping rule is used to refer to a second feature layer obtained by performing convolution and deconvolution processing (conv + deconv) on first feature layers output by different convolution layers of the backbone network. The preset layer jump rule refers to the second characteristic layers of which layers are overlapped. The purpose of superimposing the feature layers is to fuse image features obtained under different picture scales. In an alternative embodiment, the stacking of feature layers may be implemented by a concat function that is used to join two or more arrays, thereby returning a new number.

The single-channel prediction mask may be obtained by performing 1 × 1 convolution on the superimposed feature layers in the feature layer stacking result or, in an alternative embodiment.

The above feature layer stack is illustrated by taking fig. 3 as an example, for convenience of illustration, the second feature layer of six conv + deconv outputs is labeled by 10, 11, 12, 13, 14, 15 from top to bottom, the dotted line pointing is used to indicate the layer jump rule, and in fig. 3, l5, l4 — > l3, l2, l1, l0, and l3, l2 — > l1, l 0.

In step S252, a first linear mean of a plurality of single-channel prediction masks is obtained.

In step S253, the first linear mean is determined as the first segmentation result corresponding to the picture.

Specifically, the first segmentation result is used to distinguish a foreground color and a background color in the picture, so as to facilitate fine segmentation and determine a boundary of a target object in the picture.

In an alternative embodiment, as shown in fig. 4, the right part of the main network in fig. 4 represents a saliency partition network used for performing a saliency partition algorithm (saliency object detect), wherein conv is used for convolution, deconv is used for deconvolution, and the right 6 squares labeled conv + deconv in fig. 4, the dotted outline between layers indicates skip-layer chaining between layers, a corresponding multiple of transposed convolution is needed to make them have the same dimension before chaining, a feature layer is stacked by a concat function, and a corresponding single-channel prediction mask (mask) is obtained by using 1 × 1 convolution.

The above operation can obtain 6 prediction masks, i.e. the first linear mean, the linear mean of the 6 prediction masks can obtain a fusion mask, and the linear mean of the 6 prediction masks is used as the first segmentation result.

and step S254, stacking a plurality of second feature layers according to a preset layer jump rule to obtain a plurality of single-channel prediction masks, where the second feature layers are obtained by performing convolution and deconvolution on a first feature layer output by the backbone network.

Specifically, step S254 is the same as step S251, and is not described herein again.

Step S255, extracting n single-channel prediction masks from the plurality of single-channel prediction masks, where n is a positive integer smaller than the number of single-channel prediction masks.

In the example of fig. 4, after stacking a plurality of second feature layers, 6 single-channel prediction masks are obtained, that is, the number of the single-channel prediction masks is 6 so that the n may be a positive integer smaller than 6.

Step S256, obtain second linear mean values of the n single-channel prediction masks.

In step S257, the second linear average is determined as the first segmentation result corresponding to the picture.

The above calculation can obtain 6 prediction masks, and the fused mask can be obtained by calculating the linear mean of the 6 prediction masks, but if the linear mean of the 6 prediction masks is used as the first segmentation result, the effect is not optimal, so according to the above scheme, the linear mean of a part of the prediction masks can be selected as the first segmentation result, for example, the linear mean of the last three layers of prediction masks can be selected as the first segmentation result.

As an alternative embodiment, in the case that the target algorithm is a semantic segmentation algorithm, the coarse segmentation of the picture through the segmentation network includes:

in step S258, a plurality of convolutional layers in the backbone network is selected.

In the above steps, if the number of selected convolutional layers is large, the operation consumes much time, but if the number of selected convolutional layers is small, the segmentation accuracy is low, so an appropriate convolutional layer can be selected according to the experimental result.

In step S259, after the first feature layer output by the convolutional layer is up-sampled in the order from bottom to top, the feature of the first feature layer output by the last convolutional layer is overlapped to obtain a target feature layer corresponding to the last convolutional layer.

In an alternative embodiment, shown in fig. 4, the left part of the backbone network in fig. 4 represents a Semantic Segmentation network for performing Semantic Segmentation algorithm (Semantic Segmentation), and the last three convolutional layers are denoted as s32, s16, and s8 from bottom to top. Firstly, performing 2-time upsampling on the s32 feature layer by using a transposition convolution method, then stacking the upsampling result with the feature layer of s16 by concat, performing 2-time upsampling on the result again, and stacking the upsampling result with the feature layer of s8 by concat, so that a feature map with the feature map side length being 1/8 of the input image side length can be obtained, wherein the feature map is the target feature layer.

Step S261, convolving the target feature layer, so that the number of channels of the target feature layer is the number of the first nodes.

In step S263, the size of the convolved target feature layer is adjusted to be consistent with the size of the picture by upsampling the convolved target feature layer.

In an alternative embodiment, as shown in fig. 4, in this example, the semantic segmentation network only processes the last three convolutional layers of the backbone network, that is, superimposes feature layers output by the last three convolutional layers to obtain a target feature layer, performs 1 × 1 convolution on the target feature layer to make the number of channels of the target feature layer equal to the number of categories to be classified, and finally makes the size of the feature map be consistent with the size of the input image by n times of upsampling. Still in the above-described embodiment, since the target feature layer is 1/8 times the side length of the input image, the target feature layer may be up-sampled by 8 times, so that the size of the target feature layer coincides with the input picture.

As an alternative embodiment, performing fine segmentation on the first segmentation result to obtain a target object in the picture, includes:

step S271, shrink the picture to a first predetermined size.

The above steps reduce the picture, and the purpose is to reduce the time taken for the subsequent operation processing, but the cost of reducing the picture is to sacrifice the accuracy of the division, so the first preset size needs to lose less accuracy of the division under the condition of saving more operation time.

Step S273, performing Gaussian modeling on the foreground color and the background color in the first segmentation result through a clustering algorithm to obtain a mixed Gaussian distribution model, wherein the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color.

And in the step, the probability that each pixel point in the picture belongs to the foreground color or the background color is obtained by performing Gaussian modeling on the foreground color and the background color. When segmentation is carried out, according to a preset criterion, the pixel belonging to the foreground color is determined according to the probability that each pixel belongs to the foreground color or the background color, and therefore the target object in the picture can be obtained.

In step S275, the gaussian mixture model is modified by the expectation-maximization algorithm for the global pixel.

Specifically, the modified gaussian mixture distribution model is used for modifying the probability that each pixel belongs to the foreground color or the background color, so that more accurate probability is obtained.

And S277, re-segmenting according to the first segmentation result by using a maximum flow minimum segmentation algorithm to obtain a second segmentation result.

Specifically, the above steps are performed according to a predetermined segmentation criterion and the modified gaussian mixture model. In an alternative embodiment, the gaussian mixture model gives the probability that each pixel in the picture belongs to the foreground color, and the predetermined segmentation criterion may be that the probability that the pixel belongs to the foreground color exceeds a predetermined probability value, and the pixel is determined to belong to the foreground color. According to the segmentation criterion, the foreground image, namely the target object, can be segmented from the picture.

Step S279, the second segmentation result is enlarged according to a second preset size, and the step of modifying the gaussian mixture model by the expectation maximization algorithm for the global pixel is resumed until the second segmentation result is enlarged to the size before the picture is reduced.

Since the picture is small in step 271, after one iteration, the picture needs to be enlarged, and then the step of correcting the gaussian distribution model is performed, so that the size of the finally obtained picture can be the same as that of the input picture after multiple iterations.

Still referring to fig. 4, the segmentation network and the saliency segmentation network input the output result to the GrabCut algorithm module, and the GrabCut algorithm is used to implement the improved GrabCut algorithm, so as to obtain a finely segmented mask (mask), and further determine a target object in the picture according to the mask.

The method and the device for reducing the image size are based on the assumption that the probability distribution of various colors in the image cannot be changed by properly reducing the image, the image is reduced to a proper size during initialization and initial iteration, so that the purpose of reducing algorithm time is achieved, the size of the image is gradually increased at equal intervals in subsequent iteration, and the size of the image reaches a normal size until the last iteration. The scheme can greatly reduce the time consumption of the algorithm with less precision loss, and the precision loss is within an acceptable range.

As an optional embodiment, before determining the category to which the picture belongs by performing feature extraction on the picture, the method further includes: obtaining a segmentation model, wherein the step of obtaining the segmentation model comprises:

in step S30, a sample picture of a known target object is obtained.

Specifically, the sample image of the known target object refers to a sample image used for training a segmentation model, for example, taking a clothing picture as an example, the category to which the picture belongs is clothing, and the area of the clothing in the picture is known, so that the picture can be used as the sample picture to train the segmentation model.

Step S33, obtaining an initial network, wherein the initial network comprises a backbone network and a segmentation network, the segmentation network comprises a semantic segmentation network and a saliency segmentation network, the backbone network is used for extracting the characteristics of the sample picture, the saliency segmentation network is used for performing saliency segmentation on the sample picture, and the semantic segmentation network is used for performing semantic segmentation on the sample picture.

Specifically, the initial network is an initial VGG network, and the backbone network is configured to perform feature extraction on a picture, and on one hand, the backbone network can be used to classify an image subject to determine which specific segmentation network should be used for rough segmentation, and on the other hand, provides image features necessary for the segmentation network to enter a relevant segmentation network for further use.

And step S35, classifying the sample pictures through the backbone network and inputting the classified sample pictures into the corresponding segmentation network to obtain a prediction result.

In the above steps, a target segmentation algorithm of the sample picture is determined according to the correspondence between the categories and the segmentation algorithm, so that the sample picture is input to the corresponding segmentation network. .

Step S37, determining a loss value of the prediction result according to the target object of the sample picture.

Specifically, the above-mentioned loss value is used to represent the gap between the target object known in the sample image and the prediction result, and is used to adjust the network parameters of the segmentation network.

And step S39, adjusting the network parameters of the divided network according to the loss value.

In an alternative embodiment, referring to fig. 4, a sample picture is first input to a backbone network of an initial network, the sample picture is classified by the backbone network, a segmentation network for roughly segmenting the sample picture is determined, when the backbone network outputs a feature map of the sample picture to a saliency segmentation network, taking the saliency segmentation network on the right as an example, GT is used to represent a true value of a known target object in the sample picture, cross-entropy is used to represent a mutual entropy loss value, that is, the above loss value, in a training phase, a prediction mask obtained in each layer, and a final fusion mask (a linear average of the prediction masks in each layer) are respectively labeled with true data and compared pixel by pixel, so that 7 average mutual entropy loss values can be obtained, and they are added together to obtain a loss value of the saliency segmentation network portion of the picture. The network parameters of the significance partition are adjusted based on the loss value until the loss value begins to converge.

When the main network outputs the feature map of the sample picture to the voice segmentation network, taking the semantic segmentation network on the left side as an example, the GT is used for representing the real value of a known target object in the sample picture, the cross-entropy is used for representing the mutual entropy loss value, namely the loss value, in the training stage, aiming at each pixel position of the final feature map, the probability that the pixel belongs to each category is obtained by using a softmax method among the categories, the mutual entropy loss value is calculated according to the labeling category of the real labeling data on the pixel and the prediction probability of the feature map at the pixel position, and finally, the average value of all pixels of the whole picture is the loss value of the picture. The network parameters of the significance partition are adjusted based on the loss value until the loss value begins to converge.

In the embodiment, the saliency segmentation model and the semantic segmentation model are combined into the same model, share the backbone network together with the classification network, and train parameters together, so that the GPU video memory is saved, and the purpose of accelerating the calculation speed can be achieved.

As an alternative embodiment, the obtaining of the initial network includes: acquiring a convolutional neural network as a backbone network of an initial network, wherein the last layer of the convolutional neural network comprises: a first node corresponding to each category for which the number of sample pictures exceeds a preset value, and one second node corresponding to all categories for which the number of sample pictures is less than or equal to the preset value.

In an alternative embodiment, still referring to fig. 3, the fully connected layers below the backbone network, the first two layers respectively include 4096 nodes, and the number of nodes in the last layer depends on the actual situation, and includes: (1) respectively designing a node for each category capable of performing semantic segmentation; 2) one "other" node contains all classes that cannot be semantically segmented because of the small number of samples.

As an optional embodiment, for a category whose number of samples is less than or equal to a preset value, if the number of sample pictures is increased to make the number of sample pictures exceed the preset value, a corresponding first node is established for the category at the last layer of the backbone network.

In the above scheme, as samples are collected continuously, categories with a large number of samples are gradually increased, and some categories can be trained and segmented via the semantic segmentation model, so that the categories will introduce a new first node in the classifier of vgg, i.e. adding a new node in the last full-link layer. Other classes (via saliency splitting) for which the sample size is still small will remain in the second node as independent branches.

As an optional embodiment, after determining a target segmentation algorithm of a sample picture through a backbone network, before inputting a feature layer of the sample picture obtained by feature extraction to a corresponding segmentation network and obtaining a prediction result, the method further includes: training a backbone network, wherein the training of the backbone network: obtaining the prediction probability of each pixel in the sample picture belonging to each category; acquiring the real category of the sample picture on each pixel; determining a loss value of each pixel according to the prediction probability of each pixel belonging to each category and the real category on each pixel; determining the average value of the loss values of each pixel as the loss value of the sample picture; and adjusting network parameters of the backbone network according to the loss value of the sample picture.

In the scheme, in the training process of the segmentation networks, besides two segmentation networks need to be trained, the backbone network also needs to be trained, parameters of the backbone network in the initial model use pre-trained network model parameters based on the ImageNet database, and in the actual training process, the predicted results are compared with the actual values, so that the network parameters of the backbone network are adjusted.

In an alternative embodiment, the classifier is fine-tuned (refining) based on a new set of commodity samples, and the semantic segmentation network and the saliency segmentation network are alternately trained until the network loss values converge. The parameter optimization algorithm may use a random gradient descent method.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is also provided a picture segmentation method, and fig. 5 is a flowchart of a picture segmentation method according to embodiment 2 of the present application, and as shown in fig. 5, the method includes:

step S51, extracting the features of the picture through the backbone network of the segmentation model to select a target segmentation algorithm corresponding to the picture from the candidate segmentation algorithms, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm.

Specifically, the backbone network may include a plurality of convolutional layers, and each convolutional layer performs convolution and maximum pooling (con + maxpool) on the picture according to a corresponding parameter. Fig. 3 is a schematic diagram of a backbone network parameter according to embodiment 1 of the present application; fig. 4 is a schematic diagram of a segmentation model according to embodiment 1 of the present application, as shown in fig. 3 and 4, the backbone network has five convolutional layers, and each feature extraction has corresponding parameters including: layer type, kernel size, and number of channels, each convolutional layer capable of outputting a corresponding feature image.

In an optional embodiment, a plurality of nodes are arranged at the bottom layer of a backbone network of the segmentation model, wherein a category of which the number of each sample image is greater than a preset value corresponds to one node, and all categories of which the number of samples is less than or equal to the preset value correspond to one node, and after the backbone network performs the last layer of feature extraction on the picture, the picture is distributed to the corresponding node, so that the picture is subjected to corresponding operation. Still referring to fig. 4, a Full Connection layer (Full Connection) is provided below the backbone network, four black dots and two white dots are provided below the Full Connection layer, each dot represents a classification result of one picture, the pictures corresponding to the four black dots belong to a category for performing semantic segmentation, and the pictures corresponding to the two white dots belong to a category for performing saliency segmentation.

In the scheme, the backbone network receives an input picture, the backbone network performs a feature extraction result on the picture, and the picture is input to a segmentation network corresponding to a target algorithm, namely a saliency segmentation network or a semantic segmentation network, according to the feature extraction result.

Step S53, carrying out first segmentation processing on the picture according to the feature layer obtained by feature extraction through a segmentation network corresponding to the target segmentation algorithm in the segmentation model, wherein the segmentation network comprises: the image segmentation method comprises a semantic segmentation network and a saliency segmentation network, wherein the saliency segmentation network is used for performing saliency segmentation on an image, and the semantic segmentation network is used for performing semantic segmentation on the image.

In the above scheme, the backbone network is not only used for classifying the pictures and determining the target segmentation algorithm corresponding to the pictures, but also used for outputting the feature layer as the image feature required by the segmentation network. After determining a target segmentation algorithm corresponding to the picture, the backbone network outputs the image characteristics of the picture to the corresponding segmentation network, so that the corresponding segmentation network performs coarse segmentation on the picture.

It should be noted that semantic segmentation can segment specified contents in a picture, so that semantic segmentation is easy to obtain a more accurate segmentation result, but semantic segmentation requires a large number of samples to train a model, so that semantic segmentation cannot be performed for a small number of classes of samples. In the scheme, the semantic segmentation algorithm is used for segmenting the categories which have a large number of samples and can train the semantic segmentation model. For the categories which have fewer samples and cannot be subjected to semantic segmentation, the image is segmented by using a significance segmentation algorithm, so that the accuracy of the segmentation of the categories with the larger samples is ensured, the categories with the smaller samples can be roughly segmented, the opening and complexity of commodity categories in the E-commerce field can be further dealt with, and the technical problem of poor image segmentation effect in the prior art is solved.

As an alternative embodiment, after segmenting the picture according to the feature layer obtained by feature extraction through the segmentation network corresponding to the target segmentation algorithm in the segmentation model, the method further includes: performing second segmentation processing on the result obtained by the first segmentation processing through a segmentation module in the segmentation model, wherein the segmentation module is used for reducing the picture to a first preset size, performing Gaussian modeling on the foreground color and the background color in the result of the first segmentation processing through a clustering algorithm to obtain a mixed Gaussian distribution model, the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color, correcting the mixed Gaussian distribution model through an expectation maximization algorithm aiming at the global pixels, performing re-segmentation according to the first segmentation result through a maximum flow minimum segmentation algorithm to obtain a second segmentation result, amplifying the second segmentation result according to a second preset size, and re-entering the step of correcting the mixed Gaussian distribution model through an expectation maximization algorithm aiming at the global pixels, until the second segmentation result is enlarged to the size of the picture before being reduced.

The fine division module reduces the picture in order to reduce the time taken for subsequent operation, but the reduction of the picture is at the cost of sacrificing the accuracy of division, so that the first preset size needs to reduce the loss of the division accuracy while saving a large amount of operation time.

Specifically, a prediction mask is given by the first segmentation result, the prediction mask indicates the foreground color and the background color, and in the step, the probability that each pixel point in the picture belongs to the foreground color or the background color is obtained by performing Gaussian modeling on the foreground color and the background color. During segmentation, according to preset accuracy, determining pixels belonging to foreground colors according to the probability that each pixel belongs to the foreground color or the background color, so that a target object in a picture can be obtained, and specifically, correcting the Gaussian mixture distribution model to correct the probability that each pixel belongs to the foreground color or the background color, so that more accurate probability is obtained.

Although the image is small in the initial stage, after one iteration, the image needs to be amplified, and then the step of correcting the Gaussian distribution model is executed, so that after multiple iterations, the finally obtained image can be the same as the input image in size.

Example 3

According to an embodiment of the present invention, there is further provided a picture dividing apparatus for implementing the picture dividing method in embodiment 1, and fig. 6 is a schematic diagram of the picture dividing apparatus according to embodiment 3, as shown in fig. 6, the apparatus 600 includes:

a selecting module 602, configured to select a target segmentation algorithm corresponding to a picture from candidate segmentation algorithms by performing feature extraction on the picture, where the candidate segmentation algorithms include: a saliency segmentation algorithm and a semantic segmentation algorithm.

The first segmentation module 604 is configured to perform coarse segmentation on the picture by using a target segmentation algorithm to obtain a first segmentation result, where the coarse segmentation is used to preliminarily determine a target object in the picture.

And a second segmentation module 606, configured to perform fine segmentation on the first segmentation result to obtain a target object in the picture.

It should be noted here that the selection module 602, the first segmentation module 604 and the second segmentation module 606 correspond to steps S21 to S23 in embodiment 1, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the selection module comprises: the first determining submodule is used for inputting the picture into a backbone network in the segmentation model, and the backbone network distributes the picture to corresponding nodes by extracting features of the picture, wherein the nodes comprise: a first node corresponding to each category of which the number of the sample pictures exceeds a preset value, and a second node corresponding to all categories of which the number of the sample pictures is less than or equal to the preset value; the second determining submodule is used for determining that the target segmentation algorithm is a semantic segmentation algorithm if the picture is distributed to the first node; and the third determining sub-module is used for determining that the target segmentation algorithm is the significance segmentation algorithm if the picture is distributed to the second node.

As an alternative embodiment, the segmentation model further comprises a segmentation network, the segmentation network comprising: the semantic segmentation network and the significance segmentation network, the first segmentation module comprises: the output submodule is used for inputting the first characteristic layer output by the backbone network into a segmentation network corresponding to the target segmentation algorithm; and the first segmentation submodule is used for roughly segmenting the picture through a segmentation network.

As an alternative embodiment, in the case that the target segmentation algorithm is a saliency segmentation algorithm, the first segmentation submodule includes: the first stacking unit is used for stacking a plurality of second feature layers according to a preset layer jump rule to obtain a plurality of single-channel prediction masks, wherein the second feature layers are obtained by performing convolution and deconvolution on first feature layers output by the trunk network; the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring first linear mean values of a plurality of single-channel prediction masks; the first determining unit is used for determining the first linear mean value as a first segmentation result corresponding to the picture.

As an alternative embodiment, in the case that the target segmentation algorithm is a saliency segmentation algorithm, the first segmentation submodule includes: the second stacking unit is used for stacking a plurality of second feature layers according to a preset layer jump rule to obtain a plurality of single-channel prediction masks, wherein the second feature layers are obtained by performing convolution and deconvolution on first feature layers output by the backbone network; the device comprises an extraction unit, a prediction unit and a prediction unit, wherein the extraction unit is used for extracting n single-channel prediction masks from a plurality of single-channel prediction masks, and n is a positive integer smaller than the number of the single-channel prediction masks; the second acquisition unit is used for acquiring second linear mean values of the n single-channel prediction masks; and the second determining unit is used for determining the second linear mean value as the first segmentation result corresponding to the picture.

As an alternative embodiment, in the case that the segmentation algorithm is a semantic segmentation algorithm, the first segmentation submodule includes: a selection unit for selecting a plurality of convolutional layers in a backbone network; the sampling unit is used for performing up-sampling on the first characteristic layer output by the convolutional layer according to the sequence from bottom to top, and then performing characteristic superposition on the first characteristic layer output by the last convolutional layer to obtain a target characteristic layer corresponding to the last convolutional layer; the convolution unit is used for performing convolution on the target characteristic layer to enable the number of channels of the target characteristic layer to be the number of the first nodes; and the first adjusting unit is used for adjusting the size of the convolved target feature layer to be consistent with the size of the picture by up-sampling the convolved target feature layer.

As an alternative embodiment, the second segmentation module comprises: the reduction submodule is used for reducing the picture to a first preset size; the clustering submodule is used for carrying out Gaussian modeling on the foreground color and the background color in the first segmentation result through a clustering algorithm to obtain a mixed Gaussian distribution model, wherein the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color; the correction submodule is used for correcting the Gaussian mixture distribution model through an expectation maximization algorithm aiming at the global pixel; the second segmentation submodule is used for carrying out re-segmentation according to the first segmentation result through a maximum flow minimum segmentation algorithm to obtain a second segmentation result; and the amplification submodule is used for amplifying the second segmentation result according to a second preset size, and reentering the step of correcting the Gaussian mixture distribution model through an expectation maximization algorithm aiming at the global pixel until the second segmentation result is amplified to the size before the picture is reduced.

As an alternative embodiment, the apparatus further comprises: the obtaining module is used for obtaining the segmentation model before determining the category to which the picture belongs by performing feature extraction on the picture, wherein the obtaining module comprises: the first obtaining sub-module is used for obtaining a sample picture of a known target object; the second acquisition submodule is used for acquiring an initial network, the initial network comprises a backbone network and a segmentation network, the segmentation network comprises a semantic segmentation network and a saliency segmentation network, the backbone network is used for extracting the characteristics of the sample picture, the saliency segmentation network is used for performing saliency segmentation on the sample picture, and the semantic segmentation network is used for performing semantic segmentation on the sample picture; the input submodule is used for classifying the sample pictures through a backbone network and inputting the classified sample pictures into a corresponding segmentation network to obtain a prediction result; the fourth determining submodule is used for determining a loss value of a prediction result according to the target object of the sample picture; and the adjusting submodule is used for adjusting the network parameters of the segmented network according to the loss value.

As an alternative embodiment, the first obtaining sub-module includes: a third obtaining submodule, configured to obtain a convolutional neural network as a backbone network of the initial network, where a last layer of the convolutional neural network includes: a first node corresponding to each category for which the number of sample pictures exceeds a preset value, and one second node corresponding to all categories for which the number of sample pictures is less than or equal to the preset value.

As an optional embodiment, the obtaining module further includes: the training submodule is used for inputting a characteristic layer of the sample picture obtained by characteristic extraction into a corresponding segmentation network after a target segmentation algorithm of the sample picture is determined through the backbone network, and training the backbone network before a prediction result is obtained, wherein the training submodule comprises: the first obtaining unit is used for obtaining the prediction probability of each pixel in the sample picture belonging to each category; the second acquisition unit is used for acquiring the real category of the sample picture on each pixel; a third determining unit, configured to determine a loss value of each pixel according to the prediction probability that each pixel belongs to each category and the true category on each pixel; a third determining unit, configured to determine a mean value of the loss values of each pixel as a loss value of the sample picture; and the second adjusting unit is used for adjusting the network parameters of the backbone network according to the loss value of the sample picture.

Example 4

According to an embodiment of the present invention, there is further provided a picture dividing apparatus for implementing the picture dividing method in embodiment 2, and fig. 7 is a schematic diagram of the picture dividing apparatus according to embodiment 4, as shown in fig. 7, the apparatus 700 includes:

an extracting module 700, configured to perform feature extraction on the picture through a backbone network of the segmentation model to select a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms, where the candidate segmentation algorithms include: a saliency segmentation algorithm and a semantic segmentation algorithm.

A rough segmentation processing module 702, configured to perform a first segmentation process on the picture according to a feature layer obtained by feature extraction through a segmentation network corresponding to the target segmentation algorithm in the segmentation model, where the segmentation network includes: the image segmentation method comprises a semantic segmentation network and a saliency segmentation network, wherein the saliency segmentation network is used for performing saliency segmentation on an image, and the semantic segmentation network is used for performing semantic segmentation on the image.

It should be noted here that the above-mentioned extraction module 700 and rough segmentation processing module 702 correspond to steps S51 to S53 in embodiment 2, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the above-mentioned embodiment one. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the apparatus further comprises: a fine segmentation processing module, which is used for segmenting the picture according to a feature layer obtained by feature extraction through a segmentation network corresponding to a target segmentation algorithm in a segmentation model, then performing second segmentation processing on the result obtained by the first segmentation processing through a fine segmentation module in the segmentation model, wherein the fine segmentation module is used for reducing the picture to a first preset size, performing Gaussian modeling on the foreground color and the background color in the result of the first segmentation processing through a clustering algorithm to obtain a mixed Gaussian distribution model, the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color, the mixed Gaussian distribution model is corrected through an expectation maximization algorithm aiming at the global pixel, and re-segmentation is performed according to the first segmentation result through a maximum flow minimum segmentation algorithm to obtain a second segmentation result, and amplifying the second segmentation result according to a second preset size, and re-entering the step of correcting the Gaussian mixture distribution model by an expectation maximization algorithm aiming at the global pixel until the second segmentation result is amplified to the size of the picture before reduction.

Example 5

An embodiment of the present invention may provide a picture segmentation system, including:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:

selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm;

roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture;

and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

Further, the memory provides instructions for the processor to process other steps in embodiment 1, which is not described herein again.

Example 6

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the picture segmentation method: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

Alternatively, fig. 8 is a block diagram of a computer terminal according to embodiment 6 of the present invention. As shown in fig. 8, the computer terminal a may include: one or more processors 802 (only one of which is shown), a memory 804, and a transmitting device 806.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the image segmentation method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, implements the image segmentation method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

Optionally, the processor may further execute the program code of the following steps: inputting the picture into a backbone network in the segmentation model, and distributing the picture to corresponding nodes by performing feature extraction on the picture through the backbone network, wherein the nodes comprise: a first node corresponding to each category of which the number of the sample pictures exceeds a preset value, and a second node corresponding to all categories of which the number of the sample pictures is less than or equal to the preset value; if the picture is distributed to the first node, determining that the target segmentation algorithm is a semantic segmentation algorithm; and if the picture is distributed to the second node, determining the target segmentation algorithm as a significance segmentation algorithm.

Optionally, the segmentation model further includes a segmentation network, and the segmentation network includes: a semantic segmentation network and a saliency segmentation network, said processor further executable program code for: inputting a first characteristic layer output by the backbone network into a segmentation network corresponding to a target segmentation algorithm; and roughly dividing the picture through a division network.

Optionally, the processor may further execute the program code of the following steps: stacking a plurality of second feature layers according to a preset layer skipping rule under the condition that the target segmentation algorithm is a significance segmentation algorithm to obtain a plurality of single-channel prediction masks, wherein the second feature layers are obtained by performing convolution and deconvolution on a first feature layer output by a main network; acquiring linear mean values of a plurality of single-channel prediction masks; and determining the linear mean value as a first segmentation result corresponding to the picture.

Optionally, the processor may further execute the program code of the following steps: stacking a plurality of second feature layers according to a preset layer jump rule to obtain a plurality of single-channel prediction masks, wherein the second feature layers are obtained by performing convolution and deconvolution on a first feature layer output by a backbone network; extracting n single-channel prediction masks from the plurality of single-channel prediction masks, wherein n is a positive integer smaller than the number of the single-channel prediction masks; acquiring linear mean values of n single-channel prediction masks; and determining the linear mean value as a first segmentation result corresponding to the picture.

Optionally, the processor may further execute the program code of the following steps: selecting a plurality of convolutional layers in a backbone network under the condition that the segmentation algorithm is a semantic segmentation algorithm; according to the sequence from bottom to top, after the first characteristic layer output by the convolutional layer is up-sampled, the first characteristic layer output by the last convolutional layer is subjected to characteristic superposition to obtain a target characteristic layer corresponding to the last convolutional layer; performing convolution on the target characteristic layer to enable the number of channels of the target characteristic layer to be the number of the first nodes; and adjusting the size of the convolved target feature layer to be consistent with the size of the picture by up-sampling the convolved target feature layer.

Optionally, the processor may further execute the program code of the following steps: reducing the picture to a first preset size; performing Gaussian modeling on the foreground color and the background color in the first segmentation result through a clustering algorithm to obtain a mixed Gaussian distribution model, wherein the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color; modifying the Gaussian mixture distribution model through an expectation maximization algorithm aiming at the global pixel; re-segmenting according to the first segmentation result by a maximum flow minimum segmentation algorithm to obtain a second segmentation result; and amplifying the second segmentation result according to a second preset size, and re-entering the step of correcting the Gaussian mixture distribution model by an expectation maximization algorithm aiming at the global pixel until the second segmentation result is amplified to the size of the picture before reduction.

Optionally, the processor may further execute the program code of the following steps: obtaining a sample picture of a known target object before determining the category to which the picture belongs by extracting the features of the picture; the method comprises the steps that an initial network is obtained, the initial network comprises a backbone network and a segmentation network, the segmentation network comprises a semantic segmentation network and a saliency segmentation network, the backbone network is used for carrying out feature extraction on a sample picture, the saliency segmentation network is used for carrying out saliency segmentation on the sample picture, and the semantic segmentation network is used for carrying out semantic segmentation on the sample picture; after a target segmentation algorithm of a sample picture is determined through a backbone network, inputting a feature layer of the sample picture obtained by feature extraction into a corresponding segmentation network to obtain a prediction result; determining a loss value of a prediction result according to a target object of the sample picture; and adjusting network parameters of the segmented network according to the loss value.

Optionally, the processor may further execute the program code of the following steps: acquiring a convolutional neural network as a backbone network of an initial network, wherein the last layer of the convolutional neural network comprises: a first node corresponding to each category for which the number of sample pictures exceeds a preset value, and one second node corresponding to all categories for which the number of sample pictures is less than or equal to the preset value.

Optionally, the processor may further execute the program code of the following steps: and for the category of which the number of samples is less than or equal to the preset value, if the number of the sample pictures is increased and the number of the sample pictures exceeds the preset value, establishing a corresponding first node for the category at the last layer of the backbone network.

Optionally, the processor may further execute the program code of the following steps: after a target segmentation algorithm of a sample picture is determined through a backbone network, inputting a feature layer of the sample picture obtained by feature extraction into a corresponding segmentation network, and obtaining the prediction probability of each pixel in the sample picture belonging to each category before obtaining a prediction result; acquiring the real category of the sample picture on each pixel; determining a loss value of each pixel according to the prediction probability of each pixel belonging to each category and the real category on each pixel; determining the average value of the loss values of each pixel as the loss value of the sample picture; and adjusting network parameters of the backbone network according to the loss value of the sample picture.

The embodiment of the invention provides a scheme for dividing pictures. Selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture. According to the scheme, the target segmentation algorithm corresponding to the picture is determined by extracting the features of the picture, so that the openness and the complexity of commodity categories in the E-commerce field can be solved, and the technical problem of poor image segmentation effect in the prior art is solved.

It can be understood by those skilled in the art that the structure shown in fig. 8 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 8 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 8 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 7

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the picture segmentation method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: selecting a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms by extracting features of the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm; roughly dividing the picture by using a target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture; and performing fine segmentation on the first segmentation result to obtain a target object in the picture.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A picture segmentation method is characterized by comprising the following steps:

selecting a target segmentation algorithm corresponding to a picture from candidate segmentation algorithms by performing feature extraction on the picture, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm;

roughly dividing the picture by using the target division algorithm to obtain a first division result, wherein the rough division is used for preliminarily determining a target object in the picture;

2. The method of claim 1, wherein selecting a target segmentation algorithm corresponding to a picture from candidate segmentation algorithms by performing feature extraction on the picture comprises:

inputting the picture into a backbone network in a segmentation model, and distributing the picture to corresponding nodes by the backbone network through feature extraction of the picture, wherein the nodes comprise: a first node corresponding to each category of which the number of the sample pictures exceeds a preset value, and a second node corresponding to all categories of which the number of the sample pictures is less than or equal to the preset value;

determining the target segmentation algorithm as the semantic segmentation algorithm if the picture is assigned to the first node;

determining the target segmentation algorithm as the saliency segmentation algorithm if the picture is assigned to the second node.

3. The method of claim 2, wherein the segmentation model further comprises a segmentation network, the segmentation network comprising: the method comprises the following steps of performing coarse segmentation on the picture by using the target segmentation algorithm to obtain a first segmentation result, wherein the steps of the coarse segmentation comprise:

inputting a first characteristic layer output by the backbone network into a segmentation network corresponding to a target segmentation algorithm;

and roughly dividing the picture through the dividing network.

4. The method according to claim 3, wherein in the case that the target segmentation algorithm is a saliency segmentation algorithm, coarsely segmenting the picture through the segmentation network comprises:

stacking a plurality of second feature layers according to a preset layer-skipping rule to obtain a plurality of single-channel prediction masks, wherein the second feature layers are obtained by performing convolution and deconvolution on first feature layers output by the main network;

obtaining a first linear mean of the plurality of single-channel prediction masks;

and determining the first linear mean value as a first segmentation result corresponding to the picture.

5. The method according to claim 3, wherein in the case that the target segmentation algorithm is a saliency segmentation algorithm, coarsely segmenting the picture through the segmentation network comprises:

extracting n single-channel prediction masks from the plurality of single-channel prediction masks, wherein n is a positive integer smaller than the number of the single-channel prediction masks;

acquiring second linear mean values of the n single-channel prediction masks;

and determining the second linear mean value as a first segmentation result corresponding to the picture.

6. The method according to claim 3, wherein in the case that the segmentation algorithm is a semantic segmentation algorithm, coarsely segmenting the picture through the segmentation network comprises:

selecting a plurality of convolutional layers in the backbone network;

according to the sequence from bottom to top, after the first characteristic layer output by the convolutional layer is subjected to up-sampling, the first characteristic layer output by the last convolutional layer is subjected to characteristic superposition to obtain a target characteristic layer corresponding to the last convolutional layer;

performing convolution on the target feature layer to enable the number of channels of the target feature layer to be the number of the first nodes;

and adjusting the size of the convolved target feature layer to be consistent with the size of the picture by up-sampling the convolved target feature layer.

7. The method according to any one of claims 4 to 6, wherein performing a fine segmentation on the first segmentation result to obtain a target object in the picture comprises:

reducing the picture to a first preset size;

performing Gaussian modeling on the foreground color and the background color in the first segmentation result through a clustering algorithm to obtain a mixed Gaussian distribution model, wherein the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color;

modifying the Gaussian mixture distribution model by an expectation-maximization algorithm for global pixels;

re-segmenting according to the first segmentation result by a maximum flow minimum segmentation algorithm to obtain a second segmentation result;

and amplifying the second segmentation result according to a second preset size, and re-entering the step of correcting the Gaussian mixture distribution model through an expectation maximization algorithm aiming at the global pixel until the second segmentation result is amplified to the size of the picture before reduction.

8. The method of claim 1, wherein prior to determining the category to which the picture belongs by feature extraction of the picture, the method further comprises: obtaining a segmentation model, wherein the step of obtaining the segmentation model comprises:

acquiring a sample picture of a known target object;

acquiring an initial network, wherein the initial network comprises a backbone network and a segmentation network, the segmentation network comprises a semantic segmentation network and a saliency segmentation network, the backbone network is used for extracting features of the sample picture, the saliency segmentation network is used for performing saliency segmentation on the sample picture, and the semantic segmentation network is used for performing semantic segmentation on the sample picture;

after a target segmentation algorithm of the sample picture is determined through the backbone network, inputting a feature layer of the sample picture obtained by feature extraction into the corresponding segmentation network to obtain a prediction result;

determining a loss value of the prediction result according to a target object of the sample picture;

and adjusting the network parameters of the segmented network according to the loss value.

9. The method of claim 8, wherein obtaining an initial network comprises:

acquiring a convolutional neural network as a backbone network of the initial network, wherein the last layer of the convolutional neural network comprises: the first node corresponding to each category for which the number of sample pictures exceeds a preset value, and one second node corresponding to all categories for which the number of sample pictures is less than or equal to the preset value.

10. The method of claim 9, wherein for a category whose number of samples is less than or equal to the predetermined value, if the number of sample pictures is increased to exceed the predetermined value, a corresponding first node is established for the category at a last layer of the backbone network.

11. The method according to claim 8, wherein after determining a target segmentation algorithm of the sample picture through the backbone network, before inputting a feature layer of the sample picture obtained by feature extraction into the corresponding segmentation network to obtain a prediction result, the method further comprises: training the backbone network, wherein training the backbone network comprises:

obtaining the prediction probability of each pixel in the sample picture belonging to each category;

acquiring the real category of the sample picture on each pixel;

determining a loss value of each pixel according to the prediction probability of each pixel belonging to each category and the real category on each pixel;

determining the average value of the loss values of each pixel as the loss value of the sample picture;

and adjusting the network parameters of the backbone network according to the loss value of the sample picture.

12. A picture segmentation method is characterized by comprising the following steps:

extracting features of a picture through a backbone network of a segmentation model to select a target segmentation algorithm corresponding to the picture from candidate segmentation algorithms, wherein the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm;

performing first segmentation processing on the picture according to a feature layer obtained by feature extraction through a segmentation network corresponding to the target segmentation algorithm in the segmentation model, wherein the segmentation network comprises: the image processing system comprises a semantic segmentation network and a saliency segmentation network, wherein the saliency segmentation network is used for performing saliency segmentation on the image, and the semantic segmentation network is used for performing semantic segmentation on the image.

13. The method according to claim 12, wherein after the picture is segmented according to a feature layer obtained by feature extraction through a segmentation network corresponding to the target segmentation algorithm in the segmentation model, the method further comprises:

performing second segmentation processing on the result obtained by the first segmentation processing through a fine segmentation module in the segmentation model, wherein the fine segmentation module is used for reducing the picture to a first preset size, performing Gaussian modeling on foreground colors and background colors in the result obtained by the first segmentation processing through a clustering algorithm to obtain a mixed Gaussian distribution model, the mixed Gaussian distribution model is used for representing the probability that each pixel point in the picture is the foreground color or the background color, correcting the mixed Gaussian distribution model through an expectation maximization algorithm aiming at a global pixel, performing re-segmentation according to the first segmentation result through a maximum flow minimum segmentation algorithm to obtain a second segmentation result, amplifying the second segmentation result according to a second preset size, and re-entering the expectation maximization algorithm aiming at the global pixel, and correcting the Gaussian mixture distribution model until a second segmentation result is amplified to the size of the picture before reduction.

14. A picture segmentation apparatus, comprising:

the image segmentation method comprises a selection module and a segmentation algorithm selection module, wherein the selection module is used for selecting a target segmentation algorithm corresponding to an image from candidate segmentation algorithms by extracting features of the image, and the candidate segmentation algorithms comprise: a saliency segmentation algorithm and a semantic segmentation algorithm;

the first segmentation module is used for carrying out rough segmentation on the picture by using the target segmentation algorithm to obtain a first segmentation result, wherein the rough segmentation is used for preliminarily determining a target object in the picture;

and the second segmentation module is used for performing fine segmentation on the first segmentation result to obtain the target object in the picture.

15. A picture segmentation system, comprising:

a processor; and