CN111369582B

CN111369582B - Image segmentation method, background replacement method, device, equipment and storage medium

Info

Publication number: CN111369582B
Application number: CN202010150572.3A
Authority: CN
Inventors: 涂娟辉; 易阳; 李峰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-06
Filing date: 2020-03-06
Publication date: 2023-04-07
Anticipated expiration: 2040-03-06
Also published as: CN111369582A

Abstract

The invention provides an image segmentation method, an image segmentation device, electronic equipment and a storage medium; the method comprises the following steps: down-sampling coding is carried out on an image to be segmented comprising a foreground and a background, and up-sampling decoding is carried out on the obtained down-sampling feature map to obtain an up-sampling feature map; classifying the up-sampling feature map to obtain classification probability of each pixel in the up-sampling feature map corresponding to the foreground and the background; compensating errors of classification probabilities of the foreground and the background corresponding to each pixel to obtain a compensation feature map of the up-sampling feature map; fusing the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fused feature map; and identifying the foreground and the background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fusion characteristic image. According to the invention, the image can be accurately segmented.

Description

Image segmentation method, background replacement method, device, equipment and storage medium

Technical Field

The present invention relates to image processing technologies based on artificial intelligence, and in particular, to an image segmentation method, an image background replacement method, an image segmentation apparatus, an electronic device, and a storage medium.

Background

Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning, etc., and along with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important values.

In the image processing technology based on artificial intelligence, image segmentation is an important research direction, and foreground and background can be identified from an image, so that corresponding processing, such as replacement processing, can be performed on the identified foreground or background subsequently.

However, the foreground and the background identified by the related technology are easy to have errors, and the accuracy is low.

Disclosure of Invention

The embodiment of the invention provides an image segmentation method, an image background replacement device, electronic equipment and a storage medium, which can reduce recognition errors and improve the accuracy of image segmentation.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an image segmentation method, which comprises the following steps:

down-sampling coding is carried out on an image to be segmented comprising a foreground and a background, and up-sampling decoding is carried out on the obtained down-sampling feature map to obtain an up-sampling feature map;

classifying the up-sampling feature map to obtain classification probabilities of foreground and background corresponding to each pixel in the up-sampling feature map;

compensating the error of the classification probability of the foreground and the background corresponding to each pixel to obtain a compensation feature map of the up-sampling feature map;

performing fusion processing on the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fusion feature map;

and identifying the foreground and the background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fusion feature map.

The embodiment of the invention provides an image background replacing method, which comprises the following steps:

presenting an image to be segmented;

in response to the replacement operation for the image to be segmented, identifying a foreground and a background from the image to be segmented through an image segmentation model;

replacing the identified background with a target background to obtain a new image after background replacement, and

presenting the new image in the client;

the image segmentation model is used for compensating errors of the classification probabilities of the foreground and the background to obtain the foreground and the background.

An embodiment of the present invention provides an image segmentation apparatus, including:

the encoding module is used for carrying out downsampling encoding on the image to be segmented comprising the foreground and the background;

the decoding module is used for carrying out up-sampling decoding on the obtained down-sampling feature map to obtain an up-sampling feature map;

the compensation module is used for classifying the up-sampling feature map to obtain the classification probability of each pixel in the up-sampling feature map corresponding to the foreground and the background;

and the segmentation module is used for identifying the foreground and the background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fusion feature map.

In the above technical solution, the neural network model for image segmentation includes an encoding network and a decoding network;

the encoding module is also used for carrying out down-sampling encoding on the image to be segmented comprising the foreground and the background through the encoding network to obtain the down-sampling feature map;

the decoding module is further configured to perform upsampling decoding on the downsampled feature map output by the coding network through the decoding network to obtain the upsampled feature map.

In the above technical solution, the encoding network of the downsampling encoding includes a plurality of cascaded encoding layers;

the encoding module is further configured to perform downsampling encoding on the image to be segmented including the foreground and the background through a first encoding layer of the plurality of cascaded encoding layers;

outputting the coding result of the first coding layer to a subsequent cascaded coding layer, and continuing to perform downsampling coding and coding result output in the subsequent cascaded coding layer until the last coding layer;

and taking the encoding result output by the last encoding layer as a downsampling characteristic diagram output by the encoding network.

In the above technical solution, when the encoding network of the downsampling encoding includes a plurality of concatenated encoding layers, the decoding network of the upsampling decoding includes a plurality of concatenated decoding layers, and a cross-layer connection exists between the decoding layer and the encoding layer of the same layer;

the decoding module is further configured to perform upsampling decoding on the downsampled feature map through a first decoding layer of the plurality of concatenated decoding layers;

fusing the decoding result of the up-sampling decoding with the encoding result output by the encoding layer connected with the first decoding layer in a cross-layer manner, and outputting the fused result to a subsequent cascaded decoding layer as the final decoding result of the first decoding layer so as to obtain the final decoding result of the first decoding layer

Continuously performing up-sampling decoding, decoding result fusion and final decoding result output in a subsequent cascaded decoding layer;

and taking the final decoding result output by the last decoding layer as an up-sampling feature map which is output by the decoding network and comprises the features of the foreground and the background.

In the above technical solution, the neural network model for image segmentation further includes a compensation network;

the compensation module is further configured to perform the following processing on the upsampled feature map included in the intermediate decoding result and/or the final decoding result of the upsampling decoding through the compensation network:

performing convolution processing on the up-sampling feature map, and performing normalization processing on the up-sampling feature map after the convolution processing to obtain the probability that each pixel in the up-sampling feature map belongs to the foreground and the probability that each pixel belongs to the background;

and determining the maximum value of the probability of the pixel belonging to the foreground and the probability of the pixel belonging to the background as the classification probability corresponding to the pixel.

In the above technical solution, when the decoding network of the upsampling decoding includes a plurality of cascaded decoding layers, the compensation module is further configured to perform the following processing on an upsampling feature map included in an upsampling decoding output of the at least one decoding layer through the compensation network:

performing the following for each pixel in the up-sampled feature map: determining a difference value between a classification error threshold value and classification probabilities of the foreground and the background corresponding to the pixel as a compensation value of the pixel;

and combining the compensation values of all pixels in the up-sampling characteristic diagram to obtain the compensation characteristic diagram of the up-sampling characteristic diagram.

In the above technical solution, when a decoding network of the upsampling decoding includes a plurality of cascaded decoding layers, the compensation module is further configured to perform the following processing on an upsampling feature map included in an upsampling decoding output of the at least one decoding layer through the compensation network:

when the classification probability of any pixel in the up-sampling feature map is smaller than a classification threshold, determining the difference value between a classification error threshold and the classification probability of the pixel as a compensation value of the pixel;

determining the classification probability of the pixel as a compensation value of the pixel when the classification probability of the pixel is greater than or equal to a classification threshold;

In the above technical solution, the compensation module is further configured to perform the following processing on the feature of each pixel in the downsampled feature map:

multiplying the characteristic of the pixel with a compensation value corresponding to the pixel in the compensation characteristic diagram to obtain the compensation characteristic of the pixel;

adding the compensation feature of the pixel and the feature of the pixel at the same position in the up-sampling feature map to obtain a fusion feature of the pixel;

and combining the fusion characteristics of the pixels to obtain a fusion characteristic image of the image to be segmented.

In the above technical solution, the segmentation module is further configured to perform the following processing for each pixel in the fused feature map:

when the classification probability of the corresponding foreground of the pixel is larger than a foreground probability threshold value, determining the pixel as a foreground pixel;

when the classification probability of the background corresponding to the pixel is larger than a background probability threshold value, determining the pixel as a background pixel;

respectively performing communication processing on a foreground pixel set formed by the foreground pixels and a background pixel set formed by the background pixels to obtain a plurality of communication domains in the image to be segmented;

determining the largest connected domain in the plurality of connected domains as the foreground of the image to be segmented, and

and determining the region out of the foreground in the image to be segmented as the background of the image to be segmented.

In the above technical solution, the apparatus further includes:

the training module is used for carrying out down-sampling coding on an image sample to be segmented through an image segmentation model and carrying out up-sampling decoding on the obtained down-sampling feature map to obtain an up-sampling feature map;

classifying each pixel in the fusion characteristic image to obtain a pixel belonging to a foreground and a second pixel belonging to a background in the image sample to be segmented;

performing edge identification processing on each pixel in the fusion characteristic graph to obtain pixels belonging to edges in the image sample to be segmented;

constructing a loss function of the image segmentation model based on a foreground pixel set formed by the pixels belonging to the foreground, a background pixel set formed by the pixels belonging to the background, an edge pixel set formed by the pixels belonging to the edge, a segmentation label of the image sample to be segmented and an edge label of the image sample to be segmented;

and updating the parameters of the image segmentation model until the loss function is converged, and taking the parameters obtained by updating when the loss function is converged as the parameters of the trained image segmentation model.

The embodiment of the invention provides an image background replacing device, which comprises:

the display module is used for displaying an image to be segmented;

the identification module is used for responding to the replacement operation aiming at the image to be segmented, and identifying a foreground and a background from the image to be segmented through an image segmentation model;

the replacing module is used for replacing the identified background with a target background to obtain a new image after the background is replaced and presenting the new image in the client;

An embodiment of the present invention provides an electronic device for image segmentation, where the electronic device includes:

a memory for storing executable instructions;

and the processor is used for realizing the image segmentation method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

An embodiment of the present invention provides an electronic device for image background replacement, where the electronic device includes:

a memory for storing executable instructions;

and the processor is used for realizing the image background replacing method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

An embodiment of the present invention provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for segmenting an image provided by the embodiment of the present invention.

The embodiment of the invention provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for replacing the image background provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects:

compensating each pixel in the up-sampling feature map so as to reduce the identification error of each pixel; by fusing the compensation feature map, the up-sampling feature map and the down-sampling feature map, various local detail information of the image to be segmented can be effectively reserved, so that the foreground and the background can be accurately identified from the image to be segmented in the following process, and the image segmentation accuracy is improved.

Drawings

Fig. 1 is a schematic view of an application scenario of an image segmentation system 10 according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device 500 for image segmentation according to an embodiment of the present invention;

3A-3B are schematic flow charts of image segmentation methods provided by embodiments of the present invention;

FIG. 4 is a schematic structural diagram of an image segmentation model provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of another structure of an image segmentation model provided in an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device 600 for image background replacement according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of an image background replacing method according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart of an alternative image segmentation method provided by the present invention;

FIG. 9A is a diagram of any frame of image in a video provided by an embodiment of the invention;

FIG. 9B is a diagram of a foreground portrait area provided by an embodiment of the present invention;

FIG. 9C is a schematic view of an edge region provided by an embodiment of the present invention;

FIG. 9D is an alternative view of the background provided by an embodiment of the present invention;

FIG. 10 is a block diagram of an image segmentation model provided by an embodiment of the present invention;

FIG. 11 is a block diagram of a depth separable convolution residual module according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of an edge optimization module according to an embodiment of the present invention;

FIG. 13 is a schematic structural diagram of a convolution module according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of an application of portrait background replacement provided by an embodiment of the present invention;

fig. 15A is an input image without background replacement provided by an embodiment of the present invention;

fig. 15B is a background replacement image adopting other background replacement methods;

fig. 15C is a background replacement image provided by an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) And (3) prospect: a person or object, i.e. an object of interest, in the image near the leading edge. For example, a portrait image including mountains and waters, in which the portrait image close to the front is the foreground.

2) Background: a person or object near the back in the image, i.e. an object that is not of interest (a part of the image other than the foreground). For example, a portrait image including mountains and waters, in which mountains and waters near the rear are background.

3) Down-sampling: reducing the image makes the image fit a fixed size, reducing the resolution of the image. And for an image I with the size of M × N, performing s-time down-sampling on the image I to obtain a resolution image with the size of (M/s) × (N/s), wherein s is the common divisor of M and N. For example, in the case of a matrix image, the original image in the s-s window is changed into a pixel, and the value of the pixel is the average of all pixels in the s-s window.

4) And (3) upsampling: the image is enlarged to make the image conform to a fixed size, and the resolution of the image is improved. The image amplification can adopt an interpolation method, namely, on the basis of the original image pixels, new pixels are inserted among the pixel points by adopting a proper interpolation algorithm.

Embodiments of the present invention provide an image segmentation method and apparatus, an electronic device, and a storage medium, which can compensate errors of each pixel in an image, reduce recognition errors, and improve accuracy of image segmentation. An exemplary application of the electronic device for image segmentation provided by the embodiment of the present invention is described below, where the electronic device for image segmentation provided by the embodiment of the present invention may be a server, for example, a server deployed in a cloud, and performs a series of processing on an image to be segmented according to an image to be segmented provided by another device or a user, so as to identify a foreground and a background from the image to be segmented; the image segmentation method can be applied to various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a personal digital assistant), and the like, for example, a handheld terminal, and the foreground and the background are identified from the image to be segmented according to the image to be segmented input by the user on the handheld terminal, and are displayed on a display interface of the handheld terminal.

For example, referring to fig. 1, fig. 1 is a schematic view of an application scenario of an image segmentation system 10 provided by an embodiment of the present invention, a terminal 200 is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 200 may be used to acquire the image to be segmented, for example, when the user inputs the image to be segmented through the input interface, the terminal automatically acquires the image to be segmented after the input is completed.

In some embodiments, the terminal 200 locally performs the image segmentation method provided by the embodiments of the present invention to obtain the background and the foreground of the image to be segmented according to the image to be segmented input by the user, for example, an image segmentation assistant is installed on the terminal 200, the user inputs the image to be segmented in the image segmentation assistant, the terminal 200 performs processing such as up-sampling encoding, down-sampling decoding, classifying, compensating, fusing, and identifying on the image to be segmented according to the input image to be segmented, so as to obtain the background and the foreground of the image to be segmented, and the background and the foreground of the image to be segmented are displayed on the display interface 210 of the terminal 200.

In some embodiments, the terminal 200 may also send, to the server 100 through the network 300, an image to be segmented that is input by the user on the terminal 200, and invoke an image segmentation function provided by the server 100, where the server 100 obtains the background and the foreground of the image to be segmented by using the image segmentation method provided by the embodiments of the present invention, for example, an image segmentation assistant is installed on the terminal 200, the user inputs the image to be segmented in the image segmentation assistant, the terminal 200 sends the image to be segmented to the server 100 through the network 300, the server 100 receives the image to be segmented, performs processing such as up-sampling encoding, down-sampling decoding, classifying, compensating, fusing, and identifying on the image to be segmented, obtains the background and the foreground of the image to be segmented, returns the background and the foreground of the image to be segmented to the image segmentation assistant, and displays the background and the foreground of the image to be segmented on the display interface 210 of the terminal 200, or the server 100 directly gives the background and the foreground of the image to be segmented.

As an example, in a monitoring application scenario, when an electronic device for image segmentation (the server 100 or the terminal 200) records an access record of an access key area, the electronic device for image segmentation identifies a foreground, i.e., a human face, in each frame of image in a recorded access video by an image segmentation method, matches the identified human face with a human face in a database to determine identity information of a person in the access video, and records the access record of the identity information of the person in the key area, thereby implementing a monitoring function; in a medical application scenario, when an electronic device (server 100 or terminal 200) for image segmentation is used for an input medical image, by using the image segmentation method provided by the embodiment of the present invention, a foreground, i.e., a diseased part, such as enteritis, in the medical image is identified, and the identified diseased part is enlarged for a doctor to make an accurate diagnosis.

The following describes a structure of an electronic device for image segmentation according to an embodiment of the present invention, where the electronic device for image segmentation may be various terminals, such as a mobile phone, a computer, and the like, and may also be a server 100 as shown in fig. 1.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 for image segmentation according to an embodiment of the present invention, and the electronic device 500 for image segmentation shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 for image segmentation are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components of the connection. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 530 includes one or more output devices 531 that enable presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532 including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display screen, camera, other input buttons and controls.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in connection with embodiments of the invention is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 may be capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

In some embodiments, the image segmentation apparatus provided by the embodiments of the present invention may be implemented by combining hardware and software, and by way of example, the image segmentation apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to perform the operation intention processing method of the picture provided by the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

In other embodiments, the image segmentation apparatus provided by the embodiment of the present invention may be implemented in software, and fig. 2 shows an image segmentation apparatus 555 stored in a memory 550, which may be software in the form of programs and plug-ins, and includes a series of modules including an encoding module 5551, a decoding module 5552, a compensation module 5553, a segmentation module 5554, and a training module 5555; the encoding module 5551, the decoding module 5552, the compensation module 5553, and the segmentation module 5554 are configured to implement the image segmentation function provided in the embodiment of the present invention, and the training module 5555 is configured to train an image segmentation model, so that the trained image segmentation model implements the image segmentation function.

As can be understood from the foregoing, the image segmentation method provided by the embodiment of the present invention may be implemented by various types of electronic devices for image segmentation, such as an intelligent terminal, a server, and the like.

The following describes an image segmentation method provided by the embodiment of the present invention with reference to an exemplary application and implementation of the server provided by the embodiment of the present invention. Referring to fig. 3A, fig. 3A is a schematic flowchart of an image segmentation method according to an embodiment of the present invention, and the steps shown in fig. 3A are described in detail.

In step 101, down-sampling coding is performed on an image to be segmented including a foreground and a background, and up-sampling decoding is performed on the obtained down-sampling feature map to obtain an up-sampling feature map.

For example, a user may input an image to be segmented on an input interface of a terminal, after the input is completed, the terminal may forward the image to be segmented to a server, and after receiving the image to be segmented, the server may perform downsampling encoding on the image to be segmented, and perform upsampling decoding on an obtained downsampling feature map to obtain an upsampling feature map, so that image segmentation may be performed according to the upsampling feature map and the downsampling feature map in the subsequent process.

In some embodiments, a neural network model for image segmentation includes an encoding network and a decoding network; the method comprises the following steps of carrying out downsampling coding on an image to be segmented comprising a foreground and a background, wherein the downsampling coding comprises the following steps: carrying out down-sampling coding on an image to be segmented comprising a foreground and a background through a coding network to obtain a down-sampling feature map; carrying out up-sampling decoding on the obtained down-sampling feature map to obtain an up-sampling feature map, wherein the up-sampling feature map comprises the following steps: and performing up-sampling decoding on the down-sampling feature map output by the coding network through a decoding network to obtain an up-sampling feature map.

As an example, referring to fig. 4, the neural network model for image segmentation is an image segmentation model, an image to be segmented including a foreground and a background is downsampled and encoded by an encoding network (encoding module 5551) in the image segmentation model to obtain a downsampled feature map, the downsampled feature map is input to a decoding network, and the downsampled feature map output by the encoding network is upsampled and decoded by the decoding network to obtain an upsampled feature map.

In some embodiments, the downsampled encoded coding network includes a plurality of concatenated coding layers; the method comprises the following steps of carrying out down-sampling coding on an image to be segmented comprising a foreground and a background through a coding network to obtain a down-sampling feature map, wherein the down-sampling feature map comprises the following steps: down-sampling and coding an image to be segmented comprising a foreground and a background through a first coding layer of a plurality of cascaded coding layers; outputting the coding result of the first coding layer to a subsequent cascaded coding layer so as to continue downsampling coding and outputting the coding result in the subsequent cascaded coding layer until the last coding layer; and taking the coding result output by the last coding layer as a downsampling feature map output by the coding network.

As an example, referring to fig. 5, fig. 5 shows the coding network in fig. 4 comprising a plurality of concatenated coding layers. After receiving an image to be segmented comprising a foreground and a background through a first coding layer in a coding network, performing down-sampling coding on the image to be segmented to obtain a coding result of the first coding layer, namely a down-sampling feature map, outputting the coding result of the first coding layer to a subsequent cascaded coding layer, namely a second coding layer, so as to continue down-sampling coding and outputting the coding result in the subsequent cascaded coding layer until a last coding layer, and after performing down-sampling coding on the last coding layer, outputting the obtained coding result, namely the down-sampling feature map output by the last coding layer as the down-sampling feature map output by the coding network.

For example, downsampling and encoding an image to be segmented through a 1 st encoding layer in a plurality of cascaded encoding layers to obtain a 1 st downsampled feature map and output the 1 st downsampled feature map as an encoding result, and downsampling and encoding an I-1 st downsampled feature map output by an I-1 st encoding layer through an I-th encoding layer in the plurality of cascaded encoding layers to obtain an I-th downsampled feature map and output the I-th downsampled feature map as the encoding result, wherein the value of I is more than or equal to 2 and less than or equal to I, and the I is the number of the encoding layers in an encoding network. And the I-th down-sampling feature map output by the I-th coding layer is used as a down-sampling feature map comprising the features of the foreground and the background output by the coding network.

The image to be segmented is subjected to multi-level down-sampling coding, and a plurality of down-sampling feature maps of the image to be segmented are correspondingly obtained. For downsampling encoding of any one of a plurality of concatenated encoding layers, performing the following: performing convolution processing on the downsampling feature map output by the j-1 th coding layer through the j-th coding layer to obtain a first feature map; performing multi-channel coding processing on the first characteristic diagram, and performing serial connection processing on a plurality of coding characteristic diagrams obtained by the coding processing correspondingly to obtain a second characteristic diagram; outputting a j-th down-sampling feature map obtained by adding the first feature map and the second feature map to a j + 1-th coding layer; wherein j is a natural number I which is larger than 1 and smaller than I, and I is the number of coding layers in the coding network.

In some embodiments, when the encoding network of the downsampling encoding comprises a plurality of concatenated encoding layers, the decoding network of the upsampling decoding comprises a plurality of concatenated decoding layers, and a cross-layer connection exists between the decoding layer and the encoding layer of the same layer; the method for up-sampling and decoding the down-sampling feature map output by the coding network through the decoding network to obtain the up-sampling feature map comprises the following steps: performing up-sampling decoding on the down-sampling feature map through a first decoding layer of a plurality of cascaded decoding layers; fusing the decoding result of the up-sampling decoding with the encoding result output by the encoding layer connected with the first decoding layer in a cross-layer manner, outputting the fused result to the decoding layer in subsequent cascade connection as the final decoding result of the first decoding layer, and continuing to perform the up-sampling decoding, the decoding result fusion and the final decoding result output in the decoding layer in subsequent cascade connection; and taking the final decoding result output by the last decoding layer as an up-sampling feature map comprising the features of the foreground and the background output by the decoding network.

As an example, referring to fig. 5, there is a cross-layer connection between the decoding layer and the encoding layer of the same layer, i.e. the cascaded decoding layers input the encoding result (the down-sampled feature map of the decoding layer) to the encoding layer of the same layer. The method comprises the steps of performing up-sampling decoding on a down-sampling feature diagram output by a last encoding layer in a decoding network through a first decoding layer in the decoding network, fusing a decoding result of the up-sampling decoding with a coding result (a down-sampling feature diagram) output by a coding layer (the last encoding layer) connected with the first decoding layer in a cross-layer mode, outputting a fused result to a subsequent cascaded decoding layer as a final decoding result of the first decoding layer, continuing performing the up-sampling decoding and the decoding result fusion on the subsequent cascaded decoding layer (for any one decoding layer, fusing a decoding result of the up-sampling decoding of the decoding layer and a coding result output by the coding layer connected with the decoding layer in the cross-layer mode) and a final decoding result, and outputting the final decoding result output by the last decoding layer (the down-sampling feature diagram output by the last decoding layer) as an up-sampling feature diagram comprising the features of the foreground and the background output by the decoding network.

The lower layer of the coding network and the lower layer of the decoding network (the upper part of the U-shaped structure) have the detail information of the image to be segmented because the feature map is relatively large. The high layer of the coding network and the high layer of the decoding network (the lower part of the U shape) have low-frequency information of the image to be segmented, the receptive field is very large, and large outline information is convenient to obtain. Through cross-layer connection, information of each layer can be reserved, so that all information of the image to be segmented can be well learned by a coding network and a decoding network.

In step 102, the up-sampling feature map is classified to obtain the classification probability of each pixel in the up-sampling feature map corresponding to the foreground and the background.

After the server obtains the up-sampling feature map of the image to be segmented, the up-sampling feature map can be classified through a compensation network, so that the classification probability of the foreground and the background corresponding to each pixel in the up-sampling feature map is obtained, and the subsequent compensation processing is facilitated.

In some embodiments, a compensation network is also included in the neural network model for image segmentation; classifying the up-sampling feature map to obtain the classification probability of each pixel corresponding to the foreground and the background in the up-sampling feature map, including: the following processing is executed on the intermediate decoding result of the up-sampling decoding and/or the up-sampling feature map included in the final decoding result through the compensation network: performing convolution processing on the up-sampling feature map, and performing normalization processing on the up-sampling feature map after the convolution processing to obtain the probability that each pixel in the up-sampling feature map belongs to the foreground and the background; and determining the maximum value of the probability of each pixel belonging to the foreground and the probability of each pixel belonging to the background as the classification probability corresponding to each pixel.

As an example, referring to fig. 5, the image segmentation model further includes a compensation network, which may be embedded in the decoding network, the compensation network performs convolution processing on the upsampled feature map through an upsampled feature map included in the intermediate decoding result (decoding result output by the decoding layer except the last decoding layer in the decoding network) and/or the final decoding result (decoding result output by the decoding layer last in the decoding network) of the upsampled decoding, and performs normalization processing on the upsampled feature map after the convolution processing to obtain a probability that each pixel in the upsampled feature map belongs to the foreground and a probability that each pixel belongs to the background, and determines a maximum value of the probability that each pixel belongs to the foreground and the probability that each pixel belongs to the background as a classification probability corresponding to each pixel, that is, a classification probability of each pixel is obtained through a logistic regression (sotmax) function, for example, a probability that a pixel belongs to the foreground is 0.7, a probability that a pixel belongs to the background is 0.3, and a probability (0.3) of the background of the pixel is a classification probability of the pixel.

In step 103, the error of the classification probability of the foreground and the background corresponding to each pixel is compensated to obtain a compensation feature map of the up-sampling feature map.

In the process of identifying the background and the foreground in the image, identification errors are likely to occur, for example, pixel points belonging to the foreground are erroneously identified as pixel points belonging to the background, or pixel points belonging to the background are erroneously identified as pixel points belonging to the foreground. In order to avoid the problem of false identification, the image segmentation model can compensate the error of the classification probability of the foreground and the background corresponding to each pixel, so that the image segmentation model can effectively compensate the error and pay attention to the classification information of all the pixels.

In some embodiments, when the decoding network of the upsampling decoding includes a plurality of cascaded decoding layers, performing compensation processing on an error of a classification probability of a foreground and a background corresponding to each pixel to obtain a compensation feature map of the upsampling feature map, including: performing the following processing on an upsampled feature map included in an upsampled decoded output of at least one decoding layer through a compensation network: the following processing is performed for each pixel in the up-sampled feature map: determining a difference value between the classification error threshold value and the classification probability of the foreground and the background corresponding to the pixel as a compensation value of the pixel; and combining the compensation values of all pixels in the up-sampling characteristic diagram to obtain the compensation characteristic diagram of the up-sampling characteristic diagram.

As an example, referring to fig. 5, the image segmentation model further includes a compensation network, which may be embedded in the decoding layers in the decoding network, may be embedded in all decoding layers, or may be embedded in non-adjacent decoding layers, particularly in the decoding layers at intervals, and performs the following processing on each pixel in the upsampled feature map included in the upsampled decoding output of at least one decoding layer (non-adjacent decoding layer) through the compensation network: and determining a classification error threshold (for example, 1) and a difference value of classification probabilities of the foreground and the background corresponding to the pixel as a compensation value of the pixel, and performing combination processing on the compensation value of each pixel in the up-sampling feature map to obtain the compensation feature map of the up-sampling feature map.

When the decoding network only comprises one coding layer, only the up-sampling characteristic graph included in the coding result (namely the final coding result) output by the coding layer is compensated. When the decoding network comprises a plurality of cascaded decoding layers, the types of decoding layers comprise: the decoding device comprises a base decoding layer and a compensation decoding layer, wherein the base decoding layer can finish up-sampling decoding, the compensation decoding layer is a base decoding layer implanted with a compensation network, one or more compensation decoding layers can be arranged, and particularly, the compensation decoding layer and the base decoding layer can be alternately cascaded; in some embodiments, all decoding layers in the decoding network may be compensation decoding layers. Each compensation decoding layer executes the following processing to complete the operations of up-sampling decoding, compensation and fusion; a basic decoding layer in the compensation decoding layers performs up-sampling decoding on a coding result (up-sampling feature map) output by a previous decoding layer positioned in the compensation decoding layers in a decoding path of a decoding network to obtain an up-sampling feature map comprising foreground features and background features; a compensation network in the compensation decoding layer compensates the error of the classification probability of the foreground and the background corresponding to each pixel in the up-sampling feature map to obtain a compensation feature map of the up-sampling feature map; and a compensation network in the compensation decoding layer performs fusion processing on the compensation characteristic diagram of the up-sampling characteristic diagram, the up-sampling characteristic diagram and the down-sampling characteristic diagram output by the down-sampling coding of the coding layer connected with the compensation decoding layer in a cross-layer manner, and outputs the obtained fusion characteristic diagram to a next decoding layer of the compensation decoding layer as an up-sampling decoding result of the compensation decoding layer.

As an example, the base decoding layer (including the base decoding layer of the independent base decoding layer and compensation decoding layer) performs the following processing to complete the up-sampling decoding operation: performing up-sampling decoding on an up-sampling feature map output by a previous decoding layer positioned in a basic decoding layer in an up-sampling decoding path to obtain an up-sampling feature map comprising features of a foreground and a background; performing convolution processing on a down-sampling feature map output by a coding layer connected with a base decoding layer in a cross-layer mode to obtain a transition feature map; and adding the up-sampling feature map and the transition feature map to obtain a feature map, and outputting the feature map serving as a final decoding result of the basic decoding layer to a next cascaded decoding layer.

In some embodiments, when the decoding network for the upsampling decoding includes a plurality of cascaded decoding layers, performing compensation processing on an error of a classification probability of a foreground and a background corresponding to each pixel to obtain a compensation feature map of the upsampling feature map, including: performing the following processing on an upsampled feature map included in an upsampled decoded output of at least one decoding layer by a compensation network: when the classification probability of any pixel in the up-sampling feature map is smaller than a classification threshold, determining the difference value between the classification error threshold and the classification probability of the pixel as a compensation value of the pixel; determining the classification probability of the pixel as a compensation value of the pixel when the classification probability of the pixel is greater than or equal to a classification threshold; and carrying out combined processing on the compensation values of all pixels in the up-sampling characteristic diagram to obtain the compensation characteristic diagram of the up-sampling characteristic diagram.

As an example, the following processing is performed on an upsampled feature map included in an intermediate decoding result (a decoding result output by a decoding layer other than the last decoding layer in the decoding network) and/or a final decoding result (a decoding result output by a decoding layer last in the decoding network) of the upsampling decoding by the compensation network: when the classification probability of any pixel in the up-sampling feature map is smaller than the classification threshold, the classification probability of the pixel is lower, namely the pixel is easy to be identified by mistake, the class of the pixel cannot be determined, and the pixel can be an edge pixel, so that the difference value between the classification error threshold and the classification probability of the pixel is determined as the compensation value of the pixel to compensate the pixel; otherwise, it is stated that the classification probability of the pixel is relatively high, that is, the pixel is not easily misrecognized, and the classification of the pixel can be clarified, that is, the classification probability of the pixel is directly determined as the compensation value of the pixel, and the compensation values of the pixels in the upsampled feature map are combined to obtain the compensation feature map of the upsampled feature map, for example, the classification threshold is 0.8, and when the classification probability of a pixel is 0.9, 1-0.9=0.1 is taken as the compensation value of the pixel.

In step 104, the compensation feature map, the up-sampling feature map, and the down-sampling feature map are fused to obtain a fused feature map.

In order to retain more information in the image to be segmented, the compensation feature map, the up-sampling feature map, and the down-sampling feature map may be subjected to a fusion process to obtain a fusion feature map, in which more information in the image to be segmented is retained.

In some embodiments, performing a fusion process on the compensated feature map, the up-sampled feature map, and the down-sampled feature map to obtain a fused feature map includes: the following processing is performed for the feature of each pixel in the down-sampled feature map: multiplying the characteristic of the pixel with the compensation value of the corresponding pixel in the compensation characteristic diagram to obtain the compensation characteristic of the pixel; adding the compensation characteristic of the pixel and the characteristic of the pixel at the same position in the up-sampling characteristic diagram to obtain the fusion characteristic of the pixel; and combining the fusion features of the pixels to obtain a fusion feature map of the image to be segmented.

As an example, referring to fig. 5, the image segmentation model further includes a compensation network, the compensation network multiplies the feature of each pixel in the down-sampling feature map by the compensation value of the corresponding pixel in the compensation feature map to obtain the compensation feature of the pixel, adds the compensation feature of the pixel and the feature of the pixel at the same position in the up-sampling feature map to obtain the fusion feature of the pixel, and finally combines the fusion features of the pixels to obtain the fusion feature map of the image to be segmented. For example, if the feature of a certain pixel in the down-sampling feature map is a, the compensation value of the corresponding pixel in the compensation feature map is 0.2, and the feature of the pixel at the same position as that in the up-sampling feature map is B, the compensation feature of the pixel is 0.2A, and the fusion feature of the pixel is 0.2a + B.

In step 105, the foreground and the background are identified from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fused feature map.

After the server obtains the fusion feature map containing the detail information of the image to be segmented, the foreground and the background can be identified from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fusion feature map, so that the background in the image to be segmented can be replaced in the subsequent process.

In some embodiments, identifying the foreground and the background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fused feature map includes: the following processing is performed for each pixel in the fused feature map: when the classification probability of the corresponding foreground of the pixel is larger than the threshold value of the foreground probability, determining the pixel as a foreground pixel; when the classification probability of the background corresponding to the pixel is greater than a background probability threshold, determining the pixel as a background pixel; respectively performing communication processing on a foreground pixel set formed by foreground pixels and a background pixel set formed by background pixels to obtain a plurality of communication domains in an image to be segmented; and determining the largest connected domain in the plurality of connected domains as the foreground of the image to be segmented, and determining the region out of the foreground in the image to be segmented as the background of the image to be segmented.

As an example, after the server obtains the fusion feature map, the fusion feature map may be identified through a decoding network, so as to determine classification probabilities of the foreground and the background corresponding to each pixel in the fusion feature map, and when the classification probability of the foreground corresponding to the pixel is greater than a foreground probability threshold, the pixel is determined as a foreground pixel; and when the classification probability of the corresponding background of the pixel is greater than the background probability threshold, determining the pixel as the background pixel. Combining foreground pixels to form a foreground pixel set, combining background pixels to form a background pixel set, respectively performing communication processing on the foreground pixel set and the background pixel set to form a plurality of communication domains in an image to be segmented, determining the maximum communication domain in the plurality of communication domains as the foreground of the image to be segmented and determining the region out of the foreground in the image to be segmented as the background of the image to be segmented, thereby identifying the foreground and the background in the image to be segmented.

In some embodiments, referring to fig. 3B, based on fig. 3A, fig. 3B is a schematic flowchart of an image segmentation method provided in an embodiment of the present invention, in step 106, a to-be-segmented image sample is subjected to downsampling coding by using an image segmentation model, and an obtained downsampled feature map is subjected to upsampling decoding to obtain an upsampled feature map; in step 107, classifying the up-sampling feature map to obtain classification probabilities of foreground and background corresponding to each pixel in the up-sampling feature map; compensating errors of classification probabilities of the foreground and the background corresponding to each pixel to obtain a compensation feature map of the up-sampling feature map; fusing the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fused feature map; in step 108, classifying each pixel in the fusion characteristic image to obtain a pixel belonging to the foreground and a second pixel belonging to the background in the image sample to be segmented; performing edge identification processing on each pixel in the fusion characteristic graph to obtain pixels belonging to edges in the image sample to be segmented; in step 109, constructing a loss function of the image segmentation model based on a foreground pixel set formed by pixels belonging to the foreground, a background pixel set formed by pixels belonging to the background, an edge pixel set formed by pixels belonging to the edge, a segmentation label of the image sample to be segmented, and an edge label of the image sample to be segmented; in step 110, parameters of the image segmentation model are updated until the loss function converges, and the parameters updated when the loss function converges are used as the parameters of the trained image segmentation model.

Steps 106-110 are not in obvious sequence with steps 101-105. When the server determines the value of the loss function of the image segmentation model based on a foreground pixel set formed by pixels belonging to the foreground, a background pixel set formed by pixels belonging to the background, an edge pixel set formed by pixels belonging to the edge, a segmentation label of the image sample to be segmented and an edge label of the image sample to be segmented, whether the value of the loss function exceeds a preset threshold value or not can be judged, when the value of the loss function exceeds the preset threshold value, an error signal of the image segmentation model is determined based on the loss function, error information is reversely propagated in the image segmentation model, and model parameters of each layer are updated in the propagation process.

The back propagation is explained here, training sample data is input to an input layer of the neural network model, passes through a hidden layer, finally reaches an output layer and outputs a result, which is a forward propagation process of the neural network model, because the output result of the neural network model has an error with an actual result, an error between the output result and the actual value is calculated, the error is propagated from the output layer to the hidden layer in the back direction until the error is propagated to the input layer, and in the process of back propagation, the value of the model parameter is adjusted according to the error; and continuously iterating the process until convergence, wherein the image segmentation model belongs to the neural network model.

With reference to the exemplary application and implementation of the server provided by the embodiment of the present invention, the image segmentation method provided by the embodiment of the present invention is described, and the following continues to describe a scheme in which each module in the image segmentation apparatus 555 provided by the embodiment of the present invention cooperates to implement image segmentation.

An encoding module 5551, configured to perform downsampling encoding on an image to be segmented, which includes a foreground and a background; a decoding module 5552, configured to perform upsampling decoding on the obtained downsampled feature map to obtain an upsampled feature map; the compensation module 5553 is configured to perform classification processing on the up-sampling feature map to obtain classification probabilities of foreground and background corresponding to each pixel in the up-sampling feature map; compensating the error of the classification probability of the foreground and the background corresponding to each pixel to obtain a compensation feature map of the up-sampling feature map; performing fusion processing on the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fusion feature map; a segmentation module 5554, configured to identify a foreground and a background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fused feature map.

In some embodiments, a neural network model for image segmentation includes an encoding network and a decoding network; the encoding module 5551 is further configured to perform downsampling encoding on an image to be segmented including a foreground and a background through the encoding network, so as to obtain the downsampling feature map; the decoding module 5552 is further configured to perform upsampling decoding on the downsampled feature map output by the coding network through the decoding network to obtain the upsampled feature map.

In some embodiments, the downsampled encoded coding network includes a plurality of concatenated coding layers; the encoding module 5551 is further configured to perform downsampling encoding on the image to be segmented including the foreground and the background through a first encoding layer of the plurality of concatenated encoding layers; outputting the coding result of the first coding layer to a subsequent cascaded coding layer, and continuing to perform downsampling coding and coding result output in the subsequent cascaded coding layer until the last coding layer; and taking the encoding result output by the last encoding layer as a downsampling characteristic diagram output by the encoding network.

In some embodiments, when a plurality of concatenated coding layers are included in the downsampled coded coding network, a plurality of concatenated decoding layers are included in the upsampled decoded decoding network, and a cross-layer connection exists between the decoding layers and the coding layers of the same layer; the decoding module 5552 is further configured to up-sample decode the down-sampled feature map by a first decoding layer of the plurality of concatenated decoding layers; merging the decoding result of the up-sampling decoding with the encoding result output by the encoding layer connected with the first decoding layer in a cross-layer manner, outputting the merged result to the decoding layer of the subsequent cascade connection as the final decoding result of the first decoding layer, and continuing to perform the up-sampling decoding, the decoding result merging and the final decoding result output in the decoding layer of the subsequent cascade connection; and taking the final decoding result output by the last decoding layer as an up-sampling feature map which is output by the decoding network and comprises the features of the foreground and the background.

In some embodiments, a compensation network is also included in the neural network model for image segmentation; the compensation module 5553 is further configured to perform the following processing on the upsampled feature map included in the intermediate decoding result and/or the final decoding result of the upsampling decoding through the compensation network: performing convolution processing on the up-sampling feature map, and performing normalization processing on the up-sampling feature map after the convolution processing to obtain the probability that each pixel in the up-sampling feature map belongs to the foreground and the probability that each pixel belongs to the background; and determining the maximum value of the probability of the pixel belonging to the foreground and the probability of the pixel belonging to the background as the classification probability corresponding to the pixel.

In some embodiments, when a plurality of cascaded decoding layers are included in the decoding network for upsampling decoding, the compensation module 5553 is further configured to perform the following processing on an upsampled feature map included in an upsampled decoded output of the at least one decoding layer through the compensation network: performing the following for each pixel in the up-sampled feature map: determining a difference value between a classification error threshold value and classification probabilities of the foreground and the background corresponding to the pixel as a compensation value of the pixel; and combining the compensation values of all pixels in the up-sampling characteristic diagram to obtain the compensation characteristic diagram of the up-sampling characteristic diagram.

In some embodiments, when the decoding network of the upsampling decoding comprises a plurality of cascaded decoding layers, the compensation module 5553 is further configured to perform the following processing on the upsampled feature map comprised by the upsampling decoding output of the at least one decoding layer through the compensation network: when the classification probability of any pixel in the up-sampling feature map is smaller than a classification threshold, determining the difference value between a classification error threshold and the classification probability of the pixel as a compensation value of the pixel; determining the classification probability of the pixel as a compensation value of the pixel when the classification probability of the pixel is greater than or equal to a classification threshold; and combining the compensation values of all pixels in the up-sampling characteristic diagram to obtain the compensation characteristic diagram of the up-sampling characteristic diagram.

In some embodiments, the compensation module 5553 is further configured to perform the following for the feature of each pixel in the downsampled feature map: multiplying the characteristic of the pixel with a compensation value corresponding to the pixel in the compensation characteristic diagram to obtain the compensation characteristic of the pixel; adding the compensation feature of the pixel and the feature of the pixel at the same position in the up-sampling feature map to obtain a fusion feature of the pixel; and combining the fusion characteristics of the pixels to obtain a fusion characteristic diagram of the image to be segmented.

In some embodiments, the segmentation module 5554 is further configured to perform the following for each pixel in the fused feature map: when the classification probability of the corresponding foreground of the pixel is larger than a foreground probability threshold, determining the pixel as a foreground pixel; when the classification probability of the background corresponding to the pixel is larger than a background probability threshold value, determining the pixel as a background pixel; respectively performing communication processing on a foreground pixel set formed by the foreground pixels and a background pixel set formed by the background pixels to obtain a plurality of communication domains in the image to be segmented; and determining the largest connected domain in the plurality of connected domains as the foreground of the image to be segmented, and determining the region out of the foreground in the image to be segmented as the background of the image to be segmented.

In some embodiments, the image segmentation apparatus 555 further comprises: the training module 5555 is configured to perform downsampling coding on an image sample to be segmented through an image segmentation model, perform upsampling decoding on the obtained downsampling feature map, and obtain an upsampling feature map; classifying the up-sampling feature map to obtain classification probability of each pixel in the up-sampling feature map corresponding to a foreground and a background; compensating the error of the classification probability of the foreground and the background corresponding to each pixel to obtain a compensation feature map of the up-sampling feature map; performing fusion processing on the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fusion feature map; classifying each pixel in the fusion characteristic image to obtain a pixel belonging to a foreground and a second pixel belonging to a background in the image sample to be segmented; performing edge identification processing on each pixel in the fusion characteristic graph to obtain pixels belonging to edges in the image sample to be segmented; constructing a loss function of the image segmentation model based on a foreground pixel set formed by the pixels belonging to the foreground, a background pixel set formed by the pixels belonging to the background, an edge pixel set formed by the pixels belonging to the edge, a segmentation label of the image sample to be segmented and an edge label of the image sample to be segmented; and updating the parameters of the image segmentation model until the loss function is converged, and taking the parameters obtained by updating when the loss function is converged as the parameters of the trained image segmentation model.

The following describes an image background replacement method provided by the embodiment of the present invention in conjunction with an exemplary application and implementation of the terminal provided by the embodiment of the present invention. Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device 600 for image background replacement according to an embodiment of the present invention, and the electronic device 600 for image background replacement shown in fig. 6 includes: at least one processor 610, memory 650, at least one network interface 620, and a user interface 630. The functions of the processor 610, the memory 650, the at least one network interface 620, and the user interface 630 are similar to the functions of the processor 510, the memory 550, the at least one network interface 520, and the user interface 530, respectively, that is, the functions of the output device 631 and the input device 632 are similar to the functions of the output device 531 and the input device 532, and the functions of the operating system 651, the network communication module 652, the display module 653, and the input processing module 654 are similar to the functions of the operating system 551, the network communication module 552, the display module 553, and the input processing module 554, respectively, which are not described in detail.

In other embodiments, the image background replacing device provided by the embodiment of the present invention may be implemented in a software manner, and fig. 6 shows the image background replacing device 655 stored in the memory 650, which may be software in the form of programs and plug-ins, etc., and includes a series of modules including a presenting module 6551, a recognition module 6552, and a replacing module 6553; the presenting module 6551, the identifying module 6552 and the replacing module 6553 are used for implementing the image background replacing method provided by the embodiment of the invention.

The following describes an image background replacement method provided by the embodiment of the present invention in conjunction with an exemplary application and implementation of the terminal provided by the embodiment of the present invention. Referring to fig. 7, fig. 7 is a flowchart illustrating an image background replacing method according to an embodiment of the present invention, which is described with reference to the steps shown in fig. 7.

In step 201, an image to be segmented is presented.

For example, a user may input an image to be segmented on an input interface of a client, and when the input is completed, the client may present the image to be segmented and send the image to be segmented to a server, so that the server identifies a foreground and a background from the image to be segmented through an image segmentation model.

In step 202, in response to the replacement operation for the image to be segmented, a foreground and a background are identified from the image to be segmented by the image segmentation model.

The image segmentation model is used for compensating errors of classification probabilities of the foreground and the background so as to obtain the foreground and the background. For example, when the user clicks a "replace" button displayed by the client, the client acquires the foreground and the background of the image to be segmented from the server in response to a replace operation for the image to be segmented.

In step 203, the identified background is replaced by the target background, a new image after the background replacement is obtained, and the new image is presented in the client.

The target context may be a context specified by the user replacement operation, or may be a context randomly selected by the client. After the client obtains the foreground and the background of the image to be segmented, the background of the image to be segmented can be replaced by the target background, a new image after the background replacement is obtained, and the new image is presented in the client.

Now, the image background replacing method provided by the embodiment of the present invention has been described, and the following continues to describe a scheme for implementing image background replacement by matching the modules in the image background replacing apparatus 655 provided by the embodiment of the present invention.

A presenting module 6551, configured to present an image to be segmented; an identifying module 6552, configured to identify a foreground and a background from the image to be segmented through an image segmentation model in response to the replacing operation for the image to be segmented; a replacing module 6553, which replaces the identified background with a target background to obtain a new image after replacing the background, and presents the new image in the client; the image segmentation model is used for compensating errors of the classification probabilities of the foreground and the background to obtain the foreground and the background.

Next, the image segmentation method provided in the embodiment of the present invention is continuously described with reference to a terminal (including a client) and a server, fig. 8 is an optional flowchart of the image segmentation method provided in the present invention, and referring to fig. 8, the image segmentation method provided in the embodiment of the present invention includes:

in step 301, the client sends an image to be segmented.

The user can input the image to be segmented on the input interface of the client, and after the input is finished, the client can present the image to be segmented and send the image to be segmented to the server.

In step 302, the server performs downsampling encoding on an image to be segmented including a foreground and a background, and performs upsampling decoding on the obtained downsampling feature map to obtain an upsampling feature map.

In step 303, the server performs classification processing on the up-sampled feature map to obtain classification probabilities of the foreground and the background corresponding to each pixel in the up-sampled feature map.

In step 304, the server performs compensation processing on the error of the classification probability of the foreground and the background corresponding to each pixel to obtain a compensation feature map of the up-sampling feature map.

In step 305, the server performs a fusion process on the compensation feature map, the up-sampling feature map, and the down-sampling feature map to obtain a fused feature map.

In step 306, the server identifies the foreground and the background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fused feature map.

In step 307, the client obtains the foreground and the background of the image to be segmented from the server in response to the replacement operation for the image to be segmented.

For example, when a user clicks a "replace" button displayed by the client, the client acquires the foreground and the background of the image to be segmented from the server in response to a replace operation for the image to be segmented.

In step 308, the client replaces the identified background with the target background, and obtains a new image after replacing the background.

In step 309, the client renders the new image.

Embodiments of the present invention also provide a computer-readable storage medium storing executable instructions, which, when executed by a processor, cause the processor to execute an image segmentation method provided by an embodiment of the present invention, for example, the image segmentation method as shown in fig. 3A-3B, or an image background replacement method provided by an embodiment of the present invention, for example, the image background replacement method as shown in fig. 7.

In some embodiments, the storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EE PROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may, but need not, correspond to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device (a device that includes a smart terminal and a server), or on multiple computing devices located at one site, or distributed across multiple sites and interconnected by a communication network.

In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.

The embodiment of the present invention may be applied to various image segmentation scenes, for example, video background replacement, that is, replacing the background except for the portrait in each frame of image in a video, as shown in fig. 1, a terminal 200 is connected to a server 100 deployed in a cloud through a network 300, a video application is installed on the terminal, the video application will automatically synchronize a real-time video to the server 100 through the network 300, the server 100 executes the image segmentation method provided in the embodiment of the present invention, identifies (up-samples, down-samples, compensates, and performs secondary classification and other processing) on each frame of image photo in the video, identifies the background and the portrait (foreground) of each frame of image, and feeds back the background and the portrait of each frame of image to the video application of the terminal 200. For example, the video application replaces the background of the image including the portrait with a black background to form a new image, where only the portrait exists and other parts are black. Therefore, when the user carries out video, the user or the other party only looks like the person, and other parts are covered, so that the user can protect the privacy of own objects.

In the related art, the portrait background replacement may employ image segmentation to segment an image into a foreground (portrait) and a background, and replace the background with a background image desired by any user on the basis of the segmentation. According to different portrait segmentation modes, portrait background replacement technologies can be divided into two categories: 1) Based on a classification segmentation mode, the method carries out secondary classification on each pixel in the image by using a classifier, so that each pixel can be definitely classified into a foreground class or a background class, and finally, an area belonging to the background class is replaced by any picture so as to achieve the purpose of background replacement; 2) Based on the segmentation mode of the cutout, in the method, each pixel in the image is considered to be formed by mixing a foreground and a background according to a certain proportion (the probability that each pixel belongs to the foreground or the background), and the proportion component of each pixel in the image, which belongs to the foreground and the background, is estimated by means of simple foreground and background labeling information (for example, a foreground area in an artificially labeled image), so that each pixel is classified (foreground class or background class) according to the proportion component of each pixel, which belongs to the foreground and the background, and further the image segmentation is realized.

Although, the related art can implement image segmentation, which segments the foreground (portrait) and the background from the image, so as to replace the background in the image, thereby implementing portrait background replacement. However, in the related art, a feature map of an input picture is obtained through feature coding, and then each pixel is labeled by using feature decoding. When the features are encoded, the resolution of the feature map is gradually reduced, which easily causes the loss of detail information in the image, so that the phenomenon of erroneous segmentation is easily caused, for example, a real portrait part is segmented into a background part, and a real background part is segmented into a portrait part.

In order to solve the above problem, an embodiment of the present invention provides an image segmentation method, which effectively compensates information in a confidence level region by using a resolution feature map, so as to more accurately screen out foreground people in an image, achieve more accurate foreground people identification, and perform background replacement. For a video background replacement scene, frame extraction processing is carried out on a video, each frame of image in the video is subjected to image segmentation, foreground characters and a background are identified from each frame of image, the background in each frame of image is replaced by the preselected background, and therefore video background replacement is achieved. For example, as shown in fig. 9A, fig. 9A is a schematic diagram of any frame image in a video provided by an embodiment of the present invention, where a background except for a portrait in fig. 9A needs to be replaced, the image segmentation method according to the embodiment of the present invention is used to segment fig. 9A, and a foreground portrait area of the image is analyzed, as shown in fig. 9B, fig. 9B is a schematic diagram of a foreground portrait area provided by an embodiment of the present invention, 901 in fig. 9B is a foreground portrait area, and an edge area of the image is analyzed, as shown in fig. 9C, fig. 9C is a schematic diagram of an edge area provided by an embodiment of the present invention, 902 in fig. 9C is an edge area, and background replacement is performed on the background identified by image segmentation processing, that is, a previously stored background image is used to replace the identified background, so as to obtain a final background replacement diagram obtained by replacing the background, as shown in fig. 9D, fig. 9D is a background replacement diagram provided by an embodiment of the present invention, that is replaced with the background diagram in fig. 9D, so as to realize replacement of a background image, and a user replacement of a background image, so that a background replacement is not intended video is not shown in a user replacement process of a user. The specific image segmentation method is as follows:

as shown in fig. 10, fig. 10 is a schematic frame diagram of an image segmentation model according to an embodiment of the present invention, where the embodiment of the present invention implements efficient human image detection segmentation by using human image segmentation based on a deep learning model. In the training phase, the image segmentation model includes a feature coding module (Enc, or coding network) and a human image segmentation module (Seg, or decoding network), and is specifically configured as shown in fig. 10, where Conv k × k represents a convolution operation with a kernel size k × k. The feature coding module comprises a densely connected depth separable convolutional residual module (denseseplableblock or coding layer) for efficiently coding an image, a feature map output by each layer of DenseSep arabableblock is used as an input of a lower layer of denseseplableblock, dimensions of the feature map output by each layer of denseseplableblock are different, so that the feature map output by each layer of denseseplableblock contains rich information with different dimensions, and the specific structure of the denseseplableblock is shown in fig. 11. The human image segmentation module gradually restores the human image segmentation result from a low scale to a high scale by using the feature map output by the feature coding module, wherein the human image segmentation module comprises a convolution module (ConvBlock or a basic decoding layer) and an Edge Optimization module (Edge Optimization or a compensation decoding layer) (the confidence coefficient of each pixel is compensated, so that the Edge information can be accurately acquired in the subsequent identification process, and the Edge Optimization is realized). As shown in fig. 12, fig. 12 is a specific configuration of an edge optimization module, an input of the upscale is an output of a portrait segmentation module, an input of the Conv 1 × 1 is an output of a feature coding module, for the edge optimization module, first, a resolution feature map output by a previous layer of ConvBlock is up-sampled, a confidence c (i.e., a probability that a pixel belongs to the foreground/background) of each pixel is obtained by using a softmax function (Conv 1 × 1 and normalization processing in fig. 12), in order to effectively compensate a low confidence region learned by a network by using a high-resolution feature, a high-resolution feature (a feature map output by the Conv 1 × 1 in fig. 12) and 1-c are subjected to element-by-element feature multiplication (Mul) operation, and a result of the multiplication and an output of the up-sampling (U psmple) are subjected to feature addition (Add), so that a high-confidence region feature is retained, so that more accurate edge information of the foreground and the background can be obtained subsequently, and the foreground and the background are divided by the edge information. For example, a chair in the input image belongs to the background, but the probability c that the output chair belongs to the foreground may be relatively high, and to compensate for the probability that the chair belongs to the background, the chair feature and 1-c may be subjected to Mul operation, and the high-confidence region feature (the original feature of the probability c that the chair belongs to the foreground) may be retained through Add operation. The specific configuration of ConvBlock is shown in fig. 13, where input of Upsample is output of the image segmentation module and input of Conv3 × 3 is output of the feature coding module. After the input image I passes through the frame shown in fig. 10, a pixel set of the human image segmentation and a pixel set of the edge boundary can be obtained, and the calculation formula is shown as formula (1):

S,E＝Seg(Enc(I)) (1)

enc (I) represents a pixel set of a feature coding module, S represents a pixel set (background and foreground sets) of human image segmentation, and E represents a pixel set of boundary of a foreground and a background.

During training of the image segmentation model, the embodiment of the invention combines two tasks of portrait segmentation (Conv 3 × 3 binary classification in FIG. 10) and edge prediction (Conv 3 × 3 edge in FIG. 10), so that the image segmentation model can simultaneously acquire human body overall perception excitation provided by the portrait segmentation task and human body local detail perception excitation compensated by the edge optimization module while the data scale is enlarged, and the performance of the image segmentation model is improved. The specific training Loss (Loss) of the image segmentation model is as follows:

wherein Cross Encopy (.) represents a cross entropy loss function, HS represents a human image segmentation data set, and comprises N training examples, S _gt Representing the true portrait segmentation labels, E, corresponding to image I _gt Representing the corresponding real edge label of image I.

After the image segmentation model is trained, the embodiment of the present invention retains the feature coding module and the human image segmentation module, but removes the convolution operation for edge loss in the human image segmentation module (Conv 3 × 3 edge module) to speed up the operation of the image segmentation model. As shown in fig. 14, fig. 14 is an application schematic diagram of portrait background replacement provided by an embodiment of the present invention, and the final background replacement specifically includes the following steps:

1) Inputting: as shown in fig. 14, an image I to be processed is input;

2) Human image segmentation: inputting an image I to be processed into an image segmentation model to obtain a segmentation set and an edge set S, E = Seg (Enc (I));

3) And (3) foreground extraction: performing connected domain detection on the segmentation result graph to obtain a connected domain set D = { D = { (D) } ₁ ,...,D _i ,...,D _n In which D is _i Representing the pixel set of the ith connected set, and selecting a connected region with the largest area as a foreground region F, as shown in a formula (3);

F＝max({D _i |sum(D _i ),D _i ∈D}) (3)

here, sum () is an area finding function, and max () is a maximum area finding function. The background region set is B = U-F, where U is the set of all pixels in the image I.

4) Background replacement: keeping the foreground region unchanged, and replacing the background region with a selected target background picture (for example, a black background image) to obtain a final background replacement result I '(for example, the background in the image I' is black), where the calculation formula is shown in formula (4):

I′＝Replace(I)*B+I*F (4)

the Replace (·) represents the operation of replacing the original picture I with any other picture, and the algebraic operation I × F represents the element corresponding to the subscript in the image I by F index (fetching).

5) And (3) outputting: the background replacement image I' is output.

After the terminal obtains the background replacement image of each frame of the video, the background replacement image only has a portrait, and other parts are black. Therefore, when the user carries out video call, the user or the other party only sees the portrait part, and the other parts are covered, so that the user can protect the privacy of own objects.

In conclusion, the embodiment of the invention can effectively reduce the error segmentation condition of the non-human body part in the image segmentation scene, especially the condition that the color of the human body part is close to that of the surrounding objects. As shown in fig. 15A to 15C, fig. 15A is an input image without background replacement, the input image including a chair 1501 and a portrait; fig. 15B is a background replacement image in which a chair 1501 is erroneously recognized as a portrait, using another background replacement method; fig. 15C is a background replacement image obtained by the background replacement method according to the embodiment of the present invention, and the chair 1501 can be correctly recognized as the background, so that the portrait can be accurately recognized. The embodiment of the invention can effectively retain the local detail information, thereby accurately dividing the edge parts of the human body area and the non-human body area and realizing accurate background replacement.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method of image segmentation, the method comprising:

performing the following processing for each pixel in the up-sampled feature map: when the classification probability of the pixel is smaller than a classification threshold, determining a difference value between a classification error threshold and the classification probability of the pixel as a compensation value of the pixel;

combining the compensation values of all pixels in the up-sampling feature map to obtain a compensation feature map of the up-sampling feature map;

2. The method of claim 1,

the neural network model for image segmentation comprises an encoding network and a decoding network;

the down-sampling coding of the image to be segmented including the foreground and the background comprises the following steps:

carrying out down-sampling coding on the image to be segmented comprising the foreground and the background through the coding network to obtain a down-sampling feature map;

the up-sampling decoding is performed on the obtained down-sampling feature map to obtain an up-sampling feature map, and the method comprises the following steps:

and performing up-sampling decoding on the down-sampling feature map output by the coding network through the decoding network to obtain the up-sampling feature map.

3. The method of claim 2,

the coding network of the downsampling coding comprises a plurality of cascaded coding layers;

the down-sampling coding is performed on the image to be segmented including the foreground and the background through the coding network to obtain the down-sampling feature map, and the down-sampling feature map comprises:

performing down-sampling coding on the image to be segmented comprising the foreground and the background through a first coding layer of the plurality of cascaded coding layers;

outputting the coding result of the first coding layer to a subsequent cascaded coding layer, and continuing to perform downsampling coding and outputting the coding result in the subsequent cascaded coding layer until the last coding layer;

4. The method of claim 2,

when the coding network of the downsampling coding comprises a plurality of cascaded coding layers, the decoding network of the upsampling decoding comprises a plurality of cascaded decoding layers, and cross-layer connection exists between the decoding layers and the coding layers of the same layer;

the up-sampling decoding is performed on the down-sampling feature map output by the coding network through the decoding network to obtain the up-sampling feature map, and the method comprises the following steps:

upsampling the downsampled feature map by a first decoding layer of the plurality of cascaded decoding layers;

fusing the decoding result of the up-sampling decoding with the encoding result output by the encoding layer connected with the first decoding layer in a cross-layer manner, and outputting the fused result to a subsequent cascaded decoding layer as the final decoding result of the first decoding layer so as to

5. The method of claim 1,

the neural network model for image segmentation also comprises a compensation network;

the classifying the up-sampling feature map to obtain the classification probability of each pixel in the up-sampling feature map corresponding to the foreground and the background comprises:

performing the following processing on the up-sampling feature map included in the intermediate decoding result and/or the final decoding result of the up-sampling decoding through the compensation network:

6. The method of claim 1,

the fusing the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fused feature map includes:

performing the following processing for the features of each pixel in the downsampled feature map:

7. The method according to claim 1, wherein the identifying of the foreground and the background from the image to be segmented based on the classification probability of the foreground and the background corresponding to each pixel in the fused feature map comprises:

performing the following processing for each pixel in the fused feature map:

determining the maximum connected domain in the plurality of connected domains as the foreground of the image to be segmented, and

8. The method according to any one of claims 1-7, further comprising:

carrying out down-sampling coding on an image sample to be segmented through an image segmentation model, and carrying out up-sampling decoding on the obtained down-sampling feature map to obtain an up-sampling feature map;

classifying the up-sampling feature map to obtain classification probability of each pixel in the up-sampling feature map corresponding to a foreground and a background;

9. An image background replacement method, characterized in that the method comprises:

presenting an image to be segmented;

presenting the new image in the client;

wherein the image segmentation model is used for executing the following processing:

performing the following processing for each pixel in the up-sampled feature map: when the classification probability of the pixel is smaller than a classification threshold, determining the difference value between a classification error threshold and the classification probability of the pixel as a compensation value of the pixel; determining the classification probability of the pixel as a compensation value of the pixel when the classification probability of the pixel is greater than or equal to a classification threshold;

10. An image segmentation apparatus, characterized in that the apparatus comprises:

the compensation module is used for classifying the up-sampling feature map to obtain the classification probability of each pixel in the up-sampling feature map corresponding to the foreground and the background; performing the following processing for each pixel in the up-sampled feature map: when the classification probability of the pixel is smaller than a classification threshold, determining the difference value between a classification error threshold and the classification probability of the pixel as a compensation value of the pixel; determining the classification probability of the pixel as a compensation value of the pixel when the classification probability of the pixel is greater than or equal to a classification threshold;

combining the compensation values of all pixels in the up-sampling feature map to obtain a compensation feature map of the up-sampling feature map; performing fusion processing on the compensation feature map, the up-sampling feature map and the down-sampling feature map to obtain a fusion feature map;

11. An image background replacement apparatus, characterized in that the apparatus comprises:

the presentation module is used for presenting an image to be segmented;

the identification module is used for responding to the replacement operation of the image to be segmented and identifying a foreground and a background from the image to be segmented through an image segmentation model;

the replacing module is used for replacing the identified background with a target background to obtain a new image after the background is replaced and displaying the new image in the client;

12. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the image segmentation method of any one of claims 1 to 8 or the image background replacement method of claim 9 when executing the executable instructions stored in the memory.

13. A computer-readable storage medium having stored thereon executable instructions for causing a processor to perform the image segmentation method of any one of claims 1 to 8 or the image background replacement method of claim 9 when executed.