CN113139983A

CN113139983A - Human image segmentation method and device based on RGBD

Info

Publication number: CN113139983A
Application number: CN202110534736.7A
Authority: CN
Inventors: 李亚林; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2021-07-20

Abstract

The invention discloses a human image segmentation method and a human image segmentation device based on RGBD, wherein the method comprises the following steps: acquiring a color image of a current portrait; transmitting the color image to a deep learning segmentation model to obtain a first human body foreground area, wherein the deep learning segmentation model is obtained based on deep neural network training; acquiring a second human body foreground region, wherein the second human body foreground region is associated with the depth map of the current portrait; and fusing the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region. In the process, the first human body foreground area is associated with the color image, the second human body foreground area is associated with the depth image, and the first human body foreground area and the depth image are fused, which is equivalent to combining the depth image segmentation algorithm and the color image segmentation algorithm, so that the segmentation precision of the target human body segmentation area is improved, and the occurrence of false detection is avoided.

Description

Human image segmentation method and device based on RGBD

Technical Field

The invention relates to the technical field of image processing, in particular to a portrait segmentation method and device based on RGBD.

Background

With the further intellectualization of devices such as mobile phones and televisions, the applications of such devices are increasing, real-time portrait segmentation technology is becoming more and more popular, and especially in recent years, DL is emerging, and such research is becoming more and more important.

At present, the portrait segmentation technology can be divided into two types according to scene requirements, one type is portrait segmentation mainly based on mobile phone self-shooting, and the other type is portrait segmentation mainly based on the complete body of a person and then carries out human-computer interaction. The segmentation effect of the traditional methods such as machine learning algorithm and logic modeling is not good and unstable, and false detection is easy to occur.

Disclosure of Invention

In view of this, the invention provides a human image segmentation method and a human image segmentation device based on RGBD, so as to solve the problem that the current mainstream human image segmentation algorithm is a deep learning method based on an RGB color map, and the segmentation result of the method often depends on the complexity (floating point calculation) of data and a model, which means that the models that can be deployed at a mobile end often have common effects; due to poor diversity of the RGB color image, the problem of false detection is easy to occur in human image segmentation. The specific scheme is as follows:

an RGBD-based portrait segmentation method comprises the following steps:

acquiring a color image of a current portrait;

transmitting the color image to a deep learning segmentation model to obtain a first human body foreground area, wherein the deep learning segmentation model is obtained based on deep neural network training;

acquiring a second human body foreground region, wherein the second human body foreground region is associated with the depth map of the current portrait;

and fusing the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region.

Optionally, the method for obtaining the second human body foreground region includes:

acquiring a depth map of the current portrait;

and extracting the second human body foreground region based on the depth map.

Optionally, the method for obtaining a target human body segmentation region by fusing the first human body foreground region and the second human body foreground region includes:

detecting the actual distance between the current portrait and a lens;

determining a weight factor according to the actual distance;

and fusing the first human body segmentation region and the second human body segmentation region based on a preset formula C-alpha A + (1-alpha) B to determine a target human body segmentation region, wherein C is the target human body segmentation region, B is the first human body segmentation region, A is the second human body segmentation region, and alpha is a weight factor.

The above method, optionally, determining a weighting factor according to the actual distance, includes:

judging whether the actual distance belongs to a preset extreme value interval, wherein the extreme value interval comprises: a first extremum interval and a second extremum interval;

if not, comparing the actual distance with a reference distance, and determining the weight factor according to a comparison result;

if so, the weighting factor is 0 under the condition that the actual distance is in the first extreme value interval, and the weighting factor is 1 under the condition that the actual distance belongs to the second extreme value interval.

In the method, optionally, the reference distance is 1.2 m.

The above method, optionally, further includes:

and carrying out filtering processing on the target human body segmentation region.

An RGBD-based portrait segmentation apparatus, comprising:

the color image acquisition module is used for acquiring a color image of the current human image;

the determining module is used for transmitting the color image to a deep learning segmentation model to obtain a first human body foreground area, wherein the deep learning segmentation model is obtained based on deep neural network training;

the region acquisition module is used for acquiring a second human body foreground region, wherein the second human body foreground region is associated with the depth map of the current portrait;

and the fusion module is used for fusing the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region.

The above apparatus, optionally, the region acquiring module includes:

the acquisition unit is used for acquiring a depth map of the current portrait;

and the extracting unit is used for extracting the second human body foreground area based on the depth map.

The above apparatus, optionally, the fusion module includes:

the detection unit is used for detecting the actual distance between the current portrait and the lens;

the determining unit is used for determining a weight factor according to the actual distance;

and the fusion unit is used for fusing the first human body segmentation region and the second human body segmentation region based on a preset formula C-alpha A + (1-alpha) B to determine a target human body segmentation region, wherein C is the target human body segmentation region, B is the first human body segmentation region, A is the second human body segmentation region, and alpha is a weight factor.

The above apparatus, optionally, the determining unit includes:

a judging subunit, configured to judge whether the actual distance belongs to a preset extremum interval, where the extremum interval includes: a first extremum interval and a second extremum interval;

the first determining subunit is used for comparing the actual distance with a reference distance if the actual distance is not within the reference distance, and determining the weighting factor according to a comparison result;

and the second determining subunit is configured to, if yes, set the weighting factor to 0 when the actual distance is in the first extremum interval, and set the weighting factor to 1 when the actual distance belongs to the second extremum interval.

Compared with the prior art, the invention has the following advantages:

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a portrait segmentation method based on RGBD disclosed in an embodiment of the present application;

fig. 2 is another flowchart of a portrait segmentation method based on RGBD disclosed in the embodiment of the present application;

fig. 3 is a block diagram of a portrait segmentation apparatus based on RGBD according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The invention discloses a portrait segmentation method and a device based on RGBD, which are applied to the portrait segmentation process, in the prior art, a deep learning method can be adopted for portrait segmentation, the segmentation result of the method usually depends on the complexity (floating point calculation amount) of data and a model, although the effect is good, the method depends on the GPU acceleration of a PC end, and if the model parameters and the floating point calculation amount (flops) are too large, the calculation force of mobile equipment cannot keep up and the real-time effect cannot be achieved. The model which can be deployed at the mobile terminal and run to real time is usually poor in effect; the segmentation effect of the traditional methods such as machine learning algorithm and logic modeling is not good and unstable, and false detection is easy to occur. In view of the above problems, the present invention provides a portrait segmentation method based on RGBD, wherein RGBD is RGB + Depth Map, RGB is a color representing three channels of red (R), green (G), and blue (B) and is obtained by changing the three color channels and superimposing the three color channels on each other, and RGB is one of the most widely used color systems at present. Depth Map: in 3D computer graphics, a Depth Map (Depth Map) is an image or image channel containing information about the distance of the surface of a scene object from a viewpoint. Where the Depth Map is similar to a grayscale image except that each pixel value thereof is the actual distance of the sensor from the object. Usually, the RGB image and the Depth image are registered, so that there is a one-to-one correspondence between the pixel points. The method combines deep learning and traditional algorithms. The execution flow of the method is shown in fig. 1, and comprises the following steps:

s101, acquiring a color image of a current portrait;

in the embodiment of the invention, the current portrait can be a face image of a close view or a human body image of a distant view, and a color map of the current portrait can be obtained through IMI or Kinect, preferably, the color map is an RGB color map.

S102, transmitting the color image to a deep learning segmentation model to obtain a first human body foreground area, wherein the deep learning segmentation model is obtained based on deep neural network training;

in the embodiment of the invention, the deep learning segmentation model is obtained in advance based on deep neural network training, wherein the input of the deep learning segmentation model is a color map, the color map can be a public data set or a data set which is acquired and labeled by the color map, the color map comprises a label, the label is a Mask containing a human body area, the label is used as a standard when the loss of the model is calculated, the loss is minimized when the deep learning segmentation model is optimized, the label is equivalent to the standard, the optimized target is that the difference value between the output and the label is minimum, and the Mask is a Mask, namely a binary image, and is used for distinguishing a portrait area from a background.

The output of the deep learning segmentation model is a binary image with the same size as the input, and the binary image represents a mask of a portrait, for example, a value 0 represents a background, and a value 255 represents a portrait region. In the deep learning segmentation model training process, input data can comprise a colorful image containing a human body and a small number of pictures without the human body and are specially used for identifying a background, whether the pictures without the human body are selected depends on the situation, and when a portrait detected by the deep learning segmentation model contains redundant backgrounds and is detected by mistake, the portrait can be corrected by using a small number of background pictures. And after the deep learning segmentation model is trained, transmitting the color image to the deep learning segmentation model for segmentation to obtain a first human body foreground region, wherein the first human body foreground region is a binary image.

Further, a PortraitNet lightweight neural network is adopted in the training process of the deep learning segmentation model, which is not limited in the embodiment of the present invention, and other neural networks selected are also within the protection scope of the present invention.

S103, acquiring a second human body foreground region, wherein the second human body foreground region is associated with the depth map of the current portrait;

in the embodiment of the invention, a second human body foreground region is obtained, the second human body foreground region is also a binary image, the second human body foreground region can be directly obtained and can be directly output by utilizing Kinect or IMI and the like, and the human body foreground region can be obtained from an SDK (software development kit) carried in Kinect or IMI equipment. The other method is to use a preset algorithm to calculate the depth map, such as a method of blob segmentation and some logic modeling (mathematical modeling or logic screening), and some machine learning methods like random forest to extract the second human foreground region. For example, the blob segmentation adopted means a connected region with characteristics of similar color texture and the like, for a depth map containing a portrait, a segmentation method can be designed according to the shape texture of the portrait in the depth map and the relationship between pixel neighborhoods, and the depth map is binarized to obtain a portrait foreground region and a background region.

And S104, fusing the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region.

In the embodiment of the invention, the integrity of the human body edge and the human body trunk of the first human body foreground region segmented by the deep learning segmentation model trained by the lightweight neural network is slightly poor, while the integrity of the human body edge and the human body trunk of the second human body foreground region segmented by the depth map is better, but false detection and multiple detection are easy to occur, so that the two are fused, and the segmentation result can be more accurate by utilizing the advantages of the two. Can be expressed by formula (1), wherein C-target body segmentation region, B-first body segmentation region, a-second body segmentation region, α -weight factor.

C＝α*A+(1-α)B (1)

Preferably, the depth map and the color map are registered and aligned and adjusted to a uniform resolution, and then the actual distance between the current portrait and the lens is detected to determine whether the actual distance belongs to a preset extremum interval, wherein the extremum interval includes: a first extremum interval and a second extremum interval; the first extreme value interval is used for indicating that a human body is close to a lens, the selection principle of the first extreme value interval can be set based on experience or specific conditions, the selection principle of the second extreme value interval is not specifically limited in the embodiment of the invention, the second extreme value interval is used for identifying that the human body is far away from the lens, the selection principle of the second extreme value interval can be set based on experience or specific conditions, the selection principle of the second extreme value interval is not specifically limited in the embodiment of the invention, the weight factor is 0 under the condition that the actual distance is in the first extreme value interval, and the weight factor is 1 under the condition that the actual distance belongs to the second extreme value interval.

In the case that the actual distance is neither in the first extreme value interval nor in the second extreme value interval, the actual distance is compared with a reference distance, wherein the reference distance may be set based on experience or specific conditions, preferably, the reference distance is 1.2m, when the distance lens is closer (eg: <1.2m), the α value is smaller (the specific value of α is determined according to the conditions), i.e., the result of color image output is more reliable, because it is very difficult to construct an accurate target human body segmentation region when the human body is closer to the camera, the diversity of the depth image is slightly poor, and it is almost impossible to construct an accurate target human body segmentation region when the distance is closer, so the result of color image identification is more reliable. When the distance from the lens is far (eg: ═ 1.2m), the alpha value is larger (the specific value of alpha is determined according to the situation), namely the result output by the depth map is more believable, the model segmented by the depth map has the condition of false detection, the segmentation result of the depth map can be corrected by utilizing the segmentation result of the color map, and whether the segmentation result of the depth map is reserved can be determined by specifically utilizing the area size of the human body region connected domain of the statistical color map segmentation result.

In the fusion process, alpha basically has a linear relation with the distance, when the distance is gradually increased from zero, the value of alpha is gradually increased, and when the distance is increased to a certain value, the value of alpha is gradually reduced. Specifically, the quality of the first human body foreground region and the second human body foreground region is obtained according to the quality of the first human body foreground region and the quality of the second human body foreground region.

The invention discloses a portrait segmentation method based on RGBD, which comprises the following steps: acquiring a color image of a current portrait; transmitting the color image to a deep learning segmentation model to obtain a first human body foreground area, wherein the deep learning segmentation model is obtained based on deep neural network training; acquiring a second human body foreground region, wherein the second human body foreground region is associated with the depth map of the current portrait; and fusing the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region. In the process, the first human body foreground area is associated with the color image, the second human body foreground area is associated with the depth image, and the first human body foreground area and the depth image are fused, which is equivalent to combining the depth image segmentation algorithm and the color image segmentation algorithm, so that the segmentation precision of the target human body segmentation area is improved, and the occurrence of false detection is avoided.

In the embodiment of the invention, the edges of the target human body segmentation area obtained by segmentation have a lot of burrs, the human body edges are optimized by utilizing morphology and a filtering algorithm, and a specific post-processing algorithm can select the optimal fusion and post-processing method according to the color image depth learning model and the accuracy based on the depth image model. The model can be designed in different modes according to different precisions so as to achieve an optimal effect. The color image deep learning model and the depth image model comprise RGB color images with weight and bias parameter input, and the RGB color images are sent into the model, so that input data and the parameters are subjected to a series of linear and nonlinear operations to obtain a binary value output, and the binary value represents foreground and background areas.

In the embodiment of the invention, the segmentation result of the depth map and the segmentation result of the color map are perfectly combined, so that the advantages of the depth map and the color map are combined, the defects of the depth map and the color map are avoided, and the algorithm achieves a more accurate result on the premise that a mobile terminal runs to real time. The specific algorithm flow is shown in fig. 2.

In the beginning stage, initializing equipment, mainly aiming at the initialization of a software part, detecting whether the equipment is opened or not, and successfully acquiring data, if the equipment is not opened and/or unsuccessfully acquiring data, returning to initialize again, if so, respectively acquiring a depth image and a color image by using a device (such as Kinect, IMI and the like) capable of acquiring the depth image, and secondly, training a color image-based neural network model (a depth learning segmentation model) by using a depth network by using collected color data (which can be a public data set or a labeled data set acquired by the equipment) containing a human image; thirdly, preprocessing the collected RGB color image and then sending the preprocessed RGB color image into a model for inference to obtain a human body segmentation area B (a first human body foreground area), and storing the inferred human body area result; fourthly, a human body segmentation area A (a second human body foreground area) is extracted by using the depth map (the data of the human body foreground can be directly output by using Kinect, IMI and the like and can be directly used, and a self algorithm, such as blob segmentation and logic judgment, can also be used for extracting a scene area); and finally, fusing the segmented regions by using the results of the third step and the fourth step to obtain a final human body segmented region C (target human body segmented region), and finishing the treatment.

In the embodiment of the invention, the existing portrait segmentation algorithm deployed at the mobile terminal is mainly a deep learning algorithm, the algorithm is simple to deploy, but the segmentation precision of the mobile terminal is sacrificed in real time considering the computing power of the mobile equipment, and the other conventional machine learning or mathematical logic modeling method has more false detection, and when a human body does complicated special actions, the detection is often not detected and missed, so that the two algorithms are fused to realize the purpose of raising the advantages and avoiding the disadvantages, and the segmentation effect of the algorithm is improved under the condition of unchanged computing quantity. Furthermore, the portrait segmentation in a self-photographing mode and the human body complete segmentation of a long shot can be realized, so that the precision of the whole segmentation algorithm is improved.

Based on the above-mentioned human image segmentation method based on RGBD, in an embodiment of the present invention, there is further provided a human image segmentation apparatus based on RGBD, a structural block diagram of the segmentation apparatus is shown in fig. 3, and the method includes:

the system comprises a color map acquisition module 201, a determination module 202, an area acquisition module 203 and a fusion module 204.

Wherein,

the color image obtaining module 201 is configured to obtain a color image of a current person image;

the determining module 202 is configured to transmit the color image to a deep learning segmentation model to obtain a first human foreground region, where the deep learning segmentation model is obtained based on deep neural network training;

the region obtaining module 203 is configured to obtain a second human body foreground region, where the second human body foreground region is associated with the depth map of the current portrait;

the fusion module 204 is configured to fuse the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region.

The invention discloses a portrait segmentation device based on RGBD, which comprises: acquiring a color image of a current portrait; transmitting the color image to a deep learning segmentation model to obtain a first human body foreground area, wherein the deep learning segmentation model is obtained based on deep neural network training; acquiring a second human body foreground region, wherein the second human body foreground region is associated with the depth map of the current portrait; and fusing the first human body foreground region and the second human body foreground region to obtain a target human body segmentation region. In the process, the first human body foreground area is associated with the color image, the second human body foreground area is associated with the depth image, and the first human body foreground area and the depth image are fused, which is equivalent to combining the depth image segmentation algorithm and the color image segmentation algorithm, so that the segmentation precision of the target human body segmentation area is improved, and the occurrence of false detection is avoided.

In this embodiment of the present invention, the area obtaining module 203 includes:

an acquisition unit 205 and an extraction unit 206.

Wherein,

the acquiring unit 205 is configured to acquire a depth map of the current portrait;

the extracting unit 206 is configured to extract the second human foreground region based on the depth map.

In this embodiment of the present invention, the fusion module 204 includes:

a detection unit 207, a determination unit 208, and a fusion unit 209.

Wherein,

the detection unit 207 is configured to detect an actual distance between the current portrait and a lens;

the determining unit 208 is configured to determine a weighting factor according to the actual distance;

the fusion unit 209 is configured to fuse the first human body segmentation region and the second human body segmentation region based on a preset formula C ═ a + (1- α) B, and determine a target human body segmentation region, where C is the target human body segmentation region, B is the first human body segmentation region, a is the second human body segmentation region, and α is a weighting factor.

In this embodiment of the present invention, the determining unit 208 includes:

a judgment subunit 210, a first determination subunit 211, and a second determination subunit 212.

Wherein,

the determining subunit 210 is configured to determine whether the actual distance belongs to a preset extremum interval, where the extremum interval includes: a first extremum interval and a second extremum interval;

the first determining subunit 211, configured to, if not, compare the actual distance with a reference distance, and determine the weighting factor according to a comparison result;

the second determining subunit 212 is configured to, if yes, set the weight factor to 0 when the actual distance is in the first extreme value interval, and set the weight factor to 1 when the actual distance belongs to the second extreme value interval.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An RGBD-based portrait segmentation method is characterized by comprising the following steps:

acquiring a color image of a current portrait;

2. The method of claim 1, wherein obtaining a second human foreground region comprises:

acquiring a depth map of the current portrait;

and extracting the second human body foreground region based on the depth map.

3. The method according to claim 1, wherein fusing the first human foreground region and the second human foreground region to obtain a target human segmented region comprises:

detecting the actual distance between the current portrait and a lens;

determining a weight factor according to the actual distance;

4. The method of claim 3, wherein determining a weighting factor as a function of the actual distance comprises:

5. The method of claim 4, wherein the reference distance is 1.2 m.

6. The method of claim 1, further comprising:

7. An RGBD-based portrait segmentation apparatus, comprising:

8. The apparatus of claim 7, wherein the region acquisition module comprises:

the acquisition unit is used for acquiring a depth map of the current portrait;

9. The apparatus of claim 7, wherein the fusion module comprises:

10. The apparatus of claim 9, wherein the determining unit comprises: