CN112802081B - Depth detection method and device, electronic equipment and storage medium - Google Patents

Depth detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112802081B
CN112802081B CN202110105980.1A CN202110105980A CN112802081B CN 112802081 B CN112802081 B CN 112802081B CN 202110105980 A CN202110105980 A CN 202110105980A CN 112802081 B CN112802081 B CN 112802081B
Authority
CN
China
Prior art keywords
depth
target object
target
image
pixel point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110105980.1A
Other languages
Chinese (zh)
Other versions
CN112802081A (en
Inventor
李雷
李健华
张家旺
史璇珂
王权
钱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sensetime Technology Co Ltd
Original Assignee
Shenzhen Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sensetime Technology Co Ltd filed Critical Shenzhen Sensetime Technology Co Ltd
Priority to CN202110105980.1A priority Critical patent/CN112802081B/en
Publication of CN112802081A publication Critical patent/CN112802081A/en
Priority to PCT/CN2021/103067 priority patent/WO2022160587A1/en
Application granted granted Critical
Publication of CN112802081B publication Critical patent/CN112802081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a depth detection method, a depth detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a binocular image of a target scene acquired by a binocular camera; generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image; performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image; and generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.

Description

Depth detection method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of computer vision, in particular to a depth detection method, a depth detection device, electronic equipment and a storage medium.
Background
In computer vision technology, a process related to depth information identification is often required, for example, in application scenes such as augmented reality (Augmented Reality, AR) interaction, virtual photographing, short video special effects and the like, the depth information identification is required to be performed on a target object to achieve a more realistic picture effect.
In general, depth information identification for a target object may be completed by a monocular camera or a Time Of Flight (TOF) camera, for example, depth information between the target object and the monocular camera may be estimated by an image collected by the monocular camera and a pre-trained depth detection network; or the depth image of the target object can be obtained through the time interval between transmitting the light pulse and receiving the light pulse and the transmission speed of the light pulse by the TOF camera.
When depth information identification is performed based on images acquired by a monocular camera, the problem of large error exists depending on a pre-trained depth detection network, and when depth information is determined based on a TOF camera, the defects of low resolution, poor outdoor scene effect and high hardware cost easily exist.
Disclosure of Invention
Embodiments of the present disclosure provide at least one depth detection scheme.
In a first aspect, an embodiment of the present disclosure provides a depth detection method, including:
Acquiring a binocular image of a target scene acquired by a binocular camera;
Generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image;
Performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image;
And generating a target object depth map of the target object based on the target image information of the target object in the color image and the depth image.
In the embodiment of the disclosure, compared with a binocular image of a target scene acquired by a binocular camera and a color image aligned with the depth image based on a monocular camera and a TOF camera, the depth image with higher accuracy can be obtained, the color image can be subjected to semantic segmentation, and the color image contains pixel points matched with a target object, so that the depth value of the pixel points of the target object can be accurately obtained in the depth image aligned with the color image.
In a possible implementation manner, the generating the target object depth map of the target object based on the semantic segmentation result of the color image and the depth image includes:
Adjusting the depth value of the pixel point, the depth value of which does not accord with the preset depth value range, in the depth image to be a target depth value;
Determining the position information of the pixel point of the target object in the depth image aligned with the color image based on the position information of the pixel point of the target object in the color image indicated by the semantic segmentation result of the color image;
And adjusting the depth values of the pixel points in the depth image except for the image area of the target object to be the target depth value based on the determined position information of the pixel points, so as to obtain a target object depth map of the target object.
In the embodiment of the disclosure, the depth values of the pixels in the depth image, the depth values of which do not conform to the preset depth value range, and the depth values of the pixels in the depth image, except for the image area where the target object is located, are adjusted to be target depth values, for example, 0, so that the depth information of the pixels except for the target object can be filtered out, and a target object depth map capable of intuitively representing the depth information of the target object is obtained.
In one possible implementation manner, after generating the target object depth map of the target object, the method further includes:
And carrying out depth value correction on the pixel points corresponding to the depth value equal to the target depth value in the target object depth map, and carrying out smoothing treatment on the corrected target object depth map to obtain the optimized target object depth map.
In the embodiment of the disclosure, it is proposed to correct a depth value of a pixel point having a depth value equal to a target depth value in an image area where a target object is located in a target object depth map, complement the depth value of the pixel point having the depth value equal to the target depth value in the image area where the target object is located, and then smooth the corrected target object, so that accuracy and integrity of the obtained target object depth map can be improved.
In a possible implementation manner, the performing depth value correction on the pixel point with the corresponding depth value equal to the target depth value in the target object depth map includes:
Traversing pixel points of an image area where the target object is located in the target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to the target depth value;
Detecting whether the depth value of the first associated pixel point is equal to the target depth value or not under the condition that the depth value of the current pixel point is not equal to the target depth value; the first associated pixel point is a pixel point which is adjacent to the current pixel point and is in an image area where the target object is located;
and under the condition that the depth value of the first associated pixel point is detected to be equal to the target depth value, updating the depth value of the first associated pixel point into the depth value of the current pixel point.
In the embodiment of the disclosure, under the condition that the depth value of the current pixel point is not equal to the target depth value, the depth values of the adjacent pixel points in the image area where the target object is located can be complemented by the depth value of the current pixel point, and according to the method, the true depth value of the pixel point, in the image area where the target object is located, of the depth value of the pixel point which is equal to the target depth value in the depth map can be rapidly determined, so that the complete target object depth map is obtained.
In one possible implementation manner, the smoothing the modified target object depth map includes:
Determining the average depth value of any pixel point and a second associated pixel point of the image area where the target object is located in the corrected target object depth map; the second associated pixel points are pixel points which are spaced from any pixel point by a preset pixel point number range and are in an image area where the target object is located;
And adjusting the depth value of any pixel point to be the average depth value.
In the embodiment of the disclosure, the depth value of any pixel point is smoothed by carrying out smoothing processing on the pixel point and the depth value of the any pixel point within a preset pixel point number range spaced from the any pixel point in the target object, so that a smoothed target object depth map can be obtained.
In one possible implementation manner, after generating the target object depth map of the target object, the depth detection method further includes:
Determining first position information of the target object in a real scene based on the position information of the target object in the color image, the target object depth map and pose data of the binocular camera;
determining second position information of a target virtual object in the real scene based on the position information of the target virtual object in a three-dimensional scene model matched with the real scene;
and determining an AR special effect when the AR equipment presents the target virtual object according to the pose data of the AR equipment in the real scene, the first position information and the second position information.
In the embodiment of the disclosure, after determining the depth map of the target object, the first position information of the target object in the world coordinate system may be further determined, so that in combination with the second position information of the target virtual object in the world coordinate system, the AR special effect when the target virtual object is presented in the AR device may be determined, and the sense of realism of the AR special effect is improved.
In a possible implementation manner, the performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image includes:
Classifying and identifying the pixel points in the color image according to a pre-trained target neural network to obtain the probability that the pixel points in the color image belong to a target object;
And obtaining a semantic segmentation result of the color image based on the probability that each pixel point in the color image belongs to the target object.
In the embodiment of the disclosure, each pixel point in the color image can be classified through a pre-trained target neural network, so that the pixel point belonging to the target object in the color image can be rapidly determined, and the depth value of each pixel point contained in the target object can be conveniently extracted from the depth image based on the pixel position information of the pixel point belonging to the target object in the color image.
In a second aspect, an embodiment of the present disclosure provides a depth detection apparatus, including:
The acquisition module is used for acquiring binocular images of the target scene acquired by the binocular camera;
a first generation module for generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image;
the segmentation module is used for carrying out semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with the target object;
and the second generation module is used for generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the depth detection method according to the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the depth detection method according to the first aspect.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a depth detection method provided by an embodiment of the present disclosure;
FIG. 2 illustrates a flow chart of a method of determining a target object depth map provided by an embodiment of the present disclosure;
FIG. 3a illustrates a target object depth map provided by an embodiment of the present disclosure;
FIG. 3b illustrates a modified target object depth map provided by an embodiment of the present disclosure;
FIG. 4 illustrates a flow chart of a method for modifying a target object depth map provided by an embodiment of the present disclosure;
FIG. 5 illustrates a flow chart of a method for smoothing a target object depth map according to an embodiment of the present disclosure;
FIG. 6 illustrates a flow chart of a method for determining AR special effects provided by embodiments of the present disclosure;
FIG. 7 is a schematic diagram of a depth detection device according to an embodiment of the disclosure;
fig. 8 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
Depth detection Of a target object, such as human depth detection, has important applications in various scenes, such as AR interaction, virtual photographing, short video special effects, and the like, and existing human depth detection technologies are generally completed based on a monocular camera or a Time Of Flight (TOF) camera, and errors Of human depth information determined by the two modes are large.
Based on the above study, the disclosure provides a depth detection method, which is characterized in that a binocular image of a target scene acquired by a binocular camera can obtain a depth image with higher accuracy and a color image aligned with the depth image compared with a binocular image and a TOF camera, so that the color image can be subjected to semantic segmentation, and pixel points matched with a target object are contained in the color image, so that the depth value of each pixel point of the target object can be accurately obtained in the depth image aligned with the color image.
For the sake of understanding the present embodiment, first, a depth detection method disclosed in the embodiments of the present disclosure will be described in detail, where an execution subject of the depth detection method provided in the embodiments of the present disclosure is generally a computer device having a certain computing capability, and the computer device includes, for example: the terminal device or server or other processing device may be a User Equipment (UE), a mobile device, a User terminal, a handheld device, a computing device, a vehicle mount device, a wearable device, etc. In some possible implementations, the depth detection method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
Referring to fig. 1, a flowchart of a depth detection method according to an embodiment of the disclosure is shown, where the depth detection method includes the following steps S101 to S104:
S101, acquiring binocular images of a target scene acquired by a binocular camera.
For example, the target scene contains a target object to be subjected to depth extraction, and two images of the same target scene, such as a first image and a second image, can be acquired by a binocular camera, wherein the first image and the second image can be red, green and blue (RGB) images or gray scale images.
S102, generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image.
For example, after obtaining the binocular image, a depth image of the target scene may be determined according to parallax information between the binocular images, such as a depth image of the target scene may be obtained from the binocular image by a binocular stereo matching algorithm, and a color image aligned with the depth image.
Illustratively, the depth image of the target scene includes a depth value corresponding to each pixel, where the depth value corresponding to each pixel can represent a distance between a location indicated by the pixel in the target scene and the binocular camera.
Illustratively, in the case where the binocular image is an RGB image, the color image is also an RGB image, and in the case where the binocular image is a gray image, the color image is also a gray image.
Illustratively, the depth image of the target scene and the color image aligned with the depth image have the same size, and pixels in the depth image and the color image having the same pixel coordinates can represent the same location point in the target scene, for example, a pixel having a pixel coordinate of (i, j) in an image coordinate system corresponding to the color image can represent the color feature of the location point a in the target scene, and a pixel having a pixel coordinate of (i, j) in an image coordinate system corresponding to the depth image aligned with the color image can represent the depth value of the location point a in the target scene, so that the target object depth image of the target object can be determined together based on the color image and the depth image.
S103, carrying out semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with the target object in the color image.
For example, the semantic segmentation may be performed on the color image according to a pre-trained semantic segmentation model, so as to obtain a semantic segmentation result of the color image, where the semantic segmentation result may be represented by a binary image aligned with the color image, and pixels in the color image that match the target object may be represented by a first label value in the binary image, such as by 1, and pixels in the color image that match an area outside the target object may be represented by a second label value in the binary image, such as by 0.
S104, generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.
Illustratively, considering that the depth image and the color image are aligned, the depth value of the pixel indicated by the pixel position information may be extracted from the depth image corresponding to the color image based on the pixel position information of the pixel matched with the target object in the color image, in such a manner that, after obtaining the depth value corresponding to each pixel matched with the target object, a target object depth map of the target object may be generated, which may characterize the distance between the target object and the binocular camera in the target scene.
In the embodiment of the disclosure, compared with a binocular image of a target scene acquired by a binocular camera and a color image aligned with the depth image based on a monocular camera and a TOF camera, the depth image with higher accuracy can be obtained, the color image can be subjected to semantic segmentation, and the color image contains pixel points matched with a target object, so that the depth value of each pixel point of the target object can be accurately obtained in the depth image aligned with the color image.
In one embodiment, in S103, when the semantic segmentation is performed on the color image to obtain the semantic segmentation result of the color image, the method includes S1031 to S1032:
s1031, classifying and identifying pixel points in the color image to obtain the probability that the pixel points in the color image belong to the target object;
s1032, based on the probability that each pixel point in the color image belongs to the target object, the semantic segmentation result of the color image is obtained.
For example, the pixel points of the color image may be classified and identified based on a pre-trained target neural network, the target neural network may be a semantic segmentation model for performing semantic segmentation, taking the color image performing the binary prediction as an example, determining the pixel points of the color image belonging to the target object and the pixel points not belonging to the target object, inputting the color image into the target neural network for performing the class prediction, so as to obtain the probability that each pixel point in the color image belongs to the target object and the probability that each pixel point in the color image does not belong to the target object, where the sum of the probability that each pixel point in the color image belongs to the target object and the probability that each pixel point in the color image does not belong to the target object is 1, so that the semantic segmentation result of the color image may be further determined.
For example, the semantic segmentation result of the color image can indicate the pixel points matched with the target object in the color image, for example, the label value of the pixel points belonging to the target object can be set as a first label value, the label value of the pixel points not belonging to the target object can be recorded as a second label value, and thus a binary image corresponding to the color image can be obtained, and the binary image can intuitively represent the pixel points belonging to the target object and the pixel points not belonging to the target object in the color image.
In the embodiment of the disclosure, each pixel point in the color image can be classified through a pre-trained target neural network, so that the pixel point belonging to the target object in the color image can be rapidly determined, and the depth value of each pixel point contained in the target object can be conveniently extracted from the depth image based on the pixel position information of the pixel point belonging to the target object in the color image.
Specifically, when generating the target object depth map of the target object based on the semantic segmentation result of the color image and the depth image, including as shown in fig. 2, the following S201 to S203 may be included:
S201, adjusting the depth value of the pixel point, the depth value of which does not accord with the preset depth value range, in the depth image to be a target depth value.
For example, in consideration of the fact that in the process of determining the depth image based on the binocular image acquired by the binocular camera, the depth value of some pixels cannot be acquired due to the distortion problem of the binocular camera or the error in the process of determining the depth image, or the acquired depth value of some pixels exceeds a reasonable range, for example, exceeds the maximum depth value which can be acquired by the binocular camera, or the depth value is negative, the situation can affect the depth image of the target object to be determined later, so that after the depth image of the target scene is obtained, the depth image can be corrected initially.
For example, the maximum depth value and the minimum depth value in the preset depth value range may be determined according to parameters of the binocular camera, and after the preset depth value range is obtained, the depth values of the pixels whose depth values do not conform to the preset depth value range in the depth image may be reset, for example, the depth values of the pixels which do not conform to the preset depth value range are set to be fixed target depth values, for example, set to be 0, and some cavity areas, that is, areas where the depth values of the pixels are 0, exist in the depth image obtained in this manner.
S202, determining the position information of the pixel point of the target object in the depth image aligned with the color image based on the position information of the pixel point of the target object in the color image indicated by the semantic segmentation result of the color image;
And S203, adjusting the depth values of the pixel points except the image area of the target object in the depth image to be target depth values based on the determined position information of the pixel points, so as to obtain a target object depth map of the target object.
Considering that the color image and the depth image are aligned, the pixel position information of the target object in the depth image can be determined based on the pixel position information of the target object in the color image, so that the pixel in the depth image can be traversed, whether the current pixel belongs to the target object is determined according to the determined pixel position information, the depth value of the current pixel is reserved when the current pixel belongs to the target object is determined, and the depth value of the current pixel is set to be the target depth value, such as 0, when the current pixel does not belong to the target object.
In the embodiment of the disclosure, the depth values of the pixels in the depth image, the depth values of which do not conform to the preset depth value range, and the depth values of the pixels in the depth image, except for the image area where the target object is located, are adjusted to be target depth values, for example, 0, so that the depth information of the pixels except for the target object can be filtered out, and a target object depth map capable of intuitively representing the depth information of the target object is obtained.
For example, in the target object depth map obtained in the above manner, there are some pixels with depth values equal to the target depth value in the image area where the target object is located, and the area formed by these pixels may be referred to as a hole area, as shown in fig. 3a, so after the target object depth map of the target object is generated, the depth detection method provided in the embodiment of the present disclosure further includes:
and carrying out depth value correction on the pixel points corresponding to the depth value equal to the target depth value in the target object depth map, and carrying out smoothing treatment on the corrected target object depth map to obtain the optimized target object depth map.
For example, after the depth value correction is performed on the target object depth map having the cavity area, a target object depth map as shown in fig. 3b may be obtained, and in addition, considering that some noise may exist in the target object depth map obtained after the depth value correction, the depth of the outline of the obtained target object is not smooth, or an area with a severe depth value variation appears in the interior of the target object, so that the accuracy of the target object depth map is affected, so that the corrected target object depth map may be subjected to smoothing processing to improve the accuracy of the target object depth map.
In the embodiment of the disclosure, it is proposed to correct a depth value of a pixel point having a depth value equal to a target depth value in an image area where a target object is located in a target object depth map, complement the depth value of the pixel point having the depth value equal to the target depth value in the image area where the target object is located, and then smooth the corrected target object, so that accuracy and integrity of the obtained target object depth map can be improved.
Specifically, in the depth value correction of the pixel point having the corresponding depth value equal to the target depth value in the target object depth map, as shown in fig. 4, the following steps S301 to S303 are included:
S301, traversing pixel points of an image area where a target object is located in the target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to the target depth value.
Taking the target object depth map shown in fig. 3a as an example, taking a region where a pixel point with a depth value equal to the target depth value in an image region where the target object is located in the target object depth map as a hole region, where the depth value of the pixel point in the hole region and the depth value of the pixel point outside the region where the target object is located are both target depth values, so that the target object depth map is missing when depth information of the target object is represented, the depth value of the pixel point in the hole region needs to be corrected, specifically, after determining the pixel point belonging to the target object in the target object depth map, the pixel point in the image region where the target object is located is traversed, and whether the depth value of the current pixel point is equal to the target depth value is determined.
S302, detecting whether the depth value of the first associated pixel point is equal to the target depth value or not under the condition that the depth value of the current pixel point is not equal to the target depth value; the first associated pixel point is a pixel point adjacent to the current pixel point and in an image area where the target object is located.
For example, in the case that it is detected that the depth value of the current pixel point is not equal to the target depth value, it may be continuously detected whether the depth value of the pixel point adjacent to the current pixel point and in the image area where the target object is located is equal to the target depth value.
And S303, updating the depth value of the first associated pixel point to the depth value of the current pixel point under the condition that the depth value of the first associated pixel point is detected to be equal to the target depth value.
Illustratively, the depth values of the adjacent pixels in the image area where the target object is located are complemented by the depth value of the current pixel, so that the depth value of each pixel in the hollow area in the target object can be obtained, and the target object depth map is obtained as shown in fig. 3 b.
In the embodiment of the disclosure, under the condition that the depth value of the current pixel point is not equal to the target depth value, the depth values of the adjacent pixel points in the image area where the target object is located can be complemented by the depth value of the current pixel point, and according to the method, the true depth value of the pixel point, in which the depth value of the image area where the target object is located in the depth map, is equal to the target depth value can be rapidly determined, so that a complete target object depth map is obtained.
Specifically, when smoothing the corrected target object depth map, as shown in fig. 5, the following S401 to S402 may be included:
S401, determining the average depth value of any pixel point and a second associated pixel point of an image area where a target object is located in the corrected target object depth map; the second associated pixel points are pixel points which are spaced from any pixel point by a preset pixel point number range and are in an image area where the target object is located;
S402, adjusting any pixel point to be an average depth value.
For example, for a pixel point with a pixel coordinate of (i, j) in the target depth map, an average depth value of the pixel point and other pixel points of an image area where the target object is located within a preset pixel point number range with the pixel point can be determined, and the determined average depth value is used as an updated depth value of the pixel point.
In the embodiment of the disclosure, the depth value of any pixel point is smoothed by carrying out smoothing processing on the pixel point and the depth value of the any pixel point within a preset pixel point number range spaced from the any pixel point in the target object, so that a smoothed target object depth map can be obtained.
In one implementation, after generating the target object depth map of the target object, as shown in fig. 6, the depth detection method provided by the embodiment of the disclosure further includes:
s501, determining first position information of a target object in a real scene based on position information of the target object in a color image, a target object depth map, and pose data of a binocular camera.
The pose data of the binocular camera specifically includes a position and a pose of the binocular camera in a real scene, for example, a world coordinate system may be constructed in advance based on the real scene, and the position and the pose of the binocular camera in the world coordinate system corresponding to the real scene may be determined by calibrating the binocular camera in advance.
The camera coordinate system is constructed by a binocular camera, and based on the position information of the target object in the color image, the depth map of the target object and the position data of the binocular camera in the world coordinate system, the first position information of each pixel point contained in the target object in the color image in the world coordinate system can be determined, so that the first position information of the target object in the real scene is obtained.
S502, determining second position information of the target virtual object in the real scene based on the position information of the target virtual object in the three-dimensional scene model matched with the real scene.
For example, a three-dimensional scene model matched with a real scene may be constructed in advance, the three-dimensional scene model and the real scene are presented in the same coordinate system at 1:1, the display position of the target virtual object in the world coordinate system may be generated based on the three-dimensional scene model, for example, the target virtual object is a virtual cartoon character, and the second position information of the target virtual object in the world coordinate system corresponding to the real scene may be determined based on the position information of the target virtual object in the three-dimensional scene model.
S503, determining the AR special effect when the AR device presents the target virtual object according to the pose data, the first position information and the second position information of the AR device in the real scene.
For example, the AR device may or may not include the above-mentioned binocular camera, and in particular, the AR device may include, but is not limited to, devices with display functions and data processing capabilities such as AR glasses, tablet computers, smartphones, smart wearable devices, etc., in which applications for presenting AR scene content may be installed, and in which a user may experience AR special effects.
For example, the pose data of the AR device may include a position and a pose of the AR device in a world coordinate system corresponding to a real scene, may be determined based on a pose sensor mounted on the AR device, may be determined based on a real scene image captured by the AR device, and is not specifically limited herein.
For example, the AR special effect between the target virtual object and the target object when the AR device is presented, such as whether the target virtual object is blocked by the target object or whether the target virtual object is blocked by the target object, may be determined by pose data of the AR device in the world coordinate system, first position information of the target object in the world coordinate system, and second position information of the target virtual object in the world coordinate system.
In the embodiment of the disclosure, after determining the depth map of the target object, the first position information of the target object in the world coordinate system may be further determined, so that in combination with the second position information of the target virtual object in the world coordinate system, the AR special effect when the target virtual object is presented in the AR device may be determined, and the sense of realism of the AR special effect is improved.
Taking a target object as a human body as an example, the depth detection method provided by the disclosure is introduced, and specifically comprises the following steps:
(1) Obtaining a depth image of a target scene and an RGB image aligned with the depth image from an input binocular image through a binocular stereo matching algorithm;
(2) Performing portrait segmentation on the RGB image to obtain a portrait mask capable of representing a semantic segmentation result of the RGB image;
(3) Matching the depth image with a portrait mask to obtain a human body depth image;
(4) And carrying out cavity complement and smoothing treatment on the human body depth map to obtain a target human body depth map.
Wherein the (4) th step comprises: (4-1) traversing each pixel point in the human body depth map, and if the depth value is not 0, transmitting the depth value to the adjacent pixel point with the depth value of 0. Repeating the process until all the cavity areas with depth values of 0 in the human body depth map are filled; and (4-2) smoothing the filled and complemented human depth map.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same technical concept, the embodiment of the disclosure further provides a depth detection device corresponding to the depth detection method, and since the principle of solving the problem by the device in the embodiment of the disclosure is similar to that of the depth detection method in the embodiment of the disclosure, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 7, a schematic diagram of a depth detection apparatus 600 according to an embodiment of the disclosure is provided, where the depth detection apparatus 600 includes:
an acquisition module 601, configured to acquire a binocular image of a target scene acquired by a binocular camera;
A first generation module 602, configured to generate a depth image of the target scene and a color image aligned with the depth image based on the binocular image;
the segmentation module 603 is configured to perform semantic segmentation on the color image to obtain a semantic segmentation result of the color image, where the semantic segmentation result is used to indicate a pixel point in the color image that matches with the target object;
the second generation module 604 is configured to generate a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.
In one possible implementation, the second generating module 604, when configured to generate the target object depth map of the target object based on the semantic segmentation result of the color image and the depth image, includes:
Adjusting the depth value of the pixel point, the depth value of which does not accord with the preset depth value range, in the depth image to be a target depth value;
Determining the position information of the pixel point of the target object in the depth image aligned with the color image based on the position information of the pixel point of the target object in the color image indicated by the semantic segmentation result of the color image;
and adjusting the depth values of the pixels except the image area of the target object in the depth image to be target depth values based on the determined position information of the pixels, so as to obtain a target object depth map of the target object.
In one possible implementation, the second generating module 604 is further configured to, after generating the target object depth map of the target object:
and carrying out depth value correction on the pixel points corresponding to the depth value equal to the target depth value in the target object depth map, and carrying out smoothing treatment on the corrected target object depth map to obtain the optimized target object depth map.
In one possible implementation manner, the second generating module 604 when configured to perform depth value correction on a pixel point in the target object depth map, where the corresponding depth value is equal to the target depth value, includes:
traversing pixel points of an image area where a target object is located in a target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to a target depth value;
Detecting whether the depth value of the first associated pixel point is equal to the target depth value or not under the condition that the depth value of the current pixel point is not equal to the target depth value; the first associated pixel point is a pixel point which is adjacent to the current pixel point and is in an image area where the target object is located;
And under the condition that the depth value of the first associated pixel point is detected to be equal to the target depth value, updating the depth value of the first associated pixel point into the depth value of the current pixel point.
In one possible implementation manner, the second generating module 604, when configured to perform smoothing on the modified target object depth map, includes:
Determining the average depth value of any pixel point and a second associated pixel point of an image area where a target object is positioned in the corrected target object depth map; the second associated pixel points are pixel points which are spaced from any pixel point by a preset pixel point number range and are in an image area where the target object is located;
and adjusting the depth value of any pixel point to be an average depth value.
In one possible implementation, after generating the target object depth map of the target object, the second generating module 604 is further configured to:
Determining first position information of a target object in a real scene based on position information of the target object in a color image, a target object depth map and pose data of a binocular camera;
Determining second position information of the target virtual object in the real scene based on the position information of the target virtual object in the three-dimensional scene model matched with the real scene;
According to pose data, first position information and second position information of the AR equipment in the real scene, determining an AR special effect when the AR equipment presents the target virtual object.
In one possible implementation, the segmentation module 603, when configured to perform semantic segmentation on a color image to obtain a semantic segmentation result of the color image, includes:
Classifying and identifying the pixel points in the color image to obtain the probability that the pixel points in the color image belong to the target object;
and obtaining a semantic segmentation result of the color image based on the probability that each pixel point in the color image belongs to the target object.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Corresponding to the depth detection method in fig. 1, the embodiment of the present disclosure further provides an electronic device 700, as shown in fig. 8, which is a schematic structural diagram of the electronic device 700 provided in the embodiment of the present disclosure, including:
A processor 71, a memory 72, and a bus 73; memory 72 is used to store execution instructions, including memory 721 and external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 71 and data exchanged with an external memory 722 such as a hard disk, and the processor 71 exchanges data with the external memory 722 through the memory 721, and when the electronic device 700 is operated, the processor 71 and the memory 72 communicate with each other through the bus 73, so that the processor 71 executes the following instructions: acquiring a binocular image of a target scene acquired by a binocular camera; generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image; performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image; and generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the depth detection method in the method embodiments described above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform the steps of the depth detection method described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (7)

1. A depth detection method, comprising:
Acquiring a binocular image of a target scene acquired by a binocular camera;
Generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image;
Performing semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object in the color image;
Generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image;
carrying out depth value correction on the pixel points with the corresponding depth values equal to the target depth values in the target object depth map, and carrying out smoothing treatment on the corrected target object depth map to obtain an optimized target object depth map;
Determining first position information of the target object in a real scene based on the position information of the target object in the color image, the target object depth map and pose data of the binocular camera; determining second position information of a target virtual object in the real scene based on the position information of the target virtual object in a three-dimensional scene model matched with the real scene; determining an AR special effect between the target virtual object and the target object when the AR equipment presents the target virtual object according to pose data of the AR equipment in the real scene, the first position information and the second position information;
the smoothing processing of the corrected target object depth map includes:
Determining the average depth value of any pixel point and a second associated pixel point of the image area where the target object is located in the corrected target object depth map; the second associated pixel points are pixel points which are spaced from any pixel point by a preset pixel point number range and are in an image area where the target object is located;
And adjusting the depth value of any pixel point to be the average depth value.
2. The depth detection method according to claim 1, wherein the generating the target object depth map of the target object based on the semantic segmentation result of the color image and the depth image includes:
Adjusting the depth value of the pixel point, the depth value of which does not accord with the preset depth value range, in the depth image to be a target depth value;
Determining the position information of the pixel point of the target object in the depth image aligned with the color image based on the position information of the pixel point of the target object in the color image indicated by the semantic segmentation result of the color image;
And adjusting the depth values of the pixel points in the depth image except for the image area of the target object to be the target depth value based on the determined position information of the pixel points, so as to obtain a target object depth map of the target object.
3. The depth detection method according to claim 1, wherein performing depth value correction on the pixel point corresponding to the depth value equal to the target depth value in the target object depth map includes:
Traversing pixel points of an image area where the target object is located in the target object depth map, and detecting whether the depth value of the traversed current pixel point is equal to the target depth value;
Detecting whether the depth value of the first associated pixel point is equal to the target depth value or not under the condition that the depth value of the current pixel point is not equal to the target depth value; the first associated pixel point is a pixel point which is adjacent to the current pixel point and is in an image area where the target object is located;
and under the condition that the depth value of the first associated pixel point is detected to be equal to the target depth value, updating the depth value of the first associated pixel point into the depth value of the current pixel point.
4. A depth detection method according to any one of claims 1 to 3, wherein said performing semantic segmentation on said color image to obtain a semantic segmentation result of said color image comprises:
Classifying and identifying the pixel points in the color image to obtain the probability that the pixel points in the color image belong to a target object;
And obtaining a semantic segmentation result of the color image based on the probability that each pixel point in the color image belongs to the target object.
5. A depth detection device, comprising:
The acquisition module is used for acquiring binocular images of the target scene acquired by the binocular camera;
a first generation module for generating a depth image of the target scene and a color image aligned with the depth image based on the binocular image;
The segmentation module is used for carrying out semantic segmentation on the color image to obtain a semantic segmentation result of the color image, wherein the semantic segmentation result is used for indicating pixel points matched with a target object;
The second generation module is used for generating a target object depth map of the target object based on the semantic segmentation result of the color image and the depth image; carrying out depth value correction on the pixel points with the corresponding depth values equal to the target depth values in the target object depth map, and carrying out smoothing treatment on the corrected target object depth map to obtain an optimized target object depth map;
The second generation module is further configured to: determining first position information of the target object in a real scene based on the position information of the target object in the color image, the target object depth map and pose data of the binocular camera; determining second position information of a target virtual object in the real scene based on the position information of the target virtual object in a three-dimensional scene model matched with the real scene; determining an AR special effect between the target virtual object and the target object when the AR equipment presents the target virtual object according to pose data of the AR equipment in the real scene, the first position information and the second position information;
The second generation module is configured to, when performing smoothing processing on the modified target object depth map:
Determining the average depth value of any pixel point and a second associated pixel point of the image area where the target object is located in the corrected target object depth map; the second associated pixel points are pixel points which are spaced from any pixel point by a preset pixel point number range and are in an image area where the target object is located;
And adjusting the depth value of any pixel point to be the average depth value.
6. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the depth detection method according to any one of claims 1 to 4.
7. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the depth detection method according to any one of claims 1 to 4.
CN202110105980.1A 2021-01-26 2021-01-26 Depth detection method and device, electronic equipment and storage medium Active CN112802081B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110105980.1A CN112802081B (en) 2021-01-26 2021-01-26 Depth detection method and device, electronic equipment and storage medium
PCT/CN2021/103067 WO2022160587A1 (en) 2021-01-26 2021-06-29 Depth detection method and apparatus, electronic device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110105980.1A CN112802081B (en) 2021-01-26 2021-01-26 Depth detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112802081A CN112802081A (en) 2021-05-14
CN112802081B true CN112802081B (en) 2024-05-24

Family

ID=75811939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110105980.1A Active CN112802081B (en) 2021-01-26 2021-01-26 Depth detection method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112802081B (en)
WO (1) WO2022160587A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802081B (en) * 2021-01-26 2024-05-24 深圳市商汤科技有限公司 Depth detection method and device, electronic equipment and storage medium
CN113592935A (en) * 2021-07-27 2021-11-02 贝壳技术有限公司 Depth image generation method and device
CN115393224A (en) * 2022-09-02 2022-11-25 点昀技术(南通)有限公司 Depth image filtering method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680496A (en) * 2015-03-17 2015-06-03 山东大学 Kinect deep image remediation method based on colorful image segmentation
CN109683699A (en) * 2019-01-07 2019-04-26 深圳增强现实技术有限公司 The method, device and mobile terminal of augmented reality are realized based on deep learning
CN110443205A (en) * 2019-08-07 2019-11-12 北京华捷艾米科技有限公司 A kind of hand images dividing method and device
CN111815696A (en) * 2019-04-11 2020-10-23 曜科智能科技(上海)有限公司 Depth map optimization method, device, equipment and medium based on semantic instance segmentation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105517677B (en) * 2015-05-06 2018-10-12 北京大学深圳研究生院 The post-processing approach and device of depth map/disparity map
CN112802081B (en) * 2021-01-26 2024-05-24 深圳市商汤科技有限公司 Depth detection method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680496A (en) * 2015-03-17 2015-06-03 山东大学 Kinect deep image remediation method based on colorful image segmentation
CN109683699A (en) * 2019-01-07 2019-04-26 深圳增强现实技术有限公司 The method, device and mobile terminal of augmented reality are realized based on deep learning
CN111815696A (en) * 2019-04-11 2020-10-23 曜科智能科技(上海)有限公司 Depth map optimization method, device, equipment and medium based on semantic instance segmentation
CN110443205A (en) * 2019-08-07 2019-11-12 北京华捷艾米科技有限公司 A kind of hand images dividing method and device

Also Published As

Publication number Publication date
CN112802081A (en) 2021-05-14
WO2022160587A1 (en) 2022-08-04

Similar Documents

Publication Publication Date Title
CN109961406B (en) Image processing method and device and terminal equipment
CN112802081B (en) Depth detection method and device, electronic equipment and storage medium
CN110232369B (en) Face recognition method and electronic equipment
US9412034B1 (en) Occlusion handling for computer vision
US11321575B2 (en) Method, apparatus and system for liveness detection, electronic device, and storage medium
CN110648397B (en) Scene map generation method and device, storage medium and electronic equipment
CN107392958B (en) Method and device for determining object volume based on binocular stereo camera
CN108230384B (en) Image depth calculation method and device, storage medium and electronic equipment
CN108830892B (en) Face image processing method and device, electronic equipment and computer readable storage medium
CN110956114A (en) Face living body detection method, device, detection system and storage medium
CN109640066B (en) Method and device for generating high-precision dense depth image
CN105229697A (en) Multi-modal prospect background segmentation
CN111444744A (en) Living body detection method, living body detection device, and storage medium
CN113205057B (en) Face living body detection method, device, equipment and storage medium
CN107980138A (en) A kind of false-alarm obstacle detection method and device
CN113221767B (en) Method for training living body face recognition model and recognizing living body face and related device
CN113301320B (en) Image information processing method and device and electronic equipment
CN111626086A (en) Living body detection method, living body detection device, living body detection system, electronic device, and storage medium
CN115035235A (en) Three-dimensional reconstruction method and device
JP7156624B2 (en) Depth map filtering device, depth map filtering method and program
CN111353325A (en) Key point detection model training method and device
CN111383254A (en) Depth information acquisition method and system and terminal equipment
CN104243970A (en) 3D drawn image objective quality evaluation method based on stereoscopic vision attention mechanism and structural similarity
JP7264308B2 (en) Systems and methods for adaptively constructing a three-dimensional face model based on two or more inputs of two-dimensional face images
CN111160233B (en) Human face in-vivo detection method, medium and system based on three-dimensional imaging assistance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40045370

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant