WO2022110877A1 - 深度检测方法、装置、电子设备、存储介质及程序 - Google Patents

深度检测方法、装置、电子设备、存储介质及程序 Download PDF

Info

Publication number
WO2022110877A1
WO2022110877A1 PCT/CN2021/109803 CN2021109803W WO2022110877A1 WO 2022110877 A1 WO2022110877 A1 WO 2022110877A1 CN 2021109803 W CN2021109803 W CN 2021109803W WO 2022110877 A1 WO2022110877 A1 WO 2022110877A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
dimensional key
key point
image
frame image
Prior art date
Application number
PCT/CN2021/109803
Other languages
English (en)
French (fr)
Inventor
李雷
李健华
王权
钱晨
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011344694.2A external-priority patent/CN112419388A/zh
Priority claimed from CN202011335257.4A external-priority patent/CN112465890A/zh
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2022110877A1 publication Critical patent/WO2022110877A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images

Definitions

  • the present disclosure is based on the Chinese patent application with the application number of 202011344694.2 and the application date of November 24, 2020, and the application name of "depth detection method, device, electronic device and computer-readable storage medium", and the application number of 202011335257.4 and the application date of Filed for the Chinese patent application filed on November 24, 2020 with the application title of "Depth Detection Method, Apparatus, Electronic Device and Computer-readable Storage Medium", and claims the priority of the above-mentioned Chinese patent application, and the entire content of the above-mentioned Chinese patent application This disclosure is incorporated herein by reference.
  • the present disclosure relates to the field of computer vision technology, in particular, but not limited to, a depth detection method, apparatus, electronic device, storage medium and computer program.
  • image depth detection technology has important applications in Augmented Reality (AR) interaction, virtual photography and other applications; in the absence of special hardware devices such as 3D depth cameras, how to realize image human depth detection , is an urgent technical problem to be solved.
  • AR Augmented Reality
  • Embodiments of the present disclosure provide a depth detection method, apparatus, electronic device, storage medium, and computer program.
  • An embodiment of the present disclosure provides a depth detection method, the method is applied to an electronic device, and the method includes:
  • the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image, and the mask image of the human body determine the depth detection result of the human body in the current frame image, wherein the human body includes a single human body Or at least two human bodies.
  • Embodiments of the present disclosure also provide a depth detection device, the device comprising:
  • the acquisition module is configured to: acquire at least one frame of image collected by the image acquisition device, where the at least one frame of image includes the current frame image;
  • the processing module is configured to: perform human body image segmentation on the current frame image to obtain a mask image of the human body; perform human body key point detection on the at least one frame image to obtain a two-dimensional human body in the current frame image Key point information and 3D key point information;
  • the detection module is configured to: determine the depth detection result of the human body in the current frame image according to the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image and the mask image of the human body, wherein the The human body includes a single human body or at least two human bodies.
  • Embodiments of the present disclosure also provide an electronic device, the electronic device comprising:
  • the processor is configured to implement any one of the above depth detection methods when executing the executable instructions stored in the memory.
  • Embodiments of the present disclosure further provide a computer-readable storage medium storing executable instructions for implementing any one of the above-mentioned depth detection methods when executed by a processor.
  • Embodiments of the present disclosure further provide a computer program, the computer program includes computer-readable codes, and when the computer-readable codes are executed in an electronic device, the processor of the electronic device executes the code for realizing the The depth detection method described in any preceding item.
  • the embodiments of the present disclosure can combine the human body mask image and the two-dimensional key points and three-dimensional key information of the human body to determine the depth detection result of the human body, and it is not necessary to obtain the depth information of the human body in the image through a special hardware device such as a three-dimensional depth camera. Therefore, The embodiments of the present disclosure can realize depth detection of a human body in an image without relying on special hardware devices such as a three-dimensional depth camera, and can be applied to scenarios such as AR interaction and virtual photography.
  • FIG. 1 is a schematic diagram of connection between a terminal and a server according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a depth detection method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a two-dimensional key point of a human skeleton provided by an embodiment of the present disclosure
  • FIG. 4A is a schematic diagram of a two-dimensional key point of a target human body provided by an embodiment of the present disclosure
  • 4B is a schematic diagram of a three-dimensional key point and a human body mask image of a target human body according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of implementing a depth detection method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of a point cloud provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a depth detection apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the terms “comprising”, “comprising” or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes the explicitly stated elements, but also other elements not expressly listed or inherent to the implementation of the method or apparatus.
  • an element defined by the phrase “comprises a" does not preclude the presence of additional related elements (eg, steps in a method or a device) in which the element is included.
  • a unit in an apparatus for example, a unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the depth detection method provided by the embodiment of the present disclosure includes a series of steps, but the depth detection method provided by the embodiment of the present disclosure is not limited to the described steps.
  • the depth detection device provided by the embodiment of the present disclosure includes a A series of modules, but the apparatus provided by the embodiments of the present disclosure is not limited to including the explicitly described modules, and may also include modules that need to be set for acquiring relevant information or performing processing based on the information.
  • 3D depth camera can be used to realize the depth detection of the human body in the image.
  • the 3D depth camera here can be a camera with a binocular camera and using binocular vision technology to obtain depth information; however, using these special The hardware will increase the application cost and limit the application scenarios to a certain extent.
  • the requirements for the accuracy of depth estimation and the amount of information provided are relatively low; in the case of human depth estimation based on images captured by a monocular camera, Only the relative depth between each pixel of the human body can be estimated, but the depth between the pixel of the human body and the camera cannot be estimated, which limits the scope of application to a certain extent; in some cases, only a single pixel can be estimated for each pixel of the human body. Therefore, the estimated depth information is less; in some cases, depth information estimation can be achieved based on the image matching algorithm of consecutive frames, but this scheme increases the consumption of time resources and computing resources, and is not suitable for low power consumption. real-time application scenarios of consumption.
  • the embodiments of the present disclosure provide a depth detection method, apparatus, electronic device, storage medium, and computer program.
  • the depth detection method provided by the embodiments of the present disclosure can be used without relying on high-cost and complex hardware such as a three-dimensional depth camera.
  • the depth detection of the human body in the image is realized; the depth detection method provided by the embodiment of the present disclosure can be applied to an electronic device, and an exemplary application of the electronic device provided by the embodiment of the present disclosure is described below.
  • the electronic devices provided by the embodiments of the present disclosure may be AR glasses, laptop computers, tablet computers, desktop computers, mobile devices (eg, mobile phones, portable music players, personal digital assistants, dedicated messaging devices, portable game equipment) and other various terminals with image acquisition devices
  • the image acquisition device may be a device such as a monocular camera, for example, the terminal may be a mobile phone with a camera.
  • the terminal may perform depth detection on the image collected by the image collection device according to the depth detection method of the embodiment of the present disclosure, and obtain a depth detection result of the human body in the image.
  • the electronic device provided by the embodiments of the present disclosure may also be a server that forms a communication connection with the above-mentioned terminal.
  • FIG. 1 is a schematic diagram of connection between a terminal and a server according to an embodiment of the present disclosure. As shown in FIG. 1 , a terminal 100 is connected to a server 102 through a network 101, and the network 101 may be a wide area network or a local area network, or a combination of the two.
  • the server 102 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present disclosure.
  • the terminal 100 is used for collecting the image at the current moving position through the image collecting device; the collected image can be sent to the server 102; after receiving the image, the server 102 can use the depth detection method of the embodiment of the present disclosure to analyze the received image Depth detection is performed to obtain the depth detection result of the human body in the image.
  • FIG. 2 is a schematic flowchart of a depth detection method provided by an embodiment of the present disclosure, and the method is applied to an electronic device. As shown in FIG. 2, the process may include steps 201 to 203:
  • Step 201 Acquire at least one frame of image collected by the image collection device, where the at least one frame of image includes the current frame image.
  • the image capturing device may capture images, and may also send at least one frame of image including the current frame image to the processor of the electronic device.
  • At least one frame of image includes current frame image (a frame of image collected at the current moment); in some embodiments, at least one frame of image includes not only current frame image, but also historical frame image, here, historical frame image An image represents one or more frames of historical images captured by an image capture device.
  • At least one frame of image in the case where at least one frame of image is a multi-frame image, at least one frame of image may be a continuous frame image continuously collected by an image acquisition device, or may be discontinuous multiple frame images. This is not limited.
  • Step 202 segment the human body image on the current frame image to obtain a mask image of the human body; perform human body key point detection on at least one frame image to obtain two-dimensional key point information and three-dimensional key point information of the human body in the current frame image .
  • the above-mentioned human body includes at least two human bodies; correspondingly, performing the segmentation of the human body image on the current frame image to obtain the mask image of the human body may be achieved by: segmenting the human body image on the current frame image to obtain mask images of at least two human bodies; and detecting human body key points on at least one frame of images to obtain two-dimensional key point information and three-dimensional key point information of at least two human bodies in the current frame image.
  • the above-mentioned human body includes a single target human body; correspondingly, the segmentation of the human body image on the current frame image to obtain the mask image of the human body may be achieved by: segmenting the human body image on the current frame image to obtain The mask image of the target human body; and the human body key point detection is performed on at least one frame of image, and the two-dimensional key point information and the three-dimensional key point information of at least one human body in the current frame image are obtained.
  • a human body image can be segmented on the current frame image according to a pre-trained image segmentation model to obtain a human body mask image of the human body.
  • the image segmentation model may be a model related to the attributes of the human image.
  • the attributes of the human image may include area, gray value of pixels, or other attributes; in some embodiments, In the case where the attribute of the human body is the area, the human body image is segmented on the current frame image according to the pre-trained image segmentation model, and the mask image of the human body with the area larger than the set area can be obtained.
  • the image segmentation model may be implemented by a neural network, for example, the image segmentation model may be implemented by a fully convolutional neural network or other neural networks.
  • the image segmentation model can be predetermined according to actual requirements, and the actual requirements include but are not limited to time-consuming requirements, precision requirements, etc.; that is, different image segmentation models can be set according to different actual requirements.
  • the image segmentation model is an image segmentation model of at least two human bodies.
  • the image segmentation model is an image segmentation model of a single human body.
  • the human body mask of the target human body can be obtained.
  • the human body mask image of the target human body can be directly obtained, which has the characteristics of easy implementation.
  • the human body mask image of the target human body can be segmented from the current frame image by using an image segmentation model of a single human body.
  • the current frame image is segmented into a single human body image, and the target human body representing the human body with the largest area can be obtained.
  • Body mask image when the attribute of the human body is area, according to a pre-trained image segmentation model of a single human body, the current frame image is segmented into a single human body image, and the target human body representing the human body with the largest area can be obtained. Body mask image.
  • the two-dimensional key points are used to represent the key position points of the human body in the image plane;
  • the two-dimensional key point information may include coordinate information of the two-dimensional key points, and the coordinate information of the two-dimensional key points includes abscissa and vertical coordinates. coordinate.
  • the 3D key point information may include the coordinate information of the 3D key points.
  • the coordinate information of the 3D key points represents the coordinates of the 3D key points in the camera coordinate system.
  • the optical axis of the acquisition device is a three-dimensional rectangular coordinate system established by the Z axis, and the X axis and the Y axis of the camera coordinate system are two mutually perpendicular coordinate axes of the image plane.
  • the three-dimensional key point corresponding to the two-dimensional key point may be determined according to the two-dimensional key point information, and the coordinate information of the three-dimensional key point may be determined; Train a keypoint conversion model, which is used to convert 2D keypoints to 3D keypoints; in this way, after obtaining the trained keypoint conversion model, the coordinate information of the 2D keypoints can be input to the training
  • the completed key point conversion model is obtained to obtain the three-dimensional key points corresponding to the two-dimensional key points and the coordinate information of the three-dimensional key points.
  • the network structure of the key point conversion model is not limited.
  • the key point conversion model may be a sequential convolutional network or a non-sequential fully connected network; the network structure of the key point translation model may be based on practical applications. Requirements are predetermined.
  • detection and tracking of human body key points may be performed on the at least one frame of image to obtain the two-dimensional key point information of at least one human body in the current frame image and Three-dimensional key point information; understandably, tracking human body key points based on multiple frames of images is conducive to accurately obtaining the two-dimensional key point information of at least one human body in the current frame image, which is conducive to obtaining accurate three-dimensional key point information .
  • the detection and tracking of human body key points may be performed on the continuous frame images to obtain the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image.
  • Key point information understandably, the tracking of human body key points based on consecutive frame images is conducive to further accurately obtaining the two-dimensional key point information of at least one human body in the current frame image, which is further conducive to obtaining accurate three-dimensional key point information .
  • Step 203 Determine the depth detection result of the human body in the current frame image according to the two-dimensional key point information and the three-dimensional key point information of the human body in the current frame image and the mask image of the human body.
  • the above-mentioned human body includes at least two human bodies; correspondingly, the implementation of step 203 may be: according to the two-dimensional key point information and three-dimensional key point information of the at least two human bodies in the current frame image, and the at least two human bodies The mask image of the body is determined, and the depth detection results of the at least two human bodies in the current frame image are determined.
  • the above-mentioned human body includes a single target human body; correspondingly, the implementation of step 203 may be: according to the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image, and the human body of the target human body Mask image to determine the depth detection result of the target human body in the current frame image.
  • the above steps 201 to 203 may be implemented based on a processor of an electronic device, and the above processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), Central Processing Unit (CPU), At least one of a controller, a microcontroller, and a microprocessor.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • CPU Central Processing Unit
  • the electronic device that implements the function of the above processor may also be other, which is not limited by the embodiment of the present disclosure.
  • the embodiments of the present disclosure can combine the human body mask image and the two-dimensional key points and three-dimensional key information of the human body to determine the depth detection results of multiple human bodies, and there is no need to obtain the human body in the image through special hardware devices such as a three-dimensional depth camera. Therefore, the embodiments of the present disclosure can realize the depth detection of the human body in the image without relying on special hardware devices such as a 3D depth camera, and can be applied to scenarios such as AR interaction and virtual photography.
  • the embodiments of the present disclosure can obtain the depth information between each pixel point of the human body and the camera, instead of estimating a single depth for each pixel point of the human body, the obtained depth information is relatively rich and can be applied to multiple In scenes, for example, the application scope of the embodiments of the present disclosure includes but is not limited to: 3D reconstruction and presentation of dynamic human bodies in 3D human body reconstruction; occlusion display of human bodies and virtual scenes in augmented reality applications; interaction, etc.; further, the embodiment of the present disclosure does not directly estimate the depth information of the human body pixels based on the image matching algorithm of continuous frames, but uses the two-dimensional key point information and three-dimensional key point information of the human body to determine the depth information of the human body pixels. Compared with the scheme of depth information estimation based on image matching algorithm based on continuous frames, the consumption of time resources and computing resources is reduced, and a balance is achieved between the estimation accuracy of depth information and the time-consuming of determining depth information.
  • At least one frame of image collected by the above image collection device is an RGB image; it can be seen that the embodiments of the present disclosure can implement depth detection of multiple human bodies based on easily obtained RGB images, which is easy to implement.
  • the two-dimensional key point information of each of the at least two human bodies can be compared with each of the mask images of the at least two human bodies.
  • the mask images of the human body are matched to obtain the two-dimensional key point information belonging to each human body; Depth detection results of the human body.
  • the two-dimensional key point information of at least two human bodies in the current frame image with the mask image of each human body, the two-dimensional key point information of each human body can be directly obtained, and then the two-dimensional key point information of each human body can be directly obtained. Determine the depth detection results for each human body.
  • the above-mentioned two-dimensional key point information is a two-dimensional key point representing a human skeleton
  • the three-dimensional key point information is a three-dimensional key point representing a human skeleton
  • the two-dimensional key points of the human skeleton are used to represent the key positions of the human body in the image plane.
  • the key positions of the human body include but are not limited to facial features, neck, shoulders, elbows, hands, hips, knees, feet, etc.; the key positions of the human body can be determined according to The actual situation is preset; exemplarily, referring to Fig. 3, the two-dimensional key points of the human skeleton can represent 14 key positions of the human body or 17 key positions of the human body.
  • the solid dots collectively represent the 17 key positions of the human body.
  • the embodiment of the present disclosure can obtain the two-dimensional key points of each human skeleton, and determine the depth detection result of each human body based on the two-dimensional key points of each human skeleton.
  • the correlation between the two-dimensional key points of the skeletons of different human bodies is small. Therefore, the embodiments of the present disclosure can realize depth detection of at least two human bodies in an image.
  • the two-dimensional key point information of a human body whose position overlap with the mask image of each human body reaches a set value may be used as each human body.
  • 2D keypoint information of the volume may be used as each human body.
  • the set value may be a value preset according to an actual application scenario, for example, the set value may be any value between 80% and 90%;
  • the coordinate information of the two-dimensional key points of the human body and the position information of the mask image of the human body determine the degree of overlap between the two-dimensional key point information of each human body and the mask image of each human body.
  • the two-dimensional key point information of the at least two human bodies can be displayed in two of the above-mentioned at least two human bodies.
  • the 2D key point information of a human body with the highest overlap with the position of the mask image is selected.
  • the two-dimensional key point information of each human body can be directly determined according to the positional overlap between the two-dimensional key point information and the mask image of each human body, which is beneficial to accurately obtain each human body.
  • 2D keypoint information of the volume can be directly determined according to the positional overlap between the two-dimensional key point information and the mask image of each human body, which is beneficial to accurately obtain each human body.
  • optimization processing may be performed on the two-dimensional key point information of at least two human bodies in the above-mentioned current frame image, so as to obtain the two-dimensional key point information of the at least two human bodies after the optimization processing;
  • the two-dimensional key point information of at least two human bodies the two-dimensional key point information of one human body whose position overlap with the mask image of each human body reaches a set value is used as the two-dimensional key point information of each human body.
  • At least one frame of image can be further In the case of including historical frame images, the two-dimensional key point information of at least two human bodies in the current frame image and the two-dimensional key point information of at least two human bodies in the historical frame images are processed to obtain at least two human bodies after optimization processing. 2D keypoint information of .
  • time series filtering processing may be performed on the two-dimensional key point information of at least two human bodies in the current frame image and the two-dimensional key point information of at least two human bodies in the historical frame image, to obtain at least two human bodies after filtering processing.
  • two-dimensional key points of the body methods of time series filtering processing include but are not limited to time series low-pass filtering, time series extended Kalman filtering; in other embodiments, the two-dimensional key point information of at least two human bodies in the current frame image can be The skeleton limb length optimization process is performed with the two-dimensional key point information of at least two human bodies in the historical frame images, and the two-dimensional key point information of the at least two human bodies after filtering processing is obtained.
  • optimizing the two-dimensional key point information of at least two human bodies in the current frame image in combination with the two-dimensional key point information of the at least two human bodies in the historical frame image is beneficial to improve the timing stability of the two-dimensional key point information. , which is beneficial to improve the timing stability of human depth detection.
  • the two-dimensional key point of each human body can be determined.
  • the coordinate information of the three-dimensional key point information of each human body can be used as the depth information of the two-dimensional key points of each human body.
  • the depth information represents the depth information of the pixels that overlap with the two-dimensional keypoint positions.
  • any pixel in the mask image of each human body is not a pixel that overlaps with the position of the two-dimensional key point, it can be considered that any of the above-mentioned pixels is the first pixel.
  • the depth information of the dimensional key points is interpolated to obtain the depth information of the first pixel in the mask image of each human body.
  • Interpolation is an important method for discrete function approximation. Using interpolation, the approximate value of the function at other points can be estimated by the value of the function at a limited number of points.
  • the depth information of the complete pixels in the mask image of each human body may be obtained based on an interpolation processing method under a preset spatial continuity constraint.
  • a smoothing filtering process may also be performed on the depth information of each pixel in the mask image of each human body.
  • a depth map of each human body can also be generated based on the depth information of each pixel, and the depth map can be displayed on the electronic device in the display interface.
  • the embodiments of the present disclosure can determine the depth information for any pixel point of the mask image of each human body, and can comprehensively realize the depth detection of each human body in the image.
  • the depth information of the two-dimensional key points of each human body determines the discrete function used to characterize the relationship between the pixel position and the pixel depth information; according to the depth information of the two-dimensional key points of each human body, supplement the discrete function at the position of the first pixel point. Taking a value, the value of the discrete function at the position of the first pixel is determined as the depth information of the first pixel.
  • the above-mentioned contents merely describe the principle of the interpolation processing, and do not limit the specific implementation of the interpolation processing.
  • the specific implementation of the interpolation processing includes, but is not limited to, nearest neighbor interpolation. Completion, interpolation completion based on breadth-first search, etc.
  • the two-dimensional key point information of at least one human body in the current frame image can be matched with the human body mask image of the target human body to obtain the image in the current frame image.
  • the two-dimensional key point information of the target human body then, according to the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image, the depth detection result of the target human body in the current frame image is determined.
  • the two-dimensional key point information of at least one human body in the current frame image with the human body mask image of the target human body, the two-dimensional key point information of the target human body can be directly obtained, and then the target can be determined.
  • the result of the depth detection of the human body that is, the depth detection of the target human body in the image can be realized without relying on special hardware devices such as a 3D depth camera.
  • the two-dimensional key point information of the target human body may be determined from the two-dimensional key point information of the at least one human body; the two-dimensional key point information of the target human body is: the position of the human body mask image of the target human body. Two-dimensional key point information of a human body whose degree of overlap reaches a set value.
  • the two-dimensional key point information of each human body and the target human body's two-dimensional key point information can be determined according to the coordinate information of the two-dimensional key points of each human body in the at least one human body and the position information of the human body mask image of the target human body.
  • the degree of overlap of body mask images can be determined according to the coordinate information of the two-dimensional key points of each human body in the at least one human body and the position information of the human body mask image of the target human body.
  • the two-dimensional key point information of the multiple human bodies may be selected from the two-dimensional key point information of the target human body.
  • the two-dimensional key point information of a human body with the highest position overlap of the human body mask image of the target human body is used as the two-dimensional key point information of the target human body.
  • the two-dimensional key point information of the target human body can be directly determined according to the positional overlap between the two-dimensional key point information and the human body mask image of the target human body, which is beneficial to accurately obtain the target human body's information.
  • 2D keypoint information can be directly determined according to the positional overlap between the two-dimensional key point information and the human body mask image of the target human body, which is beneficial to accurately obtain the target human body's information.
  • the target human body in the current frame image can be determined.
  • the coordinate information of the three-dimensional key points of the target human body may be used as the depth information of the two-dimensional key points of the target human body.
  • any pixel in the body mask image of the target human body or the set of pixel points is not a pixel that overlaps with the position of the two-dimensional key point, it can be considered that any of the above-mentioned pixels is the first pixel.
  • the pixel points adjacent to the first pixel point are used as the depth information of the first pixel point; that is, for the first pixel point, the pixels adjacent to the first pixel point can be selected from the pixel points overlapping with the position of the two-dimensional key point. point, the depth information of the first pixel point is determined based on the Z-axis coordinate value of the three-dimensional key point corresponding to the selected pixel point.
  • the embodiments of the present disclosure can determine the depth information for the human body mask image of the target human body or any pixel point in the pixel point set, and can comprehensively realize the depth detection of the target human body in the image.
  • the connected area of the two-dimensional key points can be searched based on the two-dimensional key points of the target human body in the current frame image, and the human body of the target human body can be masked. Pixels not included in the connected region in the membrane image are deleted to obtain a set of pixels.
  • the two-dimensional key points of the target human body in the current frame image are used as seed points, and a breadth-first search is performed to determine the two-dimensional key points of the target human body in the current frame image. A connected region of the two-dimensional keypoints is searched.
  • the pixels that are not included in the connected area in the human body mask image of the target human body are pixels that cannot be searched on the basis of two-dimensional key points, and the two-dimensional key points represent key positions in the human body. Therefore, the human body of the target human body Pixels not included in the connected area in the mask image can be considered as wrong pixels; by deleting the pixels not included in the connected area in the human body mask image of the target human body, it is beneficial to improve the accuracy of the depth detection of the target human body.
  • an implementation manner of determining the depth detection result of the human body in the current frame image can be:
  • the two-dimensional key point information of at least one human body in the current frame image after the optimization process can be obtained first, and then, according to the two-dimensional key point information after the optimization process, the corresponding two-dimensional key point can be further determined.
  • any one of the frame images in response to the presence of a two-dimensional key point of the target human body in any one frame of images in the at least one frame of image, and the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the above-mentioned any one frame of image is in a predetermined area situation, determine that any one of the frame images is a valid image.
  • the two-dimensional key point information of at least one human body in the current frame image is Matching with the human body mask image of the target human body may not obtain the two-dimensional key point information of the target human body in the current frame image, that is, there is a situation where there is no two-dimensional key point of the target human body in any frame of at least one frame of image.
  • the three-dimensional key points are obtained from the two-dimensional key points, in the case that there is no two-dimensional key point of the target human body in any frame of images, it can be determined that there is no target human body in any of the above-mentioned frames of images. 3D keypoints.
  • the two-dimensional key points of the target human body in the above-mentioned any frame of images may be determined according to the coordinate information in the three-dimensional key point information.
  • the corresponding 3D key points are in the preset area.
  • images other than valid images may be regarded as invalid images, and the processing of invalid images may also be omitted, so that the accuracy of human depth detection can be improved.
  • the preset area may be preset according to the actual application scenario; in some embodiments, the current frame image may be determined according to the coordinate information of the three-dimensional key points corresponding to the two-dimensional key points of the target human body in the current frame image. The distance between the 3D key point corresponding to the 2D key point of the target human body and the image acquisition device. If the distance between the 3D key point and the image acquisition device is greater than the set distance, determine the corresponding 2D key point of the target human body in the current frame image.
  • the 3D key point is not in the preset area; when the distance between the 3D key point and the image acquisition device is less than or equal to the set distance, it can be determined that the 3D key point corresponding to the 2D key point of the target body in the current frame image is in the preset area. set area.
  • the coordinate value of the Z axis in the coordinate information of the three-dimensional key points represents the distance between the three-dimensional key point and the image acquisition device. Therefore, the distance between the three-dimensional key point and the image acquisition device can be determined according to the coordinate information of the three-dimensional key point. Is it greater than the set distance.
  • the set distance may be data preset according to actual application requirements.
  • the three-dimensional key point is a key point that meets the requirements. It is beneficial to obtain the depth detection result of the target human body accurately in the follow-up.
  • the coordinate information of the two-dimensional key points of the target human body in the current frame image after optimization processing can be obtained according to the coordinate information of the two-dimensional key points of the target human body in the valid historical frame images of at least one frame of images.
  • the two-dimensional key point of the target human body in response to the fact that the two-dimensional key point of the target human body is not detected from the current frame image, or the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the current frame image is not in the preset area, From the valid historical frame images of at least one frame of image, one frame of image can be selected, and the coordinate information of the two-dimensional key point of the target human body in the selected one frame of image is used as the target human body in the current frame image after optimization processing.
  • the coordinate information of the 2D keypoints From the valid historical frame images of at least one frame of image, one frame of image can be selected, and the coordinate information of the two-dimensional key point of the target human body in the selected one frame of image is used as the target human body in the current frame image after optimization processing.
  • the coordinate information of the 2D keypoints in response to the fact that the two-dimensional key point of the target human body is not detected from the current frame image, or the three-dimensional key point corresponding to the
  • the two-dimensional key points of the target human body in the current frame image after the optimization process can be obtained according to the two-dimensional key points of the target human body in the valid historical frame images, which is beneficial to improve the subsequent human body depth. Stability of test results.
  • an implementation manner of selecting a frame of images from the valid historical frame images of the at least one frame of images may be, in the valid historical frame images of the at least one frame of images, selecting the minimum time interval from the current frame image For example, at least one frame of image is recorded as the first frame image to the fifth frame image in chronological order, wherein the fifth frame image is the current frame image, and the first frame image to the third frame image are valid
  • the 4th frame image is an invalid historical frame image. In this way, when the 5th frame image does not have the two-dimensional key point of the target human body, you can select from the 1st frame image to the 3rd frame image.
  • the third frame image with the smallest time interval from the current frame image.
  • the two-dimensional key point information of at least one human body in the current frame image after optimization processing is obtained, which is beneficial to accurately obtain the target of the current frame image.
  • 2D keypoint information of the human body is obtained, which is beneficial to accurately obtain the target of the current frame image.
  • the target human body in the current frame image and the valid historical frame images of at least one frame image can be The coordinate information of the two-dimensional key points of the target human body in the optimized current frame image is obtained.
  • the coordinate information of the two-dimensional key points of the target human body in the current frame image and the valid historical frame images in at least one frame image may be averagely calculated to obtain the optimized two-dimensional image of the target human body in the current frame image. Coordinate information of dimension key points.
  • At least one frame of images is recorded as the 6th frame to the 8th frame in chronological order, wherein the 8th frame is the current frame, and the 6th to 8th frames are all valid historical frames.
  • the average calculation can be performed on the coordinate information of the two-dimensional key points of the target human body in the sixth frame image to the eighth frame image, and the result of the average calculation can be used as the updated two-dimensional key point of the target human body in the eighth frame image. Coordinate information.
  • updating the coordinate information of the two-dimensional key points of the target human body in the current frame image according to the coordinate information of the two-dimensional key points of the target human body in the current frame image and the valid historical frame images of at least one frame image is beneficial to The coordinate information of the two-dimensional key points of the current frame image is smoothed.
  • FIG. 4A is a schematic diagram of two-dimensional key points of a target human body provided by an embodiment of the present disclosure. As shown in FIG. 4A , circles in the human body represent two-dimensional key points of the target human body in the current frame image.
  • the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image may be determined; in some embodiments, the target body in the current frame image may be displayed at the same time.
  • FIG. 4B is a schematic diagram of the three-dimensional key points and the human body mask image of the target human body provided by the embodiment of the present disclosure, as shown in FIG.
  • the location of the point represents the location of the image acquisition device
  • the location of point O displays the three coordinate axes of the camera coordinate system
  • the human body mask image of the target human body is the outline of the human body shown in Figure 4B
  • the two-dimensional key points of the target human body correspond to
  • the 3D keypoints are the pattern of filled dots behind the body mask image of the target body.
  • the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image can be determined. , and determine the depth detection result of the target human body in the current frame image.
  • FIG. 5 is a schematic structural diagram of the implementation of a depth detection method according to an embodiment of the present disclosure.
  • the image acquisition device 501 can send the acquired multi-frame images to the processor 5021 of the electronic device 502.
  • the multi-frame images The image includes the current frame image and the historical frame image, and the multi-frame images are all RGB images; the processor 5021 can perform human body image segmentation on the current frame image of the multi-frame images to obtain at least two human body mask images;
  • the image is subjected to detection and tracking of human body key points, and two-dimensional key point information and three-dimensional key point information of at least two human bodies in the current frame image are obtained.
  • post-processing optimization can also be performed. And the above-mentioned process of performing interpolation processing on the depth information of the two-dimensional key points.
  • Another depth detection method provided by an embodiment of the present disclosure can also be implemented by the schematic structural diagram shown in FIG. 5021; the processor 5021 can perform image segmentation of a single human body on the current frame image of the multi-frame images to obtain a human body mask image of the target human body; it can also perform detection and tracking of key points of the human body based on the multi-frame images to obtain the current frame image.
  • Two-dimensional key point information and three-dimensional key point information of at least one human body After obtaining the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image, post-processing optimization can also be performed, and the post-processing optimization includes the above-mentioned optimization of the two-dimensional key point information and the three-dimensional key point information. process.
  • the depth detection result of the target human body in the current frame image is determined according to the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image, and the mask image of the target human body, based on the current frame image.
  • the depth detection result of the target human body in the image generates a depth map of the target human body, and the depth map can be displayed on the display interface 5022 of the electronic device 502 to realize human-computer interaction.
  • the display interface 5022 can also display a point cloud corresponding to each pixel in the depth map;
  • FIG. 6 is a schematic diagram of a point cloud provided by an embodiment of the present disclosure.
  • the dots of represent the point cloud composed of pixel points
  • the bold solid dots represent the key points of the skeleton
  • the lines between the bold solid dots represent the skeleton of the human body.
  • the AR effect display may also be performed based on the depth detection result of the human body.
  • the positional relationship between the human body and at least one target object in the AR scene may be determined according to the depth detection result of the human body in the current frame image; based on the positional relationship, the combined presentation mode of the human body and the at least one target object is determined; based on the combination The presentation method shows the AR effect superimposed on the human body and at least one target object.
  • the target object may be an object that actually exists in a real scene, and the depth information of the target object may be known, or may be information determined according to the shooting data of the target object; the target object may also be a preset virtual object, the virtual The depth information of the object is predetermined.
  • the positional relationship between the at least two human bodies and at least one target object in the AR scene, and the position between the at least two human bodies may be determined according to the depth detection results of the at least two human bodies and the depth information of the target object.
  • the positional relationship between each human body and the target object in the AR scene may include the following situations: 1) the human body is closer to the image acquisition device than the target object, 2) the target object is closer to the image than the human body The acquisition device, 3) the human body is located on the right, left, upper or lower side of the target object, 4) a part of the human body is closer to the image acquisition device than the target object, and the other part is farther away from the image acquisition device than the target object;
  • the positional relationship between at least two human bodies may include the following situations: 1) one human body is closer to the image acquisition device than the other human body, 2) one human body is located on the side, left, upper or lower side of the other human body 3) A part of a human body is closer to the image acquisition device than another human body, and
  • a combined presentation mode of the at least two human bodies and the at least one target object may be determined, so that the combined presentation mode reflects the above positional relationship. In this way, based on the combined presentation mode , to display the AR effect of multiple human bodies and at least one target object superimposed, which is beneficial to improve the AR display effect.
  • the positional relationship between the target human body and at least one target object in the AR scene can be determined according to the depth detection result of the target human body and the depth information of the target object;
  • the positional relationship can include the following situations: 1) the target human body is closer to the image acquisition device than the target object, 2) the target object is closer to the image acquisition device than the target human body, 3) the single human body is located on the right side of the target object , left side, upper side or lower side, 4) A part of a single human body is closer to the image acquisition device than the target object, and the other part is farther away from the image acquisition device than the target object; it should be noted that the above is only for the target object.
  • the positional relationship between the human body and the target object in the AR scene is exemplified, and the embodiments of the present disclosure are not limited thereto.
  • a combined presentation mode of the target human body and the at least one target object can be determined, so that the combined presentation mode reflects the above positional relationship.
  • the target human body and the at least one The AR effect superimposed on the target object is beneficial to enhance the AR display effect.
  • an embodiment of the present disclosure further provides a depth detection apparatus 7, and the depth detection apparatus 7 may be located in the electronic device 502 described above.
  • FIG. 7 is a schematic structural diagram of a depth detection apparatus 7 according to an embodiment of the present disclosure. As shown in FIG. 7 , the depth detection apparatus 7 may include:
  • the acquisition module 701 is configured to: acquire at least one frame of image collected by the image acquisition device, where the at least one frame of image includes the current frame image;
  • the processing module 702 is configured to: perform human body image segmentation on the current frame image to obtain a mask image of the human body; perform human body key point detection on the at least one frame image to obtain two human body images in the current frame image. 3D key point information and 3D key point information;
  • the detection module 703 is configured to: determine the depth detection result of the human body in the current frame image according to the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image, and the mask image of the human body, wherein,
  • the human body includes a single human body or at least two human bodies.
  • the human body includes at least two human bodies; the detection module 703 is specifically configured as:
  • the depth detection result of each human body in the current frame image is determined according to the three-dimensional key point information corresponding to the two-dimensional key point information belonging to each human body respectively.
  • the detection module 703 is specifically configured to, in the two-dimensional key point information of the at least two human bodies, overlap the position of the mask image with each human body to a predetermined value.
  • the two-dimensional key point information of a human body is used as the two-dimensional key point information of each human body.
  • the detection module 703 is specifically configured as:
  • the first pixel point represents the each person Any pixel in the mask image of the volume except for the pixel that overlaps with the position of the two-dimensional key point.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of one human body whose position overlap with the mask image of each human body reaches a set value is used as the information of each human body. 2D keypoint information.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of at least two human bodies in the current frame image and the two-dimensional key point information of at least two human bodies in the historical frame image Perform processing to obtain two-dimensional key point information of at least two human bodies after optimization processing.
  • the human body includes a single target human body
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information includes coordinate information of two-dimensional key points
  • the detection module 703 is specifically configured as:
  • the target human body in the image In response to the fact that the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the current frame image is in the preset area, according to the current frame image and the valid historical frame image of the at least one frame image, the target human body in the image.
  • the coordinate information of the two-dimensional key points is obtained, and the coordinate information of the two-dimensional key points of the target human body in the current frame image after optimization processing is obtained.
  • the detection module 703 is specifically configured to average the coordinate information of the two-dimensional key points of the target human body in the current frame image and the valid historical frame images in the at least one frame image. The calculation is performed to obtain the coordinate information of the two-dimensional key points of the target human body in the current frame image after optimization processing.
  • the detection module 703 is further configured to: in response to detecting a two-dimensional key point of the target human body from any frame of the at least one frame of images, and to detect the any frame In the case where the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the image is in the preset area, it is determined that any one frame of image is a valid image.
  • the detection module 703 is specifically configured as:
  • the distance between the three-dimensional key points corresponding to the two-dimensional key points of the target human body in the current frame image and the image acquisition device is determined ;
  • the distance is less than or equal to the set distance, it is determined that the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the current frame image is in the preset area.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of the target human body in the current frame image is obtained;
  • the depth detection result of the target human body in the current frame image is determined according to the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of the target human body determines the two-dimensional key point information of the target human body; the two-dimensional key point information of the target human body is: the position of the human body mask image of the target human body Two-dimensional key point information of a human body whose degree of overlap reaches a set value.
  • the detection module 703 is specifically configured as:
  • the pixel point set includes: the pixel points of the human body mask image of the target human body after filtering processing according to a preset filtering method.
  • the detection module 703 is further configured to:
  • the connected area of the two-dimensional key points is searched based on the two-dimensional key points of the target human body in the current frame image, and the two-dimensional key points in the human body mask image of the target human body are searched. Pixels not included in the connected region are deleted to obtain the set of pixels.
  • processing module 702 is further configured to:
  • an AR effect superimposed on the human body and the at least one target object is displayed.
  • the two-dimensional key point information is a two-dimensional key point representing a human skeleton
  • the three-dimensional key point information is a three-dimensional key point representing a human skeleton
  • At least one frame of image collected by the image collection device is an RGB image.
  • the acquisition module 701, the processing module 702 and the detection module 703 can all be implemented by a processor in an electronic device, and the above-mentioned processor can be an ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, at least one of the microprocessors.
  • the above-mentioned display method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure essentially or the parts that make contributions to the prior art can be embodied in the form of a software product, and the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a terminal, a server, etc.) is caused to execute all or part of the methods of various embodiments of the present disclosure.
  • the aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • an embodiment of the present disclosure further provides a computer program product, where the computer program product includes computer-executable instructions, and the computer-executable instructions are used to implement the depth detection method provided by the embodiment of the present disclosure.
  • an embodiment of the present disclosure further provides a computer storage medium, where computer-executable instructions are stored on the computer storage medium, and the computer-executable instructions are used to implement the depth detection method provided by the foregoing embodiments.
  • FIG. 8 is a schematic structural diagram of the electronic device 10 provided by an embodiment of the present disclosure. As shown in FIG. 8 , the electronic device 502 includes:
  • the processor 5021 is configured to implement any one of the above depth detection methods when executing the executable instructions stored in the memory.
  • the memory 801 is configured to store computer programs and applications by the processor 5021, and can also cache data to be processed or processed by the processor 5021 and various modules in the electronic device (for example, image data, audio data, voice communication data and video communication data). ), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • the above-mentioned processor 5021 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that the electronic device that implements the function of the above processor may also be other, which is not limited by the embodiment of the present disclosure.
  • the above-mentioned computer-readable storage medium/memory can be ROM, programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Magnetic Random Access Memory (Ferromagnetic Random Access Memory, FRAM), Flash Memory (Flash Memory), Magnetic Surface Memory, Optical Disk, or Optical Disk ( Compact Disc Read-Only Memory, CD-ROM) and other memories; it can also be various terminals including one or any combination of the above memories, such as mobile phones, computers, tablet devices, personal digital assistants, etc.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.
  • the unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit; it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present disclosure.
  • each functional unit in each embodiment of the present disclosure may be all integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.
  • the above-mentioned integrated units of the present disclosure are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure may be embodied in the form of software products that are essentially or contribute to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to make The device automated test line performs all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
  • Embodiments of the present disclosure disclose a depth detection method, device, electronic device, storage medium, and program.
  • the method includes: acquiring at least one frame of image collected by an image acquisition device, where the at least one frame of image includes a current frame image;
  • the image is divided into a human body image to obtain a mask image of the human body;
  • the human body key point detection is performed on at least one frame of the image to obtain the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image; according to the current frame image
  • the two-dimensional key point information and three-dimensional key point information of the human body and the mask image of the human body are used to determine the depth detection result of the human body in the current frame image, wherein the human body includes a single human body or at least two human bodies.
  • the depth detection method provided by the embodiments of the present disclosure can realize the depth detection of a human body in an image without relying on special hardware devices such as a 3D depth camera, and can be applied to scenarios such as AR interaction

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种深度检测方法、装置、电子设备、计算机可读存储介质及程序,该方法包括:获取图像采集设备采集的至少一帧图像,至少一帧图像包括当前帧图像;对当前帧图像进行人体图像的分割,得到人体的掩膜图像;对至少一帧图像进行人体关键点的检测,得出当前帧图像中人体的二维关键点信息和三维关键点信息;根据当前帧图像中人体的二维关键点信息和三维关键点信息、以及人体的掩膜图像,确定当前帧图像中人体的深度检测结果,其中,人体包括单个人体或者至少两个人体。

Description

深度检测方法、装置、电子设备、存储介质及程序
相关申请的交叉引用
本公开基于申请号为202011344694.2、申请日为2020年11月24日、申请名称为“深度检测方法、装置、电子设备和计算机可读存储介质”的中国专利申请,以及申请号为202011335257.4、申请日为2020年11月24日、申请名称为“深度检测方法、装置、电子设备和计算机可读存储介质”的中国专利申请提出,并要求上述中国专利申请的优先权,上述中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机视觉技术领域,具体而言,涉及但不限于一种深度检测方法、装置、电子设备、存储介质及计算机程序。
背景技术
在相关技术中,图像的深度检测技术在增强现实(Augmented Reality,AR)交互、虚拟拍照等应用中有着重要应用;在缺少三维深度相机等特殊硬件设备的场景中,如何实现图像的人体深度检测,是亟待解决的技术问题。
发明内容
本公开实施例提供了一种深度检测方法、装置、电子设备、存储介质及计算机程序。
本公开实施例的技术方案是这样实现的:
本公开实施例提供了一种深度检测方法,所述方法应用于电子设备,所述方法包括:
获取图像采集设备采集的至少一帧图像,所述至少一帧图像包括当前帧图像;
对所述当前帧图像进行人体图像的分割,得到人体的掩膜图像;对所述至少一帧图像进行人体关键点的检测,得出所述当前帧图像中人体的二维关键点信息和三维关键点信息;
根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,其中,所述人体包括单个人体或者至少两个人体。
本公开实施例还提供了一种深度检测装置,所述装置包括:
获取模块配置为:获取图像采集设备采集的至少一帧图像,所述至少一帧图像包括当前帧图像;
处理模块配置为:对所述当前帧图像进行人体图像的分割,得到人体的掩膜图像;对所述至少一帧图像进行人体关键点的检测,得出所述当前帧图像中人体的二维关键点信息和三维关键点信息;
检测模块配置为:根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,其中,所述人体包括单个人体或者至少两个人体。
本公开实施例还提供了一种电子设备,所述电子设备包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,以实现上述任意一种深度检测方法。
本公开实施例还提供了一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现上述任意一种深度检测方法。
本公开实施例还提供了一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如前任一项所述的深度检测 方法。
本公开实施例可以结合人体的人体掩膜图像以及人体的二维关键点和三维关键信息来确定人体的深度检测结果,无需通过三维深度相机等特殊硬件设备获取图像中人体的深度信息,因而,本公开实施例可以在不依赖于三维深度相机等特殊硬件设备的情况下,实现图像中的人体的深度检测,可以应用于AR交互、虚拟拍照等场景。
关于上述深度检测方法、装置、电子设备、存储介质及计算机程序的效果描述参见上述深度检测方法的说明,这里不再赘述。
为使本公开的上述目的、特征和优点能更明显易懂,下文特结合实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本公开实施例提供的终端与服务器连接示意图;
图2为本公开实施例提供的一种深度检测方法的流程示意图;
图3是本公开实施例提供的人体骨架二维关键点的示意图;
图4A为本公开实施例提供的目标人体的二维关键点的示意图;
图4B为本公开实施例提供的三维关键点和目标人体的人体掩膜图像的示意图;
图5为本公开实施例提供的一种深度检测方法实现的结构示意图;
图6为本公开实施例提供的点云的示意图;
图7为本公开实施例提供的一种深度检测装置的结构示意图;
图8为本公开实施例提供的电子设备的结构示意图。
具体实施方式
以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本公开,并不用于限定本公开。另外,以下所提供的实施例是用于实施本公开的部分实施例,而非提供实施本公开的全部实施例,在不冲突的情况下,本公开实施例记载的技术方案可以任意组合的方式实施。
需要说明的是,在本公开实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元,例如的单元可以是部分电路、部分处理器、部分程序或软件等等)。
例如,本公开实施例提供的深度检测方法包含了一系列的步骤,但是本公开实施例提供的深度检测方法不限于所记载的步骤,同样地,本公开实施例提供的深度检测装置包括了一系列模块,但是本公开实施例提供的装置不限于包括所明确记载的模块,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的模块。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或C,可以表示:单独存在A,同时存在A和C,单独存在C这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、C、D中的至少一种,可以表示包括从A、C和D构成的集合中选择的任意一个或多个元素。
在相关技术中,可以利用三维深度相机等特殊硬件,实现图像中人体的深度检测,这里的三维深度相机可以是具有双目摄像头并采用双目视觉技术获取深度信息的相机;但是,使用这些特殊硬件, 会提高应用成本,在一定程度上限制了应用场景。
在基于单目相机拍摄的图像进行人体深度估计的情况下,对深度估计的精度、所提供的信息量的要求是比较低的;在基于单目相机拍摄的图像进行人体深度估计的情况下,只能估计人体各个像素点之间的相对深度,不能估计人体像素点与相机之间的深度,在一定程度限制了应用范围;在有些情况下,只能针对人体的各个像素点估计出单一的深度,因此,估计出的深度信息较少;在有些情况下,可以基于连续帧的图像匹配算法实现深度信息估计,但是,这种方案增加了时间资源和计算资源的消耗,不能适用于低功耗的实时应用场景。
针对上述技术问题,本公开实施例提供一种深度检测方法、装置、电子设备、存储介质及计算机程序,本公开实施例提供的深度检测方法,能够在不依赖于三维深度相机等高成本复杂硬件设备的情况下,实现图像中的人体深度检测;本公开实施例提供的深度检测方法可以应用电子设备中,下面说明本公开实施例提供的电子设备的示例性应用。
在一些实施例中,本公开实施例提供的电子设备可以为AR眼镜、笔记本电脑、平板电脑、台式计算机、移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备)等各种具有图像采集设备的终端,图像采集设备可以是单目相机等设备,示例性地,终端可以是带有摄像头的手机。
示例性的,终端在接收到图像采集设备采集的图像后,可以按照本公开实施例的深度检测方法对图像采集设备采集的图像进行深度检测,得到图像中人体的深度检测结果。
在一些实施例中,本公开实施例提供的电子设备也可以为与上述终端形成通信连接的服务器。图1为本公开实施例提供的终端与服务器连接示意图,如图1所示,终端100通过网络101连接服务器102,网络101可以是广域网或者局域网,又或者是二者的组合。
在一些实施例中,服务器102可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本公开实施例中不做限制。
终端100用于通过图像采集设备采集当前移动位置处的图像;可以将采集到的图像发送至服务器102;服务器102在收到图像后,可以按照本公开实施例的深度检测方法对接收到的图像进行深度检测,得到图像中人体的深度检测结果。
下面结合上述记载的内容,说明本公开实施例的深度检测方法。
图2为本公开实施例提供的一种深度检测方法的流程示意图,该方法应用于电子设备中。如图2所示,该流程可以包括步骤201至步骤203:
步骤201:获取图像采集设备采集的至少一帧图像,至少一帧图像包括当前帧图像。
本公开实施例中,图像采集设备可以采集图像,还可以将包括当前帧图像的至少一帧图像发送至电子设备的处理器。
在一些实施例中,至少一帧图像包括当前帧图像(当前时刻采集的一帧图像);在一些实施例中,至少一帧图像不仅包括当前帧图像,还包括历史帧图像,这里,历史帧图像表示图像采集设备采集的一帧或多帧历史图像。
在一些实施例中,在至少一帧图像为多帧图像的情况下,至少一帧图像可以是图像采集设备连续采集的连续帧图像,也可以是不连续的多帧图像,本公开实施例对此并不进行限定。
步骤202:对当前帧图像进行人体图像的分割,得到人体的掩膜图像;对至少一帧图像进行人体关键点的检测,得出当前帧图像中人体的二维关键点信息和三维关键点信息。
在一些实施例中,上述人体包括至少两个人体;相应地,对当前帧图像进行人体图像的分割,得到人体的掩膜图像的实现方式可以是:对当前帧图像进行人体图像的分割,得到至少两个人体的掩膜图像;并对至少一帧图像进行人体关键点的检测,得出当前帧图像中至少两个人体的二维关键点信息和三维关键点信息。
在一些实施例中,上述人体包括单个的目标人体;相应地,对当前帧图像进行人体图像的分割,得到人体的掩膜图像的实现方式可以是:对当前帧图像进行人体图像的分割,得到目标人体的掩膜图像;并对至少一帧图像进行人体关键点的检测,得出当前帧图像中至少一个人体的二维关键点信息和三维关键点信息。
本公开实施例中,可以按照预先训练的图像分割模型,对当前帧图像进行人体图像的分割,得到人体的人体掩膜图像。
本公开实施例中,图像分割模型可以是与人体图像的属性相关的模型,在一些实施例中,人体图像的属性可以包括面积、像素点的灰度值或其它属性;在一些实施例中,在人体的属性为面积的情况下,按照预先训练的图像分割模型,对当前帧图像进行人体图像的分割,可以得到面积大于设定面积的人体的掩膜图像。
本公开实施例,图像分割模型可以通过神经网络实现,例如,图像分割模型可以通过全卷积神经网络或其它神经网络实现。
本公开实施例,图像分割模型可以根据实际需求预先确定,实际需求包括但不限于耗时需求、精度需求等;即可以根据不同的实际需求,设置不同的图像分割模型。
示例性地,在上述人体包括至少两个人体的情况下,图像分割模型为至少两个人体的图像分割模型。
在上述人体包括单个的目标人体的情况下,图像分割模型为单个人体的图像分割模型,如此,利用单个人体的图像分割模型对当前帧图像进行单个人体图像的分割,可以得到目标人体的人体掩膜图像;这样,基于预先训练的单个人体的图像分割模型,可以直接得到目标人体的人体掩膜图像,具有便于实现的特点。
在一些实施例中,在当前帧图像中包括多个人体图像的情况下,利用单个人体的图像分割模型,可以从当前帧图像分割出目标人体的人体掩膜图像。
在一些实施例中,在人体的属性为面积的情况下,按照预先训练的单个人体的图像分割模型,对当前帧图像进行单个人体图像的分割,可以得到表征面积最大的一个人体的目标人体的人体掩膜图像。
需要说明的是,上述记载的内容仅仅是对图像分割模型的示例性说明,本公开实施例并不局限于此。
本公开实施例中,二维关键点用于表征在图像平面内的人体关键位置点;二维关键点信息可以包括二维关键点的坐标信息,二维关键点的坐标信息包括横坐标和纵坐标。
三维关键点信息可以包括三维关键点的坐标信息,这里,三维关键点的坐标信息表示三维关键点在相机坐标系的坐标,其中,相机坐标系表示以图像采集设备的聚焦中心为原点,以图像采集设备的光轴为Z轴建立的三维直角坐标系,相机坐标系的X轴和Y轴为图像平面的两个互相垂直的坐标轴。
在一些实施例中,在确定二维关键点信息之后,可以根据二维关键点信息,确定出二维关键点对应的三维关键点,并确定三维关键点的坐标信息;示例性的,可以预先训练关键点转换模型,该关键点转换模型用于实现二维关键点至三维关键点的转换;这样,在得到训练完成的关键点转换模型后,可以将二维关键点的坐标信息输入至训练完成的关键点转换模型,得到二维关键点对应的三维关键点以及三维关键点的坐标信息。需要说明的是,上述记载的内容仅仅是对得出三维关键的坐标信息的示例性说明,本公开实施例并不局限于此。
本公开实施例中,并不对关键点转换模型的网络结构进行限定,例如,关键点转换模型可以是时序的卷积网络或者非时序的全连接网络;关键点转换模型的网络结构可以根据实际应用需求预先确定。
在一些实施例中,在上述至少一帧图像为多帧图像的情况下,可以对至少一帧图像进行人体关键点的检测和跟踪,得到当前帧图像中至少一个人体的二维关键点信息和三维关键点信息;可以理解地,基于多帧图像进行人体关键点的跟踪,有利于准确地得出当前帧图像中至少一个人体的二维关键点信息,进而有利于得到准确的三维关键点信息。
在一些实施例中,在上述至少一帧图像为连续帧图像的情况下,可以对连续帧图像进行人体关键点的检测和跟踪,得到当前帧图像中至少一个人体的二维关键点信息和三维关键点信息;可以理解地,基于连续帧图像进行人体关键点的跟踪,有利于进一步准确地得出当前帧图像中至少一个人体的二维关键点信息,进而有利于得到准确的三维关键点信息。
步骤203:根据当前帧图像中人体的二维关键点信息和三维关键点信息、以及人体的掩膜图像,确定当前帧图像中人体的深度检测结果。
在一些实施例中,上述人体包括至少两个人体;相应地,步骤203的实现方式可以是:根据当前帧图像中至少两个人体的二维关键点信息和三维关键点信息、以及至少两个人体的掩膜图像,确定当前帧图像中所述至少两个人体的深度检测结果。
在一些实施例中,上述人体包括单个的目标人体;相应地,步骤203的实现方式可以是:根据当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及目标人体的人体掩膜图像,确定当前帧图像中目标人体的深度检测结果。
在实际应用中,上述步骤201至步骤203可以基于电子设备的处理器实现,上述处理器可以是特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,实现上述处理器功能的电子器件还可以为其它,本公开实施例不作限制。
可以看出,本公开实施例可以结合人体的人体掩膜图像以及人体的二维关键点和三维关键信息来确定多个人体的深度检测结果,无需通过三维深度相机等特殊硬件设备获取图像中人体的深度信息,因而,本公开实施例可以在不依赖于三维深度相机等特殊硬件设备的情况下,实现图像中的人体的深度检测,可以应用于AR交互、虚拟拍照等场景。
进一步地,本公开实施例可以得出人体各个像素点与相机之间的深度信息,并不是针对人体的各个像素点估计出一个单一的深度,得出的深度信息较为丰富,可以应用于多个场景中,例如,本公开实施例的应用范围包括但不限于:三维人体重建中动态人体的三维重建和呈现;增强现实应用中人体和虚拟场景的遮挡显示;增强现实应用中人体和重建场景的交互等;进一步地,本公开实施例并不是基于连续帧的图像匹配算法直接估计人体像素点的深度信息,而是利用人体的二维关键点信息和三维关键点信息确定人体像素点的深度信息,与基于连续帧的图像匹配算法实现深度信息估计的方案相比,降低了时间资源和计算资源的消耗,在深度信息的估计精度和确定深度信息的耗时之间取得了平衡。
在一些实施例中,上述图像采集设备采集的至少一帧图像为RGB图像;可以看出,本公开实施例可以基于容易获取的RGB图像实现多个人体的深度检测,具有容易实现的特点。
在一些实施例中,在上述人体包括至少两个人体的情况下,可以通过将所述至少两个人体中每个人体的二维关键点信息与所述至少两个人体的掩膜图像中每个人体的掩膜图像进行匹配,得到分别属于每个人体的二维关键点信息;然后,根据分别属于每个人体的二维关键点信息所对应的三维关键点信息,确定当前帧图像中每个人体的深度检测结果。
可以看出,本公开实施例通过将当前帧图像中至少两个人体的二维关键点信息与每个人体的掩膜图像进行匹配,可以直接得出每个人体的二维关键点信息,进而确定每个人体的深度检测结果。
本公开实施例中,上述二维关键点信息为表示人体骨架的二维关键点,三维关键点信息为表示人体骨架的三维关键点。
人体骨架的二维关键点用于表征在图像平面内的人体关键位置点,人体关键位置点包括但不限于五官、颈、肩、肘、手、臀、膝、脚等;人体关键位置可以根据实际情况预先设置;示例性地,参照图3,人体骨架的二维关键点可以表示14个人体关键位置或17个人体关键位置,图3中,空心圆圈表示14个人体关键位置,空心圆圈和实心圆点共同表示17个人体关键位置。
可以看出,本公开实施例可以得出每个人体骨架的二维关键点,并基于每个人体骨架的二维关键 点确定每个人体的深度检测结果,由于图像中不同人体的深度检测依赖于不同人体的骨架的二维关键点,不同人体的骨架的二维关键点的相关性较小,因而,本公开实施例可以实现图像中至少两个人体的深度检测。
在一些实施例中,可以在上述至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
这里,设定值可以是根据实际应用场景预先设置的数值,例如,设定值可以为在80%至90%之间的任一数值;本公开实施例中,可以根据至少两个人体中每个人体的二维关键点的坐标信息、以及人体的掩膜图像的位置信息,确定每个人体的二维关键点信息与每个人体的掩膜图像的重叠度。
在一些实施例中,对于任意一个人体的掩膜图像,如果至少两个人体的二维关键点信息与该掩膜图像的位置重叠度达到设定值,则可以在上述至少两个人体的二维关键点信息中,选取与该掩膜图像的位置重叠度最高的一个人体的二维关键点信息。
可以看出,本公开实施例中,可以根据二维关键点信息与每个人体的掩膜图像的位置重叠度,直接确定出每个人体的二维关键点信息,有利于准确地得到每个人体的二维关键点信息。
在一些实施例中,可以对上述当前帧图像中至少两个人体的二维关键点信息进行优化处理,得到优化处理后的至少两个人体的二维关键点信息;然后,在优化处理后的至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
对于对当前帧图像中至少两个人体的二维关键点信息进行优化处理,得到优化处理后的至少两个人体的二维关键点信息的实现方式,示例性地,可以在至少一帧图像还包括历史帧图像的情况下,对当前帧图像中至少两个人体的二维关键点信息和历史帧图像中至少两个人体的二维关键点信息进行处理,得到优化处理后的至少两个人体的二维关键点信息。
在一些实施例中,可以对当前帧图像中至少两个人体的二维关键点信息和历史帧图像中至少两个人体的二维关键点信息进行时序滤波处理,得到滤波处理后的至少两个人体的二维关键点;时序滤波处理的方法包括但不限于时序低通滤波、时序扩展卡尔曼滤波;在另一些实施例中,可以对当前帧图像中至少两个人体的二维关键点信息和历史帧图像中至少两个人体的二维关键点信息进行骨架肢体长度最优化处理,得到滤波处理后的至少两个人体的二维关键点信息。
可以理解地,结合历史帧图像中至少两个人体的二维关键点信息对当前帧图像中至少两个人体的二维关键点信息进行优化处理,有利于提升二维关键点信息的时序稳定性,进而有利于提升人体深度检测时序稳定性。
对于根据分别属于每个人体的二维关键点信息所对应的三维关键点信息,确定当前帧图像中每个人体的深度检测结果的实现方式,示例性地,可以确定每个人体的二维关键点信息所对应的三维关键点信息的坐标信息;根据三维关键点的坐标信息,确定每个人体的二维关键点的深度信息;对每个人体的二维关键点的深度信息进行插值处理,得到每个人体的掩膜图像中第一像素点的深度信息;其中,第一像素点,表示所述每个人体的掩膜图像中除与二维关键点位置重叠的像素点之外的任一像素点。
示例性的,由于二维关键点与三维关键点对应,因而,可以将每个人体的三维关键点信息的坐标信息作为每个人体的二维关键点的深度信息,这里,二维关键点的深度信息表示与二维关键点位置重叠的像素点的深度信息。
如果每个人体的掩膜图像中任意一个像素点不是与二维关键点位置重叠的像素点,则可以认为上述任意一个像素点为第一像素点,此时,可以通过对每个人体的二维关键点的深度信息进行插值处理,得到每个人体的掩膜图像中第一像素点的深度信息。
插值是离散函数逼近的重要方法,利用插值可通过函数在有限个点处的取值状况,估算出函数在其它点处的近似值。在一些实施例中,可以在预设的空间连续性约束条件下,基于插值处理方式,获得每个人体的掩膜图像中完整的像素点的深度信息。
在一些实施例中,在得到每个人体的掩膜图像中各个像素点的深度信息后,还可以对每个人体的掩膜图像中各个像素点的深度信息进行平滑滤波处理。
在一些实施例中,在得到每个人体的掩膜图像中各个像素点的深度信息后,还可以基于各个像素点的深度信息生成每个人体的深度图,并可以将深度图展示在电子设备的显示界面中。
可以看出,本公开实施例对于每个人体的掩膜图像的任意像素点,均可以确定深度信息,可以全面地实现图像中每个人体的深度检测。
对于对每个人体的二维关键点的深度信息进行插值处理,得到每个人体的掩膜图像中第一像素点的深度信息的实现方式,示例性地,可以根据每个人体的二维关键点的深度信息,确定出用于表征像素点位置和像素点深度信息的关系的离散函数;根据每个人体的二维关键点的深度信息,补充离散函数在所述第一像素点的位置的取值,将离散函数在所述第一像素点的位置的取值确定为所述第一像素点的深度信息。
本公开实施例中,上述记载的内容仅仅是对插值处理的原理进行了说明,并未对插值处理的具体实现方式进行限定,示例性地,插值处理的具体实现方式包括但不限于最近邻插值补全、基于广度优先搜索的插值补全等。
可以理解地,通过对每个人体的二维关键点的深度信息进行插值处理,可以满足每个人体的像素点深度信息的空间连续性要求,进而有利于提升人体深度检测结果的空间连续性。
在一些实施例中,在上述人体包括单个的目标人体的情况下,可以通过将当前帧图像中至少一个人体的二维关键点信息与目标人体的人体掩膜图像进行匹配,得到当前帧图像中目标人体的二维关键点信息;然后,根据当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定当前帧图像中所述目标人体的深度检测结果。
可以看出,本公开实施例通过将当前帧图像中至少一个人体的二维关键点信息与目标人体的人体掩膜图像进行匹配,可以直接得出目标人体的二维关键点信息,进而确定目标人体的深度检测结果,即,可以在不依赖于三维深度相机等特殊硬件设备的情况下,实现图像中的目标人体的深度检测。
在一些实施例中,可以在上述至少一个人体的二维关键点信息中,确定目标人体的二维关键点信息;目标人体的二维关键点信息为:与目标人体的人体掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息。
本公开实施例中,可以根据至少一个人体中每个人体的二维关键点的坐标信息、以及目标人体的人体掩膜图像的位置信息,确定每个人体的二维关键点信息与目标人体的人体掩膜图像的重叠度。
在一些实施例中,如果多个人体的二维关键点信息与目标人体的人体掩膜图像的位置重叠度达到设定值,则可以在上述多个人体的二维关键点信息中,选取与目标人体的人体掩膜图像的位置重叠度最高的一个人体的二维关键点信息,将选取的一个人体的二维关键点信息作为目标人体的二维关键点信息。
可以看出,本公开实施例中,可以根据二维关键点信息与目标人体的人体掩膜图像的位置重叠度,直接确定出目标人体的二维关键点信息,有利于准确地得到目标人体的二维关键点信息。
对于根据当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定当前帧图像中所述目标人体的深度检测结果的实现方式,示例性地,可以确定当前帧图像中目标人体的二维关键点信息对应的三维关键点信息的坐标信息;根据三维关键点的坐标信息,确定目标人体的二维关键点的深度信息;在与目标人体的二维关键点位置重叠的像素点中,确定与第一像素点相邻的像素点,将与第一像素点相邻的像素点的深度信息作为第一像素点的深度信息;其中,第一像素点表示目标人体的人体掩膜图像或像素点集合中除与目标人体的二维关键点位置重叠的像素点之外的任一像素点,像素点集合包括:对目标人体的人体掩膜图像的像素点按照预设过滤方式进行过滤处理后的像素点。
示例性的,由于二维关键点与三维关键点对应,因而,可以将目标人体的三维关键点的坐标信息作为目标人体的二维关键点的深度信息。
如果目标人体的人体掩膜图像或像素点集合中任意一个像素点不是与二维关键点位置重叠的像素点,则可以认为上述任意一个像素点为第一像素点,此时,可以直接将与第一像素点相邻的像素点作为第一像素点的深度信息;也就是说,对于第一像素点,可以在与二维关键点位置重叠的像素点中选取与第一像素点邻近的像素点,基于选取的像素点对应的三维关键点的Z轴坐标值,确定第一像素 点的深度信息。
可以看出,本公开实施例对于目标人体的人体掩膜图像或像素点集合的任意像素点,均可以确定深度信息,可以全面地实现图像中目标人体的深度检测。
对于像素点集合的确定方式,示例性地,可以在目标人体的人体掩膜图像内,基于当前帧图像中目标人体的二维关键点搜索二维关键点的连通区域,将目标人体的人体掩膜图像中连通区域不包含的像素点删除,得到像素点集合。
在一些实施例中,在目标人体的人体掩膜图像内,以当前帧图像中目标人体的二维关键点为种子点,进行广度优先搜索,从而确定当前帧图像中目标人体的二维关键点搜索所述二维关键点的连通区域。
这里,目标人体的人体掩膜图像中连通区域不包含的像素点是在二维关键点基础上搜索不到的像素点,而二维关键点表示人体中的关键位置,因而,目标人体的人体掩膜图像中连通区域不包含的像素点可以认为是错误的像素点;通过将目标人体的人体掩膜图像中连通区域不包含的像素点删除,有利于提升目标人体的深度检测的准确度。
在一些实施例中,根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果的实现方式可以是:
对当前帧图像中至少一个人体的二维关键点信息和三维关键点信息进行优化,得到优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息;基于优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及所述目标人体的人体掩膜图像,确定所述当前帧图像中所述目标人体的深度检测结果。
这里,可以首先得出优化处理后的所述当前帧图像中至少一个人体的二维关键点信息,然后,根据可以根据优化处理后的二维关键点信息,进一步确定出二维关键点对应的三维关键点,得到优化处理后的三维关键点的坐标信息。
下面对二维关键点信息的优化处理的过程进行示例性说明。
在一些实施例中,响应于至少一帧图像中任意一帧图像存在目标人体的二维关键点,且上述任意一帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,确定所述任意一帧图像为有效的图像。
本公开实施例中,在至少一帧图像中不包含人体图像时,或者,对至少一帧图像进行人体关键点的检测出现错误时,通过将当前帧图像中至少一个人体的二维关键点信息与目标人体的人体掩膜图像进行匹配,可能无法得到当前帧图像中目标人体的二维关键点信息,即出现至少一帧图像中任意一帧图像不存在目标人体的二维关键点的情况。可以理解地,由于三维关键点是根据二维关键点得出的,因而,在任意一帧图像不存在目标人体的二维关键点的情况下,可以确定上述任意一帧图像中不存在目标人体的三维关键点。
本公开实施例中,可以在确定目标人体的二维关键点对应的三维关键点的坐标信息后,根据三维关键点信息中的坐标信息,判断上述任意一帧图像中目标人体的二维关键点对应的三维关键点处于预设区域。
可以理解地,在确定出有效的图像后,后续可以针对有效的图像进行处理,有利于提升人体深度检测的准确性。在一些实施例中,对于有效的图像以外的图像,可以视为是无效的图像,还可以省略对无效的图像的处理,如此,可以提升人体深度检测的准确性。
本公开实施例中,预设区域可以根据实际应用场景预先设置;在一些实施例中,可以根据当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息,确定当前帧图像中目标人体的二维关键点对应的三维关键点与图像采集设备的距离,在三维关键点与图像采集设备的距离大于设定距离的情况下,确定当前帧图像中目标人体的二维关键点对应的三维关键点未处于预设区域;在三维关键点与图像采集设备的距离小于或等于设定距离的情况下,可以确定当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域。
示例性的,三维关键点的坐标信息中Z轴的坐标值表示三维关键点与图像采集设备之间的距离, 因而,可以根据三维关键点的坐标信息,判断三维关键点与图像采集设备的距离是否大于设定距离。
本公开实施例中,设定距离可以是根据实际应用需求预先设置的数据。
可以理解地,在三维关键点与图像采集设备的距离小于或等于设定距离的情况下,可以认为三维关键点为符合要求的关键点,此时,将对应的一帧图像作为有效的图像,有利于后续准确地得出目标人体的深度检测结果。
需要说明的是,上述记载的内容仅仅是对预设区域进行了示例性说明,本公开实施例对此并不进行限定。
在一些实施例中,响应于从当前帧图像未检测到目标人体的二维关键点,或者,从当前帧图像中目标人体的二维关键点对应的三维关键点未处于预设区域的情况,可以根据至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得出优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
在一些实施例中,响应于从当前帧图像未检测到目标人体的二维关键点,或者,从当前帧图像中目标人体的二维关键点对应的三维关键点未处于预设区域的情况,可以在至少一帧图像的有效的历史帧图像中,选取一帧图像,将选取的一帧图像中的目标人体的二维关键点的坐标信息作为优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
可以看出,本公开实施例中可以根据有效的历史帧图像中目标人体的二维关键点,得出优化处理后的当前帧图像的目标人体的二维关键点,有利于提升后续的人体深度检测结果的稳定性。
在一些实施例中,在至少一帧图像的有效的历史帧图像中选取一帧图像的实现方式可以是,在至少一帧图像的有效的历史帧图像中,选取与当前帧图像的时间间隔最小的一帧图像,例如,至少一帧图像按照时间先后顺序分别记为第1帧图像至第5帧图像,其中,第5帧图像为当前帧图像,第1帧图像至第3帧图像为有效的历史帧图像,第4帧图像为无效的历史帧图像,这样,在第5帧图像不存在目标人体的二维关键点的情况下,可以在第1帧图像至第3帧图像中,选取与当前帧图像的时间间隔最小的第3帧图像。
可以看出,根据与当前帧图像的时间间隔最小的历史帧图像,得到优化处理后的所述当前帧图像中至少一个人体的二维关键点信息,有利于准确地得出当前帧图像的目标人体的二维关键点信息。
在一些实施例中,响应于当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,可以根据当前帧图像和至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得到优化处理后的当前帧图像中目标人体的二维关键点的坐标信息。
在一些实施例中,可以对当前帧图像和至少一帧图像中有效的历史帧图像中目标人体的二维关键点的坐标信息进行平均计算,得到优化处理后的当前帧图像中目标人体的二维关键点的坐标信息。
例如,至少一帧图像按照时间先后顺序分别记为第6帧图像至第8帧图像,其中,第8帧图像为当前帧图像,第6帧图像至第8帧图像均为有效的历史帧图像,这样,可以对第6帧图像至第8帧图像的目标人体的二维关键点的坐标信息进行平均计算,将平均计算的结果作为第8帧图像中目标人体的二维关键点的更新后坐标信息。
可以理解地,根据当前帧图像和至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,更新当前帧图像中目标人体的二维关键点的坐标信息,有利于对当前帧图像的二维关键点的坐标信息进行平滑处理。
下面结合附图对本公开实施例的深度检测方法进行进一步示例性说明。
图4A为本公开实施例提供的目标人体的二维关键点的示意图,如图4A所示,人体中圆圈表示当前帧图像中目标人体的二维关键点。
在确定当前帧图像中目标人体的二维关键点后,可以确定当前帧图像中目标人体的二维关键点信息对应的三维关键点信息;在一些实施例中,可以同时展示当前帧图像中目标人体的二维关键点对应的三维关键点和目标人体的人体掩膜图像;图4B为本公开实施例提供的三维关键点和目标人体的人体掩膜图像的示意图,如图4B所示,O点所在位置表示图像采集设备所在位置,O点所在位置显示有相机坐标系的三个坐标轴,目标人体的人体掩膜图像为图4B中所示的人体轮廓,目标人体的二维 关键点对应的三维关键点为目标人体的人体掩膜图像后方的填充有点的图案。
基于前述实施例记载的内容,在确定当前帧图像中目标人体的二维关键点信息对应的三维关键点信息后,可以根据当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定当前帧图像中所述目标人体的深度检测结果。
在上述人体包括至少两个人体的情况下,下面结合附图对本公开实施例的深度检测方法进行进一步说明。
图5为本公开实施例提供的一种深度检测方法实现的结构示意图,如图5所示,图像采集设备501可以将采集的多帧图像发送至电子设备502的处理器5021,这里,多帧图像包括当前帧图像和历史帧图像,多帧图像均为RGB图像;处理器5021可以对多帧图像的当前帧图像进行人体图像分割,得到至少两个人体的掩膜图像;还可以基于多帧图像进行人体关键点的检测和跟踪,得到当前帧图像中至少两个人体的二维关键点信息和三维关键点信息。在得到当前帧图像中至少两个人体的二维关键点信息和三维关键点信息后,还可以执行后处理优化,后处理优化包括上述记载的对二维关键点信息进行时序滤波处理的过程、以及上述记载的对二维关键点的深度信息进行插值处理的过程。
在执行后处理优化后,根据当前帧图像中至少两个人体的二维关键点信息和三维关键点信息、以及所述至少两个人体的掩膜图像,确定当前帧图像中所述至少两个人体的深度检测结果,基于当前帧图像中所述至少两个人体的深度检测结果生成每个人体的深度图,并可以将深度图展示在电子设备502的显示界面5022中,实现人机交互。
在上述人体包括单个的目标人体的情况下,下面结合附图对本公开实施例的深度检测方法进行进一步说明。
本公开实施例提供的另一种深度检测方法,也可以通过图5所示的结构示意图实现,如图5所示,图像采集设备501可以将采集的多帧图像发送至电子设备502的处理器5021;处理器5021可以对多帧图像的当前帧图像进行单个人体的图像分割,得到目标人体的人体掩膜图像;还可以基于多帧图像进行人体关键点的检测和跟踪,得到当前帧图像中至少一个人体的二维关键点信息和三维关键点信息。在得到当前帧图像中至少一个人体的二维关键点信息和三维关键点信息后,还可以执行后处理优化,后处理优化包括上述记载的对二维关键点信息和三维关键点信息进行优化的过程。
在执行后处理优化后,根据当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及目标人体的掩膜图像,确定当前帧图像中目标人体的深度检测结果,基于当前帧图像中目标人体的深度检测结果生成目标人体的深度图,并可以将深度图展示在电子设备502的显示界面5022中,实现人机交互。
在上述人体包括至少一个人体的情况下,显示界面5022还可以展示深度图中每个像素点对应的点云;图6为本公开实施例提供的点云的示意图,图6中,人体轮廓内的点表示像素点构成的点云,加粗的实心圆点表示骨架关键点,加粗的实心圆点之间的连线表示人体的骨架。
可以理解地,通过展示深度图中每个像素点对应的点云,便于直观地获知像素点的位置,进一步地,通过展示骨架关键点,有利于直观地了解像素点与骨架关键点之间的关系。
在一些实施例中,在得到当前帧图像中人体的深度检测结果,还可以基于人体的深度检测结果进行AR效果的展示。
在一些实施例中,可以根据当前帧图像中人体的深度检测结果,确定人体与AR场景中至少一个目标对象的位置关系;基于位置关系,确定人体和至少一个目标对象的组合呈现方式;基于组合呈现方式,展示人体和至少一个目标对象相叠加的AR效果。
这里,目标对象可以是现实场景中实际存在的对象,目标对象的深度信息可以已知的,或者,可以是根据目标对象的拍摄数据确定的信息;目标对象还可以是预先设置的虚拟对象,虚拟对象的深度信息是预先确定的。
在一种实施方式中,可以根据至少两个人体的深度检测结果和目标对象的深度信息,确定至少两个人体与AR场景中至少一个目标对象的位置关系、以及至少两个人体之间的位置关系;示例性地,每个人体与AR场景中目标对象的位置关系可以包括以下几种情况:1)人体相较于目标对象更靠近 图像采集设备,2)目标对象相较于人体更靠近图像采集设备,3)人***于目标对象的右侧、左侧、上侧或下侧,4)人体的一部分相较于目标对象更靠近图像采集设备,另一部分相较于目标对象远离图像采集设备;至少两个人体之间的位置关系可以包括以下几种情况:1)一个人体相对于另一个人体更靠近图像采集设备,2)一个人***于另一个人体的侧、左侧、上侧或下侧,3)一个人体的部分位置相较于另一人体更靠近图像采集设备,其它部分位置相较于另一个人体远离图像采集设备;需要说明的是,上述仅仅是对多个人体与AR场景中至少一个目标对象的位置关系进行了示例性说明,本公开实施例并不局限于此。
在确定至少两个人体与AR场景中至少一个目标对象的位置关系后,可以确定至少两个人体和至少一个目标对象的组合呈现方式,使组合呈现方式反映上述位置关系,这样,基于组合呈现方式,展示多个人体和至少一个目标对象相叠加的AR效果,有利于提升AR展示效果。
在一种实施方式中,可以根据目标人体的深度检测结果和目标对象的深度信息,确定目标人体与AR场景中至少一个目标对象的位置关系;示例性地,目标人体与AR场景中目标对象的位置关系可以包括以下几种情况:1)目标人体相较于目标对象更靠近图像采集设备,2)目标对象相较于目标人体更靠近图像采集设备,3)单人人***于目标对象的右侧、左侧、上侧或下侧,4)单人人体的一部分相较于目标对象更靠近图像采集设备,另一部分相较于目标对象远离图像采集设备;需要说明的是,上述仅仅是对目标人体与AR场景中目标对象的位置关系进行了示例性说明,本公开实施例并不局限于此。
在确定目标人体和至少一个目标对象的位置关系后,可以确定目标人体和至少一个目标对象的组合呈现方式,使组合呈现方式反映上述位置关系,这样,基于组合呈现方式,展示目标人体和至少一个目标对象相叠加的AR效果,有利于提升AR展示效果。
基于前述实施例记载的一种深度检测方法,本公开实施例还提供了一种深度检测装置7,该深度检测装置7可以位于上述记载的电子设备502中。
图7为本公开实施例提供的一种深度检测装置7的结构示意图,如图7所示,该深度检测装置7可以包括:
获取模块701配置为:获取图像采集设备采集的至少一帧图像,所述至少一帧图像包括当前帧图像;
处理模块702配置为:对所述当前帧图像进行人体图像的分割,得到人体的掩膜图像;对所述至少一帧图像进行人体关键点的检测,得出所述当前帧图像中人体的二维关键点信息和三维关键点信息;
检测模块703配置为:根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,其中,所述人体包括单个人体或者至少两个人体。
本公开的一些实施例中,所述人体包括至少两个人体;所述检测模块703,具体配置为:
通过将所述至少两个人体中每个人体的二维关键点信息与所述至少两个人体的掩膜图像中每个人体的掩膜图像进行匹配,得到分别属于每个人体的二维关键点信息;
根据分别属于所述每个人体的二维关键点信息所对应的三维关键点信息,确定所述当前帧图像中每个人体的深度检测结果。
本公开的一些实施例中,所述检测模块703,具体配置为在所述至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
本公开的一些实施例中,所述检测模块703,具体配置为:
确定所述每个人体的二维关键点信息所对应的三维关键点信息的坐标信息;
根据所述三维关键点的坐标信息,确定所述每个人体的二维关键点的深度信息;
对所述每个人体的二维关键点的深度信息进行插值处理,得到所述每个人体的掩膜图像中第一像素点的深度信息;其中,所述第一像素点表示所述每个人体的掩膜图像中除与所述二维关键点位置重 叠的像素点之外的任一像素点。
本公开的一些实施例中,所述检测模块703,具体配置为:
对所述当前帧图像中所述至少两个人体的二维关键点信息进行优化处理,得到优化处理后的至少两个的二维关键点信息;
在所述优化处理后的至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
本公开的一些实施例中,所述检测模块703,具体配置为:
在所述至少一帧图像还包括历史帧图像的情况下,对所述当前帧图像中至少两个人体的二维关键点信息和所述历史帧图像中至少两个人体的二维关键点信息进行处理,得到优化处理后的至少两个人体的二维关键点信息。
本公开的一些实施例中,所述人体包括单个的目标人体,所述检测模块703,具体配置为:
对所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息进行优化,得到优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息;
基于所述优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及所述目标人体的人体掩膜图像,确定所述当前帧图像中所述目标人体的深度检测结果。
本公开的一些实施例中,所述二维关键点信息包括二维关键点的坐标信息,所述检测模块703,具体配置为:
响应于从所述当前帧图像未检测到目标人体的二维关键点,或者从所述当前帧图像中目标人体的二维关键点对应的三维关键点未处于预设区域的情况,根据所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得出优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息;或者,
响应于所述当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,根据所述当前帧图像和所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
本公开的一些实施例中,所述检测模块703,具体配置为通过对所述当前帧图像和所述至少一帧图像中有效的历史帧图像的目标人体的二维关键点的坐标信息进行平均计算,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
本公开的一些实施例中,所述检测模块703,还配置为:响应于从所述至少一帧图像中任意一帧图像检测到目标人体的二维关键点,且检测到所述任意一帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,确定所述任意一帧图像为有效的图像。
本公开的一些实施例中,所述检测模块703,具体配置为:
根据所述当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息,确定所述当前帧图像中目标人体的二维关键点对应的三维关键点与所述图像采集设备的距离;
在所述距离小于或等于设定距离的情况下,确定所述当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域。
本公开的一些实施例中,所述检测模块703,具体配置为:
通过将所述当前帧图像中至少一个人体的二维关键点信息与所述目标人体的人体掩膜图像进行匹配,得到所述当前帧图像中目标人体的二维关键点信息;
根据所述当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定所述当前帧图像中所述目标人体的深度检测结果。
本公开的一些实施例中,所述检测模块703,具体配置为:
在所述至少一个人体的二维关键点信息中,确定所述目标人体的二维关键点信息;所述目标人体的二维关键点信息为:与所述目标人体的人体掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息。
本公开的一些实施例中,所述检测模块703,具体配置为:
确定所述当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息;根据所述三维关键点的坐标信息,确定所述目标人体的二维关键点的深度信息;在与所述目标人体的二维关键点位置重叠的像素点中,确定与第一像素点相邻的像素点,将所述与第一像素点相邻的像素点的深度信息作为:所述第一像素点的深度信息;其中,所述第一像素点表示所述目标人体的人体掩膜图像或像素点集合中除与所述目标人体的二维关键点位置重叠的像素点之外的任一像素点,所述像素点集合包括:对所述目标人体的人体掩膜图像的像素点按照预设过滤方式进行过滤处理后的像素点。
本公开的一些实施例中,所述检测模块703,还配置为:
在所述目标人体的人体掩膜图像内,基于所述当前帧图像中目标人体的二维关键点搜索所述二维关键点的连通区域,将所述目标人体的人体掩膜图像中所述连通区域不包含的像素点删除,得到所述像素点集合。
本公开的一些实施例中,所述处理模块702,还配置为:
根据所述当前帧图像中人体的深度检测结果,确定所述人体与增强现实AR场景中至少一个目标对象的位置关系;
基于所述位置关系,确定所述人体和所述至少一个目标对象的组合呈现方式;
基于所述组合呈现方式,展示所述人体和所述至少一个目标对象相叠加的AR效果。
本公开的一些实施例中,所述二维关键点信息为表示人体骨架的二维关键点,所述三维关键点信息为表示人体骨架的三维关键点。
本公开的一些实施例中,所述图像采集设备采集的至少一帧图像为RGB图像。
实际应用中,获取模块701、处理模块702和检测模块703均可以利用电子设备中的处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。
需要说明的是,以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开装置实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
需要说明的是,本公开实施例中,如果以软件功能模块的形式实现上述的展示方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是终端、服务器等)执行本公开各个实施例方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本公开实施例不限制于任何特定的硬件和软件结合。
对应地,本公开实施例再提供一种计算机程序产品,计算机程序产品包括计算机可执行指令,该计算机可执行指令用于实现本公开实施例提供的深度检测方法中。
相应的,本公开实施例再提供一种计算机存储介质,计算机存储介质上存储有计算机可执行指令,该计算机可执行指令用于实现上述实施例提供的深度检测方法。
本公开实施例还提供一种电子设备,图8为本公开实施例提供的电子设备10的结构示意图,如图8所示,所述电子设备502包括:
存储器801,用于存储可执行指令;
处理器5021,用于执行所述存储器中存储的可执行指令时,以实现上述任意一种深度检测方法。
存储器801配置为存储由处理器5021计算机程序和应用,还可以缓存待处理器5021以及电子设备中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。
处理器5021执行程序时实现上述任一项深度检测方法。
上述处理器5021可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,实现上述处理器功能的电子器件还可以为其它,本公开实施例不作限 制。
上述计算机可读存储介质/存储器可以是ROM、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性随机存取存储器(Ferromagnetic Random Access Memory,FRAM)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(Compact Disc Read-Only Memory,CD-ROM)等存储器;也可以是包括上述存储器之一或任意组合的各种终端,如移动电话、计算机、平板设备、个人数字助理等。
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开存储介质和设备实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本公开的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个***,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本公开实施例方案的目的。
另外,在本公开各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
或者,本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得设备自动测试线执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
本公开所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本公开所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本公开的实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。
工业实用性
本公开实施例公开了一种深度检测方法、装置、电子设备、存储介质及程序,所述方法包括:获取图像采集设备采集的至少一帧图像,至少一帧图像包括当前帧图像;对当前帧图像进行人体图像的分割,得到人体的掩膜图像;并对至少一帧图像进行人体关键点的检测,得出当前帧图像中人体的二 维关键点信息和三维关键点信息;根据当前帧图像中人体的二维关键点信息和三维关键点信息、以及人体的掩膜图像,确定当前帧图像中人体的深度检测结果,其中,人体包括单个人体或者至少两个人体。本公开实施例提供的深度检测方法,可以在不依赖于三维深度相机等特殊硬件设备的情况下,实现图像中的人体的深度检测,可以应用于AR交互、虚拟拍照等场景。

Claims (39)

  1. 一种深度检测方法,所述方法应用于电子设备,所述方法包括:
    获取图像采集设备采集的至少一帧图像,所述至少一帧图像包括当前帧图像;
    对所述当前帧图像进行人体图像的分割,得到人体的掩膜图像;
    对所述至少一帧图像进行人体关键点的检测,得出所述当前帧图像中人体的二维关键点信息和三维关键点信息;
    根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,其中,所述人体包括单个人体或者至少两个人体。
  2. 根据权利要求1所述的方法,其中,所述人体包括至少两个人体,所述根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,包括:
    通过将所述至少两个人体中每个人体的二维关键点信息与所述至少两个人体的掩膜图像中每个人体的掩膜图像进行匹配,得到分别属于每个人体的二维关键点信息;
    根据分别属于所述每个人体的二维关键点信息所对应的三维关键点信息,确定所述当前帧图像中每个人体的深度检测结果。
  3. 根据权利要求2所述的方法,其中,所述通过将所述至少两个人体中每个人体的二维关键点信息与所述至少两个人体的掩膜图像中每个人体的掩膜图像进行匹配,得到分别属于每个人体的二维关键点信息,包括:
    在所述至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
  4. 根据权利要求2所述的方法,其中,所述根据分别属于所述每个人体的二维关键点信息所对应的三维关键点信息,确定所述当前帧图像中每个人体的深度检测结果,包括:
    确定所述每个人体的二维关键点信息所对应的三维关键点信息的坐标信息;
    根据所述三维关键点的坐标信息,确定所述每个人体的二维关键点的深度信息;
    对所述每个人体的二维关键点的深度信息进行插值处理,得到所述每个人体的掩膜图像中第一像素点的深度信息;其中,所述第一像素点表示所述每个人体的掩膜图像中除与所述二维关键点位置重叠的像素点之外的任一像素点。
  5. 根据权利要求2至4任一项所述的方法,其中,所述通过将所述至少两个人体中每个人体的二维关键点信息与所述至少两个人体的掩膜图像中每个人体的掩膜图像进行匹配,得到分别属于每个人体的二维关键点信息,包括:
    对所述当前帧图像中所述至少两个人体的二维关键点信息进行优化处理,得到优化处理后的至少两个的二维关键点信息;
    在所述优化处理后的至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
  6. 根据权利要求5所述的方法,其中,所述对所述当前帧图像中至少两个人体的二维关键点信息进优化处理,得到优化处理后的至少两个人体的二维关键点信息,包括:
    在所述至少一帧图像还包括历史帧图像的情况下,对所述当前帧图像中至少两个人体的二维关键点信息和所述历史帧图像中至少两个人体的二维关键点信息进行处理,得到优化处理后的至少两个人体的二维关键点信息。
  7. 根据权利要求1至6任一所述的方法,其中,所述人体包括单个的目标人体,所述根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,包括:
    对所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息进行优化,得到优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息;
    基于所述优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及所述目标人体的人体掩膜图像,确定所述当前帧图像中所述目标人体的深度检测结果。
  8. 根据权利要求7所述的方法,其中,所述二维关键点信息包括二维关键点的坐标信息,所述对所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息进行优化,得到优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息,包括:
    响应于从所述当前帧图像未检测到目标人体的二维关键点,或者从所述当前帧图像中目标人体的二维关键点对应的三维关键点未处于预设区域的情况,根据所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得出优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息;或者,
    响应于所述当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,根据所述当前帧图像和所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
  9. 根据权利要求8所述的方法,其中,所述根据所述当前帧图像和所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息,包括:
    通过对所述当前帧图像和所述至少一帧图像中有效的历史帧图像的目标人体的二维关键点的坐标信息进行平均计算,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
  10. 根据权利要求8或9所述的方法,其中,所述方法还包括:
    响应于从所述至少一帧图像中任意一帧图像检测到目标人体的二维关键点,且检测到所述任意一帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,确定所述任意一帧图像为有效的图像。
  11. 根据权利要求10所述的方法,其中,所述检测到所述任意一帧图像中目标人体的二维关键点对应的三维关键点处于预设区域,包括:
    根据所述当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息,确定所述当前帧图像中目标人体的二维关键点对应的三维关键点与所述图像采集设备的距离;
    在所述距离小于或等于设定距离的情况下,确定所述当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域。
  12. 根据权利要求7至11任一项所述的方法,其中,所述根据所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及所述目标人体的人体掩膜图像,确定所述当前帧图像中所述目标人体的深度检测结果,包括:
    通过将所述当前帧图像中至少一个人体的二维关键点信息与所述目标人体的人体掩膜图像进行匹配,得到所述当前帧图像中目标人体的二维关键点信息;
    根据所述当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定所述当前帧图像中所述目标人体的深度检测结果。
  13. 根据权利要求12所述的方法,其中,所述通过将所述当前帧图像中至少一个人体的二维关键点信息与所述目标人体的人体掩膜图像进行匹配,得到所述当前帧图像中目标人体的二维关键点信息,包括:
    在所述至少一个人体的二维关键点信息中,确定所述目标人体的二维关键点信息;所述目标人体的二维关键点信息为:与所述目标人体的人体掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息。
  14. 根据权利要求12所述的方法,其中,所述根据所述当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定所述当前帧图像中所述目标人体的深度检测结果,包括:
    确定所述当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息;根据所述三维关键 点的坐标信息,确定所述目标人体的二维关键点的深度信息;在与所述目标人体的二维关键点位置重叠的像素点中,确定与第一像素点相邻的像素点,将所述与第一像素点相邻的像素点的深度信息作为:所述第一像素点的深度信息;其中,所述第一像素点表示所述目标人体的人体掩膜图像或像素点集合中除与所述目标人体的二维关键点位置重叠的像素点之外的任一像素点,所述像素点集合包括:对所述目标人体的人体掩膜图像的像素点按照预设过滤方式进行过滤处理后的像素点。
  15. 根据权利要求14所述的方法,其中,所述方法还包括:
    在所述目标人体的人体掩膜图像内,基于所述当前帧图像中目标人体的二维关键点搜索所述二维关键点的连通区域,将所述目标人体的人体掩膜图像中所述连通区域不包含的像素点删除,得到所述像素点集合。
  16. 根据权利要求1至15任一项所述的方法,其中,所述方法还包括:
    根据所述当前帧图像中人体的深度检测结果,确定所述人体与增强现实AR场景中至少一个目标对象的位置关系;
    基于所述位置关系,确定所述人体和所述至少一个目标对象的组合呈现方式;
    基于所述组合呈现方式,展示所述人体和所述至少一个目标对象相叠加的AR效果。
  17. 根据权利要求1至16任一项所述的方法,其中,所述二维关键点信息为表示人体骨架的二维关键点,所述三维关键点信息为表示人体骨架的三维关键点。
  18. 根据权利要求1至17任一项所述的方法,其中,所述图像采集设备采集的至少一帧图像为红绿蓝RGB图像。
  19. 一种深度检测装置,所述装置包括:
    获取模块配置为:获取图像采集设备采集的至少一帧图像,所述至少一帧图像包括当前帧图像;
    处理模块配置为:对所述当前帧图像进行人体图像的分割,得到人体的掩膜图像;对所述至少一帧图像进行人体关键点的检测,得出所述当前帧图像中人体的二维关键点信息和三维关键点信息;
    检测模块配置为:根据所述当前帧图像中人体的二维关键点信息和三维关键点信息、以及所述人体的掩膜图像,确定所述当前帧图像中人体的深度检测结果,其中,所述人体包括单个人体或者至少两个人体。
  20. 根据权利要求19所述的装置,其中,所述人体包括至少两个人体;所述检测模块,具体配置为:
    通过将所述至少两个人体中每个人体的二维关键点信息与所述至少两个人体的掩膜图像中每个人体的掩膜图像进行匹配,得到分别属于每个人体的二维关键点信息;
    根据分别属于所述每个人体的二维关键点信息所对应的三维关键点信息,确定所述当前帧图像中每个人体的深度检测结果。
  21. 根据权利要求20所述的装置,其中,所述检测模块,具体配置为在所述至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
  22. 根据权利要求20所述的装置,其中,所述检测模块,具体配置为:
    确定所述每个人体的二维关键点信息所对应的三维关键点信息的坐标信息;
    根据所述三维关键点的坐标信息,确定所述每个人体的二维关键点的深度信息;
    对所述每个人体的二维关键点的深度信息进行插值处理,得到所述每个人体的掩膜图像中第一像素点的深度信息;其中,所述第一像素点表示所述每个人体的掩膜图像中除与所述二维关键点位置重叠的像素点之外的任一像素点。
  23. 根据权利要求20至22任一项所述的装置,其中,所述检测模块,具体配置为:
    对所述当前帧图像中所述至少两个人体的二维关键点信息进行优化处理,得到优化处理后的至少两个的二维关键点信息;
    在所述优化处理后的至少两个人体的二维关键点信息中,将与每个人体的掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息作为每个人体的二维关键点信息。
  24. 根据权利要求23所述的装置,其中,所述检测模块,具体配置为:
    在所述至少一帧图像还包括历史帧图像的情况下,对所述当前帧图像中至少两个人体的二维关键点信息和所述历史帧图像中至少两个人体的二维关键点信息进行处理,得到优化处理后的至少两个人体的二维关键点信息。
  25. 根据权利要求19至24任一项所述的装置,其中,所述人体包括单个的目标人体,所述检测模块,具体配置为:
    对所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息进行优化,得到优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息;
    基于所述优化处理后的所述当前帧图像中至少一个人体的二维关键点信息和三维关键点信息、以及所述目标人体的人体掩膜图像,确定所述当前帧图像中所述目标人体的深度检测结果。
  26. 根据权利要求25所述的装置,其中,所述二维关键点信息包括二维关键点的坐标信息,所述检测模块,具体配置为:
    响应于从所述当前帧图像未检测到目标人体的二维关键点,或者从所述当前帧图像中目标人体的二维关键点对应的三维关键点未处于预设区域的情况,根据所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得出优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息;或者,
    响应于所述当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,根据所述当前帧图像和所述至少一帧图像的有效的历史帧图像中目标人体的二维关键点的坐标信息,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
  27. 根据权利要求26所述的装置,其中,所述检测模块,具体配置为通过对所述当前帧图像和所述至少一帧图像中有效的历史帧图像的目标人体的二维关键点的坐标信息进行平均计算,得到优化处理后的所述当前帧图像中目标人体的二维关键点的坐标信息。
  28. 根据权利要求26或27所述的装置,其中,所述检测模块,还配置为:响应于从所述至少一帧图像中任意一帧图像检测到目标人体的二维关键点,且检测到所述任意一帧图像中目标人体的二维关键点对应的三维关键点处于预设区域的情况,确定所述任意一帧图像为有效的图像。
  29. 根据权利要求28所述的装置,其中,所述检测模块,具体配置为:
    根据所述当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息,确定所述当前帧图像中目标人体的二维关键点对应的三维关键点与所述图像采集设备的距离;
    在所述距离小于或等于设定距离的情况下,确定所述当前帧图像中目标人体的二维关键点对应的三维关键点处于预设区域。
  30. 根据权利要求25至29任一项所述的装置,其中,所述检测模块,具体配置为:
    通过将所述当前帧图像中至少一个人体的二维关键点信息与所述目标人体的人体掩膜图像进行匹配,得到所述当前帧图像中目标人体的二维关键点信息;
    根据所述当前帧图像中目标人体的二维关键点信息对应的三维关键点信息,确定所述当前帧图像中所述目标人体的深度检测结果。
  31. 根据权利要求30所述的装置,其中,所述检测模块,具体配置为:
    在所述至少一个人体的二维关键点信息中,确定所述目标人体的二维关键点信息;所述目标人体的二维关键点信息为:与所述目标人体的人体掩膜图像的位置重叠度达到设定值的一个人体的二维关键点信息。
  32. 根据权利要求30所述的装置,其中,所述检测模块,具体配置为:
    确定所述当前帧图像中目标人体的二维关键点对应的三维关键点的坐标信息;根据所述三维关键点的坐标信息,确定所述目标人体的二维关键点的深度信息;在与所述目标人体的二维关键点位置重叠的像素点中,确定与第一像素点相邻的像素点,将所述与第一像素点相邻的像素点的深度信息作为:所述第一像素点的深度信息;其中,所述第一像素点表示所述目标人体的人体掩膜图像或像素点集合中除与所述目标人体的二维关键点位置重叠的像素点之外的任一像素点,所述像素点集合包括:对所 述目标人体的人体掩膜图像的像素点按照预设过滤方式进行过滤处理后的像素点。
  33. 根据权利要求32所述的装置,其中,所述检测模块,还配置为:
    在所述目标人体的人体掩膜图像内,基于所述当前帧图像中目标人体的二维关键点搜索所述二维关键点的连通区域,将所述目标人体的人体掩膜图像中所述连通区域不包含的像素点删除,得到所述像素点集合。
  34. 根据权利要求19至33任一项所述的装置,其中,所述处理模块,还配置为:
    根据所述当前帧图像中人体的深度检测结果,确定所述人体与增强现实AR场景中至少一个目标对象的位置关系;
    基于所述位置关系,确定所述人体和所述至少一个目标对象的组合呈现方式;
    基于所述组合呈现方式,展示所述人体和所述至少一个目标对象相叠加的AR效果。
  35. 根据权利要求19至34任一项所述的装置,其中,所述二维关键点信息为表示人体骨架的二维关键点,所述三维关键点信息为表示人体骨架的三维关键点。
  36. 根据权利要求19至35任一项所述的装置,其中,所述图像采集设备采集的至少一帧图像为红绿蓝RGB图像。
  37. 一种电子设备,所述电子设备包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,以实现权利要求1至18任一项权利要求所述的深度检测方法。
  38. 一种计算机可读存储介质,其中,存储有可执行指令,用于被处理器执行时,实现权利要求1至18任一项所述的深度检测方法。
  39. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如权利要求1至18任一项所述的深度检测方法。
PCT/CN2021/109803 2020-11-24 2021-07-30 深度检测方法、装置、电子设备、存储介质及程序 WO2022110877A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202011344694.2A CN112419388A (zh) 2020-11-24 2020-11-24 深度检测方法、装置、电子设备和计算机可读存储介质
CN202011335257.4 2020-11-24
CN202011335257.4A CN112465890A (zh) 2020-11-24 2020-11-24 深度检测方法、装置、电子设备和计算机可读存储介质
CN202011344694.2 2020-11-24

Publications (1)

Publication Number Publication Date
WO2022110877A1 true WO2022110877A1 (zh) 2022-06-02

Family

ID=81755266

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109803 WO2022110877A1 (zh) 2020-11-24 2021-07-30 深度检测方法、装置、电子设备、存储介质及程序

Country Status (2)

Country Link
TW (1) TW202221646A (zh)
WO (1) WO2022110877A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375856A (zh) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质
CN117237397A (zh) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 基于特征融合的人像分割方法、***、设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN108876835A (zh) * 2018-03-28 2018-11-23 北京旷视科技有限公司 深度信息检测方法、装置和***及存储介质
US20190012807A1 (en) * 2017-07-04 2019-01-10 Baidu Online Network Technology (Beijing) Co., Ltd.. Three-dimensional posture estimating method and apparatus, device and computer storage medium
CN110047100A (zh) * 2019-04-01 2019-07-23 四川深瑞视科技有限公司 深度信息检测方法、装置及***
CN110458177A (zh) * 2019-07-12 2019-11-15 中国科学院深圳先进技术研究院 图像深度信息的获取方法、图像处理装置以及存储介质
CN110826357A (zh) * 2018-08-07 2020-02-21 北京市商汤科技开发有限公司 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN111210468A (zh) * 2018-11-22 2020-05-29 中移(杭州)信息技术有限公司 一种图像深度信息获取方法及装置
CN112419388A (zh) * 2020-11-24 2021-02-26 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质
CN112465890A (zh) * 2020-11-24 2021-03-09 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012807A1 (en) * 2017-07-04 2019-01-10 Baidu Online Network Technology (Beijing) Co., Ltd.. Three-dimensional posture estimating method and apparatus, device and computer storage medium
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN108876835A (zh) * 2018-03-28 2018-11-23 北京旷视科技有限公司 深度信息检测方法、装置和***及存储介质
CN110826357A (zh) * 2018-08-07 2020-02-21 北京市商汤科技开发有限公司 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN111210468A (zh) * 2018-11-22 2020-05-29 中移(杭州)信息技术有限公司 一种图像深度信息获取方法及装置
CN110047100A (zh) * 2019-04-01 2019-07-23 四川深瑞视科技有限公司 深度信息检测方法、装置及***
CN110458177A (zh) * 2019-07-12 2019-11-15 中国科学院深圳先进技术研究院 图像深度信息的获取方法、图像处理装置以及存储介质
CN112419388A (zh) * 2020-11-24 2021-02-26 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质
CN112465890A (zh) * 2020-11-24 2021-03-09 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375856A (zh) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质
CN115375856B (zh) * 2022-10-25 2023-02-07 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质
CN117237397A (zh) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 基于特征融合的人像分割方法、***、设备及存储介质
CN117237397B (zh) * 2023-07-13 2024-05-28 天翼爱音乐文化科技有限公司 基于特征融合的人像分割方法、***、设备及存储介质

Also Published As

Publication number Publication date
TW202221646A (zh) 2022-06-01

Similar Documents

Publication Publication Date Title
WO2021083242A1 (zh) 地图构建方法、定位方法及***、无线通信终端、计算机可读介质
CN108895981B (zh) 一种三维测量方法、装置、服务器和存储介质
CN108062536B (zh) 一种检测方法及装置、计算机存储介质
US9576183B2 (en) Fast initialization for monocular visual SLAM
CN110276317B (zh) 一种物体尺寸检测方法、物体尺寸检测装置及移动终端
CN112419388A (zh) 深度检测方法、装置、电子设备和计算机可读存储介质
WO2022110877A1 (zh) 深度检测方法、装置、电子设备、存储介质及程序
WO2023071964A1 (zh) 数据处理方法, 装置, 电子设备及计算机可读存储介质
CN110866977B (zh) 增强现实处理方法及装置、***、存储介质和电子设备
WO2020134818A1 (zh) 图像处理方法及相关产品
US20240153213A1 (en) Data acquisition and reconstruction method and system for human body three-dimensional modeling based on single mobile phone
WO2023024441A1 (zh) 模型重建方法及相关装置、电子设备和存储介质
WO2023169281A1 (zh) 图像配准方法、装置、存储介质及电子设备
WO2022088819A1 (zh) 视频处理方法、视频处理装置和存储介质
JP2013164697A (ja) 画像処理装置、画像処理方法、プログラム及び画像処理システム
WO2021098554A1 (zh) 一种特征提取方法、装置、设备及存储介质
WO2023168957A1 (zh) 姿态确定方法、装置、电子设备、存储介质及程序
WO2024060978A1 (zh) 关键点检测模型训练及虚拟角色驱动的方法和装置
WO2023015938A1 (zh) 三维点检测的方法、装置、电子设备及存储介质
CN112270709A (zh) 地图构建方法及装置、计算机可读存储介质和电子设备
CN113362467B (zh) 基于点云预处理和ShuffleNet的移动端三维位姿估计方法
CN117711066A (zh) 一种三维人体姿态估计方法、装置、设备及介质
CN112465890A (zh) 深度检测方法、装置、电子设备和计算机可读存储介质
CN112288817B (zh) 基于图像的三维重建处理方法及装置
CN112884817B (zh) 稠密光流计算方法、装置、电子设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21896384

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21896384

Country of ref document: EP

Kind code of ref document: A1