CN113450391A

CN113450391A - Method and equipment for generating depth map

Info

Publication number: CN113450391A
Application number: CN202010225480.7A
Authority: CN
Inventors: 陈亚楠; 王军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2021-09-28

Abstract

The application provides a method and equipment for generating a depth map, wherein the method comprises the following steps: acquiring an RGB image through an RGB sensor; obtaining a NIR image by a near infrared NIR sensor; registering a second image based on the first image, wherein the first image is one of an RGB image and an NIR image, and the second image is the other one of the RGB image and the NIR image except the first image; a depth map is generated based on the first image and the registered second image. In the application, the NIR sensor has a wide wavelength range, so that a high-quality image can be obtained at night or in a poor light environment, and a high-contrast image can also be obtained under normal light, so that the depth map generated by the RGB image and the NIR image can ensure the effect of the depth map in a bright scene and improve the effect of the depth map in a dark scene.

Description

Method and equipment for generating depth map

Technical Field

The present application relates to the field of image processing, and in particular, to a method and an apparatus for generating a depth map.

Background

In machine vision, it is one of the important tasks to acquire the distance of each point in the field of view relative to the camera. The distance of points in the scene from the camera can be represented by a depth map (depth map), which is similar to a grayscale image except that each pixel value thereof is the actual distance of the sensor from the object.

At present, the depth map generation is usually performed based on a binocular system, and the main binocular system is basically a common RGB + RGB combination. In the daytime under normal light irradiation (namely bright scene), better depth estimation can be obtained, and in the night scene (namely dark scene), if the RGB image adopts short exposure under low illumination, larger noise exists, the brightness is lower, and the picture details are lost; if a picture with better brightness is obtained, high gain or long exposure is needed, but motion blur is easy to generate; while the use of a flash lamp can produce shadows and specular reflections.

Therefore, on the premise of ensuring the effect of the depth map in the bright scene, how to improve the effect of the depth map in the dark scene becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a method and equipment for generating a depth map, so that the effect of the depth map in a dark scene is improved on the premise of ensuring the effect of the depth map in a bright scene.

In a first aspect, the present application provides a method for generating a depth map, where the method may be applied to a terminal device having an image capturing module, such as a smart phone, a tablet computer, a notebook computer, and the like. The image capturing module may include: RGB sensors and NIR sensors. The method comprises the steps that an RGB sensor acquires an RGB image, an NIR sensor acquires an NIR image, the RGB sensor and the NIR sensor respectively output the RGB image and the NIR image to an image processor in the terminal equipment, the image processor registers a second image except the first image in the RGB image and the NIR image based on the first image in the RGB image and the NIR image, and a depth map is generated based on the first image and the second image after registration.

In the present application, the frame-out time stamp of the RGB image and the frame-out time stamp of the NIR image are identical.

In the application, the NIR sensor has a wide wavelength range, so that a high-quality image can be obtained at night or in a poor light environment, and a high-contrast image can also be obtained under normal light, so that the depth map generated by the RGB image and the NIR image can ensure the effect of the depth map in a bright scene and improve the effect of the depth map in a dark scene.

Based on the first aspect, in some possible embodiments, the method may further include: preprocessing the RGB image and/or the NIR image, wherein the preprocessing comprises one or more of the following processing modes: denoising, downsampling and edge enhancement.

In the application, the RBG image and/or the NIR image are preprocessed to improve the quality of the RGB image and/or the NIR image, so that the characteristic points in the RGB image and/or the NIR image are easier to detect.

Based on the first aspect, in some possible embodiments, the preprocessing of the RGB image and/or the NIR image comprises: and when the ambient light illumination detected by the illumination sensor is less than or equal to a preset threshold value, increasing the brightness of the denoised RGB and/or the denoised NIR image.

Based on the first aspect, in some possible embodiments, registering the second image based on the first image includes: performing feature point matching on the first image and the second image to obtain a feature matching pair; inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image; and transforming the second image by using the basic matrix to obtain the second image after registration.

Based on the first aspect, in some possible implementations, performing feature point matching on the first image and the second image to obtain a feature matching pair includes: extracting a first feature point in the first image and a second feature point in the second image; matching the first characteristic point with the second characteristic point; and determining the feature points which are successfully matched in the first feature points and the second feature points as feature matching pairs.

In the method and the device, the matched feature points are screened, and the feature points successfully matched are reserved, so that the accuracy of the calculation of the depth map is guaranteed.

Based on the first aspect, in some possible embodiments, the first image is aligned with the registered second image, so that the registered RGB image and NIR image have only one directional parallax, so as to accurately calculate the depth map.

In a second aspect, the present application provides a terminal device, which may implement the first aspect and possible embodiments thereof. The terminal device may include: an RGB sensor configured to acquire an RGB image; a Near Infrared (NIR) sensor configured to acquire NIR images; a processor configured to acquire an RGB image and an NIR image; registering a second image based on a first image, wherein the first image is one of an RGB image and an NIR image, and the second image is the other one of the RGB image and the NIR image except the first image; a depth map is generated based on the first image and the registered second image.

Based on the second aspect, in some possible embodiments, the processor is further configured to pre-process the RGB image and/or the NIR image, the pre-processing including one or more of the following processing modes: denoising, downsampling and edge enhancement.

Based on the second aspect, in some possible embodiments, the processor is specifically configured to increase the brightness of the denoised RGB and/or the denoised NIR image when the ambient light illuminance detected by the illumination sensor is less than or equal to a preset threshold.

Based on the second aspect, in some possible embodiments, the processor is specifically configured to perform feature point matching on the first image and the second image, and obtain a feature matching pair; inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image; and transforming the second image by using the basic matrix to obtain the second image after registration.

Based on the second aspect, in some possible embodiments, the processor is specifically configured to extract a first feature point in the first image and a second feature point in the second image; matching the first characteristic point with the second characteristic point; and determining the feature points which are successfully matched in the first feature points and the second feature points as feature matching pairs.

Based on the second aspect, in some possible embodiments, the first image is aligned with the registered second image row.

In a third aspect, the present application provides an image processing apparatus, which may be a chip or a system on a chip in a terminal device, or may be a functional module in the terminal device for implementing the method of any one of the first aspect and possible embodiments thereof. The apparatus may include: the acquisition module acquires an RGB image and an NIR image; the registration module is used for registering a second image based on a first image, wherein the first image is one of an RGB image and an NIR image, and the second image is the other one of the RGB image and the NIR image except the first image; and the generating module is used for generating a depth map based on the first image and the registered second image.

Based on the third aspect, in some possible embodiments, the apparatus further includes a preprocessing module configured to perform preprocessing on the RGB image and/or the NIR image, where the preprocessing includes one or more of the following processing modes: denoising, downsampling and edge enhancement.

Based on the third aspect, in some possible embodiments, the preprocessing module is further configured to increase the brightness of the denoised RGB and/or the denoised NIR image when the ambient light illuminance detected by the illumination sensor is less than or equal to a preset threshold.

Based on the third aspect, in some possible embodiments, the registration module is specifically configured to perform feature point matching on the first image and the second image to obtain a feature matching pair; inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image; and transforming the second image by using the basic matrix to obtain the second image after registration.

Based on the third aspect, in some possible embodiments, the registration module is specifically configured to extract a first feature point in the first image and a second feature point in the second image; matching the first characteristic point with the second characteristic point; and determining the feature points which are successfully matched in the first feature points and the second feature points as feature matching pairs.

Based on the third aspect, in some possible embodiments, the first image is aligned with the registered second image row.

In a fourth aspect, the present application provides a chip comprising: a processor and a memory, the memory being configured to store a computer program, the processor being configured to invoke and execute the computer program stored in the memory to perform the method according to the first aspect and any of its possible embodiments.

In a fifth aspect, the present application provides a computer-readable storage medium having stored thereon instructions for performing the method according to the first aspect and any of its possible implementations when the instructions are run on a computer.

In a sixth aspect, the present application provides a computer program or a computer program product which, when executed on a computer, causes the computer to carry out the method according to the first aspect and any of its possible embodiments.

It should be understood that the second to sixth aspects of the present application are consistent with the technical solution of the first aspect of the present application, and the beneficial effects obtained by the aspects and the corresponding possible implementation are similar, and are not described again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

Fig. 1 is a schematic structural diagram of a terminal device in an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a method for generating a depth map according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of image registration in the embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating background blurring of an image according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal device in the embodiment of the present application.

Detailed Description

The embodiments of the present application will be described below with reference to the drawings. In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the present application or in which specific aspects of embodiments of the present application may be employed. It should be understood that embodiments of the present application may be used in other ways and may include structural or logical changes not depicted in the drawings. For example, it should be understood that the disclosure in connection with the described methods may equally apply to the corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the described one or more method steps (e.g., a unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units, such as functional units, the corresponding method may comprise one step to perform the functionality of the one or more units (e.g., one step performs the functionality of the one or more units, or multiple steps, each of which performs the functionality of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.

In machine vision, depth map is an image or image channel that contains information about the distance of the camera from each point in the field of view. Where depth map is similar to a grayscale image except that each pixel value thereof is the actual distance of the sensor from the object.

At present, the depth map generation is usually performed based on a binocular system, and the main binocular system is basically a combination of a common RGB sensor and an RGB sensor. In the daytime under normal light irradiation (namely bright scene), better depth estimation can be obtained, and in the night scene (namely dark scene), if the RGB image adopts short exposure under low illumination, larger noise exists, the brightness is lower, and the picture details are lost; if a picture with better brightness is obtained, high gain or long exposure is needed, but motion blur is easy to generate; while the use of a flash lamp can produce shadows and specular reflections. Therefore, on the premise of ensuring the effect of the depth map in the bright scene, how to improve the effect of the depth map in the dark scene becomes an urgent problem to be solved.

In order to solve the above problem, an embodiment of the present application provides a method for generating a depth map, where the method may be applied to a terminal device such as a smart phone, a notebook computer, a tablet computer, and a digital camera, which is provided with an image capture module. Optionally, the image acquisition module may be a front image acquisition module on the terminal device, and is mainly used for acquiring a portrait so as to perform background blurring, auxiliary focusing, and the like.

In this embodiment of the present application, fig. 1 is a schematic structural diagram of a terminal device in this embodiment of the present application, and referring to solid lines shown in fig. 1, the terminal device 100 may include: image acquisition module 101 and processor 102, wherein, image acquisition module 101 can include: RGB sensor 1011, infrared sensor 1012. Among them, the RGB sensor 1011 may acquire an RGB image, and the Near Infrared (NIR) sensor 1012 may acquire an NIR image.

Further, the image acquisition module can also include: an Infrared (IR) emitter may be used to emit IR light in a dark scene for the NIR sensor to acquire an NIR image.

NIR is an electromagnetic wave having a wavelength between visible light and middle infrared, and is an invisible ray. The NIR sensor has a wide wavelength range, and thus can obtain a high-quality image not only at night or in a poor light environment, but also in normal light. Further, NIR is less susceptible to rayleigh scattering due to its longer wavelength range, and is more likely to penetrate haze than visible light. Further, the NIR sensor also has advantages of low cost, low power consumption, fast speed, invisibility to the naked eye, etc.

Fig. 2 is a flowchart illustrating a method for generating a depth map in an embodiment of the present application, and referring to fig. 2, the method may include:

s201: an RGB sensor collects RGB images in a field of view;

s202: the NIR sensor collects NIR images in a field of view;

here, it is assumed that the terminal device is a smartphone, and the terminal device is first configured to enable a user to open a "camera" application, focus on a target object such as a self portrait, and press a "photographing" button, at which time, the processor obtains a photographing instruction, executes the photographing instruction, triggers the RGB sensor and the NIR sensor to acquire images, and obtains RGB images and NIR images output by the RGB sensor and the NIR sensor.

In the embodiment of the present application, in order to generate the depth map, the RGB sensor and the NIR sensor must synchronize frames, that is, each frame of RGB image acquired by the RGB sensor and each frame of NIR image acquired by the NIR sensor must ensure that the frame time stamps are consistent.

In practical applications, S201 and S202 may be executed in parallel, that is, the RGB sensor and the NIR sensor may capture the same field of view at the same time, and obtain a synchronized RGB image and a NIR image.

In practical applications, in order to ensure synchronous frames of the RGB sensor and the NIR sensor, referring to fig. 1, the terminal device 100 may further include: and the synchronization module 103 is respectively coupled with the RGB sensor and the NIR sensor, and when the processor receives a photographing instruction, the processor triggers the synchronization module to control the RGB sensor and the NIR sensor to simultaneously acquire images.

S203: the RGB image and the NIR image are transmitted to the processor by the RGB sensor and the NIR image sensor through the data interface;

the processor acquires the RGB image and the NIR image through S201 and S202, and selects a first image and a second image having the same time stamp therefrom. In the embodiment of the present application, the first image may be an RGB image or an NIR image. When the first image is an RGB image, the second image may be an NIR image; when the first image is an NIR image, the second image may be an RGB image.

S204: the processor registers the second image based on the first image;

in an embodiment of the present application, fig. 3 is a schematic flowchart of image registration in an embodiment of the present application, and as shown in fig. 3, the image registration may include:

s301: the processor performs feature point matching on the first image and the second image to obtain a feature matching pair;

s302: the processor inputs the feature matching pairs into an antipodal constraint equation and outputs a basic matrix;

wherein the base matrix is a mapping matrix between the first image and the second image;

s303: the processor transforms the second image using the basis matrix to obtain a registered second image.

Firstly, a processor extracts a first feature point from a first image and a second feature point from a second image, wherein the first feature point and the second feature point are in one-to-one correspondence; then, the processor matches the first feature point with the second feature point, and then determines the feature point which is successfully matched in the first feature point and the second feature point as a feature matching pair, that is, the processor can filter out the feature point which has obvious errors in the matching process, obtain the feature point which is successfully matched, and further obtain the feature matching pair. In the embodiment of the application, the matched feature points are screened, and the feature points successfully matched are reserved, so that the accuracy of the calculation of the subsequent depth map is ensured.

Here, for accurate registration of the second image, the feature points extracted from the first and second images by the processor may be easily identifiable points in space, such as local extreme points, gradient discontinuities, and the like.

Next, the processor inputs the obtained feature matching pairs into an epipolar constraint equation and solves the epipolar constraint equation to obtain a basic matrix, i.e., a mapping matrix between the first image and the second image. Finally, the processor changes the first image using the basis matrix, thereby resulting in a registered second image. In the embodiment of the present application, the first image is aligned with the registered second image, so that there is only one-directional parallax between the registered RGB image and the NIR image, so as to accurately calculate the depth map.

In the embodiment of the application, because the RGB sensor and the NIR sensor have differences in rotation, focal length, and the like, the processor needs to select a feature point with rotation invariance and scaling invariance. Meanwhile, since the RGB sensor and the NIR sensor are different in type and may have different noise and brightness, the feature points extracted by the processor should be robust to the noise and brightness and distributed in the whole image area.

For example, the processor extracts a first feature point { A, B, C, D } from the first image and a second feature point { A ', B', C ', E' } from the second image, where A corresponds to A ', B corresponds to B', and C corresponds to C ', and the processor can filter out unmatched feature points D and E by regression analysis to obtain feature matching pairs { A, A' }, { B, B '}, and { C, C' }. The processor inputs the feature matching pairs { A, A ' }, { B, B ' } and { C, C ' } into an epipolar constraint equation, which is shown in the following formula (1), and the processor solves the fundamental matrix F in the formula (1). Finally, the processor transforms the second image using the basis matrix F to obtain a registered second image, the registered second image being aligned with the first image rows.

p'^TFp＝0 (1)

Wherein, for a point P in space, the imaging point in the first image is P, the imaging point in the corresponding second image is P', and F is the basic matrix.

S205: the processor generates a depth map based on the first image and the registered second image.

The processor performs block matching on the first image and the registered second image to obtain a sparse disparity map, then optimizes the sparse disparity map, then obtains a dense disparity map through optical flow, then optimizes the dense disparity map, and finally converts the optimized disparity map into a depth map.

In practical application, due to repeated texture, weak texture, parallel texture along the parallax direction, and the like, there are regions where binocular parallax cannot be accurately calculated, and the calculated parallax of these regions is wrong, so that after obtaining a sparse parallax image, the processor needs to use some auxiliary methods, such as removing isolated error blocks through dilation and corrosion of image morphology, removing obvious error noise through histogram statistics, correcting human body region parallax through portrait segmentation, and the like, to repair holes, correct wrong parallax, and obtain a better parallax image.

Further, in order to remove edge burrs and make the edges of the obtained disparity map main body completely fit with the RGB image, the processor may further optimize the dense disparity map, such as performing weighted median filtering on the dense disparity map. Of course, the processor may perform other optimizations on the dense disparity map according to different scene requirements, and the embodiment of the present application is not particularly limited.

In the embodiment of the present application, to facilitate feature detection, whether in a bright scene (e.g., day) or in a dark scene (e.g., night), the processor may perform preprocessing on one or more of the RGB image and the NIR image before performing S204, where the preprocessing may be denoising, downsampling, edge enhancement, and the like. Of course, the processor may also perform other pre-processing on one or more of the RGB image and the NIR image according to different application requirements to obtain an image with better image quality, which is not specifically limited in the embodiment of the present application. In practical applications, the closer the quality of the RGB image and the NIR image is, the more advantageous for feature matching in S301.

In practical application, the denoising process may be bilateral filtering, guided filtering, or the like.

Further, the processor may also increase the brightness of the RGB and/or NIR images for dark scenes, due to the noisy RGB images and lack of texture detail in dark scenes, such as night scenes.

In practical applications, the processor may perform the above-mentioned preprocessing on one or more of the RGB image and the NIR image, such as one of the processing manners of denoising, downsampling, edge enhancement, and the like, or may sequentially perform a plurality of the above-mentioned preprocessing manners on one or more of the RGB image and the NIR image, for example, denoising the RGB image first, then increasing the luminance of the denoised RGB image, and then performing edge enhancement on the RGB image, so as to improve the image quality of the RGB image, and make the feature points easier to detect. Optionally, because increasing the brightness of the image and the edge enhancement amplifies the noise which is not obvious before, the RGB image may be denoised once again after the edge enhancement.

It should be noted that, in order to distinguish between a bright scene and a dark scene, the terminal device may further include: the illumination sensor is used for detecting the illumination of the ambient light, when the illumination of the ambient light is larger than a preset threshold value, the terminal device is indicated to be in a bright scene currently, and when the illumination of the ambient light is smaller than or equal to the preset threshold value, the terminal device is indicated to be in a dark scene currently. Here, the preset threshold is an empirical value, and the embodiment of the present application is not particularly limited.

Further, the processor may control the IR transmitter to operate when the illumination sensor detects that the terminal device is in a dark scene.

At this point, the process of generating the depth map is completed.

In the embodiment of the application, since the NIR sensor has a wide wavelength range, not only can a higher-quality image be obtained at night or in a poor light environment, but also a high-contrast image can be obtained under normal light, and then, the depth map generated by the RGB image and the NIR image can ensure the effect of the depth map in a bright scene and improve the effect of the depth map in a dark scene.

In another embodiment of the present application, after the processor obtains the depth map through the above-mentioned S201 to S204, the processor may perform applications such as auxiliary focusing and background blurring based on the depth map.

For example, fig. 4 is a schematic flow chart of image background blurring in the embodiment of the present application, and referring to fig. 4, for an application scenario of image background blurring, the method of image background blurring may include:

s401: under a dark scene, the processor controls the IR emitter to work according to the detection result of the illumination sensor;

s402: the processor performs hardware frame output synchronous processing on the RGB image and the NIR image;

s403: acquiring a plurality of frames of RGB images and synchronous NIR images by a processor;

s404: the processor performs fusion based on the multi-frame RGB images to obtain fused RGB images;

here, the image quality can be improved by performing multi-frame fusion based on multi-frame RGB images, and further, in a high dynamic scene, the processor may perform multi-exposure fusion on multi-frame RGB images with different exposures to improve the image quality, so as to obtain a high-dynamic range (HDR) image;

s405: the processor preprocesses a reference frame and an NIR image in the multi-frame RGB image, such as denoising the reference frame and the NIR image, increasing the brightness of the denoised reference frame, downsampling the brightness-increased reference frame and the denoised NIR image, and performing edge enhancement on the sampled reference frame;

s406: the processor performs feature matching on the preprocessed reference frame and the NIR image to obtain a feature matching pair;

s407: the processor inputs the feature matching pairs into an antipodal constraint equation and outputs a basic matrix;

s408: the processor transforms the preprocessed NIR image by using the basic matrix to obtain a registered NIR image;

s409: the processor calculates a depth map by using the registered NIR image and the registered RGB image according to a binocular stereo matching algorithm;

s410: the processor uses the depth map to perform background blurring on the fused RGB image.

The background blurring method can be applied to a front image acquisition module of the mobile terminal, the application scene of the front image acquisition module is portrait self-shooting, the main body of the portrait in the image after the background blurring is kept clear, and the background of the portrait is blurred. And because the background blurring is carried out according to the calculated depth map, the depths such as portrait edges and the like can be estimated more accurately, and a better depth-of-field effect is obtained.

Alternatively, for the application scenario of auxiliary focusing, after executing the above-mentioned S401 to S409, the processor obtains a depth map, and then the processor may convert the depth value in the depth map into a corresponding motor position according to a correspondence relationship between the depth value and the motor position calibrated in advance, and drive the motor to achieve focusing.

Of course, the depth map generated in the embodiment of the present application may also be applied to other scenes, such as short-distance human head modeling, human body modeling, scene reconstruction, and the like, and the embodiment of the present application is not particularly limited.

Based on the same inventive concept as the method described above, an embodiment of the present application provides an image processing apparatus, which may be a chip or a system on a chip in the terminal device described in the foregoing embodiments, or may be a functional module in the terminal device for implementing the foregoing embodiments. Fig. 5 is a schematic structural diagram of an image processing apparatus in an embodiment of the present application, and referring to fig. 5, the image processing apparatus 500 may include: an acquisition module 501 for acquiring an RGB image and an NIR image; a registration module 502, configured to register a second image based on a first image, where the first image is one of an RGB image and an NIR image, and the second image is another one of the RGB image and the NIR image except the first image; a generating module 503, configured to generate a depth map based on the first image and the registered second image.

In some possible embodiments, the apparatus further includes a preprocessing module for preprocessing the RGB image and/or the NIR image, wherein the preprocessing includes one or more of the following processing modes: denoising, downsampling and edge enhancement.

In some possible embodiments, the preprocessing module is further configured to increase the brightness of the denoised RGB image and/or the denoised NIR image when the ambient light illuminance detected by the illumination sensor is less than or equal to a preset threshold.

In some possible embodiments, the registration module is specifically configured to perform feature point matching on the first image and the second image to obtain a feature matching pair; inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image; and transforming the second image by using the basic matrix to obtain the second image after registration.

In some possible embodiments, the registration module is specifically configured to extract a first feature point in the first image and a second feature point in the second image; matching the first characteristic point with the second characteristic point; and determining the feature points which are successfully matched in the first feature points and the second feature points as feature matching pairs.

In some possible embodiments, the first image is aligned with the registered second image row.

Based on the same inventive concept as the method described above, embodiments of the present application provide a terminal device, which is consistent with the terminal device described in the embodiments described above, and can be used to execute the method described in each embodiment described above.

Fig. 6 is a schematic structural diagram of a terminal device in an embodiment of the present application, and referring to fig. 6, the terminal device 600 may include: an RGB sensor 601 configured to acquire an RGB image; an NIR sensor 602 configured to acquire an NIR image; a processor 603 configured to acquire an RGB image and an NIR image; registering a second image based on a first image, wherein the first image is one of an RGB image and an NIR image, and the second image is the other one of the RGB image and the NIR image except the first image; a depth map is generated based on the first image and the registered second image.

In some possible embodiments, the processor is further configured to pre-process the RGB image and/or the NIR image, the pre-processing including one or more of the following processing modes: denoising, downsampling and edge enhancement.

In some possible embodiments, the processor is specifically configured to increase the brightness of the denoised RGB and/or the denoised NIR image when the ambient light illuminance detected by the illumination sensor is less than or equal to a preset threshold.

In some possible embodiments, the processor is specifically configured to perform feature point matching on the first image and the second image, and obtain a feature matching pair; inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image; and transforming the second image by using the basic matrix to obtain the second image after registration.

In some possible embodiments, the processor is specifically configured to extract a first feature point in the first image and a second feature point in the second image; matching the first characteristic point with the second characteristic point; and determining the feature points which are successfully matched in the first feature points and the second feature points as feature matching pairs.

Based on the same inventive concept as the method, an embodiment of the present application provides a chip, including: a processor and a memory, the memory being configured to store a computer program, the processor being configured to invoke and execute the computer program stored in the memory to perform the method according to the first aspect and any of its possible embodiments.

Based on the same inventive concept as the method described above, embodiments of the present application provide a computer-readable storage medium storing instructions for performing the method according to the first aspect and any of its possible implementation manners when the instructions are executed on a computer.

Based on the same inventive concept as the above method, embodiments of the present application provide a computer program or a computer program product, which, when executed on a computer, causes the computer to implement the method as described in the first aspect and any of its various possible embodiments.

Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by an interoperating hardware unit (including one or more processors as described above).

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of generating a depth map, comprising:

acquiring an RGB image through an RGB sensor;

obtaining a NIR image by a near infrared NIR sensor;

registering a second image based on a first image, wherein the first image is one frame of the RGB image and the NIR image, and the second image is the other frame of the RGB image and the NIR image except the first image;

generating a depth map based on the first image and the registered second image.

2. The method of claim 1, further comprising:

pre-processing the RGB image and/or the NIR image, the pre-processing comprising one or more of: denoising, downsampling and edge enhancement.

3. The method of claim 2, wherein the pre-processing the RGB image and/or the NIR image comprises:

and when the ambient light illumination detected by the illumination sensor is less than or equal to a preset threshold value, increasing the brightness of the denoised RGB and/or the denoised NIR image.

4. The method of any of claims 1 to 3, wherein registering the second image based on the first image comprises:

performing feature point matching on the first image and the second image to obtain a feature matching pair;

inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image;

and transforming the second image by using the basic matrix to obtain the registered second image.

5. The method of claim 4, wherein the performing feature point matching on the first image and the second image to obtain a feature matching pair comprises:

extracting a first feature point in the first image and a second feature point in the second image;

matching the first feature point and the second feature point;

and determining the feature points which are successfully matched in the first feature points and the second feature points as the feature matching pairs.

6. The method of any of claims 1 to 5, wherein the first image is aligned with the registered second image row.

7. A terminal device, comprising:

an RGB sensor configured to acquire an RGB image;

a Near Infrared (NIR) sensor configured to acquire NIR images;

an image processor configured to acquire the RGB image and the NIR image; registering a second image based on a first image, wherein the first image is one frame of the RGB image and the NIR image, and the second image is the other frame of the RGB image and the NIR image except the first image; generating a depth map based on the first image and the registered second image.

8. The terminal device of claim 7, wherein the image processor is further configured to pre-process the RGB image and/or the NIR image, the pre-processing comprising one or more of: denoising, downsampling and edge enhancement.

9. The terminal device of claim 8, wherein the image processor is specifically configured to increase the brightness of the denoised RGB and/or the denoised NIR image when the ambient light illuminance detected by the illumination sensor is less than or equal to a preset threshold.

10. The terminal device according to any one of claims 7 to 9, wherein the image processor is specifically configured to perform feature point matching on the first image and the second image, obtaining a feature matching pair; inputting the feature matching pair into an epipolar constraint equation, and outputting a basic matrix, wherein the basic matrix is a mapping matrix between the first image and the second image; and transforming the second image by using the basic matrix to obtain the registered second image.

11. The terminal device of claim 10, wherein the image processor is specifically configured to extract a first feature point in the first image and a second feature point in the second image; matching the first feature point and the second feature point; and determining the feature points which are successfully matched in the first feature points and the second feature points as the feature matching pairs.

12. The terminal device according to any of claims 7 to 11, characterized in that the first image is aligned with the registered second image row.

13. A chip, comprising: a processor and a memory for storing a computer program, the processor for invoking and executing the computer program stored in the memory to perform the method of any one of claims 1 to 7.