CN117560480B

CN117560480B - Image depth estimation method and electronic equipment

Info

Publication number: CN117560480B
Application number: CN202410026355.1A
Authority: CN
Inventors: 卢溜; 朱志聪
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-05-31
Anticipated expiration: 2044-01-09
Also published as: CN117560480A

Abstract

The embodiment of the application discloses an image depth estimation method and electronic equipment, relates to the technical field of image processing, and aims to obtain more accurate depth information under different shooting scenes and improve the robustness of the image depth estimation method. The specific scheme comprises the following steps: acquiring a binocular image in response to a first operation for starting an image blurring function; the binocular image includes two frames of images; acquiring scene information of a shooting scene according to the binocular image; under the condition that scene information accords with binocular depth estimation conditions, performing depth estimation on a binocular image by adopting a binocular depth estimation algorithm to obtain a first depth map; and under the condition that the scene information does not accord with the binocular depth estimation condition, obtaining a second depth map based on a monocular depth estimation algorithm and monocular images in the binocular images. Wherein the binocular depth estimation condition is used to indicate at least one of: the subject in the binocular image is clear, and the subject in the binocular image is the same.

Description

Image depth estimation method and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image depth estimation method and an electronic device.

Background

Currently, cameras of electronic devices (e.g., cell phones) are provided with image blurring functionality. The image blurring is to perform blurring processing on the image according to the depth information of the image so as to realize the visual effect that one part of the image is clear and the other part of the image is blurred. At present, a binocular depth estimation algorithm is mainly adopted to acquire depth information of an image.

However, there is a problem in that the acquired depth information is inaccurate using the binocular depth estimation algorithm. Blurring an image based on erroneous depth information may cause problems such as a subject to be blurred (e.g., background is clear) in the image, a subject to be clearly displayed in the image is blurred (e.g., subject to be photographed is blurred), and the visual effect of the image after blurring (or blurring effect of the image) is poor.

Disclosure of Invention

The embodiment of the application provides an image depth estimation method and electronic equipment, which can acquire more accurate depth information under different shooting scenes, improve the robustness of the image depth estimation method and further realize better image blurring effect.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:

in a first aspect, an image depth estimation method is provided, the method being applied to an electronic device comprising an image sensor, the method may comprise: acquiring a binocular image in response to a first operation for starting an image blurring function; the binocular image includes two frames of images; acquiring scene information of a shooting scene according to the binocular image; under the condition that scene information accords with binocular depth estimation conditions, performing depth estimation on a binocular image by adopting a binocular depth estimation algorithm to obtain a first depth map; under the condition that the scene information does not accord with the binocular depth estimation condition, obtaining a second depth map based on a monocular depth estimation algorithm and monocular images in the binocular images; the monocular image is any one of two frames included in the binocular image.

Wherein the binocular depth estimation condition is used to indicate at least one of: the subject in the binocular image is clear, and the subject in the binocular image is the same.

It can be appreciated that the electronic device, in response to a first operation of the image blurring function, may acquire a binocular image of a photographed scene; and then acquiring scene information of the shooting scene according to the binocular image. Because the shot object in the binocular image is unclear, the position change of the shot object in the binocular image and the like can cause the error of the depth information acquired by the binocular depth estimation algorithm, the electronic equipment can judge whether the scene information meets the binocular depth estimation condition. Further, when the scene information satisfies the binocular depth estimation condition, it indicates that the photographed objects in the binocular image are clear and/or that the photographed objects in the binocular image are identical (including the positions of the photographed objects in the binocular image are identical), the depth information determined by the binocular depth estimation algorithm has high accuracy. Secondly, when the scene information does not accord with the binocular depth estimation condition, the error rate of determining the depth information by adopting a binocular depth estimation algorithm is higher, so that the depth information is determined based on the monocular depth estimation algorithm; the accuracy of the depth information determined based on the monocular depth estimation algorithm is higher than if the binocular depth estimation algorithm was used.

In summary, by adopting the scheme, more accurate depth information can be obtained under different shooting scenes, and the robustness of the image depth estimation method is improved.

With reference to the first aspect, in one possible implementation manner, the first depth information included in the first depth map and the second depth information included in the second depth map are both absolute depth information. The method further comprises the steps of: in response to a first operation, a time-of-flight TOF depth map is acquired. The obtaining a second depth map based on the monocular depth estimation algorithm and the monocular image in the binocular image includes: performing depth estimation on the monocular image by adopting a monocular depth estimation algorithm to obtain a third depth map; the third depth information included in the third depth map is relative depth information; and converting the third depth map according to the TOF depth map to obtain a second depth map.

It will be appreciated that the electronic device is responsive to the first operation, except that binocular images are acquired while TOF depth maps are acquired. If the light is brighter and the shooting object is approximately in a static state, the electronic equipment can acquire more accurate depth information by adopting a binocular depth estimation algorithm. If the shooting scene is darker and the shooting scene of the shooting object is in a motion state, the electronic equipment does not acquire the depth information by adopting a binocular depth estimation algorithm, but acquires the depth information based on a monocular depth estimation algorithm. That is, the electronic device employs different depth estimation algorithms under different shooting scenes, and the electronic device can switch the employed depth estimation algorithms when the shooting scene changes, for example, switch from a binocular depth estimation algorithm to a monocular depth estimation algorithm, or switch from the monocular depth estimation algorithm to the binocular depth estimation algorithm. The binocular depth estimation algorithm obtains absolute depth information, the monocular depth estimation algorithm obtains relative depth information, and the absolute depth information and the relative depth information are depth information with different dimensions. When the electronic device performs blurring processing on the image based on depth information of different dimensions, the blurring degree of the photographed object at the same distance from the photographing subject is different (or blurring degree is different), which may cause the user to perceive that the blurring degree of the photographed object at the same distance from the camera is changed.

For this, the electronic device may convert the relative depth information acquired using the monocular depth estimation algorithm into absolute depth information using the TOF depth map. The converted absolute depth information and the absolute depth information obtained by the binocular depth estimation algorithm can both represent the real distance between the photographed object and the camera, i.e., the converted absolute depth information and the binocular depth estimation algorithm are depth information in the same dimension. Then, the electronic apparatus does not have a case where the blurring degree of the subject, which is the same as the distance of the subject, is different when switching the depth estimation method employed. The blurring degree of the shooting object which is not sensed by the user and has the same distance with the camera is changed, and the noninductive switching of the binocular depth estimation algorithm and the monocular depth estimation algorithm can be realized.

With reference to the first aspect, in another possible implementation manner, the converting the third depth map according to the TOF depth map to obtain a second depth map includes: determining target conversion parameters according to the TOF depth map and the third depth map; the target conversion parameter is used for converting the relative depth information into absolute depth information; and converting the third depth information in the third depth map by using the target conversion parameter to obtain a second depth map.

This embodiment describes one implementation of converting the third depth map from the TOF depth map.

With reference to the first aspect, in another possible implementation manner, determining the target conversion parameter according to the TOF depth map and the third depth map includes: performing pixel point matching on the TOF depth map and the third depth map to obtain a plurality of pixel point pairs; each of the plurality of pixel pairs includes: one pixel in the TOF depth map and one pixel in the third depth map; and determining target conversion parameters for minimizing a preset target function according to the TOF depth information and the third depth information of each pixel point pair in the plurality of pixel point pairs and the preset target function.

The preset objective function is used for representing the sum of squares of differences between the TOF depth information and the converted depth information, and the converted depth information is obtained by converting the third depth information by utilizing the objective conversion parameter.

It can be understood that the electronic device in the embodiment of the present application determines a plurality of matched pixel pairs from the TOF depth map and the relative depth map, and obtains the target conversion parameter for converting the relative depth information into the second absolute depth information by using the plurality of pixel pairs, that is, the embodiment uses the plurality of pixel pairs to realize the conversion of the relative depth information into the second absolute depth information. Compared with the method that the whole TOF depth map and the relative depth map are input into one network model, so that the network model converts the relative depth information into the absolute depth information based on the whole TOF depth map and the relative depth map, the method only utilizes a plurality of pixel pairs to convert the relative depth information into the second absolute depth information, and has lower requirements on resolution and accuracy of the TOF depth map and lower performance requirements on a TOF sensor for acquiring the TOF depth map. The cost of the TOF sensor with lower performance is lower.

With reference to the first aspect, in another possible implementation manner, the scene information includes: brightness index and motion information of shooting objects in a shooting scene; the magnitude of the brightness index is inversely proportional to the brightness of the shooting scene; the movement information of the photographic subject is used to represent the displacement amount of the photographic subject in the binocular image. The binocular depth estimation conditions include: the brightness index is smaller than a preset brightness threshold, and the motion information of the shooting object is smaller than a preset displacement threshold.

It will be appreciated that the lower brightness of the photographed scene results in an unclear photographed object in the captured binocular image. The moving state of the subject in the shooting scene causes the subjects in the binocular image to be different, for example, the position of the subject in the binocular image shifts. The unclear shot objects in the binocular image and the non-uniform shot objects in the binocular image can cause errors in the depth information acquired by the binocular depth estimation algorithm, so that the electronic equipment can acquire scene information such as brightness indexes and motion information of the shot objects and judge whether the scene information such as the brightness indexes and the motion information of the shot objects meet the binocular depth estimation condition. The binocular depth estimation condition includes: the brightness index is smaller than a preset brightness threshold, and the motion information of the shooting object is smaller than a preset displacement threshold. The brightness index being smaller than the preset brightness threshold value indicates that the brightness of the shooting scene is higher, and the movement information of the shooting object being smaller than the preset displacement threshold value indicates that the shooting object in the shooting scene is in an approximately static state. Furthermore, when the scene information meets the binocular depth estimation condition, the brightness of the photographed scene is high, and the photographed object is in an approximately static state, so that the accuracy of determining the depth information by the binocular depth estimation algorithm is not affected, namely, the accuracy of determining the depth information by the binocular depth estimation algorithm is high.

With reference to the first aspect, in another possible implementation manner, the method further includes: carrying out Gaussian smoothing and difference operation on the first depth map or the second depth map to obtain an image after Gaussian difference; and blurring the monocular image based on the Gaussian difference image to obtain a blurred image.

It can be appreciated that by adopting the scheme, more accurate depth information can be obtained under different shooting scenes. The image is subjected to blurring processing by using accurate depth information, so that a shooting subject close to the shooting subject and a foreground and a background far from the shooting subject can be accurately determined. And further, the visual effects of clear shooting subjects and blurred foreground and background in the image can be realized, namely, a better image blurring effect is realized.

With reference to the first aspect, in another possible implementation manner, the blurring processing is performed on the monocular image based on the image after gaussian difference to obtain a blurred image, where the blurring processing includes: determining a shooting subject in a monocular image; the shooting subject is determined by the electronic device in response to the second operation, or is determined based on a shooting mode started by the electronic device, or is defaulted by the electronic device; and blurring processing is carried out on other areas except the area where the shooting subject is located in the monocular image based on pixel values in the image after Gaussian difference, so that a blurred image is obtained.

This embodiment describes one implementation of blurring a monocular image.

With reference to the first aspect, in another possible implementation manner, the blurring processing is performed on other areas of the monocular image except for the area where the subject is located based on pixel values in the image after gaussian difference, so as to obtain a blurred image, where the blurring processing includes: and according to pixel values of pixel points corresponding to other areas in the image after Gaussian difference, carrying out blurring processing of different degrees on other areas in the monocular image to obtain a blurred image, so that the blurring degree of the area, which is farther from a shooting subject, in the blurred image is higher.

This embodiment describes one implementation of blurring a monocular image.

In a second aspect, there is provided an electronic device comprising: a processor, a memory, and a communication interface. The memory and the communication interface are coupled to the processor, the memory for storing computer program code, the computer program code comprising computer instructions. Wherein the processor, when executing the computer instructions, causes the electronic device to perform the image depth estimation method according to any one of the first aspects.

In a third aspect, a computer-readable storage medium having computer instructions stored therein is provided. When executed on an electronic device, the computer instructions cause the electronic device to perform the image depth estimation method according to any one of the first aspects above.

In a fourth aspect, there is provided a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the image depth estimation method according to any one of the first aspects above.

In a fifth aspect, there is provided an apparatus (e.g. the apparatus may be a chip system) comprising a processor for supporting an electronic device to implement the image depth estimation method of the first aspect described above. In one possible design, the apparatus further includes a memory for storing program instructions and data necessary for the electronic device. When the device is a chip system, the device can be formed by a chip, and can also comprise the chip and other discrete devices.

The technical effects of any one of the design manners of the second aspect to the fifth aspect may be referred to the technical effects of the different design manners of the first aspect, and will not be repeated here.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of a mobile phone according to an embodiment of the present application;

FIG. 2 is a flowchart of an image depth estimation method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an operation of activating an image blurring function according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating another operation of activating an image blurring function according to an embodiment of the present application;

Fig. 5 is a schematic flow chart of depth information conversion according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a plurality of pixel pairs obtained by performing pixel matching on a relative depth map and a TOF depth map according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an image depth estimation process according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of an image depth estimation device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a chip system according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In order to facilitate the description of the technical solutions of the application, some concepts related to the present application will be described below first.

(1) Depth information: refers to the extent to which a photographic subject is spaced from the camera in three-dimensional space. Wherein the depth information may be a numerical value and the depth information may be referred to as a depth value.

In the embodiment of the present application, the depth map may refer to an image containing depth information of pixel points.

(2) Image blurring (or blurring process): is an image processing technology, which is used for realizing the visual effect of partial picture blurring in an image by applying blurring effect to pixels with depth information within a specific range in the image.

In an embodiment of the present application, the image may include a subject, a foreground, and a background. The subject may refer to a subject to which a focus of the camera is directed, and for example, the subject may be a person, an animal, a landscape, or the like. When the focal point of the camera changes, the subject changes. A picture in front of the subject in the image may be referred to as a foreground. The picture located behind the photographing subject may be referred to as a background.

Alternatively, the blurring process may include: the blurring effect is applied to the pixels in the foreground and the background, and the degree of blurring of the pixels in the foreground and the background that are farther from the subject is greater. Thereby realizing the visual effects of clear shooting subject and blurred foreground and background. The visual effects of clear shooting subject and blurred foreground and background are similar to the depth effect. The depth effect refers to clear imaging of an area in a depth range of a scene in an image, and the depth information of a shooting subject can be the depth range.

Further, the blurring process may further include: the same degree of blurring is used for pixels in the foreground and the background, which have a distance to the subject exceeding a preset distance threshold. The distance between the pixel and the subject may be represented by a difference between the depth information of the pixel and the depth information of the subject.

(3) Binocular depth estimation algorithm: is a method of calculating depth information of a photographing object in a photographing scene using a binocular camera system. By placing two cameras (typically, left and right) at a distance, depth information of a photographic subject is calculated by comparing images at two angles of view to obtain a depth map containing the depth information of the photographic subject.

The binocular depth estimation algorithm is mainly based on the principle of triangulation, and the specific process can include: shooting a shooting object through two cameras simultaneously to obtain two frames of images; then, according to the two frames of images, measuring the pixel offset or parallax of the same shooting object shot by the two cameras; and then the distance, namely depth information, of the shooting object relative to the camera can be calculated through a plurality of geometric relations and calibration parameters. This process requires steps of stereo matching, parallax computation, depth restoration, and the like.

The parallax refers to the difference of the positions of the shooting objects in two images when the two cameras shoot the same shooting object. Binocular depth estimation algorithms rely on natural light and therefore the brightness of the rays of the photographed scene has a greater impact on the method.

Alternatively, two frames of images acquired by two cameras used by the binocular depth estimation algorithm may be referred to as binocular images.

(4) Monocular depth estimation algorithm: is a method of calculating depth information of a photographing object in a photographing scene using a single camera. Compared with a binocular depth estimation algorithm, the monocular depth estimation algorithm does not need two cameras, and can realize depth estimation only by one camera.

The monocular depth estimation algorithm is based primarily on visual cues in the image and machine learning algorithms. By carrying out processing such as semantic segmentation, motion analysis, texture analysis and the like on the image, the depth information of the shooting object can be deduced by combining an existing depth information database or a trained depth prediction model.

However, the monocular depth estimation algorithm does not have real distance information, which means that the monocular depth estimation algorithm cannot acquire depth information representing a real distance, but can only acquire depth information representing how far or how far all photographic subjects are from the camera. The depth information indicating how far or near all the photographing objects are from the camera may be referred to as relative depth information. Further, a depth map including relative depth information of the object is obtained. A depth image containing relative depth information of a photographic subject may be referred to as a relative depth map.

Alternatively, the depth information acquired by the binocular depth estimation algorithm may represent a real distance between the photographed object and the camera, which may be referred to as absolute depth information. The depth map containing absolute depth information of the photographic subject may be referred to as an absolute depth map.

Alternatively, a frame of image acquired by one camera used by the monocular depth estimation algorithm may be referred to as a monocular image.

(5) Time of flight (TOF) technique: is a method for measuring a distance between a photographic subject and a camera by measuring a time required for an optical signal from transmission to reception to infer distance information of the photographic subject.

TOF technology generally uses an infrared light source as a light source, and the light source emitted by an emitter is received by a receiver after being reflected by a subject. By recording the time from emission to reception of the light source, and combining the speed of light, the distance traveled by the light source in space can be calculated. By measuring the entire shooting scene a plurality of times, depth information of the shooting object can be obtained.

In the prior art, a binocular depth estimation algorithm is generally adopted to acquire absolute depth information, and blurring processing is carried out on an image based on the absolute depth information. The binocular depth estimation algorithm needs to perform stereo matching on two frames of images captured by two cameras to determine a position difference (i.e., parallax) of a captured object in the two frames of images. However, both a photographing scene with darker light (e.g., a night photographing scene) and a photographing scene with a photographing object in a moving state (e.g., a moving photographing scene) may cause errors in stereo matching, and the determined parallax may be wrong, which in turn may cause errors in absolute depth information determined according to the parallax.

In summary, in the prior art, absolute depth information obtained by using a binocular depth estimation algorithm is affected by a shooting scene, and there is a problem that the absolute depth information is inaccurate. Further, when the electronic device performs blurring processing on the image based on the wrong absolute depth information, there are problems that blurring processing is performed on a shooting subject to be clearly displayed in the image, blurring processing is not performed on a foreground and a background to be blurred in the image, blurring degree of a foreground and a background far away from the shooting subject is insufficient, and the like, so that the shooting subject in the image is clear, and visual effects of blurring the foreground and the background in the image, that is, visual effects of the image after blurring processing are poor, cannot be achieved.

In view of the above problems, an embodiment of the present application provides an image depth estimation method, where an electronic device may collect binocular images of a shooting scene in response to a user operation for starting an image blurring function. Because the brightness of the shooting scene is low and the shooting object in the shooting scene is in a motion state, the depth information acquired by the binocular depth estimation algorithm is wrong, the electronic equipment can acquire the environment brightness information, the motion information of the shooting object and other scene information according to the binocular image, and judge whether the environment brightness information, the motion information of the shooting object and other scene information meet the binocular depth estimation condition. The binocular depth estimation condition may be used to indicate that the brightness of the photographed scene is high and that the photographed object is approximately in a stationary state. Further, if the scene information meets the binocular depth estimation condition, it means that the brightness of the photographed scene is high, and the photographed object is approximately in a stationary state, the accuracy of the depth information determined by the binocular depth estimation algorithm is high. Secondly, if the scene information does not meet the binocular depth estimation condition, the error rate of determining the depth information by adopting a binocular depth estimation algorithm is higher, and therefore, the depth information is determined based on the monocular depth estimation algorithm. The accuracy of the depth information determined based on the monocular depth estimation algorithm is higher than if the binocular depth estimation algorithm was used.

In summary, by adopting the scheme, more accurate depth information can be obtained under different shooting scenes, and the robustness of the image depth estimation method is improved. Further, the image is subjected to blurring processing by using accurate depth information, so that a shooting subject close to the shooting subject and a foreground and a background far from the shooting subject can be accurately determined, further, the shooting subject in the image is clear, and the visual effect of blurring of the foreground and the background is achieved, namely, a better image blurring effect is achieved.

The image depth estimation method provided by the embodiment of the application can be applied to electronic equipment. The electronic device may be an electronic device that does not include a display, e.g., a system-on-chip. The electronic device may also be an electronic device including a display, such as a cell phone, tablet, notebook, device composed of a chip system and a display, and the like. The execution subject of the image depth estimation method provided by the embodiment of the application can also be an image depth estimation device. The image depth estimation means may be an electronic device or the image depth estimation means may be a control unit in the electronic device for performing the image depth estimation method. The embodiment of the present application is not limited in this regard.

The hardware structure of the electronic device will be described below by taking the example that the electronic device is a mobile phone as an example. As shown in fig. 1, the handset 100 may include a processor 110, an external memory interface 120, an internal memory 121, a sensor module 130, a display screen 140, a camera module 150, and the like. Among other things, the sensor module 130 may include a touch sensor 131, an image sensor 132, and a TOF sensor 133, among others. The camera module 150 may include a main camera (or referred to as a main camera) 151, a wide-angle camera 152, and a tele camera 153.

The processor 110 may be configured to perform the various functions or steps performed by the handset in the method embodiments described above.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), and the like. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

Wherein the memory is used to store instructions and data, for example, to store TOF depth maps. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

The external memory interface 120 may be used to connect to an external memory card, such as a Micro SD card, to extend the memory capabilities of the handset. The external memory card communicates with the processor 120 through the external memory interface 120 to implement data storage functions. The internal memory 120 may be used to store computer executable program code including instructions. The processor 120 performs the various functions or steps performed by the handset in the method embodiments described above by executing instructions stored in the internal memory 120. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an APP required for at least one function of an operating system, and the like. The data storage area can store data and the like created in the using process of the mobile phone. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The display screen 140 is used for displaying an interface or the like. The display 140 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, a light-emitting diode (LED), an organic light-emitting diode (OLED), or the like.

If the touch sensor 131 is integrated with the display screen 140 in the embodiment of the present application, the display screen 140 may be referred to as a touch screen. The touch sensor 131 may also be referred to as a "touch panel". That is, the display screen 140 may include a display panel and a touch panel. The touch sensor 131 is used to detect a touch operation acting thereon or thereabout. After the touch sensor 131 detects a touch operation (for example, the image preview operation and the photographing operation), the driving of the kernel layer of the mobile phone may be triggered to periodically scan the touch parameters generated by the touch operation. Then, the driver of the kernel layer transmits the touch parameters to the related module of the upper layer, so that the related module can determine the touch event corresponding to the touch parameters. In the embodiment of the present application, the specific process of the image depth estimation method is described by taking the display screen 140 as an example of the display screen (i.e., touch screen) integrated with the touch sensor 131.

The mobile phone 100 implements display functions through a GPU, a display screen 140, an AP, and the like. The mobile phone 100 may implement photographing functions through an ISP, a camera module 150, an image sensor 132, a video codec, a GPU, a display screen 140, an AP, and the like.

In an embodiment of the present application, the camera module (or camera) of the mobile phone 100 may include ISP, camera module 150, image sensor 132, and other components. The cell phone camera captures light through any one of the cameras (e.g., the main camera 151, the wide angle camera 152, the tele camera 153, etc.) in the camera module 150 and transmits the light signal to the image sensor 132. The optical signal is converted into an original image (or referred to as a raw image) by the image sensor 132. Then, the mobile phone camera processes the raw image into a YUV image through ISP, and finally converts the YUV image into an RGB image for display and subsequent processing. This process may be referred to as a camera capturing an image.

The raw image refers to an image obtained by converting a captured optical signal into an electrical signal by a photosensitive element (e.g., the image sensor 132) in the mobile phone camera.

YUV is a color-coded format, where Y represents luminance (luminance) and U and V represent chrominance (chrominance). In a camera module, the raw image is typically processed by ISP and converted into an image in YUV format (i.e., YUV image). The process involves white balance correction, color correction, noise reduction, etc. algorithms to convert the raw map into an image suitable for viewing by the human eye and subsequent processing.

The RGB image is an image composed of three color channels of red, green and blue, and the color of each pixel is composed of the numerical values of the three channels. In a cell phone camera, the YUV image is usually subjected to further image processing, such as sharpening, contrast adjustment, etc., and finally converted into an RGB image. The RGB image may be displayed directly on the display 140 of the mobile phone 100 or may be edited and processed later, such as blurring, etc.

Among them, the main camera 151 is a main camera in a camera module, and generally has a high pixel number and a large photosensitive element size for taking high-quality photographs and videos. The main camera 151 generally has a medium viewing angle, and can take a picture that is more natural and close to the viewing angle of human eyes.

The wide angle camera 152 has a shorter focal length so that it can capture a wider field of view, capturing more scenes and environments. The wide-angle camera 152 is often used to capture scenes such as landscapes, indoor, large groups of people, etc., with a wide field of view and a more enhanced perspective effect.

The tele camera 153 has a longer focal length, and photographs a distant object by optical magnification. The device can capture the details of a long distance, and is suitable for shooting scenes requiring higher magnification, such as objects, wild animals and the like at a far distance.

In some embodiments, the two cameras used by the handset 100 in performing the binocular depth estimation algorithm may be a main camera 151 and a wide angle camera 152.

In some embodiments, the single camera used by the handset 100 in performing the monocular depth estimation algorithm may be the main camera 151.

In an embodiment of the present application, TOF sensor 133 may include a transmitter and a receiver. To obtain a TOF depth map, the TOF sensor 133 typically emits and receives pulses of light in different directions and calculates depth information for each pixel point by triangulation principles or other algorithms. Thus, a TOF depth map containing depth information for each pixel can be obtained.

It should be understood that the structure illustrated in this embodiment is not limited to a specific configuration of the mobile phone. In other embodiments, the handset may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

It should be noted that, the methods in the following embodiments may be implemented in the mobile phone 100 having the above-described hardware structure.

In an embodiment of the application, the electronic device may include a camera module including a plurality of cameras and a TOF sensor. The electronic device can acquire two frames of images through two cameras in the camera module in response to user operation for starting an image blurring function, and simultaneously acquire a TOF depth map through a TOF sensor. Due to inaccurate depth information acquired by the binocular depth estimation algorithm in a photographing scene with dark light (e.g., a night photographing scene) and a photographing scene with a photographing object in a moving state (e.g., a moving photographing scene). Thus, the electronic device can perform scene recognition based on the two frames of pictures. If the light is brighter and the shooting object is approximately in a static state, the electronic equipment can acquire more accurate depth information by adopting a binocular depth estimation algorithm. If the shooting scene is darker and the shooting scene of the shooting object is in a motion state, the electronic equipment does not acquire the depth information by adopting a binocular depth estimation algorithm, but acquires the depth information based on a monocular depth estimation algorithm.

Further, the electronic device adopts different depth estimation methods under different shooting scenes, that is, the electronic device can switch the adopted depth estimation methods when the shooting scene changes, for example, switch from the binocular depth estimation algorithm to the monocular depth estimation algorithm or switch from the monocular depth estimation algorithm to the binocular depth estimation algorithm. The binocular depth estimation algorithm obtains absolute depth information, the monocular depth estimation algorithm obtains relative depth information, and the absolute depth information and the relative depth information are depth information with different dimensions. When the electronic device performs blurring processing on the image based on depth information of different dimensions, the blurring degree of the photographed object having the same distance as the photographing subject is different (or blurring degree is different), resulting in that the user perceives that the blurring degree of the photographed object having the same distance as the camera is changed.

Referring to fig. 2, a flowchart of an image depth estimation method according to an embodiment of the present application is shown. As shown in fig. 2, the method may include S201-S210.

S201, receiving a first operation.

The electronic device may receive a first operation of a user input. This first operation may be used to activate the image blurring function of the camera.

In some embodiments, the electronic device may provide multiple capture modes, e.g., a portrait capture mode, a large aperture capture mode, a night scene capture mode, and so forth. Some of the plurality of photographing modes have an image blurring function. The photographing mode having the image blurring function may be referred to as a target photographing mode. For example, the target shooting mode provided by the system camera APP in the electronic device may include: a portrait shooting mode and a large aperture shooting mode.

Alternatively, the electronic device may receive a first operation for starting the target photographing mode in a case where the camera is operated. For example, the electronic device may receive a first operation in the case of an image preview, the first operation to initiate a target shooting mode.

Second, the electronic device may also receive a first operation to activate the camera with the camera off, and to activate a target shooting mode of the camera. For example, the electronic device receives a first operation for starting an image preview function in a portrait shooting mode of the system camera APP in a case where the system camera APP is turned off.

Illustratively, taking the electronic device as a mobile phone, and the first operation is used to start the target shooting mode for describing the process of receiving the first operation by the electronic device, as shown in (a) of fig. 3, the mobile phone displays the first image preview interface 310 of the system camera APP when the camera is working. The image preview interface 310 may include an icon 311 corresponding to a portrait photographing mode and an icon 312 corresponding to a large aperture photographing mode. The cell phone may receive a click operation of the icon 311 input by the user on the image preview interface 310. Clicking operation on the icon 311 is used to start the portrait shooting mode. The mobile phone 100 starts an image preview function in the portrait photographing mode in response to a click operation on the icon 311 and displays a second image preview interface 320 in the portrait photographing mode, as shown in (b) of fig. 3. The click operation on the icon 311 belongs to the first operation.

Illustratively, taking the electronic device as a mobile phone, and the first operation is used to start the camera, and the target shooting mode of the camera is started for describing the process that the electronic device receives the first operation, as shown in (a) of fig. 4, the mobile phone displays a desktop interface 410, where the desktop interface 410 includes an icon 411 of the system camera APP. The cell phone may receive a click operation on icon 411 entered by the user on desktop interface 410. Clicking on icon 411 may be used to activate an image preview function in portrait shooting mode of system camera APP. The mobile phone 100 starts an image preview function of the camera and the portrait photographing mode of the camera in response to the click operation of the icon 411 and displays the second image preview interface 320 in the portrait photographing mode, as shown in (b) of fig. 4. The click operation on the icon 411 belongs to the first operation.

S202, acquiring a binocular image and a TOF depth map of a shooting scene in response to a first operation.

The electronic device controls a camera module in the electronic device to acquire two frames of images (or referred to as binocular images) and a TOF depth map of a photographed scene simultaneously in response to a first operation for starting an image blurring function. Specifically, the electronic device may control two cameras in the camera module to acquire binocular images in response to the first operation, while controlling a TOF sensor (e.g., TOF sensor 133) in the camera module to acquire a TOF depth map. The depth information comprised by the TOF depth map may be referred to as TOF depth information.

In some embodiments, the binocular image may comprise a two frame RGB image.

Alternatively, the electronic apparatus may control the main camera (e.g., the main camera 151) and the wide-angle camera (e.g., the wide-angle camera 152) to acquire one frame of image of the photographed scene, respectively, in response to the first operation, thereby obtaining the binocular image.

It should be noted that, the details of the electronic device controlling the main camera to collect one frame of image and the details of the wide-angle camera to collect one frame of image may refer to the above description about "camera to collect image".

In some embodiments, the resolution of the TOF depth map may be less than the resolution of the image acquired by the electronic device. For example, the resolution of the TOF depth map is 30×40.

For example, please refer to fig. 5, which is a schematic flow chart of depth information conversion according to an embodiment of the present application. As shown in fig. 5, the binocular image acquired by the electronic equipment includes a frame of image 510. The electronics also acquire a TOF depth map 520.

It should be noted that, all the images shown in fig. 5 (including the image 510, the TOF depth map 520, the relative depth map 530, the second depth map 540, and the blurred image 550) may be color maps in practical applications. The color of each pixel in the TOF depth map 520 may be used to represent depth information for each pixel in the TOF depth map 520, e.g., red for larger depth information and blue for smaller depth information, not shown in fig. 5. That is, the color of the pixel in the TOF depth map 520 shown in fig. 5 cannot represent the TOF depth information of the pixel in practical use, but the TOF depth information of the pixel with the same color in the TOF depth map 520 shown in fig. 5 is the same.

It will be appreciated that if the electronic device acquires a TOF depth map with a smaller resolution, the electronic device may use a lower performing TOF sensor that is less costly.

S203, acquiring scene information of a shooting scene according to the binocular image.

The electronic device can extract information from the binocular image to obtain scene information of the shooting scene. Therefore, whether the binocular depth estimation algorithm can be adopted for the shooting scene can be judged according to the scene information of the shooting scene.

In some embodiments, the scene information may include: ambient brightness information and/or motion information of a photographic subject. The ambient brightness information may include a brightness index (lux index), among others. The unit of the brightness index is lux (lux). The magnitude of the brightness index is inversely proportional to the brightness of the photographed scene, that is, the greater the brightness index, the lower the brightness of the photographed scene. The ambient brightness information may further include: the illumination intensity of the photographed scene. The magnitude of the illumination intensity is proportional to the brightness of the photographed scene, that is, the greater the illumination intensity, the higher the brightness of the photographed scene.

Second, the motion information of the photographing object may represent the displacement amount of the photographing object in the binocular image, the unit of the motion information may be the number of pixels, and the larger the motion information is, the faster the motion speed of the photographing object is.

In some embodiments, the electronic device obtaining ambient brightness information (e.g., brightness index) may include: the ambient brightness information is obtained by performing brightness information analysis on one frame of image (for example, an image acquired by a main camera) in the binocular image.

In some embodiments, the electronic device acquiring the motion information of the photographic subject may include: performing key point detection and matching on the binocular image to obtain a key point corresponding relation; the key point correspondence is used for representing a one-to-one correspondence between a plurality of key points (or called first key points) in one frame of image and a plurality of key points (or called second key points) in another frame of image; then, the displacement of each first key point and the corresponding second key point in the binocular image can be determined; and obtaining the motion information of the shooting object according to the determined displacement amounts. Wherein one frame of image and the other frame of image belong to binocular images.

S204, judging whether the scene information meets the binocular depth estimation condition.

After obtaining scene information of a shooting scene, the electronic equipment judges whether the scene information meets binocular depth estimation conditions. If the scene information satisfies the binocular depth estimation condition, the electronic apparatus may acquire depth information using a binocular depth estimation algorithm, i.e., perform S205-S206. If the scene information does not satisfy the binocular depth estimation condition, the electronic device may acquire depth information using a monocular depth estimation algorithm, i.e., perform S207-S208.

In some embodiments, the ambient brightness information may include a brightness index, and accordingly, the binocular depth estimation condition may include: the brightness index is smaller than a preset brightness threshold (e.g., 350 lux), and/or the motion information of the subject is smaller than a preset displacement threshold (e.g., 15 pixels).

Optionally, in the case where the scene information of the shooting scene obtained by the electronic device includes a brightness index and motion information of the shooting object, the electronic device may determine whether the brightness index is smaller than a preset brightness threshold, and determine whether the motion information is smaller than a preset displacement threshold. If the brightness index is smaller than the preset brightness threshold value and the motion information is smaller than the preset displacement threshold value, the scene information meets the binocular depth estimation condition. If the brightness index is larger than a preset brightness threshold value or the motion information is larger than a preset displacement threshold value, the scene information does not meet the binocular depth estimation condition.

It is understood that a brightness index smaller than the preset brightness threshold indicates that the brightness of the photographed scene is high. When the motion information of the shooting object is smaller than the preset displacement threshold value, the shooting object is approximately in a static state. Under the condition that scene information meets binocular depth estimation conditions (comprising brightness indexes being smaller than a preset brightness threshold value and motion information of a shooting object being smaller than a preset displacement threshold value), the fact that the brightness of the shooting scene is higher does not cause errors of depth information acquired by a binocular depth estimation algorithm, and the fact that the shooting object is approximately in a static state does not cause errors of depth information acquired by the binocular depth estimation algorithm. Therefore, the electronic equipment can acquire accurate depth information by adopting the binocular depth estimation algorithm when the scene information meets the binocular depth estimation condition. Secondly, when the scene information does not meet the binocular depth estimation condition, at least one of the lower brightness of the photographed scene and the moving state of the photographed object may cause the error of the depth information acquired by the binocular depth estimation algorithm. Therefore, when the scene information does not satisfy the binocular depth estimation condition, the accuracy of the depth information acquired by the electronic device based on the monocular depth estimation algorithm is higher than that obtained by adopting the binocular depth estimation algorithm.

In other embodiments, the ambient brightness information may include an illumination intensity of the photographed scene, and the magnitude of the illumination intensity is proportional to the brightness of the photographed scene. Accordingly, the binocular depth estimation condition may include: the illumination intensity is greater than a preset brightness threshold (e.g., 350 lux), and/or the motion information of the subject is less than a preset displacement threshold (e.g., 15 pixels).

Alternatively, after obtaining the scene information of the shooting scene, the electronic device may not execute S204, but determine whether the scene information satisfies the monocular depth estimation condition. If the scene information satisfies the monocular depth estimation condition, the electronic device may acquire depth information using the monocular depth estimation algorithm, i.e., perform S207-S208. If the scene information does not satisfy the monocular depth estimation condition, the electronic device may acquire depth information using a binocular depth estimation algorithm, i.e., perform S205-S206.

Wherein the monocular depth estimation condition is opposite to the binocular depth estimation condition. For example, the monocular depth estimation condition may include: the brightness index is greater than a preset brightness threshold (e.g., 350 lux), and/or the motion information of the subject is greater than a preset displacement threshold (e.g., 15 pixels).

S205, correcting the binocular image to obtain two frames of corrected images.

Under the condition that scene information meets binocular depth estimation conditions, the electronic equipment can firstly correct binocular images to obtain two frames of corrected images so as to obtain depth information.

In some embodiments, the electronic device may perform polar correction on the binocular image first to obtain two frames of initially corrected image; and performing self-learning correction on the two frames of images after initial correction to obtain two frames of corrected images.

It can be appreciated that, due to a certain difference in the viewing angles of the images acquired by the two cameras, in order to simplify the depth estimation problem, it is necessary to perform epipolar correction on the binocular images acquired by the two cameras. The epipolar correction is a process of aligning scanning lines of an image with epipolar lines, and corresponding pixel points in a binocular image are aligned on the same horizontal line by calculating a corresponding relationship between the binocular images by using a basic matrix. Secondly, the self-learning correction is to solve errors or inaccuracy possibly existing in the polar correction, and the accuracy of depth estimation can be improved through the self-learning correction.

S206, binocular depth estimation is carried out on the two frames of corrected images, and a first depth map is obtained and comprises first absolute depth information.

The electronic device may first process the two corrected images to obtain a disparity map. The disparity map may refer to an image containing a disparity value for each pixel. Each parallax value represents the horizontal interval between corresponding two pixel points in the two-frame corrected image. Then, the electronic device may convert the parallax value of each pixel in the parallax map to obtain a first depth map. The first depth map includes a depth value (or referred to as first depth information) for each pixel. The first depth information in the first depth map may be referred to as first absolute depth information, and the first depth map may be referred to as a first absolute depth map.

Alternatively, the resolution of the disparity map and the resolution of the first depth map may both be equal to the resolution of the image acquired by the camera.

S207, monocular depth estimation is carried out on monocular images in the binocular images, and a relative depth map is obtained.

Under the condition that scene information does not meet the binocular depth estimation condition, the electronic equipment can perform monocular depth estimation on one frame of image (or referred to as monocular image) in the binocular image to obtain a depth map. The depth map includes depth information representing the relative distances between all of the objects in the image, not the true distances of all of the objects from the camera. The depth information in the depth map may be referred to as relative depth information and the depth map may be referred to as a relative depth map.

Alternatively, the monocular image may be a main shot image in the electronic device.

Illustratively, the image 510 as shown in FIG. 5 is a proactively acquired image. The electronics can monocular depth estimate the image 510 resulting in a relative depth map 530. The pixel value of each pixel in the relative depth map 530 may be used to represent the relative depth information of each pixel, and the colors of pixels having the same relative depth information may be the same.

It should be noted that the color of each pixel in the relative depth map 530 may be used to represent the depth information of each pixel in the relative depth map 530, for example, red represents larger depth information, and blue represents smaller depth information, which is not shown in fig. 5. That is, the color of the pixel in the relative depth map 530 shown in fig. 5 cannot represent the relative depth information of the pixel in practical use, but the relative depth information of the pixel having the same color in the relative depth map 530 shown in fig. 5 is the same.

S208, converting the relative depth map according to the TOF depth map to obtain a second depth map, wherein the second depth map comprises second absolute depth information.

The electronic equipment can determine target conversion parameters according to the acquired TOF depth map and the relative depth map; the target conversion parameter may convert the relative depth information into absolute depth information. And then, converting the relative depth information of each pixel in the relative depth map by using the target conversion parameter to obtain a second depth map comprising second depth information. Wherein the second depth information may be referred to as second absolute depth information and the second depth map may be referred to as second absolute depth map.

In some embodiments, the electronic device may process the TOF depth map and the relative depth map using a least squares method to obtain the target conversion parameters. The target conversion parameters may include a target scale factor (scale) and a target offset (shift). The electronic device processes the TOF depth map and the relative depth map by using a least square method, and obtaining the target conversion parameter may include: firstly, performing pixel point matching on the TOF depth map and the relative depth map to obtain a plurality of pixel point pairs; and calculating to obtain target conversion parameters according to the relative depth information and TOF depth information of each pixel point pair of the plurality of pixel point pairs. Each pixel point pair may include: one pixel in the TOF depth map and one similar point in the relative depth map.

Illustratively, in conjunction with fig. 5, as shown in fig. 6, the electronic device may perform pixel matching on the TOF depth map 520 and the relative depth map 530, resulting in a plurality of pixel pairs. The plurality of pixel pairs may include: a plurality of pixels in the TOF depth map 520 are shown by a plurality of circles 531. The plurality of pixel pairs may further include a plurality of pixels in the relative depth map 530, and the plurality of pixels in the relative depth map 530 may be a plurality of pixels in the relative depth map 530 corresponding to the positions of the plurality of circles 531.

It should be noted that the color of each circle 531 may be used to represent depth information of each pixel in the TOF depth map 520, for example, red represents larger depth information, and blue represents smaller depth information, which is not shown in fig. 6.

Second, the size of each circle 531 in fig. 6 cannot represent the actual size of the pixel point.

Further, the electronic device may calculate the relative depth information and the TOF depth information of each of the plurality of pixel pairs by using a preset objective function, to obtain the objective conversion parameter. The preset objective function is used for representing the sum of squares of differences between a plurality of TOF depth information of a plurality of pixel point pairs and a plurality of converted depth information, wherein the plurality of converted depth information is obtained by converting third depth information of the plurality of pixel point pairs by utilizing an objective conversion parameter.

Alternatively, the preset objective function may be a first objective function as shown in the following formula (1). The electronic device may substitute the relative depth information and the TOF depth information of each of the plurality of pixel pairs into the first objective function, and calculate a target scale factor and a target offset. The first objective function is used to represent a least squares criterion. Using the first objective function, a scale factor s, which is the target scale factor, and an offset t, which is the target offset, can be calculated that minimizes the first objective function.

（1）

Wherein M is the total number of pixel pairs included in the plurality of pixel pairs.Is the relative depth information of the i-th pixel point in the plurality of pixel point pairs. /(I)Is the TOF depth information for the i-th pixel in the second pair of pixels. The ith pixel point in the relative depth map and the ith pixel point in the TOF depth map form a pixel point pair.

It will be appreciated that the first objective function is used to represent the sum of squares of the difference between the TOF depth information and the converted absolute depth information, and that the converted absolute depth information is converted to relative depth information using the scale factor s and the offset t. The scale factor s and offset t (i.e., target scale factor and target offset) that minimize the first objective function may minimize the sum of squares of the difference between the TOF depth information and the converted absolute depth information.

Alternatively, the preset objective function may be a second objective function as shown in the following formula (2). The electronic device can generate a relative depth vector according to the relative depth information of the ith pixel point。/>. T denotes a transpose operation. Then, the electronic device may substitute the relative depth vector and the TOF depth information of each pixel point into the second objective function, and calculate to obtain the objective transformation parameter. The target conversion parameter is a conversion parameter h that minimizes the second objective function. The conversion parameter h may be a vector consisting of a scale factor s and an offset t,/>. The scale factor s included in the target conversion parameter is a target scale factor, and the offset t included in the target conversion parameter h is a target offset.

（2）

Where opt denotes solving one conversion parameter h that minimizes the second objective function.

It will be appreciated that the second objective function may also be used to represent the sum of squares of the difference between the TOF depth information and the converted absolute depth information, and that the converted absolute depth information is converted from the relative depth information using the conversion parameter h. The conversion parameter h (i.e., the target conversion parameter) that minimizes the second objective function may minimize the sum of squares of the difference between the TOF depth information and the converted absolute depth information.

Alternatively, the second objective function may be replaced with an analytical function as shown in the following formula (3). The analytical function may also be used to calculate a conversion parameter h (i.e. a target conversion parameter) that minimizes the sum of squares of the difference between the TOF depth information and the converted absolute depth information.

（3）

Optionally, after the electronic device obtains the target scaling factor and the target offset, the following formula (4) may be used to convert the relative depth information d of each pixel in the relative depth map to obtain the second absolute depth information of each pixel. And further obtaining a second depth map comprising second absolute depth information of all the pixel points.

（4）

Illustratively, as shown in fig. 5, the electronic device may convert the relative depth map 530 according to the TOF depth map 520, resulting in the second depth map 540. The pixel value of each pixel in the second depth map 540 may be used to represent second absolute depth information of each pixel, where the pixel values of a plurality of pixels having the same second absolute depth information are the same.

It should be noted that, the color of each pixel in the second depth map 540 may be used to represent the depth information of each pixel in the second depth map 540, for example, red represents larger depth information, and blue represents smaller depth information, which is not shown in fig. 5. That is, the color of the pixel in the second depth map 540 shown in fig. 5 cannot represent the second absolute depth information of the pixel in practical use, but the second absolute depth information of the pixel in the second depth map 540 shown in fig. 5, which is the same in color, is the same.

Wherein the person in the image 510 captured by the electronic device through the main shot may be the subject of the shot. The background behind the person in image 510 includes object 511, object 512, and object 513. In the relative depth map 530 obtained by monocular depth estimation of the image 510, the color of the object 511, the color of the object 512, and the color of the object 513 are substantially the same, indicating that the relative depth information of the object 511, the relative depth information of the object 512, and the relative depth information of the object 513 are the same. However, the three objects (including the object 511, the object 512, and the object 513) are more and more distant from the person in the actual shooting scene. It is further known that the relative depth information of the three objects does not coincide with the true distance of the three objects to the person. In this regard, the electronic device may convert the relative depth map 530 according to the TOF depth map 520 to obtain a second depth map 540. In the second depth map 540, the color of the object 511 is different from the colors of the other two objects (including the object 512 and the object 513), and the second absolute depth information representing the object 511 is different from the second absolute depth information of the other two objects, and is consistent with the fact that the actual distance between the object 511 and the person is different from the actual distance between the other two objects and the person.

In some embodiments, the electronic device may use the same degree of blurring for pixels in the foreground and the background, where the distance between the pixels and the subject exceeds a preset distance threshold, so after the electronic device converts the relative depth map according to the TOF depth map to obtain a second depth map, the electronic device may set second depth information in the second depth map, which is greater than the preset distance threshold, to the same value (for example, the preset distance threshold), to obtain an updated second depth map.

For example, as shown in fig. 5, in an actual photographed scene, the distance of the object 512 from the person and the distance of the object 513 from the person are different, and the distances of the object 512 and the object 513 from the person are both far, possibly exceeding a preset distance threshold. The electronic device may set both the second depth information of object 512 and the second depth information of object 513 in second depth map 540 to a preset distance threshold, so that the color of object 512 and the color of object 513 in second depth map 540 are the same.

It should be noted that, compared with a method that the whole TOF depth map and the relative depth map are input into one network model, so that the network model converts the relative depth information into absolute depth information based on the whole TOF depth map and the relative depth map, in the embodiment of the application, the electronic device determines a plurality of matched pixel pairs from the TOF depth map and the relative depth map, and uses the technical scheme that the plurality of pixel pairs convert the relative depth information into second absolute depth information, so that the requirements on resolution and accuracy of the TOF depth map are lower, and the performance requirements on a TOF sensor for acquiring the TOF depth map are also lower. The cost of a lower performing TOF sensor is less.

S209, carrying out Gaussian smoothing and difference operation on the absolute depth map to obtain an image after Gaussian difference.

After the electronic device obtains the absolute depth map (i.e., the first depth map or the second depth map), gaussian smoothing and difference operation can be performed on the depth map, so as to obtain an image after gaussian difference. Alternatively, the gaussian-differentiated image may be referred to as a sigma graph. The electronic device extracts a region with large depth change, namely an object edge, by performing Gaussian smoothing and difference operation on the depth map. A sigma graph is an image representing a region where depth variation is large.

In some embodiments, performing, by the electronic device, a gaussian smoothing and difference operation on the depth map to obtain a sigma map may include: firstly, carrying out Gaussian blur (Gaussian blast) on a depth map to obtain a depth map after Gaussian blur; and then, differentiating the depth map after Gaussian blur to obtain a sigma map.

In some embodiments, the resolution of the absolute depth map (i.e., the first depth map or the second depth map) and the resolution of the binocular image may be the same. The resolution of the sigma map and the resolution of the binocular image may also be the same.

S210, blurring the monocular image based on the Gaussian difference image to obtain a blurred image.

The electronic device may perform blurring processing on the monocular image based on pixel values of pixel points in the sigma graph, to obtain a blurred image. For example, as shown in fig. 5, the electronic device performs blurring processing on the monocular image 510 to obtain a blurred image 550.

Further, the electronic device may display the blurred image on the photographing interface. For example, as shown in fig. 3 (b), the mobile phone 100 displays the blurred image on the second image preview interface 320.

In some embodiments, the electronic device may first determine the subject to be photographed; then, keeping the area where the shooting subject is located in the monocular image unchanged; and blurring processing is carried out on other areas except the area where the photographing main body is located in the monocular image according to the pixel values of the pixel points in the sigma graph, so that a blurred image is obtained. The other areas comprise an area where the background is located and an area where the foreground is located.

Alternatively, the electronic device may automatically determine a default photographing subject, or determine a photographing subject based on an activated photographing mode, or determine a photographing subject selected by a second operation in response to the second operation entered by the user. For example, as shown in (b) of fig. 3, the electronic apparatus determines that the subject is a person in the case where the started photographing mode is the portrait photographing mode. For another example, the electronic apparatus determines the area selected by the second operation as the subject of photographing.

Optionally, the electronic device may perform blurring processing to other areas in the monocular image to different degrees according to pixel values of pixel points corresponding to other areas in the sigma graph, so as to obtain a blurred image. The blurring degree of other areas in the blurred image is different. For example, the blurring degree of the region of the blurred image which is farther from the subject is higher.

Optionally, the electronic device performs blurring processing on other areas in the monocular image according to pixel values of pixel points in the sigma image, and obtaining the blurred image may include: and (3) adopting a target filtering method, and carrying out filtering processing with different degrees on other areas in the monocular image according to pixel values of pixel points in the sigma image (for example, carrying out filtering processing with different degrees on other areas in the monocular image), so as to obtain an image after blurring. The target filtering method may include at least one of: circular filtering, infinite impulse response (Infinite Impulse Response, IIR) filtering, etc.

The process of S209-S210 performed by the electronic device may be referred to as a blurring process.

Referring to fig. 7, a flowchart of another image depth estimation method according to an embodiment of the application is shown. As shown in fig. 7, after the electronic device acquires the binocular image and the TOF depth map, S203-S203 may not be executed, but the binocular image is input into a classification model for scene recognition, and the classification model outputs scene information that meets or does not meet the binocular depth estimation condition.

Then, in the case where the scene information satisfies the binocular depth estimation condition, the electronic device may input the binocular image to the binocular depth estimation network instead of S205-S206, and perform parallax measurement to obtain a parallax map; and converting the parallax value of each pixel in the parallax map to obtain a first absolute depth map (namely a first depth map).

Or in the case that the scene information does not satisfy the binocular depth estimation condition, the electronic device may not execute S207, but input a frame of image (i.e., a monocular image) of the binocular images to the monocular depth estimation network, to obtain a relative depth map; and converting the relative depth map by utilizing the TOF depth map to obtain a second absolute depth map (namely a second depth map).

Further, the electronic device may perform gaussian smoothing and difference operations on the first absolute depth map or the second absolute depth map, to obtain a gaussian-difference image. And finally, blurring the monocular image based on the Gaussian difference image to obtain a blurred image.

Alternatively, both the binocular depth estimation network and the binocular depth estimation network may be convolutional neural networks.

It will be appreciated that the electronic device (e.g., a mobile phone) may include hardware structures and/or software modules that perform the functions described above. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The embodiment of the application can divide the functional modules of the electronic device according to the method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

In the case of dividing each functional module by corresponding each function, referring to fig. 8, an embodiment of the present application provides an image depth estimation apparatus applied to an electronic device, which can implement the image depth estimation method provided in the foregoing embodiment. In practice the image depth estimation means may be a processor in the electronic device. The image depth estimation apparatus may include: an acquisition module 710, an image processing module 720 and a depth information acquisition module 730.

Wherein the acquisition module 710 is configured to acquire a binocular image in response to a first operation for starting an image blurring function; the binocular image includes two frames of images;

The image processing module 720 is configured to obtain scene information of a shooting scene according to the binocular image;

The depth information obtaining module 730 is configured to: under the condition that scene information accords with binocular depth estimation conditions, performing depth estimation on a binocular image by adopting a binocular depth estimation algorithm to obtain a first depth map; under the condition that the scene information does not accord with the binocular depth estimation condition, obtaining a second depth map based on a monocular depth estimation algorithm and monocular images in the binocular images; the monocular image is any one of two frames included in the binocular image. Wherein the binocular depth estimation condition is used to indicate at least one of: the subject in the binocular image is clear, and the subject in the binocular image is the same.

Optionally, the first depth information included in the first depth map and the second depth information included in the second depth map are absolute depth information. The acquisition module 710 is further configured to acquire a time of flight TOF depth map in response to the first operation. The depth information obtaining module 730 is specifically configured to: performing depth estimation on the monocular image by adopting a monocular depth estimation algorithm to obtain a third depth map; the third depth information included in the third depth map is relative depth information; and converting the third depth map according to the TOF depth map to obtain a second depth map.

Optionally, the depth information obtaining module 730 is specifically configured to: determining target conversion parameters according to the TOF depth map and the third depth map; the target conversion parameter is used for converting the relative depth information into absolute depth information; and converting the third depth information in the third depth map by using the target conversion parameter to obtain a second depth map.

Optionally, the depth information obtaining module 730 is specifically configured to: performing pixel point matching on the TOF depth map and the third depth map to obtain a first pixel point pair; each of the first pixel pairs includes: one pixel in the TOF depth map and one pixel in the third depth map; and determining target conversion parameters for minimizing a preset target function according to the TOF depth information and the third depth information of each pixel point pair in the first pixel point pair and the preset target function.

Optionally, the scene information includes: brightness index and motion information of shooting objects in a shooting scene; the numerical value of the brightness index is inversely proportional to the brightness of the shooting scene; the movement information of the photographic subject is used to represent the displacement amount of the photographic subject in the binocular image. The binocular depth estimation conditions include: the brightness index is smaller than a preset brightness threshold, and the motion information of the shooting object is smaller than a preset displacement threshold.

Optionally, the image depth estimation apparatus may further include: the image blurring module 740. An image blurring module 740 for: carrying out Gaussian smoothing and difference operation on the first depth map or the second depth map to obtain an image after Gaussian difference; and blurring the monocular image based on the Gaussian difference image to obtain a blurred image.

Optionally, the image blurring module 740 is specifically configured to: determining a shooting subject in a monocular image; the shooting subject is determined by the electronic device in response to the second operation, or is determined based on a shooting mode started by the electronic device, or is defaulted by the electronic device; and blurring processing is carried out on other areas except the area where the shooting subject is located in the monocular image based on pixel values in the image after Gaussian difference, so that a blurred image is obtained.

Optionally, the image blurring module 740 is specifically configured to: and carrying out blurring processing of different degrees on other areas in the monocular image according to pixel values of pixel points corresponding to other areas in the image after Gaussian difference to obtain a blurred image. The more the blurring degree of the region in the blurred image is, the further the region is from the subject.

With respect to the image depth estimation apparatus in the above-described embodiments, the specific manner in which the respective modules perform the operations has been described in detail in the embodiments of image depth estimation in the above-described embodiments, and will not be specifically described here. The relevant beneficial effects may also refer to the relevant beneficial effects of image depth estimation, and will not be described herein.

The embodiment of the application also provides electronic equipment, which comprises: a memory and one or more processors; the memory is coupled with the processor; wherein the memory has stored therein computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the image depth estimation method as provided by the foregoing embodiments.

Embodiments of the present application also provide a computer-readable storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform an image depth estimation method as provided by the foregoing embodiments.

Embodiments of the present application also provide a computer program product containing executable instructions that, when run on an electronic device, cause the electronic device to perform an image depth estimation method as provided by the previous embodiments.

The present application also provides a chip system, as shown in fig. 9, the chip system 800 includes at least one processor 801 and at least one interface circuit 802. The processor 801 and the interface circuit 802 may be interconnected by wires. For example, interface circuit 802 may be used to receive signals from other devices (e.g., a memory of an electronic apparatus). For another example, interface circuit 802 may be used to send signals to other devices (e.g., processor 801).

The interface circuit 802 may, for example, read instructions stored in a memory and send the instructions to the processor 801. The instructions, when executed by the processor 801, may cause the electronic device to perform the various steps of the embodiments described above. Of course, the system-on-chip may also include other discrete devices, which are not particularly limited in accordance with embodiments of the present application.

It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other manners. For example, the apparatus/device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image depth estimation method, applied to an electronic device, comprising:

Acquiring a binocular image in response to a first operation for starting an image blurring function; the binocular image comprises two frames of images;

Acquiring scene information of a shooting scene according to the binocular image;

Under the condition that the scene information accords with binocular depth estimation conditions, performing depth estimation on the binocular image by adopting a binocular depth estimation algorithm to obtain a first depth map; the binocular depth estimation condition is used for indicating that a shooting object in the binocular image is clear and the shooting objects in the binocular image are the same;

Under the condition that the scene information does not accord with the binocular depth estimation condition, obtaining a second depth map based on a monocular depth estimation algorithm and monocular images in the binocular images; the monocular image is any one of the two frames of images included in the binocular image;

The first depth information included in the first depth map and the second depth information included in the second depth map are absolute depth information;

The method further comprises the steps of: in response to the first operation, acquiring a time-of-flight TOF depth map;

the obtaining a second depth map based on the monocular depth estimation algorithm and the monocular image in the binocular image includes:

performing depth estimation on the monocular image by adopting the monocular depth estimation algorithm to obtain a third depth map; the third depth information included in the third depth map is relative depth information;

and converting the third depth map according to the TOF depth map to obtain the second depth map.

2. The method of claim 1, wherein converting the third depth map from the TOF depth map to obtain the second depth map comprises:

Determining target conversion parameters according to the TOF depth map and the third depth map; the target conversion parameter is used for converting the relative depth information into absolute depth information;

And converting the third depth information in the third depth map by using the target conversion parameter to obtain the second depth map.

3. The method of claim 2, wherein the determining target conversion parameters from the TOF depth map and the third depth map comprises:

Performing pixel point matching on the TOF depth map and the third depth map to obtain a plurality of pixel point pairs; each of the plurality of pixel pairs includes: one pixel in the TOF depth map and one pixel in the third depth map;

Determining the target conversion parameter for minimizing a preset target function according to TOF depth information and third depth information of each pixel point pair of the plurality of pixel point pairs and the preset target function; the preset objective function is used for representing the sum of squares of differences between the TOF depth information and the converted depth information, and the converted depth information is obtained by converting the third depth information by using the target conversion parameter.

4. A method according to any of claims 1-3, wherein the scene information comprises: brightness index and motion information of shooting objects in the shooting scene; the magnitude of the brightness index is inversely proportional to the brightness of the shooting scene; the motion information of the shooting object is used for representing the displacement of the shooting object in the binocular image;

The binocular depth estimation condition includes: the brightness index is smaller than a preset brightness threshold value, and the motion information of the shooting object is smaller than a preset displacement threshold value; the brightness index is smaller than the preset brightness threshold value and used for indicating that the shooting object in the binocular image is clear; and the movement information of the shooting objects is smaller than the preset displacement threshold value and is used for indicating that the shooting objects in the binocular image are identical.

5. A method according to any one of claims 1-3, characterized in that the method further comprises:

Performing Gaussian smoothing and difference operation on the first depth map or the second depth map to obtain an image after Gaussian difference;

And blurring the monocular image based on the Gaussian difference image to obtain a blurred image.

6. The method of claim 5, wherein blurring the monocular image based on the gaussian-differentiated image to obtain a blurred image, comprising:

determining a shooting subject in the monocular image; the shooting subject is determined by the electronic device in response to a second operation, or is determined based on a shooting mode started by the electronic device, or is defaulted by the electronic device;

And carrying out blurring processing on other areas except the area where the shooting subject is located in the monocular image based on pixel values in the Gaussian difference image to obtain the blurring image.

7. The method according to claim 6, wherein the blurring processing is performed on the monocular image in the area other than the area where the subject is located based on the pixel values in the image after gaussian difference, to obtain the image after blurring, including:

And according to the pixel values of the pixel points corresponding to the other areas in the image after Gaussian difference, carrying out blurring processing of different degrees on the other areas in the monocular image to obtain the image after blurring, wherein the blurring degree of the area, which is farther from the shooting subject, in the image after blurring is higher.

8. An electronic device, the electronic device comprising: a processor, a memory, and a communication interface; the memory and the communication interface are coupled with the processor, the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of claims 1-7.

9. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions; the computer instructions, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-7.

10. A chip system, wherein the chip system comprises a processor and an interface circuit, wherein the processor and the interface circuit are interconnected through a line; wherein the interface circuit is configured to receive a signal from an electronic device and send the signal to the processor, the signal comprising computer instructions; the computer instructions, when executed by the processor, cause the chip system to perform the method of any of claims 1-7.