WO2023103792A1 - Image processing method, apparatus and device - Google Patents

Image processing method, apparatus and device Download PDF

Info

Publication number
WO2023103792A1
WO2023103792A1 PCT/CN2022/133950 CN2022133950W WO2023103792A1 WO 2023103792 A1 WO2023103792 A1 WO 2023103792A1 CN 2022133950 W CN2022133950 W CN 2022133950W WO 2023103792 A1 WO2023103792 A1 WO 2023103792A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth map
depth
scale factor
image
scale
Prior art date
Application number
PCT/CN2022/133950
Other languages
French (fr)
Chinese (zh)
Inventor
向显嵩
柳跃天
李良骥
鲍文
刘养东
曾柏伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023103792A1 publication Critical patent/WO2023103792A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present application relates to the technical field of terminals, and in particular to an image processing method, device and equipment.
  • Obtaining the depth map of the scene is a key technology for mobile phones and other devices to provide users with augmented reality (augmented reality, AR) experience.
  • AR augmented reality
  • the device can perceive and understand the objective environment it is in through the depth map of the scene, and then realize scene reconstruction, virtual and real occlusion and other functions. Therefore, the completeness, accuracy, and fineness of the depth map of the scene affect the experience of subsequent AR special effects.
  • the present application provides an image processing method, device, equipment, computer storage medium, and computer program product, which can use the relative depth map of the target scene to screen valid points in the original sparse depth map, and eliminate the valid points in the original sparse depth map. Outliers reduce the deformation and distortion of the depth map and improve the quality of the depth map obtained by subsequent processing.
  • the present application provides an image processing method, including: acquiring a first image of a target scene; inputting the first image into a first neural network to obtain a first depth map corresponding to the first image;
  • the second depth map, the second depth map is at least used to represent the depth information of some scenes contained in the target scene, wherein the depth information contained in the second depth map is less than or equal to the depth information contained in the first depth map ;
  • Based on the first depth map remove outliers in the second depth map to obtain a third depth map; input the first image and the third depth map to the second neural network to obtain a fourth depth map; output the fourth depth picture.
  • the relative depth map of the target scene (namely the first depth map) is used to filter the effective points in the original sparse depth map (ie the second depth map), and the outliers in the original sparse depth map are eliminated, reducing the depth map.
  • the distortion of the deformation improves the quality of the depth map obtained by subsequent processing.
  • the first neural network and the second neural network may be the same neural network.
  • before inputting the first depth map and the third depth map to the second neural network it also includes: performing scale transformation on the third depth map based on the scale factor, and the scale factor is based on the first depth map Determined with the second depth map, the scale factor is used to characterize the ratio between the first depth map and the second depth map; before outputting the fourth depth map, it also includes: performing scale inverse transformation on the fourth depth map based on the scale factor.
  • the image conditions corresponding to the depth map input into the neural network can match the training data set of the neural network, reducing the error between the scale corresponding to the depth map output by the neural network and the real value, and ensuring The depth estimation results conform to the actual scale, which improves the quality of the depth map output by the neural network, and also improves the versatility of the depth estimation algorithm.
  • performing scale transformation on the third depth map based on the scale factor specifically includes: converting the third depth map to The depth value corresponding to each point in is divided by the scale factor; based on the scale factor, the scale inverse transformation is performed on the fourth depth map, which specifically includes: multiplying the depth value corresponding to each point in the fourth depth map by the scale factor.
  • performing scale transformation on the third depth map based on the scale factor specifically includes: converting the third depth map to The depth value corresponding to each point in is multiplied by the scale factor; based on the scale factor, the scale inverse transformation is performed on the fourth depth map, which specifically includes: dividing the depth value corresponding to each point in the fourth depth map by the scale factor.
  • the scaling conversion before performing scale transformation on the third depth map based on the scale factor, it further includes: determining that the image condition of the first image does not match the training data set corresponding to the second neural network, wherein the image condition Include one or more of field of view and aspect ratio. Therefore, when the image condition of the first image does not match the training set corresponding to the second neural network, the scale conversion is performed, which reduces the calculation amount of the system, improves the stability of the system, and saves power consumption.
  • the training data set can be understood as a collection of data used in training.
  • removing outliers in the second depth map to obtain a third depth map specifically includes: separately determining the depth value of each first target point in the second depth map The ratio between the depth value of the second target point in the first depth map to obtain N ratios, wherein the second depth map includes N first target points, N is a positive integer greater than or equal to 1, and the second The target point is the same point as the first target point; according to the N ratios, determine the scale factor, which is used to characterize the ratio between the first depth map and the second depth map; determine each first target point separately The deviation value between the corresponding target ratio and the scale factor, to obtain N deviation values, the target ratio is the ratio between the first target point and the second target point; in the second depth map, the N deviation values are not The first target point corresponding to the deviation value within the preset deviation range is removed to obtain the third depth map.
  • the first image is obtained by an image acquisition device;
  • the second depth map is obtained by a depth sensor and/or an inertial sensor combined with a multi-view geometric algorithm.
  • the present application provides an image processing device, including at least one processor and an interface; at least one processor obtains program instructions or data through the interface; at least one processor is used to execute program instructions, so as to implement the first aspect.
  • the present application provides a device, including: at least one memory for storing a program; at least one processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is used for executing the first method provided by the aspect.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on an electronic device, the electronic device executes the method provided in the first aspect.
  • the present application provides a computer program product, which is characterized in that, when the computer program product is run on an electronic device, the electronic device is made to execute the method provided in the first aspect.
  • FIG. 1 is a schematic diagram of a process for obtaining a depth map provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of another process for obtaining a depth map provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a hardware structure of a device provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a system framework of an image processing method provided in an embodiment of the present application.
  • Fig. 5 is a schematic diagram of a system framework of another image processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an image processing method provided in an embodiment of the present application.
  • Fig. 7 is a process intention of obtaining a sparse depth map and a scale factor from an original sparse depth map provided by an embodiment of the present application;
  • Fig. 8 is a schematic diagram of the steps of another image processing method provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of a hardware structure of a positioning device provided by an embodiment of the present application.
  • first and second and the like in the specification and claims herein are used to distinguish different objects, rather than to describe a specific order of objects.
  • first response message and the second response message are used to distinguish different response messages, rather than describing a specific order of the response messages.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • multiple means two or more, for example, multiple processing units refer to two or more processing units, etc.; multiple A component refers to two or more components or the like.
  • Fig. 1 shows a process of obtaining a depth map.
  • the process of obtaining the depth map is: directly input the red, green, blue (RGB) image of the scene captured by the camera into a pre-trained monocular depth estimation neural network.
  • the RGB image is processed through the monocular depth estimation neural network to obtain a dense depth map, that is, the required depth map is obtained.
  • this depth map is generally regarded as a relative depth map.
  • Fig. 2 shows another process of obtaining a depth map.
  • the process of obtaining the depth map is: using active ranging equipment (such as: lidar, etc.), or multi-view geometric methods (such as binocular matching, visual odometer, etc.), to obtain partial depth of the scene, That is, the original sparse depth map shown in Figure 2.
  • the obtained original sparse depth map and the RGB image of the scene captured by the camera are fed into a pre-trained joint depth estimation neural network.
  • the original sparse depth map and RGB image are processed by the joint depth estimation neural network to obtain a dense, absolute scale depth map, that is, the required depth map is obtained.
  • the joint depth estimation neural network has the dense depth map output capability of the monocular depth estimation neural network; Estimated neural networks are more accurate.
  • this method still has the problem of poor versatility.
  • the depth output by the neural network often depends on the input depth at the corresponding position of the image, and the abnormal depth in the input information (such as the depth folding value of the ranging device, the mismatch point in the multi-view geometry, etc.) will seriously affect the output depth map. , to distort the output deep structure warp.
  • the present application provides an image processing method.
  • the method can compare the relative depth map corresponding to the image of the target scene with the first depth map corresponding to the target scene that contains at least part of the depth information in the target scene, so as to eliminate abnormal values in the first depth map, and then
  • the first depth map from which outliers have been removed and the image of the target scene are processed by a neural network to obtain a required depth map. In this way, the influence of outliers in the first depth map on the subsequent output depth map is avoided, and the quality of the depth map output by the neural network is improved.
  • a scale factor can also be obtained from the relative depth map and the first depth map corresponding to the image of the target scene, and the scale factor can represent the proportional relationship between the first depth map and the relative depth map; and, in the Before the first depth map is input into the neural network, the scale factor can be used to change the scale of the first depth map, so that the image condition of the first depth map matches the training set of the neural network, thus making the neural network in an ideal scale Prediction on the network improves the quality of the depth map output by the neural network.
  • the scale factor is used to change the scale of the depth map output by the neural network again, so that the scale of the acquired depth map is consistent with the scale of the first depth map (that is, the actual scale).
  • the depth information may refer to, but is not limited to, three-dimensional coordinate information of each point of the detected object.
  • FIG. 3 shows a hardware structure of a device.
  • the device 300 may include a processor 311 , a memory 312 , an image acquisition device 313 and a depth map acquisition device 314 .
  • the processor 311 is a calculation core and a control core of the device 300 .
  • Processor 311 may include one or more processing units.
  • the processor 311 may include an application processor (application processor, AP), a modem (modem), a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video encoder One or more of a decoder, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc.
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the memory 312 may store a program, and the program may be executed by the processor 311, so that the processor 311 executes the method provided in this application.
  • the memory 312 may also store data.
  • the processor 311 can read data stored in the memory 312 .
  • the memory 312 and the processor 311 can be set independently.
  • the memory 312 may also be integrated in the processor 311 .
  • the image acquisition device 313 is used to acquire images in the scene.
  • the image acquisition device 313 may be, but not limited to, a camera, a camera, and the like.
  • the image acquisition device 313 may be integrated on the device 300 or arranged separately from the device 300 . Wherein, when the two are arranged separately, the connection between the two can be established through a wired network or a wireless network, but not limited to.
  • the depth map obtaining means 314 is used to obtain a depth map in a scene; wherein, the depth map may be, but not limited to, a depth map of a scene that is partial, limited in scope, low-resolution, and contains individual large error values.
  • the depth map acquisition device 314 can be a depth sensor, such as a laser radar, etc., and the depth sensor can be integrated on the device 300, or can be arranged separately from the device 300. When the depth sensor and the device 300 are arranged separately, the two can be, but not limited to Connect via wired or wireless network.
  • the depth map acquiring device 314 may also be an inertial measurement unit (IMU).
  • IMU inertial measurement unit
  • the inertial sensor can be used to support the device 300 to perform visual simultaneous localization and mapping (SLAM) operations, and then provide position information of feature points in the scene.
  • the information can be, but not limited to, local, limited in scope, Low resolution, containing individual error values.
  • the depth image acquisition device 314 is an inertial sensor
  • the inertial sensor may be used in combination with a multi-view geometric algorithm to obtain the depth image when acquiring the depth image.
  • the structure shown in FIG. 3 of this solution does not constitute a specific limitation on the device 300 .
  • the device 300 may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the device 300 may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook , and cellular phones, personal digital assistants (personal digital assistant, PDA), augmented reality (augmented reality, AR) devices, virtual reality (virtual reality, VR) devices, artificial intelligence (artificial intelligence, AI) devices, wearable devices , vehicle-mounted equipment, smart home equipment, and/or smart city equipment, the embodiment of the present application does not specifically limit the specific type of the electronic equipment.
  • PDA personal digital assistants
  • augmented reality augmented reality, AR
  • VR virtual reality
  • AI artificial intelligence
  • wearable devices wearable devices
  • vehicle-mounted equipment smart home equipment
  • smart city equipment smart city equipment
  • Fig. 4 shows a system framework of an image processing method.
  • the system framework may include a joint depth estimation neural network module, a scale calculation & consistency screening module, and a scale transformation/inverse transformation module.
  • the names of each module are only schematic illustrations, and the names of these modules can also be changed according to actual conditions, and the scheme after changing the names is still within the scope of protection of the present application.
  • the RGB image corresponding to the scene can be processed by using the joint depth estimation neural network module, for example, using a monocular depth estimation algorithm to obtain the relative depth map of the scene corresponding to the RGB image;
  • the RGB image of is used as the input of the joint depth estimation neural network module.
  • the scale of the relative depth map is not good, the front-back relationship of the scene in the RGB image expressed by it is reasonable.
  • the RGB image of the scene can also be replaced with other forms of images, such as black and white images, etc., and the replaced solution is still within the protection scope of the present application.
  • the scale calculation & consistency screening module performs scale calculation and consistency screening on the original sparse depth map to obtain the sparse depth map and scale factor. Therefore, by removing the outliers in the original sparse depth map, the impact of the outliers in the original sparse depth map on the subsequently output depth map is avoided, and the quality of the subsequently output depth map is improved.
  • the original sparse depth map is a depth map that contains at least part of the depth information in the scene corresponding to the RGB image;
  • the sparse depth map is a depth map obtained by removing abnormal information in the original sparse depth map;
  • the scale factor is the original sparse depth The scale relationship between the map and the relative depth map.
  • the scale factor can be used to scale the obtained sparse depth map through the scale transformation/inverse transformation module, so that the conditions of the sparse depth map (such as: field of view (FOV), aspect ratio, etc.) Match the training data set in the joint depth estimation neural network module, and then reduce the error between the scale corresponding to the depth map output by the joint depth estimation neural network module and the real value, and improve the quality of the subsequent output depth map and joint depth estimation Generality of Neural Networks in the Neural Networks Module.
  • the input of the scale transformation/inverse transformation module at this time is a scale factor and a sparse depth map.
  • the RGB image corresponding to the scene and the scale-transformed sparse depth map are simultaneously input to the joint depth estimation neural network module.
  • the RGB image corresponding to the scene and the scale-transformed sparse depth map are processed through the joint depth estimation neural network module.
  • the scale factor can be used to inverse scale the output of the joint depth estimation neural network module through the scale transformation/inverse transformation module to obtain a dense, absolute scale depth map, that is, to obtain the desired depth map.
  • the sparse depth map since the sparse depth map has been scaled in the previous step, it is necessary to inverse scale the output of the joint depth estimation neural network module in the subsequent step, so that the information in the two depth maps can match.
  • scale transformation may not be performed on it, but the RGB image corresponding to the scene is directly input to the joint depth estimation neural network module.
  • the output result of the joint depth estimation neural network module is the required depth map, and there is no need to perform scale inverse transformation on the output result of the joint depth estimation neural network module.
  • the scale calculation & consistency screening module may not output the scale factor.
  • Fig. 5 shows the system framework of another image processing method.
  • the main difference between Fig. 5 and Fig. 4 is that: in Fig. 5, the device for obtaining the RGB image of the scene and the device for obtaining the original sparse depth map corresponding to the scene are integrated on the device 300, while in Fig.
  • the devices corresponding to the original sparse depth map may be integrated on the device 300 , or part or all may be arranged separately from the device 300 .
  • the image processing method provided by the present application will be described in detail below.
  • the image captured by the image capture device 313 supporting the device 300 as an RGB image
  • the image capture device 313 is a color camera as an example.
  • the image captured by the image capture device 313 is not an RGB image , still within the protection scope of the present application.
  • FIG. 6 shows a flow of an image processing method.
  • the image processing method may include the following steps:
  • the RGB image can be input separately into the joint depth estimation network to perform monocular depth estimation to obtain a relative depth map.
  • the joint depth estimation network may be a pre-trained RGB-D (depth) four-channel input neural network, and when the network performs monocular depth estimation, only the depth channel in the input data is filled with zeros.
  • S602. Perform scale calculation and consistency screening on the original sparse depth map and the relative depth map to obtain a scale factor of the joint depth estimation network and a sparse depth map after removing outliers.
  • the original sparse depth map is a depth map that contains part of the depth information in the target scene.
  • the depth information contained in the depth map may be local, limited in scope, low-resolution, and contain individual large error values.
  • this step can also be understood as using the RGB image of the scene collected by the color camera and the scene depth map obtained by the corresponding depth sensor or machine vision algorithm (the depth information contained in the depth map may be local and limited in scope) , low-resolution, and containing individual large error values), the depth map corresponding to the scene (that is, the sparse depth map) is obtained through fusion calculation.
  • the process of obtaining the scale factor and the sparse depth map may include the following steps:
  • the original sparse depth information corresponding to the target scene is obtained in the previous period
  • the original sparse depth information can be projected to the RGB camera view angle, up/down sampling, combined, etc. to obtain the original sparse depth map, thereby using Get the original sparse depth map.
  • this step can be omitted when the original sparse depth map is obtained in the previous stage.
  • the ratio r of each effective depth value in the original sparse depth map to the depth value at a corresponding position in the relative depth map can be calculated. and calculate the average R of all r.
  • r/R corresponding to each effective point in the original sparse depth map can be calculated respectively.
  • a valid point can be understood as a point corresponding to a valid depth value in the original sparse depth map.
  • comparing the r/R corresponding to the i-th effective point among the N effective points with the preset range it can be determined whether the i-th effective point is within the preset range, and if so, execute S705, Otherwise, execute S706.
  • the initial value of i is 1, and N is the total number of valid points in the original sparse depth map.
  • the i-th effective point is not within the preset range, it indicates that the i-th effective point is an outlier point, that is, the point is an abnormal value, so the effective point can be eliminated, that is, S706 is executed.
  • the i-th valid point is within the preset range, it indicates that the i-th valid point is a normal value, so the valid point can be reserved, that is, S705 is executed.
  • the size of the two can be determined by comparing i with N. Wherein, when i is less than or equal to N, it indicates that each point in the N effective points has not been traversed at this time, so S704 can be returned; when i is greater than N, it indicates that it has traversed to each of the N effective points at this time For each point, the sparse depth map excluding abnormal points can already be obtained at this time, so the traversal process can be ended, that is, S708 is executed.
  • a sparse depth map After traversing each effective point among the N effective points, a sparse depth map can be obtained.
  • R obtained above may be used as a scaling factor.
  • S702 to S708 may be repeatedly executed until M iterations, where M is a positive integer greater than or equal to 1, or until there is no valid point to be eliminated.
  • the original sparse depth map required in this execution of S702 may be the sparse depth map obtained from the previous execution of S702 to S708.
  • the R obtained in S709 is the R obtained in the last execution of S702
  • the finally obtained sparse depth map is the sparse depth map obtained in the last execution of S702 to S708.
  • the scale factor can be used to perform scale transformation on the obtained sparse depth map.
  • the scale transformation may be scaling by dividing the depth value of each effective point in the sparse depth map obtained after filtering by the scale factor R.
  • R can be understood as the ratio of the original sparse depth map to the relative depth map scale factor.
  • the scale transformation can be scaled by multiplying the depth value of each valid point in the filtered sparse depth map by the scale factor R.
  • R the scale factor of the relative depth map relative to the original sparse depth map.
  • the sparse depth map and the RGB image of the target scene can be spliced into RGB-D (depth) four-channel input data and input to the joint depth estimation network at the same time, and the joint depth estimation A depth map is obtained after network processing.
  • the depth map can be inversely transformed to obtain the required depth map (ie, dense, absolute scale depth map).
  • scale inverse transformation refers to multiplying the depth value of each effective point in the depth map output by the neural network by the scale factor R to scale;
  • the inverse scale transformation refers to dividing the depth value of each effective point in the depth map output by the neural network by the scale factor R to scale.
  • the inverse scale transformation is the same as The above scale transformation is the reverse process.
  • the scale factor when scale conversion is not required, the scale factor may not be obtained in S602, and at the same time, S604 may be directly performed after performing S602, and the result output by S604 may be used as the final required result.
  • the image conditions (such as field of view, aspect ratio, etc.) of the RGB image of the target scene captured by the color camera at S601 do not match the training set corresponding to the joint depth estimation network.
  • the image condition of the RGB image of the target scene captured by the color camera is matched with the training set corresponding to the joint depth estimation network.
  • the image conditions corresponding to the depth map input into the neural network can match the data set in the neural network, reducing the error between the scale corresponding to the depth map output by the neural network and the real value, ensuring The depth estimation result conforms to the actual scale, which improves the quality of the depth map output by the neural network, and also improves the versatility of the neural network and/or depth estimation algorithm.
  • FIG. 8 is a schematic flowchart of an image processing method provided in an embodiment of the present application. It can be understood that the method can be executed by any device, device, platform, or device cluster that has computing and processing capabilities. As shown in Figure 8, the image processing method includes:
  • the first image of the target scene may be acquired by an image acquisition device (such as a camera, a camera, etc.), but is not limited to.
  • an image acquisition device such as a camera, a camera, etc.
  • the first image may be, but not limited to, an RGB image.
  • the first image can be input into the first neural network to process the first image through the first neural network, for example, the first image is processed through a monocular depth estimation algorithm to obtain the The first depth map corresponding to the first image.
  • the first depth map may be the relative depth map described above in FIG. 6 .
  • the second depth map corresponding to the target scene may be obtained, but not limited to, by a depth map obtaining device (such as a depth sensor and/or an inertial sensor, etc.).
  • the second depth map is at least used to represent the depth information of some scenes included in the target scene, wherein the depth information included in the second depth map is less than or equal to the depth information included in the first depth map.
  • the second depth map may be the original sparse depth map described above in FIG. 6 .
  • abnormal values in the second depth map may be eliminated based on the first depth map to obtain a third depth map.
  • the third depth map may be the sparse depth map described above in FIG. 6 .
  • culling can be understood as removal, removal, etc., which means to keep normal values in the second depth map and remove outliers.
  • the ratio between the depth value of each first target point in the second depth map and the depth value of the second target point in the first depth map can be determined first to obtain N ratios, where , the second depth map includes N first target points, N is a positive integer greater than or equal to 1, and the second target point is a point at the same position as the first target point. Then, according to the N ratios, a scale factor used to characterize the ratio between the first depth map and the second depth map is determined. Then, the deviation value between the target ratio corresponding to each first target point and the scale factor is respectively determined to obtain N deviation values, wherein the target ratio is the ratio between the first target point and the second target point.
  • the first target points corresponding to the deviation values that are not within the preset deviation range among the N deviation values are eliminated to obtain the third depth map.
  • this process can refer to the relevant description in FIG. 7 above, and details will not be repeated here.
  • the deviation value may be a ratio or a difference
  • the preset deviation range may be, but not limited to, the preset range described in FIG. 7 above.
  • the first image and the third depth map may be input into the second neural network simultaneously or in time division to obtain the fourth depth map.
  • the second neural network and the first neural network may be the same neural network.
  • the fourth depth map may be output.
  • the relative depth map of the target scene to filter the effective points in the original sparse depth map, the outliers in the original sparse depth map are eliminated, the deformation and distortion of the depth map are reduced, and the depth map obtained by subsequent processing is improved. the quality of.
  • scale transformation may be performed on the third depth map based on a scale factor
  • the scale factor is determined based on the first depth map and the second depth map
  • the scale factor is used to characterize the first depth map and the second depth map.
  • Scale between second depth maps perform scale inverse transformation on the fourth depth map based on the scale factor.
  • the image conditions corresponding to the depth map input into the neural network can match the training data set in the neural network, reducing the error between the scale corresponding to the depth map output by the neural network and the real value, ensuring This ensures that the depth estimation result conforms to the actual scale, improves the quality of the depth map output by the neural network, and also improves the versatility of the neural network and/or the depth estimation algorithm.
  • scale transformation and scale inverse transformation refer to the description in FIG. 6 above, and details will not be repeated here.
  • sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any obligation for the implementation process of the embodiment of the present application. limited.
  • the steps in the foregoing embodiments may be selectively executed according to actual conditions, may be partially executed, or may be completely executed, which is not limited here.
  • all or part of any feature of any embodiment of the present application can be freely and in any combination on the premise of no contradiction.
  • the combined technical solutions are also within the scope of the present application.
  • FIG. 9 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • an image processing device 900 includes one or more processors 901 and an interface circuit 902 .
  • the image processing apparatus 900 may further include a bus 903 . in:
  • the processor 901 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 901 or instructions in the form of software.
  • the above-mentioned processor 901 may be a general-purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • NPU neural network processor
  • DSP digital communicator
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the interface circuit 902 can be used for sending or receiving data, instructions or information.
  • the processor 901 can process the data, instructions or other information received by the interface circuit 902 , and can send the processing completion information through the interface circuit 902 .
  • the image processing apparatus 900 further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
  • a portion of the memory may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory may be coupled with the processor 901 .
  • the memory stores executable software modules or data structures
  • the processor 901 can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
  • the interface circuit 902 may be used to output an execution result of the processor 901 .
  • processor 901 and the interface circuit 902 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here.
  • processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor, or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmablerom, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or any known in the art other forms of storage media.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted via a computer-readable storage medium.
  • the computer instructions may be transmitted from one website site, computer, server, or data center to another website site by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

An image processing method, relating to the technical field of terminals. In the method, effective points in an original sparse depth map corresponding to a target scene can be screened by using a relative depth map corresponding to an image of the target scene, such that outliers in the original sparse depth map are eliminated; and the original sparse depth map from which the outliers are eliminated and the image of the target scene are input into a neural network so as to obtain a required depth map. Therefore, by eliminating outliers in an original sparse depth map, deformation distortion of the depth map during subsequent processing is reduced, and the quality of the depth map, which is obtained by means of subsequent processing, is improved.

Description

一种图像处理方法、装置及设备An image processing method, device and equipment
本申请要求于2021年12月9日提交中国国家知识产权局、申请号为202111499904.X、申请名称为“一种图像处理方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111499904.X and the application title "An image processing method, device and equipment" submitted to the State Intellectual Property Office of China on December 9, 2021, the entire content of which is passed References are incorporated in this application.
技术领域technical field
本申请涉及终端技术领域,尤其涉及一种图像处理方法、装置及设备。The present application relates to the technical field of terminals, and in particular to an image processing method, device and equipment.
背景技术Background technique
获取场景的深度图是手机等设备为用户提供增强现实(augmented reality,AR)体验的关键技术。其中,设备可以通过场景的深度图,感知和理解到其所处的客观环境,进而实现场景重建、虚实遮挡等功能。因此,场景的深度图的完整度、准确度和精细度影响了后续AR特效的体验。Obtaining the depth map of the scene is a key technology for mobile phones and other devices to provide users with augmented reality (augmented reality, AR) experience. Among them, the device can perceive and understand the objective environment it is in through the depth map of the scene, and then realize scene reconstruction, virtual and real occlusion and other functions. Therefore, the completeness, accuracy, and fineness of the depth map of the scene affect the experience of subsequent AR special effects.
发明内容Contents of the invention
本申请提供了一种图像处理方法、装置、设备、计算机存储介质及计算机程序产品,能够利用目标场景的相对深度图对原始稀疏深度图中的有效点进行筛选,剔除了原始稀疏深度图中的异常值,减少了深度图的变形失真,提升了后续处理得到的深度图的质量。The present application provides an image processing method, device, equipment, computer storage medium, and computer program product, which can use the relative depth map of the target scene to screen valid points in the original sparse depth map, and eliminate the valid points in the original sparse depth map. Outliers reduce the deformation and distortion of the depth map and improve the quality of the depth map obtained by subsequent processing.
第一方面,本申请提供一种图像处理方法,包括:获取目标场景的第一图像;将第一图像输入至第一神经网络,以得到第一图像对应的第一深度图;获取目标场景对应的第二深度图,第二深度图至少用于表征目标场景中所包含的部分景物的深度信息,其中,第二深度图所包含的深度信息少于或等于第一深度图所包含的深度信息;基于第一深度图,剔除第二深度图中的异常值,得到第三深度图;将第一图像和第三深度图输入至第二神经网络,以得到第四深度图;输出第四深度图。这样,利用目标场景的相对深度图(即第一深度图)对原始稀疏深度图(即第二深度图)中的有效点进行筛选,剔除了原始稀疏深度图中的异常值,减少了深度图的变形失真,提升了后续处理得到的深度图的质量。示例性的,第一神经网络和第二神经网络可以为同一个神经网络。In a first aspect, the present application provides an image processing method, including: acquiring a first image of a target scene; inputting the first image into a first neural network to obtain a first depth map corresponding to the first image; The second depth map, the second depth map is at least used to represent the depth information of some scenes contained in the target scene, wherein the depth information contained in the second depth map is less than or equal to the depth information contained in the first depth map ; Based on the first depth map, remove outliers in the second depth map to obtain a third depth map; input the first image and the third depth map to the second neural network to obtain a fourth depth map; output the fourth depth picture. In this way, the relative depth map of the target scene (namely the first depth map) is used to filter the effective points in the original sparse depth map (ie the second depth map), and the outliers in the original sparse depth map are eliminated, reducing the depth map. The distortion of the deformation improves the quality of the depth map obtained by subsequent processing. Exemplarily, the first neural network and the second neural network may be the same neural network.
在一种可能的实现方式中,将第一深度图和第三深度图输入至第二神经网络之前,还包括:基于尺度因子,对第三深度图进行尺度变换,尺度因子基于第一深度图和第二深度图确定,尺度因子用于表征第一深度图和第二深度图之间的比例;输出第四深度图之前,还包括:基于尺度因子,对第四深度图进行尺度反变换。由此通过尺度变换使得输入至神经网络中的深度图对应的图像条件可以与该神经网络的训练数据集相匹配,降低了神经网络输出的深度图对应的尺度与真实值间的误差,保证了深度估计结果符合实际尺度,提升了神经网络输出的深度图的质量,同时也提升了深度估计算法的通用性。In a possible implementation manner, before inputting the first depth map and the third depth map to the second neural network, it also includes: performing scale transformation on the third depth map based on the scale factor, and the scale factor is based on the first depth map Determined with the second depth map, the scale factor is used to characterize the ratio between the first depth map and the second depth map; before outputting the fourth depth map, it also includes: performing scale inverse transformation on the fourth depth map based on the scale factor. Therefore, through scale transformation, the image conditions corresponding to the depth map input into the neural network can match the training data set of the neural network, reducing the error between the scale corresponding to the depth map output by the neural network and the real value, and ensuring The depth estimation results conform to the actual scale, which improves the quality of the depth map output by the neural network, and also improves the versatility of the depth estimation algorithm.
在一种可能的实现方式中,当尺度因子用于表征第二深度图相对于第一深度图的比例时,基于尺度因子,对第三深度图进行尺度变换,具体包括:将第三深度图中的每个点对应的深度值均除以尺度因子;基于尺度因子,对第四深度图进行尺度反变换,具体包括:将第四深度图中的每个点对应的深度值均乘以尺度因子。In a possible implementation manner, when the scale factor is used to characterize the ratio of the second depth map to the first depth map, performing scale transformation on the third depth map based on the scale factor specifically includes: converting the third depth map to The depth value corresponding to each point in is divided by the scale factor; based on the scale factor, the scale inverse transformation is performed on the fourth depth map, which specifically includes: multiplying the depth value corresponding to each point in the fourth depth map by the scale factor.
在一种可能的实现方式中,当尺度因子用于表征第一深度图相对于第二深度图的比例时,基于尺度因子,对第三深度图进行尺度变换,具体包括:将第三深度图中的每个点对应的深度值均乘以尺度因子;基于尺度因子,对第四深度图进行尺度反变换,具体包括:将第四深度图中的每个点对应的深度值均除以尺度因子。In a possible implementation manner, when the scale factor is used to characterize the ratio of the first depth map to the second depth map, performing scale transformation on the third depth map based on the scale factor specifically includes: converting the third depth map to The depth value corresponding to each point in is multiplied by the scale factor; based on the scale factor, the scale inverse transformation is performed on the fourth depth map, which specifically includes: dividing the depth value corresponding to each point in the fourth depth map by the scale factor.
在一种可能的实现方式中,基于尺度因子,对第三深度图进行尺度变换之前,还包括:确定第一图像的图像条件与第二神经网络对应的训练数据集不匹配,其中,图像条件包括视场角和横纵比中的一项或多项。由此,在第一图像的图像条件与第二神经网络对应的训练集不匹配时再进行尺度变换,降低了***的计算量,提升了***的稳定性,节省了功耗。示例性的,训练数据集可以理解为训练时所用到的数据的集合。In a possible implementation, before performing scale transformation on the third depth map based on the scale factor, it further includes: determining that the image condition of the first image does not match the training data set corresponding to the second neural network, wherein the image condition Include one or more of field of view and aspect ratio. Therefore, when the image condition of the first image does not match the training set corresponding to the second neural network, the scale conversion is performed, which reduces the calculation amount of the system, improves the stability of the system, and saves power consumption. Exemplarily, the training data set can be understood as a collection of data used in training.
在一种可能的实现方式中,基于第一深度图,剔除第二深度图中的异常值,得到第三深度图,具体包括:分别确定第二深度图中每个第一目标点的深度值与第一深度图中第二目标点的深度值间的比值,以得到N个比值,其中,第二深度图中包括N个第一目标点,N为大于或等于1的正整数,第二目标点为与第一目标点的位置相同的点;根据N个比值,确定尺度因子,尺度因子用于表征第一深度图和第二深度图之间的比例;分别确定每个第一目标点对应的目标比值与尺度因子之间的偏差值,以得到N个偏差值,目标比值为第一目标点与第二目标点间的比值;在第二深度图中,将N个偏差值中未处于预设偏差范围内的偏差值对应的第一目标点剔除,以得到第三深度图。In a possible implementation manner, based on the first depth map, removing outliers in the second depth map to obtain a third depth map specifically includes: separately determining the depth value of each first target point in the second depth map The ratio between the depth value of the second target point in the first depth map to obtain N ratios, wherein the second depth map includes N first target points, N is a positive integer greater than or equal to 1, and the second The target point is the same point as the first target point; according to the N ratios, determine the scale factor, which is used to characterize the ratio between the first depth map and the second depth map; determine each first target point separately The deviation value between the corresponding target ratio and the scale factor, to obtain N deviation values, the target ratio is the ratio between the first target point and the second target point; in the second depth map, the N deviation values are not The first target point corresponding to the deviation value within the preset deviation range is removed to obtain the third depth map.
在一种可能的实现方式中,第一图像通过图像采集装置获得;第二深度图通过深度传感器和/或利用惯性传感器并结合多视几何算法获得。In a possible implementation manner, the first image is obtained by an image acquisition device; the second depth map is obtained by a depth sensor and/or an inertial sensor combined with a multi-view geometric algorithm.
第二方面,本申请提供一种图像处理装置,包括至少一个处理器和接口;至少一个处理器通过接口获取程序指令或者数据;至少一个处理器用于执行程序行指令,以实现第一方面所提供的方法。In a second aspect, the present application provides an image processing device, including at least one processor and an interface; at least one processor obtains program instructions or data through the interface; at least one processor is used to execute program instructions, so as to implement the first aspect. Methods.
第三方面,本申请提供一种设备,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行第一方面所提供的方法。In a third aspect, the present application provides a device, including: at least one memory for storing a program; at least one processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is used for executing the first method provided by the aspect.
第四方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,当计算机程序在电子设备上运行时,使得电子设备执行第一方面所提供的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on an electronic device, the electronic device executes the method provided in the first aspect.
第五方面,本申请提供一种计算机程序产品,其特征在于,当计算机程序产品在电子设备上运行时,使得电子设备执行第一方面所提供的方法。In a fifth aspect, the present application provides a computer program product, which is characterized in that, when the computer program product is run on an electronic device, the electronic device is made to execute the method provided in the first aspect.
可以理解的是,上述第二方面至第六方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the above-mentioned second aspect to the sixth aspect, reference can be made to the related description in the above-mentioned first aspect, which will not be repeated here.
附图说明Description of drawings
图1是本申请实施例提供的一种得到深度图的过程示意图;FIG. 1 is a schematic diagram of a process for obtaining a depth map provided by an embodiment of the present application;
图2是本申请实施例提供的另一种得到深度图的过程示意图;FIG. 2 is a schematic diagram of another process for obtaining a depth map provided by an embodiment of the present application;
图3是本申请实施例提供的一种设备的硬件结构示意图;FIG. 3 is a schematic diagram of a hardware structure of a device provided by an embodiment of the present application;
图4是本申请实施例提供的一种图像处理方法的***框架示意图;FIG. 4 is a schematic diagram of a system framework of an image processing method provided in an embodiment of the present application;
图5是本申请实施例提供的另一种图像处理方法的***框架示意图;Fig. 5 is a schematic diagram of a system framework of another image processing method provided by an embodiment of the present application;
图6是本申请实施例提供的一种图像处理方法的流程示意图;FIG. 6 is a schematic flowchart of an image processing method provided in an embodiment of the present application;
图7是本申请实施例提供的一种由原始稀疏深度图得到稀疏深度图和尺度因子的过程意 图;Fig. 7 is a process intention of obtaining a sparse depth map and a scale factor from an original sparse depth map provided by an embodiment of the present application;
图8是本申请实施例提供的另一种图像处理方法的步骤示意图;Fig. 8 is a schematic diagram of the steps of another image processing method provided by the embodiment of the present application;
图9是本申请实施例提供的一种定位装置的硬件结构示意图。FIG. 9 is a schematic diagram of a hardware structure of a positioning device provided by an embodiment of the present application.
具体实施方式Detailed ways
本文中术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本文中符号“/”表示关联对象是或者的关系,例如A/B表示A或者B。The term "and/or" in this article is an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone These three situations. The symbol "/" in this document indicates that the associated object is an or relationship, for example, A/B indicates A or B.
本文中的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一响应消息和第二响应消息等是用于区别不同的响应消息,而不是用于描述响应消息的特定顺序。The terms "first" and "second" and the like in the specification and claims herein are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first response message and the second response message are used to distinguish different response messages, rather than describing a specific order of the response messages.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或者两个以上,例如,多个处理单元是指两个或者两个以上的处理单元等;多个元件是指两个或者两个以上的元件等。In the description of the embodiments of the present application, unless otherwise specified, "multiple" means two or more, for example, multiple processing units refer to two or more processing units, etc.; multiple A component refers to two or more components or the like.
示例性的,图1示出了一种得到深度图的过程。如图1所示,该得到深度图的过程为:将摄像头采集的场景的红绿蓝(red,green,blue;RGB)图像直接输入到一个预先训练好的单目深度估计神经网络中。通过该单目深度估计神经网络对RGB图像进行处理,得到稠密深度图,即得到所需的深度图。但这种方式得到的深度图与实际的深度图之间往往存在尺度差异,一般只是将这种深度图作为相对深度图。此外,当实际获取到的图像的条件(比如:视场角(field of view,FOV)、横纵比等)与神经网络中的训练集不匹配时,神经网络输出的深度图对应的尺度与真实值间的误差往往较大,因此,这种方式的通用性较差。Exemplarily, Fig. 1 shows a process of obtaining a depth map. As shown in Figure 1, the process of obtaining the depth map is: directly input the red, green, blue (RGB) image of the scene captured by the camera into a pre-trained monocular depth estimation neural network. The RGB image is processed through the monocular depth estimation neural network to obtain a dense depth map, that is, the required depth map is obtained. However, there is often a scale difference between the depth map obtained in this way and the actual depth map, and this depth map is generally regarded as a relative depth map. In addition, when the conditions of the actually acquired image (such as: field of view (FOV), aspect ratio, etc.) do not match the training set in the neural network, the scale corresponding to the depth map output by the neural network is the same as The error between the real values is often large, so this method has poor versatility.
示例性的,图2示出了另一种得到深度图的过程。如图2所示,该得到深度图的过程为:使用主动测距设备(如:激光雷达等),或者多视几何方法(如双目匹配、视觉里程计等),得到场景的部分深度,即图2中所示的原始稀疏深度图。然后,再将得到的原始稀疏深度图和摄像头采集的场景的RGB图像一起输入一个预先训练好的联合深度估计神经网络中。通过该联合深度估计神经网络对原始稀疏深度图和RGB图像进行处理,得到稠密、绝对尺度深度图,即得到所需的深度图。其中,该联合深度估计神经网络一方面具有单目深度估计神经网络的稠密深度图输出能力;另一方面由于融合了场景的部分深度信息,其预测值比只用RGB图像作输入的单目深度估计神经网络更加准确。但由于神经网络的预测尺度由训练数据决定,因此,这种方式仍然存在通用性较差的问题。此外,神经网络输出的深度往往依赖于图像相应位置的输入深度,而输入信息中的异常深度(如测距设备的深度折叠值、多视几何中的误匹配点等)会严重影响输出深度图,使输出的深度结构变形失真。Exemplarily, Fig. 2 shows another process of obtaining a depth map. As shown in Figure 2, the process of obtaining the depth map is: using active ranging equipment (such as: lidar, etc.), or multi-view geometric methods (such as binocular matching, visual odometer, etc.), to obtain partial depth of the scene, That is, the original sparse depth map shown in Figure 2. Then, the obtained original sparse depth map and the RGB image of the scene captured by the camera are fed into a pre-trained joint depth estimation neural network. The original sparse depth map and RGB image are processed by the joint depth estimation neural network to obtain a dense, absolute scale depth map, that is, the required depth map is obtained. Among them, on the one hand, the joint depth estimation neural network has the dense depth map output capability of the monocular depth estimation neural network; Estimated neural networks are more accurate. However, since the prediction scale of the neural network is determined by the training data, this method still has the problem of poor versatility. In addition, the depth output by the neural network often depends on the input depth at the corresponding position of the image, and the abnormal depth in the input information (such as the depth folding value of the ranging device, the mismatch point in the multi-view geometry, etc.) will seriously affect the output depth map. , to distort the output deep structure warp.
为提升深度图的质量,以提升后续AR特效的体验,本申请提供了一种图像处理方法。该方法可以将目标场景的图像对应的相对深度图与该目标场景对应的至少包含有该目标场景中部分深度信息的第一深度图进行比较,以剔除该第一深度图中的异常值,进而通过神经网络对该剔除异常值的第一深度图和目标场景的图像进行处理,以得到所需的深度图。由此避免 第一深度图中的异常值对后续输出的深度图的影响,提升了神经网络输出的深度图的质量。此外,该方法中还可以由目标场景的图像对应的相对深度图和第一深度图得到一个尺度因子,该尺度因子可以表示第一深度图与相对深度图之间的比例关系;以及,在将第一深度图输入神经网络之前,可以利用该尺度因子对第一深度图进行尺度变化,从而使得第一深度图的图像条件与神经网络的训练集相匹配,由此使得神经网络在理想的尺度上进行预测,提升神经网络输出的深度图的质量。另外,在神经网络输出深度图后,利用该尺度因子对神经网络输出的深度图再次进行尺度变化,以使得获取到的深度图的尺度与第一深度图的尺度(即实际尺度)一致,再次提升神经网络输出的深度图的质量。在一个例子中,深度信息可以但不限于指所检测到的物体各个点的三维坐标信息。In order to improve the quality of the depth map and enhance the experience of subsequent AR special effects, the present application provides an image processing method. The method can compare the relative depth map corresponding to the image of the target scene with the first depth map corresponding to the target scene that contains at least part of the depth information in the target scene, so as to eliminate abnormal values in the first depth map, and then The first depth map from which outliers have been removed and the image of the target scene are processed by a neural network to obtain a required depth map. In this way, the influence of outliers in the first depth map on the subsequent output depth map is avoided, and the quality of the depth map output by the neural network is improved. In addition, in this method, a scale factor can also be obtained from the relative depth map and the first depth map corresponding to the image of the target scene, and the scale factor can represent the proportional relationship between the first depth map and the relative depth map; and, in the Before the first depth map is input into the neural network, the scale factor can be used to change the scale of the first depth map, so that the image condition of the first depth map matches the training set of the neural network, thus making the neural network in an ideal scale Prediction on the network improves the quality of the depth map output by the neural network. In addition, after the neural network outputs the depth map, the scale factor is used to change the scale of the depth map output by the neural network again, so that the scale of the acquired depth map is consistent with the scale of the first depth map (that is, the actual scale). Improve the quality of the depth map output by the neural network. In one example, the depth information may refer to, but is not limited to, three-dimensional coordinate information of each point of the detected object.
示例性的,图3示出了一种设备的硬件结构。如图3所示,该设备300可以包括处理器311、存储器312、图像采集装置313和深度图获取装置314。Exemplarily, FIG. 3 shows a hardware structure of a device. As shown in FIG. 3 , the device 300 may include a processor 311 , a memory 312 , an image acquisition device 313 and a depth map acquisition device 314 .
其中,处理器311是设备300的计算核心及控制核心。处理器311可以包括一个或多个处理单元。例如,处理器311可以包括应用处理器(application processor,AP)、调制解调器(modem)、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、和/或神经网络处理器(neural-network processing unit,NPU)等中的一项或多项。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。Wherein, the processor 311 is a calculation core and a control core of the device 300 . Processor 311 may include one or more processing units. For example, the processor 311 may include an application processor (application processor, AP), a modem (modem), a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video encoder One or more of a decoder, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
存储器312可以存储有程序,程序可被处理器311运行,使得处理器311执行本申请提供的方法。存储器312还可以存储有数据。处理器311可以读取存储器312中存储的数据。存储器312和处理器311可以单独设置。可选地,存储器312也可以集成在处理器311中。The memory 312 may store a program, and the program may be executed by the processor 311, so that the processor 311 executes the method provided in this application. The memory 312 may also store data. The processor 311 can read data stored in the memory 312 . The memory 312 and the processor 311 can be set independently. Optionally, the memory 312 may also be integrated in the processor 311 .
图像采集装置313用于采集场景中的图像。该图像采集装置313可以但不限于为摄像头、相机等。该图像采集装置313可以集成在设备300上,也可以与设备300单独布置。其中,当两者单独布置时,两者之间可以但不限于通过有线网络或无线网络建立连接。The image acquisition device 313 is used to acquire images in the scene. The image acquisition device 313 may be, but not limited to, a camera, a camera, and the like. The image acquisition device 313 may be integrated on the device 300 or arranged separately from the device 300 . Wherein, when the two are arranged separately, the connection between the two can be established through a wired network or a wireless network, but not limited to.
深度图获取装置314用于获取场景中的深度图;其中,该深度图可以但不限于是局部的、范围有限的、低分辨率的、包含个别较大错误值的场景的深度图。深度图获取装置314可以为深度传感器,比如激光雷达等,该深度传感器集成在设备300上,也可以与设备300单独布置,当深度传感器与设备300单独布置时,两者之间可以但不限于通过有线网络或无线网络建立连接。此外,该深度图获取装置314也可以为惯性传感器(inertial measurement unit,IMU)。该惯性传感器可以用于支持设备300进行视觉同步定位与建图(simultaneous localization and mapping,SLAM)运算,进而提供场景中特征点的位置信息,该信息可以但不限于是局部的、范围有限的、低分辨率的、包含个别错误值的。示例性的,当深度图像获取装置314为惯性传感器时,在获取深度图时可以利用惯性传感器并结合多视几何算法得到深度图。The depth map obtaining means 314 is used to obtain a depth map in a scene; wherein, the depth map may be, but not limited to, a depth map of a scene that is partial, limited in scope, low-resolution, and contains individual large error values. The depth map acquisition device 314 can be a depth sensor, such as a laser radar, etc., and the depth sensor can be integrated on the device 300, or can be arranged separately from the device 300. When the depth sensor and the device 300 are arranged separately, the two can be, but not limited to Connect via wired or wireless network. In addition, the depth map acquiring device 314 may also be an inertial measurement unit (IMU). The inertial sensor can be used to support the device 300 to perform visual simultaneous localization and mapping (SLAM) operations, and then provide position information of feature points in the scene. The information can be, but not limited to, local, limited in scope, Low resolution, containing individual error values. Exemplarily, when the depth image acquisition device 314 is an inertial sensor, the inertial sensor may be used in combination with a multi-view geometric algorithm to obtain the depth image when acquiring the depth image.
可以理解的是,本方案图3示意的结构并不构成对设备300的具体限定。在本方案另一些实施例中,设备300可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure shown in FIG. 3 of this solution does not constitute a specific limitation on the device 300 . In other embodiments of the present solution, the device 300 may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.
可以理解的是,本申请实施例中,设备300可以是手机、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant, PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备和/或智慧城市设备,本申请实施例对该电子设备的具体类型不作特殊限制。It can be understood that, in the embodiment of the present application, the device 300 may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook , and cellular phones, personal digital assistants (personal digital assistant, PDA), augmented reality (augmented reality, AR) devices, virtual reality (virtual reality, VR) devices, artificial intelligence (artificial intelligence, AI) devices, wearable devices , vehicle-mounted equipment, smart home equipment, and/or smart city equipment, the embodiment of the present application does not specifically limit the specific type of the electronic equipment.
示例性的,图4示出了一种图像处理方法的***框架。如图4所示,该***框架可以包括联合深度估计神经网络模块、尺度计算&一致性筛选模块和尺度变换/反变换模块。其中,各个模块的名称仅是示意性说明,也可以根据实际情况变更这些模块的名称,变更名称后的方案仍在本申请的保护范围内。Exemplarily, Fig. 4 shows a system framework of an image processing method. As shown in Figure 4, the system framework may include a joint depth estimation neural network module, a scale calculation & consistency screening module, and a scale transformation/inverse transformation module. Wherein, the names of each module are only schematic illustrations, and the names of these modules can also be changed according to actual conditions, and the scheme after changing the names is still within the scope of protection of the present application.
在该***框架下,首先,可以利用联合深度估计神经网络模块对场景对应的RGB图像进行处理,例如使用单目深度估计算法处理等,得到该RGB图像对应的场景的相对深度图;即将场景对应的RGB图像作为联合深度估计神经网络模块的输入。其中,该相对深度图虽然尺度不佳,但其所表达的RGB图像中景物的前后关系合理。可以理解的是,场景的RGB图像也可以替换为其他形式的图像,比如黑白图像等,替换后的方案仍在本申请的保护范围内。Under the framework of this system, first, the RGB image corresponding to the scene can be processed by using the joint depth estimation neural network module, for example, using a monocular depth estimation algorithm to obtain the relative depth map of the scene corresponding to the RGB image; The RGB image of is used as the input of the joint depth estimation neural network module. Wherein, although the scale of the relative depth map is not good, the front-back relationship of the scene in the RGB image expressed by it is reasonable. It can be understood that the RGB image of the scene can also be replaced with other forms of images, such as black and white images, etc., and the replaced solution is still within the protection scope of the present application.
然后,将该相对深度图和利用深度图获取装置(比如上文所描述的深度图获取装置314)获取到的原始稀疏深度图输入至尺度计算&一致性筛选模块,通过尺度计算&一致性筛选模块对原始稀疏深度图进行尺度计算和一致性筛选,以得到稀疏深度图和尺度因子。由此,以剔除该原始稀疏深度图中的异常值,避免原始稀疏深度图中的异常值对后续输出的深度图的影响,提升后续输出的深度图的质量。其中,原始稀疏深度图是至少包含有RGB图像对应的场景中部分深度信息的深度图;稀疏深度图是将原始稀疏深度图中的异常信息剔除后所得到的深度图;尺度因子是原始稀疏深度图与相对深度图之间的比例关系。Then, input the relative depth map and the original sparse depth map obtained by the depth map acquisition device (such as the depth map acquisition device 314 described above) into the scale calculation & consistency screening module, and pass the scale calculation & consistency screening The module performs scale calculation and consistency screening on the original sparse depth map to obtain the sparse depth map and scale factor. Therefore, by removing the outliers in the original sparse depth map, the impact of the outliers in the original sparse depth map on the subsequently output depth map is avoided, and the quality of the subsequently output depth map is improved. Among them, the original sparse depth map is a depth map that contains at least part of the depth information in the scene corresponding to the RGB image; the sparse depth map is a depth map obtained by removing abnormal information in the original sparse depth map; the scale factor is the original sparse depth The scale relationship between the map and the relative depth map.
接着,可以通过尺度变换/反变换模块使用尺度因子对得到的稀疏深度图进行尺度变换,以使得该稀疏深度图的条件(比如:视场角(field of view,FOV)、横纵比等)与联合深度估计神经网络模块中的训练数据集相匹配,进而以降低联合深度估计神经网络模块输出的深度图对应的尺度与真实值间的误差,提升后续输出的深度图的质量和联合深度估计神经网络模块中神经网络的通用性。其中,尺度变换/反变换模块此时的输入为尺度因子和稀疏深度图。Then, the scale factor can be used to scale the obtained sparse depth map through the scale transformation/inverse transformation module, so that the conditions of the sparse depth map (such as: field of view (FOV), aspect ratio, etc.) Match the training data set in the joint depth estimation neural network module, and then reduce the error between the scale corresponding to the depth map output by the joint depth estimation neural network module and the real value, and improve the quality of the subsequent output depth map and joint depth estimation Generality of Neural Networks in the Neural Networks Module. Wherein, the input of the scale transformation/inverse transformation module at this time is a scale factor and a sparse depth map.
接着,将场景对应的RGB图像和已进行尺度变换的稀疏深度图同时输入至联合深度估计神经网络模块。通过该联合深度估计神经网络模块对场景对应的RGB图像和已进行尺度变换的稀疏深度图进行处理。Then, the RGB image corresponding to the scene and the scale-transformed sparse depth map are simultaneously input to the joint depth estimation neural network module. The RGB image corresponding to the scene and the scale-transformed sparse depth map are processed through the joint depth estimation neural network module.
最后,可以通过尺度变换/反变换模块使用尺度因子对联合深度估计神经网络模块输出的结果进行尺度反变换,以得到稠密、绝对尺度深度图,即得到所需的深度图。其中,由于前边步骤已对稀疏深度图进行尺度变换,所以在后续需要对联合深度估计神经网络模块输出的结果进行尺度反变换,以使得这两个深度图中的信息相匹配。Finally, the scale factor can be used to inverse scale the output of the joint depth estimation neural network module through the scale transformation/inverse transformation module to obtain a dense, absolute scale depth map, that is, to obtain the desired depth map. Among them, since the sparse depth map has been scaled in the previous step, it is necessary to inverse scale the output of the joint depth estimation neural network module in the subsequent step, so that the information in the two depth maps can match.
在一些实施例中,在实施过程中,在得到稀疏深度图后,也可以不对其进行尺度变换,而是直接将其和场景对应的RGB图像输入至联合深度估计神经网络模块。这时联合深度估计神经网络模块输出的结果即为所需的深度图,此时也不需要对联合深度估计神经网络模块输出的结果进行尺度反变换。其中,当不需要对稀疏深度图进行尺度变换时,尺度计算&一致性筛选模块也可以不输出尺度因子。In some embodiments, in the implementation process, after the sparse depth map is obtained, scale transformation may not be performed on it, but the RGB image corresponding to the scene is directly input to the joint depth estimation neural network module. At this time, the output result of the joint depth estimation neural network module is the required depth map, and there is no need to perform scale inverse transformation on the output result of the joint depth estimation neural network module. Wherein, when the sparse depth map does not need to be scaled, the scale calculation & consistency screening module may not output the scale factor.
图5示出了另一种图像处理方法的***框架。其中,图5与图4的主要区别在于:图5中获取场景的RGB图像和获取场景对应的原始稀疏深度图的装置均集成在设备300上,而图4中获取场景的RGB图像和获取场景对应的原始稀疏深度图的装置可以均集成在设备300上, 也可以部分或全部与设备300单独布置。Fig. 5 shows the system framework of another image processing method. Among them, the main difference between Fig. 5 and Fig. 4 is that: in Fig. 5, the device for obtaining the RGB image of the scene and the device for obtaining the original sparse depth map corresponding to the scene are integrated on the device 300, while in Fig. The devices corresponding to the original sparse depth map may be integrated on the device 300 , or part or all may be arranged separately from the device 300 .
下面基于上文所描述的内容,对本申请提供的图像处理方法进行详细描述。为便于叙述,以与设备300配套的图像采集装置313采集到的图像为RGB图像,且图像采集装置313为彩色摄像头为例,当然了,当图像采集装置313采集到的图像不是RGB图像的方案,仍在本申请的保护范围之内。Based on the content described above, the image processing method provided by the present application will be described in detail below. For ease of description, take the image captured by the image capture device 313 supporting the device 300 as an RGB image, and the image capture device 313 is a color camera as an example. Of course, when the image captured by the image capture device 313 is not an RGB image , still within the protection scope of the present application.
示例性的,图6示出了一种图像处理方法的流程。如图6所示,该图像处理方法可以包括以下步骤:Exemplarily, FIG. 6 shows a flow of an image processing method. As shown in Figure 6, the image processing method may include the following steps:
S601、将由彩色摄像头采集的目标场景的RGB图像单独输入联合深度估计网络,进行单目深度估计获得相对深度图。S601. Separately input the RGB image of the target scene collected by the color camera into the joint depth estimation network, and perform monocular depth estimation to obtain a relative depth map.
具体地,彩色摄像头采集到目标场景的RGB图像后,可以将该RGB图像单独输入联合深度估计网络,进行单目深度估计获得相对深度图。其中,该相对深度图虽然尺度不佳,但其所表达的RGB图像中景物的前后关系合理。示例性的,联合深度估计网络可以是预先训练好的RGB-D(深度)四通道输入的神经网络,该网络在进行单目深度估计时,将输入数据中的深度通道填零即可。Specifically, after the color camera captures the RGB image of the target scene, the RGB image can be input separately into the joint depth estimation network to perform monocular depth estimation to obtain a relative depth map. Wherein, although the scale of the relative depth map is not good, the front-back relationship of the scene in the RGB image expressed by it is reasonable. Exemplarily, the joint depth estimation network may be a pre-trained RGB-D (depth) four-channel input neural network, and when the network performs monocular depth estimation, only the depth channel in the input data is filled with zeros.
S602、将原始稀疏深度图和相对深度图进行尺度计算和一致性筛选,得到联合深度估计网络的尺度因子和剔除异常值后的稀疏深度图。S602. Perform scale calculation and consistency screening on the original sparse depth map and the relative depth map to obtain a scale factor of the joint depth estimation network and a sparse depth map after removing outliers.
具体地,得到目标场景对应的相对深度图后,在获取到原始稀疏深度图后,可以将它们进行尺度计算和一致性筛选,从而得到联合深度估计网络的尺度因子和剔除异常值后的稀疏深度图。其中,原始稀疏深度图是包含有目标场景中部分深度信息的深度图,该深度图中所包含的深度信息可能是局部的、范围有限的、低分辨率的、包含个别较大错误值的。Specifically, after obtaining the relative depth map corresponding to the target scene, after obtaining the original sparse depth map, they can be subjected to scale calculation and consistency screening, so as to obtain the scale factor of the joint depth estimation network and the sparse depth after removing outliers picture. Among them, the original sparse depth map is a depth map that contains part of the depth information in the target scene. The depth information contained in the depth map may be local, limited in scope, low-resolution, and contain individual large error values.
可以理解的是,由于相对深度图中反应的景物的前后关系是合理的,因此可以利用该相对深度图剔除掉原始稀疏深度图中的不合理的点,进而使得得到的稀疏深度图更为准确,这就提升了后续得到的深度图的质量。此外,该步骤也可以理解为是使用由彩色摄像头采集到的场景的RGB图像和对应的深度传感器或者机器视觉算法得到的场景深度图(该深度图所包含的深度信息可能是局部的、范围有限的、低分辨率的、包含个别较大错误值的),通过融合计算得到场景对应的深度图(即稀疏深度图)。It can be understood that since the front and back relationship of the scene reflected in the relative depth map is reasonable, the relative depth map can be used to eliminate unreasonable points in the original sparse depth map, thereby making the obtained sparse depth map more accurate , which improves the quality of the subsequent depth map. In addition, this step can also be understood as using the RGB image of the scene collected by the color camera and the scene depth map obtained by the corresponding depth sensor or machine vision algorithm (the depth information contained in the depth map may be local and limited in scope) , low-resolution, and containing individual large error values), the depth map corresponding to the scene (that is, the sparse depth map) is obtained through fusion calculation.
作为一种可能的实现方式,如图7所示,得到尺度因子和稀疏深度图的过程可以包括以下步骤:As a possible implementation, as shown in Figure 7, the process of obtaining the scale factor and the sparse depth map may include the following steps:
S701、将原始稀疏深度信息通过投影到RGB相机视角、升/降采样、合并等方法,获得原始稀疏深度图。S701. Obtain an original sparse depth map by projecting the original sparse depth information to an RGB camera angle of view, up/down sampling, merging and other methods.
具体地,当前期获取到的是目标场景对应的原始稀疏深度信息时,可以将原始稀疏深度信息通过投影到RGB相机视角、升/降采样、合并等方法,获得原始稀疏深度图,由此以得到原始稀疏深度图。Specifically, when the original sparse depth information corresponding to the target scene is obtained in the previous period, the original sparse depth information can be projected to the RGB camera view angle, up/down sampling, combined, etc. to obtain the original sparse depth map, thereby using Get the original sparse depth map.
在一个例子中,当前期获取的是原始稀疏深度图时,该步骤则可以省略。In an example, this step can be omitted when the original sparse depth map is obtained in the previous stage.
S702、计算原始稀疏深度图中每个有效深度值和相对深度图中对应位置的深度值之比r,以及所有r的平均值R。S702. Calculate the ratio r between each effective depth value in the original sparse depth map and the depth value at a corresponding position in the relative depth map, and an average value R of all r's.
具体地,得到原始稀疏深度图后,可以计算原始稀疏深度图中每个有效深度值和相对深度图中对应位置的深度值之比r。以及计算所有r的平均值R。Specifically, after the original sparse depth map is obtained, the ratio r of each effective depth value in the original sparse depth map to the depth value at a corresponding position in the relative depth map can be calculated. and calculate the average R of all r.
S703、对于原始稀疏深度图中每个有效点,计算每个有效点对应的r/R。S703. For each valid point in the original sparse depth map, calculate r/R corresponding to each valid point.
具体地,得到R后,可以分别计算原始稀疏深度图中每个有效点对应的r/R。示例性的,有效点可以理解为是在原始稀疏深度图中的有效深度值对应的点。Specifically, after obtaining R, r/R corresponding to each effective point in the original sparse depth map can be calculated respectively. Exemplarily, a valid point can be understood as a point corresponding to a valid depth value in the original sparse depth map.
S704、判断N个有效点中的第i个有效点对应的r/R是否处于预设范围,i的初始值为1,N为原始稀疏深度图中有效点的总数。S704. Determine whether r/R corresponding to the i-th valid point among the N valid points is within a preset range, the initial value of i is 1, and N is the total number of valid points in the original sparse depth map.
具体地,将N个有效点中的第i个有效点对应的r/R与预设范围进行对比,即可以确定出该第i个有效点是否处于预设范围内,若是,则执行S705,否则,则执行S706。i的初始值为1,N为原始稀疏深度图中有效点的总数。其中,当第i个有效点未处于预设范围内时,表明该第i个有效点为离群点,即该点为异常值,因此可以剔除该有效点,即执行S706。当第i个有效点处于预设范围内时,表明该第i个有效点为正常值值,因此可以保留该有效点,即执行S705。Specifically, comparing the r/R corresponding to the i-th effective point among the N effective points with the preset range, it can be determined whether the i-th effective point is within the preset range, and if so, execute S705, Otherwise, execute S706. The initial value of i is 1, and N is the total number of valid points in the original sparse depth map. Wherein, when the i-th effective point is not within the preset range, it indicates that the i-th effective point is an outlier point, that is, the point is an abnormal value, so the effective point can be eliminated, that is, S706 is executed. When the i-th valid point is within the preset range, it indicates that the i-th valid point is a normal value, so the valid point can be reserved, that is, S705 is executed.
S705、保留第i个有效点,以及i=i+1。S705. Keep the i-th effective point, and i=i+1.
具体地,当第i个有效点对应的r/R处于预设范围内时,则保留该点。以及将i=i+1,并执行S707,以便遍历到N个有效点中的每个有效点。Specifically, when r/R corresponding to the i-th effective point is within a preset range, this point is retained. And set i=i+1, and execute S707, so as to traverse to each effective point in the N effective points.
S706、剔除第i个有效点,以及i=i+1。S706. Eliminate the i-th effective point, and i=i+1.
具体地,当第i个有效点对应的r/R未处于预设范围内时,则剔除该点。以及将i=i+1,并执行S707,以便遍历到N个有效点中的每个有效点。Specifically, when the r/R corresponding to the i-th effective point is not within the preset range, this point is eliminated. And set i=i+1, and execute S707, so as to traverse to each effective point in the N effective points.
S707、判断i是否小于或等于N。S707. Determine whether i is less than or equal to N.
具体地,将i与N进行比较即可以确定出两者的大小。其中,当i小于或等于N时,表明此时未遍历到N个有效点中的每个点,因此可以返回执行S704;当i大于N时,表明此时已遍历到N个有效点中的每个点,此时已经可以得到剔除异常点的稀疏深度图,因此可以结束遍历过程,即执行S708。Specifically, the size of the two can be determined by comparing i with N. Wherein, when i is less than or equal to N, it indicates that each point in the N effective points has not been traversed at this time, so S704 can be returned; when i is greater than N, it indicates that it has traversed to each of the N effective points at this time For each point, the sparse depth map excluding abnormal points can already be obtained at this time, so the traversal process can be ended, that is, S708 is executed.
S708、得到稀疏深度图。S708. Obtain a sparse depth map.
具体地,在遍历完N个有效点中的每个有效点后,即可以得到稀疏深度图。Specifically, after traversing each effective point among the N effective points, a sparse depth map can be obtained.
S709、将R作为尺度因子。S709. Use R as a scaling factor.
具体地,可以将上述得到的R作为尺度因子。Specifically, R obtained above may be used as a scaling factor.
在一些实施例中,继续参阅图7,可以重复执行S702至S708,直至迭代M轮,M为大于或等于1的正整数,或者直至无有效点可剔除。其中,在重复执行S702至S708时,本次执行S702中所需的原始稀疏深度图可以为前次执行S702至S708所得到的稀疏深度图。此外,S709所得到的R为最后一次执行S702所得到的R,且最终获取到的稀疏深度图为最后一次执行S702至S708所获取到的稀疏深度图。In some embodiments, referring to FIG. 7 , S702 to S708 may be repeatedly executed until M iterations, where M is a positive integer greater than or equal to 1, or until there is no valid point to be eliminated. Wherein, when S702 to S708 are repeatedly executed, the original sparse depth map required in this execution of S702 may be the sparse depth map obtained from the previous execution of S702 to S708. In addition, the R obtained in S709 is the R obtained in the last execution of S702, and the finally obtained sparse depth map is the sparse depth map obtained in the last execution of S702 to S708.
这样就得到了稀疏深度图和尺度因子,之后即可以执行S603。In this way, the sparse depth map and the scale factor are obtained, and then S603 can be executed.
S603、利用尺度因子对获取到的稀疏深度图进行尺度变换。S603. Perform scale transformation on the acquired sparse depth map by using the scale factor.
具体地,在得到稀疏深度图和尺度因子后,可以利用尺度因子对获取到的稀疏深度图进行尺度变换。Specifically, after obtaining the sparse depth map and the scale factor, the scale factor can be used to perform scale transformation on the obtained sparse depth map.
示例性的,当R为原始稀疏深度图相对于相对深度图的尺度因子时,尺度变换可以为将筛选后得到的稀疏深度图中每个有效点深度值除以尺度因子R进行缩放。在一个例子中,当上述r为原始稀疏深度图中每个有效深度值除以相对深度图中对应位置的深度值得到的值时,R可以理解为是原始稀疏深度图相对于相对深度图的尺度因子。Exemplarily, when R is the scale factor of the original sparse depth map relative to the relative depth map, the scale transformation may be scaling by dividing the depth value of each effective point in the sparse depth map obtained after filtering by the scale factor R. In an example, when the above r is the value obtained by dividing each effective depth value in the original sparse depth map by the depth value of the corresponding position in the relative depth map, R can be understood as the ratio of the original sparse depth map to the relative depth map scale factor.
当R为相对深度图相对于原始稀疏深度图的尺度因子时,尺度变换可以为将筛选后得到的稀疏深度图中每个有效点深度值乘以尺度因子R进行缩放。在一个例子中,当上述r为相 对深度图中与原始稀疏深度图中每个有效深度值相对应的位置处的深度值除以原始稀疏深度图中对应位置的深度值得到的值时,R可以理解为是相对深度图相对于原始稀疏深度图的尺度因子。When R is the scale factor of the relative depth map relative to the original sparse depth map, the scale transformation can be scaled by multiplying the depth value of each valid point in the filtered sparse depth map by the scale factor R. In an example, when the above r is the value obtained by dividing the depth value at the position corresponding to each effective depth value in the original sparse depth map in the relative depth map by the depth value at the corresponding position in the original sparse depth map, R It can be understood as the scale factor of the relative depth map relative to the original sparse depth map.
S604、将尺度变换后的稀疏深度图和目标场景的RGB图像拼接为RGB-D(深度)四通道输入数据同时输入联合深度估计网络,并经联合深度估计网络处理后得到一个深度图。S604. Stitch the scale-transformed sparse depth map and the RGB image of the target scene into RGB-D (depth) four-channel input data and simultaneously input the joint depth estimation network, and obtain a depth map after being processed by the joint depth estimation network.
具体地,在得到尺度变换后的稀疏深度图后,可以将该稀疏深度图和目标场景的RGB图像拼接为RGB-D(深度)四通道输入数据同时输入联合深度估计网络,并经联合深度估计网络处理后得到一个深度图。Specifically, after obtaining the scale-transformed sparse depth map, the sparse depth map and the RGB image of the target scene can be spliced into RGB-D (depth) four-channel input data and input to the joint depth estimation network at the same time, and the joint depth estimation A depth map is obtained after network processing.
S605、将联合深度估计网络输出的深度图进行尺度反变换,得到所需的深度图。S605. Perform scale inverse transformation on the depth map output by the joint depth estimation network to obtain a required depth map.
具体地,在联合深度估计网络输出深度图后,可以对该深度图进行尺度反变换,进而得到所需的深度图(即稠密、绝对尺度深度图)。Specifically, after the joint depth estimation network outputs the depth map, the depth map can be inversely transformed to obtain the required depth map (ie, dense, absolute scale depth map).
示例性的,当尺度因子为原始稀疏深度图相对于相对深度图的尺度因子时,尺度反变换是指将神经网络输出的深度图中每个有效点深度值乘以尺度因子R进行缩放;当尺度因子为相对深度图相对于原始稀疏深度图的尺度因子时,尺度反变换是指将神经网络输出的深度图中每个有效点深度值除以尺度因子R进行缩放其中,尺度反变换是与上述尺度变换相反的过程。Exemplarily, when the scale factor is the scale factor of the original sparse depth map relative to the relative depth map, scale inverse transformation refers to multiplying the depth value of each effective point in the depth map output by the neural network by the scale factor R to scale; when When the scale factor is the scale factor of the relative depth map relative to the original sparse depth map, the inverse scale transformation refers to dividing the depth value of each effective point in the depth map output by the neural network by the scale factor R to scale. Among them, the inverse scale transformation is the same as The above scale transformation is the reverse process.
在一些实施例中,当不需要尺度变换时,S602中也可以不用得到尺度因子,同时,在执行S602后可以直接执行S604,并将S604输出的结果作为最终所需的结果。In some embodiments, when scale conversion is not required, the scale factor may not be obtained in S602, and at the same time, S604 may be directly performed after performing S602, and the result output by S604 may be used as the final required result.
在一些实施例中,当需要尺度变换时,在S601由彩色摄像头采集的目标场景的RGB图像的图像条件(比如视场角、横纵比等)与联合深度估计网络对应的训练集不匹配。当不需要尺度变换时,在S601由彩色摄像头采集的目标场景的RGB图像的图像条件与联合深度估计网络对应的训练集匹配。In some embodiments, when scale conversion is required, the image conditions (such as field of view, aspect ratio, etc.) of the RGB image of the target scene captured by the color camera at S601 do not match the training set corresponding to the joint depth estimation network. When scale conversion is not required, at S601 the image condition of the RGB image of the target scene captured by the color camera is matched with the training set corresponding to the joint depth estimation network.
由此,通过利用目标场景的相对深度图对原始稀疏深度图中的有效点进行筛选,剔除了原始稀疏深度图中的异常值,减少了深度图的变形失真,提升了后续处理得到的深度图的质量。另外,通过尺度变换使得输入至神经网络中的深度图对应的图像条件可以与该神经网络中的数据集相匹配,降低了神经网络输出的深度图对应的尺度与真实值间的误差,保证了深度估计结果符合实际尺度,提升了神经网络输出的深度图的质量,同时也提升了神经网络和/或深度估计算法的通用性。Therefore, by using the relative depth map of the target scene to filter the effective points in the original sparse depth map, the outliers in the original sparse depth map are eliminated, the deformation and distortion of the depth map are reduced, and the depth map obtained by subsequent processing is improved. the quality of. In addition, through scale transformation, the image conditions corresponding to the depth map input into the neural network can match the data set in the neural network, reducing the error between the scale corresponding to the depth map output by the neural network and the real value, ensuring The depth estimation result conforms to the actual scale, which improves the quality of the depth map output by the neural network, and also improves the versatility of the neural network and/or depth estimation algorithm.
接下来,基于上文所描述的图像处理方法,对本申请实施例提供的另一种图像处理方法进行介绍。可以理解的是,该方法是上文所描述的图像处理方法的另一种表达方式,两者是相结合的。该方法是基于上文所描述的图像处理方法提出,该方法中的部分或全部内容可以参见上文中图像处理方法的描述。Next, another image processing method provided by the embodiment of the present application is introduced based on the image processing method described above. It can be understood that this method is another expression of the image processing method described above, and the two are combined. This method is proposed based on the image processing method described above, and part or all of the content of this method may refer to the description of the image processing method above.
请参阅图8,图8是本申请实施例提供的一种图像处理方法的流程示意图。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行。如图8所示,该图像处理方法包括:Please refer to FIG. 8 . FIG. 8 is a schematic flowchart of an image processing method provided in an embodiment of the present application. It can be understood that the method can be executed by any device, device, platform, or device cluster that has computing and processing capabilities. As shown in Figure 8, the image processing method includes:
S801、获取目标场景的第一图像。S801. Acquire a first image of a target scene.
具体地,可以当不限于通过图像采集装置(比如相机、摄像头等)获取目标场景的第一图像。示例性的,该第一图像可以但不限于为RGB图像。Specifically, the first image of the target scene may be acquired by an image acquisition device (such as a camera, a camera, etc.), but is not limited to. Exemplarily, the first image may be, but not limited to, an RGB image.
S802、将第一图像输入至第一神经网络,以得到第一图像对应的第一深度图。S802. Input the first image to the first neural network to obtain a first depth map corresponding to the first image.
具体地,获取到第一图像后,可以将该第一图像输入至第一神经网络,以通过第一神经网络对第一图像进行处理,例如通过单目深度估计算法处理第一图像,得到该第一图像对应的第一深度图。示例性的,该第一深度图可以为上文图6中所描述的相对深度图。Specifically, after the first image is acquired, the first image can be input into the first neural network to process the first image through the first neural network, for example, the first image is processed through a monocular depth estimation algorithm to obtain the The first depth map corresponding to the first image. Exemplarily, the first depth map may be the relative depth map described above in FIG. 6 .
S803、获取目标场景对应的第二深度图,第二深度图至少用于表征目标场景中所包含的部分景物的深度信息,其中,第二深度图所包含的深度信息少于或等于第一深度图所包含的深度信息。S803. Acquire a second depth map corresponding to the target scene, where the second depth map is at least used to represent depth information of some scenes contained in the target scene, wherein the depth information contained in the second depth map is less than or equal to the first depth The depth information contained in the graph.
具体地,可以但不限于通过深度图获取装置(比如深度传感器和/或惯性传感器等)获取目标场景对应的第二深度图。第二深度图至少用于表征目标场景中所包含的部分景物的深度信息,其中,第二深度图所包含的深度信息少于或等于第一深度图所包含的深度信息。示例性的,第二深度图可以为上文图6中所描述的原始稀疏深度图。Specifically, the second depth map corresponding to the target scene may be obtained, but not limited to, by a depth map obtaining device (such as a depth sensor and/or an inertial sensor, etc.). The second depth map is at least used to represent the depth information of some scenes included in the target scene, wherein the depth information included in the second depth map is less than or equal to the depth information included in the first depth map. Exemplarily, the second depth map may be the original sparse depth map described above in FIG. 6 .
S804、基于第一深度图,剔除第二深度图中的异常值,得到第三深度图。S804. Based on the first depth map, remove abnormal values in the second depth map to obtain a third depth map.
具体地,确定出第一深度图和第二深度图后,可以基于第一深度图,剔除第二深度图中的异常值,得到第三深度图。示例性的,第三深度图可以为上文图6中所描述的稀疏深度图。在一个例子中,剔除可以理解为移除、去除等,其所表达的意思上在第二深度图中保留下正常值,而去除掉异常值。Specifically, after the first depth map and the second depth map are determined, abnormal values in the second depth map may be eliminated based on the first depth map to obtain a third depth map. Exemplarily, the third depth map may be the sparse depth map described above in FIG. 6 . In one example, culling can be understood as removal, removal, etc., which means to keep normal values in the second depth map and remove outliers.
作为一种可能的实现方式,可以先分别确定第二深度图中每个第一目标点的深度值与第一深度图中第二目标点的深度值间的比值,以得到N个比值,其中,第二深度图中包括N个第一目标点,N为大于或等于1的正整数,第二目标点为与第一目标点的位置相同的点。然后,再根据N个比值,确定用于表征第一深度图和第二深度图之间的比例的尺度因子。接着,再分别确定每个第一目标点对应的目标比值与尺度因子之间的偏差值,以得到N个偏差值,其中,目标比值为第一目标点与第二目标点间的比值。最后,在第二深度图中,将N个偏差值中未处于预设偏差范围内的偏差值对应的第一目标点剔除,以得到第三深度图。其中,该过程可以参见上文图7中的相关描述,此处就不再一一赘述。示例性的,该偏差值可以为比值或差值,预设偏差范围可以但不限于为上文图7中所描述的预设范围。As a possible implementation, the ratio between the depth value of each first target point in the second depth map and the depth value of the second target point in the first depth map can be determined first to obtain N ratios, where , the second depth map includes N first target points, N is a positive integer greater than or equal to 1, and the second target point is a point at the same position as the first target point. Then, according to the N ratios, a scale factor used to characterize the ratio between the first depth map and the second depth map is determined. Then, the deviation value between the target ratio corresponding to each first target point and the scale factor is respectively determined to obtain N deviation values, wherein the target ratio is the ratio between the first target point and the second target point. Finally, in the second depth map, the first target points corresponding to the deviation values that are not within the preset deviation range among the N deviation values are eliminated to obtain the third depth map. Wherein, this process can refer to the relevant description in FIG. 7 above, and details will not be repeated here. Exemplarily, the deviation value may be a ratio or a difference, and the preset deviation range may be, but not limited to, the preset range described in FIG. 7 above.
S805、将第一图像和第三深度图输入至第二神经网络,以得到第四深度图。S805. Input the first image and the third depth map to the second neural network to obtain a fourth depth map.
具体地,得到第三深度图后,可以将第一图像和第三深度图同时或分时输入至第二神经网络中,得到第四深度图。示例性的,该第二神经网络与第一神经网络可以为同一神经网络。Specifically, after the third depth map is obtained, the first image and the third depth map may be input into the second neural network simultaneously or in time division to obtain the fourth depth map. Exemplarily, the second neural network and the first neural network may be the same neural network.
S806、输出第四深度图。S806. Output the fourth depth map.
具体地,得到第四深度图后,可以输出该第四深度图。Specifically, after the fourth depth map is obtained, the fourth depth map may be output.
由此,通过利用目标场景的相对深度图对原始稀疏深度图中的有效点进行筛选,剔除了原始稀疏深度图中的异常值,减少了深度图的变形失真,提升了后续处理得到的深度图的质量。Therefore, by using the relative depth map of the target scene to filter the effective points in the original sparse depth map, the outliers in the original sparse depth map are eliminated, the deformation and distortion of the depth map are reduced, and the depth map obtained by subsequent processing is improved. the quality of.
在一些实施例中,在S805之前,可以基于尺度因子,对第三深度图进行尺度变换,该尺度因子基于第一深度图和第二深度图确定,该尺度因子用于表征第一深度图和第二深度图之间的比例。以及,在S806之前,基于该尺度因子,对第四深度图进行尺度反变换。由此通过尺度变换使得输入至神经网络中的深度图对应的图像条件可以与该神经网络中的训练数据集相匹配,降低了神经网络输出的深度图对应的尺度与真实值间的误差,保证了深度估计结果符合实际尺度,提升了神经网络输出的深度图的质量,同时也提升了神经网络和/或深度估计算法的通用性。其中,对于尺度变换和尺度反变换的过程可以参见上文图6中的描述,此处就不再一一赘述。In some embodiments, before S805, scale transformation may be performed on the third depth map based on a scale factor, the scale factor is determined based on the first depth map and the second depth map, and the scale factor is used to characterize the first depth map and the second depth map. Scale between second depth maps. And, before S806, perform scale inverse transformation on the fourth depth map based on the scale factor. Therefore, through scale transformation, the image conditions corresponding to the depth map input into the neural network can match the training data set in the neural network, reducing the error between the scale corresponding to the depth map output by the neural network and the real value, ensuring This ensures that the depth estimation result conforms to the actual scale, improves the quality of the depth map output by the neural network, and also improves the versatility of the neural network and/or the depth estimation algorithm. Wherein, for the processes of scale transformation and scale inverse transformation, refer to the description in FIG. 6 above, and details will not be repeated here.
可以理解的是,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。此外,在一些可能的实现方式中,上述实施例中的各步骤可以根据实际情况选择性执行,可以部分执行,也可以全部执行,此处不做限定。此外,本申请的任意实施例的任意特征的全部或部分在不矛盾的前提下,可以自由地、任何地组合。组合后的技术方案也在本申请的范围之内。It can be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any obligation for the implementation process of the embodiment of the present application. limited. In addition, in some possible implementation manners, the steps in the foregoing embodiments may be selectively executed according to actual conditions, may be partially executed, or may be completely executed, which is not limited here. In addition, all or part of any feature of any embodiment of the present application can be freely and in any combination on the premise of no contradiction. The combined technical solutions are also within the scope of the present application.
基于上述实施例中的描述的方法,本申请实施例还提供了一种图像处理装置。请参阅图9,图9为本申请实施例提供的一种图像处理装置的结构示意图。如图9所示,图像处理装置900包括一个或多个处理器901以及接口电路902。可选的,图像处理装置900还可以包含总线903。其中:Based on the methods described in the foregoing embodiments, an embodiment of the present application further provides an image processing apparatus. Please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of an image processing device provided by an embodiment of the present application. As shown in FIG. 9 , an image processing device 900 includes one or more processors 901 and an interface circuit 902 . Optionally, the image processing apparatus 900 may further include a bus 903 . in:
处理器901可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器901中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器901可以是通用处理器、神经网络处理器(Neural Network Processing Unit,NPU)、数字通信器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 901 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 901 or instructions in the form of software. The above-mentioned processor 901 may be a general-purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods and steps disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
接口电路902可以用于数据、指令或者信息的发送或者接收,处理器901可以利用接口电路902接收的数据、指令或者其它信息,进行加工,可以将加工完成信息通过接口电路902发送出去。The interface circuit 902 can be used for sending or receiving data, instructions or information. The processor 901 can process the data, instructions or other information received by the interface circuit 902 , and can send the processing completion information through the interface circuit 902 .
可选的,图像处理装置900还包括存储器,存储器可以包括只读存储器和随机存取存储器,并向处理器提供操作指令和数据。存储器的一部分还可以包括非易失性随机存取存储器(NVRAM)。其中,该存储器可以与处理器901耦合。Optionally, the image processing apparatus 900 further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (NVRAM). Wherein, the memory may be coupled with the processor 901 .
可选的,存储器存储了可执行软件模块或者数据结构,处理器901可以通过调用存储器存储的操作指令(该操作指令可存储在操作***中),执行相应的操作。Optionally, the memory stores executable software modules or data structures, and the processor 901 can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
可选的,接口电路902可用于输出处理器901的执行结果。Optionally, the interface circuit 902 may be used to output an execution result of the processor 901 .
需要说明的,处理器901、接口电路902各自对应的功能既可以通过硬件设计实现,也可以通过软件设计来实现,还可以通过软硬件结合的方式来实现,这里不作限制。It should be noted that the corresponding functions of the processor 901 and the interface circuit 902 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here.
应理解,上述方法实施例的各步骤可以通过处理器中的硬件形式的逻辑电路或者软件形式的指令完成。It should be understood that each step in the foregoing method embodiments may be implemented by logic circuits in the form of hardware or instructions in the form of software in the processor.
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。It can be understood that the processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor, or any conventional processor.
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmablerom,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、 电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。The method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmablerom, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or any known in the art other forms of storage media. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted via a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。It can be understood that the various numbers involved in the embodiments of the present application are only for convenience of description, and are not used to limit the scope of the embodiments of the present application.

Claims (11)

  1. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:
    获取目标场景的第一图像;Obtain the first image of the target scene;
    将所述第一图像输入至第一神经网络,以得到所述第一图像对应的第一深度图;inputting the first image into a first neural network to obtain a first depth map corresponding to the first image;
    获取所述目标场景对应的第二深度图,所述第二深度图至少用于表征所述目标场景中所包含的部分景物的深度信息,其中,所述第二深度图所包含的深度信息少于或等于所述第一深度图所包含的深度信息;Acquiring a second depth map corresponding to the target scene, the second depth map is at least used to represent the depth information of some scenes contained in the target scene, wherein the second depth map contains less depth information be equal to or equal to the depth information included in the first depth map;
    基于所述第一深度图,剔除所述第二深度图中的异常值,得到第三深度图;Based on the first depth map, removing outliers in the second depth map to obtain a third depth map;
    将所述第一图像和所述第三深度图输入至第二神经网络,以得到第四深度图;inputting the first image and the third depth map to a second neural network to obtain a fourth depth map;
    输出所述第四深度图。outputting the fourth depth map.
  2. 根据权利要求1所述的方法,其特征在于,所述将所述第一图像和所述第三深度图输入至第二神经网络之前,所述方法还包括:The method according to claim 1, wherein before inputting the first image and the third depth map into the second neural network, the method further comprises:
    基于尺度因子,对所述第三深度图进行尺度变换,所述尺度因子基于所述第一深度图和所述第二深度图确定,所述尺度因子用于表征所述第一深度图和所述第二深度图之间的比例;performing scale transformation on the third depth map based on a scale factor, the scale factor is determined based on the first depth map and the second depth map, and the scale factor is used to characterize the first depth map and the second depth map The ratio between the second depth map;
    所述输出所述第四深度图之前,所述方法还包括:Before the output of the fourth depth map, the method further includes:
    基于所述尺度因子,对所述第四深度图进行尺度反变换。Based on the scale factor, inverse scale transformation is performed on the fourth depth map.
  3. 根据权利要求2所述的方法,其特征在于,所述尺度因子用于表征所述第二深度图相对于所述第一深度图的比例;The method according to claim 2, wherein the scale factor is used to characterize the ratio of the second depth map to the first depth map;
    所述基于尺度因子,对所述第三深度图进行尺度变换,具体包括:The performing scale transformation on the third depth map based on the scale factor specifically includes:
    将所述第三深度图中的每个点对应的深度值均除以所述尺度因子;Dividing the depth value corresponding to each point in the third depth map by the scale factor;
    所述基于所述尺度因子,对所述第四深度图进行尺度反变换,具体包括:The performing scale inverse transformation on the fourth depth map based on the scale factor specifically includes:
    将所述第四深度图中的每个点对应的深度值均乘以所述尺度因子。multiplying the depth value corresponding to each point in the fourth depth map by the scale factor.
  4. 根据权利要求2所述的方法,其特征在于,所述尺度因子用于表征所述第一深度图相对于所述第二深度图的比例;The method according to claim 2, wherein the scale factor is used to characterize the ratio of the first depth map to the second depth map;
    所述基于尺度因子,对所述第三深度图进行尺度变换,具体包括:The performing scale transformation on the third depth map based on the scale factor specifically includes:
    将所述第三深度图中的每个点对应的深度值均乘以所述尺度因子;multiplying the depth value corresponding to each point in the third depth map by the scale factor;
    所述基于所述尺度因子,对所述第四深度图进行尺度反变换,具体包括:The performing scale inverse transformation on the fourth depth map based on the scale factor specifically includes:
    将所述第四深度图中的每个点对应的深度值均除以所述尺度因子。Dividing the depth value corresponding to each point in the fourth depth map by the scale factor.
  5. 根据权利要求2-4任一所述的方法,其特征在于,所述基于尺度因子,对所述第三深度图进行尺度变换之前,所述方法还包括:The method according to any one of claims 2-4, wherein, before performing scale transformation on the third depth map based on the scale factor, the method further includes:
    确定所述第一图像的图像条件与所述第二神经网络对应的训练数据集不匹配,其中,所述图像条件包括视场角和横纵比中的一项或多项。It is determined that the image condition of the first image does not match the training data set corresponding to the second neural network, wherein the image condition includes one or more items of an angle of view and an aspect ratio.
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述基于所述第一深度图,剔除所述第二深度图中的异常值,得到第三深度图,具体包括:The method according to any one of claims 1-5, wherein the removing outliers in the second depth map based on the first depth map to obtain a third depth map specifically includes:
    分别确定所述第二深度图中每个第一目标点的深度值与所述第一深度图中第二目标点的深度值间的比值,以得到N个比值,其中,所述第二深度图中包括N个第一目标点,N为大于或等于1的正整数,所述第二目标点为与所述第一目标点的位置相同的点;Determine the ratio between the depth value of each first target point in the second depth map and the depth value of the second target point in the first depth map to obtain N ratios, wherein the second depth The figure includes N first target points, N is a positive integer greater than or equal to 1, and the second target point is the same point as the first target point;
    根据所述N个比值,确定尺度因子,所述尺度因子用于表征所述第一深度图和所述第二深度图之间的比例;Determine a scale factor according to the N ratios, where the scale factor is used to characterize the ratio between the first depth map and the second depth map;
    分别确定每个所述第一目标点对应的目标比值与所述尺度因子之间的偏差值,以得到N个偏差值,所述目标比值为所述第一目标点与所述第二目标点间的比值;Determining the deviation value between the target ratio corresponding to each of the first target points and the scale factor respectively to obtain N deviation values, the target ratio being the first target point and the second target point ratio between
    在所述第二深度图中,将所述N个偏差值中未处于预设偏差范围内的偏差值对应的所述第一目标点剔除,以得到所述第三深度图。In the second depth map, the first target point corresponding to a deviation value that is not within a preset deviation range among the N deviation values is eliminated to obtain the third depth map.
  7. 根据权利要求1-6任一所述的方法,其特征在于,所述第一图像通过图像采集装置获得;The method according to any one of claims 1-6, wherein the first image is obtained by an image acquisition device;
    所述第二深度图通过深度传感器和/或利用惯性传感器并结合多视几何算法获得。The second depth map is obtained by using a depth sensor and/or using an inertial sensor in combination with a multi-view geometry algorithm.
  8. 一种图像处理装置,其特征在于,包括至少一个处理器和接口;An image processing device, characterized in that it includes at least one processor and an interface;
    所述至少一个处理器通过所述接口获取程序指令或者数据;The at least one processor obtains program instructions or data through the interface;
    所述至少一个处理器用于执行所述程序行指令,以实现如权利要求1-7任一所述的方法。The at least one processor is configured to execute the program line instructions, so as to realize the method according to any one of claims 1-7.
  9. 一种设备,其特征在于,包括:A device, characterized in that it comprises:
    至少一个存储器,用于存储程序;at least one memory for storing programs;
    至少一个处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行如权利要求1-7任一所述的方法。At least one processor is configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute the method according to any one of claims 1-7.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-7任一所述的方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program runs on an electronic device, the electronic device executes the method according to any one of claims 1-7 .
  11. 一种计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-7任一所述的方法。A computer program product, characterized in that, when the computer program product is run on an electronic device, the electronic device is made to execute the method according to any one of claims 1-7.
PCT/CN2022/133950 2021-12-09 2022-11-24 Image processing method, apparatus and device WO2023103792A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111499904.X 2021-12-09
CN202111499904.XA CN116258754A (en) 2021-12-09 2021-12-09 Image processing method, device and equipment

Publications (1)

Publication Number Publication Date
WO2023103792A1 true WO2023103792A1 (en) 2023-06-15

Family

ID=86686670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/133950 WO2023103792A1 (en) 2021-12-09 2022-11-24 Image processing method, apparatus and device

Country Status (2)

Country Link
CN (1) CN116258754A (en)
WO (1) WO2023103792A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
CN111985535A (en) * 2020-07-17 2020-11-24 南京大学 Method and device for optimizing human body depth map through neural network
US20200410699A1 (en) * 2018-03-13 2020-12-31 Magic Leap, Inc. Image-enhanced depth sensing using machine learning
CN112861729A (en) * 2021-02-08 2021-05-28 浙江大学 Real-time depth completion method based on pseudo-depth map guidance

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200410699A1 (en) * 2018-03-13 2020-12-31 Magic Leap, Inc. Image-enhanced depth sensing using machine learning
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
CN111985535A (en) * 2020-07-17 2020-11-24 南京大学 Method and device for optimizing human body depth map through neural network
CN112861729A (en) * 2021-02-08 2021-05-28 浙江大学 Real-time depth completion method based on pseudo-depth map guidance

Also Published As

Publication number Publication date
CN116258754A (en) 2023-06-13

Similar Documents

Publication Publication Date Title
WO2020207191A1 (en) Method and apparatus for determining occluded area of virtual object, and terminal device
WO2021088473A1 (en) Image super-resolution reconstruction method, image super-resolution reconstruction apparatus, and computer-readable storage medium
WO2018119889A1 (en) Three-dimensional scene positioning method and device
WO2018214505A1 (en) Method and system for stereo matching
CN110070598B (en) Mobile terminal for 3D scanning reconstruction and 3D scanning reconstruction method thereof
EP3135033B1 (en) Structured stereo
WO2022206020A1 (en) Method and apparatus for estimating depth of field of image, and terminal device and storage medium
US20220012457A1 (en) Image processing method, microscope, image processing system, and medium based on artificial intelligence
CN112784874B (en) Binocular vision stereo matching method and device, electronic equipment and storage medium
WO2021179745A1 (en) Environment reconstruction method and device
US20230237683A1 (en) Model generation method and apparatus based on multi-view panoramic image
WO2023160426A1 (en) Video frame interpolation method and apparatus, training method and apparatus, and electronic device
CN111105452A (en) High-low resolution fusion stereo matching method based on binocular vision
TWI528783B (en) Methods and systems for generating depth images and related computer products
CN111739071A (en) Rapid iterative registration method, medium, terminal and device based on initial value
CN114881901A (en) Video synthesis method, device, equipment, medium and product
CN112270748B (en) Three-dimensional reconstruction method and device based on image
WO2023103792A1 (en) Image processing method, apparatus and device
CN117745845A (en) Method, device, equipment and storage medium for determining external parameter information
US7751613B2 (en) Method for rapidly establishing image space relation using plane filtering constraint
CN112508996A (en) Target tracking method and device for anchor-free twin network corner generation
WO2023142732A1 (en) Image processing method and apparatus, and electronic device
WO2023082685A1 (en) Video enhancement method and apparatus, and computer device and storage medium
CN116012418A (en) Multi-target tracking method and device
CN117252912A (en) Depth image acquisition method, electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22903215

Country of ref document: EP

Kind code of ref document: A1