CN115272709A

CN115272709A - Training method, device, equipment, medium and product of deep completion model

Info

Publication number: CN115272709A
Application number: CN202210908195.4A
Authority: CN
Inventors: 崔致豪; 丁有爽; 邵天兰
Original assignee: Mech Mind Robotics Technologies Co Ltd
Current assignee: Mech Mind Robotics Technologies Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-11-01
Anticipated expiration: 2042-07-29
Also published as: CN115272709B

Abstract

The disclosure provides a training method, a device, equipment, a medium and a product of a depth completion model, wherein the method comprises the steps of obtaining a training image and a first depth image corresponding to the training image, wherein the training image is a two-dimensional image; adding a depth defect into the first depth image to generate a second depth image corresponding to the training image; and performing depth completion training on the depth completion model according to the training image, the first depth image and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on the input depth image with depth missing. According to the method, the deep completion model is established and deep learning is carried out, the trained deep completion model is used for completing the depth map with the deficiency, and a solution for deep completion is provided for the depth map with the depth deficiency.

Description

Training method, device, equipment, medium and product of deep completion model

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a training method, device, apparatus, medium, and product for a deep completion model.

Background

With the development of artificial intelligence technology, the application field of the artificial intelligence technology is wider and wider, for example, with the popularization of deep learning training models, image completion is performed by using a deep learning method.

In practical application, for example, an intelligent sorting model is used, the depth map of an object needs to be analyzed to accurately grab the object, but due to the fact that the object is shielded and the like, the depth of the object in the depth map is lost, and therefore the intelligent sorting model has obstacles in the object grabbing process. Therefore, it is an urgent problem to perform depth completion on a depth map with depth deficiency.

Disclosure of Invention

The disclosure provides a training method, device, equipment, medium and product of a deep completion model, which are used for completing a depth map with deficiency.

In one aspect, the present disclosure provides a training method for a deep completion model, including:

acquiring a training image and a first depth image corresponding to the training image, wherein the training image is a two-dimensional image;

adding a depth defect into the first depth image to generate a second depth image corresponding to the training image;

and performing depth completion training on the depth completion model according to the training image, the first depth image and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on the input depth missing image.

In an embodiment, the adding a depth defect to the first depth image to generate a second depth image corresponding to the training image includes:

randomly removing the local area depth information of the first depth image, and generating a second depth image corresponding to the training image; or,

and shielding a local area of the first depth image by using the object depth image with depth missing, and generating a second depth image corresponding to the training image.

In one embodiment, the depth completion training of the depth completion model according to the training image, the first depth image, and the second depth image includes:

inputting the training image and the second depth image into a current depth completion model to obtain a third depth image corresponding to the training image output by the depth completion model;

determining a loss value of a current depth completion model based on a loss function according to the third depth image and the first depth image;

and performing parameter adjustment on the current deep completion model according to the loss value until the current deep completion model meets a preset convergence condition, and obtaining the trained deep completion model.

In one embodiment, the loss function is a mean square loss function.

In one embodiment, the depth completion model comprises: the first branch, the second branch and the fusion layer;

the first branch is used for inputting the training image, the first branch comprises a plurality of first sub-modules which are connected in sequence, and the first sub-modules are used for performing feature extraction on color information, texture information, edge information and spatial information;

the second branch is used for inputting the second depth image, the second branch comprises a down-sampling module and an up-sampling module, the down-sampling module is used for executing down-sampling, and feature extraction of depth information and spatial information is carried out on a result obtained by the down-sampling; the up-sampling module is used for extracting the features of the depth information and the spatial information and performing up-sampling on the result of the feature extraction; wherein the number of the down-sampling modules is the same as the number of the up-sampling modules;

and the fusion layer is used for carrying out feature fusion on the features output by the first branch and the features output by the second branch to obtain the output of the deep completion model.

In one embodiment, the down-sampling modules are in one-to-one correspondence with the up-sampling modules, wherein the output size of a down-sampling module is the same as the input size of a corresponding up-sampling module, at least one of the down-sampling modules providing a cross-layer connection to a corresponding up-sampling module;

and the down-sampling module is provided with a cross-layer connection and is used for transmitting the shallow feature output by the down-sampling module to the corresponding up-sampling module through the cross-layer connection.

In an embodiment, an upsampling module connected across layers is provided, and is specifically configured to perform feature extraction of depth information and spatial information on a result of superposition of shallow features transmitted across layers and deep features output by a last module of the upsampling module, and perform upsampling on the result of feature extraction.

In one embodiment, each of the upsampling modules and each of the downsampling modules includes a second sub-module, and the second sub-module is configured to perform feature extraction of depth information and spatial information.

In one embodiment, the first sub-module and the second sub-module are both residual channel attention block models.

In another aspect, the present disclosure provides a depth-compensated image generating method, including:

acquiring a depth image to be processed, wherein the depth image to be processed comprises a depth missing area;

inputting the depth image to be processed into a depth completion model to obtain a depth completion image subjected to depth completion; the deep completion model is generated by training by adopting the training method of the deep completion model.

In another aspect, the present disclosure provides a training apparatus for a deep completion model, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a training image and a first depth image corresponding to the training image, and the training image is a two-dimensional image;

the processing module is used for adding depth defects into the first depth image and generating a second depth image corresponding to the training image;

and the training module is used for carrying out depth completion training on the depth completion model according to the training image, the first depth image and the second depth image so that the depth completion model can output a depth image subjected to depth completion based on an input depth missing image.

In yet another aspect, the present disclosure provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement a method of training a depth-complementing model as described in any one of the preceding items or a method of generating a depth-complementing image as described in the preceding item.

In yet another aspect, the present disclosure provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the training method of a depth-complementing model or the depth-complementing image generating method as described in any one of the preceding items when executed by a processor.

In a further aspect, the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a training method for a depth-complementing model as described in any one of the preceding claims or a depth-complementing image generating method as described in the preceding paragraphs.

In the training method, device, equipment, medium and product of the depth completion model, a training image and a first depth image corresponding to the training image are obtained, a second depth image with depth defects is added to the first depth image, and the depth completion model is subjected to depth completion training according to the training image, the first depth image and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on the input depth image with depth defects. According to the depth completion method and device, the depth completion model is built and deep learning is carried out, the trained depth completion model is used for completing the depth map with the depth missing, a solution for depth completion is provided for the depth map with the depth missing, the depth map with the depth missing can be subjected to depth completion, and the depth map with complete depth is obtained.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic view of an application scenario of an example of the present disclosure;

fig. 2 is a schematic flowchart of a training method of a deep completion model according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of another training method for a deep completion model according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a training method for a deep completion model according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a depth completion model according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of another depth completion model according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of another depth completion model according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of another depth completion model according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of another depth completion model provided in an embodiment of the present disclosure

Fig. 10 is a deep completion model training apparatus provided in the third embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Description of the reference numerals:

51: a first sub-module;

52: a down-sampling module;

53: an upsampling module;

61: a second sub-module.

Specific embodiments of the present disclosure have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

At present, artificial intelligence is usually realized through machine learning, and the machine learning is trained according to learning samples, so that a perfect intelligent model is established, and intelligent operation of a machine is realized. For example, an intelligent sorting scene is taken as an example, the depth map of an object needs to be analyzed to accurately capture the object, but due to the fact that the object is shielded and the like, the depth in the depth map of the object is lost, and the untrained machine cannot judge the accurate position and shape of the object according to the depth map with the lost depth, so that a capture obstacle exists.

In order to complete a depth map with missing depth, a depth completion model needs to be established for a sorting machine, and a learning sample is used for training to obtain an intelligent depth completion model. Fig. 1 is a schematic view of an application scenario of an example of the present disclosure. As shown in the figure, the depth completion model may perform depth completion on a depth map with depth deficiency, and specifically, the depth map with depth deficiency is input into the depth completion model to obtain a depth map subjected to depth completion. Optionally, the depth map subjected to depth completion can be used for realizing the accurate positioning of the intelligent sorting model on the position and the shape of the object and realizing the accurate grabbing of the object.

It should be noted that the brief descriptions of terms in the present disclosure are only for convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present disclosure. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The technical solutions of the present disclosure and the technical solutions of the present disclosure will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. In the description of the present disclosure, unless otherwise explicitly specified and defined, each term should be understood broadly in the art. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Example one

Fig. 2 is a schematic flowchart of a training method for a deep completion model according to an embodiment of the present disclosure, as shown in fig. 1, the method includes:

101, acquiring a training image and a first depth image corresponding to the training image, wherein the training image is a two-dimensional image;

102, adding a depth defect into the first depth image to generate a second depth image corresponding to the training image;

step 103, performing depth completion training on the depth completion model according to the training image, the first depth image and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on an input depth-missing image.

The execution subject of this embodiment is a training apparatus of a deep completion model, which can be implemented by a computer program, for example, application software; alternatively, the present invention may be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, the implementation may be realized by a physical device, such as a chip, etc., into which the relevant computer program is integrated or installed.

Training of the intelligent model generally requires that a training sample is used as input of the model, and then parameters of the model are determined according to an optimization algorithm, so that the effect required to be achieved by the intelligent model is achieved. For example, the training samples of the depth completion model mainly include a training image, a first depth image, and a second depth image. The training image is a two-dimensional image, and the acquisition mode of the training image is not limited. For example, the image can be acquired by acquiring an image of a real object. Optionally, the image acquisition mode is not limited, and for example, the image acquisition mode may be acquired by an image acquisition device such as a camera or a video camera. The first depth image is a depth image corresponding to the training image and is a depth image with relatively perfect depth information. Similarly, the first depth image may be obtained by a depth camera, for example, the first depth image corresponds to the training image, and for example, the first depth image is the same as the scene in the training image. Wherein the second depth image is obtained by adding depth defects to the first depth image. In practical applications, there are various ways to incorporate deep defects.

Optionally, fig. 3 is a schematic flow chart of another training method for a deep completion model according to an embodiment of the present disclosure, as shown in fig. 3, step 102 includes:

step 201, randomly removing the depth information of the local area of the first depth image, and generating a second depth image corresponding to the training image; or,

step 202, using the depth image of the object with the missing depth to shield a local area of the first depth image, and generating a second depth image corresponding to the training image.

Specifically, the method for adding the depth defect may include, but is not limited to, removing the depth information and blocking the depth information, thereby achieving the effect of adding the depth defect. Taking the manner of removing the depth information as an example, the depth information corresponding to a certain local area in the first depth image may be randomly removed, so that the removed part does not have the depth information, thereby obtaining the second depth image added with the depth missing. Taking the way of blocking depth information as an example, a local area in the first depth image may be blocked by an object depth image with less depth information, so as to obtain a second depth image with the depth missing added. Optionally, the object image used to block the depth information may be a depth image of a transparent object.

In practical application, the intelligent model needs to be trained based on a training sample to achieve a desired effect, so as to obtain a trained model. For example, after the second depth image is obtained, the training image, the first depth image, and the second depth image may be used as training samples of the depth completion model for training.

In an example, fig. 4 is a schematic flowchart of a training method for a depth completion model according to an embodiment of the present disclosure, and as shown in fig. 4, in step 103, the performing depth completion training on the depth completion model according to the training image, the first depth image, and the second depth image includes:

step 301, inputting the training image and the second depth image into a current depth completion model to obtain a third depth image corresponding to the training image output by the depth completion model;

step 302, determining a loss value of a current depth completion model based on a loss function according to the third depth image and the first depth image;

and 303, performing parameter adjustment on the current deep completion model according to the loss value until the current deep completion model meets a preset convergence condition, and obtaining the trained deep completion model.

In combination with the scene example, before the training of the deep completion model, the parameters in the deep completion model are default values, and appropriate parameter values need to be determined through the training of the training samples. The training image and the second depth image with defects can be used as first input to obtain a third depth image output by the current depth completion model. And judging whether the model meets the accuracy requirement or not based on the difference between the first depth image and the third depth image. As an example, a smaller difference between the first depth image and the third depth image indicates a better accuracy of the model. And then, based on the current difference between the first depth image and the third depth image, adjusting the parameter value in the depth completion model by using a loss function, selecting the training image and the second depth image again, inputting the training image and the second depth image into the parameter-adjusted model, and indicating that the depth completion model is more accurate at the moment until the difference between the third depth image output by the current model and the first depth image is small enough, so that the depth completion model can be used as the trained depth completion model. Optionally, the loss function is a mean-square loss function, and the loss function may use any function, but generally, the mean-square loss function calculated by using a mean-square error is more, and the purpose of using the mean-square loss function is to make the difference between the first depth image and the third depth image smaller as better as possible, so as to obtain a reference index of the optimal parameter solution.

In an example, fig. 5 is a schematic structural diagram of a depth completion model provided in an embodiment of the present disclosure, where the depth completion model includes: the first branch, the second branch and the fusion layer;

the first branch is used for inputting the training image, the first branch includes a plurality of first sub-modules 51 connected in sequence, and the first sub-modules 51 are used for performing feature extraction on color information, texture information, edge information and spatial information;

the second branch is used for inputting the second depth image, the second branch comprises a down-sampling module 52 and an up-sampling module 53, the down-sampling module 52 is used for executing down-sampling, and feature extraction of depth information and spatial information is performed on a result obtained by the down-sampling; the up-sampling module 53 is configured to perform feature extraction of depth information and spatial information, and perform up-sampling on a result of the feature extraction; the number of the down-sampling modules 52 is the same as that of the up-sampling modules 53;

And combining with a scene example, inputting the training image and the second depth image into the depth completion model, and performing processing by the depth completion model to obtain a depth map subjected to depth completion, wherein in the processing by the depth completion model, features of the input training image and the second depth image need to be extracted, and the extracted image features need to be fused.

Because the feature information of the training image and the feature information of the first depth image need to be extracted respectively, two branches, namely a first branch and a second branch, can be arranged in the depth completion model, the training image is subjected to feature extraction through the first branch, and the second depth image is subjected to feature extraction through the second branch.

The first branch circuit is provided with a plurality of first sub-modules 51 which are sequentially linked, the first sub-modules 51 are feature extraction modules of the first branch circuit, the first sub-modules 51 extract features of the training image, and the feature information of the image mainly comprises color information, texture information, edge information and space information.

The second branch is respectively provided with a down-sampling module 52 and an up-sampling module 53, and the number of the down-sampling module 52 and the number of the up-sampling module 53 are the same. Because the second depth image is a depth map, the features of the second depth image include depth information and spatial information.

Optionally, fig. 6 is a schematic structural diagram of another depth completion model provided in the first embodiment of the present disclosure, each up-sampling module 53 and each down-sampling module 52 include a second sub-module 61, and the second sub-module 61 is configured to perform feature extraction on depth information and spatial information.

The down-sampling module 52 is composed of two parts, namely a down-sampling operation module and a second sub-module 61, the down-sampling operation module is responsible for performing down-sampling processing on the second depth image, and the second sub-module 61 is responsible for performing feature extraction on the second depth image. Similarly, the upsampling module 53 is also composed of two parts, namely an upsampling operation module and a second sub-module 61, where the upsampling operation module is responsible for upsampling the second depth image, and the second sub-module 61 is responsible for extracting features of the second depth image.

Optionally, fig. 7 is a schematic structural diagram of another depth completion model provided in the first embodiment of the present disclosure, and as shown in fig. 7, the first sub-module 51 and the second sub-module 61 are both Residual Channel Attention Blocks (RCABs). Since the first sub-module 51 and the second sub-module 61 both perform feature extraction on the image, the first sub-module 51 and the second sub-module 61 may be selected as a residual channel attention block model. Each part of the RCAB is provided with a plurality of channels, each channel can extract different feature information of the image, the features of the correction information channels can be self-adaptive to correspond to each other, the characterization capability of the network is improved, and the image features are fully extracted in a multi-level feature fusion mode.

In the process of extracting the depth information and the spatial information of the second depth image, the second branch passes through each down-sampling module 52 in sequence, and in the process of passing through each down-sampling module 52 in sequence, the second branch firstly passes through a down-sampling operation module to perform down-sampling, and then the second branch extracts the depth information and the spatial information by the RCAB in the down-sampling module 52. After each down-sampling module 52 finishes feature extraction of the second depth image in sequence, each up-sampling module 53 is passed through in sequence. In the process of passing through the up-sampling module 53, firstly, depth information and spatial information are extracted from the second depth image through the RCAB, and then the second depth image is up-sampled through the up-sampling operation module.

And when the first branch and the second branch respectively extract the features of the training image and the second depth image, inputting the extracted feature information into a fusion layer, fusing the feature information obtained by the two branches by the fusion layer, and outputting the depth image subjected to depth completion.

In an example, fig. 8 is a schematic structural diagram of another depth completion model provided in an embodiment of the present disclosure, where a down-sampling module 52 is in one-to-one correspondence with an up-sampling module 53, where an output size of the down-sampling module 52 is the same as an input size of the corresponding up-sampling module 53, and at least one of the down-sampling modules 52 is provided to be connected to the corresponding up-sampling module 53 across layers;

a down-sampling module 52 is provided with a cross-layer connection for transmitting the shallow features output by the down-sampling module 52 to the corresponding up-sampling module via the cross-layer connection.

In combination with the scene example, the modules for extracting features in the first branch and the second branch are both RCABs, multiple channels exist inside the RCABs, and different features of the picture are extracted in different channels. As mentioned above, the down-sampling module 52 and the up-sampling module 53 are the same in number, and before feature extraction is performed on the second depth image, the down-sampling module 52 first performs image size reduction processing on the second depth image by the down-sampling operation module, so as to increase the number of channels that can be used by the RCAB for feature extraction in a manner that better adapts to multi-channel feature extraction of the RCAB. And after the second depth image is subjected to image size reduction, copying the second depth image subjected to image size reduction into multiple copies, and respectively inputting the copies into different channels in the RCAB for feature extraction.

As shown in fig. 8, two down-sampling modules 52 are illustrated in the second branch, and as can be seen from the left to right, after feature extraction is performed by the first down-sampling module 52, a second depth image with a reduced size is output and input to the second down-sampling module 52. At this time, the down-sampling processing module in the second down-sampling module 52 performs the size reduction operation on the reduced second depth image again based on the output of the first down-sampling module 52, and then performs the feature extraction by the RCAB in the second down-sampling module 52, so as to pass through all the down-sampling modules 52.

After passing through all of the down-sampling modules 52 in turn, it passes through each of the up-sampling modules 53 in turn. In passing through each upsampling module 53, the features for the second depth image are first extracted through RCAB in the upsampling module 53, and then passed through an upsampling processing module in the upsampling module 53, which can stretch the size of the image. The degree of stretching of the image by the up-sampling processing module is the same as the degree of compressing the image by the down-sampling operation module, for example, the compression of the image by the down-sampling operation module may make the image half the original size, and the stretching of the image by the up-sampling processing module may make the image twice the original size. Since the number of the up-sampling modules 53 and the down-sampling modules 52 in the second branch is the same, the size of the second depth image finally output by the second branch is the same as the size of the original input image.

Since the number of the up-sampling modules 53 and the down-sampling modules 52 in the second branch is the same, and the up-sampling modules 53 and the down-sampling modules 52 are connected in sequence, the up-sampling modules 53 and the down-sampling modules 52 are in one-to-one correspondence with the central axis of the second branch as a symmetry line. For example, in the example of fig. 8, the second downsampling module 52 corresponds to the first upsampling module 53 and the first downsampling module 52 corresponds to the second upsampling module 53 in order from left to right, and thus it can be seen that the size of the image subjected to feature extraction by the corresponding upsampling module 53 and the RCAB in the downsampling module 52 is the same.

The cross-linking operation is performed between the corresponding up-sampling module 53 and down-sampling module 52, so as to transmit the image shallow feature extracted by the down-sampling module 52 to the corresponding up-sampling module 53, and superimpose the image shallow feature with the image deep feature extracted by the up-sampling module 53, so as to obtain more comprehensive image feature information. It should be noted that the image shallow feature and the image deep feature are defined in sequence by relative extraction, the image feature extracted first is the image shallow feature, and the image feature extracted later is the image deep feature. For example, in the second branch, the feature extraction of the second depth image by the downsampling module 52 is prior, so the feature extracted by the downsampling module 52 for the second depth image is called a shallow feature. Similarly, the feature of the second depth image is extracted by the up-sampling module 53, so the feature extracted by the up-sampling module 53 for the second depth image is called a deep feature.

In one example, the cross-layer connection up-sampling module 53 is provided, and is specifically configured to perform feature extraction of depth information and spatial information on a result of superposition of shallow features transmitted by the cross-layer connection and deep features output by a module immediately preceding the up-sampling module 53, and perform up-sampling on the result of feature extraction.

As can be seen from fig. 8, each input of the up-sampling module 53 provided with the cross-layer connection includes two parts, namely, the shallow feature output by the down-sampling module 52 corresponding to the cross-layer connection and the deep feature output by the previous up-sampling module 53, so that each up-sampling module 53 needs to superimpose the received deep feature and the shallow feature, perform feature extraction of depth information and spatial information again on the second depth image according to the superimposed result, perform up-sampling processing on the extracted feature result, and transmit the result to the next up-sampling module 53 until all up-sampling modules 53 in the second branch are completed. The feature extraction of the second depth image is performed through the down-sampling module 52 and the up-sampling module 53, and the extracted different image features are fused, so that the extracted image features are richer and more comprehensive, and sufficient image features are provided for the depth completion of the second depth image.

Optionally, fig. 9 is a schematic structural diagram of another depth completion model provided in an embodiment of the present disclosure, as shown in fig. 9, in order from left to right, one or more multi-layer perceptrons (MLPs) may be separately disposed between the second down-sampling module 52 and the first up-sampling module 53, the MLPs are mainly used to adjust dimensions corresponding to extracted image features, so as to facilitate reestablishing depth information of an image, and the MLPs are also referred to as fully-connected neural networks or fully-connected layers. When the MLP exists between the second down-sampling module 52 and the first up-sampling module 53, the image feature information directly transmitted to the first up-sampling module 53 by the second down-sampling module 52 is processed by the MLP, so that the image feature information finally input to the first up-sampling module 53 is different from the image feature information input to the first up-sampling module 53 by the second down-sampling module 52 in a cross-link manner, and thus the first up-sampling module receives two different pieces of image feature information, and then the received image feature information is fused, so that the image feature information obtained and extracted by the first up-sampling module 53 is more sufficient and abundant.

According to the depth completion method and device, a depth completion model is established, a training image and a first depth image corresponding to the training image are obtained, a second depth image with depth defects is added into the first depth image, and depth completion training is performed on the depth completion model according to the training image, the first depth image and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on the input depth image with depth loss. According to the depth completion method and device, the depth completion model is built and deep learning is carried out, the trained depth completion model is used for completing the depth map with the depth missing, a solution for depth completion is provided for the depth map with the depth missing, the depth map with the depth missing can be subjected to depth completion, and the depth map with complete depth is obtained.

Example two

The present disclosure further provides a method for generating a depth-compensated image, including:

In practical application, along with artificial intelligence's development, more and more occasions are through adopting artificial intelligence to realize people's demand, for example use intelligent letter sorting model as an example, letter sorting model needs the depth map of analysis object to carry out the accurate of object and snatchs, but because the random reason such as putting between the object leads to having between the object to shelter from to there is the disappearance of object degree of depth when causing intelligent letter sorting model to observe the object, lead to intelligent letter sorting model to have the obstacle at the in-process of snatching the object.

In combination with the scene example, the present embodiment provides a completion method for the depth completion of an image by establishing and training a depth completion model, and a depth map with depth defects is input into the depth completion model, and the depth map subjected to depth completion can be obtained through the depth completion operation of the depth completion model.

Specifically, in the process of training the depth completion model, a training sample needs to be obtained first, the training sample is used as input, a depth image subjected to depth completion is output, and a difference between the depth image output by the depth completion model and a standard depth image in the training sample is reduced by optimizing parameters corresponding to the model. Two branches, namely a first branch and a second branch, exist in the depth completion model. Firstly, a training image is obtained, wherein the training image is an image obtained by collecting a real object and is input into the first branch circuit to extract the characteristics of the image, and the characteristics comprise color information, texture information, edge information and space information. And acquiring a first depth image corresponding to the training image again, wherein no depth is lost in the first depth image, acquiring a second depth image with defects by a method of randomly removing depth values in the first depth image or a method of shielding the first depth image, and inputting the second depth image into the second branch for extracting depth information and spatial information. And finally, fusing the characteristic information extracted by the first branch and the second branch to obtain a completed depth image, and determining parameters in a depth completion model by the completed depth image and the first depth image without depth defects according to a mean square loss function to finish the training of the depth completion model.

In this embodiment, the depth image with depth missing is input into the depth completion model, and the depth completion model completes the depth map with the depth missing, so as to obtain a complete depth map.

EXAMPLE III

In an example, fig. 10 is a deep completion model training apparatus provided in a third embodiment of the present disclosure, including:

an obtaining module 71, configured to obtain a training image and a first depth image corresponding to the training image, where the training image is a two-dimensional image;

a processing module 72, configured to add a depth defect to the first depth image, and generate a second depth image corresponding to the training image;

a training module 73, configured to perform depth completion training on the depth completion model according to the training image, the first depth image, and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on an input depth-missing image.

In combination with the scene example, the training of the intelligent model generally requires the obtaining module 71 to obtain a training sample as the input of the model, and then determines the parameters of the model according to the optimization algorithm, thereby achieving the effect required to be achieved by the intelligent model. For example, the training samples of the depth completion model mainly include a training image, a first depth image, and a second depth image. The training image is a two-dimensional image, and the acquisition mode of the training image is not limited. For example, the image can be obtained by image acquisition of a real object. Optionally, the image acquisition mode is not limited, and for example, the image acquisition mode may be acquired by an image acquisition device such as a camera or a video camera. The first depth image is a depth image corresponding to the training image and is a depth image with relatively perfect depth information. Similarly, the first depth image may be obtained by a depth camera, for example, the first depth image corresponds to the training image, and for example, the first depth image is the same as the scene in the training image. Wherein the second depth image is obtained by adding depth defects to the first depth image by the processing module 72. In practice, there are various ways to incorporate depth defects.

Specifically, the method for adding the depth defect by the processing module 72 may include, but is not limited to, removing the depth information and blocking the depth information, thereby achieving the effect of adding the depth defect. Taking the manner of removing the depth information as an example, the depth information corresponding to a certain local area in the first depth image may be randomly removed, so that the removed part does not have the depth information, thereby obtaining the second depth image added with the depth missing. Taking the way of blocking depth information as an example, a local area in the first depth image may be blocked by an object depth image with less depth information, so as to obtain a second depth image with the depth missing added. Alternatively, the object image used to block the depth information may be a depth image of a transparent object.

In practical applications, the intelligent model needs to train the model based on the training samples by the training module 73 to achieve the expected effect, so as to obtain the trained model. For example, after the second depth image is obtained, the training image, the first depth image, and the second depth image may be used as training samples of the depth completion model for training.

In combination with the scene example, before the training of the deep completion model, the parameters in the deep completion model are default values, and appropriate parameter values need to be determined through the training of the training samples. The training image and the second depth image with defects can be used as first input to obtain a third depth image output by the current depth completion model. And judging whether the model meets the accuracy requirement or not based on the difference between the first depth image and the third depth image. As an example, a smaller difference between the first depth image and the third depth image indicates a better accuracy of the model. And then, adjusting the parameter value in the depth completion model based on the current difference between the first depth image and the third depth image, selecting the training image and the second depth image again, inputting the training image and the second depth image into the parameter-adjusted model, and indicating that the depth completion model is accurate at the moment until the difference between the third depth image and the first depth image output by the current model is small enough, so that the depth completion model can be used as the trained depth completion model. The loss function is a mean-square loss function, and any function can be used for the loss function, but the mean-square loss function generally calculated by using a mean-square error is more, and the purpose of using the mean-square loss function is to make the difference between the first depth image and the third depth image smaller and better, so as to obtain a reference index of an optimal parameter solution.

The second branch is provided with a down-sampling module 52 and an up-sampling module 53, and the number of the down-sampling module 52 and the up-sampling module 53 is the same. Because the second depth image is a depth map, the features of the second depth image include depth information and spatial information.

Optionally, the downsampling module 52 includes two parts, which are a downsampling operation module and a second sub-module 61, where the downsampling operation module is responsible for downsampling the second depth image, and the second sub-module 61 is responsible for feature extraction of the second depth image. Similarly, the upsampling module 53 is also composed of two parts, namely an upsampling operation module and a second sub-module 61, where the upsampling operation module is responsible for upsampling the second depth image, and the second sub-module 61 is responsible for extracting features of the second depth image.

Optionally, the first sub-module 51 and the second sub-module 61 are both Residual Channel Attention Blocks (RCABs). Since the first sub-module 51 and the second sub-module 61 both perform feature extraction on the image, the first sub-module 51 and the second sub-module 61 may be selected as a residual channel attention block model. Each part of the RCAB is provided with a plurality of channels, each channel can extract different feature information of the image, the features of the correction information channels can be self-adaptive to correspond to each other, the characterization capability of the network is improved, and the image features are fully extracted in a multi-level feature fusion mode.

In the process of extracting the depth information and the spatial information of the second depth image, the second branch passes through each down-sampling module 52 in sequence, and in the process of passing through each down-sampling module 52 in sequence, the second branch firstly passes through a down-sampling operation module to perform down-sampling, and then the second branch extracts the depth information and the spatial information by the RCAB in the down-sampling module 52. After each down-sampling module 52 completes feature extraction of the second depth image in sequence, each up-sampling module 53 is passed through in sequence. In the process of passing through the up-sampling module 53, firstly, depth information and spatial information are extracted from the second depth image through the RCAB, and then the second depth image is up-sampled through the up-sampling operation module.

And when the first branch and the second branch respectively extract the features of the training image and the second depth image, inputting the extracted feature information into a fusion layer, fusing the feature information obtained by the two branches by the fusion layer, and then outputting the depth image subjected to depth completion.

Optionally, in combination with the scene example, the modules for performing feature extraction in the first branch and the second branch are both RCABs, multiple channels exist inside the RCABs, and different features of the picture are extracted in different channels. The down-sampling module 52 and the up-sampling module 53 are the same in number, and the down-sampling module 52 firstly performs image size reduction processing on the second depth image by the down-sampling operation module before performing feature extraction on the second depth image, so as to increase the number of channels that can be used for feature extraction by the RCAB in a manner that can better adapt to multi-channel feature extraction by the RCAB. And after the second depth image is subjected to image size reduction, copying the second depth image with reduced size in different ways, and inputting the second depth image into different channels in the RCAB respectively for feature extraction.

In fig. 8, two down-sampling modules 52 are illustrated in the second branch, and as can be seen from the left to right, after feature extraction is performed by the first down-sampling module 52, a second depth image with a reduced size is output and input to the second down-sampling module 52. At this time, the down-sampling processing module in the second down-sampling module 52 performs the size reduction operation on the reduced second depth image again based on the output of the first down-sampling module 52, and then performs the feature extraction by the RCAB in the second down-sampling module 52, so as to pass through all the down-sampling modules 52.

After passing through all of the down-sampling modules 52 in turn, it passes through each of the up-sampling modules 53 in turn. In passing through each upsampling module 53, the features for the second depth image are first extracted through the RCAB in the upsampling module 53, and then passed through an upsampling processing module in the upsampling module 53, which can stretch the size of the image. The degree of stretching of the image by the up-sampling processing module is the same as the degree of compressing the image by the down-sampling operation module, for example, the compression of the image by the down-sampling operation module may make the image half the original size, and the stretching of the image by the up-sampling processing module may make the image twice the original size. Since the number of the up-sampling modules 53 and the down-sampling modules 52 in the second branch is the same, the size of the second depth image finally output by the second branch is the same as the size of the original input image.

Since the number of the up-sampling modules 53 and the down-sampling modules 52 in the second branch is the same, and the up-sampling modules 53 and the down-sampling modules 52 are connected in sequence, the up-sampling modules 53 and the down-sampling modules 52 are in one-to-one correspondence with the central axis of the second branch as a symmetry line. For example, in fig. 8, the second downsampling module 52 corresponds to the first upsampling module 53 and the first downsampling module 52 corresponds to the second upsampling module 53 in order from left to right, and thus it can be seen that the size of the image for feature extraction by the corresponding upsampling module 53 and RCAB in the downsampling module 52 is the same.

The cross-linking operation is performed between the corresponding up-sampling module 53 and the corresponding down-sampling module 52, so as to transmit the image shallow feature extracted by the down-sampling module 52 to the corresponding up-sampling module 53, and superimpose the image shallow feature extracted by the up-sampling module 53 on the image deep feature to obtain more comprehensive image feature information.

Each input of the up-sampling module 53 provided with the cross-layer connection includes two parts, namely, the shallow feature output by the down-sampling module 52 corresponding to the cross-layer connection and the deep feature output by the previous up-sampling module 53, so that each up-sampling module 53 needs to overlap the received deep feature and the shallow feature, extract the features of depth information and spatial information again from the second depth image according to the overlapped result, up-sample the extracted feature result, and transmit the result to the next up-sampling module 53 until all up-sampling modules 53 in the second branch pass through. The second depth image feature is extracted through the down-sampling module 52 and the up-sampling module 53, and the extracted different image features are fused, so that the extracted image features are richer and more comprehensive, and sufficient image features are provided for the depth completion of the second depth image.

Optionally, one or more MLPs may be separately disposed between the second downsampling module 52 and the first upsampling module 53 in order from left to right, where the MLPs are mainly used to adjust the corresponding dimensions of the extracted image features, so as to reestablish the depth information of the image, and the MLPs are also called fully-connected neural networks or fully-connected layers. When the MLP exists between the second down-sampling module 52 and the first up-sampling module 53, the image feature information directly transmitted to the first up-sampling module 53 by the second down-sampling module 52 is processed by the MLP, so that the image feature information finally input to the first up-sampling module 53 is different from the image feature information input to the first up-sampling module 53 by the second down-sampling module 52 in a cross-link manner, and thus the first up-sampling module receives two different pieces of image feature information, and then the received image feature information is fused, so that the image feature information obtained and extracted by the first up-sampling module 53 is more sufficient and abundant.

In this embodiment, a depth completion model is first established, a training image and a first depth image corresponding to the training image are then obtained by an obtaining module 71, a second depth image with a depth defect is added to the first depth image by a processing module 72, and a training module 73 performs depth completion training on the depth completion model according to the training image, the first depth image and the second depth image, so that the depth completion model can output a depth image subjected to depth completion based on an input depth image with depth missing. According to the depth completion method and device, the depth completion model is built and deep learning is carried out, the trained depth completion model is used for completing the depth map with the depth missing, a solution for depth completion is provided for the depth map with the depth missing, the depth map with the depth missing can be subjected to depth completion, and the depth map with complete depth is obtained.

Example four

Fig. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure, and as shown in fig. 11, the electronic device includes:

a processor (processor) 291, the electronic device further including a memory (memory) 292; a Communication Interface 293 and bus 294 may also be included. The processor 291, the memory 292, and the communication interface 293 may communicate with each other via the bus 294. Communication interface 293 may be used for the transmission of information. Processor 291 may invoke logic instructions in memory 292 to perform the methods of the embodiments described above.

Further, the logic instructions in the memory 292 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 292 is used as a computer-readable storage medium for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 291 executes the functional application and data processing by executing the software program, instructions and modules stored in the memory 292, so as to implement the method in the above method embodiments.

The memory 292 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 292 may include a high speed random access memory and may also include a non-volatile memory.

The disclosed embodiments provide a non-transitory computer-readable storage medium having stored therein computer-executable instructions for implementing the method of the foregoing embodiments when executed by a processor.

EXAMPLE five

The embodiments of the present disclosure provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the private network data acquisition method provided in any embodiment of the present disclosure is implemented.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of a deep completion model is characterized by comprising the following steps:

2. The method of claim 1, wherein the adding a depth defect to the first depth image to generate a second depth image corresponding to the training image comprises:

randomly removing the depth information of the local area of the first depth image, and generating a second depth image corresponding to the training image; or,

and shielding a local area of the first depth image by using the object depth image with the depth missing to generate a second depth image corresponding to the training image.

3. The method of claim 1, wherein the depth completion training of the depth completion model according to the training image, the first depth image, and the second depth image comprises:

4. The method of claim 3, wherein the loss function is a mean square loss function.

5. The method according to any one of claims 1-4, wherein the depth completion model comprises: the device comprises a first branch, a second branch and a fusion layer;

6. The method of claim 5, wherein the downsampling modules correspond one-to-one to the upsampling modules, wherein an output size of a downsampling module is the same as an input size of a corresponding upsampling module, at least one of the downsampling modules providing a cross-layer connection to a corresponding upsampling module;

7. The method of claim 6,

the system comprises an up-sampling module connected in a cross-layer mode, and is particularly used for extracting the features of depth information and spatial information of the superimposed results of the shallow features transmitted in the cross-layer mode and the deep features output by the last module of the up-sampling module and performing up-sampling on the extracted results of the features.

8. The method of claim 5,

each up-sampling module and each down-sampling module comprise a second sub-module, and the second sub-module is used for extracting the characteristics of depth information and spatial information.

9. The method of claim 8, wherein the first sub-module and the second sub-module are both residual channel attention block models.

10. A method of generating a depth-complementing image, comprising:

inputting the depth image to be processed into a depth completion model to obtain a depth completion image subjected to depth completion; wherein the deep-patch model is generated by training with the training method of the deep-patch model according to any one of claims 1 to 9.

11. A training device for a deep completion model, comprising:

the processing module is used for adding depth defects into the first depth image to generate a second depth image corresponding to the training image;

and the training module is used for carrying out depth completion training on the depth completion model according to the training image, the first depth image and the second depth image so as to enable the depth completion model to output a depth image subjected to depth completion based on an input depth missing image.

12. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the method of training a depth-complementing model as defined in any one of claims 1 to 9 or the method of generating a depth-complementing image as defined in claim 10.

13. A computer-readable storage medium having stored thereon computer-executable instructions for implementing a training method of a depth-complementing model according to any one of claims 1 to 9 or a method of generating a depth-complementing image according to claim 10, when executed by a processor.

14. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, implements the training method of a depth-complementing model according to any one of claims 1 to 9 or the method of generating a depth-complementing image according to claim 10.