CN110334769A

CN110334769A - Target identification method and device

Info

Publication number: CN110334769A
Application number: CN201910614107.8A
Authority: CN
Inventors: 郭建亚; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2019-10-15

Abstract

The embodiment of the present application discloses a kind of target identification method and device, acquires the RGB image and depth image of target area；Empty filling, the depth image repaired are carried out to depth image；The depth image of reparation is encoded to obtain triple channel depth image；RGB image and triple channel depth image are inputted into trained identification model in advance, obtain the target identification result in RGB image.The application carries out target identification using preparatory trained identification model, in conjunction with RGB image and depth image, improves the accuracy rate of target identification.

Description

Target identification method and device

Technical field

This application involves technical field of image processing, more specifically to a kind of target identification method and device.

Background technique

Current target identification is all based on RGB image realization, by extracting color characteristic, texture from RGB image Feature and contour feature identify target.But it is influenced when due to imaging by environmental factors such as illumination, it is existing to be schemed based on RGB Extracted feature can not completely embody the available feature information of target during the target identification of picture, so that the identification of target Accuracy rate is lower.

Summary of the invention

The purpose of the application is to provide a kind of target identification method and device, to improve the accuracy rate of target identification, including Following technical solution:

A kind of target identification method, comprising:

Acquire the RGB image and depth image of target area；

Empty filling, the depth image repaired are carried out to the depth image；

The depth image of the reparation is encoded to obtain triple channel depth image；

The RGB image and the triple channel depth image are inputted into trained identification model in advance, obtain the RGB Target identification result in image；The identification model is, in advance the RGB with the RGB image of several marks and with each mark The corresponding depth image of image obtains for sample training.

The above method, it is preferred that described that empty filling, the depth image packet repaired are carried out to the depth image It includes:

Binary conversion treatment is carried out to the depth image, obtains mask；

The empty point in the depth image is determined according to the mask；

Pixel value in the RGB image of gray processing is clustered, cluster image, the cluster image identification are obtained The approximate pixel of pixel value in the RGB image of gray processing；

Determining the first pixel corresponding with the cavity point in the RGB image of gray processing, and with described first Similar all second pixels of pixel, second pixel are corresponding with the non-cavity point in the depth image；

Calculate first pixel between each second pixel at a distance from；

Using corresponding depth value the filling out as the cavity point of the second pixel shortest at a distance between first pixel It supplements with money.

Binary conversion treatment is carried out to the depth image, obtains mask；

The empty point in depth image is determined according to the mask；

It is determining in the RGB image that presetting for corresponding first pixel and first pixel is put with the cavity The second pixel in neighborhood, second pixel are to put corresponding pixel with non-cavity in the default neighborhood；

Calculate first pixel between each second pixel at a distance from；

The above method, it is preferred that the identification model includes:

Depth network unit and convolutional neural networks unit；Wherein,

The depth network unit is for handling the triple channel depth image, to extract the triple channel depth The feature of image；

The convolutional neural networks unit is used to handle the RGB image, extracts the feature of the RGB image, The feature of feature and the RGB image to the triple channel depth image is handled, and is obtained in the RGB image Target identification result.

The above method, it is preferred that the depth network unit includes: three layer multi-layer perceptron convolutional layers；

The convolutional neural networks unit includes: two layers of convolution pond layer；Two connect with two layers of convolution pond layer The first Inception module of layer；The first pond layer being connect with two layers of the oneth Inception modules；With first pond Change five layer of the 2nd Inception module of layer connection；The second pond layer being connect with five layer of the 2nd Inception module； Two layers of the 3rd Inception modules being connect with second pond layer；It is connect with two layers of the 3rd Inception modules Third pond layer；The dropout layer being connect with third pond layer；The linear layer being connect with the dropout layer；With institute State the classification layer of linear layer connection；The decision-making level being connect with the classification layer；The output layer being connect with the decision-making level.

A kind of Target Identification Unit, comprising:

Acquisition module, for acquiring the RGB image and depth image of target area；

Module is filled, for carrying out empty filling, the depth image repaired to the depth image；

Coding module obtains triple channel depth image for being encoded to the depth image of the reparation；

Identification module, for the RGB image and the triple channel depth image to be inputted trained identification mould in advance Type obtains the target identification result in the RGB image；The identification model is, in advance with the RGB image of several marks and Depth image corresponding with the RGB image of each mark obtains for sample training.

Above-mentioned apparatus, it is preferred that the filling module includes:

Binarization unit obtains mask for carrying out binary conversion treatment to the depth image；

First determination unit, for determining the empty point in the depth image according to the mask；

Cluster cell is clustered for the pixel value in the RGB image to gray processing, obtains cluster image, institute State the approximate pixel of pixel value in the RGB image of cluster image identification gray processing；

Second determination unit puts corresponding first picture with the cavity for determining in the RGB image of gray processing Non- cavity in element, and all second pixels similar with first pixel, second pixel and the depth image Point corresponds to；

Computing unit, for calculate first pixel between each second pixel at a distance from；

Fills unit, for using the corresponding depth value of the second pixel shortest at a distance between first pixel as institute State the Filling power of cavity point.

Above-mentioned apparatus, it is preferred that the filling module includes:

Third determination unit puts corresponding first pixel, Yi Jisuo with the cavity for determining in the RGB image The second pixel in the default neighborhood of the first pixel is stated, second pixel is corresponding with non-cavity point in the default neighborhood Pixel；

Above-mentioned apparatus, it is preferred that the identification model includes: depth network unit and convolutional neural networks unit；Wherein,

Above-mentioned apparatus, it is preferred that the depth network unit includes: three layer multi-layer perceptron convolutional layers；

By above scheme it is found that a kind of target identification method provided by the present application and device, acquire the RGB of target area Image and depth image；Empty filling, the depth image repaired are carried out to depth image；The depth image of reparation is carried out Coding obtains triple channel depth image；RGB image and triple channel depth image are inputted into trained identification model in advance, obtained Target identification result in RGB image.The application is using preparatory trained identification model, in conjunction with RGB image and depth image Target identification is carried out, the accuracy rate of target identification is improved, existing target identification method is solved and the environmental factors such as is illuminated by the light Influence and the low problem of recognition accuracy.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of implementation flow chart of target identification method provided by the embodiments of the present application；

Fig. 2 be it is provided by the embodiments of the present application empty filling carried out to depth image, the one of the depth image repaired Kind implementation flow chart；

Fig. 3 is a kind of structural schematic diagram of identification model provided by the embodiments of the present application；

Fig. 4 is a kind of exemplary diagram of Inception module provided by the embodiments of the present application；

Fig. 5 is a kind of structural schematic diagram of Target Identification Unit provided by the embodiments of the present application；

Fig. 6 is a frame image of pending target identification provided by the embodiments of the present application；

Fig. 7 is based on target identification method provided by the embodiments of the present application to image shown in Fig. 6 and corresponding depth image It is handled, obtained target identification result.

Specification and claims and term " first " in above-mentioned attached drawing, " second ", " third " " the 4th " etc. (if In the presence of) it is part for distinguishing similar, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so that embodiments herein described herein can be in addition to illustrating herein Sequence in addition is implemented.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under that premise of not paying creative labor Embodiment shall fall within the protection scope of the present invention.

Referring to Fig. 1, Fig. 1 provides a kind of implementation flow chart of target identification method for the embodiment of the present application, may include:

Step S101: the RGB image and depth image of target area are acquired.

RGB image and depth image that can be trivial using RGB-D depth camera acquisition target.Based on RGB-D depth phase When machine acquires image, one frame RGB image of every acquisition can acquire a frame depth image simultaneously.When being shown, only show RGB image.

Step S102: empty filling, the depth image repaired are carried out to depth image.

It usually there will be cavity in depth image using depth camera acquisition, need to repair it.It is optional one Embodiment in, the depth value that can use the pixel around cavity is filled to repair to depth map cavity.

Step S103: the depth image of reparation is encoded to obtain triple channel depth image.

Optionally, the depth image of reparation can be encoded using HHA coding method, obtained triple channel depth map Triple channel as in can be these three channels of the angle of level difference, distance away the ground and surface normal.HHA coding method is strong The complementary information between channel data is adjusted.

Step S104: RGB image and triple channel depth image are inputted into trained identification model in advance, obtain RGB figure Target identification result as in；Above-mentioned identification model is, in advance with several RGB images for being labelled with target and with each mark The corresponding depth image of RGB image obtains for sample training.

In the embodiment of the present application, made in advance with several pairs of RGB images and depth image that are acquired by RGB-D depth camera For training sample, it is trained to obtain identification model as label using the corresponding markup information of RGB image.RGB image is corresponding Markup information may include Text Flag corresponding with region is specified in RGB image.Text mark can not be identified to be schemed in RGB As in, but (the specified area information is by RGB image for the specified area information of Text Flag and RGB image and the RGB Pattern identification, for example, the specified area information can be rectangle frame) associated storage, wherein specified area information is used to illustrate Position of the target in RGB image.

Target identification method provided by the present application is schemed using preparatory trained identification model in conjunction with depth image and RGB As carrying out target identification, target identification precision is improved, existing target identification method is solved and the environmental factors such as is illuminated by the light It influences and the low problem of recognition accuracy.

It is above-mentioned that empty filling carried out to depth image in an optional embodiment, the one of the depth image repaired Kind implementation flow chart is as shown in Fig. 2, may include:

Step S201: binary conversion treatment is carried out to depth image, obtains mask.

Optionally, can be by depth image, the point two-value that depth value is zero turns to 0, the point two that depth value is not zero Value is 255, can be indicated with formula are as follows:

Mask indicates that mask, A (i, j) indicate the depth value at (i, j).

Step S202: the empty point in depth image is determined according to mask.

Cavity point is the point that value is 0 in mask.

Step S203: clustering the pixel value in the RGB image of gray processing, obtains cluster image, the cluster image Identify the approximate pixel of pixel value in the RGB image of gray processing.

The RGB image of gray processing refers to the gray level image being converted to by RGB image.Optionally, K-means can be used Algorithm clusters the pixel value in the RGB image of gray processing.Alternatively, can be using other clustering algorithms to gray processing Pixel value in RGB image is clustered, such as hierarchical clustering algorithm.Cluster characterization image gray processing RGB image in which The pixel value of pixel is approximate.

Step S204: it is determined in RGB image and puts corresponding first pixel, and the institute similar with the first pixel with cavity There is the second pixel, the second pixel is corresponding with the non-cavity point in depth image.

The pixel in pixel and depth map in RGB image is one-to-one.In the pixel for belonging to same cluster, both The first pixel including corresponding cavity point, and the second pixel including the non-cavity point of correspondence.

Step S205: calculate the first pixel between each second pixel at a distance from.

In the embodiment of the present application, corresponding first pixel is put for each cavity, passes through pixel value (i.e. gray value), meter The distance between each second pixel in first pixel and same cluster is calculated, which can be Euclidean distance, be also possible to Other distances, such as cosine similarity distance etc..

In an optional embodiment, the first pixel between the second pixel at a distance from can be by the first pixel and the second picture The comprehensive distance between the first pixel and the second pixel is calculated in Euclidean distance between element and image pixel distance.Image pixel away from From refer to by pixel measure two pixels between with a distance from.For example,

Assuming that the coordinate of a pixel is the 10th row the 30th column in image, the coordinate of b pixel is the 13rd row the 34th column, then The distance of two pixels in the row direction is 3, and distance in a column direction is 4, then the image slices of a pixel and b pixel Element distance is exactly 5.It, can be by the two (i.e. Europe after Euclidean distance and the image pixel distance for obtaining a pixel and b pixel Formula distance and image pixel distance) the sum of comprehensive distance as a pixel and b pixel, asked alternatively, the two is weighted With obtain the comprehensive distance of a pixel and b pixel, alternatively, after the two is extracted square root respectively summation obtain a pixel and b picture The comprehensive distance of vegetarian refreshments.

Step S206: using corresponding depth value the filling out as cavity point of the second pixel shortest at a distance between the first pixel It supplements with money.That is, being filled out using the corresponding depth value of the second pixel shortest at a distance between the first pixel to cavity point It fills.

It is above-mentioned that empty filling carried out to depth image in an optional embodiment, the one of the depth image repaired Kind of implementation can be with are as follows:

Binary conversion treatment is carried out to depth image, obtains mask.

The empty point in depth image is determined according to mask.

The implementation of above-mentioned two step may refer to previous embodiment, and I will not elaborate.

Second in the default neighborhood for putting corresponding first pixel and the first pixel with cavity is determined in RGB image Pixel, the second pixel are to put corresponding pixel with the non-cavity in depth image in above-mentioned default neighborhood.

In the present embodiment, the second pixel is the pixel in the neighborhood of the first pixel.

Calculate the first pixel between each second pixel at a distance from.The calculating process may refer to previous embodiment, here No longer it is described in detail.

Using the corresponding depth value of the second pixel shortest at a distance between the first pixel as the Filling power of cavity point.

In an optional embodiment, a kind of structural schematic diagram of above-mentioned identification model is as shown in figure 3, may include: depth Spend network unit and convolutional neural networks unit；Wherein,

Depth network unit (referred to as NIN network unit) is for handling triple channel depth image, to extract three The feature of channel depth image.In example shown in Fig. 3, the triple channel depth image of input is HHA_Img, size 300*300.

Convolutional neural networks unit (referred to as CNN network unit) extracts RGB image for handling RGB image Feature, the feature of feature and RGB image to triple channel depth image handles, and the target obtained in RGB image is known Other result.In example shown in Fig. 3, the RGB image of input is RGB_Img, size 300*300.

Optionally, NIN network unit includes three layer multi-layer perceptron convolutional layers (i.e. three layers of mlpconv network layer, in Fig. 3 Identified respectively with NIN1, NIN2, NIN3), mlpconv layers actually first carry out primary common convolution (convolution) Again plus traditional mlp (multilayer perceptron).Multilayer perceptron is the perceptron of one 2 layers (+1 hidden layer of input layer), it its Reality is that an element of the same position in each characteristic layer to common convolutional layer output is weighted linear recombination, This is equivalent to then all carry out this to each of characteristic pattern element to the operating result of a localized mass in 1X1 convolution The operation of sample, this is equivalent to 1X1 convolution.Because convolution be it is linear, and mlp be it is nonlinear, the latter can Obtain higher abstract, thus generalization ability is stronger.Under across channel case, mlpconv is equivalent to convolutional layer+1*1 convolutional layer.

CNN network unit includes two layers of convolution pond layer；Two layers first connect with two layers of convolution pond layer Inception module；The first pond layer being connect with two layers of the oneth Inception modules；Connect with first pond layer Five layer of the 2nd Inception module connect；The second pond layer being connect with five layer of the 2nd Inception module；With it is described Two layers of the 3rd Inception modules of the second pond layer connection；The third pond being connect with two layers of the 3rd Inception modules Change layer；The dropout layer being connect with third pond layer；The linear layer being connect with the dropout layer；With it is described linear The classification layer of layer connection；The decision-making level being connect with the classification layer；The output layer being connect with the decision-making level.

By taking Fig. 3 as an example, sequentially connected 7*7 convolutional layer Conv_7*7 in Fig. 3, maximum pond layer maxpool, 3*3 convolution Layer Conv_3*3 and maximum pond layer maxpool, constitutes two layers of convolution pond layer；Sequentially connected Inception (3a) and Inception (3b) constitutes two layers of the oneth Inception modules；The maximum pond layer maxpool being connect with Inception (3b) Constitute the first pond layer；Sequentially connected Inception (4a)-Inception (4e) constitutes five layer of the 2nd Inception mould Block；The maximum pond layer maxpool connecting with Inception (4e) constitutes the second pond layer；Sequentially connected Inception (5a) and Inception (5b) constitute two layers of the 3rd Inception modules；The mean value pond layer being connect with Inception (5b) Avgpool constitutes third pond layer；Dropout is dropout layer；Linear is linear layer；Softmax is classification layer； Detections is decision-making level；Non-Maximum Suppression is output layer.

Feature of the Inception module for exporting in multiple sizes to upper one layer carries out convolution simultaneously and polymerize again.Tool Body, for a kind of exemplary diagram of Inception module as shown in figure 4, in the example, the convolution kernel that Inception module uses is big It is small to use 1*1,3*3 and 5*5, different size of receptive field is meaned using different size of convolution kernel, finally splicing means The polymerization of different scale feature, why convolution kernel size uses 1*1,3*3 and 5*5, primarily to facilitating alignment.Setting volume After product step-length stride=1, as long as edge extended parameter pad=0,1,2 are set separately, then can be obtained after convolution The feature of identical dimensional, then these features can direct splicing together, while referring to the pond of 3*3 in a network Layer, network is more to below, and feature is more abstract, and receptive field involved in each feature is also bigger, therefore with the number of plies Increase, the ratio of 3x3 and 5x5 convolution will also increase, and still, still can bring huge calculation amount using the convolution kernel of 5x5.For This, carries out dimensionality reduction using 1x1 convolution kernel.

Corresponding with embodiment of the method, the application also provides a kind of Target Identification Unit, target identification provided by the present application A kind of structural schematic diagram of device is as shown in figure 5, may include:

Acquisition module 51 fills module 52, coding module 53 and identification module 54；Wherein,

Acquisition module 51 is used to acquire the RGB image and depth image of target area；

Module 52 is filled to be used to carry out the depth image empty filling, the depth image repaired；

Coding module 53 is for encoding the depth image of the reparation to obtain triple channel depth image；

Identification module 54 is used to the RGB image and the triple channel depth image inputting trained identification mould in advance Type obtains the target identification result in the RGB image；The identification model is, in advance with the RGB image of several marks and Depth image corresponding with the RGB image of each mark obtains for sample training.

Target Identification Unit provided by the present application acquires the RGB image and depth image of target area；To depth image into The filling of row cavity, the depth image repaired；The depth image of reparation is encoded to obtain triple channel depth image；It will RGB image and triple channel depth image input trained identification model in advance, obtain the target identification result in RGB image. The application carries out target identification using preparatory trained identification model, in conjunction with RGB image and depth image, improves target knowledge Other accuracy rate.

In an optional embodiment, filling module 52 may include:

In an optional embodiment, coding module 53 specifically can be used for: carry out HHA to the depth image of the reparation Coding, obtains triple channel depth image.

In an optional embodiment, the identification model may include: depth network unit and convolutional neural networks list Member；Wherein,

In an optional embodiment, the depth network unit includes: three layer multi-layer perceptron convolutional layers；

As shown in fig. 6-7, Fig. 6 is a frame image of pending target identification, is based on target identification side provided by the present application Method handles the frame image and the corresponding depth image of frame image, and obtained target identification result is as shown in Figure 7.It should In example, target is chair, when training identification model, is trained to obtain chair identification mould using the training sample comprising chair Type.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.Another point, shown or discussed mutual coupling, direct-coupling or communication connection can To be the indirect coupling or communication connection of device or unit through some interfaces, it can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It should be appreciated that can be combined with each other combination in the embodiment of the present application from power, each embodiment, feature, can realize Solve aforementioned technical problem.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of target identification method characterized by comprising

Acquire the RGB image and depth image of target area；

Empty filling, the depth image repaired are carried out to the depth image；

The RGB image and the triple channel depth image are inputted into trained identification model in advance, obtain the RGB image In target identification result；The identification model is, in advance the RGB image with the RGB image of several marks and with each mark Corresponding depth image obtains for sample training.

2. being obtained the method according to claim 1, wherein described carry out empty filling to the depth image The depth image of reparation includes:

Binary conversion treatment is carried out to the depth image, obtains mask；

The empty point in the depth image is determined according to the mask；

Pixel value in the RGB image of gray processing is clustered, cluster image, the cluster image identification gray scale are obtained The approximate pixel of pixel value in the RGB image changed；

Determining the first pixel corresponding with the cavity point in the RGB image of gray processing, and with first pixel Similar all second pixels, second pixel are corresponding with the non-cavity point in the depth image；

Calculate first pixel between each second pixel at a distance from；

Using the corresponding depth value of the second pixel shortest at a distance between first pixel as the Filling power of the cavity point.

3. being obtained the method according to claim 1, wherein described carry out empty filling to the depth image The depth image of reparation includes:

Binary conversion treatment is carried out to the depth image, obtains mask；

The empty point in depth image is determined according to the mask；

The default neighborhood that corresponding first pixel and first pixel are put with the cavity is determined in the RGB image The second interior pixel, second pixel are to put corresponding pixel with non-cavity in the default neighborhood；

Calculate first pixel between each second pixel at a distance from；

4. the method according to claim 1, wherein the identification model includes:

Depth network unit and convolutional neural networks unit；Wherein,

The depth network unit is for handling the triple channel depth image, to extract the triple channel depth image Feature；

The convolutional neural networks unit extracts the feature of the RGB image, to institute for handling the RGB image The feature of the feature and the RGB image of stating triple channel depth image is handled, and the target in the RGB image is obtained Recognition result.

5. according to the method described in claim 4, it is characterized in that, the depth network unit includes: three layer multi-layer perceptrons Convolutional layer；

The convolutional neural networks unit includes: two layers of convolution pond layer；Two layers connect with two layers of convolution pond layer One Inception module；The first pond layer being connect with two layers of the oneth Inception modules；With first pond layer Five layer of the 2nd Inception module of connection；The second pond layer being connect with five layer of the 2nd Inception module；With institute State two layers of the 3rd Inception modules of the second pond layer connection；The third being connect with two layers of the 3rd Inception modules Pond layer；The dropout layer being connect with third pond layer；The linear layer being connect with the dropout layer；With the line Property layer connection classification layer；The decision-making level being connect with the classification layer；The output layer being connect with the decision-making level.

6. a kind of Target Identification Unit characterized by comprising

Identification module is obtained for the RGB image and the triple channel depth image to be inputted trained identification model in advance To the target identification result in the RGB image；The identification model is, in advance with the RGB image of several marks and with it is each The corresponding depth image of the RGB image of mark obtains for sample training.

7. device according to claim 6, which is characterized in that the filling module includes:

Cluster cell is clustered for the pixel value in the RGB image to gray processing, obtains cluster image, described poly- The approximate pixel of pixel value in the RGB image of class image identification gray processing；

Second determination unit puts corresponding first pixel with the cavity for determining in the RGB image of gray processing, with And all second pixels similar with first pixel, second pixel and the non-cavity point pair in the depth image It answers；

Fills unit, for using the corresponding depth value of the second pixel shortest at a distance between first pixel as the sky The Filling power of hole point.

8. device according to claim 6, which is characterized in that the filling module includes:

Third determination unit, for the first pixel corresponding with the cavity point determining in the RGB image and described the The second pixel in the default neighborhood of one pixel, second pixel are to put corresponding picture with non-cavity in the default neighborhood Element；

9. device according to claim 6, which is characterized in that the identification model includes: depth network unit and convolution Neural network unit；Wherein,

10. device according to claim 9, which is characterized in that the depth network unit includes: three layer multi-layer perceptrons Convolutional layer；