CN109543601A

CN109543601A - A kind of unmanned vehicle object detection method based on multi-modal deep learning

Info

Publication number: CN109543601A
Application number: CN201811388553.3A
Authority: CN
Inventors: 程洪; 林子彧; 许成凤; 赵洋
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2019-03-29

Abstract

The invention discloses a kind of unmanned vehicle object detection methods based on multi-modal deep learning, comprising the following steps: (1) data acquire；(2) data are synchronous；(3) laser radar point cloud data is handled；(4) feature extraction；(5) tagsort returns；(6) it projects；(7) redundancy candidate frame is removed；(8) it corrects；(9) three dimensional detection frame is exported.The advantage of present invention combination multisensor overcomes the shortcomings of that image modalities to lane scene ability to express, cope with the scene perception under the complex scene that may be faced in unmanned vehicle traveling.

Description

A kind of unmanned vehicle object detection method based on multi-modal deep learning

Technical field

The present invention relates to the technical fields of pilotless target detection, are based on multi-modal depth more particularly to one kind The unmanned vehicle object detection method of habit.

Background technique

Since automobile is born, the characteristics of convenience and rapidity, is that everybody is obvious to all, but along with vapour Vehicle safety problem is also always the emphasis of car manufactures and consumer's concern.Avoiding road barricade in time is road vehicle peace Full important prerequisite, so how to accomplish to allow vehicle can " identification " present road situation either dynamic object or static object Body just becomes one ring of key of intelligent vehicle scene understanding.

The prior art is usually to use image modalities, but only be difficult to do intelligent vehicle road scene using image modalities One accurately reproduces and analyzes out, and camera has larger parameter variation when illumination variation is big, to make effect by shadow It rings, and ability to express is poor under the scenes weather such as rainy day, snowy day.It therefore, can be by increasing millimetre-wave radar, laser thunder Up to the modal information of equal multisensors, the disadvantage of camera information deficiency is made up, so that intelligent vehicle can prejudge under various circumstances The position of obstacle.

The patent of Publication No. CN107169421A describes a kind of car steering field based on depth convolutional neural networks Scape object detection method, but the patent there is no solve camera for light sensitive, bad weather when effect it is very poor lack Point can not provide when driving reliable scene perception information in unmanned vehicle, be only the framework for changing CNN network, improve Detection effect when single image mode is as input.

The patent of Publication No. CN102253391A describes a kind of pedestrian target track side based on multilasered optical radar Method detects pedestrian using the sensor network of multiple line laser radars composition, but the patent is provided only for pedestrian's Detection, is unable to satisfy unmanned vehicle when driving for the demand of scene perception, and since the point cloud of laser radar is very sparse, far Not as good as image data, so that it is extremely difficult in space search target.

Summary of the invention

In order to solve the above technical problems, the present invention provides a kind of advantage of combination multisensor, overcome image modalities to vehicle The deficiency of road scene ability to express copes with the base of the scene perception under the complex scene that may be faced in unmanned vehicle traveling In the unmanned vehicle object detection method of multi-modal deep learning.

A kind of unmanned vehicle object detection method based on multi-modal deep learning, comprising the following steps:

(1) data acquire: utilizing the image data and laser thunder of in-vehicle camera and mobile lidar acquisition monocular camera The point cloud data reached；

(2) it handles laser radar point cloud data: looking down visual angle by look down that visual angle projection algorithm obtains laser radar Figure；

(3) feature extraction: the deep layer for extracting laser radar top view and image data respectively using convolutional neural networks is special Sign；

(4) tagsort returns, and respectively obtains the candidate frame of laser radar and the multiple targets of image data；

(5) it projects: laser radar being looked down into visual angle figure and the respective candidate frame of image data projects to the feature of same specification Figure；

(6) it removes redundancy candidate frame: using laser radar candidate frame as standard, according to candidate frame central threshold range, abandoning Redundancy anchor in image data does not occur in laser radar such as but the anchor of candidate frame occurs in image data, then deletes；

(7) the nomination correction of image aspects avoids differing biggish laser radar target nomination with practical；

(8) three dimensional detection frame is exported: the laser radar point cloud data array and reservation of rule-based input convolutional network Depth information is returned out together by loss function to be detected the coordinate of vehicle and saves.

Wherein, the loss function be intersect entropy function and image section loss function multinomial and:

Wherein, N_l、N_cRespectively represent the number of training of laser radar and image；y_nFor the mark value of n-th of sample, it is Vehicle is represented when 1, represents other objects when being 0,It is convolutional neural networks for the predicted value of sample；W is net Parameter in network, multinomial first half are exactly the loss function definition about laser radar, and aft section is damaged about image Considering for function is lost, image impairment function loses L by confidence level_confL is lost with positioning_locIt forms, x represents prediction block in formula Whether matched with true frame, c represents the matching degree of prediction block Yu true frame, and α is balance factor, when cross validation Value is that 1, l represents prediction block, and g represents true frame, also represent the finally target detection classification level of confidence that returns out and Whether the accurate positioning of each coordinate points of target detection, multiple risks function constraint is exactly to fit close to label data True value improves the performance level of network entirety whereby.

The invention also includes data synchronizing steps, by the image of in-vehicle camera and mobile lidar acquisition monocular camera The point cloud data of data and laser radar synchronizes processing.

The coordinate quantity of the detected vehicle is 8.

The tagsort returns, the sub-step including following non-sequential execution:

A. the feature extracted from laser radar point cloud data is sent into RPN network, mentioned feature is classified and returned Return, obtain the frame coordinate value of multiple targets under 3D coordinate, by converting the candidate frame for obtaining 2D and looking down under visual angle；

B. the extracted feature of institute it will be sent into RPN network from image data, mentioned feature is classified and returned, Obtain the candidate frame of multiple targets on the plane of delineation.

Described looks down visual angle projection algorithm, including following sub-step:

(1) rectangular area for needing to project is incorporated by origin of this vehicle；

(2) Multidimensional numerical dimensionality reduction；

(3) laser radar point is mapped to location of pixels；

(4) translational coordination origin；

(5) normalized coordinates value is to 0~255；

(6) spectrum mapping is done to grayscale image based on the depth information of preservation；

Each laser radar point cloud data can be attached to a reflectivity and receive natural conditions to characterize the point The strength conditions of lower object reflection are the arrays that N row 4 arranges, first three is classified as the three-dimensional coordinate information of laser radar, and the Four are classified as the Reflection intensity information of corresponding points.

The present invention will be schemed using image combination laser radar and its multi-modal lane scene detection method of depth information Laser radar point cloud data is projected directly on image as pixel coordinate system and radar fix system translate rotation, overcomes biography That unites is only insufficient to lane scene ability to express with image modalities, has real-time good and obstacle target detection is accurate Advantage.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Specific embodiment

With reference to embodiment, the embodiment of the present invention is furthur described in detail.Following embodiment is used for Illustrate the present invention, but is not intended to limit the scope of the invention.

S1, the image data of monocular camera and the point cloud of laser radar are acquired using in-vehicle camera and mobile lidar Data, and synchronize；

S2, basis look down visual angle projection algorithm process generation radar and look down perspective data, incorporate into first by origin of this vehicle The rectangular area for needing to project, is then mapped to location of pixels for laser radar point.Each point cloud data point can be attached to one Reflectivity receives the strength conditions that object reflects under natural conditions to characterize the point, is the array that N row 4 arranges.First three columns The three-dimensional coordinate information of laser radar is saved, and the 4th column then save the Reflection intensity information of corresponding points.Standardize reflected intensity It is worth 0-255, then grayscale image does spectrum mapping；

S3, laser radar top view further feature is extracted using convolutional neural networks, is operated by convolution operation, pondization Feature extraction is carried out to the laser radar radar top view of input；

Laser radar top view further feature after S4, extraction is sent into RPN network, and mentioned feature is classified and returned Return, obtain laser radar 2D looks down perspective data candidate frame.

S5, image further feature is extracted using convolutional neural networks, the figure to input is operated by convolution operation, pondization As carrying out feature extraction；

Image further feature after S6, extraction is sent into RPN network, and mentioned feature is classified and returned, image is obtained Data candidate frame, as shown in Fig. 1 right part of flg；

S7, the characteristic pattern that laser radar and the respective candidate frame of image are projected to same specification, by each coordinate first divided by The picture of input obtains the frame coordinate on characteristic pattern with the ratio of the size of characteristic pattern, is obtained using the down-sampling pond that is averaged Output, due to the scale that the image and radar of input look down visual angle figure be all it is unified, using average pond；

S8, removal redundancy candidate frame, the present invention is using laser radar candidate frame as standard, according to candidate frame central threshold model It encloses, abandons the redundancy anchor in image.

S9, three dimensional detection frame is obtained, is classified by softmax function, the loss carried by tensorflow Each coordinate points of function -- mean square error (MSE) Lai Huigui three dimensional detection frame.

Wherein, softmax function is the real vector of a N-dimensional to be mapped to another N-dimensional real vector, normalization Whole elements between 0 to 1 to get the probability for belonging to each classification to each detection block,

Tensorflow is that a kind of deep learning frame of open source is similar with caffe, pytorch etc..

It is described that look down visual angle projection algorithm process as follows:

Ff=np, logical_and ((x_lidar>0), (x_lidar<q))

Ss=np, logical_and ((y_lidar>-t), (y_lidar<t))

(2) Multidimensional numerical dimensionality reduction；

Indices=np.argwhere (np.logical_and (ff, ss)) .flatten ()

(3) laser radar point is mapped to location of pixels；

For y_points=-t ... .t do

For x_points=-0 ... .q do

X_img=(- y_pointaes) .astype (np.int32)

Y_img=(- x_pointaes) .astype (np.int32)

end for

(4) translational coordination origin；

X_img-=-t

Y_img-=0

(5) normalized coordinates value is to 0~255；

(6) spectrum mapping is done to grayscale image based on the depth information of preservation.

Wherein, np.logical_and () function is logic and function in (1), and x_lidar is the seat of x in radar points cloud Mark, it is to set true which, which meets 0 < x_lidar < q value for ff corresponding position, is unsatisfactory for being set to false, y_lidar is same Reason；(2) np.argwhere () function is index function in, finds the index that ff and ss same position is true, that is, finds out Meet 0 < x_lidar < q in radar points cloud, the index of-t < y_lidar < t point, the index found out is a Multidimensional numerical, is needed One-dimensional array is translated into flatten () function；(3) it converts point cloud data to for two-dimensional image data in, X_img and y_img is the coordinate in top view, and res is the actual distance between two pixels, then passes through astype () Function is converted into integer value；(4) in that radar top view coordinate system and image coordinate system is unified.

A possibility that RPN prediction anchor in the present invention the is background or prospect i.e. process of classification, and extract anchor namely recurrence No matter position of the anchor on characteristic pattern out, take the anchor of what ratio, last testing result it is accurate whether largely by The candidate frame that RPN network provides, which is nominated, to be determined.The mistake nomination candidate frame even provided is more, and it is complicated not only to increase the algorithm time Degree, also has a significant impact to accuracy.Anchor described in the text is the regional frame on characteristic pattern in fact.

The softmax function that the present invention uses is the real vector of a N-dimensional to be mapped to another N-dimensional real vector, Whole elements are normalized between 0 to 1, more classification tasks are finally determined according to its range.

The advantage of present invention combination multisensor provides a kind of complexity that copes with and may face in unmanned vehicle traveling Scene perception under scene has been evaded and has only detected bring lane object scene false retrieval, missing inspection situation with image modalities, so that The result of target detection is relatively reliable.

And the present invention overcomes the shortcomings of that image modalities to lane scene ability to express, propose a kind of image combination laser Radar utilizes the multi-modal vehicle object detection method of deep learning, by the fusion of multi-modal information, when realizing vehicle driving pair In the accurate perception of scene around.

Loss function used in the present invention has quantified the gap between the predicted value of artificial neural network and true value, And train the purpose of neural network to be that and reduce this gap, therefore, minimizes loss function and namely train neural network Target.

Loss function of the present invention be intersect entropy function and image section loss function multinomial and form:

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvements and modifications can also be made, these improve and become Type also should be regarded as protection scope of the present invention.

Claims

1. a kind of unmanned vehicle object detection method based on multi-modal deep learning, which comprises the following steps:

(1) data acquire: utilizing the image data and laser radar of in-vehicle camera and mobile lidar acquisition monocular camera Point cloud data；

(2) it handles laser radar point cloud data: looking down visual angle figure by look down that visual angle projection algorithm obtains laser radar；

(3) feature extraction: the further feature of laser radar top view and image data is extracted respectively using convolutional neural networks；

(5) it projects: laser radar being looked down into visual angle figure and the respective candidate frame of image data projects to the characteristic pattern of same specification；

(6) it removes redundancy candidate frame: using laser radar candidate frame as standard, according to candidate frame central threshold range, abandoning image Redundancy anchor in data does not occur in laser radar such as but the anchor of candidate frame occurs in image data, then deletes；

(8) three dimensional detection frame is exported: the laser radar point cloud data array of rule-based input convolutional network and the depth of reservation Information is returned out together by loss function to be detected the coordinate of vehicle and saves.

2. a kind of unmanned vehicle object detection method based on multi-modal deep learning according to claim 1, feature exist In, the loss function be intersect entropy function and image section loss function multinomial and:

Wherein, N_l、N_cRespectively represent the number of training of laser radar and image, y_nFor the mark value of n-th of sample,For volume Product neural network is for the predicted value of sample, and W is the parameter in network, and multinomial first half is exactly the damage about laser radar Function definition is lost, aft section is considering about image impairment function, and image impairment function loses L by confidence level_confWith it is fixed Bit-loss L_locComposition, x represent whether prediction block matches with true frame, and c represents the matching degree of prediction block Yu true frame, and α is flat Weigh the factor, and l represents prediction block, and g represents true frame.

3. a kind of unmanned vehicle object detection method based on multi-modal deep learning according to claim 1, feature exist In, further include data synchronizing step, by in-vehicle camera and mobile lidar acquisition monocular camera image data and laser thunder The point cloud data reached synchronizes processing.

4. a kind of unmanned vehicle object detection method based on multi-modal deep learning according to claim 1, feature exist In the coordinate quantity of the detected vehicle is 8.

5. a kind of unmanned vehicle object detection method based on multi-modal deep learning according to claim 1, feature exist In the tagsort returns, the sub-step including following non-sequential execution:

A. the feature extracted from laser radar point cloud data is sent into RPN network, mentioned feature is classified and returned, is obtained To the frame coordinate value of targets multiple under 3D coordinate, by converting the candidate frame for obtaining 2D and looking down under visual angle；

B. the extracted feature of institute it will be sent into RPN network from image data, mentioned feature is classified and returned, figure is obtained As the candidate frame of targets multiple in plane.

6. a kind of unmanned vehicle object detection method based on multi-modal deep learning according to claim 1, feature exist In described looks down visual angle projection algorithm, including following sub-step:

(2) Multidimensional numerical dimensionality reduction；

(3) laser radar point is mapped to location of pixels；

(4) translational coordination origin；

(5) normalized coordinates value is to 0~255；

7. a kind of unmanned vehicle object detection method based on multi-modal deep learning according to claim 1, feature exist In each laser radar point cloud data can be attached to a reflectivity and receive object under natural conditions to characterize the point The strength conditions of reflection, are the arrays that N row 4 arranges, first three is classified as the three-dimensional coordinate information of laser radar, and the 4th is classified as pair The Reflection intensity information that should be put.