CN108765444A

CN108765444A - Ground T shape Moving objects detection and location methods based on monocular vision

Info

Publication number: CN108765444A
Application number: CN201810520560.8A
Authority: CN
Inventors: 侯谊; 贺风华; 姚郁; 马杰; 郝宁
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2018-11-06

Abstract

Ground T shape Moving objects detection and location methods based on monocular vision, are related to target detection technique field.The present invention can accurately obtain the position and direction of T-type plate on ground in order to realize on rotor wing unmanned aerial vehicle.The present invention acquires a certain number of target positive samples and negative sample picture first, and grader is generated according to picture；Then, the picture of input camera acquisition obtains the different region of picture with a certain size detection window and feature is inputted trained grader to the extracted region HOG features of acquisition, to judge whether the region is target；To the region of then target, color notation conversion space is carried out, image segmentation is carried out according to threshold value, is generated and binary map, shape analysis is carried out to binary map later, to obtain the direction of T-type plate；Finally, it according to the posture of camera and elevation information, is coordinately transformed, obtains relative position and the direction of target and aircraft.The present invention is suitable for the relative position and the direction of motion that unmanned plane obtains mobile surface targets.

Description

Ground T shape Moving objects detection and location methods based on monocular vision

Technical field

The method that the present invention relates to a kind of to detect on rotor wing unmanned aerial vehicle and position ground target, is related to target detection Technical field.

Background technology

Rotor unmanned aircraft is a kind of novel in appearance, and the vertical take-off and landing unmanned aerial vehicle of superior performance has structure letter The features such as single, operation is flexibly, load capacity is strong has in military, civil fields such as reconnaissance patrol, the monitoring of natural calamity and supports Have a wide range of applications.

For unmanned plane, the detection of ground target is to realize the basis of unmanned plane and environmental interaction with positioning, can be with Promote automation and the intelligent level of flight control.Task phase is executed in unmanned plane, and detection and positioning to target can be with Improve the ability that unmanned plane independently executes task.

For target detection problems, there is the technology of some.There are two main classes for the method for target detection, first, based on fortune The detection mode of dynamic information, such as background subtraction, but the application scenarios of this kind of method are static background.And the camera shooting of aircraft Head constantly moves, therefore background also changes in real time, and target is also moving.Therefore, mesh is detected using this method There are larger difficulty for mark.Second method is to utilize the feature of target inherently, and target is extracted from background.Such as profit With color characteristic, edge feature is split target.Color, edge, texture are all the intrinsic features in objective world, can The difference between target and background is depicted well.Coloured image contains abundant colouring information, and is easy to apply, and has good Good real-time, but the variation of many factors such as characteristic of illumination, shade, video camera itself, all can be to the segmentation of color characteristic It impacts.Edge is the high-level feature obtained to the integrated treatment of more pixels in image, can describe object well The profile information of itself, and the algorithm of target detection based on edge has good accuracy and robustness, but calculation amount is larger.

Invention content

The present invention proposes a kind of ground T shape Moving objects detection and location methods based on monocular vision, to realize The position and direction of T-type plate on ground can be accurately obtained on rotor wing unmanned aerial vehicle, improve T shape Moving objects detection and locations Accuracy rate.T-type plate of the present invention is as shown in Figure 2.

The present invention adopts the technical scheme that solve above-mentioned technical problem：

Ground T shape Moving objects detection and location methods based on monocular vision, include the following steps：

Step 1: acquiring a certain number of positive samples and negative sample, to the sample extraction HOG features of acquisition, and mark is assigned Feature is sent into label in SVM (support vector machines) and is trained by label, preserves the grader that training obtains；

Step 2: the picture that camera acquires on input unmanned plane, the camera being connected with unmanned plane is just lower to be regarded to obtain Camera height off the ground acquires distance of the camera away from target according to known target height off the ground；

According to the size l in known target real space_x*l_yWith the f of height h and fixed-focus away from camera of video camera_x, f_y, the size of target in the picture is found out, pixel number of the target in image x-axis, y-axis is：

The region of the window interception picture of predefined size is used later；

f_xIndicate the focal length in x-axis of the fixed-focus away from camera, unit is pixel；f_yIndicate the coke in y-axis of the fixed-focus away from camera Away from unit is pixel；

Step 3: the extracted region HOG features to intercepting picture in step 2, the classification that feature input step one is generated In device, grader exports whether the region is target；

Step 4: to the region of then target, carries out color notation conversion space and carried out based on threshold value according to colouring information Image segmentation generates binary map, by T-type plate and background separation；

Step 5: according to the segmentation result in binary map, rectangle fitting is carried out, T-type plate is determined with rectangle, according to The length side information of fitted rectangle obtains the accurate location of target and possible direction, and according to the face at four angles in rectangular area Color information judges the accurate direction of T-type plate；

Step 6: the position according to the target of step 3 determination in the picture, solves position of the target relative to aircraft； Camera is fixed on aircraft, so the posture information and elevation information of camera can be obtained according to the posture and height of aircraft , according to camera forming model and rotation transformation relationship, the relative position relation of target and aircraft can be acquired.

Further, step 1 includes the following steps：

Step 1 A, a certain number of positive samples of acquisition and negative sample, to the sample extraction HOG features of acquisition, and assign mark Label；

The Target Photo under different angle difference light conditions is acquired, is intercepted comprising complete object just using sectional drawing tool Square region, wherein target percentage in figure is more than 60%, assigns these pictures to label value 1, is for positive sample； Picture of the acquisition not comprising target assigns these pictures to label value -1, is for negative sample；Positive sample and the ratio of negative sample are 1:3；All pictures are adjusted to onesize；

Step 1 B, HOG features are extracted to picture, process is as follows：Color is carried out to input picture using Gamma correction methods The standardization in space；The Grad dx and dy in each pixel level direction and vertical direction in image are calculated later, and calculate ladder The direction θ of degree：

Dx (x, y)=I (x+1, y)-I (x-1, y)

Dy (x, y)=I (x, y+1)-I (c, y-1)

In formula, I (x, y) is that coordinate is the gray value of the pixel of (x, y) in image；

Picture is divided into n*n cell, each region includes m*m pixel；The Gradient Features histogram of generation unit lattice Figure, is divided into k section by gradient direction, and using these sections as the horizontal axis of histogram, the value in each section is that this belongs to the area Between the sum of pixel gradient value, the value in each section in statistic unit lattice is the longitudinal axis of histogram, then each cell includes k A characteristic value；The feature of j*j cell is connected in series together, j<N generates a block B, returns to the feature vector in block One changes, and generates j*j*k feature vector；According to the block feature of default step-length motion scan picture, step-length is one or more single First lattice, the feature of all pieces of statistics, as final HOG features；

Step 1 C, training grader：The picture feature vector that the picture tag that step 1 A is generated is generated with step 1 B It is input in SVM classifier and is trained, generate the grader for classification.

Further, in step 2：

By the length of side size setting w=min (n of target in the picture_x,n_y), the length of side size of estimation target in the picture is Between w/2 to 2*w；

The flow for intercepting picture region is as follows：First, the w/2 that the size of detection window is set to the target length of side, according to one Fixed step-length moves detection window over an input image, and under normal circumstances, step-length may be set to the 1/2 or 1/ of the detection window length of side 4, the region that window each time is truncated to is preserved, until detection window all carried out interception to input picture full figure；So Afterwards, detection window is amplified to certain multiple, multiple may be set to 1.1 times, continue to move according to certain step-length, until time Go through complete graph；Continuous amplification detection window, intercepts picture region, until the detection window length of side is more than the target maximum side of estimation Long, as w*2 stops interception picture, the region of all interceptions is preserved.

Further, step 4 includes the following steps：

According to the target area that step 3 obtains, corresponding region in original picture is intercepted, color is carried out to the picture of interception Spatial alternation；There is RGB color to be converted to hsv color space in picture according to following formula：

V=max (R, G, B)

if(H<0) H=H+360.

Wherein, R, G, B are pixel in r, g, the numerical value on the channels b, H, S, and V is pixel in h, s, the numerical value on the channels v.

For red and blue target, using different Color Channels：

Red T-type plate is split using the channels H, for blue target, is split using the channels V；For interception Pixel in picture, pixel value meet set interval (I_l,I_h) be target point, other point be background dot；By target The pixel value of point is set as 255, the pixel value of background dot is set as 0, you can the binary map after being divided：

Further, in step 5,

According to the binary map for only including T-shaped plate of generation, the minimum enclosed rectangle of T-type plate is obtained using rectangle fitting；T-type The length-width ratio of plate is 5：3, the direction of motion and short side of robot are assured that by the information of boundary rectangle long side and short side Straight line where AC is consistent；Two round a and circle b and T-shaped plate on the upside of T-type plate are not intersections, and two following circles Circle c and circle d are completely superposed with T-shaped plate；The number of the stain and white point in binary map in four circles of T-shaped plate is counted come the side of determination To directions question：

Number(a)+Number(b)>Number (c)+Number (d),

Number (x) indicates the white point number for including in circle x in binary map, thereby determines that the direction of motion has C to be directed toward A.

Further, the process of acquisition aiming spot information is：

Coordinate of the known target point in pixel coordinate system is (u, v)；The difference in height h, h that target point is sat with camera>0, it can It is acquired by ultrasonic ranging；Transformational relation between each coordinate system of camera coordinates system, body coordinate system, world coordinate system can be by Spin matrix indicates；

According to camera imaging model, coordinate of the target point under camera coordinates system is acquired as the following formula：

Wherein, s is a dimension scale, and M is Intrinsic Matrix, [X_ca Y_ca Z_ca]^TIt is target point under camera coordinates system Coordinate；

Image coordinate system and body coordinate system, the transformational relation between body coordinate system and competition field coordinate system can be by following public affairs Formula acquires：

Wherein,For the spin matrix of image coordinate system and body coordinate system, R is body coordinate system and competition field coordinate system Between spin matrix；Then by the above transformational relation, can acquire：

Competition field moving coordinate system Z axis straight down, it follows that Z=h；According to transformational relation required before, can obtain Following formula：

M,R it is known that can find out the value of s, and then acquire X, Y as a result, as aiming spot information.

The invention has the advantages that：

The present invention is a kind of method detected using monocular cam and position red or green T-type plate, in rotor It is detected on unmanned plane and positions mobile surface targets.The present invention acquires a certain number of target positive samples and negative sample figure first Piece generates grader according to picture；Then, the picture of input camera acquisition obtains picture with a certain size detection window Feature is inputted trained grader by different regions to the extracted region HOG features of acquisition, whether to judge the region For target；To the region of then target, color notation conversion space is carried out, image segmentation is carried out according to threshold value, is generated and binary map, it Shape analysis is carried out to binary map afterwards, to obtain the direction of T-type plate；Finally, it according to the posture of camera and elevation information, is sat Mark transformation, obtains relative position and the direction of target and aircraft.The present invention is suitable for unmanned plane and obtains mobile surface targets Relative position and the direction of motion.The present invention has higher accuracy rate, is embodied in：

1,90% or more accuracy rate can be reached for the detection of T-type plate on ground

2, the present invention is less than 10 ° to the evaluated error in T-type plate direction, and site error is less than 3cm

3, it is intel i7-6500u using processor, the computer of dominant frequency 2.6GHz is run, the method for the present invention place per second It is about 5 frames to manage speed, meets requirement of real-time.

4, in conjunction with the elevation information of aircraft, detection algorithm algorithm is optimized, improves the efficiency of detection.

Description of the drawings

Fig. 1 is the flow chart of embodiment；

Fig. 2 is T-type plate to be detected；

Fig. 3 is that schematic diagram is extracted in T-type plate direction；

Fig. 4 is the green positive sample picture of acquisition；

Fig. 5 is the red positive sample picture of acquisition；

Fig. 6 is the shift process for resolving position.

Specific implementation mode

The step of a kind of ground T shape Moving objects detection and location methods based on monocular vision of described in present embodiment It is as follows：

Step 2: the picture that camera acquires on input unmanned plane, camera just regard down, and are connected with unmanned plane, so We can obtain camera height off the ground, simultaneously, it is assumed that target height off the ground, can be in the hope of camera away from target it is known that therefore Distance.And UAV Attitude variation range is smaller, therefore, it is just to regard down that can be approximately considered camera.In target real space Size l_x*l_y, it was known that with the height h of video camera it is known that and we using fixed-focus away from camera, so f_x,f_y, it was known that by This can find out the size of target in the picture, and pixel number of the target in image x-axis, y-axis is：

The region of the window interception picture of particular size is used later；

Step 3: to the extracted region HOG features of step 4 interception, in the grader that feature input step one is generated, Grader exports whether the region is target；

Step 6: the coordinate according to the target of step 3 determination in the picture, solves position of the target relative to aircraft. Camera is fixed on aircraft, so the posture information and elevation information of camera can be obtained according to the posture and height of aircraft , according to camera forming model and rotation transformation relationship, the relative position relation of target and aircraft can be acquired.

Each step is described further：

Step 1 A, a certain number of positive samples of acquisition and negative sample, to the sample extraction HOG features of acquisition, and assign mark Label.First, the Target Photo under different angle difference light conditions is acquired, using sectional drawing tool, interception includes complete object Square area, wherein target percentage in figure is more than 60%, and such as Fig. 4, these pictures are assigned to label shown in Fig. 5 Value 1 is for positive sample；Picture of the acquisition not comprising target assigns these pictures to label value -1, is for negative sample.Positive sample Ratio with negative sample is about 1:3.Can be according to by this onesize step that is adjusted to of all pictures, later by image carry out by According to the processing of following formula gray processing.

Step 1 B, HOG features are extracted to picture.Input picture can be carried out using Gamma correction methods according to the following steps The standardization of color space；The Grad dx and dy in each pixel level direction and vertical direction in image are calculated later, and are counted Calculate the direction θ of gradient：

Dx (x, y)=I (x+1, y)-I (x-1, y)

Dy (x, y)=I (x, y+1)-I (x, y-1)

In formula, I (x, y) is that coordinate is the gray value of the pixel of (x, y) in image.

Picture is divided into n*n cell, each region includes m*m pixel；The Gradient Features histogram of generation unit lattice Figure, is divided into k section by gradient direction, and using these sections as the horizontal axis of histogram, the value in each section is that this belongs to the area Between the sum of pixel gradient value, the value in each section in statistic unit lattice is the longitudinal axis of histogram, then each cell includes k A characteristic value；The feature of j*j cell is connected in series together, j<N generates a block B, returns to the feature vector in block One changes, and generates j*j*k feature vector；According to the block feature of certain step-length motion scan picture, step-length is several cells, Count all pieces of feature, as final HOG features.

Step 1 C, training grader.The picture feature vector that the picture tag that step 1 A is generated is generated with step 1 B It is input in SVM classifier and is trained, generate the grader for classification.

By the length of side size setting w=min (n of target in the picture_x,n_y).Therefore, we estimate target in the picture Length of side size is between w/2 to 2*w.

The flow for intercepting picture region is as follows.First, the w/2 that the size of detection window is set to the target length of side, according to one Fixed step-length moves detection window over an input image, and under normal circumstances, step-length may be set to the 1/2 or 1/ of the detection window length of side 4, the region that window each time is truncated to is preserved, until detection window all carried out interception to input picture full figure；So Afterwards, detection window is amplified to certain multiple, multiple can be set as 1.1 times, continue to move according to certain step-length, until Traverse complete graph；Continuous amplification detection window, intercepts picture region, until the detection window length of side is more than the target maximum side of estimation Long, as w*2 stops interception picture, the region of all interceptions is preserved.

Step 3: according to the HOG feature extracting methods of step 1 B, HOG features are carried out to the region that step 2 generates and are carried It takes, generates the feature vector in the region, feature vector is input in the grader that step 1 C is generated, grader exports target Class label, if label be 1, the region be target area, if output label be -1, the region be background.

Step 4: according to the target area that step 3 obtains, intercept corresponding region in original picture, to the picture of interception into Row color notation conversion space.Picture is had RGB color to be converted to hsv color space by we, can be according to following formula：

V=max (R, G, B)

if(H<0) H=H+360.

For red and blue target, using different Color Channels.Red T-type plate is split using the channels H, right In blue target, it is split using the channels V.For the pixel in interception picture, pixel value meets set interval (I_l, I_h) be target point, other point be background dot.The pixel value of target point is set as 255, the pixel value of background dot is set It is set to 0, you can the binary map after being divided.

Step 5: according to the binary map for only including T-shaped plate generated in step 4, we obtain T-type using rectangle fitting The minimum enclosed rectangle of plate.The length-width ratio of T-type plate is 5：3, therefore can be true by boundary rectangle long side and the information of short side The direction of motion for determining robot is consistent with the straight line where short side AC, as shown in Figure 3.The two round a and circle b and T shapes of upside Plate is not intersection, and following two round c and circle d are completely superposed with T-shaped plate.We can count in binary map The number of stain and white point in four circles of T-shaped plate, so that it may to determine the directions question in direction.

Number(a)+Number(b)>Number(c)+Number(d)

(Number (x) indicates the white point number for including in circle x in binary map) thereby determines that the direction of motion has C to be directed toward A.

Step 6: coordinate of the known target point in pixel coordinate system is (u, v)；The difference in height h that target point is sat with camera (h>0) it, can be acquired by ultrasonic ranging；Camera coordinates system, body coordinate system turn between each coordinate system such as world coordinate system The relationship of changing can be indicated that transfer process is as shown in Figure 6 by spin matrix.

According to camera imaging model, coordinate of the target point under camera coordinates system can be acquired by formula.

Wherein, s is a dimension scale, and M is Intrinsic Matrix, [X_ca Y_ca Z_ca]^TIt is target point under camera coordinates system Coordinate.

Wherein,For the spin matrix of image coordinate system and body coordinate system, R is body coordinate system and competition field coordinate system Between spin matrix.Then by the above transformational relation, can acquire：

Competition field moving coordinate system Z axis straight down, it follows that Z=h.According to transformational relation required before, can obtain Following formula：

Claims

1. the ground T shape Moving objects detection and location methods based on monocular vision, which is characterized in that include the following steps：

Step 1: acquiring a certain number of positive samples and negative sample, to the sample extraction HOG features of acquisition, and label is assigned, it will Feature is sent into SVM with label and is trained, and the grader that training obtains is preserved；

Step 2: the picture that camera acquires on input unmanned plane, the camera being connected with unmanned plane is just lower to be regarded to be imaged Head height off the ground, distance of the camera away from target is acquired according to known target height off the ground；

According to the size l in known target real space_x*l_yWith the f of height h and fixed-focus away from camera of video camera_x,f_y, ask Go out the size of target in the picture, pixel number of the target in image x-axis, y-axis is：

f_xIndicate the focal length in x-axis of the fixed-focus away from camera, unit is pixel；f_yIndicate the focal length in y-axis of the fixed-focus away from camera, it is single Position is pixel；

Step 3: the extracted region HOG features to intercepting picture in step 2, the grader that feature input step one is generated In, grader exports whether the region is target；

Step 4: to the region of then target, carries out color notation conversion space and the image based on threshold value is carried out according to colouring information Segmentation generates binary map, by T-type plate and background separation；

Step 5: according to the segmentation result in binary map, rectangle fitting is carried out, T-type plate is determined with rectangle, according to fitting The length side information of rectangle obtains the accurate location of target and possible direction, and is believed according to the color at the angle of rectangular area four Breath, judges the accurate direction of T-type plate；

Step 6: the position according to the target of step 3 determination in the picture, solves position of the target relative to aircraft；It will take the photograph As head is fixed on aircraft, so the posture information and elevation information of camera can be obtained according to the posture and height of aircraft, According to camera forming model and rotation transformation relationship, the relative position relation of target and aircraft can be acquired.

2. the ground T shape Moving objects detection and location methods according to claim 1 based on monocular vision, feature exist In step 1 includes the following steps：

Step 1 A, a certain number of positive samples of acquisition and negative sample, to the sample extraction HOG features of acquisition, and assign label；

The Target Photo under different angle difference light conditions is acquired, the square of complete object is included using the interception of sectional drawing tool Region, wherein target percentage in figure is more than 60%, assigns these pictures to label value 1, is for positive sample；Acquisition Picture not comprising target assigns these pictures to label value -1, is for negative sample；Positive sample and the ratio of negative sample are 1:3； All pictures are adjusted to onesize；

Step 1 B, HOG features are extracted to picture, process is as follows：Color space is carried out to input picture using Gamma correction methods Standardization；The Grad dx and dy in each pixel level direction and vertical direction in image are calculated later, and calculate gradient Direction θ：

Dx (x, y)=I (x+1, y)-I (x-1, y)

Dy (x, y)=I (x, y+1)-I (x, y-1)

Picture is divided into n*n cell, each region includes m*m pixel；The Gradient Features histogram of generation unit lattice, will Gradient direction is divided into k section, and using these sections as the horizontal axis of histogram, the value in each section is the picture for belonging to the section The sum of plain Grad, the value in each section in statistic unit lattice, is the longitudinal axis of histogram, then each cell includes k feature Value；The feature of j*j cell is connected in series together, j<N generates a block B, the feature vector in block is normalized, Generate j*j*k feature vector；According to the block feature of default step-length motion scan picture, step-length is one or more cells, Count all pieces of feature, as final HOG features；

Step 1 C, training grader：The picture tag that step 1 A is generated is inputted with the picture feature vector that step 1 B is generated To being trained in SVM classifier, the grader for classification is generated.

3. the ground T shape Moving objects detection and location methods according to claim 1 based on monocular vision, feature exist In in step 2：

By the length of side size setting w=min (n of target in the picture_x,n_y), the length of side size of estimation target in the picture is w/2 To between 2*w；

The flow for intercepting picture region is as follows：First, the w/2 that the size of detection window is set to the target length of side, according to certain Step-length moves detection window over an input image, and under normal circumstances, step-length may be set to the 1/2 or 1/4 of the detection window length of side, will The region that window is truncated to each time preserves, until detection window all carried out interception to input picture full figure；Then, will Detection window is amplified to certain multiple, and multiple may be set to 1.1 times, continues to move according to certain step-length, until having traversed Full figure；Continuous amplification detection window, intercepts picture region, until target maximal side of the detection window length of side more than estimation, i.e., For w*2, stops interception picture, the region of all interceptions is preserved.

4. the ground T shape Moving objects detection and location methods according to claim 1 based on monocular vision, feature exist In step 4 includes the following steps：

According to the target area that step 3 obtains, corresponding region in original picture is intercepted, color space is carried out to the picture of interception Transformation；There is RGB color to be converted to hsv color space in picture according to following formula：

V=max (R, G, B)

if(H<0) H=H+360.

For red and blue target, using different Color Channels：

Red T-type plate is split using the channels H, for blue target, is split using the channels V；For intercepting picture In pixel, pixel value meets set interval (I_l,I_h) be target point, other point be background dot；By target point Pixel value is set as 255, the pixel value of background dot is set as 0, you can the binary map after being divided：

5. the ground T shape Moving objects detection and location methods according to claim 1 based on monocular vision, feature exist In, in step 5,

According to the binary map for only including T-shaped plate of generation, the minimum enclosed rectangle of T-type plate is obtained using rectangle fitting；T-type plate Length-width ratio is 5：3, the direction of motion and the short side AC institutes of robot are assured that by the information of boundary rectangle long side and short side Straight line it is consistent；Two round a on the upside of T-type plate and circle b and T-shaped plate are not intersections, and following two round c and Circle d is completely superposed with T-shaped plate；The number of the stain and white point in binary map in four circles of T-shaped plate is counted to determine direction Directions question：

Number(a)+Number(b)>Number (c)+Number (d),

6. the ground T shape Moving objects detection and location methods according to claim 1 based on monocular vision, feature exist In the process for obtaining aiming spot information is：

Coordinate of the known target point in pixel coordinate system is (u, v)；The difference in height h, h that target point is sat with camera>0, it can be by surpassing Sound ranging acquires；Transformational relation between each coordinate system of camera coordinates system, body coordinate system, world coordinate system can be by rotating Matrix indicates；

Wherein, s is a dimension scale, and M is Intrinsic Matrix, [X_ca Y_ca Z_ca]^TFor seat of the target point under camera coordinates system Mark；

Image coordinate system and body coordinate system, the transformational relation between body coordinate system and competition field coordinate system can be asked by following formula ?：

Wherein,For the spin matrix of image coordinate system and body coordinate system, R is between body coordinate system and competition field coordinate system Spin matrix；Then by the above transformational relation, can acquire：