CN115984766A

CN115984766A - Rapid monocular vision three-dimensional target detection method for underground coal mine

Info

Publication number: CN115984766A
Application number: CN202211571246.5A
Authority: CN
Inventors: 赵佳琦; 王斌; 周勇; 芦志广; 阿卜杜穆塔利布·埃尔·萨迪克
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-18

Abstract

The invention discloses a method for rapidly detecting a monocular vision three-dimensional target under a coal mine, which comprises the following steps: s1, firstly, acquiring image data containing a specified target by using a camera installed underground, carrying out two-dimensional frame labeling, and constructing a target detection data set; s2, constructing a two-dimensional target detection network model and training on the constructed target detection data set to obtain a trained two-dimensional target detection model; s3, defining a world coordinate system, calibrating internal parameters of the camera by using a checkerboard calibration method, and completing calibration of the camera and external parameters of the world coordinate system by using laser radar assistance; s4, inputting the image into a two-dimensional target detection model to obtain a target two-dimensional frame, and calculating a target three-dimensional coordinate by combining internal and external parameters of a camera; and S5, calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinates of the target, the course angle of the target along the roadway direction and the length, width and height of the target.

Description

Method for rapidly detecting three-dimensional target by monocular vision under coal mine

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a method for rapidly detecting a monocular vision three-dimensional target under a coal mine.

Background

In recent years, the accuracy of three-dimensional target detection is greatly improved by the laser radar-based algorithm. However, the disadvantages of lidar are: the cost is high and the device is easily influenced by the surrounding environment. When the laser radar module fails, the three-dimensional detection based on monocular camera data can improve the robustness of the detection system. Therefore, it is important how to implement reliable and accurate three-dimensional detection based on camera data. The method of estimating the three-dimensional bounding box from the image alone faces a greater challenge than the lidar-based method, since recovering three-dimensional information from two-dimensional input data is an ill-posed problem. However, despite this inherent difficulty, image-based three-dimensional object detection methods have found widespread use in the computer vision community over the past few years.

However, currently, research on a single-purpose three-dimensional target detection method is focused in the field of automatic driving, a camera for sensing is mounted on a vehicle, and well-known data sets include KITTI, waymo, nuScenes and the like, and the field of research on a fixed camera and an underground coal mine environment is very little. With the development of technology and safety requirements, various automatic devices such as an inspection robot, an autonomous navigation transport vehicle and the like begin to appear under a coal mine. Accurate positioning information is essential for these devices. Therefore, the camera installed at the top of the roadway is used for detecting the three-dimensional target of the autonomous vehicle in the roadway, so that the auxiliary vehicle is favorably positioned, and a manager is favorably monitored. Aiming at the complex underground environment of a coal mine, a monocular camera arranged at the top of a roadway is required to be used for detecting a three-dimensional target, and the existing method has the following problems:

(1) The camera installed at the top of the roadway is narrow and long in sensing range, and few information can be extracted. The underground coal mine has a long and narrow structure, the distance observed by a camera at the top of a roadway is farther than that observed by a camera installed on a vehicle, and the proportion of a target in an image is greatly changed. Moreover, because underground light is dark, depth features and visual features which can be extracted by a neural network are reduced, and the existing method is difficult to accurately deduce three-dimensional information of a target.

(2) The existing method relies on a large amount of label data for training, and the data acquisition and labeling process is time-consuming and labor-consuming.

(3) Although the existing method achieves higher reasoning speed, the method still has larger calculated amount and high energy consumption depending on a high-performance display card, and is difficult to transplant into underground equipment.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to overcome the defects of the prior art and realize a method for rapidly detecting a three-dimensional target by using a monocular camera installed under a coal mine. Compared with the existing monocular three-dimensional target detection method, the method disclosed by the invention fully utilizes the special environmental characteristics of the mine, has lower requirements on the calculation force of equipment and better instantaneity, and is more suitable for the underground environment of the coal mine.

The technical scheme is as follows: in order to achieve the aim of the invention, the invention provides a method for rapidly detecting a monocular vision three-dimensional target under a coal mine, which comprises the following steps:

s1, constructing a two-dimensional target detection data set: acquiring a data set with a specified target by using a monocular camera installed in a coal mine, determining the number of classes of the target in the data set, marking a two-dimensional frame and a class of the target in each image sample, randomly disordering the obtained image and a corresponding label, and dividing the image into a training set and a testing set;

s2, training a neural network model: training on a training set, testing on a testing set, and obtaining an optimal neural network model through parameter adjustment;

s3, calibrating internal parameters and external parameters of the camera: defining a world coordinate system, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P, and completing calibration of camera and world coordinate system external parameters by using laser radar assistance to obtain a transformation matrix T from the world coordinate system to the camera coordinate system _W2C ；

S4, calculating a target three-dimensional coordinate: inputting the image into the two-dimensional target detection model obtained in the step S2 to obtain a target two-dimensional frame, and calculating a two-dimensional coordinate C of the center of the bottom edge of the frame _2d Calculating a target three-dimensional coordinate C by combining internal and external parameters of the camera;

s5, calculating a target three-dimensional bounding box: and calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinate C of the target, the course angle alpha of the target along the roadway direction and the length, width and height [ l, w, h ] priori information of the target, which are obtained in the step S4.

Further, in step S1, a specific method for constructing a two-dimensional target detection data set is as follows:

s11, acquiring target image data of different types of targets in different environments by using a monocular camera installed under a coal mine, wherein the different environments comprise different illumination conditions and different postures of the targets;

s12, determining the number of classes of the targets to be detected, using a labellmg tool to label the acquired images in a two-dimensional frame mode and derive labels, wherein each picture corresponds to one label and comprises the classes and two-dimensional boundary frames of all the targets in the picture, and dividing the labeled data set into a training set and a testing set which comprise the pictures and corresponding label files.

Further, in step S2, a specific method for training the two-dimensional target detection model is as follows:

s21, taking a target image in a training set as input, taking a target category and a two-dimensional boundary box as output training neural network models, testing and adjusting parameters on the testing set to obtain the trained neural network models, wherein the neural network models can take the target image as input and take the output as the target category in the target image and the two-dimensional boundary box in the image;

and S22, using the trained neural network model to further improve the reasoning speed by using TensorRT to obtain the accelerated neural network model.

Further, in step S3, the calibration method for the internal reference and the external reference of the camera is as follows:

s31, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P;

s32, using the laser radar to perform auxiliary calibration on an external reference matrix, wherein the external reference matrix refers to a transformation matrix from a world coordinate system to a camera coordinate system, the laser radar and the camera have a common installation height, and using an MATLAB tool to calibrate the transformation matrix between the camera and the radar to obtain a transformation matrix T from the laser radar to the camera _L2C ；

S33, defining a world coordinate system: the origin is a projection point of the radar origin on the ground, the xOy plane is the ground, the y axis is forward along the roadway, the x axis is rightward, the z axis is upward, a right-hand coordinate system is met, the installation height h of the laser radar is measured, and a translation vector t between the radar coordinate system and the world coordinate system is obtained through the definition of the world coordinate system _L2W ＝(0，0，h)；

S34, obtaining coordinates A of two points with the farthest distance on the edge of the roadway in the point cloud _L (x ₁ ，y ₁ ，z ₁ )，B _L (x ₂ ，y ₂ ，z ₂ ) Then, then

And->

Parallel to the y-axis of the world coordinate system, subscript L indicates that the coordinate is multiplied by a vector point in the lidar coordinate system, and->

Vector in direction of y axis of radar coordinate system->

The cosine value of the included angle between the two is as follows:

wherein normaize indicates the conversion of a vector into a unit vector,

to/>

The rotation axis of (a) is obtained by vector cross multiplication: />

Note the book

Is obtained according to the Rodrigues rotation formula>

To/>

Of (3) a rotation matrix R ₁ ：

S35, determining a ground normal vector so as to determine the rotation between the other two axes between the two coordinate systems, and fitting the ground part in the point cloud in the acquired point cloud by using a plane fitting method to obtain the normal vector of the ground under the radar coordinate system

In the same manner as in S34

To>

A calculation step of the rotation matrix, calculating a vector ≥>

To the xOy plane normal vector @underthe radar coordinate system>

Is R ₂ Then the rotation matrix R of the radar coordinate system to the world coordinate system _L2W Comprises the following steps:

R _L2W ＝R ₁ R ₂

calculating a transformation matrix from the radar coordinate system to the world coordinate system as follows:

the transformation matrix from the world coordinate system to the radar coordinate system is the inverse matrix T thereof _W2L ＝T′ _L2W ；

S36, by T _L2C And T _W2L Calculating to obtain a transformation matrix T from a world coordinate system to a camera coordinate system _W2C ：

T _W2C ＝T _W2L ·T _L2C 。

Further, the specific method for calculating the target three-dimensional coordinate in step S4 is as follows:

s41, utilization of T _W2C The origin O under the world coordinate system _W = (0, 0) and ground normal vector

Transformation to camera coordinate system:

note O _C ＝(x _o ，y _o ，z _o ) And

s42, obtaining a ground equation G (x, y, z) under a camera coordinate system according to a point-normal equation of the plane:

G(x，y，z)：n ₁ (x-x _o )+n ₂ (y-y _o )+n ₃ (z-z _o )＝0

s43, inputting the target image into the neural network model obtained in the step S2 to obtain the categories and the two-dimensional surrounding frame of all targets, and calculating the bottom edge center coordinate C of the two-dimensional surrounding frame of the targets _2d ＝(x _2d ，y _2d )；

S44, carrying out back projection by using the pinhole camera projection model, and calculating a two-dimensional coordinate C _2d Represents:

C _3d ＝(z _3d (x _2d -c _x )/f _x ，z _3d (y _2d -c _y )/f _y ，z _3d )

wherein f is _x And f _y The focal lengths in the x-axis and y-axis directions of the camera are respectively expressed and obtained by a camera projection matrix P, namely: f. of _x ＝P[0][0]，f _y ＝P[1][1]，z _3d ∈(0，∞]Representing a depth;

s45, mixing C _3d Substituting into the ground equation to calculate C _3d Is an unknown number z _3d ：

z _3d ＝(n ₁ x _o +n ₂ y _o +n ₃ z _o )/(n ₁ (x _2d -c _x )/f _x +n ₂ (y _2d -c _y )/f _y +n ₃ )

S46, utilizing the obtained three-dimensional coordinates C _3d Calculating the center coordinates C of the bottom surface of the target, calculating C _3d The coordinates under the world coordinate system are:

C _W ＝T _C2W C _3d ＝(x _W ，y _W ，z _W )

wherein, T _C2W ＝T′ _W2C ，T′ _W2C Is T _W2C Inverse matrix, calculated C _W And target bottom center C' _W An offset exists, the offset changes along with the change of a target course angle alpha, if the target course angle is along the direction of a roadway, alpha is equal to the included angle between the vector corresponding to the projection of the x axis of the projection camera coordinate system to the ground and the y axis of the world coordinate system, and according to the geometrical relationship, the offset is as follows:

offset＝(|l·sinα|+|w·cosα|)/2

wherein l and w are respectively the length and width of the target, and the bottom center coordinate C 'of the target in the world coordinate system is calculated by the definition of the world coordinate system' _W ：

C′ _W ＝(x _W -|cosα·offset|，y _W +|sinα·offset|，z _W )

The coordinates of the center of the target bottom under the camera coordinate system are calculated as:

C＝T _W2C C′ _W 。

further, in step S5, a specific method for calculating the target three-dimensional bounding box is as follows:

s51, calculating 8 vertex coordinates of the three-dimensional surrounding frame under the target coordinate system, and defining the target coordinate system: the xOz plane is coincident with the bottom surface of the target, namely the ground, the origin of coordinates is positioned at the central point of the bottom surface of the target, the y axis is vertical to the ground and faces downwards, the x axis is along the advancing direction of the target, the z axis points to the left side of the advancing direction of the target and accords with a right-hand coordinate system, and the origin of the target coordinate system is set as O _V = (0,0,0), then the three dimensional bounding box 8 vertex coordinates are calculated as:

wherein l, w and h respectively represent the length, width and height of the target;

s52, rotating the eight vertexes to a camera coordinate system to obtain a three-dimensional surrounding frame under the camera coordinate system, and using the ground normal vector under the camera coordinate system obtained in the step 4

And a target course angle alpha, rotating the target twice, translating again to obtain eight vertex coordinates under a camera coordinate system, and synchronizing the coordinates in the step 3 by utilizing a Rodrigues formula>

To/>

A step of calculating a rotation matrix, calculating a direction vector (0, -1, 0) to ^ 4 in the negative direction of the y-axis of the camera>

Is R _n Calculating a rotation matrix along the direction of the target course angle alpha:

the eight vertices are rotated to:

V′＝R _α R _n V

setting the center coordinate of the target bottom surface in the camera coordinate system calculated in the step 4 as C = (x) _c ，y _c ，z _c ) Then:

the obtained V' is the coordinates of eight top points under the camera coordinate system;

s53, projecting 8 vertex coordinates in a camera coordinate system to an image coordinate system to obtain two-dimensional coordinates of a three-dimensional surrounding frame on an image, and firstly projecting by using a camera projection matrix P:

V″′＝PV″

the x, y value of each point is divided by the pixel depth to obtain its coordinates in the image coordinate system:

/>

wherein x is _n ，y _n ，z _n Respectively, the coordinate values of the nth vertex in V'.

Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

firstly, the invention designs an efficient real-time three-dimensional target detection frame aiming at the unique underground environment and the camera installation mode of the coal mine, thereby greatly improving the speed of three-dimensional target detection. The detection model is accelerated by using TensorRT during deployment, and the inference speed of 50 frames per second is realized on the Yingvian 3060 display card.

Secondly, the accuracy and efficiency of target identification are greatly improved due to the use of the two-dimensional target detection model. By constructing a reasonable data set, the target detection model can achieve higher accuracy, and the occurrence of false detection is reduced. In addition, the two-dimensional target detection model can be flexibly adjusted and replaced, and the coupling degree between the modules is low.

Thirdly, as the invention only needs to label the two-dimensional information of the target for model training, the cost of data acquisition and data labeling is greatly reduced, the manpower and material resources are saved, and the transplantation is convenient.

Drawings

FIG. 1 is an overall flow chart of the present invention.

FIG. 2 is a schematic diagram of the calculation of three-dimensional coordinates of an object of the present invention.

FIG. 3 is a schematic diagram of the target bottom center offset calculation of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The invention discloses a method for rapidly detecting a monocular vision three-dimensional target under a coal mine, which has an overall flow chart shown as the attached figure 1 and comprises the following steps:

s3, calibrating internal parameters and external parameters of the camera: defining a world coordinate system, calibrating camera internal parameters by a checkerboard calibration method to obtain a camera projection matrix P, and calibrating camera external parameters by laser radar assistance to obtain a transformation matrix T from the world coordinate system to the camera coordinate system _W2C ；

s5, calculating a target three-dimensional bounding box: and calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinate C of the target obtained in the step S4, the course angle alpha of the target along the roadway direction and the length, width and height [ l, w, h ] prior information of the target.

Eye and/or liver device>

Parallel to the world coordinate system y-axis, the subscript L indicates that the coordinate is multiplied by a vector point, if, under the lidar coordinate system>

Direction vector to the y axis of the radar coordinate system &>

The cosine value of the included angle between the two is as follows:

wherein normaize indicates the conversion of a vector into a unit vector,

to/>

The rotation axis of (a) is obtained by vector cross multiplication:

note the book

Is obtained according to the Rodrigues rotation formula>

To>

Of (3) a rotation matrix R ₁ ：

S35, determining a ground normal vector so as to determine the rotation between the other two axes between the two coordinate systems, and fitting the ground part in the point cloud in the obtained point cloud by using a plane fitting method to obtain the radar seat of the groundNormal vector under the mark system

In the same manner as in S34

To>

A calculation step of the rotation matrix, calculating a vector ≥>

To the xOy plane normal vector @underthe radar coordinate system>

R _L2W ＝R ₁ R ₂

T _W2C ＝T _W2L ·T _L2C 。

Transformation to camera coordinate system:

note O _C ＝(x _o ，y _o ，z _o ) And

s42, obtaining a ground equation G (x, y, z) under a camera coordinate system by using a point-method equation of a plane:

G(x，y，z)：n ₁ (x-x _o )+n ₂ (y-y _o )+n ₃ (z-z _o )＝0

C _3d ＝(z _3d (x _2d -c _x )/f _x ，z _3d (y _2d -c _y )/f _y ，z _3d )

s45, mixing C _3d Substituting into the ground equation to calculate C _3d Unknown number z in _3d ：

C _W ＝T _C2W C _3d ＝(x _W ，y _W ，z _W )

wherein, T _C2W ＝T′ _W2C ，T′ _W2C Is T _W2C Inverse matrix, calculated C _W And target bottom center C' _W An offset exists, the offset changes along with the change of a target course angle alpha, if the target course angle is along the roadway direction, alpha is equal to the included angle between the vector corresponding to the projection of the x axis of the projection camera coordinate system to the ground and the y axis of the world coordinate system, and according to the geometrical relationship, the offset is as follows:

offset＝(|l·sinα|+|w·cosα|)/2

C′ _W ＝(x _W -|cosα·offset|，y _W +|sinα·offset|，z _W )

The coordinates of the center of the bottom of the target under the camera coordinate system are calculated as:

C＝T _W2C C′ _W 。

/>

To/>

Is R _n Calculating a rotation matrix along the direction of the target heading angle alpha:

the eight vertices are rotated to:

V′＝R _α R _n V

setting the center coordinate of the target bottom surface in the camera coordinate system calculated in the step 4 as C = (x) _c ，y _c ，z _c ) And then:

s53, projecting 8 vertex coordinates under a camera coordinate system to an image coordinate system to obtain two-dimensional coordinates of the three-dimensional enclosure frame on the image, and firstly projecting by using a camera projection matrix P:

V″′＝PV″

The method for detecting a three-dimensional target in a coal mine by rapid monocular vision provided by the embodiment of the present invention is described in detail above, and for a person skilled in the art, according to the idea of the embodiment of the present invention, there may be changes in the specific implementation manner and the application scope.

Claims

1. A method for detecting a coal mine underground rapid monocular vision three-dimensional target is characterized by comprising the following steps:

S4, calculating a three-dimensional coordinate of the target: inputting the image into the two-dimensional target detection model obtained in the step S2To obtain a target two-dimensional frame, calculating a two-dimensional coordinate C of the center of the bottom edge of the frame _2d Calculating a target three-dimensional coordinate C by combining internal and external parameters of the camera;

s5, calculating a target three-dimensional bounding box: and calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinate C of the target obtained in the step S4, the course angle alpha of the target along the roadway direction and the prior information of the length, width and height [ l, w, h of the target.

2. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, wherein in the step S1, a specific method for constructing a two-dimensional target detection data set is as follows:

s11, acquiring target image data of different types of targets in different environments by using a monocular camera installed in a coal mine, wherein the different environments comprise different illumination conditions and different postures of the targets;

s12, determining the number of classes of the targets to be detected, using a labelImg tool to perform two-dimensional frame labeling on the acquired images and derive labels, wherein each picture corresponds to one label and comprises the classes and two-dimensional boundary frames of all targets in the picture, and dividing the labeled data set into a training set and a testing set which comprise the pictures and corresponding label files.

3. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, wherein in the step S2, a specific method for training a two-dimensional target detection model is as follows:

4. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, characterized in that in step S3, the calibration method of the camera internal reference and external reference is as follows:

And->

Vector in direction of y axis of radar coordinate system->

The cosine value of the included angle between them is:

wherein normaize indicates the conversion of a vector into a unit vector,

to>

The rotation axis of (a) is obtained by vector cross multiplication:

note the book

Is obtained according to the Rodrigues rotation formula>

To>

Of (3) a rotation matrix R ₁ ：

In the same manner as in S34

To>

A calculation step of the rotation matrix, calculating a vector ≥>

xOy plane normal vector under radar coordinate system

R _L2W ＝R ₁ R ₂

T _W2C ＝T _W2L ·T _L2C 。

5. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, characterized in that the specific method for calculating the target three-dimensional coordinate in the step S4 is as follows:

s41, utilization of T _W2C The origin OW = (0, 0) and the ground normal vector in the world coordinate system

Transformation to camera coordinate system:

note O _C ＝(x _o ，y _o ，z _o ) And

G(x，y，z)：n ₁ (x-x _o )+n ₂ (y-y _o )+n ₃ (z-z _o )＝0

s43, inputting the target image into the neural network model obtained in the step S2 to obtain the categories and the two-dimensional surrounding frames of all targets, and calculating the bottom edge center coordinate C of the two-dimensional surrounding frame of the target _2d ＝(x _2d ，y _2d )；

C _3d ＝(z _3d (x _2d -c _x )/f _x ，z _3d (y _2d -c _y )/f _y ，z _3d )

wherein, f _x And f _y The focal lengths in the x-axis and y-axis directions of the camera are respectively expressed and obtained by a camera projection matrix P, namely: f. of _x ＝P[0][0]，f _y ＝P[1][1]，z _3d ∈(0，∞]Representing a depth;

S46, utilizing the obtained three-dimensional coordinates C _3d Calculating the center coordinates C of the bottom surface of the target, calculatingC _3d The coordinates under the world coordinate system are:

C _W ＝T _C2W C _3d ＝(x _W ，y _W ，z _W )

offset＝(|l·sinα|+|w·cosα|)/2

wherein l and w are respectively the length and the width of the target, and the bottom center coordinate C 'of the target in the world coordinate system is calculated by the definition of the world coordinate system' _W ：

C′ _W ＝(x _W -|cosα·offset|，y _W +|sinα·offset|，z _W )

C＝T _W2C C′ _W 。

6. the method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, wherein the specific method for calculating the target three-dimensional surrounding frame in the step S5 is as follows:

To>

A rotation matrix calculation step of calculating a direction vector (0, -1, 0) to { (R) } in the negative direction of the y-axis of the camera>

the eight vertices are rotated to:

V′＝R _α R _n V

the obtained V' is the coordinates of eight vertexes under the camera coordinate system;

V″′＝PV″