CN115984766A - Rapid monocular vision three-dimensional target detection method for underground coal mine - Google Patents

Rapid monocular vision three-dimensional target detection method for underground coal mine Download PDF

Info

Publication number
CN115984766A
CN115984766A CN202211571246.5A CN202211571246A CN115984766A CN 115984766 A CN115984766 A CN 115984766A CN 202211571246 A CN202211571246 A CN 202211571246A CN 115984766 A CN115984766 A CN 115984766A
Authority
CN
China
Prior art keywords
target
coordinate system
dimensional
camera
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211571246.5A
Other languages
Chinese (zh)
Inventor
赵佳琦
王斌
周勇
芦志广
阿卜杜穆塔利布·埃尔·萨迪克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202211571246.5A priority Critical patent/CN115984766A/en
Publication of CN115984766A publication Critical patent/CN115984766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method for rapidly detecting a monocular vision three-dimensional target under a coal mine, which comprises the following steps: s1, firstly, acquiring image data containing a specified target by using a camera installed underground, carrying out two-dimensional frame labeling, and constructing a target detection data set; s2, constructing a two-dimensional target detection network model and training on the constructed target detection data set to obtain a trained two-dimensional target detection model; s3, defining a world coordinate system, calibrating internal parameters of the camera by using a checkerboard calibration method, and completing calibration of the camera and external parameters of the world coordinate system by using laser radar assistance; s4, inputting the image into a two-dimensional target detection model to obtain a target two-dimensional frame, and calculating a target three-dimensional coordinate by combining internal and external parameters of a camera; and S5, calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinates of the target, the course angle of the target along the roadway direction and the length, width and height of the target.

Description

Method for rapidly detecting three-dimensional target by monocular vision under coal mine
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method for rapidly detecting a monocular vision three-dimensional target under a coal mine.
Background
In recent years, the accuracy of three-dimensional target detection is greatly improved by the laser radar-based algorithm. However, the disadvantages of lidar are: the cost is high and the device is easily influenced by the surrounding environment. When the laser radar module fails, the three-dimensional detection based on monocular camera data can improve the robustness of the detection system. Therefore, it is important how to implement reliable and accurate three-dimensional detection based on camera data. The method of estimating the three-dimensional bounding box from the image alone faces a greater challenge than the lidar-based method, since recovering three-dimensional information from two-dimensional input data is an ill-posed problem. However, despite this inherent difficulty, image-based three-dimensional object detection methods have found widespread use in the computer vision community over the past few years.
However, currently, research on a single-purpose three-dimensional target detection method is focused in the field of automatic driving, a camera for sensing is mounted on a vehicle, and well-known data sets include KITTI, waymo, nuScenes and the like, and the field of research on a fixed camera and an underground coal mine environment is very little. With the development of technology and safety requirements, various automatic devices such as an inspection robot, an autonomous navigation transport vehicle and the like begin to appear under a coal mine. Accurate positioning information is essential for these devices. Therefore, the camera installed at the top of the roadway is used for detecting the three-dimensional target of the autonomous vehicle in the roadway, so that the auxiliary vehicle is favorably positioned, and a manager is favorably monitored. Aiming at the complex underground environment of a coal mine, a monocular camera arranged at the top of a roadway is required to be used for detecting a three-dimensional target, and the existing method has the following problems:
(1) The camera installed at the top of the roadway is narrow and long in sensing range, and few information can be extracted. The underground coal mine has a long and narrow structure, the distance observed by a camera at the top of a roadway is farther than that observed by a camera installed on a vehicle, and the proportion of a target in an image is greatly changed. Moreover, because underground light is dark, depth features and visual features which can be extracted by a neural network are reduced, and the existing method is difficult to accurately deduce three-dimensional information of a target.
(2) The existing method relies on a large amount of label data for training, and the data acquisition and labeling process is time-consuming and labor-consuming.
(3) Although the existing method achieves higher reasoning speed, the method still has larger calculated amount and high energy consumption depending on a high-performance display card, and is difficult to transplant into underground equipment.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to overcome the defects of the prior art and realize a method for rapidly detecting a three-dimensional target by using a monocular camera installed under a coal mine. Compared with the existing monocular three-dimensional target detection method, the method disclosed by the invention fully utilizes the special environmental characteristics of the mine, has lower requirements on the calculation force of equipment and better instantaneity, and is more suitable for the underground environment of the coal mine.
The technical scheme is as follows: in order to achieve the aim of the invention, the invention provides a method for rapidly detecting a monocular vision three-dimensional target under a coal mine, which comprises the following steps:
s1, constructing a two-dimensional target detection data set: acquiring a data set with a specified target by using a monocular camera installed in a coal mine, determining the number of classes of the target in the data set, marking a two-dimensional frame and a class of the target in each image sample, randomly disordering the obtained image and a corresponding label, and dividing the image into a training set and a testing set;
s2, training a neural network model: training on a training set, testing on a testing set, and obtaining an optimal neural network model through parameter adjustment;
s3, calibrating internal parameters and external parameters of the camera: defining a world coordinate system, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P, and completing calibration of camera and world coordinate system external parameters by using laser radar assistance to obtain a transformation matrix T from the world coordinate system to the camera coordinate system W2C
S4, calculating a target three-dimensional coordinate: inputting the image into the two-dimensional target detection model obtained in the step S2 to obtain a target two-dimensional frame, and calculating a two-dimensional coordinate C of the center of the bottom edge of the frame 2d Calculating a target three-dimensional coordinate C by combining internal and external parameters of the camera;
s5, calculating a target three-dimensional bounding box: and calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinate C of the target, the course angle alpha of the target along the roadway direction and the length, width and height [ l, w, h ] priori information of the target, which are obtained in the step S4.
Further, in step S1, a specific method for constructing a two-dimensional target detection data set is as follows:
s11, acquiring target image data of different types of targets in different environments by using a monocular camera installed under a coal mine, wherein the different environments comprise different illumination conditions and different postures of the targets;
s12, determining the number of classes of the targets to be detected, using a labellmg tool to label the acquired images in a two-dimensional frame mode and derive labels, wherein each picture corresponds to one label and comprises the classes and two-dimensional boundary frames of all the targets in the picture, and dividing the labeled data set into a training set and a testing set which comprise the pictures and corresponding label files.
Further, in step S2, a specific method for training the two-dimensional target detection model is as follows:
s21, taking a target image in a training set as input, taking a target category and a two-dimensional boundary box as output training neural network models, testing and adjusting parameters on the testing set to obtain the trained neural network models, wherein the neural network models can take the target image as input and take the output as the target category in the target image and the two-dimensional boundary box in the image;
and S22, using the trained neural network model to further improve the reasoning speed by using TensorRT to obtain the accelerated neural network model.
Further, in step S3, the calibration method for the internal reference and the external reference of the camera is as follows:
s31, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P;
s32, using the laser radar to perform auxiliary calibration on an external reference matrix, wherein the external reference matrix refers to a transformation matrix from a world coordinate system to a camera coordinate system, the laser radar and the camera have a common installation height, and using an MATLAB tool to calibrate the transformation matrix between the camera and the radar to obtain a transformation matrix T from the laser radar to the camera L2C
S33, defining a world coordinate system: the origin is a projection point of the radar origin on the ground, the xOy plane is the ground, the y axis is forward along the roadway, the x axis is rightward, the z axis is upward, a right-hand coordinate system is met, the installation height h of the laser radar is measured, and a translation vector t between the radar coordinate system and the world coordinate system is obtained through the definition of the world coordinate system L2W =(0,0,h);
S34, obtaining coordinates A of two points with the farthest distance on the edge of the roadway in the point cloud L (x 1 ,y 1 ,z 1 ),B L (x 2 ,y 2 ,z 2 ) Then, then
Figure BDA0003987862100000031
And->
Figure BDA0003987862100000032
Parallel to the y-axis of the world coordinate system, subscript L indicates that the coordinate is multiplied by a vector point in the lidar coordinate system, and->
Figure BDA0003987862100000033
Vector in direction of y axis of radar coordinate system->
Figure BDA0003987862100000034
The cosine value of the included angle between the two is as follows:
Figure BDA0003987862100000035
wherein normaize indicates the conversion of a vector into a unit vector,
Figure BDA0003987862100000036
to/>
Figure BDA0003987862100000037
The rotation axis of (a) is obtained by vector cross multiplication: />
Figure BDA0003987862100000038
Note the book
Figure BDA0003987862100000039
Is obtained according to the Rodrigues rotation formula>
Figure BDA00039878621000000310
To/>
Figure BDA00039878621000000311
Of (3) a rotation matrix R 1
Figure BDA00039878621000000312
S35, determining a ground normal vector so as to determine the rotation between the other two axes between the two coordinate systems, and fitting the ground part in the point cloud in the acquired point cloud by using a plane fitting method to obtain the normal vector of the ground under the radar coordinate system
Figure BDA00039878621000000318
In the same manner as in S34
Figure BDA00039878621000000313
To>
Figure BDA00039878621000000314
A calculation step of the rotation matrix, calculating a vector ≥>
Figure BDA00039878621000000315
To the xOy plane normal vector @underthe radar coordinate system>
Figure BDA00039878621000000316
Is R 2 Then the rotation matrix R of the radar coordinate system to the world coordinate system L2W Comprises the following steps:
R L2W =R 1 R 2
calculating a transformation matrix from the radar coordinate system to the world coordinate system as follows:
Figure BDA00039878621000000317
the transformation matrix from the world coordinate system to the radar coordinate system is the inverse matrix T thereof W2L =T′ L2W
S36, by T L2C And T W2L Calculating to obtain a transformation matrix T from a world coordinate system to a camera coordinate system W2C
T W2C =T W2L ·T L2C
Further, the specific method for calculating the target three-dimensional coordinate in step S4 is as follows:
s41, utilization of T W2C The origin O under the world coordinate system W = (0, 0) and ground normal vector
Figure BDA0003987862100000041
Transformation to camera coordinate system:
Figure BDA0003987862100000042
note O C =(x o ,y o ,z o ) And
Figure BDA0003987862100000043
s42, obtaining a ground equation G (x, y, z) under a camera coordinate system according to a point-normal equation of the plane:
G(x,y,z):n 1 (x-x o )+n 2 (y-y o )+n 3 (z-z o )=0
s43, inputting the target image into the neural network model obtained in the step S2 to obtain the categories and the two-dimensional surrounding frame of all targets, and calculating the bottom edge center coordinate C of the two-dimensional surrounding frame of the targets 2d =(x 2d ,y 2d );
S44, carrying out back projection by using the pinhole camera projection model, and calculating a two-dimensional coordinate C 2d Represents:
C 3d =(z 3d (x 2d -c x )/f x ,z 3d (y 2d -c y )/f y ,z 3d )
wherein f is x And f y The focal lengths in the x-axis and y-axis directions of the camera are respectively expressed and obtained by a camera projection matrix P, namely: f. of x =P[0][0],f y =P[1][1],z 3d ∈(0,∞]Representing a depth;
s45, mixing C 3d Substituting into the ground equation to calculate C 3d Is an unknown number z 3d
z 3d =(n 1 x o +n 2 y o +n 3 z o )/(n 1 (x 2d -c x )/f x +n 2 (y 2d -c y )/f y +n 3 )
S46, utilizing the obtained three-dimensional coordinates C 3d Calculating the center coordinates C of the bottom surface of the target, calculating C 3d The coordinates under the world coordinate system are:
C W =T C2W C 3d =(x W ,y W ,z W )
wherein, T C2W =T′ W2C ,T′ W2C Is T W2C Inverse matrix, calculated C W And target bottom center C' W An offset exists, the offset changes along with the change of a target course angle alpha, if the target course angle is along the direction of a roadway, alpha is equal to the included angle between the vector corresponding to the projection of the x axis of the projection camera coordinate system to the ground and the y axis of the world coordinate system, and according to the geometrical relationship, the offset is as follows:
offset=(|l·sinα|+|w·cosα|)/2
wherein l and w are respectively the length and width of the target, and the bottom center coordinate C 'of the target in the world coordinate system is calculated by the definition of the world coordinate system' W
C′ W =(x W -|cosα·offset|,y W +|sinα·offset|,z W )
The coordinates of the center of the target bottom under the camera coordinate system are calculated as:
C=T W2C C′ W
further, in step S5, a specific method for calculating the target three-dimensional bounding box is as follows:
s51, calculating 8 vertex coordinates of the three-dimensional surrounding frame under the target coordinate system, and defining the target coordinate system: the xOz plane is coincident with the bottom surface of the target, namely the ground, the origin of coordinates is positioned at the central point of the bottom surface of the target, the y axis is vertical to the ground and faces downwards, the x axis is along the advancing direction of the target, the z axis points to the left side of the advancing direction of the target and accords with a right-hand coordinate system, and the origin of the target coordinate system is set as O V = (0,0,0), then the three dimensional bounding box 8 vertex coordinates are calculated as:
Figure BDA0003987862100000051
wherein l, w and h respectively represent the length, width and height of the target;
s52, rotating the eight vertexes to a camera coordinate system to obtain a three-dimensional surrounding frame under the camera coordinate system, and using the ground normal vector under the camera coordinate system obtained in the step 4
Figure BDA0003987862100000057
And a target course angle alpha, rotating the target twice, translating again to obtain eight vertex coordinates under a camera coordinate system, and synchronizing the coordinates in the step 3 by utilizing a Rodrigues formula>
Figure BDA0003987862100000052
To/>
Figure BDA0003987862100000053
A step of calculating a rotation matrix, calculating a direction vector (0, -1, 0) to ^ 4 in the negative direction of the y-axis of the camera>
Figure BDA0003987862100000054
Is R n Calculating a rotation matrix along the direction of the target course angle alpha:
Figure BDA0003987862100000055
the eight vertices are rotated to:
V′=R α R n V
setting the center coordinate of the target bottom surface in the camera coordinate system calculated in the step 4 as C = (x) c ,y c ,z c ) Then:
Figure BDA0003987862100000056
the obtained V' is the coordinates of eight top points under the camera coordinate system;
s53, projecting 8 vertex coordinates in a camera coordinate system to an image coordinate system to obtain two-dimensional coordinates of a three-dimensional surrounding frame on an image, and firstly projecting by using a camera projection matrix P:
V″′=PV″
the x, y value of each point is divided by the pixel depth to obtain its coordinates in the image coordinate system:
Figure BDA0003987862100000061
/>
wherein x is n ,y n ,z n Respectively, the coordinate values of the nth vertex in V'.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
firstly, the invention designs an efficient real-time three-dimensional target detection frame aiming at the unique underground environment and the camera installation mode of the coal mine, thereby greatly improving the speed of three-dimensional target detection. The detection model is accelerated by using TensorRT during deployment, and the inference speed of 50 frames per second is realized on the Yingvian 3060 display card.
Secondly, the accuracy and efficiency of target identification are greatly improved due to the use of the two-dimensional target detection model. By constructing a reasonable data set, the target detection model can achieve higher accuracy, and the occurrence of false detection is reduced. In addition, the two-dimensional target detection model can be flexibly adjusted and replaced, and the coupling degree between the modules is low.
Thirdly, as the invention only needs to label the two-dimensional information of the target for model training, the cost of data acquisition and data labeling is greatly reduced, the manpower and material resources are saved, and the transplantation is convenient.
Drawings
FIG. 1 is an overall flow chart of the present invention.
FIG. 2 is a schematic diagram of the calculation of three-dimensional coordinates of an object of the present invention.
FIG. 3 is a schematic diagram of the target bottom center offset calculation of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention discloses a method for rapidly detecting a monocular vision three-dimensional target under a coal mine, which has an overall flow chart shown as the attached figure 1 and comprises the following steps:
s1, constructing a two-dimensional target detection data set: acquiring a data set with a specified target by using a monocular camera installed in a coal mine, determining the number of classes of the target in the data set, marking a two-dimensional frame and a class of the target in each image sample, randomly disordering the obtained image and a corresponding label, and dividing the image into a training set and a testing set;
s2, training a neural network model: training on a training set, testing on a testing set, and obtaining an optimal neural network model through parameter adjustment;
s3, calibrating internal parameters and external parameters of the camera: defining a world coordinate system, calibrating camera internal parameters by a checkerboard calibration method to obtain a camera projection matrix P, and calibrating camera external parameters by laser radar assistance to obtain a transformation matrix T from the world coordinate system to the camera coordinate system W2C
S4, calculating a target three-dimensional coordinate: inputting the image into the two-dimensional target detection model obtained in the step S2 to obtain a target two-dimensional frame, and calculating a two-dimensional coordinate C of the center of the bottom edge of the frame 2d Calculating a target three-dimensional coordinate C by combining internal and external parameters of the camera;
s5, calculating a target three-dimensional bounding box: and calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinate C of the target obtained in the step S4, the course angle alpha of the target along the roadway direction and the length, width and height [ l, w, h ] prior information of the target.
Further, in step S1, a specific method for constructing a two-dimensional target detection data set is as follows:
s11, acquiring target image data of different types of targets in different environments by using a monocular camera installed under a coal mine, wherein the different environments comprise different illumination conditions and different postures of the targets;
s12, determining the number of classes of the targets to be detected, using a labellmg tool to label the acquired images in a two-dimensional frame mode and derive labels, wherein each picture corresponds to one label and comprises the classes and two-dimensional boundary frames of all the targets in the picture, and dividing the labeled data set into a training set and a testing set which comprise the pictures and corresponding label files.
Further, in step S2, a specific method for training the two-dimensional target detection model is as follows:
s21, taking a target image in a training set as input, taking a target category and a two-dimensional boundary box as output training neural network models, testing and adjusting parameters on the testing set to obtain the trained neural network models, wherein the neural network models can take the target image as input and take the output as the target category in the target image and the two-dimensional boundary box in the image;
and S22, using the trained neural network model to further improve the reasoning speed by using TensorRT to obtain the accelerated neural network model.
Further, in step S3, the calibration method for the internal reference and the external reference of the camera is as follows:
s31, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P;
s32, using the laser radar to perform auxiliary calibration on an external reference matrix, wherein the external reference matrix refers to a transformation matrix from a world coordinate system to a camera coordinate system, the laser radar and the camera have a common installation height, and using an MATLAB tool to calibrate the transformation matrix between the camera and the radar to obtain a transformation matrix T from the laser radar to the camera L2C
S33, defining a world coordinate system: the origin is a projection point of the radar origin on the ground, the xOy plane is the ground, the y axis is forward along the roadway, the x axis is rightward, the z axis is upward, a right-hand coordinate system is met, the installation height h of the laser radar is measured, and a translation vector t between the radar coordinate system and the world coordinate system is obtained through the definition of the world coordinate system L2W =(0,0,h);
S34, obtaining coordinates A of two points with the farthest distance on the edge of the roadway in the point cloud L (x 1 ,y 1 ,z 1 ),B L (x 2 ,y 2 ,z 2 ) Then, then
Figure BDA0003987862100000071
Eye and/or liver device>
Figure BDA0003987862100000072
Parallel to the world coordinate system y-axis, the subscript L indicates that the coordinate is multiplied by a vector point, if, under the lidar coordinate system>
Figure BDA0003987862100000073
Direction vector to the y axis of the radar coordinate system &>
Figure BDA0003987862100000074
The cosine value of the included angle between the two is as follows:
Figure BDA0003987862100000075
wherein normaize indicates the conversion of a vector into a unit vector,
Figure BDA0003987862100000076
to/>
Figure BDA0003987862100000077
The rotation axis of (a) is obtained by vector cross multiplication:
Figure BDA0003987862100000081
note the book
Figure BDA0003987862100000082
Is obtained according to the Rodrigues rotation formula>
Figure BDA0003987862100000083
To>
Figure BDA0003987862100000084
Of (3) a rotation matrix R 1
Figure BDA0003987862100000085
S35, determining a ground normal vector so as to determine the rotation between the other two axes between the two coordinate systems, and fitting the ground part in the point cloud in the obtained point cloud by using a plane fitting method to obtain the radar seat of the groundNormal vector under the mark system
Figure BDA0003987862100000086
In the same manner as in S34
Figure BDA0003987862100000087
To>
Figure BDA0003987862100000088
A calculation step of the rotation matrix, calculating a vector ≥>
Figure BDA0003987862100000089
To the xOy plane normal vector @underthe radar coordinate system>
Figure BDA00039878621000000810
Is R 2 Then the rotation matrix R of the radar coordinate system to the world coordinate system L2W Comprises the following steps:
R L2W =R 1 R 2
calculating a transformation matrix from the radar coordinate system to the world coordinate system as follows:
Figure BDA00039878621000000811
the transformation matrix from the world coordinate system to the radar coordinate system is the inverse matrix T thereof W2L =T′ L2W
S36, by T L2C And T W2L Calculating to obtain a transformation matrix T from a world coordinate system to a camera coordinate system W2C
T W2C =T W2L ·T L2C
Further, the specific method for calculating the target three-dimensional coordinate in step S4 is as follows:
s41, utilization of T W2C The origin O under the world coordinate system W = (0, 0) and ground normal vector
Figure BDA00039878621000000812
Transformation to camera coordinate system:
Figure BDA00039878621000000813
note O C =(x o ,y o ,z o ) And
Figure BDA00039878621000000814
s42, obtaining a ground equation G (x, y, z) under a camera coordinate system by using a point-method equation of a plane:
G(x,y,z):n 1 (x-x o )+n 2 (y-y o )+n 3 (z-z o )=0
s43, inputting the target image into the neural network model obtained in the step S2 to obtain the categories and the two-dimensional surrounding frame of all targets, and calculating the bottom edge center coordinate C of the two-dimensional surrounding frame of the targets 2d =(x 2d ,y 2d );
S44, carrying out back projection by using the pinhole camera projection model, and calculating a two-dimensional coordinate C 2d Represents:
C 3d =(z 3d (x 2d -c x )/f x ,z 3d (y 2d -c y )/f y ,z 3d )
wherein f is x And f y The focal lengths in the x-axis and y-axis directions of the camera are respectively expressed and obtained by a camera projection matrix P, namely: f. of x =P[0][0],f y =P[1][1],z 3d ∈(0,∞]Representing a depth;
s45, mixing C 3d Substituting into the ground equation to calculate C 3d Unknown number z in 3d
z 3d =(n 1 x o +n 2 y o +n 3 z o )/(n 1 (x 2d -c x )/f x +n 2 (y 2d -c y )/f y +n 3 )
S46, utilizing the obtained three-dimensional coordinates C 3d Calculating the center coordinates C of the bottom surface of the target, calculating C 3d The coordinates under the world coordinate system are:
C W =T C2W C 3d =(x W ,y W ,z W )
wherein, T C2W =T′ W2C ,T′ W2C Is T W2C Inverse matrix, calculated C W And target bottom center C' W An offset exists, the offset changes along with the change of a target course angle alpha, if the target course angle is along the roadway direction, alpha is equal to the included angle between the vector corresponding to the projection of the x axis of the projection camera coordinate system to the ground and the y axis of the world coordinate system, and according to the geometrical relationship, the offset is as follows:
offset=(|l·sinα|+|w·cosα|)/2
wherein l and w are respectively the length and width of the target, and the bottom center coordinate C 'of the target in the world coordinate system is calculated by the definition of the world coordinate system' W
C′ W =(x W -|cosα·offset|,y W +|sinα·offset|,z W )
The coordinates of the center of the bottom of the target under the camera coordinate system are calculated as:
C=T W2C C′ W
further, in step S5, a specific method for calculating the target three-dimensional bounding box is as follows:
s51, calculating 8 vertex coordinates of the three-dimensional surrounding frame under the target coordinate system, and defining the target coordinate system: the xOz plane is coincident with the bottom surface of the target, namely the ground, the origin of coordinates is positioned at the central point of the bottom surface of the target, the y axis is vertical to the ground and faces downwards, the x axis is along the advancing direction of the target, the z axis points to the left side of the advancing direction of the target and accords with a right-hand coordinate system, and the origin of the target coordinate system is set as O V = (0,0,0), then the three dimensional bounding box 8 vertex coordinates are calculated as:
Figure BDA0003987862100000091
/>
wherein l, w and h respectively represent the length, width and height of the target;
s52, rotating the eight vertexes to a camera coordinate system to obtain a three-dimensional surrounding frame under the camera coordinate system, and using the ground normal vector under the camera coordinate system obtained in the step 4
Figure BDA0003987862100000092
And a target course angle alpha, rotating the target twice, translating again to obtain eight vertex coordinates under a camera coordinate system, and synchronizing the coordinates in the step 3 by utilizing a Rodrigues formula>
Figure BDA0003987862100000093
To/>
Figure BDA0003987862100000094
A step of calculating a rotation matrix, calculating a direction vector (0, -1, 0) to ^ 4 in the negative direction of the y-axis of the camera>
Figure BDA0003987862100000101
Is R n Calculating a rotation matrix along the direction of the target heading angle alpha:
Figure BDA0003987862100000102
the eight vertices are rotated to:
V′=R α R n V
setting the center coordinate of the target bottom surface in the camera coordinate system calculated in the step 4 as C = (x) c ,y c ,z c ) And then:
Figure BDA0003987862100000103
the obtained V' is the coordinates of eight top points under the camera coordinate system;
s53, projecting 8 vertex coordinates under a camera coordinate system to an image coordinate system to obtain two-dimensional coordinates of the three-dimensional enclosure frame on the image, and firstly projecting by using a camera projection matrix P:
V″′=PV″
the x, y value of each point is divided by the pixel depth to obtain its coordinates in the image coordinate system:
Figure BDA0003987862100000104
wherein x is n ,y n ,z n Respectively, the coordinate values of the nth vertex in V'.
The method for detecting a three-dimensional target in a coal mine by rapid monocular vision provided by the embodiment of the present invention is described in detail above, and for a person skilled in the art, according to the idea of the embodiment of the present invention, there may be changes in the specific implementation manner and the application scope.

Claims (6)

1. A method for detecting a coal mine underground rapid monocular vision three-dimensional target is characterized by comprising the following steps:
s1, constructing a two-dimensional target detection data set: acquiring a data set with a specified target by using a monocular camera installed in a coal mine, determining the number of classes of the target in the data set, marking a two-dimensional frame and a class of the target in each image sample, randomly disordering the obtained image and a corresponding label, and dividing the image into a training set and a testing set;
s2, training a neural network model: training on a training set, testing on a testing set, and obtaining an optimal neural network model through parameter adjustment;
s3, calibrating internal parameters and external parameters of the camera: defining a world coordinate system, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P, and completing calibration of camera and world coordinate system external parameters by using laser radar assistance to obtain a transformation matrix T from the world coordinate system to the camera coordinate system W2C
S4, calculating a three-dimensional coordinate of the target: inputting the image into the two-dimensional target detection model obtained in the step S2To obtain a target two-dimensional frame, calculating a two-dimensional coordinate C of the center of the bottom edge of the frame 2d Calculating a target three-dimensional coordinate C by combining internal and external parameters of the camera;
s5, calculating a target three-dimensional bounding box: and calculating a three-dimensional surrounding frame of the target under a camera coordinate system by using the three-dimensional coordinate C of the target obtained in the step S4, the course angle alpha of the target along the roadway direction and the prior information of the length, width and height [ l, w, h of the target.
2. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, wherein in the step S1, a specific method for constructing a two-dimensional target detection data set is as follows:
s11, acquiring target image data of different types of targets in different environments by using a monocular camera installed in a coal mine, wherein the different environments comprise different illumination conditions and different postures of the targets;
s12, determining the number of classes of the targets to be detected, using a labelImg tool to perform two-dimensional frame labeling on the acquired images and derive labels, wherein each picture corresponds to one label and comprises the classes and two-dimensional boundary frames of all targets in the picture, and dividing the labeled data set into a training set and a testing set which comprise the pictures and corresponding label files.
3. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, wherein in the step S2, a specific method for training a two-dimensional target detection model is as follows:
s21, taking a target image in a training set as input, taking a target category and a two-dimensional boundary box as output training neural network models, testing and adjusting parameters on the testing set to obtain the trained neural network models, wherein the neural network models can take the target image as input and take the output as the target category in the target image and the two-dimensional boundary box in the image;
and S22, using the trained neural network model to further improve the reasoning speed by using TensorRT to obtain the accelerated neural network model.
4. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, characterized in that in step S3, the calibration method of the camera internal reference and external reference is as follows:
s31, calibrating camera internal parameters by using a checkerboard calibration method to obtain a camera projection matrix P;
s32, using the laser radar to perform auxiliary calibration on an external reference matrix, wherein the external reference matrix refers to a transformation matrix from a world coordinate system to a camera coordinate system, the laser radar and the camera have a common installation height, and using an MATLAB tool to calibrate the transformation matrix between the camera and the radar to obtain a transformation matrix T from the laser radar to the camera L2C
S33, defining a world coordinate system: the origin is a projection point of the radar origin on the ground, the xOy plane is the ground, the y axis is forward along the roadway, the x axis is rightward, the z axis is upward, a right-hand coordinate system is met, the installation height h of the laser radar is measured, and a translation vector t between the radar coordinate system and the world coordinate system is obtained through the definition of the world coordinate system L2W =(0,0,h);
S34, obtaining coordinates A of two points with the farthest distance on the edge of the roadway in the point cloud L (x 1 ,y 1 ,z 1 ),B L (x 2 ,y 2 ,z 2 ) Then, then
Figure FDA0003987862090000021
And->
Figure FDA0003987862090000022
Parallel to the world coordinate system y-axis, the subscript L indicates that the coordinate is multiplied by a vector point, if, under the lidar coordinate system>
Figure FDA0003987862090000023
Vector in direction of y axis of radar coordinate system->
Figure FDA0003987862090000024
The cosine value of the included angle between them is:
Figure FDA0003987862090000025
wherein normaize indicates the conversion of a vector into a unit vector,
Figure FDA0003987862090000026
to>
Figure FDA0003987862090000027
The rotation axis of (a) is obtained by vector cross multiplication:
Figure FDA0003987862090000028
note the book
Figure FDA0003987862090000029
Is obtained according to the Rodrigues rotation formula>
Figure FDA00039878620900000210
To>
Figure FDA00039878620900000211
Of (3) a rotation matrix R 1
Figure FDA00039878620900000212
S35, determining a ground normal vector so as to determine the rotation between the other two axes between the two coordinate systems, and fitting the ground part in the point cloud in the acquired point cloud by using a plane fitting method to obtain the normal vector of the ground under the radar coordinate system
Figure FDA00039878620900000213
In the same manner as in S34
Figure FDA00039878620900000214
To>
Figure FDA00039878620900000217
A calculation step of the rotation matrix, calculating a vector ≥>
Figure FDA00039878620900000218
xOy plane normal vector under radar coordinate system
Figure FDA00039878620900000215
Is R 2 Then the rotation matrix R of the radar coordinate system to the world coordinate system L2W Comprises the following steps:
R L2W =R 1 R 2
calculating a transformation matrix from the radar coordinate system to the world coordinate system as follows:
Figure FDA00039878620900000216
the transformation matrix from the world coordinate system to the radar coordinate system is the inverse matrix T thereof W2L =T′ L2W
S36, by T L2C And T W2L Calculating to obtain a transformation matrix T from a world coordinate system to a camera coordinate system W2C
T W2C =T W2L ·T L2C
5. The method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, characterized in that the specific method for calculating the target three-dimensional coordinate in the step S4 is as follows:
s41, utilization of T W2C The origin OW = (0, 0) and the ground normal vector in the world coordinate system
Figure FDA0003987862090000031
Transformation to camera coordinate system:
Figure FDA0003987862090000032
note O C =(x o ,y o ,z o ) And
Figure FDA0003987862090000033
s42, obtaining a ground equation G (x, y, z) under a camera coordinate system according to a point-normal equation of the plane:
G(x,y,z):n 1 (x-x o )+n 2 (y-y o )+n 3 (z-z o )=0
s43, inputting the target image into the neural network model obtained in the step S2 to obtain the categories and the two-dimensional surrounding frames of all targets, and calculating the bottom edge center coordinate C of the two-dimensional surrounding frame of the target 2d =(x 2d ,y 2d );
S44, carrying out back projection by using the pinhole camera projection model, and calculating a two-dimensional coordinate C 2d Represents:
C 3d =(z 3d (x 2d -c x )/f x ,z 3d (y 2d -c y )/f y ,z 3d )
wherein, f x And f y The focal lengths in the x-axis and y-axis directions of the camera are respectively expressed and obtained by a camera projection matrix P, namely: f. of x =P[0][0],f y =P[1][1],z 3d ∈(0,∞]Representing a depth;
s45, mixing C 3d Substituting into the ground equation to calculate C 3d Is an unknown number z 3d
z 3d =(n 1 x o +n 2 y o +n 3 z o )/(n 1 (x 2d -c x )/f x +n 2 (y 2d -c y )/f y +n 3 )
S46, utilizing the obtained three-dimensional coordinates C 3d Calculating the center coordinates C of the bottom surface of the target, calculatingC 3d The coordinates under the world coordinate system are:
C W =T C2W C 3d =(x W ,y W ,z W )
wherein, T C2W =T′ W2C ,T′ W2C Is T W2C Inverse matrix, calculated C W And target bottom center C' W An offset exists, the offset changes along with the change of a target course angle alpha, if the target course angle is along the direction of a roadway, alpha is equal to the included angle between the vector corresponding to the projection of the x axis of the projection camera coordinate system to the ground and the y axis of the world coordinate system, and according to the geometrical relationship, the offset is as follows:
offset=(|l·sinα|+|w·cosα|)/2
wherein l and w are respectively the length and the width of the target, and the bottom center coordinate C 'of the target in the world coordinate system is calculated by the definition of the world coordinate system' W
C′ W =(x W -|cosα·offset|,y W +|sinα·offset|,z W )
The coordinates of the center of the bottom of the target under the camera coordinate system are calculated as:
C=T W2C C′ W
6. the method for detecting the underground coal mine rapid monocular vision three-dimensional target according to claim 1, wherein the specific method for calculating the target three-dimensional surrounding frame in the step S5 is as follows:
s51, calculating 8 vertex coordinates of the three-dimensional surrounding frame under the target coordinate system, and defining the target coordinate system: the xOz plane is coincident with the bottom surface of the target, namely the ground, the origin of coordinates is positioned at the central point of the bottom surface of the target, the y axis is vertical to the ground and faces downwards, the x axis is along the advancing direction of the target, the z axis points to the left side of the advancing direction of the target and accords with a right-hand coordinate system, and the origin of the target coordinate system is set as O V = (0,0,0), then the three dimensional bounding box 8 vertex coordinates are calculated as:
Figure FDA0003987862090000041
wherein l, W and h respectively represent the length, width and height of the target;
s52, rotating the eight vertexes to a camera coordinate system to obtain a three-dimensional surrounding frame under the camera coordinate system, and using the ground normal vector under the camera coordinate system obtained in the step 4
Figure FDA0003987862090000044
And a target course angle alpha, rotating the target twice, translating again to obtain eight vertex coordinates under a camera coordinate system, and synchronizing the coordinates in the step 3 by utilizing a Rodrigues formula>
Figure FDA0003987862090000045
To>
Figure FDA0003987862090000046
A rotation matrix calculation step of calculating a direction vector (0, -1, 0) to { (R) } in the negative direction of the y-axis of the camera>
Figure FDA0003987862090000047
Is R n Calculating a rotation matrix along the direction of the target heading angle alpha:
Figure FDA0003987862090000042
the eight vertices are rotated to:
V′=R α R n V
setting the center coordinate of the target bottom surface in the camera coordinate system calculated in the step 4 as C = (x) c ,y c ,z c ) And then:
Figure FDA0003987862090000043
the obtained V' is the coordinates of eight vertexes under the camera coordinate system;
s53, projecting 8 vertex coordinates in a camera coordinate system to an image coordinate system to obtain two-dimensional coordinates of a three-dimensional surrounding frame on an image, and firstly projecting by using a camera projection matrix P:
V″′=PV″
the x, y value of each point is divided by the pixel depth to obtain its coordinates in the image coordinate system:
Figure FDA0003987862090000051
wherein x is n ,y n ,z n Respectively, the coordinate values of the nth vertex in V'.
CN202211571246.5A 2022-12-08 2022-12-08 Rapid monocular vision three-dimensional target detection method for underground coal mine Pending CN115984766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211571246.5A CN115984766A (en) 2022-12-08 2022-12-08 Rapid monocular vision three-dimensional target detection method for underground coal mine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211571246.5A CN115984766A (en) 2022-12-08 2022-12-08 Rapid monocular vision three-dimensional target detection method for underground coal mine

Publications (1)

Publication Number Publication Date
CN115984766A true CN115984766A (en) 2023-04-18

Family

ID=85973075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211571246.5A Pending CN115984766A (en) 2022-12-08 2022-12-08 Rapid monocular vision three-dimensional target detection method for underground coal mine

Country Status (1)

Country Link
CN (1) CN115984766A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681778A (en) * 2023-06-06 2023-09-01 固安信通信号技术股份有限公司 Distance measurement method based on monocular camera
CN117197149A (en) * 2023-11-08 2023-12-08 太原理工大学 Cooperative control method of tunneling and anchoring machine and anchor rod trolley

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681778A (en) * 2023-06-06 2023-09-01 固安信通信号技术股份有限公司 Distance measurement method based on monocular camera
CN116681778B (en) * 2023-06-06 2024-01-09 固安信通信号技术股份有限公司 Distance measurement method based on monocular camera
CN117197149A (en) * 2023-11-08 2023-12-08 太原理工大学 Cooperative control method of tunneling and anchoring machine and anchor rod trolley

Similar Documents

Publication Publication Date Title
CN109270534B (en) Intelligent vehicle laser sensor and camera online calibration method
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN115984766A (en) Rapid monocular vision three-dimensional target detection method for underground coal mine
CN107677274B (en) Unmanned plane independent landing navigation information real-time resolving method based on binocular vision
CN108594245A (en) A kind of object movement monitoring system and method
CN109993793B (en) Visual positioning method and device
CN106645205A (en) Unmanned aerial vehicle bridge bottom surface crack detection method and system
CN110031829B (en) Target accurate distance measurement method based on monocular vision
CN103075998B (en) A kind of monocular extraterrestrial target range finding angle-measuring method
CN111046776A (en) Mobile robot traveling path obstacle detection method based on depth camera
CN112037159B (en) Cross-camera road space fusion and vehicle target detection tracking method and system
CN104268935A (en) Feature-based airborne laser point cloud and image data fusion system and method
US20200357141A1 (en) Systems and methods for calibrating an optical system of a movable object
CN101702233B (en) Three-dimension locating method based on three-point collineation marker in video frame
CN102589530B (en) Method for measuring position and gesture of non-cooperative target based on fusion of two dimension camera and three dimension camera
CN107784038B (en) Sensor data labeling method
CN101520892B (en) Detection method of small objects in visible light image
CN103886107A (en) Robot locating and map building system based on ceiling image information
CN113050074B (en) Camera and laser radar calibration system and calibration method in unmanned environment perception
Ma et al. Crlf: Automatic calibration and refinement based on line feature for lidar and camera in road scenes
CN114608554B (en) Handheld SLAM equipment and robot instant positioning and mapping method
CN113313116B (en) Underwater artificial target accurate detection and positioning method based on vision
CN114692720A (en) Image classification method, device, equipment and storage medium based on aerial view
CN114004977A (en) Aerial photography data target positioning method and system based on deep learning
CN111368797A (en) Target real-time ranging method based on road end monocular camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination