WO2024131200A1 - 基于单目视觉的车辆3d定位方法、装置及汽车 - Google Patents

基于单目视觉的车辆3d定位方法、装置及汽车 Download PDF

Info

Publication number
WO2024131200A1
WO2024131200A1 PCT/CN2023/122151 CN2023122151W WO2024131200A1 WO 2024131200 A1 WO2024131200 A1 WO 2024131200A1 CN 2023122151 W CN2023122151 W CN 2023122151W WO 2024131200 A1 WO2024131200 A1 WO 2024131200A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
target
detection
side line
frame
Prior art date
Application number
PCT/CN2023/122151
Other languages
English (en)
French (fr)
Inventor
郑敏鹏
熊光银
卢金波
Original Assignee
惠州市德赛西威智能交通技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 惠州市德赛西威智能交通技术研究院有限公司 filed Critical 惠州市德赛西威智能交通技术研究院有限公司
Publication of WO2024131200A1 publication Critical patent/WO2024131200A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of vehicle position detection and tracking, and in particular to a vehicle 3D positioning method, device and vehicle based on monocular vision.
  • the vehicle 3D is obtained through regression algorithm.
  • 3D (three-dimensional) target detection of a single RGB image "Monocular 3D Object Detection and BOX Fitting trained end-to-end using intersecton over union loss"
  • Input a single RGB image directly output the predicted target category, 2D (two-dimensional) box position, target distance, target deflection, length, width and height of the 3D box, and the coordinate position of the eight vertices of the 3D box projected to 2D through a CNN (Convolutional Neural Networks).
  • the best 2D box is extracted through NMS (Non-Maximum Suppression).
  • 3D BOXFitting is converted into three information: target category, 2D box, and 3D box, which correspond to the annotated groundtruth (true value, real and effective value) one by one, and the IOUloss is optimized for network regression training.
  • This method requires the annotated data to have groundtruth information such as length, width, height, and orientation.
  • the deep learning-based method requires a large amount of groundtruth data as a prerequisite to obtain the target 3D detection frame through training. If the data is small, the trained model will often overfit and have weak generalization ability. Under normal road conditions, there are many vehicles ahead. If the data is not fused with the LiDAR data when collecting data, it is difficult to obtain the actual physical size and orientation information of the length, width, and height of the vehicle ahead by only splicing the monocular camera. However, the cost of LiDAR is high, and the LiDAR and the monocular camera need to be calibrated together, and further data fusion is required in the later stage to obtain the relevant actual physical size and orientation information of the length, width, and height of the vehicle ahead.
  • the 3D data information of the front vehicle is difficult to collect and obtain, and it is difficult to meet the training data requirements.
  • To obtain the target 3D frame through deep learning methods a large amount of 3D data information of the front vehicle that is difficult to collect is required.
  • the universality of the calibrated 3D data information is poor. When the camera is replaced (internal parameter changes) or the installation position is deviated (external parameter changes), it will seriously affect the use effect of the model.
  • the present invention proposes a vehicle 3D positioning method, device and automobile based on monocular vision, which adopts a monocular vehicle 3D target detection and tracking method based on multi-task joint of convolutional neural network, integrates vehicle body key point regression, side edge angle regression, vehicle body posture classification, target detection and feature vector generation together, allows a certain degree of occlusion and can accurately obtain the 3D target frame of the target vehicle, has strong anti-interference ability, and can effectively detect and track the target vehicle 3D.
  • the present invention provides a vehicle 3D detection and tracking method based on monocular vision, comprising the steps of:
  • S1 Acquire a real-time captured image of a monocular camera, wherein the real-time captured image includes a real-time captured image of at least one target vehicle in front of the monocular camera of the vehicle.
  • S2 Processing the real-time collected image through a multi-task recognition model to obtain first target information.
  • S3 Obtain parameter information corresponding to the monocular camera, and calculate second target information of the target vehicle in combination with the first target information.
  • each feature point corresponds to a set of detection results, without the need for subsequent matching, thus improving target tracking efficiency.
  • the first target information includes the 2D full body frame, target vehicle feature vector, target vehicle body posture and target vehicle body side line information corresponding to the target vehicle;
  • the second target information includes the 3D full body frame of the target vehicle.
  • the combination of the first target information and the second target information can provide rich target vehicle information, which helps to improve the accuracy of subsequent target vehicle tracking.
  • the step S2 is to obtain the first target information by processing through the multi-task recognition model, which specifically includes:
  • S201 Preprocess the real-time collected image through a convolutional neural network model to obtain a thermal map.
  • S202 Filter feature points in the heat map that are higher than a preset threshold as target feature data, obtain vehicle type parameters corresponding to the feature points, and calculate a 2D detection box classification list according to the CenterNet target detection algorithm.
  • the first target information can be effectively extracted through processing by a multi-task recognition model, and has the characteristics of high efficiency, accuracy and comprehensiveness.
  • step S201 specifically includes:
  • the image size of the input end of the convolutional neural network model is set for the real-time collected image according to the camera resolution ratio and network structure parameters.
  • the multi-layer feature nodes are normalized according to the image size ratio at the input end of the convolutional neural network model; the ghost module is used to perform linear transformation of image features to obtain ghost features, and then the weighted values of ghost features of different scales are fused through the BiFPN feature fusion algorithm to output multi-layer feature nodes; the multi-layer feature nodes are normalized to obtain a heat map.
  • the ghost module can use fewer operations and can reduce the amount of calculation by appropriately reducing the convolutional feature layer, bringing richer feature expression, which is suitable for application in the field of autonomous driving with high real-time requirements.
  • BiFPN can fuse features of different scales with weighted values, allowing the network to learn the weights of different input features by itself.
  • the BiFPN application method is simple and easy to use, and BiFPN can be used multiple times during the feature fusion process according to the complexity of the detection task and the computing power of the hardware.
  • the first solution for performing side line detection in step S203 specifically includes:
  • the coordinates of the midpoint C are calculated based on the coordinates of the central feature point and its coordinate deviation.
  • the coordinates of the side line endpoints A and B are calculated based on the coordinates of the midpoint C, the length of the spacing line segment, and the angle of the line segment relative to the x-axis.
  • the method for calculating the coordinates of the side line endpoints A and B does not rely on the accuracy of the 2D detection frame.
  • the second solution for performing side line detection in step S203 specifically includes:
  • the method for calculating the coordinates of the side line endpoints A and B utilizes the information of the 2D detection box, and the amount of regression tasks is small.
  • the third solution for performing side line detection in step S203 specifically includes:
  • the two ends of the line segment where the side line endpoints A and B are located on the 2D detection frame are used as prior information, and the relative position ratio of the bottom edge of the bottom edge line endpoint is output.
  • the coordinates of the side line endpoint A are calculated according to the coordinates of the upper left corner, the frame width and frame height, and the relative position ratio of the bottom edge of the 2D detection frame.
  • the method for calculating the coordinates of the side line endpoints A and B is applied when the posture of the front vehicle is a visible front and side body or a visible rear and side body, and the angle of the line segment where the side line endpoints A and B are located relative to the x-axis is obvious; and when the side line is visible in the front view, the angle value of the line segment where the side line endpoints A and B are located relative to the x-axis cannot be within a small neighborhood centered on 0.5 (i.e.
  • endpoint A can be obtained by the intersection of the outer vertical line of the taillight or the headlight with the frame, and the feature is obvious; and in one embodiment, the angle of the line segment where the side line endpoints A and B are located relative to the x-axis and the relative position ratio of the bottom edge are both in the range of [0, 1], which is convenient for network regression learning.
  • step S3 Before executing step S3, the process includes:
  • the internal and external parameters of the camera are calibrated respectively, and the image coordinates are established to the ground plane vehicle coordinate system with the camera projection as the origin.
  • the first case of calculating the second target information of the target vehicle in step S3 specifically includes:
  • S301 Select the side line endpoints of the 2D detection frame and the endpoints on the invisible vehicle side, and correspondingly obtain the coordinate mapping of the side line endpoints of the target vehicle detection frame and the endpoints on the invisible vehicle side in the ground plane vehicle coordinate system.
  • S303 Calculate the 3D coordinate system parameters of the target vehicle in the ground plane vehicle coordinate system according to the coordinate mapping of the endpoints of the side line of the target vehicle detection frame and the endpoints on the invisible side of the vehicle, respectively.
  • the 3D coordinate system parameters include at least vehicle length, vehicle width, vehicle height and angle relative to the origin.
  • S311 Select the bottom edge endpoint and midpoint of the 2D detection frame, and correspondingly obtain the coordinate mapping of the target vehicle detection frame endpoint and midpoint in the ground plane vehicle coordinate system.
  • S312 Calculate the 3D coordinate system parameters of the target vehicle in the ground plane vehicle coordinate system according to the coordinate mapping of the endpoints and the midpoint of the target vehicle detection frame.
  • the key point coordinates of the target vehicle can be represented in the ground plane vehicle coordinate system, which can facilitate subsequent calculations and analysis; calculating the target vehicle 3D coordinate system parameters is helpful for subsequent tracking of the target vehicle.
  • the step S4 specifically includes: performing 3D target tracking by a deepsort target tracking method according to the second target information.
  • DeepSORT is a real-time target tracking method with high processing speed and real-time performance; and the DeepSORT method can track multiple targets at the same time, so it is suitable for situations where there are multiple target vehicles in complex scenes.
  • the present invention also provides a vehicle 3D detection and tracking device based on monocular vision, the device at least comprising:
  • the first computing unit is used to process the real-time collected image through a multi-task recognition model to obtain first target information, wherein the first target information includes a 2D full body frame corresponding to the target vehicle, a target vehicle feature vector, a target vehicle body posture and a target vehicle body side line information.
  • the second acquisition module is used to obtain parameter information corresponding to the monocular camera.
  • a tracking unit is used to perform 3D target tracking through a deepsort target tracking method according to the second target information.
  • the first computing unit further includes:
  • the preprocessing module is used to preprocess the real-time acquired images through a convolutional neural network model.
  • the data screening module is used to select the feature points above the preset threshold in the thermal map obtained by the preprocessing module as target feature data.
  • the first detection module is used to obtain the vehicle type parameters corresponding to the feature points, and calculate a 2D detection box classification list according to the CenterNet target detection algorithm.
  • the second detection module is used to output the body posture information of the corresponding position for each 2D detection frame, and obtain the posture of the target vehicle through the argmax operator; if the posture is Front_Side or Rear_Side, side line detection is performed to obtain the coordinates of the side line endpoints A and B.
  • the second calculation unit further includes:
  • a coordinate mapping module for obtaining the coordinate mapping of the target vehicle detection frame endpoint and the midpoint in the ground plane vehicle coordinate system according to the selected 2D detection frame bottom edge endpoint and the midpoint, or for obtaining the coordinate mapping of the target vehicle detection frame side line endpoint and the endpoint on the invisible vehicle side in the ground plane vehicle coordinate system according to the selected 2D detection frame side line endpoint and the endpoint on the invisible vehicle side;
  • the parameter calculation module is used to calculate the 3D coordinate system parameters of the target vehicle in the ground plane vehicle coordinate system according to the results of the coordinate mapping module.
  • the 3D coordinate system parameters at least include the vehicle length, vehicle width, vehicle height and angle relative to the origin.
  • the present invention also provides a car, which is equipped with a monocular camera and a vehicle 3D detection and tracking device based on monocular vision.
  • the device uses the vehicle 3D detection and tracking method based on monocular vision as described above to process the real-time image of at least one target vehicle in front of the car captured in real time by the monocular camera.
  • the present invention provides a vehicle 3D positioning method, device and vehicle based on monocular vision, which obtains first target information by acquiring images from a monocular camera in real time and processing them through a multi-task recognition model; after obtaining parameter information corresponding to the monocular camera, the second target information of the target vehicle is calculated in combination with the first target information, and finally the target vehicle is tracked according to the second target information.
  • the present invention uses a monocular camera on a vehicle to collect images in real time when the vehicle is moving, and extracts feature points and feature lines in the image.
  • the 2D frame diagram of the vehicle is obtained using the feature points and feature lines, and then the coordinate mapping of the endpoints and midpoints of the target vehicle detection frame in the ground plane vehicle coordinate system is correspondingly obtained, thereby obtaining the 3D coordinate system parameters of the target vehicle.
  • the optimized camera pose of the monocular camera is obtained based on the fusion optimization of the feature points and feature lines. In an environment with less texture, a camera pose with a high accuracy can still be obtained, and then a vehicle pose with a high accuracy can be obtained. The influence of the texture change of the environment on the vehicle positioning can be reduced, and the robustness of the vehicle positioning system is greatly improved.
  • the present invention uses the same-scale feature layer for body side line detection and vehicle detection. Each feature point corresponds to a set of detection results, and there is no need for subsequent matching.
  • the vehicle side line detection in the present invention has strong anti-interference ability and allows a certain degree of occlusion.
  • the vehicle body posture classification task enables various scenes to be processed in a targeted manner.
  • the present invention can output the cuboid target of the preceding vehicle in the current vehicle 3D coordinate system.
  • the present invention combines vehicle 3D detection with feature detection to improve target tracking efficiency.
  • FIG1 is a flow chart of the vehicle 3D detection and tracking method based on monocular vision according to the present invention.
  • FIG2 is a diagram of a convolutional neural network model according to the present invention.
  • FIG. 3 is a schematic diagram of a ghost module according to the present invention.
  • FIG. 4 is a schematic diagram of a method for calculating a vehicle body side line in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a method for calculating a vehicle body side line in another embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a method for calculating a vehicle body side line in another embodiment of the present invention.
  • FIG. 7 is a schematic diagram of target feature vector output in another embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a vehicle coordinate system in another embodiment of the present invention.
  • FIG. 9 is a schematic diagram of image-to-vehicle coordinate system mapping in another embodiment of the present invention.
  • FIG. 10 is a schematic diagram of projection analysis of another embodiment of the present invention in which the side line of the front vehicle is invisible.
  • FIG. 11 is a system framework diagram of the vehicle 3D detection and tracking method based on monocular vision described in FIG. 1 .
  • the present invention provides a vehicle 3D detection and tracking method based on monocular vision, comprising the steps of:
  • S1 Acquire a real-time captured image of a monocular camera, wherein the real-time captured image includes a real-time captured image of at least one target vehicle in front of the monocular camera of the vehicle.
  • S2 Processing the real-time collected image through a multi-task recognition model to obtain first target information.
  • the first target information includes the 2D full body frame corresponding to the target vehicle, the target vehicle feature vector, the target vehicle body posture and the target vehicle body side line information.
  • S3 Obtain parameter information corresponding to the monocular camera, and calculate second target information of the target vehicle in combination with the first target information.
  • the second target information includes the 3D full body frame of the target vehicle.
  • the step S2 is to obtain the first target information by processing through the multi-task recognition model, which specifically includes:
  • S201 Preprocess the real-time collected image through a convolutional neural network model to obtain a thermal map.
  • the preprocessing specifically includes: setting the image size of the input end of the convolutional neural network model according to the camera resolution ratio and network structure parameters of the real-time collected image; normalizing the multi-layer feature nodes according to the image size ratio of the input end of the convolutional neural network model; using the ghost module to perform linear transformation of image features to obtain ghost features, and then fusing the weighted values of ghost features of different scales through the BiFPN feature fusion algorithm to output multi-layer feature nodes; normalizing the multi-layer feature nodes to obtain a heat map.
  • the convolutional neural network model includes: input unit, backbone network, feature fusion and multi-task detection, specifically:
  • the input unit obtains the image collected by the monocular camera in real time.
  • the backbone network uses a ResNet network structure based on the ghost module (GhostNet).
  • the ghost module obtains ghost features through simple linear transformation of existing features.
  • the ghost module can use fewer operations and reduce the amount of calculation by appropriately reducing the convolutional feature layer, bringing richer feature expressions, which is suitable for applications in the field of autonomous driving with high real-time requirements.
  • BiFPN feature fusion Use the BiFPN feature fusion algorithm launched by Google. Unlike the traditional approach of directly stacking features of different scales, BiFPN can fuse features of different scales by weighting them, allowing the network to learn the weights of different input features on its own.
  • the BiFPN application method is simple and easy to use. During the feature fusion process, BiFPN can be used multiple times according to the complexity of the detection task and the computing power of the hardware. Take the 3-layer BiFPN as an example, as shown in Figure 4.
  • Multi-task detection You can select one (such as P3_out), multiple or all BiFPNs output feature nodes for detection. When selecting some output feature layers, you can delete other output feature nodes.
  • the input size can be set (example size, including but not limited to):
  • P3_out_c 192. According to the input size, the P3_out size is:
  • S202 Filter feature points in the heat map that are higher than a preset threshold as target feature data, obtain vehicle type parameters corresponding to the feature points, calculate the feature point distribution heat map, target center relative feature point deviation regression value and target frame width and height regression value according to the CenterNet target detection algorithm, and then calculate a 2D detection frame classification list, and record the 2D detection frame at the corresponding coordinate position on the heat map.
  • 2D object detection uses the centerNet algorithm.
  • Each vehicle type (10 categories are set, not limited to 10 categories) corresponds to an output sequence (heat map, center deviation regression, width and height regression).
  • the heat map uses the focal_loss loss function, and the center deviation regression and width and height regression use the L2 loss function.
  • the 2D detection box classification list uses a 6-category softmax structure and a cross entropy loss function.
  • Body posture classification illustrate Front Only the front of the car is visible (the sight direction is facing the front of the car) Rear Only the rear of the car is visible (the direction of sight is facing the rear of the car) Left_Side Only the side body is seen (the direction of the target body is perpendicular to the direction of sight), the front of the car is facing left Right_Side Only the side body is seen (the direction of the target body is perpendicular to the direction of sight), the front of the car is facing right Front_Side Visible front + side Rear_Side Visible rear + side
  • each feature position (96 x 64) corresponds to a detection result.
  • the present invention uses the L2 loss function to determine the side line. Specifically, three schemes for determining the side line of the vehicle body are provided:
  • A_x C_x - length/2 * cos(angle*180°);
  • A_y C_y + length/2 * sin(angle*180°);
  • B_x C_x + length/2 * cos(angle*180°);
  • B_y C_y - length/2 * sin(angle*180°).
  • the bottom relative position ratio of the bottom line endpoint is output; then the coordinates of the side line endpoint A are calculated according to the upper left corner coordinates, frame width and frame height of the 2D detection frame and the bottom relative position ratio; further, the side relative position of the side line endpoint is obtained, and it is determined whether the side relative position is less than or equal to a preset threshold.
  • the coordinates of the side line endpoint B are calculated according to the upper left corner coordinates, frame height and side relative position of the 2D detection frame; otherwise, the coordinates of the side line endpoint B are calculated according to the upper left corner coordinates, frame width and side relative position of the 2D detection frame.
  • the two ends of the line segment where the side line endpoints A and B are located on the 2D detection box are used as prior information, and the bottom edge relative position ratio reg_x of the bottom edge endpoint is output, where 0 represents the left end of the bottom line and 1 represents the right end of the bottom line.
  • the side relative position reg_y of the side line endpoint where [0, 0.5] represents the left side of the box, 0 is on the top, and 0.5 is on the bottom; (0.5, 1] represents the right side of the box, 0.5 is on the top, and 1 is on the bottom;
  • the calculation formula for side line detection is:
  • A_x Box_x + Box_w * reg_x
  • A_y Box_y + Box_h;
  • B_y Box_y + Box_h * (reg_y – 0.5).
  • the relative position ratio of the bottom edge of the bottom line endpoint is output; then the coordinates of the side line endpoint A are calculated according to the upper left corner coordinates, frame width and frame height of the 2D detection frame and the relative position ratio of the bottom edge; further obtain the angle of the line segment where the side line endpoints A and B are located relative to the x-axis, and calculate the coordinates of the side line endpoint B according to the upper left corner coordinates, frame width, frame height, the relative position ratio of the bottom edge and the angle of the 2D detection frame.
  • the side line detection includes:
  • the two ends of the line segment where the side line endpoints A and B are located on the 2D detection frame are used as prior information, and the relative position ratio reg_x of the bottom line endpoint of the bottom line is output, where 0 means on the left side of the bottom line and 1 means on the right side of the bottom line; the line segment is relative to the x-axis r, and the value range is [0, 1]; the formula for calculating the side line detection is:
  • A_x Box_x + Box_w * reg_x
  • A_y Box_y + Box_h;
  • B_y Box_y + Box_h - Box_w * (1 - reg_x) * tan(angle*180°);
  • B_y Box_y + Box_h + Box_w * reg_x * tan(angle*180°).
  • the feature vector is followed by the tracking ID classifier, and the number of categories is the number of IDs in the training sample set, ensuring that the targets with the same ID in the training set are projected to the same ID category, and targets with different IDs are projected to different ID categories. In this way, a target feature vector with individual differences can be obtained.
  • it is only saved until the target feature vector is output.
  • step S3 Before executing step S3, the process includes:
  • the internal and external parameters of the camera are calibrated respectively, and the image coordinates are established to the ground plane vehicle coordinate system with the camera projection as the origin, as shown in Figure 8.
  • the first case of calculating the second target information of the target vehicle in step S3 of the present invention (the posture is that the front and side of the vehicle are visible or the rear and side of the vehicle are visible) specifically includes:
  • S301 Select the side line endpoints of the 2D detection frame and the endpoints on the invisible vehicle side, and correspondingly obtain the coordinate mapping of the side line endpoints of the target vehicle detection frame and the endpoints on the invisible vehicle side in the ground plane vehicle coordinate system.
  • S302 Calculate the 3D coordinate system parameters of the target vehicle in the ground plane vehicle coordinate system according to the coordinate mapping of the endpoints of the side line of the target vehicle detection frame and the endpoints on the invisible side of the vehicle, wherein the 3D coordinate system parameters include at least vehicle length, vehicle width, vehicle height and angle relative to the origin.
  • the camera is a pinhole model camera. If not, distortion correction can be performed to convert it into a pinhole model image. That is, as shown in Figure 9, the corresponding coordinate mapping of the ground point of the image in the vehicle coordinate system is calculated, and point E is the end point of the bottom line of the frame on the side of the invisible vehicle.
  • the vehicle coordinate system is obtained: A'(A'_x, A'_y), B'(B'_x, B'_y), E'(E'_x, E'_y)
  • the angle of the vehicle in front is: (value range (-90°, 90°), left angle is positive, right angle is negative):
  • l(A,E) represents the Euclidean distance between points A and E in the image coordinate system.
  • the rectangular target in the current vehicle’s 3D coordinate system (in the coordinate system of Figure 7, add the z-axis pointing vertically upward from the ground) is obtained until all vehicle detection frames are processed.
  • the second situation of calculating the second target information of the target vehicle in step S3 of the present invention (the posture is that only the front of the vehicle is visible, only the rear of the vehicle is visible, only the side of the vehicle is visible and the front of the vehicle is facing left, or only the side of the vehicle is visible and the front of the vehicle is facing right) specifically includes:
  • S311 Select the bottom edge endpoint and midpoint of the 2D detection frame, and correspondingly obtain the coordinate mapping of the target vehicle detection frame endpoint and midpoint in the ground plane vehicle coordinate system.
  • M_x Box_x + Box_w / 2.
  • S312 Calculate the 3D coordinate system parameters of the target vehicle in the ground plane vehicle coordinate system according to the coordinate mapping of the endpoints and the midpoint of the target vehicle detection frame.
  • car_L 3*car_W.
  • car_L 3*car_W.
  • car_W car_L/3.
  • car_W car_L/3.
  • the rectangular target in the current vehicle’s 3D coordinate system can be obtained.
  • the step S4 specifically includes: performing 3D target tracking by a deepsort target tracking method according to the second target information.
  • the present invention further provides a vehicle 3D detection and tracking device based on monocular vision, the device at least comprising:
  • the first acquisition module is used to acquire a real-time captured image of a monocular camera, wherein the real-time captured image includes a real-time captured image of at least one target vehicle in front of the monocular camera of the vehicle.
  • the first computing unit is used to process the real-time collected image through a multi-task recognition model to obtain first target information, wherein the first target information includes a 2D full body frame corresponding to the target vehicle, a target vehicle feature vector, a target vehicle body posture and a target vehicle body side line information.
  • the second acquisition module is used to obtain parameter information corresponding to the monocular camera.
  • the second calculation unit is used to calculate the second target information of the target vehicle by combining the parameter information corresponding to the monocular camera and the first target information.
  • a tracking unit is used to perform 3D target tracking through a deepsort target tracking method according to the second target information.
  • the first computing unit further includes:
  • the preprocessing module is used to preprocess the real-time acquired images through a convolutional neural network model.
  • the data screening module is used to select the feature points above the preset threshold in the thermal map obtained by the preprocessing module as target feature data.
  • the first detection module is used to obtain the vehicle type parameters corresponding to the feature points, and calculate a 2D detection box classification list according to the CenterNet target detection algorithm.
  • the second detection module is used to output the body posture information of the corresponding position for each 2D detection frame, and obtain the posture of the target vehicle through the argmax operator; if the posture is Front_Side or Rear_Side, side line detection is performed to obtain the coordinates of the side line endpoints A and B.
  • the second calculation unit further includes:
  • the calibration module is used to calibrate the internal and external parameters of the camera and establish the image coordinates to the ground plane vehicle coordinate system with the camera projection as the origin.
  • a coordinate mapping module is used to obtain the coordinate mapping of the target vehicle detection frame endpoints and midpoints in the ground plane vehicle coordinate system according to the selected 2D detection frame bottom edge endpoints and midpoints, or to obtain the coordinate mapping of the target vehicle detection frame side line endpoints and the endpoints on the invisible vehicle side in the ground plane vehicle coordinate system according to the selected 2D detection frame side line endpoints and the endpoints on the invisible vehicle side.
  • the parameter calculation module is used to calculate the 3D coordinate system parameters of the target vehicle in the ground plane vehicle coordinate system according to the results of the coordinate mapping module.
  • the 3D coordinate system parameters at least include the vehicle length, vehicle width, vehicle height and angle relative to the origin.
  • the present invention also provides a car, which is equipped with a monocular camera and a vehicle 3D detection and tracking device based on monocular vision.
  • the device uses the vehicle 3D detection and tracking method based on monocular vision as described above to process the real-time image of at least one target vehicle in front of the car collected in real time by the monocular camera.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic, for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed.
  • the various component embodiments of the present invention can be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It should be understood by those skilled in the art that a microprocessor or digital signal processor (DSP) can be used in practice to implement some or all of the functions of some modules according to embodiments of the present invention.
  • DSP digital signal processor
  • the present invention can also be implemented as a device program (e.g., a computer program and a computer program product) for executing part or all of the methods described herein.
  • a program implementing the present invention can be stored on a computer-readable medium, or can have the form of one or more signals. Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明提供基于单目视觉的车辆3D定位方法、装置及汽车,通过对所获取的单目摄像头实时采集图像,经多任务识别模型进行处理,得到第一目标信息;在获取所述单目摄像头对应的参数信息,结合所述第一目标信息计算所述目标车辆的第二目标信息,最后根据所述第二目标信息对所述目标车辆进行跟踪。在纹理较少的环境中,仍能够获得准确率较高的相机位姿,进而能够获得准确率较高的车辆位姿,能够减小由于环境的纹理变化给车辆定位带来的影响,极大地提高了车辆定位***的鲁棒性。通过车身侧边线检测与车辆检测使用同尺度特征层,每个特征点位对应一套检测结果,无需采用后续匹配,提升目标跟踪效率。

Description

基于单目视觉的车辆3D定位方法、装置及汽车 技术领域
本发明涉及车辆位置检测和跟踪技术领域,尤其是涉及一种基于单目视觉的车辆3D定位方法、装置及汽车。
背景技术
在ADAS(Advanced DrivingAssistance System,高级驾驶辅助***)单目视觉中,准确的检测前方车辆是ADAS***必不可少的功能,是路径规划、运动预测、碰撞避免等功能的前提。而2D(二维)的前车检测已不能满足进一步提高ADAS***智能化的需求,因此需要研究单目视觉中前车3D(三维)框检测技术。
利用车辆三维信息作为训练数据,通过回归算法得到车辆3D。单张RGB图的3D(三维)目标检测《Monocular 3D Object Detection and BOX Fitting trained end‑to‑end using intersecton over union loss》输入单张RGB图,直接通过一个CNN(Convolutional Neural Networks,卷积神经网络)输出预测目标类别、2D(二维)框位置、目标距离、目标偏角、3D框的长宽高、3D框八个顶点投影到2D的坐标位置,通过NMS(Non‑Maximum Suppression,非极大值抑制)提取最佳2D框,3D BOXFitting转化成目标类别、2D框、3D框三个信息,和标注的groundtruth(真值,真实有效的值)一一对应,优化IOUloss进行网络回归训练。该方法需要标注数据具有长、宽、高、朝向等groundtruth信息。
然而,现实中,基于深度学习的方法通过训练得到目标3D检测框是需要大量的groundtruth数据作为前提的,若数据较少,训练得到的模型往往会过拟合,泛化能力弱。在一般道路情况下,前方车辆较多,若采集数据时,没有配合激光雷达数据融合,仅拼接单目摄像机很难获取前方车辆的长、宽、高实际物理尺寸与朝向信息。而激光雷达成本高,并且需要将激光雷达与单目摄像机进行联合标定后,后期还需进一步做数据融合,才能获取前方车辆的相关的长、宽、高实际物理尺寸与朝向信息。因此,前车的三维数据信息很难采集获取,难以满足训练数据需求。通过深度学习的方法训练得到目标3D框,则需要大量很难采集获取的前车三维数据信息。且经过标定的三维数据信息普适性较差,在更换摄像头(内参变化)或安装位置有偏差(外参变化)时,会严重影响模型的使用效果。
发明内容
针对上述技术问题,本发明提出一种基于单目视觉的车辆3D定位方法、装置及汽车,采用基于卷积神经网络多任务联合的单目车辆3D目标检测及跟踪的手段,将车身关键点回归、侧身边线角度回归、车身姿态分类、目标检测及特征向量生成融合在一起,允许一定程度的遮挡也可以精准获取目标车辆的3D目标框,抗干扰能力较强,可以有效对目标车辆3D进行检测和跟踪。
具体的,本发明提供的一种基于单目视觉的车辆3D检测及跟踪方法,包括步骤:
S1:获取单目摄像头的实时采集图像,所述实时采集图像包括所述车辆的所述单目摄像头对前方至少一个目标车辆的实时采集图像。
S2:根据所述实时采集图像,通过多任务识别模型进行处理,得到第一目标信息。
S3:获取所述单目摄像头对应的参数信息,结合所述第一目标信息计算所述目标车辆的第二目标信息。
S4:根据所述第二目标信息对所述目标车辆进行跟踪。
在上述技术方案中,在纹理较少的环境中,仍能够获得准确率较高的相机位姿,进而能够获得准确率较高的车辆位姿,能够减小由于环境的纹理变化给车辆定位带来的影响,极大地提高了车辆定位***的鲁棒性。通过车身侧边线检测与车辆检测使用同尺度特征层,每个特征点位对应一套检测结果,无需采用后续匹配,提升目标跟踪效率。
其中,所述第一目标信息包括所述目标车辆对应的2D全车身框、目标车辆特征向量、目标车辆车身姿态和目标车辆车身侧边线信息;所述第二目标信息包括目标车辆3D全车身框。
在上述技术方案中,所述第一目标信息和第二目标信息的组合能够提供丰富的目标车辆信息,有助于提供后续目标车辆跟踪的准确性。
所述步骤S2中通过多任务识别模型进行处理,得到第一目标信息,具体包括:
S201:通过卷积神经网络模型对所述实时采集图像进行预处理,获得热力图。
S202:筛选热力图中高于预设阈值的特征点作为目标特征数据,获取所述特征点对应的车辆类型参数,根据CenterNet目标检测算法,计算得2D检测框分类列表。
S203:对于每一2D检测框输出对应位置的车身姿态信息,经过argmax算子得到目标车辆的姿态;若姿态为Front_Side或Rear_Side,进行侧边线检测,获得侧边线端点A和B的坐标;否则,返回S202,直到所有目标车辆检测框处理完毕。
在上述技术方案中,通过多任务识别模型进行处理,能够有效地提取出第一目标信息,并具有高效、准确和全面的特点。
所述步骤S201中预处理,具体包括:
对所述实时采集图像根据摄像头分辨率比例及网络结构参数,设置卷积神经网络模型输入端的图像尺寸。
根据所述卷积神经网络模型输入端的图像尺寸比例,对所述多层特征节点归一化处理;采用幽灵模块进行图像特征线性变换获得幽灵特征,再通过BiFPN特征融合算法对不同尺度的幽灵特征加权值进行融合,输出多层特征节点;对所述多层特征节点归一化处理,获得热力图。
在上述技术方案中,相比mobilenet,幽灵模块可以用更少的运算,可以通过适当减少卷积特征层来降低计算量,带来更丰富的特征表达,适合在对实时性要求较高的自动驾驶领域应用。
此外,与直接堆叠不同尺度的特征的传统做法不同的是,BiFPN 可以给不同尺度特征加权值进行融合,让网络自行学习不同输入特征的权重。BiFPN应用方法简单易用,在特征融合过程中可以根据检测任务复杂度及硬件计算能力多次使用BiFPN。
所述步骤S203中进行侧边线检测的第一种方案,具体包括:
获取侧边线端点A和B所在线段中点C相对中心特征点F的坐标偏差、间距线段长度和线段相对x轴角度。
根据所述中心特征点的坐标及其坐标偏差计算中点C的坐标。
根据所述中点C的坐标、间距线段长度和线段相对x轴角度计算侧边线端点A和B的坐标。
在上述技术方案中,该计算侧边线端点A和B的坐标的方法不依赖2D检测框的准确性。
所述步骤S203中进行侧边线检测的第二种方案,具体包括:
以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例。
根据所述2D检测框的左上角坐标、框宽和框高以及底边相对位置比例计算侧边线端点A的坐标。
获取侧边线端点的侧边相对位置,并判断该侧边相对位置是否小于等于预设阈值,若小于等于,则根据所述2D检测框的左上角坐标、框高和侧边相对位置计算侧边线端点B的坐标;否则,根据所述2D检测框的左上角坐标、框宽和侧边相对位置计算侧边线端点B的坐标。
在上述技术方案中,该计算侧边线端点A和B的坐标的方法利用了2D检测框的信息,回归任务量少。
所述步骤S203中进行侧边线检测的第三种方案,具体包括:
以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例。
根据所述2D检测框的左上角坐标、框宽和框高以及底边相对位置比例计算侧边线端点A的坐标。
获取侧边线端点A和B所在线段相对x轴的角度,并根据2D检测框的左上角坐标、框宽、框高、底边相对位置比例和所述角度计算侧边线端点B的坐标。
在上述技术方案中,该计算侧边线端点A和B的坐标的方法应用于前车姿态为可见车头和侧身或可见车尾和侧身时,侧边线端点A和B所在线段相对x轴的角度明显;且前视可见侧边线时,侧边线端点A和B所在线段相对x轴的角度取值不可能为以0.5(即90°)为中心的小邻域内,因此不会存在跳变情况;端点A可通过尾灯或车头大灯外侧垂线与框相交得到,特征明显;并在在一种实施方式中侧边线端点A和B所在线段相对x轴的角度和底边相对位置比例取值范围均为[0, 1],便于网络回归学习。
在执行步骤S3之前,包括:
分别标定摄像机的内外参数,建立图像坐标到以相机投影为原点地平面车辆坐标系。
所述步骤S3计算所述目标车辆的第二目标信息的第一种情况,具体包括:
S301:选取2D检测框的侧边线端点及在不可见车侧边一方的端点,相应获得地平面车辆坐标系下的目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射。
S303:根据所述目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射分别计算地平面车辆坐标系下的目标车辆3D坐标系参数,所述3D坐标系参数至少包括车辆长度、车辆宽度、车辆高度以及相对原点的角度。
所述步骤S3计算所述目标车辆的第二目标信息的第二种情况,具体包括:
S311:选取2D检测框底边端点及中点,相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射。
S312:根据所述目标车辆检测框端点和中点的坐标映射分别计算地平面车辆坐标系下的目标车辆3D坐标系参数。
在上述技术方案中,通过建立图像坐标与地平面车辆坐标系之间的映射关系,可以将目标车辆的关键点坐标在地平面车辆坐标系中表示,这样可以方便地进行后续的计算和分析;计算出目标车辆3D坐标系参数有助于进行后续目标车辆的跟踪。
所述步骤S4具体包括:根据所述第二目标信息通过deepsort目标跟踪方法进行3D目标跟踪。
在上述技术方案中,DeepSORT是一种实时目标跟踪方法,具有较高的处理速度和实时性;并且DeepSORT方法能够同时跟踪多个目标,从而适用于复杂场景中存在多个目标车辆的情况。
作为另一优选的,本发明还提供一种基于单目视觉的车辆3D检测及跟踪装置,所述装置至少包括:
第一获取模块,用于获取单目摄像头的实时采集图像,所述实时采集图像包括所述车辆的所述单目摄像头对前方至少一个目标车辆的实时采集图像。
第一计算单元,用于根据所述实时采集图像,通过多任务识别模型进行处理,得到第一目标信息,所述第一目标信息包括所述目标车辆对应的2D全车身框,目标车辆特征向量,目标车辆车身姿态和目标车辆车身侧边线信息。
第二获取模块,用于获取所述单目摄像头对应的参数信息。
第二计算单元,用于结合所述单目摄像头对应的参数信息和第一目标信息,计算目标车辆的第二目标信息。
跟踪单元,用于根据所述第二目标信息通过deepsort目标跟踪方法进行3D目标跟踪。
其中,所述第一计算单元,还包括:
预处理模块,用于通过卷积神经网络模型对实时采集图像进行预处理。
数据筛选模块,用于对预处理模块获得的热力图中高于预设阈值的特征点进行选择作为目标特征数据。
第一检测模块,用于获取所述特征点对应的车辆类型参数,并根据CenterNet目标检测算法,计算得2D检测框分类列表。
第二检测模块,用于对于每一2D检测框输出对应位置的车身姿态信息,经过argmax算子得到目标车辆的姿态;若姿态为Front_Side或Rear_Side,则进行侧边线检测,获得侧边线端点A和B的坐标。
其中,所述第二计算单元,还包括:
标定模块,用于标定摄像机的内外参数,并建立图像坐标到以相机投影为原点地平面车辆坐标系;
坐标映射模块,用于根据所选取的2D检测框底边端点及中点,相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射,或者是根据所选取的2D检测框的侧边线端点及在不可见车侧边一方的端点,相应获得地平面车辆坐标系下的目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射;
参数计算模块,用于根据坐标映射模块的结果计算地平面车辆坐标系下的目标车辆3D坐标系参数,所述3D坐标系参数至少包括车辆长度,车辆宽度,车辆高度以及相对原点的角度。
作为另一优选的,本发明还提供一种汽车,安装有单目相机,还配置有基于单目视觉的车辆3D检测及跟踪装置,所述装置采用如上所述基于单目视觉的车辆3D检测及跟踪方法对所述单目相机实时采集的所述汽车前方至少一个目标车辆的实时图像进行处理。
综上所述,本发明提供基于单目视觉的车辆3D定位方法、装置及汽车,通过对所获取的单目摄像头实时采集图像,经多任务识别模型进行处理,得到第一目标信息;在获取所述单目摄像头对应的参数信息,结合所述第一目标信息计算所述目标车辆的第二目标信息,最后根据所述第二目标信息对所述目标车辆进行跟踪。
与现有技术相比,本发明实施例具有以下有益效果:
(1)本发明利用车辆上的单目相机在车辆移动时实时采集图像,并提取图像中的特征点以及特征线,利用该特征点以及特征线获取车辆的2D框图,进而相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射,从而获得目标车辆3D坐标系参数,在图像跟踪的过程中基于特征点和特征线的融合优化获得单目相机的优化后的相机位姿,在纹理较少的环境中,仍能够获得准确率较高的相机位姿,进而能够获得准确率较高的车辆位姿,能够减小由于环境的纹理变化给车辆定位带来的影响,极大地提高了车辆定位***的鲁棒性。
(2)本发明通过车身侧边线检测与车辆检测使用同尺度特征层,每个特征点位对应一套检测结果,无需采用后续匹配。
(3)本发明中的车侧边线检测,抗干扰能力较强,允许一定程度的遮挡。车身姿态分类任务,使得各种场景可以针对性地被处理。
(4)本发明中可输出前车目标在当前车辆3D坐标系中的长方体目标。本发明将车辆3D检测与特征检测合并完成,提升目标跟踪效率。
附图说明
图1为本发明所述基于单目视觉的车辆3D检测及跟踪方法流程图。
图2为本发明所述卷积神经网络模型图。
图3为本发明所述幽灵模块原理图。
图4为本发明一实施例中的车身侧边线计算方法示意图。
图5为本发明另一实施例中的车身侧边线计算方法示意图。
图6为本发明另一实施例中的车身侧边线计算方法示意图。
图7为本发明另一实施例中的目标特征向量输出示意图。
图8为本发明另一实施例中的车辆坐标系示意图。
图9为本发明另一实施例中的图像至车辆坐标系映射示意图。
图10为本发明另一实施例中的前车侧边线不可见情况,投影分析示意图。
图11为图1所述的基于单目视觉的车辆3D检测及跟踪方法的***框架图。
实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例
如图1所示,本发明提供的一种基于单目视觉的车辆3D检测及跟踪方法,包括步骤:
S1:获取单目摄像头的实时采集图像,所述实时采集图像包括所述车辆的所述单目摄像头对前方至少一个目标车辆的实时采集图像。
S2:根据所述实时采集图像,通过多任务识别模型进行处理,得到第一目标信息。
其中,所述第一目标信息包括所述目标车辆对应的2D全车身框、目标车辆特征向量、目标车辆车身姿态和目标车辆车身侧边线信息
S3:获取所述单目摄像头对应的参数信息,结合所述第一目标信息计算所述目标车辆的第二目标信息。
其中,所述第二目标信息包括目标车辆3D全车身框。
S4:根据所述第二目标信息对所述目标车辆进行跟踪。
所述步骤S2中通过多任务识别模型进行处理,得到第一目标信息,具体包括:
S201:通过卷积神经网络模型对所述实时采集图像进行预处理,获得热力图。
其中预处理具体包括:对所述实时采集图像根据摄像头分辨率比例及网络结构参数,设置卷积神经网络模型输入端的图像尺寸;根据所述卷积神经网络模型输入端的图像尺寸比例,对所述多层特征节点归一化处理;采用幽灵模块进行图像特征线性变换获得幽灵特征,再通过BiFPN特征融合算法对不同尺度的幽灵特征加权值进行融合,输出多层特征节点;对所述多层特征节点归一化处理,获得热力图。
如图2-3所示,所述卷积神经网络模型包括:输入单元,主干网络,特征融合和多任务检测,具体的:
输入单元,获取单目摄像头实时采集的图像。
主干网络,使用基于幽灵模块(GhostNet)的ResNet网络结构。幽灵模块通过已有的特征通过简单的线性变换得到幽灵特征。图3中Output的下半部分。相比mobilenet,幽灵模块可以用更少的运算,可以通过适当减少卷积特征层来降低计算量,带来更丰富的特征表达,适合在对实时性要求较高的自动驾驶领域应用。
特征融合:使用Google推出的BiFPN特征融合算法。与直接堆叠不同尺度的特征的传统做法不同的是,BiFPN 可以给不同尺度特征加权值进行融合,让网络自行学习不同输入特征的权重。BiFPN应用方法简单易用,在特征融合过程中可以根据检测任务复杂度及硬件计算能力多次使用BiFPN,以3层BiFPN为例,如图4所示。 
多任务检测:可选择一个(如P3_out)、多个或全部BiFPNs的输出特征节点进行检测。选择部分输出特征层时可将其它输出特征节点删除。
下面以使用P3_out进行多任务检测为例介绍本发明中的多任务检测功能:
Input:参考应用摄像头分辨率比例及网络结构设计input尺寸。假设摄像头分辨率1920x1080,即,宽:长=3:2;如上网络结构要求输入长宽需为128的倍数。综上可设定input尺寸(示例尺寸,包括但不限于):
Input_w = 128*6 = 768;
Input_h = 128*4 = 512;
设定P3_out特征输出层数P3_out_c = 192,根据input尺寸可得P3_out尺寸为:
P3_out_chw = [P3_out_c, Input_w/8, Input_w/8] = [192, 96, 64]。
S202:筛选热力图中高于预设阈值的特征点作为目标特征数据,获取所述特征点对应的车辆类型参数,根据CenterNet目标检测算法计算得所述特征点分布热力图,目标中心相对特征点偏差回归值和目标框宽高回归值,进而计算得到2D检测框分类列表,并将2D检测框记录在所述热力图上对应坐标位置。
其中, 2D目标检测使用centerNet算法。每个车辆类型(设定10个分类,不限于10个分类)对应一个输出序列(热力图、中心偏差回归、宽高回归),热力图使用focal_loss损失函数,中心偏差回归和宽高回归使用L2损失函数。
2D检测框分类列表使用6分类softmax结构,使用交叉熵损失函数。
车身姿态分类 说明
Front 只见车头(视线方向正对车头)
Rear 只见车尾(视线方向正对车尾)
Left_Side 只见侧身(目标车身方向与视线方向垂直),车头向左
Right_Side 只见侧身(目标车身方向与视线方向垂直),车头向右
Front_Side 可见车头+侧身
Rear_Side 可见车尾+侧身
S203:对于每一2D检测框输出对应位置的车身姿态信息,经过argmax算子得到目标车辆的姿态;若姿态为Front_Side或Rear_Side,进行侧边线检测,获得侧边线端点A和B的坐标;否则,返回S202,直到所有目标车辆检测框处理完毕。
所述侧边线检测过程汇中,每个特征位置(96 x 64)对应一个检测结果。本发明使用L2损失函数确定侧边线,具体的,提供三种确定车身侧边线的方案:
方案一:
获取侧边线端点A和B所在线段中点C相对中心特征点F的坐标偏差、间距线段长度和线段相对x轴角度;然后根据所述中心特征点的坐标及其坐标偏差计算中点C的坐标;再进一步根据所述中点C的坐标、间距线段长度和线段相对x轴角度计算侧边线端点A和B的坐标。
如图4所示,设侧边线端点A和B所在线段中点C相对中心特征点F的坐标偏差(reg_x, reg_y),间距线段长度L,线段相对x轴角度R,取值范围[0, 1],计算侧边线检测公式为:
C_x = Fx + reg_x;
C_y = Fy + reg_y;
A_x = C_x - length/2 * cos(angle*180°);
A_y = C_y + length/2 * sin(angle*180°);
B_x = C_x + length/2 * cos(angle*180°);
B_y = C_y - length/2 * sin(angle*180°)。
此方案的优点:不依赖2D检测框的准确性。
方案二:
以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例;然后根据所述2D检测框的左上角坐标、框宽和框高以及底边相对位置比例计算侧边线端点A的坐标;进一步的,获取侧边线端点的侧边相对位置,并判断该侧边相对位置是否小于等于预设阈值,若小于等于,则根据所述2D检测框的左上角坐标、框高和侧边相对位置计算侧边线端点B的坐标;否则,根据所述2D检测框的左上角坐标、框宽和侧边相对位置计算侧边线端点B的坐标。
如图5所示,以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例reg_x,其中,0表示在底线左端,1表示在底线右端,侧边线端点的侧边相对位置reg_y,其中,[0, 0.5]表示在框左侧,0在上,0.5在下;(0.5, 1]表示在框右侧,0.5在上,1在下;计算侧边线检测公式为:
A_x = Box_x + Box_w * reg_x;
A_y = Box_y + Box_h;
当reg_y ≤ 0.5时,
B_x = Box_x;
B_y = Box_y + Box_h * reg_y;
当reg_y>0.5时,
B_x = Box_x + Box_w;
B_y = Box_y + Box_h * (reg_y – 0.5)。
此方案的优点:利用了2D检测框的信息;回归任务量少。
方案三:
以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例;然后根据所述2D检测框的左上角坐标、框宽和框高以及底边相对位置比例计算侧边线端点A的坐标;进一步获取侧边线端点A和B所在线段相对x轴的角度,并根据2D检测框的左上角坐标、框宽、框高、底边相对位置比例和所述角度计算侧边线端点B的坐标。
如图6所示,所述侧边线检测,包括:
以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例reg_x,其中,0表示在底线左端,1表示在底线右端;线段相对x轴r,取值范围[0, 1];计算侧边线检测公式为:
A_x = Box_x + Box_w * reg_x;
A_y = Box_y + Box_h;
当angle<90°时,
B_x = Box_x + Box_w;
B_y = Box_y + Box_h - Box_w * (1 - reg_x) * tan(angle*180°);
当angle>90°时,
B_x = Box_x;
B_y = Box_y + Box_h + Box_w * reg_x * tan(angle*180°)。
此方案的优点:如图6所示,前车姿态为Front_Side或Rear_Side时,侧边线方向(angle)明显。且前视可见侧边线时,angle取值不可能为以0.5(即90°)为中心的小邻域内,因此不会存在跳变情况;端点A可通过尾灯或车头大灯外侧垂线与框相交得到,特征明显;angle和reg_x取值范围均为[0, 1],便于网络回归学习。
特征输出如图7所示。每个特征位置(96 x 64)对应一个n维特征向量。以n=64为例(应用中不限于该数值)。训练时,参考arcface算法,特征向量后接跟踪ID分类器,类别数量即为训练样本集ID数量,保证训练集中同个ID的目标投影聚集到同个ID类别,不同ID目标投影到不同的ID类别。以此方式,可获得具有具有个体差异性的目标特征向量。保存模型时,仅保存至目标特征向量输出为止。
在执行步骤S3之前,包括:
分别标定摄像机的内外参数,建立图像坐标到以相机投影为原点地平面车辆坐标系,如图8所示。
本发明上述中步骤S3计算所述目标车辆的第二目标信息的第一种情况(姿态为可见车头和侧身或可见车尾和侧身),具体包括:
S301:选取2D检测框的侧边线端点及在不可见车侧边一方的端点,相应获得地平面车辆坐标系下的目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射。
设E点为框底线在不可见车侧边一方的端点,得到车辆坐标系下:A’(A’_x,A’_y),B’(B’_x, B’_y),E’(E’_x, E’_y)。
S302:根据所述目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射分别计算地平面车辆坐标系下的目标车辆3D坐标系参数,所述3D坐标系参数至少包括车辆长度、车辆宽度、车辆高度以及相对原点的角度。
优选的,在本实施例中,假设摄像头为针孔模型摄像头,若不是,可进行畸变矫正转换至针孔模型画面。即如图9所示,计算图像地面上点在车辆坐标系中相应坐标映射,E点为框底线在不可见车侧边一方的端点。得到车辆坐标系下:A’(A’_x,A’_y),B’(B’_x, B’_y),E’(E’_x, E’_y)
然后,计算得:

车辆长度:
[援引加入(细则20.6)08.10.2023]

相对于车辆坐标系x轴,前方车辆角度:(取值范围(-90°,90°),左偏角度为正,右偏角度为负):
[援引加入(细则20.6)08.10.2023]

由于车尾框是在同一距离在摄像头上投影,根据小孔成像原理知图像坐标中的宽高比与实际宽高比一致,得车辆高度:
[援引加入(细则20.6)08.10.2023]
[援引加入(细则20.6)08.10.2023]
l(A,E)表示图像坐标系中A,E两点的欧氏距离。

当前车朝向θ>0时,车尾框底线右侧端点E可近似认为是真实E_r在直线x=A_x上的投影。得,车辆宽度:
[援引加入(细则20.6)08.10.2023]
结合前方车辆姿态分类,继而可得前车朝向与当前车朝向(x轴方向)的角度θ_final :
当car_pose = Front_Side 时,θ_final =θ + 180°。      
当car_pose = Rear_Side 时(如图9中示例),θ_final =θ。
综上,依据前车长宽高、车辆方向及A’坐标,即得当前车辆3D坐标系(在图7坐标系中增加垂直地面向上的z轴)中的长方体目标,直到所有车辆检测框处理完毕。
本发明上述中步骤S3计算所述目标车辆的第二目标信息的第二种情况(姿态为只见车头、只见车尾、只见侧身和车头向左或只见侧身和车头向右),具体包括:
S311:选取2D检测框底边端点及中点,相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射。
其中,取车2D框底边端点A,E及中点 M(M_x,M_y),如图10所示。
计算中点 M(M_x,M_y)公式为:
M_x = Box_x + Box_w / 2。
M_y = Box_y + Box_h。

M点在车辆坐标系中的投影M’(M’_x,M’_y),得:
[援引加入(细则20.6)08.10.2023]
[援引加入(细则20.6)08.10.2023]
A',E'间距离:
S312:根据所述目标车辆检测框端点和中点的坐标映射分别计算地平面车辆坐标系下的目标车辆3D坐标系参数。
假设前车身长:宽 = 3:1,得:
当car_pose = Front(只见车头)时,θ_final =θ + 180°;car_W = dis(A’, E’);
[援引加入(细则20.6)08.10.2023]
car_L=3*car_W。
当car_pose = Rear(只见车尾)时,θ_final =θ ;car_W = dis(A’, E’);
[援引加入(细则20.6)08.10.2023]
car_L=3*car_W。
[援引加入(细则20.6)08.10.2023]
当car_pose=Left_Side(只见侧身和车头向左)时,θ_final=θ+90°;car_L=dis(A’,E’);
[援引加入(细则20.6)08.10.2023]
car_W=car_L/3。
当car_pose = Right_Side(只见侧身和车头向右)时,
θ_final =θ + 270°;car_L = dis(A’, E’);
[援引加入(细则20.6)08.10.2023]
car_W=car_L/3。
可见,依据前车长宽高、车辆方向及A’坐标,即得当前车辆3D坐标系中的长方体目标。
所述步骤S4具体包括:根据所述第二目标信息通过deepsort目标跟踪方法进行3D目标跟踪。
实施例
如图11所示,本发明还提供一种基于单目视觉的车辆3D检测及跟踪装置,所述装置至少包括:
第一获取模块,用于获取单目摄像头的实时采集图像,所述实时采集图像包括所述车辆的所述单目摄像头对前方至少一个目标车辆的实时采集图像。
第一计算单元,用于根据所述实时采集图像,通过多任务识别模型进行处理,得到第一目标信息,所述第一目标信息包括所述目标车辆对应的2D全车身框,目标车辆特征向量,目标车辆车身姿态和目标车辆车身侧边线信息。
第二获取模块,用于获取所述单目摄像头对应的参数信息。
第二计算单元,用于结合所述单目摄像头对应的参数信息和第一目标信息,计算目标车辆的第二目标信息。
跟踪单元,用于根据所述第二目标信息通过deepsort目标跟踪方法进行3D目标跟踪。
其中,所述第一计算单元,还包括:
预处理模块,用于通过卷积神经网络模型对实时采集图像进行预处理。
数据筛选模块,用于对预处理模块获得的热力图中高于预设阈值的特征点进行选择作为目标特征数据。
第一检测模块,用于获取所述特征点对应的车辆类型参数,并根据CenterNet目标检测算法,计算得2D检测框分类列表。
第二检测模块,用于对于每一2D检测框输出对应位置的车身姿态信息,经过argmax算子得到目标车辆的姿态;若姿态为Front_Side或Rear_Side,则进行侧边线检测,获得侧边线端点A和B的坐标。
其中,所述第二计算单元,还包括:
标定模块,用于标定摄像机的内外参数,并建立图像坐标到以相机投影为原点地平面车辆坐标系。
坐标映射模块,用于根据所选取的2D检测框底边端点及中点,相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射,或者是根据所选取的2D检测框的侧边线端点及在不可见车侧边一方的端点,相应获得地平面车辆坐标系下的目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射。
参数计算模块,用于根据坐标映射模块的结果计算地平面车辆坐标系下的目标车辆3D坐标系参数,所述3D坐标系参数至少包括车辆长度,车辆宽度,车辆高度以及相对原点的角度。
实施例
本发明还提供一种汽车,安装有单目相机,还配置有基于单目视觉的车辆3D检测及跟踪装置,所述装置采用如上所述基于单目视觉的车辆3D检测及跟踪方法对所述单目相机实时采集的所述汽车前方至少一个目标车辆的实时图像进行处理。
尽管这里已经参考附图描述了示例实施例,应理解上述示例实施例仅仅是示例性的,并且不意图将本发明的范围限制于此。本领域普通技术人员可以在其中进行各种改变和修改,而不偏离本发明的范围和精神。所有这些改变和修改意在被包括在所附权利要求所要求的本发明的范围之内。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个设备,或一些特征可以忽略,或不执行。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的一些模块的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
虽然对本发明的描述是结合以上具体实施例进行的,但是,熟悉本技术领域的人员能够根据上述的内容进行许多替换、修改和变化是显而易见的。因此,所有这样的替代、改进和变化都包括在附后的权利要求的精神和范围内。

Claims (15)

  1. 一种基于单目视觉的车辆3D检测及跟踪方法,其特征在于,包括步骤:
    S1:获取单目摄像头的实时采集图像,所述实时采集图像包括所述车辆的所述单目摄像头对前方至少一个目标车辆的实时采集图像;
    S2:根据所述实时采集图像,通过多任务识别模型进行处理,得到第一目标信息;
    S3:获取所述单目摄像头对应的参数信息,结合所述第一目标信息计算所述目标车辆的第二目标信息;
    S4:根据所述第二目标信息对所述目标车辆进行跟踪。
  2. 根据权利要求1所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述第一目标信息包括所述目标车辆对应的2D全车身框、目标车辆特征向量、目标车辆车身姿态和目标车辆车身侧边线信息;
    所述第二目标信息包括目标车辆3D全车身框。
  3. 根据权利要求2所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S2中通过多任务识别模型进行处理,得到第一目标信息,具体包括:
    S201:通过卷积神经网络模型对所述实时采集图像进行预处理,获得热力图;
    S202:筛选热力图中高于预设阈值的特征点作为目标特征数据,获取所述特征点对应的车辆类型参数,根据CenterNet目标检测算法,计算得2D检测框分类列表;
    S203:对于每一2D检测框输出对应位置的车身姿态信息,经过argmax算子得到目标车辆的姿态;若姿态为Front_Side或Rear_Side,进行侧边线检测,获得侧边线端点A和B的坐标;否则,返回S202,直到所有目标车辆检测框处理完毕。
  4. 根据权利要求3所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S201中预处理,具体包括:
    对所述实时采集图像根据摄像头分辨率比例及网络结构参数,设置卷积神经网络模型输入端的图像尺寸;
    根据所述卷积神经网络模型输入端的图像尺寸比例,对所述多层特征节点归一化处理;
    采用幽灵模块进行图像特征线性变换获得幽灵特征,再通过BiFPN特征融合算法对不同尺度的幽灵特征加权值进行融合,输出多层特征节点;
    对所述多层特征节点归一化处理,获得热力图。
  5. 根据权利要求4所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S203中进行侧边线检测,具体包括:
    获取侧边线端点A和B所在线段中点C相对中心特征点F的坐标偏差、间距线段长度和线段相对x轴角度;
    根据所述中心特征点的坐标及其坐标偏差计算中点C的坐标;
    根据所述中点C的坐标、间距线段长度和线段相对x轴角度计算侧边线端点A和B的坐标。
  6. 根据权利要求4所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S203中进行侧边线检测,具体包括:
    以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例;
    根据所述2D检测框的左上角坐标、框宽和框高以及底边相对位置比例计算侧边线端点A的坐标;
    获取侧边线端点的侧边相对位置,并判断该侧边相对位置是否小于等于预设阈值,若小于等于,则根据所述2D检测框的左上角坐标、框高和侧边相对位置计算侧边线端点B的坐标;否则,根据所述2D检测框的左上角坐标、框宽和侧边相对位置计算侧边线端点B的坐标。
  7. 根据权利要求4所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S203中进行侧边线检测,具体包括:
    以侧边线端点A和B所在线段两端在2D检测框上为先验信息,输出底边线端点的底边相对位置比例;
    根据所述2D检测框的左上角坐标、框宽和框高以及底边相对位置比例计算侧边线端点A的坐标;
    获取侧边线端点A和B所在线段相对x轴的角度,并根据2D检测框的左上角坐标、框宽、框高、底边相对位置比例和所述角度计算侧边线端点B的坐标。
  8. 根据权利要求7所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,在执行步骤S3之前,包括:
    分别标定摄像机的内外参数,建立图像坐标到以相机投影为原点地平面车辆坐标系。
  9. 根据权利要求8所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S3计算所述目标车辆的第二目标信息,具体包括:
    S301:选取2D检测框的侧边线端点及在不可见车侧边一方的端点,相应获得地平面车辆坐标系下的目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射;
    S302:根据所述目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射分别计算地平面车辆坐标系下的目标车辆3D坐标系参数,所述3D坐标系参数至少包括车辆长度、车辆宽度、车辆高度以及相对原点的角度。
  10. 根据权利要求8所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S3计算所述目标车辆的第二目标信息,具体包括:
    S311:选取2D检测框底边端点及中点,相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射;
    S313:根据所述目标车辆检测框端点和中点的坐标映射分别计算地平面车辆坐标系下的目标车辆3D坐标系参数。
  11. 根据权利要求10所述的基于单目视觉的车辆3D检测及跟踪方法,其特征在于,所述步骤S4具体包括:根据所述第二目标信息通过deepsort目标跟踪方法进行3D目标跟踪。
  12. 一种基于单目视觉的车辆3D检测及跟踪装置,其特征在于,所述装置至少包括:
    第一获取模块,用于获取单目摄像头的实时采集图像,所述实时采集图像包括所述车辆的所述单目摄像头对前方至少一个目标车辆的实时采集图像;
    第一计算单元,用于根据所述实时采集图像,通过多任务识别模型进行处理,得到第一目标信息,所述第一目标信息包括所述目标车辆对应的2D全车身框,目标车辆特征向量,目标车辆车身姿态和目标车辆车身侧边线信息;
    第二获取模块,用于获取所述单目摄像头对应的参数信息;
    第二计算单元,用于结合所述单目摄像头对应的参数信息和第一目标信息,计算目标车辆的第二目标信息;
    跟踪单元,用于根据所述第二目标信息通过deepsort目标跟踪方法进行3D目标跟踪。
  13. 根据权利要求12所述的装置,其特征在于,所述第一计算单元,还包括:
    预处理模块,用于通过卷积神经网络模型对实时采集图像进行预处理;
    数据筛选模块,用于对预处理模块获得的热力图中高于预设阈值的特征点进行选择作为目标特征数据;
    第一检测模块,用于获取所述特征点对应的车辆类型参数,并根据CenterNet目标检测算法,计算得2D检测框分类列表;
    第二检测模块,用于对于每一2D检测框输出对应位置的车身姿态信息,经过argmax算子得到目标车辆的姿态;若姿态为Front_Side或Rear_Side,则进行侧边线检测,获得侧边线端点坐标。
  14. 根据权利要求13所述的装置,其特征在于,所述第二计算单元,还包括:
    标定模块,用于标定摄像机的内外参数,并建立图像坐标到以相机投影为原点地平面车辆坐标系;
    坐标映射模块,用于根据所选取的2D检测框底边端点及中点,相应获得地平面车辆坐标系下的目标车辆检测框端点和中点的坐标映射,或者是根据所选取的2D检测框的侧边线端点及在不可见车侧边一方的端点,相应获得地平面车辆坐标系下的目标车辆检测框侧边线端点及在不可见车侧边一方的端点的坐标映射;
    参数计算模块,用于根据坐标映射模块的结果计算地平面车辆坐标系下的目标车辆3D坐标系参数,所述3D坐标系参数至少包括车辆长度,车辆宽度,车辆高度以及相对原点的角度。
  15. 一种汽车,安装有单目相机,其特征在于,还配置有基于单目视觉的车辆3D检测及跟踪装置,所述装置采用如权利要求1-11中任一所述基于单目视觉的车辆3D检测及跟踪方法对所述单目相机实时采集的所述汽车前方至少一个目标车辆的实时图像进行处理。
PCT/CN2023/122151 2022-12-21 2023-09-27 基于单目视觉的车辆3d定位方法、装置及汽车 WO2024131200A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211646727.8 2022-12-21
CN202211646727.8A CN116128962A (zh) 2022-12-21 2022-12-21 基于单目视觉的车辆3d定位方法,装置,汽车及存储介质

Publications (1)

Publication Number Publication Date
WO2024131200A1 true WO2024131200A1 (zh) 2024-06-27

Family

ID=86310925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/122151 WO2024131200A1 (zh) 2022-12-21 2023-09-27 基于单目视觉的车辆3d定位方法、装置及汽车

Country Status (2)

Country Link
CN (1) CN116128962A (zh)
WO (1) WO2024131200A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128962A (zh) * 2022-12-21 2023-05-16 惠州市德赛西威智能交通技术研究院有限公司 基于单目视觉的车辆3d定位方法,装置,汽车及存储介质
CN117553695B (zh) * 2024-01-11 2024-05-03 摩斯智联科技有限公司 计算车辆高度的方法、装置及计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517349A (zh) * 2019-07-26 2019-11-29 电子科技大学 一种基于单目视觉和几何约束的3d车辆目标检测方法
CN111862157A (zh) * 2020-07-20 2020-10-30 重庆大学 一种机器视觉与毫米波雷达融合的多车辆目标跟踪方法
US20200364472A1 (en) * 2019-05-14 2020-11-19 Neusoft Corporation Vehicle tracking method, computer readable storage medium, and electronic device
CN115205654A (zh) * 2022-07-06 2022-10-18 舵敏智能科技(苏州)有限公司 一种新型基于关键点约束的单目视觉3d目标检测方法
CN116128962A (zh) * 2022-12-21 2023-05-16 惠州市德赛西威智能交通技术研究院有限公司 基于单目视觉的车辆3d定位方法,装置,汽车及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364472A1 (en) * 2019-05-14 2020-11-19 Neusoft Corporation Vehicle tracking method, computer readable storage medium, and electronic device
CN110517349A (zh) * 2019-07-26 2019-11-29 电子科技大学 一种基于单目视觉和几何约束的3d车辆目标检测方法
CN111862157A (zh) * 2020-07-20 2020-10-30 重庆大学 一种机器视觉与毫米波雷达融合的多车辆目标跟踪方法
CN115205654A (zh) * 2022-07-06 2022-10-18 舵敏智能科技(苏州)有限公司 一种新型基于关键点约束的单目视觉3d目标检测方法
CN116128962A (zh) * 2022-12-21 2023-05-16 惠州市德赛西威智能交通技术研究院有限公司 基于单目视觉的车辆3d定位方法,装置,汽车及存储介质

Also Published As

Publication number Publication date
CN116128962A (zh) 2023-05-16

Similar Documents

Publication Publication Date Title
CN112861653B (zh) 融合图像和点云信息的检测方法、***、设备及存储介质
WO2024131200A1 (zh) 基于单目视觉的车辆3d定位方法、装置及汽车
CN112785702B (zh) 一种基于2d激光雷达和双目相机紧耦合的slam方法
CN111462135A (zh) 基于视觉slam与二维语义分割的语义建图方法
CN113111887A (zh) 一种基于相机和激光雷达信息融合的语义分割方法及***
CN112396656B (zh) 一种视觉与激光雷达融合的室外移动机器人位姿估计方法
WO2021120574A1 (en) Obstacle positioning method and apparatus for autonomous driving system
CN112232275B (zh) 基于双目识别的障碍物检测方法、***、设备及存储介质
WO2020237516A1 (zh) 点云的处理方法、设备和计算机可读存储介质
CN112037159A (zh) 一种跨相机道路空间融合及车辆目标检测跟踪方法及***
Sun et al. ATOP: An attention-to-optimization approach for automatic LiDAR-camera calibration via cross-modal object matching
Ma et al. Crlf: Automatic calibration and refinement based on line feature for lidar and camera in road scenes
CN111178150A (zh) 车道线检测方法、***及存储介质
CN114140439A (zh) 基于深度学习的激光焊接焊缝特征点识别方法及装置
CN114088081A (zh) 一种基于多段联合优化的用于精确定位的地图构建方法
CN110197104B (zh) 基于车辆的测距方法及装置
Sun et al. Automatic targetless calibration for LiDAR and camera based on instance segmentation
Chavan et al. Obstacle detection and avoidance for automated vehicle: A review
CN114419259B (zh) 一种基于物理模型成像仿真的视觉定位方法及***
CN114359384A (zh) 一种车辆定位方法、装置、车辆***和存储介质
CN113792645A (zh) 一种融合图像和激光雷达的ai眼球
Ahrnbom et al. Seg2Pose: Pose Estimations from Instance Segmentation Masks in One or Multiple Views for Traffic Applications.
Shen et al. YCANet: Target Detection for Complex Traffic Scenes Based on Camera-LiDAR Fusion
Zhang et al. A real-time obstacle detection algorithm for the visually impaired using binocular camera
Mukherjee et al. Object mapping from disparity map by fast clustering