CN116665189B - Multi-mode-based automatic driving task processing method and system - Google Patents

Multi-mode-based automatic driving task processing method and system Download PDF

Info

Publication number
CN116665189B
CN116665189B CN202310945276.6A CN202310945276A CN116665189B CN 116665189 B CN116665189 B CN 116665189B CN 202310945276 A CN202310945276 A CN 202310945276A CN 116665189 B CN116665189 B CN 116665189B
Authority
CN
China
Prior art keywords
perception
voxel
automatic driving
type
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310945276.6A
Other languages
Chinese (zh)
Other versions
CN116665189A (en
Inventor
丁勇
刘瑞香
戴行
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Haipu Microelectronics Co ltd
Original Assignee
Hefei Haipu Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Haipu Microelectronics Co ltd filed Critical Hefei Haipu Microelectronics Co ltd
Priority to CN202310945276.6A priority Critical patent/CN116665189B/en
Publication of CN116665189A publication Critical patent/CN116665189A/en
Application granted granted Critical
Publication of CN116665189B publication Critical patent/CN116665189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/0097Predicting future conditions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/0098Details of control systems ensuring comfort, safety or stability not otherwise provided for
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0043Signal treatments, identification of variables or parameters, parameter estimation or state estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Automation & Control Theory (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Mechanical Engineering (AREA)
  • Mathematical Physics (AREA)
  • Transportation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a multi-mode-based automatic driving task processing method and a multi-mode-based automatic driving task processing system, wherein the method comprises the following steps: acquiring modal data acquired by a plurality of perception sensors, and extracting voxel characteristics of the modal data; after unifying the feature dimension and resolution of the extracted voxel features, carrying out feature fusion to obtain first type of feature adaptively fused with different modes; and acquiring a perception task of the automatic driving, inputting the first type of body characteristics and the perception task into a pre-established and trained transducer model of the automatic driving, and completing tasks of the perception of the automatic driving vehicle, the prediction of actions of surrounding objects and the planning of driving behaviors. The method can effectively reduce the training cost and the deep learning model deployment difficulty caused by a mode of a plurality of independent models, and can fully utilize the relevance among perception, prediction and planning tasks to obtain the effect of mutual improvement in performance.

Description

Multi-mode-based automatic driving task processing method and system
Technical Field
The invention relates to the technical field of automatic driving, in particular to a multi-mode-based automatic driving task processing method and system.
Background
The autopilot (Autonomous Driving) technology has led to an industrial revolution in the automotive industry, whose development has kept away from the continual innovation and advancement of autopilot awareness, prediction and planning technology. With the continuous improvement of the Perception sensor technology and related artificial intelligence algorithms, the automatic driving vehicle can obtain more accurate and comprehensive scene information, and complete automatic driving Perception (permission), prediction (Prediction) and Planning (Planning) tasks, so that safer and more efficient driving is realized. The perception is a 'visual system' of the automatic driving vehicle, the prediction and the planning are 'brains' of the automatic driving vehicle, and the intelligent vehicle is a key technology for constructing intelligent traffic and intelligent cities in the smart city, and has an important technical support function for future Chinese smart city construction.
The sensing sensor technology mainly relates to a laser radar, a millimeter wave radar and a camera, the current mainstream automatic driving technology uses a plurality of independent deep learning models, and utilizes multi-mode data from the three mainstream sensing sensors to respectively complete sensing, prediction and planning tasks. This method has the following disadvantages: 1) Extracting features from multi-modal data is a deep learning network structure common to each task and is one of the main components of a model structure, which can lead to an increase in training cost in a manner of using a plurality of independent models; 2) The perceived, predicted and planned tasks have correlation, and the independent model cannot utilize the correlation to improve the accuracy of the tasks; 3) Multiple independent models increase the actual deployment cost of the deep learning model.
Disclosure of Invention
In order to solve the technical problems in the background technology, the invention provides a multi-mode-based automatic driving task processing method and a multi-mode-based automatic driving task processing system.
The invention provides a multi-mode-based automatic driving task processing method, which comprises the following steps:
s1, acquiring modal data acquired by a plurality of perception sensors, and extracting voxel characteristics of the modal data;
s2, after unifying the feature dimension and the resolution of the extracted voxel features, carrying out feature fusion to obtain first type voxel features which are adaptively fused with different modes;
s3, acquiring an automatic driving perception task, inputting the first type of body characteristics and the perception task into a pre-established and trained automatic driving transducer model, and completing tasks of automatic driving vehicle perception, surrounding object action prediction and driving behavior planning; the transducer model specifically comprises: an autopilot-aware converter network, a surrounding object motion prediction converter network, and a driving behavior planning converter network;
"S3" specifically includes:
acquiring an automatic driving perception task, inputting the first type of body characteristics and the perception task into an automatic driving perception Transformer network, completing the corresponding automatic driving vehicle perception task through a perception task output head, and acquiring a perception output result;
constructing a Key and a Value related to perception by using a perception output result;
inputting the perception output result into a voxel feature filter to obtain sparse interesting second type of voxel features, and constructing a first type Key and Value related to voxel environment through the second type of voxel features;
inputting the Key and Value related to perception and the Key and Value of the first type into a surrounding object motion prediction transducer network at the same time to obtain the Key and Value of the second type related to motion prediction, and then completing the corresponding task of predicting the motion of the surrounding object of the automatic driving vehicle through a surrounding object motion prediction output head;
and after the second type Key and Value are simultaneously input into the driving behavior programming converter network, the corresponding task of driving behavior programming of the automatic driving vehicle is completed through the driving behavior programming output head.
Preferably, the mode data collected by the plurality of sensing sensors specifically includes: image collected by camera sensorPoint cloud acquired by laser radar sensor>Point cloud acquired by millimeter wave radar sensor>
Preferably, the perceived tasks of autopilot include, but are not limited to, three-dimensional object detection, three-dimensional object tracking, three-dimensional semantic segmentation, three-dimensional space occupancy prediction, and online map generation.
Preferably, "inputting the sensing output result into a voxel feature filter to obtain a sparse interesting second type of voxel feature, and constructing a first type Key and Value related to voxel environment through the second type of voxel feature" specifically includes:
and inputting the perception output result into a voxel feature screening device, selecting interesting voxel features of the first type of voxel features through the perception output result by the voxel feature screening device, selecting sparse interesting second type of voxel features, and constructing first type Key and Value related to voxel environments through the second type of voxel features.
Preferably, the tasks of the driving behavior planning include, but are not limited to, keeping straight, turning left, turning right, accelerating, decelerating and stopping.
A multi-modal based autopilot task processing system comprising:
the feature extraction module is used for acquiring the modal data acquired by the plurality of perception sensors and extracting voxel features of the modal data;
the feature fusion module is used for carrying out feature fusion after unifying the feature dimension and the resolution of the extracted voxel features to obtain first type voxel features which are adaptively fused with different modes;
the task processing module acquires an automatic driving perception task, inputs the first type of body characteristics and the perception task into a pre-established and trained automatic driving transducer model, and completes tasks of automatic driving vehicle perception, surrounding object action prediction and driving behavior planning; the transducer model specifically comprises: an autopilot-aware converter network, a surrounding object motion prediction converter network, and a driving behavior planning converter network; the task processing module comprises: the system comprises an automatic driving perception processing module, a surrounding object action prediction processing module, a driving behavior planning processing module and a voxel characteristic screening module;
the automatic driving perception processing module is used for acquiring an automatic driving perception task, inputting a first type of body characteristics and the perception task into an automatic driving perception Transformer network, completing a corresponding automatic driving vehicle perception task through a perception task output head, acquiring a perception output result, and constructing a perception-related Key and Value by utilizing the perception output result;
the voxel feature screening module is used for inputting the perception output result into the voxel feature screening device to obtain sparse interesting second type voxel features, and constructing a first type Key and Value related to voxel environments through the second type voxel features;
the peripheral object motion prediction processing module is used for inputting the Key and the Value related to perception and the first type Key and the Value into a peripheral object motion prediction transducer network at the same time to obtain the second type Key and the Value related to motion prediction, and then completing the corresponding task of peripheral object motion prediction of the automatic driving vehicle through a peripheral object motion prediction output head;
the driving behavior planning processing module is used for inputting the second type Key and the Value into the driving behavior planning converter network at the same time and then completing corresponding tasks of driving behavior planning of the automatic driving vehicle through the driving behavior planning output head.
Preferably, the mode data collected by the plurality of sensing sensors specifically includes: an image acquired by a camera sensor, a point cloud acquired by a laser radar sensor and a point cloud acquired by a millimeter wave radar sensor;
the perception tasks of the autopilot comprise, but are not limited to, three-dimensional target detection, three-dimensional target tracking, three-dimensional semantic segmentation, three-dimensional space occupation prediction and online map generation.
Preferably, the tasks of the driving behavior planning include, but are not limited to, keeping straight, turning left, turning right, accelerating, decelerating and stopping.
According to the multi-mode-based automatic driving task processing method and system, various sensor data can be processed and fused into a unified voxel space in a multi-mode voxel feature generation stage, so that the addition and deletion of the number of sensors can be flexibly supported, and the feature requirements of a plurality of subsequent tasks can be met. In the multi-task output stage, multi-stage tasks such as perception, prediction and planning are combined, so that training cost increase and deep learning model deployment difficulty can be brought by a mode of effectively reducing a plurality of independent models, and the effect of mutual improvement in performance can be obtained by fully utilizing the relevance among the perception, prediction and planning tasks.
Drawings
FIG. 1 is a schematic diagram of a workflow of a multi-mode-based automatic driving task processing method according to the present invention;
FIG. 2 is a schematic diagram of the operation flow of the multi-mode-based automatic driving task processing method according to the present invention;
FIG. 3 is a schematic structural diagram of a multi-mode autopilot algorithm system based on a unified large model according to the present invention;
fig. 4 is a schematic structural diagram of a task processing module of the multi-mode automatic driving algorithm system based on the unified big model.
Detailed Description
Variable subscripts "cam", "LiDAR" and "Radar" are set for distinguishing between respective sensors of Camera, laser Radar (LiDAR) and millimeter wave Radar (Radar), and variable subscripts "perc", "pred" and "plan" are set for distinguishing between Perception (task), prediction (task) and Planning (task).
Referring to fig. 1 and 2, the multi-mode-based automatic driving task processing method provided by the invention comprises the following steps:
s1, acquiring modal data acquired by a plurality of perception sensors, and extracting voxel characteristics of the modal data.
In this embodiment, the sensing sensor collects the modal data of the autopilot application scene by using the sensors such as a camera, a laser radar, a millimeter wave radar, and the like, and correspondingly, the image collected by the camera sensor is collectedPoint cloud acquired by laser radar sensor>Point cloud acquired by millimeter wave radar sensor>And respectively inputting the corresponding voxel characteristics into a corresponding voxel characteristic generation network to obtain corresponding voxel characteristics.
And S2, after unifying the feature dimension and the resolution of the extracted voxel features, carrying out feature fusion to obtain first type voxel features which are adaptively fused with different modes.
In this embodiment, the image acquired by the camera sensor is usedPoint cloud acquired by laser radar sensorPoint cloud acquired by millimeter wave radar sensor>Conversion of corresponding voxel features into unified voxel feature space to form respective setsHaving the same characteristic dimension C and spatial resolution +.>Voxel characteristics of->Input to the corresponding adaptive voxel feature fusion network +.>And obtaining the self-adaptive fusion weight.
Voxel characterization by image modalityGenerating corresponding image voxel characteristic self-adaptive fusion weight +.>
Voxel characterization by lidar point cloud modalityGenerating self-adaptive fusion weight of corresponding laser radar point cloud voxel characteristics>
Voxel characterization by millimeter wave Lei Dadian cloud modalityGenerating corresponding millimeter wave Lei Dadian cloud voxel characteristic self-adaptive fusion weight +.>
Fusion weights to be generatedAnd (3) carrying out numerical normalization, namely:
wherein, as the normalization function, a Softmax function can be adopted.
Voxel characteristic of three modesFusion weights with corresponding adaptationsMultiplying and adding to obtain self-adaptive fused voxel characteristic +.>
The fused voxel featuresHaving the same characteristic dimension C and resolution +.>The system can flexibly adapt to the increase and decrease of the number of the sensors, namely, the input modes can be compatible with multiple modes (cameras, laser radars and millimeter wave radars), double modes (cameras and laser radars, laser radars and millimeter wave radars, cameras and millimeter wave radars) and single modes (cameras, laser radars and millimeter wave radars), and uniform voxel characteristics are obtained.
S3, acquiring an automatic driving perception task, inputting the first type of body characteristics and the perception task into a pre-established and trained automatic driving transducer model, and completing tasks of automatic driving vehicle perception, surrounding object action prediction and driving behavior planning.
The perception tasks of autopilot include, but are not limited to, three-dimensional object detection, three-dimensional object tracking, three-dimensional semantic segmentation, three-dimensional space occupation prediction, and online map generation.
In particular, the unified large model is specifically a transducer model for autopilot.
Specifically, as shown in fig. 1 and 2, the transducer model specifically includes: an autopilot-aware converter network, a surrounding object motion prediction converter network, and a driving behavior planning converter network;
"S3" specifically includes:
acquiring an automatic driving perception task, inputting the first type of body characteristics and the perception task into an automatic driving perception Transformer network, completing the corresponding automatic driving vehicle perception task through a perception task output head, and acquiring a perception output result; and constructing a Key and a Value related to perception by using the perception output result.
In particular, the output result is perceivedThe result is->And is used to construct multi-type perceptually relevant Key and Value, denoted as +.>
Inputting the perception output result into a voxel feature filter to obtain sparse interesting second type of voxel features, and constructing a first type Key and Value related to voxel environment through the second type of voxel features; respectively marked as
Inputting the perception output result into a voxel feature filter to obtain sparse interesting second type of voxel features, and constructing a first type Key and Value related to voxel environment through the second type of voxel features specifically comprises the following steps:
and inputting the perception output result into a voxel feature screening device, selecting interesting voxel features of the first type of voxel features through the perception output result by the voxel feature screening device, selecting sparse interesting second type of voxel features, and constructing first type Key and Value related to voxel environments through the second type of voxel features.
The perception-related Key and Value and the first type Key and Value are input into the surrounding object action prediction transducer network at the same timeObtaining a second type Key and Value related to motion prediction, and then completing corresponding tasks of motion prediction of surrounding objects of the automatic driving vehicle through a surrounding object motion prediction output head;
in this embodiment, surrounding object motion prediction transform neural network is usedVoxel context dependent +.>Motion prediction Query (denoted +.>) Learning and updating are carried out, the updated motion prediction Query is used for completing the task of motion prediction of objects around the automatic driving vehicle, and Key and Value related to motion prediction are recorded as +.>
The method comprises the following steps:
step 1:is->Part of use->For->Learning and updating are performed, and the process is based on a calculation mode of a transducer structure, as follows:
wherein (1)>The method comprises the following main calculation steps:
calculating a correlation matrix of the two; />The function normalizes the correlation matrix and is realized by a Softmax function; FFN is a feedforward neural network and can be arranged into a two-layer structure;may be set to 128.
Step 2:is->Voxel context dependent +.>And->Multiple pairs->Further learning and updating are performed, the process still being based on +.>A similar calculation is as follows:
step 3: the +.2 updated by the above step>To the motion prediction output head>In, for outputting the motion prediction result +.>
Step 4: the +.2 updated by the above step>Key and Value, which are also used as motion prediction-related, are denoted +.>
And after the second type Key and Value are simultaneously input into the driving behavior programming converter network, the corresponding task of driving behavior programming of the automatic driving vehicle is completed through the driving behavior programming output head.
In this embodiment, the driving behavior planning transducer neural networkAction prediction related +.>Planning driving behaviorQuery (denoted as->) And learning and updating, wherein the updated driving behavior planning Query is used for completing the corresponding task of driving behavior planning of the automatic driving vehicle.
Tasks for driving behavior planning include, but are not limited to, maintaining straight, left turn, right turn, acceleration, deceleration, and stopping.
The specific implementation process is as follows:
step 5: the driving behavior planning transducer neural networkAction prediction related +.>Planning a Query for driving behavior (denoted +.>) Learning and updating is performed, the process still being based on +.>A similar calculation is as follows:
step 6: the +.2 updated by the above step>Send to driving behavior planning output head->In for outputting the driving behavior planning prediction result +.>
Wherein the result isOutput->Including but not limited to maintaining specific driving behaviors such as straight, left turn, right turn, acceleration, deceleration, and parking.
Referring to fig. 3, a multi-modality based autopilot task processing system includes:
the feature extraction module is used for acquiring the modal data acquired by the plurality of perception sensors and extracting voxel features of the modal data;
the feature fusion module is used for carrying out feature fusion after unifying the feature dimension and the resolution of the extracted voxel features to obtain first type voxel features which are adaptively fused with different modes;
the task processing module acquires an automatic driving perception task, inputs the first type of body characteristics and the perception task into a pre-established and trained automatic driving transducer model, and completes tasks of automatic driving vehicle perception, surrounding object action prediction and driving behavior planning. The transducer model specifically comprises: an autopilot-aware converter network, a surrounding object motion prediction converter network, and a driving behavior planning converter network.
Specifically, as shown in fig. 3, the task processing module includes: the system comprises an automatic driving perception processing module, a surrounding object action prediction processing module, a driving behavior planning processing module and a voxel characteristic screening module;
the automatic driving perception processing module is used for acquiring an automatic driving perception task, inputting a first type of body characteristics and the perception task into an automatic driving perception Transformer network, completing a corresponding automatic driving vehicle perception task through a perception task output head, acquiring a perception output result, and constructing a perception-related Key and Value by utilizing the perception output result;
the voxel feature screening module is used for inputting the perception output result into the voxel feature screening device to obtain sparse interesting second type voxel features, and constructing a first type Key and Value related to voxel environments through the second type voxel features;
the peripheral object motion prediction processing module is used for inputting the Key and the Value related to perception and the first type Key and the Value into a peripheral object motion prediction transducer network at the same time to obtain the second type Key and the Value related to motion prediction, and then completing the corresponding task of peripheral object motion prediction of the automatic driving vehicle through a peripheral object motion prediction output head;
the driving behavior planning processing module is used for inputting the second type Key and the Value into the driving behavior planning converter network at the same time and then completing corresponding tasks of driving behavior planning of the automatic driving vehicle through the driving behavior planning output head.
Specifically, as shown in fig. 2, the modal data collected by the plurality of sensing sensors specifically includes: an image acquired by a camera sensor, a point cloud acquired by a laser radar sensor and a point cloud acquired by a millimeter wave radar sensor;
the perception tasks of autopilot include, but are not limited to, three-dimensional object detection, three-dimensional object tracking, three-dimensional semantic segmentation, three-dimensional space occupation prediction, and online map generation.
Specifically, as shown in FIG. 2, tasks of driving behavior planning include, but are not limited to, maintaining straight, left-turn, right-turn, accelerating, decelerating, and stopping.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (8)

1. The multi-mode-based automatic driving task processing method is characterized by comprising the following steps of:
s1, acquiring modal data acquired by a plurality of perception sensors, and extracting voxel characteristics of the modal data;
s2, after unifying the feature dimension and the resolution of the extracted voxel features, carrying out feature fusion to obtain first type voxel features which are adaptively fused with different modes;
s3, acquiring an automatic driving perception task, inputting the first type of body characteristics and the perception task into a pre-established and trained automatic driving transducer model, and completing tasks of automatic driving vehicle perception, surrounding object action prediction and driving behavior planning; the transducer model specifically comprises: an autopilot-aware converter network, a surrounding object motion prediction converter network, and a driving behavior planning converter network;
"S3" specifically includes:
acquiring an automatic driving perception task, inputting the first type of body characteristics and the perception task into an automatic driving perception Transformer network, completing the corresponding automatic driving vehicle perception task through a perception task output head, and acquiring a perception output result;
constructing a Key and a Value related to perception by using a perception output result;
inputting the perception output result into a voxel feature filter to obtain sparse interesting second type of voxel features, and constructing a first type Key and Value related to voxel environment through the second type of voxel features;
inputting the Key and Value related to perception and the Key and Value of the first type into a surrounding object motion prediction transducer network at the same time to obtain the Key and Value of the second type related to motion prediction, and then completing the corresponding task of predicting the motion of the surrounding object of the automatic driving vehicle through a surrounding object motion prediction output head;
and after the second type Key and Value are simultaneously input into the driving behavior programming converter network, the corresponding task of driving behavior programming of the automatic driving vehicle is completed through the driving behavior programming output head.
2. The multi-modal-based automatic driving task processing method according to claim 1, wherein the modal data collected by the plurality of sensing sensors specifically includes: image collected by camera sensorPoint cloud acquired by laser radar sensor>Point cloud acquired by millimeter wave radar sensor>
3. The multi-modal based autopilot task processing method of claim 1 wherein the autopilot-based sensory tasks include, but are not limited to, three-dimensional object detection, three-dimensional object tracking, three-dimensional semantic segmentation, three-dimensional space occupancy prediction, online map generation.
4. The method for processing the automatic driving task based on the multiple modes according to claim 1, wherein inputting the perception output result into the voxel feature filter to obtain sparse interesting second type of voxel features, and constructing the first type Key and Value related to the voxel environment through the second type of voxel features specifically comprises:
and inputting the perception output result into a voxel feature screening device, selecting interesting voxel features of the first type of voxel features through the perception output result by the voxel feature screening device, selecting sparse interesting second type of voxel features, and constructing first type Key and Value related to voxel environments through the second type of voxel features.
5. The method of claim 1, wherein the driving behavior planning tasks include, but are not limited to, keep straight, turn left, turn right, accelerate, decelerate, and park.
6. A multi-modal based autopilot task processing system comprising:
the feature extraction module is used for acquiring the modal data acquired by the plurality of perception sensors and extracting voxel features of the modal data;
the feature fusion module is used for carrying out feature fusion after unifying the feature dimension and the resolution of the extracted voxel features to obtain first type voxel features which are adaptively fused with different modes;
the task processing module acquires an automatic driving perception task, inputs the first type of body characteristics and the perception task into a pre-established and trained automatic driving transducer model, and completes tasks of automatic driving vehicle perception, surrounding object action prediction and driving behavior planning; the transducer model specifically comprises: an autopilot-aware converter network, a surrounding object motion prediction converter network, and a driving behavior planning converter network;
the task processing module comprises: the system comprises an automatic driving perception processing module, a surrounding object action prediction processing module, a driving behavior planning processing module and a voxel characteristic screening module;
the automatic driving perception processing module is used for acquiring an automatic driving perception task, inputting a first type of body characteristics and the perception task into an automatic driving perception Transformer network, completing a corresponding automatic driving vehicle perception task through a perception task output head, acquiring a perception output result, and constructing a perception-related Key and Value by utilizing the perception output result;
the voxel feature screening module is used for inputting the perception output result into the voxel feature screening device to obtain sparse interesting second type voxel features, and constructing a first type Key and Value related to voxel environments through the second type voxel features;
the peripheral object motion prediction processing module is used for inputting the Key and the Value related to perception and the first type Key and the Value into a peripheral object motion prediction transducer network at the same time to obtain the second type Key and the Value related to motion prediction, and then completing the corresponding task of peripheral object motion prediction of the automatic driving vehicle through a peripheral object motion prediction output head;
the driving behavior planning processing module is used for inputting the second type Key and the Value into the driving behavior planning converter network at the same time and then completing corresponding tasks of driving behavior planning of the automatic driving vehicle through the driving behavior planning output head.
7. The multi-modal based autopilot task processing system of claim 6 wherein the modal data collected by the plurality of perception sensors specifically includes: an image acquired by a camera sensor, a point cloud acquired by a laser radar sensor and a point cloud acquired by a millimeter wave radar sensor;
the perception tasks of the autopilot comprise, but are not limited to, three-dimensional target detection, three-dimensional target tracking, three-dimensional semantic segmentation, three-dimensional space occupation prediction and online map generation.
8. The multimodal automatic driving task processing system of claim 6 wherein the driving behavior planning tasks include, but are not limited to, holding straight, turning left, turning right, accelerating, decelerating, and stopping.
CN202310945276.6A 2023-07-31 2023-07-31 Multi-mode-based automatic driving task processing method and system Active CN116665189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310945276.6A CN116665189B (en) 2023-07-31 2023-07-31 Multi-mode-based automatic driving task processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310945276.6A CN116665189B (en) 2023-07-31 2023-07-31 Multi-mode-based automatic driving task processing method and system

Publications (2)

Publication Number Publication Date
CN116665189A CN116665189A (en) 2023-08-29
CN116665189B true CN116665189B (en) 2023-10-31

Family

ID=87710145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310945276.6A Active CN116665189B (en) 2023-07-31 2023-07-31 Multi-mode-based automatic driving task processing method and system

Country Status (1)

Country Link
CN (1) CN116665189B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063539A (en) * 2022-07-19 2022-09-16 上海人工智能创新中心 Image dimension increasing method and three-dimensional target detection method
CN115775378A (en) * 2022-11-30 2023-03-10 北京航空航天大学 Vehicle-road cooperative target detection method based on multi-sensor fusion
CN116229224A (en) * 2023-01-18 2023-06-06 重庆长安汽车股份有限公司 Fusion perception method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674421B (en) * 2021-08-25 2023-10-13 北京百度网讯科技有限公司 3D target detection method, model training method, related device and electronic equipment
JP2023073231A (en) * 2021-11-15 2023-05-25 三星電子株式会社 Method and device for image processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063539A (en) * 2022-07-19 2022-09-16 上海人工智能创新中心 Image dimension increasing method and three-dimensional target detection method
CN115775378A (en) * 2022-11-30 2023-03-10 北京航空航天大学 Vehicle-road cooperative target detection method based on multi-sensor fusion
CN116229224A (en) * 2023-01-18 2023-06-06 重庆长安汽车股份有限公司 Fusion perception method and device, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection》;Xin Li等;《arXiv》;全文 *
《ST-P3:End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning》;Shengchao Hu等;《arXiv》;第3节,图2 *
《Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving》;Zhenxun Yuan等;《arXiv》;全文 *
《体素点云融合的三维动态目标检测算法》;周锋 等;《计算机辅助设计与图形学学报》;第34卷(第6期);全文 *
《基于改进GAN的端到端自动驾驶图像生成方法》;孙雄风 等;《交通信息与安全》;第39卷(第5期);全文 *

Also Published As

Publication number Publication date
CN116665189A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
Mittal A survey on optimized implementation of deep learning models on the nvidia jetson platform
JP7086911B2 (en) Real-time decision making for self-driving vehicles
US11480972B2 (en) Hybrid reinforcement learning for autonomous driving
EP3289529B1 (en) Reducing image resolution in deep convolutional networks
Ondruska et al. End-to-end tracking and semantic segmentation using recurrent neural networks
CN111368972B (en) Convolutional layer quantization method and device
CN111797983A (en) Neural network construction method and device
Galvao et al. Pedestrian and vehicle detection in autonomous vehicle perception systems—A review
CN112990211A (en) Neural network training method, image processing method and device
WO2022007867A1 (en) Method and device for constructing neural network
CN113191241A (en) Model training method and related equipment
CN114494158A (en) Image processing method, lane line detection method and related equipment
CN111340190A (en) Method and device for constructing network structure, and image generation method and device
CN117157678A (en) Method and system for graph-based panorama segmentation
CN116109678B (en) Method and system for tracking target based on context self-attention learning depth network
Zaghari et al. Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques
CN115880560A (en) Image processing via an isotonic convolutional neural network
Kanchana et al. Computer vision for autonomous driving
US20230368513A1 (en) Method and system for training a neural network
CN116665189B (en) Multi-mode-based automatic driving task processing method and system
CN115146757A (en) Training method and device of neural network model
CN116680656B (en) Automatic driving movement planning method and system based on generating pre-training converter
CN116863430B (en) Point cloud fusion method for automatic driving
CN116902003B (en) Unmanned method based on laser radar and camera mixed mode
WO2023029704A1 (en) Data processing method, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant