CN116859724B

CN116859724B - Automatic driving model for simultaneous decision and prediction of time sequence autoregressive and training method thereof

Info

Publication number: CN116859724B
Application number: CN202310745966.7A
Authority: CN
Inventors: 黄际洲; 王凡; 叶晓青; 曾增烽; 吴泽武
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2024-03-15
Anticipated expiration: 2043-06-21
Also published as: CN116859724A

Abstract

The present disclosure provides an automatic driving model for simultaneous decision and prediction of time-series autoregressions and a training method thereof. Relates to the technical field of automatic driving. The model comprises a multi-mode coding layer, a time sequence layer and a decision control layer, wherein the input information of the multi-mode coding layer comprises the perception information of the surrounding environment of the vehicle obtained by a sensor, and the multi-mode coding layer is used for obtaining initial implicit representation; the time sequence layer is used for acquiring a time sequence transmission implicit expression of a first moment in the future; the decision control layer is used for acquiring first target automatic driving strategy information; the time sequence layer is used for acquiring time sequence transmission implicit expression of a second time in the future based on time sequence transmission implicit expression of the first time in the future, navigation information of a target vehicle and first target automatic driving strategy information, and the decision control layer acquires second target automatic driving strategy information based on time sequence transmission implicit expression of the second time in the future, so that the prediction effect of the model can be improved through time sequence autoregression.

Description

Automatic driving model for simultaneous decision and prediction of time sequence autoregressive and training method thereof

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the field of autopilot and artificial intelligence, and in particular, to an autopilot model, an autopilot method implemented using an autopilot model, a method of training an autopilot model, an autopilot device based on an autopilot model, an autopilot model training device, electronic equipment, a computer readable storage medium, a computer program product, and an autopilot vehicle.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

The automatic driving technology integrates the technologies of various aspects such as identification, decision making, positioning, communication safety, man-machine interaction and the like. Automatic driving strategies can be assisted by artificial intelligence learning.

High-precision maps, also called high-precision maps, are maps used by autopilot vehicles. The high-precision map has accurate vehicle position information and rich road element data information, and can help automobiles to predict complex road surface information such as gradient, curvature, heading and the like, so that potential risks are better avoided. In other words, the autopilot technology strongly depends on high-precision maps.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides an automatic driving model, an automatic driving method implemented using the automatic driving model, a training method of the automatic driving model, an automatic driving apparatus based on the automatic driving model, a training apparatus of the automatic driving model, an electronic device, a computer-readable storage medium, a computer program product, and an automatic driving vehicle.

According to an aspect of the present disclosure, there is provided an autopilot model including a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer. The first input information of the multi-mode coding layer comprises current perception information and historical perception information which are obtained by using a sensor and are aimed at the surrounding environment of the target vehicle, and the multi-mode coding layer is configured to obtain initial implicit representation corresponding to the first input information; the second input information of the time sequence layer is based on the initial implicit representation and navigation information of the target vehicle, and the time sequence layer is configured to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the target vehicle based on the second input information; the decision control layer is configured to acquire first target automatic driving strategy information corresponding to a first time in the future based on the time sequence transmission implicit representation of the first time in the future; and the time series layer is further configured to obtain a time series transfer implicit representation of a second future time instant based on the time series transfer implicit representation of the first future time instant, the navigation information of the target vehicle, and the first target automatic driving strategy information, wherein the second future time instant is after the first future time instant, and the decision control layer is further configured to obtain second target automatic driving strategy information corresponding to the second future time instant based on the time series transfer implicit representation of the second future time instant.

According to another aspect of the present disclosure, there is provided an autopilot method implemented using an autopilot model including a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer. The method comprises the following steps: acquiring first input information of a multi-mode coding layer, wherein the first input information of the multi-mode coding layer comprises current perception information and historical perception information which are acquired by using a sensor and aimed at the surrounding environment of a target vehicle; inputting the first input information into a multi-mode coding layer to obtain an initial implicit representation corresponding to the first input information, which is output by the multi-mode coding layer; inputting second input information based on the initial implicit representation and navigation information of the target vehicle into the time sequence layer to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the target vehicle output by the time sequence layer; the time sequence transmission implicit expression of the first moment in the future is input into a decision control layer to acquire first target automatic driving strategy information output by the decision control layer; further inputting the time sequence transfer implicit representation of the future first moment, the navigation information of the target vehicle and the first target automatic driving strategy information into a time sequence layer to obtain the time sequence transfer implicit representation of the future second moment output by the time sequence layer, wherein the future second moment is after the future first moment; and further inputting the timing sequence transmission implicit representation of the second moment in the future into a decision control layer to acquire second target automatic driving strategy information output by the decision control layer.

According to another aspect of the present disclosure, there is provided a training method of an automatic driving model including a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the method including: acquiring sample input information and real automatic driving strategy information corresponding to the sample input information, wherein the sample input information comprises current sample perception information and historical sample perception information aiming at the surrounding environment of a sample vehicle; inputting the sample perception information into a multi-mode coding layer to obtain an initial implicit representation output by the multi-mode coding layer; inputting intermediate sample input information based on the initial implicit representation and navigation information of the sample vehicle into a time sequence layer to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the sample vehicle output by the time sequence layer; the time sequence transmission implicit expression of the first moment in the future is input into a decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; further inputting the time sequence transfer implicit representation of the future first moment, the navigation information of the sample vehicle and the first predicted automatic driving strategy information into a time sequence layer to obtain the time sequence transfer implicit representation of the future second moment output by the time sequence layer, wherein the future second moment is after the future first moment; further inputting the time sequence transfer implicit representation of the second moment in the future into a decision control layer to obtain second predicted automatic driving strategy information output by the decision control layer; and adjusting parameters of the multi-modal encoding layer, the time series layer, and the decision control layer based at least on the first predictive autopilot strategy information, the second predictive autopilot strategy information, and the real autopilot strategy information.

According to another aspect of the present disclosure, there is provided an automatic driving apparatus based on an automatic driving model including a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the apparatus including: an input information acquisition unit configured to acquire first input information of the multi-modal encoding layer, wherein the first input information of the multi-modal encoding layer includes current perception information and history perception information for a surrounding environment of a target vehicle acquired by using a sensor; the multi-mode coding unit is configured to input the first input information into the multi-mode coding layer so as to acquire an initial implicit representation corresponding to the first input information, which is output by the multi-mode coding layer; a time-series unit configured to input second input information based on the initial implicit representation and navigation information of the target vehicle into the time-series layer to acquire a time-series transfer implicit representation for a future first moment of the target vehicle surroundings output by the time-series layer; and a decision control unit configured to input the timing transfer implicit representation of the future first time into the decision control layer to obtain the first target automatic driving strategy information output by the decision control layer, wherein the time series unit is further configured to input the timing transfer implicit representation of the future first time, the navigation information of the target vehicle, and the first target automatic driving strategy information into the time series layer to obtain the timing transfer implicit representation of the future second time output by the time series layer, wherein the future second time is after the future first time; and wherein the decision control unit is further configured to input the timing transfer implicit representation of the future second moment further into the decision control layer to obtain second target autopilot strategy information output by the decision control layer.

According to another aspect of the present disclosure, there is provided a training apparatus of an automatic driving model including a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the apparatus including: a sample input information acquisition unit configured to acquire sample input information including current sample perception information and historical sample perception information for a sample vehicle surrounding environment and real automatic driving strategy information corresponding to the sample input information; the multi-mode coding layer training unit is configured to input sample perception information into the multi-mode coding layer so as to acquire initial implicit expression output by the multi-mode coding layer; a time-series layer training unit configured to input intermediate sample input information based on the initial implicit representation and the navigation information of the sample vehicle into the time-series layer to obtain a time-series transfer implicit representation for a future first moment of the sample vehicle surroundings output by the time-series layer; the system comprises a decision control layer training unit, a time sequence layer training unit and a decision control layer, wherein the decision control layer training unit is configured to input a time sequence transmission implicit representation of a first time in the future into the decision control layer to acquire first predicted automatic driving strategy information output by the decision control layer, the time sequence layer training unit is further configured to input the time sequence transmission implicit representation of the first time in the future, navigation information of a sample vehicle and the first predicted automatic driving strategy information into the time sequence layer to acquire a time sequence transmission implicit representation of a second time in the future output by the time sequence layer, the second time in the future is after the first time in the future, and the decision control layer training unit is further configured to further input the time sequence transmission implicit representation of the second time in the future into the decision control layer to acquire second predicted automatic driving strategy information output by the decision control layer; and a parameter adjustment unit configured to adjust parameters of the multi-modal encoding layer, the time-series layer, and the decision control layer based on at least the first predicted automatic driving strategy information, the second predicted automatic driving strategy information, and the real automatic driving strategy information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the above method.

According to another aspect of the present disclosure, there is provided an autonomous vehicle including: an autopilot device, an autopilot model training device, and one of electronic devices according to embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of an autopilot model according to another embodiment of the present disclosure;

FIG. 4 illustrates a schematic diagram of an autopilot model in accordance with another embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of an autopilot method implemented using an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 6 illustrates a flow chart of a method of training an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 7 illustrates a flow chart of a method of training an autopilot model in accordance with another embodiment of the present disclosure;

FIG. 8 illustrates a flow chart of a method of training an autopilot model in accordance with another embodiment of the present disclosure;

FIG. 9 illustrates a schematic diagram of a training method employing an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 10 illustrates a block diagram of an autopilot based autopilot model in accordance with an embodiment of the present disclosure;

FIG. 11 shows a block diagram of a training device of an autopilot model in accordance with an embodiment of the present disclosure; and

fig. 12 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another element. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

In the related art, the unmanned technique mainly relies on the cooperation of the perception module and the planning control module. The working process of autopilot comprises a plurality of phases: first, unstructured information obtained by a sensor such as a camera or radar is converted into structured information (structured information includes obstacle information, other vehicle information, pedestrian and non-motor vehicle information, lane line information, traffic light information, other static road surface information, and the like). The information can be combined and matched with the high-precision map, so that the position information on the high-precision map can be accurately obtained. Second, predictions and decisions are made based on structured information and related observation histories. Wherein predicting comprises predicting a change in the ambient structured environment over a period of time in the future; decisions include generating some structured information (e.g., lane change, stuffing, waiting) that can be used for subsequent trajectory planning. Third, a trajectory of the target vehicle for a future period of time is planned, such as a planned trajectory or control information (e.g., planned speed and position), based on the structured decision information and the change in the surrounding structured environment. For this reason, it is often necessary to temporally fuse the spatial information acquired from the sensors, and such a full fusion of time-spatial information involves a very high-dimensional aggregation of spatio-temporal information.

In the related art, the temporal-spatial information aggregation may be performed based on the sequence coding of the codec. However, in the space operation process, a single frame of data often includes tens or more than tens of thousands of vector representations, and each vector representation may include hundreds or even thousands of dimensions. Spatial representation-based temporal aggregation may involve hundreds of frames of data, and thus, aggregate may involve aggregation of hundreds of billions of dimension vectors, which is very computationally difficult. On the other hand, in the related art, the effect of the decision of the autonomous vehicle itself on the future prediction may not be well considered in making the autonomous streaming decision based only on past or current unstructured information acquired by the sensor. For example, when the host vehicle decides to jam, it may cause the vehicle behind the lane to be turned into to have to slow down in advance. If the vehicle is not plugged, the vehicle behind the lane to be turned into may keep running normally. Such complex interactions between the host vehicle and surrounding obstacles can affect the accuracy of decisions and future predictions.

Based on this, the present application provides an automatic driving model, an automatic driving method implemented by using the automatic driving model, a training method of the automatic driving model, an automatic driving device based on the automatic driving model, a training device of the automatic driving model, an electronic device, a computer readable storage medium, a computer program product and an automatic driving vehicle. The time sequence layer is used for continuously updating the time sequence transfer implicit expression so as to realize autoregressive time sequence transfer, the time sequence transfer implicit expression can play a role in time signal transfer, the time cost of time sequence modeling of a large number of vector expressions can be greatly reduced through the time sequence autoregressive time sequence, the Multi-mode (Multi-Modal) problem of track prediction is solved, and the prediction effect of an automatic driving model is improved. In addition, the time sequence layer predicts based on the input data comprising the first target automatic driving strategy information to obtain the time sequence transmission implicit expression of the second moment in the future, and the decision control layer makes subsequent decisions based on the time sequence transmission implicit expression of the second moment in the future, so that the automatic driving model can fully consider the influence of the decisions of the automatic driving vehicle on the future prediction, and the prediction accuracy is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes a motor vehicle 110, a server 120, and one or more communication networks 130 coupling the motor vehicle 110 to the server 120.

In an embodiment of the present disclosure, motor vehicle 110 may include a computing device in accordance with an embodiment of the present disclosure and/or be configured to perform a method in accordance with an embodiment of the present disclosure.

The server 120 may run one or more services or software applications that enable autopilot. In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user of motor vehicle 110 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from motor vehicle 110. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of motor vehicle 110.

Network 130 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, the one or more networks 130 may be a satellite communications network, a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (including, for example, bluetooth, wiFi), and/or any combination of these with other networks.

The system 100 may also include one or more databases 150. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 150 may be used to store information such as audio files and video files. The data store 150 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 150 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 150 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

Motor vehicle 110 may include a sensor 111 for sensing the surrounding environment. The sensors 111 may include one or more of the following: visual cameras, infrared cameras, ultrasonic sensors, millimeter wave radar, and laser radar (LiDAR). Different sensors may provide different detection accuracy and range. The camera may be mounted in front of, behind or other locations on the vehicle. The vision cameras can capture the conditions inside and outside the vehicle in real time and present them to the driver and/or passengers. In addition, by analyzing the captured images of the visual camera, information such as traffic light indication, intersection situation, other vehicle running state, etc. can be acquired. The infrared camera can capture objects under night vision. The ultrasonic sensor can be arranged around the vehicle and is used for measuring the distance between an object outside the vehicle and the vehicle by utilizing the characteristics of strong ultrasonic directivity and the like. The millimeter wave radar may be installed in front of, behind, or other locations of the vehicle for measuring the distance of an object outside the vehicle from the vehicle using the characteristics of electromagnetic waves. Lidar may be mounted in front of, behind, or other locations on the vehicle for detecting object edges, shape information for object identification and tracking. The radar apparatus may also measure a change in the speed of the vehicle and the moving object due to the doppler effect.

Motor vehicle 110 may also include a communication device 112. The communication device 112 may include a satellite positioning module capable of receiving satellite positioning signals (e.g., beidou, GPS, GLONASS, and GALILEO) from satellites 141 and generating coordinates based on these signals. The communication device 112 may also include a module for communicating with the mobile communication base station 142, and the mobile communication network may implement any suitable communication technology, such as the current or evolving wireless communication technology (e.g., 5G technology) such as GSM/GPRS, CDMA, LTE. The communication device 112 may also have a Vehicle-to-Everything (V2X) module configured to enable, for example, vehicle-to-Vehicle (V2V) communication with other vehicles 143 and Vehicle-to-Infrastructure (V2I) communication with Infrastructure 144. In addition, the communication device 112 may also have a module configured to communicate with a user terminal 145 (including but not limited to a smart phone, tablet computer, or wearable device such as a watch), for example, by using a wireless local area network or bluetooth of the IEEE802.11 standard. With the communication device 112, the motor vehicle 110 can also access the server 120 via the network 130.

Motor vehicle 110 may also include a control device 113. The control device 113 may include a processor, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), or other special purpose processor, etc., in communication with various types of computer readable storage devices or mediums. The control device 113 may include an autopilot system for automatically controlling various actuators in the vehicle. The autopilot system is configured to control a powertrain, steering system, braking system, etc. of a motor vehicle 110 (not shown) via a plurality of actuators in response to inputs from a plurality of sensors 111 or other input devices to control acceleration, steering, and braking, respectively, without human intervention or limited human intervention. Part of the processing functions of the control device 113 may be implemented by cloud computing. For example, some of the processing may be performed using an onboard processor while other processing may be performed using cloud computing resources. The control device 113 may be configured to perform a method according to the present disclosure. Furthermore, the control means 113 may be implemented as one example of a computing device on the motor vehicle side (client) according to the present disclosure.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

According to an aspect of the present disclosure, an autopilot model is provided. Fig. 2 shows a schematic diagram of an autopilot model 200 in accordance with an embodiment of the present disclosure.

As shown in fig. 2, the automatic driving model 200 includes a multi-modal coding layer 21O, a time-series layer 220 connected to the multi-modal coding layer 21O, and a decision control layer 230 connected to the time-series layer 220.

The first input information of the multi-modal encoding layer 21O includes current and historical perception information for the surroundings of the target vehicle obtained with the sensor, and the multi-modal encoding layer 210 is configured to obtain an initial implicit representation b corresponding to the first input information.

The second input information of the time series layer 220 is based on the initial implicit representation B and the navigation information p of the target vehicle, the time series layer 220 being configured for obtaining a time-sequential transfer implicit representation B 'for a future first moment of the surroundings of the target vehicle based on the second input information' _t+1 ；

The decision control layer 230 is configured for delivering the implicit representation B 'based on the timing of the future first moment' _t+1 Acquiring first target automatic driving strategy information a corresponding to future first moment _t+1 The method comprises the steps of carrying out a first treatment on the surface of the And is also provided with

The time series layer 220 is further configured for delivering the implicit representation B 'based on the timing of the future first moment' _t+1 Navigation information p of target vehicle, and first target automatic driving strategy information a _t+1 Acquiring a timing transfer implicit representation B 'of a future second moment' _t+2 . The future second moment is after the future first moment, and the decision control layer 230 is further configured for delivering the implicit representation B 'based on the timing of the future second moment' _t+2 Acquiring second target automatic driving strategy information a corresponding to a second moment in future _t+2 。

According to the embodiment of the invention, the time sequence layer 220 is used for continuously updating the implicit representation of the time sequence transfer, so that the automatic regression is realized by the time sequence transfer, the implicit representation of the time sequence transfer can play a role in time signal transfer, the time cost of time sequence modeling for a large number of vector representations can be greatly reduced by the automatic regression of the time sequence, the Multi-Modal (Multi-Modal) problem of track prediction is solved, and the prediction effect of the automatic driving model 200 is improved. In addition, since the time sequence layer 220 further predicts based on the input data including the first target autopilot strategy information to obtain the implicit representation of the timing transfer at the second moment in the future, and the decision control layer 230 makes the subsequent decision based on the implicit representation of the timing transfer at the second moment in the future, the autopilot model 200 can fully consider the influence of the decision of the autopilot vehicle itself on the future prediction, thereby improving the accuracy of the prediction.

In an example, the first input information may include sensing information In1 of one or more cameras, sensing information In2 of one or more lidars, and sensing information In3 of one or more millimeter wave radars. It will be appreciated that the perceived information of the surroundings of the target vehicle is not limited to the above-described one, and may include, for example, only the perceived information In1 of a plurality of cameras, and not one or morePerception information In2 of the lidar and perception information In3 of one or more millimeter wave radars. The sensing information In1 acquired by the camera may be sensing information In the form of a picture or a video, and the sensing information In2 acquired by the laser radar may be sensing information In the form of a radar point cloud (e.g., a three-dimensional point cloud). In an example, the different forms of information (pictures, videos, point clouds) and the like described above may be directly input to the multi-modal encoding layer 210 without preprocessing. Furthermore, the perception information includes current perception information x for the surroundings of the target vehicle during the running of the vehicle _t History awareness information x corresponding to a plurality of history times _t-Δt Here, there may be a time span of a preset duration between t and Δt.

In an example, the multimodal encoding layer 210 may perform encoding calculations on the first input information to generate a corresponding initial implicit representation b. The initial implicit representation b may be, for example, an implicit representation in a Bird's Eye View (BEV) space.

In an example, the time series layer 220 of the autopilot model 200 may include a Transformer network structure, for example, may include a Vision Transformer structure.

The second input information of the time series layer 220 is navigation information p based on the initial implicit representation b and the target vehicle. For example, the second input information may include a timing transfer implicit representation B of the current time _t Implicit representation B of timing transfer at the current time _t For example, as described in equation (1) below, the initial implicit representation b based on the current time of day _t And the timing transfer of the last moment implicitly represents B _t-1 Obtained.

B _t ＝TempralEncoder(B _t-1 ，b _t ) Equation (1)

Here, the timing transfer of the first moment implicitly represents B _t-k For example, may be randomly generated.

And, the second input information may further include navigation information p of the target vehicle, in an example, the navigation information p may include vectorized navigation information and vectorized map information, which may be obtained by vectorizing one or more of lane-level, or road-level navigation information and coarse positioning information.

The time series layer 220 is configured for obtaining a time series transfer implicit representation B 'for a future first moment of the target vehicle surroundings based on the second input information' _t+1 For example, the timing transfer implicit representation B 'for the first time of the future can be obtained using equation (2) as follows' _t+1 ：

B′ _t+1 ＝TemporalDecoder(B _t P) equation (2)

Accordingly, the decision control layer 230 is configured to deliver the implicit representation B 'based on the timing of the first moment in the future' _t+1 Acquiring first target automatic driving strategy information a corresponding to future first moment _t+1 . First target automatic driving strategy information a _t+1 For example, may include planning a trajectory or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.). In an example, a trajectory plan may be interpreted using a control strategy module in an autonomous vehicle to obtain control signals for the vehicle.

In an example, the implicit representation B 'may be transferred based on the timing of the first moment in the future, e.g. by the following equation (3)' _t+1 Navigation information p of target vehicle, and first target automatic driving strategy information a _t+1 Acquiring a timing transfer implicit representation B 'of a future second moment' _t+2 。

B′ _t+2 ＝TemporalDecoder(B′ _t+1 ，a _t+1 P) equation (3)

Further, the decision control layer 230 may deliver the implicit representation B 'based on the timing of the future second moment' _t+2 Acquiring second target automatic driving strategy information a corresponding to a second moment in future _t+2 . Similarly, the second target automatic driving strategy information a _t+2 A planned trajectory or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.) may also be included.

According to some embodiments, the initial implicit representation may include the current initial implicit representation b _t And historical initial implicit representation b _t-1 The multi-modal encoding layer 210 may be configured to obtain a current initial implicit representation b corresponding to the current perceptual information _t Historical initial implicit representation b corresponding to historical perceptual information _t-1 。

Fig. 3 shows a schematic diagram of an autopilot model 300 in accordance with another embodiment of the present disclosure. As shown in fig. 3, according to some embodiments, the autopilot model 300 includes a multi-modal encoding layer 310 and a decision control layer 330. Further, the time-series layers include a time-series encoding layer 321 and a time-series decoding layer 322 connected to the time-series encoding layer 321. The time series coding layer 321 is connected to the multi-modal coding layer 310, and the time series coding layer 321 is configured to obtain a timing transfer implicit representation of the current moment based on the initial implicit representation. The time series decoding layer 322 is configured to acquire a time series transfer implicit representation of a future first time based on the time series transfer implicit representation of the current time and the navigation information of the target vehicle, and the time series decoding layer 322 is further configured to acquire a time series transfer implicit representation of a future second time based on the time series transfer implicit representation of the future first time, the navigation information of the target vehicle, and the first target autopilot strategy information.

According to some embodiments, the initial implicit representation may include a current initial implicit representation b corresponding to the current perceptual information and the historical perceptual information, respectively _t And historical initial implicit representation b _t-1 And the time-series encoding layer 321 is configured to:

initial implicit representation b based at least on history _t-1 Acquisition of timing transfer implicit representation B of historic moments _t-1 The method comprises the steps of carrying out a first treatment on the surface of the And

implicit representation B of timing transfer based on historic moments _t-1 And the current initial implicit representation b _t Acquiring timing transfer implicit representation B of current moment _t 。

For example, as shown in equation (4), b may be first expressed in terms of the initial implicit expression of the last time instant _t-1 And timing transfer implicit representation B of the last instant _t-2 Obtaining the timing transfer implicit representation B of the last time _t-1 ：

B _t-1 ＝TempralEncoder(B _t-2 ，b _t-1 ) Equation (4)

Based on the initial implicit representation b of the current time again according to equation (1) mentioned above _t And the timing transfer of the last moment implicitly represents B _t-1 Implicit representation B of the time sequence transfer of the current moment _t 。

The time sequence decoding layer 322 is configured to deliver the implicit representation B based on the timing of the current time instant _t And the navigation information p of the target vehicle acquires the time sequence transmission implicit expression B 'of the first moment in the future' _t+1 And the time series decoding layer 322 is further configured for delivering the implicit representation B 'based on the timing of the future first moment' _t+1 Navigation information p of the target vehicle, and first target automatic driving strategy information a acquired by the decision control layer 330 _t+1 Acquiring a timing transfer implicit representation B 'of a future second moment' _t+2 。

It follows that the timing transfer implicit representation B of the current moment can be first obtained from the initial implicit representation (e.g. comprising a current initial implicit representation and a historical initial implicit representation corresponding to the current perceptual information and the historical perceptual information, respectively) _t ，B _t Further passing the past and current time signals to the future and iteratively deriving a timing-passed implicit representation B 'of the first moment in the future' _t+1 And a timing transfer implicit representation B 'of a future second moment' _t+2 . During each iteration, the decision control layer 330 may make a decision plan based on the implicit representation of the timing transfer at each future time instant, and the decision plan at the first future time instant may be used for the implicit representation B 'of the timing transfer at the second future time instant by the time series decoding layer 322' _t+2 Is a further prediction of (2). Therefore, the automatic driving model 300 can further fully consider the influence of the self-decision of the automatic driving vehicle on the future prediction, so that the accuracy of the future prediction is further improved on the premise of reasonable calculation.

According to some embodiments, the time series decoding layer 322 is further configured for timing transfer based on a future second time instantImplicit representation B' _t+2 Navigation information p of target vehicle, and second target automatic driving strategy information a _t+2 Acquiring a timing transfer implicit representation B 'of a future third moment' _t+3 The future third moment is after the future second moment. Thus, the time series decoding layer 322 can continue to perform iterative reasoning.

According to some embodiments, the time-series encoding layer 321 is configured to initially implicitly represent b based on history _t-1 And time sequence transfer of an implicitly represented original random value (e.g., B _t-2 Raw random values for timing transfer implicit representation, i.e., randomly generated values) obtain timing transfer implicit representation B at a historical moment _t-1 。

According to some embodiments, with continued reference to fig. 3, the autopilot model 300 may further include a perception detection layer 340, the perception detection layer 340 configured to obtain target detection information s for the target vehicle surroundings based on the initial implicit representation of the input _t The object detection information packet is to object types of a plurality of road surface elements and obstacles in the surrounding environment of the vehicle and current state information thereof.

The road surface element may be a stationary object and the obstacle may be a moving object.

In an example, the target detection information may be a bounding box in three-dimensional space for the obstacle, and may indicate a classification, status, etc. of the corresponding obstacle in the bounding box. For example, the size, position, and type of vehicle, the current state of the vehicle (e.g., whether the tail information such as turn signal, high beam, etc. is on), the position and length of the lane line, etc. of the obstacle in the surrounding frame may be indicated. It will be appreciated that the classification for the respective obstacle in the bounding box may be one or more of a plurality of predefined categories. Further, the object detection information may be structured information.

In an example, the perception detection layer 340 may include a decoder in a transducer.

According to some embodiments, with continued reference to fig. 3, the autopilot model 300 may further include a future prediction layer 350, the future prediction layer 350 configured for input-based timing of a future first moment in timeDelivering implicit representation B' _t+1 Or the timing transfer of the second moment in the future implicitly represents B' _t+2 Future prediction information s 'for predicting the surroundings of the target vehicle' _t 。

In an example, future prediction layer 350 may be a decoder in a transform. In an example, the future prediction information may output structured prediction information.

According to some embodiments, the future prediction information comprises at least one of: future predicted perceived information for the target vehicle surroundings (e.g., sensor information at a future time, the sensor information at the future time including camera input information or radar input information at the future time), future predicted implicit representations corresponding to the future predicted perceived information (e.g., implicit representations in BEV space corresponding to the sensor information at the future time), and future predicted detected information for the target vehicle surroundings (e.g., obstacle positions at the future time). And the future prediction detection information may include the types of a plurality of obstacles in the surrounding environment of the target vehicle and their future prediction state information (including the size of the obstacle and various long tail information).

Fig. 4 shows a schematic diagram of an autopilot model 400 according to another embodiment of the present disclosure.

According to some embodiments, as shown in fig. 4, according to some embodiments, an autopilot model 400 includes a multi-modal encoding layer 410, a time-series decoding layer 420 (i.e., the time-series layer is the time-series decoding layer 420), and a decision control layer 430.

The initial implicit representation comprises a current initial implicit representation b corresponding to the current perception information and the historical perception information respectively _t And historical initial implicit representation b _t-1 The time sequence decoding layer 420 is further configured for based on the current initial implicit representation b _t Navigation information p of target vehicle, time sequence transfer implicit expression B of history time _t-1 And target automatic driving strategy information a at historic time _t-1 Obtaining a predicted implicit representation b 'of a future first time instant' _t+1 And timing transfer implicit representation B of the current time _t Wherein the timing transfer of the historic moment implicitly represents B _t-1 To initially implicitly represent b based on history _t-1 Target automatic driving strategy information a at historic time _t-1 Implicit representation B for time sequence transfer based on historical moment _t-1 。

The time series decoding layer 420 is further configured for implicitly representing b 'based on the prediction of the future first time instant' _t+1 Implicit representation B of timing transfer at the current time _t Navigation information p of target vehicle, and target automatic driving strategy information a corresponding to current time _t Obtaining a predicted implicit representation b 'of a future second time instant' _t+2 And the timing transfer of the first moment in the future implicitly represents B' _t+1 Wherein, the target automatic driving strategy information a corresponding to the current moment _t Implicit representation B for time-sequential transfer based on the current instant _t 。

For example, as shown in equation (5), the implicit representation B may be delivered based on the timing of the historical moment _t-1 Current initial implicit representation b _t Target automatic driving strategy information a at historic time _t-1 Navigation information p of a target vehicle, and obtaining a predicted implicit expression b 'of a first moment in the future' _t+1 And timing transfer implicit representation B of the current time _t 。

B _t ，b′ _t+1 ＝TempralDecoder(B _t-1 ，b _t ，a _t-1 P) equation (5)

Wherein the timing transfer of the historical moment implicitly represents B _t-1 Can be obtained according to the following equation (6):

B _t-1 ，b′ _t ＝TempralDecoder(B _t-2 ，b _t-1 ，a _t-2 p) equation (6)

Further, the predicted implicit representation b 'of the future second moment can be obtained according to equation (7)' _t+2 And the timing transfer of the first moment in the future implicitly represents B' _t+1 ：

B′ _t+1 ，b′ _t+2 ＝TempralDecoder(B _t ，b _t+1 ，a _t P) equation (7)

The decision control layer 430 connected to the time series decoding layer 420 may deliver the implicit representation B according to the timing of the current time, for example, output by the time series decoding layer 420 _t Or the timing transfer of the first moment in the future implicitly represents B' _t+1 Acquiring target automatic driving strategy information a at corresponding time _t Or a _t+1 . Target automatic driving strategy information a _t Or a _t+1 A planned trajectory or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.) may also be included.

It can be seen that not only is the timing passed implicit representation B _t Further passing the past and current time signals to the future and iteratively deriving a timing-passed implicit representation B 'of the first moment in the future' _t+1 And a timing transfer implicit representation B 'of a future second moment' _t+2 And the model represents implicit representations (e.g., predicted implicit representation b 'of the first time in the future during each iteration' _t+1 The prediction of the future second instant implicitly represents b' _t+2 ) Iterative updates are also made in such a way that the decision plan at each instant can be used for further predictions of the predicted implicit representation and the timing transfer implicit representation at subsequent instants. Therefore, the automatic driving model 400 can further fully consider the influence of the self-decision of the automatic driving vehicle on the future prediction, so that the accuracy of the future prediction is further improved on the premise of reasonable calculation.

According to some embodiments, the time series decoding layer 420 is further configured for implicitly representing b 'based on future predictions of a future second time instant' _t+2 Implicit representation B 'of the timing transfer of the first moment in the future' _t+1 Navigation information p of target vehicle, and first target automatic driving strategy information a corresponding to future first time _t+1 Acquiring future prediction implicit representation b 'of future third moment' _t+3 And a timing transfer implicit representation B 'of a future second moment' _t+2 . Thus, continuous iterations can be performed.

According to some embodiments, with continued reference to fig. 4, the autopilot model 400 may further include a perception detection layer 440, the perception detection layer 440 configured for use with a baseAcquiring target detection information s for the surroundings of a target vehicle from an initial implicit representation of the input _t The object detection information packet is to object types of a plurality of road surface elements and obstacles in the surrounding environment of the vehicle and current state information thereof.

According to some embodiments, with continued reference to fig. 4, the autopilot model 400 may further include a future prediction layer 450, the future prediction layer 450 configured to convey an implicit representation B 'based on the timing of the input future first moment in time' _t+1 Or the timing transfer of the second moment in the future implicitly represents B' _t+2 Future prediction information s 'for predicting the surroundings of the target vehicle' _t 。

In an example, the future prediction layer 450 may be a decoder in a transform. In an example, the future prediction information may output structured prediction information.

It will be appreciated that the future prediction information includes at least one of: future predicted perceived information for the target vehicle surroundings (e.g., sensor information at a future time, the sensor information at the future time including camera input information or radar input information at the future time), future predicted implicit representations corresponding to the future predicted perceived information (e.g., implicit representations in BEV space corresponding to the sensor information at the future time), and future predicted detected information for the target vehicle surroundings (e.g., obstacle positions at the future time). And the future prediction detection information may include the types of a plurality of obstacles in the surrounding environment of the target vehicle and their future prediction state information (including the size of the obstacle and various long tail information).

According to some embodiments, a decision control layer (e.g., decision control layer 230, 330, 430) is configured to:

acquiring probability distribution of each first target automatic driving strategy in a plurality of first target automatic driving strategies based on time sequence transmission implicit representation of a first moment in the future; and

sampling is carried out in each probability distribution, and a first target automatic driving strategy with the largest probability is determined as the first target automatic driving strategy information.

In an example, the decision control layer may convey the implicit representation B 'based on the timing of the first moment in the future' _t+1 Acquiring probability distribution (e.g., 55% left turn, 10% right turn, 35% brake) of each first target autopilot strategy of a plurality of first target autopilots (e.g., left turn, right turn, brake);

and sampling is performed in each probability distribution, and the first target automatic driving strategy (for example, 55% of left turn) with the highest probability is determined as the first target automatic driving strategy information a _t+1 。

According to some embodiments, the sensor may comprise a camera and the perceived information may comprise a two-dimensional image acquired by the camera. And the multi-modality encoding layer (e.g., multi-modality encoding layer 210, 310, 410) is further configured to: based on first input information including a two-dimensional image, and internal and external parameters of the camera, an implicit representation corresponding to the first input information is acquired.

In an example, the camera's internal parameters (i.e., parameters related to the camera's own characteristics, such as the camera's focal length, pixel size, etc.) and external parameters (i.e., parameters in the world coordinate system, such as the camera's position, direction of rotation, etc.) may be input into the modality encoding layer as super-parameters of the autopilot model. The camera's internal and external parameters may be used to perform conversion of the input two-dimensional image into, for example, BEV space.

Furthermore, the perception information may be a sequence of two-dimensional images acquired by a plurality of cameras, respectively.

According to some embodiments, the first input information may further comprise a lane-level map, and the navigation information may comprise road-level navigation information and/or lane-level navigation information. Unlike high-precision maps, lane-level maps have better availability and smaller space occupation. Thus, by using the lane-level map and the lane-level navigation information, the dependence on the high-precision map can be overcome.

The navigation Map may include a road level Map (SD Map), a lane level Map (LD Map), a high-definition Map (HD Map). The road map mainly comprises road topology information with granularity, has lower navigation positioning accuracy (for example, the accuracy is about 15 meters), is mainly used for helping a driver to navigate, and cannot meet the requirement of automatic driving. Whereas lane-level maps and high-precision maps may be used for automatic driving. The lane-level map incorporates topology information at the lane level with higher accuracy, typically at the sub-meter level, than the road-level map, and may include road information (e.g., lane lines) and accessory facility information related to the lanes (e.g., traffic lights, guideboards, parking spaces, etc.), which may be used to assist in automated driving. Compared with a lane-level map, the high-precision map has higher map data precision (the precision reaches the centimeter level), richer map data types and higher map updating frequency, and can be used for automatic driving. The three navigation maps have the advantages of rich information, highest precision and higher use and update cost of the high-precision map. According to the scheme, the automatic driving technology of the heavy-sensing light map can be realized, so that dependence on a high-precision map can be eliminated, and efficient decision making is ensured.

According to some embodiments, the perceptual information may include at least one of: the method comprises the steps of acquiring images by a camera, acquiring information by a laser radar and acquiring information by a millimeter wave radar. It will be appreciated that the image acquired by the camera may be in the form of a picture or video and the information acquired by the lidar may be a radar point cloud (e.g. a three-dimensional point cloud).

According to another aspect of the present disclosure, an autopilot method implemented using an autopilot model is provided.

Fig. 5 illustrates a flow chart of an autopilot method 500 implemented using an autopilot model in accordance with an embodiment of the present disclosure. The automatic driving model comprises a multi-mode coding layer, a time sequence layer connected with the multi-mode coding layer and a decision control layer connected with the time sequence layer. As shown in fig. 5, the method 500 includes:

step S510, acquiring first input information of a multi-mode coding layer, wherein the first input information of the multi-mode coding layer comprises current perception information and historical perception information which are acquired by using a sensor and aimed at the surrounding environment of a target vehicle;

step S520, inputting the first input information into the multi-mode coding layer to obtain an initial implicit representation corresponding to the first input information output by the multi-mode coding layer;

Step S530, inputting second input information based on the initial implicit representation and navigation information of the target vehicle into the time sequence layer to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the target vehicle output by the time sequence layer;

step S540, implicitly representing the time sequence transmission at the first moment in the future to be input into a decision control layer so as to obtain first target automatic driving strategy information output by the decision control layer;

step S550, further inputting the time sequence transfer implicit representation of the future first moment, the navigation information of the target vehicle and the first target automatic driving strategy information into the time sequence layer to obtain the time sequence transfer implicit representation of the future second moment output by the time sequence layer, wherein the future second moment is after the future first moment; and

step S560, the time sequence transmission implicit expression of the second moment in the future is further input into the decision control layer to obtain the second target automatic driving strategy information output by the decision control layer.

The time sequence layer can continuously update the implicit representation of time sequence transfer so as to realize autoregressive for the time sequence transfer, so that the implicit representation of time sequence transfer can play a role in time signal transfer, the time cost of time sequence modeling for a large number of vector representations can be greatly reduced through the autoregressive of time sequence, and the Multi-Modal (Multi-Modal) problem of track prediction is solved, so that the prediction and decision effect of automatic driving is better. In addition, the time sequence layer further predicts based on the input data comprising the first target automatic driving strategy information to obtain the time sequence transmission implicit expression of the second moment in the future, and the decision control layer makes subsequent decisions based on the time sequence transmission implicit expression of the second moment in the future, so that the automatic driving process can fully consider the influence of the decisions of the automatic driving vehicle on the future prediction, and the accuracy of the future prediction is improved.

According to some embodiments, the time-series layers may include a time-series coding layer and a time-series decoding layer connected to the time-series coding layer, the time-series coding layer is connected to the multi-mode coding layer, and the time-series decoding layer is connected to the decision control layer. The step S530 may include: inputting the initial implicit expression into a time sequence coding layer to acquire a time sequence transmission implicit expression of the current moment output by the time sequence coding layer; and inputting the time sequence transfer implicit representation of the current moment and the navigation information of the target vehicle into a time sequence decoding layer to acquire the time sequence transfer implicit representation of the future first moment output by the time sequence decoding layer.

According to some embodiments, the time-series layer may be a time-series decoding layer, and the initial implicit representation includes a current initial implicit representation and a historical initial implicit representation corresponding to the current perceptual information and the historical perceptual information, respectively. The method 500 may further include:

inputting third input information including a current initial implicit representation, navigation information of a target vehicle, a time-series transfer implicit representation of a history time, and target automatic driving strategy information of the history time into a time-series decoding layer to acquire a predicted implicit representation of a future first time and a time-series transfer implicit representation of the current time output by the time-series decoding layer, wherein the time-series transfer implicit representation of the history time is based on the history initial implicit representation, the target automatic driving strategy information of the history time is based on the time-series transfer implicit representation of the history time, and

Wherein, step S550 may include: and inputting the predicted implicit representation of the first time in the future, the time sequence transmission implicit representation of the current time, the navigation information of the target vehicle and the target automatic driving strategy information corresponding to the current time into a time sequence layer to obtain the predicted implicit representation of the second time and the time sequence transmission implicit representation of the first time in the future, wherein the target automatic driving strategy information corresponding to the current time is the time sequence transmission implicit representation based on the current time.

According to another aspect of the present disclosure, a method of training an autopilot model is provided.

Fig. 6 shows a flowchart of a training method of an autopilot model according to an embodiment of the present disclosure. The automatic driving model comprises a multi-mode coding layer, a time sequence layer connected with the multi-mode coding layer and a decision control layer connected with the time sequence layer. As shown in fig. 6, the method 600 includes:

step S610, acquiring sample input information and real automatic driving strategy information corresponding to the sample input information, wherein the sample input information comprises current sample perception information and historical sample perception information aiming at the surrounding environment of a sample vehicle;

Step S620, inputting sample perception information into the multi-mode coding layer to obtain an initial implicit representation output by the multi-mode coding layer;

step S630, inputting intermediate sample input information based on the initial implicit representation and the navigation information of the sample vehicle into a time sequence layer to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the sample vehicle output by the time sequence layer;

step S640, the time sequence transmission implicit expression of the first moment in the future is input into the decision control layer to obtain the first predicted automatic driving strategy information output by the decision control layer;

step S650, further inputting the timing transfer implicit representation of the future first moment, the navigation information of the sample vehicle, and the first predicted automatic driving strategy information into the time sequence layer to obtain the timing transfer implicit representation of the future second moment output by the time sequence layer, wherein the future second moment is after the future first moment;

step S660, further inputting the time sequence transmission implicit expression of the second moment in the future into a decision control layer to obtain second predicted automatic driving strategy information output by the decision control layer; and

step S670, adjusting parameters of the multi-mode encoding layer, the time sequence layer and the decision control layer based on at least the first predicted automatic driving strategy information, the second predicted automatic driving strategy information and the real automatic driving strategy information.

The time sequence layer is used for continuously updating the time sequence transfer implicit expression so as to realize autoregressive time sequence transfer, so that the time sequence transfer implicit expression can play a role in time signal transfer, the time cost for time sequence modeling of a large number of vector expressions can be greatly reduced through the time sequence autoregressive time sequence, and the Multi-Modal (Multi-Modal) problem of track prediction is solved, so that the prediction effect of the automatic driving model obtained through training is better. In addition, the time sequence layer predicts based on the input data comprising the first predicted automatic driving strategy information to obtain the time sequence transmission implicit expression of the second moment in the future, and the decision control layer makes subsequent decisions based on the time sequence transmission implicit expression of the second moment in the future, so that the automatic driving model obtained through training can fully consider the influence of the decision of the automatic driving vehicle on the future prediction, and the accuracy of the future prediction is improved.

In an example, the sample input information may include perception information of one or more cameras of the sample vehicle, perception information of one or more lidars, and perception information of one or more millimeter wave radars. It will be appreciated that the sensing information of the surroundings of the sample vehicle is not limited to the above-described one, and may include, for example, only sensing information of a plurality of cameras, but not sensing information of one or more lidars and a sensor Sensing information of one or more millimeter wave radars. The perceived information obtained by the camera may be perceived information in the form of a picture or video, and the perceived information obtained by the lidar may be perceived information in the form of a radar point cloud (e.g., a three-dimensional point cloud). In an example, the above-described different forms of information (pictures, videos, point clouds) and the like may be directly input to the multi-modal encoding layer without preprocessing. Furthermore, the perception information includes current perception information x for the surroundings of the sample vehicle during the driving of the sample vehicle _t History awareness information x corresponding to a plurality of history times _t-Δt Here, there may be a time span of a preset duration between t and Δt.

In an example, the actual autopilot information may include, for example, a trajectory of the sample vehicle or a control signal (e.g., a signal to control throttle, brake, steering amplitude, etc.) for the sample vehicle.

According to some embodiments, the time-series layers may include a time-series coding layer and a time-series decoding layer connected to the time-series coding layer, the time-series coding layer being connected to the multi-mode coding layer, the time-series decoding layer being connected to the decision control layer.

Step S630 may include: inputting the initial implicit representation into a time sequence coding layer to acquire a time sequence transmission implicit representation of the current moment output by the time sequence coding layer; and inputting the time sequence transfer implicit representation of the current moment and the navigation information of the sample vehicle into a time sequence decoding layer to acquire the time sequence transfer implicit representation of the future first moment output by the time sequence decoding layer.

And, step S650 may include: the time sequence transmission implicit representation of the future first moment, the navigation information of the sample vehicle and the first prediction automatic driving strategy information are further input into a time sequence decoding layer to obtain the time sequence transmission implicit representation of the future second moment output by the time sequence decoding layer.

According to some embodiments, the time-series layer may be a time-series decoding layer, and the initial implicit representation includes a current initial implicit representation and a historical initial implicit representation corresponding to the current perceptual information and the historical perceptual information, respectively.

Step S630 may include: inputting the current initial implicit representation, the navigation information of the sample vehicle, the time sequence transmission implicit representation of the historical moment and the predicted automatic driving strategy information of the historical moment into a time sequence decoding layer to obtain the future predicted implicit representation of the future first moment and the time sequence transmission implicit representation of the current moment output by the time sequence decoding layer, wherein the time sequence transmission implicit representation of the historical moment is based on the historical initial implicit representation, and the target automatic driving strategy information of the historical moment is based on the time sequence transmission implicit representation of the historical moment; and

and inputting the future prediction implicit representation of the future first moment, the time sequence transmission implicit representation of the current moment, the navigation information of the sample vehicle and the current prediction automatic driving strategy information corresponding to the current moment into a time sequence decoding layer to obtain the future prediction implicit representation of the future second moment and the time sequence transmission implicit representation of the future first moment which are output by the time sequence decoding layer, wherein the current prediction automatic driving strategy information corresponding to the current moment is the time sequence transmission implicit representation based on the current moment.

Fig. 7 illustrates a flow chart of a method 700 of training an autopilot model in accordance with another embodiment of the present disclosure. As shown in fig. 7, the method 700 includes steps S710 to S780, wherein step S710 is similar to step S610 described above, and steps S730 to S770 are similar to steps S620 to S660 described above, respectively, and are not repeated here.

According to some embodiments, the sample input information may include an intervention identification capable of characterizing whether the actual autopilot strategy information is autopilot strategy information with human intervention. The method further comprises the steps of:

step S720, obtaining evaluation feedback information for the sample input information,

and step S780 may include: parameters of the multi-mode coding layer, the time sequence layer and the decision control layer are adjusted based on the intervention identification, the evaluation feedback information, the first prediction automatic driving strategy information, the second prediction automatic driving strategy information and the real automatic driving strategy information.

In the training process of the real vehicle, a safety person can intervene at any time at critical time, and takes control right of the automatic driving vehicle. After the crisis passes, control is returned to the autonomous vehicle. The intervention identification is used for representing whether the real automatic driving strategy information is automatic driving strategy information with human intervention. In other words, by introducing the intervention mark, the unacceptable model training cost caused by collision possibly occurring during the actual vehicle training can be avoided. Reinforcement learning can gradually learn to avoid adverse events that occur with intervention. Through the mechanism, the reinforcement learning efficiency can be improved, and the influence of the inferior experience on the learning process can be reduced, so that the robustness of the model obtained through training is further improved.

For example, the objective function L in the following equation (8) can be employed ₁ To adjust parameters of the multi-mode coding layer, the time sequence layer and the decision control layer:

where R is an intermediate value calculated by reinforcement learning from feedback, and when the intervention identifies (i _t ) When the value is true (e.g. 1), the automatic driving vehicle is controlled manually and is not controlled by a control signal sent by the automatic driving model any more; when the intervention is identified (i _t ) When the value is a non-true value (e.g., 0), the control signal sent by the automatic driving model is used for representing the control of the automatic driving vehicle, and the control signal is not manually controlled.

Fig. 8 illustrates a flow chart of a method 800 of training an autopilot model in accordance with another embodiment of the present disclosure. According to some embodiments, the autopilot model may further include a future prediction layer, and as shown in fig. 8, method 800 may further include:

step 810, acquiring future real information corresponding to sample input information;

step S820, the time sequence transmission implicit expression of the first moment in the future is input into the future prediction layer so as to acquire the future prediction information output by the future prediction layer; and

step S830 adjusts parameters of the multi-mode coding layer, the time sequence layer and the future prediction layer based on the future prediction information and the future real information.

In an example, the future real information may be information that is manually annotated. For example, data (x) collected for an automatically driven vehicle (including a manually driven vehicle with an automatic driving sensor) ₁ ，x ₂ ...，x _t ,..) the road surface elements and obstacles therein may be manually annotated so as to obtain future real information, such as bounding boxes in three-dimensional space, and the real classification, real current state, etc. of the corresponding obstacles in the bounding boxes may be annotated. For example, the actual size, location, and type of vehicle, the current state of the vehicle (e.g., whether the tail information such as turn signal, high beam, etc. is on), the location and length of the lane lines, etc. of the obstacle in the bounding box may be noted.

For example, the objective function L in the following equation (9) may be employed ₂ To adjust parameters of future prediction layers:

L ₂ ＝∑ _t ∑ _k log p(s’ _t+k ＝s _t+k |x ₁ ，...，x _t ) Equation (9)

And, the objective function L in the equation (8) and the equation (9) can be based ₁ And L ₂ Adjusting parameters of a multi-modal coding layer, a temporal sequence layer, a decision control layer and a future prediction layer, here based on an objective function L ₂ The parameter adjustment can be used as an auxiliary training task.

In model training, an initial implicit representation of each time instant (e.g., b _t+k ) Deriving may also utilize an implicit representation of the predictions predicted for each instant (e.g., b' _t+k ) Autoregressive derivation was performed.

Fig. 9 shows a schematic diagram of a training method applying an autopilot model according to an embodiment of the present disclosure. As shown in fig. 9, the model 900 includes a multi-mode encoding layer 910, a time-series decoding layer 920 (i.e., the time-series layer is the time-series decoding layer 920), and a decision control layer 930. In time seriesIn a scenario where the column layer is the time series decoding layer 920, an initial implicit representation of each instant (e.g., the true BEV spatial implicit representation b may be utilized _t-1 、b _t 、b _t+1 、b _t+2 ) The derivation is performed to perform model training.

According to some embodiments, the autoregressive derivation is performed, i.e., with an implicit representation of the predictions predicted at each time instant (e.g., see b 'in FIG. 4)' _t+1 、b′ _t+2 Etc.) can be further employed as the objective function L in the following equation (10) ₃ To adjust parameters of the multi-mode coding layer:

L ₃ ＝∑ _t ∑ _k logp(b′ _t+k ＝b _t+k |x ₁ ，...，x _t ) Equation (10)

And, the objective function L in the equation (8), the equation (9) and the equation (10) can be based ₁ 、L ₂ And L ₃ Parameters of the multi-mode coding layer, the time sequence decoding layer, the decision control layer and the future prediction layer are adjusted.

According to another aspect of the present disclosure, an autopilot based on an autopilot model is provided.

Fig. 10 shows a block diagram of an autopilot 1000 based on an autopilot model in accordance with an embodiment of the present disclosure.

The autopilot model includes a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, as shown in fig. 10, the apparatus 1000 includes:

an input information acquisition unit 1010 configured to acquire first input information of the multi-modal encoding layer, wherein the first input information of the multi-modal encoding layer includes current perception information and history perception information for a surrounding environment of a target vehicle acquired with a sensor;

the multi-mode encoding unit 1020 is configured to input the first input information into the multi-mode encoding layer to obtain an initial implicit representation corresponding to the first input information output by the multi-mode encoding layer;

a time-series unit 1030 configured to input second input information based on the initial implicit representation and navigation information of the target vehicle into the time-series layer to acquire a time-series transfer implicit representation for a future first time of the surrounding environment of the target vehicle output by the time-series layer; and

a decision control unit 1040 configured to input a timing transfer implicit representation of a first time in the future to the decision control layer, to obtain first target automatic driving strategy information output by the decision control layer,

Wherein the time series unit 1030 is further configured to further input a time series layer with the time series transfer implicit representation of a future first time instant, the navigation information of the target vehicle, and the first target autopilot strategy information to obtain a time series transfer implicit representation of a future second time instant output by the time series layer, wherein the future second time instant is after the future first time instant; and is also provided with

Wherein the decision control unit 1040 is further configured to input the timing transfer implicit representation of the second time in the future further into the decision control layer to obtain the second target autopilot strategy information output by the decision control layer.

According to another aspect of the present disclosure, a training apparatus for an autopilot model is provided.

Fig. 11 shows a block diagram of a training apparatus 1000 of an autopilot model in accordance with an embodiment of the present disclosure.

The autopilot model includes a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, as shown in fig. 11, the apparatus 1100 includes:

a sample input information acquisition unit 1110 configured to acquire sample input information including current sample perception information and historical sample perception information for a sample vehicle surroundings and real automatic driving strategy information corresponding to the sample input information;

The multi-modal coding layer training unit 1120 is configured to input the sample perception information into the multi-modal coding layer to obtain an initial implicit representation output by the multi-modal coding layer;

a time-series layer training unit 1130 configured to input intermediate sample input information based on the initial implicit representation and the navigation information of the sample vehicle into the time-series layer to obtain a time-series transfer implicit representation for a future first moment of the sample vehicle surroundings output by the time-series layer;

the decision control layer training unit 1140 is configured to input the timing transfer implicit representation of the first time in the future to the decision control layer, so as to obtain the first predicted automatic driving strategy information output by the decision control layer.

Wherein the time series layer training unit 1130 is further configured to further input the time series layer with the time series transfer implicit representation of a future first time instant, the navigation information of the sample vehicle, and the first predicted automatic driving strategy information to obtain a time series transfer implicit representation of a future second time instant output by the time series layer, wherein the future second time instant is after the future first time instant.

And the decision control layer training unit 1140 is further configured to further input the timing transfer implicit representation of the second time in the future into the decision control layer to obtain second predicted autopilot strategy information output by the decision control layer; and

The parameter adjustment unit 1150 is configured to adjust parameters of the multi-modal encoding layer, the time-series layer, and the decision control layer based on at least the first predicted automatic driving strategy information, the second predicted automatic driving strategy information, and the real automatic driving strategy information.

It should be appreciated that the various modules or units of the apparatus 1000 shown in fig. 10 may correspond to the various steps in the method 500 described with reference to fig. 5. Thus, the operations, features and advantages described above with respect to method 500 apply equally to apparatus 1000 and the modules and units comprised thereof; and the various modules or units of the apparatus 1100 shown in fig. 11 may correspond to the various steps in the method 600 described with reference to fig. 6. Thus, the operations, features and advantages described above with respect to method 600 apply equally to apparatus 1100 and the modules and units comprising it. For brevity, certain operations, features and advantages are not described in detail herein.

It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various units described above with respect to fig. 10 and 11 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the units may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these units may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the units 1010-1040, and units 1110-1150 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a processor (e.g., central processing unit (ProcessingUnit, CPU), microcontroller, microprocessor, digital signal processor (Digital Signal Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.

According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an autopilot method or a training method of an autopilot model in accordance with embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method of automated driving or a method of training an automated driving model according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method of automatic driving or a method of training an automatic driving model according to embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided an autonomous vehicle including the autonomous device 1000 according to an embodiment of the present disclosure, the training device 1100 of the autonomous model, and one of the above-described electronic devices.

With reference to fig. 12, a block diagram of an electronic device 1200 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the electronic device 1200 includes a computing unit 1201 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the electronic device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other via a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204.

Various components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206, an output unit 1207, a storage unit 1208, and a communication unit 1209. The input unit 1206 may be any type of device capable of inputting information to the electronic device 1200, the input unit 1206 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 1207 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1208 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices over computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1201 performs the various methods and processes described above, such as methods (or processes) 500 through 800. For example, in some embodiments, the methods (or processes) 500-800 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When a computer program is loaded into RAM 1203 and executed by computing unit 1201, one or more steps of methods (or processes) 500 to 800 described above may be performed. Alternatively, in other embodiments, computing unit 1201 may be configured to perform methods (or processes) 500-800 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. An automatic driving model comprises a multi-mode coding layer, a time sequence layer connected with the multi-mode coding layer and a decision control layer connected with the time sequence layer,

The multi-mode coding layer is configured to acquire initial implicit representation corresponding to the first input information;

the second input information of the time sequence layer is based on the initial implicit representation and navigation information of the target vehicle, and the time sequence layer is configured to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the target vehicle based on the second input information;

the decision control layer is configured to acquire first target automatic driving strategy information corresponding to a first future time based on the time sequence transmission implicit representation of the first future time; and is also provided with

Wherein the time series layer is further configured to obtain a timing transfer implicit representation of a second future time instant based on the timing transfer implicit representation of the first future time instant, the navigation information of the target vehicle, and the first target automatic driving strategy information, wherein the second future time instant is subsequent to the first future time instant, and the decision control layer is further configured to obtain second target automatic driving strategy information corresponding to the second future time instant based on the timing transfer implicit representation of the second future time instant.

2. The automatic driving model of claim 1, wherein the initial implicit representation comprises a current initial implicit representation and a historical initial implicit representation, the multi-modal encoding layer configured to obtain the current initial implicit representation corresponding to the current perception information and the historical initial implicit representation corresponding to the historical perception information.

3. The autopilot model of claim 1 wherein the time-series layers include a time-series encoding layer and a time-series decoding layer coupled to the time-series encoding layer,

wherein the time-series coding layer is connected to the multi-modal coding layer and the time-series coding layer is configured to obtain a timing transfer implicit representation of a current time instant based on the initial implicit representation; and is also provided with

Wherein the time series decoding layer is connected to the decision control layer and is configured to obtain a time series transfer implicit representation of the future first time instant based on the time series transfer implicit representation of the current time instant and the navigation information of the target vehicle, and the time series decoding layer is further configured to obtain a time series transfer implicit representation of the future second time instant based on the time series transfer implicit representation of the future first time instant, the navigation information of the target vehicle, and the first target automatic driving strategy information.

4. The automatic driving model of claim 3, wherein the initial implicit representation comprises a current initial implicit representation and a historical initial implicit representation corresponding to the current and historical perceptual information, respectively, and the time-series encoding layer is configured to:

acquiring a timing transfer implicit representation of a historical moment based at least on the historical initial implicit representation; and

and acquiring the timing sequence transfer implicit representation of the current moment based on the timing sequence transfer implicit representation of the historical moment and the current initial implicit representation.

5. The autopilot model of claim 3 wherein the time sequence decoding layer is further configured to obtain a time sequence delivery implicit representation of a third future time instant based on the time sequence delivery implicit representation of the second future time instant, navigation information of the target vehicle, and the second target autopilot strategy information, the third future time instant being subsequent to the second future time instant.

6. The automatic driving model of claim 4, wherein the time-series encoding layer is configured to obtain a timing transfer implicit representation of the historical moment based on original random values of the historical initial implicit representation and the timing transfer implicit representation.

7. The automatic driving model of claim 1, wherein the time-series layer is a time-series decoding layer and the initial implicit representation includes a current initial implicit representation and a historical initial implicit representation corresponding to the current perception information and the historical perception information, respectively, the time-series decoding layer being further configured to obtain a future predictive implicit representation of a future first time instant and a timing transfer implicit representation of a current time instant based on the current initial implicit representation, navigation information of a target vehicle, a timing transfer implicit representation of a historical time instant, and target automatic driving strategy information of the historical time instant, wherein the timing transfer implicit representation of the historical time instant is based on the historical initial implicit representation, the target automatic driving strategy information of the historical time instant is based on the timing transfer implicit representation of the historical time instant, and

the time sequence decoding layer is further configured to obtain a future prediction implicit representation of a second time in the future and a timing transfer implicit representation of the first time in the future based on the future prediction implicit representation of the first time in the future, the timing transfer implicit representation of the current time, navigation information of the target vehicle, and current target automatic driving strategy information corresponding to the current time, wherein the target automatic driving strategy information corresponding to the current time is the timing transfer implicit representation based on the current time.

8. The automatic driving model of claim 7, wherein the time series decoding layer is further configured to obtain a future predicted implicit representation of a future third time instant and a timing passed implicit representation of a future second time instant based on the future predicted implicit representation of the future second time instant, the timing passed implicit representation of the future first time instant, navigation information of the target vehicle, and first target automatic driving strategy information corresponding to the future first time instant.

9. The automatic driving model of any one of claims 1-8, further comprising a perception detection layer configured to obtain target detection information for the target vehicle surroundings based on the initial implicit representation of input, the target detection information comprising a plurality of road surface elements and types of obstacles in the target vehicle surroundings and current state information thereof.

10. The automatic driving model of any one of claims 1-8, further comprising a future prediction layer configured to predict future prediction information for the target vehicle surroundings based on the input timing transfer implicit representation of the future first time instant or the timing transfer implicit representation of the future second time instant.

11. The autopilot model of claim 10 wherein the future prediction information includes at least one of:

future predictive perceptual information for the target vehicle surroundings, an implicit representation of a future prediction corresponding to the future predictive perceptual information, and future predictive detection information for the target vehicle surroundings,

wherein the future prediction detection information includes types of a plurality of obstacles in the surrounding environment of the target vehicle and future prediction state information thereof.

12. The autopilot model of any one of claims 1 to 8 wherein the decision control layer is configured to:

acquiring probability distribution of each first target automatic driving strategy in a plurality of first target automatic driving strategies based on the time sequence transmission implicit representation of the future first moment; and

13. Model according to any one of claims 1 to 8, wherein the sensor comprises a camera, the perception information comprises a two-dimensional image acquired by the camera,

The multi-modal coding layer is further configured to:

based on first input information comprising the two-dimensional image and internal and external parameters of the camera, an implicit representation corresponding to the first input information is acquired.

14. The model of any one of claims 1 to 8, wherein the first input information further comprises a lane-level map, the navigation information comprising road-level navigation information and/or lane-level navigation information.

15. The model of any of claims 1-8, wherein the perceptual information comprises at least one of:

the method comprises the steps of acquiring images by a camera, acquiring information by a laser radar and acquiring information by a millimeter wave radar.

16. An autopilot method implemented using an autopilot model, the autopilot model comprising a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the method comprising:

acquiring first input information of the multi-mode coding layer, wherein the first input information of the multi-mode coding layer comprises current perception information and historical perception information which are acquired by utilizing a sensor and aimed at the surrounding environment of a target vehicle;

Inputting the first input information into the multi-mode coding layer to obtain an initial implicit representation corresponding to the first input information, which is output by the multi-mode coding layer;

inputting second input information based on the initial implicit representation and navigation information of the target vehicle into the time sequence layer to acquire a time sequence transfer implicit representation of a future first moment of the surrounding environment of the target vehicle output by the time sequence layer;

inputting the time sequence transmission implicit representation of the future first moment into the decision control layer to acquire first target automatic driving strategy information output by the decision control layer;

further inputting the timing transfer implicit representation of the future first time, the navigation information of the target vehicle, and the first target autopilot strategy information into the time sequence layer to obtain a timing transfer implicit representation of a future second time output by the time sequence layer, wherein the future second time is subsequent to the future first time; and

and further inputting the time sequence transmission implicit representation of the future second moment into the decision control layer to acquire second target automatic driving strategy information output by the decision control layer.

17. The method of claim 16, wherein the time-series layers include a time-series coding layer and a time-series decoding layer connected to the time-series coding layer, the time-series coding layer is connected to the multi-mode coding layer, and the time-series decoding layer is connected to the decision control layer,

wherein inputting second input information based on the initial implicit representation and navigation information of the target vehicle into the time-series layer to obtain a time-series transfer implicit representation of a future first moment of time for the surrounding environment of the target vehicle output by the time-series layer comprises:

inputting the initial implicit expression into the time sequence coding layer to acquire a time sequence transmission implicit expression of the current moment output by the time sequence coding layer; and

and inputting the time sequence transfer implicit representation of the current moment and the navigation information of the target vehicle into the time sequence decoding layer to acquire the time sequence transfer implicit representation of the future first moment output by the time sequence decoding layer.

18. The method of claim 16, wherein the time-series layer is a time-series decoding layer and the initial implicit representation comprises a current initial implicit representation and a historical initial implicit representation corresponding to the current perceptual information and historical perceptual information, respectively, and the method further comprises:

Inputting third input information including the current initial implicit representation, navigation information of a target vehicle, a timing transfer implicit representation of a history time, and target automatic driving strategy information of the history time into the time series decoding layer to acquire a predicted implicit representation of a future first time output by the time series decoding layer and a timing transfer implicit representation of the current time, wherein the timing transfer implicit representation of the history time is based on the history initial implicit representation, the target automatic driving strategy information of the history time is based on the timing transfer implicit representation of the history time, and

wherein further inputting the timing transfer implicit representation of the future first time, the navigation information of the target vehicle, and the first target autopilot strategy information into the time sequence layer to obtain the timing transfer implicit representation of the future second time output by the time sequence layer comprises:

and inputting the predicted implicit representation of the future first time, the time sequence transmission implicit representation of the current time, the navigation information of the target vehicle and the target automatic driving strategy information corresponding to the current time into the time sequence layer to obtain the predicted implicit representation of the second time and the time sequence transmission implicit representation of the future first time, wherein the target automatic driving strategy information corresponding to the current time is based on the time sequence transmission implicit representation of the current time.

19. A training method of an automatic driving model, the automatic driving model comprising a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the method comprising:

acquiring sample input information and real automatic driving strategy information corresponding to the sample input information, wherein the sample input information comprises current sample perception information and historical sample perception information aiming at the surrounding environment of the sample vehicle;

inputting the sample perception information into the multi-modal coding layer to obtain an initial implicit representation output by the multi-modal coding layer;

inputting intermediate sample input information based on the initial implicit representation and navigation information of a sample vehicle into the time series layer to obtain a time series transfer implicit representation of a future first moment of time of a sample vehicle surroundings output by the time series layer;

inputting the time sequence transmission implicit representation of the future first moment into the decision control layer to acquire first prediction automatic driving strategy information output by the decision control layer;

further inputting a time-sequential delivery implicit representation of the future first time instant, navigation information of the sample vehicle, and the first predicted automatic driving strategy information into the time-sequential layer to obtain a time-sequential delivery implicit representation of a future second time instant output by the time-sequential layer, wherein the future second time instant is subsequent to the future first time instant;

Further inputting the time sequence transfer implicit representation of the future second moment into the decision control layer to acquire second predicted automatic driving strategy information output by the decision control layer; and

and adjusting parameters of the multi-mode coding layer, the time sequence layer and the decision control layer based at least on the first predicted automatic driving strategy information, the second predicted automatic driving strategy information and the real automatic driving strategy information.

20. The method of claim 19, wherein the time-series layers include a time-series coding layer and a time-series decoding layer coupled to the time-series coding layer, the time-series coding layer coupled to the multi-mode coding layer, the time-series decoding layer coupled to the decision control layer, wherein,

inputting intermediate sample input information based on the initial implicit representation and navigation information of a sample vehicle into the time series layer to obtain a time series transfer implicit representation of a future first moment in time of a sample vehicle surroundings output by the time series layer comprises:

inputting the initial implicit representation into the time sequence coding layer to acquire a time sequence transmission implicit representation of the current moment output by the time sequence coding layer; and

Inputting a timing transfer implicit representation of the current time and navigation information of the sample vehicle to the time series decoding layer to obtain a timing transfer implicit representation of the future first time output by the time series decoding layer,

and wherein further inputting the time series layer with the time series transfer implicit representation of the future first time instant, the navigation information of the sample vehicle, and the first predicted automatic driving strategy information to obtain the time series transfer implicit representation of the future second time instant output by the time series layer comprises:

and further inputting the time sequence transmission implicit representation of the future first moment, the navigation information of the sample vehicle and the first prediction automatic driving strategy information into the time sequence decoding layer so as to acquire the time sequence transmission implicit representation of the future second moment output by the time sequence decoding layer.

21. The method of claim 19, wherein the time-series layer is a time-series decoding layer and the initial implicit representation comprises a current initial implicit representation and a historical initial implicit representation corresponding to the current perceptual information and the historical perceptual information, respectively, wherein,

inputting the current initial implicit representation, navigation information of a sample vehicle, time sequence transfer implicit representation of a historical moment and predicted automatic driving strategy information of the historical moment into the time sequence decoding layer to obtain future predicted implicit representation of a future first moment and time sequence transfer implicit representation of the current moment output by the time sequence decoding layer, wherein the time sequence transfer implicit representation of the historical moment is based on the historical initial implicit representation, and the target automatic driving strategy information of the historical moment is based on the time sequence transfer implicit representation of the historical moment; and

and inputting the future prediction implicit representation of the future first moment, the time sequence transmission implicit representation of the current moment, the navigation information of the sample vehicle and the current prediction automatic driving strategy information corresponding to the current moment into the time sequence decoding layer so as to acquire the future prediction implicit representation of the future second moment and the time sequence transmission implicit representation of the future first moment output by the time sequence decoding layer, wherein the current prediction automatic driving strategy information corresponding to the current moment is based on the time sequence transmission implicit representation of the current moment.

22. The method of any of claims 19-21, wherein the sample input information includes an intervention identification that can characterize whether the actual autopilot strategy information is autopilot strategy information with human intervention, the method further comprising:

acquiring evaluation feedback information for the sample input information,

and wherein adjusting parameters of the multi-modal encoding layer, the time-series layer, and the decision control layer based at least on the first predicted autopilot strategy information, the second predicted autopilot strategy information, and the real autopilot strategy information comprises:

and adjusting parameters of the multi-mode coding layer, the time sequence layer and the decision control layer based on the intervention identification, the evaluation feedback information, the first prediction automatic driving strategy information, the second prediction automatic driving strategy information and the real automatic driving strategy information.

23. The method of any of claims 19-21, wherein the autopilot model further includes a future prediction layer, the method further comprising:

acquiring future real information corresponding to the sample input information;

inputting the time sequence transfer implicit representation of the first moment in the future into the future prediction layer to acquire future prediction information output by the future prediction layer; and

And adjusting parameters of the multi-mode coding layer, the time sequence layer and the future prediction layer based on the future prediction information and the future real information.

24. An autopilot device based on an autopilot model, the autopilot model comprising a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the device comprising:

an input information acquisition unit configured to acquire first input information of the multi-mode encoding layer, wherein the first input information of the multi-mode encoding layer includes current perception information and history perception information for a surrounding environment of a target vehicle acquired by a sensor;

the multi-mode coding unit is configured to input the first input information into the multi-mode coding layer so as to acquire an initial implicit representation corresponding to the first input information, which is output by the multi-mode coding layer;

a time series unit configured to input second input information based on the initial implicit representation and navigation information of a target vehicle into the time series layer to acquire a time series transfer implicit representation for a future first moment of the surrounding environment of the target vehicle output by the time series layer; and

A decision control unit configured to input a timing transfer implicit representation of the future first moment into the decision control layer to obtain first target automatic driving strategy information output by the decision control layer,

wherein the time series unit is further configured to further input a time series transfer implicit representation of the future first moment, navigation information of the target vehicle, and the first target autopilot strategy information into the time series layer to obtain a time series transfer implicit representation of a future second moment output by the time series layer, wherein the future second moment is subsequent to the future first moment; and is also provided with

Wherein the decision control unit is further configured to further input a timing transfer implicit representation of the future second moment into the decision control layer to obtain second target autopilot strategy information output by the decision control layer.

25. A training device for an autopilot model, the autopilot model comprising a multi-modal coding layer, a time-series layer connected to the multi-modal coding layer, and a decision control layer connected to the time-series layer, the device comprising:

A sample input information acquisition unit configured to acquire sample input information including current sample perception information and historical sample perception information for the sample vehicle surroundings and real automatic driving strategy information corresponding to the sample input information;

the multi-mode coding layer training unit is configured to input the sample perception information into the multi-mode coding layer so as to acquire an initial implicit representation output by the multi-mode coding layer;

a time-series layer training unit configured to input intermediate sample input information based on the initial implicit representation and navigation information of a sample vehicle into the time-series layer to obtain a time-series transfer implicit representation of a future first moment of the sample vehicle surroundings output by the time-series layer;

a decision control layer training unit configured to input a timing transfer implicit representation of the future first moment into the decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer,

wherein the time series layer training unit is further configured to further input a time series transfer implicit representation of the future first moment, navigation information of the sample vehicle, and the first predicted automatic driving strategy information into the time series layer to obtain a time series transfer implicit representation of a future second moment of the time series layer output, wherein the future second moment is after the future first moment, and

The decision control layer training unit is further configured to input the time sequence transmission implicit representation of the future second moment into the decision control layer to acquire second predicted automatic driving strategy information output by the decision control layer; and

and a parameter adjustment unit configured to adjust parameters of the multi-modal encoding layer, the time-series layer, and the decision control layer based at least on the first predicted automatic driving strategy information, the second predicted automatic driving strategy information, and the real automatic driving strategy information.

26. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 16-23.

27. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 16-23.

28. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 16-23.

29. An autonomous vehicle comprising:

one of the autopilot device of claim 24, the training device of the autopilot model of claim 25, and the electronic apparatus of claim 26.