CN118171105A

CN118171105A - Automatic driving model training method, track determining method and automatic driving vehicle

Info

Publication number: CN118171105A
Application number: CN202410362267.9A
Authority: CN
Inventors: 叶晓青; 黄际洲
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2024-06-11

Abstract

The disclosure provides an automatic driving model training method, a track determining method and an automatic driving vehicle, relates to the technical field of artificial intelligence, in particular to the field of deep learning and computer vision, and can be applied to scenes such as automatic driving, intelligent traffic and the like. The specific implementation scheme is as follows: inputting the first sensing data into an automatic driving model to obtain a plurality of first candidate tracks; the first sensing data represents environmental information of the environment where the vehicle is located; determining evaluation values for the plurality of first candidate tracks respectively according to the first sensing data and the plurality of first candidate tracks; determining, for each of the plurality of first candidate trajectories, a first sub-loss value corresponding to the first candidate trajectory based on a difference between the first candidate trajectory and the reference trajectory, and an evaluation value for the first candidate trajectory; determining a first loss value according to the first sub-loss value of each of the plurality of first candidate tracks; and training the automatic driving model according to the first loss value.

Description

Automatic driving model training method, track determining method and automatic driving vehicle

Technical Field

The disclosure relates to the field of artificial intelligence technology, in particular to the field of deep learning and computer vision, and can be applied to scenes such as automatic driving and intelligent traffic. More particularly, the present disclosure provides a training method of an automatic driving model, a method, an apparatus, an electronic device, a storage medium, and a computer program product for determining a vehicle driving trajectory.

Background

In the driving process of the automatic driving vehicle, a plurality of candidate tracks need to be planned based on an automatic driving model, and then a target driving track is determined based on the plurality of candidate tracks. However, in practical application, the faced practical situation is relatively complex, and sometimes the target running track is relatively low in rationality.

Disclosure of Invention

The present disclosure provides a training method of an automatic driving model, a method, an apparatus, an electronic device, a storage medium and a computer program product for determining a vehicle driving track.

According to an aspect of the present disclosure, there is provided a training method of an automatic driving model, including: inputting the first sensing data into an automatic driving model to obtain a plurality of first candidate tracks; the first sensing data represents environmental information of the environment where the vehicle is located; determining evaluation values for the plurality of first candidate tracks respectively according to the first sensing data and the plurality of first candidate tracks; determining, for each of the plurality of first candidate trajectories, a first sub-loss value corresponding to the first candidate trajectory based on a difference between the first candidate trajectory and the reference trajectory, and an evaluation value for the first candidate trajectory; determining a first loss value according to the first sub-loss value of each of the plurality of first candidate tracks; and training the automatic driving model according to the first loss value.

According to another aspect of the present disclosure, there is provided a method of determining a vehicle travel track, including: inputting third sensing data into the automatic driving model to obtain a plurality of third candidate tracks; the third sensing data represents environmental information of the environment where the vehicle is located; determining evaluation values of each of the plurality of third candidate tracks according to the third sensing data and the plurality of third candidate tracks; determining a target running track from the plurality of third candidate tracks according to the evaluation values of the plurality of third candidate tracks; the automatic driving model is trained by the method.

According to another aspect of the present disclosure, there is provided a training apparatus of an automatic driving model, including: the device comprises a first input module, a first evaluation value determining module, a first sub-loss determining module, a first loss determining module and a training module. The first input module is used for inputting first sensing data into the automatic driving model to obtain a plurality of first candidate tracks; the first sensing data represents environmental information of the environment where the vehicle is located; the first evaluation value determining module is used for determining evaluation values respectively aiming at a plurality of first candidate tracks according to the first sensing data and the plurality of first candidate tracks; the first sub-loss determination module is used for determining a first sub-loss value corresponding to each first candidate track in the plurality of first candidate tracks according to the difference between the first candidate track and the reference track and the evaluation value of the first candidate track; the first loss determination module is used for determining a first loss value according to the first sub-loss values of each of the plurality of first candidate tracks; the training module is used for training the automatic driving model according to the first loss value.

According to another aspect of the present disclosure, there is provided an apparatus for determining a travel track of a vehicle, including: the system comprises a second input module, a second evaluation value determining module and a target track determining module. The second input module is used for inputting third sensing data into the automatic driving model to obtain a plurality of third candidate tracks; the third sensing data represents environmental information of the environment where the vehicle is located; the second evaluation value determining module is used for determining evaluation values of each of the plurality of third candidate tracks according to the third sensing data and the plurality of third candidate tracks; the target track determining module is used for determining a target running track from the plurality of third candidate tracks according to the evaluation values of the plurality of third candidate tracks; wherein, the automatic driving model is trained by the device.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of training the autopilot model provided by the present disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining a vehicle travel path provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided an autonomous vehicle including the above-described electronic device.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic illustration of an application scenario of a training method of an autopilot model, a method and apparatus for determining a vehicle travel trajectory, in accordance with an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of a training method of an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of determining weights according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of a method of training an autopilot model in accordance with another embodiment of the present disclosure;

FIG. 5 is a schematic diagram of sorting trajectories according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of determining track pairs according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a training method of an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 8A is a schematic flow chart diagram of a method of determining a vehicle travel track according to an embodiment of the disclosure;

FIG. 8B is a schematic diagram of a method of determining a vehicle travel track according to an embodiment of the disclosure;

FIG. 9 is a schematic block diagram of a training apparatus of an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 10 is a schematic block diagram of an apparatus for determining a vehicle travel track according to an embodiment of the present disclosure; and

FIG. 11 is a block diagram of an electronic device for implementing a training method of an autopilot model and/or a method of determining a vehicle travel trajectory in accordance with embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In some embodiments, multiple candidate trajectories may be planned first, and then a rule-based optimization algorithm to optimize the solution to the target travel path.

However, rule-based optimization algorithms tend to fall into local optima, resulting in a relatively conservative determined driving scheme, which is not capable of handling emergency situations.

For example, the following special cases occur: the vehicle is positioned at a solid line in the right turn lane, which is close to the intersection, and the intersection is blocked due to traffic accidents. In the face of this special case, the human driver will take some empirical strategy and choose to temporarily bypass, for example, from the straight-through path of the partition to make a right turn. However, the optimization algorithm is calculated based on rules, and a lane change mode at a solid line is not adopted, so that the result determined by the optimization algorithm is that: and stopping and waiting for taking over.

The embodiment of the disclosure provides a training method of an automatic driving model, in the training process, if an evaluation value is not considered, an original loss value aiming at a first candidate track can be determined, and the automatic driving model is trained based on the original loss value. In this embodiment, the first candidate trajectory output by the autopilot model is evaluated, and the evaluation value may be characterized by: the vehicle runs along the candidate track with high and low rationality under the current sensing scene. The original loss value may then be modulated to a first sub-loss value based on the evaluation value, and then an autopilot model may be trained based on the first sub-loss values of the first plurality of candidate trajectories.

The training method can be used for relieving the problem that the automatic driving model is too conservative, so that the automatic driving model can output multimode candidate tracks in a model reasoning stage, wherein the multimode candidate tracks refer to tracks with various driving schemes, for example, the automatic driving model can output detour tracks and following tracks. In the model reasoning stage, the target running track is determined based on a plurality of candidate tracks output by the automatic driving model, so that the diversity of the candidate tracks can improve the rationality of the target running track.

The disclosed embodiments also provide a method of determining a vehicle travel track that replaces a complex optimization algorithm by evaluating candidate tracks, which may determine an evaluation value of the candidate track, and select a target travel track based on the evaluation value. In some embodiments, the assessment model used to determine the assessment value may introduce human feedback during the training phase, further improving the rationality of the target travel trajectory.

The technical solutions provided by the present disclosure will be described in detail below with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a schematic view of an application scenario of a training method of an automatic driving model, a method of determining a driving trajectory of a vehicle, and an apparatus according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include sensors 101, 102, 103, a vehicle 110, a network 120, and a server 130.

The sensors 101, 102, 103 may interact with the server 130 over the network 120 to receive or send messages, etc. The sensors 101, 102, 103 may be functional elements integrated on the vehicle 110, such as infrared sensors, ultrasonic sensors, millimeter wave radars, image acquisition devices, lidars, inertial measurement units, etc. The sensors 101, 102, 103 may be used to collect data of the environment in which the vehicle 110 is located, such as the location and status of objects such as pedestrians, vehicles, obstacles, etc. surrounding the vehicle, and road data surrounding the vehicle.

Vehicle 110 may be an autonomous vehicle or a manually driven vehicle having an autonomous mode.

The network 120 is used as a medium to provide a communication link between the sensors 101, 102, 103 and the server 130, and may also be used as a medium to provide a communication link between the vehicle 110 and the server 130. Network 120 may include various connection types, such as wired and/or wireless communication links, and the like.

The server 130 may be disposed at a remote end capable of establishing communication with the vehicle-mounted terminal, and may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server.

The server 130 may be a server providing various services. On the server 130, for example, a map class application, a data processing class application, a model training application, or the like may be installed. Taking the example of the server 130 running the model training application: the server 130 may receive sensed data from the point cloud data, image data, etc., transmitted by the sensors 101, 102, 103 through the network 120. The sensed data may be stored in a database and used to train an autopilot model, which may be implemented based on a transducer model or other deep learning model.

After the training task is completed, server 130 may send the trained autopilot model to vehicle 110 so that vehicle 110 processes the data collected by sensors 101, 102, 103 and obtains a target travel track based on the autopilot model to cause the vehicle to travel along the target travel track. Alternatively, the server 130 may receive the sensed data of the sensors 101, 102, 103, then process the sensed data using the trained autopilot model and obtain a target trajectory ride, and then may output the target trajectory ride to the vehicle 110.

It should be noted that, the training method of the automatic driving model provided in the embodiments of the disclosure may be generally executed by the server 130. Accordingly, the training device for the autopilot model provided in the embodiments of the present disclosure may also be disposed in the server 130. The method of determining a vehicle travel track provided by embodiments of the present disclosure may be generally performed by the vehicle 110 or the server 130. Accordingly, the device for determining the driving track of the vehicle provided in the embodiment of the present disclosure may also be disposed in the vehicle 110 or the server 130.

It will be appreciated that the number of sensors, vehicles, networks, and servers in fig. 1 are merely illustrative. There may be any number of sensors, vehicles, networks, and servers, as desired for implementation.

Fig. 2 is a schematic flow chart of a training method of an autopilot model in accordance with an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the automatic driving model may include operations S210 to S250.

In operation S210, first sensing data is input into an automatic driving model, resulting in a plurality of first candidate trajectories.

For example, the first sensing data may represent environmental information, the first sensing data may include at least one of point cloud data and image data, and the first sensing data may be obtained by data acquisition of an environment where the vehicle is located by using a sensor such as a laser radar, an image acquisition device, or the like. The first sensed data may also be representative of vehicle motion information, e.g., the first sensed data includes information related to the vehicle such as speed, acceleration, etc.

For example, the first sensed data is input into an autopilot model to be trained, which outputs a plurality of first candidate trajectories. Each first candidate track may include a plurality of location points, each point corresponding to movement information of the vehicle at the point, and the movement information may include at least one of speed, acceleration, direction, and the like.

In operation S220, evaluation values for the plurality of first candidate tracks are determined according to the first sensing data and the plurality of first candidate tracks, respectively.

For example, the evaluation value may be determined by some predetermined evaluation algorithm, and the manner of determining the evaluation value is not limited in this embodiment.

For example, the evaluation value may characterize: in the case where the first sensed data is acquired (i.e., in the case where the vehicle is in the currently sensed environment), the vehicle travels along the candidate trajectory to a reasonable degree. For example, if the evaluation value is higher, the candidate track is more reasonable; and if the evaluation value is lower, the candidate track is less reasonable.

In operation S230, for each of the plurality of first candidate trajectories, a first sub-loss value corresponding to the first candidate trajectory is determined according to a difference between the first candidate trajectory and the reference trajectory, and an evaluation value for the first candidate trajectory.

For example, the reference track is a Label (Label), and the track of the driver can be used as the reference track, or the reference track can be preconfigured.

For example, the loss value between the first candidate track and the reference track may be calculated by a predetermined loss function, which may include cosine similarity, cross entropy loss function, and the like, which is not limited in this embodiment. And then adjusting the loss value by using the evaluation value, and taking the adjusted loss value as a first sub-loss value, wherein the mode of adjusting the loss value can comprise the following steps: if the evaluation value is greater than a certain threshold value, the loss value is reduced; if the evaluation value is smaller than a threshold value, the loss value is increased. It will be appreciated that there is one first sub-loss value for each first candidate track.

In operation S240, a first loss value is determined according to the first sub-loss value of each of the plurality of first candidate tracks.

Illustratively, the number of first candidate tracks is N, where N is an integer greater than or equal to 2, such that the N first candidate tracks correspond to N first sub-loss values. For example, the smallest first sub-loss value among the N first sub-loss values may be regarded as the first loss value. For another example, a weighted average calculation may be performed on the N first sub-loss values, and the weighted average may be used as the first loss value. For another example, the minimum M first sub-loss values among the N first sub-loss values may be calculated by weighted average, and the weighted average may be defined as the first loss value, where 2.ltoreq.m.ltoreq.n, where M is an integer.

In operation S250, an automatic driving model is trained according to the first loss value.

For example, training of the autopilot model is accomplished by adjusting network weights in the autopilot model by a back propagation algorithm or the like.

The present embodiment evaluates a first candidate trajectory output by the automatic driving model, then modulates the loss value to a first sub-loss value based on the evaluation value, and then trains the automatic driving model based on the first sub-loss value. The first sub-loss is influenced through track rationality, so that the problem that an automatic driving model is too conservative can be relieved, the trained automatic driving model can output multimode candidate tracks in a model reasoning stage, more and more reasonable choices are provided for a target driving track, and the rationality of the target driving track is improved.

According to another embodiment of the present disclosure, the above method of determining the first sub-loss value corresponding to the first candidate track according to the difference between the first candidate track and the reference track and the evaluation value for the first candidate track may include the following operations: an original loss value is determined based on a difference between the first candidate trajectory and the reference trajectory. Then, a weight is determined based on the original loss value and the evaluation value for the first candidate track, and then a first sub-loss value corresponding to the first candidate track is determined based on the original loss value and the weight.

For example, the original loss value between the first candidate track and the reference track may be calculated by a cosine similarity, a cross entropy loss function, or other predetermined loss function. The weight may then be determined according to the size of the evaluation value, for example, the correspondence between the evaluation value range and the weight value may be preconfigured, for example, when the preconfigured evaluation value is in the range of 0.7 to 0.9, the corresponding weight is 0.2. Weights may then be determined based on the correspondence. The product of the original loss value and the weight may then be taken as the first sub-loss value.

According to the method, the original loss value is modulated into the first sub-loss through the evaluation value, so that the evaluation value can directly influence the first loss value, the training process of the automatic driving model is influenced, and the automatic driving model outputs multimode tracks.

Fig. 3 is a schematic diagram of determining weights according to an embodiment of the present disclosure.

Next, a process of determining a weight from the original loss value and the evaluation value for the first candidate trajectory will be described with reference to fig. 3, where the weight is denoted by w in fig. 3.

It is understood that the first threshold Th1 and the second threshold Th2 may be preconfigured, and if the original Loss value loss_0 is greater than or equal to the first threshold Th1, the original Loss value loss_0 is greater. If the original Loss value loss_0 is smaller than the first threshold Th1, the original Loss value loss_0 is smaller. If the evaluation value Score is equal to or greater than the second threshold Th2, the evaluation value Score is higher. If the evaluation value Score is smaller than the second threshold Th2, the evaluation value Score is lower. It can be seen that by permutation and combination, there are the following four cases.

First case: the original Loss value loss_0 corresponding to a certain first candidate track Traj _1 determined by the autopilot model is smaller, and the evaluation value Score of the first candidate track Traj _1 is higher. The situation illustrates that from the perspective of the autopilot model and the evaluation of two dimensions, the first candidate track Traj _1 is relatively similar to the reference track, so that the autopilot model is relatively accurate in processing the first candidate track Traj _1, the weight of the original Loss value loss_0 may not be modulated, the weight may be 1, and at this time, the first sub-Loss value loss_1 is equal to the original Loss value loss_0.

Second case: the original Loss value loss_0 corresponding to a certain first candidate track Traj _2 determined by the autopilot model is larger, and the evaluation value Score of the first candidate track Traj _2 is lower. The situation illustrates that, from the perspective of the autopilot model and the evaluation of two dimensions, the first candidate track Traj _2 is greatly different from the reference track, so that the autopilot model has a more accurate processing result on the first candidate track Traj _1, the weight of the original Loss value loss_0 may not be modulated, the weight may be 1, and at this time, the first sub-Loss value loss_1 is equal to the original Loss value loss_0.

Third case: the original Loss value loss_0 corresponding to a certain first candidate track Traj _3 determined by the autopilot model is larger, and the evaluation value Score of the first candidate track Traj _3 is higher. This situation illustrates that the first candidate trajectory Traj _3 does not coincide with the reference trajectory from the perspective of the autopilot model dimension, but that the first candidate trajectory Traj _3 is still a viable path from the perspective of the rationality dimension characterized by the evaluation value Score.

For example, the actual driving trajectory of the driver is taken as the reference trajectory. When the vehicle runs, a driver can choose to follow the vehicle or choose to bypass the vehicle during driving, but the driver can only make a specific choice, so that after the data fall, only a certain driving track is collected, for example, the driver chooses to follow the vehicle, the collected reference track is the track of following the vehicle, and the corresponding bypass track becomes a negative sample. In practical applications, however, the autopilot model may be encouraged to output multimode candidate trajectories that represent diverse candidate trajectories.

Thus, for this third case, the weight of the original Loss value loss_0 can be modulated. For example, the first weight may be determined first, and the value of the first weight may be greater than 0 and less than 1, and then the first weight is taken as the weight, so that the autopilot model is encouraged to output multimode candidate trajectories based on the first sub-Loss value loss_1 determined according to the first weight being less than the original Loss value loss_0, so that the trained autopilot model may output candidate trajectories with higher rationality, such as the encouraged detour trajectory, thereby avoiding the autopilot model being too conservative and not comprehensible when coping with special situations.

In some embodiments, the first weight is inversely related to the evaluation value Score for the first candidate trajectory. Thus, the higher the evaluation value Score is, the smaller the first weight is, and the smaller the first loss value determined based on the first weight is, so that the first sub-loss of the first candidate track is sufficiently reduced, and the knowledge learned by the automatic driving model from the first candidate track is weakened.

In some embodiments, the first weight may be determined from an evaluation value Score for the first candidate trajectory. For example, the evaluation value Score is a numerical value between 0 and 1, and the difference between 1 and the evaluation value Score may be used as the first weight, or the first weight may be determined based on the reciprocal of the evaluation value Score. It can be seen that in this way, the first weight can be accurately and rapidly calculated based on the evaluation value Score on the basis of realizing that the first weight is inversely related to the evaluation value Score for the first candidate trajectory.

Fourth case: the original Loss value loss_0 corresponding to a certain first candidate track Traj _4 determined by the autopilot model is smaller, and the evaluation value Score of the first candidate track Traj _4 is lower. This situation illustrates that the first candidate trajectory Traj _4 is relatively close to the reference trajectory from the autopilot model dimension, but that the first candidate trajectory Traj _4 is less rational from the rationality dimension characterized by the evaluation value Score.

In one example, for this fourth case, the weight of the original Loss value loss_0 may be modulated. For example, a second weight may be determined first, the value of the second weight may be greater than 1, and the second weight may be inversely related to the evaluation value Score. And then taking the second weight as the weight, so that the first sub-Loss value loss_1 determined based on the second weight is larger than the original Loss value loss_0, thus weakening the knowledge learned by the automatic driving model from the first candidate track, avoiding the over conservation of the automatic driving model and realizing the effect of encouraging the automatic driving model to output multimode candidate tracks.

In another example, the weight of the original Loss value loss_0 may not be modulated for this fourth case. For example, the accuracy of the autopilot model is considered to be higher than the accuracy of the evaluation value Score, and the processing result of the autopilot model may be more trusted than the evaluation value Score, so the weight of the original Loss value loss_0 may not be modulated, and the weight may be 1, where the first sub-Loss value loss_1 is equal to the original Loss value loss_0.

The four cases are described in detail above.

In some embodiments, based on the four cases described above, the weights may be determined by the following formulas.

Where L _traj represents the original loss value of the first candidate trajectory L _traj, L _T represents the first threshold value, score _traj represents the evaluation value of the first candidate trajectory L _traj, and S _T represents the second threshold value. W (L _traj,score_traj) represents a weight, which may be: given a first candidate track L _traj and a corresponding evaluation value score _traj, the modulation factor to which the original loss value of the first candidate track is multiplied is required, and the weight may be 1 (i.e., not modulated). While in the case of encouraging multimode, the weight may be reduced such that the first sub-loss value of the first candidate trajectory is reduced by the weight.

Fig. 4 is a schematic flow chart of a training method of an automatic driving model according to another embodiment of the present disclosure.

As shown in fig. 4, the training method 400 of the automatic driving model may include operations S410, S421, and S430-S450, wherein operations S410, S430-S450 may refer to operations S210, S230-S250 described above, and the embodiment is not described herein.

In operation S410, first sensing data is input into an automatic driving model, resulting in a plurality of first candidate trajectories.

In operation S421, the first sensing data and the plurality of first candidate trajectories are input into an evaluation model, and evaluation values for the plurality of first candidate trajectories, respectively, are obtained.

For example, an evaluation model may be trained in advance, and the input of the evaluation model may include sensed data collected by the sensor or may include traveling information of the vehicle. The output of the evaluation model includes a plurality of candidate trajectories. The accuracy of the evaluation value can be ensured by determining the evaluation value by a pre-trained evaluation model.

In operation S430, for each of the plurality of first candidate trajectories, a first sub-loss value corresponding to the first candidate trajectory is determined according to a difference between the first candidate trajectory and the reference trajectory, and an evaluation value for the first candidate trajectory.

In operation S440, a first loss value is determined according to the first sub-loss values of each of the plurality of first candidate tracks.

In operation S450, an automatic driving model is trained according to the first loss value.

Next, a construction process of training samples used for training the evaluation model and a process of training the evaluation model will be described.

The training samples used to train the assessment model may include second sensed data, second candidate trajectories, and labels.

For example, the second sensing data may represent environmental information of an environment in which the vehicle is located, and the sensor may be used to collect data of the environment in which the vehicle is located, and the obtained data is used as the second sensing data. The second sensed data may also be representative of vehicle motion information, for example, the second sensed data includes information related to vehicle such as speed, acceleration, and the like.

For example, the second sensed data may be input into a historical version of the autopilot model, from which K second candidate trajectories are output, K being an integer greater than or equal to 1. Alternatively, the second sensing data may be processed in other ways to obtain K second candidate tracks. The present embodiment is not limited to the manner of determining the second candidate track based on the second sensing data.

After obtaining the K second candidate tracks, the K second candidate tracks may be ranked manually or based on a predetermined ranking algorithm based on the observation condition of the current scene, to obtain ranking information, where the ranking information characterizes the true rationality of the K second candidate tracks, for example, the more rational the second candidate tracks rank the higher.

As shown in fig. 5, the first row shows K second candidate tracks, the second row shows the ordering of the respective second candidate tracks, and ">" in the figure indicates that the former track is more reasonable than the latter track. In the example shown in fig. 5, the order of track 3 is rank_1, the order of track 2 is rank_2, the order of track K-1 is rank_3, the order of track 1 is rank_k, and the rationality of track 3, track 2, track K-1, and track 1 decreases in order.

As shown in fig. 6, after the ranking information is obtained, the tag may be determined based on the ranking information. For example, K second candidate tracks are combined in pairs, so thatAnd track pairs, each track pair may be represented as (Obs, T _i,T_j), where Obs represents second sensed data, T _i represents a certain second candidate track, T _j represents another second candidate track, and a label of the track pair characterizes a relationship between an evaluation value of the second candidate track T _i and an evaluation value of the second candidate track T _j, for example, the evaluation value of the second candidate track T _i is greater than the evaluation value of the second candidate track T _j, the label may be "0", or else the label may be 1. The/>The pairs of traces form a Batch (Batch) of data, in space/>And (3) representing. According to the method and the device for labeling the labels, the labels are determined based on the ordering information, so that the labels of all track pairs do not need to be labeled independently, the labels can be automatically generated based on the front-back relation of ordering, and labeling efficiency is improved.

After the training samples are constructed, the training samples may be utilized to train the assessment model.

For example, the second sensing data and the K second candidate trajectories may be input into an evaluation model to be trained, resulting in K evaluation values.

Next, for each of the at least one track pair, the track pair comprises two second candidate tracks, one for each evaluation value, such that the track pair corresponds to the two evaluation values. A second sub-loss value may be determined from the two evaluation values of the track pair and the tag associated with the track pair. For example, the second evaluation value is calculated according to a predetermined loss function, which may calculate a difference between the evaluation value of the second candidate track T _i and the evaluation value of the second candidate track T _j, and then calculate the second sub-loss value based on whether or not the relationship between the difference and the two evaluation values indicated by the label is identical.

Next, a second loss value may be determined for each second sub-loss value based on the at least one track pair. For example, a weighted average calculation may be performed on the second sub-loss values of the plurality of track pairs, and the calculation result may be regarded as the second loss value. The assessment model may then be trained based on the second loss value. For example, training of the assessment model is accomplished by adjusting network weights in the assessment model by a back propagation algorithm or the like.

For example, the following equation may be employed to calculate the second loss value.

Where Obs represents the second sensed data and T _i,T_j represents the two second candidate trajectories in a certain pair of trajectories under current observation (i.e., in the case where the second sensed data is observed), it may be assumed that the rank of T _i in the training data is higher than T _j. Space ofRepresenting all track pairs under this observation. R denotes an evaluation model to be trained, R (Obs, T _i) is an evaluation value of T _i under the current observation, and R (Obs, T _j) is an evaluation value of T _j under the current observation. /(I)The predetermined coefficient is used for relieving the problem that the number of K influences training to cause over fitting, and the predetermined coefficient can be other values.

In this embodiment, the output of the evaluation model is a value between 0 and 1. It should be noted that, the output of the evaluation model is not an absolute score value, but a sorting manner is adopted, because the absolute score of different people is low in robustness, and the relative sorting is more robust, so that the output result of the trained evaluation model is more robust and accurate.

Fig. 7 is a schematic diagram of a training method of an autopilot model according to an embodiment of the present disclosure.

As shown in fig. 7, this embodiment requires training the following two models: the training process of the present embodiment may include a first stage_1, a second stage_2, a third stage_3, and a fourth stage_4.

In the first stage_1, the autopilot model m_1.0 to be trained may be trained as the first version of the autopilot model m_1.1.

The input of the automatic driving model M_1.0 comprises sensing data acquired by a sensor, the output comprises a plurality of candidate tracks, the supervisory signal is a GT track, and the real manual driving track or the automatic driving track can be used as the GT track. The evaluation value may be ignored in training the autopilot model m_1.0.

The training process of the first stage_1 is similar to that of the fourth stage stage_4 below, with the main differences that: the first stage stage_1 has not yet obtained the trained evaluation model m_2.1, and therefore the first stage stage_1 ignores the evaluation value.

For example, the sensed data may be input into the autopilot model m_1.0 to be trained, resulting in a plurality of candidate trajectories. Then, for each candidate track of the plurality of candidate tracks, a sub-loss value corresponding to the candidate track is determined based on the difference between the candidate track and the GT track. And then determining a loss value according to the sub-loss values of each of the plurality of candidate tracks, and training the automatic driving model M_1.0 according to the loss value, so that the automatic driving model M_1.1 of the first version is obtained after training.

For example, the autopilot model m_1.0 may include an image encoder, a point cloud encoder, and a processing sub-model, and features of image data and features of point cloud data may be extracted using the image encoder and the point cloud encoder, respectively, and then the two features may be converted into BEV (Bird's Eye View) features, and then the BEV features may be input into the processing sub-model, and the processing sub-model may output trajectory information. In the process of adjusting the parameters, the parameters of the processing submodel can be adjusted, and the parameters of the image encoder and the point cloud encoder can also be adjusted. The structure and specific training means of the automatic driving model are not limited in this embodiment.

In the second stage_2, an evaluation model m_2.0 may be determined based on the pre-trained model. At least part of the network layer in the historical version of the automatic driving model (e.g. the automatic driving model m_1.1 above) can be used as a pre-training model, and then the linear layer is connected in series at the output end of the pre-training model, so as to obtain an evaluation model m_2.0 to be trained. For example, the last layer of the autopilot model m_1.1 is the m_lp layer, and if the last layer of the autopilot model m_1.1 outputs 5 tracks, each track including 30 points, the last layer of the autopilot model m_1.1 handles a regression task of 5×30. The last layer of the autopilot model m_1.1 may be modified to be a linear layer, or a linear layer may be added after the output layer of the autopilot model m_1.1, or a part of the layers in the autopilot model m_1.1 may be deleted and then a linear layer may be added at the end. The pre-training model is adopted to determine the evaluation model M_2.0, so that model convergence can be quickened, and training efficiency can be improved. It will be appreciated that in other embodiments, the pre-training model may not be employed to determine the assessment model M_2.0.

In the third phase stage_3, the evaluation model m_2.0 to be trained can be trained as the evaluation model m_2.1 using the autopilot model m_1.1.

The process of training the assessment model may be referred to above, e.g., training samples may be pre-constructed, the training samples including the second sensing data, the second candidate trajectory, and the labels. In the training process, the second sensing data and the K second candidate tracks can be input into an evaluation model m_2.0 to be trained, so as to obtain K evaluation values. For each of the at least one track pair, a second sub-loss value is determined from the evaluation value of each second candidate track of the track pair and the label associated with the track pair. And then determining a second loss value according to the respective second sub-loss value of at least one track pair, and training the evaluation model M_2.0 according to the second loss value, so as to obtain the evaluation model M_2.1 after training.

In the fourth stage stage_4, the evaluation model m_2.1 may be applied in a training stage for training the first version of the autopilot model m_1.1, thereby training the first version of the autopilot model m_1.1 into the second version of the autopilot model m_1.2.

In the training process, the first sensing data can be input into the automatic driving model M_1.1 to obtain a first candidate track, and the original loss value of the first candidate track can be determined according to the difference between the first candidate track and the reference track. In addition, the first sensing data and the first candidate track are input into a trained evaluation model M_2.1, and an evaluation value of the first candidate track is obtained. Then, a weight is determined according to the evaluation value, and the original loss value is modulated according to the weight, thereby changing the original loss value to a first sub-loss value. Then, according to the first sub-loss values of each of the plurality of first candidate trajectories, a first loss value can be determined, the automatic driving model M_1.1 is trained based on the first loss value, and the automatic driving model M_2.1 can be obtained after training. It will be appreciated that during this process, the parameters of the assessment model m_2.1 may be kept unchanged, and the original loss value output by the first version of the autopilot model m_1.1 is modulation weighted using the assessment model m_2.1 as a modulator, thereby encouraging the multimode nature of the autopilot model. It can be seen that the assessment model can modulate the autopilot model, acting as a mutual aid.

Furthermore, in some embodiments, the training data set D may be first collected, where the training data set D may include data collected by the sensor and an actual driving track, and the actual driving track may be used as a tag to train the autopilot model, and after training, the autopilot model m_1.0 may be obtained. Then, after training the evaluation model, for example, the above evaluation model m_2.1 is obtained, then the evaluation model m_2.1 may be used to evaluate the tracks in the training data set D, determine the evaluation value of the tracks, and delete the tracks with the evaluation value lower than the threshold value, so as to reject some training data that is too aggressive or unreasonable, so that the data set D may be updated to the data set D'. Next, the data set D' may be utilized to further train an autopilot model.

The embodiment can realize self-evolutionary closed loop of mutual iteration of the evaluation model and the automatic driving model. By applying the evaluation model to the rationality judgment of the collected training data, the training data is continuously purified and the automatic driving model is updated, so that the accuracy of the evaluation model is continuously improved. Corresponding to: and training an evaluation model based on the high-quality data, evaluating and screening a large number of actual driving tracks by using the evaluation model, eliminating training data which are too aggressive or unreasonable, and then continuing to train an automatic driving model. In this way, the evaluation model can be further optimized to realize closed loop. Meanwhile, the method can continuously improve the quality of training data and the quality of a model on the premise of not increasing extra labor cost, and the self-evolution of the model is formed.

Fig. 8A is a schematic flowchart of a method of determining a vehicle travel track according to an embodiment of the present disclosure, and fig. 8B is a schematic diagram of a method of determining a vehicle travel track according to an embodiment of the present disclosure.

As shown in fig. 8A and 8B, the method 800 of determining a vehicle travel track may include operations S810 to S830.

In operation S810, third sensing data is input into the automatic driving model, resulting in a plurality of third candidate trajectories.

For example, the third sensed data may characterize environmental information of an environment in which the vehicle is located, and the third sensed data may include at least one of point cloud data and image data. The third sensed data may also characterize vehicle motion information, e.g., the third sensed data may also be vehicle related information such as speed, acceleration, etc.

For example, the autopilot model may be trained using the training method above.

In operation S820, respective evaluation values of the plurality of third candidate trajectories are determined according to the third sensing data and the plurality of third candidate trajectories.

For example, the third sensed data and the plurality of third candidate trajectories may be input into a trained evaluation model, resulting in respective evaluation values for the plurality of third candidate trajectories. The evaluation model may be obtained by the operation of training the evaluation model above. Of course, the evaluation value of the third candidate trajectory may be determined based on other manners, which is not limited in this embodiment.

In operation S830, a target travel track is determined from the plurality of third candidate tracks according to the evaluation values of the respective plurality of third candidate tracks.

For example, the third candidate trajectory corresponding to the highest evaluation value may be determined as the target travel trajectory.

For another example, a track of a predetermined category may be selected from a plurality of third candidate tracks, the predetermined category may include following, detouring, etc., and then the target travel track may be selected from the tracks of the predetermined category.

The present embodiment can replace a complex optimization algorithm by evaluating candidate trajectories, and the method can determine an evaluation value of the candidate trajectories and select a target travel trajectory based on the evaluation value. In some embodiments, the assessment model used to determine the assessment value may introduce human feedback during the training phase, further improving the rationality and intelligence of the target travel trajectory.

Fig. 9 is a schematic block diagram of a training apparatus of an automatic driving model according to an embodiment of the present disclosure. As shown in fig. 9, the training apparatus 900 of the autopilot model may include a first input module 910, a first evaluation value determination module 920, a first sub-loss determination module 930, a first loss determination module 940, and a training module 950.

The first input module 910 is configured to input first sensing data into an autopilot model to obtain a plurality of first candidate trajectories. The first sensed data is indicative of environmental information of an environment in which the vehicle is located.

The first evaluation value determination module 920 is configured to determine evaluation values for the plurality of first candidate tracks according to the first sensing data and the plurality of first candidate tracks.

The first sub-loss determination module 930 is configured to determine, for each of the plurality of first candidate trajectories, a first sub-loss value corresponding to the first candidate trajectory according to a difference between the first candidate trajectory and the reference trajectory and an evaluation value for the first candidate trajectory.

The first loss determination module 940 is configured to determine a first loss value according to a first sub-loss value of each of the plurality of first candidate tracks.

The training module 950 is configured to train the autopilot model based on the first loss value.

According to an embodiment of the present disclosure, the first sub-loss determination module includes: an original loss determination sub-module, a weight determination sub-module, and a first sub-loss determination sub-module. The original loss determination submodule is used for determining an original loss value according to the difference between the first candidate track and the reference track. The weight determination submodule is used for determining weights according to the original loss values and the evaluation values aiming at the first candidate tracks. The first sub-loss determination sub-module is used for determining a first sub-loss value corresponding to the first candidate track according to the original loss value and the weight.

According to an embodiment of the present disclosure, the weight determination submodule includes: a first determination unit and a second determination unit. The first determining unit is used for determining a first weight under the condition that the original loss value is larger than or equal to a first threshold value and the evaluation value of the first candidate track is larger than or equal to a second threshold value. The second determining unit is used for taking the first weight as the weight. The first sub-loss value determined from the first weight is less than the original loss value.

According to an embodiment of the present disclosure, the first determination unit includes: and the determining subunit is used for determining the first weight according to the evaluation value for the first candidate track.

According to an embodiment of the present disclosure, the first weight is inversely related to the evaluation value for the first candidate trajectory.

According to an embodiment of the present disclosure, the weight determination submodule includes: a third determination unit and a fourth determination unit. The third determining unit is used for determining a second weight when the original loss value is determined to be smaller than a first threshold value and the evaluation value for the first candidate track is determined to be smaller than a second threshold value. The fourth determining unit is used for taking the second weight as the weight. The first sub-loss value determined from the second weight is greater than the original loss value.

According to an embodiment of the present disclosure, the first evaluation value determination module includes: the first input sub-module is used for inputting the first sensing data and the plurality of first candidate tracks into the evaluation model to obtain evaluation values respectively aiming at the plurality of first candidate tracks.

According to an embodiment of the present disclosure, further comprising: the training module is used for training the evaluation model, and the training module comprises: the system comprises a second input sub-module, a second loss determination sub-module, and a training sub-module. The second input sub-module is used for inputting the second sensing data and the plurality of second candidate tracks into an evaluation model to be trained to obtain respective evaluation values of the plurality of second candidate tracks. Wherein the second sensed data is indicative of environmental information of an environment in which the vehicle is located, and the plurality of second candidate trajectories includes at least one trajectory pair. The second sub-loss determination sub-module is configured to determine, for each of the at least one track pair, a second sub-loss value based on the evaluation value of each second candidate track in the track pair and the label associated with the track pair. The label characterizes the relation between two evaluation values, and the two evaluation values correspond to two second candidate tracks in the track pair. The second loss determination submodule is used for determining a second loss value according to at least one track pair respectively corresponding to the second loss value. The training sub-module is used for training the evaluation model according to the second loss value.

According to an embodiment of the present disclosure, the training module further comprises: the pre-training model determines sub-modules and adds sub-modules. The pre-training model determination submodule is used for taking at least part of a network layer in the historical version of the automatic driving model as a pre-training model. The adding submodule is used for adding a linear layer in series at the output end of the pre-training model to obtain an evaluation model to be trained.

According to an embodiment of the present disclosure, at least one tag associated with at least one track pair is determined from ranking information of a plurality of second candidate tracks.

According to an embodiment of the present disclosure, the first sensing data includes at least one of image data and point cloud data.

Fig. 10 is a schematic block diagram of a device for determining a travel track of a vehicle according to an embodiment of the present disclosure.

As shown in fig. 10, the apparatus 1000 for determining a vehicle travel track may include a second input module 1010, a second evaluation value determination module 1020, and a target track determination module 1030.

The second input module 1010 is configured to input third sensing data into an autopilot model to obtain a plurality of third candidate trajectories, where the third sensing data characterizes environmental information of an environment where the vehicle is located, and the autopilot model is trained by using the training device.

The second evaluation value determination module 1020 is configured to determine evaluation values of each of the plurality of third candidate tracks according to the third sensing data and the plurality of third candidate tracks.

The target track determining module 1030 is configured to determine a target driving track from the plurality of third candidate tracks according to the evaluation values of the plurality of third candidate tracks.

According to an embodiment of the present disclosure, the second evaluation value determination module includes: and the third input sub-module is used for inputting the third sensing data and the plurality of third candidate tracks into the trained evaluation model to obtain respective evaluation values of the plurality of third candidate tracks.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of training the autopilot model and/or the method of determining the trajectory of the vehicle.

According to an embodiment of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described method of training an automatic driving model and/or method of determining a vehicle running track.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described method of training an autopilot model and/or method of determining a vehicle driving trajectory.

According to an embodiment of the present disclosure, there is also provided an autonomous vehicle including an electronic device including a processor, and the processor is capable of executing the above-described method of determining a vehicle travel track.

Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1101 performs the respective methods and processes described above, such as the training method of the automatic driving model described above and/or the method of determining the vehicle running track. For example, in some embodiments, the above-described method of training an autopilot model and/or method of determining a vehicle travel trajectory may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the above-described training method of the automatic driving model and/or the method of determining the vehicle running track may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g. by means of firmware) to perform the training method of the autonomous driving model described above and/or the method of determining the vehicle driving trajectory.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of training an autopilot model, comprising:

inputting the first sensing data into an automatic driving model to obtain a plurality of first candidate tracks; the first sensing data represents environmental information of the environment where the vehicle is located;

Determining evaluation values for the plurality of first candidate tracks respectively according to the first sensing data and the plurality of first candidate tracks;

determining, for each first candidate track of the plurality of first candidate tracks, a first sub-loss value corresponding to the first candidate track from a difference between the first candidate track and a reference track, and an evaluation value for the first candidate track;

determining a first loss value according to the first sub-loss values of the first candidate tracks; and

And training the automatic driving model according to the first loss value.

2. The method of claim 1, wherein determining a first sub-loss value corresponding to the first candidate track based on a difference between the first candidate track and a reference track and an evaluation value for the first candidate track comprises:

determining an original loss value according to the difference between the first candidate track and the reference track;

determining a weight according to the original loss value and an evaluation value for the first candidate track; and

And determining a first sub-loss value corresponding to the first candidate track according to the original loss value and the weight.

3. The method of claim 2, wherein the determining weights from the original loss value and the evaluation value for the first candidate trajectory comprises:

determining a first weight under the condition that the original loss value is larger than or equal to a first threshold value and the evaluation value for the first candidate track is larger than or equal to a second threshold value; and

Taking the first weight as the weight;

Wherein the first sub-loss value determined according to the first weight is smaller than the original loss value.

4. A method according to claim 3, wherein determining the first weight comprises:

And determining the first weight according to the evaluation value for the first candidate track.

5. A method according to claim 3, wherein the first weight is inversely related to the evaluation value for the first candidate trajectory.

6. The method of claim 2, wherein the determining weights from the original loss value and the evaluation value for the first candidate trajectory comprises:

determining a second weight if the original loss value is determined to be less than a first threshold and the evaluation value for the first candidate trajectory is determined to be less than a second threshold; and

Taking the second weight as the weight;

wherein the first sub-loss value determined from the second weight is greater than the original loss value.

7. The method of any of claims 1 to 6, wherein the determining evaluation values for the plurality of first candidate trajectories, respectively, from the first sensed data and the plurality of first candidate trajectories comprises:

And inputting the first sensing data and the plurality of first candidate tracks into an evaluation model to obtain evaluation values respectively aiming at the plurality of first candidate tracks.

8. The method of claim 7, wherein the assessment model is derived by a training operation comprising:

Inputting second sensing data and a plurality of second candidate tracks into an evaluation model to be trained, and obtaining respective evaluation values of the plurality of second candidate tracks; wherein the second sensed data characterizes environmental information of an environment in which the vehicle is located, the plurality of second candidate trajectories including at least one trajectory pair;

Determining, for each of the at least one track pair, a second sub-loss value based on the evaluation value of each second candidate track of the track pair and the label associated with the track pair; wherein the tag characterizes a relationship between two evaluation values corresponding to two second candidate trajectories in the pair of trajectories;

Determining a second loss value according to the at least one track pair respective second sub-loss value; and

And training the evaluation model according to the second loss value.

9. The method of claim 8, wherein the training operation for the assessment model further comprises:

taking at least part of network layers in the history version automatic driving model as a pre-training model; and

And adding a linear layer at the output end of the pre-training model to obtain the evaluation model to be trained.

10. The method of claim 8, wherein at least one tag associated with the at least one track pair is determined from ranking information of the plurality of second candidate tracks.

11. The method of claim 1, wherein the first sensed data comprises at least one of image data and point cloud data.

12. A method of determining a vehicle travel path, comprising:

Inputting third sensing data into the automatic driving model to obtain a plurality of third candidate tracks; the third sensing data represents environmental information of the environment where the vehicle is located;

Determining respective evaluation values of the plurality of third candidate tracks according to the third sensing data and the plurality of third candidate tracks; and

Determining a target running track from the plurality of third candidate tracks according to the evaluation values of the plurality of third candidate tracks;

Wherein the autopilot model is trained using the method of any one of claims 1 to 11.

13. The method of claim 12, wherein the determining the evaluation value for each of the plurality of third candidate trajectories from the third sensed data and the plurality of third candidate trajectories comprises:

and inputting the third sensing data and the plurality of third candidate tracks into a trained evaluation model to obtain respective evaluation values of the plurality of third candidate tracks.

14. A training device for an autopilot model, comprising:

The first input module is used for inputting first sensing data into the automatic driving model to obtain a plurality of first candidate tracks; the first sensing data represents environmental information of the environment where the vehicle is located;

A first evaluation value determination module for determining evaluation values for the plurality of first candidate tracks, respectively, according to the first sensing data and the plurality of first candidate tracks;

a first sub-loss determination module configured to determine, for each first candidate track of the plurality of first candidate tracks, a first sub-loss value corresponding to the first candidate track according to a difference between the first candidate track and a reference track, and an evaluation value for the first candidate track;

a first loss determination module, configured to determine a first loss value according to a first sub-loss value of each of the plurality of first candidate trajectories; and

And the training module is used for training the automatic driving model according to the first loss value.

15. The apparatus of claim 14, wherein the first sub-loss determination module comprises:

an original loss determination submodule, configured to determine an original loss value according to a difference between the first candidate track and a reference track;

the weight determining submodule is used for determining weight according to the original loss value and the evaluation value aiming at the first candidate track; and

And the first sub-loss determination submodule is used for determining a first sub-loss value corresponding to the first candidate track according to the original loss value and the weight.

16. The apparatus of claim 15, wherein the weight determination submodule comprises:

a first determining unit, configured to determine a first weight when it is determined that the original loss value is equal to or greater than a first threshold value, and an evaluation value for the first candidate track is equal to or greater than a second threshold value; and

A second determining unit configured to take the first weight as the weight;

17. The apparatus of claim 16, wherein the first determining unit comprises:

And the determining subunit is used for determining the first weight according to the evaluation value for the first candidate track.

18. The apparatus of claim 16, wherein the first weight is inversely related to an evaluation value for the first candidate trajectory.

19. The apparatus of claim 15, wherein the weight determination submodule comprises:

A third determining unit, configured to determine a second weight if it is determined that the original loss value is smaller than a first threshold value and the evaluation value for the first candidate track is smaller than a second threshold value; and

A fourth determining unit configured to take the second weight as the weight;

20. The apparatus of any one of claims 14 to 19, wherein the first evaluation value determination module comprises:

And the first input sub-module is used for inputting the first sensing data and the plurality of first candidate tracks into an evaluation model to obtain evaluation values respectively aiming at the plurality of first candidate tracks.

21. The apparatus of claim 20, further comprising: a training module for performing training operation on the evaluation model, the training module comprising:

The second input sub-module is used for inputting the second sensing data and a plurality of second candidate tracks into an evaluation model to be trained to obtain respective evaluation values of the plurality of second candidate tracks; wherein the second sensed data characterizes environmental information of an environment in which the vehicle is located, the plurality of second candidate trajectories including at least one trajectory pair;

A second sub-loss determination sub-module for determining, for each of the at least one track pair, a second sub-loss value based on an evaluation value of each second candidate track of the track pair and a label associated with the track pair; wherein the tag characterizes a relationship between two evaluation values corresponding to two second candidate trajectories in the pair of trajectories;

A second loss determination submodule configured to determine a second loss value based on the at least one track pair respective second sub-loss values; and

And the training sub-module is used for training the evaluation model according to the second loss value.

22. The apparatus of claim 21, wherein the training module further comprises:

The pre-training model determining submodule is used for taking at least part of network layers in the automatic driving model of the historical version as a pre-training model; and

And the adding sub-module is used for adding a linear layer at the output end of the pre-training model to obtain the evaluation model to be trained.

23. The apparatus of claim 21, wherein at least one tag associated with the at least one track pair is determined from ranking information of the plurality of second candidate tracks.

24. The apparatus of claim 15, wherein the first sensing data comprises at least one of image data and point cloud data.

25. An apparatus for determining a vehicle travel path, comprising:

the second input module is used for inputting third sensing data into the automatic driving model to obtain a plurality of third candidate tracks; the third sensing data represents environmental information of the environment where the vehicle is located;

a second evaluation value determining module, configured to determine evaluation values of each of the plurality of third candidate tracks according to the third sensing data and the plurality of third candidate tracks; and

The target track determining module is used for determining a target running track from the plurality of third candidate tracks according to the evaluation values of the plurality of third candidate tracks;

wherein the autopilot model is trained using the apparatus of any one of claims 15 to 24.

26. The apparatus of claim 25, wherein the second evaluation value determination module comprises:

And the third input sub-module is used for inputting the third sensing data and the plurality of third candidate tracks into the trained evaluation model to obtain respective evaluation values of the plurality of third candidate tracks.

27. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.

28. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 12 to 13.

29. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 13.

30. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 13.

31. An autonomous vehicle comprising the electronic device of claim 28.