CN112949756B

CN112949756B - Method and device for model training and trajectory planning

Info

Publication number: CN112949756B
Application number: CN202110338028.6A
Authority: CN
Inventors: 李潇; 丁曙光; 杜挺; 袁克彬; 任冬淳
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-07-15
Anticipated expiration: 2041-03-30
Also published as: CN112949756A

Abstract

The specification discloses a method and a device for model training and trajectory planning, which can acquire state information of unmanned equipment as a training sample, acquire a trajectory confidence coefficient obtained by previous iterative training for each iterative training, input the training sample into a position determination model to obtain a target position output by the position determination model according to the training sample and the acquired trajectory confidence coefficient, and obtain a target trajectory of the unmanned equipment reaching the target position and a trajectory confidence coefficient of the target trajectory through a decision model according to the training sample and the target position. By the method, the position determination model can determine a better target position again based on the track confidence of the target track each time, so that the target track is planned again by the decision model based on the target position, and the position determination model and the decision model are trained.

Description

Method and device for model training and trajectory planning

Technical Field

The specification relates to the technical field of unmanned driving, in particular to a method and a device for model training and trajectory planning.

Background

Generally, when the unmanned device is running, control information can be determined through a machine learning model, and the unmanned device is controlled based on the control information.

For example, the unmanned aerial vehicle may input the current state, environmental information, and the like of the unmanned aerial vehicle into the reinforcement learning model, obtain information such as the accelerator control amount and the steering wheel angle output by the reinforcement learning model, and control the unmanned aerial vehicle to operate according to the information output by the reinforcement learning model.

Actually, in the above manner, during the operation of the unmanned aerial vehicle, the control information is continuously obtained based on the reinforcement learning model over time, and when the unmanned aerial vehicle operates according to the control information, the trajectory of the operation may not be smooth enough in terms of curvature of the trajectory, and the like, so that the comfort and reliability of the operation cannot be ensured well.

Disclosure of Invention

The embodiments of the present disclosure provide a method and an apparatus for model training and trajectory planning, so as to partially solve the above problems in the prior art.

The embodiment of the specification adopts the following technical scheme:

the present specification provides a method of model training, the method comprising:

acquiring state information of the unmanned equipment as a training sample;

according to the training sample, performing iterative training on a position determination model and a decision model by adopting the following method:

for each iteration training, obtaining a track confidence coefficient obtained by the last iteration training, inputting the training sample into the position determination model, and obtaining a target position output by the position determination model according to the training sample and the obtained track confidence coefficient, wherein the position determination model is used for planning the target position of the unmanned equipment;

and obtaining a target track of the unmanned equipment reaching the target position and a track confidence coefficient of the target track through a decision model according to the training sample and the target position.

Optionally, obtaining, by a decision model, a target trajectory of the unmanned aerial vehicle reaching the target position and a trajectory confidence of the target trajectory according to the training sample and the target position, specifically including:

determining a plurality of designated positions in the neighborhood of the target position, and forming a position set by the target position and each designated position;

inputting the training sample and the position set into the decision model to obtain undetermined tracks of the unmanned equipment reaching each position in the position set and track confidence of each undetermined track, wherein the undetermined tracks are output by the decision model;

and selecting the target track from the undetermined tracks according to the track confidence degrees.

Optionally, the decision model comprises a first sub-model and a second sub-model;

obtaining a target track of the unmanned equipment reaching the target position and a track confidence coefficient of the target track through a decision model according to the training sample and the target position, and specifically comprising:

inputting the training sample and the target position into the first submodel to obtain the target track output by the first submodel;

and inputting the target track into the second submodel, and obtaining the track confidence coefficient of the target track by the second submodel according to a plurality of preset parameters.

Optionally, inputting the training sample and the target position into the first submodel to obtain the target trajectory output by the first submodel, specifically including:

acquiring information of each obstacle in the environment where the unmanned equipment is located;

inputting the training samples, the information of each obstacle and the target position into the first submodel to obtain a plurality of undetermined tracks output by the first submodel;

aiming at each obstacle, acquiring a historical track of the obstacle, and determining an estimated track of the obstacle according to the historical track and the information of the obstacle;

and selecting the target track from the undetermined tracks according to the undetermined tracks and the estimated track of each obstacle.

Optionally, the parameters of the second submodel include a velocity weight, an offset weight;

inputting the target track into the second submodel, and obtaining the track confidence coefficient of the target track by the second submodel according to a plurality of preset parameters, wherein the track confidence coefficient specifically comprises the following steps:

determining each track point of the target track, wherein the information of the track point comprises the position of the track point and the speed of the track point;

and determining the track confidence of the target track according to at least one of the position of each track point, the offset weight, the speed of each track point and the speed weight through the second submodel.

The method for planning the track provided by the specification comprises the following steps:

acquiring current state information of the unmanned equipment;

inputting the state information into a position determination model to obtain a target position output by the position determination model;

inputting the state information and the target position into a decision model to obtain a plurality of tracks output by the decision model and a track confidence coefficient of each track;

according to the confidence degree of each track, selecting a designated track in each track, and controlling the unmanned equipment to run according to the designated track, wherein the position determination model and the decision model are trained in advance through the model training method provided by the contents.

The present specification provides an apparatus for model training, the apparatus comprising:

the first acquisition module is used for acquiring the state information of the unmanned equipment as a training sample;

the training module is used for carrying out iterative training on the position determination model and the decision model by adopting the following method according to the training sample:

the first input sub-module is used for acquiring the track confidence coefficient obtained by the last iterative training aiming at each iterative training, inputting the training sample into the position determination model to obtain a target position output by the position determination model according to the acquired track confidence coefficient, and the position determination model is used for planning the target position of the unmanned equipment;

and the second input submodule is used for obtaining a target track of the unmanned equipment reaching the target position and a track confidence coefficient of the target track through a decision model according to the training sample and the target position.

The present specification provides an apparatus for trajectory planning, the apparatus comprising:

the second acquisition module is used for acquiring the current state information of the unmanned equipment;

the position determining module is used for inputting the state information into a position determining model to obtain a target position output by the position determining model;

a track determining module, configured to input the state information and the target position into a decision model, so as to obtain a plurality of tracks output by the decision model and a track confidence of each track;

and the control module is used for selecting a specified track in each track according to the confidence of each track and controlling the unmanned equipment to run according to the specified track, wherein the position determination model and the decision model are trained in advance by a model training method provided by the content.

The present specification provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the above method for model training and trajectory planning.

The present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the method for model training and trajectory planning is implemented.

The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:

the state information of the unmanned equipment can be obtained as a training sample, and the position determination model and the decision model are subjected to iterative training by adopting the following method according to the training sample: and aiming at each iterative training, obtaining the track confidence coefficient obtained by the last iterative training, inputting a training sample into a position determination model to obtain a target position output by the position determination model according to the training sample and the obtained track confidence coefficient, wherein the position determination model is used for planning the target position of the unmanned equipment, and obtaining the target track of the unmanned equipment reaching the target position and the track confidence coefficient of the target track through a decision model according to the training sample and the target position.

By the method, the position determination model can re-determine a better target position based on the track confidence of each target track, so that the target track is planned again by the decision model based on the target position, the position determination model and the decision model are trained, the better target position is determined based on the trained position determination model, the better target track is determined based on the trained decision model, and the unmanned equipment is controlled to operate according to the target track.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

FIG. 1 is a flow chart of a method for model training provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a target position and a target track determined in a model training process according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for trajectory planning according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a model training apparatus provided in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a trajectory planning apparatus provided in an embodiment of the present specification;

fig. 6 is a schematic diagram of an electronic device for implementing the method for model training and trajectory planning according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present specification clearer and more complete, the technical solutions of the present specification will be described in detail and completely with reference to the specific embodiments of the present specification and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without making any creative effort belong to the protection scope of the present specification.

When the unmanned device is operated, control information such as an accelerator control amount can be obtained through a machine learning model such as a reinforcement learning model based on the state of the unmanned device, surrounding environment information and the like, and then the operation of the unmanned device can be controlled according to the control information. The unmanned equipment mainly comprises intelligent unmanned equipment such as unmanned vehicles and unmanned aerial vehicles, and is mainly used for replacing manual goods delivery, for example, goods after being sorted are transported in a large goods storage center, or the goods are transported to another place from a certain place.

In the process, the unmanned aerial vehicle obtains the control quantity at the current moment based on the information at the current moment, operates according to the control quantity at the current moment, and obtains the control quantity at the next moment again according to the information at the next moment, so that the unmanned aerial vehicle does not plan a track, and the unmanned aerial vehicle can operate according to the control information at different moments, which is possibly poor in comfort and the like.

Accordingly, the present specification provides a method for model training and trajectory planning to solve at least some of the above problems.

In this specification, a target position is determined by a position determination model, a target trajectory from which the unmanned equipment reaches the target position from a current position and a trajectory confidence of the target trajectory are determined by a decision model, in a model training phase, a better target position is obtained by the position determination model again based on the trajectory confidence of the target trajectory obtained in a last iteration process, the target trajectory is planned again according to the target position and the trajectory confidence of the target trajectory is determined by the decision model, and the position determination model and the decision model are trained continuously and iteratively according to the above contents. After the position determination model and the decision model are trained, the unmanned device can determine a target position through the position determination model according to state information of the unmanned device, surrounding environment information and the like, determine a target track of the unmanned device reaching the target position through the decision model, and finally, can control the unmanned device to run according to the target track.

Therefore, the contents of training the position determination model and the decision model will be described first. The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a method for training a model according to an embodiment of the present disclosure, and as shown in fig. 1, state information of an unmanned device may be first obtained as a training sample, and then an iterative training may be performed on a position determination model and a decision model based on the training sample.

First, the determination of training samples.

Since the position determination model and the decision model may be integrally trained or separately trained in this specification, when the position determination model and the decision model are integrally trained, state information of the unmanned aerial vehicle, which may include information such as a position, a speed, and an acceleration of the unmanned aerial vehicle, information of an environment in which the unmanned aerial vehicle is located, which may include road information, information of each obstacle, and the like, and an obstacle, which may be classified into a dynamic obstacle such as a vehicle, a pedestrian, and a static obstacle such as a road isolation device, may be acquired as a training sample. Here, the overall training of the position determination model and the decision model is mainly described, and the following can be referred to for the process of individually training the position determination model and the decision model.

An iterative process of training the position determination model and the decision model based on the training samples is then described.

For each iteration training, firstly, the track confidence coefficient obtained by the last iteration training can be obtained, then, the training sample can be input into the position determination model, and the target position output by the position determination model according to the training sample and the obtained track confidence coefficient is obtained, wherein the position determination model is used for planning the target position of the unmanned equipment.

Specifically, the location determination model and the decision model may be machine learning models, and the location determination model may be a machine learning model based on a reinforcement learning idea or a machine learning model based on a simulation learning idea. In any model, the input information of the position determination model may include information such as a training sample and a trajectory confidence obtained in the last iterative training, and the output information is the target position, that is, the target position to which the unmanned device needs to reach may be determined based on the information of the unmanned device and the ambient environment information through the position determination model.

The target position is a position in an environment where the unmanned device is located, for example, a position in front of the unmanned device is located in a current lane, and for example, a position in a neighboring lane of the current lane is located in front of a side of the unmanned device. The information of the target position may be represented by two-dimensional coordinates, and the coordinate System may be selected based on a coordinate System having a lane line as a coordinate axis, or may be a coordinate System such as Universal Transverse organ Grid System (UTM). It should be noted here that, when the current iteration process is the first iteration, since there is no training process of the last iteration, the trajectory confidence may be initialized at this time.

After the target position is obtained according to the position determination model, a plurality of designated positions can be determined in the neighborhood of the target position, and a position set is formed by the target position and each designated position. Each position in the position set can be used as a position to which the unmanned equipment needs to arrive, that is, can be used as a basis for planning a track of the unmanned equipment.

The number of the designated positions may be fixed or unfixed, at least one designated position may be randomly determined within a neighborhood range of the target position, or the designated position may be determined according to a certain rule, where the rule may include determining, within the neighborhood range of the target position, a location located at a distance that is an integral multiple of a preset length of the target position as the designated position, for example, if the preset length may be 0.5 m, then determining, as the designated position, a position located at 0.5 m in front of, behind, on the left of, or on the right of the target position, and determining, as the designated position, a position located at 1 m in front of the target position, and the like. Of course, the rules may also include other ways, and details about the contents of other rules are not described herein again.

Here, the step of determining the position set from the target position may be an unnecessary step, that is, in this specification, if the position set is obtained from the target position, the training sample and the position set may be input to the decision model, and if the position set is not obtained from the target position, the training sample and the target position may be input to the decision model.

And finally, according to the training sample and the target position, obtaining a target track of the unmanned equipment reaching the target position and a track confidence coefficient of the target track through a decision model.

Specifically, after the position set is obtained, the training sample and the position set may be input into the decision model, so as to obtain the undetermined trajectory of each position included in the unmanned equipment arrival position set output by the decision model and the trajectory confidence of each undetermined trajectory. If the position set is not obtained according to the target position, the training sample and the target position can be input into the decision model, and a plurality of undetermined tracks of the unmanned equipment reaching the target position and the track confidence coefficient of each undetermined track output by the decision model are obtained. Then, according to the confidence degree of each track, a target track can be selected from the undetermined tracks.

Further, the decision model may further include a first sub-model operable to plan a trajectory of the drone from the current location to each location in the set of target locations, and a second sub-model operable to determine a trajectory confidence for each trajectory. Therefore, the information such as the training sample, the target position and the like can be output to the first sub-model to obtain each track output by the first sub-model, the track output by the first sub-model can be input to the second sub-model, and the track confidence of each track can be obtained by the second sub-model according to a plurality of preset parameters.

With respect to the first submodel: the method comprises the steps of obtaining information of each obstacle in the environment where the unmanned equipment is located, inputting information such as training samples, information of each obstacle and target positions into a first submodel, obtaining a plurality of tracks output by the first submodel, obtaining historical tracks of the obstacles according to the first submodel, determining estimated tracks of the obstacles according to the historical tracks and the information of the obstacles, and selecting a target track in each undetermined track according to each undetermined track and the estimated tracks of the obstacles.

Specifically, a plurality of tracks are obtained through the first sub-model, and the content of the estimated track of the obstacle is determined according to the information of the obstacle, which refers to the existing technical scheme, and there are many schemes for planning the track by a machine learning model at present, and the description is not repeated.

When a target track is selected, the undetermined tracks can be compared with the estimated tracks of the obstacles one by one aiming at each undetermined track, if the undetermined track is overlapped with the estimated track of any obstacle, collision possibly occurs between the unmanned equipment and the obstacle when the unmanned equipment runs according to the undetermined track, therefore, the undetermined track can be removed, and if the undetermined track is not overlapped with the estimated track of any obstacle, collision possibly does not occur between the unmanned equipment and the obstacle when the unmanned equipment runs according to the undetermined track, therefore, the undetermined track does not need to be removed. After screening and removing are carried out on all the undetermined tracks, the track confidence coefficient of each undetermined track can be determined by the second submodel in the rest undetermined tracks, the target track is determined based on the track confidence coefficient of each undetermined track, or the target track is selected according to a preset rule in the rest undetermined tracks, wherein the preset rule can comprise that the track curvature of each undetermined track meets a preset curvature limit, the speed value of each track point in each undetermined track is in a preset speed interval, and the like. Of course, in this specification, the remaining undetermined trajectories may also be directly used as target trajectories.

With respect to the second submodel: the parameters of the second submodel may include a speed weight, an offset weight, wherein the speed weight may also include a travel speed weight, an acceleration weight, and the like. In this specification, each track point of target track can be confirmed, wherein, the information of track point includes information such as position, the speed (promptly, the speed of traveling) of track point, unmanned aerial vehicle at the acceleration of track point, can input the second submodel with target track, through the second submodel, according to at least one in position, skew weight, the speed of each track point, the speed weight of each track point, the orbit confidence of target track is confirmed.

Specifically, the information of the target track (i.e., the information of each track point located on the target track) may be input into the second submodel, and the track confidence of the target track may be determined according to one or more of the information of the position, the offset weight, the driving speed and the driving speed weight of each track point, the acceleration and the acceleration weight of the unmanned device at each track point, and the like, of each track point through the second submodel. In practice, the second submodel may be formulated such that a product of the position of each track point and the offset weight may be determined as a first product, a product of the travel speed of each track point and the travel speed weight may be determined as a second product, and a product of the acceleration of each unmanned aerial vehicle at each track point and the acceleration weight may be determined as a third product, a weighted value of any one of the first product, the second product, and the third product may be determined as the track confidence of the target track, or a weighted sum value of a plurality of the first product, the second product, and the third product may be determined as the track confidence of the target track.

In addition, when the second submodel determines the track confidence of the target track or each undetermined track, for each track needing to determine the track confidence, the information of each obstacle can be input into the second submodel, especially the estimated track of each obstacle is input into the second submodel, and the track confidence of the track is determined by the second submodel based on the estimated track of each obstacle, preset parameters (including offset weight, driving weight and the like of the content) and the information of the track. When determining the track confidence of the track based on the information of each obstacle, the track confidence of the track may be generally determined based on whether the estimated track of each obstacle overlaps with the track to determine whether the track will collide with each obstacle, and whether the distance between the estimated track of each obstacle and the track is greater than a preset safe distance to determine whether the unmanned device keeps a good safe distance from each obstacle when the unmanned device travels along the track.

When the second submodel determines the track confidence of each to-be-determined track, a target track needs to be selected from the to-be-determined tracks according to the track confidence of each to-be-determined track, at this time, the to-be-determined tracks can be sorted according to the estimated confidence of each to-be-determined track, and the target track is selected according to a sorting result, for example, the to-be-determined track with the largest track confidence can be selected as the target track, or the to-be-determined track with the largest track confidence is randomly selected from a plurality of to-be-determined tracks which are sorted in the front. Naturally, a track confidence threshold may also be preset, and in each to-be-determined track whose track confidence is greater than the track confidence threshold, the to-be-determined track is randomly selected as a target track, and the like.

Fig. 2 is a schematic diagram of a target position and a target trajectory determined in a model training process provided in an embodiment of this specification, in fig. 2, a vehicle a is an unmanned device, a vehicle B is an obstacle, a black dot is a target position, and each dotted line between the vehicle a and the black dot represents each undetermined trajectory, where the target position shown in a case a in fig. 2 is determined by a position determination model according to information of the vehicle a and the vehicle B, road information, and the like, three undetermined trajectories from the vehicle a to the target position shown in a case B in fig. 2 are planned by a decision model according to information of the target position and the like, and trajectory confidences of the three undetermined trajectories are determined, and the three undetermined trajectories can all be used as target trajectories, and the trajectory confidences of the three undetermined trajectories are input position determination model in a next iterative training process (i.e. a case B in fig. 2), or selecting a target track from the three undetermined tracks according to the track confidence of each undetermined track, inputting a position determination model into the track confidence of the target track in the next iterative training process, wherein the track confidence of the target track shown in the case B in fig. 2 is lower because the target position shown in the case a in fig. 2 is positioned between the vehicle a and the vehicle B, so that after the track confidence of the target track shown in the case B in fig. 2 is input into the position determination model in the next iterative process, the target position determined by the position determination model can be shown in the case c in fig. 2, then determining the confidence of the target track and the target track again by the decision model based on the target position shown in the case c in fig. 2, and so on, iteratively training the position determination model and the decision model until the training conditions are met, the training condition may include that the iteration number is greater than a preset iteration number threshold, the trajectory confidence coefficient of the target trajectory converges, and the like.

Based on the above, when the position determination model and the decision model are integrally trained, the whole process is to input the training sample and the track confidence obtained in the previous iteration process into the position determination model, determine the target position by the position determination model, input the information of the target position into the decision model, and determine the target track and the track confidence of the target track by the decision model. Therefore, the idea of reinforcement learning is adopted for the overall training of the position determination model and the decision model, that is, the track confidence obtained in the previous iteration process can be used as a reward, and based on the reward obtained in the previous iteration process and the training sample, the position determination model outputs a target position (i.e., the target position can be used as an action), and the decision model determines a target track (i.e., it can be regarded as performing an action) and a track confidence of the target track (i.e., it can be regarded as a reward given when performing an action) based on the target position. Then, the reward maximization is used as a training target, and the position determination model and the decision model are integrally trained, that is, the target track is better (for example, the target track approaches to expert data), the track confidence of the target track is higher, the reward is higher, in the next iteration process, the target position determined by the position determination model is better based on the higher reward, the decision model determines a better target track based on the better target position, and the track confidence of the obtained target track is higher, so that the loop can be formed, and when the number of iterations reaches a preset number threshold or when the reward approaches to stability or meets other preset conditions, the integral training of the position determination model and the decision model can be completed.

In addition, the present description may also train the location determination model and the decision model separately.

When the position determination model is trained independently, the training samples and the actual positions corresponding to the training samples can be obtained, wherein the actual positions corresponding to the training samples can be expert data, and the actual positions corresponding to the training samples can be used as marking information. Then, information such as training samples and the like can be input into the position determination model to be trained to obtain a target position output by the position determination model to be trained, and the position determination model is supervised-trained according to the target position and the labeling information, namely, the difference between the target position and the labeling information can be determined as loss, the loss is minimized as a training target, and the position determination model is trained.

When the decision model is trained independently, the decision model can be supervised and trained by referring to the training process of the position determination model, that is, the information of the unmanned equipment and the information of the target position can be obtained as training samples, a plurality of tracks corresponding to the training samples are obtained as labeling tracks, the track confidence coefficient of each labeling track is obtained, wherein the track confidence coefficients of the labeling tracks and the labeling tracks are expert data, the training samples can be input into the decision model, the decision model outputs a plurality of undetermined tracks and the track confidence coefficient of each undetermined track, a first difference between each undetermined track and the labeling track and a second difference between the track confidence coefficient of each undetermined track and the track confidence coefficient of the labeling track can be determined, and the loss is determined according to the first difference and the second difference, wherein the loss is positively correlated with the first difference, and the loss is positively correlated with the second difference, and the decision model is subjected to supervised training by taking the loss minimization as a training target.

Of course, as described above, the decision model may also be divided into a first sub-model and a second sub-model, where a manner of determining the trajectory of the drone from the current location to the target location by the first sub-model may refer to an existing technical scheme, and therefore, a training process for the first sub-model may also refer to the existing technical scheme, which is not described herein again.

The training of the second submodel is described below. The method comprises the steps of obtaining a plurality of tracks as training samples, obtaining the actual track confidence coefficient of each track as labeling information, wherein the actual track confidence coefficient of each track can be determined by other technical schemes or directly determined by other modules, inputting the training samples into a second submodel, obtaining the estimated track confidence coefficient output by the second submodel, and carrying out supervised training on the second submodel according to the estimated track confidence coefficient and the labeling information, for example, determining the difference between the estimated track confidence coefficient and the labeling information as loss, minimizing the loss to be a training target, and adjusting each parameter in the second submodel.

Based on the above, after the position determination model and the decision model are trained, the position determination model and the decision model may be applied to trajectory planning of the unmanned device and a control scenario of the unmanned device, so this specification further provides a method for trajectory planning, and fig. 3 is a flowchart of a method for trajectory planning provided in an embodiment of the specification, and specifically includes the following steps:

s200: and acquiring the current state information of the unmanned equipment.

S202: and inputting the state information into a position determination model to obtain a target position output by the position determination model.

S204: and inputting the state information and the target position into a decision model to obtain a plurality of tracks output by the decision model and the track confidence of each track.

S206: and selecting a specified track from the tracks according to the confidence of each track, and controlling the unmanned equipment to run according to the specified track.

After the training of the position determination model and the decision model is completed, the target position can be determined based on the position determination model, and the trajectory of the unmanned device is planned based on the decision model, so that the unmanned device can run according to the trajectory of the trajectory.

Specifically, the current state information of the unmanned aerial vehicle can be obtained, in addition, the environment information of the unmanned aerial vehicle can also be obtained, the state information, the environment information and the like are input into a position determination model, a target position is obtained through the position determination model, then the state information, the environment information and the like of the unmanned aerial vehicle are input into a decision model, a plurality of tracks and the track confidence coefficient of each track are obtained through the decision model, wherein in the specification, the decision model can also comprise a first submodel and a second submodel, the state information, the environment information and the like of the unmanned aerial vehicle are input into the first submodel, a plurality of tracks output by the first submodel are obtained, and each track is input into the second submodel, so that the track confidence coefficient of each track output by the second submodel is obtained.

After the tracks and the track confidence degrees of the tracks are obtained, the tracks can be sorted according to the track confidence degrees of the tracks, and the designated track is selected according to the sorting result, for example, the track can be randomly selected from a plurality of tracks at the front of the sorting result as the designated track, or the track with the highest track confidence degree can be directly selected as the designated track, or for example, a track confidence degree threshold value can be set, and the track can be randomly selected as the designated track in each track with the track confidence degree larger than the track confidence degree threshold value, and the like.

After the appointed track is determined, because the appointed track can be composed of track points, the information of each track point can comprise the information of the position of the track point, the running speed of the track point, the acceleration of the unmanned device at the track point and the like, and therefore the unmanned device can be controlled to run according to the appointed track (or the information of the track points on the appointed track).

The method for controlling the unmanned aerial vehicle provided by the specification can be particularly applied to the field of delivery by using the unmanned aerial vehicle, for example, delivery scenes such as express delivery and takeout by using the unmanned aerial vehicle. Specifically, in the above-described scenario, the distribution may be performed using an unmanned vehicle group constituted by a plurality of unmanned devices.

Based on the method for model training described above, the embodiment of the present specification further provides a schematic structural diagram of a device for model training, as shown in fig. 4.

Fig. 4 is a schematic structural diagram of an apparatus for model training provided in an embodiment of the present disclosure, where the apparatus includes:

a first obtaining module 400, configured to obtain state information of the unmanned device as a training sample;

a training module 402, configured to perform iterative training on the position determination model and the decision model according to the training samples by using the following method:

the first input submodule 4020 is configured to, for each iterative training, obtain a trajectory confidence degree obtained by the previous iterative training, input the training sample into the position determination model, and obtain a target position output by the position determination model according to the obtained trajectory confidence degree, where the position determination model is used to plan the target position of the unmanned device;

the second input sub-module 4022 is configured to obtain, according to the training sample and the target position, a target trajectory of the unmanned device reaching the target position and a trajectory confidence of the target trajectory through a decision model.

Optionally, the second input sub-module 4022 is specifically configured to determine a plurality of designated positions in a neighborhood of the target position, and form a position set by the target position and each designated position; inputting the training sample and the position set into the decision model to obtain undetermined tracks of the unmanned equipment reaching each position in the position set and track confidence of each undetermined track, wherein the undetermined tracks are output by the decision model; and selecting the target track from the undetermined tracks according to the confidence coefficient of each track.

Optionally, the decision model comprises a first submodel and a second submodel;

the second input sub-module 4022 is specifically configured to input the training sample and the target position into the first sub-model to obtain the target trajectory output by the first sub-model; and inputting the target track into the second submodel, and obtaining the track confidence of the target track by the second submodel according to a plurality of preset parameters.

Optionally, the second input sub-module 4022 is specifically configured to acquire information of each obstacle in an environment where the unmanned device is located; inputting the training sample, the information of each obstacle and the target position into the first submodel to obtain a plurality of undetermined tracks output by the first submodel; aiming at each obstacle, acquiring a historical track of the obstacle, and determining an estimated track of the obstacle according to the historical track and the information of the obstacle; and selecting the target track from the undetermined tracks according to the undetermined tracks and the estimated track of each obstacle.

the second input sub-module 4022 is specifically configured to determine each trace point of the target track, where information of the trace point includes a position of the trace point and a speed of the trace point; and determining the track confidence of the target track according to at least one of the position of each track point, the offset weight, the speed of each track point and the speed weight through the second submodel.

Based on the above method for model training, the embodiment of the present specification further provides a schematic structural diagram of a trajectory planning device, as shown in fig. 5.

Fig. 5 is a schematic structural diagram of an apparatus for trajectory planning provided in an embodiment of the present specification, where the apparatus includes:

a second obtaining module 500, configured to obtain current state information of the unmanned device;

a position determining module 502, configured to input the state information into a position determining model to obtain a target position output by the position determining model;

a track determining module 504, configured to input the state information and the target position into a decision model, so as to obtain a plurality of tracks output by the decision model and a track confidence of each track;

a control module 506, configured to select a specified trajectory from the trajectories according to the confidence of the trajectories, and control the unmanned device to operate according to the specified trajectory, where the position determination model and the decision model are trained in advance by a model training method provided in the foregoing.

Embodiments of the present specification further provide a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is operable to execute the method for model training and trajectory planning described above.

Based on the method for model training and trajectory planning described above, the embodiment of the present specification further provides a schematic structural diagram of the electronic device shown in fig. 6. As shown in fig. 6, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the model training and trajectory planning method described above.

Of course, besides the software implementation, this specification does not exclude other implementations, such as logic devices or combination of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90's of the 20 th century, improvements to a technology could clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements to process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical blocks. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is an integrated circuit whose Logic functions are determined by a user programming the Device. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium that stores computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of model training, the method comprising:

acquiring state information of the unmanned equipment as a training sample;

according to the training sample, the position determination model and the decision model are subjected to iterative training by adopting the following method:

for each iteration training, obtaining a track confidence coefficient obtained by the last iteration training, inputting the training sample and the obtained track confidence coefficient of the last iteration training into the position determination model, and obtaining a target position output by the position determination model according to the training sample and the obtained track confidence coefficient, wherein the position determination model is used for planning the target position of the unmanned equipment;

the decision model comprises a first submodel and a second submodel;

inputting the training sample and the target position into the first submodel to obtain a plurality of undetermined tracks of the unmanned equipment output by the first submodel when reaching the target position;

inputting the track to be determined into the second submodel, and obtaining the track confidence coefficient of the track to be determined according to a plurality of preset parameters by the second submodel;

selecting a target track from all the to-be-determined tracks according to a preset rule; the preset rule comprises that the track curvature of the target track meets a preset curvature limit;

inputting the training sample and the target position into the first submodel to obtain a plurality of undetermined tracks, output by the first submodel, of the unmanned equipment reaching the target position, and specifically comprising:

inputting the training sample, the information of each obstacle and the target position into the first submodel to obtain a plurality of undetermined tracks output by the first submodel;

according to a preset rule, before selecting the target track from the pending tracks, the method further comprises:

acquiring a historical track of each obstacle, and determining an estimated track of each obstacle according to the historical track and information of the obstacle;

and screening each undetermined track according to each undetermined track and the estimated track of each obstacle.

2. The method of claim 1, wherein inputting the training samples and the target location into the first submodel to obtain a number of pending trajectories for the drone to reach the target location output by the first submodel, comprises:

inputting the training sample and the position set into the first submodel to obtain undetermined tracks of the unmanned equipment, which are output by the first submodel, reaching each position contained in the position set;

inputting the to-be-determined track into the second submodel, and obtaining the track confidence of the to-be-determined track by the second submodel according to a plurality of preset parameters, wherein the track confidence specifically comprises the following steps:

inputting the undetermined track of each position, which is included in the position set and is reached by the unmanned equipment and output by the first submodel, into the second submodel, and obtaining the track confidence coefficient of each undetermined track by the second submodel according to a plurality of preset parameters;

the method further comprises the following steps:

and selecting the target track from the undetermined tracks according to the confidence coefficient of each track.

3. The method of claim 1, wherein the parameters of the second submodel include a speed weight, an offset weight;

4. A method of trajectory planning, the method comprising:

acquiring current state information of the unmanned equipment;

selecting a designated track from the tracks according to the confidence of each track, and controlling the unmanned equipment to operate according to the designated track, wherein the position determination model and the decision model are trained in advance by the method of any one of claims 1 to 3.

5. An apparatus for model training, the apparatus comprising:

the first input submodule is used for acquiring a track confidence coefficient obtained by last iterative training aiming at each iterative training, inputting the training sample and the acquired track confidence coefficient of the last iterative training into the position determination model, and obtaining a target position output by the position determination model according to the acquired track confidence coefficient, wherein the position determination model is used for planning the target position of the unmanned equipment;

the second input submodule is used for enabling the decision model to comprise a first submodel and a second submodel; inputting the training sample and the target position into the first submodel to obtain a plurality of undetermined tracks, output by the first submodel, of the unmanned equipment reaching the target position; inputting the track to be determined into the second submodel, and obtaining the track confidence coefficient of the track to be determined according to a plurality of preset parameters by the second submodel; selecting a target track from all the to-be-determined tracks according to a preset rule; the preset rule comprises that the track curvature of the target track meets a preset curvature limit;

6. An apparatus for trajectory planning, the apparatus comprising:

a control module, configured to select a specified trajectory in each trajectory according to the confidence of each trajectory, and control the unmanned aerial vehicle to operate according to the specified trajectory, wherein the position determination model and the decision model are trained in advance by the method according to any one of claims 1 to 3.

7. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-3 or 4.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-3 or 4 when executing the program.