CN112306059B - Training method, control method and device for control model - Google Patents

Training method, control method and device for control model Download PDF

Info

Publication number
CN112306059B
CN112306059B CN202011104599.5A CN202011104599A CN112306059B CN 112306059 B CN112306059 B CN 112306059B CN 202011104599 A CN202011104599 A CN 202011104599A CN 112306059 B CN112306059 B CN 112306059B
Authority
CN
China
Prior art keywords
control model
track
obstacle
state data
interaction level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011104599.5A
Other languages
Chinese (zh)
Other versions
CN112306059A (en
Inventor
金昕泽
白钰
贾庆山
任冬淳
李阔
刘思威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Tsinghua University
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Beijing Sankuai Online Technology Co Ltd filed Critical Tsinghua University
Priority to CN202011104599.5A priority Critical patent/CN112306059B/en
Publication of CN112306059A publication Critical patent/CN112306059A/en
Application granted granted Critical
Publication of CN112306059B publication Critical patent/CN112306059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0259Control of position or course in two dimensions specially adapted to land vehicles using magnetic or electromagnetic means
    • G05D1/0263Control of position or course in two dimensions specially adapted to land vehicles using magnetic or electromagnetic means using magnetic strips

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Electromagnetism (AREA)
  • Optics & Photonics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Train Traffic Observation, Control, And Security (AREA)
  • Feedback Control In General (AREA)

Abstract

The specification discloses a training method, a control method and a device of a control model, which acquire state data of a first acquisition device and a first obstacle around the first acquisition device at a set historical moment as first state data. And inputting the first state data into a control model, and determining the planned track of the first acquisition equipment after the set historical moment. And determining the rewarding value of the rewarding function corresponding to the control model according to the planning track, and training the control model according to the rewarding value. According to the method, the first acquisition equipment and the state data of the obstacles around the first acquisition equipment at the set historical moment are used for training the control model, the track is planned through the trained control model, and the unmanned equipment is controlled according to the planned track, so that the probability of collision between the unmanned equipment and the surrounding obstacles is reduced.

Description

Training method, control method and device for control model
Technical Field
The present disclosure relates to the field of unmanned technologies, and in particular, to a training method, a control method, and a device for a control model.
Background
At present, the intersection scene in urban traffic is complex in traffic condition and non-uniform in mode, so that unmanned equipment cannot plan a reasonable movement track under the complex scene.
The unmanned equipment can meet a plurality of obstacles on the road with complex traffic conditions, and the unmanned equipment is often prevented from colliding with the obstacles according to the conditions of the unmanned equipment, but in practical application, the accuracy of avoiding the obstacles only according to the conditions of the unmanned equipment is not high, the possibility of colliding with other surrounding obstacles exists, and the safety is low.
Therefore, how to plan a reasonable movement path according to the interaction situation of surrounding traffic participants by the unmanned device is a problem to be solved.
Disclosure of Invention
The present disclosure provides a training method and apparatus for a control model, a storage medium, and an electronic device, so as to partially solve the foregoing problems in the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides a training method of a control model, comprising the following steps:
acquiring state data of a first acquisition device and a first obstacle around the first acquisition device at a set historical moment as first state data;
Inputting the first state data into a control model, and determining a planned track of the first acquisition device after the set historical moment, wherein the planned track is determined by the control model according to the condition that the first obstacle runs according to a second track after the set historical moment, the second track is determined by the control model according to the first state data, or the control model is determined according to the first state data when the first acquisition device runs according to the first track after the set historical moment, and the first track is a basic track of the first acquisition device after the set historical moment, which is determined by the control model according to the first state data;
and determining the rewarding value of the rewarding function corresponding to the control model according to the planning track, and training the control model according to the rewarding value.
Optionally, before inputting the first state data into the control model to be trained, the method further comprises:
acquiring state data of the first acquisition equipment and the first obstacle before the set historical moment as second state data;
Inputting the second state data into a pre-trained interaction state prediction model to predict obstacle interaction levels of the first obstacles, different obstacle interaction levels being used to characterize different interaction states between the first obstacles;
inputting the first state data into a control model, and determining a planned track of the first acquisition device after the set historical moment, wherein the method specifically comprises the following steps of:
and inputting the first state data into a control model corresponding to the obstacle interaction level, and determining a planned track of the acquisition equipment after the set historical moment under the obstacle interaction level.
Optionally, the pre-trained interaction state prediction model specifically includes:
obtaining a training sample;
inputting state data of a second acquisition device and a second obstacle around the second acquisition device contained in the training sample to the interaction state prediction model to predict an interaction level of the second obstacle;
and training the interaction state prediction model by taking the deviation between the interaction level of the minimized second obstacle and the tag data contained in the training sample as an optimization target.
Optionally, if the obstacle interaction level of the first obstacle at the set historical moment is K, the device interaction level of the first acquisition device at the set historical moment is k+1, and K is an integer not less than 2.
Optionally, the first state data is input into a control model to be trained to determine a planned track of the first acquisition device after the set historical moment, which specifically includes:
if the obstacle interaction level is determined to be K, acquiring a first model parameter related to the equipment interaction level K-1, and taking the first model parameter as a model parameter related to the equipment interaction level K-1 in a control model corresponding to the obstacle interaction level K, wherein the first model parameter is acquired from a control model K-2 corresponding to the trained obstacle interaction level K-2;
inputting the first state data into a control model K corresponding to the obstacle interaction level K, and determining a motion track corresponding to the first acquisition device under the obstacle interaction level K as the planning track, wherein the planning track is a motion track corresponding to the first acquisition device under the device interaction level K+1, the planning track is determined by the control model K by taking the motion track of the first obstacle under the obstacle interaction level K as the second track according to the second track, the second track is determined by taking the motion track of the first acquisition device under the device interaction level K-1 as the first track according to the first track and the first state data, and the first track is determined by the control model K according to model parameters related to the device interaction level K-1 and the first state data;
Training the control model according to the reward value, wherein the training comprises the following steps:
and according to the reward value, adjusting model parameters of the obstacle interaction level K and model parameters of the equipment interaction level K+1 contained in the control model K so as to train the control model K.
Optionally, the first state data is input into a control model, and a planned track of the first acquisition device after the set historical moment is determined, which specifically includes:
if the obstacle interaction level is determined to be 0, inputting the first state data into a control model 0 corresponding to the obstacle interaction level 0, determining a motion track of the first acquisition device when the equipment interaction level is 1, and taking the motion track as the planning track of the first acquisition device under the obstacle interaction level 0, wherein the planning track is a second track by taking the motion track of the first obstacle under the obstacle interaction level 0 as a second track by the control model 0, and determining according to the second track, and the second track is determined by the control model 0 according to the first state data;
training the control model according to the reward value, wherein the training comprises the following steps:
And according to the reward value, adjusting model parameters of the obstacle interaction level 0 and model parameters of the equipment interaction level 1 contained in the control model 0 to train the control model 0.
Optionally, the first state data is input into a control model, and a planned track of the first acquisition device after the set historical moment is determined, which specifically includes:
if the obstacle interaction level is determined to be 1, inputting the first state data into a control model 1 corresponding to the obstacle interaction level 1, determining a motion track corresponding to the first acquisition device under the obstacle interaction level 1 as the planning track, wherein the planning track is a motion track corresponding to the first acquisition device under the device interaction level 2, the planning track is determined by the control model 1 by taking the motion track of the first obstacle under the obstacle interaction level 1 as the second track and according to the second track, the second track is determined by the control model 1 by taking the motion track of the first acquisition device under the device interaction level 0 as the first track and according to the first track and the first state data, and the first track is determined by the control model 1 according to the first state data;
Training the control model according to the reward value, wherein the training comprises the following steps:
and according to the reward value, adjusting the model parameters of the equipment interaction level 0, the model parameters of the obstacle interaction level 1 and the model parameters of the equipment interaction level 2 contained in the control model 1 to train the control model 1.
Optionally, determining a reward value of a reward function corresponding to the control model according to the planned trajectory, and training the control model according to the reward value, which specifically includes:
determining a first influence factor in the reward function according to the planned trajectory and the historical motion trajectory of the first obstacle, and determining a second influence factor in the reward function according to the planned trajectory, wherein the first influence factor is used for representing the collision probability of the first acquisition equipment and the first obstacle, and the second influence factor is used for representing the running efficiency of the first acquisition equipment;
determining the rewarding value of the rewarding function according to the first influencing factor and the second influencing factor, and training the control model by taking the maximized rewarding value as a training target.
The present specification provides a control method of an unmanned apparatus, comprising:
acquiring state data of unmanned equipment and obstacles around the unmanned equipment at the current moment as current state data;
inputting the current state data into a control model, and determining a planned track of the unmanned aerial vehicle after the current moment, wherein the planned track is determined by the control model according to the situation that the obstacle runs according to the obstacle track after the current moment, the obstacle track is determined by the control model according to the current state data, or the control model is determined according to the current state data when the unmanned aerial vehicle runs according to a basic track after the current moment, the basic track is determined by the control model according to the current state data, and the control model is obtained by a training method of the control model;
and controlling the unmanned equipment according to the planned track.
The present specification provides a training device of a control model, comprising:
the acquisition module is used for acquiring state data of the first acquisition equipment and a first obstacle around the first acquisition equipment at a set historical moment as first state data;
The determining module is used for inputting the first state data into a control model, determining a planned track of the first acquisition device after the set historical moment, wherein the planned track is determined by the control model according to the condition that the first obstacle runs according to a second track after the set historical moment, the second track is determined by the control model according to the first state data, or the control model is determined according to the first state data when the first acquisition device runs according to the first track after the set historical moment, and the first track is a basic track of the first acquisition device after the set historical moment, which is determined by the control model according to the first state data;
and the training module is used for determining the rewarding value of the rewarding function corresponding to the control model according to the planning track and training the control model according to the rewarding value.
The present specification provides a control device of an unmanned apparatus, comprising:
the acquisition module is used for acquiring state data of the unmanned equipment and obstacles around the unmanned equipment at the current moment as current state data;
The determining module is used for inputting the current state data into a control model, determining a planned track of the unmanned equipment after the current moment, wherein the planned track is determined by the control model according to the situation that the obstacle runs according to the obstacle track after the current moment, the obstacle track is determined by the control model according to the current state data, or the control model is determined by the control model according to the current state data when the unmanned equipment runs according to a basic track, the basic track is determined by the control model according to the current state data, and the control model is obtained by a training method of the control model;
and the control module is used for controlling the unmanned equipment according to the planned track.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the training method of the control model or the control method of the unmanned apparatus described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a training method of the control model or a control method of the unmanned device when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the training method of the control model provided by the specification, state data of the first collecting device and first obstacles around the first collecting device at the set historical moment are obtained to be used as the first state data, the first state data are input into the control model, a planned track of the first collecting device after the set historical moment is determined, wherein the planned track is determined by the control model according to the condition that the first obstacles travel according to the second track after the set historical moment, the second track is determined by the control model according to the first state data, or the control model is determined according to the first state data when the first collecting device travels according to the first track after the set historical moment, and the first track is a basic track of the first collecting device, which is determined by the control model according to the first state data, after the set historical moment. And then determining the rewarding value of the rewarding function corresponding to the control model according to the planning track, and training the control model according to the rewarding value.
The above model training process can show that the motion trail of the first acquisition device output by the control model after the setting of the history time is determined by taking the first obstacle into consideration that the first acquisition device runs according to a certain motion trail after the setting of the history time. By training the control model in the mode, the control model is applied to practical application, so that the unmanned equipment can determine the driving strategy under the condition that the next motion state of surrounding obstacles is considered, the safe driving of the unmanned equipment is effectively ensured, and the collision with the surrounding obstacles is avoided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
fig. 1 is a flow chart of a training method of a control model according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a motion track of an acquisition device and an obstacle according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram for training a control model with an obstacle interaction level of 4 according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a training process of the control model 0 according to the embodiment of the present disclosure;
fig. 5 is a schematic flow chart of a control method of the unmanned device according to the embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an interaction state prediction model and a control model according to an embodiment of the present disclosure;
FIG. 7 is a schematic structural diagram of a control model training device according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a control device of the unmanned device according to the embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
In the embodiment of the present disclosure, before planning the trajectory according to the first state data, a pre-trained control model is needed, and all the process of training the control model will be described first, as shown in fig. 1.
Fig. 1 is a flow chart of a training method of a control model according to an embodiment of the present disclosure, which specifically includes the following steps:
s100: and acquiring state data of the first acquisition equipment and first obstacles around the first acquisition equipment at the set historical moment as first state data.
In the embodiment of the present disclosure, the first collecting device may collect, during a movement process, its own state data and state data of a first obstacle around the collecting device at a set historical time, where the first collecting device refers to a device that performs data collection during a training process of a control model, such as a device that is driven by a person, a robot that is controlled by a person, and so on. The first obstacle mentioned here refers to an object that can move around a vehicle, a bicycle, a pedestrian, etc. during the movement of the first collecting device, that is, a dynamic obstacle that can cause interference to the movement of the first collecting device, and the set historical moment mentioned here may be a manually selected historical moment.
The execution subject for training the control model may be a server or an electronic device such as a desktop computer, and for convenience of description, the method for training the control model provided in the present specification will be described below with only the server as the execution subject.
The server may acquire status data of the first collection device and the first obstacle around the first collection device at the set history time. That is, the first acquisition device is mainly responsible for controlling the early data acquisition of the model training. The acquired state data may include: position data of the first collecting device and obstacles around the first collecting device, speed data of the first collecting device and obstacles around the first collecting device, steering angle data of the first collecting device and obstacles around the first collecting device, and the like. During the movement of the first acquisition device, there may be a plurality of first obstacles around, and thus the first acquisition device may acquire and acquire status data of each of the first obstacles around.
S102: and inputting the first state data into a control model, and determining a planned track of the first acquisition device after the set historical moment, wherein the planned track is determined by the control model according to the condition that the first obstacle runs according to a second track after the set historical moment, the second track is determined by the control model according to the first state data, or the control model is determined according to the first state data when the first acquisition device runs according to the first track after the set historical moment, and the first track is a basic track of the first acquisition device after the set historical moment, which is determined by the control model according to the first state data.
The motion trail output in the control model refers to a trail that the first acquisition equipment determined by the control model needs to travel after the historical moment is set. The trajectory is determined in the case where the control model is such that the first obstacle takes into account that the first acquisition device is traveling according to a certain trajectory after setting the history moment.
The above-mentioned basic trajectory is understood to mean a movement trajectory of the first acquisition device after the setting history time, which is determined only from the acquired first state data. In determining the first track (i.e., the base track), the determination may be performed based on all data included in the first state data, or may be performed based on part of the data in the first state data. For example, the first trajectory may be determined only from the first acquisition device and the position data of the first obstacle around the first acquisition device in the first state data. It should be noted that the first trajectory mentioned in the present specification is to be understood as a trajectory to be traveled by the first acquisition device after the setting of the history time, which is determined by the control model without sufficiently taking into account the interaction between the first acquisition device and the first obstacle.
The second trajectory may be a trajectory of the control model for traveling after the set history time of the first obstacle predicted based on the first state data, or a trajectory of the control model for traveling after the set history time of the first obstacle predicted based on the first state data and the first trajectory. The former case may be understood as the trajectory predicted by the control model in case the first obstacle only takes into account the first state data, while the latter case may be understood as the trajectory predicted by the control model in combination with the first state data in case the first obstacle takes into account that the first acquisition device is travelling according to the first trajectory as described above.
Further, the planned track of the first collecting device outputted by the control model after the setting of the history time is actually determined by the control model assuming that the first collecting device considers that the first obstacle runs according to the second track after the setting of the history time. From the above, it can be seen that the decision logic in the control model fully considers the driving condition of the obstacle around the first acquisition device after the history time is set, so that the trained control model is deployed into the actual unmanned device, and the safety of the unmanned device in the actual driving process can be effectively ensured, as shown in fig. 2.
Fig. 2 is a schematic diagram of a motion track of an acquisition device and an obstacle according to an embodiment of the present disclosure.
In the intersection scenario shown in fig. 2, in the intersection scenario, the first collecting device predicts that the first collecting device is traveling straight (i.e., the dashed line 1 in fig. 2) according to the first obstacle, so that the first obstacle may avoid collision with the first collecting device according to the situation that the first collecting device is traveling straight (i.e., the dashed line 4 in fig. 2), and the first collecting device determines that the first collecting device is changing lanes (i.e., the dashed line 2 in fig. 2) or accelerating straight traveling (accelerating traveling according to the dashed line 1 in fig. 2) according to the situation that the first collecting device is changing lanes of the first obstacle and the state data of the first obstacle, so as to avoid collision with the first obstacle and improve traveling efficiency.
In the embodiment of the present specification, before the first state data is input to the control model to be trained, the state data of the first acquisition device and the first obstacle before the setting history time (within a period of time in the past) acquired by the first acquisition device is required to be used as the second state data, and the second state data is input to the interaction state prediction model trained in advance to predict the obstacle interaction level of the first obstacle. The obstacle interaction level is mainly used for embodying a strategy for predicting the driving track of the first obstacle after the set historical moment by the control model. Different levels of obstacle interaction, the strategy adopted by the embodied control model to predict the second trajectory is also different. For example, for the same first state data and first trajectories, the second trajectories finally determined by the control model according to the strategies of different obstacle interaction levels are also different.
The interaction state prediction model may be a recurrent neural network (Recurrent Neural Network, RNN), or a model obtained by a variant of the recurrent neural network, such as Long Short-Term memory (LSTM), and the like, and the specific form of the interaction state prediction model is not limited in this specification.
Further, in the embodiment of the present disclosure, different levels of obstacle interaction may correspond to different control models, so the second state data may be input into a pre-trained interaction state prediction model to determine which control model should be applied when the first acquisition device and the surrounding first obstacles exist as the second state data before the historical moment is set.
The interaction state prediction model is input into the interaction state prediction model according to the second acquisition equipment and the state data of the second obstacle around the second acquisition equipment contained in the training sample so as to predict the interaction level of the second obstacle, and the optimization target training is completed by minimizing the deviation between the interaction level of the second obstacle and the tag data contained in the training sample.
The training samples refer to the second acquisition device, the second acquisition device and the state data of the second obstacle around the second acquisition device, which are acquired in advance, and the label data corresponding to the preset state data. The second acquisition device mentioned here and the first acquisition device mentioned above may be identical or may be different devices having the same function, the first obstacle and the second obstacle being only used to distinguish between obstacles around the first acquisition device and the second acquisition device.
The first state data and the second state data are also different, where the first state data refers to state data of the first collecting device and the first obstacle at a moment in time, and the second state data refers to state data of the second collecting device and the second obstacle for a period of time before the moment in time. The interaction state prediction model and the control model can be respectively and independently trained.
In the training process of the interaction state prediction model, according to second state data of a second acquisition device and second barriers around the second acquisition device contained in a training sample, predicting an interaction level of the second barrier, comparing the interaction level with tag data, optimizing the interaction state prediction model according to deviation between the interaction level of the second barrier and the tag data, and performing multiple rounds of iterative training to continuously reduce the deviation, so that the training process of the interaction state prediction model is completed.
In this embodiment of the present disclosure, the control models corresponding to different barrier interaction levels may be divided into several cases, where a barrier interaction level is K (K is not less than 2) is one case, where a barrier interaction level is 0 is one case, and where a barrier interaction level is 1 is one case. The strategy configuration in the control model is different for the three cases, and the training process is also different, and the three cases will be described respectively.
For the situation that the barrier interaction level is K, the control model comprises three layers of network structures, and different network structures correspond to different strategies, wherein one layer of network structure is a network structure corresponding to the equipment interaction level of K-1, one layer of network structure corresponding to the barrier interaction level of K, and one layer of network structure corresponding to the equipment interaction level of K+1.
The equipment interaction level is mainly used for embodying a strategy for determining the driving track of the first acquisition equipment after the historical moment is set by the control model. The strategies adopted by the control model to predict the first track or the planned track are different in different device interaction levels.
In this embodiment of the present disclosure, the control models corresponding to different barrier interaction levels may be trained sequentially according to the size of K in order from small to large, for example, the control model 2 with the barrier interaction level of 2 is trained first, and the model parameters of the device interaction level 1 in the control model 2 may be directly obtained from the model parameters of the device interaction level 1 in the trained control model 0. The model parameters of the equipment interaction level 2 in the control model 3 can be directly obtained from the model parameters of the equipment interaction level 2 in the trained control model 1, and the model parameters of the equipment interaction level 3 in the control model 4 can be directly obtained from the model parameters of the equipment interaction level 3 in the trained control model 2.
Therefore, in general, for the control model K corresponding to the obstacle interaction level K, the control model K includes the network structure of the obstacle interaction level K, the network structure of the equipment interaction level k+1, and the network structure of the equipment interaction level K-1. The model parameters of the network structure of the device interaction level K-1 in the control model K are obtained from the trained control model K-2, that is, the model parameters of the device interaction level K-1 in the control model K are obtained from the network structure of the device interaction level K-1 in the trained control model K-2.
Further, according to the model parameters and the first state data of the device interaction level K-1 in the control model K, a motion track corresponding to the device interaction level K-1 in the control model K may be determined, according to the motion track corresponding to the device interaction level K-1 in the control model K and the first state data, a motion track corresponding to the obstacle interaction level K may be determined, and further according to the motion track corresponding to the obstacle interaction level K in the control model K and the first state data, a motion track corresponding to the device interaction level k+1 in the control model K may be determined, that is, the above-mentioned planning track, as shown in fig. 3.
Fig. 3 is a schematic diagram of training a control model with an obstacle interaction level of 4 according to an embodiment of the present disclosure.
For the control model 4 corresponding to the obstacle interaction level 4, the control model 4 includes a network structure of the obstacle interaction level 4, a network structure of the equipment interaction level 5, and a network structure of the equipment interaction level 3. The model parameters of the device interaction level 3 in the control model 4 may be directly obtained from the trained control model 2. It will be appreciated that some of the model parameters in the control model 4 are obtained directly from the previously trained control model, whereas training the control model 4 requires actually training model parameters in the control model that cannot be obtained directly from the previously trained control model, i.e. in case some model parameters are known, determining by training the other part of the model parameters that need to be adjusted.
Therefore, after the model parameters of the network structure of the device interaction level 3 are obtained from the control model 2, the motion trail corresponding to the device interaction level 3 in the control model 4, that is, the first trail mentioned above, can be determined according to the model parameters of the device interaction level 3 in the control model 4 and the first state data. Then, according to the motion track corresponding to the equipment interaction level 3 in the control model 4 and the first state data, determining the motion track corresponding to the obstacle interaction level 4, namely the second track mentioned above, through the model parameters of the obstacle interaction level 4. And finally, determining the motion trail corresponding to the equipment interaction level 5 in the control model 4, namely the planning trail according to the motion trail corresponding to the obstacle interaction level 4 in the control model 4 and the first state data through the model parameters of the equipment interaction level 5. That is, the motion trail corresponding to each device interaction level is determined by the model parameter corresponding to the previous obstacle interaction level and the input first state data, and the motion trail corresponding to each obstacle interaction level is determined by the model parameter corresponding to the previous device interaction level and the input first state data.
In the above example, the model parameters of the obstacle interaction level 4 and the model parameters of the equipment interaction level 5 in the control model 4 are the model parameters to be adjusted in another part, so the control model 4 is trained, and in fact, the model parameters of the obstacle interaction level 4 and the model parameters of the equipment interaction level 5 are adjusted.
For the training manner of the control model corresponding to the obstacle interaction level 0, see fig. 4 in particular.
Fig. 4 is a schematic diagram of a training process of the control model 0 according to the embodiment of the present disclosure.
For the case that the barrier interaction level is 0, the corresponding control model 0 includes two layers of network structures, and different network structures correspond to different strategies, wherein one layer is the network structure corresponding to the barrier interaction level 0, and the other layer is the network structure corresponding to the equipment interaction level 1.
Further, in the embodiment of the present disclosure, when the obstacle interaction level is K, the model parameter of the device interaction level K-1 in the control model K may be obtained from the control model K-2, but the model parameter of the obstacle interaction level 0 in the control model 0 cannot be obtained from the previous control model, so that the model parameters of each network structure in the control model 0 need to be determined through training.
Thus, when training the control model 0, the first state data may be input into the control model 0, determining the movement trajectory of the obstacle interaction level 0, i.e. the second trajectory mentioned above. Then, according to the motion trail of the obstacle interaction level 0 and the model parameters of the equipment interaction level 1, determining the motion trail corresponding to the equipment interaction level 1 in the control model 0, namely the planning trail. That is, the process of training the control model 0 is actually a process of continuously adjusting the model parameters of the obstacle interaction level 0 and the model parameters of the equipment interaction level 1 in the control model 0, so as to finally achieve the training target.
The strategy represented by control model 0 is to plan the travel trajectory of the acquisition device without complete consideration of the interaction with the acquisition device by the obstacle. For example, in the intersection scenario of fig. 2, the first collecting device predicts that the motion trajectory of the first obstacle is straight running (i.e., the dashed line 3 in fig. 2) according to the control model 0 (since the first obstacle does not fully consider the interaction situation with the first collecting device, it follows straight running), and therefore the first collecting device will change the lane to avoid collision with the first obstacle according to the situation that the first obstacle is straight running (i.e., the dashed line 2 in fig. 2).
For the situation that the barrier interaction level is 1, the corresponding control model 1 comprises three layers of network structures, and different network structures correspond to different strategies, wherein one layer of network structure is a network structure corresponding to the equipment interaction level 0, one layer of network structure corresponding to the barrier interaction level 1, and one layer of network structure corresponding to the equipment interaction level 2.
Further, since the control model 0 does not include the network structure of the device interaction level 0, the model parameters of the device interaction level 0 in the control model 1 cannot be obtained from the previous control model (i.e., the control model 0), and therefore, the model parameters of the network structures in the control model 1 also need to be obtained through training.
Specifically, during the training of the control model 1, the first state data may be input into the control model 1 to determine the motion trajectory of the device interaction level 0, i.e. the first trajectory mentioned above. Then, according to the motion trail of the device interaction level 0 and the first state data, the motion trail of the obstacle interaction level 1 in the control model 1, namely the second trail, is determined through the model parameters of the obstacle interaction level 1. Finally, according to the motion trail of the obstacle interaction level 1 and the first state data, the motion trail corresponding to the equipment interaction level 2 in the control model 1, namely the planning trail, can be determined through the model parameters of the equipment interaction level 2.
In other words, the process of training the control model 1 is actually a process of continuously adjusting the model parameters of the device interaction level 0, the model parameters of the obstacle interaction level 1 and the model parameters of the device interaction level 2 in the control model 1, so as to finally achieve the training target.
It should be noted that the interaction level of 0 may refer to a policy that is exhibited when the first acquisition device or the first obstacle does not sufficiently consider interaction with other devices or obstacles, such as regarding the other devices or obstacles as stationary objects. That is, when the first state data is input into the control model 0, the movement locus of the obstacle interaction level 0 determined according to the first state data may be understood as being determined by the control model 0 only according to the position data of the first collecting device and the surrounding first obstacle in the first state data, or may be understood as being a locus of the first obstacle to be travelled after the setting history time by the control model under insufficient consideration of the interaction between the first collecting device and the first obstacle. The same is true for the device interaction level 0, and detailed descriptions thereof are omitted.
S104: and determining the rewarding value of the rewarding function corresponding to the control model according to the planning track, and training the control model according to the rewarding value.
In the embodiment of the specification, in addition to ensuring that the first collecting device does not collide with the surrounding first obstacle in the process of running according to the planned track output by the control model, the running efficiency of the first collecting device in running according to the planned track can be further ensured. Therefore, the server can determine the rewarding value of the rewarding function corresponding to the control model according to the determined planning track, and train the control model according to the rewarding value.
The reward functions employed include: a first influence factor for characterizing a collision probability between the first acquisition device and the first obstacle, and a second influence factor for characterizing a driving efficiency of the first acquisition device. The data adopted in the model training process are all historical data acquired before, so that the actual motion trail of the first obstacle after the set historical moment is known, and therefore the collision probability between the first acquisition equipment and the first obstacle after the set historical moment can be determined through the known actual motion trail of the first obstacle after the set historical moment and the planning trail output by the control model. The driving efficiency mentioned here can be represented by data such as the average speed, the acceleration, and the final speed of the first acquisition device when driving along the planned trajectory.
Further, the higher the collision probability between the first acquisition device and the first obstacle, the lower the above-mentioned prize value (i.e., the collision probability and the prize value are in a negative correlation). If the running efficiency of the first collecting device is higher (for example, the average speed of the first collecting device running along the planned track is higher), the above-mentioned reward value is also higher (that is, the running efficiency and the reward value are in positive correlation).
In the embodiment of the present disclosure, the determination manner of the collision probability may be various, for example, the planned trajectory and the known actual motion trajectory of the first obstacle after the set historical time may be input into a simulation system, and the simulation system determines the collision probability of the first acquisition device and the first obstacle after the set historical time. Other ways are not illustrated in detail herein.
It should be noted that the specific form of the reward function used for training the control model may be various, as long as the relationship that the reward value is inversely related to the collision probability and the relationship that the reward value is positively related to the running efficiency can be represented, and the specific form of the reward function is not limited in this specification.
In the embodiment of the present disclosure, the server may implement training of the control model by adjusting and optimizing the model parameters included in the control model with the maximum value of the reward function as an optimization target. That is, through multiple rounds of iterative training, the reward value of the reward function can be continuously increased and converged in a numerical range, so that the training process of the information recommendation model is completed.
Of course, besides training the control model with the maximum reward value as the optimization target, the control model may be trained by adjusting model parameters included in the control model with a preset reward value as the optimization target. That is, in the process of multiple rounds of iterative training, the reward value needs to be continuously close to the preset reward value, and after multiple rounds of iterative training, the reward value floats back and forth around the preset reward value, so that the training of the control model can be determined to be completed.
According to the process, the collision probability of the first acquisition equipment is considered in the model training process, and meanwhile the running efficiency of the first acquisition equipment is considered, so that the motion trail obtained by the trained control model according to the first state data can improve the safety in the running process, and meanwhile the running efficiency can also be improved.
After the training of the control model is completed, the embodiment of the specification can deploy the trained control model into the unmanned equipment to realize the control of the unmanned equipment, as shown in fig. 5.
Fig. 5 is a schematic flow chart of a control method of the unmanned device according to an embodiment of the present disclosure, which specifically includes:
s500: and acquiring state data of the unmanned equipment and obstacles around the unmanned equipment at the current moment as current state data.
S502: and inputting the current state data into a control model, and determining a planned track of the unmanned aerial vehicle after the current moment, wherein the planned track is determined by the control model according to the condition that the obstacle runs according to the obstacle track after the current moment, the obstacle track is determined by the control model according to the current state data, or the control model is determined according to the current state data under the condition that the unmanned aerial vehicle runs according to the basic track after the current moment, and the basic track is determined by the control model according to the current state data.
S504: and controlling the unmanned equipment according to the planned track.
In the embodiment of the present specification, the unmanned apparatus may acquire, as the current state data, the state data of the unmanned apparatus and the obstacle around the unmanned apparatus at the current time through various sensors (such as a camera, a laser radar, etc.) provided by itself. And then, inputting the current state data into a control model, and determining the planned track of the unmanned equipment after the current moment.
The planned trajectory is determined by the control model according to the situation that the obstacle runs after the current moment according to the obstacle trajectory, the obstacle trajectory is determined by the control model according to the current state data (if the control model 0 of the obstacle interaction level 0 is used, the obstacle trajectory is determined by the control model according to the current state data only if the control model 0 of the obstacle interaction level 0 is used), or the control model is determined by the control model according to the current state data when the unmanned device runs after the current moment according to the basic trajectory. The base track is a track of the unmanned device after the current moment, which is determined by the control model according to the current state data. Accordingly, the unmanned device can exercise control over itself based on the planned trajectory. The contents of S500 to S502 are basically the same as the model training stage described above, and will not be described in detail here.
In the use process of the control model, the unmanned aerial vehicle device can determine the obstacle interaction level through the interaction state prediction model, then determine the corresponding control model according to the obstacle interaction level, input the current state data into the control model corresponding to the obstacle interaction level, and determine the planning track of the unmanned aerial vehicle device, as shown in fig. 6.
Fig. 6 is a schematic diagram of an interaction state prediction model and a control model according to an embodiment of the present disclosure.
In the embodiment of the present specification, the unmanned device may acquire, as the historical state data, state data of the unmanned device and the obstacle around the unmanned device before the current time, that is, state data of the unmanned device and the obstacle around the unmanned device for a period of time before the current time. The drone may then input the historical state data to a pre-trained interaction state prediction model to predict an obstacle interaction level for obstacles around the drone. After determining the obstacle interaction level of the obstacle, the acquired current state data can be input into a control model corresponding to the obstacle interaction level, and the planned movement track of the unmanned equipment in a next period of time under the obstacle interaction level is determined.
According to the method, the motion trail of the first acquisition equipment output by the control model after the historical moment is set is determined by taking the first obstacle into consideration that the first acquisition equipment runs according to a certain motion trail after the historical moment is set. By training the control model in the mode, the control model is applied to practical application, so that the unmanned equipment can determine the driving strategy under the condition that the next motion state of surrounding obstacles is considered, the safe driving of the unmanned equipment is effectively ensured, and the collision with the surrounding obstacles is avoided.
The control model referred to in this specification may be a model obtained by Deep Q Network (DQN), LSTM, or the like. The specification does not limit the specific form of the control model.
The above method for training a control model provided for one or more embodiments of the present disclosure further provides a corresponding apparatus for training a control model based on the same concept, as shown in fig. 7.
Fig. 7 is a schematic structural diagram of a control model training device provided in the embodiment of the present disclosure, which specifically includes:
an acquiring module 700, configured to acquire, as first state data, state data of a first acquisition device and first obstacles around the first acquisition device at a set history time;
A determining module 702, configured to input the first state data into a control model, and determine a planned track of the first collecting device after the set historical moment, where the planned track is determined by the control model according to a situation that the first obstacle runs according to a second track after the set historical moment, and the second track is determined by the control model according to the first state data, or the control model determines according to the first state data when the first collecting device runs according to the first track after the set historical moment, and the first track is a basic track of the first collecting device after the set historical moment, which is determined by the control model according to the first state data;
and the training module 704 is configured to determine a reward value of a reward function corresponding to the control model according to the planned trajectory, and train the control model according to the reward value.
Optionally, the acquiring module 700 is specifically configured to acquire state data of the first collecting device and the first obstacle before the set historical moment, input the second state data as second state data to a pre-trained interaction state prediction model to predict an obstacle interaction level of the first obstacle, and input the first state data to a control model corresponding to the obstacle interaction level to determine a planned track of the collecting device after the set historical moment under the obstacle interaction level.
Optionally, the obtaining module 700 is specifically configured to obtain a training sample, input the second acquisition device included in the training sample and the state data of the second obstacle around the second acquisition device to the interaction state prediction model to predict the interaction level of the second obstacle, and train the interaction state prediction model with a deviation between the interaction level of the second obstacle and the tag data included in the training sample as an optimization target.
Optionally, the acquiring module 700 is specifically configured to, if the obstacle interaction level of the first obstacle at the set historical moment is K, the device interaction level of the first collecting device at the set historical moment is k+1, and K is an integer not less than 2.
Optionally, the obtaining module 700 is specifically configured to, if the obstacle interaction level is determined to be K, obtain a first model parameter related to the equipment interaction level K-1, and use the first model parameter as a model parameter related to the equipment interaction level K-1 in a control model corresponding to the obstacle interaction level K, where the first model parameter is obtained from a control model K-2 corresponding to the trained obstacle interaction level K-2. The first state data is input into a control model K corresponding to the obstacle interaction level K, a motion track corresponding to the first acquisition device under the obstacle interaction level K is determined to be used as the planning track, wherein the planning track is a motion track corresponding to the first acquisition device under the device interaction level K+1, the planning track is determined by the control model K by taking the motion track of the first obstacle under the obstacle interaction level K as the second track and according to the second track, the second track is determined by taking the motion track of the first acquisition device under the device interaction level K-1 as the first track and according to the first track and the first state data, and the first track is determined by the control model K according to model parameters related to the device interaction level K-1 and the first state data. And according to the reward value, adjusting model parameters of the obstacle interaction level K and model parameters of the equipment interaction level K+1 contained in the control model K so as to train the control model K.
Optionally, the obtaining module 700 is specifically configured to, if it is determined that the obstacle interaction level is 0, input the first state data into a control model 0 corresponding to the obstacle interaction level 0, determine a motion track of the first acquisition device when the device interaction level is 1, as the planned track of the first acquisition device under the obstacle interaction level 0, where the planned track is determined by the control model 0 by using the motion track of the first obstacle under the obstacle interaction level 0 as a second track, and determine the second track according to the second track, and the second track is determined by the control model 0 according to the first state data. And according to the reward value, adjusting model parameters of the obstacle interaction level 0 and model parameters of the equipment interaction level 1 contained in the control model 0 to train the control model 0.
Optionally, the obtaining module 700 is specifically configured to, if it is determined that the obstacle interaction level is 1, input the first state data to the control model 1 corresponding to the obstacle interaction level 1, determine a motion track corresponding to the first acquisition device under the obstacle interaction level 1 as the planned track, where the planned track is a motion track corresponding to the first acquisition device under the device interaction level 2, the planned track is determined by the control model 1 by using the motion track of the first obstacle under the obstacle interaction level 1 as the second track and according to the second track, and the second track is determined by the control model 1 by using the motion track of the first acquisition device under the device interaction level 0 as the first track and according to the first state data, and the first track is determined by the control model 1 according to the first state data. And according to the reward value, adjusting the model parameters of the equipment interaction level 0, the model parameters of the obstacle interaction level 1 and the model parameters of the equipment interaction level 2 contained in the control model 1 to train the control model 1.
Optionally, the acquiring module 700 is specifically configured to determine a first influence factor in the reward function according to the planned trajectory and the historical motion trajectory of the first obstacle, and determine a second influence factor in the reward function according to the planned trajectory, where the first influence factor is used to characterize a collision probability between the first acquisition device and the first obstacle, and the second influence factor is used to characterize a driving efficiency of the first acquisition device. Determining the rewarding value of the rewarding function according to the first influencing factor and the second influencing factor, and training the control model by taking the maximized rewarding value as a training target.
Fig. 8 is a schematic structural diagram of a control device of the unmanned device according to the embodiment of the present disclosure, which specifically includes:
an obtaining module 800, configured to obtain, as current state data, state data of the unmanned device and the obstacle around the unmanned device at the current time;
a determining module 802, configured to input the current state data into a control model, and determine a planned trajectory of the unmanned device after the current time, where the planned trajectory is determined by the control model according to a situation that the obstacle travels according to an obstacle trajectory after the current time, and the obstacle trajectory is determined by the control model according to the current state data, or is determined by the control model according to the current state data when the unmanned device travels according to a basic trajectory, and the basic trajectory is determined by the control model according to the current state data, and the control model is obtained by a training method of the control model;
And the control module 804 controls the unmanned equipment according to the planned track.
The present specification also provides a computer-readable storage medium storing a computer program operable to execute the training method of the control model provided in fig. 1 described above or the control method of the unmanned apparatus provided in fig. 5 described above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 9. As shown in fig. 9, at the hardware level, the training device of the control model and the device of the control method of the unmanned device include a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may of course include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the training method of the control model described in the above figure 1 or the control method of the unmanned equipment provided in the above figure 5. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (9)

1. A method of training a control model, comprising:
acquiring state data of a first acquisition device and a first obstacle around the first acquisition device at a set historical moment as first state data;
inputting the first state data into a control model, and determining a planned track of the first acquisition device after the set historical moment, wherein the planned track is determined by the control model according to the condition that the first obstacle runs according to a second track after the set historical moment, the second track is determined by the control model according to the first state data, or the control model is determined according to the first state data when the first acquisition device runs according to the first track after the set historical moment, and the first track is a basic track of the first acquisition device after the set historical moment, which is determined by the control model according to the first state data;
Determining a first influence factor in a reward function corresponding to the control model according to the planned trajectory and the historical motion trajectory of the first obstacle, and determining a second influence factor in the reward function according to the planned trajectory, wherein the first influence factor is used for representing the collision probability of the first acquisition equipment and the first obstacle, and the second influence factor is used for representing the running efficiency of the first acquisition equipment;
determining the rewarding value of the rewarding function according to the first influencing factor and the second influencing factor, and training the control model by taking the maximized rewarding value as a training target.
2. The method of claim 1, wherein prior to inputting the first state data into a control model to be trained, the method further comprises:
acquiring state data of the first acquisition equipment and the first obstacle before the set historical moment as second state data;
inputting the second state data into a pre-trained interaction state prediction model to predict obstacle interaction levels of the first obstacles, different obstacle interaction levels being used to characterize different interaction states between the first obstacles;
Inputting the first state data into a control model, and determining a planned track of the first acquisition device after the set historical moment, wherein the method specifically comprises the following steps of:
and inputting the first state data into a control model corresponding to the obstacle interaction level, and determining a planned track of the acquisition equipment after the set historical moment under the obstacle interaction level.
3. The method of claim 2, wherein the pre-trained interaction state prediction model specifically comprises:
obtaining a training sample;
inputting state data of a second acquisition device and a second obstacle around the second acquisition device contained in the training sample to the interaction state prediction model to predict an interaction level of the second obstacle;
and training the interaction state prediction model by taking the deviation between the interaction level of the minimized second obstacle and the tag data contained in the training sample as an optimization target.
4. The method of claim 2, wherein if the first obstacle has an obstacle interaction level K at the set history time, the first acquisition device has a device interaction level k+1 at the set history time, where K is an integer not less than 2.
5. The method of claim 4, wherein inputting the first state data into a control model to be trained to determine a planned trajectory of the first acquisition device after the set historical moment, comprises:
if the obstacle interaction level is determined to be K, acquiring a first model parameter related to the equipment interaction level K-1, and taking the first model parameter as a model parameter related to the equipment interaction level K-1 in a control model corresponding to the obstacle interaction level K, wherein the first model parameter is acquired from a control model K-2 corresponding to the trained obstacle interaction level K-2;
inputting the first state data into a control model K corresponding to the obstacle interaction level K, and determining a motion track corresponding to the first acquisition device under the obstacle interaction level K as the planning track, wherein the planning track is a motion track corresponding to the first acquisition device under the device interaction level K+1, the planning track is determined by the control model K by taking the motion track of the first obstacle under the obstacle interaction level K as the second track according to the second track, the second track is determined by taking the motion track of the first acquisition device under the device interaction level K-1 as the first track according to the first track and the first state data, and the first track is determined by the control model K according to model parameters related to the device interaction level K-1 and the first state data;
Training the control model according to the reward value, wherein the training comprises the following steps:
and according to the reward value, adjusting model parameters of the obstacle interaction level K and model parameters of the equipment interaction level K+1 contained in the control model K so as to train the control model K.
6. The method according to claim 2, wherein the first state data is input into a control model, and determining a planned trajectory of the first acquisition device after the set historical moment comprises:
if the obstacle interaction level is determined to be 0, inputting the first state data into a control model 0 corresponding to the obstacle interaction level 0, determining a motion track of the first acquisition device when the equipment interaction level is 1, and taking the motion track as the planning track of the first acquisition device under the obstacle interaction level 0, wherein the planning track is a second track by taking the motion track of the first obstacle under the obstacle interaction level 0 as a second track by the control model 0, and determining according to the second track, and the second track is determined by the control model 0 according to the first state data;
Training the control model according to the reward value, wherein the training comprises the following steps:
and according to the reward value, adjusting model parameters of the obstacle interaction level 0 and model parameters of the equipment interaction level 1 contained in the control model 0 to train the control model 0.
7. The method according to claim 2, wherein the first state data is input into a control model, and determining a planned trajectory of the first acquisition device after the set historical moment comprises:
if the obstacle interaction level is determined to be 1, inputting the first state data into a control model 1 corresponding to the obstacle interaction level 1, determining a motion track corresponding to the first acquisition device under the obstacle interaction level 1 as the planning track, wherein the planning track is a motion track corresponding to the first acquisition device under the device interaction level 2, the planning track is determined by the control model 1 by taking the motion track of the first obstacle under the obstacle interaction level 1 as the second track and according to the second track, the second track is determined by the control model 1 by taking the motion track of the first acquisition device under the device interaction level 0 as the first track and according to the first track and the first state data, and the first track is determined by the control model 1 according to the first state data;
Training the control model according to the reward value, wherein the training comprises the following steps:
and according to the reward value, adjusting the model parameters of the equipment interaction level 0, the model parameters of the obstacle interaction level 1 and the model parameters of the equipment interaction level 2 contained in the control model 1 to train the control model 1.
8. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.
CN202011104599.5A 2020-10-15 2020-10-15 Training method, control method and device for control model Active CN112306059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011104599.5A CN112306059B (en) 2020-10-15 2020-10-15 Training method, control method and device for control model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011104599.5A CN112306059B (en) 2020-10-15 2020-10-15 Training method, control method and device for control model

Publications (2)

Publication Number Publication Date
CN112306059A CN112306059A (en) 2021-02-02
CN112306059B true CN112306059B (en) 2024-02-27

Family

ID=74327621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011104599.5A Active CN112306059B (en) 2020-10-15 2020-10-15 Training method, control method and device for control model

Country Status (1)

Country Link
CN (1) CN112306059B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947495B (en) * 2021-04-25 2021-09-24 北京三快在线科技有限公司 Model training method, unmanned equipment control method and device
CN114047764B (en) * 2021-11-16 2023-11-07 北京百度网讯科技有限公司 Training method of path planning model, path planning method and device
CN114019981B (en) * 2021-11-16 2023-12-22 北京三快在线科技有限公司 Track planning method and device for unmanned equipment
CN117452955B (en) * 2023-12-22 2024-04-02 珠海格力电器股份有限公司 Control method, control device and cleaning system of cleaning equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109878513A (en) * 2019-03-13 2019-06-14 百度在线网络技术(北京)有限公司 Defensive driving strategy generation method, device, equipment and storage medium
CN109991987A (en) * 2019-04-29 2019-07-09 北京智行者科技有限公司 Automatic Pilot decision-making technique and device
CN111002980A (en) * 2019-12-10 2020-04-14 苏州智加科技有限公司 Road obstacle trajectory prediction method and system based on deep learning
CN111046981A (en) * 2020-03-17 2020-04-21 北京三快在线科技有限公司 Training method and device for unmanned vehicle control model
CN111076739A (en) * 2020-03-25 2020-04-28 北京三快在线科技有限公司 Path planning method and device
CN111079721A (en) * 2020-03-23 2020-04-28 北京三快在线科技有限公司 Method and device for predicting track of obstacle
CN111123933A (en) * 2019-12-24 2020-05-08 华为技术有限公司 Vehicle track planning method and device, intelligent driving area controller and intelligent vehicle
CN111325401A (en) * 2020-02-20 2020-06-23 江苏苏宁物流有限公司 Method and device for training path planning model and computer system
CN111338360A (en) * 2020-05-18 2020-06-26 北京三快在线科技有限公司 Method and device for planning vehicle driving state

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019006B2 (en) * 2015-04-08 2018-07-10 University Of Maryland, College Park Surface vehicle trajectory planning systems, devices, and methods

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109878513A (en) * 2019-03-13 2019-06-14 百度在线网络技术(北京)有限公司 Defensive driving strategy generation method, device, equipment and storage medium
CN109991987A (en) * 2019-04-29 2019-07-09 北京智行者科技有限公司 Automatic Pilot decision-making technique and device
CN111002980A (en) * 2019-12-10 2020-04-14 苏州智加科技有限公司 Road obstacle trajectory prediction method and system based on deep learning
CN111123933A (en) * 2019-12-24 2020-05-08 华为技术有限公司 Vehicle track planning method and device, intelligent driving area controller and intelligent vehicle
CN111325401A (en) * 2020-02-20 2020-06-23 江苏苏宁物流有限公司 Method and device for training path planning model and computer system
CN111046981A (en) * 2020-03-17 2020-04-21 北京三快在线科技有限公司 Training method and device for unmanned vehicle control model
CN111079721A (en) * 2020-03-23 2020-04-28 北京三快在线科技有限公司 Method and device for predicting track of obstacle
CN111076739A (en) * 2020-03-25 2020-04-28 北京三快在线科技有限公司 Path planning method and device
CN111338360A (en) * 2020-05-18 2020-06-26 北京三快在线科技有限公司 Method and device for planning vehicle driving state

Also Published As

Publication number Publication date
CN112306059A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112306059B (en) Training method, control method and device for control model
CN111208838B (en) Control method and device of unmanned equipment
CN111190427B (en) Method and device for planning track
CN112364997B (en) Method and device for predicting track of obstacle
CN111114543A (en) Trajectory prediction method and device
CN112766468B (en) Trajectory prediction method and device, storage medium and electronic equipment
CN111076739B (en) Path planning method and device
CN111238523B (en) Method and device for predicting motion trail
CN113341941B (en) Control method and device of unmanned equipment
CN113296541B (en) Future collision risk based unmanned equipment control method and device
CN111062372B (en) Method and device for predicting obstacle track
CN113968243B (en) Obstacle track prediction method, device, equipment and storage medium
CN113110526B (en) Model training method, unmanned equipment control method and device
CN113419547B (en) Multi-vehicle cooperative control method and device
CN110942181A (en) Method and device for predicting obstacle track
CN112947495B (en) Model training method, unmanned equipment control method and device
CN111123957B (en) Method and device for planning track
CN114019971B (en) Unmanned equipment control method and device, storage medium and electronic equipment
CN113033527A (en) Scene recognition method and device, storage medium and unmanned equipment
CN114153207B (en) Control method and control device of unmanned equipment
CN114167857B (en) Control method and device of unmanned equipment
CN113848913A (en) Control method and control device of unmanned equipment
CN114019981B (en) Track planning method and device for unmanned equipment
CN113879337B (en) Track prediction method and device, storage medium and electronic equipment
CN113074734B (en) Track planning method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant