CN109606383B

CN109606383B - Method and apparatus for generating a model

Info

Publication number: CN109606383B
Application number: CN201811636798.3A
Authority: CN
Inventors: 张连川
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-10
Anticipated expiration: 2038-12-29
Also published as: CN109606383A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for generating models. One embodiment of the method comprises: adopting a reinforcement learning algorithm, and executing the following training steps based on the initial model to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model characterizing the established at least one correspondence is generated. The implementation mode adopts the model obtained by the reinforcement learning algorithm to control the running of the vehicle, thereby enriching the control modes of the vehicle.

Description

Method and apparatus for generating a model

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating a model.

Background

Today's closed-loop automatic control techniques are often based on feedback to reduce uncertainty. In engineering practice, a proportional integral derivative controller is generally used to realize the above regulation control. The feedforward control system adopting the proportional-integral-derivative controller compensates, so that the deviation of the system is reduced.

For example, when controlling a vehicle, the prior art generally adopts a proportional-integral-derivative controller to realize the control of the vehicle.

Disclosure of Invention

The present disclosure presents methods and apparatus for generating models, and methods and apparatus for generating information.

In a first aspect, an embodiment of the present disclosure provides a method for generating a model, the method including: acquiring a target speed set and an acceleration set, wherein the acceleration in the acceleration set is used for indicating the acceleration to be possessed by the target vehicle when the target vehicle reaches the target speed; adopting a reinforcement learning algorithm, and executing the following training steps based on the initial model to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model characterizing the established at least one correspondence is generated.

In some embodiments, the method further comprises: and responding to the condition that the training is not finished, adjusting the model parameters of the initial model, and continuing to execute the training step by adopting the initial model after the model parameters are adjusted.

In some embodiments, determining whether the target vehicle satisfies a predetermined running smoothness condition while the target vehicle is running at the selected acceleration includes: determining whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running in the simulated environment according to the selected acceleration.

In some embodiments, determining whether the target vehicle satisfies a predetermined running smoothness condition while the target vehicle is running at the selected acceleration includes: and determining whether the target vehicle meets a predetermined running smoothness condition in a state where the target vehicle runs in the actual running process according to the selected acceleration.

In some embodiments, the target velocities in the set of target velocities correspond one-to-one to the accelerations in the set of accelerations; and aiming at the target speed in the target speed set, the acceleration corresponding to the target speed is obtained by the following steps: determining a time for the target vehicle to reach the target speed from an initial speed; determining a difference between the target speed and the initial speed; and determining the ratio of the determined difference value to the determined time as the acceleration corresponding to the target speed.

In some embodiments, the driving smoothing condition comprises at least one of: the average speed of the target vehicle is less than a preset speed threshold; the acceleration rate of the target vehicle is less than a preset acceleration rate threshold.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a model, the apparatus including: a first acquisition unit configured to acquire a target speed set and an acceleration set, wherein accelerations in the acceleration set are used to indicate an acceleration that the target vehicle is to have to reach the target speed; a training unit configured to perform the following training steps to learn generation of acceleration based on the initial model using a reinforcement learning algorithm: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model characterizing the established at least one correspondence is generated.

In some embodiments, the apparatus further comprises: and an adjusting unit configured to adjust the model parameters of the initial model in response to determining that the end training condition is not satisfied, and continue to perform the training step with the initial model after the model parameters are adjusted.

In some embodiments, the training unit comprises: the vehicle control device includes a first determination module configured to determine whether a target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running in the simulated environment at the selected acceleration.

In some embodiments, the training unit comprises: and a second determination module configured to determine whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running during actual running in accordance with the selected acceleration.

In a third aspect, an embodiment of the present disclosure provides a method for generating information, the method including: acquiring an initial speed and a target speed of a target vehicle; inputting the initial speed and the target speed into a driving model trained in advance to obtain an acceleration, wherein the driving model is obtained by training according to the method of any one embodiment of the method for generating the model, and the acceleration is used for indicating the acceleration which is to be possessed by the target vehicle when the target vehicle reaches the target speed; an instruction for instructing the target vehicle to travel is generated based on the acceleration.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for generating information, the apparatus including: a second acquisition unit configured to acquire an initial speed and a target speed of a target vehicle; an input unit configured to input an initial speed and a target speed to a driving model trained in advance, resulting in an acceleration, wherein the driving model is trained in a method according to any one of the embodiments of the method for generating a model as described above, and the acceleration is used for indicating an acceleration that the target vehicle is to have to reach the target speed; a generation unit configured to generate an instruction for instructing the target vehicle to travel, in accordance with the acceleration.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for generating a model as described above.

In a sixth aspect, embodiments of the present disclosure provide a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method of any of the embodiments of the method for generating a model as described above.

The method and the device for generating the model provided by the embodiment of the disclosure are implemented by acquiring a target speed set and an acceleration set, wherein the acceleration in the acceleration set is used for indicating the target vehicle to reach the acceleration to be possessed by the target speed, and then executing the following training steps based on an initial model by adopting a reinforcement learning algorithm so as to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; and in addition, the driving of the vehicle is controlled by the model obtained by training based on the reinforcement learning algorithm, so that the accuracy of vehicle control is improved, and the driving safety of the vehicle is improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a model according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for generating a model according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for generating a model according to the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating models according to the present disclosure;

FIG. 6 is a flow diagram for one embodiment of a method for generating information, according to the present disclosure;

FIG. 7 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present disclosure;

FIG. 8 is a schematic structural diagram of a computer system suitable for use with the electronic device used to implement embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of an embodiment of a method for generating a model or an apparatus for generating a model, or a method for generating information or an apparatus for generating information, to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, a server 103, a network 104, and a vehicle 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, server 103 and vehicle 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, the server 103, the vehicle 105 may interact through the network 104 to receive or transmit data (e.g., signals indicating the movement of the vehicle), etc. The

terminal devices

101 and 102 may have various communication client applications installed thereon, such as a device control application, an image processing application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal apparatuses

101 and 102 may be hardware or software. When the

terminal devices

101, 102 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101 and 102 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 103 may be a server that provides various services, such as a background server that controls the movement of the vehicle 105. The backend server may perform calculations and other processing on the received data (e.g., the target speed set and the acceleration set) and feed back processing results (e.g., a driving model trained based on the target speed set and the acceleration set) to the vehicle 105.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

The vehicle 105 may be a variety of vehicles that can be controlled. For example, the vehicle 105 may be controlled by instructions sent by the

terminal devices

101, 102, or, alternatively, the server 103; or may be controlled by a controller or software installed in the vehicle 105 itself. By way of example, the vehicle 105 may include, but is not limited to, any of the following: automobiles, cars, buses, autonomous vehicles, and the like. After the instruction is acquired, the vehicle 105 can travel in accordance with the instruction.

It should be noted that the method for generating the model provided by the embodiment of the present disclosure may be executed by the server 103, or may be executed by the

terminal devices

101 and 102; accordingly, the means for generating the model may be provided in the server 103, or may be provided in the

terminal devices

101, 102. Furthermore, the method for generating information provided by the embodiments of the present disclosure may be executed by the server 103, by the

terminal devices

101 and 102, and by the vehicle 105; accordingly, the means for generating information may be provided in the server 103, in the

terminal devices

101, 102, or in the vehicle 105.

It should be understood that the number of terminal devices, networks, servers, and vehicles in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, servers, and vehicles, as desired for implementation. For example, the system architecture may only include the electronic device on which the method for generating the model is run, when the electronic device on which the method for generating the information is run does not require data transfer with other electronic devices.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present disclosure is shown. The method for generating the model comprises the following steps:

step 201, a target speed set and an acceleration set are obtained.

In this embodiment, an execution subject (for example, a server, a terminal device or a vehicle shown in fig. 1) of the method for generating a model may obtain the target speed set and the acceleration set from other electronic devices or locally through a wired connection manner or a wireless connection manner.

In the present embodiment, the acceleration in the above-described set of accelerations is used to indicate the acceleration that the target vehicle is to have to reach the target speed. The target speed may be a desired speed, or a speed to be reached by the vehicle.

In practice, there is often a deviation between the actual speed of the vehicle and the target speed. For example, if the device for controlling the vehicle, or the driver, desires the vehicle to travel at a speed of 20 km/h, then, in this case, the actual speed of the vehicle is often deviated from the above-mentioned speed (i.e., the target speed) "20 km/h" desired to be achieved due to the influence of factors such as the resistance of the vehicle. For example, the actual speed of the vehicle may be less than 20 kilometers per hour.

Here, the above target speed set may be determined by a skilled person. For example, a speed range (e.g., 0km/h to 10 km/h) may be first determined, then the speed range may be equally divided into a preset number (e.g., 300) of speed intervals, and a set of endpoint values of each interval may be determined as a target speed set. Similarly, the set of accelerations may also be determined by a skilled person. For example, an acceleration range (e.g., 0 km/h) may be first determined²To 1km/h²) Then, the acceleration range is equally divided into a preset number (for example, 300) of speed intervals, and a set of end point values of each interval is determined as an acceleration set.

In some optional implementations of this embodiment, the executing main body may execute the step 201 by: and acquiring a target speed set and an acceleration set in the running process of the target vehicle.

It can be understood that, when the executing entity executes step 201 by using the optional implementation manner, since the target speed set and the acceleration set are actual data in the driving process of the target vehicle, there is usually a certain relation between the target speed in the target speed set and the acceleration in the acceleration set, so that the training time of the driving model can be reduced, and the accuracy of the trained driving model can be improved.

In some optional implementations of the present embodiment, the target velocities in the target velocity set correspond to the accelerations in the acceleration set in a one-to-one correspondence. For a target speed in the target speed set, the acceleration corresponding to the target speed may be obtained by the execution main body or an electronic device communicatively connected to the execution main body, through the following steps:

first, the time at which the target vehicle reaches the target speed from the initial speed is determined. The initial speed may be an actual speed of the target vehicle before the target vehicle receives the command to accelerate or decelerate. For each instruction, there may be an initial velocity.

Then, a difference between the target speed and the initial speed is determined.

And finally, determining the ratio of the determined difference value to the determined time as the acceleration corresponding to the target speed.

In step 202, a reinforcement learning algorithm is used, and based on the initial model, the following training steps are performed to learn the generation of the acceleration.

In this embodiment, the executing agent may perform the following training steps (including steps 2021 to 2026) based on the initial model by using a reinforcement learning algorithm to learn the generation of the acceleration.

Here, the initial model may be a model that is not trained or does not reach a preset condition after training. As an example, the initial model may be a Q table. The Q table can be regarded as a two-dimensional table, and the element value of the Q table can be used for measuring the quality degree of each action taken under the current state. Here, the initial model may be a model having a deep neural network structure.

In this embodiment, the training step includes the following steps:

step 2021, select a target speed from the target speed set.

In this embodiment, the executing entity may select a target speed from the target speed set acquired in step 201.

Here, the target speed may be selected from the acquired target speed set in various ways. For example, randomly, or in a particular order.

Step 2022, select an acceleration from the set of accelerations.

In this embodiment, the executing entity may select an acceleration from the acceleration set acquired in step 201.

Step 2023, determines whether the target vehicle satisfies a predetermined running smoothness condition while the target vehicle is running at the selected acceleration.

In the present embodiment, the execution subject described above may determine whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running at the selected acceleration.

Here, the above-mentioned target vehicle may be various vehicles, for example, an automobile, a car, a bus, a subway, and the like. The running smoothness condition may be a condition determined in advance for determining whether the running of the vehicle is smooth or not. For example, the above-described running smooth condition may include that the actual speed of the target vehicle is less than a preset speed threshold.

In some optional implementations of the embodiment, the driving smoothing condition includes at least one of: the average speed of the target vehicle is less than a preset speed threshold; the acceleration rate of the target vehicle is less than a preset acceleration rate threshold.

Here, the average speed of the target vehicle may be an average value of actual speeds of the target vehicle for a preset time period (for example, 1 minute), or may be an average value of actual speeds of the target vehicle during a period from before the acceleration or deceleration command is received to after the target speed is reached by traveling at an acceleration indicated by the acceleration or deceleration command.

The acceleration change rate may be indicative of a change in acceleration of the vehicle per unit time. It can be obtained by dividing the difference between the accelerations at the end times of the time period by the duration of the time period.

Step 2024, in response to determining that the driving smoothness condition is satisfied, establishing a correspondence between the initial velocity of the target vehicle, the selected target velocity, and the selected acceleration.

In this embodiment, in the case where it is determined that the above-described running smoothing condition is satisfied, the execution body may establish a correspondence relationship among an initial speed of the target vehicle, the selected target speed, and the selected acceleration.

Here, the driving model may be trained with the total reward value obtained being maximum as a target by determining a reward value (reward) of the established correspondence relationship in a case where the above-described driving smoothing condition is satisfied, and the correspondence relationship between the target speeds and the accelerations may be established by determining a probability of transition between each of the target speeds in the above-described target speed set and each of the accelerations in the above-described acceleration set during the training.

Step 2025, determine whether the predetermined training termination condition is satisfied.

In this embodiment, the execution subject may determine whether a preset end training condition is satisfied. The training end condition may be a condition predetermined by a technician for ending the training step. For example, the end training condition may include, but is not limited to, at least one of the following: the training times reach or exceed the preset times; the training time reaches or exceeds the preset time length.

Step 2026, in response to determining that the end training condition is satisfied, generating a driving model characterizing the established at least one correspondence.

In this embodiment, in a case where it is determined that the end training condition is satisfied, the executing agent may generate a driving model characterizing the established at least one correspondence relationship.

In some optional implementation manners of this embodiment, in response to determining that the end training condition is not satisfied, the executing entity may further adjust a model parameter of the initial model, and continue to execute the training step using the initial model after the model parameter adjustment.

It is understood that the process of performing the training step is the process of adjusting the probabilities in the Q table. When the step 2022 is performed for the first time or the first few times, a greedy algorithm may be adopted to select the acceleration from the acceleration set; as the number of times step 2022 is performed increases, the acceleration with the maximum probability corresponding to the selected target speed may be selected from the acceleration set.

In some optional implementations of the embodiment, for the step of determining whether the target vehicle satisfies a predetermined running smoothing condition in a state where the target vehicle is running at the selected acceleration, the executing body may execute: determining whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running in the simulated environment according to the selected acceleration.

Here, the simulation environment may be used to simulate a running environment of the vehicle. For example, the simulation environment may be a driving environment of the vehicle simulated by simulation software (e.g., CareMaker, carsim, etc.).

It can be understood that, because the driving of the vehicle usually has a certain risk in the training process, whether the target vehicle meets the predetermined driving smoothness condition can be preliminarily determined based on the simulation environment, so that the trial and error cost in the training process is reduced, and the risk is reduced.

In some optional implementations of the embodiment, for the step of determining whether the target vehicle satisfies a predetermined running smoothing condition in a state where the target vehicle is running at the selected acceleration, the executing body may also execute: and determining whether the target vehicle meets a predetermined running smoothness condition in a state where the target vehicle runs in the actual running process according to the selected acceleration.

It can be understood that, in the actual running process, determining whether the target vehicle meets the predetermined running smoothness condition will result in more accurate results, thereby improving the control accuracy of the vehicle.

In addition, the driving model obtained by training in the method of the embodiment does not directly learn the control of the accelerator and the brake, but learns the control of the acceleration, so that the trained control strategy has good transfer characteristics and is not limited to a certain vehicle or a certain brand of vehicle.

In addition, the acceleration can be obtained by the embodiment without depending on traditional control algorithms such as proportional-integral-derivative and the like, so that the control modes of the vehicle are enriched. The method of the present embodiment can control the running of the vehicle more accurately than in the prior art when the vehicle is in a low-speed running state (for example, running at a speed of less than 10 km/h) or in a low-speed climbing state.

Optionally, after the running model is generated, the executing body may further optimize parameters of the proportional-integral controller on line in the cruise control, that is, optimize the parameters of the proportional-integral controller on line in a data-driven manner by using the running model, so that the cruise control achieves a desired performance.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a model according to the present embodiment. In the application scenario of fig. 3, the server 301 first obtains a target velocity set 3001 and an acceleration set 3002. Wherein the accelerations in the acceleration set 3002 are used to indicate the acceleration that the target vehicle is to have to reach the target speed. Then, the server 301 performs the following training steps based on the initial model by using a reinforcement learning algorithm to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model 3003 characterizing the established at least one correspondence is generated.

The method provided by the above embodiment of the present disclosure is to acquire a target speed set and an acceleration set, where the acceleration in the acceleration set is used to indicate that a target vehicle reaches an acceleration to be possessed by a target speed, and then, based on an initial model, perform the following training steps to learn generation of the acceleration by using a reinforcement learning algorithm: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; and in addition, the driving of the vehicle is controlled by the model obtained by training based on the reinforcement learning algorithm, so that the accuracy of vehicle control is improved, and the driving safety of the vehicle is improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating a model is shown. The process 400 of the method for generating a model includes the steps of:

step 401, a reinforcement learning algorithm is adopted, and based on the initial model, the following training steps are executed to learn the generation of the acceleration.

In the present embodiment, an executing subject (for example, a server, a terminal device or a vehicle shown in fig. 1) of the method for generating a model may employ a reinforcement learning algorithm, and based on an initial model, perform the following training steps to learn generation of an acceleration. Wherein, the training step comprises the following steps of 4001 to 4013.

Step 4001, obtain a target velocity set and an acceleration set. Thereafter, step 4002 is performed.

In this embodiment, the execution main body may obtain the target speed set and the acceleration set from other electronic devices or locally through a wired connection manner or a wireless connection manner.

Here, the above target speed set may be determined by a skilled person. Similarly, the set of accelerations may also be determined by a skilled person.

Step 4002, selecting a target speed from the target speed set, and selecting an acceleration from the acceleration set. Thereafter, step 4003 is performed.

In this embodiment, the executing entity may select a target speed from the target speed set acquired in step 4001, and select an acceleration from the acceleration set acquired in step 4001.

Step 4003, determining whether the target vehicle satisfies a predetermined first driving smoothness condition while the target vehicle is traveling in the simulated environment according to the selected acceleration. Thereafter, step 4004 is performed.

In the present embodiment, the execution subject described above may determine whether the target vehicle satisfies a first travel smoothness condition determined in advance in a state where the target vehicle travels in the simulation environment at the selected acceleration.

It can be understood that whether the target vehicle meets the predetermined running smoothness condition or not can be preliminarily determined based on the simulation environment, so that the trial and error cost in the training process is reduced, and the risk is reduced.

Step 4004, in response to determining that the first driving smoothness condition is satisfied, establishing a correspondence between the initial velocity of the target vehicle, the selected target velocity, and the selected acceleration. Thereafter, step 4005 is performed.

In this embodiment, in the case where it is determined that the first running smoothness condition is satisfied, the execution main body may further establish a correspondence relationship among the initial speed of the target vehicle, the selected target speed, and the selected acceleration. The first running smoothing condition may be a condition determined in advance for determining whether the running of the vehicle in the simulation environment is smooth. For example, the first driving smoothness condition includes at least one of: the average speed of the target vehicle is less than a preset speed threshold; the acceleration rate of the target vehicle is less than a preset acceleration rate threshold.

Step 4005, determining whether a preset first end training condition is satisfied. Then, if yes, go to step 4006; if not, go to step 4002.

In this embodiment, the executing entity may further determine whether a preset first end training condition is satisfied. Wherein the first end training condition may be a condition predetermined by a technician for ending training in the simulation environment. For example, the first end training condition may include, but is not limited to, at least one of: the training times reach or exceed the preset times; the training time reaches or exceeds the preset time length.

Step 4006, acquiring a target speed set and an acceleration set in the running process of the target vehicle. Thereafter, step 4007 is performed.

In the present embodiment, the execution subject may further acquire a target speed set and an acceleration set during travel of the target vehicle.

It is understood that the target speed set and the acceleration set in step 4006 are actual data of the target vehicle during the driving process, and therefore, there is usually a certain relationship between the target speed in the target speed set and the acceleration in the acceleration set, so that the training time of the driving model can be reduced, and the accuracy of the trained driving model can be improved.

Here, the target speed set and the acceleration set acquired in step 4006 may also be acquired by the driver during driving of the target vehicle, so that the generation of acceleration may be learned during driving, which may help to obtain more accurate acceleration through the model obtained through final training.

Step 4007, selecting a target speed and an acceleration from the target speed set and the acceleration set during the driving process of the target vehicle, and determining whether the target vehicle meets a predetermined second driving smooth condition in a state where the target vehicle is driving according to the selected acceleration. Thereafter, step 4008 is performed.

In this embodiment, the execution subject may further select a target speed and an acceleration from a target speed set and an acceleration set during the running of the target vehicle, and determine whether the target vehicle satisfies a predetermined second running smoothing condition in a state where the target vehicle runs according to the selected acceleration. The second running smoothing condition may be a condition determined in advance for determining whether or not the running of the vehicle is smooth in the generation process of learning the acceleration using the data acquired during the running. For example, the second running smoothing condition includes at least one of: the average speed of the target vehicle is less than a preset speed threshold; the acceleration rate of the target vehicle is less than a preset acceleration rate threshold.

Step 4008, in response to determining that the second driving smoothness condition is satisfied, establishing a correspondence between the initial speed of the target vehicle, the selected target speed, and the selected acceleration. Thereafter, step 4009 is performed.

In the present embodiment, the executing body described above may further establish a correspondence relationship among the initial speed of the target vehicle, the selected target speed, and the selected acceleration in response to a determination that the second running smoothness condition is satisfied.

Step 4009, determining whether a preset second training ending condition is met. If yes, go to step 4010; if not, go to step 4007.

In this embodiment, the execution subject may determine whether a preset second end training condition is satisfied. Wherein the second end training condition may be a condition predetermined by a technician for ending the training based on the second running smoothness condition. For example, the second end training condition may include, but is not limited to, at least one of: the training times reach or exceed the preset times; the training time reaches or exceeds the preset time length.

And 4010, selecting a target speed and an acceleration from a target speed set and an acceleration set of the target vehicle in the running process respectively, determining the target vehicle, and according to the selected acceleration, determining whether the target vehicle meets a predetermined third running smoothness condition in the running state of the target vehicle in the actual running process. Thereafter, step 4011 is performed.

In this embodiment, the executing entity may select a target speed and an acceleration from a target speed set and an acceleration set during the running of the target vehicle, respectively, determine the target vehicle, and determine whether the target vehicle satisfies a predetermined third running smoothness condition in a state where the target vehicle runs during the actual running according to the selected acceleration. The third running smoothing condition may be a condition determined in advance for determining whether or not the running of the vehicle is smooth in the generation process of learning the acceleration using the data acquired during the running. For example, the third travel smoothing condition includes at least one of: the average speed of the target vehicle is less than a preset speed threshold; the acceleration rate of the target vehicle is less than a preset acceleration rate threshold.

Step 4011, in response to determining that the third traveling smoothness condition is satisfied, establishing a correspondence between the initial velocity of the target vehicle, the selected target velocity, and the selected acceleration. Thereafter, step 4012 is performed.

In the present embodiment, the executing body may further establish a correspondence relationship among the initial speed of the target vehicle, the selected target speed, and the selected acceleration in response to determining that the third running smoothing condition is satisfied.

And step 4012, determining whether a preset third training ending condition is met. If yes, go to step 4013; if not, go to step 4010.

In this embodiment, the executing entity may further determine whether a preset third end training condition is satisfied. Wherein the third end training condition may be a condition predetermined by a technician for ending training based on the third driving smoothness condition. For example, the third end training condition may include, but is not limited to, at least one of: the training times reach or exceed the preset times; training time reaches or exceeds a preset duration, and the like.

The first, second, and third of the first, second, and third travel smoothness conditions are used only for distinguishing the travel smoothness conditions, and do not constitute a special limitation on the travel smoothness conditions, and the first, second, and third travel smoothness conditions may be the same or different travel smoothness conditions. The first, second, and third of the first, second, and third training end conditions are used only for distinguishing the training end conditions, and are not particularly limited to the training end conditions, and the first, second, and third training end conditions may be the same training end conditions or different training end conditions. And is not limited herein.

Step 4013, a driving model characterizing the established at least one correspondence is generated.

In this embodiment, the execution subject may generate a driving model characterizing the established at least one correspondence relationship.

It is understood that the process of performing the training step is the process of adjusting the probabilities in the Q table. When the target speed and the acceleration are selected from the acceleration set for the first time or the last several times respectively, a greedy algorithm can be adopted to select the acceleration from the acceleration set; as the number of times step 2022 is performed increases, the acceleration with the maximum probability corresponding to the selected target speed may be selected from the acceleration set. Further, the target speed may be selected from the acquired set of target speeds in various ways. For example, randomly, or in a particular order.

In some usage cases, the target speed set and the acceleration set during the running of the target vehicle in step 4007 and step 4010 may be the same or different. For example, the target speed set and the acceleration set during the driving of the target vehicle in step 4007 may be data generated during the driving of the target vehicle by the driver, and the target speed set and the acceleration set during the driving of the target vehicle in step 4010 may be data automatically generated by the target vehicle (i.e., data generated during actual measurement training), so that the requirement of trial and error for reinforcement learning may be solved and the time and cost of actual vehicle training may be reduced by training through simulation, training using training data generated by the driver during the driving, and training for actual measurement in the last vehicle.

Here, the specific implementation of the steps in this embodiment may refer to the embodiment corresponding to fig. 2 or an alternative implementation, and the above description on the embodiment corresponding to fig. 2 may also be applied to this embodiment, which is not described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the process 400 of the method for generating a model in this embodiment highlights the step of training to obtain a driving model in a simulation environment, so that it can be preliminarily determined whether the target vehicle meets the predetermined driving smoothness condition, thereby reducing the trial-and-error cost in the training process and reducing the risk.

With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides an embodiment of an apparatus for generating a model, which corresponds to the method embodiment illustrated in fig. 2, and which may include the same or corresponding features as the method embodiment illustrated in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the apparatus 500 for generating a model of the present embodiment includes: a first acquisition unit 501 and a training unit 502. Wherein the first obtaining unit 501 is configured to obtain a target speed set and an acceleration set, wherein an acceleration in the acceleration set is used to indicate an acceleration that the target vehicle is to have to reach the target speed; the training unit 502 is configured to employ a reinforcement learning algorithm to perform the following training steps based on the initial model to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model characterizing the established at least one correspondence is generated.

In this embodiment, the first obtaining unit 501 of the apparatus 500 for generating a model may obtain the target speed set and the acceleration set from other electronic devices through a wired connection manner or a wireless connection manner, or locally.

In this embodiment, the training unit 502 may employ a reinforcement learning algorithm, and perform the following training steps based on the initial model to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model characterizing the established at least one correspondence is generated.

In some optional implementations of this embodiment, the apparatus 500 further includes: an adjusting unit (not shown in the figures) is configured to adjust model parameters of the initial model in response to determining that the end training condition is not satisfied, and to continue to perform the training step with the initial model after the model parameters are adjusted.

In some optional implementations of this embodiment, the training unit 502 includes: the first determination module (not shown in the figure) is configured to determine whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running in the simulated environment at the selected acceleration.

In some optional implementations of this embodiment, the training unit 502 includes: the second determination module (not shown in the figure) is configured to determine whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running during actual running at the selected acceleration.

In some optional implementations of this embodiment, the first obtaining unit 501 includes: the acquisition module (not shown in the figure) is configured to acquire a target speed set and an acceleration set during travel of a target vehicle.

In some optional implementations of this embodiment, the target velocities in the target velocity set correspond to the accelerations in the acceleration set one to one; and aiming at the target speed in the target speed set, the acceleration corresponding to the target speed is obtained by the following steps: determining a time for the target vehicle to reach the target speed from an initial speed; determining a difference between the target speed and the initial speed; and determining the ratio of the determined difference value to the determined time as the acceleration corresponding to the target speed.

The apparatus provided by the above embodiment of the present disclosure acquires, by the first acquiring unit 501, a target speed set and an acceleration set, where an acceleration in the acceleration set is used to indicate that a target vehicle reaches an acceleration to be possessed by a target speed, and then, the training unit 502 executes, by using a reinforcement learning algorithm, based on an initial model, the following training steps to learn generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; and in addition, the driving of the vehicle is controlled by the model obtained by training based on the reinforcement learning algorithm, so that the accuracy of vehicle control is improved, and the driving safety of the vehicle is improved.

Turning next to fig. 6, a flow 600 of one embodiment of a method for generating information in accordance with the present disclosure is illustrated. The method for generating information comprises the following steps:

step 601, acquiring an initial speed and a target speed of a target vehicle.

In this embodiment, an execution subject (for example, a server, a terminal device or a vehicle shown in fig. 1) of the method for generating a model may obtain the initial speed and the target speed of the target vehicle from other electronic devices or locally through a wired connection manner or a wireless connection manner.

Here, the target vehicle may be, but is not limited to, a car, a bus, and the like, and the target vehicle may be the same vehicle as the vehicle described in the embodiment of fig. 2 or a different vehicle.

The initial speed may be an actual speed of the target vehicle before the target vehicle receives the command to accelerate or decelerate. For each instruction, there may be an initial velocity.

The target speed may be a speed desired to be achieved, or a speed to be achieved by the target vehicle.

Step 602, inputting the initial speed and the target speed into a pre-trained driving model to obtain an acceleration.

In this embodiment, the executing entity may input the initial speed and the target speed acquired in step 601 to a driving model trained in advance to obtain an acceleration. Wherein the driving model is trained according to the method of any one of the embodiments of the method for generating a model as described above, and the acceleration is used for indicating an acceleration to be taken by the target vehicle to reach the target speed. The acceleration may be used to indicate an acceleration that the target vehicle is to have to reach the target speed.

It is to be understood that, since the driving model in the embodiment of fig. 2 is a model obtained by using a reinforcement learning algorithm, it can be used to characterize the corresponding relationship between the initial speed, the target speed and the acceleration.

Step 603 generates a command for instructing the target vehicle to travel, based on the acceleration.

In the present embodiment, the execution subject described above may generate an instruction for instructing the target vehicle to travel, based on the acceleration.

As an example, the execution subject described above may directly generate an instruction for instructing the target vehicle to travel, including an acceleration; the target vehicle may be instructed to travel by determining a scale value of the accelerator or the brake of the target vehicle based on the obtained acceleration, and generating an instruction including the scale value.

The method provided by the above-mentioned embodiment of the present disclosure obtains an acceleration by acquiring an initial speed and a target speed of a target vehicle, and then inputting the initial speed and the target speed into a driving model trained in advance, wherein the driving model is trained according to the method of any one of the above-mentioned methods for generating a model, the acceleration is used for indicating an acceleration that the target vehicle is to have in order to reach the target speed, and then, an instruction for indicating the driving of the target vehicle is generated according to the acceleration, so that the control of the vehicle is realized without a proportional-integral-derivative controller, thereby enriching a manner of controlling the vehicle, and furthermore, the driving of the vehicle is controlled by the model trained based on an enhanced learning algorithm, which contributes to improving the accuracy of vehicle control and improving the safety of vehicle driving.

Referring next to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating information, the embodiment of the apparatus corresponding to the embodiment of the method shown in fig. 6, and the embodiment of the apparatus may further include the same or corresponding features as the embodiment of the method shown in fig. 6, except for the features described below. The device can be applied to various electronic equipment.

As shown in fig. 7, the apparatus 700 for generating information of the present embodiment includes: the second acquisition unit 701 is configured to acquire an initial speed and a target speed of the target vehicle; the input unit 702 is configured to input an initial speed and a target speed to a running model trained in advance, resulting in an acceleration, wherein the running model is trained in the method according to any one of the embodiments of the method for generating a model as described above, and the acceleration is used for indicating an acceleration that the target vehicle is to have to reach the target speed; the generation unit 703 is configured to generate an instruction for instructing the target vehicle to travel, in accordance with the acceleration.

In this embodiment, the second obtaining unit 701 of the apparatus for generating information 700 may obtain the initial speed and the target speed of the target vehicle from other electronic devices, or locally, through a wired connection manner or a wireless connection manner.

In this embodiment, the input unit 702 may input the initial speed and the target speed acquired by the second acquisition unit 701 to a driving model trained in advance to obtain an acceleration. Wherein the driving model is trained according to the method of any one of the embodiments of the method for generating a model as described above, and the acceleration is used for indicating an acceleration to be taken by the target vehicle to reach the target speed. The acceleration may be used to indicate an acceleration that the target vehicle is to have to reach the target speed.

In this embodiment, the generation unit 703 may generate a command instructing the target vehicle to travel, based on the acceleration obtained by the input unit 702.

The above-described embodiment of the present disclosure provides an apparatus that acquires an initial speed and a target speed of a target vehicle by a second acquisition unit 701, and thereafter, the input unit 702 inputs the initial speed and the target speed to a previously trained driving model, obtains an acceleration, wherein the driving model is trained according to the method of any one of the embodiments of the method for generating a model as described above, the acceleration is used to indicate an acceleration that the target vehicle is to have to reach the target speed, and finally, the generation unit 703 generates an instruction for instructing the target vehicle to travel, based on the acceleration, thus, the control of the vehicle is realized without a proportional-integral-derivative controller, thereby enriching the way of controlling the vehicle, and in addition, the driving of the vehicle is controlled through the model obtained through training based on the reinforcement learning algorithm, so that the accuracy of vehicle control is improved, and the driving safety of the vehicle is improved.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with the electronic device implementing embodiments of the present disclosure. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

To the I/O interface 805, AN input section 806 including a keyboard, a mouse, and the like, AN output section 807 including a network interface card such as a Cathode Ray Tube (CRT), a liquid crystal display (L CD), and the like, a speaker, and the like, a storage section 808 including a hard disk, and the like, and a communication section 809 including a network interface card such as a L AN card, a modem, and the like are connected, the communication section 809 performs communication processing via a network such as the internet, a drive 810 is also connected to the I/O interface 805 as necessary, a removable medium 811 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, and the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted into the storage section 808 as.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the Central Processing Unit (CPU)801, performs the above-described functions defined in the method of the present disclosure.

It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including AN object oriented programming language such as Python, Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit and a training unit. As another example, it can be described as: a processor includes a second acquisition unit, an input unit, and a generation unit. Where the names of the units do not in some cases constitute a limitation on the unit itself, for example, the first acquisition unit may also be described as a "unit that acquires a set of target velocities and a set of accelerations".

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target speed set and an acceleration set, wherein the acceleration in the acceleration set is used for indicating the acceleration to be possessed by the target vehicle when the target vehicle reaches the target speed; adopting a reinforcement learning algorithm, and executing the following training steps based on the initial model to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smooth condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relation among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to the fact that the driving smooth condition is met; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, a driving model characterizing the established at least one correspondence is generated. Or, causing the electronic device to: acquiring an initial speed and a target speed of a target vehicle; inputting the initial speed and the target speed into a pre-trained driving model to obtain an acceleration; an instruction for instructing the target vehicle to travel is generated based on the acceleration.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for generating a model, comprising:

acquiring a target speed set and an acceleration set, wherein the acceleration in the acceleration set is used for indicating the acceleration to be possessed by a target vehicle when the target vehicle reaches the target speed;

adopting a reinforcement learning algorithm, and executing the following training steps based on the initial model to learn the generation of the acceleration: selecting a target speed from the target speed set; selecting an acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smoothing condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relationship among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to determining that the driving smoothness condition is satisfied; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, generating a driving model characterizing the established at least one correspondence.

2. The method of claim 1, wherein the method further comprises:

and responding to the condition that the training is not finished, adjusting the model parameters of the initial model, and continuing to execute the training step by adopting the initial model after the model parameters are adjusted.

3. The method of claim 1, wherein the determining whether the target vehicle satisfies a predetermined driving smoothness condition while the target vehicle is traveling at the selected acceleration comprises:

and determining whether the target vehicle meets a predetermined running smooth condition in a state that the target vehicle runs in the simulation environment according to the selected acceleration.

4. The method of claim 1, wherein the determining whether the target vehicle satisfies a predetermined driving smoothness condition while the target vehicle is traveling at the selected acceleration comprises:

and determining whether the target vehicle meets a predetermined running smooth condition or not in a state that the target vehicle runs in the actual running process according to the selected acceleration.

5. The method of claim 1, wherein a target velocity in the set of target velocities corresponds one-to-one to an acceleration in the set of accelerations; and

for a target speed in the target speed set, the acceleration corresponding to the target speed is obtained by the following steps:

determining a time for the target vehicle to reach the target speed from an initial speed;

determining a difference between the target speed and the initial speed;

and determining the ratio of the determined difference value to the determined time as the acceleration corresponding to the target speed.

6. The method according to one of claims 1-5, wherein the driving smoothing condition comprises at least one of:

the average speed of the target vehicle is less than a preset speed threshold;

the acceleration rate of the target vehicle is less than a preset acceleration rate threshold.

7. A method for generating information, comprising:

acquiring an initial speed and a target speed of a target vehicle;

inputting the initial speed and the target speed into a pre-trained driving model to obtain an acceleration, wherein the driving model is obtained by training according to the method of one of claims 1 to 6, and the acceleration is used for indicating the acceleration to be carried by the target vehicle to reach the target speed;

and generating an instruction for indicating the target vehicle to run according to the acceleration.

8. An apparatus for generating a model, comprising:

a first acquisition unit configured to acquire a target speed set and an acceleration set, wherein accelerations in the acceleration set are used to indicate an acceleration that a target vehicle is to have to reach a target speed;

a training unit configured to perform the following training steps to learn generation of acceleration based on the initial model using a reinforcement learning algorithm: selecting a target speed from the target speed set; selecting an acceleration from the acceleration set; determining whether the target vehicle meets a predetermined running smoothing condition in a state where the target vehicle is running according to the selected acceleration; establishing a corresponding relationship among the initial speed of the target vehicle, the selected target speed and the selected acceleration in response to determining that the driving smoothness condition is satisfied; determining whether a preset training ending condition is met; in response to determining that the end training condition is satisfied, generating a driving model characterizing the established at least one correspondence.

9. The apparatus of claim 8, wherein the apparatus further comprises:

an adjusting unit configured to adjust a model parameter of the initial model in response to determining that the end training condition is not satisfied, and continue to perform the training step with the initial model after the model parameter adjustment.

10. The apparatus of claim 8, wherein the training unit comprises:

a first determination module configured to determine whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running in the simulated environment at the selected acceleration.

11. The apparatus of claim 8, wherein the training unit comprises:

a second determination module configured to determine whether the target vehicle satisfies a predetermined running smoothness condition in a state where the target vehicle is running during actual running in accordance with the selected acceleration.

12. The apparatus of claim 8, wherein a target velocity of the set of target velocities corresponds one-to-one to an acceleration of the set of accelerations; and

determining a difference between the target speed and the initial speed;

13. The apparatus according to one of claims 8-12, wherein the driving smoothing condition comprises at least one of:

the average speed of the target vehicle is less than a preset speed threshold;

14. An apparatus for generating information, comprising:

a second acquisition unit configured to acquire an initial speed and a target speed of a target vehicle;

an input unit configured to input the initial speed and the target speed to a pre-trained driving model, resulting in an acceleration, wherein the driving model is trained according to the method of one of claims 1 to 6, and the acceleration is used for indicating an acceleration to be taken by the target vehicle to reach the target speed;

a generation unit configured to generate an instruction for instructing the target vehicle to travel, in accordance with the acceleration.

15. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.