CN110843746A - Anti-lock brake control method and system based on reinforcement learning - Google Patents

Anti-lock brake control method and system based on reinforcement learning Download PDF

Info

Publication number
CN110843746A
CN110843746A CN201911194029.7A CN201911194029A CN110843746A CN 110843746 A CN110843746 A CN 110843746A CN 201911194029 A CN201911194029 A CN 201911194029A CN 110843746 A CN110843746 A CN 110843746A
Authority
CN
China
Prior art keywords
reinforcement learning
reward function
range
parameter
wheel speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911194029.7A
Other languages
Chinese (zh)
Other versions
CN110843746B (en
Inventor
董舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dilu Technology Co Ltd
Original Assignee
Dilu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dilu Technology Co Ltd filed Critical Dilu Technology Co Ltd
Priority to CN201911194029.7A priority Critical patent/CN110843746B/en
Publication of CN110843746A publication Critical patent/CN110843746A/en
Application granted granted Critical
Publication of CN110843746B publication Critical patent/CN110843746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60TVEHICLE BRAKE CONTROL SYSTEMS OR PARTS THEREOF; BRAKE CONTROL SYSTEMS OR PARTS THEREOF, IN GENERAL; ARRANGEMENT OF BRAKING ELEMENTS ON VEHICLES IN GENERAL; PORTABLE DEVICES FOR PREVENTING UNWANTED MOVEMENT OF VEHICLES; VEHICLE MODIFICATIONS TO FACILITATE COOLING OF BRAKES
    • B60T8/00Arrangements for adjusting wheel-braking force to meet varying vehicular or ground-surface conditions, e.g. limiting or varying distribution of braking force
    • B60T8/17Using electrical or electronic regulation means to control braking
    • B60T8/172Determining control parameters used in the regulation, e.g. by calculations involving measured or detected parameters
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60TVEHICLE BRAKE CONTROL SYSTEMS OR PARTS THEREOF; BRAKE CONTROL SYSTEMS OR PARTS THEREOF, IN GENERAL; ARRANGEMENT OF BRAKING ELEMENTS ON VEHICLES IN GENERAL; PORTABLE DEVICES FOR PREVENTING UNWANTED MOVEMENT OF VEHICLES; VEHICLE MODIFICATIONS TO FACILITATE COOLING OF BRAKES
    • B60T8/00Arrangements for adjusting wheel-braking force to meet varying vehicular or ground-surface conditions, e.g. limiting or varying distribution of braking force
    • B60T8/17Using electrical or electronic regulation means to control braking
    • B60T8/176Brake regulation specially adapted to prevent excessive wheel slip during vehicle deceleration, e.g. ABS
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Regulating Braking Force (AREA)

Abstract

The invention discloses an anti-lock brake control method and system based on reinforcement learning, which comprises the following steps: extracting key parameters; quantizing the extracted key parameters and limiting the range of the key parameters; constructing a reinforcement learning module comprising a reward function; defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result; inputting the output result to the reinforcement learning module for training; the trained reinforcement learning module is used for controlling the anti-lock brake of the vehicle, the defect that the existing reinforcement learning is unreasonable in definition of the reward function is overcome, and the reinforcement learning algorithm with the new reward function defined is applied to the anti-lock brake control, so that the performance of the reinforcement learning algorithm is better than that of the traditional algorithm.

Description

Anti-lock brake control method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of automatic driving of automobiles, in particular to an anti-lock brake control method and system based on reinforcement learning.
Background
In modern automobile products, an anti-lock brake system can ensure that the automobile prevents wheels from locking in emergency braking, so that the automobile body is stabilized, the braking distance is shortened, and the anti-lock brake system is a standard configuration of the automobile. With the development of artificial intelligence, it is also possible to realize antilock braking using artificial intelligence technology, and theoretically, it is possible to obtain more excellent performance than the conventional algorithm.
The reinforcement learning is an important direction of artificial intelligence and is more suitable for processing the problem of serialization, and the anti-lock brake operation of the automobile during emergency braking accords with the characteristic of reinforcement learning, so that the realization of anti-lock brake by using the reinforcement learning has feasibility. The basic idea of reinforcement learning is as follows: the intelligent agent makes an action in a specific environment, the environment gives a feedback reward according to the action, the intelligent agent adjusts the action according to the reward, and a higher reward is expected to be obtained.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention is proposed in view of the above problems of the existing abs control algorithm and the reward function definition in reinforcement learning.
Therefore, the technical problem solved by the invention is as follows: when the reinforcement learning algorithm is applied to anti-lock brake control, the defect that the excitation function is unreasonably defined in the conventional reinforcement learning is overcome.
In order to solve the technical problems, the invention provides the following technical scheme: an anti-lock brake control method based on reinforcement learning comprises the following steps: extracting key parameters; quantizing the extracted key parameters and limiting the range of the key parameters; constructing a reinforcement learning module comprising a reward function; defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result; inputting the output result to the reinforcement learning module for training; and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the key parameter comprises wheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the calculation function of the difference ratio parameter is (V)2-V1)/V2And limiting the calculated value of the difference ratio parameter within the range of 0-1.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the calculated value of the difference proportion parameter is defined as resetting the calculated value of the difference proportion parameter to 1 when the calculated value of the difference proportion parameter is larger than 1; resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0; and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: defining the wheel speed variation range reward function value includes the step of comparing different body speeds V2Wheel speed V1The shortest braking distance L under the condition; defining the value range of the difference value proportion parameter under the condition of the shortest braking distance L, and extracting the wheel speed V1Is said vehicle body speed V270-90%, the shortest braking distance; according to the vehicle body speed V under the shortest braking distance L2And speed V of wheel1The difference ratio parameter of (2), the wheel speed V1The range-of-variation reward function value is defined as:
V1<75%V2when, V1=1-((V2-V1/V2-25%)÷75%);
V1>85%V2When, V1=1-(15%-(V2-V1/V2)÷15%);
When the wheel speed V is1Is said vehicle body speed V275% -85%, the wheel speed V1The variation range reward function value is 1.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: defining the value of the reward function for the range of variation of the braking time comprises the step of comparing different body speeds V2Wheel speed V1The shortest braking distance L under the condition; defining the single braking time parameter T under the condition of the shortest braking distance L1Extracting the single braking time parameter T1When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter; according to the value range, the brake time variation range reward function value is defined as:
when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;
when the time is more than or equal to 150ms, t is 1;
wherein t represents a braking time.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: when the wheel speed V is1Is said vehicle body speed V2And when the brake is 80%, the shortest brake distance L is the smallest.
In order to solve the technical problems, the invention also provides the following technical scheme: an anti-lock brake control system based on reinforcement learning comprises an extraction module for extracting wheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1These two key parameters, and quantizing both of said key parameters; the limiting module limits the range of the two quantized key parameters according to the two quantized key parameters to respectively obtain the difference ratio parameter and the single braking time length parameter when the braking distance is shortestNumber T1The value range of (a); a definition module for respectively defining the wheel speeds V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value; the output module is used for inputting an algorithm formed by multiplying the two function values defined in the definition module into the reinforcement learning module for training; and the control module controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.
As a preferable aspect of the reinforcement learning-based antilock brake control system according to the present invention, wherein: the extraction module specifically comprises an analysis unit for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm; the extraction unit is used for extracting two key parameters; and the quantization unit is used for quantizing the two key parameters.
As a preferable aspect of the reinforcement learning-based antilock brake control system according to the present invention, wherein: the definition module specifically comprises a function definition unit for respectively defining the wheel speed V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value; and the normalizing unit is used for normalizing the two well-defined function values.
The invention has the beneficial effects that: in the reward function definition of reinforcement learning, the intelligent agent can learn comprehensive optimal rewards under different evaluation indexes by adding various types of rewards in the reward function, the condition that the intelligent agent only can perform excellent performance on one side is avoided, the definition of normalization is added, the reward value is scaled to 0-1, all the rewards are in a balanced state, a certain reward is prevented from occupying an absolute dominant position, the defect that the reward function definition is unreasonable in the existing reinforcement learning is overcome, the reinforcement learning algorithm defining a new reward function is applied to anti-lock brake control, and the performance which is more excellent than that of the traditional algorithm is obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a schematic flow chart of the operation of the present invention;
FIG. 2 is a schematic flow chart of system modules for implementing the present invention
FIG. 3 is a schematic flow chart of a general process for carrying out the present invention;
FIG. 4 is a brake effect diagram embodying the present invention;
fig. 5 is a diagram of braking effect of the prior art.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1 to 3, in the embodiment, it is proposed that a reinforcement learning algorithm is applied to the abs control, because the reinforcement learning algorithm is suitable for handling the problem of serialization, and the abs operation of the car during emergency braking conforms to the reinforcement learning characteristic, the present invention has strong practical operability, and can achieve excellent abs control by cooperating with the definition of the reward function in the present invention.
In particular to an anti-lock brake control method based on reinforcement learning, which comprises the following steps,
s1: extracting key parameters, and extracting wheel speed V by analyzing and summarizing the action principle of the traditional anti-lock brake algorithm1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1
Wherein, considering the wheel speed V at the time of braking1With vehicle body speed V2The difference between the two parameters is important, the first extracted key parameter is the proportional difference parameter of the two parameters, and the calculation function is (V)2-V1)/V2And limiting the calculated value of the difference ratio parameter within the range of 0-1 by the following limiting rule:
resetting the calculated value of the difference ratio parameter to 1 when the calculated value of the difference ratio parameter is larger than 1;
resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0;
and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.
Secondly, extracting a parameter T for controlling the time length of single braking in the reinforcement learning algorithm according to the mode of point braking1
S2: quantifying the extracted key parameters and limiting the range of the key parameters, and specifically comprising the following steps of:
first, in the simulation system, different vehicle body speeds V are compared2Wheel speed V1The shortest braking distance L under the condition is obtained by quantifying key parameters and extracting the current wheel speed V1Is the vehicle body speed V270-90%, the parameter with the shortest braking distance, in general, the wheel speed V1Is the vehicle body speed V2When the brake force is 80%, the optimal brake distance can be obtained under the comprehensive condition;
then in a simulation system, analyzing the parameter T of the time length of single braking under the condition of the shortest braking distance1Extracting the single braking time length parameter T1And when the brake is between 50 and 150ms, the brake distance is shortest.
S3: and constructing the reinforcement learning module comprising the reward function, wherein the reinforcement learning module comprising the reward function needs to be reconstructed considering that in the algorithm, the reward is realized by the defined reward function, and the superiority of the defined reward function directly determines the construction quality of the learning module.
S4: the method comprises the following steps of defining a wheel speed change range reward function value and a brake time change range reward function value in a reward function, multiplying the wheel speed change range reward function value and the brake time change range reward function value and outputting a result, wherein the wheel speed change range reward function value and the brake time change range reward function value are multiplied by each other, and the method specifically comprises the following steps:
firstly, a wheel speed variation range reward function value is defined, and the method specifically comprises the following steps:
① comparison of different body speeds V2Wheel speed V1The shortest braking distance L under the condition;
② defining the range of difference ratio parameters under the condition of the shortest braking distance L, and extracting the current wheel speed V1Is the vehicle body speed V270-90%, the shortest braking distance;
③ according to the vehicle body speed V at the shortest braking distance L2And speed V of wheel1The difference ratio parameter of (1) will be the wheel speed V1The range-of-variation reward function value is defined as:
V1<75%V2when, V1=1-((V2-V1/V2-25%)÷75%);
V1>85%V2When, V1=1-(15%-(V2-V1/V2)÷15%);
When the wheel speed V1Is the vehicle body speed V275% -85%, wheel speed V1The variation range reward function value is 1.
It should be noted that: when the wheel speed is higher than 85% of the vehicle body speed and lower than 75% of the vehicle body speed, awarding is carried out, so that the intelligent agent can learn a better difference value, and the condition that the intelligent agent only deviates to one direction is avoided;
the wheel speed reward value is zoomed to 0-1, reward weights when the wheel speed is higher than 85% of the vehicle body speed and lower than 75% of the vehicle body speed are consistent, and the intelligent body is prevented from leaning to learn on the one hand;
when the wheel speed is 75% -85% of the vehicle speed, the reward value is constant 1, a buffer area can be provided for the intelligent body, and severe jumping is avoided.
Then, defining a brake time variation range reward function value, and specifically comprising the following steps:
① comparisonDifferent vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition;
② defining single braking time parameter T under the condition of shortest braking distance L1Extracting the parameter T of the single braking time1When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter;
③ according to the value range, the brake time variation range reward function value is defined as:
when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;
when the time is more than or equal to 150ms, t is 1;
wherein t represents a braking time.
It needs to be further explained that: wheel speed V1The variation range reward function value and the brake time variation range reward function value are normalized in the definition process, so that the algorithm can be balanced, the training efficiency is improved, and when the wheel speed V is obtained1The method has the advantages that an algorithm formed by multiplying the variation range reward function value and the brake time variation range reward function value is used for constructing a reinforcement learning module, so that the problem that the model learned by the intelligent body is biased and difficult to adjust due to the fact that the intelligent body can learn comprehensive optimal rewards under different evaluation indexes by spending a large amount of time in searching and learning in an invalid and inefficient range is solved, meanwhile, multiple types of rewards are added in the reward function definition, the intelligent body can learn the comprehensive optimal rewards under different evaluation indexes, the situation that the intelligent body only can perform excellent on one side is avoided, the definition of normalization is added in the reward function definition, reward values are scaled to be 0-1, all rewards are in a balanced state, and the situation that a certain reward occupies an absolute dominant position is avoided.
S5: inputting the output result to a reinforcement learning module for training;
s6: and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
Example 2
Referring to fig. 1 to 5, the present embodiment provides an antilock brake control system based on reinforcement learning, including:
an extraction module 100 for extracting a wheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1These two key parameters, and quantizing both of said key parameters,
wherein, the extracting module 100 specifically comprises,
the analysis unit is used for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm;
the extraction unit is used for extracting two key parameters;
and the quantization unit is used for quantizing the two key parameters.
The limiting module 200 limits the range according to the two quantized key parameters to respectively obtain the difference ratio parameter and the single braking time length parameter T when the braking distance is shortest1The value range of (a);
a defining module 300 for respectively defining the wheel speeds V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value;
wherein the definition module 300 includes, for instance,
a function defining unit for respectively defining wheel speeds V under the condition of the shortest braking distance L1Variable range reward function value and brakeA time-varying range reward function value;
and the normalizing unit is used for normalizing the two well-defined function values.
An output module 400, configured to input an algorithm formed by multiplying two function values defined in the definition module into the reinforcement learning module for training;
and the control module 500 controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.
It should be noted that: the algorithm model provided by the invention can have an obvious convergence effect, and the braking distance is obviously reduced, while the traditional algorithm model has no convergence in the training and vibrates in the range of 44 m-over 70m all the time.
As shown in fig. 4 and 5 of the accompanying drawings, which are graphs comparing the training effect of the present invention and the conventional algorithm, wherein:
①, the axis of abscissas in the picture represents the training times, the axis of ordinates represents the braking distance from the vehicle starting braking at 100km/h to the vehicle at the speed of 0km/h, the unit is meter, and the English symbol is m;
② the road friction coefficients in the two pictures are consistent;
③ fig. 5 shows the training result when the patent definition is not adopted, after 6000 times of training, the braking distance of the vehicle is vibrated in the range of 44 m-over 70m, and there is no sign of convergence;
④ fig. (4) adopts the training result defined by this patent, the total training times is 25000 times, but the convergence condition appears in 2000 times, after 6000 times of training, the braking distance can be generally controlled below 50m, and the braking distance fluctuates within 41-44 m with the continuous increase of the training times.
As is obvious from the effect comparison graph, the reinforcement learning module containing the reward function constructed by the invention can provide an excellent algorithm for anti-lock brake control, and can be better applied to the field of automatic driving, so that the outstanding control effect is achieved.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. An anti-lock brake control method based on reinforcement learning is characterized in that: comprises the following steps of (a) carrying out,
extracting key parameters;
quantizing the extracted key parameters and limiting the range of the key parameters;
constructing a reinforcement learning module comprising a reward function;
defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result;
inputting the output result to the reinforcement learning module for training;
and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.
2. The reinforcement learning-based antilock brake control method according to claim 1, wherein: the key parameters include the number of parameters that are,
wheel speed V1With vehicle body speed V2The difference ratio parameter of (2);
time length parameter T of single braking1
3. The reinforcement learning-based antilock brake control method according to claim 2, wherein: the calculation function of the difference ratio parameter is,
(V2-V1)/V2and limiting the calculated value of the difference ratio parameter within the range of 0-1.
4. The reinforcement learning-based antilock brake control method according to claim 3, wherein: the calculated value of the difference ratio parameter is defined as,
resetting the calculated value of the difference ratio parameter to 1 when the calculated value of the difference ratio parameter is greater than 1;
resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0;
and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.
5. The reinforcement learning-based antilock brake control method according to any one of claims 1 to 4, wherein: defining the wheel speed range reward function value includes the steps of,
comparing different vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition;
defining the value of the difference proportion parameter under the condition of the shortest braking distance LValue range, extracting the wheel speed V1Is said vehicle body speed V270-90%, the shortest braking distance;
according to the vehicle body speed V under the shortest braking distance L2And speed V of wheel1The difference ratio parameter of (2), the wheel speed V1The range-of-variation reward function value is defined as:
V1<75%V2when, V1=1-((V2-V1/V2-25%)÷75%);
V1>85%V2When, V1=1-(15%-(V2-V1/V2)÷15%);
When the wheel speed V is1Is said vehicle body speed V275% -85%, the wheel speed V1The variation range reward function value is 1.
6. The reinforcement learning-based antilock brake control method according to any one of claims 1 to 4, wherein: defining the brake time variation range reward function value comprises the steps of,
comparing different vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition;
defining the single braking time parameter T under the condition of the shortest braking distance L1Extracting the single braking time parameter T1When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter;
according to the value range, the brake time variation range reward function value is defined as:
when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;
when the time is more than or equal to 150ms, t is 1;
wherein t represents a braking time.
7. The reinforcement learning-based antilock brake control method according to claim 5, wherein: when the wheel speed V is1Is said vehicle body speed V2And when the brake is 80%, the shortest brake distance L is the smallest.
8. An anti-lock brake control system based on reinforcement learning is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
an extraction module (100) for extracting a wheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1These two key parameters, and quantizing both of said key parameters;
a limiting module (200) for limiting the range according to the two quantized key parameters to respectively obtain the difference proportion parameter and the single braking time length parameter T when the braking distance is shortest1The value range of (a);
a definition module (300) for respectively defining the wheel speeds V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value;
the output module (400) is used for inputting an algorithm formed by multiplying two function values defined in the definition module (300) into a reinforcement learning module for training;
and the control module (500) controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.
9. The reinforcement learning-based antilock brake control system according to claim 8, wherein: the extraction module (100) comprises in particular,
the analysis unit is used for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm;
the extraction unit is used for extracting two key parameters;
and the quantization unit is used for quantizing the two key parameters.
10. The reinforcement learning-based antilock brake control system according to claim 8, wherein: the definition module (300) comprises in particular,
function defining units for respectively defining the shortest braking distanceWheel speed V under the condition of L separation1A variation range reward function value and a brake time variation range reward function value;
and the normalizing unit is used for normalizing the two well-defined function values.
CN201911194029.7A 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning Active CN110843746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911194029.7A CN110843746B (en) 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911194029.7A CN110843746B (en) 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN110843746A true CN110843746A (en) 2020-02-28
CN110843746B CN110843746B (en) 2022-06-14

Family

ID=69605967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911194029.7A Active CN110843746B (en) 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110843746B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111605558A (en) * 2020-04-21 2020-09-01 宁波吉利汽车研究开发有限公司 Vehicle speed determination method and device, electronic equipment and vehicle
CN112906304A (en) * 2021-03-10 2021-06-04 北京航空航天大学 Brake control method and device

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004249812A (en) * 2003-02-19 2004-09-09 Fuji Heavy Ind Ltd Device and method for generating vehicle movement model
US20070067085A1 (en) * 2005-09-19 2007-03-22 Ford Global Technologies Llc Integrated vehicle control system using dynamically determined vehicle conditions
CN101224740A (en) * 2008-01-31 2008-07-23 赵西安 Anti-lock method
CN101311047A (en) * 2008-05-04 2008-11-26 重庆邮电大学 Vehicle anti-lock brake control method based on least squares support vector machine
DE102009019960A1 (en) * 2008-06-09 2009-12-10 Ford Global Technologies, LLC, Dearborn Method for compensating for normal forces in antilock control
US20110175438A1 (en) * 2010-01-21 2011-07-21 Ford Global Technologies Llc Vehicle Line-Locking Braking System and Method
CN104015711A (en) * 2014-06-17 2014-09-03 广西大学 Dual fuzzy control method of automobile ABS
US20180009445A1 (en) * 2016-07-08 2018-01-11 Toyota Motor Engineering & Manufacturing North America, Inc. Online learning and vehicle control method based on reinforcement learning without active exploration
US20190113918A1 (en) * 2017-10-18 2019-04-18 Luminar Technologies, Inc. Controlling an autonomous vehicle based on independent driving decisions
CN109709956A (en) * 2018-12-26 2019-05-03 同济大学 A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding
CN109733415A (en) * 2019-01-08 2019-05-10 同济大学 A kind of automatic Pilot following-speed model that personalizes based on deeply study
CN109808706A (en) * 2019-02-14 2019-05-28 上海思致汽车工程技术有限公司 Learning type assistant driving control method, device, system and vehicle
CN109858630A (en) * 2019-02-01 2019-06-07 清华大学 Method and apparatus for intensified learning
WO2019155061A1 (en) * 2018-02-09 2019-08-15 Deepmind Technologies Limited Distributional reinforcement learning using quantile function neural networks
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN110254408A (en) * 2019-05-21 2019-09-20 江苏大学 A kind of adaptive time-varying slip rate constraint control algolithm of intelligent automobile anti-lock braking system
CN110348278A (en) * 2018-04-02 2019-10-18 索尼公司 The efficient intensified learning frame of the sample of view-based access control model for autonomous driving
CN110450771A (en) * 2019-08-29 2019-11-15 合肥工业大学 A kind of intelligent automobile stability control method based on deeply study

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004249812A (en) * 2003-02-19 2004-09-09 Fuji Heavy Ind Ltd Device and method for generating vehicle movement model
US20070067085A1 (en) * 2005-09-19 2007-03-22 Ford Global Technologies Llc Integrated vehicle control system using dynamically determined vehicle conditions
CN101224740A (en) * 2008-01-31 2008-07-23 赵西安 Anti-lock method
CN101311047A (en) * 2008-05-04 2008-11-26 重庆邮电大学 Vehicle anti-lock brake control method based on least squares support vector machine
DE102009019960A1 (en) * 2008-06-09 2009-12-10 Ford Global Technologies, LLC, Dearborn Method for compensating for normal forces in antilock control
US20110175438A1 (en) * 2010-01-21 2011-07-21 Ford Global Technologies Llc Vehicle Line-Locking Braking System and Method
CN104015711A (en) * 2014-06-17 2014-09-03 广西大学 Dual fuzzy control method of automobile ABS
US20180009445A1 (en) * 2016-07-08 2018-01-11 Toyota Motor Engineering & Manufacturing North America, Inc. Online learning and vehicle control method based on reinforcement learning without active exploration
US20190113918A1 (en) * 2017-10-18 2019-04-18 Luminar Technologies, Inc. Controlling an autonomous vehicle based on independent driving decisions
WO2019155061A1 (en) * 2018-02-09 2019-08-15 Deepmind Technologies Limited Distributional reinforcement learning using quantile function neural networks
CN110348278A (en) * 2018-04-02 2019-10-18 索尼公司 The efficient intensified learning frame of the sample of view-based access control model for autonomous driving
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN109709956A (en) * 2018-12-26 2019-05-03 同济大学 A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding
CN109733415A (en) * 2019-01-08 2019-05-10 同济大学 A kind of automatic Pilot following-speed model that personalizes based on deeply study
CN109858630A (en) * 2019-02-01 2019-06-07 清华大学 Method and apparatus for intensified learning
CN109808706A (en) * 2019-02-14 2019-05-28 上海思致汽车工程技术有限公司 Learning type assistant driving control method, device, system and vehicle
CN110254408A (en) * 2019-05-21 2019-09-20 江苏大学 A kind of adaptive time-varying slip rate constraint control algolithm of intelligent automobile anti-lock braking system
CN110450771A (en) * 2019-08-29 2019-11-15 合肥工业大学 A kind of intelligent automobile stability control method based on deeply study

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111605558A (en) * 2020-04-21 2020-09-01 宁波吉利汽车研究开发有限公司 Vehicle speed determination method and device, electronic equipment and vehicle
CN111605558B (en) * 2020-04-21 2022-07-19 浙江吉利控股集团有限公司 Vehicle speed determination method and device, electronic equipment and vehicle
CN112906304A (en) * 2021-03-10 2021-06-04 北京航空航天大学 Brake control method and device
CN112906304B (en) * 2021-03-10 2023-04-07 北京航空航天大学 Brake control method and device

Also Published As

Publication number Publication date
CN110843746B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
JP6799197B2 (en) Neural network construction device, information processing device, neural network construction method and program
CN110880036B (en) Neural network compression method, device, computer equipment and storage medium
EP3295385B1 (en) Fixed point neural network based on floating point neural network quantization
CN110843746A (en) Anti-lock brake control method and system based on reinforcement learning
CN105740793A (en) Road bump condition and road type identification based automatic speed adjustment method and system
CN114355793B (en) Training method and device for automatic driving planning model for vehicle simulation evaluation
CN107770525A (en) A kind of method and device of Image Coding
CN112784885B (en) Automatic driving method, device, equipment, medium and vehicle based on artificial intelligence
CN116490858A (en) Adaptive generation and evaluation of autonomous vehicle key scenarios
CN103544358A (en) Method and device for calculating brake performance of vehicle
CN113625753B (en) Method for guiding neural network to learn unmanned aerial vehicle maneuver flight by expert rules
CN114547782A (en) Method for calculating speed and road gradient of electric automobile
CN108960160B (en) Method and device for predicting structured state quantity based on unstructured prediction model
CN110719487B (en) Video prediction method and device, electronic equipment and vehicle
CN114332520B (en) Abnormal driving behavior recognition model construction method based on deep learning
CN114615505A (en) Point cloud attribute compression method and device based on depth entropy coding and storage medium
CN111768493B (en) Point cloud processing method based on distribution parameter coding
EP4002270A1 (en) Image recognition evaluation program, image recognition evaluation method, evaluation device, and evaluation system
CN113642832A (en) Method and system for evaluating driving behavior of commercial vehicle
CN111401544A (en) Compression method and device for deep neural network and computer readable storage medium
CN117933349B (en) Visual reinforcement learning method based on safety mutual simulation measurement
CN116562346B (en) L0 norm-based artificial neural network model compression method and device
US20240221438A1 (en) Data processing apparatus and non-transitory recording medium
Liu et al. Mean-square convergence and stability of the backward Euler method for stochastic differential delay equations with highly nonlinear growing coefficients
CN113807630B (en) Method, device, equipment and storage medium for acquiring requirements of robot service platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant