CN110843746A

CN110843746A - Anti-lock brake control method and system based on reinforcement learning

Info

Publication number: CN110843746A
Application number: CN201911194029.7A
Authority: CN
Inventors: 董舒
Original assignee: Dilu Technology Co Ltd
Current assignee: Dilu Technology Co Ltd
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-02-28
Anticipated expiration: 2039-11-28
Also published as: CN110843746B

Abstract

The invention discloses an anti-lock brake control method and system based on reinforcement learning, which comprises the following steps: extracting key parameters; quantizing the extracted key parameters and limiting the range of the key parameters; constructing a reinforcement learning module comprising a reward function; defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result; inputting the output result to the reinforcement learning module for training; the trained reinforcement learning module is used for controlling the anti-lock brake of the vehicle, the defect that the existing reinforcement learning is unreasonable in definition of the reward function is overcome, and the reinforcement learning algorithm with the new reward function defined is applied to the anti-lock brake control, so that the performance of the reinforcement learning algorithm is better than that of the traditional algorithm.

Description

Anti-lock brake control method and system based on reinforcement learning

Technical Field

The invention relates to the technical field of automatic driving of automobiles, in particular to an anti-lock brake control method and system based on reinforcement learning.

Background

In modern automobile products, an anti-lock brake system can ensure that the automobile prevents wheels from locking in emergency braking, so that the automobile body is stabilized, the braking distance is shortened, and the anti-lock brake system is a standard configuration of the automobile. With the development of artificial intelligence, it is also possible to realize antilock braking using artificial intelligence technology, and theoretically, it is possible to obtain more excellent performance than the conventional algorithm.

The reinforcement learning is an important direction of artificial intelligence and is more suitable for processing the problem of serialization, and the anti-lock brake operation of the automobile during emergency braking accords with the characteristic of reinforcement learning, so that the realization of anti-lock brake by using the reinforcement learning has feasibility. The basic idea of reinforcement learning is as follows: the intelligent agent makes an action in a specific environment, the environment gives a feedback reward according to the action, the intelligent agent adjusts the action according to the reward, and a higher reward is expected to be obtained.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention is proposed in view of the above problems of the existing abs control algorithm and the reward function definition in reinforcement learning.

Therefore, the technical problem solved by the invention is as follows: when the reinforcement learning algorithm is applied to anti-lock brake control, the defect that the excitation function is unreasonably defined in the conventional reinforcement learning is overcome.

In order to solve the technical problems, the invention provides the following technical scheme: an anti-lock brake control method based on reinforcement learning comprises the following steps: extracting key parameters; quantizing the extracted key parameters and limiting the range of the key parameters; constructing a reinforcement learning module comprising a reward function; defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result; inputting the output result to the reinforcement learning module for training; and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.

As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the key parameter comprises wheel speed V₁With vehicle body speed V₂Difference ratio parameter and single braking time length parameter T₁。

As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the calculation function of the difference ratio parameter is (V)₂-V₁)/V₂And limiting the calculated value of the difference ratio parameter within the range of 0-1.

As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the calculated value of the difference proportion parameter is defined as resetting the calculated value of the difference proportion parameter to 1 when the calculated value of the difference proportion parameter is larger than 1; resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0; and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.

As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: defining the wheel speed variation range reward function value includes the step of comparing different body speeds V₂Wheel speed V₁The shortest braking distance L under the condition; defining the value range of the difference value proportion parameter under the condition of the shortest braking distance L, and extracting the wheel speed V₁Is said vehicle body speed V₂70-90%, the shortest braking distance; according to the vehicle body speed V under the shortest braking distance L₂And speed V of wheel₁The difference ratio parameter of (2), the wheel speed V₁The range-of-variation reward function value is defined as:

V₁＜75％V₂when, V₁＝1-((V₂-V1/V₂-25％)÷75％)；

V₁＞85％V₂When, V₁＝1-(15％-(V₂-V₁/V₂)÷15％)；

When the wheel speed V is₁Is said vehicle body speed V₂75% -85%, the wheel speed V₁The variation range reward function value is 1.

As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: defining the value of the reward function for the range of variation of the braking time comprises the step of comparing different body speeds V₂Wheel speed V₁The shortest braking distance L under the condition; defining the single braking time parameter T under the condition of the shortest braking distance L₁Extracting the single braking time parameter T₁When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter; according to the value range, the brake time variation range reward function value is defined as:

when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;

when the time is more than or equal to 150ms, t is 1;

wherein t represents a braking time.

As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: when the wheel speed V is₁Is said vehicle body speed V₂And when the brake is 80%, the shortest brake distance L is the smallest.

In order to solve the technical problems, the invention also provides the following technical scheme: an anti-lock brake control system based on reinforcement learning comprises an extraction module for extracting wheel speed V₁With vehicle body speed V₂Difference ratio parameter and single braking time length parameter T₁These two key parameters, and quantizing both of said key parameters; the limiting module limits the range of the two quantized key parameters according to the two quantized key parameters to respectively obtain the difference ratio parameter and the single braking time length parameter when the braking distance is shortestNumber T₁The value range of (a); a definition module for respectively defining the wheel speeds V under the condition of the shortest braking distance L₁A variation range reward function value and a brake time variation range reward function value; the output module is used for inputting an algorithm formed by multiplying the two function values defined in the definition module into the reinforcement learning module for training; and the control module controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.

As a preferable aspect of the reinforcement learning-based antilock brake control system according to the present invention, wherein: the extraction module specifically comprises an analysis unit for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm; the extraction unit is used for extracting two key parameters; and the quantization unit is used for quantizing the two key parameters.

As a preferable aspect of the reinforcement learning-based antilock brake control system according to the present invention, wherein: the definition module specifically comprises a function definition unit for respectively defining the wheel speed V under the condition of the shortest braking distance L₁A variation range reward function value and a brake time variation range reward function value; and the normalizing unit is used for normalizing the two well-defined function values.

The invention has the beneficial effects that: in the reward function definition of reinforcement learning, the intelligent agent can learn comprehensive optimal rewards under different evaluation indexes by adding various types of rewards in the reward function, the condition that the intelligent agent only can perform excellent performance on one side is avoided, the definition of normalization is added, the reward value is scaled to 0-1, all the rewards are in a balanced state, a certain reward is prevented from occupying an absolute dominant position, the defect that the reward function definition is unreasonable in the existing reinforcement learning is overcome, the reinforcement learning algorithm defining a new reward function is applied to anti-lock brake control, and the performance which is more excellent than that of the traditional algorithm is obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

FIG. 1 is a schematic flow chart of the operation of the present invention;

FIG. 2 is a schematic flow chart of system modules for implementing the present invention

FIG. 3 is a schematic flow chart of a general process for carrying out the present invention;

FIG. 4 is a brake effect diagram embodying the present invention;

fig. 5 is a diagram of braking effect of the prior art.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

Referring to fig. 1 to 3, in the embodiment, it is proposed that a reinforcement learning algorithm is applied to the abs control, because the reinforcement learning algorithm is suitable for handling the problem of serialization, and the abs operation of the car during emergency braking conforms to the reinforcement learning characteristic, the present invention has strong practical operability, and can achieve excellent abs control by cooperating with the definition of the reward function in the present invention.

In particular to an anti-lock brake control method based on reinforcement learning, which comprises the following steps,

s1: extracting key parameters, and extracting wheel speed V by analyzing and summarizing the action principle of the traditional anti-lock brake algorithm₁With vehicle body speed V₂Difference ratio parameter and single braking time length parameter T₁。

Wherein, considering the wheel speed V at the time of braking₁With vehicle body speed V₂The difference between the two parameters is important, the first extracted key parameter is the proportional difference parameter of the two parameters, and the calculation function is (V)₂-V₁)/V₂And limiting the calculated value of the difference ratio parameter within the range of 0-1 by the following limiting rule:

resetting the calculated value of the difference ratio parameter to 1 when the calculated value of the difference ratio parameter is larger than 1;

resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0;

and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.

Secondly, extracting a parameter T for controlling the time length of single braking in the reinforcement learning algorithm according to the mode of point braking₁。

S2: quantifying the extracted key parameters and limiting the range of the key parameters, and specifically comprising the following steps of:

first, in the simulation system, different vehicle body speeds V are compared₂Wheel speed V₁The shortest braking distance L under the condition is obtained by quantifying key parameters and extracting the current wheel speed V₁Is the vehicle body speed V₂70-90%, the parameter with the shortest braking distance, in general, the wheel speed V₁Is the vehicle body speed V₂When the brake force is 80%, the optimal brake distance can be obtained under the comprehensive condition;

then in a simulation system, analyzing the parameter T of the time length of single braking under the condition of the shortest braking distance₁Extracting the single braking time length parameter T₁And when the brake is between 50 and 150ms, the brake distance is shortest.

S3: and constructing the reinforcement learning module comprising the reward function, wherein the reinforcement learning module comprising the reward function needs to be reconstructed considering that in the algorithm, the reward is realized by the defined reward function, and the superiority of the defined reward function directly determines the construction quality of the learning module.

S4: the method comprises the following steps of defining a wheel speed change range reward function value and a brake time change range reward function value in a reward function, multiplying the wheel speed change range reward function value and the brake time change range reward function value and outputting a result, wherein the wheel speed change range reward function value and the brake time change range reward function value are multiplied by each other, and the method specifically comprises the following steps:

firstly, a wheel speed variation range reward function value is defined, and the method specifically comprises the following steps:

① comparison of different body speeds V₂Wheel speed V₁The shortest braking distance L under the condition;

② defining the range of difference ratio parameters under the condition of the shortest braking distance L, and extracting the current wheel speed V₁Is the vehicle body speed V₂70-90%, the shortest braking distance;

③ according to the vehicle body speed V at the shortest braking distance L₂And speed V of wheel₁The difference ratio parameter of (1) will be the wheel speed V₁The range-of-variation reward function value is defined as:

V₁＜75％V₂when, V₁＝1-((V₂-V₁/V₂-25％)÷75％)；

V₁＞85％V₂When, V₁＝1-(15％-(V₂-V₁/V₂)÷15％)；

When the wheel speed V₁Is the vehicle body speed V₂75% -85%, wheel speed V₁The variation range reward function value is 1.

It should be noted that: when the wheel speed is higher than 85% of the vehicle body speed and lower than 75% of the vehicle body speed, awarding is carried out, so that the intelligent agent can learn a better difference value, and the condition that the intelligent agent only deviates to one direction is avoided;

the wheel speed reward value is zoomed to 0-1, reward weights when the wheel speed is higher than 85% of the vehicle body speed and lower than 75% of the vehicle body speed are consistent, and the intelligent body is prevented from leaning to learn on the one hand;

when the wheel speed is 75% -85% of the vehicle speed, the reward value is constant 1, a buffer area can be provided for the intelligent body, and severe jumping is avoided.

Then, defining a brake time variation range reward function value, and specifically comprising the following steps:

① comparisonDifferent vehicle body speeds V₂Wheel speed V₁The shortest braking distance L under the condition;

② defining single braking time parameter T under the condition of shortest braking distance L₁Extracting the parameter T of the single braking time₁When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter;

③ according to the value range, the brake time variation range reward function value is defined as:

when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;

when the time is more than or equal to 150ms, t is 1;

wherein t represents a braking time.

It needs to be further explained that: wheel speed V₁The variation range reward function value and the brake time variation range reward function value are normalized in the definition process, so that the algorithm can be balanced, the training efficiency is improved, and when the wheel speed V is obtained₁The method has the advantages that an algorithm formed by multiplying the variation range reward function value and the brake time variation range reward function value is used for constructing a reinforcement learning module, so that the problem that the model learned by the intelligent body is biased and difficult to adjust due to the fact that the intelligent body can learn comprehensive optimal rewards under different evaluation indexes by spending a large amount of time in searching and learning in an invalid and inefficient range is solved, meanwhile, multiple types of rewards are added in the reward function definition, the intelligent body can learn the comprehensive optimal rewards under different evaluation indexes, the situation that the intelligent body only can perform excellent on one side is avoided, the definition of normalization is added in the reward function definition, reward values are scaled to be 0-1, all rewards are in a balanced state, and the situation that a certain reward occupies an absolute dominant position is avoided.

S5: inputting the output result to a reinforcement learning module for training;

s6: and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.

It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

Example 2

Referring to fig. 1 to 5, the present embodiment provides an antilock brake control system based on reinforcement learning, including:

an extraction module 100 for extracting a wheel speed V₁With vehicle body speed V₂Difference ratio parameter and single braking time length parameter T₁These two key parameters, and quantizing both of said key parameters,

wherein, the extracting module 100 specifically comprises,

the analysis unit is used for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm;

the extraction unit is used for extracting two key parameters;

and the quantization unit is used for quantizing the two key parameters.

The limiting module 200 limits the range according to the two quantized key parameters to respectively obtain the difference ratio parameter and the single braking time length parameter T when the braking distance is shortest₁The value range of (a);

a defining module 300 for respectively defining the wheel speeds V under the condition of the shortest braking distance L₁A variation range reward function value and a brake time variation range reward function value;

wherein the definition module 300 includes, for instance,

a function defining unit for respectively defining wheel speeds V under the condition of the shortest braking distance L₁Variable range reward function value and brakeA time-varying range reward function value;

and the normalizing unit is used for normalizing the two well-defined function values.

An output module 400, configured to input an algorithm formed by multiplying two function values defined in the definition module into the reinforcement learning module for training;

and the control module 500 controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.

It should be noted that: the algorithm model provided by the invention can have an obvious convergence effect, and the braking distance is obviously reduced, while the traditional algorithm model has no convergence in the training and vibrates in the range of 44 m-over 70m all the time.

As shown in fig. 4 and 5 of the accompanying drawings, which are graphs comparing the training effect of the present invention and the conventional algorithm, wherein:

①, the axis of abscissas in the picture represents the training times, the axis of ordinates represents the braking distance from the vehicle starting braking at 100km/h to the vehicle at the speed of 0km/h, the unit is meter, and the English symbol is m;

② the road friction coefficients in the two pictures are consistent;

③ fig. 5 shows the training result when the patent definition is not adopted, after 6000 times of training, the braking distance of the vehicle is vibrated in the range of 44 m-over 70m, and there is no sign of convergence;

④ fig. (4) adopts the training result defined by this patent, the total training times is 25000 times, but the convergence condition appears in 2000 times, after 6000 times of training, the braking distance can be generally controlled below 50m, and the braking distance fluctuates within 41-44 m with the continuous increase of the training times.

As is obvious from the effect comparison graph, the reinforcement learning module containing the reward function constructed by the invention can provide an excellent algorithm for anti-lock brake control, and can be better applied to the field of automatic driving, so that the outstanding control effect is achieved.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. An anti-lock brake control method based on reinforcement learning is characterized in that: comprises the following steps of (a) carrying out,

extracting key parameters;

quantizing the extracted key parameters and limiting the range of the key parameters;

constructing a reinforcement learning module comprising a reward function;

defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result;

inputting the output result to the reinforcement learning module for training;

and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.

2. The reinforcement learning-based antilock brake control method according to claim 1, wherein: the key parameters include the number of parameters that are,

wheel speed V₁With vehicle body speed V₂The difference ratio parameter of (2);

time length parameter T of single braking₁。

3. The reinforcement learning-based antilock brake control method according to claim 2, wherein: the calculation function of the difference ratio parameter is,

(V₂－V₁)/V₂and limiting the calculated value of the difference ratio parameter within the range of 0-1.

4. The reinforcement learning-based antilock brake control method according to claim 3, wherein: the calculated value of the difference ratio parameter is defined as,

resetting the calculated value of the difference ratio parameter to 1 when the calculated value of the difference ratio parameter is greater than 1;

5. The reinforcement learning-based antilock brake control method according to any one of claims 1 to 4, wherein: defining the wheel speed range reward function value includes the steps of,

comparing different vehicle body speeds V₂Wheel speed V₁The shortest braking distance L under the condition;

defining the value of the difference proportion parameter under the condition of the shortest braking distance LValue range, extracting the wheel speed V₁Is said vehicle body speed V₂70-90%, the shortest braking distance;

according to the vehicle body speed V under the shortest braking distance L₂And speed V of wheel₁The difference ratio parameter of (2), the wheel speed V₁The range-of-variation reward function value is defined as:

V₁＜75％V₂when, V₁＝1－((V₂－V₁/V₂－25％)÷75％)；

V₁＞85％V₂When, V₁＝1－(15％－(V₂－V₁/V₂)÷15％)；

6. The reinforcement learning-based antilock brake control method according to any one of claims 1 to 4, wherein: defining the brake time variation range reward function value comprises the steps of,

defining the single braking time parameter T under the condition of the shortest braking distance L₁Extracting the single braking time parameter T₁When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter;

according to the value range, the brake time variation range reward function value is defined as:

when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;

when the time is more than or equal to 150ms, t is 1;

wherein t represents a braking time.

7. The reinforcement learning-based antilock brake control method according to claim 5, wherein: when the wheel speed V is₁Is said vehicle body speed V₂And when the brake is 80%, the shortest brake distance L is the smallest.

8. An anti-lock brake control system based on reinforcement learning is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

an extraction module (100) for extracting a wheel speed V₁With vehicle body speed V₂Difference ratio parameter and single braking time length parameter T₁These two key parameters, and quantizing both of said key parameters;

a limiting module (200) for limiting the range according to the two quantized key parameters to respectively obtain the difference proportion parameter and the single braking time length parameter T when the braking distance is shortest₁The value range of (a);

a definition module (300) for respectively defining the wheel speeds V under the condition of the shortest braking distance L₁A variation range reward function value and a brake time variation range reward function value;

the output module (400) is used for inputting an algorithm formed by multiplying two function values defined in the definition module (300) into a reinforcement learning module for training;

and the control module (500) controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.

9. The reinforcement learning-based antilock brake control system according to claim 8, wherein: the extraction module (100) comprises in particular,

the extraction unit is used for extracting two key parameters;

and the quantization unit is used for quantizing the two key parameters.

10. The reinforcement learning-based antilock brake control system according to claim 8, wherein: the definition module (300) comprises in particular,

function defining units for respectively defining the shortest braking distanceWheel speed V under the condition of L separation₁A variation range reward function value and a brake time variation range reward function value;