CN113715805B

CN113715805B - Rule fusion deep reinforcement learning energy management method based on working condition identification

Info

Publication number: CN113715805B
Application number: CN202111177978.1A
Authority: CN
Inventors: ***; 昌诚程; 张自宇; 栾众楷; 赵万忠; 周冠; 文凯
Original assignee: Nanjing Tianhang Intelligent Equipment Research Institute Co ltd; Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing Tianhang Intelligent Equipment Research Institute Co ltd; Nanjing University of Aeronautics and Astronautics
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2023-01-06
Anticipated expiration: 2041-10-09
Also published as: CN113715805A

Abstract

The invention discloses a rule fusion deep reinforcement learning energy management method based on working condition identification. The plug-in hybrid electric vehicle is established by taking a plug-in hybrid electric vehicle as an object, a parallel structure is used as a connection mode of an engine and a motor to establish a hybrid electric system model, a working condition library is established by selecting 8 standard working conditions and is subjected to kinematics segmentation, the working conditions of the vehicle are classified and identified by comparing 9 representative parameters according to segmented kinematics segments, then states, actions, agents and penalty functions in a deep Q learning algorithm are designed, and then the designed depth-enhanced learning algorithm with a rule fused is trained and distributed under three different training working conditions, so that the purposes of efficient energy distribution and utilization are achieved, fewer poor samples exist in the training process of the algorithm, the training efficiency is high, and the comprehensive performance of the hybrid electric vehicle system is high.

Description

Rule fusion deep reinforcement learning energy management method based on working condition identification

Technical Field

The invention relates to the field of energy management of hybrid power systems, in particular to a rule fusion deep reinforcement learning energy management method based on working condition identification.

Background

The hybrid power system is a relatively mature driving mode in the transition period from a fuel vehicle to a pure electric vehicle, and a plug-in hybrid power system is widely applied in recent years along with the development of battery technology as a relatively new driving mode.

The energy management strategies of present hybrid vehicles can be roughly divided into three categories: a rule-based energy management policy, an optimization-based energy management policy, and a learning-based energy management policy. The rule-based energy management strategy needs more experimental results and experience, is biased to local optimization at a component level, cannot realize overall optimization control on the plug-in hybrid power system, and the designed rule is usually only aimed at specific working conditions and has poor working condition adaptability. The energy management strategy based on optimization can only solve an optimal solution under known working conditions, and cannot be well suitable for unknown working conditions, the global optimization is easy to generate dimension disasters, the algorithm instantaneity is poor, the dependence degree of an instantaneous optimizer on a model is large, and the optimal distribution in a long time period cannot be guaranteed. The problem of working condition adaptability is not considered in the learning-based energy management strategy, an algorithm is generally trained under a standard working condition, and when the working condition characteristics change, the energy management strategy can cause the problems of unreasonable energy distribution, low running efficiency of a hybrid power system and the like. Meanwhile, the intelligent algorithm gives all action spaces to a machine for exploration during training, and does not have advantages brought by fusion of expert experience, so that the algorithm has more poor samples in the training process, the training efficiency is low, and the problems that the control effect of the trained energy management strategy is not ideal under certain conditions, the comprehensive performance of a hybrid power system is low and the like can be caused. Therefore, aiming at the problems, the invention provides a rule fusion deep reinforcement learning energy management method based on working condition identification, and the purpose of reasonably distributing the energy of the hybrid electric system is achieved.

Disclosure of Invention

The invention aims to solve the technical problem of providing a rule fusion deep reinforcement learning energy management method based on working condition identification aiming at the defects of the background technology. The method solves the problems that in the training process of an algorithm, a plurality of poor samples exist, the training efficiency is low, the control effect of the trained energy management strategy is not ideal under certain conditions, and the comprehensive performance of a hybrid power system is low.

The invention adopts the following technical scheme for solving the technical problems:

a rule fusion deep reinforcement learning energy management method based on working condition identification specifically comprises the following steps:

step 1, establishing a hybrid power system model;

step 2, classifying and identifying working conditions;

and 3, designing a rule-fused deep reinforcement learning energy management strategy.

Further, in the step 1, a plug-in hybrid electric vehicle is used as a target for establishing, and a parallel structure is used as a connection mode of an engine and a motor for establishing a hybrid power system model. The plug-in hybrid power system comprises a fuel engine, a motor, a vehicle-mounted power battery, an oil tank, a torque coupler, a clutch and a 5-gear transmission. The fuel engine is connected with the torque coupler, the motor is directly connected with one end of the torque coupler, the output end of the torque coupler is connected with the clutch, the other end of the clutch is connected with the 5-gear transmission, and then power is transmitted to the front axle to drive the vehicle to run;

the power battery adopts a Rint equivalent circuit model:

in the formula, I is a battery, and is positive when discharging and negative when charging; u shape _ocv The open-circuit voltage of the battery can be obtained by an open-circuit voltage test; r is the internal resistance of the battery, the value of which changes along with the SOC and can be obtained by looking up a table; p is _bat As the power of the battery, when the motor torque T _m When the battery is in discharge state and the motor torque T is positive _m When the voltage is negative, the battery is in a charging state; n is a radical of an alkyl radical _m The motor rotation speed; eta _bat-d The efficiency of discharge for the cell; eta _bat-c Efficiency of charging the battery; eta _m The efficiency of the motor under the current rotating speed and torque is obtained; SOC is the state of charge of the battery; Δ t is the sampling interval; q is the battery capacity;

the longitudinal running equation of the vehicle, regardless of the vertical motion and the operational stability of the vehicle, is as follows:

wherein Tcon is the torque required by the current working condition; ig is the transmission ratio of the transmission at the current gear; i0 is the transmission ratio of the main speed reducer; eta _T To the overall transmission efficiency; r is the wheel radius; m is the mass of the whole vehicle; g is gravity acceleration; f is a rolling resistance coefficient; theta is the ramp angle; CD is the air resistance coefficient; a is the frontal area of the vehicle; u is the vehicle speed; δ is a rotation mass conversion coefficient.

The vehicle torque coupler adopts three-port two-degree-of-freedom mechanical configuration, a port 1 is used for unidirectional power input, a port 2 and a port 3 are used for bidirectional power input or output, the port 1 is connected with an engine crankshaft, the port 2 is connected with a motor output shaft, and the port 3 is connected with a clutch input end;

the relationship between the torque and the rotating speed of each port of the torque coupler is as follows:

in the formula, T _e Is the engine torque; n is a radical of an alkyl radical _e Is the engine speed; t is a unit of ₃ Outputting torque for the coupler; n is ₃ Outputting the rotating speed for the coupler; i all right angle _e For the transmission ratio at the connection of port 1 to the crankshaft of the engine, i is taken here _e ＝1；i _m For the transmission ratio of the port 2 connected with the output shaft of the motor, the rotating speed of the motor is generally higher and needs to be reduced, the invention i _m Taking 1.7368;

there are 3 driving modes according to the energy flow direction of the engine and the motor in the torque coupler:

(1) A combined driving mode: in the mode, the port 1 and the port 2 are power input ends, the port 3 is a power output end, the engine and the motor jointly provide power to drive the vehicle to run, and the motor torque T is at the moment _m Positive, the battery is in a discharged state;

(2) Pure electric drive mode: in the mode, the port 1 has no power input, the port 2 is a power input end, and the motor drives the vehicle independently, wherein the motor torque T is _m If the voltage is positive, the battery is in a discharge state, the engine is stopped, and because the port 1 is in one-way power input, the decoupling of the engine on a power system can be realized, so that the mechanical loss is reduced;

(3) A motor charging mode: in this mode, the motor of the vehicle becomes a generator, and the motor torque T _m Is negative; and can be classified into charging in a driving state and charging in a non-driving state according to the vehicle running state. When the vehicle is charged in a driving state, the clutch is combined, the port 1 is a power input end, the port 2 and the port 3 are power output ends, the engine provides power to drive the vehicle to run, meanwhile, the generator is driven to rotate, and the battery is in a charging state. When charging is carried out in a parking state, the port 1 is a power input end, the port 2 is a power output end, the port 3 has no power output, the clutch is separated, mechanical loss caused by the gearbox and the front axle is reduced, and the engine only provides power for the generator to charge the battery.

Further, the kinematic segment of the vehicle condition in step 2 represents the vehicle driving state from the beginning of one idling to the beginning of the next idling, and includes an idling process and a driving process, wherein the vehicle is in a stationary state during the idling process, and the driving process includes multiple acceleration, constant speed and deceleration behaviors of the vehicle. In the invention, for comprehensively establishing a deep reinforcement learning training working condition, 8 standard working conditions are selected to establish a working condition library, the working condition library is subjected to kinematics segmentation, and then the following 9 representative parameters are selected according to the segmented kinematics segment to calculate the characteristics of the kinematics segment: average vehicle speed, average running vehicle speed, maximum vehicle speed, average acceleration, acceleration ratio, deceleration ratio, constant speed ratio, maximum acceleration and maximum deceleration;

the characteristic parameters in each kinematic segment can represent the characteristics of the kinematic segment, but each characteristic parameter is not independent and has a certain relationship with each other, so that the invention utilizes principal component analysis to reduce the dimension of the characteristic parameters of the kinematic segment and simultaneously covers all working condition characteristics as fully as possible, thereby reducing the classification difficulty and improving the reliability. The specific implementation process is as follows:

(1) Data were normalized:

wherein x is _ij A j-th characteristic parameter representing an i-th kinematic segment;

is the sample mean; s _j Is the standard deviation. i =1,2,3, \ 8230;, n; j =1,2,3, \ 8230;, m.

(2) Calculating the covariance matrix C of the Z matrix

(3) Eigenvalue decomposition of covariance matrix C

C＝Q∑Q ^-1 (6)

Wherein Q is a matrix formed by eigenvalue vectors, sigma is a diagonal matrix, and the elements on the diagonal are eigenvalues lambda ₁ 、λ ₂ 、…、λ _m 。

(4) Calculating the contribution ratio p of each feature vector ₁ 、p ₂ 、…p _m And cumulative contribution rates.

Wherein, the first and the second end of the pipe are connected with each other,

cumulative contribution rate P _j Is the accumulation of the first k principal component contribution rates.

(5) Taking the feature vector corresponding to the principal component as a conversion matrix, and multiplying the data matrix by the conversion matrix to realize principal component mapping to obtain the corresponding kinematics segment feature parameters after dimension reduction;

then, fuzzy C-means clustering in the fuzzy clustering is used, and the clustering analysis is carried out on the kinematic segments according to the obtained principal component result, wherein the flow is as follows:

(1) Setting the number of clusters n _c And a weighting index b;

(2) Initializing each cluster center m _j

(3) Calculating membership functions of all samples under the current clustering center:

wherein mu _j (x _i ) Expressed as the membership function of the ith sample corresponding to the jth class.

(4) Calculating various clustering centers under the current membership function:

(5) And (4) repeating the steps (3) and (4) until the algorithm converges or the maximum iteration number is reached.

To determine the number of clusters n _c L (n) is used herein _c ) The function is used as an evaluation index, and the formula is as follows:

in the formula, the numerator represents the sum of the inter-class distances and the denominator represents the sum of the intra-class distances, so L (n) _c ) Larger values indicate better classification.

And according to the fuzzy clustering result, combining the different types of kinematic fragments into a 3-type kinematic fragment library, then randomly extracting a certain number of kinematic fragments from the 3-type kinematic fragment library, and randomly arranging the kinematic fragments to obtain 3 working conditions for training.

And finally, training and identifying the working condition type under the training working condition of 3 by using the LVQ neural network, wherein the specific steps are as follows:

(1) And combining the working conditions 1,2 and 3 for training, calculating 9 corresponding characteristic parameters in window data by using a sliding window algorithm to serve as input of the LVQ neural network, and training by taking a vector form of the working condition category as a label.

(2) If the number of windows is too large, the window data may include more than one type of operating condition data, thereby increasing the difficulty of identification. If the number of the windows is too short, the working condition characteristic information is incomplete, so that the identification precision is reduced, and the fuel economy of the whole vehicle is reduced. Comprehensively, the method uses 35s as the window length to perform rolling extraction of the characteristic parameters of the working conditions.

(3) And training the LVQ neural network. The selected hyper-parameters are as follows: the number of nodes of the LVQ nerve competition layer is 500, the learning rate is 0.0005, the type of the learning function is learnlv1, and the iteration cycle is 50 times.

(4) And verifying the accuracy of the LVQ neural network. And carrying out sliding window operation with the length of 35s on the verification working condition, rolling the extracted characteristic parameters to be used as input of the trained LVQ neural network, and carrying out indexing operation on the output to obtain a verification working condition identification result.

Furthermore, the design in the step 3 comprises a state, an action, an agent and a penalty function, a state space is selected as a required torque Tr, a battery SOC and a current transmission ratio of the transmission, an action variable is selected as an engine output torque Te and a gear shifting action Ag, the agent design of the fusion rule takes the idea of energy distribution by using a rule algorithm into reference, the rule is fused into a machine for deep Q learning, a deep Q learning algorithm for the fusion rule is obtained, the number of effective samples in a sample pool is increased, a plug-in hybrid electric vehicle generally controls a battery SOC working interval within a certain range to ensure the cycle life of the battery and a small amount of electric energy storage for special conditions, the SOC is used as a rule control quantity, and the efficient working range of the SOC is set to be 0.2-0.8; and taking the torque of the power system as a regular control quantity;

the penalty function calculation method comprises the following steps:

wherein b is the fuel consumption rate, and can be obtained from a universal characteristic curve chart according to the current torque and the rotating speed of the engine; ρ is the fuel density; g is the acceleration of gravity; cf is the price per liter of fuel; ce is the price of electrical energy per kwh; lambda [ alpha ] _A Is a shift action value weighting factor; lambda [ alpha ] _p1 Is a penalty factor under a poor shift strategy; lambda [ alpha ] _p2 Is a penalty factor for SOC exceeding the upper and lower usage limits.

Compared with the prior art, the invention adopting the technical scheme has the following beneficial effects:

1. training the designed rule-fused deep reinforcement learning algorithm under three different training working conditions to obtain three deep neural networks net1, net2 and net3 suitable for different working condition categories for energy distribution of a hybrid power system;

2. in the actual use process, a sliding window algorithm is used firstly, 9 corresponding characteristic parameters in window data are calculated and used as input of a trained LVQ neural network to obtain the current working condition type, and then a rule-fused deep reinforcement learning algorithm under the training of the corresponding working condition type is used for distributing the energy of the hybrid power system, so that the purpose of efficient energy distribution and utilization is achieved.

Drawings

FIG. 1 is a block diagram of a plug-in hybrid powertrain system;

FIG. 2 is a battery Rint equivalent circuit model;

FIG. 3 is a flow chart of an energy management policy algorithm.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 1 shows a plug-in hybrid power system structure diagram, which is composed of a fuel engine, an electric motor, a vehicle-mounted power battery, a fuel tank, a torque coupler, a clutch and a 5-gear transmission. The fuel engine is connected with the torque coupler, the motor is directly connected with one end of the torque coupler, the output end of the torque coupler is connected with the clutch, the other end of the clutch is connected with the 5-gear transmission, and then power is transmitted to the front axle to drive the vehicle to run. The vehicle model comprises a five-gear transmission, and gears

The torque required by the powertrain is directly related to and thus affects the power reserve capacity of the vehicle, so the torque of the powertrain is also referred to herein as a regulated control quantity.

The plug-in hybrid electric vehicle engine is a power source for driving the vehicle to run and supplementing the electric quantity of the battery, the importance of the plug-in hybrid electric vehicle engine is higher than that of the electric motor, so that the engine torque is used as a first-stage rule control quantity, the SOC of the battery is used as a second-stage rule control quantity, and the motor torque is used as a third-stage rule control quantity because the motor torque is larger and the power reserve capacity is stronger.

As shown in fig. 2, which is an equivalent circuit model of the battery Rint, it can be obtained:

in the formula, I is a battery, and is positive when discharging and negative when charging; u shape _ocv The open-circuit voltage of the battery can be obtained by an open-circuit voltage test; r is the internal resistance of the battery, the value of which changes along with the SOC and can be obtained by looking up a table; p _bat For the power of the battery, when the motor torque T _m When the time is positive, the battery is in a discharge state, and the motor torque T _m When the voltage is negative, the battery is in a charging state; n is _m The motor rotating speed; eta _bat-d The cell discharge efficiency; eta _bat-c Efficiency of charging the battery; eta _m The efficiency of the motor under the current rotating speed and torque is obtained; SOC is the state of charge of the battery; Δ t is the sampling interval; q is the battery capacity.

As shown in figure 3 is a flow chart of an energy management policy algorithm,

the method comprises the steps of firstly, conducting dimensionality reduction on characteristic values of the velocity fragments in a working condition by using principal component analysis, classifying the motion fragments by using fuzzy clustering, conducting working condition recombination according to classification results to obtain low-speed, medium-speed and high-speed training working conditions, and conducting training on working condition types by using an LVQ neural network. And then, establishing a rule with the engine torque, the SOC and the motor torque as rule control variables and the driving mode as output quantities, integrating the rule into an agent of deep reinforcement learning, and training the rule-integrated deep reinforcement learning energy management under three working conditions by combining with a designed penalty function. And then in the actual use process, firstly, extracting characteristic parameters of the current operation working condition by using a sliding window algorithm, then, taking the characteristic parameters as the input of the trained LVQ neural network to obtain the current working condition category, and then, selecting a rule after corresponding working condition training according to the working condition category, and fusing a deep reinforcement learning energy management strategy to carry out energy distribution on the hybrid power system.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A rule fusion deep reinforcement learning energy management method based on working condition identification is characterized by comprising the following steps:

step 1, establishing a hybrid power system model, establishing a plug-in hybrid power automobile as a target, and establishing the hybrid power system model by adopting a parallel structure as a connection mode of an engine and a motor, wherein the plug-in hybrid power system consists of a fuel engine, the motor, a vehicle-mounted power battery, an oil tank, a torque coupler, a clutch and a 5-gear gearbox, the fuel engine is connected with the torque coupler, the motor is directly connected with one end of the torque coupler, the output end of the torque coupler is connected with the clutch, the other end of the clutch is connected with the 5-gear gearbox, and then transmitting power to a front axle to drive the vehicle to run;

the power battery adopts a Rint equivalent circuit model:

wherein, I is the current of the battery, and is positive when discharging and negative when charging; u shape _ocv The open-circuit voltage of the battery can be obtained by an open-circuit voltage test; r is the internal resistance of the battery, the value of which changes along with the SOC and can be obtained by looking up a table; p _bat For the power of the battery, when the motor torque T _m When the time is positive, the battery is in a discharge state, and the motor torque T _m When the voltage is negative, the battery is in a charging state; n is _m The motor rotating speed; eta _bat-d The efficiency of discharge for the cell; eta _bat-c Efficiency of charging the battery; eta _m The efficiency of the motor under the current rotating speed and torque is obtained; SOC is the state of charge of the battery; Δ t is the sampling interval; q is the battery capacity;

the longitudinal running equation of the vehicle is that when the vertical motion and the operation stability of the vehicle are not considered:

wherein Tcon is the torque required by the current working condition; ig is the transmission ratio of the transmission at the current gear; i0 is the transmission ratio of the main speed reducer; eta _T The total transmission efficiency; r is the wheel radius; m is the mass of the whole vehicle; g is the acceleration of gravity; f is a rolling resistance coefficient; theta is a road slope angle; CD is the air resistance coefficient; a is the frontal area of the vehicle; u is the vehicle speed; delta is a rotating mass conversion coefficient;

T ₃ ＝i _e T _e +i _m T _m

in the formula, T _e Is the engine torque; n is _e Is the engine speed; t is a unit of ₃ Outputting torque for the coupler; n is a radical of an alkyl radical ₃ Outputting the rotating speed for the coupler; i.e. i _e For the transmission ratio at the position where the port 1 is connected with the crankshaft of the engine, i is taken _e ＝1；i _m Gear ratio of port 2 connected to the output shaft of the motor, i _m Taking the weight as 1.7368;

step 2, classifying and identifying working conditions, wherein the kinematic segment of the working condition of the vehicle represents the driving state of the vehicle in the period from one idling starting to the next idling starting, and comprises an idling process and a driving process, wherein the vehicle is in a static state in the idling process, and the driving process comprises multiple acceleration, constant speed and deceleration behaviors of the vehicle; establishing a deep reinforcement learning training working condition, selecting 8 standard working conditions, establishing a working condition library, performing kinematics segmentation on the working condition library, and then selecting the following 9 representative parameters according to the segmented kinematics segment to calculate the characteristics of the kinematics segment: average vehicle speed, average running vehicle speed, maximum vehicle speed, average acceleration, acceleration ratio, deceleration ratio, constant speed ratio, maximum acceleration and maximum deceleration;

the characteristic parameters in each kinematic segment can represent the characteristics of the kinematic segment, but each characteristic parameter is not independent and has a certain relationship with each other, so that the characteristic parameters of the kinematic segment are subjected to dimension reduction by utilizing principal component analysis, all working condition characteristics are covered as comprehensively as possible, the classification difficulty is reduced, and the reliability is improved;

step 3, designing a rule-fused deep reinforcement learning energy management strategy, wherein the design comprises a state, an action, an agent and a penalty function, a state space is selected as a required torque Tr, a battery SOC and a current transmission ratio of a transmission, an action variable is selected as an engine output torque Te and a gear shifting action Ag, the agent design of the fusion rule carries out energy based on the thought of the rule algorithm, a depth Q learning algorithm of the fusion rule is obtained, the number of effective samples in a sample pool is increased, the SOC is used as a rule control quantity, and the efficient working range of the SOC is set to be 0.2-0.8; and taking the torque of the power system as a regular control quantity;

the penalty function calculation method comprises the following steps:

wherein b is the fuel consumption rate, and can be obtained from the universal characteristic curve chart according to the current torque and the rotating speed of the engine; ρ is the fuel density; g is the acceleration of gravity; cf is the price per liter of fuel; ce is the price of electrical energy per kwh; lambda [ alpha ] _A Is a shift action value weighting factor; lambda [ alpha ] _p1 Is a penalty factor under a poor shift strategy; lambda [ alpha ] _p2 Is the penalty factor of SOC exceeding the upper and lower use limits.

2. The rule fusion deep reinforcement learning energy management method based on working condition identification as claimed in claim 1, wherein the step 1 can be divided into 3 driving modes according to the energy flow direction of the engine and the motor in the torque coupler:

(1) A combined driving mode: in the mode, the port 1 and the port 2 are power input ends, the port 3 is a power output end, the engine and the motor jointly provide power to drive the vehicle to run, the motor torque Tm is positive, and the battery is in a discharging state;

(2) Pure electric drive mode: in the mode, the port 1 has no power input, the port 2 is a power input end, the motor drives the vehicle independently, the motor torque Tm is positive, the battery is in a discharge state, and the engine is stopped;

(3) And (3) a motor charging mode: in this mode, the motor of the vehicle becomes a generator, and the motor torque Tm is negative; the vehicle can be charged in a driving state and charged in a non-driving state according to the driving state of the vehicle, when the vehicle is charged in the driving state, the clutch is combined, the port 1 is a power input end, the port 2 and the port 3 are power output ends, the engine drives the generator to rotate while providing power to drive the vehicle to run, the battery is in the charging state, when the vehicle is charged in the parking state, the port 1 is the power input end, the port 2 is the power output end, the port 3 has no power output, the clutch is separated, the mechanical loss caused by the gearbox and the front axle is reduced, and the engine only provides power for the generator to charge the battery.

3. The rule fusion deep reinforcement learning energy management method based on working condition identification as claimed in claim 1, wherein the specific implementation process in step 2 is as follows:

(1) Data were normalized:

is the sample mean; s _j Is the standard deviation; i =1,2,3, \8230;, n; j =1,2,3, \8230;, m;

(2) Calculating the covariance matrix C of the Z matrix

(3) Eigenvalue decomposition of covariance matrix C

C＝Q∑Q ^-1

Wherein Q is a matrix formed by eigenvalue vectors, sigma is a diagonal matrix, and the elements on the diagonal are eigenvalues lambda ₁ 、λ ₂ 、…、λ _m ；

(4) Calculating the contribution ratio p of each feature vector ₁ 、p ₂ 、…p _m And cumulative contribution rate;

cumulative contribution rate P _j Accumulating contribution rates of the first k principal components;

(1) Setting the number of clusters n _c And a weighting index b;

(2) Initializing each cluster center m _j

wherein mu _j (x _i ) Expressed as a membership function of the ith sample corresponding to the jth class;

(5) Until the algorithm converges or reaches the maximum iteration times, otherwise, repeating the steps (3) and (4);

to determine the number of clusters n _c Using L (n) _c ) The function is used as an evaluation index, and the formula is as follows:

wherein the numerator represents the sum of the inter-class distances and the denominator represents the sum of the intra-class distances, so L (n) _c ) The larger the value is, the better the classification effect is;

4. The rule fusion deep reinforcement learning energy management method based on working condition identification as claimed in claim 3, wherein the step 2 further comprises training identification of working condition classes under 3 training working conditions by using LVQ neural network, and the specific steps are as follows:

(1) Combining the training working conditions 1,2 and 3, calculating 9 corresponding characteristic parameters in window data by using a sliding window algorithm as input of an LVQ neural network, and training by using a vector form of the working condition category as a label;

(2) If the number of the windows is too large, the window data may contain more than one type of working condition data, so that the identification difficulty is increased, and if the number of the windows is too short, the working condition characteristic information is incomplete, so that the identification precision is reduced, and the fuel economy of the whole vehicle is reduced, so that the working condition characteristic parameters are extracted by rolling with 35s as the window length;

(3) Training the LVQ neural network, wherein the selected hyper-parameters are as follows: the number of nodes of the LVQ nerve competition layer is 500, the learning rate is 0.0005, the type of the learning function is Learnlv1, and the iteration cycle is 50 times;

(4) Verifying the accuracy of the LVQ neural network, performing 35 s-long sliding window operation on the verification working condition, taking the characteristic parameters extracted in a rolling mode as the input of the trained LVQ neural network, and performing indexing operation on the output to obtain a verification working condition identification result.