CN115730529A

CN115730529A - PHET energy management strategy generation method and system based on working condition identification

Info

Publication number: CN115730529A
Application number: CN202211627066.4A
Authority: CN
Inventors: 王姝; 赵轩; 韩琪; 谢鹏辉; 张凯
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-03-03
Anticipated expiration: 2042-12-16
Also published as: CN115730529B

Abstract

A PHET energy management strategy generation method and system based on working condition identification are disclosed, the method comprises: constructing typical running conditions of the vehicle under different running scenes; identifying the real-time running condition of the vehicle; constructing a neural network based on a DDPG algorithm, and performing deep reinforcement learning on a source domain of the neural network to finish training of the neural network, wherein the source domain is a typical driving condition of a vehicle under different operation scenes; and transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy according with the driving scene characteristics, wherein the target domain is the real-time driving working condition of the vehicle. The method can solve the problems that the existing energy management strategy based on learning cannot ensure the real-time optimization of energy consumption, the control strategy has poor timeliness and poor real-time application effect under the brand-new, complex and special and variable driving working conditions, the balance of the energy consumption and the aging state of the power battery of the plug-in hybrid electric vehicle is not considered, and the like.

Description

PHET energy management strategy generation method and system based on working condition identification

Technical Field

The invention belongs to the technical field of new energy automobile design, and particularly relates to a PHET energy management strategy generation method and system based on working condition identification.

Background

With the rapid development of the new energy automobile industry, compared with the traditional internal combustion engine automobile, the hybrid electric automobile can realize better fuel economy and lower exhaust emission, and has higher driving range compared with a pure electric automobile. Especially for plug-in hybrid electric vehicles, compared with traditional hybrid electric vehicles, the plug-in hybrid electric vehicle has the advantage that electric energy can be obtained from a power grid through an external charger, and adaptability to various different driving environments is enhanced, so that the plug-in hybrid electric vehicle is widely concerned and researched in the field of commercial vehicles. At present, research aiming at optimizing and improving the performance of a plug-in hybrid electric heavy truck (PHET) mainly focuses on a control strategy of energy management, and with the development of artificial intelligence technology, an energy management strategy based on learning, in particular a Deep Reinforcement Learning (DRL) method, becomes an effective method in a real-time energy management strategy. However, the deep reinforcement learning has certain disadvantages in real-time application due to the complexity of the calculation. Meanwhile, due to the complexity and the variety of the running conditions of the PHET, higher requirements are put forward on the energy management strategy.

The energy management strategy based on deep reinforcement learning currently involved has the following disadvantages:

1) After a control strategy based on deep reinforcement learning is trained, the optimal control strategy is optimal under one working condition, the suboptimal control strategy is suboptimal under the other working condition, and the real-time optimization of energy consumption cannot be guaranteed; 2) When deep reinforcement learning faces a brand-new working condition, even if the energy management strategy problem in the same field exists, the deep reinforcement learning needs to be explored again, longer calculation time needs to be consumed, and the requirement on the real-time performance of the energy management strategy is difficult to meet; 3) In the traditional working condition identification, the construction of a typical representative driving working condition is based on a standard driving working condition, however, the PHET driving working condition is variable, the driving behaviors under different operation scenes are different, and the standard driving working condition is not enough to reflect the driving characteristics of a heavy truck under a specific scene; 4) The existing energy management strategy adopts transfer learning to accelerate the training rate of deep reinforcement learning, so as to achieve the purpose of accelerating the training efficiency. However, the influence of the variability of the running conditions and the accurate identification of the real-time running conditions of the vehicle on the formulation and application of the energy management strategy is not considered. 5) Most of the existing deep reinforcement learning energy management strategies only pay attention to how to utilize intelligent information to optimize the fuel economy of a vehicle on the basis of guaranteeing the dynamic property, and the state balance of battery energy loss and aging is not considered.

Disclosure of Invention

The invention aims to provide a PHET energy management strategy generation method and system based on working condition identification, which can obtain a control strategy according with the characteristics of a driving scene, and solve the problems that the real-time optimization of energy consumption cannot be guaranteed in the existing learning-based energy management strategy, the control strategy has poor timeliness and poor real-time application effect under the brand-new, complex and special variable driving working conditions, the balance of energy loss and aging state of a power battery of a plug-in hybrid electric vehicle is not considered, and the like.

In order to achieve the purpose, the invention has the following technical scheme:

a PHET energy management strategy generation method based on working condition identification comprises the following steps:

constructing typical running conditions of the vehicle under different running scenes;

identifying the real-time running condition of the vehicle;

constructing a neural network based on a DDPG algorithm, and performing deep reinforcement learning on a source domain of the neural network to finish training of the neural network, wherein the source domain is a typical driving condition of a vehicle under different operation scenes;

and transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy according with the driving scene characteristics, wherein the target domain is the real-time driving working condition of the vehicle.

As a preferred scheme, the building of the typical driving conditions of the vehicle in different operating scenes specifically comprises the following steps:

collecting running condition data of the vehicle in different running scenes through cloud big data or vehicle-mounted OBD;

preprocessing the running condition data by adopting wavelet decomposition and reconstruction, and performing kinematic segmentation on the preprocessed data;

performing dimensionality reduction processing on the characteristic parameters describing the characteristics of each kinematic segment by adopting a principal component analysis algorithm;

and classifying the kinematic segments by adopting an SVM and K-means mixed classification algorithm, and constructing typical driving conditions under different operation scenes by utilizing a Markov chain and Monte Carlo simulation method on the basis of finishing classification.

As a preferred scheme, when the real-time running condition of the vehicle is identified, learning vector quantization is selected as a condition identifier.

As a preferred scheme, the identifying the real-time driving condition of the vehicle specifically includes the following steps:

selecting characteristic parameters by calculating Pearson correlation coefficients among the classical characteristic parameters;

extracting and training corresponding characteristic parameters based on typical running condition data of the vehicle in different running scenes;

and identifying the real-time running condition of the vehicle by calculating the Pearson correlation coefficient among the characteristic parameters.

As a preferred scheme, in the step of extracting and training the corresponding characteristic parameters, a sliding window mode is adopted for parameter extraction.

Preferably, when identifying the real-time driving condition of the vehicle by calculating the pearson correlation coefficient among the characteristic parameters, 25s is selected as the initial identification view, and the vehicle driving condition is determined for the accumulated historical conditions every 25s by using a rolling superposition method.

As a preferred scheme, the neural network is constructed based on the DDPG algorithm, deep reinforcement learning is carried out on a source domain of the neural network, and training of the neural network is completed by designing a state space, an action space and a reward function of the deep reinforcement learning;

the design expression for the state space is as follows:

S＝{V,acc,SoC,SoH}

in the formula, V and acc are respectively the vehicle speed and the vehicle acceleration, soC is the battery charge state, and SoH is the battery health state;

the design expression of the action space is as follows:

action＝{P _eng |P _eng ∈[0,172kw]}

in the formula, P _eng Outputting power for the engine;

the design expression of the reward function is as follows:

J＝{α[fuel(t)+elec(t)]+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]}

where J is an objective function defined in energy management, α is a weight of fuel consumption, β is a weight of battery power maintenance, γ is a weight of battery degradation cost, fuel is a fuel consumption, elec is an electric energy consumption, and SOC is _ref Is a reference value of the battery SOC, SOH _ref Is a reference value for the state of health of the battery.

As a preferred scheme, the neural network is constructed based on the DDPG algorithm, deep reinforcement learning is carried out on a source domain of the neural network, and training of the neural network is completed, and the method further comprises the steps of providing corresponding constraints for all parts of the whole vehicle power assembly;

the constraint expression is as follows:

the DDPG algorithm is a deep reinforcement learning algorithm developed based on an Actor-Critic framework, and is used for solving the problem of the Actor-network mu (s | theta) ^μ ) Middle inputState observations, then mapped to a deterministic behavior by neural networks; critic-network Q (s | theta) ^Q ) Inputting the action taken by the Actor network and the observed quantity of the current state to evaluate the quality of the current action;

introducing target Actor-network mu' (s | theta) ^μ' ) And target critical-network Q' (s | theta) ^Q' ) To estimate Q-value:

y _t ＝r _t +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' )

training a Critic-network:

randomly selecting empirical data from an experience pool, calculating a loss function and updating a Critic-network parameter, wherein the goal of the DDPG algorithm is to minimize the expectation of the loss function by updating a network parameter, and the calculated time difference (td) -error is as follows:

where L is the average penalty, N is the fixed size of the mini-batch, randomly selected from the empirical replay buffer;

for Actor-network mu (s | theta) ^μ ) The purpose of the selection action is to maximize the Q-value, and thus the parameter θ ^μ The value is solved by applying a gradient method, and the derived chain rule is as follows:

in addition, the target networks μ 'and Q' are learned with time lag updating, and the specific expression is as follows:

wherein tau is a soft update factor, theta and theta' are parameters of an original network and a target network respectively;

on the premise of ensuring the fuel economy of the whole vehicle, the controller searches for an optimal solution in a smaller action space.

As a preferred scheme, the generating the PHET energy management strategy according with the driving scene characteristics by transferring the trained neural network from the source domain to the target domain by using the transfer learning includes: in a given source domain M _s And a target domain M _t On the basis of the data, the slave source domain M is obtained through transfer learning _s Middle learning target domain M _t Optimum strategy of (n) ^* And the transfer from the source domain to the target domain is realized, and the source network and the target network both use the same DDPG architecture.

A PHET energy management strategy generation system based on working condition identification comprises:

the typical working condition construction module is used for constructing typical running working conditions of the vehicle under different running scenes;

the real-time working condition identification module is used for identifying the real-time driving working condition of the vehicle;

the neural network training module is used for constructing a neural network based on a DDPG algorithm, performing deep reinforcement learning on a source domain of the neural network and finishing the training of the neural network, wherein the source domain is a typical driving working condition of a vehicle under different running scenes;

and the transfer learning module is used for transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy according with the driving scene characteristics, wherein the target domain is the real-time driving working condition of the vehicle.

Compared with the prior art, the invention at least has the following beneficial effects:

the driving condition recognition technology is combined with a depth certainty strategy gradient (DDPG) algorithm based on Transfer Learning (TL) so as to obtain a control strategy according with the driving scene characteristics, and the method has obvious effects on improving the PHET comprehensive performance and improving the system efficiency and adaptability and is obviously superior to the existing energy management strategy based on deep reinforcement learning. The PHET energy management strategy generation method provided by the invention adopts Transfer Learning (TL) to realize the transfer of the energy management strategy between the source domain (based on the PHET typical driving condition constructed by data driving) and the target domain (based on the PHET real-time driving condition identified by the neural network condition identification algorithm).

Drawings

FIG. 1 is a schematic diagram of an overall energy management control framework for a plug-in hybrid vehicle;

FIG. 2 is a schematic view of the driving parameters after wavelet decomposition and reconstruction preprocessing;

figure 3 is a schematic flow diagram based on a markov chain and monte carlo simulation;

FIG. 4 is a schematic diagram of three exemplary representative cycle conditions constructed: (a) urban construction; (b) mining; (c) coal;

FIG. 5 is a schematic diagram of a learning vectorized neural network;

FIG. 6 is a schematic diagram of an engine map (containing an optimal equivalent fuel consumption curve);

FIG. 7 is an engine output power profile for three energy management strategies under three operating scenarios: (a) a scenario one; (b) scenario two; and (c) scene three.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, those skilled in the art can also obtain other embodiments without creative efforts.

As shown in fig. 1, the overall energy management control framework of the PHET is divided into an upper layer and a lower layer, the upper layer is a working condition identification framework based on LVQ, and the running scene of the vehicle is identified by collecting real-time data of vehicle running on the basis of the constructed PHET running working condition. The lower layer is an energy management control framework of deep migration reinforcement learning based on a DDPG algorithm, and based on the actual operation scene of the automobile identified by the upper layer, the migration learning technology applies the neural network which is fully trained by the Deep Reinforcement Learning (DRL) in the corresponding scene to the actual working condition so as to realize the goal that the current neural network converges with fewer learning turns under the target working condition, so that the energy management strategy is rapidly generated, and the optimal performance is ensured to be obtained. The method for generating the PHET energy management strategy based on the working condition recognition is divided into four steps of construction of a typical driving working condition, recognition of a real-time driving working condition of a vehicle, pre-training and storage of a representative working condition of each operation scene based on deep reinforcement learning of a DDPG algorithm, and transfer of a pre-trained neural network from a source domain to a target domain by adopting transfer learning.

Step 1: the construction of the typical running condition specifically comprises the following steps:

step 1.1: and acquiring running condition data of the vehicle under different application scenes through the cloud big data/vehicle-mounted OBD. The application scenarios of PHET studied by the invention mainly include three types: urban construction residue soil transport vehicle, mining transport vehicle and coal transport vehicle.

Step 1.2: because the acquired original data often presents the conditions of burrs, sudden changes and the like, the original data is smoothed and denoised by adopting wavelet decomposition and reconstruction before the typical running working condition is constructed. The data after wavelet decomposition and reconstruction preprocessing is shown in fig. 2. And performing kinematic segmentation on the preprocessed data, wherein the segmentation standard is as follows: idling section (vehicle speed < 2km/h, -0.15 m/s) ² Acceleration < 0.15m/s ² ) Acceleration section (acceleration is more than or equal to 0.15 m/s) ² ) The deceleration section (the acceleration is less than or equal to-0.15 m/s) ² ) And a cruise section (the speed of the vehicle is more than or equal to 2km/h and is-0.15 m/s ² Acceleration < 0.15m/s ² )。

Step 1.3: and (3) performing dimensionality reduction on the characteristic parameters describing the characteristics of each motion segment by adopting a Principal Component Analysis (PCA) algorithm. The principal component analysis results of the characteristic parameters are shown in table 1.

TABLE 1

Principal component	Variance (variance)	Contribution ratio (%)	Cumulative contribution ratio (%)
				1	5.5762	55.762	55.762
2	2.0218	20.228	75.990
				3	1.0112	10.112	86.102
4	0.7913	7.913	94.015
				5	0.3975	3.975	97.99

As can be seen from Table 1, the variance of the first 3 principal components is greater than 1, the fourth is close to 1 and the cumulative contribution rate of the first four principal components is greater than 90%, so the four characteristic parameters basically contain most of the information of the original variables. The first four principal components, i.e., maximum speed, minimum speed, average speed, and standard deviation of speed, are selected to characterize the kinematics of the travel segment.

Step 1.4: and classifying the motion segments by adopting an SVM and K-means mixed classification algorithm. On the basis of finishing classification, representative cycle conditions under three application scenes are constructed by utilizing a Markov chain and a Monte Carlo simulation method. The process flow based on the markov chain and monte carlo simulation is shown in figure 3. The vehicle speed-time relationships for the three exemplary representative cycle conditions constructed are shown in fig. 4 (a), (b), and (c).

Step 2: the method for identifying the real-time running condition of the automobile specifically comprises the following steps:

the embodiment of the invention selects a data-driven mode to carry out modeling of working condition identification.

Because the Learning Vector Quantization (LVQ) has higher accuracy and strong practicability on the identification of the working condition, the invention selects the learning vector quantization as the working condition identifier. The learning vector quantization neural network structure is shown in fig. 5. The LVQ neural network is mainly divided into a competition layer and a linear layer, wherein the neural network combines competition learning and supervised learning to classify an input vector X in the competition layer, and the process comprises two parts: one is to select the best matching neuron and the other is to adaptively update the weight vector. In the linear layer, the classification result of the competition layer is transferred to a target classification defined by a user. Driving scenes for PHET can be classified into 3 types: scene 1 represents an urban construction residue soil transport vehicle, scene 2 represents a mining transport vehicle, and scene 3 represents a coal transport vehicle.

Step 2.1: through calculating the Pearson correlation coefficient among the classical characteristic parameters, 10 characteristic parameters of maximum speed, average speed, speed standard deviation, average acceleration, average deceleration, acceleration standard deviation, acceleration proportion (acceleration time/total time), deceleration proportion (deceleration time/total time), uniform speed proportion (uniform speed time/total time) and idle speed proportion (idle speed time/total time) are selected.

Specific numerical values of the characteristic parameters of the driving scenes are shown in table 2.

TABLE 2

Characteristic parameter	Urban construction	Mining	Coal (coal)
				Maximum vehicle speed (km/h)	64.87	65.02	86.00
Average speed (km/h)	18.78	13.80	41.74
				Standard deviation of speed	21.13	15.70	27.97
Mean acceleration (m/s) ² )	0.31	0.38	0.30
				Average deceleration (m/s) ² )	-0.42	-0.45	-0.46
Standard deviation of acceleration	0.23	0.27	0.28
				Acceleration ratio (%)	0.18	0.17	0.25
Deceleration ratio (%)	0.12	0.15	0.17
				Mean square rate (%)	0.30	0.27	0.12
Idling ratio (%)	0.39	0.40	0.44

Step 2.2: and (3) extracting and training the corresponding characteristic parameters based on the 3 typical driving cycle representative working condition data established in the step (1). When the training set is determined, in order to improve the number of training samples, the invention adopts a sliding window mode to extract parameters. Because the duration of the synthesized representative working condition is limited, and a single working condition is not enough to provide enough training samples, the working conditions established in the previous text are repeated for 10 times and connected in series to serve as the representative working condition, the window length is selected to be 1800s, 100s is used as interval time, characteristic parameters of a speed interval in the window are extracted, and the extracted characteristic parameters are input into an identifier for training.

Step 2.3: the real-time driving working condition of the automobile is accurately identified by calculating the Pearson correlation coefficient among all the characteristic parameters. When the running condition of the real vehicle running data is identified, 25s is selected as an initial identification view field, and the rolling superposition mode is adopted to judge the running condition of the vehicle for the accumulated historical working condition every 25 s. The characteristics of the whole driving cycle may not be reflected due to the fact that the initially selected working condition is short in time, but the working condition characteristics can represent the driving scene of the vehicle more and more along with the increase of the accumulated time, and the proportion of the driving scene determined by the driving condition recognition module for inputting the accumulated historical time each time in the first 1000s of the running of the vehicle is taken as the scene to which the vehicle runs at this time according to the actual situation.

And step 3: the method specifically comprises the following steps of pre-training and storing representative working conditions of each operation scene based on the deep reinforcement learning of the DDPG algorithm:

and pre-training the 3 types of typical representative driving condition data constructed by the source domain, namely the first part by adopting a deep reinforcement learning method based on a DDPG algorithm. The overall algorithm for DDPG is as follows:

step 3.1: the deep reinforcement learning state, action and reward are designed, and the specific description is as follows:

in the design of the state space, the embodiment of the invention not only takes the consumption of the whole system energy into consideration, but also takes the balance of the system energy loss and the battery aging into consideration, so the SOC, the temperature and the battery state of health SOH taking the battery aging into consideration are selected. The entire state space is shown in the following formula (1):

S＝{V,acc,SoC,SoH} (1)

v and acc are respectively the vehicle speed and the vehicle acceleration, soC is the battery charge state, soH is the battery health state, and the variables are key parameters for representing the vehicle running state.

In terms of the design of the motion space, the control strategy of the overall vehicle energy management aims at continuously controlling the mechanical power of the vehicle, so the engine output power is taken as a control variable, which is specifically shown in the following formula (2):

action＝{P _eng |P _eng ∈[0,172kw]} (2)

in the design of a reward function (objective function) of an energy management problem, the reward function is determined by comprehensively considering several optimization objectives of vehicle energy consumption, power battery SOC and battery degradation cost, and is specifically expressed as the following formula (3):

J＝{α[fuel(t)+elec(t)]+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]} (3)

wherein J is an objective function defined in energy management, alpha is weight of fuel consumption, beta is weight of battery power maintenance, gamma is weight of battery degradation cost, fuel is fuel consumption, elec is electric energy consumption, SOC _ref Is a reference value of the SOC of the battery, SOH _ref Is a reference value for the state of health of the battery.

Step 3.2: corresponding constraints are provided for all parts of the whole vehicle power assembly, and the specific expression is shown as the following formula (4):

the DDPG algorithm is a deep reinforcement learning algorithm developed based on an Actor-critical architecture, wherein the Actor-network mu (s | theta) is ^μ ) Is a state observation, which is then mapped to a deterministic behavior by a neural network. Critic-network Q (s | theta [ ]) ^Q ) And inputting the action taken by the Actor network and the observed quantity of the current state to evaluate the quality of the current action.

Step 3.4: in order to reduce the deviation generated by adopting a single Critic to estimate the Q-value and adopting a single Actor network to select an action, a target Actor-network mu' is introduced (s | theta) ^μ' ) And target critical-network Q' (s | theta) ^Q' ) To estimate Q-value, the specific expression is shown in the following formula (5):

y _t ＝r _t +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' ) (5)

step 3.5: training the Critic-network. Randomly selecting a small batch of experience from the experience pool, calculating a loss function and updating a Critic-network parameter, wherein the goal of the DDPG is to minimize the expectation of the loss function by updating the network parameter, and the calculated time difference (td) -error is shown as the following formula (6):

where L is the average loss, N is the fixed size of the mini-batch, and the replay buffer from experience is randomly selected for the Actor-network μ (s | θ |) ^μ ) The purpose of the selection action is to maximize the Q-value, and thus the parameter θ ^μ The update of (2) can be solved numerically by using a gradient method, and the derived chain rule is shown in the following formula (7):

in addition, the target networks μ 'and Q' can greatly improve the stability of learning by using time lag updating, and a specific expression is shown in the following formula (8):

wherein, tau is a soft update factor, and theta' are parameters of the original network and the target network respectively.

Step 3.6: on the premise of ensuring the fuel economy of the whole vehicle, in order to effectively reduce the dimension of the deep reinforcement learning action space and enable the controller to search for an optimal solution in a smaller action space and further accelerate the convergence rate of the reinforcement learning, as shown in fig. 6, an optimal fuel consumption rate curve is constructed in a Map of the engine, and when the engine works, the power of any engine corresponds to a rotating speed torque pair on the curve.

And 4, step 4: the method adopts transfer learning to realize the transfer of a pre-trained neural network from a source domain to a target domain, and specifically comprises the following steps:

based on the transferability of the neural network, the transfer learning technology can apply the neural network which is fully trained under the corresponding scene to the actual working condition. The transmission of the energy management policy between the source domain and the target domain can be realized by combining a deep migration (DTL) algorithm and a DDPG algorithm.

The source domain is the three typical driving cycles of step 1 representing the operating conditions, and the target domain is the real-time driving conditions of the vehicle identified by the operating condition identification module in step 2. Since the drive cycles of the source domain and the target domain have the same feature space and are correlated to each other, at a given source domain M _s And a target domain M _t Based on the data, the migration learning can realize the slave source domain M _s Middle learning target domain M _t Of (2) is determined ^* Namely, the transfer of the source domain knowledge to the related target domain is realized. Meanwhile, most parameters in the neural network are the same, and only the parameters of the output layer need to be retrained, so that the source network and the target network both use the same DDPG architecture.

As mentioned above, the PHET energy management strategy generation method based on the working condition recognition combines the driving working condition recognition technology and the deep reinforcement learning method based on the transfer learning, so as to obtain the control strategy according with the driving scene characteristics.

As shown in (a), (b) and (c) of fig. 7, based on the DP-based energy management strategy, the engine condition of the whole control strategy is closer to the DP-based energy management strategy, and the proportion of the engine operating point in the high power region is smaller than that of the deep reinforcement learning energy management strategy without considering the driving condition identification, which indicates that the energy management strategy provided by the invention has good fuel economy performance and better energy saving effect than the DRL energy management strategy without driving condition identification.

As shown in table 3, since the battery energy consumption is considered in combination with the dynamic balance of the battery health when determining the state space and the reward function of the DDPG, if the fuel consumption index alone is considered, the economy of the proposed energy management strategy is slightly reduced, but the degradation cost of the power battery is considered together, and the comprehensive operation cost of the control strategy is effectively reduced compared with the DDPG algorithm which ignores the battery health strategy.

TABLE 3

Finally, as shown in table 4, the transfer learning technique is adopted, so that the training period can be effectively shortened in the training of the neural network, the number of convergent iteration steps is reduced by about 50%, the convergence speed is increased, the real-time utilization of the energy strategy based on the working condition recognition provided by the invention is facilitated, and the efficiency of the control and implementation of the whole vehicle is improved.

TABLE 4

TABLE 5

The PHET energy management strategy generation method based on the working condition identification at least has the following advantages:

(1) A two-tiered energy management framework is provided. The upper layer adopts a Learning Vectorization (LVQ) -based working driving condition identification framework, the lower layer adopts a depth migration reinforcement learning control framework based on a depth certainty strategy gradient (DDPG) algorithm, and a driving condition identification technology and the depth certainty strategy gradient (DDPG) algorithm based on migration learning (TL) are combined to obtain a control strategy according with driving scene characteristics. Based on an energy management strategy based on Dynamic Programming (DP), compared with a deep reinforcement learning energy management strategy without considering driving condition identification, the energy management strategy combining driving condition identification and deep reinforcement learning is closer to the DP-based energy management strategy in the aspect of engine operation, and the proportion of the working point proportion of an engine in a high-power interval is smaller than that of the deep reinforcement learning energy management strategy without considering driving condition identification. Meanwhile, the battery state of charge (SOC) trajectory descending trend of the energy management strategy adopting the driving condition identification technology and considering the battery state of health (SOH) is slower and the fluctuation is smaller in the whole process. The energy management strategy provided by the invention can adopt a power distribution strategy capable of reflecting the characteristics of an actual operation scene, has obvious effects on improving the comprehensive performance of PHET and improving the efficiency and adaptability of a system, and is obviously superior to the existing energy management strategy based on deep reinforcement learning.

(2) The method is based on data driving, firstly, an SVM (support vector machine) and K-means mixed classification algorithm is used for classifying collected historical motion data segments of the vehicle, and then a Markov chain and a Monte Carlo simulation method are used for constructing a typical representative cyclic driving working condition reflecting a real operation scene and driving behaviors of a plug-in hybrid electric vehicle (PHET). Because the constructed driving condition data are derived from the real driving related data of the vehicle, the constructed driving condition is used as a source domain of the transfer learning, and a more accurate evaluation basis can be provided for the actual energy consumption of the PHET, so that the proposed energy management strategy has important practical significance.

(3) And (3) accurately identifying the scene of the actual running working condition of the vehicle by adopting a Learning Vectorization (LVQ) neural network based working condition identification algorithm. In order to enhance the accuracy of identifying the working condition, 10 characteristic parameters are finally selected by calculating the pearson correlation coefficient among the classical characteristic parameters, wherein the 10 characteristic parameters are respectively maximum speed, average speed, speed standard deviation, average acceleration, average deceleration, acceleration standard deviation, acceleration proportion (acceleration time/total time), deceleration proportion (deceleration time/total time), uniform proportion (uniform time/total time) and idle proportion (idle time/total time). Meanwhile, when a training set is determined, in order to improve the number of training samples, the method adopts a sliding window mode to extract parameters. By means of the method, on the premise of guaranteeing usability, accuracy of real-time working condition recognition is greatly improved, and guarantee is provided for presentation of a target domain of later transfer learning.

(4) The lower-layer control framework adopts a depth deterministic strategy gradient (DDPG) algorithm, and the DDPG algorithm of the invention adopts a priority replay experience, so that the training efficiency, the stability of the training process and the robustness of a model can be improved while the randomness and the dependency among samples are removed. Meanwhile, extra noise is added in the output of the Actor-Network to enable the DDPG algorithm to better explore and select correct action, and Soft-max action noise (SAN) noise is adopted by comparing the influence of different action noises on the performance of the algorithm.

(5) In determining the state space and reward function of the DDPG algorithm. Compared with the conventional optimization algorithm which only considers the consumption of the whole system energy, the method also considers the state balance of the battery energy consumption and the aging, and takes the state of health (SOH) of the battery into the consideration range of the state space. Meanwhile, when determining the objective function (reward function), the battery degradation cost is introduced into the optimization object together with the fuel consumption item. By the mode, the running state of the vehicle in each time period is more comprehensively and deeply represented, the comprehensive performance of the whole vehicle under the algorithm is further improved, and the fuel consumption cost and the electric energy consumption cost of the whole vehicle are reduced while the battery degradation cost is also reduced.

(6) On the premise of ensuring fuel economy, in order to effectively reduce the dimension of the DDPG algorithm action space, an optimal fuel consumption rate curve is constructed in an engine map, and when an engine runs, any engine power corresponds to a rotating speed and torque pair on the curve. Therefore, the controller can find the optimal solution in a smaller motion space, and the convergence speed of reinforcement learning is further increased.

(7) Transfer Learning (TL) is adopted to realize the transfer of energy management strategies between a source domain (PHET typical driving condition constructed based on data driving) and a target domain (PHET real-time driving condition identified based on an LVQ neural network condition identification algorithm), and the optimal strategy of the target domain is learned from the source domain on the basis that the source domain provides prior knowledge that the target domain can access. Therefore, the convergence rate of the energy management strategy training can be increased, the timeliness of the energy management control strategy is effectively improved, and the adaptability of the energy management control strategy under the changeable complex driving working condition is improved.

Another embodiment of the present invention further provides a system for generating a PHET energy management policy based on condition identification, including:

the real-time working condition identification module is used for identifying the real-time running working condition of the vehicle;

the neural network training module is used for constructing a neural network based on a DDPG algorithm, performing deep reinforcement learning on a source domain of the neural network and completing training of the neural network, wherein the source domain is a typical driving working condition of a vehicle under different running scenes;

Another embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the PHET energy management policy generation method based on condition identification according to the present invention is implemented.

For example, the instructions stored in the memory may be divided into one or more modules/units, and the one or more modules/units are stored in the computer readable storage medium and executed by the processor to implement the PHET energy management strategy generation method based on condition recognition according to the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the server.

The electronic device can be a computing device such as a smart phone, a notebook, a palm computer and a cloud server. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or combine certain components, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the server. Further, the memory may also include both an internal storage unit of the server and an external storage device. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the above contents of information interaction, execution process, and the like between the module units, specific functions and technical effects brought by the same concept as that of the method embodiment may be specifically referred to a part of the method embodiment, and details are not described here.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. A PHET energy management strategy generation method based on working condition identification is characterized by comprising the following steps:

constructing typical running conditions of the vehicle in different running scenes;

identifying the real-time running condition of the vehicle;

2. The PHET energy management strategy generation method based on condition identification as claimed in claim 1, wherein the construction of the typical driving conditions of the vehicle under different operation scenes specifically comprises the following steps:

preprocessing the driving condition data by adopting wavelet decomposition and reconstruction, and performing kinematic segmentation on the preprocessed data;

and classifying the kinematics sections by adopting an SVM (support vector machine) and K-means mixed classification algorithm, and constructing typical driving conditions under different operation scenes by utilizing a Markov chain and a Monte Carlo simulation method on the basis of finishing classification.

3. The PHET energy management strategy generation method based on working condition identification according to claim 1, wherein learning vector quantization is selected as a working condition identifier when the real-time driving working condition of the vehicle is identified.

4. The PHET energy management strategy generation method based on condition identification as claimed in claim 3, wherein the step of identifying the real-time driving condition of the vehicle specifically comprises the steps of:

selecting characteristic parameters by calculating the Pearson correlation coefficient among the classical characteristic parameters;

5. The PHET energy management strategy generation method based on working condition identification as claimed in claim 4, wherein in the step of extracting and training the corresponding characteristic parameters, a sliding window mode is adopted for parameter extraction.

6. The PHET energy management strategy generation method based on condition identification as claimed in claim 4, wherein when identifying the real-time driving condition of the vehicle by calculating the Pearson correlation coefficient among the characteristic parameters, 25s is selected as an initial identification view field, and the accumulated historical conditions are judged by rolling and overlaying every 25 s.

7. The PHET energy management strategy generation method based on working condition identification as claimed in claim 1, wherein the neural network is constructed based on DDPG algorithm, deep reinforcement learning is carried out on the source domain of the neural network, and the training of the neural network is completed by designing the state space, the action space and the reward function of the deep reinforcement learning;

the design expression for the state space is as follows:

S＝{V,acc,SoC,SoH}

the design expression of the motion space is as follows:

action＝{P _eng |P _eng ∈[0,172kw]}

in the formula, P _eng Outputting power for the engine;

the design expression of the reward function is as follows:

J＝{α[fuel(t)+elec(t)]+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]}

where J is an objective function defined in energy management, α is a weight of fuel consumption, β is a weight of battery power maintenance, γ is a weight of battery degradation cost, fuel is a fuel consumption, elec is an electric energy consumption, and SOC is _ref Is a reference value of the SOC of the battery, SOH _ref Is a reference value for the state of health of the battery.

8. The PHET energy management strategy generation method based on working condition identification as claimed in claim 7, wherein the neural network is constructed based on DDPG algorithm, deep reinforcement learning is carried out on the source domain of the neural network, and the training of the neural network is completed further comprising proposing corresponding constraints for each part of the whole vehicle powertrain;

the constraint expression is as follows:

the DDPG algorithm is a deep reinforcement learning algorithm developed based on an Actor-Critic framework, and is developed on the Actor-networkμ(s|θ ^μ ) Inputting state observed quantity, and mapping to a deterministic behavior through a neural network; critic-network Q (s | theta) ^Q ) Inputting the action taken by the Actor network and the observed quantity of the current state to evaluate the quality of the current action;

introducing target Actor-network mu' (s | theta) ^μ′ ) And target critical-network Q' (s | theta) ^Q′ ) To estimate Q-value:

y _t ＝r _t +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ′ )|θ ^Q )

training a Critic-network:

randomly selecting empirical data from an experience pool, calculating a loss function and updating a Critic-network parameter, wherein the goal of the DDPG algorithm is to minimize the expectation of the loss function by updating the network parameter, and the calculated time difference (td) -error is as follows:

wherein L is the average loss and N is the fixed size of the mini-batch, randomly selected from the empirical replay buffer;

wherein tau is a soft update factor, and theta' are parameters of an original network and a target network respectively;

9. The method for generating the PHET energy management strategy based on the working condition identification according to claim 1, wherein the step of transferring the trained neural network from a source domain to a target domain by adopting the transfer learning to generate the PHET energy management strategy according with the driving scene features comprises the following steps: in a given source domain M _s And a target domain M _t On the basis of the data, the slave source domain M is obtained through transfer learning _s Middle learning target domain M _t Of (d) an optimum strategy of ^* And the transfer from the source domain to the target domain is realized, and the source network and the target network both use the same DDPG architecture.

10. A PHET energy management strategy generation system based on working condition identification is characterized by comprising the following steps:

and the transfer learning module is used for transferring the trained neural network from a source domain to a target domain by using transfer learning to generate a PHET energy management strategy according with the driving scene characteristics, wherein the target domain is the real-time driving working condition of the vehicle.