CN117787925B - Method, device, equipment and medium for managing hybrid power energy - Google Patents

Method, device, equipment and medium for managing hybrid power energy Download PDF

Info

Publication number
CN117787925B
CN117787925B CN202410217191.0A CN202410217191A CN117787925B CN 117787925 B CN117787925 B CN 117787925B CN 202410217191 A CN202410217191 A CN 202410217191A CN 117787925 B CN117787925 B CN 117787925B
Authority
CN
China
Prior art keywords
layer
action
state
neural network
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410217191.0A
Other languages
Chinese (zh)
Other versions
CN117787925A (en
Inventor
张元清
吕潇
张元生
朱铭
刘鹏
孙昊
甄文昊
陈章铭
宋岳汶
贠思齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Beikuang Intelligent Technology Co ltd
Original Assignee
Beijing Beikuang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Beikuang Intelligent Technology Co ltd filed Critical Beijing Beikuang Intelligent Technology Co ltd
Priority to CN202410217191.0A priority Critical patent/CN117787925B/en
Publication of CN117787925A publication Critical patent/CN117787925A/en
Application granted granted Critical
Publication of CN117787925B publication Critical patent/CN117787925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Hybrid Electric Vehicles (AREA)

Abstract

The application provides a method, a device, equipment and a medium for managing hybrid power energy, wherein the method comprises the following steps: using the observation state quantity generated in the test process of the hybrid power test vehicle system to perform deep reinforcement learning training on the initial management model to obtain a trained target management model; inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model; and managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy. The application can effectively improve the fuel economy of the scraper and can fully play the energy buffering function of the super capacitor.

Description

Method, device, equipment and medium for managing hybrid power energy
Technical Field
The application relates to the technical field of energy management, in particular to a method, a device, equipment and a medium for hybrid power energy management.
Background
As an important way of energy conservation and emission reduction of vehicles, the hybrid power technology is highly valued in the fields of automobiles and engineering machinery at present, and by means of the advantages of various performances of the super capacitor, the hybrid power technology is applied to the hybrid power system to match and coordinate with an engine, and has strong applicability to peak power output matching and fuel economy improvement. The energy management strategy refers to a strategy for reasonably distributing output power among various driving devices according to specific requirements of the vehicle in the running process. The main aim is to optimize the energy utilization on the premise of ensuring the vehicle performance.
Current research into energy management strategies is largely divided into three categories, namely rule-based management strategies, optimization-based management strategies, and learning-based management strategies. The capacity management strategy in the prior art has larger limitation and is not obvious for improving the performance of the vehicle system.
Disclosure of Invention
Accordingly, the present application is directed to a method, apparatus, device and medium for hybrid energy management, which overcome the problems in the prior art.
In a first aspect, an embodiment of the present application provides a method for hybrid energy management, the method including:
using the observation state quantity generated in the test process of the hybrid power test vehicle system to perform deep reinforcement learning training on the initial management model to obtain a trained target management model;
Inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model;
And managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
In some embodiments of the present application, the initial management model includes a first neural network model and a second neural network model; the observation state quantity generated in the test process of the test vehicle system using the hybrid power is used for carrying out deep reinforcement learning training on the initial management model to obtain a trained target management model; comprising the following steps:
Inputting the observed state quantity into the first neural network model to obtain a prediction action output by the first neural network model;
Inputting the observed state quantity and the predicted action into the second neural network model to obtain a current value evaluation value output by the second neural network model;
And adjusting a first parameter of the first neural network model and a second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, and obtaining the target management model when a preset convergence stop condition is reached.
In some embodiments of the present application, after the test vehicle system executes an action, a reward for the action is generated and updated to a state corresponding to the action; building a pool of experiences based on the actions, rewards, and the status;
the method obtains the observed state quantity by:
acquiring a preset number of observed state quantities from the experience pool; the observed state quantity includes a first action, a first state, a second state, and a first reward; wherein the first state is a state after the first action is performed by the test vehicle system; the second state is a state after the second action of the test vehicle system is executed, the second action is an adjacent action before the first action, and the first state is input into the first neural network model to obtain a second action predicted by the first neural network model.
In some embodiments of the present application, the first neural network model includes a first current network and a first target network, and the second neural network model includes a second current network and a second target network; the first current network and the second current network are used for calculating the current value evaluation value;
the adjusting the first parameter of the first neural network model and the second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, when reaching a preset convergence stop condition, obtaining the target management model includes:
And adjusting the first parameter of the first current network and the second parameter of the second current network based on the current value evaluation value and the observed state quantity, and obtaining the target management model when a preset convergence stop condition is reached.
In some embodiments of the present application, the adjusting the first parameter of the first current network and the second parameter of the second current network based on the observed state quantity of the current value evaluation value, when reaching a preset convergence stop condition, obtains the target management model includes:
calculating a target value evaluation value according to the current value evaluation value and the observed state;
adjusting a first parameter of the first current network and a second parameter of the second current network based on the current value evaluation value and the target value evaluation value with the aim of approaching a target network; the target network comprises a first target network and a second target network;
When a preset adjustment condition is reached, adjusting the target network to obtain an adjusted target network; the adjusted target network is used for being used when parameter adjustment is performed again.
In some embodiments of the present application, the first current network includes an input layer, an intermediate layer, an output layer, and a tanh activation function layer, where the intermediate layer includes two full connection layers and two ReLU activation function layers.
In some embodiments of the present application, the second current network includes a state path and an action path; wherein the state path comprises two full connection layers and a ReLU activation function layer; the action path comprises a full connection layer; and the full connection layer of the state path and the full connection layer of the action path are commonly connected with an addition layer, a ReLU activation function layer and an output layer.
In a second aspect, an embodiment of the present application provides an apparatus for hybrid energy management, the apparatus comprising:
The training module is used for performing deep reinforcement learning training on the initial management model by using the observation state quantity generated in the test process of the hybrid power test vehicle system to obtain a trained target management model;
The processing module is used for inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model;
And the management module is used for managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for hybrid energy management described above when the processor executes the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of hybrid energy management described above.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
The method comprises the steps of using an observation state quantity generated in a test process of a hybrid power test vehicle system to perform deep reinforcement learning training on an initial management model to obtain a trained target management model; inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model; and managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
The application approximates the actual value of the strategy function through the deep neural network, so that the system can work in a continuous state space and an action space, can process high-dimensional data and learn the structure in the data, can process very complex control problems, has very strong generalization capability, can effectively improve the fuel economy of the scraper, and can fully play the energy buffering function of the super capacitor.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for hybrid energy management according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a hybrid scraper power drive system according to an embodiment of the present application;
FIG. 3 is a schematic diagram showing interaction between an agent and an environment according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a network structure according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a first neural network model and a second neural network model data transmission provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of an initial management model specific training process provided by an embodiment of the present application;
FIG. 7 illustrates a schematic diagram of an apparatus for hybrid energy management provided by an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.
In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.
The hybrid power technology is highly valued in the fields of automobiles and engineering machinery at present, and by means of the advantages of various performances of the super capacitor, the hybrid power technology is applied to the hybrid power system to match and coordinate with an engine, and has strong applicability to peak power output matching and fuel economy improvement. The energy management strategy refers to a strategy for reasonably distributing output power among various driving devices according to specific requirements of the vehicle in the running process. The main aim is to optimize the energy utilization on the premise of ensuring the vehicle performance. Current research into energy management strategies is largely divided into three categories, namely rule-based management strategies, optimization-based management strategies, and learning-based management strategies.
Rule-based energy management strategies rely too much on human expert experience and technical experience and cannot achieve real-time optimization and achieve globally optimal solutions. The long parameter tuning period of this approach typically requires long trial and error and local adjustments, which can increase the difficulty of policy learning and trial. The mathematical model required by the energy management strategy based on optimization is complex, the real-time performance is poor, and the real-time optimization cannot be realized. The control strategy based on learning has the problems of dimension disaster, calculation efficiency and the like, and particularly under a high-dimension state space, the storage space and time-consuming cost in the calculation process are exponentially increased.
Based on this, the embodiment of the application provides a method, a device, equipment and a medium for managing hybrid power energy, and the description is given below by way of embodiment.
FIG. 1 is a flow chart of a method for hybrid energy management according to an embodiment of the present application, wherein the method includes steps S101-S103; specific:
s101, performing deep reinforcement learning training on an initial management model by using an observation state quantity generated in a test process of a hybrid power test vehicle system to obtain a trained target management model;
s102, inputting a state quantity to be detected of a vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model;
S103, managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
The application approximates the actual value of the strategy function through the deep neural network, so that the system can work in a continuous state space and an action space, can process high-dimensional data and learn the structure in the data, can process very complex control problems, has very strong generalization capability, can effectively improve the fuel economy of the scraper, and can fully play the energy buffering function of the super capacitor.
Some embodiments of the application are described in detail below. The following embodiments and features of the embodiments may be combined with each other without conflict.
The vehicle system of the embodiment of the application can specifically select a hybrid power scraper. The drive system configuration of the hybrid scraper power is shown in fig. 2. The vehicle system adopts a diesel power and electric power hybrid power device to provide power, and an engine and a super capacitor energy storage system are used as power sources. On the basis, the hydraulic system of the scraper provides hydraulic power for the working module of the scraper, and the driving system provides traction power for the movement of the scraper.
The power balance is very critical to the control of the hybrid power system, in practical application, the analysis of the energy flow is very important due to the specificity of the control system, and the energy conversion and the performance characteristics of the control system can be deeply known through the analysis of the energy flow, so that important references are provided for the design, optimization and adjustment of the system.
The hybrid power system mainly performs energy flow analysis through an energy supply end and an energy consumption end. The combination of the engine, the super capacitor and the generator is a combination form of an energy supply end of the hybrid power system, wherein the engine is mainly responsible for providing mechanical energy for the system; the super capacitor mainly plays roles of energy storage and instantaneous energy supply; the generator is then responsible for powering the motor drive system. The hydraulic system and the motor drive system are important components of the energy consuming end. The hydraulic system is a key part of transmission power and execution operation in engineering machinery, and converts mechanical energy output by an engine into pressure energy of liquid through a hydraulic pump so as to drive a hydraulic cylinder to complete work. The motor driving system obtains energy through the generator and the super capacitor, and the motor converts electric energy into mechanical energy to drive the mechanical component to complete work. On this basis, further energy conversion and flow analysis are required for key constituent elements of the energy supply end and the energy consumption end to further determine the physical quantity required to be acquired by the system control and the corresponding control parameters. The scraper hybrid power system mainly comprises various power sources, transmission elements and energy consumption elements, and the elements execute corresponding operations according to instructions of a control system.
For the energy supply end and the energy consumption end of the system, the energy supply end and the energy consumption end accord with the principle of energy conservation, and the power balance relation of the system can be obtained according to the principle:
Of the formula (I) Representing engine output power; /(I)Representing the efficiency of the transfer case; /(I)And/>Respectively representing the input power and the output power of the permanent magnet generator; /(I)And/>Representing the input and output power of the hydraulic system; /(I)Representing the charge and discharge power of the super capacitor; /(I)And/>Respectively representing the input power and the output power of the motor; /(I)Representing the total power of the load of the system.
The hybrid power scraper utilizes a reasonable energy management strategy by using the super capacitor as energy source buffer, and the power requirement of an energy consumption end is shared by releasing the energy stored by the super capacitor by utilizing the high load of the engine on the premise of meeting the normal working requirement of the scraper, so that the engine stably works in a more efficient power interval for a long time, and the fuel utilization efficiency is improved. The goal of the energy management strategy is therefore to meet the performance of the hybrid scraper system, ensure system energy stability, and improve fuel economy. It mainly needs to consider two problems, including: energy stability problem: the system load requirement is met, and meanwhile, the engine is not overloaded and the super capacitor SOC is in a reasonable range; fuel economy problems: the engine works in a power interval with relatively low oil consumption and higher efficiency. The embodiment of the application provides a hybrid power capacity management method, which not only ensures the economical efficiency of fuel oil, but also can keep the power of an engine in a proper interval based on an energy management strategy of deep reinforcement learning.
Specifically, in the embodiment of the application, the observation state quantity generated in the test process of the hybrid power test vehicle system is used for performing deep reinforcement learning training on the initial management model to obtain a trained target management model; inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model; and managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
In the embodiment of the application, the initial management model comprises a first neural network model and a second neural network model, wherein the first neural network model can select an Actor network when the method is implemented, and the second neural network model can select a Critic network. Based on the first neural network model and the second neural network model, an agent is constructed, interaction is carried out between the agent and the environment, and the observed state quantity generated in the test process of the test vehicle system is obtained as shown in fig. 3. The first neural network model comprises a first current network and a first target network, and the second neural network model comprises a second current network and a second target network; the first current network and the first target network have the same structure, and the second current network and the second target network have the same structure. As shown in fig. 4, the first current network includes an input layer, an intermediate layer, an output layer, and a tanh activation function layer, wherein the intermediate layer includes two fully connected layers and two ReLU activation function layers. The second current network comprises a state path and an action path; wherein the state path comprises two full connection layers and a ReLU activation function layer; the action path comprises a full connection layer; and the full connection layer of the state path and the full connection layer of the action path are commonly connected with an addition layer, a ReLU activation function layer and an output layer.
The number of neurons of an input layer and an output layer in a specific Actor network is respectively 4 and 1, the number of neurons of a middle layer is 2, the number of neurons of the middle layer of the first layer is 128, the number of neurons of the middle layer of the 2 nd layer is 200, and a ReLU function is adopted as an activation function. And the output layer uses the tanh activation function to constrain the output actions between [ -1,1 ].
The design of the Critic network is different from that of the Actor network in that the Critic network takes the action output by the Actor network as one of the inputs, so that the Critic network needs to be divided into two branches of the action input, one branch is a state path for processing the state input, the other branch is an action path for processing the action input, and then a common path is designed to connect the state path and the action path together in a summarized manner to form the Critic network. The number of neurons of the state path input layer is 4, the two intermediate layers are all in a full connection mode, the number of neurons of the first intermediate layer is 128, the number of neurons of the layer 2 intermediate layer is 200, the number of neurons of the action path input layer is 1, the number of neurons of the layer one intermediate layer is 200, the two paths are connected together by using an addition layer in a common path, the number of neurons of the output layer is 1, and the activation function adopts a ReLU function.
The observed state quantity in the embodiment of the application comprises a first action, a first state, a second state and a first reward; wherein the first state is a state after the first action is performed by the test vehicle system; the second state is a state after the second action of the test vehicle system is executed, the second action is an adjacent action after the first action, and the first state is input into the first neural network model to obtain a second action predicted by the first neural network model.
Status of: The power requirements of the energy consumption end systems of the hybrid power scraper are different, the output proportion of a power source is required to be adjusted according to the actual power requirements of the scraper, in addition, the SOC range of the super capacitor in the running process can influence the safety and the energy stability of the system, so that the power required by the hydraulic system in the running process of the hybrid power scraper is selectedDrive system demand power/>/>, Super capacitorAnd the last moment of action, i.e. the last moment of generator power/>As the observed state of the controller, the following is expressed:
the corresponding observed quantity can be measured by a sensor to obtain the motor rotating speed Motor torque/>And hydraulic pump oil pressure/>Obtained by calculation processing, t represents time.
Action: The action taken by the intelligent agent for energy management is a key link in the energy management strategy, and the action quantity is required to reasonably distribute the energy of the hybrid power system. The generator is used as a core unit for converting mechanical energy into electric energy, the generated power of the generator can be used for adjusting the energy provided by the engine, and the mechanical energy provided for the hydraulic system to work and the mechanical energy provided for the generator to generate electricity are distributed. The generator power is selected as the action, expressed as:
=/>
Is/> Time of day generator power.
Rewards: The reward signal is closely related to the optimization objective of the energy management strategy, which is to improve the fuel economy of the scraper during operation while maintaining the SOC within a reasonable range, so the reward function needs to be designed around both the fuel economy and the SOC range. The bonus function is therefore designed to be:
the above equation is a designed reward function, and is composed of three parts which are added together. The first part relates to the optimum power of the engine ,/>And/>Respectively refers to the power generated by the generator at the last moment and the power required by the hydraulic system, and when the sum of the power generated by the generator at the last moment and the power required by the hydraulic system is equal to the optimal power/>The smaller the deviation, the higher the prize value. The second part relates to the super capacitor SOC, and the system presets the initial value of the SOC as/>When the system drives the power/>And the smaller the SOC deviation from the initial value, the higher the prize value. The third part relates to the operational threshold of the system, wherein/>Is a very large constant,/>Is a variable indicating yes or no, when the power required by the system is such that the engine is overloaded or the SOC value exceeds a set threshold value,/>1 Otherwise/>0, Such that there is a larger penalty when the system run time parameter exceeds the threshold. The three parts are added in negative values, and the addition is the designed rewarding function according to a certain proportion, so that the intelligent body continuously approaches to the optimal power of the engine in the training process, the SOC value is kept stable, and the system operation is not overloaded.
As shown in FIG. 5, the observed state quantity is input into an Actor network, the Actor network outputs a first action, after the first action is executed, the agent obtains the first state and the first reward which are reached from the environment, and forms a four-element group with the action value and the last state (second state) to be stored in a memory pool. Then, the observed state quantity and the action are input into a Critic network to estimate a Q value (current value evaluation value), and based on the current value evaluation value and the observed state quantity, a first parameter of the first neural network model and a second parameter of the second neural network model are adjusted, and when a preset convergence stop condition is reached, the target management model is obtained.
The adjusting the first parameter of the first neural network model and the second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, when reaching a preset convergence stop condition, obtaining the target management model includes:
And adjusting the first parameter of the first current network and the second parameter of the second current network based on the current value evaluation value and the observed state quantity, and obtaining the target management model when a preset convergence stop condition is reached.
And adjusting the first parameter of the first current network and the second parameter of the second current network based on the current value evaluation value, and obtaining the target management model when a preset convergence stop condition is reached, wherein the method comprises the following steps:
calculating a target value evaluation value according to the current value evaluation value and the observed state;
adjusting a first parameter of the first current network and a second parameter of the second current network based on the current value evaluation value and the target value evaluation value with the aim of approaching a target network; the target network comprises a first target network and a second target network;
When a preset adjustment condition is reached, adjusting the target network to obtain an adjusted target network; the adjusted target network is used for being used when parameter adjustment is performed again.
Specifically, after the Q value is calculated, the gradient is calculated by using the Q value, and then the Actor network is updated by using a strategy gradient method. In the Actor network, the current state information is input, so that the action prediction can be performed, and the optimal action corresponding to the input state is judged. Thereafter, the action predicted by the Actor network is taken as input and combined with the next state into a new quadruple for input into the memory pool. Next, samples are randomly selected from the memory pool, and then the randomly selected states and actions are input into the Critic network to estimate the Q value. After calculating the estimated Q value of Critic, the Critic network is updated using Adam optimizer. At regular intervals, the Target network is updated to alleviate the problem that the Critic network may oscillate rapidly. The above process is cyclically performed until convergence. Finally, the trained goal management model enables predictions from state to action, while enabling adaptive interactions between agents and the environment.
The specific training process is shown in fig. 6, and is initialized: state space: Action space: /(I) ; The capacity of the experience pool D is set to K, and the minimum batch sample number is set to M; current Actor network parameters: /(I)Target Actor network parameters: /(I)Current Critic network parameters: /(I)Target Critic network parameters: /(I); Setting a loop L wheel and simulating the length T.
For episode = 1 : L :
Initializing random noise N;
Acquiring an initial state of the scraper:
For t = 1 : T :
According to the current strategy and random noise Select action/>
Executing an actionGenerating rewards/>Update status/>;
Sample state-action pairs will be learnedStore in experience pool D;
Randomly sampling M state-action pair samples from the experience pool D ;
For i = 1 : M :
Calculation using TD method:;/> For rewarding value,/> For Critic target network,/>For the corresponding parameter,/>Is an Actor target network,/>For the corresponding parameter,/>Is a discount factor.
Gradient descent minimization is used: updating Critic network parameters; in/> Is a parameter of the Critic current network,/>For the number of small batches,/>Is the Q value of the target network,/>The current network is used.
Policy gradient: Counter-propagating to update Actor network parameters; in the/> For gradient,/>Is the Critic current network,/>For the corresponding parameter,/>Is the current network of the Actor,/>Is the corresponding parameter. Then update/>, using gradient ramp-up
Updating target network parameters:
End for;
End for;
End for。
wherein: v is a constant coefficient;
The action space at the moment t;
the power is required for the hydraulic system in the initialized state of the scraper;
power is required for the drive system in the initialized state of the scraper;
initializing the charge state of the super capacitor in the state of the scraper;
The power of the generator in the initialized state of the scraper;
A state space corresponding to the sample i;
the motion space corresponding to the sample i;
a state space corresponding to the sample i+1;
L( A loss function under current network parameters;
a is a sample space value;
typically, the target smoothing factor is much smaller than 1.
After the target management model is obtained, the to-be-measured state quantity of the to-be-measured vehicle system is input into the target management model, and the target management strategy output by the target management model is obtained. And managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
When the required power of the vehicle system to be tested is higher, the required power of the hydraulic system is lower, and the charging requirement is higher, the generator gives out instruction power to generate electricity; when the required power of the hydraulic system is higher, the management system can reduce the power generation instruction power, so that the engine mainly provides energy for the hydraulic system, and the super capacitor discharges to supplement energy for the running system. Meanwhile, after the basic target management strategy is implemented, the change trend of the command power of the generator is smooth, the command power can be kept in a moderate area for a long time, and when the system has a demand, the higher command power can be rapidly given out.
The target management strategy can enable the power generation instruction power of the generator of the vehicle system to be tested and the corresponding engine output power to be smoother, and large fluctuation can not occur. Meanwhile, the optimal result is pursued from the global angle, the output power is not changed directly along with the power required by the system, the energy buffering function of the super capacitor is utilized more, the power output is carried out more stably, and the stability of the system energy is ensured. The output power of the engine can be seen to be kept in a moderate interval, the power cannot fluctuate back and forth between lower power and higher power, the engine can be operated in a more efficient range, and the fuel economy of the system is improved. The super capacitor SOC curve is smoother. Meanwhile, when larger power requirements are met, the SOC curve valley value of the super capacitor is lower, and the energy buffering effect of the super capacitor is better utilized.
Fig. 7 is a schematic structural diagram of an apparatus for hybrid energy management according to an embodiment of the present application, where the apparatus includes:
The training module is used for performing deep reinforcement learning training on the initial management model by using the observation state quantity generated in the test process of the hybrid power test vehicle system to obtain a trained target management model;
The processing module is used for inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model;
And the management module is used for managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy.
The initial management model comprises a first neural network model and a second neural network model; the observation state quantity generated in the test process of the test vehicle system using the hybrid power is used for carrying out deep reinforcement learning training on the initial management model to obtain a trained target management model; comprising the following steps:
Inputting the observed state quantity into the first neural network model to obtain a prediction action output by the first neural network model;
Inputting the observed state quantity and the predicted action into the second neural network model to obtain a current value evaluation value output by the second neural network model;
And adjusting a first parameter of the first neural network model and a second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, and obtaining the target management model when a preset convergence stop condition is reached.
Generating rewards of the action after the test vehicle system executes the action and updating the rewards to a state corresponding to the action; building a pool of experiences based on the actions, rewards, and the status;
the method obtains the observed state quantity by:
acquiring a preset number of observed state quantities from the experience pool; the observed state quantity includes a first action, a first state, a second state, and a first reward; wherein the first state is a state after the first action is performed by the test vehicle system; the second state is a state after the second action of the test vehicle system is executed, the second action is an adjacent action before the first action, and the first state is input into the first neural network model to obtain a second action predicted by the first neural network model.
The first neural network model comprises a first current network and a first target network, and the second neural network model comprises a second current network and a second target network; the first current network and the second current network are used for calculating the current value evaluation value;
the adjusting the first parameter of the first neural network model and the second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, when reaching a preset convergence stop condition, obtaining the target management model includes:
And adjusting the first parameter of the first current network and the second parameter of the second current network based on the current value evaluation value and the observed state quantity, and obtaining the target management model when a preset convergence stop condition is reached.
The step of adjusting the first parameter of the first current network and the second parameter of the second current network based on the current value evaluation value and the observed state quantity, and obtaining the target management model when a preset convergence stop condition is reached, includes:
calculating a target value evaluation value according to the current value evaluation value and the observed state;
adjusting a first parameter of the first current network and a second parameter of the second current network based on the current value evaluation value and the target value evaluation value with the aim of approaching a target network; the target network comprises a first target network and a second target network;
When a preset adjustment condition is reached, adjusting the target network to obtain an adjusted target network; the adjusted target network is used for being used when parameter adjustment is performed again.
The first current network comprises an input layer, an intermediate layer, an output layer and a tanh activation function layer, wherein the intermediate layer comprises two full connection layers and two ReLU activation function layers.
The second current network comprises a state path and an action path; wherein the state path comprises two full connection layers and a ReLU activation function layer; the action path comprises a full connection layer; and the full connection layer of the state path and the full connection layer of the action path are commonly connected with an addition layer, a ReLU activation function layer and an output layer.
As shown in fig. 8, an embodiment of the present application provides an electronic device for performing the method of hybrid energy management in the present application, where the device includes a memory, a processor, a bus, and a computer program stored on the memory and capable of running on the processor, where the processor implements the steps of the method of hybrid energy management when executing the computer program.
In particular, the above-mentioned memory and processor may be general-purpose memory and processor, and are not particularly limited herein, and the above-mentioned method of hybrid energy management can be performed when the processor runs a computer program stored in the memory.
Corresponding to the method of hybrid energy management in the present application, the embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the method of hybrid energy management described above.
In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, etc., on which a computer program is executed, capable of performing the above-described method of hybrid energy management.
In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other manners. The system embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions in actual implementation, and e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (5)

1. The method is characterized by being applied to a hybrid power scraper, wherein the hybrid power scraper adopts a combination form of an engine, a super capacitor and an energy supply end of a generator composite hybrid power system, and the engine is mainly responsible for providing mechanical energy; the super capacitor mainly plays roles of energy storage and instantaneous energy supply; the generator is responsible for providing energy for the motor drive system; the method comprises the following steps:
using the observation state quantity generated in the test process of the hybrid power test vehicle system to perform deep reinforcement learning training on the initial management model to obtain a trained target management model;
Inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model;
Based on the target management strategy, the hybrid power energy of the vehicle system to be tested is managed and distributed, so that the power requirement of an energy consumption end is shared by releasing the energy stored by the super capacitor by utilizing the high load of the engine on the premise of meeting the normal working requirement of the scraper, the engine stably works in a more efficient power interval for a long time, and the fuel utilization efficiency is improved;
the initial management model comprises a first neural network model and a second neural network model; the observation state quantity generated in the test process of the test vehicle system using the hybrid power is used for carrying out deep reinforcement learning training on the initial management model to obtain a trained target management model; comprising the following steps:
Inputting the observed state quantity into the first neural network model to obtain a prediction action output by the first neural network model;
Inputting the observed state quantity and the predicted action into the second neural network model to obtain a current value evaluation value output by the second neural network model;
based on the current value evaluation value and the observed state quantity, adjusting a first parameter of the first neural network model and a second parameter of the second neural network model, and obtaining the target management model when a preset convergence stop condition is reached;
Generating rewards of the action after the test vehicle system executes the action and updating the rewards to a state corresponding to the action; building a pool of experiences based on the actions, rewards, and the status;
the method obtains the observed state quantity by:
Acquiring a preset number of observed state quantities from the experience pool; the observed state quantity includes a first action, a first state, a second state, and a first reward; wherein the first state is a state after the first action is performed by the test vehicle system; the second state is a state after the second action of the test vehicle system is executed, the second action is an adjacent action before the first action, the first state is input into the first neural network model, and the second action predicted by the first neural network model is obtained;
The first neural network model comprises a first current network and a first target network, and the second neural network model comprises a second current network and a second target network; the first current network and the second current network are used for calculating the current value evaluation value;
The adjusting the first parameter of the first neural network model and the second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, when reaching a preset convergence stop condition, obtaining the target management model includes:
Based on the current value evaluation value and the observed state quantity, adjusting a first parameter of the first current network and a second parameter of the second current network, and obtaining the target management model when a preset convergence stop condition is reached;
the first current network comprises an input layer, a middle layer, an output layer and a tanh activation function layer, wherein the middle layer comprises two full-connection layers and two ReLU activation function layers; the number of neurons of an input layer and an output layer in a specific first current network is respectively 4 and 1, the number of neurons of the middle layer is set to be 2 layers, the number of neurons of the middle layer of the first layer is 128, the number of neurons of the middle layer of the 2 nd layer is 200, and a ReLU function is adopted as an activation function; and the output layer adopts a tanh activation function to restrict the output action between [ -1,1 ];
The second current network comprises a state path and an action path; wherein the state path comprises two full connection layers and a ReLU activation function layer; the action path comprises a full connection layer; the full connection layer of the state path and the full connection layer of the action path are connected with an addition layer, a ReLU activation function layer and an output layer together; the state path input layer neurons are 4 in number, the two intermediate layers are all in a full-connection mode, the first layer of intermediate layer neurons are 128 in number, the 2 nd layer of intermediate layer neurons are 200 in number, the activation function adopts a ReLU function, the action path input layer neurons are 1 in number, the one layer of intermediate layer neurons are 200 in number, the two paths are connected together by using an addition layer in a common path, the activation function adopts a ReLU function, and the output layer neurons are 1 in number;
The first state and the second state both comprise hydraulic system required power, driving system required power, super capacitor and generator power at the last moment, namely the last moment, in the running process of the hybrid power scraper;
the first action includes generator power;
the bonus function for obtaining the first bonus is:
The bonus function consists of the addition of three parts, the first part relating to the optimum power of the engine ,/>And/>Respectively refers to the power generated by the generator at the last moment and the power required by the hydraulic system; the second part relates to the super capacitor SOC, and the system presets the initial value of the SOC as/>; The third part relates to the operational threshold of the system, wherein/>Is a constant,/>Is a variable indicating yes or no, and B includes 0 or 1.
2. The method according to claim 1, wherein the adjusting the first parameter of the first current network and the second parameter of the second current network based on the current value evaluation value and the observed state quantity, when a preset convergence stop condition is reached, obtains the target management model includes:
calculating a target value evaluation value according to the current value evaluation value and the observed state;
adjusting a first parameter of the first current network and a second parameter of the second current network based on the current value evaluation value and the target value evaluation value with the aim of approaching a target network; the target network comprises a first target network and a second target network;
When a preset adjustment condition is reached, adjusting the target network to obtain an adjusted target network; the adjusted target network is used for being used when parameter adjustment is performed again.
3. The device for managing the hybrid power energy is characterized by being applied to a hybrid power scraper, wherein the hybrid power scraper adopts a combined form of an engine, a super capacitor and an energy supply end of a generator composite hybrid power system, and the engine is mainly responsible for providing mechanical energy; the super capacitor mainly plays roles of energy storage and instantaneous energy supply; the generator is responsible for providing energy for the motor drive system; the device comprises:
The training module is used for performing deep reinforcement learning training on the initial management model by using the observation state quantity generated in the test process of the hybrid power test vehicle system to obtain a trained target management model;
The processing module is used for inputting the state quantity to be detected of the vehicle system to be detected into the target management model to obtain a target management strategy output by the target management model;
The management module is used for managing and distributing the hybrid power energy of the vehicle system to be tested based on the target management strategy, so that the energy stored by the super capacitor is released by utilizing the high load of the engine to share the power requirement of the energy consumption end on the premise of meeting the normal working requirement of the scraper, the engine can stably work in a more efficient power interval for a long time, and the fuel utilization efficiency is improved;
the initial management model comprises a first neural network model and a second neural network model; the observation state quantity generated in the test process of the test vehicle system using the hybrid power is used for carrying out deep reinforcement learning training on the initial management model to obtain a trained target management model; comprising the following steps:
Inputting the observed state quantity into the first neural network model to obtain a prediction action output by the first neural network model;
Inputting the observed state quantity and the predicted action into the second neural network model to obtain a current value evaluation value output by the second neural network model;
based on the current value evaluation value and the observed state quantity, adjusting a first parameter of the first neural network model and a second parameter of the second neural network model, and obtaining the target management model when a preset convergence stop condition is reached;
Generating rewards of the action after the test vehicle system executes the action and updating the rewards to a state corresponding to the action; building a pool of experiences based on the actions, rewards, and the status;
the observed state quantity is obtained by:
Acquiring a preset number of observed state quantities from the experience pool; the observed state quantity includes a first action, a first state, a second state, and a first reward; wherein the first state is a state after the first action is performed by the test vehicle system; the second state is a state after the second action of the test vehicle system is executed, the second action is an adjacent action before the first action, the first state is input into the first neural network model, and the second action predicted by the first neural network model is obtained;
The first neural network model comprises a first current network and a first target network, and the second neural network model comprises a second current network and a second target network; the first current network and the second current network are used for calculating the current value evaluation value;
The adjusting the first parameter of the first neural network model and the second parameter of the second neural network model based on the current value evaluation value and the observed state quantity, when reaching a preset convergence stop condition, obtaining the target management model includes:
Based on the current value evaluation value and the observed state quantity, adjusting a first parameter of the first current network and a second parameter of the second current network, and obtaining the target management model when a preset convergence stop condition is reached;
the first current network comprises an input layer, a middle layer, an output layer and a tanh activation function layer, wherein the middle layer comprises two full-connection layers and two ReLU activation function layers; the number of neurons of an input layer and an output layer in a specific first current network is respectively 4 and 1, the number of neurons of the middle layer is set to be 2 layers, the number of neurons of the middle layer of the first layer is 128, the number of neurons of the middle layer of the 2 nd layer is 200, and a ReLU function is adopted as an activation function; and the output layer adopts a tanh activation function to restrict the output action between [ -1,1 ];
The second current network comprises a state path and an action path; wherein the state path comprises two full connection layers and a ReLU activation function layer; the action path comprises a full connection layer; the full connection layer of the state path and the full connection layer of the action path are connected with an addition layer, a ReLU activation function layer and an output layer together; the state path input layer neurons are 4 in number, the two intermediate layers are all in a full-connection mode, the first layer of intermediate layer neurons are 128 in number, the 2 nd layer of intermediate layer neurons are 200 in number, the activation function adopts a ReLU function, the action path input layer neurons are 1 in number, the one layer of intermediate layer neurons are 200 in number, the two paths are connected together by using an addition layer in a common path, the activation function adopts a ReLU function, and the output layer neurons are 1 in number;
The first state and the second state both comprise hydraulic system required power, driving system required power, super capacitor and generator power at the last moment, namely the last moment, in the running process of the hybrid power scraper;
the first action includes generator power;
the bonus function for obtaining the first bonus is:
The bonus function consists of the addition of three parts, the first part relating to the optimum power of the engine ,/>And/>Respectively refers to the power generated by the generator at the last moment and the power required by the hydraulic system; the second part relates to the super capacitor SOC, and the system presets the initial value of the SOC as/>; The third part relates to the operational threshold of the system, wherein/>Is a constant,/>Is a variable indicating yes or no, and B includes 0 or 1.
4. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of hybrid energy management according to any one of claims 1 to 2.
5. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method of hybrid energy management according to any of claims 1 to 2.
CN202410217191.0A 2024-02-28 2024-02-28 Method, device, equipment and medium for managing hybrid power energy Active CN117787925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410217191.0A CN117787925B (en) 2024-02-28 2024-02-28 Method, device, equipment and medium for managing hybrid power energy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410217191.0A CN117787925B (en) 2024-02-28 2024-02-28 Method, device, equipment and medium for managing hybrid power energy

Publications (2)

Publication Number Publication Date
CN117787925A CN117787925A (en) 2024-03-29
CN117787925B true CN117787925B (en) 2024-05-31

Family

ID=90383769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410217191.0A Active CN117787925B (en) 2024-02-28 2024-02-28 Method, device, equipment and medium for managing hybrid power energy

Country Status (1)

Country Link
CN (1) CN117787925B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112249002A (en) * 2020-09-23 2021-01-22 南京航空航天大学 Heuristic series-parallel hybrid power energy management method based on TD3
CN113051667A (en) * 2021-03-29 2021-06-29 东南大学 Accelerated learning method for energy management strategy of hybrid electric vehicle
CN113525396A (en) * 2021-08-13 2021-10-22 北京理工大学 Hybrid electric vehicle layered prediction energy management method integrating deep reinforcement learning
CN117227700A (en) * 2023-11-15 2023-12-15 北京理工大学 Energy management method and system for serial hybrid unmanned tracked vehicle
CN117332677A (en) * 2023-09-12 2024-01-02 吉林大学 Deep reinforcement learning-based energy management method for fuel cell hybrid electric vehicle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112249002A (en) * 2020-09-23 2021-01-22 南京航空航天大学 Heuristic series-parallel hybrid power energy management method based on TD3
CN113051667A (en) * 2021-03-29 2021-06-29 东南大学 Accelerated learning method for energy management strategy of hybrid electric vehicle
CN113525396A (en) * 2021-08-13 2021-10-22 北京理工大学 Hybrid electric vehicle layered prediction energy management method integrating deep reinforcement learning
CN117332677A (en) * 2023-09-12 2024-01-02 吉林大学 Deep reinforcement learning-based energy management method for fuel cell hybrid electric vehicle
CN117227700A (en) * 2023-11-15 2023-12-15 北京理工大学 Energy management method and system for serial hybrid unmanned tracked vehicle

Also Published As

Publication number Publication date
CN117787925A (en) 2024-03-29

Similar Documents

Publication Publication Date Title
CN110341690B (en) PHEV energy management method based on deterministic strategy gradient learning
Yuan et al. Intelligent energy management strategy based on hierarchical approximate global optimization for plug-in fuel cell hybrid electric vehicles
US11958365B2 (en) Method for dual-motor control on electric vehicle based on adaptive dynamic programming
CN113511082B (en) Hybrid electric vehicle energy management method based on rule and double-depth Q network
Padhy et al. A modified GWO technique based cascade PI-PD controller for AGC of power systems in presence of Plug in Electric Vehicles
CN110481536B (en) Control method and device applied to hybrid electric vehicle
Zhang et al. Twin delayed deep deterministic policy gradient-based deep reinforcement learning for energy management of fuel cell vehicle integrating durability information of powertrain
Lin et al. Online correction predictive energy management strategy using the Q-learning based swarm optimization with fuzzy neural network
Bo et al. A Q-learning fuzzy inference system based online energy management strategy for off-road hybrid electric vehicles
Li et al. A speedy reinforcement learning-based energy management strategy for fuel cell hybrid vehicles considering fuel cell system lifetime
CN112818588B (en) Optimal power flow calculation method, device and storage medium of power system
CN114362187B (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
CN112757922A (en) Hybrid power energy management method and system for vehicle fuel cell
Lv et al. Energy management of hybrid electric vehicles based on inverse reinforcement learning
CN116523327A (en) Method and equipment for intelligently generating operation strategy of power distribution network based on reinforcement learning
CN113110052A (en) Hybrid energy management method based on neural network and reinforcement learning
CN115085202A (en) Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium
CN114037048B (en) Belief-consistent multi-agent reinforcement learning method based on variational circulation network model
Kong et al. A data-driven energy management method for parallel PHEVs based on action dependent heuristic dynamic programming (ADHDP) model
Zhang et al. A generalized energy management framework for hybrid construction vehicles via model-based reinforcement learning
CN117787925B (en) Method, device, equipment and medium for managing hybrid power energy
CN115528750B (en) Power grid safety and stability oriented data model hybrid drive unit combination method
Li et al. A novel energy management strategy for series hybrid electric rescue vehicle
CN112613229B (en) Energy management method, model training method and device for hybrid power equipment
CN115107734A (en) Method and system for coordinated control of front and rear power chains of hybrid electric drive vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant