CN117151928A - Power saving calculation method and device combined with reinforcement learning - Google Patents

Power saving calculation method and device combined with reinforcement learning Download PDF

Info

Publication number
CN117151928A
CN117151928A CN202311143879.0A CN202311143879A CN117151928A CN 117151928 A CN117151928 A CN 117151928A CN 202311143879 A CN202311143879 A CN 202311143879A CN 117151928 A CN117151928 A CN 117151928A
Authority
CN
China
Prior art keywords
action
state
rewards
reinforcement learning
system performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311143879.0A
Other languages
Chinese (zh)
Inventor
刘姚
陈嘉诺
孙启文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202311143879.0A priority Critical patent/CN117151928A/en
Publication of CN117151928A publication Critical patent/CN117151928A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Feedback Control In General (AREA)

Abstract

The embodiment of the specification provides a power saving calculation method and device combined with reinforcement learning, wherein the method comprises the following steps: defining the definition, state, action, rewards and strategy of the reinforcement learning algorithm; optimizing an electric appliance control strategy through a reinforcement learning algorithm; and controlling the opening or closing of the electric appliance to save electricity through the optimized electric appliance control strategy.

Description

Power saving calculation method and device combined with reinforcement learning
Technical Field
The present document relates to the field of electrical technology, and in particular, to a power saving calculation method and apparatus combined with reinforcement learning.
Background
In the practical school electricity-saving application scenario, the activity time of students and teachers is not completely regular, and if we only set the on-off state of the electric appliance based on the curriculum schedule and the camera information, the practical requirement may not be met. For example, it is not enough for an et al to turn on the appliance. Therefore, how to reduce the power consumption as much as possible under the premise of ensuring the use requirement of the electric appliance is a technical problem to be solved.
Disclosure of Invention
The invention aims to provide a power saving calculation method and device combined with reinforcement learning, and aims to solve the problems in the prior art.
The invention provides a power saving calculation method combined with reinforcement learning, which comprises the following steps:
defining the definition, state, action, rewards and strategy of the reinforcement learning algorithm;
optimizing an electric appliance control strategy through a reinforcement learning algorithm;
and controlling the opening or closing of the electric appliance to save electricity through the optimized electric appliance control strategy.
The invention provides a power-saving computing device combined with reinforcement learning, which comprises:
the definition module is used for defining the definition, state, action, rewards and strategies of the reinforcement learning algorithm;
the optimizing module is used for optimizing the electrical appliance control strategy through a reinforcement learning algorithm;
and the control module is used for controlling the on or off of the electric appliance to save electricity through the optimized electric appliance control strategy.
By adopting the embodiment of the invention, the control strategy of the electric appliance is optimized by adopting the reinforcement learning method, so that the electric appliance reduces the electricity consumption as much as possible on the premise of ensuring the use requirement.
Drawings
For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description that follow are only some of the embodiments described in the description, from which, for a person skilled in the art, other drawings can be obtained without inventive faculty.
FIG. 1 is a flow chart of a power saving computing method incorporating reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a power-saving computing device incorporating reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive faculty, are intended to be within the scope of the present disclosure.
Method embodiment
According to an embodiment of the present invention, there is provided a power saving calculation method combined with reinforcement learning, and fig. 1 is a flowchart of the power saving calculation method combined with reinforcement learning according to an embodiment of the present invention, as shown in fig. 1, the power saving calculation method combined with reinforcement learning according to an embodiment of the present invention specifically includes:
step S101, defining the definition, state, action, rewards and strategy of the reinforcement learning algorithm; specifically, system power, the number of persons in the electricity-saving space, and activity in the electricity-saving space are defined as states, in which,
1. obtaining system power by monitoring the current and voltage of the system;
2. defining the system performance parameter adjustment as an action, wherein the system performance parameter specifically comprises: air conditioning temperature, CPU frequency and memory size;
3. defining the energy consumption reduction as positive rewards and the system performance reduction as negative rewards; the method specifically comprises the following steps:
the reward function is denoted as R (s, a), where s represents the state and a represents the system performance parameter, then the reward is defined according to equation 1:
wherein α and β are positive constants, respectively, for controlling the weights of positive and negative rewards, P t Representing the system power at time t, gamma is a weight controlling the perceived impact of the consumer, F t Is a teacher and student feeling at time t, and if the feeling is good, F t =0Otherwise F t =-1,a t The system performance parameter at time t is indicated.
4. The method of selecting an action based on the current state is defined as a policy. The method specifically comprises the following steps:
the strategy is expressed as pi (a|s), wherein a represents a system performance parameter, s represents a state, then according to the common formula
Equation 2 defines the strategy:
where n represents the number of actions and A represents the set of actions.
Step S102, optimizing an electric appliance control strategy through a reinforcement learning algorithm; the method specifically comprises the following steps:
setting a reinforcement learning algorithm as a Q-learning algorithm, and initializing a cost function Q (s, a) of the Q-learning algorithm, wherein Q (s, a) represents the value of the selection action a in the current state;
action a is selected according to the e-greedy strategy pi, action a is performed, the prize r and the new state s' are obtained, and the cost function Q (s, a) is updated according to formula 3:
Q(s,a)←Q(s,a)+α·[r+γmax a′ Q(s′,a′)—Q(s,a)]equation 3;
wherein, the E-greedy strategy refers to randomly selecting actions with a certain probability E, selecting the current optimal actions with a probability of 1-epsilon, wherein alpha is a learning rate, used for controlling the step length of each update, gamma is a discount factor, used for measuring the importance of future rewards, and r represents rewards;
the update state s is s'.
And step S103, controlling the on or off of the electric appliance to save electricity through the optimized electric appliance control strategy. The method specifically comprises the following steps:
initializing system power P, initializing consumer experience F, initializing system performance parameter a, initializing positive and negative reward weights α and β, initializing action set a=a 1 ,a 2 ,...,a n Where n is the number of actions, initializing the cost function Q (s, a) to an arbitrary value, initializingA state of chemical s=p;
action a is selected according to the current state, action a is performed by using a strategy pi (a|s), a reward R and a new state s' are obtained, system power P is updated, user experience F is updated, a reward function R (s, a) is calculated, a cost function Q (s, a) is updated, and the state s|P is updated.
The following describes the above technical solution of the embodiment of the present invention in detail.
It is assumed that there is an electric appliance in the teaching room to be controlled, and the electricity consumption of the first electric appliance is set at the moment. Our goal is to minimize the power usage of each appliance during the day.
The day may be divided into T time segments, each time segment having a length Δt, and the total time of day is tΔt. We can divide the time of day into T times, with the jth time being tj=jΔt. The total amount of electricity used by the ith appliance in a day is:
our goal is to minimize the sum of the power usage of all appliances in a day:
obviously, this is a problem of linear programming, so we can model this problem as an optimization problem as follows:
namely, the electricity consumption of each electric appliance in one day is minimized by saving the electricity consumption of a single electric appliance as much as possible. To achieve the above function, we can embed school curriculum and camera information into our model. In the simplest terms, we can put the power state x of the appliance i,j Two situations are distinguished: on and off, i.e. 1 and 0. If there is no course or current timeIf no person exists in the classroom, setting the corresponding state of the electrical appliance to be closed; otherwise, the device is set to be opened.
However, it is necessary to consider that in the actual application scenario, the activity time of students and teachers is not completely regular, and if the on-off state of the electric appliance is set based on only the curriculum schedule and the camera information, the actual requirement may not be satisfied. For example, it is not enough for an et al to turn on the appliance. Therefore, the control strategy of the electric appliance is optimized by adopting the reinforcement learning method, so that the electric appliance reduces the electricity consumption as much as possible on the premise of ensuring the use requirement.
1.1 reinforcement learning Algorithm
Reinforcement learning is a machine learning method that learns the best strategy by trial and error. In this problem, we can consider the control strategy of the appliance as an agent, and choose to turn on or off the appliance at each moment according to the current environment (e.g. whether someone is present, time, etc.), so as to obtain an instant prize (e.g. reduce the power consumption). By constantly interacting with the environment, the intelligent agent can learn the optimal appliance control strategy, thereby reducing the electricity consumption as much as possible on the premise of ensuring the use requirement. Under the reinforcement learning framework, we need to define concepts of states, actions, rewards, and strategies.
1.1.1 State definition
In the power saving algorithm, we take the power of the system and the number of classrooms/textbooks as states. Power may be obtained by monitoring the current and voltage of the system. Assuming that the system has power P, N people in the classroom, lessons in 10 minutes, the state s can be expressed as s= (P, N, 1).
1.1.2 behavior definition
In the power saving algorithm, the system performance parameter may be adjusted as an action, such as adjusting the air conditioning temperature, CPU frequency, memory size, etc. Assuming we express a system performance parameter as a, action set a may be expressed as a=a1, a 2.
1.1.3 prize definition
In the power saving algorithm, we can treat the decrease of energy consumption as positive rewards and the decrease of system performance as negative rewards. Assuming we denote the bonus function as R (s, a), we can define as follows:
where α and β are positive constants, respectively, that are used to control the weights of the positive and negative rewards. P (P) t The system power at time t is indicated. Gamma is a weight for controlling influence of teachers and students, F t Is a teacher and student feeling at time t, and if the feeling is good, F t =0, otherwise F t = -1. Thus, the intelligent agent can control the system power and consider the use feeling of teachers and students.
1.1.4 policy definition
In the power saving algorithm, we can refer to a method of selecting an action according to the current state as a policy. Assuming we express the policy as pi (a|s), we can define as follows:
that is, in each state, an even distribution is employed to select an action.
1.2 Algorithm
1.2.1 Q-learning algorithm
The learning algorithm is a reinforcement learning algorithm and can be used for optimizing strategies so as to achieve the aim of optimization. The basic idea of the algorithm is to constantly optimize the strategy by iteratively updating the cost function. The algorithm flow is shown in table 1:
TABLE 1
Wherein, the E-greedy strategy refers to randomly selecting actions with a certain probability E and selecting the current optimal actions with a probability of 1-E. Alpha is the learning rate used to control the step size of each update. Gamma is a discount factor used to measure the importance of future rewards.
In the power saving algorithm, the cost function of the Q-learning algorithm may be expressed as Q (s, a) =q (l, e, a), i.e., the value of the selection action a in the current state.
The strategy can be continuously optimized through continuous iteration of the Q-learning algorithm, so that the aim of minimizing energy consumption is fulfilled.
1.2.2 Power saving algorithm combining reinforcement learning
Aiming at the reinforcement learning algorithm, the learning period is longer, and the feedback period is long and the success is high in a real environment. In order to solve the problem, the embodiment of the invention adopts a simulation method, trains a model on a computer and applies the model to actual production. The specific algorithm is shown in table 2:
TABLE 2
Device embodiment
According to an embodiment of the present invention, there is provided a power saving computing device combined with reinforcement learning, and fig. 2 is a schematic diagram of the power saving computing device combined with reinforcement learning according to the embodiment of the present invention, as shown in fig. 2, the power saving computing device combined with reinforcement learning according to the embodiment of the present invention specifically includes:
a definition module 20, configured to define a definition, a state, an action, a reward, and a policy of the reinforcement learning algorithm; the definition module 20 is specifically configured to:
defining system power, the number of people in the electricity-saving space, and activity in the electricity-saving space as states, wherein the system power is obtained by monitoring the current and voltage of the system;
defining the system performance parameter adjustment as an action, wherein the system performance parameter specifically comprises: air conditioning temperature, CPU frequency and memory size;
the energy consumption reduction is defined as positive rewards and the system performance reduction is defined as negative rewards, in particular:
the reward function is denoted as R (s, a), where s represents the state and a represents the system performance parameter, then the reward is defined according to equation 1:
wherein α and β are positive constants, respectively, for controlling the weights of positive and negative rewards, P t Representing the system power at time t, gamma is a weight controlling the perceived impact of the consumer, F t Is a teacher and student feeling at time t, and if the feeling is good, F t =0, otherwise F t =-1,a t Representing system performance parameters at time t;
the method of selecting an action according to the current state is defined as a policy, specifically:
the policy is expressed as pi (a|s), where a represents a system performance parameter and s represents a state, then the policy is defined according to equation 2:
where n represents the number of actions and A represents the set of actions.
An optimizing module 22, configured to optimize an appliance control strategy through a reinforcement learning algorithm; the optimization module 22 is specifically configured to:
setting a reinforcement learning algorithm as a Q-learning algorithm, and initializing a cost function Q (s, a) of the Q-learning algorithm, wherein Q (s, a) represents the value of the selection action a in the current state;
action a is selected according to the e-greedy strategy pi, action a is performed, the prize r and the new state s' are obtained, and the cost function Q (s, a) is updated according to formula 3:
Q(s,a)←Q(s,a)+α·[r+γmax a′ Q(s′,a′)—Q(s,a)]equation 3;
wherein, epsilon-greedy strategy refers to randomly selecting actions with a certain probability epsilon, selecting the current optimal actions with a probability of 1 epsilon, alpha is learning rate, used for controlling step length of each update, gamma is discount factor, used for measuring importance of future rewards, and r represents rewards;
the update state s is s'.
And the control module 24 is used for controlling the on or off of the electric appliance to save electricity through the optimized electric appliance control strategy. The control module 24 is specifically configured to:
initializing system power P, initializing consumer experience F, initializing system performance parameter a, initializing positive and negative reward weights α and β, initializing action set a=a 1 ,a 2 ,...,a n Where n is the number of actions, the initialization cost function Q (s, a) is an arbitrary value, the initialization state s=p;
action a is selected according to the current state, action a is performed by using a strategy pi (a|s), a reward R and a new state s' are obtained, system power P is updated, user experience F is updated, a reward function R (s, a) is calculated, a cost function Q (s, a) is updated, and the state s|P is updated.
The embodiment of the present invention is an embodiment of a device corresponding to the embodiment of the method, and specific operations of each module may be understood by referring to descriptions of the embodiment of the method, which are not repeated herein.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A power saving computing method in combination with reinforcement learning, comprising:
defining the definition, state, action, rewards and strategy of the reinforcement learning algorithm;
optimizing an electric appliance control strategy through a reinforcement learning algorithm;
and controlling the opening or closing of the electric appliance to save electricity through the optimized electric appliance control strategy.
2. The method of claim 1, wherein defining the reinforcement learning algorithm definition, state, action, rewards, and policies specifically comprises:
defining system power, the number of people in the electricity-saving space, and activity in the electricity-saving space as states, wherein the system power is obtained by monitoring the current and voltage of the system;
defining the system performance parameter adjustment as an action, wherein the system performance parameter specifically comprises: air conditioning temperature, CPU frequency and memory size;
defining the energy consumption reduction as positive rewards and the system performance reduction as negative rewards;
the method of selecting an action based on the current state is defined as a policy.
3. The method of claim 2, wherein defining the energy consumption reduction as a positive reward and the system performance reduction as a negative reward specifically comprises:
the reward function is denoted as R (s, a), where s represents the state and a represents the system performance parameter, then the reward is defined according to equation 1:
wherein α and β are positive constants, respectively, for controlling the weights of positive and negative rewards, P t Representing the system power at time t, gamma is a weight controlling the perceived impact of the consumer, F t Is at the time ofT is carved with the teacher and student feeling, if the feeling is good, F t =0, otherwise F t =-1,a t The system performance parameter at time t is indicated.
4. The method according to claim 2, wherein defining the method of selecting an action based on the current state as a policy specifically comprises:
the policy is expressed as pi (a|s), where a represents a system performance parameter and s represents a state, then the policy is defined according to equation 2:
where n represents the number of actions and A represents the set of actions.
5. The method of claim 4, wherein optimizing the appliance control strategy by the reinforcement learning algorithm specifically comprises:
setting a reinforcement learning algorithm as a Q-learning algorithm, and initializing a cost function Q (s, a) of the Q-learning algorithm, wherein Q (s, a) represents the value of the selection action a in the current state;
action a is selected according to the e-greedy strategy pi, action a is performed, the prize r and the new state s' are obtained, and the cost function Q (s, a) is updated according to formula 3:
Q(s,a)←Q(s,a)+α·[r+γmax a′ Q(s′,a′)—Q(s,a)]equation 3;
wherein, the E-greedy strategy refers to randomly selecting actions with a certain probability E, selecting the current optimal action with a probability of 1-E, wherein alpha is a learning rate, used for controlling the step length of each update, gamma is a discount factor, used for measuring the importance of future rewards, and r represents rewards;
the update state s is s'.
6. The method of claim 5, wherein controlling the appliance to turn on or off via the optimized appliance control strategy comprises:
initializing system power P, initializing consumer experience F, initializing system performance parameter a, initializing positive and negative reward weights α and β, initializing action set a=a 1 ,a 2 ,...,a n Where n is the number of actions, the initialization cost function Q (s, a) is an arbitrary value, the initialization state s=p;
action a is selected according to the current state, action a is performed by using a strategy pi (a|s), a reward R and a new state s' are obtained, system power P is updated, user experience F is updated, a reward function R (s, a) is calculated, a cost function Q (s, a) is updated, and the state s|P is updated.
7. A power saving computing device incorporating reinforcement learning, comprising:
the definition module is used for defining the definition, state, action, rewards and strategies of the reinforcement learning algorithm;
the optimizing module is used for optimizing the electrical appliance control strategy through a reinforcement learning algorithm;
and the control module is used for controlling the on or off of the electric appliance to save electricity through the optimized electric appliance control strategy.
8. The method according to claim 1, wherein the definition module is specifically configured to:
defining system power, the number of people in the electricity-saving space, and activity in the electricity-saving space as states, wherein the system power is obtained by monitoring the current and voltage of the system;
defining the system performance parameter adjustment as an action, wherein the system performance parameter specifically comprises: air conditioning temperature, CPU frequency and memory size;
the energy consumption reduction is defined as positive rewards and the system performance reduction is defined as negative rewards, in particular:
the reward function is denoted as R (s, a), where s represents the state and a represents the system performance parameter, then the reward is defined according to equation 1:
wherein α and β are positive constants, respectively, for controlling the weights of positive and negative rewards, P t Representing the system power at time t, gamma is a weight controlling the perceived impact of the consumer, F t Is a teacher and student feeling at time t, and if the feeling is good, F t =0, otherwise F T =-1,a T Representing system performance parameters at time t;
the method of selecting an action according to the current state is defined as a policy, specifically:
the policy is expressed as pi (a|s), where a represents a system performance parameter and s represents a state, then the policy is defined according to equation 2:
where n represents the number of actions and A represents the set of actions.
9. The apparatus of claim 8, wherein the optimization module is specifically configured to:
setting a reinforcement learning algorithm as a Q-learning algorithm, and initializing a cost function Q (s, a) of the Q-learning algorithm, wherein Q (s, a) represents the value of the selection action a in the current state;
action a is selected according to the e-greedy strategy pi, action a is performed, the prize r and the new state s' are obtained, and the cost function Q (s, a) is updated according to formula 3:
Q(s,a)←Q(s,a)+α·[r+γmax a′ Q(s′,a′)—Q(s,a)]equation 3;
wherein, the E-greedy strategy refers to randomly selecting actions with a certain probability E, selecting the current optimal action with a probability of 1-E, wherein alpha is a learning rate, used for controlling the step length of each update, gamma is a discount factor, used for measuring the importance of future rewards, and r represents rewards;
the update state s is s'.
10. The apparatus of claim 9, wherein the control module is specifically configured to:
initializing system power P, initializing consumer experience F, initializing system performance parameter a, initializing positive and negative reward weights α and β, initializing action set a=a 1 ,a 2 ,...,a n Where n is the number of actions, the initialization cost function Q (s, a) is an arbitrary value, the initialization state s=p;
action a is selected according to the current state, action a is performed by using a strategy pi (a|s), a reward R and a new state s' are obtained, system power P is updated, user experience F is updated, a reward function R (s, a) is calculated, a cost function Q (s, a) is updated, and the state s|P is updated.
CN202311143879.0A 2023-09-05 2023-09-05 Power saving calculation method and device combined with reinforcement learning Pending CN117151928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311143879.0A CN117151928A (en) 2023-09-05 2023-09-05 Power saving calculation method and device combined with reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311143879.0A CN117151928A (en) 2023-09-05 2023-09-05 Power saving calculation method and device combined with reinforcement learning

Publications (1)

Publication Number Publication Date
CN117151928A true CN117151928A (en) 2023-12-01

Family

ID=88911628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311143879.0A Pending CN117151928A (en) 2023-09-05 2023-09-05 Power saving calculation method and device combined with reinforcement learning

Country Status (1)

Country Link
CN (1) CN117151928A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200010982A (en) * 2018-06-25 2020-01-31 군산대학교산학협력단 Method and apparatus of generating control parameter based on reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN114139778A (en) * 2021-11-15 2022-03-04 北京华能新锐控制技术有限公司 Wind turbine generator power prediction modeling method and device
CN114218867A (en) * 2021-12-20 2022-03-22 暨南大学 Special equipment flow control method and system based on entropy optimization safety reinforcement learning
CN114370698A (en) * 2022-03-22 2022-04-19 青岛理工大学 Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN116523327A (en) * 2023-02-28 2023-08-01 福建亿榕信息技术有限公司 Method and equipment for intelligently generating operation strategy of power distribution network based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200010982A (en) * 2018-06-25 2020-01-31 군산대학교산학협력단 Method and apparatus of generating control parameter based on reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN114139778A (en) * 2021-11-15 2022-03-04 北京华能新锐控制技术有限公司 Wind turbine generator power prediction modeling method and device
CN114218867A (en) * 2021-12-20 2022-03-22 暨南大学 Special equipment flow control method and system based on entropy optimization safety reinforcement learning
CN114370698A (en) * 2022-03-22 2022-04-19 青岛理工大学 Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN116523327A (en) * 2023-02-28 2023-08-01 福建亿榕信息技术有限公司 Method and equipment for intelligently generating operation strategy of power distribution network based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘犇: "加强学习的实现及其在多主体***中的应用", 北京印刷学院学报, no. 01, 30 March 2000 (2000-03-30), pages 22 - 30 *

Similar Documents

Publication Publication Date Title
Yang et al. Reinforcement learning for optimal control of low exergy buildings
Barrett et al. Autonomous hvac control, a reinforcement learning approach
Leslie et al. Best-response dynamics in zero-sum stochastic games
Dounis et al. Intelligent control system for reconciliation of the energy savings with comfort in buildings using soft computing techniques
CN110826723A (en) Interactive reinforcement learning method combining TAMER framework and facial expression feedback
Qiao et al. An incremental neuronal-activity-based RBF neural network for nonlinear system modeling
Haghnevis et al. A modeling framework for engineered complex adaptive systems
Klein et al. Towards optimization of building energy and occupant comfort using multi-agent simulation
CN110134165A (en) A kind of intensified learning method and system for environmental monitoring and control
Ponce et al. Framework for communicating with consumers using an expectation interface in smart thermostats
Karjalainen et al. Integrated control and user interfaces for a space
Nedungadi et al. Incorporating forgetting in the personalized, clustered, bayesian knowledge tracing (pc-bkt) model
CN111442476A (en) Method for realizing energy-saving temperature control of data center by using deep migration learning
CN117151928A (en) Power saving calculation method and device combined with reinforcement learning
Dinh et al. MILP-based imitation learning for HVAC control
Grubaugh et al. Harnessing AI to power constructivist learning: An evolution in educational methodologies
CN110323758A (en) Power system discrete reactive power optimization method based on serial Q learning algorithm
von Grabe A preliminary cognitive model for the prediction of energy-relevant human interaction with buildings
Wang et al. Energy optimization for HVAC systems in multi-VAV open offices: A deep reinforcement learning approach
Lee et al. On-policy learning-based deep reinforcement learning assessment for building control efficiency and stability
Papadimitriou Adaptive and Intelligent MOOCs: Ηow They Contribute to the Improvement of the MOOCs’ Effectiveness
Kadamala et al. Enhancing HVAC control systems through transfer learning with deep reinforcement learning agents
Marantos et al. Towards Plug&Play smart thermostats for building’s heating/cooling control
Kamsa et al. Learning time planning in a distance learning system using intelligent agents
Erdemir Using web-based intelligent tutoring systems in teaching physics subjects at undergraduate level

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination