CN117369263A - Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism - Google Patents

Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism Download PDF

Info

Publication number
CN117369263A
CN117369263A CN202311375874.0A CN202311375874A CN117369263A CN 117369263 A CN117369263 A CN 117369263A CN 202311375874 A CN202311375874 A CN 202311375874A CN 117369263 A CN117369263 A CN 117369263A
Authority
CN
China
Prior art keywords
network
state
combustion
control method
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311375874.0A
Other languages
Chinese (zh)
Other versions
CN117369263B (en
Inventor
梁合兰
国宏伟
闫炳基
杨韬然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202311375874.0A priority Critical patent/CN117369263B/en
Publication of CN117369263A publication Critical patent/CN117369263A/en
Application granted granted Critical
Publication of CN117369263B publication Critical patent/CN117369263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention belongs to the technical field of combustion control of hot blast stoves, and particularly relates to an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanisms, which comprises the following steps: s1: acquiring historical data of combustion of a combustion furnace, and selecting continuous data from the historical data as combustion state data by utilizing a moving time window; s2: based on the combustion state data, obtaining a trained Attention-MLP model; s3: and acquiring real-time combustion data, and controlling the gas valve position adjusting direction of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model. The invention solves the problem that the existing hot blast stove combustion control method can not meet the requirements of control precision and real-time property, has higher accuracy and can meet the requirements of intelligent combustion optimization control of the blast furnace hot blast stove.

Description

Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism
Technical Field
The invention relates to the technical field of combustion control of hot blast stoves, in particular to an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanisms.
Background
The blast furnace smelting is a key link in the steel smelting process, and the production cost of the blast furnace smelting accounts for more than 60% of the cost of steel products. The hot blast stove is used as a heat exchange device which is vital in the iron-making process and has the function of generating and conveying high-temperature hot air to a blast furnace so as to meet the heat requirement of the reduction process of iron ores. The improvement of the combustion control precision of the hot blast stove is an important means for improving the air temperature, reducing the fuel consumption and reducing the emission, and is also an effective means for prolonging the service life of the hot blast stove and reducing the labor intensity of workers.
The goal of the hot blast stove combustion control is to achieve the optimal control of the vault temperature and the flue gas temperature rise rate in the combustion period by dynamically adjusting the air valve position and the gas valve position according to the combustion state. The combustion control method of the blast furnace hot blast stove mainly comprises three main types of traditional control methods, mathematical model methods and intelligent control methods. In the method, the traditional control method has the problems of control lag and overlarge control action intensity, and the mathematical model method has large investment because of more thermal parameters to be monitored. Compared with the prior art, the intelligent control method of the hot blast stove can utilize intelligent knowledge to design the controller, has the advantages of wide working range, wide application range and the like, and is the main stream direction of current development. However, in the current intelligent control method, rule induction and extraction are difficult problems, and in actual operation, rules are usually set manually, and the defects of single expression form, simple structure and the like exist.
Disclosure of Invention
Therefore, the invention aims to solve the technical problem that the control precision and the real-time performance of the optimized combustion of the hot blast stove cannot be considered in the prior art.
In order to solve the technical problems, the invention provides an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism, which comprises the following steps:
s1: acquiring historical data of combustion of a combustion furnace, and selecting continuous data from the historical data as combustion state data by utilizing a moving time window;
s2: based on the combustion state data, obtaining a trained Attention-MLP model;
s3: acquiring real-time combustion data, and controlling the gas valve position adjusting direction of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model;
the Attention-MLP model comprises a predefined action space, an experience pool, a Q network for outputting actions and a target network for outputting estimated values to guide the Q network, wherein initial parameters of the Q network and initial parameters of the target network are the same; after the combustion state data is obtained by the intelligent agent observing the environment, an output state is formed after the combustion state data is processed by the attention mechanism module, a state transfer record obtained by the interaction of the intelligent agent and the environment is stored in the experience pool, the Q network is trained by adopting an experience playback mechanism, the parameters of the Q network are updated, and the Q network synchronizes the updated parameters to the target network;
the combustion state data is a plurality of pieces, and any piece of combustion state data comprises values of gas pressure, air valve position, gas valve position and vault temperature.
In one embodiment of the invention, storing state transition records obtained by the agent interacting with the environment in the experience pool comprises: the intelligent agent obtains feedback rewards corresponding to the action after selecting the action from the action space to execute according to the output state, and the intelligent agent transfers to a new state, and takes the combustion state data, the action, the feedback rewards and the new state as state transfer records to be stored in the experience pool; the action space is defined as a collection of 3 adjustment directions of a gas valve position, and the 3 adjustment directions are specifically: downward adjustment, upward adjustment and constant.
In one embodiment of the present invention, the specific steps for forming the output state after processing by the attention mechanism module are:
s211: in combustion state data s t For input, linear features are obtained by linear mappingThe following is shown:
Z t ={z 1 ,z 2 ,...,z ω }=W z s t +b z
wherein,representing the linear characteristics at time iI=1, …, ω, ω represents the time window length; m is the length of each piece of furnace data; />Is a neural network parameter->D is the dimension of the embedded feature, which is the bias parameter;
s212: in linear characteristic Z t For input, computing hidden states with position coding addedThe following are provided:
H t ={h 1 ,h 2 ,...,h ω }=Z t +C
wherein,indicating the hidden state at time i, i=1, …, ω; />Is a neural network parameter, its size and Z t The same;
s213: h by linear mapping t Respectively mapped as keysSum->The following is shown:
K t ={k 1 ,k 2 ,...,k ω }=W k H t +b k
V t ={v 1 ,v 2 ,...,v ω }=W v H t +b v
in the method, in the process of the invention, the key vector and the value vector at the i-th moment are respectively represented, i=1, …, ω; a weight parameter for the embedded network; /> Is a bias parameter;
s214: inputting query vectorsCalculating the relevance between the query vector and the key +.>The following is shown:
A t =(a 1 ,a 2 ,...,a ω )=softmax(q T K t )
in the method, in the process of the invention,an attention value between the query vector and the key representing the i-th moment, i=1, …, ω; />Softmax is a normalized exponential function, a randomly initialized learnable parameter;
s215: substituting the correlation vector and the value matrix according to the following formula to obtain the combustion state data s at the moment t t Output state of (2)
In one embodiment of the invention, the training of the Q network using an empirical playback mechanism includes:
s21: initializing experience pool D, parameters of the Q network and the target network, sampling priority of each state transition record, and recording state transition record at moment i as tau i
S22: according to the sampling priority, calculating the sampling probability of each state transition record:
where β is a state transition sampling constant, τ i For the state transition record at time i,recording τ for state transitions i Is a sampling priority of (1);
s23: and sampling the state transition record according to the sampling probability, setting a loss function according to the Q value corresponding to the output action of the B-state transition record obtained by sampling, and updating the parameters of the Q network by adopting a gradient descent method so as to minimize the loss function value.
In one embodiment of the invention, the loss function is expressed as:
wherein L is TD As a time sequence difference error, L E For the purpose of large pitch class loss,for the L2 regularization loss,λ 1 、λ 2 is the weight of the corresponding loss function.
In one embodiment of the present invention, the expression of the time series differential error is:
L TD =Y t -Q(s t ,a t ;θ)
wherein Y is t Represents the cumulative expected return estimate at time t, Q(s) t ,a t The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the input state s in the Q network t Estimating an output value by accumulated expected returns at the time; wherein,
wherein r is t+1 Rewards at time t+1; gamma epsilon (0, 1) is the discount coefficient;expressed in terms of the parameter theta - Input s in target network of (a) t+1 Output action is +.>A corresponding accumulated expected return estimate; />To input state s in Q network t+1 The action with the maximum expected return estimated output value is accumulated.
In one embodiment of the present invention, the expression of the large-pitch classification loss is:
wherein A represents an action space, s t Indicating that the hot air furnace is inthe combustion state at time t, a represents any operation in the operation space a, Q (s t A; θ) indicates that the Q network with parameter θ is in the input state s t The lower action is the output of a,representing expert action at time t, i.e. s in the state transition record t The corresponding valve position adjustment action under the state; />And b is a super-parameter greater than zero, representing a penalty function when the action output by the model is inconsistent with the expert action reflected in the record.
In one embodiment of the present invention, the expression of the L2 regularization loss is:
wherein θ is a parameter of the Q network.
In one embodiment of the present invention, the specific method for the Q network to synchronize the updated parameters to the target network is: and synchronizing the parameters of the Q network to the target network when the sampling times reach the preset target network parameter updating frequency.
In one embodiment of the present invention, the gradient descent method is as follows: and calculating importance sampling weights of all the state transition records according to the sampling probability, updating gradient values according to the average importance sampling weights of the B-state transition records obtained by sampling, and updating the sampling priority of the state transition records according to the time sequence differential error.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism combines the reinforcement learning and supervision learning ideas, utilizes the burning history data set to guide the intelligent body to learn offline, particularly aims at the characteristics of history data, and uses the attention mechanism to better learn the state change rule of the hot blast stove contained in the history data, thereby outputting better combustion control actions of the hot blast stove, and solving the problem that the combustion of the hot blast stove cannot meet the requirements of control precision and real-time performance due to the characteristics of nonlinearity, continuity, hysteresis and the like.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which
FIG. 1 is a flow chart of an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism;
FIG. 2 is a generation map of the combustion state data in the embodiment;
FIG. 3 is a block diagram of the Q network or the target network in an embodiment;
FIG. 4 is a flow chart of the processing of state data into output states by the attention-based embedded network of FIG. 3;
FIGS. 5 (a) - (b) are the accuracy of the Attention-MLP model and the MLP model on the training set and the test set, respectively;
FIG. 6 is a graph of a comparative change in loss function values for an Attention-MLP model and an MLP model;
FIG. 7 is a graph showing Q-value change curves of the Attention-MLP model and the MLP model.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.
Referring to FIG. 1, the invention provides an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism, which comprises the following steps:
s1: acquiring historical data of combustion of a combustion furnace, and selecting continuous data from the historical data as combustion state data by utilizing a moving time window;
s2: based on the combustion state data, obtaining a trained Attention-MLP model;
s3: acquiring real-time combustion data, and controlling the gas valve position adjusting direction of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model;
the Attention-MLP model comprises a predefined action space, an experience pool, a Q network for outputting actions and a target network for outputting estimated values to guide the Q network, wherein initial parameters of the Q network and initial parameters of the target network are the same; after the combustion state data is obtained by the intelligent agent observing the environment, an output state is formed after the combustion state data is processed by the attention mechanism module, a state transfer record obtained by the interaction of the intelligent agent and the environment is stored in the experience pool, the Q network is trained by adopting an experience playback mechanism, the parameters of the Q network are updated, and the Q network synchronizes the updated parameters to the target network;
the combustion state data is multiple, and any one combustion state data comprises values of gas pressure, air valve position, gas valve position and vault temperature.
Specifically in S1, a strategy of moving a time window is adopted, and continuous data in the time window with the size of ω=10 is selected to construct a state, i.e. S t =X t-ω+1:t Wherein x is t Representing the values of variables such as gas pressure, air pressure and the like at the time t, and m is the length of each record. For s t To s ω-1 The previous portion of data is missing and padded with 0. The generation of the status data is shown in fig. 2. The upper part of fig. 2 is the collected furnace history data, and the state data of the reinforcement learning model is shown in the lower part of the figure. Wherein the state data s at time t t Furnace history X from t-omega+1 to t t-ω+1:t Composition is prepared. The status data at other times can be obtained in the same way.
As shown in fig. 3, the Q network and the target network are each composed of an embedded network f and a fully connected network g, the embedded network f based on an attention mechanism integrating two-dimensional time sequence state data s t Mapped to output state e t Then in output state e t As input by full connectionAnd connecting the network g to obtain Q value output estimated values corresponding to 3 actions of the gas valve position.
Specifically, storing state transition records obtained by interaction of an agent with an environment in the experience pool includes: the intelligent agent obtains feedback rewards corresponding to the action after selecting the action from the action space to execute according to the output state, and the intelligent agent transfers to a new state, and takes the combustion state data, the action, the feedback rewards and the new state as state transfer records to be stored in the experience pool; the action space is defined as a collection of 3 adjustment directions of a gas valve position, and the 3 adjustment directions are specifically: downward adjustment, upward adjustment and constant.
As shown in fig. 4, the specific steps for forming the output state after processing by the attention mechanism module are as follows:
s211: in combustion state data s t For input, linear features are obtained by linear mappingThe following is shown:
Z t ={z 1 ,z 2 ,...,z ω }=W z s t +b z
wherein,representing the linear characteristic at time i, i=1,..; m is the length of each piece of combustion state data, m=5; />Is a neural network parameter->D is the dimension of the embedded feature, d=64, for the bias parameter;
s212: in linear characteristic Z t For input, computing hidden states with position coding addedThe following are provided:
H t ={h 1 ,h 2 ,...,h ω }=Z t +C
wherein,indicating the hidden state at time i, i=1, …, ω; />Is a neural network parameter, its size and Z t The same;
s213: h by linear mapping t Respectively mapped as keysSum->The following is shown:
K t ={k 1 ,k 2 ,...,k ω }=W k H t +b k
V t ={v 1 ,v 2 ,...,v ω }=W v H t +b v
in the method, in the process of the invention, the key vector and the value vector at the i-th moment are respectively represented, i=1, …, ω; a weight parameter for the embedded network; /> Is a bias parameter;
s214: inputting query vectorsCalculating the relevance between the query vector and the key +.>The following is shown:
A t =(a 1 ,a 2 ,...,a ω )=softmax(q T K t )
in the method, in the process of the invention,an attention value between the query vector and the key representing the i-th moment, i=1, …, ω; />Softmax is a normalized exponential function, a randomly initialized learnable parameter;
s215: substituting the correlation vector and the value matrix according to the following formula to obtain the combustion state data s at the moment t t Output state of (2)
The method for obtaining the Q value output estimated value comprises the following steps: and then output state e t Inputting a full-connection network g, setting the node number of an output layer to be 3, and respectively outputting Q value output estimated values corresponding to three actions of increasing the gas valve position, reducing the gas valve position and keeping the gas valve position unchanged, namely:
Q={Q 1 ,Q 2 ,Q 3 }=MLP(e t )。
in this embodiment, the training of the Q network using an empirical playback mechanism includes:
s21: initializing experience pool D, parameters of the Q network and the target network, sampling priority of each state transition record, and recording state transition record at moment i as tau i
S22: according to the sampling priority, calculating the sampling probability of each state transition record:
where β is a state transition sampling constant, β=0.4, τ in this embodiment i For the state transition record at time i,recording τ for state transitions i Is a sampling priority of (1);
s23: and sampling the state transition record according to the sampling probability, setting a loss function according to the Q value corresponding to the output action of the B-state transition record obtained by sampling, and updating the parameters of the Q network by adopting a gradient descent method so as to minimize the loss function value.
The expression of the loss function is:
wherein L is TD As a time sequence difference error, L E For the purpose of large pitch class loss,lambda is the regularization loss of L2 1 、λ 2 Is the weight of the corresponding loss function.
The expression of the time sequence differential error is as follows:
L TD =Y t -Q(s t ,a t ;θ)
wherein Y is t Represents the cumulative expected return estimate at time t, Q(s) t ,a t The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the input state s in the Q network t Estimating an output value by accumulated expected returns at the time; wherein,
wherein r is t+1 Rewards at time t+1; gamma epsilon (0, 1) is the discount coefficient;expressed in terms of the parameter theta - Input s in target network of (a) t+1 Output action is +.>A corresponding accumulated expected return estimate; />To input state s in Q network t+1 The action with the maximum expected return estimated output value is accumulated.
The expression of the large-space classification loss is as follows:
wherein A represents an action space, s t The combustion state of the hot air furnace at time t is shown, a is any one of the actions in the action space A, Q (s t A; θ) indicates that the Q network with parameter θ is in the input state s t The lower action is the output of a,indicating time tExpert action, i.e. s in the state transition record t The corresponding valve position adjustment action under the state; />And b is a super-parameter greater than zero, representing a penalty function when the action output by the model is inconsistent with the expert action reflected in the record.
The expression of the L2 regularization loss is:
wherein θ is a parameter of the Q network.
In this embodiment, the specific method for the Q network to synchronize the updated parameters to the target network is: and synchronizing the parameters of the Q network to the target network when the sampling times reach the preset target network parameter updating frequency.
The method for updating the parameters of the Q network by adopting the gradient descent method comprises the following steps:
and calculating importance sampling weights of all state transition records according to the sampling probability:
in the method, in the process of the invention,for the importance sampling constant +.in this embodiment>
Updating a gradient value according to the average importance sampling weight of the B-state transition record obtained by sampling:θ=θ-λΔ θ l, where delta θ L(τ i ) Representing state transition records τ i And updating the sampling priority of the state transition record according to the time sequence difference error, wherein lambda is the learning rate of the corresponding loss function gradient value:
in order to evaluate the performance of the Attention-MLP model, the proposed model is realized by adopting software such as Python 3.11.3, JAX 0.4.8 and the like, a GNU/Linux operating system is installed, a computer with an Intel (R) Core (TM) i7-10700 CPU and a 16GB memory is used for completing calculation, and Wandb is used for recording experimental data. In terms of data, a set of stove combustion control data is obtained for training and testing. By dividing the data according to the burning period and eliminating the abnormal period, 27 burning periods are obtained, and the number of the burning periods is 7: the ratio of 3 randomly divides the training and testing sets (19 cycles for training phase and the remaining 8 for testing phase). Then 47739 state transitions are obtained in total according to the generation rules of the state data.
Determining a parameter θ of the Q network and a parameter θ of the target network using four different random number seeds - Four experiments were performed for each network structure. Fig. 5 to 7 show the variation of the statistical index in the training process, the Attention-MLP model is the reinforcement learning model based on the Attention mechanism proposed by the present invention, and the MLP is the reinforcement learning model only using the fully connected network.
As can be seen from the accuracy change curves of FIGS. 5 (a) - (b), as training proceeds, the accuracy of both models increases, and at the same time, the accuracy of the training set and the accuracy of the test set are close at the same time step, indicating that the Attention-MLP model does not have over-fit or under-fit. Fig. 6 shows the average loss value change of the model in the training process, and it can be seen that the overall loss is continuously reduced, which indicates that the model can effectively learn the data rule. Fig. 7 shows the average Q value of the model during training. It can be seen that the average Q values of all models eventually converge to around 8, which means that all models have similar estimates of Q values. Compared with the MLP model, the Attention-MLP model can pay Attention to important parts in the input state sequence, and meanwhile, the addition of position embedding enables the model to be optimal in solving quality in consideration of sequence order.
According to the embodiment, the intelligent optimized combustion control method for the blast furnace hot blast stove based on the deep reinforcement learning does not need to use monitoring instruments such as a flowmeter and a residual oxygen meter, does not need to explicitly provide the corresponding relation between the valve position size and detection equipment such as the flowmeter, and autonomously learns the implicit relation between the furnace burning state and the valve position adjustment parameter based on the deep reinforcement learning method according to the historical furnace burning operation. Furthermore, considering the difficulty of state feature representation, an Attention-based deep embedded network and fully connected network model is proposed. Experimental results show that the accuracy of the model reaches 86%, and the intelligent combustion optimization control requirement of the blast furnace hot blast stove can be met.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (10)

1. An intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism is characterized by comprising the following steps:
s1: acquiring historical data of combustion of a combustion furnace, and selecting continuous data from the historical data as combustion state data by utilizing a moving time window;
s2: based on the combustion state data, obtaining a trained Attention-MLP model;
s3: acquiring real-time combustion data, and controlling the adjustment direction of a gas valve position of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model;
the Attention-MLP model comprises a predefined action space, an experience pool, a Q network for outputting actions and a target network for outputting estimated values to guide the Q network, wherein initial parameters of the Q network and initial parameters of the target network are the same; after the combustion state data is obtained by the intelligent agent observing the environment, an output state is formed after the combustion state data is processed by the attention mechanism module, a state transfer record obtained by the interaction of the intelligent agent and the environment is stored in the experience pool, the Q network is trained by adopting an experience playback mechanism, the parameters of the Q network are updated, and the Q network synchronizes the updated parameters to the target network;
the combustion state data is a plurality of pieces, and any piece of combustion state data comprises values of gas pressure, air valve position, gas valve position and vault temperature.
2. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 2, wherein the intelligent combustion control method is characterized by comprising the following steps of: storing a state transition record obtained by interaction of an agent with an environment in the experience pool comprises: the intelligent agent obtains feedback rewards corresponding to the action after selecting the action from the action space to execute according to the output state, and the intelligent agent transfers to a new state, and takes the combustion state data, the action, the feedback rewards and the new state as state transfer records to be stored in the experience pool; the action space is defined as a collection of 3 adjustment directions of a gas valve position, and the 3 adjustment directions are specifically: downward adjustment, upward adjustment and constant.
3. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 1, wherein the intelligent combustion control method is characterized by comprising the following steps of: the specific steps for forming the output state after being processed by the attention mechanism module are as follows:
s211: in combustion state data s t For input, linear features are obtained by linear mappingThe following is shown:
Z t ={z 1 ,z 2 ,...,z ω }=W z s t +b z
wherein,representing the linear characteristic at time i, i=1, …, ω, ω representing the time window length; m is the length of each piece of furnace data; />Is a neural network parameter->D is the dimension of the embedded feature, which is the bias parameter;
s212: in linear characteristic Z t For input, computing hidden states with position coding addedThe following are provided:
H t ={h 1 ,h 2 ,...,h ω }=Z t +C
wherein,indicating the hidden state at time i, i=1, …, ω; />Is a neural network parameter, its size and Z t The same;
s213: h by linear mapping t Respectively mapped as keysSum->The following is shown:
K t ={k 1 ,k 2 ,...,k ω }=W k H t +b k
V t ={v 1 ,v 2 ,...,v ω }=W v H t +b v
in the method, in the process of the invention,the key vector and the value vector at the i-th moment are respectively represented, i=1, …, ω;a weight parameter for the embedded network; />Is a bias parameter;
s214: inputting query vectorsCalculating the relevance between the query vector and the key +.>The following is shown:
A t =(a 1 ,a 2 ,...,a ω )=softmax(q T K t )
in the method, in the process of the invention,an attention value between the query vector and the key representing the i-th moment, i=1, …, ω; />Softmax is a normalized exponential function, a randomly initialized learnable parameter;
s215: substituting the correlation vector and the value matrix according to the following formula to obtain the combustion state data s at the moment t t Output state of (2)
4. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 1, wherein the intelligent combustion control method is characterized by comprising the following steps of: the training of the Q network by adopting an experience playback mechanism comprises the following steps:
s21: initializing experience pool D, parameters of the Q network and the target network, sampling priority of each state transition record, and recording state transition record at moment i as tau i
S22: according to the sampling priority, calculating the sampling probability of each state transition record:
where β is a state transition sampling constant, τ i State transition record for instant i, p τi Recording τ for state transitions i Is a sampling priority of (1);
s23: and sampling the state transition record according to the sampling probability, setting a loss function according to the Q value corresponding to the output action of the B-state transition record obtained by sampling, and updating the parameters of the Q network by adopting a gradient descent method so as to minimize the loss function value.
5. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 4, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the loss function is:
wherein L is TD As a time sequence difference error, L E For the purpose of large pitch class loss,lambda is the regularization loss of L2 1 、λ 2 Is the weight of the corresponding loss function.
6. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the time sequence differential error is as follows:
L TD =Y t -Q(s t ,a t ;θ)
wherein Y is t Represents the cumulative expected return estimate at time t, Q(s) t ,a t The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the input state s in the Q network t Estimating an output value by accumulated expected returns at the time; wherein,
wherein r is t+1 Rewards at time t+1; is a discount coefficient;expressed in terms of the parameter theta - Input s in target network of (a) t+1 Output action is +.>A corresponding accumulated expected return estimate; />To input state s in Q network t+1 The action with the maximum expected return estimated output value is accumulated.
7. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the large-space classification loss is as follows:
wherein A represents an action space, s t The combustion state of the hot air furnace at time t is shown, a is any one of the actions in the action space A, Q (s t A; θ) indicates that the Q network with parameter θ is in the input state s t The lower action is the output of a,representing expert action at time t, i.e. s in the state transition record t The corresponding valve position adjustment action under the state; />And b is a super-parameter greater than zero, representing a penalty function when the action output by the model is inconsistent with the expert action reflected in the record.
8. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the L2 regularization loss is:
wherein θ is a parameter of the Q network.
9. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 4, wherein the intelligent combustion control method is characterized by comprising the following steps of: the specific method for the Q network to synchronize the updated parameters to the target network is as follows: and synchronizing the parameters of the Q network to the target network when the sampling times reach the preset target network parameter updating frequency.
10. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the gradient descent method comprises the following steps: and calculating importance sampling weights of all the state transition records according to the sampling probability, updating gradient values according to the average importance sampling weights of the B-state transition records obtained by sampling, and updating the sampling priority of the state transition records according to the time sequence differential error.
CN202311375874.0A 2023-10-23 2023-10-23 Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism Active CN117369263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311375874.0A CN117369263B (en) 2023-10-23 2023-10-23 Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311375874.0A CN117369263B (en) 2023-10-23 2023-10-23 Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism

Publications (2)

Publication Number Publication Date
CN117369263A true CN117369263A (en) 2024-01-09
CN117369263B CN117369263B (en) 2024-07-09

Family

ID=89396113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311375874.0A Active CN117369263B (en) 2023-10-23 2023-10-23 Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism

Country Status (1)

Country Link
CN (1) CN117369263B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017818A (en) * 2022-06-20 2022-09-06 南京工业职业技术大学 Power plant flue gas oxygen content intelligent prediction method based on attention mechanism and multilayer LSTM
CN116668995A (en) * 2023-07-27 2023-08-29 苏州大学 Deep reinforcement learning-based vehicle networking dynamic beacon broadcasting method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017818A (en) * 2022-06-20 2022-09-06 南京工业职业技术大学 Power plant flue gas oxygen content intelligent prediction method based on attention mechanism and multilayer LSTM
CN116668995A (en) * 2023-07-27 2023-08-29 苏州大学 Deep reinforcement learning-based vehicle networking dynamic beacon broadcasting method and system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JIGAO FU 等: "Control Strategy for Denitrification Efficiency of Coal-Fired Power Plant Based on Deep Reinforcement Learning", IEEE ACCESS, 2 April 2020 (2020-04-02) *
LIANGSHU HE 等: "Machine-learning-driven on-demand design of phononic beams", SCIENCE CHINA PHYSICS, MECHANICS & ASTRONOMY, 25 November 2021 (2021-11-25) *
PICTUREYAWEI XUE 等: "An Optimal Model of Power Source Investment Considering Carbon Emission Constraint Based on Deep Deterministic Policy Gradient Algorithm", PROCEEDINGS OF THE 2022 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, INTELLIGENT CONTROL AND ARTIFICIAL, 19 April 2023 (2023-04-19) *
WEIJUN TAN 等: "A Fast Partial Video Copy Detection Using KNN and Global Feature Database", 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, 15 February 2022 (2022-02-15) *
敖韬: "基于深度强化学习的热工过程控制方法研究与应用", 中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑, 15 March 2022 (2022-03-15) *
黄志刚 等: "深度分层强化学习研究与发展", 软件学报, 15 February 2023 (2023-02-15) *

Also Published As

Publication number Publication date
CN117369263B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN110187727A (en) A kind of Glass Furnace Temperature control method based on deep learning and intensified learning
CN107368125B (en) A kind of blast furnace temperature control system and method based on CBR Yu the parallel mixed inference of RBR
CN111915080B (en) Raw fuel cost optimal proportioning method based on molten iron quality constraint
CN109583585A (en) A kind of station boiler wall temperature prediction neural network model
CN107526927A (en) A kind of online robust flexible measurement method of blast-melted quality
CN110097929A (en) A kind of blast furnace molten iron silicon content on-line prediction method
CN104778361B (en) The method of modified EMD Elman neural network prediction molten iron silicon contents
CN106709197A (en) Molten iron silicon content predicting method based on slide window T-S fuzzy neural network model
CN110427715B (en) Method for predicting furnace hearth thermal state trend based on time sequence and multiple dimensions of blast furnace
CN104615856A (en) Gas consumption prediction model establishing method and device based on hot blast stove group
CN117369263B (en) Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism
Koyfman et al. Using of Intelligence Analysis of Technological Parameters Database for Implementation of Control Subsystem of Hot Blast Stoves Block ACS.
Tian et al. A new incremental learning modeling method based on multiple models for temperature prediction of molten steel in LF
JPH02170904A (en) Method for predicting furnace heat in blast furnace
JPH0673414A (en) Method for controlling quality of molten iron in blast furnace
Jiang et al. Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
JP6933196B2 (en) Blast furnace unloading speed prediction model learning method, blast furnace unloading speed prediction method, blast furnace operation guidance method, blast furnace unloading speed control method, hot metal manufacturing method, blast furnace operation method, and blast furnace unloading speed prediction Model learning device
JP4110780B2 (en) Estimation method of hot metal temperature and disturbance in blast furnace
Wu et al. Intelligent optimization and control for reheating furnaces
JP7384311B1 (en) Driving support device, driving support method and program
CN113637819B (en) Blast furnace material distribution method and system based on deep reinforcement learning
Lakshmanan et al. A hybrid modelling approach based on deep learning for the prediction of the silicon content in the blast furnace
CN114943173B (en) Ladle baking system and optimization method based on deep reinforcement learning and combustion simulation coupling
Zhao et al. Study on prediction method of hot metal temperature in blast furnace
CN116464985A (en) Combustion control method of hot blast stove

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant