CN114943173B

CN114943173B - Ladle baking system and optimization method based on deep reinforcement learning and combustion simulation coupling

Info

Publication number: CN114943173B
Application number: CN202210388962.3A
Authority: CN
Inventors: 张琦; 卢厚杨
Original assignee: 东北大学
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2024-06-28
Anticipated expiration: 2042-04-13
Also published as: CN114943173A

Abstract

The invention discloses a ladle baking system and an optimization method based on deep reinforcement learning and combustion simulation technology coupling, wherein the method comprises the following steps: s1, establishing an actual baking geometric model of a ladle; s2, establishing and calculating a simulation model; s3, preprocessing data; s4, building a prediction model; s5, double-mode coupling; s6, intelligent optimization control. The method adopts a software modeling and control optimization mode to replace manual control optimization, thereby shortening the optimization period, improving the optimization efficiency, reducing the labor cost and realizing the purpose of rapidly improving the baking temperature of the steel ladle; in addition, the software modeling optimization is to perform coupling modeling optimization by using Python and combustion simulation software, so that the optimization can be completed under the condition that a parameterized model is not needed, the problem that parameters in the steel ladle are difficult to obtain under actual working conditions is avoided, and the modeling cost is greatly reduced.

Description

Ladle baking system and optimization method based on deep reinforcement learning and combustion simulation coupling

Technical Field

The invention relates to the technical field of optimization in the steel industry, in particular to a ladle baking system and an optimization method based on coupling of deep reinforcement learning and combustion simulation technology.

Background

The steel industry is a pillar-type industry of national economy, is a resource and energy intensive industry, and is also an important point for optimizing an energy system. At present, the energy structure in the iron and steel industry mainly comprises low-grade ores and coals, the environmental pressure is high, and the development of green low carbon is imperative. There is a great need to promote the digitization and intellectualization level of the steel industry by the leading-edge information technology of big data, cloud computing, artificial intelligence and the like, promote the industry transformation and upgrading by intelligent manufacturing, and promote the high-level development of the steel industry. Due to the production process requirement and the gas saving requirement, the ladle roaster is often roasted to 1100-1200 ℃ in a short time, and meanwhile, the ladle roaster is ensured not to flameout due to low heat value of the gas during working. In the existing baking process, the traditional controlled ladle roaster coarsely adjusts the gas flow through manual operation to raise the temperature to a target value, and does not pay attention to the air-fuel ratio, so that resource waste and equipment damage are caused. In addition, the existing baking control has the problems of weak capability, unsuitable air-fuel ratio, insufficient combustion, low baking efficiency and the like, and a mode of prolonging the baking time is often adopted to reach the target temperature, so that serious gas waste is caused, and the defects of slow production rhythm and the like exist; and because of the complexity of the distribution of the internal flow field and the temperature field of the ladle under the on-site actual working condition, the internal data is difficult to obtain to realize fine regulation. The traditional machine learning model is difficult to acquire enough related data for training, and modeling and optimal control cannot be performed according to the requirements of the continuous casting production process.

Therefore, developing a new ladle baking modeling and optimizing control method to realize efficient and accurate ladle baking is an important intelligent process method which is urgently needed to be developed.

Disclosure of Invention

Aiming at the defects of the prior baking technology, the invention provides a ladle baking modeling and optimizing method based on coupling of deep reinforcement learning and combustion simulation technology, so as to solve the problems that the ladle baking temperature in the prior art cannot meet the technological requirement of rapidly reaching 1100 ℃ during baking, and lower pollutant emission is generated. Specifically, the method replaces manual control optimization by using a software modeling and control optimization mode, so that the optimization period is shortened, the optimization efficiency is improved, the labor cost is reduced, and the aim of rapidly improving the baking temperature of the steel ladle is fulfilled; in addition, the software modeling optimization is to perform coupling modeling optimization by using Python and combustion simulation software, so that the optimization can be completed under the condition that a parameterized model is not needed, the problem that parameters in the steel ladle are difficult to obtain under actual working conditions is avoided, and the modeling cost is greatly reduced.

In order to achieve the above purpose, the technical scheme provided by the invention is that a ladle baking system and an optimizing method based on deep reinforcement learning and combustion simulation technology are provided, and the method comprises the following steps:

s1, establishing an actual baking geometric model of a ladle based on acquired ladle parameters;

the geometric model comprises a gas pipeline, a fuel gas pipeline, a ladle cover, an internal combustion area, a ladle lining and a ladle shell.

S2, establishing and calculating a simulation model: and (3) importing the constructed ladle actual baking geometric model into combustion simulation software, calculating and analyzing to obtain a plurality of groups of parameters related to the temperature distribution of the lining of the ladle in the unstable combustion process and the pollutant discharge amount, and storing and outputting by using a script.

Specifically, in step S2, the process of calculating and analyzing the actual baking condition of the ladle by using the combustion simulation software includes the following steps:

S21, grid division: respectively modeling and meshing the ladle cover, the ladle body and the heat preservation layer, encrypting the central burning area mesh of the ladle body, and setting an access and a geometric boundary of the model;

S22, calculating and solving: setting required boundary conditions and fluid-solid coupling heat exchange conditions, selecting an unsteady state solver, initializing a primary field, selecting monitoring points, and then starting calculation according to a set baking system (namely gas flow and combustion-supporting gas flow in a specified time);

s23, obtaining the temperature distribution, flame length and pollutant emission of the ladle lining in the ladle baking process by using the script, and storing.

S3, data preprocessing: dividing the multiple groups of parameters into input data and test data to form a data set.

Specifically, in the pretreatment in this step, there are the following sub-steps:

s31, checking data and cleaning bad numbers, namely checking through data visualization processing (such as distribution diagram, violin diagram and the like), deleting abnormal points, and modifying time data according to actual baking time;

s32, dividing the data set into a training set and a testing set according to a baking time sequence in a certain proportion. The former part is a training set suitable for parameter training after the ladle model is built, and the latter part is a testing set to be used for checking and evaluating model precision;

S33, carrying out standardized processing on the data set, eliminating dimension influence among indexes, and solving the comparability problem among the data indexes.

S4, building a prediction model: and importing the data set into a deep learning model frame to obtain and verify the ladle lining temperature distribution, flame length and pollutant emission result predicted by the deep learning model, and finally storing the model.

S5, double-mode coupling: and connecting the combustion simulation with Python software by using a CORBA interface, creating a combustion simulation server session, and sending a TUI command and a script to a server side in real time, so that the combustion simulation software automatically executes a ladle baking heating system and returns an operation result.

S6, intelligent optimization control: and (3) establishing a baking control intelligent agent, leading in a state by the deep learning prediction model established in the step (S4), and interacting a combustion simulation model as an environment (in reinforcement learning, interacting with the intelligent agent) (outputting a decision and obtaining environment feedback), so as to perform baking optimization control on the baking process (the opening changes of a fuel gas valve and a fuel gas valve along with the baking process) in real time. The optimized control result is used for regulating and controlling the actual baking process.

Specifically, the intelligent optimization control in this step includes the following steps:

s61, state input: and (3) taking the deep learning prediction model established in the step (S4) as an input layer importing state, and obtaining ladle lining temperature distribution, flame length and pollutant discharge amount according to combustion-supporting air flow and gas flow parameters of a reference environment to import the intelligent agent network established in the subsequent step as state quantity.

S62, establishing an agent network: and establishing an Actor strategy network and a Critic evaluation network based on the deep learning network respectively, obtaining a state quantity input and then outputting a change decision of a better air-fuel ratio (the ratio of combustion-supporting air flow to gas flow), and obtaining an environmental feedback rewarding and then adjusting strategy network weight and evaluation network loss.

S63, environment interaction: the decisions such as the combustion-supporting gas flow output by the step S62, the gas flow change and the like are led into a simulation environment of combustion simulation through CORBA, environmental rewards are obtained and fed back to a Critic evaluation network, and the environmental states such as the gas flow, the combustion-supporting gas flow, the flame length and the like at the moment are input into a deep learning prediction model established in the step S4. Repeating the steps S61-S63, and iterating to finally obtain the optimal combustion-supporting air flow and gas flow (optimal baking system).

S64, actual baking control: and feeding back the optimal baking system to the actual baking process for baking control.

Furthermore, the intelligent optimization control of ladle baking is performed by adopting a finer simulation intelligent agent, the intelligent optimization control is not limited to ladle baking control, the intelligent optimization control can be expanded to the combustion control process of various hearths, and the combustion control of oxygen enrichment and MILD (Moderate & Intense Low Oxygen Dilution, a MILD combustion mode under the condition that combustion oxygen is strongly diluted to low oxygen) with higher requirements on combustion atmosphere can be realized.

The actual baking control comprises two control methods: automatic control of the system and manual control of operators.

Furthermore, the automatic control is to regulate and control the consumption of the fuel gas and the combustion-supporting gas according to the optimized control result of the intelligent agent by the control system, so that the optimal air-fuel ratio can be realized, the combustion efficiency can be improved, the consumption of the fuel gas can be saved, the energy-saving effect can be realized, and the pollutant emission can be controlled.

Further, the manual adjustment adjusts the gas consumption and the combustion-supporting gas consumption through the self-adjusting valve of an operator.

Compared with the prior art, the optimization effect of the invention is as follows:

1. According to the invention, the problems that the temperature distribution in the steel ladle is complex under the actual working condition, the actual baking condition is unknown and is difficult to dispatch, and the traditional machine learning model is difficult to acquire enough related data for training are solved by the combustion simulation technology and the deep reinforcement learning intelligent body;

2. The mode of establishing intelligent agent optimization control by using software is used for replacing manual operation, so that the optimization control period can be greatly shortened, the control efficiency can be improved, and the labor cost can be reduced; meanwhile, the obtained optimization scheme can be effectively guaranteed to be an optimal result, and further the ladle baking is performed efficiently and accurately.

Drawings

Fig. 1 is a model of a ladle roaster constructed by operating the optimization control method provided by the present invention:

Steel ladle cover: 1. a gas pipeline, a fuel gas assisting pipeline, a heat insulating layer, a gas inlet and a combustion gas inlet, wherein the gas pipeline, the fuel gas assisting pipeline, the heat insulating layer, the gas inlet and the combustion gas inlet are respectively arranged in sequence; ladle body: 4. a combustion zone; village in ladle: 7. the steel ladle comprises a steel ladle shell, a permanent layer and a working layer.

FIG. 2 is a flow chart of a modeling optimization method provided by the present invention;

FIG. 3 is a schematic diagram of an LSTM cell according to an embodiment of the invention

FIG. 4 is a schematic diagram of a baking agent according to an embodiment of the present invention

FIG. 5 is a schematic diagram of a baking optimization control structure based on deep reinforcement learning and combustion simulation according to an embodiment of the present invention

Detailed Description

In order to make the objects, techniques and embodiments of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention.

Examples:

As shown in fig. 1, the ladle adopted in the embodiment is of a jet type burner structure, a conical valve structure is arranged below a gas outlet, gas is sprayed out from a circular seam between a gas pipeline 1 and the conical valve, an internal combustion area can preheat external gas, a combustion-supporting gas pipeline 2 is arranged outside the gas pipeline, and heat-insulating cotton is additionally arranged below a ladle cover to play a heat storage function so as to preheat combustion-supporting gas above the ladle cover. The ladle cover is provided with a ladle inner combustion area below, the outside of the combustion area is provided with a working layer 9, a permanent layer 8 and a ladle shell 7, the working layer, the permanent layer and the heat insulation layer 3 are all made of heat insulation materials, the ladle shell is made of steel, and the gas pipeline and the combustion-supporting gas pipeline are made of high-temperature resistant stainless steel materials. After the roaster starts to work, coal gas enters the burner from a coal gas inlet 5 of a coal gas pipeline, the flow rate is increased due to the sudden reduction of the cross section area when the coal gas passes through a conical valve, a fan sends combustion-supporting gas into the burner through a combustion-supporting gas inlet 6 of a combustion-supporting gas pipeline, the combustion-supporting gas and the combustion-supporting gas are mixed in the burner after passing through the conical valve and begin to burn in a burning area 4, and the combustion-supporting gas and the blast furnace gas are preheated by the flue gas waste heat generated when the fan is matched for extraction and baking, so that the burning efficiency and the energy utilization rate are effectively improved; and because of adopting the internal combustion form and the oxygen-enriched combustion technology, the flame is sprayed out of the burner under the actions of blast furnace gas pressure, conical valve acceleration and thermal expansibility of gas, and the flame temperature is high.

During the simulated baking, the coal gas is sprayed and flows out from the annular seam at the upper part of the conical valve, and the combustion-supporting gas is driven to be fully mixed and burnt. By means of the reaction force of the gas which expands violently during combustion, the high-temperature flue gas is sprayed out of the nozzle. Meanwhile, when the gas flows through the conical valve in a bypass mode, the condition of bypass and body detachment occurs, and a stable combustion backflow area is formed after the blunt body structure of the conical valve. The high-temperature flue gas in the backflow area entrains the fuel gas and the combustion-supporting gas so that the combustion is stably carried out.

Performing combustion simulation on the ladle baking process: because the ladle baking process is in a closed high-temperature environment, an infrared temperature measuring instrument is often adopted in industry to measure the outside, and then the inside of the ladle is estimated, so that the temperature distribution in the ladle is difficult to measure accurately in real time. The combustion simulation software is used for simulation modeling, so that the problem of data sources can be solved, the real-time optimization effect can be simulated, and a basis is provided for a ladle baking and heating system.

And establishing a geometric model through SolidWorks software according to the actual model of the roaster, and dividing a network and encrypting a central combustion area through ICEM software. And importing the established grid into fluent software, setting boundary conditions (inlet conditions, outlet conditions and heat preservation parameters) according to actual working conditions, simulating a ladle baking unsteady combustion process, and processing simulation results to form a data set for a follow-up neural network baking prediction model.

Baking prediction using neural network model: the neural network has autonomous learning capability and very strong prediction capability, and the intelligent prediction method for ladle baking can train/predict and identify the running state of the system by using the neural network model, so that the installation and maintenance cost of hardware equipment is reduced, and the ladle baking process can be predicted in real time. As an alternative embodiment, the neural network model adopted in the method may be a Back Propagation (BP) neural network model, but the BP network prediction effect is not ideal under a longer period of time. Further, considering that the ladle baking process is a dynamic time sequence, as a preferred implementation mode, the invention adopts a Long Short-Term Memory (LSTM) with Memory capability to improve, and optimizes an intelligent prediction model of continuous casting ladle baking so as to realize automatic identification and effective prediction of air-fuel ratio in the baking process.

LSTM belongs to a variant of the recurrent neural network (Recurrent Neural Network, RNN): typically there is only one simple node in the RNN, such as a sigma function or tanh function, whereas in LSTM networks, the structure in the hidden layer is changed from simple node to unit; this allows the variant to not only retain most of the properties of the RNN model, but also solve the problems of gradient explosion and gradient disappearance during BP. Thus, LSTM is suitable for handling real tasks that are highly coupled with time.

As shown in fig. 3, represents an LSTM hidden layer unit. In the figure, each row carries a vector that is output from one node to the input of the other node. Circles represent point-wise operations such as vector addition, point multiplication, and boxes represent the learning neural network layer. The merging of lines indicates a connection, while the crossing of lines indicates that its content is being copied, the copy will go to a different location. The first row inside the hidden layer cell represents the cell state c, which can preserve the characteristic parameters of the history time, and the arrows from x _t to sigma and tanh layers can let three gates observe the cell state for training the input characteristic parameters: the memory unit has three parts, namely a forgetting gate f (a first sigma layer), an input gate i (divided into two parts, including a sigma layer and a tanh layer) and an output gate o (the input gate o is combined with the tanh layer after passing through the sigma layer). Each gate is a standard neuron in the recurrent neural network: t represents the activation of the input, output and forget gates at a time step t; [] Representing a micro-functional weighted sum; the outputs of the 3 control gates of the input gate, the forget gate and the output gate are respectively connected to a multiplication node in the figure, so as to control the input and output of the information flow and the state of the cell unit: x _t is an input node and can correspond to a characteristic parameter, wherein the characteristic parameter is ladle lining temperature distribution, flame length and pollutant discharge amount which are calculated and output by combustion simulation: h is an output node, and can correspond to an output state of an LSTM unit.

The operation of the LSTM unit can be represented by the following equations:

f_t＝σ(W_f·[h_t-1,x_t]+b_f) (1.1)

i_t＝σ(W_i·[h_t-1,x_t]+b_i) (1.2)

Using a deep reinforcement learning model to perform baking optimization control: reinforcement learning is different from the traditional supervised machine learning fitting method in that learning proceeds quickly because new feedback is quickly obtained (action is taken and rewards are obtained according to the environment) and subsequent decisions are immediately affected. But also can adapt to environmental changes quickly. However, weight calculation in reinforcement learning may not be performed in a continuous state-motion space problem because of lack of prediction ability and inability to generalize. The weights may be non-linearly fitted by means of a deep learning model, i.e. a deep reinforcement learning model. The steel ladle baking depth reinforcement learning model can train and control the running state of the system by using an intelligent agent, solves the problem that the internal condition of the baking process is difficult to observe, greatly reduces the manual operation cost, and can predict the steel ladle baking process in real time. Further, considering that the ladle baking process is a dynamic continuous action space, the invention adopts a near-end strategy optimization network (Proximal Policy Optimization, PPO) suitable for continuous control to improve, and optimizes the intelligent baking process of continuous casting ladle baking so as to realize intelligent optimization control of the baking process and achieve the optimal air-fuel ratio.

As shown in fig. 4, the ladle baking optimization control deep reinforcement learning PPO agent is shown. In the training process, the Environment (Environment) is ladle lining temperature distribution, flame length and pollutant discharge amount obtained by combustion simulation, after pretreatment, the acquired data are used as input data of an Actor network, and the Actor network outputs a group of gas flow and combustion-supporting gas change values (actions), namely strategies (policies), according to the input data (state). And judging whether the output result of the strategy is good or bad by the Critic network. And obtaining feedback (rewind) and the environmental state at the next moment according to the acquired data of the next period, and judging (value) the action of the Actor network in the previous period according to the Critic network, so as to guide the optimization of the Actor network.

The optimization control process of the PPO agent can be expressed as:

V_π(s)＝E_π[G_t|s_t＝s,a_t＝a] (1.6)

δ_t＝r_t+γV(s_t+1)-V(s_t) (1.8)

Where G _t is the long-term return desire, γ is the long-term return discount factor, and r is the time-of-t return. V _π(s) is the value of state s, a _t is the action performed during policy update, and s _t is the state value at this moment. Is the dominance value of the policy at time t for comparing the dominance of the new policy to the old policy. λ is a weighted average factor, δ _t is a time-series differential error (TD-error), L _V is a loss function (cost loss) of the critic network, L _P is a loss function (policy loss) of the Actor network, r _t (θ) is a time-of-t policy ratio, and ε is a super-parameter. Pi _θ(a_t|s_t) is the probability of the latest policy at time t and pi _θ,old(a_t|s_t) is the corresponding old policy.

The detailed steps of the optimal control of the PPO intelligent agent are as follows:

1. Inputting environmental information s (ladle lining temperature distribution, flame length and pollutant discharge amount) into actor-new network to obtain parameters representing flow distribution: mu and sigma, get the gaussian distribution that they represent, i.e. action a (comburent gas flow and gas flow), and then replicate action to build a new network actor-old. Inputting the data into a combustion simulation environment to obtain a reward r (weighted sum of the temperature change quantity, flame length change quantity and pollutant discharge quantity in a ladle) and the next state s ', establishing a list for storing [ (s, a, r) … ], inputting s' into a actor-new network, and cycling the step until [ (s, a, r) … ] data quantity meets the requirement of optimal control.

2. Inputting s 'obtained in the last step of the step 1 into critic networks to obtain a v' (heating effect, energy saving and negative pollutant discharge amount under the corresponding state) value corresponding to the state.

3. Inputting the stored data in the step 1 into critic network to obtain v' values of all states, and calculating the dominant value of the strategy at the time t

4. The L _V is found and then the critic network is updated by Back Propagation (BP).

5. The stored s list is input into actor-old and actor-new networks to obtain a Gaussian distribution Normal1 and Normal2 respectively, and then stored actions are stored as list actions and are input into the Gaussian distribution Normal1 and Normal2 to obtain prob1 and prob2 (namely pi _θ,old(a_t|s_t and pi _θ(a_t|s_t) corresponding to each action), and compared with the obtained importance weight (importance weight), namely r _t (theta).

6. L _P is calculated and BP is performed to update actor-new network.

7. And (3) after the steps of 5-6 are cycled, finishing the cycle after the precision requirement is met, and updating the actor-old network by using the time sequence differential error delta _t of the latest actor-new network.

8. And (5) circularly carrying out 1-7 steps, and outputting an optimal strategy.

As shown in fig. 5, a baking optimization control structure based on deep reinforcement learning and combustion simulation technology is shown. The ladle baking model is built by the combustion simulation software, an environment is built for the intelligent body and interacted with the intelligent body, and the optimal gas flow and the optimal combustion-supporting gas flow are output to the actual production environment in real time to guide the baking process.

While there has been shown and described what are at present considered to be the fundamental principles and structural features of the present invention, it will be understood by those skilled in the art that the present invention is not limited to the details of the foregoing preferred embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof; the present embodiments are therefore to be considered as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The ladle baking optimization method based on the coupling of the deep reinforcement learning and the combustion simulation technology is characterized by comprising the following steps of:

S2, establishing and calculating a simulation model: the constructed ladle actual baking geometric model is imported into combustion simulation software, calculation and analysis are carried out, the ladle baking unsteady state combustion process is simulated, and a plurality of groups of parameters related to the ladle lining temperature distribution and pollutant emission in the unsteady state combustion process are obtained;

s3, data preprocessing: dividing the acquired multiple groups of parameters into training data and test data to form a data set;

S4, building a prediction model: importing the data set into a deep learning model frame to obtain ladle lining temperature distribution, flame length and pollutant emission results predicted by the deep learning model, and storing a final prediction model after verification;

S5, double-mode coupling: connecting combustion simulation software and Python software, creating a combustion simulation server session, automatically executing a ladle baking heating system by the combustion simulation software, and returning an operation result;

S6, intelligent optimization control: establishing a baking control intelligent agent, importing a state by a deep learning prediction model established in the step S4, interacting a combustion simulation model as an environment, performing baking optimization control on the baking process in real time, and finally feeding back an optimal baking system to an actual baking process;

The step 6 specifically comprises the following steps:

S61, state input: the deep learning prediction model established in the step S4 is used as an input layer to be imported into a state, and the ladle lining temperature distribution, the flame length and the pollutant discharge amount are obtained according to the combustion-supporting air flow and the gas flow parameters of the reference environment and are used as state quantities to be imported into an intelligent agent network established in the subsequent step;

s62, establishing an agent network: establishing an Actor strategy network and a Critic evaluation network based on a deep learning network respectively, obtaining a state quantity input and then outputting a better air-fuel ratio change decision, and adjusting strategy network weight and evaluation network loss after obtaining environmental feedback rewards;

s63, environment interaction: the decision of the combustion-supporting air flow and the gas flow variation output by the step S62 is imported into a simulation environment of combustion simulation, environmental rewards are obtained and fed back to a Critic evaluation network, the environmental states of the gas flow, the combustion-supporting air flow and the flame length at the moment are input into a deep learning prediction model established in the step S4, the steps S61-S63 are repeated, iteration is carried out, and finally the optimal combustion-supporting air flow and the optimal gas flow are obtained as an optimal baking system;

2. The ladle baking optimization method based on the coupling of the deep reinforcement learning and the combustion simulation technology according to claim 1, wherein the method is characterized in that: the ladle actual baking geometric model in the step S1 comprises a gas pipeline, a fuel gas pipeline, a ladle cover, an internal combustion area, a ladle lining and a ladle shell.

3. The ladle baking optimization method based on the coupling of the deep reinforcement learning and the combustion simulation technology according to claim 1, wherein the method is characterized in that: the step S2 specifically comprises the following steps:

S22, calculating and solving: setting required boundary conditions and fluid-solid coupling heat exchange conditions, selecting an unsteady state solver, initializing a primary field, selecting monitoring points, and then starting calculation according to a set baking system, wherein the set baking system comprises gas flow and combustion-supporting gas flow in a set time;

s23, acquiring ladle lining temperature distribution, flame length and pollutant emission data of a ladle baking process calculated by a simulation model.

4. The ladle baking optimization method based on coupling of deep reinforcement learning and combustion simulation technology according to claim 3, wherein after preprocessing the internal combustion data of the ladle obtained by combustion simulation, predictive training is performed by using a deep learning model, and updated lining temperature distribution, flame length and pollutant emission data are obtained.

5. The ladle baking optimization method based on coupling of deep reinforcement learning and combustion simulation technology according to claim 1 or 4, wherein the deep learning model comprises a long-term and short-term memory network with memory capability, the input node corresponds to a characteristic parameter, and the characteristic parameter comprises ladle lining temperature distribution, flame length and pollutant emission output by combustion simulation calculation.

6. The ladle baking optimization method based on the coupling of the deep reinforcement learning and the combustion simulation technology according to claim 1, wherein in the step 6, an improved near-end strategy optimization network is adopted to optimize the intelligent baking process of continuous casting ladle baking.

7. The ladle baking optimization method based on the coupling of the deep reinforcement learning and the combustion simulation technology according to claim 1, wherein the baking optimization control of the baking process comprises the control of the opening degree variation of a fuel gas valve and a fuel gas valve along with the baking process.

8. A ladle baking system based on deep reinforcement learning and combustion simulation technology, comprising:

For establishing a ladle geometric model based on the acquired ladle parameters,

The ladle combustion simulation model is used for obtaining a plurality of groups of parameters for obtaining the correlation of the temperature distribution of the ladle lining and the pollutant discharge amount in the unsteady combustion process through the established ladle geometric model;

a ladle baking deep learning prediction model for carrying out deep learning treatment on a plurality of groups of acquired parameters,

The ladle baking depth reinforcement learning intelligent agent optimizing control model is used for optimizing an intelligent baking process of baking the continuous casting ladle and realizing intelligent optimizing control of the baking process;

establishing a ladle baking model by using combustion simulation software, establishing an environment for an intelligent body and interacting with the environment, and outputting optimal gas flow and combustion-supporting gas flow to an actual production environment in real time to guide the baking process;

The method specifically comprises the following steps:

S61, state input: the deep learning prediction model is used as an input layer to be imported, and the ladle lining temperature distribution, the flame length and the pollutant discharge amount are obtained according to the combustion-supporting air flow and the gas flow parameters of the reference environment and are used as state quantity to be imported into an intelligent network established in the subsequent step;

S63, environment interaction: the decision of the combustion-supporting air flow and the gas flow variation output by the step S62 is imported into a simulation environment of combustion simulation, environmental rewards are obtained and fed back to a Critic evaluation network, the environmental states of the air flow, the combustion-supporting air flow and the flame length at the moment are input into a deep learning prediction model, the steps S61-S63 are repeated for iteration, and finally the optimal combustion-supporting air flow and the optimal gas flow are obtained as an optimal baking system;