CN116469481B

CN116469481B - LF refined molten steel composition forecasting method based on XGBoost algorithm

Info

Publication number: CN116469481B
Application number: CN202310726453.1A
Authority: CN
Inventors: 丁宏翔; 沙周凤; 李世健
Original assignee: Suzhou Fangxing Information Technology Co ltd
Current assignee: Suzhou Fangxing Information Technology Co ltd
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-08-29
Anticipated expiration: 2043-06-19
Also published as: CN116469481A

Abstract

The invention discloses an LF refined molten steel composition forecasting method based on XGBoost algorithm, which comprises the following steps: collecting LF refining initial parameters; preprocessing data; constructing an XGBoost model by using a plot-importance function in an XGBoost module under a python environment; the relative ratio of the predicted value to the actual value in the XGBoost model uses the average absolute error MAE and the R square as evaluation criteria to reflect the quality of the predicted value; and predicting the molten steel components after the model evaluation is qualified. The whole process is simple, and the components Al, si and Mn of molten steel in the initial stage of LF refining are predicted by using an XGBoost algorithm, so that the converter tapping and the initial stage of LF refining are guided, the stability of the components of molten steel is improved, the steelmaking production efficiency is improved, the steelmaking cost is reduced, and qualified high-quality molten steel is provided for the continuous casting process.

Description

LF refined molten steel composition forecasting method based on XGBoost algorithm

Technical Field

The invention relates to the technical field of steelmaking, in particular to an LF refined molten steel composition forecasting method based on an XGBoost algorithm.

Background

In modern steelmaking processes mainly comprising an electric arc furnace or a converter, LF refining, vacuum degassing and continuous casting, the LF process plays a role in supporting the top and bottom and is known as a buffer in the steelmaking process. The temperature and the components of the molten steel after primary smelting generally have certain fluctuation, and can be controlled by LF refining, so that molten steel with qualified temperature and components is provided for degassing or continuous casting procedures regularly. In view of the advantages, almost all excellent steel is produced through an LF refining process, but enterprises face some common problems in the LF process at present, for example, a primary refining furnace end point often has certain fluctuation, a steel ladle arrives at a station and is added with a deoxidizer and slag according to experience of operators, so that fluctuation of molten steel components and slagging property in the primary LF stage is large, the fluctuation is often genetic, and finally the fluctuation of product quality is large. How to realize stable control of molten steel components in the initial stage of LF refining is a great difficulty facing current metallurgical workers.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides an LF refined molten steel component forecasting method based on an XGBoost algorithm, utilizes the XGBoost algorithm to forecast the contents of three elements of Al, si and Mn in the molten steel component in the initial stage of LF refining, and aims to guide the converter tapping and the initial process of LF refining and improve the stability of the molten steel component.

In order to achieve the above purpose, the invention adopts the following technical scheme: an LF refined molten steel composition forecasting method based on XGBoost algorithm comprises the following steps:

s1: combining metallurgical principle analysis and experience of field operators, primarily screening out characteristic parameters affecting molten steel components in the initial stage of LF, combining a large amount of actual production data, integrating and storing into a database;

s2: duplicate functions in python are utilized to remove repeated items in the data, and dropna functions are utilized to remove null values; the z-score method is utilized to normalize the data, and the influence of the variation size and the numerical value size of the dimension and the variable is eliminated; deleting abnormal points by using a 3 sigma outlier detection method;

s3: the preprocessed data is used for representing the correlation among the features by using pearson correlation coefficients, and the features with little meaning on model training are removed;

s4: constructing an XGBoost model by using a plot-importance function in an XGBoost module under a python environment, establishing a relation between input features and output features, sequencing contribution degrees of the output features according to the input features, and finally eliminating features with contribution degree scores smaller than 20 to a target index;

s5: the relative ratio of the predicted value to the actual value in the XGBoost model uses the average absolute error MAE and the R square as evaluation criteria to reflect the quality of the predicted value, and the specific formula is as follows:

，

wherein:n、mpredicting the number of samples;is a model predictive value; />Is an actual value; />Is the average value;RSSis the sum of squares of the residuals;

s6: if in step S5MAEValue sum R ² When the values are in the preset range, judging that the XGBoost model is qualified in evaluation, entering a step S7, otherwise, failing, and carrying out operations of the steps S2 to S5 again;

s7: and predicting the molten steel composition by adopting a qualified XGBoost model.

As a specific implementation manner, the data preprocessed in the step S3 is randomly divided into a training group and a test group, the XGBoost model is provided with three models of Si, mn and Al, and training parameters of the three models are set as follows:

the boost is set as a tree model gbtree;

the maximum depth max-depth of the tree is set to 6;

the learning rate cta is set to 0.03 in the Si and Mn model and 0.01 in the Al model;

the gbtree classifier number was set to 900 in the Si, mn model and 1000 in the Al model.

As a specific embodiment, in the Si modelMAEThe value setting range is less than 0.015, in Mn modelMAEThe value setting range is less than 0.023 and is in an Al modelMAEThe value setting range is less than 0.003, and the three modelsR ² The value setting ranges are all greater than 0.98.

As a specific embodiment, the method for forecasting the composition of the LF refined molten steel further includes the following steps:

s8: after step S7, the actual production data in a period of time in the future is collected and included in the sample data and stored in the database, and then steps S2 to S7 are repeated.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages: the invention predicts the Al, si and Mn components of the molten steel in the initial stage of LF refining by using an XGBoost algorithm, and aims to guide the tapping of a converter and the initial process of LF refining, improve the stability of the components of the molten steel, improve the steelmaking production efficiency, reduce the steelmaking cost and provide qualified high-quality molten steel for the continuous casting process.

Drawings

FIG. 1 is a characteristic variable-importance score chart of the molten steel component Al in the LF refining treatment in example 1;

FIG. 2 is a characteristic variable-importance score chart of the LF refining treatment molten steel composition Si in example 1;

FIG. 3 is a characteristic variable-importance score chart of the LF refining treatment molten steel component Mn in example 1;

FIG. 4 is a graph showing the comparison between the measured and predicted values of the initial Al content in LF refining in example 1;

FIG. 5 is a graph showing comparison between measured and predicted values of Si content in the initial stage of LF refining in example 1;

FIG. 6 is a graph showing the comparison of measured and predicted values of Mn content in the initial stage of LF refining in example 1;

FIG. 7 is a flow chart of the initial molten steel composition prediction in LF refining based on XGBoost algorithm.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings and specific embodiments.

The invention provides an LF refined molten steel composition forecasting method based on XGBoost algorithm, which is shown in FIG. 1 and comprises the following steps:

s1, acquiring LF refining initial parameters: combining metallurgical principle analysis and experience of field operators, primarily screening out characteristic parameters affecting molten steel components in the initial stage of LF, collecting a large amount of actual production data, integrating and storing the data in a database;

s2, data preprocessing: duplicate functions in python are utilized to remove repeated items in the data, and dropna functions are utilized to remove null values; the z-score method is utilized to normalize the data, and the influence of the variation size and the numerical value size of the dimension and the variable is eliminated; deleting abnormal points by using a 3 sigma outlier detection method;

s3, representing the correlation among the features by using pearson correlation coefficients, and eliminating the features with little meaning on model training;

s4, constructing an XGBoost model by using the training set:

the preprocessed data are randomly divided into a training group and a testing group, wherein an XGBoost model is provided with three models of Si, mn and Al, and training parameters of the three models are set as follows:

the boost is set as a tree model gbtree;

the maximum depth max-depth of the tree is set to 6;

the number of the gbtree classifiers is set to 900 in the Si and Mn models and 1000 in the Al models;

constructing an XGBoost model by using a plot-importance function in an XGBoost module under a python environment, establishing a relation between input features and output features, sequencing contribution degrees of the output features according to the input features, and finally eliminating features with contribution degree scores smaller than 20 to a target index;

s5, calculating evaluation indexes MAE and R ² ：

Comparing the predicted value with the actual value in the XGBoost model by using average absolute errors MAE and R ² As an evaluation criterion, the quality of the predicted value is reflected, and the specific formula is as follows:

，

s6, verifying the XGBoost model by using a test set: if Si content prediction model in step S5MAEIn a model for forecasting Mn content with a value less than 0.015MAEThe value is less than 0.023 and the Al content is predicted in the modelMAEThe value should be less than 0.003,R ² all are larger than 0.98, judging that the forecast model is evaluated to be qualified, entering the step S7, otherwise, failing, and carrying out operations of the steps S2 to S5 again;

s7, model prediction result treatment: adopting a qualified XGBoost model to predict molten steel components, and ending;

s8, after the qualified XGBoost model is adopted for predicting the molten steel components, collecting actual production data in a period of time in the future, incorporating the actual production data into a sample data base, and then repeating the steps S2 to S7.

Example 1

In the example, taking actual production data of Jiangsu Yongsteel group company two steel works from 2019 1 month 1 day to 2023 1 month 1 day as an example, the content of Al element in LF refined molten steel is predicted, and the steps are as follows:

s1, integrating 2020 group data acquired on site and storing the integrated 2020 group data into a database, and primarily screening out parameters possibly related to the content of Al element in refined molten steel in an LF furnace, namely, input characteristic parameters, wherein the method comprises the following steps: LF furnace start-up temperature, furnace cover furnace age, ladle temperature, smelting period, total oxygen amount, molten iron weight, aluminum block addition amount, net promoting dosage, high-purity ferrosilicon addition amount, converter end point C content, converter end point S content, argon-back Mn content, argon-back Mo content and the like;

s2, eliminating repeated items in the data by using a duplicate function in python, and eliminating null values by using a dropna function; the z-score method is utilized to normalize the data, and the influence of the variation size and the numerical value size of the dimension and the variable is eliminated; deleting abnormal points by using a 3 sigma outlier detection method, and remaining 2000 groups of data;

s4, constructing an Al model by using the training set:

the preprocessed data are randomly divided into 1600 groups of training data, 400 groups of test data, and training parameters are set as follows:

the boost is set as a tree model gbtree;

the maximum depth max-depth of the tree is set to 6;

the learning rate cta is 0.01;

the gbtree classifier number is 1000,

constructing an Al model by using a plot-importance function in an XGBoost module under a python environment, establishing a relation between input features and output features, sorting contribution degrees of the output features according to the input features, finally eliminating features with a contribution degree score smaller than 20 to a target index, and finally selecting feature parameters, wherein the figure 1 is output feature parameter sorting of 30 bits before the importance of the output features of the Al model, and comprises the following steps: aluminum bean addition amount, furnace cover furnace age, LF furnace inlet temperature, argon post-S content, ladle temperature, smelting period, converter tapping molten steel weight, argon post-Si content, molten steel total oxygen content, nitrogen consumption, LF furnace age, argon station inlet time, tapping process time consumption, slag recovery value, converter endpoint S content, argon post-Mo content, argon post-Mn content, converter endpoint C content, argon post-Nb content, argon post-V content, aluminum element proportion in molten steel, steam recovery amount, difference between converter tapping and LF furnace inlet temperature, actual slag MgO content, gun head age, molten iron temperature, difference between molten iron-argon post-Ni element content, recovery heat value and argon post-Cr content;

s5, calculating evaluation indexes MAE and R ² ：

Comparing the predicted value with the actual value in the Al model by using average absolute errors MAE and R ² As an evaluation criterion, the quality of the predicted value is reflected, and the specific formula is as follows:

，

wherein:n、mthe number of samples is predicted, and 2000 is taken as the number;is a model predictive value; />Is an actual value; />Is the average value;RSSis the sum of squares of the residuals;

the predicted value in the Al model is compared with the actual value, as shown in the comparison with reference to fig. 4, by calculation, when the number of classifiers of the Al model is set to 1000,MAEin the form of 0.002345,R ² 0.983041 is reached;

s6, verifying the XGBoost model by using a test set: in Al modelMAEThe value is smaller than the set value of 0.003 at the same timeR ² If the set value is greater than 0.98, judging that the Al model is qualified in evaluation, and entering step S7;

s7, model prediction result treatment: predicting the Al content in the molten steel component by adopting a qualified Al model, and ending;

s8, after the qualified Al model is adopted to predict the content of Al in the molten steel component, collecting actual production data in a period of time in the future, incorporating the actual production data into a sample data and storing the sample data into a database, and then repeating the steps S2 to S7.

Example 2

The content of Si and Mn elements in the refined molten steel of the LF furnace is predicted by adopting the same method as in the embodiment 1:

s1, integrating 43147 groups of data acquired on site and storing the data into a database, and primarily screening out parameters possibly related to the content of Si and Mn elements in refined molten steel of an LF furnace, namely, input characteristic parameters, wherein the method comprises the following steps: LF furnace start-up temperature, furnace cover furnace age, ladle temperature, smelting period, total oxygen amount, molten iron weight, aluminum block addition amount, net promoting dosage, high-purity ferrosilicon addition amount, converter end point C content, converter end point S content, argon-back Mn content, argon-back Mo content and the like;

s2, eliminating repeated items in the data by using a duplicate function in python, and eliminating null values by using a dropna function; the z-score method is utilized to normalize the data, and the influence of the variation size and the numerical value size of the dimension and the variable is eliminated; deleting abnormal points by using a 3 sigma outlier detection method, and remaining 43053 groups of data;

s4, constructing two models of Si and Mn by using a training set:

the Si model and the Mn model randomly divide 43053 data into 34296 data and 8757 data, and training parameters of the two models are set as follows:

the boost is set as a tree model gbtree;

the maximum depth max-depth of the tree is set to 6;

the learning rate cta is 0.03;

the number of the gbtree classifiers is 900;

constructing an XGBoost model by using a plot-importance function in an XGBoost module under a python environment, establishing a relation between input features and output features, sorting the contribution degree of the output features according to the input features, and finally eliminating features with the contribution degree score smaller than 20 to a target index, wherein the output feature parameters of the Si model with the contribution degree score smaller than 20 are sorted, as shown in figures 2 and 3, and figure 2 is the output feature parameter sorting of the first 30 bits of the importance of the output features, and the method comprises the following steps: si content after argon, furnace age of LF furnace, si content before argon, S content after argon and the like; FIG. 3 is a ranking of output characteristic parameters of the first 30 bits of the Mn model output characteristic importance, including Mn content after argon, mn content before argon, C content after argon, S content after argon, etc.;

s5, calculating evaluation indexes MAE and R ² ：

Calculated by the same formula as in example 1, where n and m are 43053, and the predicted values in both Si and Mn models are compared with the actual values, as shown in FIGS. 5 and 6, by calculation, when the number of classifiers for the Si model and Mn model is set to 900, the two models are obtainedMAE0.014151 and 0.022906 respectively,R ² 0.988921 is achieved;

s6, verifying the XGBoost model by using a test set: in Si modelMAEA value less than the set point of 0.015, in Mn modelMAEThe value is less than the set value of 0.023, and the two modelsR ² The values are all larger than the set value of 0.98, the Si model and the Mn model are judged to be qualified, and the step S7 is carried out;

s7, model prediction result treatment: respectively predicting Si and Mn contents in molten steel components by adopting a qualified Si model and an Mn model, and ending;

s8, after the qualified Si model and Mn model are adopted to predict the Si and Mn content in the molten steel component, collecting actual production data in a period of time in the future, incorporating the actual production data into a sample data and storing the sample data into a database, and then repeating the steps S2 to S7.

Therefore, the overall forecasting accuracy of the LF refined molten steel composition forecasting model is higher, the method has good guiding significance for actual production of a production line, and along with the continuous increase of the subsequent data quantity, redundant abnormal points and noise are eliminated, and the accuracy of the forecasting model is increased.

The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims

1. An LF refined molten steel composition forecasting method based on an XGBoost algorithm is characterized by comprising the following steps:

s5: the relative ratio of the predicted value to the actual value in the XGBoost model is used as a main evaluation standard, the difference value between the predicted value and the actual value is reflected, the R square is used as an auxiliary evaluation standard, and the fitting effect of the model is reflected, wherein the specific formula is as follows:

wherein: n and m are the number of sample predictions;is a model predictive value; y is _i Is an actual value; />Is the average value; RSS is the sum of squares of the residuals;

s6: if the MAE value and R in step S5 ² When the values are in the preset range, judging that the XGBoost model is qualified in evaluation, entering a step S7, otherwise, failing, and carrying out operations of the steps S2 to S5 again;

s7: adopting a qualified XGBoos t model to predict molten steel components;

here, the data preprocessed in step S3 is randomly divided into a training group and a test group, and the XGBoost model is provided with three models of Si, mn, and Al, where training parameters of the three models are set as follows:

the boost is set as a tree model gbtree;

the maximum depth max-depth of the tree is set to 6;

2. The method for predicting the composition of LF refined molten steel based on XGBoost algorithm as set forth in claim 1, wherein the set range of MAE values in Si model is less than 0.015, the set range of MAE values in Mn model is less than 0.023, the set range of MAE values in Al model is less than 0.003, R of the three models ² The value setting ranges are all greater than 0.98.

3. The LF refined molten steel composition prediction method based on xgboost algorithm according to claim 1, characterized in that the LF refined molten steel composition prediction method further comprises the steps of: