CN117272782A

CN117272782A - Hot rolled steel mechanical property prediction method based on self-adaptive multi-branch depth separation

Info

Publication number: CN117272782A
Application number: CN202310986868.2A
Authority: CN
Inventors: 张其文; 郭荣平; 唐兴昌; 郭欣欣; 王义超
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2023-08-07
Filing date: 2023-08-07
Publication date: 2023-12-22
Anticipated expiration: 2043-08-07

Abstract

The invention discloses a method for predicting mechanical properties of hot rolled steel based on self-adaptive multi-branch depth separable, and relates to the technical field of prediction of mechanical properties of hot rolled steel. On one hand, the invention aims at the problem of weaker original data characteristic representation capability, and provides a Gram Angle Field (GAF) method, which is used for converting original one-dimensional data into a two-dimensional matrix and enriching the correlation characteristics among the data, so that the model can extract more effective characteristic information and obtain better training, thereby realizing the improvement of performance. On the other hand, in order to improve model precision when the high real-time requirement of task compactness is solved, a combination module of multi-branch depth separable convolution (MB-DSC) and a self-adaptive bottleneck layer (AB) is provided, the characteristic extraction capacity is improved, intuitionistic is introduced on the basis of depth separable, the model performance is improved while the model is lightened, and the relationship between the model and the model is balanced, so that a lightweight high-precision model is constructed.

Description

Hot rolled steel mechanical property prediction method based on self-adaptive multi-branch depth separation

Technical Field

The invention relates to the technical field of prediction of mechanical properties of hot rolled steel, in particular to a method for predicting the mechanical properties of hot rolled steel based on self-adaptive multi-branch depth separable.

Background

With the development and progress of the times, the requirements of people on the mechanical properties of the hot rolled steel are increasingly strict, and industrial products produced in industries such as construction, bridges, traffic and the like are not separated from the excellent mechanical properties of the hot rolled steel.

At present, in the smelting process of the hot-rolled steel, the microstructure of the hot-rolled steel crystal phase can be changed differently due to the influence of different technological parameters and chemical component contents, and the microstructure directly determines the mechanical properties of the hot-rolled steel. The process of hot rolling is shown in fig. 1. Important mechanical properties of hot rolled steel include Yield Strength (YS), tensile Strength (TS) and Elongation (EL). YS refers to the stress magnitude when the metal material is subjected to external stress and is subjected to plastic deformation, and is also called the yield limit when the yield phenomenon occurs. TS refers to the maximum tensile stress to which the steel is subjected before breaking under tensile stress. EL means that when a steel sheet is stretched by an external force until it breaks, the percentage of the stretched length to the length of the original sample is called elongation.

The influencing factors on the performance index are divided into two types, namely process parameters and chemical components. In terms of process parameters, the furnace heating temperature (FT) has a certain influence on the Start Rolling Temperature (SRT). Higher heating temperatures can increase the plasticity and heat deformability of the steel, resulting in a relatively low initial rolling temperature. Lower initial rolling temperatures can reduce recrystallization behavior during hot deformation, thereby refining the grains and improving the strength and toughness of the steel. During the process of reducing the initial rolling temperature to the final rolling temperature, the mechanical properties may be reduced due to the influence of time and speed, and the excessively high or excessively low Final Rolling Temperature (FRT) may further influence the Coiling Temperature (CT), and the cooling rate, the phase change behavior and the curling process of the steel are influenced by the coiling temperature. In addition, the degree of compression (Redn) also has an effect on the finishing temperature. Higher degrees of compression will generate more heat, resulting in a relatively higher finishing temperature. Therefore, in selecting the finishing temperature, the limitation of the degree of compression needs to be considered in order to avoid the decrease in strength and toughness of the steel caused by excessive recrystallization and grain growth. In terms of chemical content, the interrelationship has a non-negligible effect on the mechanical properties. Increasing the content of carbon (C) and silicon (Si) can increase the yield strength and tensile strength of the steel, but can decrease the elongation and cold workability. However, too high a carbon and silicon content may result in increased brittleness of the steel. The proper amount of manganese (Mn) content can promote solid solution strengthening of carbon, thereby improving the yield strength and the tensile strength of the hot rolled steel. Phosphorus (P) is also present in solid solution in steel to strengthen the steel rolling, but when manganese forms a compound with phosphorus, the solid solubility of phosphorus is reduced, reducing its content. When sulfur (S) forms a compound with carbon, the solid solubility of carbon is lowered, resulting in precipitation of carbon and formation of brittle sulfide, thereby further lowering the performance of the steel.

In the past, a random sampling method or a physical metallurgical model method is adopted to measure the mechanical properties of each batch of products, so that the nonlinear relation in the production process is difficult to describe, a great deal of time and labor cost are also spent, and most importantly, the method depends on personal experience of an operator, so that a correct and effective conclusion cannot be obtained.

The prior application (publication No. CN 110472349A) provides a one-dimensional CNN model for mechanical property prediction aiming at the field. And the prior invention application (publication No. CN 116307195A) uses a sequential interpolation method to enhance the data of the field data, and performs rolling width prediction by combining a CNN model. Conventional convolution is a basic local operation, as shown in fig. 2. The core idea is to obtain an output feature map by performing an operation while sliding a small window called a convolution kernel on the input data. This convolution kernel is typically a small matrix that is much smaller in size than the input data. The convolution kernel is operated with the input data by element-wise multiplication, the multiplication results are summed to obtain a scalar value, and the scalar value is used as an element of the output feature map. The moving window repeats this process until the entire input data is traversed, resulting in an output profile. Typically, the model will convolve the input features using multiple convolution checks to achieve feature rich objectives.

Convolution kernels play a key role in the convolution process. Convolution kernels of different sizes and shapes can be used to extract different features, which provides the possibility for the network to learn more advanced feature expressions, inspiring the idea of subsequent optimization improvement. However, compared with other image data, the hot rolled steel data still has unsatisfactory feature performance, which is shown in unreasonable way of the data enhancement method, and the primary purpose is to make the original one-dimensional data more in line with the feature extraction process of the convolutional neural network in the process of converting the original one-dimensional data into two-dimensional data, and meanwhile, more relevant feature is introduced to assist the learning training of the network. And for the hot rolling prediction problems of wide development and compact tasks, how to construct a lightweight model with excellent performance has important value for the production and utilization of hot rolled steel.

However, the prior art proposes to use sequential interpolation to convert the original data into two-dimensional data for data enhancement. The method has the defects that correlation characteristics among data are not fully considered, simple sequential arrangement is only performed, original data are placed in a space matrix with a specified size one by one, dimensional change is realized, and the method has little help to model performance improvement. Meanwhile, the prior art provides a CNN model for the hot rolling performance prediction problem, and the model performance is improved by applying the strong characteristic extraction capability of the CNN model. However, in actual production, the hot rolling process has high real-time requirements, front-end equipment resources are limited, and the traditional CNN model is thick and complicated.

Disclosure of Invention

The invention aims at two problems existing in the prior art: on the one hand, in the field of hot-rolled steel mechanical property prediction, the hot-rolled steel data is complex and multi-coupled, and the correlation characteristics among influence factors can be ignored when the hot-rolled steel data is directly used as the input of a network, so that the model prediction precision is insufficient; on the other hand, in the field, the hot rolling process is a widely developed and compact process, has a plurality of prediction index tasks, has real-time requirements, and needs a lightweight model under the condition that the computing resources of front-end equipment are limited. Therefore, the self-adaptive multi-branch depth separable hot-rolled steel mechanical property prediction method is provided, the original one-dimensional data is converted into two-dimensional data by adopting a gram angle field method, and the data enhancement is realized by introducing data correlation characteristics, so that more effective characteristic information can be extracted from a model, and the performance is improved; and a depth separable convolution module combining a multi-branch enhanced convolution kernel and a self-adaptive bottleneck layer is provided to balance the relation between model precision and light weight, so that a light weight high-precision model is constructed.

In order to achieve the above object, the present invention provides the following technical solutions:

the invention provides a method for predicting mechanical properties of hot rolled steel based on self-adaptive multi-branch depth separable, wherein input parameters comprise 5 heat treatment parameters and 5 chemical components, the 5 heat treatment parameters comprise heating temperature in a furnace, start rolling temperature, finish rolling temperature, coiling temperature and compressibility, and the 5 chemical components comprise carbon, silicon, manganese, phosphorus and sulfur, and the method comprises the following steps:

s1, carrying out data enhancement on original one-dimensional data to obtain a two-dimensional matrix, and dividing a data set;

s2, constructing an optimized and improved convolutional neural network model; the optimized and improved convolutional neural network model uses depth separable convolutional operations to carry out channel-by-channel convolution and point-by-point convolution; the channel-by-channel convolution uses a single channel as a feature map, and local information of each channel is extracted by using each convolution kernel; the point-by-point convolution uses a convolution kernel of the same channel dimension as the channel-by-channel convolution, and integrates the convolution kernel into the same number of output feature maps as the convolution kernel of the point-by-point convolution; in a channel-by-channel convolution stage, constructing a multi-branch convolution neural network, wherein each branch represents convolution kernels with different sizes, and summing the multi-branch outputs; in the point-by-point convolution stage, combining the self-adaptive bottleneck layer with the self-adaptive bottleneck layer, and adopting global average pooling operation to self-adaptively adjust the bottleneck coefficient through weight size;

s3, setting super parameters, learning, training, optimizing and improving a convolutional neural network model on a training set, and judging through a loss function to obtain a trained model;

s4, predicting the test set data by using the trained model to obtain a hot rolled steel mechanical property prediction result.

Further, in step S1, the data enhancement is performed by using a glamer angle field method, which includes the steps of:

s11, performing normalization operation on the original one-dimensional data, and assuming that one-dimensional data X= { X exists ₁ ,x ₂ ,...,x _N Normalized one-dimensional data to [ -1,1 } length N]As shown in formula (1):

wherein x is _i Data for the ith position, x _i The value of the position after normalization is min (X) and max (X) respectively represent the minimum value and the maximum value in X;

s12, representing the normalized data by using polar coordinates, wherein the value of the one-dimensional data is taken as a polar angle theta, and the value of the position where the corresponding value is located is taken as a polar axis r, as shown in the formula (2):

where i is a position value, L is a position length, r _i For its polar axis, θ _i For its polar angle, x after normalization _i Is within the range of [ -1,1]So theta is _i The value range of (2) is [0, pi ]]And cos θ is monotonic in this interval;

s13, summing different position angles to obtain a gram and an angle field, wherein the gram and the angle field are shown in a formula (3):

further, the method is characterized in that in step S1, the enhanced data is represented by 9: the ratio of 1 is divided into a training set and a test set.

Further, the step S2 of constructing an optimized and improved convolutional neural network model includes the steps of:

s21, performing two convolution processing steps of channel-by-channel convolution and point-by-point convolution by using depth separable convolution operation on the basis of CNN; let the input feature map size be D _in ×D _in X M, output feature map size D _out ×D _out The dimension of the x N corresponds to the height, width and channel number of the feature map, and the dimension of the convolution kernel is set as D _k ×D _k The method comprises the steps of carrying out a first treatment on the surface of the The number of convolutions per channel is M and the size is D _k ×D _k Processing by the x 1 convolution kernel to produce D _out ×D _out Feature map of x M; the point-by-point convolution uses N number and 1×1×M size convolution to check the intermediate feature map to perform channel conversion to generate D size _out ×D _out Feature map of x N;

s22, constructing a multi-branch convolution neural network from the aspect of channel-by-channel convolution on the basis of depth separable convolution, wherein each branch represents convolution kernels with different sizes, namely Kx K, K x 1 and 1 xK, and summing the outputs of the three branches;

s23, combining the self-adaptive bottleneck layer with the self-adaptive bottleneck layer from the aspect of point-by-point convolution, adopting global average pooling operation to adaptively adjust the size of the bottleneck coefficient through the weight size so as to realize self-adaptive bottleneck layer optimization, enabling the bottleneck coefficient to set the kernel number to be a value smaller than the number of input channels in a ratio mode, adopting a point convolution mode to perform restoration operation after the bottleneck, and recovering the number of channels;

s24, a global averaging layer (Global Average Pooling, GAP) is adopted to replace a full-connection layer (Fully Connected layers, FC) at the tail end, the feature map is converted into a vector with a fixed length, weight parameters in the FC layer are replaced, the number of weight parameters needing to be learned in a network is obviously reduced, and the model is lighter. Further, in step S23, it is assumed that the number of channels of the input feature map is C _in The number of the output characteristic diagram channels is C _out The channel number of the bottleneck layer is C _b If the bottleneck coefficient is B and the bottleneck layer is not set, the point convolution calculation parameters are shown as formula (4), if the bottleneck layer is setThen the formula (5) is shown, wherein the bottleneck layer channel number is shown as formula (6), and the formula (7) is obtained according to the formulas (5) and (6), when P ₂ Less than P ₁ When present, formula (8) is present, which yields about formula (9):

P ₁ ＝C _in ×C _out (4)

P ₂ ＝C _in ×C _b +C _b ×C _out (5)

C _b ＝B×C _in (6)

further, step S3 includes the steps of:

s31, placing two-dimensional matrix data enhanced by GAF data into multi-branch depth separable convolution, performing enhanced convolution by using three branch paths in a channel-by-channel stage of the depth separable convolution, respectively using convolution kernels with the sizes of 3 multiplied by 3, 3 multiplied by 1 and 1 multiplied by 3, wherein the convolution kernels are increased by 64, 128, 256 and 512 along with the increase of the number of convolution layers, and performing total four-layer depth separable convolution; batch size is set to 220, reLU is used as a nonlinear activation function, model parameters are updated by adopting an Adam optimizer with a learning rate of 0.001, and iteration is carried out for 30 times;

s32, in a point-by-point convolution stage, adopting global average pooling operation to adaptively adjust the size of a bottleneck coefficient through weight magnitude so as to realize adaptive bottleneck layer optimization, and setting the number of cores to be a value smaller than the number of input channels in a ratio mode by the bottleneck coefficient;

s33, the loss function adopts a mean square error MSE shown in the formula (10), the weight bias parameter is updated through back propagation, the prediction error is minimized, and meanwhile, different evaluation indexes are used for describing the influence of the model on the prediction precision;

and S34, after training, saving weight bias to obtain a trained model structure.

6. The method for predicting mechanical properties of hot-rolled steel based on the separable depth of the adaptive multi-branch according to claim 1, wherein in step S33, different evaluation indexes are used, including mean absolute error MSE, root mean square error RMSE and determination coefficient R2, as shown in formulas (11) - (13), the closer the error is to 0, the closer the determination coefficient is to 1, and the better the fitting effect of the model is;

wherein y is _i In order to predict the data,for the corresponding actual data +.>Is the average of the corresponding actual data.

Compared with the prior art, the invention has the beneficial effects that:

according to the self-adaptive multi-branch depth separable-based hot rolled steel mechanical property prediction method, on one hand, a Gram Angle Field (GAF) method is adopted to convert original one-dimensional data into a two-dimensional matrix aiming at the problem of weak original data characteristic representation capability, and correlation characteristics among data are enriched, so that the model can extract more effective characteristic information, better training is obtained, and further performance improvement is realized. On the other hand, in order to improve model precision when the high real-time requirement of task compactness is solved, a combination module of multi-branch depth separable convolution (Multi Branch Depthwise Separable Convolution, MB-DSC) and a self-adaptive bottleneck layer (AB) is provided, the feature extraction capability is improved, visual intuition is introduced on the basis of depth separation, the model performance is improved while the model is lightened, and the relationship between the model and the model is balanced, so that a lightweight high-precision model is constructed.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a flow chart of a hot rolling process.

Fig. 2 is a diagram of a standard convolution process.

Fig. 3 is a view of GAF data enhancement provided by an embodiment of the present invention, where a is original one-dimensional data and b is GAF two-dimensional data.

Fig. 4 is a diagram of a multi-branch depth separable convolution process according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of an adaptive bottleneck structure according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a multi-branch enhanced depth separable convolution model based on an adaptive bottleneck layer according to an embodiment of the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention. It will be apparent that the described examples are only some embodiments, but not all embodiments, of the present invention. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.

The invention provides a hot-rolled steel mechanical property prediction method based on self-adaptive multi-branch depth separable, a data-enhanced gram angle field method and a multi-branch depth separable model based on a self-adaptive bottleneck layer, so as to obtain a hot-rolled steel mechanical property prediction result with better performance in a light-weight state. Which comprises the following steps:

s1, carrying out data enhancement on original one-dimensional data to obtain a two-dimensional matrix with rich characteristics, and dividing a data set.

S2, constructing an optimized and improved convolutional neural network model,

s3, setting super parameters, learning a training model on the training set, and judging through a loss function to obtain a trained model. The superiority and the light weight of the model are proved by a plurality of evaluation index formulas.

S4, predicting the test set data by using the trained model. Obtaining better results.

Specifically, step S1 includes:

s11, data preparation. The data used in the experiment are hot rolled steel production data in a wine steel group carbon steel sheet factory based on a real industrial environment, accord with international standards and have practical significance, and the obtained experimental result can be used in a real life production environment. The original data is 7898 pieces in total. The input parameters include 5 heat treatment parameters including in-furnace heating temperature (FT), start Rolling Temperature (SRT), finish Rolling Temperature (FRT), coiling Temperature (CT), compressibility (Redn), and 5 chemical components including carbon (C), silicon (Si), manganese (Mn), phosphorus (P), sulfur (S). The output parameters include three important mechanical properties, namely Yield Strength (YS), tensile Strength (TS) and Elongation (EL). The data distribution statistics of the minimum, maximum, mean and standard deviation of all input parameters in the original dataset are shown in table 1.

TABLE 1 Hot Rolling data distribution

S12, data enhancement using the glamer angle field (Gramian Angular Field, GAF) method, comprising the steps of:

s121, performing normalization operation on the original one-dimensional data. Assume that there is one-dimensional data x= { X ₁ ,x ₂ ,...,x _N Normalized one-dimensional data to [ -1,1 } length N]As shown in formula (1), wherein x _i Data for the ith position, x _i Is the value of the position after normalization. min (X) and max (X) represent the minimum and maximum values of X, respectively.

S122, the normalized data is represented using polar coordinates. Wherein the value of the one-dimensional data is taken as a polar angle theta, and the value of the position where the corresponding value is located is taken as a polar axis r. As shown in formula (2), wherein i is a position value, L is a position length, r _i For its polar axis, θ _i For its polar angle, the numerical relationship is preserved. Converting the recalibrated raw data onto a polar coordinate system, the interaction relationship between the components of the raw vector can be identified from an angular perspective by the triangles and/or differences between each point.

After normalization, x _i Is within the range of [ -1,1]So theta is _i The value range of (2) is [0, pi ]]And cos θ is monotonic in this interval, a unique mapping result is produced in polar coordinates.

And S123, summing the different position angles to obtain the gram and the angle field (Gramian summation angular field, GASF) as shown in the formula (3).

As shown in fig. 3, the data enhancement is performed visually on the unobvious characteristic relationship between the original one-dimensional data, and 10 input data which are originally one-dimensional are converted into 10×10 matrix data, so that a rich set of original characteristics and correlation characteristics with obvious distinction is obtained. Provides a more accurate and comprehensive data basis for the subsequent learning training of CNN. It should be noted that this section of data enhancement process is performed only once during the data acquisition phase, and does not add to the complexity of the model. The application of the method can bring important help and improvement to the aspects of process control, quality optimization and the like of the hot rolled steel.

S13, training set test set segmentation is carried out on the enhanced data. 9:1 to the training set and the test set.

Further, step S2 constructs an optimized and improved convolutional neural network model, comprising the steps of:

s21, channel dimension and spatial position features in the feature map can be decoupled, so that depth separable convolution operation is used on the basis of CNN, and the purpose of feature extraction consistent with the traditional convolution is achieved through two convolution processing steps of channel-by-channel convolution and point-by-point convolution, and meanwhile calculation parameters are effectively reduced, so that the purpose of light weight is achieved.

The process of depth separable convolution is illustrated in detail in fig. 4. Wherein the channel-by-channel convolution uses a single channel as a feature map and uses each convolution kernel to extract local information of the respective channel. Unlike full-channel convolution in conventional convolution, channel-by-channel convolution greatly reduces the number of convolution parameters. However, since the characteristic information obtained by the channel-by-channel convolution is relatively independent, the characteristic information between different channels is not considered to be extracted. Thus, by using a 1×1 convolution kernel of the same channel dimension as the channel-wise convolution, the same number of output feature maps as the convolution kernel of the point-wise convolution are integrated, integrating its channel information.

Under the condition of ensuring that the sizes of the input and output characteristic diagrams are the same, the parameter quantity and the calculated quantity of the two convolution processing methods are compared. Let the input feature map size be D _in ×D _in X M, output feature map size D _out ×D _out X N. The dimension dimensions correspond to the height, width, and number of channels of the feature map, respectively. The convolution kernel is sized to D _k ×D _k . Thus, the first and second substrates are bonded together,the reference quantity occupied by the standard convolution is D _k ×D _k X M N, calculated as D _k ×D _k ×M×D _out ×D _out X N. For depth separable convolution, a number M and a size D are used for channel-by-channel convolution _k ×D _k Processing by the x 1 convolution kernel to produce D _out ×D _out Feature map of xM, point-by-point convolution uses N number, size 1 x 1 xM convolution check intermediate feature map to make channel conversion to generate size D _out ×D _out Feature map of x N. The total parameter of the method is D _k ×D _k X1 XM+1 X1 XM N, calculated as D _k ×D _k ×D _in ×D _in ×M+1×1×M×D _out ×D _out X N. Comparing the parameter quantity and the calculated quantity to obtain the ratio of the formula (4) to the formula (5). Since Dk and N are both values greater than 1, the depth separable convolution has fewer parameters and less computation than the standard convolution.

S22, on the basis of the depth separable convolution, in the aspect of channel-by-channel convolution, in order to achieve performance improvement under a lightweight model, multi-branch convolution kernel structure improvement is used, the characteristic relation of the separable convolution kernels is enriched, and the model prediction performance is improved. Due to the additivity of convolution operations, two-dimensional convolution kernels of compatible sizes are operated on the same input with the same operation to obtain output features of the same size, and convolutions of compatible sizes can be added at corresponding positions to obtain an equivalent convolution kernel of the same output features. Multi-branch depth separable convolutions (Multi Branch Depthwise Separable Convolution, MB-DSC) were constructed, each branch representing a different size convolution kernel, K x K, K x 1 and 1 x K, respectively, and the three branch outputs were summed to enrich the feature space. Because of its additivity, the model whose training is complete can remain the same response time as a one-way convolutional neural network if it has already been deployed. Therefore, the model performance can be improved on the basis of not affecting the depth-separable light model.

S23, although the depth separable convolution module significantly reduces parameters and calculation amount of the model, we hope to combine the requirement with practical situations, and hope that important features can be learned more, so that the model can obtain higher precision, and less important features can be learned in a small amount, so as to further reduce occupation of parameters and time loss. It is therefore proposed that an adaptive bottleneck layer (Adaptive Bottleneck, AB) is combined with a multi-branch depth separable convolution module. And (5) balancing the relationship between the weight reduction and the precision of the model.

The bottleneck layer is a bottleneck-like layer with a small number of channels of the intermediate feature map, and is used for reducing the computational complexity and the number of parameters, and meanwhile, a certain model expression capability is maintained. The method is effective for eliminating unimportant features by reducing the data through a narrower layer and reducing the data, reducing the compression calculation cost. The invention adopts global average pooling operation to adaptively adjust the size of the bottleneck coefficient (Bottleneck coefficient) through weight size so as to realize the optimization of the adaptive bottleneck layer. The bottleneck coefficient is made to set the number of cores to a value smaller than the number of input channels in the form of a ratio. In order to prevent the loss of detail information in the dimension reduction operation, the reduction operation is performed in a point convolution mode after the bottleneck, the channel number is recovered, more nonlinear activation functions are introduced, and richer characteristic representations can be captured, so that the method is beneficial to the performance and the expression capability of the network. As can be obtained from equations (4) (5), the point convolution integrates the channel information with more computation than the channel-by-channel convolution. The adaptive bottleneck layer is combined with the depth separable convolution from the point-wise convolution aspect on the basis of it.

P ₁ ＝C _in ×C _out (6)

P ₂ ＝C _in ×C _b +C _b ×C _out (7)

C _b ＝B×C _in (8)

Let Cin be the number of channels of the input feature map, cout be the number of channels of the output feature map, cb be the number of channels of the bottleneck layer, and B be the bottleneck coefficient. If the bottleneck layer is not set, the point convolution calculation parameters are shown in the formula (6), and if the bottleneck layer is set as shown in the formula (7), wherein the number of the bottleneck layer channels is shown in the formula (8). Then the formula (9) can be obtained, when P2 is smaller than P1, the formula (10) exists, the formula (11) is obtained by about division, and under the condition that the formula is established, a lighter model can be obtained by adding the point convolution module of the self-adaptive bottleneck layer, and otherwise, a more accurate model can be obtained.

The transfer of bottleneck coefficients in the adaptive bottleneck structure is illustrated in fig. 6. The bottleneck coefficients are adaptively regulated according to the weights, so that the characteristic bottleneck layers with larger weights perform further characteristic extraction learning, and the model performance is improved; and the layers with smaller weights reduce the bottleneck, reduce model parameters, accelerate the overall training speed and improve the generalization capability of the model. Thereby balancing the weight and performance of the model.

S24, a global averaging layer is adopted to replace a traditional full-connection layer at the tail end, and the characteristic map is converted into a vector with a fixed length to replace a large number of weight parameters in the full-connection layer. The number of weight parameters to be learned in the network model is obviously reduced, the risk of overfitting is reduced, the calculation and storage requirements of the model are reduced, and the model is light. Meanwhile, important spatial information can be reserved, so that the model is less sensitive to the position of the target, and the robustness of the model is enhanced. Finally, the multi-branch enhanced depth separable convolution model based on the adaptive bottleneck layer as shown in fig. 6 is obtained.

Next, step S3 sets super parameters, and performs model learning on the training set, including the following steps:

s31, two-dimensional matrix data enhanced by GAF data are put into multi-branch depth separable convolution, enhancement convolution is carried out by using three branch paths in a channel-by-channel stage of the depth separable convolution, convolution kernels with the sizes of 3 multiplied by 3, 3 multiplied by 1 and 1 multiplied by 3 are respectively used, the convolution kernels are increased by 64, 128, 256 and 512 along with the increase of the number of convolution layers, and the total four layers of depth separable convolution is carried out. Batch size was set to 220, and ReLU was iterated 30 times as a nonlinear activation function, with model parameters updated using an Adam optimizer with a learning rate of 0.001.

S32, in the point-by-point convolution stage, the size of the bottleneck coefficient (Bottleneck coefficient) is adaptively adjusted through weight size by adopting global average pooling operation, so that adaptive bottleneck layer optimization is realized. The bottleneck coefficient is made to set the number of cores to a value smaller than the number of input channels in the form of a ratio.

And S33, the loss function adopts a mean square error (Mean Square Error, MSE) shown in a formula (12), and the weight bias parameter is updated through back propagation to minimize the prediction error. Meanwhile, different evaluation indexes, such as average absolute errors (Mean Absolute Error, MAE) and root mean square errors (Root Mean Square Error, RMSE) shown in the formula (13) and the formula (14) and a determination coefficient (R2) shown in the formula (15), are used for better describing the influence of the model on the prediction precision, and the closer the error is to 0, the closer the determination coefficient is to 1 and the better the fitting effect of the model is. Wherein y is _i In order to predict the data,for the corresponding actual data +.>Is the average of the corresponding actual data.

And S34, after training, saving weight bias to obtain a trained model structure for subsequent direct use.

S4, predicting the test set data by using the trained model. The evaluation index in step S33 again proves that a better result can be obtained in practical application.

To further verify the effectiveness of the model improvement, the effect of each component in the evaluation model on overall performance was studied. The different improvement modules are stepped up to determine the contribution of each component to the overall model performance.

The experimental results are shown in Table 2, and it can be seen from a plurality of evaluation indexes that the GAF method introduces more features than the one-dimensional CNN (1 d-CNN), but increases the parameter burden of the model. The multi-branch depth separable module (Multi Branch Depthwise Separable Convolution, MB-DSC) performs more effective feature extraction, so that the performance of the model is greatly improved on the basis of light weight. The adaptive bottleneck structure (AB) further balances the relationship between model performance and light weight, resulting in an acceptable model scale, making it easier to deploy mobile configurations while maintaining higher performance accuracy.

Table 2 ablation experiments

The foregoing is merely illustrative of the preferred embodiments and principles of the present invention, and not in limitation thereof. Any modification, equivalent replacement, improvement, etc. which are within the spirit and principle of the present invention, should be considered as the protection scope of the present invention, based on the ideas provided by the present invention, for those skilled in the art.

Claims

1. The method for predicting the mechanical properties of the hot rolled steel based on the self-adaptive multi-branch depth separable is characterized in that the input parameters comprise 5 heat treatment parameters and 5 chemical components, wherein the 5 heat treatment parameters comprise in-furnace heating temperature, start rolling temperature, finish rolling temperature, coiling temperature and compressibility, and the 5 chemical components comprise carbon, silicon, manganese, phosphorus and sulfur, and the method comprises the following steps:

2. The method for predicting mechanical properties of hot rolled steel based on the self-adaptive multi-branch depth separable according to claim 1, wherein the step S1 of data enhancement by using a glamer angle field method comprises the steps of:

s11, performing normalization operation on the original one-dimensional data, and assuming that one-dimensional data X= { X exists ₁ ,x ₂ ,…,x _N Normalized one-dimensional data to [ -1,1 } length N]As shown in formula (1):

3. the method for predicting mechanical properties of hot rolled steel based on adaptive multi-branch depth separation according to claim 1, wherein in step S1, the reinforced data is obtained by using the method of 9: the ratio of 1 is divided into a training set and a test set.

4. The method for predicting mechanical properties of hot rolled steel based on adaptive multi-branch depth separation according to claim 1, wherein the step S2 of constructing an optimized and improved convolutional neural network model comprises the steps of:

s24, a global averaging layer is adopted to replace a full-connection layer at the tail end, and the feature map is converted into a vector with fixed length to replace weight parameters in the full-connection layer.

5. The method for predicting mechanical properties of hot-rolled steel based on the adaptive multi-branch depth separation as claimed in claim 4, wherein in step S23, it is assumed that the number of channels of the input feature map is C _in The number of the output characteristic diagram channels is C _out The channel number of the bottleneck layer is C _b If the bottleneck coefficient is B, the point convolution calculation parameters are shown as a formula (4) and a formula (5) if the bottleneck layer is not arranged, wherein the bottleneck layer channel number is shown as a formula (6), a formula (7) is obtained according to the formulas (5) and (6), and when P is the sum of the values ₂ Less than P ₁ When present, formula (8) is present, which yields about formula (9):

P ₁ ＝C _in ×C _out (4)

P ₂ ＝C _in ×C _b +C _b ×C _out (5)

C _b ＝B×C _in (6)

6. the method for predicting mechanical properties of hot rolled steel based on the adaptive multi-branch depth separation according to claim 1, wherein the step S3 comprises the following steps:

7. The method for predicting mechanical properties of hot-rolled steel based on the adaptive multi-branch depth separation according to claim 1, wherein in step S33, different evaluation indexes are used, including mean absolute error MAE, root mean square error RMSE and determination coefficient R2, as shown in formulas (11) - (13), the closer the error is to 0, the closer the determination coefficient is to 1, and the better the fitting effect of the model is;

wherein y is _i In order to predict the data,to pair(s)Actual data of the response->Is the average of the corresponding actual data.