CN116303626B

CN116303626B - Well cementation pump pressure prediction method based on feature optimization and online learning

Info

Publication number: CN116303626B
Application number: CN202310558753.3A
Authority: CN
Inventors: 钟原; 杨建新; 周静; 李平; 张涛
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-08-04
Anticipated expiration: 2043-05-18
Also published as: CN116303626A

Abstract

The invention provides a well cementation pump pressure prediction method based on feature optimization and online learning, which comprises the following steps: calculating the well diameter expansion rate and the quality variation by using construction data and well structure data, and realizing feature expansion and optimization; dividing data subsets by using a clustering mode according to the change of data distribution caused by the differentiation operation, constructing a differentiated decision tree base model, and integrating in a Stacking mode to obtain a pre-training model; converting and calculating real-time operation data, dividing the category to which the sample belongs through an online clustering mode, constructing a sample subset, updating a base model or constructing a new base model through an online mode, and realizing pump pressure prediction; and meanwhile, the real pump pressure value is utilized to train the integrated model on line. The method can be used for predicting the well cementation pump pressure of different blocks and different well types, has higher accuracy and generalization, and can provide timely and accurate pump pressure values for well cementation operation so as to perform the well cementation operation more efficiently and safely.

Description

Well cementation pump pressure prediction method based on feature optimization and online learning

Technical Field

The invention relates to the technical field of oil well cementation operation pump pressure prediction based on machine learning, in particular to an online integration method for predicting a well cementation pump pressure by utilizing optimized characteristics.

Background

The prediction and monitoring of the pressure in the well cementation operation have great significance in the field of petroleum engineering. In the actual operation process, conditions such as blowout or lost circulation and the like can be generated due to excessive pressure or untimely flow control, so that great loss of manpower and financial resources is caused, but because the well cementation operation belongs to the underground operation, constructors cannot observe actual real conditions in the underground operation process in time and cannot process underground occurrence conditions in time, and therefore, the real-time performance and the accuracy of pump pressure prediction of the well cementation operation are high in requirements.

At present, the pumping pressure calculation aiming at the well cementation operation mainly adopts a simulation mode of a mathematical model, and the estimation of the pumping pressure is realized by performing simulation calculation of the mathematical model on a large amount of data. The method has the advantages that the method is complex in pumping pressure influence factors, large errors exist in simulation results due to large front-back correlation and the like, the pumping pressure calculation method is not universal, different well structures are arranged among different wells, the size of the well diameter and the change of the fluid pressure in the well have large influence on the pumping pressure, the general mathematical model only considers the flow correlation influence, and the consideration of the factors such as the well diameter is incomplete; moreover, the underground environments of different areas have differences, and an effective general mathematical model cannot be adopted for accurate calculation, so that the pump pressure value estimated by the numerical model in a general form has a large difference from the actual value. Therefore, the invention provides a more applicable machine learning model for accurately predicting the numerical variation of the pump pressure in the pumping pressure operation, and the invention can effectively assist and guide the safe implementation of the well cementation operation.

Disclosure of Invention

The invention aims to solve the problems and provides a well cementation pump pressure prediction method based on feature optimization and online learning.

A well cementation pump pressure prediction method based on feature optimization and online learning is characterized in that real data in the well cementation operation process is subjected to feature processing and feature construction, an integrated learning model is introduced to carry out model construction and prediction on data with different relations, and the method comprises the following steps:

s1, calculating characteristics on line;

s2, model pre-training stage;

and S3, model online training and prediction.

Further, the well cementation pumping pressure prediction method based on feature optimization and online learning is characterized in that data of the model pre-training stage comprises well cementation wellhead pumping pressure and related factor data sets, and the related factor data sets comprise stage, density, flow, stage discharge and total discharge characteristics.

Further, a well cementation pump pressure prediction method based on feature optimization and online learning, wherein the step S1 comprises the following steps:

the sensor information obtained through actual well cementation operation and corresponding well structure data are subjected to characteristic online calculation, a mathematical calculation formula is used for calculation in the online training process, and new relevant factors are obtained to serve as auxiliary characteristics, wherein the calculation formula is as follows:

；

wherein ,representing the well diameter expansion rate; />Representing the position of the ring or the well;an average well diameter representing a location where the work process fluid has covered; />Representing the borehole diameter at the current location.

Further, a well cementation pump pressure prediction method based on feature optimization and online learning, wherein the step S1 comprises the following sub-steps:

s11: constructing a well quality variation, reading real data and corresponding well structure data, calculating total volume of fluid entering the well in the current situation one by one according to the real data, calculating the quality of each fluid at each moment through a mathematical formula, and simultaneously considering two situations of the well and the annulus, and dividing the characteristics into two situations of the well and the annulus to calculate variation;

s12: constructing and optimizing real-time characteristics;

s121: constructing a hole diameter expansion rate, reading real data and corresponding hole diameter structure data, and constructing a dictionary with depth corresponding to the hole diameter by using the corresponding hole diameter structure data;

s122: calculating total volume of fluid entering the well in the current situation according to real data one by one, calculating specific height information of the fluid in the well according to the volume and well bore information, obtaining relevant well bore of depth by inquiring a dictionary, and calculating well bore expansion rate by a mathematical formula;

s123: as the quality in the well is the same, as the well and the annulus are two conditions, the characteristics are divided into the well and the annulus, and the well diameter expansion rate is calculated respectively;

s13: and (3) characteristic processing, namely, abnormal values exist in different data sets, and data processing is realized by adopting an information complement and normalization mode.

Further, a well cementation pump pressure prediction method based on feature optimization and online learning, wherein step S2 includes:

the self-adaptive data flow K-mean algorithm is adopted as an online clustering algorithm to realize data category division, and the core formula is as follows:

；

wherein Representing cluster centers obtained by the new clustering; />Representing cluster centers obtained by the last clustering; />Representing the number of data currently allocated to this cluster; />A cluster core representing a new set of data; />Representing a new set of data added to the cluster +.>Number of data points>Representing forgetting factors;

obtaining new clustered cluster centers in a manner similar to taking an exponentially weighted moving average, whereTo reduce the influence of already clustered data,/->The value of (2) is in the range of 0 to 1;

when (when)When 0, only the latest data can be used in calculation with the new cluster core;

when (when)A1 indicates that all of the data that has occurred will affect the clustering result.

Further, a well cementation pump pressure prediction method based on feature optimization and online learning, wherein the step S2 comprises the following sub-steps:

s21: clustering the data stream;

s211: in the data stream clustering process, adopting flow and operation stages as clustering basis, and dividing data into different types of data according to flow size and stages;

s212: adopting an online clustering method, and using a self-adaptive data flow K-Means (Adaptive Streaming K-Means) algorithm to realize clustering of the data flow for well cementation operation;

s213: the self-adaptive data flow K-average algorithm is divided into an initialization stage and a continuous clustering stage;

s2131: the initialization stage, the data point will go through two stages of accumulating and determining candidate centers;

the probability density function (Probability density function, PDF) obtained after the data is calculated by the kernel density estimation (Kernel density estimation, KDE) is used as the selection basis of the clustering number k, candidate centers of the clusters are determined for each cluster, the region refers to the part between two continuous direction changes of the PDF curve, and the mark of the beginning of the new region refers to all direction changes of the shape of the calculated PDF curve;

the number k of the candidate clusters is the number of the areas, the candidate initial center is the center of the area, the obtained different k epsilon [ kmin, kmin+kmax ] are clustered again, the clustering results of different k values are compared, and the best k and the center of the area corresponding to the k are selected as the initial center;

s2132: in the continuous clustering stage, concept drift detection is carried out on historical data and existing data, and standard deviation and mean value of data stored in the executing process of an algorithm are calculated;

predicting the concept drift of the standard deviation and the mean value, and re-initializing a clustering algorithm when predicting one concept drift, and re-calculating k and a clustering center;

performing clustering operation by using a new k and a clustering center, otherwise defining that no concept drift occurs between the existing data and the historical data, and performing clustering processing on the existing data;

s22: constructing a base model;

s221: the VFDT tree is used as a base model, and the Hoeffding inequality is used as an optimal attribute division basis of the decision node;

s222: in the generation stage of the decision tree, the VFDT model divides data information in a data stream into different nodes according to the Hoeffding inequality;

s223: continuously reading data and continuously replacing leaf nodes with decision nodes to generate a decision tree, wherein each leaf node in the decision tree stores statistical information about attribute values;

s224: when a new data sample is transmitted into the VFDT model, each node of the tree tests or judges the data sample, and the data sample enters different branches by dividing the value of the data sample, and finally reaches leaf nodes of the tree;

s23: model integration;

s231: different base models are dynamically integrated by Stacking, and a unified linear regression model is used as a unified model meta learner;

s232: and using the result parameters obtained by the K-means clustering of the self-adaptive data flow as initial parameters of a base model in the integrated model, and dynamically adjusting parameters of a meta learner along with the on-line training of the model to realize the dynamic integration of the model.

Further, a well cementation pumping pressure prediction method based on feature optimization and online learning is provided, wherein the step S3 comprises the steps of carrying out feature online calculation on measured pumping pressure data and well body structure data in a numerical calculation mode to construct a well diameter expansion rate and well quality variation related features;

dividing data in the data stream into different clusters by using online clustering of the data stream, calling the VFDT corresponding to the cluster in the integrated model to predict a base model, and simultaneously carrying out online updating of the VFDT according to the follow-up real pumping pressure data;

and realizing real-time weighted prediction of the pump pressure for the integrated model according to the data stream clustering result in the model, completing integration of the stack online model, and adjusting the weight of the base model in the integrated model according to the data stream clustering result.

Further, a well cementation pump pressure prediction method based on feature optimization and online learning, wherein the step S3 substep comprises the following steps of

S31: on-line clustering of data streams;

s311: adopting a self-adaptive data flow K-mean algorithm as a clustering model, and continuing on-line clustering on the basis of the clustering model obtained in the model training stage;

s312: updating the clustering model, obtaining a clustering result of new data, carrying out normalization processing on the clustering result, and taking the clustering result as an initial weight of each base learner in the integrated model;

s32: on-line training and prediction of the integrated model;

s321: the result of data flow clustering and data are transmitted into an integrated model, the integrated model calls a related base model to predict, and then online training is carried out through real pump pressure data, so that online updating of the base model is completed;

s322: along with the input of data streams, different base models respectively perform online training, and meanwhile, parameters of a meta learner in the integrated model are dynamically adjusted, so that real-time prediction and online tuning of the integrated model are realized.

The invention has the beneficial effects that:

1. the method of online feature optimization is adopted, and the collected real-time data is subjected to feature construction, feature expansion and feature optimization through numerical calculation, so that the data quality is improved, and the method is more suitable for real-time pump pressure prediction service scenes;

2. the self-adaptive data drift detection method is built in the real-time data stream by adopting a linear clustering algorithm, and clustering results are introduced into the online training process of the model, so that the fitting degree of the model to data can be effectively enhanced, and the problem of the model prediction performance reduction caused by data drift is avoided;

3. the online learning mode is adopted, the online integration model of the VFDT is realized in a stacking mode according to the characteristics of real-time acquisition and construction, the incremental learning of the service flow is realized, and compared with the offline model, the online learning mode is better suitable for the service scene of pump pressure real-time prediction in well cementation operation.

Drawings

Fig. 1 is a flow chart of the present invention.

Fig. 2 is a feature construction flow chart.

Fig. 3 is a diagram of an integrated model structure.

Fig. 4 is a graph of experimental results.

Detailed Description

For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.

As shown in fig. 1, wherein,、/>、/>the method for predicting the well cementation pump pressure based on feature optimization and online learning respectively represents the distance between clustered data and cluster center and serves as the initial weight of an integrated model, and comprises the following steps:

s1, online calculation of characteristics:

according to the well cementation technology, the structure of the well has an influence on the pumping pressure value of well cementation operation, so that characteristic construction is carried out on different well structures of each well, the characteristics of the well diameter expansion rate, the quality variation in the well and the like are calculated in real time according to the online operation process, and all the non-independent and uniformly distributed data are preprocessed to be used as the characteristics of model training.

The hole diameter expansion rate is obtained by adopting a dictionary lookup table mode: constructing well structure data into a dictionary, simulating a well cementation operation process, calculating the real position of each piece of real data in the well cementation process, and searching the dictionary to obtain well information so as to calculate the well diameter expansion rate; meanwhile, real data are read in real time in the well cementation operation simulation process to calculate the quality change quantity of the fluid in the well under the current condition.

S2, model pre-training phase:

and adopting a self-adaptive data flow K-mean algorithm as an online clustering algorithm to realize data category division. The core formula is as follows:

；

the cluster centers of the new clusters are obtained in a manner similar to taking an exponentially weighted moving average. Wherein forgetting factor is adoptedTo reduce the influence of already clustered data,/->The value of (2) is in the range of 0 to 1, when +.>When 0, it means that only the latest data will be used in the calculation with the new cluster core, when +.>A1 indicates that all of the data that has occurred will affect the clustering result. Wherein->Representing cluster centers obtained by the new clustering; />Representing cluster centers obtained by the last clustering; />Representing the number of data currently allocated to this cluster; />A cluster core representing a new set of data; />Representing a new set of data added to the cluster +.>Is the number of data points.

In equation (1) above, the denominator is the number of sample points newly added to the cluster and the number of history sample points multiplied by the forgetting factorBy which the impact of past history data on new cluster core calculations is reduced. The molecules are vector sums of sample points newly added to the cluster and +.>Processing the sum of vector sums after all sample points in the past, without forgetting factor +.>When referring to the vector sum of all sample points in the previous round.

In the above process, the numerator is the vector sum of the sample points left after the current iteration in the clustering process, and the denominator is the total number of all the sample points needed to be kept in the clustering process. The new cluster center is the vector average representing the existing sample points. The method has the advantages that the result of mainly influencing the new cluster core is forgetting factors, the cluster core is kept in a dynamically changed state at all times by introducing the forgetting factors, the phenomenon that the new cluster core is more influenced by past data after a large amount of historical data are accumulated is avoided, and accurate clustering treatment cannot be accurately carried out on the new data. The VFDT (Very Fast Decision Tree, fast decision tree) avoided in the real process is a base model of an adaptive random forest, which is an improved version of the Hoeffding tree divided by the best attribute with the Hoeffding inequality as the decision node. The method has wide application in the data flow data analysis algorithm. The time and space utilization efficiency of the algorithm is improved by processing the data using a fixed time and memory size. Meanwhile, based on the Hoeffding inequality, the base model tuning is realized by continuously receiving new sample information and finding out proper decision node characteristic attribute within the confidence coefficient of the received information, and continuously optimizing and updating the VFDT.

The classification of the data is realized through a clustering algorithm, the same number of base models are generated through the clustering result, the respective base models are trained by using the data of the respective categories, finally, the base models generated by the different data categories are dynamically integrated through a stack through the idea of an integrated model, and the parameters obtained by the clustering model are used as dynamic parameters to adjust the weights of the base models in the integrated model.

S3, model online training and prediction

Firstly, carrying out characteristic online calculation on measured pump pressure data and well structure data by adopting a numerical calculation mode to construct well diameter expansion rate, well quality variation and other characteristics; secondly, dividing data in the data stream into different clusters by using online clustering processing of the data stream, calling the VFDT corresponding to the cluster in the integrated model to predict a base model, and simultaneously carrying out online updating of the VFDT according to the follow-up real pumping pressure data; then, according to the result of data flow clustering in the model, realizing real-time weighted prediction of the pump pressure for the integrated model; and integrating the stack online model in the process, and adjusting the weight of the base model in the integrated model according to the result of the data stream clustering process.

2. As shown in fig. 2, the following steps are mainly included for the feature configuration in S1:

2.1, constructing the quality variation in the well, reading real data and corresponding well structure data, calculating the total volume of fluid entering the well in the current situation one by one according to the real data, calculating the quality of each fluid at each moment through a physical formula, and simultaneously, calculating the variation of the characteristics by dividing the characteristics into two situations of the inside of the well and the annulus because the inside of the well and the annulus are two situations and have certain difference.

2.2, constructing a hole diameter expansion rate, reading real data and corresponding hole diameter structure data, and constructing a dictionary with depth corresponding to the hole diameter by using the corresponding hole diameter structure data; and calculating the total volume of the fluid entering the well in the current situation according to the real data one by one, calculating the specific height information of the fluid in the well according to the volume and the well body information, obtaining the relevant well diameter of the depth by inquiring the dictionary, and calculating the well diameter expansion rate according to a mathematical formula. As the quality in the well is the same, the well diameter expansion rate is calculated by dividing the characteristics into two conditions of the well and the annulus because the well and the annulus are two conditions and have certain difference.

And 2.3, feature processing, namely realizing data processing by adopting information complement, normalization and other modes due to certain abnormal values in different data sets.

3. The model training stage in S2 mainly comprises the following steps:

3.1 data stream clustering

The flow and the working stage of the construction work influence the pump pressure of the well cementation work as known from the actual well cementation process and the well cementation work flow, so that in the data flow clustering process, the flow and the working stage are used as the clustering basis, and the data are divided into different types of data according to the flow size and the working stage. Considering the reason that the well cementation operation is performed for receiving the real-time data stream, an online clustering method is adopted, and the clustering of the well cementation operation data stream is realized by using a self-adaptive data stream K-Means (Adaptive Streaming K-Means) algorithm. The self-adaptive data flow K-average algorithm is mainly divided into an initialization stage and a continuous clustering stage. In the data initialization phase, the data point will go through two phases of accumulating and determining candidate centers. The probability density function (Probability density function, PDF) obtained after the data is calculated through kernel density estimation (Kernel density estimation, KDE) is used as a selection basis of the clustering number k, candidate centers of the clusters are determined for each cluster, a region refers to a part between two continuous direction changes of the PDF curve, and a mark at the beginning of a new region refers to all direction changes of the shape of the calculated PDF curve. The number k of clusters of the candidate clusters is the number of regions and the initial center of the candidate is the center of the region. Since the features show different distribution, different features will find different candidate centers. And (3) re-clustering the obtained different k E [ kmin, kmin+kmax ], comparing different k values to obtain clustering results, and selecting the best k and the center of the region corresponding to the k as the initial center. In the continuous clustering stage, firstly, the historical data and the existing data are subjected to concept drift detection, standard deviation and mean value of the data stored in the execution process of an algorithm are calculated, the two values are subjected to prediction of concept drift, when one concept drift is predicted, the clustering algorithm is re-initialized, k and a clustering center are re-calculated, then the new k and the clustering center are used for clustering operation, otherwise, the fact that no concept drift occurs between the existing data and the historical data is defined, and the existing data is clustered.

3.2 base model construction

The VFDT (Very Fast Decision Tree) model is a common decision tree model for data stream prediction, and improves the utilization rate of time and space on the basis of the traditional Hoeffding tree. In the generation stage of the decision tree, the VFDT model divides data information in a data stream into different nodes according to the Hoeffding inequality, and then continuously reads data and continuously replaces leaf nodes with decision nodes to generate the decision tree. Statistical information about the attribute values is maintained in each leaf node in the decision tree. When a new data sample is transmitted into the VFDT model, each node of the tree tests or judges the data sample, then the data sample is divided into values, enters different branches, finally reaches leaf nodes of the tree, and then the statistical information of the node is updated to calculate the test value of the leaf nodes about the attribute. The attribute statistics in the leaf nodes of the decision tree will be updated continuously as the data stream is received.

3.3 model integration

Different base models are dynamically integrated by Stacking, a unitary linear regression model is used as a meta learner of the integrated model, a result parameter clustered by a self-adaptive data flow K-means is used as an initial parameter of the base model in the integrated model, and the parameters of the meta learner are dynamically adjusted along with the online training of the model, so that the dynamic integration of the model is realized.

4. As shown in fig. 3, wherein,represents->Weights of the individual basis models, +.>Represents->The model number, for the pressure prediction in S3, mainly includes the following steps:

4.1, data stream on-line clustering

And carrying out online clustering on the data flow of the newly transmitted data. And adopting a self-adaptive data flow K-mean algorithm as a clustering model, continuously carrying out online clustering on the basis of the clustering model obtained in the model training stage, obtaining a clustering result of new data while updating the clustering model, carrying out normalization processing on the clustering result, and taking the clustering result as an initial weight of each base learner in the integrated model.

4.2 on-line training and prediction of Integrated model

The result of data flow clustering and data are transmitted into an integrated model, the integrated model calls a related base model to predict, and then online training is carried out through real pump pressure data, so that online updating of the base model is completed; along with the input of data streams, different base models respectively perform online training, and meanwhile, parameters of a meta learner in a Stacking integrated model are dynamically adjusted, so that real-time prediction and online tuning of the integrated model are realized.

Performing characteristic construction and characteristic optimization on real data of a well cementation pump pressure, and then classifying experimental data; and training different VFDT trees by using different types of data, and finally, carrying out Stacking integration on the trained trees to obtain a completed Stacking VFDT model.

In the training process, two modes of pre-training and online training are adopted, a pre-training preparation basic model is used, and then online updating is carried out on the StarkingVFDT model by using online training.

The data of the well cementation operation comprises a well cementation wellhead pump pressure and related factor data sets, wherein the related factor data sets comprise stage, density, flow, stage displacement and total displacement.

The method for constructing the StarkingVFDT model mainly comprises the following steps of:

step 1, online calculation of characteristics: the method comprises the steps of carrying out characteristic online calculation through the real well cementation pump pressure and corresponding well structure data, and calculating by using a physical calculation formula in the online training process to obtain new relevant factors such as the well diameter expansion rate and the like as auxiliary characteristics, wherein the formula is as follows:

；

Step 2, model pre-training: and (5) performing offline training on the model by using the actual well cementation pump pressure and the corresponding well structure data.

Step 2.1 data partitioning and base model training: dividing according to real data of a well cementation pump pressure, using a stage and flow as dividing basis, taking the divided data as different sub-data sets to train different VFDT trees as a base model, using a Hoeffding inequality as an optimal attribute dividing basis of a decision node, dividing data information in a data stream into different nodes according to the Hoeffding inequality in a generating stage of the decision tree by the VFDT model, and then continuously reading the data and continuously replacing leaf nodes with the decision nodes to generate the decision tree. Statistical information about the attribute values is maintained in each leaf node in the decision tree. When a new data sample is transferred into the VFDT model, each node of the tree tests or judges it, then, by dividing its value, it enters into different branches, and finally, reaches the leaf node of the tree.

Step 2.2, base model integration: integrating the basic models in a Stacking mode, adopting a unitary linear regression model as a meta learner of the Stacking integrated model, adopting a quantitative relation among data in a data dividing process as an initial weight of the meta learner, taking the output of each basic model as the input of the meta learner, continuously adjusting the weight of each sub model in a final integrated model, and finally obtaining the pre-trained Stacking VFDT model.

Step 3, online training and prediction: and (3) carrying out online training and prediction on the data of the test set by using the model obtained by training in the step (2).

Step 3.1, online training of data partitioning and integration models: 2.1, dividing data, and simultaneously carrying out online prediction and training on a base model in an integrated model corresponding to the data according to the divided data;

step 3.2, online updating of the integrated model: and taking the prediction results of different base models as the input of a meta learner in the integrated model, updating the weights of different base models in the meta learner on line, and adjusting the final prediction results of the integrated model.

The experiment of the method is as follows:

1. description of data

Experimental data are collected data from a cementing operation, the data set was sampled in time series in 10 seconds, and the data set characteristics and their associated description are shown in table 1 below.

Table 1 description of well cementation pump pressure data

2. Experimental setup

The experimental hardware environment of the test is Intel (R) Core (TM) i9-10900KF [email protected] GHz and NVIDIA Geforce 3090 display card. In order to evaluate the performance of the method, the method is compared with an original characteristic self-adaptive random forest (original ARF) algorithm, a characteristic optimized self-adaptive random forest (characteristic optimized ARF) and the StackingVFDT model, and the prediction performance is measured by adopting two evaluation indexes of a mean square error (Mean Square Error, MSE) and an R2 decision coefficient, wherein the definition is shown in the following formula:

；

wherein ,represents mean square error>Representing the number of samples>Represents->Predicted outcome of->Representing a true value; />Representing a decision coefficient for evaluating the degree of coincidence of the predicted value and the actual value, < >>Represents->Personal value>Represents->Predicted value of->Representing the mean of the true values.

In the traditional operation process, the mathematical model has the problems of inaccurate prediction, larger prediction error and the like for the pump pressure prediction, the method is introduced into a machine learning method, experiments are carried out on the real data of two wells in the same block, one well is used as a training set, the other well is used as a testing set for online training, and the obtained result is shown in table 2.

Table 2 experiment verifies the results

As shown in fig. 4, the result shows that the ARF model with feature optimization has higher prediction accuracy than the original feature, while the staringvfdt model with feature optimization has better advantages than the ARF model with feature optimization, and has a certain improvement in MSE and R2. Fully explaining the good predictive effect of the staring vfdt model using feature optimization.

According to the scheme, by means of the well cementation pump pressure prediction method based on feature optimization and online learning, an online feature optimization method is adopted, and feature construction, feature expansion and feature optimization are performed on collected real-time data through numerical calculation, so that the data quality is improved, and the method is more suitable for a real-time pump pressure prediction service scene;

a self-adaptive data drift detection method is built in a real-time data stream by adopting a linear clustering algorithm, and clustering results are introduced into an online training process of a model, so that the fitting degree of the model to data can be effectively enhanced, and the problem of model prediction performance reduction caused by data drift is avoided.

The online learning mode is adopted, the online integration model of the VFDT is realized in a Stacking mode according to the characteristics acquired and constructed in real time, the incremental learning of the service flow is realized, and compared with the offline model, the online learning model is better suitable for the service scene of the real-time prediction of the pump pressure in the well cementation operation.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A well cementation pump pressure prediction method based on feature optimization and online learning is characterized in that real data in the well cementation operation process is subjected to feature processing and feature construction, an integrated learning model is introduced to perform model construction and prediction according to data distribution change, and the method comprises the following steps:

s1, calculating characteristics on line;

s2, model pre-training stage;

s3, model on-line training and prediction;

said step S1 comprises the sub-steps of:

s11: constructing a well quality variation, reading real data and corresponding well structure data, calculating the total volume of fluid entering the well in the current situation one by one according to the real data, calculating the quality of each fluid at each moment through a physical formula, dividing the characteristics into two situations of the well and the annulus, and calculating the variation;

s12: constructing and optimizing real-time characteristics;

s122: calculating total volume of fluid entering the well in the current situation according to real data one by one, calculating specific height information of the fluid in the well according to the volume and well bore information, obtaining relevant well bore of depth by inquiring a dictionary, and calculating the well bore expansion rate by a mathematical formula;

s13: characteristic processing, namely, abnormal values exist in different data sets, and data processing is realized by adopting an information complement and normalization mode;

said step S2 comprises the sub-steps of:

s21: clustering the data stream;

predicting the concept drift of the standard deviation and the mean value, and re-initializing a clustering algorithm when the concept drift is predicted, and re-calculating k and a clustering center;

s22: constructing a base model;

s23: model integration;

s231: different base models are dynamically integrated by using Stacking, and a unitary linear regression model is used as a meta learner of the Stacking integrated model;

s232: the result parameter obtained by the K-means clustering of the self-adaptive data flow is used as the initial parameter of the base model in the integrated model, and the parameters of the element learner are dynamically adjusted along with the on-line training of the model, so that the dynamic integration of the model is realized;

step S3 comprises the steps of carrying out characteristic online calculation on measured pump pressure data and well structure data by adopting a numerical calculation mode, and constructing a well diameter expansion rate and well quality variation related characteristics;

and realizing real-time weighted prediction of the pump pressure on the integrated model according to the data stream clustering result in the model, completing integration of the Stacking on-line model, and adjusting the weight of the base model in the integrated model according to the data stream clustering result.

2. The method for predicting the well cementation pumping pressure based on feature optimization and online learning according to claim 1, wherein the data of the model pre-training stage comprises a well cementation wellhead pumping pressure and related factor data sets, and the related factor data sets comprise stage, density, flow, stage displacement and total displacement features.

3. The method for predicting the pump pressure of well cementation based on feature optimization and online learning according to claim 1, wherein the step S1 comprises:

；

4. The method for predicting the well cementation pump pressure based on feature optimization and online learning according to claim 1, wherein the step S2 comprises:

；

5. The method for predicting the pump pressure of well cementation based on feature optimization and online learning according to claim 1, wherein the step S3 substep comprises:

s31: on-line clustering of data streams;

s312: updating the clustering model, obtaining a clustering result of new data, carrying out normalization processing on the clustering result, and taking the result of online clustering as the initial weight of each base learner in the integrated model;

s32: on-line training and prediction of the integrated model;

s322: along with the input of data streams, different base models respectively perform online training, and meanwhile, parameters of a meta learner in a Stacking integrated model are dynamically adjusted, so that real-time prediction and online tuning of the integrated model are realized.