CN115630571A

CN115630571A - Oil well indicator diagram automatic diagnosis method based on ensemble learning

Info

Publication number: CN115630571A
Application number: CN202211252063.7A
Authority: CN
Inventors: 赵小波; 雷俊杰; 杨兴利; 肖红卫; 赵亚杰; 尹正秋; 张庆祝; 张虎; 于强; 李辉
Original assignee: Changan University; Yanchang Oil Field Co Ltd
Current assignee: Changan University; Yanchang Oil Field Co Ltd
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2023-01-20

Abstract

The invention discloses an oil well indicator diagram automatic diagnosis method based on ensemble learning, which specifically comprises the following steps: historical data are collected to construct a dynamometer diagram database, and data preprocessing and data cleaning are carried out on the data in the dynamometer diagram database; extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the typical indicator diagram; dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data; adopting integrated learning construction and combining a Bagging algorithm and a decision tree to generate a random forest model; inputting the verification data set into a random forest model, and evaluating an automatic diagnosis result of the indicator diagram; and acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by using the random forest model. The invention can collect oil well data in real time, complete the judgment of fault types which can be prompted, and effectively improve the production management efficiency to a certain extent.

Description

Oil well indicator diagram automatic diagnosis method based on ensemble learning

Technical Field

The invention belongs to the technical field of fault diagnosis of indicator diagrams of oil wells, and particularly relates to an automatic diagnosis method of indicator diagrams of oil wells based on integrated learning.

Background

The pumping unit is the main lifting equipment in the production operation of most oil fields, and the working conditions are complex due to the fact that the rod, the pipe and the pump work in the severe environment deep underground and the dynamic change of the oil reservoir environment in the production process, and faults of the pumping unit occur frequently. The underground working condition is judged timely and accurately, and then the operating state of the pumping unit and the reason of the change of the yield are analyzed, so that production personnel and decision makers can make an oil well management operation decision timely, further deterioration of the working condition of an oil well is avoided to a certain extent, and the purpose of improving and optimizing the production efficiency is achieved;

the oil well indicator diagram reflects the change of the load of the suspension point of the pumping unit along with the displacement, and the method for judging the underground working condition of the pumping unit by analyzing the geometric shape of the indicator diagram is a common method. The traditional indicator diagram analysis is usually implemented by manually identifying the indicator diagram collected on site and then judging how the underground working condition is, and then a site engineer provides corresponding measures according to the result. With continuous iterative update of the automation technology and the importance of oil field enterprises on automation and digitization, the indicator diagram diagnosis method based on machine learning is further developed and applied. Most of the traditional methods are used for distinguishing common underground working conditions aiming at the identification of a typical indicator diagram, however, an atypical complex indicator diagram exists in actual production, the complex indicator diagrams can represent various underground single working conditions or composite working conditions, for example, the single working conditions include pump hanging, pump bumping, oil pipe leakage, plunger dropping out of a pump barrel and the like, and the composite working conditions include oil well sand production and vibration, double valve leakage, insufficient liquid supply and the like. The indicator diagram has similarity under complex working conditions, and the slight change of the shape of the indicator diagram represents different underground working conditions. The automatic diagnosis model of the indicator diagram of the oil well cannot be comprehensively and accurately constructed only by realizing the identification of a plurality of typical indicator diagrams, and the whole real-time production working conditions cannot be mastered. Therefore, the realization of accurate real-time automatic diagnosis of the indicator diagram of the oil well is a challenge of the current automatic diagnosis of the oil well fault, but the false alarm and the false failure rate of the fault are still high from the current diagnosis effect.

Disclosure of Invention

The invention aims to provide an oil well indicator diagram automatic diagnosis method based on integrated learning, and solves the problems of high false alarm and missing report rate of faults in the existing oil well fault automatic diagnosis.

The technical scheme adopted by the invention is that,

an oil well indicator diagram automatic diagnosis method based on ensemble learning specifically comprises the following steps:

step 1: acquiring historical data to construct a power map database, and performing data preprocessing and data cleaning on the data in the power map database;

and 2, step: extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the typical indicator diagram;

and 3, step 3: dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data;

and 4, step 4: adopting integrated learning construction and combining a Bagging algorithm and a decision tree to generate a random forest model, and further classifying data;

and 5: inputting the verification data set into a random forest model, and evaluating the automatic diagnosis result of the indicator diagram by using the error rate, the calculation efficiency, the accuracy and the recall rate;

step 6: and acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by using the random forest model.

The invention is also characterized in that;

in step 1, the data cleaning comprises repeated data processing, error data processing and incomplete data processing, wherein:

step 1.1: for error data, carrying out consistency detection on the data, unifying the data through data modification when the data are inconsistent, repeating the detection and modification processes until the data meet the requirements, and outputting the data;

step 1.2: for repeated data, directly removing the repeated data;

step 1.3: for incomplete data, calculate incomplete data X _{Lack of} With other data X _i Euclidean distance of (c):

dist＝‖X _{is short of} -X _i ‖ ₂ i＝1,2,…,N (1)；

Sequencing the Euclidean distances dist to find the data with the minimum Euclidean distance from the incomplete data:

(X _min ,Y _min )＝arg min dist (2)；

wherein, X _min For data features with minimum Euclidean distance dist, X _min Is corresponding to European style

A data category label with minimum dist;

and taking the label corresponding to the data as the label of the incomplete label data.

In step 2, the extracted features include: oil extraction engineering characteristics and indicator diagram geometrical characteristics.

In step 3, the training data set is used for helping to train the model, namely parameters of a fitting curve are determined through data of the training set; the verification data set is used for model selection, namely final optimization and determination of the model are carried out, and the model is constructed in an auxiliary mode; the test data set is used to test the accuracy of the trained model.

In step 4, the Bagging algorithm specifically comprises: the algorithm trains a plurality of rounds, a training set of each round consists of n training samples randomly taken from initial training data, each initial training data can appear or not appear in a certain round of training set for a plurality of times, and a prediction function sequence is obtained after training.

In step 4, the specific flow of the random forest algorithm is as follows:

returning the sample set to randomly sample and selecting n samples;

randomly selecting k features from all the features, and establishing a decision tree for the selected samples by using the k features;

repeating the two steps for m times to generate m decision trees to form a random forest;

and for data, the classification of new data is confirmed by voting through the decision of each decision tree.

In step 4, the rule for generating each decision tree in the random forest model is as follows:

for an original data set containing m samples, obtaining a training set containing m samples through self-service sampling;

if each sample has d features, selecting an integer smaller than d, randomly selecting k features from the d features, and then selecting the optimal features from the k features when the decision tree is split on each node;

each decision tree grows to the maximum depth without pruning;

wherein the k value controls the degree of randomness of the random forest: when k = d, the random forest Lin Zhongji decision tree is generated in the same process as a traditional decision tree; when k =1, randomly selecting one attribute to divide, and setting k as k

d is a feature number.

The method has the advantages that through exploratory data analysis, in addition to basic indexes of the indicator diagram, a plurality of characteristics with high distinguishing degree are found, and the upper area, the lower area, the maximum load and the like make the characteristics more significant for identifying faults possibly prompted by the oil well in the indicator diagram; the method has the advantages that the clustering and decision tree are combined by utilizing the ensemble learning to form a random forest model, the model is high in generalization capability, high-dimensional data can be processed, the overall accuracy cannot be affected due to the fact that individual characteristics are occasionally lost, the calculation speed is high, and finally effective diagnosis of the indicator diagram fault type is achieved. The method can obviously improve the capability of the classification model for automatically diagnosing the indicator diagram, reduce the error rate and improve the calculation efficiency.

Drawings

FIG. 1 is a flow chart diagram of an oil well indicator diagram automatic diagnosis method based on ensemble learning according to the invention;

FIG. 2 is a basic flow chart of a random forest model in the oil well indicator diagram automatic diagnosis method based on ensemble learning.

Detailed Description

The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning is described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to an oil well indicator diagram automatic diagnosis method based on integrated learning. Secondly, a random forest model is generated by adopting integrated learning construction and combining Bagging and decision trees so as to classify data, a verification data set is input into the classification model after learning is completed, and the automatic diagnosis result of the indicator diagram is evaluated by using the error rate, the calculation efficiency, the accuracy and the recall rate. And finally, acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by utilizing the learned classification model.

As shown in FIG. 1, an oil well indicator diagram automatic diagnosis method based on ensemble learning comprises the following steps:

step 1, historical data are collected to construct a power indication graph database, and data in the power indication graph database are preprocessed and cleaned. When abnormal data such as incomplete data is encountered, deletion or interpolation is used for processing, and a deletion or interpolation method is reasonably selected according to specific conditions during processing, wherein the specific method comprises the following steps:

step 1.1, for error data, carrying out consistency detection on the data, unifying the data through data modification when the data are inconsistent, repeating the detection and modification processes until the data meet the requirements, and outputting the data;

step 1.2, directly removing the repeated data for the repeated data;

step 1.3. For incomplete data, the following procedure is generally adopted:

a. calculating incomplete data X _{Lack of} With other data X _i Euclidean distance of (c):

dist＝‖X _{lack of} -X _i ‖ ₂ i＝1,2,…,N (1)；

b. Sequencing the Euclidean distances dist, and finding the data with the minimum Euclidean distance to the incomplete data:

(X _min ,Y _min )＝argmindist (2)；

wherein, X _min Data feature with minimum Euclidean distance dist, Y _min Is the data category label corresponding to the minimum Euclidean distance didt;

c. and taking the label corresponding to the data as the label of the incomplete label data.

And 2, extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the characteristics of the typical indicator diagram, so that the characteristics can better describe the characteristics of the indicator diagram under different fault conditions. The specific method comprises the following steps:

based on the characteristics of different oil well faults and the like possibly prompted by different indicator diagrams, the extraction of relevant characteristics comprises the following steps:

oil extraction engineering characteristics: pump depth, current water content of the well, pump fullness degree, effective stroke;

the geometrical characteristics of the indicator diagram are as follows: the load balance of the power indicator is characterized by comprising an indicator diagram maximum load, an indicator diagram minimum load, an indicator diagram theoretical upper load and theoretical lower load, an upper stroke average load, a lower stroke average load, a first peak value and a last peak value of an upper stroke curve of the indicator diagram, a first peak value and a last peak value of a lower stroke curve of the indicator diagram, an upper stroke curve average slope of the indicator diagram, an lower stroke curve average slope of the indicator diagram, an upper area, a lower area and the like.

Step 3, dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data:

training data set: helping to train the model, namely determining parameters of a fitting curve through data of a training set;

verifying the data set: the method is used for model selection, namely final optimization and determination of the model are carried out, and the model is constructed in an auxiliary manner;

testing the data set: the method is used for testing the accuracy of the trained model and preventing the overfitting phenomenon generated when the noise is large.

And 4, adopting integrated learning to construct and combine Bagging and a decision tree to generate a random forest model so as to classify the data.

The specific flow of the Bagging algorithm is as follows: the learning algorithm is used for training a plurality of rounds, a training set of each round consists of n training samples randomly taken from initial training data, each initial training data can appear for a plurality of times or does not appear in a certain round of training set based on each initial training data, a prediction function sequence H _1 and … H _ n can be obtained after training, a final prediction function H adopts a voting mode for classification problems, and a simple average method is adopted for judging new examples of regression problems.

The rule for generating each tree by the random forest algorithm is as follows:

a. for an original data set containing m samples, obtaining a training set containing m samples through self-service sampling;

b. if each sample has d features, selecting an integer k smaller than d, randomly selecting k features from the d features, and then selecting the optimal features from the k features when the decision tree is split on each node;

c. each tree grows to the maximum depth without pruning;

as shown in fig. 2, the basic flow of the random forest algorithm is:

a. returning the sample set to randomly sample and selecting n samples;

b. randomly selecting k features from all the features, and establishing a decision tree for the selected samples by using the features;

c. repeating the two steps for m times to generate m decision trees to form a random forest;

d. for new data, through each tree decision, the decision is finally voted to confirm which category is assigned.

And 5, inputting the verification data set into the classification model after learning is completed, and evaluating the automatic diagnosis result of the indicator diagram by using the error rate, the calculation efficiency, the accuracy and the recall rate.

And 6, acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by utilizing the learned classification model.

According to the oil well indicator diagram automatic diagnosis method based on the integrated learning, the classification model after the integrated learning can be used for automatic diagnosis of the oil well indicator diagram, oil well data can be collected in real time, the judgment of fault types possibly prompted can be completed, the production management efficiency can be effectively improved, and the method has certain practicability.

Claims

1. An oil well indicator diagram automatic diagnosis method based on ensemble learning is characterized by comprising the following steps:

step 2: extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the typical indicator diagram;

and step 3: dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data;

and 4, step 4: adopting ensemble learning construction and combining a Bagging algorithm and a decision tree to generate a random forest model, and further classifying data;

2. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 1, the data cleaning comprises repeated data processing, error data processing and incomplete data processing, wherein:

step 1.2: for repeated data, directly removing the repeated data;

dist＝‖X _{lack of} -X _i ‖ ₂ i＝1，2，...，N (1)；

Sequencing the Euclidean distances dist, and finding the data with the minimum Euclidean distance to the incomplete data:

(X _min ，Y _min )＝arg min dist (2)；

wherein, X _min Data feature with minimum Euclidean distance dist, Y _min Is the data category label corresponding to the minimum Euclidean distance dist;

3. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein the extracted features in the step 2 comprise: oil extraction engineering characteristics and indicator diagram geometrical characteristics.

4. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 3, the training data set is used for helping to train the model, namely, parameters of a fitting curve are determined through data of the training set; the verification data set is used for model selection, namely final optimization and determination of the model are carried out, and the model is constructed in an auxiliary mode; the test data set is used to test the accuracy of the trained model.

5. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 4, the Bagging algorithm is specifically as follows: the algorithm trains multiple rounds, a training set of each round consists of n training samples randomly taken from initial training data, each initial training data can appear or not appear in a certain round of training set for multiple times based on each initial training data, a prediction function sequence is obtained after training, a final prediction function adopts a voting mode for classification problems, and a simple average method is adopted for judging new examples of regression problems.

6. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 4, the specific flow of the random forest algorithm is as follows:

returning the sample set to randomly sample and selecting n samples;

7. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 6, wherein in the step 4, the rule for generating each decision tree in the random forest model is as follows:

each decision tree grows to the maximum depth without pruning;

wherein the k value controls the degree of randomness of the random forest: when k = d attribute set number, the generation process of the random forest Lin Zhongji decision tree is the same as that of the traditional decision tree; when k =1, one attribute is randomly selected for division, and k is set to be

d is a feature number.