CN115630571A - Oil well indicator diagram automatic diagnosis method based on ensemble learning - Google Patents

Oil well indicator diagram automatic diagnosis method based on ensemble learning Download PDF

Info

Publication number
CN115630571A
CN115630571A CN202211252063.7A CN202211252063A CN115630571A CN 115630571 A CN115630571 A CN 115630571A CN 202211252063 A CN202211252063 A CN 202211252063A CN 115630571 A CN115630571 A CN 115630571A
Authority
CN
China
Prior art keywords
data
indicator diagram
oil well
random forest
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211252063.7A
Other languages
Chinese (zh)
Inventor
赵小波
雷俊杰
杨兴利
肖红卫
赵亚杰
尹正秋
张庆祝
张虎
于强
李辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Yanchang Oil Field Co Ltd
Original Assignee
Changan University
Yanchang Oil Field Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University, Yanchang Oil Field Co Ltd filed Critical Changan University
Priority to CN202211252063.7A priority Critical patent/CN115630571A/en
Publication of CN115630571A publication Critical patent/CN115630571A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses an oil well indicator diagram automatic diagnosis method based on ensemble learning, which specifically comprises the following steps: historical data are collected to construct a dynamometer diagram database, and data preprocessing and data cleaning are carried out on the data in the dynamometer diagram database; extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the typical indicator diagram; dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data; adopting integrated learning construction and combining a Bagging algorithm and a decision tree to generate a random forest model; inputting the verification data set into a random forest model, and evaluating an automatic diagnosis result of the indicator diagram; and acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by using the random forest model. The invention can collect oil well data in real time, complete the judgment of fault types which can be prompted, and effectively improve the production management efficiency to a certain extent.

Description

Oil well indicator diagram automatic diagnosis method based on ensemble learning
Technical Field
The invention belongs to the technical field of fault diagnosis of indicator diagrams of oil wells, and particularly relates to an automatic diagnosis method of indicator diagrams of oil wells based on integrated learning.
Background
The pumping unit is the main lifting equipment in the production operation of most oil fields, and the working conditions are complex due to the fact that the rod, the pipe and the pump work in the severe environment deep underground and the dynamic change of the oil reservoir environment in the production process, and faults of the pumping unit occur frequently. The underground working condition is judged timely and accurately, and then the operating state of the pumping unit and the reason of the change of the yield are analyzed, so that production personnel and decision makers can make an oil well management operation decision timely, further deterioration of the working condition of an oil well is avoided to a certain extent, and the purpose of improving and optimizing the production efficiency is achieved;
the oil well indicator diagram reflects the change of the load of the suspension point of the pumping unit along with the displacement, and the method for judging the underground working condition of the pumping unit by analyzing the geometric shape of the indicator diagram is a common method. The traditional indicator diagram analysis is usually implemented by manually identifying the indicator diagram collected on site and then judging how the underground working condition is, and then a site engineer provides corresponding measures according to the result. With continuous iterative update of the automation technology and the importance of oil field enterprises on automation and digitization, the indicator diagram diagnosis method based on machine learning is further developed and applied. Most of the traditional methods are used for distinguishing common underground working conditions aiming at the identification of a typical indicator diagram, however, an atypical complex indicator diagram exists in actual production, the complex indicator diagrams can represent various underground single working conditions or composite working conditions, for example, the single working conditions include pump hanging, pump bumping, oil pipe leakage, plunger dropping out of a pump barrel and the like, and the composite working conditions include oil well sand production and vibration, double valve leakage, insufficient liquid supply and the like. The indicator diagram has similarity under complex working conditions, and the slight change of the shape of the indicator diagram represents different underground working conditions. The automatic diagnosis model of the indicator diagram of the oil well cannot be comprehensively and accurately constructed only by realizing the identification of a plurality of typical indicator diagrams, and the whole real-time production working conditions cannot be mastered. Therefore, the realization of accurate real-time automatic diagnosis of the indicator diagram of the oil well is a challenge of the current automatic diagnosis of the oil well fault, but the false alarm and the false failure rate of the fault are still high from the current diagnosis effect.
Disclosure of Invention
The invention aims to provide an oil well indicator diagram automatic diagnosis method based on integrated learning, and solves the problems of high false alarm and missing report rate of faults in the existing oil well fault automatic diagnosis.
The technical scheme adopted by the invention is that,
an oil well indicator diagram automatic diagnosis method based on ensemble learning specifically comprises the following steps:
step 1: acquiring historical data to construct a power map database, and performing data preprocessing and data cleaning on the data in the power map database;
and 2, step: extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the typical indicator diagram;
and 3, step 3: dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data;
and 4, step 4: adopting integrated learning construction and combining a Bagging algorithm and a decision tree to generate a random forest model, and further classifying data;
and 5: inputting the verification data set into a random forest model, and evaluating the automatic diagnosis result of the indicator diagram by using the error rate, the calculation efficiency, the accuracy and the recall rate;
step 6: and acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by using the random forest model.
The invention is also characterized in that;
in step 1, the data cleaning comprises repeated data processing, error data processing and incomplete data processing, wherein:
step 1.1: for error data, carrying out consistency detection on the data, unifying the data through data modification when the data are inconsistent, repeating the detection and modification processes until the data meet the requirements, and outputting the data;
step 1.2: for repeated data, directly removing the repeated data;
step 1.3: for incomplete data, calculate incomplete data X Lack of With other data X i Euclidean distance of (c):
dist=‖X is short of -X i2 i=1,2,…,N (1);
Sequencing the Euclidean distances dist to find the data with the minimum Euclidean distance from the incomplete data:
(X min ,Y min )=arg min dist (2);
wherein, X min For data features with minimum Euclidean distance dist, X min Is corresponding to European style
A data category label with minimum dist;
and taking the label corresponding to the data as the label of the incomplete label data.
In step 2, the extracted features include: oil extraction engineering characteristics and indicator diagram geometrical characteristics.
In step 3, the training data set is used for helping to train the model, namely parameters of a fitting curve are determined through data of the training set; the verification data set is used for model selection, namely final optimization and determination of the model are carried out, and the model is constructed in an auxiliary mode; the test data set is used to test the accuracy of the trained model.
In step 4, the Bagging algorithm specifically comprises: the algorithm trains a plurality of rounds, a training set of each round consists of n training samples randomly taken from initial training data, each initial training data can appear or not appear in a certain round of training set for a plurality of times, and a prediction function sequence is obtained after training.
In step 4, the specific flow of the random forest algorithm is as follows:
returning the sample set to randomly sample and selecting n samples;
randomly selecting k features from all the features, and establishing a decision tree for the selected samples by using the k features;
repeating the two steps for m times to generate m decision trees to form a random forest;
and for data, the classification of new data is confirmed by voting through the decision of each decision tree.
In step 4, the rule for generating each decision tree in the random forest model is as follows:
for an original data set containing m samples, obtaining a training set containing m samples through self-service sampling;
if each sample has d features, selecting an integer smaller than d, randomly selecting k features from the d features, and then selecting the optimal features from the k features when the decision tree is split on each node;
each decision tree grows to the maximum depth without pruning;
wherein the k value controls the degree of randomness of the random forest: when k = d, the random forest Lin Zhongji decision tree is generated in the same process as a traditional decision tree; when k =1, randomly selecting one attribute to divide, and setting k as k
Figure BDA0003888541110000041
d is a feature number.
The method has the advantages that through exploratory data analysis, in addition to basic indexes of the indicator diagram, a plurality of characteristics with high distinguishing degree are found, and the upper area, the lower area, the maximum load and the like make the characteristics more significant for identifying faults possibly prompted by the oil well in the indicator diagram; the method has the advantages that the clustering and decision tree are combined by utilizing the ensemble learning to form a random forest model, the model is high in generalization capability, high-dimensional data can be processed, the overall accuracy cannot be affected due to the fact that individual characteristics are occasionally lost, the calculation speed is high, and finally effective diagnosis of the indicator diagram fault type is achieved. The method can obviously improve the capability of the classification model for automatically diagnosing the indicator diagram, reduce the error rate and improve the calculation efficiency.
Drawings
FIG. 1 is a flow chart diagram of an oil well indicator diagram automatic diagnosis method based on ensemble learning according to the invention;
FIG. 2 is a basic flow chart of a random forest model in the oil well indicator diagram automatic diagnosis method based on ensemble learning.
Detailed Description
The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning is described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to an oil well indicator diagram automatic diagnosis method based on integrated learning. Secondly, a random forest model is generated by adopting integrated learning construction and combining Bagging and decision trees so as to classify data, a verification data set is input into the classification model after learning is completed, and the automatic diagnosis result of the indicator diagram is evaluated by using the error rate, the calculation efficiency, the accuracy and the recall rate. And finally, acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by utilizing the learned classification model.
As shown in FIG. 1, an oil well indicator diagram automatic diagnosis method based on ensemble learning comprises the following steps:
step 1, historical data are collected to construct a power indication graph database, and data in the power indication graph database are preprocessed and cleaned. When abnormal data such as incomplete data is encountered, deletion or interpolation is used for processing, and a deletion or interpolation method is reasonably selected according to specific conditions during processing, wherein the specific method comprises the following steps:
step 1.1, for error data, carrying out consistency detection on the data, unifying the data through data modification when the data are inconsistent, repeating the detection and modification processes until the data meet the requirements, and outputting the data;
step 1.2, directly removing the repeated data for the repeated data;
step 1.3. For incomplete data, the following procedure is generally adopted:
a. calculating incomplete data X Lack of With other data X i Euclidean distance of (c):
dist=‖X lack of -X i2 i=1,2,…,N (1);
b. Sequencing the Euclidean distances dist, and finding the data with the minimum Euclidean distance to the incomplete data:
(X min ,Y min )=argmindist (2);
wherein, X min Data feature with minimum Euclidean distance dist, Y min Is the data category label corresponding to the minimum Euclidean distance didt;
c. and taking the label corresponding to the data as the label of the incomplete label data.
And 2, extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the characteristics of the typical indicator diagram, so that the characteristics can better describe the characteristics of the indicator diagram under different fault conditions. The specific method comprises the following steps:
based on the characteristics of different oil well faults and the like possibly prompted by different indicator diagrams, the extraction of relevant characteristics comprises the following steps:
oil extraction engineering characteristics: pump depth, current water content of the well, pump fullness degree, effective stroke;
the geometrical characteristics of the indicator diagram are as follows: the load balance of the power indicator is characterized by comprising an indicator diagram maximum load, an indicator diagram minimum load, an indicator diagram theoretical upper load and theoretical lower load, an upper stroke average load, a lower stroke average load, a first peak value and a last peak value of an upper stroke curve of the indicator diagram, a first peak value and a last peak value of a lower stroke curve of the indicator diagram, an upper stroke curve average slope of the indicator diagram, an lower stroke curve average slope of the indicator diagram, an upper area, a lower area and the like.
Step 3, dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data:
training data set: helping to train the model, namely determining parameters of a fitting curve through data of a training set;
verifying the data set: the method is used for model selection, namely final optimization and determination of the model are carried out, and the model is constructed in an auxiliary manner;
testing the data set: the method is used for testing the accuracy of the trained model and preventing the overfitting phenomenon generated when the noise is large.
And 4, adopting integrated learning to construct and combine Bagging and a decision tree to generate a random forest model so as to classify the data.
The specific flow of the Bagging algorithm is as follows: the learning algorithm is used for training a plurality of rounds, a training set of each round consists of n training samples randomly taken from initial training data, each initial training data can appear for a plurality of times or does not appear in a certain round of training set based on each initial training data, a prediction function sequence H _1 and … H _ n can be obtained after training, a final prediction function H adopts a voting mode for classification problems, and a simple average method is adopted for judging new examples of regression problems.
The rule for generating each tree by the random forest algorithm is as follows:
a. for an original data set containing m samples, obtaining a training set containing m samples through self-service sampling;
b. if each sample has d features, selecting an integer k smaller than d, randomly selecting k features from the d features, and then selecting the optimal features from the k features when the decision tree is split on each node;
c. each tree grows to the maximum depth without pruning;
as shown in fig. 2, the basic flow of the random forest algorithm is:
a. returning the sample set to randomly sample and selecting n samples;
b. randomly selecting k features from all the features, and establishing a decision tree for the selected samples by using the features;
c. repeating the two steps for m times to generate m decision trees to form a random forest;
d. for new data, through each tree decision, the decision is finally voted to confirm which category is assigned.
And 5, inputting the verification data set into the classification model after learning is completed, and evaluating the automatic diagnosis result of the indicator diagram by using the error rate, the calculation efficiency, the accuracy and the recall rate.
And 6, acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by utilizing the learned classification model.
According to the oil well indicator diagram automatic diagnosis method based on the integrated learning, the classification model after the integrated learning can be used for automatic diagnosis of the oil well indicator diagram, oil well data can be collected in real time, the judgment of fault types possibly prompted can be completed, the production management efficiency can be effectively improved, and the method has certain practicability.

Claims (7)

1. An oil well indicator diagram automatic diagnosis method based on ensemble learning is characterized by comprising the following steps:
step 1: acquiring historical data to construct a power map database, and performing data preprocessing and data cleaning on the data in the power map database;
step 2: extracting the characteristics of the indicator diagram based on the oil extraction engineering theory and the typical indicator diagram;
and step 3: dividing the data into a training data set, a verification data set and a test data set based on the original data and the generated data;
and 4, step 4: adopting ensemble learning construction and combining a Bagging algorithm and a decision tree to generate a random forest model, and further classifying data;
and 5: inputting the verification data set into a random forest model, and evaluating the automatic diagnosis result of the indicator diagram by using the error rate, the calculation efficiency, the accuracy and the recall rate;
step 6: and acquiring indicator diagram data in real time, and automatically diagnosing and judging the fault reason of the indicator diagram in real time by using the random forest model.
2. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 1, the data cleaning comprises repeated data processing, error data processing and incomplete data processing, wherein:
step 1.1: for error data, carrying out consistency detection on the data, unifying the data through data modification when the data are inconsistent, repeating the detection and modification processes until the data meet the requirements, and outputting the data;
step 1.2: for repeated data, directly removing the repeated data;
step 1.3: for incomplete data, calculate incomplete data X Lack of With other data X i Euclidean distance of (c):
dist=‖X lack of -X i2 i=1,2,...,N (1);
Sequencing the Euclidean distances dist, and finding the data with the minimum Euclidean distance to the incomplete data:
(X min ,Y min )=arg min dist (2);
wherein, X min Data feature with minimum Euclidean distance dist, Y min Is the data category label corresponding to the minimum Euclidean distance dist;
and taking the label corresponding to the data as the label of the incomplete label data.
3. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein the extracted features in the step 2 comprise: oil extraction engineering characteristics and indicator diagram geometrical characteristics.
4. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 3, the training data set is used for helping to train the model, namely, parameters of a fitting curve are determined through data of the training set; the verification data set is used for model selection, namely final optimization and determination of the model are carried out, and the model is constructed in an auxiliary mode; the test data set is used to test the accuracy of the trained model.
5. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 4, the Bagging algorithm is specifically as follows: the algorithm trains multiple rounds, a training set of each round consists of n training samples randomly taken from initial training data, each initial training data can appear or not appear in a certain round of training set for multiple times based on each initial training data, a prediction function sequence is obtained after training, a final prediction function adopts a voting mode for classification problems, and a simple average method is adopted for judging new examples of regression problems.
6. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 1, wherein in the step 4, the specific flow of the random forest algorithm is as follows:
returning the sample set to randomly sample and selecting n samples;
randomly selecting k features from all the features, and establishing a decision tree for the selected samples by using the k features;
repeating the two steps for m times to generate m decision trees to form a random forest;
and for data, the classification of new data is confirmed by voting through the decision of each decision tree.
7. The method for automatically diagnosing the indicator diagram of the oil well based on the ensemble learning as claimed in claim 6, wherein in the step 4, the rule for generating each decision tree in the random forest model is as follows:
for an original data set containing m samples, obtaining a training set containing m samples through self-service sampling;
if each sample has d features, selecting an integer smaller than d, randomly selecting k features from the d features, and then selecting the optimal features from the k features when the decision tree is split on each node;
each decision tree grows to the maximum depth without pruning;
wherein the k value controls the degree of randomness of the random forest: when k = d attribute set number, the generation process of the random forest Lin Zhongji decision tree is the same as that of the traditional decision tree; when k =1, one attribute is randomly selected for division, and k is set to be
Figure FDA0003888541100000031
d is a feature number.
CN202211252063.7A 2022-10-13 2022-10-13 Oil well indicator diagram automatic diagnosis method based on ensemble learning Pending CN115630571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211252063.7A CN115630571A (en) 2022-10-13 2022-10-13 Oil well indicator diagram automatic diagnosis method based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211252063.7A CN115630571A (en) 2022-10-13 2022-10-13 Oil well indicator diagram automatic diagnosis method based on ensemble learning

Publications (1)

Publication Number Publication Date
CN115630571A true CN115630571A (en) 2023-01-20

Family

ID=84904078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211252063.7A Pending CN115630571A (en) 2022-10-13 2022-10-13 Oil well indicator diagram automatic diagnosis method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN115630571A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226767A (en) * 2023-05-08 2023-06-06 国网浙江省电力有限公司宁波供电公司 Automatic diagnosis method for experimental data of power system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226767A (en) * 2023-05-08 2023-06-06 国网浙江省电力有限公司宁波供电公司 Automatic diagnosis method for experimental data of power system
CN116226767B (en) * 2023-05-08 2023-10-17 国网浙江省电力有限公司宁波供电公司 Automatic diagnosis method for experimental data of power system

Similar Documents

Publication Publication Date Title
CN109272123B (en) Sucker-rod pump working condition early warning method based on convolution-circulation neural network
CN109255134B (en) Method for acquiring fault condition of pumping well
CN111340063B (en) Data anomaly detection method for coal mill
CN106779200A (en) Based on the Wind turbines trend prediction method for carrying out similarity in the historical data
CN112859822B (en) Equipment health analysis and fault diagnosis method and system based on artificial intelligence
CN114066242A (en) Enterprise risk early warning method and device
CN105678481A (en) Pipeline health state assessment method based on random forest model
CN112756759B (en) Spot welding robot workstation fault judgment method
US10585863B2 (en) Systems and methods for providing information services associated with natural resource extraction activities
CN115630571A (en) Oil well indicator diagram automatic diagnosis method based on ensemble learning
CN114201374A (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN113029619A (en) Underground scraper fault diagnosis method based on C4.5 decision tree algorithm
CN116186946A (en) Hydraulic system fault diagnosis method and system based on diagnosis model
CN116186624A (en) Boiler assessment method and system based on artificial intelligence
CN114416707A (en) Method and device for automated feature engineering of industrial time series data
CN116361059A (en) Diagnosis method and diagnosis system for abnormal root cause of banking business
CN117076915B (en) Intelligent fault attribution analysis method and system for FPSO crude oil process system
CN116522111A (en) Automatic diagnosis method for remote power failure
CN115270875A (en) Diaphragm pump running state monitoring method based on deep learning
CN108493933A (en) A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms
CN115640329A (en) Intelligent diagnosis method for oil well fault based on multi-source data analysis
Cao et al. Study on inferring interwell connectivity of injection-production system based on decision tree
CN116451885B (en) Water supply network health degree prediction method and device and computing equipment
CN115994231B (en) Knowledge graph optimization method for thickened oil steam distribution optimization
CN115906591B (en) XGBoost network-based oil well working fluid level calculation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination