CN112215307B - Method for automatically detecting signal abnormality of earthquake instrument by machine learning - Google Patents

Method for automatically detecting signal abnormality of earthquake instrument by machine learning Download PDF

Info

Publication number
CN112215307B
CN112215307B CN202011300744.7A CN202011300744A CN112215307B CN 112215307 B CN112215307 B CN 112215307B CN 202011300744 A CN202011300744 A CN 202011300744A CN 112215307 B CN112215307 B CN 112215307B
Authority
CN
China
Prior art keywords
data
value
sample
probability density
density function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011300744.7A
Other languages
Chinese (zh)
Other versions
CN112215307A (en
Inventor
薛蕾
周蓝捷
李文惠
方伟华
王遹其
方一成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202011300744.7A priority Critical patent/CN112215307B/en
Publication of CN112215307A publication Critical patent/CN112215307A/en
Application granted granted Critical
Publication of CN112215307B publication Critical patent/CN112215307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Geophysics And Detection Of Objects (AREA)

Abstract

The invention discloses a method for automatically detecting signal abnormality of a seismic instrument by applying machine learning, which comprises the following steps: s1, collecting data sets of the same type in the past; s2, taking continuous records of each channel of each station in the data set for a fixed period of time as a sample; s3, extracting various characteristic values capable of representing signal states from each sample; s4, carrying out normalization processing on each characteristic value; s5, manufacturing a training set, a cross verification set and a test set; s6, constructing a probability density function model; selecting a threshold epsilon of the decision boundary; s7, checking the probability density function model by adopting data in the test set; s8, checking and analyzing a sample with calculation judgment errors, and adding a new characteristic value of the abnormal characteristic of the sample; then, carrying out the steps S4 to S7 again, and training an optimization model; and S9, processing the real-time data of the earthquake station according to the S2-S4, and detecting the real-time data by using an optimization model.

Description

Method for automatically detecting signal abnormality of earthquake instrument by machine learning
Technical Field
The invention relates to the field of seismic monitoring, in particular to a method for automatically detecting signal anomalies of a seismic instrument by using machine learning.
Background
At present, in the field of earthquake monitoring, seismometer equipment in a station network system can acquire and view data in real time, and abnormal station signals can be manually distinguished from data waveforms transmitted back in real time. However, as the construction of the earthquake stations is quickened, the number of the total stations in one province is increased from tens to hundreds to thousands, and after the station data is transmitted to the system, abnormal signals are difficult to distinguish in huge waveform data only by manpower, so that inconvenience is brought to earthquake monitoring work.
Disclosure of Invention
The invention aims to solve the problems and provide a method for automatically detecting signal anomalies of a seismic instrument by using machine learning, which is simple to operate and improves efficiency.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a method for automatically detecting seismic instrument signal anomalies using machine learning, comprising the steps of:
s1, collecting data sets of the same type in the past;
s2, checking and analyzing a data set, taking continuous records of each channel of each station in the data set as one sample, manually screening the sample, deleting the sample with obvious errors or vacancies, manually identifying the sample, and dividing the data set into two subsets of normal and abnormal;
s3, extracting various characteristic values capable of representing signal states from each sample;
s4, carrying out normalization processing on each characteristic value;
s5, selecting 60% of data from the normal subset as a training set; selecting 20% of data from the normal subset, 50% of data from the abnormal subset as a cross validation set, and the rest data as a test set;
s6, constructing a probability density function model according to the average value and variance of each characteristic value in the data of the training set; selecting a threshold epsilon of a judgment boundary through data in the cross verification set;
s7, aiming at a threshold epsilon of the selected judgment boundary, adopting data in a test set to test the probability density function model;
s8, checking and analyzing a sample with calculation judgment errors after the probability density function model is checked, and adding a new characteristic value of the abnormal characteristic of the sample; then, carrying out the steps S4 to S7 again, and training an optimization model;
and S9, processing the real-time data of the earthquake station according to the S2-S4, and detecting the real-time data by using an optimization model.
Further, the characteristic values in the step S3 include an average value, a median value, a maximum value, a minimum value, and an amplitude value.
Further, in the step S3, when the characteristic value is extracted from the sample, a sliding time window is first set, and then the difference value of the maximum value, the minimum value, the intermediate value, the average value and the amplitude value of the adjacent time windows is used as the characteristic value.
Further, the step S6 of constructing a probability density function model includes the following steps:
s1, for a given training set x (1) ,x (1) ,...,x (m) The average value and the variance value are calculated for each characteristic value, and the calculation formula is as follows:
where m is the number of samples, μ j For the average value of the eigenvalues j in the training set,the variance of the characteristic value j in the training set;
s2, establishing a probability density function model through the average value and the variance value, wherein the calculation formula is as follows:
wherein p (x) is a probability density function, n is the number of eigenvalues,as a probability density function of the eigenvalues j, μ j For the mean value of the eigenvalues j in the training set, +.>The variance of the characteristic value j in the training set;
s3, setting a threshold epsilon of a judgment boundary, and predicting the abnormal condition of the data by taking p (x) =epsilon as the judgment boundary, wherein the abnormal condition is normal when p (x) > epsilon, otherwise the abnormal condition is abnormal;
s4, substituting the data in the cross verification set into a probability density function model, and selecting a threshold epsilon of the judgment boundary according to the accuracy and the recall.
Compared with the prior art, the invention has the advantages and positive effects that:
according to the invention, the sample is prepared by collecting the previous data, and the probability density function model is established by extracting the characteristic values in the sample, so that when the seismic data is monitored, the real-time data of a mass of stations can be identified by inputting the real-time data of the seismic stations into the probability density function model, and the normal waveform and the abnormal waveform are automatically distinguished in a state of not participating in the manual work, so that the seismic stations with abnormal signals are screened, the monitoring efficiency is improved while the labor cost is reduced, and convenience is brought to the seismic monitoring work.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a schematic diagram of an anomaly monitoring principle;
fig. 2 is a frame structure diagram of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, modifications, equivalents, improvements, etc., which are apparent to those skilled in the art without the benefit of this disclosure, are intended to be included within the scope of this invention.
As shown in fig. 1 and 2, the invention can utilize a machine learning method to test and judge the real-time data of the earthquake station, and can rapidly identify the station with abnormal earthquake signals.
Principle of: real-time data can be considered a collection, with "normal" data generally having similarities, and "abnormal" data being data points that differ significantly from other data points, and are therefore referred to as outliers. Anomaly detection techniques in machine learning are processes that find outliers in data (data points that differ significantly from most data points).
As shown in fig. 1, assuming that the dataset has two features, x1 and x2, points in the graph that deviate too far can be considered "outliers" because outliers behave quite differently from other data.
For a given data set x (1) ,x (2) ,...,x (m) Assuming that the features satisfy a gaussian distribution, μ and σ can be calculated for each feature 2 Is the value of (1):
where m is the number of samples, μ j For the average value of the eigenvalues j in the training set,the variance of the characteristic value j in the training set;
mu and sigma are obtained 2 Given a new training example, p (x) can be calculated from the model:
wherein p (x) is a probability density function, n is the number of eigenvalues,as a probability density function of the eigenvalues j, μ j For the mean value of the eigenvalues j in the training set, +.>The variance of the characteristic value j in the training set;
when p (x) is smaller than the threshold epsilon, it is determined as abnormal.
As shown in fig. 2, the present invention is implemented as follows:
1. collecting data: collecting data sets (including data with abnormal signals) of the same type in the past;
2. data cleaning and arrangement: and checking and analyzing the data set, taking the continuous record of each channel of each station in the data set for a period of time as a sample, manually screening the sample, and deleting samples with obvious errors or vacancies in formats, contents and the like. Manually identifying and sorting the data set, performing time-course diagram on each sample, performing manual identification, and dividing the data set into two subsets of normal and abnormal;
3. characteristic engineering: various features are extracted from each sample that can represent the signal state. Such as extracting statistical features (e.g., average, median, maximum, minimum, amplitude, etc.) of the whole data; because the seismic data changes along with time, in order to reflect the time characteristic, a sliding time window needs to be set, for example, a 10s time window is set, each time the seismic data slides for 1s, statistical characteristics in the time window are extracted, and the seismic data slides once and is extracted until the tail end of the seismic data. In order to embody the variation characteristics of the data, the differences of the maximum value, the minimum value, the median value, the average value and the amplitude of the adjacent time windows are counted, and the statistical characteristics of the differences are taken out.
4. And (3) feature processing: in order to make the algorithm more effective, each characteristic value is normalized; looking at the distribution of each feature, the feature can be transformed to approximate the normal distribution.
5. Data distribution: selecting 60% of normal data from the normal data set as a training set; 20% normal data and 50% abnormal data were used as the cross validation set, the remaining data as the test set, and the label was made.
6. And (3) constructing a model: estimating the mean and variance of the features and constructing a probability density function p (x) according to the data of the training set; for the cross validation set, we tried to predict the anomalies of the data using a different threshold ε, p (x) =ε as the decision boundary, and were normal when p (x) > ε, and were otherwise anomalous. Finally, selecting a threshold epsilon according to the correct rate and the recall rate (or F1 value: F1 value = correct rate x 2/(correct rate + recall rate));
7. and (3) checking a model: for the selected threshold epsilon, adopting a test set to detect, and calculating the accuracy and recall rate (F1 value) of the abnormal inspection system;
8. optimizing a model: observing the results of the model test, if an abnormal sample is mistaken by the algorithm as normal, means that the sample has a higher p (x) value. At this time, the sample needs to be checked and analyzed, and new features which can represent the abnormal characteristics of the sample are added. Then, the 4 th to 7 th steps are carried out again, an optimal model is trained until all abnormal samples in the test set are identified;
9. practical application: and (3) processing the real-time data of the earthquake station according to steps 2-4, namely performing anomaly detection on the real-time earthquake waveform according to the optimal model obtained in step 8.
According to the invention, the sample is prepared by collecting the previous data, and the probability density function model is established by extracting the characteristic values in the sample, so that when the seismic data is monitored, the real-time data of a mass of stations can be identified by inputting the real-time data of the seismic stations into the probability density function model, and the normal waveform and the abnormal waveform are automatically distinguished in a state of not participating in the manual work, so that the seismic stations with abnormal signals are screened, the monitoring efficiency is improved while the labor cost is reduced, and convenience is brought to the seismic monitoring work.

Claims (3)

1. A method for automatically detecting signal anomalies of a seismic instrument by using machine learning, which is characterized in that: the method comprises the following steps:
s1, collecting data sets of the same type in the past;
s2, checking and analyzing a data set, taking continuous records of each channel of each station in the data set as one sample, manually screening the sample, deleting the sample with obvious errors or vacancies, manually identifying the sample, and dividing the data set into two subsets of normal and abnormal;
s3, extracting various characteristic values capable of representing signal states from each sample;
s4, carrying out normalization processing on each characteristic value;
s5, selecting 60% of data from the normal subset as a training set; selecting 20% of data from the normal subset, 50% of data from the abnormal subset as a cross validation set, and the rest data as a test set;
s6, constructing a probability density function model according to the average value and variance of each characteristic value in the data of the training set; selecting a threshold epsilon of a judgment boundary through data in the cross verification set;
s7, aiming at a threshold epsilon of the selected judgment boundary, adopting data in a test set to test the probability density function model;
s8, checking and analyzing a sample with calculation judgment errors after the probability density function model is checked, and adding a new characteristic value of the abnormal characteristic of the sample; then, carrying out the steps S4 to S7 again, and training an optimization model;
s9, processing real-time data of the earthquake station according to the S2-S4, and detecting the real-time data by using an optimization model;
the constructing the probability density function model in the step S6 comprises the following steps:
s61, for a given training set x (1) ,x (2) ,...,x (m) The average value and the variance value are calculated for each characteristic value, and the calculation formula is as follows:
where m is the number of samples, μ j For the average value of the eigenvalues j in the training set,the variance of the characteristic value j in the training set;
s62, establishing a probability density function model through the average value and the variance value, wherein the calculation formula is as follows:
wherein p (x) is a probability density function, n is the number of eigenvalues,as a probability density function of the eigenvalues j, μ j For the mean value of the eigenvalues j in the training set, +.>The variance of the characteristic value j in the training set;
s63, setting a threshold epsilon of a judgment boundary, and predicting the abnormal condition of the data by taking p (x) =epsilon as the judgment boundary, wherein the abnormal condition is normal when p (x) > epsilon, otherwise the abnormal condition is abnormal;
s64, substituting the data in the cross verification set into a probability density function model, and selecting a threshold epsilon of the judgment boundary according to the accuracy and the recall.
2. The method for automatically detecting seismic instrument signal anomalies using machine learning of claim 1, wherein: the characteristic values in the step S3 include an average value, a median value, a maximum value, a minimum value, and an amplitude value.
3. The method for automatically detecting seismic instrument signal anomalies using machine learning of claim 2, wherein: when the characteristic value is extracted from the sample in the step S3, a sliding time window is set first, and then the difference value of the maximum value, the minimum value, the intermediate value, the average value and the amplitude value of the adjacent time windows is used as the characteristic value.
CN202011300744.7A 2020-11-19 2020-11-19 Method for automatically detecting signal abnormality of earthquake instrument by machine learning Active CN112215307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011300744.7A CN112215307B (en) 2020-11-19 2020-11-19 Method for automatically detecting signal abnormality of earthquake instrument by machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011300744.7A CN112215307B (en) 2020-11-19 2020-11-19 Method for automatically detecting signal abnormality of earthquake instrument by machine learning

Publications (2)

Publication Number Publication Date
CN112215307A CN112215307A (en) 2021-01-12
CN112215307B true CN112215307B (en) 2024-03-19

Family

ID=74067857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011300744.7A Active CN112215307B (en) 2020-11-19 2020-11-19 Method for automatically detecting signal abnormality of earthquake instrument by machine learning

Country Status (1)

Country Link
CN (1) CN112215307B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113325824B (en) * 2021-06-02 2022-10-25 三门核电有限公司 Regulating valve abnormity identification method and system based on threshold monitoring
CN115240428B (en) * 2022-07-29 2024-05-14 浙江数智交院科技股份有限公司 Tunnel operation abnormality detection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647891A (en) * 2018-05-14 2018-10-12 口口相传(北京)网络技术有限公司 Data exception classification, Reasons method and device
CN109311478A (en) * 2016-12-30 2019-02-05 同济大学 A kind of automatic Pilot method for controlling driving speed based on comfort level
CN109738939A (en) * 2019-03-21 2019-05-10 蔡寅 A kind of Precursory Observational Data method for detecting abnormality
CN110389264A (en) * 2019-07-01 2019-10-29 浙江大学 A kind of detection method of exception Electro-metering
CN111666187A (en) * 2020-05-20 2020-09-15 北京百度网讯科技有限公司 Method and apparatus for detecting abnormal response time

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109311478A (en) * 2016-12-30 2019-02-05 同济大学 A kind of automatic Pilot method for controlling driving speed based on comfort level
CN108647891A (en) * 2018-05-14 2018-10-12 口口相传(北京)网络技术有限公司 Data exception classification, Reasons method and device
CN109738939A (en) * 2019-03-21 2019-05-10 蔡寅 A kind of Precursory Observational Data method for detecting abnormality
CN110389264A (en) * 2019-07-01 2019-10-29 浙江大学 A kind of detection method of exception Electro-metering
CN111666187A (en) * 2020-05-20 2020-09-15 北京百度网讯科技有限公司 Method and apparatus for detecting abnormal response time

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Anomaly recognition of ultra low frequency electric data based on artificial neutral network;jianqin An 等;《2016 9th international congress on image and signal processing,biomedical engineering and informatics》;第1-2页 *
地震前兆数据异常识别关键技术研究;刘子维 等;《中国博士学位论文全文数据库基础科学辑》;第133页 *

Also Published As

Publication number Publication date
CN112215307A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN110263846B (en) Fault diagnosis method based on fault data deep mining and learning
CN111507376B (en) Single-index anomaly detection method based on fusion of multiple non-supervision methods
CN113344134B (en) Low-voltage distribution monitoring terminal data acquisition abnormality detection method and system
CN111855810B (en) Rail foot damage identification method and system based on recurrent neural network
CN112215307B (en) Method for automatically detecting signal abnormality of earthquake instrument by machine learning
CN108802535B (en) Screening method, main interference source identification method and device, server and storage medium
CN112070073B (en) Logging curve abnormity discrimination method based on Markov chain transition probability matrix eigenvalue classification and support vector machine
CN111398798B (en) Circuit breaker energy storage state identification method based on vibration signal interval feature extraction
CN116520236B (en) Abnormality detection method and system for intelligent ammeter
CN108956111A (en) A kind of the abnormal state detection method and detection system of mechanical part
CN114118219A (en) Data-driven real-time abnormal detection method for health state of long-term power-on equipment
CN117368651B (en) Comprehensive analysis system and method for faults of power distribution network
CN111505064A (en) Catalytic combustion type methane sensor service state evaluation method
CN116466408B (en) Artificial neural network superbedrock identification method based on aeromagnetic data
CN117251814A (en) Method for analyzing electric quantity loss abnormality of highway charging pile
CN116502163A (en) Vibration monitoring data anomaly detection method based on multi-feature fusion and deep learning
Li et al. Meteorological radar fault diagnosis based on deep learning
CN113095364B (en) High-speed rail seismic event extraction method, medium and equipment using convolutional neural network
CN112732773B (en) Method and system for checking uniqueness of relay protection defect data
CN115659271A (en) Sensor abnormality detection method, model training method, system, device, and medium
CN112699609B (en) Diesel engine reliability model construction method based on vibration data
CN114547796A (en) Ball mill feature fusion fault diagnosis method based on optimized BN network
Deuschle et al. Robust sensor spike detection method based on dynamic time warping
CN112464146B (en) Key subsystem based on historical telemetering data and single-machine correlation health baseline construction method
CN110956340A (en) Engineering test detection data management early warning decision method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant