CN112235043B - Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory - Google Patents

Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory Download PDF

Info

Publication number
CN112235043B
CN112235043B CN202010959309.9A CN202010959309A CN112235043B CN 112235043 B CN112235043 B CN 112235043B CN 202010959309 A CN202010959309 A CN 202010959309A CN 112235043 B CN112235043 B CN 112235043B
Authority
CN
China
Prior art keywords
optical fiber
value
data
adaptive
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010959309.9A
Other languages
Chinese (zh)
Other versions
CN112235043A (en
Inventor
王俊龙
刘宏月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202010959309.9A priority Critical patent/CN112235043B/en
Publication of CN112235043A publication Critical patent/CN112235043A/en
Application granted granted Critical
Publication of CN112235043B publication Critical patent/CN112235043B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/07Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems
    • H04B10/075Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Optical Communication System (AREA)

Abstract

The invention provides a distributed optical fiber abnormal sensing data repairing device based on self-adaptive long-term and short-term memory, which mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM prediction module and is used for repairing the problem of abnormal values in monitoring data. Aiming at the noise signal characteristics of the monitored data, a data filtering method of a weighted difference fusion 3 sigma threshold criterion is adopted, and the limit strain of the theoretically derived optical fiber is used as a defined threshold value in a form of combining time sequence variable filtering and position variable filtering, so that abnormal values are eliminated; and then, taking the null value of the optical fiber sensing monitoring data and the filtered abnormal value as a prediction object of the LSTM model to carry out data restoration. The Loss function Loss in the self-adaptive iterative training process is adopted to improve the learning efficiency of the training model, further reduce the accumulated error, traverse all monitoring sample points of the distributed optical fiber according to the spatial resolution to complete the repair work, and have high efficiency and high monitoring precision.

Description

Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory
Technical Field
The invention designs a missing data restoration model, in particular to a distributed optical fiber abnormal data restoration model based on self-adaptive Long-Short Term Memory (LSTM), which is used for restoring the problem of missing sample point data caused by factors such as external interference, jump of input signals and the like in the structure monitoring process of a distributed optical fiber.
Background
The distributed optical fiber sensor system as a novel structure health monitoring technology has the advantages of high measurement precision, wide measurement range, high spatial resolution and the like. In the process of monitoring the structural state of the distributed optical fiber, when the measured strain gradient jumps beyond the demodulation range of the optical fiber demodulator due to factors such as the harsh external monitoring environment, the unreasonable optical fiber layout mode, noise signals and the structural characteristics of the monitored object, the optical fiber sensing data greatly increase the probability of generating abnormal values of the data. In order to solve the above problems, it is necessary to provide a targeted data recovery model according to the characteristics of distributed optical fiber sensing data.
The time series is a remarkable characteristic of the monitoring data collected by the distributed optical fiber sensing signal because the monitoring data has a typical regular time sequence characteristic. Common time series data analysis mainly comprises a comprehensive autoregressive moving average method, a time series difference algorithm, a deep neural network, a Monte Carlo method, a long-term and short-term memory cycle and the like. The LSTM effectively overcomes the defects of gradient sudden drop and iterative explosion of a recurrent neural network and insufficient long-term memory capacity, and is more fully suitable for training long-distance time sequence information. The LSTM mainly describes the current data and the information input into the network, learns and continuously updates state information through a deep network, predicts the development trend of subsequent data by utilizing the strong memory capacity of the LSTM, and is widely applied to fault feature diagnosis, image analysis, voice recognition, time sequence prediction such as financial stock market and the like. The problem that local deletion and noise signals exist in the existing distributed optical fiber in the field of structure monitoring for collecting sequential autocorrelation time series data limits the application of a distributed optical fiber sensor system, and the problem becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory, particularly aims at the problems of local deletion and noise signals of continuous autocorrelation time sequence data acquired by a distributed optical fiber in the field of structure monitoring, optimizes an optical fiber time sequence monitoring data restoration model, effectively combines an LSTM algorithm with the characteristics of optical fiber monitoring data, constructs an optical fiber abnormal data restoration model consisting of a noise reduction preprocessing module and an LSTM prediction module, and improves the monitoring precision.
In order to achieve the purpose of the invention, the invention adopts the following inventive concept:
based on the excellent performance of the LSTM in the time sequence prediction, the invention effectively combines the LSTM algorithm and the optical fiber monitoring data characteristic together to construct an optical fiber abnormal data restoration model consisting of a noise reduction preprocessing module and an LSTM prediction module.
Firstly, the weighted difference and the threshold value method are adopted to be fused through a 3 sigma criterion to finish noise pretreatment.
Then, according to the fact that the positions and the quantity of the missing data of different sample points are different, the invention provides an input cycle length of adaptive iteration based on optical fiber sampling frequency change in a targeted mode, and then a corresponding error loss function is defined to constrain the learning efficiency of an LSTM network model. Because the predicted value at the previous moment can become the independent variable value at the next moment, so that the iteration error accumulation exists, the invention sets an average correction coefficient delta to improve the fitting degree of the average correction coefficient delta and the measured value. And finally, completing data restoration work by traversing all the sample points. The invention provides a distributed optical fiber abnormal data restoration model based on self-adaptive LSTM, effectively combines the improved LSTM algorithm with distributed optical fiber data, and improves the prediction precision of the LSTM algorithm on the optical fiber abnormal data.
According to the inventive concept, the invention adopts the following technical scheme:
a distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory combines a self-adaptive LSTM algorithm with distributed optical fiber time sequence data, and completes abnormal data restoration work by changing the optical fiber time sequence input period length of the LSTM and control parameters in the training process in a self-adaptive manner; the model mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM data prediction module, and specifically comprises the following execution steps:
1) Firstly, fusing a weighted difference algorithm based on an optical fiber time sequence variable and a 3 sigma threshold noise reduction algorithm based on a position variable through a noise reduction preprocessing module, so as to improve the noise reduction effect of an optical fiber abnormal noise signal;
2) By constructing the self-adaptive LSTM prediction module, two self-adaptive parameters of the input cycle length L and the Loss function Loss of the LSTM are continuously updated, and an error correction coefficient delta is set, so that the repair precision of the model is improved.
Preferably, the specific method of step 1) is: defining a weighted disturbance factor alpha i in a noise reduction preprocessing module; defining a strain threshold based on the mechanical characteristics of the optical fiber based on a 3 sigma threshold criterion of the position variable to further constrain the effective range of the monitoring value; combining a time sequence variable weighting difference algorithm with a 3 sigma threshold method of a position variable to carry out noise signal filtering pretreatment; in the step 1), the method specifically comprises the following steps:
1-1) in the time sequence variable weighted difference method, the independent variable is used as a time sequence, the dependent variable is used as a measured strain value, and the weighted first-order difference is calculated as follows:
Figure GDA0003911504620000021
wherein m represents participation in the calculation of y x The window length of (d); y is mean Representing the mean of m samples within a window, i.e.
Figure GDA0003911504620000022
i denotes the number of sampling points, y i Indicating the monitored value of the sample point, alpha i Is a weighted perturbation factor which satisfies:
Figure GDA0003911504620000023
wherein the content of the first and second substances,
Figure GDA0003911504620000024
to pair
Figure GDA0003911504620000025
The influence factor of (2) is 1/m;
Figure GDA0003911504620000026
to pair
Figure GDA0003911504620000027
The influence factor of (2/m); by the way of analogy, the method can be used,
Figure GDA0003911504620000028
to pair
Figure GDA0003911504620000029
The influence factor of (a) is m-1/m;
1-2) in the position variable 3 sigma threshold criterion, setting the threshold on the basis of the standard 3 sigma, mainly by the ultimate stress sigma of the mechanical strength of the fiber u Deducing an optical fiberEffective strain monitoring range [ epsilon ] minmax ](ii) a Allowable stress of engineering
Figure GDA0003911504620000031
The limit strain epsilon available with the fiber then satisfies:
E(1+cε)ε≤[σ]
wherein c is a nonlinear equilibrium constant and takes a value of 3.0-6.0; n is allowable stress balance constant, and the value is 3.0-5.0; combining the two formulas, and calculating the value range of the limited limit strain of the optical fiber as follows:
Figure GDA0003911504620000032
wherein E represents the elastic modulus of the optical fiber, σ u Represents the ultimate breaking stress of the optical fiber;
when data filtering of position variables is carried out, an extreme threshold method is adopted to screen abnormal values
Figure GDA0003911504620000033
Defining a constraint coefficient K based on the mean value and standard deviation of sampled data at the same time of an optical fiber 1 And K 2 To define the width of the range, the normal measurement of width satisfies:
Figure GDA0003911504620000034
wherein, K 1 ε min Should approximate μ -k σ; k 2 ε max Approximate μ + k σ; sigma is the standard deviation of all sampling values in a value window, mu is equal to the mean value of the sample points, k is a confidence range coefficient, and k belongs to [0,3]]。
The invention carries out denoising pretreatment on the monitored original data. And removing the defined gross error by adopting a noise reduction algorithm based on the weighted difference of the time sequence sliding window and a position variable 3 sigma threshold criterion. When the independent variable is time based on the sampling frequency and the dependent variable is the measured strain. Data y of x time x And x-1 time y x-1 Is compared with the average weighted error of the data at the current m time instants. And after the weighted differential filtering is finished, converting the independent variable into a position variable, wherein the dependent variable is a dependent variable. And screening the position variable monitoring data by adopting a threshold method based on the strain limit of the optical fiber processing module, and obtaining the limit stress sigma u according to the mechanical strength of the optical fiber.
Preferably, the specific method of step 2) is: in the self-adaptive LSTM prediction module, the training efficiency of the model is improved by utilizing the self-adaptive input time period length L and the improved custom Loss function Loss; in the step 2), the method specifically comprises the following steps:
2-1) in the time prediction sequence, if the time interval between the current prediction point and the last prediction point is L, after each window movement, the window length L for improving the self-adaptation is as follows:
Figure GDA0003911504620000035
wherein, d k Is an initial input period window with k as a sampling point, which satisfies:
Figure GDA0003911504620000036
wherein L is k The window number of the continuous missing data of the k sampling point position; l is sum Representing the sum of the missing windows of all sampling points; h k Representing the number of abnormal values at a sampling point k on the optical fiber; t is the total sampling time;
2-2) on the basis of a standard regularization term function, integrating the segmented input cycle length L and the window step length distance n by using the thought of a two-dimensional data group, and defining a training Loss function Loss of the LSTM so as to improve the learning efficiency; l is the length of the current segmentation window, n is less than L, n is the number of the step of the sample point of the prediction set, and the input and the output of the hidden layer meet a two-dimensional array with dimension (L-n, n); according to an error calculation formula, a Loss function Loss in the training process is defined as:
Figure GDA0003911504620000041
in the formula, i represents the number of sampling points;
Figure GDA0003911504620000042
is the average of the split windows; y is i Representing a predicted value; l represents the input cycle length; n represents the step distance of the prediction window; r (omega) is a regularization term used for limiting interference noise in model learning; β is its controllable weight factor constant.
Preferably, in the adaptive LSTM prediction module, an average error coefficient delta is defined by calculating the difference of monitoring values of sampling points of adjacent positions of n steps before the time t to be measured based on a position independent variable to correct an actual LSTM prediction value, so that the iteration error of optical fiber data is reduced; in step n, the average coefficient of variation δ satisfies:
Figure GDA0003911504620000043
in the formula, beta i For the proposed weight factor, equal to
Figure GDA0003911504620000044
n represents the step distance of the prediction window,
i represents the number of movements of the monitoring value at the previous time, j represents the number of movements of the monitoring value at the next time, y t-i
Indicating the monitored value at the previous time instant,
y t-j the actual predicted value at the time t is represented by the monitoring value at the next time
Figure GDA0003911504620000045
The output value is:
Figure GDA0003911504620000046
in the formula, y t Representing the theoretical prediction of the LSTM,
Figure GDA0003911504620000047
the average value of the measured values of the n step length sampling points is represented, and the larger the spatial resolution is, the closer the monitoring values of the adjacent sampling points are.
Preferably, the 3 σ criterion filtering is performed on all the monitored data timing and position variables. If y x Within the confidence interval, the data is regarded as a normal value, otherwise, the data is regarded as a noise signal, and NaN is adopted to replace the original data;
μ-kσ≤y x ≤μ+kσ
where k represents the range of confidence intervals, generally, k ∈ [0,3];
then constructing an LSTM prediction module, and grouping the preprocessed data according to a space sampling rate, wherein one space sample point is regarded as a group of independent time sequences;
defining a time sequence variable set T and a corresponding measurement strain set epsilon according to the characteristics of the monitoring data;
T=(t k ,t k+1τ ,…t k+dτ )
ε=(ε kk+1τ ,…ε k+dτ )
wherein, the time sequence independent variable t at any time j The position of (a) satisfies:
Figure GDA0003911504620000051
in the formula, t 0 The method comprises the steps that initial acquisition time of an optical fiber is obtained, and the sampling frequency of an optical fiber demodulator is f; suppose L k Is the number of windows of k-position continuous missing data of the sampling point, L sum Representing the sum of the missing windows of all the sampling points; initial input period window length d k Comprises the following steps:
Figure GDA0003911504620000052
in the formula,d sum The number of the sampling points with normal monitoring values can be expressed as:
d sum =m-H k
in the formula, H k Representing the number of abnormal values at a sampling point k on the optical fiber, wherein m is the total sampling time; and if the time interval between the current prediction point and the last prediction point is set, L of the length of the self-adaptive window meets the following conditions:
Figure GDA0003911504620000053
defining a Loss function Loss in the training process according to an error calculation formula and the length of a segmentation window;
according to the error calculation formula, if L is the length of the current segmentation window (n < L), and n is the number of sample point steps in the prediction set, the Loss function Loss during the training process can be defined as:
Figure GDA0003911504620000054
in the formula: r (omega) is a regularization term used for limiting interference noise in model learning, and beta is a controllable weight factor thereof;
and finally, setting an average error coefficient delta to correct the fitting degree of the estimated value by calculating the difference of the monitoring values of the sampling points of the H step lengths adjacent to the moment to be measured based on the position independent variable information.
The principle of the invention is as follows:
the invention relates to a distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory, which specifically comprises the following steps:
firstly, according to the characteristics of discrete points generated by monitoring data, the invention provides a weighting idea during time sequence variable differential filtering, and provides a threshold method based on the mechanical characteristics of optical fibers in a position variable 3 sigma criterion noise reduction algorithm. The noise signal is preprocessed by fusing and filtering the two signals;
then, a self-adaptive LSTM prediction model is provided; the length adaptation of an adaptive window is adopted to modify the limitation of sample point prediction redundancy caused by the fact that the number of abnormal values is small due to random setting of a traditional LSTM model according to experience;
continuously adjusting parameter values of regularization in Loss functions Loss and LSTM layers in the model training process to improve the model learning efficiency;
because the prediction value of the output of the prediction sequence at the last moment can become a training set at the next moment along with the movement of an input window, the problem of error accumulation exists; therefore, the difference value of the monitoring values of the sampling points at the adjacent positions of H steps before the moment is calculated in a weighting mode, an average iteration error coefficient delta is provided, and a predicted value is updated, so that the prediction accuracy of the LSTM is further improved.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:
1. aiming at the data problems of abnormal values and null values generated in the monitoring process of a distributed optical fiber structure, the invention provides a noise reduction method based on the weighted difference fusion 3 sigma threshold criterion of a sliding window, and provides noise (discrete points) in monitoring data; then, taking the denoised sample value and the system null value as the repair object of the LSTM model;
2. aiming at the condition that the lengths of the lost segments of the sampling points in the optical fiber monitoring data are different, the invention adopts a self-adaptive sliding window to replace the input cycle length of an LSTM time sequence variable; redefining a loss function in the training process according to the length of the segmentation window; then, in order to further reduce the prediction error, setting an average error coefficient delta to replace a null value at the moment of the prediction sequence with a corresponding correction prediction value, and keeping the existing measured value at the moment of the sequence; and finally, the calculation accuracy of the output sequence predicted by the LSTM and the actual monitoring value is calculated through cross validation, so that the efficiency and the calculation accuracy are high.
Drawings
Fig. 1 is a flow chart of an improved algorithm of the preferred embodiment of the present invention.
Fig. 2 is a time series variation of sampling points for a preferred embodiment of the present invention.
Fig. 3 is a schematic diagram of distributed fiber monitoring data according to a preferred embodiment of the present invention.
Detailed Description
The invention will now be further described and illustrated with reference to the following examples and drawings. The following description is illustrative and not intended to limit the scope of the invention.
The first embodiment is as follows:
in this embodiment, a distributed optical fiber abnormal data restoration device based on adaptive long-term and short-term memory combines an adaptive LSTM algorithm with distributed optical fiber time sequence data, and adaptively changes the optical fiber time sequence input period length of LSTM and control parameters in a training process, thereby completing abnormal data restoration work; the model mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM data prediction module, and specifically comprises the following execution steps:
1) Firstly, fusing a weighted difference algorithm based on an optical fiber time sequence variable and a 3 sigma threshold noise reduction algorithm based on a position variable through a noise reduction preprocessing module, so as to improve the noise reduction effect of an optical fiber abnormal noise signal;
2) By constructing the self-adaptive LSTM prediction module, two self-adaptive parameters of the input cycle length L and the Loss function Loss of the LSTM are continuously updated, and an error correction coefficient delta is set, so that the repair precision of the model is improved.
The method is used for solving the problems that the distributed optical fiber collects sequential data with continuous autocorrelation in the structure monitoring field and has local deletion and noise signals, optimizing an optical fiber sequential monitoring data restoration model, effectively combining an LSTM algorithm with optical fiber monitoring data characteristics, constructing an optical fiber abnormal data restoration model consisting of a noise reduction preprocessing module and an LSTM prediction module, and improving monitoring precision.
Example two:
this embodiment is substantially the same as the first embodiment, and is characterized in that:
in this embodiment, referring to fig. 1 to fig. 3, the specific method of step 1) is:
defining a weighted disturbance factor alpha in a noise reduction preprocessing module i (ii) a A 3 sigma threshold criterion based on position variables, the definition being based onThe strain threshold value of the mechanical property of the optical fiber is used for further restricting the effective range of the monitoring value; combining a time sequence variable weighting difference algorithm with a 3 sigma threshold method of a position variable to carry out noise signal filtering pretreatment;
1-1) in the time sequence variable weighted difference method, the independent variable is used as a time sequence, the dependent variable is used as a measured strain value, and the weighted first-order difference is calculated as follows:
Figure GDA0003911504620000071
wherein, y x And y x-1 Representing the values of x and x-1 sample points, respectively, m representing the number of samples involved in computing y x The window length of (d); y is mean Representing the mean of m samples within a window, i.e.
Figure GDA0003911504620000072
i denotes the number of sampling points, y i Indicating the monitored value of the sample point.
α i Is a weighted perturbation factor which satisfies:
Figure GDA0003911504620000073
wherein the content of the first and second substances,
Figure GDA0003911504620000074
to pair
Figure GDA0003911504620000075
The influence factor of (2) is 1/m;
Figure GDA0003911504620000076
to pair
Figure GDA0003911504620000077
The influence factor of (2/m); by the way of analogy, the method can be used,
Figure GDA0003911504620000078
to pair
Figure GDA0003911504620000079
The influence factor of (a) is m-1/m;
1-2) in the position variable 3 sigma threshold criterion, setting the threshold on the basis of the standard 3 sigma, mainly by the ultimate stress sigma of the mechanical strength of the fiber u Deducing effective strain monitoring range [ epsilon ] of optical fiber minmax ](ii) a Allowable stress of engineering
Figure GDA00039115046200000710
The limit strain epsilon available with the fiber then satisfies:
E(1+cε)ε≤[σ]
wherein c is a nonlinear equilibrium constant and takes a value of 3.0-6.0; n is allowable stress balance constant, and the value is 3.0-5.0; combining the two formulas, and calculating the value range of the limited limit strain of the optical fiber as follows:
Figure GDA0003911504620000081
wherein E represents the elastic modulus, σ u Representing the ultimate breaking stress of the fiber.
When data of position variables are filtered, an extreme threshold method is adopted to screen abnormal values
Figure GDA0003911504620000082
Defining a constraint coefficient K based on the mean value and standard deviation of sampled data at the same time of an optical fiber 1 And K 2 To define the width of the range, the normal measurement of width satisfies:
Figure GDA0003911504620000083
wherein, K 1 ε min Should approximate μ -k σ; k 2 ε max Approximate μ + k σ; σ is the standard deviation of all sample values in the value window, μ is the mean of the sample points, k is the confidence range coefficient, k belongs to [0,3]]。
In the embodiment, for the acquired original data, data denoising preprocessing is performed based on a weighted difference and a 3 σ threshold criterion, and the spatial position after filtering is set to NaN, so that subsequent data recovery work is facilitated. When data filtering based on time sequence variables is carried out, when a weighted difference algorithm based on a sliding window is adopted to compare numerical values at adjacent moments, a reasonable window length m and a window weighted disturbance factor alpha need to be set i . When data of position variables are filtered, an extreme threshold method is adopted to screen abnormal values
Figure GDA0003911504620000084
First, the elastic modulus E and the ultimate stress σ of the optical fiber under the experimental conditions u And the allowable stress balance coefficient n and the nonlinear balance constant c of the optical fiber can obtain the ultimate strain value of the optical fiber. In the embodiment, in a noise reduction preprocessing module, a weighted disturbance factor alpha is provided on the basis of a first-order difference algorithm of a time sequence variable i The concept of (a); on the basis of a 3 sigma criterion of a position variable, proposing a strain threshold value thought based on the mechanical characteristics of the optical fiber to further constrain the effective range of the monitoring value; the time sequence variable weighted difference algorithm is combined with the 3 sigma threshold method of the position variable, and the purpose of noise signal filtering preprocessing is achieved together.
In this embodiment, the specific method of step 2) is as follows: in the self-adaptive LSTM prediction module, the training efficiency of the model is improved by utilizing the self-adaptive input time period length L and the improved custom Loss function Loss;
2-1) in the time pre-sequencing, if the time interval between the current predicted point and the last predicted point is L, after each window movement, the window length L of the improved self-adaptation is as follows:
Figure GDA0003911504620000085
wherein d is k Is an initial input period window with k as a sampling point, which satisfies:
Figure GDA0003911504620000086
wherein L is k Is the window number of consecutive missing data for the k sample point location; l is sum Representing the sum of the missing windows of all the sampling points; h k Representing the number of abnormal values at a sampling point k on the optical fiber; t is the total sampling time;
2-2) on the basis of a standard regularization term function, integrating the segmented input cycle length L and the window step length distance n by utilizing the thought of a two-dimensional data group, and defining a training Loss function Loss of the LSTM so as to improve the learning efficiency; l is the length of the current segmentation window, n is less than L, n is the number of the sample point steps of the prediction set, and the input and the output of the hidden layer meet a two-dimensional array with the dimension of (L-n, n); according to the error calculation formula, the Loss function Loss in the training process is defined as:
Figure GDA0003911504620000091
in the formula, i represents the number of sampling points;
Figure GDA0003911504620000092
is the average of the split windows; y is i Representing a monitored value; l represents an input period length; n represents the step distance of the window; r (omega) is a regularization term used for limiting interference noise in model learning; beta is its controllable weight factor constant.
The embodiment proposes the idea of a two-dimensional data set to integrate the length L of the segmented input period and the window step length distance n, and redefines the training Loss function Loss of the LSTM, so as to improve the learning efficiency.
In this embodiment, in the adaptive LSTM prediction module, an average error coefficient δ is defined by calculating a difference between monitoring values of sampling points at adjacent positions n steps before a time t to be measured based on a position independent variable to correct an actual LSTM prediction value, thereby reducing an iterative error of optical fiber data; in step n, the average coefficient of variation δ satisfies:
Figure GDA0003911504620000093
in the formula, beta i For the proposed weight factor, equal to
Figure GDA0003911504620000094
n represents the distance of stepping, i represents the number of movements of the monitored value at the previous time, j represents the number of movements of the monitored value at the subsequent time, y t-i Indicating the monitored value, y, of the previous moment t-j The actual predicted value at the time t is represented by the monitoring value at the next time
Figure GDA0003911504620000095
The output value is:
Figure GDA0003911504620000096
in the formula, y t Representing the predicted value of the LSTM;
Figure GDA0003911504620000097
the average value of the measured values of the n step length sampling points is represented, and the larger the spatial resolution is, the closer the monitoring values of the adjacent sampling points are.
The embodiment is based on a repair model of distributed optical fiber abnormal sensing data of self-adaptive Short-Term Memory (LSTM) and is used for repairing abnormal values in monitoring data. The model of the embodiment mainly comprises a noise reduction preprocessing module and an adaptive LSTM prediction module. Firstly, aiming at the noise signal characteristics of monitoring data, a data filtering method of a weighted difference fusion 3 sigma threshold rule is adopted, and the limit strain of an optical fiber derived theoretically is used as a defined threshold value through a form of combining time sequence variable filtering and position variable filtering, so that abnormal values are eliminated. And then, taking the null value of the optical fiber sensing monitoring data and the filtered abnormal value as a prediction object of the LSTM model to carry out data restoration. In the LSTM model, because the number of data lost at different sample points in the sampled data is different, the present embodiment uses a self-adaptive input window as the input cycle length of the LSTM timing variable; a Loss function Loss in the self-adaptive iterative training process is adopted to improve the learning efficiency of the training model; in order to further reduce the accumulated error, the average difference between the predicted time of each output and n adjacent space sampling points of the previous step length is calculated, and an average error coefficient delta is set to correct the predicted value of the LSTM at the current time. And finally traversing all monitoring sample points of the distributed optical fiber according to the spatial resolution to finish the repair work.
Example three:
this embodiment is substantially the same as the previous embodiment, and is characterized in that:
in the present embodiment, as shown in fig. 2, 3 σ criterion filtering is performed on all the monitoring data timing and position variables. If y x Within the confidence interval, the data is regarded as a normal value, otherwise, the data is regarded as a noise signal, and NaN is adopted to replace the original data;
μ-kσ≤y x ≤μ+kσ
where k represents the range of confidence intervals, generally, k ∈ [0,3];
then constructing an LSTM prediction module, and grouping the preprocessed data according to a space sampling rate, wherein one space sample point is regarded as a group of independent time sequences;
defining a time sequence variable set T and a corresponding measurement strain set epsilon according to the characteristics of the monitoring data, and referring to FIG. 2;
T=(t k ,t k+1τ ,…t k+dτ )
ε=(ε kk+1τ ,…ε k+dτ )
wherein, the time sequence independent variable t at any time j Satisfies the following conditions:
Figure GDA0003911504620000101
in the formula, t 0 The initial acquisition time of the optical fiber is set, and the sampling frequency of the optical fiber demodulator is f; suppose L k Is the window number, L, of the sampling point k position continuously missing data sum Indicates all the miningThe sum of the sampling point missing windows; initial input period window length d k Comprises the following steps:
Figure GDA0003911504620000102
in the formula (d) sum The number of the sampling points with normal monitoring values can be expressed as:
d sum =m-H k
in the formula, H k Representing the number of abnormal values at a sampling point k on the optical fiber, wherein m is the total sampling time; and L, setting the time interval between the current prediction point and the last prediction point, wherein the length L of the self-adaptive window meets the following conditions:
Figure GDA0003911504620000103
defining a Loss function Loss in the training process according to an error calculation formula and the length of a segmentation window;
according to the error calculation formula, if L is the length of the current segmentation window (n < L), and n is the number of sample point steps in the prediction set, the Loss function Loss during the training process can be defined as:
Figure GDA0003911504620000111
in the formula: r (omega) is a regularization term used for limiting interference noise in model learning, and beta is a controllable weight factor thereof;
and finally, setting an average error coefficient delta to correct the fitting degree of the estimated value by calculating the difference of the monitoring values of the sampling points of the H step lengths adjacent to the moment to be measured based on the position independent variable information. In the embodiment, the parameter values of regularization in Loss functions Loss and LSTM layers are continuously adjusted in the model training process so as to improve the model learning efficiency.
Example four:
this embodiment is substantially the same as the previous embodiment, and is characterized in that:
in this embodiment, as shown in fig. 1, a distributed fiber anomaly data restoration model based on adaptive LSTM includes the following specific implementation steps:
1) For the acquired original data, carrying out data noise reduction preprocessing based on a weighted difference and 3 sigma threshold criterion, and setting the spatial position after filtering as NaN so as to facilitate subsequent data restoration work;
when data filtering based on time sequence variables is carried out, when numerical values at adjacent moments are compared by adopting a weighted difference algorithm based on a sliding window, a reasonable window length m and a window weighted disturbance factor alpha need to be set i The method comprises the following specific operations:
Figure GDA0003911504620000112
wherein y is x And y x-1 Values representing x and x-1 sample points, y, respectively mean Representing the mean value of m sampling points in the window range;
when data of position variables are filtered, an extreme threshold method is adopted to screen abnormal values
Figure GDA0003911504620000113
First, the elastic modulus E and the ultimate stress σ of the optical fiber under the experimental conditions u When the stress balance coefficient n allowed by engineering and the nonlinear balance constant c of the optical fiber are used, the obtained ultimate strain value of the optical fiber meets the following requirements:
Figure GDA0003911504620000114
the specific operation of threshold screening is as follows:
Figure GDA0003911504620000115
when data filtering based on time sequence variables and position variables is performed through positive distribution criteria, the specific operations are as follows:
μ-kσ≤y x ≤μ+kσ
wherein mu and sigma respectively represent the mean value and standard deviation of adjacent sampling data at the moment, and k is the length range of the confidence interval; when y is x If the confidence interval is within the confidence interval, the normal value is obtained, otherwise, the normal value is determined as an abnormal value, and NaN is used for replacing the abnormal value;
2) Constructing a self-adaptive LSTM prediction module; grouping the preprocessed data according to spatial resolution, namely each sampling point is an independent prediction array, and adaptively defining the input period length L and the time step length H of window movement in each sampling point;
fig. 3 is a schematic diagram of data acquired by the distributed optical fiber sensing system, where NaN is defined as a null value, and a noise signal is used as a noise signal when a noise reduction algorithm is satisfied in normal monitoring values;
counting the time interval l between the current adjacent prediction point and the previous adjacent prediction point, and calculating the window length of the self-adaptive input:
Figure GDA0003911504620000121
wherein d is k The window period length for the initial transformation satisfies:
Figure GDA0003911504620000122
window number L of continuous missing data at k position of sampling point k ;L sum Is the sum of the missing windows of all the sampling points; t is the total sampling time; h k Representing the number of abnormal values at a sampling point k on the optical fiber;
then, according to a self-defined Loss function Loss in the training process, calculating a specific Loss parameter value Loss as follows:
Figure GDA0003911504620000123
where R (ω) is the default regularization term; β is an initial set weight factor constant;
3) According to the set calculation formula of the error correction coefficient, the average correction coefficient delta of the first n sampling points in the output stepping window of each point to be measured can be calculated as
Figure GDA0003911504620000124
In the formula, beta i For the proposed weight factor, equal to
Figure GDA0003911504620000125
Comparing the average value of the measured values of the n step sampling points with the estimated value, the actual predicted value at the moment i
Figure GDA0003911504620000126
The output can be expressed as:
Figure GDA0003911504620000127
thus, according to the time variable, all time sequence missing data repairing work of the sampling point is completed through traversing by continuously moving the window of the training set; and finally, according to the graph shown in fig. 3, traversing and predicting missing data of all sampling points according to the position variables.
The above examples are given for the purpose of illustrating the invention clearly and not for the purpose of limiting the same, and it will be apparent to the skilled person that modifications in other forms can be made on the basis of the above description, and it is not intended to be exhaustive of all the embodiments and obvious variations and modifications are possible within the scope of the invention.

Claims (2)

1. A distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory is characterized in that: combining the self-adaptive LSTM algorithm with distributed optical fiber time sequence data, and changing the optical fiber time sequence input cycle length of the LSTM and control parameters in the training process in a self-adaptive manner so as to finish abnormal data repair work; the device mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM data prediction module, and specifically comprises the following execution steps:
1) Firstly, fusing a weighted difference algorithm based on an optical fiber time sequence variable and a 3 sigma threshold noise reduction algorithm based on a position variable through a noise reduction preprocessing module, so as to improve the noise reduction effect of an optical fiber abnormal noise signal;
2) By constructing a self-adaptive LSTM prediction module, continuously updating two self-adaptive parameters of an input cycle length L and a Loss function Loss of the LSTM, and setting an error correction coefficient delta, thereby improving the repair precision of the model;
the specific method of the step 1) comprises the following steps:
defining a weighted disturbance factor alpha in a noise reduction preprocessing module i (ii) a Defining a strain threshold based on the mechanical properties of the optical fiber based on a 3 sigma threshold criterion of the position variable to further constrain the effective range of the monitored values; combining a time sequence variable weighted difference algorithm with a 3 sigma threshold method of a position variable to carry out noise signal filtering pretreatment;
1-1) in the time sequence variable weighted difference method, the independent variable is used as a time sequence, the dependent variable is used as a measured strain value, and the weighted first-order difference is calculated as follows:
Figure FDA0003911504610000011
where m represents participation in the computation of y x The window length of (d); y is mean Representing the mean of m samples within a window, i.e.
Figure FDA0003911504610000012
i denotes the number of sampling points, y i Monitor value, alpha, representing a sample point i Is a weighted perturbation factor which satisfies:
Figure FDA0003911504610000013
wherein the content of the first and second substances,
Figure FDA0003911504610000014
to pair
Figure FDA0003911504610000015
The influence factor of (2) is 1/m;
Figure FDA0003911504610000016
for is to
Figure FDA0003911504610000017
The influence factor of (2/m); by the way of analogy, the method can be used,
Figure FDA0003911504610000018
for is to
Figure FDA0003911504610000019
The influence factor of (a) is m-1/m;
1-2) in the position variable 3 sigma threshold criterion, setting the threshold on the basis of the standard 3 sigma, mainly by the ultimate stress sigma of the mechanical strength of the fiber u Deducing effective strain monitoring range [ epsilon ] of optical fiber minmax ](ii) a Allowable stress of engineering
Figure FDA00039115046100000110
The limit strain epsilon available with the fiber then satisfies:
E(1+cε)ε≤[σ]
wherein c is a nonlinear equilibrium constant and takes a value of 3.0-6.0; n is allowable stress balance constant, and the value is 3.0-5.0; combining the two formulas, and calculating the value range of the limited limit strain of the optical fiber as follows:
Figure FDA00039115046100000111
wherein E represents the elastic modulus, σ u Represents the ultimate breaking stress of the optical fiber;
when data of position variables are filtered, an extreme threshold method is adopted to screen abnormal values
Figure FDA0003911504610000021
Defining a constraint coefficient K based on the mean value and standard deviation of sampled data at the same time of an optical fiber 1 And K 2 To define the width of the range, the width is measured normally
Figure FDA0003911504610000022
Satisfies the following conditions:
Figure FDA0003911504610000023
wherein, K 1 ε min Should approximate μ -k σ; k 2 ε max Approximate μ + k σ; σ is the standard deviation of all sample values in the value window, μ is the mean of the sample points, k is the confidence range coefficient, k belongs to [0,3]];
The specific method of the step 2) comprises the following steps:
in the self-adaptive LSTM prediction module, the training efficiency of the model is improved by utilizing the self-adaptive input time period length L and the improved custom Loss function Loss;
2-1) in the time pre-sequencing, if the time interval between the current predicted point and the last predicted point is L, after each window movement, the window length L of the improved self-adaptation is as follows:
Figure FDA0003911504610000024
wherein d is k Is an initial input period window with k as a sampling point, which satisfies:
Figure FDA0003911504610000025
wherein L is k The window number of the continuous missing data of the k sampling point position; l is a radical of an alcohol sum Representing the sum of the missing windows of all sampling points; h k Representing the number of abnormal values at the sampling point k on the optical fiber; t is the total sampling time;
2-2) on the basis of a standard regularization term function, integrating the segmented input cycle length L and the window step length distance n by utilizing the thought of a two-dimensional data group, and defining a training Loss function Loss of the LSTM so as to improve the learning efficiency; l is the length of the current segmentation window, n is less than L, n is the number of the step of the sample point of the prediction set, and the input and the output of the hidden layer meet a two-dimensional array with dimension (L-n, n); according to the error calculation formula, the Loss function Loss in the training process is defined as:
Figure FDA0003911504610000026
in the formula, i represents the number of sampling points;
Figure FDA0003911504610000027
is the average of the split windows; y is i Representing a monitored value; l represents an input period length; n represents the step distance of the window; r (ω) is a regularization term for limiting interference noise at the time of model learning; β is its controllable weight factor constant.
2. The distributed optical fiber abnormal data restoration device based on the adaptive long-short term memory as claimed in claim 1, wherein in the adaptive LSTM prediction module, an average error coefficient delta is defined by calculating the difference of the monitoring values of the sampling points of the adjacent positions of n steps before the time t to be measured based on the position independent variable to correct the actual LSTM prediction value, thereby reducing the iterative error of the optical fiber data; in step n, the average coefficient of variation δ satisfies:
Figure FDA0003911504610000031
in the formula, beta i For the proposed weight factor, equal to
Figure FDA0003911504610000032
n represents the distance of stepping, i represents the number of movements of the monitored value at the previous time, j represents the number of movements of the monitored value at the subsequent time, y t-i Indicating the monitored value of the previous moment, y t-j Indicating a monitoring value at a later time; the actual predicted value at time t
Figure FDA0003911504610000033
The output value is:
Figure FDA0003911504610000034
in the formula, y t Representing the theoretical prediction of the LSTM,
Figure FDA0003911504610000035
the average value of the measured values of the n step length sampling points is represented, and the larger the spatial resolution is, the closer the monitoring values of the adjacent sampling points are.
CN202010959309.9A 2020-09-14 2020-09-14 Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory Active CN112235043B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010959309.9A CN112235043B (en) 2020-09-14 2020-09-14 Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010959309.9A CN112235043B (en) 2020-09-14 2020-09-14 Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory

Publications (2)

Publication Number Publication Date
CN112235043A CN112235043A (en) 2021-01-15
CN112235043B true CN112235043B (en) 2022-12-23

Family

ID=74116351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010959309.9A Active CN112235043B (en) 2020-09-14 2020-09-14 Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory

Country Status (1)

Country Link
CN (1) CN112235043B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377744B (en) * 2021-06-16 2023-07-18 北京建筑大学 Reconstruction method and device for structural anomaly monitoring data with environmental temperature correlation
CN113791275B (en) * 2021-08-30 2022-12-06 国网福建省电力有限公司 Method and system for repairing single-phase harmonic data loss
CN116760466B (en) * 2023-08-23 2023-11-28 青岛诺克通信技术有限公司 Optical cable positioning method and system
CN117451113B (en) * 2023-12-22 2024-03-26 中国电建集团华东勘测设计研究院有限公司 Self-elevating platform spud leg structure health monitoring system based on optical fiber sensing

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101690A1 (en) * 2014-12-22 2016-06-30 国家电网公司 Time sequence analysis-based state monitoring data cleaning method for power transmission and transformation device
CN109343505A (en) * 2018-09-19 2019-02-15 太原科技大学 Gear method for predicting residual useful life based on shot and long term memory network
CN109816008A (en) * 2019-01-20 2019-05-28 北京工业大学 A kind of astronomical big data light curve predicting abnormality method based on shot and long term memory network
CN110334726A (en) * 2019-04-24 2019-10-15 华北电力大学 A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure
CN110689075A (en) * 2019-09-26 2020-01-14 北京工业大学 Fault prediction method of self-adaptive threshold of refrigeration equipment based on multi-algorithm fusion
CN110879253A (en) * 2018-09-05 2020-03-13 哈尔滨工业大学 Steel rail crack acoustic emission signal detection method based on improved long-time and short-time memory network
CN110995339A (en) * 2019-11-26 2020-04-10 电子科技大学 Method for extracting and identifying time-space information of distributed optical fiber sensing signal
CN111563706A (en) * 2020-03-05 2020-08-21 河海大学 Multivariable logistics freight volume prediction method based on LSTM network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101690A1 (en) * 2014-12-22 2016-06-30 国家电网公司 Time sequence analysis-based state monitoring data cleaning method for power transmission and transformation device
CN110879253A (en) * 2018-09-05 2020-03-13 哈尔滨工业大学 Steel rail crack acoustic emission signal detection method based on improved long-time and short-time memory network
CN109343505A (en) * 2018-09-19 2019-02-15 太原科技大学 Gear method for predicting residual useful life based on shot and long term memory network
CN109816008A (en) * 2019-01-20 2019-05-28 北京工业大学 A kind of astronomical big data light curve predicting abnormality method based on shot and long term memory network
CN110334726A (en) * 2019-04-24 2019-10-15 华北电力大学 A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure
CN110689075A (en) * 2019-09-26 2020-01-14 北京工业大学 Fault prediction method of self-adaptive threshold of refrigeration equipment based on multi-algorithm fusion
CN110995339A (en) * 2019-11-26 2020-04-10 电子科技大学 Method for extracting and identifying time-space information of distributed optical fiber sensing signal
CN111563706A (en) * 2020-03-05 2020-08-21 河海大学 Multivariable logistics freight volume prediction method based on LSTM network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
改进型LSTM变形预测模型研究;许宁等;《江西理工大学学报》;20181015(第05期);全文 *

Also Published As

Publication number Publication date
CN112235043A (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN112235043B (en) Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory
CN107688871B (en) Water quality prediction method and device
CN111258297B (en) Equipment health index construction and service life prediction method based on data fusion network
CN111144542B (en) Oil well productivity prediction method, device and equipment
CN111914883B (en) Spindle bearing state evaluation method and device based on deep fusion network
CN111340292B (en) Integrated neural network PM2.5 prediction method based on clustering
CN110852515B (en) Water quality index prediction method based on mixed long-time and short-time memory neural network
CN112949828B (en) Graph convolution neural network traffic prediction method and system based on graph learning
CN113723010B (en) Bridge damage early warning method based on LSTM temperature-displacement correlation model
CN112070322B (en) High-voltage cable line running state prediction method based on long-short term memory network
CN114282443B (en) Residual service life prediction method based on MLP-LSTM supervised joint model
CN110414442B (en) Pressure time sequence data segmentation characteristic value prediction method
CN114218872B (en) DBN-LSTM semi-supervised joint model-based residual service life prediction method
CN112559598B (en) Telemetry time series data abnormity detection method and system based on graph neural network
CN115200850A (en) Mechanical equipment anomaly detection method under explicit representation of multi-point sample structure information
CN115017826A (en) Method for predicting residual service life of equipment
CN114970688A (en) Landslide monitoring data preprocessing method based on LSTMAD algorithm and Hermite interpolation method
CN111238462A (en) LSTM fiber-optic gyroscope temperature compensation modeling method based on deep embedded clustering
CN111881413B (en) Multi-source time sequence missing data recovery method based on matrix decomposition
CN113325721A (en) Model-free adaptive control method and system for industrial system
CN112767692A (en) Short-term traffic flow prediction system based on SARIMA-GA-Elman combined model
CN112905436A (en) Quality evaluation prediction method for complex software
CN113988210A (en) Method and device for restoring distorted data of structure monitoring sensor network and storage medium
CN113468720B (en) Service life prediction method for digital-analog linked random degradation equipment
CN115630582A (en) Multi-sliding-window model fused soft rock tunnel surrounding rock deformation prediction method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant