CN112235043B

CN112235043B - Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory

Info

Publication number: CN112235043B
Application number: CN202010959309.9A
Authority: CN
Inventors: 王俊龙; 刘宏月
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2022-12-23
Anticipated expiration: 2040-09-14
Also published as: CN112235043A

Abstract

The invention provides a distributed optical fiber abnormal sensing data repairing device based on self-adaptive long-term and short-term memory, which mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM prediction module and is used for repairing the problem of abnormal values in monitoring data. Aiming at the noise signal characteristics of the monitored data, a data filtering method of a weighted difference fusion 3 sigma threshold criterion is adopted, and the limit strain of the theoretically derived optical fiber is used as a defined threshold value in a form of combining time sequence variable filtering and position variable filtering, so that abnormal values are eliminated; and then, taking the null value of the optical fiber sensing monitoring data and the filtered abnormal value as a prediction object of the LSTM model to carry out data restoration. The Loss function Loss in the self-adaptive iterative training process is adopted to improve the learning efficiency of the training model, further reduce the accumulated error, traverse all monitoring sample points of the distributed optical fiber according to the spatial resolution to complete the repair work, and have high efficiency and high monitoring precision.

Description

Distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory

Technical Field

The invention designs a missing data restoration model, in particular to a distributed optical fiber abnormal data restoration model based on self-adaptive Long-Short Term Memory (LSTM), which is used for restoring the problem of missing sample point data caused by factors such as external interference, jump of input signals and the like in the structure monitoring process of a distributed optical fiber.

Background

The distributed optical fiber sensor system as a novel structure health monitoring technology has the advantages of high measurement precision, wide measurement range, high spatial resolution and the like. In the process of monitoring the structural state of the distributed optical fiber, when the measured strain gradient jumps beyond the demodulation range of the optical fiber demodulator due to factors such as the harsh external monitoring environment, the unreasonable optical fiber layout mode, noise signals and the structural characteristics of the monitored object, the optical fiber sensing data greatly increase the probability of generating abnormal values of the data. In order to solve the above problems, it is necessary to provide a targeted data recovery model according to the characteristics of distributed optical fiber sensing data.

The time series is a remarkable characteristic of the monitoring data collected by the distributed optical fiber sensing signal because the monitoring data has a typical regular time sequence characteristic. Common time series data analysis mainly comprises a comprehensive autoregressive moving average method, a time series difference algorithm, a deep neural network, a Monte Carlo method, a long-term and short-term memory cycle and the like. The LSTM effectively overcomes the defects of gradient sudden drop and iterative explosion of a recurrent neural network and insufficient long-term memory capacity, and is more fully suitable for training long-distance time sequence information. The LSTM mainly describes the current data and the information input into the network, learns and continuously updates state information through a deep network, predicts the development trend of subsequent data by utilizing the strong memory capacity of the LSTM, and is widely applied to fault feature diagnosis, image analysis, voice recognition, time sequence prediction such as financial stock market and the like. The problem that local deletion and noise signals exist in the existing distributed optical fiber in the field of structure monitoring for collecting sequential autocorrelation time series data limits the application of a distributed optical fiber sensor system, and the problem becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory, particularly aims at the problems of local deletion and noise signals of continuous autocorrelation time sequence data acquired by a distributed optical fiber in the field of structure monitoring, optimizes an optical fiber time sequence monitoring data restoration model, effectively combines an LSTM algorithm with the characteristics of optical fiber monitoring data, constructs an optical fiber abnormal data restoration model consisting of a noise reduction preprocessing module and an LSTM prediction module, and improves the monitoring precision.

In order to achieve the purpose of the invention, the invention adopts the following inventive concept:

based on the excellent performance of the LSTM in the time sequence prediction, the invention effectively combines the LSTM algorithm and the optical fiber monitoring data characteristic together to construct an optical fiber abnormal data restoration model consisting of a noise reduction preprocessing module and an LSTM prediction module.

Firstly, the weighted difference and the threshold value method are adopted to be fused through a 3 sigma criterion to finish noise pretreatment.

Then, according to the fact that the positions and the quantity of the missing data of different sample points are different, the invention provides an input cycle length of adaptive iteration based on optical fiber sampling frequency change in a targeted mode, and then a corresponding error loss function is defined to constrain the learning efficiency of an LSTM network model. Because the predicted value at the previous moment can become the independent variable value at the next moment, so that the iteration error accumulation exists, the invention sets an average correction coefficient delta to improve the fitting degree of the average correction coefficient delta and the measured value. And finally, completing data restoration work by traversing all the sample points. The invention provides a distributed optical fiber abnormal data restoration model based on self-adaptive LSTM, effectively combines the improved LSTM algorithm with distributed optical fiber data, and improves the prediction precision of the LSTM algorithm on the optical fiber abnormal data.

According to the inventive concept, the invention adopts the following technical scheme:

a distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory combines a self-adaptive LSTM algorithm with distributed optical fiber time sequence data, and completes abnormal data restoration work by changing the optical fiber time sequence input period length of the LSTM and control parameters in the training process in a self-adaptive manner; the model mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM data prediction module, and specifically comprises the following execution steps:

1) Firstly, fusing a weighted difference algorithm based on an optical fiber time sequence variable and a 3 sigma threshold noise reduction algorithm based on a position variable through a noise reduction preprocessing module, so as to improve the noise reduction effect of an optical fiber abnormal noise signal;

2) By constructing the self-adaptive LSTM prediction module, two self-adaptive parameters of the input cycle length L and the Loss function Loss of the LSTM are continuously updated, and an error correction coefficient delta is set, so that the repair precision of the model is improved.

Preferably, the specific method of step 1) is: defining a weighted disturbance factor alpha i in a noise reduction preprocessing module; defining a strain threshold based on the mechanical characteristics of the optical fiber based on a 3 sigma threshold criterion of the position variable to further constrain the effective range of the monitoring value; combining a time sequence variable weighting difference algorithm with a 3 sigma threshold method of a position variable to carry out noise signal filtering pretreatment; in the step 1), the method specifically comprises the following steps:

1-1) in the time sequence variable weighted difference method, the independent variable is used as a time sequence, the dependent variable is used as a measured strain value, and the weighted first-order difference is calculated as follows:

wherein m represents participation in the calculation of y _x The window length of (d); y is _mean Representing the mean of m samples within a window, i.e.

i denotes the number of sampling points, y _i Indicating the monitored value of the sample point, alpha _i Is a weighted perturbation factor which satisfies:

wherein the content of the first and second substances,

to pair

The influence factor of (2) is 1/m;

to pair

The influence factor of (2/m); by the way of analogy, the method can be used,

to pair

The influence factor of (a) is m-1/m;

1-2) in the position variable 3 sigma threshold criterion, setting the threshold on the basis of the standard 3 sigma, mainly by the ultimate stress sigma of the mechanical strength of the fiber _u Deducing an optical fiberEffective strain monitoring range [ epsilon ] _min ,ε _max ](ii) a Allowable stress of engineering

The limit strain epsilon available with the fiber then satisfies:

E(1+cε)ε≤[σ]

wherein c is a nonlinear equilibrium constant and takes a value of 3.0-6.0; n is allowable stress balance constant, and the value is 3.0-5.0; combining the two formulas, and calculating the value range of the limited limit strain of the optical fiber as follows:

wherein E represents the elastic modulus of the optical fiber, σ _u Represents the ultimate breaking stress of the optical fiber;

when data filtering of position variables is carried out, an extreme threshold method is adopted to screen abnormal values

Defining a constraint coefficient K based on the mean value and standard deviation of sampled data at the same time of an optical fiber ₁ And K ₂ To define the width of the range, the normal measurement of width satisfies:

wherein, K ₁ ε _min Should approximate μ -k σ; k ₂ ε _max Approximate μ + k σ; sigma is the standard deviation of all sampling values in a value window, mu is equal to the mean value of the sample points, k is a confidence range coefficient, and k belongs to [0,3]]。

The invention carries out denoising pretreatment on the monitored original data. And removing the defined gross error by adopting a noise reduction algorithm based on the weighted difference of the time sequence sliding window and a position variable 3 sigma threshold criterion. When the independent variable is time based on the sampling frequency and the dependent variable is the measured strain. Data y of x time _x And x-1 time y _x-1 Is compared with the average weighted error of the data at the current m time instants. And after the weighted differential filtering is finished, converting the independent variable into a position variable, wherein the dependent variable is a dependent variable. And screening the position variable monitoring data by adopting a threshold method based on the strain limit of the optical fiber processing module, and obtaining the limit stress sigma u according to the mechanical strength of the optical fiber.

Preferably, the specific method of step 2) is: in the self-adaptive LSTM prediction module, the training efficiency of the model is improved by utilizing the self-adaptive input time period length L and the improved custom Loss function Loss; in the step 2), the method specifically comprises the following steps:

2-1) in the time prediction sequence, if the time interval between the current prediction point and the last prediction point is L, after each window movement, the window length L for improving the self-adaptation is as follows:

wherein, d ^k Is an initial input period window with k as a sampling point, which satisfies:

wherein L is _k The window number of the continuous missing data of the k sampling point position; l is _sum Representing the sum of the missing windows of all sampling points; h _k Representing the number of abnormal values at a sampling point k on the optical fiber; t is the total sampling time;

2-2) on the basis of a standard regularization term function, integrating the segmented input cycle length L and the window step length distance n by using the thought of a two-dimensional data group, and defining a training Loss function Loss of the LSTM so as to improve the learning efficiency; l is the length of the current segmentation window, n is less than L, n is the number of the step of the sample point of the prediction set, and the input and the output of the hidden layer meet a two-dimensional array with dimension (L-n, n); according to an error calculation formula, a Loss function Loss in the training process is defined as:

in the formula, i represents the number of sampling points;

is the average of the split windows; y is _i Representing a predicted value; l represents the input cycle length; n represents the step distance of the prediction window; r (omega) is a regularization term used for limiting interference noise in model learning; β is its controllable weight factor constant.

Preferably, in the adaptive LSTM prediction module, an average error coefficient delta is defined by calculating the difference of monitoring values of sampling points of adjacent positions of n steps before the time t to be measured based on a position independent variable to correct an actual LSTM prediction value, so that the iteration error of optical fiber data is reduced; in step n, the average coefficient of variation δ satisfies:

in the formula, beta _i For the proposed weight factor, equal to

n represents the step distance of the prediction window,

i represents the number of movements of the monitoring value at the previous time, j represents the number of movements of the monitoring value at the next time, y _t-i

Indicating the monitored value at the previous time instant,

y _t-j the actual predicted value at the time t is represented by the monitoring value at the next time

The output value is:

in the formula, y _t Representing the theoretical prediction of the LSTM,

the average value of the measured values of the n step length sampling points is represented, and the larger the spatial resolution is, the closer the monitoring values of the adjacent sampling points are.

Preferably, the 3 σ criterion filtering is performed on all the monitored data timing and position variables. If y _x Within the confidence interval, the data is regarded as a normal value, otherwise, the data is regarded as a noise signal, and NaN is adopted to replace the original data;

μ-kσ≤y _x ≤μ+kσ

where k represents the range of confidence intervals, generally, k ∈ [0,3];

then constructing an LSTM prediction module, and grouping the preprocessed data according to a space sampling rate, wherein one space sample point is regarded as a group of independent time sequences;

defining a time sequence variable set T and a corresponding measurement strain set epsilon according to the characteristics of the monitoring data;

T＝(t _k ,t _k+1τ ,…t _k+dτ )

ε＝(ε _k ,ε _k+1τ ,…ε _k+dτ )

wherein, the time sequence independent variable t at any time _j The position of (a) satisfies:

in the formula, t ₀ The method comprises the steps that initial acquisition time of an optical fiber is obtained, and the sampling frequency of an optical fiber demodulator is f; suppose L _k Is the number of windows of k-position continuous missing data of the sampling point, L _sum Representing the sum of the missing windows of all the sampling points; initial input period window length d ^k Comprises the following steps:

in the formula，d _sum The number of the sampling points with normal monitoring values can be expressed as:

d _sum ＝m-H _k

in the formula, H _k Representing the number of abnormal values at a sampling point k on the optical fiber, wherein m is the total sampling time; and if the time interval between the current prediction point and the last prediction point is set, L of the length of the self-adaptive window meets the following conditions:

defining a Loss function Loss in the training process according to an error calculation formula and the length of a segmentation window;

according to the error calculation formula, if L is the length of the current segmentation window (n < L), and n is the number of sample point steps in the prediction set, the Loss function Loss during the training process can be defined as:

in the formula: r (omega) is a regularization term used for limiting interference noise in model learning, and beta is a controllable weight factor thereof;

and finally, setting an average error coefficient delta to correct the fitting degree of the estimated value by calculating the difference of the monitoring values of the sampling points of the H step lengths adjacent to the moment to be measured based on the position independent variable information.

The principle of the invention is as follows:

the invention relates to a distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory, which specifically comprises the following steps:

firstly, according to the characteristics of discrete points generated by monitoring data, the invention provides a weighting idea during time sequence variable differential filtering, and provides a threshold method based on the mechanical characteristics of optical fibers in a position variable 3 sigma criterion noise reduction algorithm. The noise signal is preprocessed by fusing and filtering the two signals;

then, a self-adaptive LSTM prediction model is provided; the length adaptation of an adaptive window is adopted to modify the limitation of sample point prediction redundancy caused by the fact that the number of abnormal values is small due to random setting of a traditional LSTM model according to experience;

continuously adjusting parameter values of regularization in Loss functions Loss and LSTM layers in the model training process to improve the model learning efficiency;

because the prediction value of the output of the prediction sequence at the last moment can become a training set at the next moment along with the movement of an input window, the problem of error accumulation exists; therefore, the difference value of the monitoring values of the sampling points at the adjacent positions of H steps before the moment is calculated in a weighting mode, an average iteration error coefficient delta is provided, and a predicted value is updated, so that the prediction accuracy of the LSTM is further improved.

Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:

1. aiming at the data problems of abnormal values and null values generated in the monitoring process of a distributed optical fiber structure, the invention provides a noise reduction method based on the weighted difference fusion 3 sigma threshold criterion of a sliding window, and provides noise (discrete points) in monitoring data; then, taking the denoised sample value and the system null value as the repair object of the LSTM model;

2. aiming at the condition that the lengths of the lost segments of the sampling points in the optical fiber monitoring data are different, the invention adopts a self-adaptive sliding window to replace the input cycle length of an LSTM time sequence variable; redefining a loss function in the training process according to the length of the segmentation window; then, in order to further reduce the prediction error, setting an average error coefficient delta to replace a null value at the moment of the prediction sequence with a corresponding correction prediction value, and keeping the existing measured value at the moment of the sequence; and finally, the calculation accuracy of the output sequence predicted by the LSTM and the actual monitoring value is calculated through cross validation, so that the efficiency and the calculation accuracy are high.

Drawings

Fig. 1 is a flow chart of an improved algorithm of the preferred embodiment of the present invention.

Fig. 2 is a time series variation of sampling points for a preferred embodiment of the present invention.

Fig. 3 is a schematic diagram of distributed fiber monitoring data according to a preferred embodiment of the present invention.

Detailed Description

The invention will now be further described and illustrated with reference to the following examples and drawings. The following description is illustrative and not intended to limit the scope of the invention.

The first embodiment is as follows:

in this embodiment, a distributed optical fiber abnormal data restoration device based on adaptive long-term and short-term memory combines an adaptive LSTM algorithm with distributed optical fiber time sequence data, and adaptively changes the optical fiber time sequence input period length of LSTM and control parameters in a training process, thereby completing abnormal data restoration work; the model mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM data prediction module, and specifically comprises the following execution steps:

The method is used for solving the problems that the distributed optical fiber collects sequential data with continuous autocorrelation in the structure monitoring field and has local deletion and noise signals, optimizing an optical fiber sequential monitoring data restoration model, effectively combining an LSTM algorithm with optical fiber monitoring data characteristics, constructing an optical fiber abnormal data restoration model consisting of a noise reduction preprocessing module and an LSTM prediction module, and improving monitoring precision.

Example two:

this embodiment is substantially the same as the first embodiment, and is characterized in that:

in this embodiment, referring to fig. 1 to fig. 3, the specific method of step 1) is:

defining a weighted disturbance factor alpha in a noise reduction preprocessing module _i (ii) a A 3 sigma threshold criterion based on position variables, the definition being based onThe strain threshold value of the mechanical property of the optical fiber is used for further restricting the effective range of the monitoring value; combining a time sequence variable weighting difference algorithm with a 3 sigma threshold method of a position variable to carry out noise signal filtering pretreatment;

wherein, y _x And y _x-1 Representing the values of x and x-1 sample points, respectively, m representing the number of samples involved in computing y _x The window length of (d); y is _mean Representing the mean of m samples within a window, i.e.

i denotes the number of sampling points, y _i Indicating the monitored value of the sample point.

α _i Is a weighted perturbation factor which satisfies:

wherein the content of the first and second substances,

to pair

The influence factor of (2) is 1/m;

to pair

The influence factor of (2/m); by the way of analogy, the method can be used,

to pair

The influence factor of (a) is m-1/m;

1-2) in the position variable 3 sigma threshold criterion, setting the threshold on the basis of the standard 3 sigma, mainly by the ultimate stress sigma of the mechanical strength of the fiber _u Deducing effective strain monitoring range [ epsilon ] of optical fiber _min ,ε _max ](ii) a Allowable stress of engineering

The limit strain epsilon available with the fiber then satisfies:

E(1+cε)ε≤[σ]

wherein E represents the elastic modulus, σ _u Representing the ultimate breaking stress of the fiber.

When data of position variables are filtered, an extreme threshold method is adopted to screen abnormal values

wherein, K ₁ ε _min Should approximate μ -k σ; k ₂ ε _max Approximate μ + k σ; σ is the standard deviation of all sample values in the value window, μ is the mean of the sample points, k is the confidence range coefficient, k belongs to [0,3]]。

In the embodiment, for the acquired original data, data denoising preprocessing is performed based on a weighted difference and a 3 σ threshold criterion, and the spatial position after filtering is set to NaN, so that subsequent data recovery work is facilitated. When data filtering based on time sequence variables is carried out, when a weighted difference algorithm based on a sliding window is adopted to compare numerical values at adjacent moments, a reasonable window length m and a window weighted disturbance factor alpha need to be set _i . When data of position variables are filtered, an extreme threshold method is adopted to screen abnormal values

First, the elastic modulus E and the ultimate stress σ of the optical fiber under the experimental conditions _u And the allowable stress balance coefficient n and the nonlinear balance constant c of the optical fiber can obtain the ultimate strain value of the optical fiber. In the embodiment, in a noise reduction preprocessing module, a weighted disturbance factor alpha is provided on the basis of a first-order difference algorithm of a time sequence variable _i The concept of (a); on the basis of a 3 sigma criterion of a position variable, proposing a strain threshold value thought based on the mechanical characteristics of the optical fiber to further constrain the effective range of the monitoring value; the time sequence variable weighted difference algorithm is combined with the 3 sigma threshold method of the position variable, and the purpose of noise signal filtering preprocessing is achieved together.

In this embodiment, the specific method of step 2) is as follows: in the self-adaptive LSTM prediction module, the training efficiency of the model is improved by utilizing the self-adaptive input time period length L and the improved custom Loss function Loss;

2-1) in the time pre-sequencing, if the time interval between the current predicted point and the last predicted point is L, after each window movement, the window length L of the improved self-adaptation is as follows:

wherein d is ^k Is an initial input period window with k as a sampling point, which satisfies:

wherein L is _k Is the window number of consecutive missing data for the k sample point location; l is _sum Representing the sum of the missing windows of all the sampling points; h _k Representing the number of abnormal values at a sampling point k on the optical fiber; t is the total sampling time;

2-2) on the basis of a standard regularization term function, integrating the segmented input cycle length L and the window step length distance n by utilizing the thought of a two-dimensional data group, and defining a training Loss function Loss of the LSTM so as to improve the learning efficiency; l is the length of the current segmentation window, n is less than L, n is the number of the sample point steps of the prediction set, and the input and the output of the hidden layer meet a two-dimensional array with the dimension of (L-n, n); according to the error calculation formula, the Loss function Loss in the training process is defined as:

in the formula, i represents the number of sampling points;

is the average of the split windows; y is _i Representing a monitored value; l represents an input period length; n represents the step distance of the window; r (omega) is a regularization term used for limiting interference noise in model learning; beta is its controllable weight factor constant.

The embodiment proposes the idea of a two-dimensional data set to integrate the length L of the segmented input period and the window step length distance n, and redefines the training Loss function Loss of the LSTM, so as to improve the learning efficiency.

In this embodiment, in the adaptive LSTM prediction module, an average error coefficient δ is defined by calculating a difference between monitoring values of sampling points at adjacent positions n steps before a time t to be measured based on a position independent variable to correct an actual LSTM prediction value, thereby reducing an iterative error of optical fiber data; in step n, the average coefficient of variation δ satisfies:

in the formula, beta _i For the proposed weight factor, equal to

n represents the distance of stepping, i represents the number of movements of the monitored value at the previous time, j represents the number of movements of the monitored value at the subsequent time, y _t-i Indicating the monitored value, y, of the previous moment _t-j The actual predicted value at the time t is represented by the monitoring value at the next time

The output value is:

in the formula, y _t Representing the predicted value of the LSTM;

The embodiment is based on a repair model of distributed optical fiber abnormal sensing data of self-adaptive Short-Term Memory (LSTM) and is used for repairing abnormal values in monitoring data. The model of the embodiment mainly comprises a noise reduction preprocessing module and an adaptive LSTM prediction module. Firstly, aiming at the noise signal characteristics of monitoring data, a data filtering method of a weighted difference fusion 3 sigma threshold rule is adopted, and the limit strain of an optical fiber derived theoretically is used as a defined threshold value through a form of combining time sequence variable filtering and position variable filtering, so that abnormal values are eliminated. And then, taking the null value of the optical fiber sensing monitoring data and the filtered abnormal value as a prediction object of the LSTM model to carry out data restoration. In the LSTM model, because the number of data lost at different sample points in the sampled data is different, the present embodiment uses a self-adaptive input window as the input cycle length of the LSTM timing variable; a Loss function Loss in the self-adaptive iterative training process is adopted to improve the learning efficiency of the training model; in order to further reduce the accumulated error, the average difference between the predicted time of each output and n adjacent space sampling points of the previous step length is calculated, and an average error coefficient delta is set to correct the predicted value of the LSTM at the current time. And finally traversing all monitoring sample points of the distributed optical fiber according to the spatial resolution to finish the repair work.

Example three:

this embodiment is substantially the same as the previous embodiment, and is characterized in that:

in the present embodiment, as shown in fig. 2, 3 σ criterion filtering is performed on all the monitoring data timing and position variables. If y _x Within the confidence interval, the data is regarded as a normal value, otherwise, the data is regarded as a noise signal, and NaN is adopted to replace the original data;

μ-kσ≤y _x ≤μ+kσ

where k represents the range of confidence intervals, generally, k ∈ [0,3];

defining a time sequence variable set T and a corresponding measurement strain set epsilon according to the characteristics of the monitoring data, and referring to FIG. 2;

T＝(t _k ,t _k+1τ ,…t _k+dτ )

ε＝(ε _k ,ε _k+1τ ,…ε _k+dτ )

wherein, the time sequence independent variable t at any time _j Satisfies the following conditions:

in the formula, t ₀ The initial acquisition time of the optical fiber is set, and the sampling frequency of the optical fiber demodulator is f; suppose L _k Is the window number, L, of the sampling point k position continuously missing data _sum Indicates all the miningThe sum of the sampling point missing windows; initial input period window length d ^k Comprises the following steps:

in the formula (d) _sum The number of the sampling points with normal monitoring values can be expressed as:

d _sum ＝m-H _k

in the formula, H _k Representing the number of abnormal values at a sampling point k on the optical fiber, wherein m is the total sampling time; and L, setting the time interval between the current prediction point and the last prediction point, wherein the length L of the self-adaptive window meets the following conditions:

and finally, setting an average error coefficient delta to correct the fitting degree of the estimated value by calculating the difference of the monitoring values of the sampling points of the H step lengths adjacent to the moment to be measured based on the position independent variable information. In the embodiment, the parameter values of regularization in Loss functions Loss and LSTM layers are continuously adjusted in the model training process so as to improve the model learning efficiency.

Example four:

in this embodiment, as shown in fig. 1, a distributed fiber anomaly data restoration model based on adaptive LSTM includes the following specific implementation steps:

1) For the acquired original data, carrying out data noise reduction preprocessing based on a weighted difference and 3 sigma threshold criterion, and setting the spatial position after filtering as NaN so as to facilitate subsequent data restoration work;

when data filtering based on time sequence variables is carried out, when numerical values at adjacent moments are compared by adopting a weighted difference algorithm based on a sliding window, a reasonable window length m and a window weighted disturbance factor alpha need to be set _i The method comprises the following specific operations:

wherein y is _x And y _x-1 Values representing x and x-1 sample points, y, respectively _mean Representing the mean value of m sampling points in the window range;

First, the elastic modulus E and the ultimate stress σ of the optical fiber under the experimental conditions _u When the stress balance coefficient n allowed by engineering and the nonlinear balance constant c of the optical fiber are used, the obtained ultimate strain value of the optical fiber meets the following requirements:

the specific operation of threshold screening is as follows:

when data filtering based on time sequence variables and position variables is performed through positive distribution criteria, the specific operations are as follows:

μ-kσ≤y _x ≤μ+kσ

wherein mu and sigma respectively represent the mean value and standard deviation of adjacent sampling data at the moment, and k is the length range of the confidence interval; when y is _x If the confidence interval is within the confidence interval, the normal value is obtained, otherwise, the normal value is determined as an abnormal value, and NaN is used for replacing the abnormal value;

2) Constructing a self-adaptive LSTM prediction module; grouping the preprocessed data according to spatial resolution, namely each sampling point is an independent prediction array, and adaptively defining the input period length L and the time step length H of window movement in each sampling point;

fig. 3 is a schematic diagram of data acquired by the distributed optical fiber sensing system, where NaN is defined as a null value, and a noise signal is used as a noise signal when a noise reduction algorithm is satisfied in normal monitoring values;

counting the time interval l between the current adjacent prediction point and the previous adjacent prediction point, and calculating the window length of the self-adaptive input:

wherein d is ^k The window period length for the initial transformation satisfies:

window number L of continuous missing data at k position of sampling point _k ；L _sum Is the sum of the missing windows of all the sampling points; t is the total sampling time; h _k Representing the number of abnormal values at a sampling point k on the optical fiber;

then, according to a self-defined Loss function Loss in the training process, calculating a specific Loss parameter value Loss as follows:

where R (ω) is the default regularization term; β is an initial set weight factor constant;

3) According to the set calculation formula of the error correction coefficient, the average correction coefficient delta of the first n sampling points in the output stepping window of each point to be measured can be calculated as

In the formula, beta _i For the proposed weight factor, equal to

Comparing the average value of the measured values of the n step sampling points with the estimated value, the actual predicted value at the moment i

The output can be expressed as:

thus, according to the time variable, all time sequence missing data repairing work of the sampling point is completed through traversing by continuously moving the window of the training set; and finally, according to the graph shown in fig. 3, traversing and predicting missing data of all sampling points according to the position variables.

The above examples are given for the purpose of illustrating the invention clearly and not for the purpose of limiting the same, and it will be apparent to the skilled person that modifications in other forms can be made on the basis of the above description, and it is not intended to be exhaustive of all the embodiments and obvious variations and modifications are possible within the scope of the invention.

Claims

1. A distributed optical fiber abnormal data restoration device based on self-adaptive long-term and short-term memory is characterized in that: combining the self-adaptive LSTM algorithm with distributed optical fiber time sequence data, and changing the optical fiber time sequence input cycle length of the LSTM and control parameters in the training process in a self-adaptive manner so as to finish abnormal data repair work; the device mainly comprises a noise reduction preprocessing module and a self-adaptive LSTM data prediction module, and specifically comprises the following execution steps:

2) By constructing a self-adaptive LSTM prediction module, continuously updating two self-adaptive parameters of an input cycle length L and a Loss function Loss of the LSTM, and setting an error correction coefficient delta, thereby improving the repair precision of the model;

the specific method of the step 1) comprises the following steps:

defining a weighted disturbance factor alpha in a noise reduction preprocessing module _i (ii) a Defining a strain threshold based on the mechanical properties of the optical fiber based on a 3 sigma threshold criterion of the position variable to further constrain the effective range of the monitored values; combining a time sequence variable weighted difference algorithm with a 3 sigma threshold method of a position variable to carry out noise signal filtering pretreatment;

where m represents participation in the computation of y _x The window length of (d); y is _mean Representing the mean of m samples within a window, i.e.

i denotes the number of sampling points, y _i Monitor value, alpha, representing a sample point _i Is a weighted perturbation factor which satisfies:

wherein the content of the first and second substances,

to pair

The influence factor of (2) is 1/m;

for is to

The influence factor of (2/m); by the way of analogy, the method can be used,

for is to

The influence factor of (a) is m-1/m;

The limit strain epsilon available with the fiber then satisfies:

E(1+cε)ε≤[σ]

wherein E represents the elastic modulus, σ _u Represents the ultimate breaking stress of the optical fiber;

Defining a constraint coefficient K based on the mean value and standard deviation of sampled data at the same time of an optical fiber ₁ And K ₂ To define the width of the range, the width is measured normally

Satisfies the following conditions:

wherein, K ₁ ε _min Should approximate μ -k σ; k ₂ ε _max Approximate μ + k σ; σ is the standard deviation of all sample values in the value window, μ is the mean of the sample points, k is the confidence range coefficient, k belongs to [0,3]]；

The specific method of the step 2) comprises the following steps:

in the self-adaptive LSTM prediction module, the training efficiency of the model is improved by utilizing the self-adaptive input time period length L and the improved custom Loss function Loss;

wherein L is _k The window number of the continuous missing data of the k sampling point position; l is a radical of an alcohol _sum Representing the sum of the missing windows of all sampling points; h _k Representing the number of abnormal values at the sampling point k on the optical fiber; t is the total sampling time;

2-2) on the basis of a standard regularization term function, integrating the segmented input cycle length L and the window step length distance n by utilizing the thought of a two-dimensional data group, and defining a training Loss function Loss of the LSTM so as to improve the learning efficiency; l is the length of the current segmentation window, n is less than L, n is the number of the step of the sample point of the prediction set, and the input and the output of the hidden layer meet a two-dimensional array with dimension (L-n, n); according to the error calculation formula, the Loss function Loss in the training process is defined as:

in the formula, i represents the number of sampling points;

is the average of the split windows; y is _i Representing a monitored value; l represents an input period length; n represents the step distance of the window; r (ω) is a regularization term for limiting interference noise at the time of model learning; β is its controllable weight factor constant.

2. The distributed optical fiber abnormal data restoration device based on the adaptive long-short term memory as claimed in claim 1, wherein in the adaptive LSTM prediction module, an average error coefficient delta is defined by calculating the difference of the monitoring values of the sampling points of the adjacent positions of n steps before the time t to be measured based on the position independent variable to correct the actual LSTM prediction value, thereby reducing the iterative error of the optical fiber data; in step n, the average coefficient of variation δ satisfies:

in the formula, beta _i For the proposed weight factor, equal to

n represents the distance of stepping, i represents the number of movements of the monitored value at the previous time, j represents the number of movements of the monitored value at the subsequent time, y _t-i Indicating the monitored value of the previous moment, y _t-j Indicating a monitoring value at a later time; the actual predicted value at time t

The output value is:

in the formula, y _t Representing the theoretical prediction of the LSTM,