CN109408498B - Time series feature identification and decomposition method based on feature matrix decision tree - Google Patents

Time series feature identification and decomposition method based on feature matrix decision tree Download PDF

Info

Publication number
CN109408498B
CN109408498B CN201811170289.6A CN201811170289A CN109408498B CN 109408498 B CN109408498 B CN 109408498B CN 201811170289 A CN201811170289 A CN 201811170289A CN 109408498 B CN109408498 B CN 109408498B
Authority
CN
China
Prior art keywords
feature
data
value
characteristic
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811170289.6A
Other languages
Chinese (zh)
Other versions
CN109408498A (en
Inventor
苏鹭梅
朱文婷
郑小龙
郑锐洁
张宝琼
叶恺昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University of Technology
Original Assignee
Xiamen University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University of Technology filed Critical Xiamen University of Technology
Priority to CN201811170289.6A priority Critical patent/CN109408498B/en
Publication of CN109408498A publication Critical patent/CN109408498A/en
Application granted granted Critical
Publication of CN109408498B publication Critical patent/CN109408498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a time series feature recognition and decomposition method based on a feature matrix decision tree, which mainly comprises sample data preprocessing, sample data period determination, sample data feature selection and extraction and multivariate time series feature recognition and decomposition model establishment. The method can improve the speed and accuracy of feature identification, and is particularly suitable for non-invasive load identification and decomposition in the power industry.

Description

Time series feature identification and decomposition method based on feature matrix decision tree
Technical Field
The invention relates to the field of big data analysis and mining, in particular to data identification and decomposition based on time series.
Background
In recent years, a data model based on time series is ubiquitous in the aspects of internet big data, machine and sensor data and the like, and the model is widely applied to the fields of finance, e-commerce platforms, process industry and the like, so that the model is widely concerned.
At present, the analysis aiming at the time sequence is widely concerned, a mathematical model is established for the analysis, parameter estimation is carried out, and the analysis is further applied to many aspects such as prediction, industrial adaptive control, optimal filtering and the like. The traditional time series analysis technology focuses on the analysis and the variable point identification of the unary time series. With the time series method becoming mature, the application field is wider and wider, the requirement on the model is not limited to a unitary time series but faces to a multi-component time series model, and therefore the time series model needs to be analyzed, identified and decomposed.
Based on analysis, identification and decomposition of the time series, not only the hidden features of the series are extracted, but also the range of periodic fluctuation and the division of the multivariate time series are determined. The decomposition of the multivariate time sequence enables the data to be more visual and simple, and the rules and trends in the data can be obtained more easily, so that the multivariate time sequence is applied to various fields.
Disclosure of Invention
Therefore, the invention provides a time series feature identification and decomposition method based on a feature matrix decision tree, which comprises the following specific scheme:
the time series feature identification and decomposition method based on the feature matrix decision tree comprises the following steps:
100. data preprocessing: carrying out data cleaning, data integration and data reduction on the sample data;
200. determining a sample period: performing data screening and grouping on the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method is to perform Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, find out the maximum frequency component and determine the reciprocal of the maximum frequency component as the period;
300. selecting and extracting features, namely evaluating feature subsets by adopting a combination sequence forward feature selection algorithm and a K-means clustering algorithm and determining an optimal feature subset to complete feature selection, and then extracting high-identification-degree features from the selected sample features;
400. and establishing a multivariate time series characteristic identification and decomposition model.
Further, the method for cleaning data in data preprocessing described in step 100 is a Grubbs (Grubbs) method, specifically, the specific kettle determines a "suspicious value" by determining a "suspicious value" in the sample data, calculates a deviation value to determine the "suspicious value", calculates a Gi value, compares Gi with a critical value GP (n) given by the Grubbs table by searching the Grubbs (Grubbs) table, determines that the measured data is an abnormal value if the Gi value is greater than the critical value GP (n) in the table, and can remove the "suspicious value" from the data sample without participating in the calculation of the average value.
Further, the method for data integration in data preprocessing described in step 100 is a correlation coefficient method, specifically, a correlation coefficient is obtained by calculating a standard deviation and a covariance of a sample, and the strength of a relationship between the two is judged according to a value of the correlation coefficient, and a value range of the correlation coefficient is between 1 and-1, where 1 represents that two variables are completely linearly correlated, -1 represents that two variables are completely negatively correlated, and 0 represents that two variables are uncorrelated; the closer the data is to 0, the weaker the correlation is.
Further, the method for data reduction in data preprocessing described in step 100 is a regression analysis method, and the relationship between variables is refined and solidified on the basis of the association degree between each parameter obtained by data integration, and irrelevant variables are removed, so that the dimensionality of the analyzed data sample is reduced, and a reliable model is mined.
Further, the specific process of step 300 is:
310. determining an optimal feature subset according to a sequential forward feature selection algorithm, and forming a feature group X with k size by using the selected k features k D-k unselected features X j J =1,2, 3., d-k, arranged in J value size after combination with the features already entered, the sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m;
320. evaluating the separation degree of characteristics among different types of samples by adopting a K-means clustering algorithm, giving a sample set K, dividing the sample set into K clusters by the K-means algorithm, wherein each clustering center is the mean value of samples in the clusters; then distributing the other objects to the nearest cluster according to the distance between the other objects and all samples in each cluster, then requiring the center of a new cluster, and continuously repeating the iterative positioning process to ensure that the sum of the distances between all samples and the center in each cluster is minimum until the target function is minimized, thereby selecting the optimal characteristic;
330. the method comprises the steps of extracting features based on a time sequence feature selection algorithm, calculating feature values of sample data, eliminating invalid periods in the sample data, selecting 15 period data with feasibility as the sample data, calculating the feature values of the 15 period data, and extracting and obtaining the features with the highest identification degree through feature value classification.
Further, the step of establishing the multivariate time series identification model in step 400 comprises the following sub-steps:
410. based on a C4.5 decision tree classification algorithm, each feature is considered to be a class, the class is equivalent to a leaf node in a decision tree, attribute values (namely sample feature parameters) are compared at internal nodes of the decision tree in a top-down recursive mode, classification is carried out in a mode of judging downward branches from the nodes according to different attribute values until each class only contains a unique result, namely the leaves are pure, and identification and decomposition are carried out according to the obtained optimal feature parameters to judge the data class to which the feature parameters belong.
420. And introducing an improved sliding window bilateral CUSUM event detection algorithm to segment the time sequence, and continuously tracking the change of the characteristic parameters at each sampling point through an event detection program. Whether a certain characteristic parameter is changed or not is detected in the whole time sequence, so that the identification of the characteristic in the time sequence is realized, then the time of the time sequence of the characteristic value group at the current time is judged, and then characteristic decomposition is carried out, so that the current time of the current data is in a certain state of certain data;
430. establishing a category characteristic matrix based on a time sequence, averaging the characteristic values of data through training samples, solving a standard deviation of the mean value as a fluctuation level, introducing a category characteristic matrix decision tree, and establishing a time sequence characteristic probability model, thereby establishing the optimal solution of the current multivariate time sequence characteristic and finally realizing the automatic identification and decomposition of the characteristic.
The method can improve the speed and the accuracy of feature identification, and is particularly suitable for non-intrusive load identification in the power industry.
Drawings
FIG. 1 is a general flow diagram of the process of the present invention;
FIG. 2 is a flow chart of feature selection and extraction in the method of the present invention;
FIG. 3 is a flow chart of the multivariate time series identification modeling in the method of the present invention;
FIG. 4 is a schematic diagram of power spectrum analysis of a computer;
FIG. 5 is a flowchart of a sliding window bilateral CUSUM event detection method in the method of the present invention;
fig. 6 is a schematic diagram of four stages of event detection in the CUSUM event detection method of fig. 5.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the accompanying drawings and detailed description.
Referring to FIG. 1, the overall flow of the process of the present invention is described. The method mainly comprises sample data preprocessing 100, sample data period determination 200, feature selection and extraction 300 and establishment of a multivariate time series feature identification and decomposition model 400.
In the sample data preprocessing 100, the present embodiment first performs data cleaning using the Grubbs (Grubbs) method, i.e., corrects recognizable errors in the data file, processes invalid values, missing values, and abnormal values, and checks data consistency. The method comprises the steps of determining a suspicious value in sample data, calculating a deviation value to determine the suspicious value, calculating a Gi value, searching a Grubbs table, comparing Gi with a critical value GP (n) given by the Grubbs table, and if the Gi value is larger than the critical value GP (n) in the table, judging that the measured data is an abnormal value. Thus, the "suspect value" can be removed from the data samples by the Grubbs method without taking part in the calculation of the mean value.
Considering that some electrical equipment parameters may have a high degree of correlation, we perform data integration processing on sample data, and we use a correlation coefficient-based method to reflect the degree of affinity between variables. And calculating to obtain a correlation coefficient by calculating the standard deviation and the covariance of the sample, and judging the strength of the relation between the standard deviation and the covariance according to the numerical value of the correlation coefficient. The following is a calculation formula of the correlation coefficient:
Figure BDA0001822230550000051
sxy sample covariance calculation formula:
Figure BDA0001822230550000052
sx sample standard deviation calculation formula:
Figure BDA0001822230550000053
sy sample standard deviation calculation formula:
Figure BDA0001822230550000054
wherein r is xy Represents the sample correlation coefficient, S xy Represents the sample covariance, S x Sample standard deviation, S, for X y Sample standard deviations for y are indicated. Coefficient of correlation r xy The correlation degree table of (2) is shown in table 1:
TABLE 1 correlation coefficient r xy Reference table of degree of correlation
Figure BDA0001822230550000061
The value interval of the correlation coefficient is between 1 and-1. Where 1 indicates that the two variables are completely linearly related, -1 indicates that the two variables are completely negatively related, and 0 indicates that the two variables are not related. The closer the data is to 0, the weaker the correlation is.
Because the number of data samples for analysis is huge, data reduction processing, namely parameter dimension reduction processing, is carried out on the data samples. Specifically, a regression analysis method is adopted, relationships among variables are refined and solidified on the basis of the association degree among all parameters obtained through data integration, irrelevant variables are removed, the dimensionality of analyzed data samples is reduced, and a reliable model is excavated. Taking the current, active power, reactive power, power factor and second harmonic current of a laser printer and a notebook computer as examples, the results obtained by regression analysis of MATLAB 2016a are shown in tables 2 and 3:
TABLE 2 correlation between laser printer parameters
Figure BDA0001822230550000062
Figure BDA0001822230550000071
TABLE 3 correlation between computer parameters
Figure BDA0001822230550000072
In the step of determining the sample data period 200, the present embodiment employs a spectrum analysis method, and performs data screening and grouping on the screened feature values at certain intervals in a specific period, where the grouping method is to perform fourier transform on the time-series feature quantity to obtain an intensity spectrum, find out the maximum frequency component, and determine the reciprocal of the maximum frequency component as a period, thereby improving the resolution of feature extraction.
The fourier transform of the periodic discrete-time signal x (nT) can be expressed as:
Figure BDA0001822230550000073
wherein, the finite long discrete signal x (N), N =0,1, \8230;, N-1.
Fig. 4 shows a spectral analysis of a computer. The period we estimate from the raw data is about 400s, and the second highest frequency obtained with our algorithm is about 0.0025Hz, which is consistent. The reason why the frequency of the highest amplitude is not used is that because our data is non-periodic, the highest amplitude occurs near zero and the corresponding frequency of the next highest amplitude is closer to the data period.
In the feature selection and extraction 300, the embodiment evaluates the feature subsets by combining the sequential forward feature selection algorithm and the K-means clustering algorithm and determines the optimal feature subsets to complete feature selection, and then extracts features with high degree of identification from the selected sample features. As shown in fig. 2, the specific process is as follows:
310. the optimal feature subset is determined according to a sequential forward feature selection algorithm. Let it be assumed that k selected features form a set of k sized features X k The unselected d-k features X j J =1,2,3,.., d-k, arranged in the size of the J value after combination with the already entered feature, i.e. if
J(X k +x 1 )≥J(X k +x 2 )≥…≥J(X k +x d-k ) (6)
The next step is to select the feature set as
X k+1 =X k +x 1 (7)
The sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m.
320. And evaluating the separation degree of the characteristics among different types of samples by adopting a K-means clustering algorithm. From the perspective of geometric intuition, the larger the separability between classes is, the larger the distance between classes is, the farther the classification between different classes of samples is, and meanwhile, the smaller the intra-class distance is, the higher the intra-class aggregation degree is. Giving a sample set K, and dividing the sample set into K clusters by a K-means algorithm, wherein each cluster center is the mean value of samples in the clusters; and then distributing the other objects to the nearest cluster according to the distances between the other objects and all samples in each cluster, then requiring the center of a new cluster, and continuously repeating the iterative positioning process to ensure that the sum of the distances between all samples and the center in each cluster is minimum until the target function is minimized, thereby selecting the optimal characteristic.
330. Because the clustering result cannot complete the feature selection of all sample data categories to a great extent, in order to improve the efficiency of feature selection, the embodiment proposes the feature extraction method adopting time domain statistical features, calculates the operating feature value of the electric equipment, eliminates the invalid period in the sample data, selects 15 period data with feasibility as the sample data, and extracts the features with strong identification through calculating the feature value of the 15 period data and comparing various time domain statistical features such as the mean value, the variance, the skewness and the like.
In the process of establishing the multivariate time series feature identification and decomposition model 400, referring to fig. 3, the process of the multivariate time series feature identification model of the embodiment is described, which includes the following sub-steps:
410. based on a C4.5 decision tree classification algorithm, each feature is considered to be classified as a class, the class is equivalent to a leaf node in a decision tree, attribute values (namely sample feature parameters) are compared at internal nodes of the decision tree in a top-down recursive mode, classification is carried out in a mode of judging downward branches from the nodes according to different attribute values until each class only contains a unique result, namely, leaves are pure, and the class to which the feature parameters belong is judged by identifying and decomposing according to the obtained optimal feature parameters.
The C4.5 decision tree classification algorithm is a supervised classification learning algorithm. Let us say that there is one sample set denoted PC. The proportion of the kth class sample in the sample set is P k (k =1,2, \8230;, a), a being the total number of classes in a sample, the sample set information entropy is defined as shown in the formula:
Figure BDA0001822230550000091
let us say that the sample set is divided according to the attribute B, if there are X possible values in the attribute B, X branch nodes are generated, wherein the X (X =1,2, \ 8230;, X) th branch node contains all the values B on the attribute B in the sample set x Sample of (1), denoted as C x (ii) a The "information gain" (information gain) obtained by dividing the sample set by the attribute B can be defined as follows:
Figure BDA0001822230550000092
further, the information gain ratio of the attribute B:
Figure BDA0001822230550000093
the gain rates of different attributes can be calculated according to the formula, the attribute with the maximum gain rate is selected as the splitting attribute of the splitting, the gain rates of other attributes are calculated in the same mode, and the splitting is performed successively until all the attributes are separated or all samples are subjected to value phase on all the attributes until the splitting cannot be performed.
420. Introducing an improved sliding window bilateral CUSUM event detection algorithm to segment the time sequence, and continuously tracking the change of the characteristic parameters at each sampling point through an event detection program; whether a certain characteristic parameter changes is detected in the whole time sequence, so that the identification of the characteristics in the time sequence is realized, then the time of the characteristic value group at the current time is judged at the time of the time sequence, and then characteristic decomposition is carried out, so that the current time of the current data is in a certain state of certain data.
The following describes an improved sliding window bilateral CUSUM event detection algorithm by taking detection of residential electric equipment as an example, and the algorithm specifically includes the following steps:
setting an active power time sequence
Figure BDA0001822230550000101
Defining two continuous sliding windows Ws (steady state mean window) and Wu (transient mean window) in the time sequence, defining the lengths of the windows as s and u respectively, and calculating the mean value A of the two windows respectively s And A u The calculation formula is as follows:
Figure BDA0001822230550000102
Figure BDA0001822230550000103
then define respectively
Figure BDA0001822230550000104
And
Figure BDA0001822230550000105
for detecting whether the time series is switched on (i.e. power present increasing phenomenon) or switched off (i.e. power present decreasing phenomenon) at the current moment, and defining a fluctuation level epsilon for representing the time series in a steady state, the calculation formula is as follows:
Figure BDA0001822230550000106
Figure BDA0001822230550000107
taking the time sequence whether to have an event starting or changing the state as an example, the flow of the sliding window bilateral CUSUM event detection method is as follows, taking the detection of the input event as an example, when the detection window A is used u A value of greater than A u When the sum is + epsilon,
Figure BDA00018222305500001010
an increment is started. At this time, a threshold value range K for determining the occurrence of the event needs to be set when
Figure BDA00018222305500001011
In order to avoid the multiple recognition of the load turn-on or turn-off event caused by the sequence oscillation, a time delay factor d (with an initial value of 0) is introduced, and each time the delay factor is added by l, the event can be generated at the moment
Figure BDA0001822230550000108
And
Figure BDA0001822230550000109
make a comparison if
Figure BDA0001822230550000111
Then it is considered that what caused the active power change at that time is a fluctuation, and let
Figure BDA0001822230550000112
d =0, thereby avoiding multiple identification events caused by device data fluctuations. When in use
Figure BDA0001822230550000113
Let d = d + l, calculate
Figure BDA0001822230550000114
Up to
Figure BDA0001822230550000115
The detected time of occurrence of the event can be derived from t-d. The sliding window bilateral CUSUM event detection process taking the detection of the load input event as an example is shown in fig. 5, and the process of detecting the close event can be obtained in the same manner.
When the sliding window of the sliding window bilateral CUSUM event detection program slides over the occurrence time of an event, the sliding window bilateral CUSUM event detection program can be divided into 4 stages, as shown in fig. 6, where P is 0 Is the active power before the occurrence of the event, and Δ P is the active power after the occurrence of the event and P 0 The difference of (a).
a. The first phase is when the transient detection window has not yet slid to the event occurrence, and the values of both windows remain unchanged, i.e. A u –A s =0;
b. The second phase is when the time of occurrence of the event is within the transient detection window, A u Is constantly changing, and A s Do not change, this time order P 1 =P 0 +. DELTA P, and set t d =t-t 1 And t is d E (1, u), then at this stage every moment in time corresponds to it A s And A u Are respectively A s =P 0
Figure BDA0001822230550000118
c. The third phase is when the time of occurrence of the event is within the mean calculation window, A u Invariable, A s Constantly changing, and (t) d -u) e (1, s-1), where A corresponds to each time instant s And A u Are respectively as
Figure BDA0001822230550000116
d. The fourth stage is when both windows have slid past the event detection window, A s And A u No change occurs.
The above calculation and analysis of the threshold K are based on the instant-on devices, but many of the residential electrical devices, such as microwave ovens, printers, etc., are not instant-on. In order to reduce the error rate of event identification, a compromise scheme is introduced, and the maximum and minimum values of the threshold value are used as the threshold value for determining the occurrence of the event, namely, the maximum and minimum values are ordered to be
Figure BDA0001822230550000117
From the above derivation, it is only necessary to determine As and A u K, and the minimum power of the device identified at that time, may be determined. Then, the value range of the threshold K for determining the occurrence of time can be obtained as follows:
K=(K max +K min )/2 (12)
430. establishing a category characteristic matrix based on a time sequence, averaging characteristic values of data through training samples, solving a standard deviation of the mean value as a fluctuation level, introducing a category characteristic matrix decision tree, and establishing a time sequence characteristic probability model, so that an optimal solution of the current multivariate time sequence characteristic is established, and automatic identification and decomposition of the multivariate time sequence characteristic are finally realized.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. A time series feature recognition and decomposition method based on a feature matrix decision tree is used for non-intrusive load recognition in the power industry, and is characterized by comprising the following steps:
100. data preprocessing: carrying out data cleaning, data integration and data reduction on the sample data;
200. determining a sample period: performing data screening and grouping on the characteristic values obtained by screening according to a preset period and a preset number at intervals, wherein the grouping method is to perform Fourier transform on the time sequence characteristic quantity to obtain an intensity spectrum, find out the maximum frequency component and determine the reciprocal of the maximum frequency component as the period;
300. selecting and extracting features, namely evaluating feature subsets by adopting a combination sequence forward feature selection algorithm and a K-means clustering algorithm and determining an optimal feature subset to complete feature selection, and then extracting high-identification-degree features from the selected sample features, wherein the specific process comprises the following steps of:
310. determining an optimal feature subset according to a sequential forward feature selection algorithm, and forming a feature group X with the size of k by setting k selected features k D-k unselected features X j J =1,2,3,.., D-k, arranged in J value size in combination with the already entered features; the sequential forward feature selection algorithm starts from a null feature set, selects the best feature in the original feature set in each subsequent cycle, and adds the best feature to the optimal feature subset until the number of features increases to m;
320. evaluating the separation degree of characteristics among different types of samples by adopting a K-means clustering algorithm, giving a sample set K, dividing the sample set into K clusters by the K-means algorithm, wherein each clustering center is the mean value of samples in the clusters; then distributing the other objects to the nearest cluster according to the distance between the other objects and all samples in each cluster, then requiring the center of a new cluster, and continuously repeating the iterative positioning process to ensure that the sum of the distances between all samples and the center in each cluster is minimum until the target function is minimized, thereby selecting the optimal characteristic;
330. extracting features based on a time sequence feature selection algorithm, calculating a feature value of sample data, eliminating invalid periods in the sample data, selecting 15 period data with feasibility as the sample data, calculating the feature value of the 15 period data, and extracting and obtaining the features with the highest identification degree through feature value classification;
400. establishing a multivariate time series feature recognition and decomposition model, comprising the following substeps:
410. based on a C4.5 decision tree classification algorithm, considering that each feature is classified into one type, namely a leaf node in a decision tree, performing attribute value comparison on an internal node of the decision tree in a top-down recursion mode, namely sample feature parameters, and classifying in a mode of judging downward branches from the node according to different attribute values until each type only contains a unique result, namely pure leaves, and performing identification and decomposition according to the obtained optimal feature parameters to judge the type to which the feature parameters belong;
420. introducing an improved sliding window bilateral CUSUM event detection algorithm to detect a load input event, and specifically, setting an active power time sequence
Figure FDA0003805047390000021
Defining two continuous sliding windows in the time sequence, namely a steady-state mean value window Ws and a transient-state mean value window Wu, defining the lengths of the two continuous sliding windows as s and u respectively, and calculating the average value A of the two continuous sliding windows respectively s And A u (ii) a Are defined separately
Figure FDA0003805047390000022
And
Figure FDA0003805047390000023
detecting whether the time series is input or cut off at the current moment, and defining a fluctuation level epsilon for representing the time series in a steady state; when detecting window A u A value of greater than A u When the sum is + epsilon,
Figure FDA0003805047390000024
starting to increase, it is necessary to set a threshold range K for determining the occurrence of an event when
Figure FDA0003805047390000025
If so, then there may be an event occurring at this time; introducing a time delay factor d, wherein the initial value of d is 0, and the delay factor is added to l
Figure FDA0003805047390000026
And
Figure FDA0003805047390000027
make a comparison if
Figure FDA0003805047390000028
Then it is considered that what caused the active power change at that time is a fluctuation, and order
Figure FDA0003805047390000029
d =0; when the temperature is higher than the set temperature
Figure FDA00038050473900000210
Let d = d + l, calculate
Figure FDA00038050473900000211
Up to
Figure FDA00038050473900000212
The occurrence time of the detected event can be deduced according to t-d; segmenting the time sequence, and continuously tracking the change of the characteristic parameters at each sampling point through an event detection program; whether a certain characteristic parameter changes is detected in the whole time sequence, so that the identification of the characteristics in the time sequence is realized, then the time of the characteristic value group at the current time is judged at the time of the time sequence, and then characteristic decomposition is carried out, so that the current time of the current data is in a certain state of certain data;
430. establishing a category characteristic matrix based on a time sequence, averaging the characteristic values of data through training samples, solving a standard deviation of the mean value as a fluctuation level, introducing a category characteristic matrix decision tree, and establishing a time sequence characteristic probability model, thereby establishing the optimal solution of the current multivariate time sequence characteristic and finally realizing the automatic identification and decomposition of the characteristic.
2. The method of claim 1, wherein the data cleansing method in the data preprocessing described in the step 100 is a Grubbs method, and specifically, the method comprises determining a suspicious value by determining a suspicious value in the sample data, calculating a deviation value to determine the suspicious value, calculating a Gi value, comparing Gi with a critical value GP (n) given by the Grubbs table by searching the Grubbs table, and determining that the data is an abnormal value if the Gi value is greater than the critical value GP (n) in the table.
3. The method of claim 1, wherein the method of data integration in the data preprocessing described in step 100 is a correlation coefficient method.
4. The method of claim 1, wherein the reduction of the data in the pre-processing of the data in step 100 is a regression analysis.
CN201811170289.6A 2018-10-09 2018-10-09 Time series feature identification and decomposition method based on feature matrix decision tree Active CN109408498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811170289.6A CN109408498B (en) 2018-10-09 2018-10-09 Time series feature identification and decomposition method based on feature matrix decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811170289.6A CN109408498B (en) 2018-10-09 2018-10-09 Time series feature identification and decomposition method based on feature matrix decision tree

Publications (2)

Publication Number Publication Date
CN109408498A CN109408498A (en) 2019-03-01
CN109408498B true CN109408498B (en) 2022-12-13

Family

ID=65466850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811170289.6A Active CN109408498B (en) 2018-10-09 2018-10-09 Time series feature identification and decomposition method based on feature matrix decision tree

Country Status (1)

Country Link
CN (1) CN109408498B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915082B (en) * 2020-08-03 2024-03-29 深圳集智数字科技有限公司 Prediction method, prediction device, storage medium and prediction equipment
CN112836875A (en) * 2021-02-02 2021-05-25 朗坤智慧科技股份有限公司 Equipment regulation and control method and system based on time sequence domain and network side server
CN117310118B (en) * 2023-11-28 2024-03-08 济南中安数码科技有限公司 Visual monitoring method for groundwater pollution

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9652716B2 (en) * 2014-02-20 2017-05-16 Nec Corporation Extracting interpretable features for classification of multivariate time series from physical systems
CN104268572B (en) * 2014-09-23 2017-10-27 南京大学 Feature extraction and feature selection approach towards backstage multi-source data
CN105243456A (en) * 2015-11-09 2016-01-13 国家电网公司 Decision tree and expert system-based short-term power load forecasting system and method
CN108288096B (en) * 2017-01-10 2020-08-21 北京嘀嘀无限科技发展有限公司 Method and device for estimating travel time and training model
CN107895214A (en) * 2017-12-08 2018-04-10 北京邮电大学 A kind of multivariate time series Forecasting Methodology
CN108491886A (en) * 2018-03-29 2018-09-04 重庆大学 A kind of sorting technique of the polynary time series data based on convolutional neural networks

Also Published As

Publication number Publication date
CN109408498A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN109387712B (en) Non-invasive load detection and decomposition method based on state matrix decision tree
Biswal et al. Detection and characterization of multiple power quality disturbances with a fast S-transform and decision tree based classifier
CN109408498B (en) Time series feature identification and decomposition method based on feature matrix decision tree
CN110995508B (en) KPI mutation-based adaptive unsupervised online network anomaly detection method
Grabusts The choice of metrics for clustering algorithms
WO2014198052A1 (en) Fast grouping of time series
CN112134862B (en) Coarse-fine granularity hybrid network anomaly detection method and device based on machine learning
Sani et al. Redefining selection of features and classification algorithms for room occupancy detection
Graß et al. Unsupervised anomaly detection in production lines
CN113052225A (en) Alarm convergence method and device based on clustering algorithm and time sequence association rule
CN112860819A (en) Interactive feature selection method based on neighborhood condition mutual information
CN112463848A (en) Method, system, device and storage medium for detecting abnormal user behavior
CN116610938A (en) Method and equipment for detecting unsupervised abnormality of semiconductor manufacture in curve mode segmentation
CN108256274B (en) Power system state identification method based on search attractor error algorithm
CN116664335A (en) Intelligent monitoring-based operation analysis method and system for semiconductor production system
Honest A survey on feature selection techniques
de Araujo et al. Impact of feature selection methods on the classification of DDoS attacks using XGBoost
CN111339986A (en) Frequency law mining method and system for equipment based on time domain/frequency domain analysis
Andrae et al. Soft clustering analysis of galaxy morphologies: a worked example with SDSS
CN115664814A (en) Network intrusion detection method and device, electronic equipment and storage medium
Pau et al. Electric current classification with tiny machine learning for home appliances
CN113177078A (en) Efficient approximate query processing algorithm based on condition generation model
Seelammal et al. Multi-criteria decision support for feature selection in network anomaly detection system
CN110796155A (en) Crude oil water content data analysis method based on clustering algorithm
CN115827821B (en) Judgment strategy generation method and system based on information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant