CN108536862A - A kind of Time Series Similarity measure based on dynamic time warping - Google Patents
A kind of Time Series Similarity measure based on dynamic time warping Download PDFInfo
- Publication number
- CN108536862A CN108536862A CN201810355812.6A CN201810355812A CN108536862A CN 108536862 A CN108536862 A CN 108536862A CN 201810355812 A CN201810355812 A CN 201810355812A CN 108536862 A CN108536862 A CN 108536862A
- Authority
- CN
- China
- Prior art keywords
- time series
- dist
- time
- length
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011524 similarity measure Methods 0.000 title claims abstract description 6
- 238000000034 method Methods 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000005259 measurement Methods 0.000 abstract description 3
- 239000007787 solid Substances 0.000 abstract 1
- 238000007418 data mining Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Measurement Of Unknown Time Intervals (AREA)
Abstract
The present invention discloses a kind of Time Series Similarity measure, and this method combines dynamic time warping algorithm and derivative dynamic time warping algorithm, increases the accuracy of Time Series Similarity measurement, solid foundation is provided for the follow-up study of time series.
Description
Technical field
The present invention relates to the method for measuring similarity between data analysis field more particularly to time series.
Background technology
Nowadays, with the continuous development of Internet technology, electronic equipment and software technology, all trades and professions are at every moment all
Breaking out huge data, exponentially type increases the size of data, presents that data scale is big, data class is more, updating decision
And intrinsic value the characteristics of reaching.Time series be it is very common in a kind of actual life and with association in time, have successively time
The sequence of values or symbol sebolic addressing of sequence, it is especially common in industries such as economy, weather, biologic medicals, while some non-time series
Data can also be converted into time series data to be analyzed.Therefore, how to be excavated from the time series data of magnanimity hiding
The useful information is that current Data Mining needs one of content of primary study.
Time Series Data Mining is the sub- content of core of Data Mining, and application range is very extensive.As when
Between important foundation Journal of Sex Research in Series Data Mining, Time Series Similarity measurement is before other data mining tasks are realized
It carries, such as classification, cluster, abnormality detection and pattern-recognition etc..Therefore, from certain angle, Time Series Similarity degree
The quality of amount performance decides the efficiency of Time series data mining algorithm to a certain extent.The similitude of time series
There are many measure, common are Euclidean distance (Euclidean Distance, ED), dynamic time warping (Dynamic
Time Warping, DTW) etc..But in numerous measures in calculating process, only calculate two time serieses away from
From, and the shape feature of time series is not considered.It would therefore be desirable to have better method, calculate time series apart from while,
The shape feature of time series is taken into account.
Invention content
In order to preferably calculate the similitude between time series, the present invention provides the sides for calculating Time Series Similarity
Method not only allows for the distance between time series, it is also contemplated that the shape feature between time series, specific technical solution is such as
Under:
(1) length of m time serieses to be measured uniformly is arranged to n, time series to be measured not less than m n
In maximum length sequence length;
(2) time series that m length is n is formed into a matrix Tm×n;
(3) by PCA dimension-reduction algorithms to matrix Tm×nDimensionality reduction is carried out, new matrix T is obtainedm×l, after wherein l indicates dimensionality reduction
Length of time series.
(4) calculating matrix Tm×lIn two time serieses (A and B) between DTW distances Dist1。
(5) calculating matrix Tm×lIn each time series derivative, constitute derivative time sequence, then calculate in step 4 again
Two time serieses A and B derivative time sequence between DTW distances Dist2, i.e. DDTW distances of time series.
(6) the Time Series Similarity size finally calculated is Dist=α * Dist1+(1-α)*Dist2, wherein α ∈ (0,
1)。
(7) according to similitude size Dist, cluster operation is carried out, is calculated between cluster result and similitude size Dist
Homologous related coefficient;Different α values are taken, are sought so that maximum α ' the values of homologous related coefficient.
(8) it is worth according to the α ' that step 7 obtains, obtains final similitude size Dist=α ' the * of time series A and B
Dist1+(1-α')*Dist2。
Further, in the step 1, n is the length of the maximum length sequence in m time serieses to be measured.
Further, in the step 1, the time series of n is less than for sequence length, 0 is mended at sequence end, is allowed to long
Degree is n.
Time Series Similarity measure according to the present invention, during calculating Time Series Similarity, no
The distance between time series size is only calculated, is also taken into account the shape feature of time series so that time series
Similarity measurement is more accurate.
Description of the drawings
The calculated homologous related coefficient size of Fig. 1 distinct methods
Specific implementation mode
The Time Series Similarity measure of the present invention is further explained with reference to specific embodiment
It states.
The present invention provides the methods for calculating Time Series Similarity, not only allow for the distance between time series, also
Consider the shape feature between time series.Below with the time series power of communication histories of mobile phone, the present invention is made specific
It is described as follows:
1. the length of 2076 time serieses to be measured uniformly is arranged to 4032, described 4032 to wait measuring for 2076
Time series in maximum length sequence length, for sequence length be less than 4032 time series, sequence end mend 0, make
Length be 4032, i.e. m=2076, n=4032;
2. the time series that 2076 length are 4032 is formed a matrix Tm×n;
3. by PCA dimension-reduction algorithms to matrix Tm×nDimensionality reduction is carried out, new matrix T is obtainedm×l, after wherein l indicates dimensionality reduction
Length of time series, i.e. l=8.
4. calculating matrix Tm×lDTW distances Dist between middle any two time series1。
5. calculating matrix Tm×lIn each time series derivative, constitute derivative time sequence, then calculate any two again
DTW distances Dist between derivative time sequence2, i.e. DDTW distances of time series.
6. the Time Series Similarity size finally calculated is Dist=α * Dist1+(1-α)*Dist2, wherein α ∈ (0,
1)。
7. according to similitude size Dist, cluster operation is carried out, is calculated between cluster result and similitude size Dist
Homologous related coefficient;Different α values are taken, are sought so that maximum α ' the values of homologous related coefficient.
8. according to the α values that step 7 obtains, final similitude size Dist=α ' the * Dist of time series are obtained1+(1-
α')*Dist2。
It can be seen in the drawings that being higher than using DTW and making by using the homologous related coefficient that DDTW methods obtain
(such as with traditional method for measuring similarity:Euclidean distance), meanwhile, use homologous phase relation obtained by method of the present invention
Number obtains best effect under certain α values, it follows that method of the present invention, can more accurately reflect two
Similarity between time series.
Claims (3)
1. a kind of Time Series Similarity measure, which is characterized in that include the following steps:
(1) length of m time serieses to be measured uniformly is arranged to n, n is not less than in a time serieses to be measured of m
The length of maximum length sequence;
(2) time series that m length is n is formed into a matrix Tm×n;
(3) by PCA dimension-reduction algorithms to matrix Tm×nDimensionality reduction is carried out, new matrix T is obtainedm×l, wherein l indicate dimensionality reduction after time
Sequence length.
(4) calculating matrix Tm×lIn two time serieses (A and B) between DTW distances Dist1。
(5) calculating matrix Tm×lIn each time series derivative, constitute derivative time sequence, then calculate two in step 4 again
DTW distances Dist between the derivative time sequence of a time series A and B2, i.e. DDTW distances of time series.
(6) the Time Series Similarity size finally calculated is Dist=α * Dist1+(1-α)*Dist2, wherein α ∈ (0,1).
(7) according to similitude size Dist, cluster operation is carried out, is calculated homologous between cluster result and similitude size Dist
Related coefficient;Different α values are taken, are sought so that maximum α ' the values of homologous related coefficient.
(8) it is worth according to the α ' that step 7 obtains, obtains final similitude size Dist=α ' the * Dist of time series A and B1+(1-
α')*Dist2。
2. according to the method described in claim 1, it is characterized in that, in the step 1, n is in m time serieses to be measured
Maximum length sequence length.
3. according to the method described in claim 1, it is characterized in that, in the step 1, the time of n is less than for sequence length
Sequence mends 0 at sequence end, and it is n to be allowed to length.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810355812.6A CN108536862A (en) | 2018-04-19 | 2018-04-19 | A kind of Time Series Similarity measure based on dynamic time warping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810355812.6A CN108536862A (en) | 2018-04-19 | 2018-04-19 | A kind of Time Series Similarity measure based on dynamic time warping |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108536862A true CN108536862A (en) | 2018-09-14 |
Family
ID=63478644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810355812.6A Pending CN108536862A (en) | 2018-04-19 | 2018-04-19 | A kind of Time Series Similarity measure based on dynamic time warping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108536862A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109596929A (en) * | 2019-01-31 | 2019-04-09 | 国家电网有限公司 | A kind of voltage curve similitude judgment method considering the asynchronous influence of clock |
-
2018
- 2018-04-19 CN CN201810355812.6A patent/CN108536862A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109596929A (en) * | 2019-01-31 | 2019-04-09 | 国家电网有限公司 | A kind of voltage curve similitude judgment method considering the asynchronous influence of clock |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107703480B (en) | Mixed kernel function indoor positioning method based on machine learning | |
CN109297689B (en) | Large-scale hydraulic machinery intelligent diagnosis method introducing weight factors | |
CN109783604B (en) | Information extraction method and device based on small amount of samples and computer equipment | |
CN107682319A (en) | A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor | |
CN102693452A (en) | Multiple-model soft-measuring method based on semi-supervised regression learning | |
CN107957946B (en) | Software defect prediction method based on neighborhood embedding protection algorithm support vector machine | |
CN108985065B (en) | Method and system for detecting firmware bugs by applying improved Mahalanobis distance calculation method | |
CN110472695B (en) | Abnormal working condition detection and classification method in industrial production process | |
CN113259331A (en) | Unknown abnormal flow online detection method and system based on incremental learning | |
CN108960342B (en) | Image similarity calculation method based on improved Soft-Max loss function | |
CN103885867B (en) | Online evaluation method of performance of analog circuit | |
CN105678409A (en) | Adaptive and distribution-free time series abnormal point detection method | |
CN110837874A (en) | Service data abnormity detection method based on time series classification | |
Wen et al. | A new method for identifying the ball screw degradation level based on the multiple classifier system | |
US20190095876A1 (en) | Method and system for determining maintenance policy of complex forming device | |
CN105224941A (en) | Process identification and localization method | |
KR20190099811A (en) | Method and apparatus for predicting time series signal using RNN | |
Le et al. | A novel wifi indoor positioning method based on genetic algorithm and twin support vector regression | |
CN111737294A (en) | Data flow classification method based on dynamic increment integration fuzzy | |
Pathan et al. | Efficient forecasting of precipitation using LSTM | |
CN108536862A (en) | A kind of Time Series Similarity measure based on dynamic time warping | |
Ko et al. | Feedforward error learning deep neural networks for multivariate deterministic power forecasting | |
CN113962954A (en) | Surface defect detection method based on SE-R-YOLOV4 automobile steel part | |
CN113110961A (en) | Equipment abnormality detection method and device, computer equipment and readable storage medium | |
CN116595857A (en) | Rolling bearing multistage degradation residual life prediction method based on deep migration learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180914 |
|
WD01 | Invention patent application deemed withdrawn after publication |