CN111612082B

CN111612082B - Method and device for detecting abnormal subsequence in time sequence

Info

Publication number: CN111612082B
Application number: CN202010456099.1A
Authority: CN
Inventors: 翟波; 张亚; 曾海芳; 覃桢
Original assignee: Hebei Xiaopenguin Medical Technology Co ltd
Current assignee: Hebei Xiaopenguin Medical Technology Co ltd
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2023-06-23
Anticipated expiration: 2040-05-26
Also published as: CN111612082A

Abstract

The embodiment of the invention provides a method and equipment for detecting an abnormal subsequence in a time sequence. The method comprises the following steps: adopting a single numerical value and a single moment to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any moment; constructing a plurality of splitting points, dividing a numerical space in a time sequence into a plurality of numerical intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical intervals, and constructing an extended interval table according to the interval table; and acquiring the weight of each time point of each sub-sequence of the time sequence in the extended interval table, taking an average value of all the weights as the score of each sub-sequence, and determining that the sub-sequence is less likely to be abnormal if the score is smaller. The invention ensures the detection precision and reliability of the abnormal subsequence.

Description

Method and device for detecting abnormal subsequence in time sequence

Technical Field

The embodiment of the invention relates to the technical field of data mining, in particular to a method and equipment for detecting abnormal subsequences in a time sequence.

Background

In real life, various fields contain a large amount of time-series data such as electrocardiographic data, electroencephalographic data, industrial-field sensor data, and network flow data of a patient. The time-series data is data formed according to a data generation precedence relationship. Thus, the time series data records the fluctuation information of a certain action in the time dimension, and the abnormal subsequence possibly contained in the time series data contains more important information than most normal subsequences. For example, abnormal electrocardiographic data means that a patient may suffer from a certain type of heart disease, and abnormal electroencephalogram data may be caused by brain diseases such as epilepsy. Abnormal subsequence (pattern) detection in a time series is a very important field, most of data of the time series containing abnormal patterns are in a normal form, the occurrence frequency of the abnormal patterns is very low, but the rarely-occurring abnormal patterns contain very important information. The unsupervised time series anomaly detection algorithm does not need known data, and belongs to a machine learning algorithm of inert learning. In an unsupervised abnormal subsequence detection algorithm, comparing any two subsequences in any time sequence to judge an abnormal condition; however, the time series data has characteristics such as dynamics and is often high-dimensional data; therefore, these methods for comparing the two-by-two subsequences often require a large time overhead, and often lose information of the time-series data in the time dimension during the conversion of the time-series representation, which affects the detection accuracy of the algorithm. Therefore, the detection and research of abnormal subsequences of time series data are of great practical significance. Therefore, developing a method for detecting abnormal subsequences in a time sequence, which can effectively overcome the above-mentioned drawbacks of the related art, is a technical problem to be solved in the industry.

Disclosure of Invention

Aiming at the problems existing in the prior art, the embodiment of the invention provides a method and equipment for detecting an abnormal subsequence in a time sequence.

In a first aspect, an embodiment of the present invention provides a method for detecting an abnormal subsequence in a time sequence, including: adopting a single numerical value and a single moment to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any moment; constructing a plurality of splitting points, dividing a numerical space in a time sequence into a plurality of numerical intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical intervals, and constructing an extended interval table according to the interval table; acquiring the weight of each time point of each sub-sequence of the time sequence in the extended interval table, taking an average value of all the weights as the score of each sub-sequence, and determining that the sub-sequence is less likely to be abnormal if the score is smaller; wherein the value space is made up of all values in the number of tuples; the probability that any numerical point falls within any numerical interval is the same.

Based on the foregoing method embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention uses a single value and a single time point to form a tuple, and forms a plurality of tuples into a time sequence, including:

P＝{(t ₁ ,p ₁ ),(t ₂ ,p ₂ ),(t ₃ ,p ₃ ),...,(t _n ,p _n )}

wherein n is the length of the time sequence and is any integer; (t) _n ，p _n ) Is the tuple; p is the time sequence; t is t _n For the single point in time; p is p _n Are the individual values.

Based on the foregoing method embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention defines similarity of different time sequences at any time point, including: if the first time sequence and the second time sequence are at t ₁ To t _n And if the numerical value of any time point in the time is in the same numerical value interval, judging that the first time sequence and the second time sequence are similar at the any time point.

Based on the content of the embodiment of the method, the method for detecting the abnormal subsequence in the time sequence provided in the embodiment of the invention, wherein the probability density is as follows:

the probability is:

wherein x is any time point; s is the number of numerical intervals; beta _i Is the ith split point; i=0, …, S-1.

Based on the foregoing method embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention includes:

wherein p' is the derivative of p with respect to time; g is a constructor, if G is zero, then beta is determined _i Is a split point, if G is not zero, beta is determined _i Not the split point.

Based on the foregoing method embodiment, in the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, an interval table is constructed according to the probability and a plurality of numerical intervals, and correspondingly, elements of the interval table include:

wherein j is the j-th numerical interval; ITable is an element of the interval table.

Based on the foregoing content of the method embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, where the average value of the ownership weights is used as a score of each subsequence, includes:

wherein t is _i Is the i-th moment; score (t) _i ) For time point t _i Is set in the extended interval table; score (P) is the fraction of subsequences; w is a weight; r is (r) _j+1,i For time point t _i The compact coefficients in the position of the numerical space and the adjacent upper interval; r is (r) _j-1,i For time point t _i The position in numerical space and the adjacent lower interval.

In a second aspect, an embodiment of the present invention provides an apparatus for detecting an abnormal subsequence in a time sequence, including:

the sequence construction module is used for forming a tuple by adopting a single numerical value and a single time point, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point;

the system comprises an extended interval table construction module, a time sequence generation module and a time sequence generation module, wherein the extended interval table construction module is used for constructing a plurality of splitting points to divide a numerical value space in the time sequence into a plurality of numerical value intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table;

an anomaly determination module, configured to obtain weights of each time point of each sub-sequence of a time sequence in the extended interval table, average the weights of all the time points as a score of each sub-sequence, and if the score is smaller, determine that the sub-sequence is less likely to be anomalous;

wherein the value space is made up of all values in the number of tuples; the probability that any numerical point falls within any numerical interval is the same.

In a third aspect, an embodiment of the present invention provides an electronic device, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions capable of executing a method of detecting abnormal subsequences in the time series provided by any of the various possible implementations of the first aspect.

In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform a method of detecting an abnormal sub-sequence in a time sequence provided by any one of the various possible implementations of the first aspect.

According to the method and the device for detecting the abnormal subsequence in the time sequence, the time sequence and the similarity thereof are redefined, the numerical space is divided into a plurality of numerical intervals, the probability density and the corresponding falling probability of the time sequence are further obtained, an extended interval table is constructed on the basis, the subsequence of the time sequence is scored according to the weight in the extended interval table, algorithm detection efficiency can be improved on the premise that the time sequence is complete in time dimension information, and detection precision and reliability of the abnormal subsequence are guaranteed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without any inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for detecting an abnormal subsequence in a time sequence according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a position of a time point of electrocardiographic data in a numerical space according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of similarity of numerical points at the same time points in different time sequences according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an apparatus for detecting abnormal subsequences in a time sequence according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In addition, the technical features of the various embodiments or the single embodiments provided in the present invention may be combined with each other arbitrarily to form a feasible technical solution, but it is necessary to base that a person skilled in the art can implement the solution, and when the combination of the technical solutions contradicts or cannot implement the solution, it should be considered that the combination of the technical solutions does not exist and is not within the scope of protection claimed in the present invention.

The embodiment of the invention provides a method for detecting abnormal subsequences in a time sequence, referring to fig. 1, the method comprises the following steps:

101. adopting a single numerical value and a single moment to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any moment;

102. constructing a plurality of splitting points, dividing a numerical space in a time sequence into a plurality of numerical intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical intervals, and constructing an extended interval table according to the interval table;

103. acquiring the weight of each time point of each sub-sequence of the time sequence in the extended interval table, taking an average value of all the weights as the score of each sub-sequence, and determining that the sub-sequence is less likely to be abnormal if the score is smaller;

Based on the foregoing disclosure of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, where a single value and a single point of time are adopted to form a tuple, and a plurality of tuples are formed into a time sequence, includes:

P＝{(t ₁ ,p ₁ ),(t ₂ ,p ₂ ),(t ₃ ,p ₃ ),...,(t _n ,p _n )} (1)

Specifically, a time series is assumed as formula (1). If each tuple (t _i ,p _i ) Considered as coordinates in two dimensions, it can locate a point in space. Thus, tuple (t _i ,p _i ) It can be understood that p is used _i To represent the time point t _i In the numerical space. Thus, the time series representation translated in this way of understanding, taking as an example one of the electrocardiographic data in the ECG200, illustrates the visual appearance under this representation, as shown in FIG. 2. In FIG. 2, p is used _i Subscript in time series P denotes t _i (from t ₀ To t ₉₅ ) The location of each point in time of the electrocardiographic data in numerical space is shown in fig. 2.

Based on the foregoing disclosure of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, where the defining similarity between different time sequences at any time point includes: if the first time sequence and the second time sequence are at t ₁ To t _n And if the numerical value of any time point in the time is in the same numerical value interval, judging that the first time sequence and the second time sequence are similar at the any time point.

Specifically, for time series of arbitrary length equal, the time point t _i Is identical, then the difference between the time series can only be represented by the value p at each instant in time _i And p is different from _i Representing the corresponding t _i In the numerical space. Thus, the time sequence can be measuredAnd the adjacent relation of the corresponding time points in the column in the numerical space is used for completing the similarity calculation of the time sequence. If the positions of the time points in the numerical space are adjacent, the time points are similar; if the time points differ far in the value space, this time series is said to be dissimilar at the time points. That is, if the time points t of the time series P and Q _i In which the value space lies in the same value interval, the time sequences P and Q lie at the instant t _i The upper are adjacent, also referred to as similar. For example, in fig. 3, when the whole numerical space is divided into five sections by straight lines, the time points t of the time series P and Q are ₁₂ Is adjacent to, at a time point t ₀ Is not adjacent (t is the sum ₀ To t ₁₈ A point in time).

Based on the foregoing content of the method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, the probability density is:

the probability is:

Specifically, the dividing method of the numerical intervals is that the whole numerical space is divided into numerical intervals with equal probability. The probabilities are equal, i.e., the probability that any data point falls within any numerical interval is the same. Dividing the numerical space into S numerical intervals requires determining S-1 split points beta ₁ <β ₂ <β ₃ <...<β _S-1 . S intervals are [ beta ] ₀ ,β ₁ ],[β ₁ ,β ₂ ],....,[β _S-1 ,β _S ]Wherein beta is ₀ ＝-∞,β _S = + infinity. Assuming that the time sequence accords with normal distribution of X-N (0, 1), a probability density function of the time sequence can be obtained as shown in a formula (2); then the probability calculation method that any point in time of the time series falls within any numerical interval is shown in formula (3).

Based on the foregoing disclosure of the foregoing method embodiment, as an optional embodiment, a method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, the constructing a plurality of split points includes:

Specifically, the splitting point dividing the numerical section can be obtained from the literature SAX. To verify the validity of the algorithm over more value intervals, newton's method is used to calculate the split points for more value intervals. First, the construction function G (x) is as shown in formula (4), where β _i Is the known previous split point (beta ₀ ＝-∞,β _S = + infinity is provided A kind of electronic device. Then an iterative beta solution can be constructed _i+1 After each iteration, determining whether a stop condition (if G is equal to 0, it is determined as a split point, otherwise, it is not a split point) is satisfied by using the formula (4) to determine each split point.

Based on the foregoing disclosure of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention constructs an interval table according to the probability and a plurality of numerical intervals, and correspondingly, elements of the interval table include:

Specifically, each Interval of the Interval Table (Interval Table) counts a set of time points corresponding to data points located in the Interval. Because the time of the time sequence is consistent, the time point set of each interval can be converted into binary representation; therefore, each interval table is a two-dimensional matrix of s×n, S represents the number of numerical intervals, n represents the length of the time series, and the value of each element in the interval table can be only 0 or 1. If the element is 1, the position of the time point in the numerical space is indicated to be in the corresponding interval, otherwise, the position of the time point in the numerical space is indicated to not fall in the corresponding interval. The form of each element in the Iable is shown as (6). For subsequences of equal time sequence length, the transformed interval tables are not only similar in structure, but also the number of binary 1's appearing in each interval table is the same. The position where a binary 1 appears in the interval table represents the variability of the interval table. Then, in combination with the feature that the abnormal data is "few and different", it can be found that if the data point of the sub-sequence at some time point is abnormal, the position of the 1 in the interval table corresponding to the sub-sequence will be different from the position of the 1 in most other interval tables.

Based on the foregoing disclosure of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, where the averaging of all weights as a score of each subsequence includes:

wherein t is _i Is the i-th moment; score (t) _i ) For time point t _i Is at (1)The weight in the extended interval table; score (P) is the fraction of subsequences; w is a weight; r is (r) _j+1,i For time point t _i The compact coefficients in the position of the numerical space and the adjacent upper interval; r is (r) _j-1,i For time point t _i The position in numerical space and the adjacent lower interval.

Specifically, each extended interval table (Extend Interval Table, EITable) is composed of a matrix of size s×n, n representing the length of the subsequence constituting the EITable, and S representing the number of numerical interval sections. Each element w in EITable _j,i Is an integer greater than or equal to 0, and represents the time point t calculated in the data set _i At the interval of numerical value _j Is a weight of (a). The structure of the EITable is shown in table 1.

TABLE 1

After the EITable is constructed, the weight distribution of each time point in the time sequence data set in S intervals can be obtained. The greater the weight, the more sub-sequences the points in time representing the dataset are in line with the interval distribution in the location of the numerical space; the smaller the weight, the more points in time with fewer sub-sequences are in the numerical space, which corresponds to the distribution of this interval. From the feature of "few and different" abnormal data, it can be inferred that if a sub-sequence is abnormal, then some or even all of the time points in the sub-sequence have a distribution in the numerical space that is significantly different from the distribution of the time points corresponding to the majority of the other sub-sequences. The different distribution of time points in the numerical space is reflected in that the weights of the time points of the abnormal subsequence in the extended interval table in the EITable are different, so that the weights of the time points are necessarily small. Thus, the abnormality of the subsequence is determined by calculating the weight score of the time series in the EITable. The weight score is calculated as follows: based on the constructed extended interval table (EITable), the weight of each time point of each sub-sequence in the extended interval table is queried, and then the average value of all the time point weights of the sub-sequence is calculated as the fraction of the sub-sequence, as shown in a formula (7). The larger the score of the subsequence P calculated using equation (7), the greater the probability that P will fit the distribution of most subsequences; the smaller the score, the less likely the subsequence P will fit the distribution of most non-self matching subsequences, and the more likely the subsequence P will be abnormal.

score(t _i ) Is the calculation time point t _i The weight in EITable is divided into three parts: t is t _i The weight of the section to which the weight belongs and the weights of two sections adjacent to each other up and down; but if t _i Belonging to the first and last numerical intervals, inquiring the time point t acquired by EITable _i The weight of (2) comprises two parts: t is t _i The weight of the section to which it belongs and the weight of the adjacent preceding or following section. When calculating the score of adjacent sections, the time point t needs to be calculated first _i The degree of compactness between the position in the numerical space and the adjacent interval; if the time point t _i The position of the time point t can be approximately calculated by closely compacting the adjacent interval _i Classifying the adjacent sections; if there is a slight gap from the adjacent interval, only the time point t can be described _i There is a neighborhood relationship with a few data in adjacent intervals. Thus, score (t) _i ) The formula of (2) is shown as formula (8).

According to the method for detecting the abnormal subsequence in the time sequence, which is provided by the embodiment of the invention, the time sequence and the similarity thereof are redefined, the numerical value space is divided into a plurality of numerical value intervals, the probability density and the corresponding falling probability of the time sequence are further obtained, an extended interval table is constructed on the basis, the subsequence of the time sequence is scored according to the weight in the extended interval table, the algorithm detection efficiency can be improved on the premise that the time dimension information of the time sequence is complete, and the detection precision and the reliability of the abnormal subsequence are ensured.

In order to more clearly illustrate the essence of the technical scheme of the invention, an integral embodiment is proposed on the basis of the above embodiment, and the overall view of the technical scheme of the invention is presented. It should be noted that, the overall embodiment is only for further embodying the technical essence of the present invention, and not limiting the scope of the present invention, and any combined technical solution meeting the technical essence of the present invention obtained by combining technical features on the basis of each embodiment of the present invention by a person skilled in the art is within the scope of protection of the present patent as long as the practical implementation is possible.

First, the time series data set selected in this experiment is shown in the following table 2 (UCR experimental data set):

TABLE 2

Table 2 contains a total of 4 different types of time-series data sets, and the time-series length of these data sets is from 65 to 2709, and the abnormal time-series contained in these data occupy different proportions, respectively. The diversity of data in the table may verify from different aspects the validity analysis of the proposed algorithm for time series anomaly detection. In order for a quantitative analysis algorithm to accurately detect abnormal time sequences, the proposed algorithm is evaluated using the AUC index. AUC represents the area of a graph surrounded by an ROC curve and two coordinate axes, the ROC curve can be used for evaluating indexes of the effects of two classifiers, data samples are ordered according to the prediction results of the classifiers, different thresholds are sequentially taken according to the order, samples with the prediction effects larger than the thresholds are taken as positive examples, and samples with the prediction results smaller than the thresholds are taken as negative examples. An element (FPR, TPR) is obtained each time divided by different thresholds, wherein FPR represents the false positive rate and TPR represents the true positive rate. And then calculating the values of two important quantities each time, and respectively plotting with FPR as an abscissa and TPR as an ordinate to obtain the ROC curve. True positive rates are also known as sensitivity in machine learning, false positive rates are also known as probability of false positives.

The selected comparison algorithms are respectively as follows: angle-based anomaly detection algorithm (FastVOA) proposed in 2012; an anomaly detection algorithm (PAPR-RW) with a combination of piecewise aggregation approximation and a random walk model was proposed in 2017; the kernel density based anomaly detection algorithm (RDOS) proposed in 2017; the interval set-based time series anomaly detection algorithm (international) proposed in 2018. Setting parameters of the comparison algorithm, wherein the neighbor number in the RDOS algorithm is set to be 10 according to the parameter set suggested in the reference document; the number of hash functions of the FastVOA algorithm is set to 100; parameters suggested in parameter references in the international algorithm, such as a boundary width factor of 0.2; the number of subspaces in the PAPR-RW algorithm is set to values ranging from 6 to 9 as suggested, and the other three parameters are set to 0.3, 0.4 and 0.3, respectively. Experimental results as shown in table 3, the best first two experimental results on each dataset are shown in bold, NA indicates that in the current experimental environment, the algorithm cannot calculate on this dataset. The AUC scores on the data sets for each algorithm are shown in table 3.

TABLE 3 Table 3

In the experimental results of table 3, the results of EITable are experimental results of the algorithms proposed by the subject, and the other columns are experimental results of the selected comparison algorithm. From the experimental results in the table, it can be found that in most of the time, the EITable has a better detection result, and compared with other algorithms, the EITable has a different degree of improvement in AUC score. For example: on the MoteStrain data set, the two proposed algorithms have more than ten percent improvement than other algorithms; on the Lighting2 dataset, ten percent is improved compared with the RDOS algorithm, and twenty percent is improved compared with the RDOS algorithm; ten percent improvement over other algorithms as well on the ECG200 dataset; there were good results on the three datasets of the DiatomSizeReduction.

Besides verifying the effectiveness of the experiment under the AUC index, the difference of the proposed algorithm and the comparison algorithm in CPU time is counted. The CPU run time pairs for each method on the data set of table 2 are shown in table 4. From the experimental results of table 4 (run time comparisons over different time series data sets), it can be seen that EITable requires less run time over most data sets, which requires only linear time complexity; the international algorithm divides the time series and finds the similarity matrix, which requires less time for small data sets, so that it can reach the best running time in part of the data sets; the algorithm RDOS needs to calculate Euclidean distance between time sequences and calculate k nearest neighbor, and all the time is needed; the PAPR-RW requires the maximum running time because it requires to convert the time series representation and calculate the similarity matrix first and input the similarity matrix into the RW model for multiple iterative optimization.

TABLE 4 Table 4

The implementation basis of the embodiments of the present invention is realized by a device with a processor function to perform programmed processing. Therefore, in engineering practice, the technical solutions and the functions of the embodiments of the present invention can be packaged into various modules. Based on this actual situation, on the basis of the above embodiments, an embodiment of the present invention provides an apparatus for detecting an abnormal subsequence in a time sequence, which is configured to perform the method for detecting an abnormal subsequence in a time sequence in the above method embodiment. Referring to fig. 4, the apparatus includes:

a sequence construction module 401, configured to form a tuple with a single value and a single time point, form a time sequence from a plurality of tuples, and define similarities of different time sequences at any time point;

the extended interval table construction module 402 is configured to construct a plurality of splitting points, divide a numerical space in a time sequence into a plurality of numerical intervals, obtain probability density of the time sequence, obtain probability that any time point in the time sequence falls into any numerical interval according to the probability density, construct an interval table according to the probability and the plurality of numerical intervals, and construct an extended interval table according to the interval table;

an anomaly determination module 403, configured to obtain weights of each time point of each sub-sequence of a time sequence in the extended interval table, average the weights of all the time points as a score of each sub-sequence, and if the score is smaller, determine that the sub-sequence is less likely to be anomalous;

The device for detecting the abnormal subsequence in the time sequence provided by the embodiment of the invention adopts the sequence construction module, the extended interval table construction module and the abnormality judgment module, redefines the time sequence and the similarity thereof, divides the numerical space into a plurality of numerical intervals, further obtains the probability density and the corresponding falling probability of the time sequence, constructs the extended interval table on the basis, scores the subsequence of the time sequence according to the weight in the extended interval table, and can improve the algorithm detection efficiency and ensure the detection precision and the reliability of the abnormal subsequence on the premise of ensuring the integrity of the information of the time sequence in the time dimension.

The method of the embodiment of the invention is realized by the electronic equipment, so that the related electronic equipment is necessary to be introduced. To this end, an embodiment of the present invention provides an electronic device, as shown in fig. 5, including: at least one processor (processor) 501, a communication interface (Communications Interface) 504, at least one memory (memory) 502 and a communication bus 503, wherein the at least one processor 501, the communication interface 504, and the at least one memory 502 are in communication with each other via the communication bus 503. The at least one processor 501 may invoke logic instructions in the at least one memory 502 to perform the following method: adopting a single numerical value and a single moment to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any moment; constructing a plurality of splitting points, dividing a numerical space in a time sequence into a plurality of numerical intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical intervals, and constructing an extended interval table according to the interval table; acquiring the weight of each time point of each sub-sequence of the time sequence in the extended interval table, taking an average value of all the weights as the score of each sub-sequence, and determining that the sub-sequence is less likely to be abnormal if the score is smaller; wherein the value space is made up of all values in the number of tuples; the probability that any numerical point falls within any numerical interval is the same.

Further, the logic instructions in the at least one memory 502 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. Examples include: adopting a single numerical value and a single moment to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any moment; constructing a plurality of splitting points, dividing a numerical space in a time sequence into a plurality of numerical intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical intervals, and constructing an extended interval table according to the interval table; acquiring the weight of each time point of each sub-sequence of the time sequence in the extended interval table, taking an average value of all the weights as the score of each sub-sequence, and determining that the sub-sequence is less likely to be abnormal if the score is smaller; wherein the value space is made up of all values in the number of tuples; the probability that any numerical point falls within any numerical interval is the same. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Based on this knowledge, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting an abnormal subsequence in a time series, comprising:

adopting a single numerical value and a single moment to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any moment;

constructing a plurality of splitting points, dividing a numerical space in a time sequence into a plurality of numerical intervals, acquiring probability density of the time sequence, acquiring probability that any time point in the time sequence falls into any numerical interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical intervals, and constructing an extended interval table according to the interval table;

acquiring the weight of each time point of each sub-sequence of the time sequence in the extended interval table, taking an average value of all the weights as the score of each sub-sequence, and determining that the sub-sequence is less likely to be abnormal if the score is smaller;

wherein the value space is made up of all values in the number of tuples; the probability that any numerical point falls into any numerical interval is the same;

the probability density is:

the probability is:

wherein x is any time point; s is the number of numerical intervals; beta _i Is the ith split point;

i＝0,…,S-1；

the construction of a number of split points includes:

wherein p' is the derivative of p with respect to time; g is a constructor, if G is zero, then beta is determined _i Is a split point, if G is not zero, beta is determined _i Not the split point;

the interval table is constructed according to the probability and a plurality of numerical intervals, and correspondingly, the elements of the interval table comprise:

wherein j is the j-th numerical interval; ITable is an element of the interval table;

said averaging of the ownership weights as a fraction of said each sub-sequence comprises:

2. The method of claim 1, wherein the employing a single value with a single point in time to form a tuple, and the grouping of tuples into a time series, comprises:

P＝{(t ₁ ,p ₁ ),(t ₂ ,p ₂ ),(t ₃ ,p ₃ ),...,(t _n ,p _n )}

3. The method for detecting abnormal subsequences in a time series as claimed in claim 2, which comprisesCharacterized in that said defining the similarity of different time sequences at any point in time comprises: if the first time sequence and the second time sequence are at t ₁ To t _n And if the numerical value of any time point in the time is in the same numerical value interval, judging that the first time sequence and the second time sequence are similar at the any time point.

4. An apparatus for detecting an abnormal subsequence in a time series, comprising:

the probability density is:

the probability is:

i＝0,…,S-1；

the construction of a number of split points includes:

wherein t is _i Is the i-th moment; score (t) _i ) For time point t _i Is set in the extended interval table; sc (Sc)ore (P) is the fraction of subsequences; w is a weight; r is (r) _j+1,i For time point t _i The compact coefficients in the position of the numerical space and the adjacent upper interval; r is (r) _j-1,i For time point t _i The position in numerical space and the adjacent lower interval.

5. An electronic device, comprising:

at least one processor, at least one memory, and a communication interface; wherein,,

the processor, the memory and the communication interface are communicated with each other;

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-3.

6. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any one of claims 1 to 3.