CN112329847A

CN112329847A - Abnormity detection method and device, electronic equipment and storage medium

Info

Publication number: CN112329847A
Application number: CN202011215040.XA
Authority: CN
Inventors: 郭海
Original assignee: Beijing Shenzhou Taiyue Software Co Ltd
Current assignee: Beijing Shenzhou Taiyue Software Co Ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-02-05

Abstract

The application provides an abnormality detection method, an abnormality detection device, an electronic apparatus, and a storage medium, the method including: acquiring time sequence data to be detected, wherein the time sequence data to be detected is equipment performance data changing along with time; extracting data characteristics of time sequence data to be detected; searching indication information of the data characteristic corresponding model in the association relation table; and carrying out anomaly detection on the time sequence data to be detected by using an anomaly detection model corresponding to the indication information to obtain an anomaly detection result. In the implementation process, different anomaly detection models are used for detecting and processing different types of performance time sequence data of different devices, so that the condition that a large number of false alarms are easily caused by setting an index threshold value depending on the experience level of operation and maintenance personnel to perform anomaly detection is avoided, and the accuracy of anomaly detection on the time sequence data is effectively improved.

Description

Abnormity detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the technical field of machine learning and computer operation and maintenance, and in particular, to an anomaly detection method and apparatus, an electronic device, and a storage medium.

Background

In the field of operation and maintenance of computers, servers, and network devices at present, abnormality detection is usually performed through an index threshold value of device performance, where the index threshold value may be understood as that when time series data of real-time performance of a device exceeds a certain preset index threshold value, it is determined that the device is abnormal, and the specific examples are as follows: if the utilization rate of a Central Processing Unit (CPU) or a network bandwidth of the device exceeds 90% and the duration exceeds a preset duration, it is determined that the device is abnormal. After the equipment is determined to be abnormal, acquiring alarm information of the equipment, sending the alarm information to operation and maintenance personnel corresponding to the equipment, and after receiving the alarm information, the operation and maintenance personnel correspondingly processes the equipment to eliminate the equipment abnormality.

In specific practice, it is found that, because the number and types of hardware devices are very large, the performance index threshold values required to be set for each device are different according to different requirements, and the types of operating systems of server devices to be maintained are also very rich, the workload of manually setting the index threshold values or the index threshold values for each device is very large, the accuracy of anomaly detection in a manual threshold value setting mode is very low, the accuracy mainly depends on the experience level of operation and maintenance personnel, and a large number of false alarms are easily caused.

Disclosure of Invention

An object of the embodiments of the present application is to provide an abnormality detection method, an abnormality detection apparatus, an electronic device, and a storage medium, which are used to solve the problem of very low accuracy of abnormality detection.

The embodiment of the application provides an anomaly detection method, which comprises the following steps: acquiring time sequence data to be detected, wherein the time sequence data to be detected is equipment performance data changing along with time; extracting data characteristics of time sequence data to be detected; searching indication information of the data characteristic corresponding model in the association relation table; and carrying out anomaly detection on the time sequence data to be detected by using an anomaly detection model corresponding to the indication information to obtain an anomaly detection result. In the implementation process, the data characteristics of the performance time sequence data of the equipment are extracted, the abnormal detection model corresponding to the data characteristics is found in the association relation table, and finally the abnormal detection model is used for carrying out abnormal detection on the time sequence data, so that an abnormal detection result is obtained; that is to say, different anomaly detection models are used for detecting and processing different types of performance time series data of different devices, so that the condition that a large number of false alarms are easily caused by setting an index threshold value depending on the experience level of operation and maintenance personnel to perform anomaly detection is avoided, and the accuracy of anomaly detection on the time series data is effectively improved.

Optionally, in this embodiment of the application, before looking up the indication information of the data feature corresponding model in the association relationship table, the method further includes: obtaining a plurality of time series data; classifying the plurality of time series data to obtain a plurality of data categories, wherein each data category in the plurality of data categories comprises at least one time series data; clustering all time sequence data in each data category by using a clustering algorithm corresponding to each data category in the data categories to obtain a plurality of clustered subclasses, wherein each subclass of the subclasses comprises at least one time sequence data; and training the anomaly detection model corresponding to each subclass by using all the time sequence data in each subclass to obtain the trained anomaly detection model. In the implementation process, clustering all time sequence data in each data category through a clustering algorithm corresponding to each data category in the classified data categories; the clustered time sequence data share one model, so that a series of problems of operation, maintenance and management caused by various models are greatly reduced, and meanwhile, computing resources, storage resources, maintenance manpower and cost are saved.

Optionally, in this embodiment of the present application, after obtaining the trained anomaly detection model, the method further includes: extracting the centroid data of each subclass from at least one time sequence data included in each subclass, wherein the centroid data is a time sequence formed by the average values of all time sequence data in the subclass; extracting the characteristics of the centroid data of each subclass; and storing the characteristics of the centroid data of each subclass and the indication information of the abnormal detection model corresponding to each subclass into an association relation table. In the implementation process, the centroid data of each subclass is extracted from at least one time sequence data of each subclass, the characteristics of the centroid data of each subclass are extracted, and then the characteristics of the centroid data of each subclass and the indication information of the abnormal detection model corresponding to each subclass are stored in an association relation table; therefore, the abnormal detection model corresponding to the data characteristics can be quickly matched through the association relation table, and the speed and the efficiency of abnormal detection on the time sequence data are effectively improved.

Optionally, in an embodiment of the present application, the plurality of data categories include: periodic data and stationary data; classifying the plurality of time series data to obtain a plurality of data categories, comprising: if the time sequence data has the periodicity characteristic, determining the data type of the time sequence data as the periodicity data; and if the time sequence data has the characteristic of stationarity, determining the data type of the time sequence data as stationarity data. In the implementation process, if the time series data has the periodicity characteristic, the data type of the time series data is determined to be the periodicity data; if the time sequence data has the characteristic of stationarity, determining the data type of the time sequence data as stationarity data; therefore, the method has good expansibility and universality, and can perform subdivision classification on irregular data, so that the corresponding anomaly detection method is expanded, and the accuracy of anomaly detection on time sequence data is further improved.

Optionally, in this embodiment of the present application, clustering all time series data in each data category by using a clustering algorithm corresponding to each data category in a plurality of data categories includes: if the data category of the time sequence data is periodic data, clustering all the time sequence data in each data category by using a mean value clustering algorithm; and if the data category of the time sequence data is stationary data, clustering all the time sequence data in each data category by combining a mean value clustering algorithm and a dynamic time warping algorithm.

Optionally, in this embodiment of the present application, the plurality of data categories further include: irregular data; classifying the plurality of time series data to obtain a plurality of data categories, further comprising: if the time series data do not have the periodicity characteristic and the stationarity characteristic, determining the data type of the time series data to be irregular data; clustering all time sequence data in each data category by using a clustering algorithm corresponding to each data category in a plurality of data categories, and further comprising: and if the data category of the time sequence data is irregular data, clustering all the time sequence data in each data category by using a density-based clustering algorithm.

Optionally, in this embodiment of the present application, the searching for the indication information of the data feature corresponding model in the association relationship table includes: calculating similarity values of a plurality of characteristics and data characteristics in the association relation table to obtain a plurality of similarity values; screening out the features with the maximum similarity values from the similarity values; and determining the indication information of the model corresponding to the features with the maximum similarity value as the indication information of the model corresponding to the data features. In the implementation process, a plurality of similarity values are obtained by calculating the similarity values of a plurality of characteristics and data characteristics in the association relation table; screening out the features with the maximum similarity values from the similarity values; determining the indication information of the model corresponding to the features with the maximum similarity value as the indication information of the model corresponding to the data features; therefore, the model with the maximum similarity can carry out abnormity detection on the time sequence data, and the accuracy of abnormity detection on the time sequence data is improved.

An embodiment of the present application further provides an anomaly detection device, including: the detection data acquisition module is used for acquiring time sequence data to be detected, and the time sequence data to be detected is equipment performance data which changes along with time; the data characteristic extraction module is used for extracting the data characteristics of the time sequence data to be detected; the indication information searching module is used for searching the indication information of the data characteristic corresponding model in the association relation table; and the detection result obtaining module is used for carrying out abnormity detection on the time sequence data to be detected by using the abnormity detection model corresponding to the indication information to obtain an abnormity detection result.

Optionally, in an embodiment of the present application, the method further includes: the time sequence data acquisition module is used for acquiring a plurality of time sequence data; the time sequence data classification module is used for classifying the time sequence data to obtain a plurality of data categories, and each data category in the data categories comprises at least one time sequence data; the time sequence data clustering module is used for clustering all time sequence data in each data category by using a clustering algorithm corresponding to each data category in the multiple data categories to obtain multiple clustered subclasses, wherein each subclass of the multiple subclasses comprises at least one time sequence data; and the detection model obtaining module is used for training the abnormality detection model corresponding to each subclass by using all the time sequence data in each subclass to obtain the trained abnormality detection model.

Optionally, in an embodiment of the present application, the method further includes: the centroid data extraction module is used for extracting the centroid data of each subclass from at least one time sequence data of each subclass, and the centroid data is a time sequence formed by the average values of all the time sequence data in the subclass; the centroid feature extraction module is used for extracting the feature of the centroid data of each subclass; and the association information storage module is used for storing the characteristics of the centroid data of each subclass and the indication information of the abnormality detection model corresponding to each subclass into the association relation table.

Optionally, in an embodiment of the present application, the plurality of data categories include: periodic data and stationary data; a time series data classification module comprising: the periodic data classification module is used for determining the data type of the time sequence data as periodic data if the time sequence data has the periodic characteristic; and the stationary data classification module is used for determining the data type of the time sequence data as stationary data if the time sequence data has the stationary characteristic.

Optionally, in an embodiment of the present application, the time series data clustering module includes: the periodic data clustering module is used for clustering all the time sequence data in each data category by using a mean clustering algorithm if the data category of the time sequence data is periodic data; and the stable data clustering module is used for clustering all the time sequence data in each data category by combining a mean clustering algorithm and a dynamic time warping algorithm if the data category of the time sequence data is stable data.

Optionally, in this embodiment of the present application, the plurality of data categories further include: irregular data; the time sequence data classification module further comprises: the irregular classification module is used for determining the data type of the time sequence data as irregular data if the time sequence data has neither periodicity characteristics nor stationarity characteristics; a temporal data clustering module, comprising: and the irregular clustering module is used for clustering all the time sequence data in each data category by using a density-based clustering algorithm if the data category of the time sequence data is irregular data.

Optionally, in this embodiment of the present application, the indication information searching module includes: the similarity value calculation module is used for calculating the similarity values of the plurality of characteristics and the data characteristics in the association relation table to obtain a plurality of similarity values; the similarity value screening module is used for screening out the features with the maximum similarity values from the similarity values; and the indicating information determining module is used for determining the indicating information of the model corresponding to the features with the maximum similarity value as the indicating information of the model corresponding to the data features.

An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing processor-executable machine-readable instructions, the machine-readable instructions when executed by the processor performing the method as described above.

Embodiments of the present application also provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the method as described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of an anomaly detection method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of online anomaly detection and offline classification training provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating training of an anomaly detection model and building of an association table according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an abnormality detection apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Before describing the anomaly detection method provided by the embodiment of the present application, some concepts related to the embodiment of the present application are described:

the time series data is time series data, and the time series data is a data sequence recorded by the same index in time sequence; each data in the same data column has to be of the same caliber, the requirement is comparability, time sequence data can be time period data or time point data, and the purpose of time sequence analysis is to construct a time sequence model and predict outside a sample by finding out the statistical characteristic and the development regularity of a time sequence in the sample.

Fourier Transform (FT), which has a plurality of chinese translation names, commonly known as "Fourier Transform", etc., means that a certain function satisfying a certain condition can be expressed as a trigonometric function (sine and/or cosine function) or a linear combination of their integrals, and methods applied in signal analysis are, for example: many waveforms can be used as components of the signal, such as sine waves, square waves, sawtooth waves, etc., and the fourier transform uses sine waves as components of the signal.

Discrete Wavelet Transform (DWT), which is a Discrete input and a Discrete output as its name implies, is useful in numerical analysis and time-frequency analysis, but there is no simple and clear formula to express the relationship between input and output.

Feature selection (Feature selection) refers to a process of selecting at least one Feature from a plurality of existing features (features) to optimize a specific index of a system, and selecting some most effective features from original features to reduce data dimension.

It should be noted that the abnormality detection method provided in the embodiment of the present application may be executed by an electronic device, where the electronic device refers to a device terminal or a server having a function of executing a computer program, and the device terminal includes, for example: a smart phone, a Personal Computer (PC), a tablet computer, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), a network switch or a network router, and the like.

Before describing the anomaly detection method provided by the embodiment of the present application, an application scenario applicable to the anomaly detection method is described, where the application scenario includes, but is not limited to: the operation and maintenance fields of computer software, hardware and network equipment specifically include: the anomaly detection method is used for determining a model for anomaly detection of time series data according to the characteristics of the time series data, and the model is used for anomaly detection of the time series data, and the like, wherein the time series data comprises but is not limited to: the access amount of the service layer, the access duration of the database, the time consumed by page requests, the CPU utilization rate of the basic performance layer, the memory utilization rate, the switching area utilization rate and the like.

Please refer to fig. 1, which illustrates a flow chart of an anomaly detection method provided in the embodiment of the present application; the method comprises the main steps of firstly extracting data characteristics of performance time sequence data of equipment, finding an abnormal detection model corresponding to the data characteristics in an association relation table, and finally performing abnormal detection on the time sequence data by using the abnormal detection model so as to obtain an abnormal detection result; that is to say, different anomaly detection models are used for detecting and processing different types of performance time series data of different devices, so that the condition that a large number of false alarms are easily caused by setting an index threshold value depending on the experience level of operation and maintenance personnel to perform anomaly detection is avoided, and the accuracy of anomaly detection on the time series data is effectively improved; the abnormality detection method described above may include:

step S110: and acquiring time sequence data to be detected.

The time series data to be detected is time series data of the performance of the equipment which changes in real time along with time, and specifically includes: the access amount of the service layer, the access duration of the database, the time consumed by page requests, the CPU utilization rate of the basic performance layer, the memory utilization rate, the switching area utilization rate and the like.

There are many ways to obtain the time series data to be detected in step S110, including but not limited to: the first acquisition mode is that time sequence data to be detected sent by other terminal equipment is received, and the time sequence data to be detected is stored in a file system, a database or mobile storage equipment; the second obtaining method is to obtain pre-stored time series data to be detected, and specifically includes: acquiring time sequence data to be detected from a file system, or acquiring the time sequence data to be detected from a database, or acquiring the time sequence data to be detected from a mobile storage device; and in the third acquisition mode, software such as a browser is used for acquiring the time sequence data to be detected on the Internet, or other application programs are used for accessing the Internet to acquire the time sequence data to be detected.

After step S110, step S120 is performed: and extracting the data characteristics of the time sequence data to be detected.

The data characteristics refer to statistical characteristics, spectral characteristics and the like of time series data, and specifically, the data characteristics include, but are not limited to: period, frequency, and amplitude, etc.

There are many embodiments of the above step S120, including but not limited to the following:

in a first embodiment, the data features of the time series data to be detected are extracted by using a machine learning algorithm, where the machine learning algorithm includes but is not limited to: decision trees, bayesian learning, instance-based learning, genetic algorithms, rule learning, interpretation-based learning, and histogram of oriented gradients feature extraction algorithms, among others.

In a second embodiment, a Neural network model is used to extract data features of time series data to be detected, where the Neural network model may use Deep Neural Networks (DNNs), and the Deep Neural Networks (DNNs) are a discriminant model, including but not limited to: a Single point multi-box Detector (FSSD), a LeNet network, an AlexNet network, a *** LeNet network, a VGG network, a renet network, a Wide renet network, an inclusion network, and the like.

After step S120, step S130 is performed: and searching indication information of the data characteristic corresponding model in the association relation table.

The indication information of the model is information for guiding the model corresponding to the acquired data features, and specifically includes: host address information of the model, version number of the model, name of the model, and Uniform Resource Locator (URL) of the model call, etc.

There are many embodiments of the above step S130, including but not limited to the following:

in a first embodiment, determining corresponding model indication information according to a feature with a largest data feature similarity value in the association relationship table may include:

step S131: and calculating similarity values of the plurality of characteristics and the data characteristics in the association relation table to obtain a plurality of similarity values.

There are many ways to calculate the similarity in step S131, including but not limited to: calculating similarity values of a plurality of features and data features in the association table by using an average hash algorithm (also called a mean hash algorithm) or a perceptual hash (hash) algorithm, and the like, where measures of the similarity values are also various, including but not limited to: cosine distance, cosine similarity, hamming distance or euclidean distance, etc.; of course, in practice, a plurality of complex patterns such as combination weighting parameters may be selected to determine the similarity value.

Step S132: and screening out the features with the maximum similarity value from the plurality of similarity values.

Step S133: and determining the indication information of the model corresponding to the features with the maximum similarity value as the indication information of the model corresponding to the data features.

The embodiments of the above steps S132 to S133 are, for example: screening out the features with the largest similarity value from the similarity values, and determining the indication information of the model corresponding to the features with the largest similarity value as the indication information of the model corresponding to the data features, specifically for example: if the similarity values of the plurality of features in the data feature and association table are 33, 88 and 99, respectively, and the models corresponding to the similarity values of 33, 88 and 99 are A, B and C, respectively, the similarity value is 99 with the maximum value, and the model corresponding to the similarity value 99 is C, so that the indication information of the feature corresponding model with the similarity value of 99 should be determined as the indication information of the data feature corresponding model. Of course, in the process of adapting the model indication information, other machine learning algorithms may also be used as the auxiliary algorithm, specifically, for example: and (4) adapting different anomaly detection models to the characteristics of the time series data through the decision tree.

In the implementation process, a plurality of similarity values are obtained by calculating the similarity values of a plurality of characteristics and data characteristics in the association relation table; screening out the features with the maximum similarity values from the similarity values; determining the indication information of the model corresponding to the features with the maximum similarity value as the indication information of the model corresponding to the data features; therefore, the model with the maximum similarity can carry out abnormity detection on the time sequence data, and the accuracy of abnormity detection on the time sequence data is improved.

In a second embodiment, determining corresponding model indication information from the features of which the similarity values of the data features in the association table are greater than the preset threshold, and then from the features of which the similarity values are greater than the preset threshold, the embodiment may include:

step S134: and calculating similarity values of the plurality of characteristics and the data characteristics in the association relation table to obtain a plurality of similarity values.

The implementation principle and implementation manner of step S134 are similar to those of step S131, and therefore, the implementation principle and implementation manner of step are not described herein, and reference may be made to the description of step S131 if it is not clear.

Step S135: and screening out the features with the similarity values larger than a preset threshold value from the similarity values.

Step S136: and randomly selecting one feature from the features with the similarity value larger than a preset threshold value to obtain the target feature.

The embodiments of the above steps S132 to S133 are, for example: screening out the features with the similarity values larger than a preset threshold value from the similarity values, and randomly selecting one feature from the features with the similarity values larger than the preset threshold value to obtain a target feature, wherein the specific examples are as follows: if the similarity values of the plurality of features in the data feature and association table are 33, 88 and 99 respectively, the models corresponding to the similarity values of 33, 88 and 99 are A, B and C respectively, and the preset threshold is 80, then the similarity values 88 and 99 larger than the preset threshold are randomly selected from 88 and 99. If 88 is randomly selected, the indication information of the feature correspondence model with the similarity value of 88 should be determined as the indication information of the data feature correspondence model; if 99 is randomly selected, the indication of the feature correspondence model with a similarity value of 99 should be determined as the indication of the data feature correspondence model.

Step S137: and determining the indication information of the target characteristic corresponding model as the indication information of the data characteristic corresponding model.

The implementation principle and implementation manner of step S137 are similar to those of step S133, and therefore, the implementation principle and implementation manner of step S133 are not described here, and reference may be made to the description of step S133 if it is unclear.

After step S130, step S140 is performed: and carrying out anomaly detection on the time sequence data to be detected by using an anomaly detection model corresponding to the indication information to obtain an anomaly detection result.

The implementation of the above step S140 differs according to the model, including but not limited to the following:

in the first embodiment, if the data type of the time series data to be detected is periodic data, the time series data of the periodic data type may be matched with the same-proportion algorithm model, and the same-proportion algorithm model is used to perform anomaly detection on the time series data to be detected, so as to obtain an anomaly detection result.

In the second embodiment, assuming that the data type of the time series data to be detected is stationary data, the Arima algorithm model can be matched with the time series data of the stationary data type, and the Arima algorithm model is used for performing anomaly detection on the time series data to be detected, so as to obtain an anomaly detection result.

In the third embodiment, if the data type of the time series data to be detected is irregular data, the Box-plot algorithm model can be matched with the time series data of the irregular data, and the Box-plot algorithm model is used for performing anomaly detection on the time series data to be detected to obtain an anomaly detection result.

In the fourth embodiment, if the matched model is the density-based anomaly detection algorithm model, the density anomaly detection model in the machine learning algorithm is used for anomaly detection; specific examples thereof include: obtaining an abnormal detection model according to the address information of a host where the model is located, the version number of the model, the name of the model, and indication information such as a Uniform Resource Locator (URL) of the model call, and using the abnormal detection model to treat an outlier data point in time sequence data to be detected as an abnormal detection result.

In the fifth embodiment, if the matched model is a statistical feature anomaly detection algorithm-based model, anomaly detection is performed by using a statistical feature anomaly detection algorithm in a machine learning algorithm; specific examples thereof include: and under the condition that the sample is in accordance with normal distribution, the abnormal point is detected by applying a statistical algorithm of probability distribution.

It can be understood that after the anomaly detection result is obtained, the anomaly detection result can be stored in a temporary database or a persistent database according to the actual scene requirements, so that the anomaly detection result can be displayed to a user subsequently, or the warning information of the anomaly detection can be sent to the user; the database here includes but is not limited to: in-memory databases, relational and non-relational databases, and the like.

In the implementation process, the data characteristics of the performance time sequence data of the equipment are extracted, the abnormal detection model corresponding to the data characteristics is found in the association relation table, and finally the abnormal detection model is used for carrying out abnormal detection on the time sequence data, so that an abnormal detection result is obtained; that is to say, different anomaly detection models are used for detecting and processing different types of performance time series data of different devices, so that the condition that a large number of false alarms are easily caused by setting an index threshold value depending on the experience level of operation and maintenance personnel to perform anomaly detection is avoided, the accuracy of anomaly detection on time series data is effectively improved, and the problem of very low accuracy of anomaly detection is solved.

Please refer to fig. 2, which is a schematic diagram of online anomaly detection and offline classification training provided in the embodiment of the present application; before the incidence relation table is used, namely before online anomaly detection is carried out by using the model, the anomaly detection model needs to be trained in batch. The real-time data stream is collected from the online real-time production environment, and the real-time data is stored in the database (namely, the data in the graph is put in storage), so that a data source is provided for model training, and a plurality of time sequence data can be directly extracted from the database as training data during model training. The specific training process is described in detail below, and the indication information of the trained abnormal detection model and the like are inserted into the association table, and the association table is used to match the model, so as to provide the abnormal detection service, and the alarm information generated according to the alarm strategy is sent to the user; the training of the anomaly detection model may be offline training (i.e., training in a case of departing from the production application environment), or online training (i.e., training in the production application environment).

Please refer to a schematic flow chart of training an anomaly detection model and constructing an association table provided in the embodiment of the present application shown in fig. 3; the following describes in detail the training process of the model by taking an offline training mode as an example, and the specific process of training the anomaly detection model includes:

step S210: a plurality of time series data is obtained.

There are many embodiments of the step S210, including but not limited to: the first acquisition mode is that a plurality of time sequence data sent by other terminal devices (for example, real-time data streams acquired by acquisition devices) are received, and the plurality of time sequence data are stored in a file system, a database or mobile storage devices; the second way of obtaining is to obtain a plurality of time series data.

Of course, in the process of specifically obtaining the time series data, a plurality of obtained time series data may also be preprocessed, specifically for example: missing data processing and feature selection, etc.; the missing data processing means that a large amount of data which is incomplete, inconsistent, abnormal and deviated exists in the original massive data, and the data missing value processing can be divided into two types: one is to delete missing data, and the other is to perform data interpolation, also called missing value interpolation.

After step S210, step S220 is performed: the method includes classifying a plurality of time series data to obtain a plurality of data categories, each of the plurality of data categories including at least one time series data.

The data category refers to a data category obtained after classification according to data characteristics of time series data, and the data category includes: periodic data, stationary data, irregular data, and the like; the periodic data is time sequence data with a periodic characteristic, the stationary data is time sequence data with a stationary characteristic, and the irregular data is time sequence data with neither a periodic characteristic nor a stationary characteristic, and can be understood as time sequence data with a characteristic which cannot be found.

There are many time series data classification manners in step S220, including but not limited to these several: the first classification method is to classify according to data characteristics, such as: the periodicity, stationarity and time sequence form of the data; the second classification method is to classify according to data sources, which specifically include: the access amount of the service level source, the access duration of the database source, the request time of the page source, and the like, as well as the CPU utilization, the memory utilization, the switching area utilization, and the like of the basic performance level source, which are described in detail below by taking the first classification manner as an example, the step S220 may include:

step S221: and if the time sequence data has the periodicity characteristic, determining the data type of the time sequence data as the periodicity data.

The embodiment of step S221 described above is, for example: and judging whether the time sequence data has the periodicity characteristic or not by using a Fourier transform algorithm or a fast Fourier transform algorithm, if so, marking a category label of the periodicity data for the time sequence data, and storing the time sequence data into a list of the periodicity time sequence, thereby determining that the data category of the time sequence data is the periodicity data.

Step S222: and if the time sequence data has the characteristic of stationarity, determining the data type of the time sequence data as stationarity data.

The embodiment of step S222 described above includes, for example: and judging whether the time sequence data has the stationarity characteristic or not by using a discrete wavelet transform algorithm, if the time sequence data has the stationarity characteristic, marking a category label of the stationarity data on the time sequence data, and storing the time sequence data into a list of the stationarity time sequence, thereby determining that the data category of the time sequence data is the stationarity data.

Step S223: and if the time sequence data does not have the periodicity characteristic and the stationarity characteristic, determining the data type of the time sequence data to be irregular data.

The embodiment of step S223 described above is, for example: if the time series data do not have the periodicity characteristic and the stationarity characteristic, classifying the rest time series data which do not meet the requirements of the periodicity characteristic and the stationarity characteristic into irregular time series data, marking a category label of the irregular data on the time series data, and storing the time series data into a list of the irregular time series, thereby determining the data category of the time series data to be the irregular data; that is, the data category of the time-series data having neither the periodicity characteristic nor the stationarity characteristic is determined as the irregular data. Of course, in a specific practical process, the data category of the time series data may be further expanded and subdivided according to actual needs, so as to match an anomaly detection model more suitable for the category.

In the implementation process, if the time series data has the periodicity characteristic, the data type of the time series data is determined to be the periodicity data; if the time sequence data has the characteristic of stationarity, determining the data type of the time sequence data as stationarity data; therefore, the method has good expansibility and universality, and can perform subdivision classification on irregular data, so that the corresponding anomaly detection method is expanded, and the accuracy of anomaly detection on time sequence data is further improved.

After step S220, step S230 is performed: and clustering all time sequence data in each data category by using a clustering algorithm corresponding to each data category in the plurality of data categories to obtain a plurality of clustered subclasses.

Wherein each of the plurality of subclasses includes at least one time series data.

Since there are many time series data classification methods, there are many time series data clustering methods in step S230, including but not limited to:

step S231: if the data category of the time series data is periodic data, clustering all the time series data in each data category according to the data period of the time series data by using a K-means clustering algorithm (K-means for short).

The embodiment of step S231 described above is, for example: taking the period in the periodic data as the basis of clustering, for example, the time sequence of day period is grouped into a subclass, the time sequence of hour period is grouped into a subclass, and so on. Of course, in a specific implementation process, other indexes may be used to measure the clustering, for example: euclidean distance, cosine similarity, Hamming distance or Euclidean distance, etc.; wherein, euclidean distance is also called as euclidean distance or euclidean metric, and refers to the common (i.e. straight line) distance between two points in euclidean space; using this distance, the euclidean space becomes the metric space.

Step S232: and if the data type of the Time sequence data is stationary data, clustering all the Time sequence data in each data type according to morphological similarity by combining a K-means clustering algorithm (K-means) and a Dynamic Time Warping (DTW) algorithm.

Dynamic Time Warping (DTW) refers to stretching or shrinking (or companding) an unknown to coincide with the length of a reference template, during which the time axis of the unknown is distorted or bent so that the characteristic of the unknown corresponds to a standard pattern.

Step S233: if the data category of the time series data is irregular data, Clustering all the time series data in each data category by using a Density-Based Spatial Clustering of Applications (DBSCAN).

Certainly, in a specific implementation process, the irregular data may not be clustered, or the clustering may be performed according to actual needs, for example: all timing data in each data category is clustered using DBSCAN.

After step S230, step S240 is performed: and training the anomaly detection model corresponding to each subclass by using all the time sequence data in each subclass to obtain the trained anomaly detection model.

In the implementation process, clustering all time sequence data in each data category through a clustering algorithm corresponding to each data category in the classified data categories; the clustered time sequence data share one model, so that a series of problems of operation, maintenance and management caused by various models are greatly reduced, and meanwhile, computing resources, storage resources, maintenance manpower and cost are saved.

After the trained anomaly detection model is obtained, data such as model indication information needs to be inserted into the association table, where a specific process of inserting data into the association table may include:

after step S240, step S250 is performed: centroid data for each subclass is extracted from each subclass including at least one time series data.

Wherein the centroid data is a time sequence composed of the average of all time sequence data in the subclass.

After step S250, step S260 is performed: features of the centroid data for each subclass are extracted.

After step S260, step S270 is performed: and storing the characteristics of the centroid data of each subclass and the indication information of the abnormal detection model corresponding to each subclass into an association relation table.

In the implementation process, the centroid data of each subclass is extracted from at least one time sequence data of each subclass, the characteristics of the centroid data of each subclass are extracted, and then the characteristics of the centroid data of each subclass and the indication information of the abnormal detection model corresponding to each subclass are stored in an association relation table; therefore, the abnormal detection model corresponding to the data characteristics can be quickly matched through the association relation table, and the speed and the efficiency of abnormal detection on the time sequence data are effectively improved.

Please refer to fig. 4, which illustrates a schematic structural diagram of an anomaly detection apparatus provided in the embodiment of the present application; the embodiment of the present application provides an abnormality detection apparatus 300, including:

the detection data obtaining module 310 is configured to obtain time series data to be detected, where the time series data to be detected is device performance data that changes with time.

And the data feature extraction module 320 is configured to extract data features of the time series data to be detected.

And the indication information searching module 330 is configured to search the association relation table for indication information of the data feature corresponding model.

The detection result obtaining module 340 is configured to perform anomaly detection on the to-be-detected time series data by using the anomaly detection model corresponding to the indication information, so as to obtain an anomaly detection result.

Optionally, in this embodiment of the present application, the abnormality detecting device further includes:

and the time sequence data acquisition module is used for acquiring a plurality of time sequence data.

The time sequence data classification module is used for classifying the time sequence data to obtain a plurality of data categories, and each data category in the data categories comprises at least one time sequence data.

And the time sequence data clustering module is used for clustering all the time sequence data in each data category by using a clustering algorithm corresponding to each data category in the multiple data categories to obtain multiple clustered subclasses, wherein each subclass of the multiple subclasses comprises at least one time sequence data.

And the detection model obtaining module is used for training the abnormality detection model corresponding to each subclass by using all the time sequence data in each subclass to obtain the trained abnormality detection model.

Optionally, in this embodiment of the present application, the abnormality detecting device may further include:

and the centroid data extraction module is used for extracting the centroid data of each subclass from at least one time sequence data of each subclass, and the centroid data is a time sequence formed by the average values of all the time sequence data in the subclass.

And the centroid feature extraction module is used for extracting the features of the centroid data of each subclass.

And the association information storage module is used for storing the characteristics of the centroid data of each subclass and the indication information of the abnormality detection model corresponding to each subclass into the association relation table.

Optionally, in an embodiment of the present application, the plurality of data categories include: periodic data and stationary data; a time series data classification module comprising:

and the periodic data classification module is used for determining the data type of the time sequence data as periodic data if the time sequence data has the periodic characteristic.

And the stationary data classification module is used for determining the data type of the time sequence data as stationary data if the time sequence data has the stationary characteristic.

Optionally, in an embodiment of the present application, the time series data clustering module includes:

and the periodic data clustering module is used for clustering all the time sequence data in each data category by using a mean clustering algorithm if the data category of the time sequence data is periodic data.

And the stable data clustering module is used for clustering all the time sequence data in each data category by combining a mean clustering algorithm and a dynamic time warping algorithm if the data category of the time sequence data is stable data.

Optionally, in this embodiment of the present application, the plurality of data categories further include: irregular data; the time sequence data classification module further comprises:

and the irregular classification module is used for determining the data type of the time sequence data as the irregular data if the time sequence data has neither the periodicity characteristic nor the stationarity characteristic.

A temporal data clustering module, comprising:

and the irregular clustering module is used for clustering all the time sequence data in each data category by using a density-based clustering algorithm if the data category of the time sequence data is irregular data.

Optionally, in this embodiment of the present application, the indication information searching module includes:

and the similarity value calculation module is used for calculating the similarity values of the plurality of characteristics and the data characteristics in the association relation table to obtain a plurality of similarity values.

And the similarity value screening module is used for screening out the features with the maximum similarity values from the multiple similarity values.

And the indicating information determining module is used for determining the indicating information of the model corresponding to the features with the maximum similarity value as the indicating information of the model corresponding to the data features.

It should be understood that the apparatus corresponds to the above-mentioned embodiment of the abnormality detection method, and can perform the steps related to the above-mentioned embodiment of the method, and the specific functions of the apparatus can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The device includes at least one software function that can be stored in memory in the form of software or firmware (firmware) or solidified in the Operating System (OS) of the device.

Please refer to fig. 5, which illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 400 provided in an embodiment of the present application includes: a processor 410 and a memory 420, the memory 420 storing machine-readable instructions executable by the processor 410, the machine-readable instructions when executed by the processor 410 performing the method as above.

The embodiment of the present application also provides a storage medium 430, where the storage medium 430 stores a computer program, and the computer program is executed by the processor 410 to perform the method as above.

The storage medium 430 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules of the embodiments in the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an alternative embodiment of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application.

Claims

1. An abnormality detection method characterized by comprising:

acquiring time sequence data to be detected, wherein the time sequence data to be detected is equipment performance data changing along with time;

extracting data characteristics of the time sequence data to be detected;

searching indication information of the data characteristic corresponding model in an association relation table;

and carrying out anomaly detection on the time sequence data to be detected by using an anomaly detection model corresponding to the indication information to obtain an anomaly detection result.

2. The method according to claim 1, further comprising, before said looking up the indication information of the data feature correspondence model in the association table:

obtaining a plurality of time series data;

classifying the plurality of time series data to obtain a plurality of data categories, wherein each data category in the plurality of data categories comprises at least one time series data;

clustering all time sequence data in each data category by using a clustering algorithm corresponding to each data category in the data categories to obtain a plurality of clustered subclasses, wherein each subclass of the subclasses comprises at least one time sequence data;

and training the anomaly detection model corresponding to each subclass by using all the time sequence data in each subclass to obtain the trained anomaly detection model.

3. The method of claim 2, further comprising, after the obtaining the trained anomaly detection model:

extracting centroid data of each subclass from the subclass comprising at least one time sequence data, wherein the centroid data is a time sequence formed by the average value of all time sequence data in the subclass;

extracting features of the centroid data of each subclass;

and storing the characteristics of the centroid data of each subclass and the indication information of the abnormal detection model corresponding to each subclass into the association relation table.

4. The method of claim 2, wherein the plurality of data categories comprise: periodic data and stationary data; the classifying the plurality of time series data to obtain a plurality of data categories includes:

if the time sequence data has the periodicity characteristic, determining the data type of the time sequence data as the periodicity data;

and if the time sequence data has the characteristic of stationarity, determining the data type of the time sequence data as stationarity data.

5. The method of claim 4, wherein the clustering all time series data in each of the plurality of data categories using the clustering algorithm corresponding to the each data category comprises:

if the data type of the time sequence data is periodic data, clustering all the time sequence data in each data type by using a mean value clustering algorithm;

and if the data category of the time sequence data is stationary data, clustering all the time sequence data in each data category by combining a mean value clustering algorithm and a dynamic time warping algorithm.

6. The method of claim 4, wherein the plurality of data categories further comprises: irregular data; the classifying the plurality of time series data to obtain a plurality of data categories further includes:

if the time series data do not have the periodicity characteristic and the stationarity characteristic, determining the data type of the time series data to be irregular data;

the clustering of all time series data in each data category by using the clustering algorithm corresponding to each data category in the plurality of data categories further comprises:

and if the data type of the time sequence data is irregular data, clustering all the time sequence data in each data type by using a density-based clustering algorithm.

7. The method according to claim 1, wherein the looking up the indication information of the data feature correspondence model in the association table comprises:

calculating similarity values of a plurality of characteristics in the association relation table and the data characteristics to obtain a plurality of similarity values;

screening out the features with the maximum similarity values from the similarity values;

and determining the indication information of the model corresponding to the features with the maximum similarity value as the indication information of the model corresponding to the data features.

8. An abnormality detection device characterized by comprising:

the device comprises a detection data acquisition module, a data processing module and a data processing module, wherein the detection data acquisition module is used for acquiring time sequence data to be detected, and the time sequence data to be detected is equipment performance data which changes along with time;

the data characteristic extraction module is used for extracting the data characteristics of the time sequence data to be detected;

the indication information searching module is used for searching the indication information of the data characteristic corresponding model in the association relation table;

and the detection result obtaining module is used for carrying out abnormity detection on the time sequence data to be detected by using the abnormity detection model corresponding to the indication information to obtain an abnormity detection result.

9. An electronic device, comprising: a processor and a memory, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions, when executed by the processor, performing the method of any of claims 1 to 7.

10. A storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1 to 7.