WO2021185182A1 - 一种异常检测的方法及装置 - Google Patents

一种异常检测的方法及装置 Download PDF

Info

Publication number
WO2021185182A1
WO2021185182A1 PCT/CN2021/080564 CN2021080564W WO2021185182A1 WO 2021185182 A1 WO2021185182 A1 WO 2021185182A1 CN 2021080564 W CN2021080564 W CN 2021080564W WO 2021185182 A1 WO2021185182 A1 WO 2021185182A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
boundary
period
value
index value
Prior art date
Application number
PCT/CN2021/080564
Other languages
English (en)
French (fr)
Inventor
卢冠男
朱红燕
莫林林
孙芮
李冕正
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021185182A1 publication Critical patent/WO2021185182A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • This application relates to the technical field of operation and maintenance of financial technology (Fintech), and in particular to a method and device for anomaly detection.
  • computers can directly handle most of the business, such as detecting whether wealth management products are in a normal transaction state.
  • the detection of transaction status can be monitored by indicators such as transaction volume and transaction delay; when indicators such as transaction volume or transaction delay are monitored.
  • indicators such as transaction volume or transaction delay are monitored
  • an exception occurs, an alarm is generated, and the operation and maintenance personnel can learn that the transaction is abnormal, and restore the normal state of the transaction by repairing abnormal transaction equipment, abnormal transaction procedures, or setting malicious purchase account permissions.
  • the release and promotion effects of the product can also be judged by the abnormality of the monitored transaction volume or transaction delay indicators.
  • the embodiment of the present invention provides a method and device for abnormality detection, which can automatically adjust the abnormality threshold and improve the accuracy of abnormality detection on the basis of occupying a small memory.
  • an embodiment of the present invention provides an abnormality detection method, the method including:
  • For the collected data in the first major cycle determine at least one index value of each minor cycle of the index to be detected in the first major cycle; for at least one index value of each minor cycle, according to preset boundary rules, Determine the boundary index value of the index to be detected in each small period; determine the boundary index value of the index to be detected in the first major period according to the boundary index value of each small period and in accordance with the boundary rule;
  • the boundary index value of the first major cycle determines the credible boundary index value of the first major cycle; the credible boundary index value of the first major cycle is used as abnormal detection of the collected data in the second major cycle
  • the detection threshold; the second large period is the period after the first large period.
  • the first major cycle includes multiple small cycles, and the data in the first major cycle is collected to determine at least one index value in the first major cycle and the middle and small cycles; for each minor cycle the corresponding index value ,
  • the preset boundary rules obtain the boundary index value corresponding to each small period.
  • the index value of each small period can be selected through the boundary rule, and the boundary index value corresponding to each small period can be obtained, so that the boundary index value corresponding to each small period is more accurate.
  • the boundary index value of the first major period is obtained again according to the boundary rule through the boundary index value corresponding to each small period of the first major period, so that the boundary index value of the first major period is more accurate.
  • the credible boundary index value of the first major cycle is obtained according to the boundary index value of the first major cycle, and the credible boundary index value is used as the detection threshold of the data generated later; in this way, the boundary index of the first major cycle
  • relevant adjustments are made to obtain the credible boundary index value, so that the data collected in the first major cycle afterwards can fluctuate in a normal and reasonable range without triggering abnormalities and causing abnormal abnormalities.
  • the adaptive adjustment of the detection threshold is realized, the accuracy of the detection threshold is improved, and the accuracy of data anomaly detection is further improved.
  • the boundary rule is that for a set of data, the boundary value is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum The value is determined as the boundary value of the group of data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data Certainly.
  • the boundary value is selected by boundary rules; specifically, it is judged whether the number of data in the maximum density area is greater than the density threshold. If the number of data in the maximum density area is greater than the density threshold, it can be considered as being at the maximum value. The density of nearby data conforms to the law of data change, and the maximum value is reasonable data, which can be used as a boundary value. If the number of data in the maximum density area is less than the density threshold, it can be determined that the amount of data near the maximum value is abnormally small and does not conform to the law of data change. The maximum value is likely to be abnormal data, and the maximum value is deleted and restarted.
  • Re-determine the maximum value in the set of data continue to determine whether the number of data in the density area of the re-determined maximum value is greater than the density threshold, until the number of data in the density area of the determined maximum value is greater than the density threshold, then the maximum The value is the boundary value. Therefore, the accuracy of determining the boundary value can be improved, and the accuracy of the subsequent calculation of the detection threshold can be prevented because the boundary value is abnormal data.
  • the density area is determined in the following ways, including:
  • the density area is determined.
  • the density area is centered on the maximum value, and the area radius is the area determined by the radius.
  • the maximum and minimum values of a set of data determine the number of partitions of the set of data, and determine the area radius through the maximum, minimum, and number of partitions, so that the obtained area radius can more accurately divide the density area, so that the density
  • the region can accurately characterize the data distribution characteristics near the maximum value, and then determine whether the data near the maximum value is abnormal, determine whether the maximum value conforms to the data change law, and then determine whether the maximum value is applied to the subsequent detection threshold calculation, so , Increase the accuracy of the detection threshold calculation.
  • determining the credible boundary index value of the first major period according to the boundary index value of the first major period includes:
  • up_boundary is the credible boundary index value of the first major cycle
  • up_p is the boundary index value of the first major cycle
  • K is the fluctuation coefficient
  • eps is the area radius, according to the small cycle in the first major cycle
  • the maximum value, minimum value of the boundary index value and the number of partitions are determined; the number of partitions is determined according to the number of data of the boundary index value of the small period in the first major period; base is determined according to the maximum value and The minimum value is determined.
  • the credible boundary index value is equal to the sum of the boundary index value, twice the area radius and base.
  • the boundary index value plus twice the area radius can set a reasonable fluctuation interval for subsequent data on the basis of the boundary index value, and the setting of base can increase a small amount of accidental data on the basis of the reasonable fluctuation interval to ensure abnormality. In the case of detection accuracy, the false alarm rate is reduced.
  • it also includes:
  • the maximum value in the group of data is used as the boundary value of the group of data.
  • the maximum value in the set of data can be used as the boundary value to prevent the detection threshold from being unable to be calculated because the boundary value is empty.
  • the density threshold is set according to the data volume of the set of data, including:
  • the density threshold is determined by the amount of data, which can increase the rationality and accuracy of the density threshold.
  • each small period in the first large period is the same time period.
  • an embodiment of the present invention provides an abnormality detection device, the device includes:
  • An acquisition unit configured to determine at least one index value of each small period of the indicator to be detected in the first large period for the collected data in the first large period;
  • the processing unit is configured to determine the boundary index value of the indicator to be detected in each small period for at least one index value of each small period according to preset boundary rules; according to the boundary index value of each small period, according to the The boundary rule, determining the boundary index value of the index to be detected in the first major period;
  • the processing unit is further configured to determine the credible boundary index value of the first major cycle according to the boundary index value of the first major cycle; the credible boundary index value of the first major cycle is used as a comparison A detection threshold for abnormal detection of collected data in a large period; the second large period is a period after the first large period.
  • an embodiment of the present application further provides a computing device, including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect
  • a computing device including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect
  • embodiments of the present application also provide a computer-readable non-volatile storage medium, including computer-readable instructions.
  • the computer reads and executes the computer-readable instructions, the computer executes the same as in the first aspect.
  • FIG. 1 is a schematic diagram of an anomaly detection architecture provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an abnormality detection method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a table of average delay data collected in a small period according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a curve of average delay data collected in a small period according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of yet another anomaly detection method provided by an embodiment of the present invention.
  • Fig. 6 is a schematic diagram of an abnormality detection device provided by an embodiment of the present invention.
  • Fig. 1 is an anomaly detection system architecture provided by an embodiment of the present invention.
  • the data collection module 101 sends the collected data of each small cycle or the data of the first large cycle collected to the calculation module 102; the calculation module 102 collects data according to the The data of each small cycle sent by the data module 101 calculates the boundary index value of each small cycle, and then calculates the boundary index value of the first major cycle according to the boundary index value of each minor cycle in the first major cycle, and further calculates the boundary index value of the first major cycle.
  • the calculation module 102 sends the obtained credible boundary index value to the detection module 103, and the detection module 103 uses the credible boundary index value as the detection threshold to detect whether the data in the second largest cycle is abnormal.
  • the embodiment of the present application provides a method flow of anomaly detection, as shown in FIG. 2, including:
  • Step 201 With regard to the collected data in the first major cycle, determine at least one index value of each minor cycle of the index to be detected in the first major cycle;
  • the first major cycle can be ten days, one week, four days, three days, etc.
  • the minor cycle can be six hours, three hours, one hour, etc.
  • the first major cycle and the minor cycle can be based on the laws and specifics of past data You need to wait for the setting, and there is no restriction on the specifics.
  • the index to be detected is the index corresponding to the need to be detected, for example, the operation index of the lithography equipment of the chip: resolution, alignment accuracy, etc.
  • Banks newly launched transaction indicators for wealth management products transaction volume, transaction delay, etc.
  • the collected data is the data generated corresponding to the indicators to be detected. For example, the resolution and alignment accuracy of the chip's lithography equipment need to be collected to determine whether the accuracy of the lithography equipment is qualified.
  • the index value is the value that reflects the operating level, such as the maximum value and the minimum value in the index data in a small period.
  • Step 202 For at least one indicator value of each small period, determine the boundary indicator value of the indicator to be detected in each small period according to a preset boundary rule;
  • the boundary index value is the upper boundary index value, lower boundary index value, discrete boundary, convergence boundary, conversion boundary, etc. in the index data group to be detected collected in the corresponding small period.
  • the specific boundary index value type is based on the index to be detected The data characteristics and change rules of the data are determined, and the specifics are not limited.
  • the preset boundary rule may be that the boundary value of a group of data is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum value is determined as the The boundary value of the group of data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • the preset boundary rule may be that the boundary value of a set of data is to determine the density area from the minimum value in the data; if the number of data in the density area is greater than the density threshold, the minimum value is determined as The boundary value of the group of data; otherwise, the minimum value is deleted from the group of data, and the step of determining the density area from the minimum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • the preset boundary rule is the rule of how to obtain the boundary value. Determine the maximum value and the density area of the maximum value from a set of data for which the boundary value needs to be obtained; if the number of data in the density area of the maximum value is greater than the density threshold , The maximum value is the boundary value of the group of data; otherwise, the number of data in the density area of the maximum value is less than the density threshold, the maximum value is deleted from the group of data; in the group where the maximum value is deleted Determine the maximum value in the data, and continue to determine whether the number of data in the density area of the maximum value determined the second time is greater than the density threshold; if it is greater, the maximum value determined the second time is the boundary value of the group of data; otherwise, Delete the maximum value determined for the second time; determine the maximum value in the group of data where the maximum value determined for the first time and the maximum value determined for the second time have been deleted; continue to determine the data in the density area of the maximum value determined for the third time Whether the number of is greater than the density threshold;
  • the minimum value and the density area of the minimum value are determined from a group of data for which the boundary value needs to be obtained; if the number of data in the minimum density area is greater than the density threshold, the minimum value is the boundary value of the group of data; Otherwise, the number of data in the density area of the minimum value is less than the density threshold, the minimum value is deleted from the group of data; the minimum value is determined in the group of data where the minimum value is deleted, and the second determination is continued.
  • the minimum value determined for the second time is the boundary value of the group of data; otherwise, the minimum value determined for the second time is deleted; Determine the minimum value in the set of data of the minimum value determined at the first time and the minimum value determined at the second time, and continue to determine whether the number of data in the density area of the minimum value determined at the third time is greater than the density threshold; until the minimum determined If the number of data in the density region of the value is greater than the density threshold, the final minimum value is used as the boundary value, that is, the boundary index value.
  • the density threshold is set according to the data volume of the set of data, including:
  • the corresponding function can be set according to the data characteristics such as the density change rate of the collected data and the square difference between the data.
  • the density threshold is determined by a power function, or the density threshold is determined by a logarithmic function.
  • the coefficient value of the correlation coefficient is set according to actual needs, and the calculation method of the density threshold is not specifically limited.
  • the density area is determined by the following methods, including: determining the number of partitions of the group of data according to the maximum and minimum values of the group of data; determining the area according to the maximum and minimum values of the group of data and the number of partitions Radius; taking the maximum value as the center and the area radius as the radius to determine the density area.
  • the characteristics of the index data can be analyzed according to the index data of the same type or different types in history, but with the same changing law, and the corresponding operation can be determined through the maximum value and the minimum value to obtain the corresponding partition number, so that the number of partitions and the number of partitions
  • the area radius determined by the maximum and minimum values of the group data can more accurately divide the density area, so that the number of data in the density area of the index value can accurately represent whether the index value is abnormal, and the rationality of the index value can be accurately judged.
  • FIG. 3 is a schematic diagram of a table of average delay data collected in a small period according to an embodiment of the present invention, as shown in the figure. As shown in 3, real-time collection of transactions that occur within each minute, and calculation of the average delay generated by transactions within each minute. Among them, the shaded part has the maximum average delay of 60ms, the minimum average delay of 10ms, and the average delay of 10ms. The average value of the maximum value and the minimum value is 35ms, and the number of partitions can be determined by the following formula:
  • block_edge [(-1,4),(50,4),(200,6),(500,8),(1000,10),(2000,14),(mean+1,14)]
  • mean+1 is the average value. Therefore, in the formula (-1,4), (50,4), that is, the number of partitions corresponding to the mean value +1 greater than -1 and less than 50 is 4. (50,4),(200,6), that is, the number of partitions corresponding to the mean value +1 greater than 50 and less than 200 is 4. (200,6),(500,8), that is, the mean value +1 is greater than 200 and less than 500, the number of partitions corresponding to 6,...(2000,14),(mean+1,14)], that is, the mean value + The number of partitions with 1 greater than 2000 is 14. Then the number of partitions corresponding to an average value of 35+1 is 4.
  • the area radius is determined according to the maximum and minimum values of the set of data and the number of partitions.
  • the difference between the maximum value and the minimum value is divided by the number of partitions to get the area radius.
  • the density area is the interval where the average delay is [47.5(60-12.5), 72.5(60+12.5)].
  • FIG. 4 is a schematic diagram of the curve of the average delay data collected in a small period according to an embodiment of the present invention.
  • the average delay of 60ms is only the maximum average delay of 60ms between the baseline ( ⁇ ) 72.5 and the baseline ( ⁇ ) 47.5, that is, the number of data in the density area is 0.
  • the maximum average delay of 60ms is deleted from the data group of this small period, and the maximum average delay is determined to be 40ms again.
  • the minimum delay is 10ms, and the average of the maximum and minimum average delay is 25ms.
  • the number of partitions is determined by the block_edge formula to be 4.
  • the density area is the interval where the average delay is [32.5, 47.5].
  • the maximum average delay is 40ms.
  • the average delay is 36ms, and the average delay is 40ms.
  • the delay is 36ms and the average delay is 35ms, that is, the number of data in the density area is 3.
  • the density threshold is:
  • the number of data 3 in the density area is greater than the density threshold 2.7708520116421, and the average delay 40ms is the boundary index value. It can also be called the upper boundary index value.
  • the density threshold can be calculated according to the number of data without the maximum deleted.
  • eps can be calculated using the deleted maximum value and the minimum value in the group of data.
  • the calculation method of the density threshold and the area radius is not specifically limited.
  • the operation and maintenance personnel can quickly identify whether the transaction delay data generated in the first major cycle can be applied by setting a preset ratio; for example, If the ratio of the time when no transaction is generated in a week to the time when a transaction occurs in a week is less than 15%, the boundary indicator value of the transaction delay can be calculated based on the transaction delay data of this week; or, for the corresponding daily in a week There will be very little data in the time period. For example, when the number of transactions is small at night or in the early morning, you can set the preset ratio of the time when no transaction is generated to the time when the transaction is generated according to the user's schedule and habits to determine the transaction delay. Whether the data can be used to calculate the boundary index value.
  • the maximum value in the group of data is taken as the boundary value of the group of data. That is to say, when the number of data in the small period is small or the data characteristics are special, and the boundary value cannot be calculated according to the preset boundary rules, the maximum value in the group of data is used as the boundary value of the group of data.
  • Step 203 Determine the boundary index value of the index to be detected in the first major period according to the boundary index value of each small period according to the boundary rule;
  • step 202 the boundary index value for each small period has been calculated, the boundary index value of each small period in the first large period is taken, and the boundary of the indicator to be detected in the first large period is calculated according to the boundary rules. Index value.
  • the boundary rule may be that the boundary value of a group of data is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum value is determined as the group The boundary value of the data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • the preset boundary rule may be that the boundary value of a set of data is to determine the density area from the minimum value in the data; if the number of data in the density area is greater than the density threshold, the minimum value is determined as the The boundary value of the group of data; otherwise, the minimum value is deleted from the group of data, and the step of determining the density area from the minimum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • a set of data is the boundary index value of each small period corresponding to the first large period; wherein, each small period may be all small periods in the first large period. For example, if the first major cycle is seven days a week, and the minor cycle is one hour, then the number of minor cycles included in the first major cycle is 7 ⁇ 24, corresponding to 168 boundary index values, then these 168 boundary index values are a group According to the data, the boundary index value of the first major period is calculated through these 168 boundary index values.
  • the small periods in the first large period may be the same time period. For example, the first major cycle is seven days a week, and the minor cycle is one hour.
  • the first major cycle includes 7 minor cycles from 0 o'clock to 1 o'clock, and 1 o'clock. 7 small cycles at -2 o'clock, 7 small cycles at 2 o'clock-3 o'clock...7 small cycles at 23 o'clock -24 o'clock. Therefore, the boundary index value of the 7 small periods at 0:00 -1 is a set of data, the boundary index value of the 7 small periods at 1:00 -2 is a set of data, and the 7 small periods at 2:3 am The boundary index value of is a set of data... The boundary index value of the 7 small periods from 23 am to 24 am is a set of data.
  • the boundary index value corresponding to 0 o'clock -1 in the first major period is determined, the boundary index value corresponding to 0 o'clock -1 in the first major period, and the boundary index corresponding to 1 o'clock -2 in the first major period.
  • Index value is the boundary index value corresponding to 23 o'clock -24 o'clock in the first major period. It can be so, for each period of the day in the first major cycle, the corresponding boundary index value of the first major cycle will be obtained. If the analysis determines that there is almost no transaction in the small period of a time period in a day, or the transaction volume is very small, then the boundary index value of the small period of the time period may not be calculated.
  • step 202 the method of determining the boundary value based on a set of data has been described in detail in step 202.
  • the set of data in the small period in step 202 is replaced with the set of data in the first large period mentioned above for calculation to determine the first period.
  • the boundary index value of a large period will not be repeated here.
  • the density area is determined by the following methods, including:
  • the density area is determined.
  • the number of each small cycle in the first major cycle can be determined.
  • Data characteristics such as quantity and square difference of data determine the function of calculating the density threshold.
  • the density threshold is determined by a power function, or the density threshold is determined by a logarithmic function.
  • the function of the density threshold calculated in the first large period may be the same as or different from the function of the density threshold calculated in the small period.
  • the calculation method of the density threshold is not specifically limited.
  • the density area is determined by the following methods, including: determining the number of partitions of the group of data according to the maximum and minimum values of the group of data; determining the area according to the maximum and minimum values of the group of data and the number of partitions Radius; taking the maximum value as the center and the area radius as the radius to determine the density area.
  • the method of determining the density area based on a set of data has been described in detail in step 202.
  • the set of data in the small period in step 202 is replaced with the boundary index value corresponding to each small period in the first large period.
  • the formed set of data is calculated to determine the density area of the first major period, which will not be repeated here.
  • the boundary index value of the small period (assuming the small period is the small period from 0:00 to 1:00 on Monday of the week) is 40ms. If each small period is the same time period, the small period boundary index value of 0:00-1:00 on Tuesday of the week is obtained in the same way, and the small period boundary index value of 0:00-1:00 on Wednesday of the week is 45ms, The index value of the small cycle boundary at 0:00-1:00 on mid-Thursday of the week is 47.5ms, the index value of the small cycle boundary at 0:00-1:00 on Friday of the week is 48ms, and the small cycle boundary at 0:00-1:00 on Saturday of the week.
  • the index value is 48.5ms, and the index value of the small period boundary at 0:00-1:00 on Sunday of the week is 55ms. In this group of data, the maximum value is 55ms, the minimum value is 40ms, and the average value is 47.5ms.
  • the density threshold can be calculated according to the number of data without the deleted maximum.
  • eps can be calculated using the deleted maximum value and the minimum value in the group of data.
  • the calculation method of the density threshold and the area radius is not specifically limited.
  • the maximum value in the group of data is taken as the boundary value of the group of data. That is to say, when the number of data in the first major period is small or the data characteristics are special, and the boundary value cannot be calculated according to the preset boundary rule, the maximum value in the group of data is used as the boundary value of the group of data.
  • Step 204 Determine the credible boundary index value of the first major period according to the boundary index value of the first major period
  • the boundary index value of the first major cycle can be used as the credible boundary index value of the first major cycle, or some simple calculations can be performed on the basis of the boundary index value of the first major cycle to obtain the credible boundary index value. Increase the reliability of the credible boundary index value.
  • determining the credible boundary index value of the first major period according to the boundary index value of the first major period includes:
  • up_boundary is the credible boundary index value of the first major cycle
  • up_p is the boundary index value of the first major cycle
  • K is the fluctuation coefficient
  • eps is the area radius, according to the small cycle in the first major cycle
  • the maximum value, minimum value of the boundary index value and the number of partitions are determined; the number of partitions is determined according to the number of data of the boundary index value of the small period in the first major period; base is determined according to the maximum value and The minimum value is determined.
  • the coefficient 0.1 can be adjusted accordingly according to the analysis of engineers and technicians, and it is just as a realization possibility here.
  • Step 205 The credible boundary index value of the first major period is used as a detection threshold for abnormal detection of the collected data in the second major period; the second major period is the period after the first major period.
  • the second major cycle may be a major cycle immediately after the first major cycle, for example, the first week of January is the first major cycle, and the second week of January is the second major cycle.
  • the second major cycle may be a major cycle that is not immediately after the first major cycle, for example, the first week of January is the first major cycle, and the fourth week of January is the second major cycle.
  • the second major cycle can be a major cycle that overlaps with the first major cycle. For example, Monday to Sunday in the first week of January is the first major cycle, and Tuesday to Sunday in the first week of January Adding the Monday of the second week of January is the second major cycle.
  • the credible boundary index value determined by using 168 data in the first major cycle can be used to detect the data generated at each moment (ms ⁇ s ⁇ min, etc.) in the second major cycle.
  • the credible boundary index value determined by the 7 data of the corresponding period in the first major cycle can be used to detect the data generated at each time (ms ⁇ s ⁇ min, etc.) of the corresponding period in the second major cycle.
  • the data in the corresponding time period in the second major period is greater than the credible boundary index value, it can be determined that the data is abnormal and triggers an alarm.
  • the credible boundary indicator of a certain period of a certain day in the first major cycle can be used to detect the data generated in the corresponding period of the second major cycle corresponding to this day; for example, 0- on the first day of the first major cycle
  • the credible boundary index at 1 o'clock is a, which can be used to detect the data generated at 0-1 on the first day of the second major cycle
  • the credible boundary index at 0-1 on the second day of the first major cycle is b. It can be used to detect the data generated at 0-1 on the second day of the second major cycle.
  • the cycle length of the first major cycle and the cycle length of the second major cycle can be the same or different.
  • the first major cycle is two days, the second major cycle is one day, or the first major cycle is One day, the second largest cycle is two days.
  • the data can be updated in time by shortening the period length of the first major cycle and the period length of the second major cycle; for example, the first major cycle is one day, the second major cycle is one day, and the data can be updated in a certain period of the first major cycle.
  • the signal boundary index value detects the data generated in the corresponding time period in the second largest period.
  • the time period of the first major cycle for determining the credible boundary index value is not specifically limited, and the time period of the second major cycle using the credible boundary index value to detect is not specifically limited, and it can be flexibly adjusted according to the characteristics of data fluctuations or needs. .
  • the first major cycle includes multiple small cycles, and the data in the first major cycle is collected to determine at least one index value in the first major cycle and the middle and small cycles; for each minor cycle the corresponding index value ,
  • the preset boundary rules obtain the boundary index value corresponding to each small period.
  • the index value of each small period can be selected through the boundary rule, and the boundary index value corresponding to each small period can be obtained, so that the boundary index value corresponding to each small period is more accurate.
  • the boundary index value of the first major period is obtained again according to the boundary rule through the boundary index value corresponding to each small period of the first major period, so that the boundary index value of the first major period is more accurate.
  • the credible boundary index value of the first major cycle is obtained according to the boundary index value of the first major cycle, and the credible boundary index value is used as the detection threshold of the data generated later; in this way, the boundary index of the first major cycle
  • relevant adjustments are made to obtain the credible boundary index value, so that the data collected in the first major cycle afterwards can fluctuate in a normal and reasonable range without triggering abnormalities and causing abnormal abnormalities.
  • the adaptive adjustment of the detection threshold is realized, the accuracy of the detection threshold is improved, and the accuracy of anomaly detection is further improved.
  • the embodiment of the present application provides yet another method flow of anomaly detection, as shown in FIG. 5, including:
  • Step 501 Set the first large period and each small period.
  • setting the first major period includes setting period size, period start time, period end time and other related parameters.
  • Setting each small period includes setting period size, period start time and period end time and other related parameters.
  • each small period is all small periods in the first large period, or each small period in the first large period is the same time period, etc. The relationship between each small period and the first large period can be flexibly set.
  • Step 502 Collect data in each small period.
  • determine the product or equipment that needs to be detected data determine the to-be-detected index of the product or equipment, and perform data collection on the to-be-detected index.
  • Step 503 According to the number of data in each small period, respectively calculate the density threshold corresponding to each small period.
  • Step 504 Determine the maximum value and the minimum value in the data of each small period respectively.
  • Step 505 Determine the number of partitions corresponding to the data in each small period according to the maximum value and the minimum value in the data of each small period, and determine the area radius of each small period according to the maximum value, minimum value and the number of partitions of each small period. , Determine the density area of each small period according to the maximum value of each small period and the area radius.
  • Step 506 Determine the number of data in the density area corresponding to the maximum value of each small period.
  • Step 507 Determine whether the number of data in the density area of each small period is greater than the density threshold. If not, proceed to step 508 to delete the maximum value in the small period corresponding to the number of data in the density area less than the density threshold from the small period, re-determine the maximum value, and perform steps 505, 506, and 507 until the small period The number of data in the density area of the cycle is greater than the density threshold. If yes, go to step 509.
  • Step 509 Use the maximum value corresponding to the density area when the number of data in the density area of the small period is greater than the density threshold as the boundary index value of the small period, and similarly obtain the boundary index value of each small period.
  • Step 510 Determine the number of the boundary index values of the small period in the first large period, and determine the density threshold.
  • Step 511 Determine the maximum value and the minimum value of the boundary index value of the small period in the first large period.
  • Step 512 Determine the number of partitions corresponding to the small period boundary index value data in the first major period according to the maximum value and the minimum value, determine the area radius according to the maximum value, minimum value and the number of partitions, and then determine the first major period according to the area radius The density area.
  • Step 513 Determine the number of data in the density area of the first major period.
  • Step 514 Determine whether the number of data in the density area of the first major period is greater than the density threshold of the first major period. If not, proceed to step 515 to delete the number of data in the density area less than the maximum value in the first major period corresponding to the density threshold from the data group in the first major period, re-determine the maximum value, and perform step 511, 512, 513, until the number of data in the density area of the first major period is greater than the density threshold. If yes, go to step 516.
  • Step 516 When the number of data in the density area of the first major period is greater than the density threshold, the maximum value of the density area corresponding to the first major period is used as the boundary index value of the first major period.
  • Step 517 Obtain the credible boundary index value of the first major cycle according to the boundary index value of the first major cycle.
  • Step 518 Monitor and detect the data of the second major cycle according to the credible boundary index value of the first major cycle, and detect whether the data of the second major cycle is abnormal.
  • step 501 and step 502 step 502 can be performed first, and then step 501 can be performed.
  • FIG. 6 is a schematic diagram of an abnormality detection device provided by an embodiment of the application, as shown in FIG. 6, including:
  • the collection unit 601 is configured to determine at least one index value of each small period of the index to be detected in the first large period for the collected data in the first large period;
  • the processing unit 602 is configured to determine the boundary index value of the indicator to be detected in each small period for at least one index value of each small period according to preset boundary rules; according to the boundary index value of each small period, according to The boundary rule determines the boundary index value of the index to be detected in the first major period;
  • the processing unit 602 is further configured to determine the credible boundary index value of the first major cycle according to the boundary index value of the first major cycle; the credible boundary index value of the first major cycle is used as a reference value A detection threshold for abnormality detection of collected data in two major periods; the second major cycle is a cycle after the first major cycle.
  • the boundary rule is that for a set of data, the boundary value is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum The value is determined as the boundary value of the group of data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data Certainly.
  • the density area is determined in the following ways, including:
  • the density area is determined.
  • processing unit 602 is specifically configured to calculate by the following formula:
  • up_boundary is the credible boundary index value of the first major cycle
  • up_p is the boundary index value of the first major cycle
  • K is the fluctuation coefficient
  • eps is the area radius, according to the small cycle in the first major cycle The maximum value, the minimum value and the number of partitions in the boundary index values of ?? And the minimum value is determined.
  • processing unit 602 is further configured to:
  • the maximum value in the group of data is used as the boundary value of the group of data.
  • the density threshold is set according to the data volume of the set of data, including:
  • each small period in the first large period is the same time period.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明实施例提供一种异常检测的方法及装置,该方法包括:针对第一大周期内的采集数据,确定待检测指标在第一大周期内的各小周期的至少一个指标值;针对每个小周期的至少一个指标值,根据预设的边界规则,确定待检测指标在每个小周期的边界指标值;根据各小周期的边界指标值,按照边界规则,确定待检测指标在第一大周期的边界指标值;根据第一大周期的边界指标值,确定第一大周期的可信边界指标值;第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;第二大周期为第一大周期之后的周期。采用上述方法,实现了检测阈值的自适应调整,提高了检测阈值的准确性,进一步提高了异常检测的准确率。

Description

一种异常检测的方法及装置
相关申请的交叉引用
本申请要求在2020年03月19日提交中国专利局、申请号为202010196303.0、申请名称为“一种异常检测的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及金融科技(Fintech)的运维技术领域,尤其涉及一种异常检测的方法及装置。
背景技术
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性、实时性要求,也对技术提出更高的要求。在网络迅速发展的今天,已经实现可以通过计算机直接处理大部分金融业务,这种方式极大地节省了人力资源,又可以快速且准确的处理金融业务,提高了金融业务处理的精确性和实时性。
当前,计算机可以直接处理大部分业务,比如对理财产品是否处于正常交易状态进行检测,交易状态的检测可以通过交易量、交易时延等指标进行监控;当监控到交易量或交易时延等指标异常时产生告警,运维人员可以获知交易发生异常,通过维修异常交易设备、异常交易程序或设定恶意购买账户权限等方式恢复交易正常状态。也可以通过监控到的交易量或交易时延等指标的异常判断产品的发布、推广效果等。
现有两种监控方法,一种为人工设定阈值,当交易量或时延超过对应阈值后,产生告警;但这种方式由于过于死板,需要人工不断调整,浪费人工成本。另一种方式为利用历史数据通过模型(如LSTM,ARIMA等)进行无监督学习其变化规律,并设置阈值;但这种方式应用的模型需要较大的内存存储模型参数,且对于波动很大且随机的数据,误告率非常高。
因此,现在亟需一种异常检测的方法及装置,能够自动调整异常阈值,且在占用较小内存的基础上,提高异常检测的准确率。
发明内容
本发明实施例提供一种异常检测的方法及装置,能够自动调整异常阈值,且在占用较小内存的基础上,提高异常检测的准确率。
第一方面,本发明实施例提供一种异常检测的方法,该方法包括:
针对第一大周期内的采集数据,确定待检测指标在所述第一大周期内的各小周期的至少一个指标值;针对每个小周期的至少一个指标值,根据预设的边界规则,确定所述待检测指标在每个小周期的边界指标值;根据各小周期的边界指标值,按照所述边界规则,确定所述待检测指标在所述第一大周期的边界指标值;根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值;所述第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;所述第二大周期为所述第一大周期之后的周期。
采用上述方法,第一大周期内包含多个小周期,采集第一大周期内的数据,确定第一大周期及中各个小周期内的至少一个指标值;对于每个小周期对应的指标值,根据预设边界规则,获取每个小周期对应的边界指标值。如此,可以通过边界规则对每个小周期的指标值进行选择,得到每个小周期对应的边界指标值,使得每个小周期对应的边界指标值更准确。进一步,通过第一大周期的各个小周期对应的边界指标值,再次按照边界规则得到第一大周期的边界指标值,使得第一大周期的边界指标值更准确。更进一步的,根据第一大周期的边界指标值得到第一大周期的可信边界指标值,以可信边界指标值作为之后产生的数据的检测阈值;如此,在第一大周期的边界指标值的准确性基础上做相关调整得到可信边界指标值,使得之后的第一大周期采集的数据可以在正常合理的范围波动而不会触发异常,造成异常误报。实现了检测阈值的自适应调整,提高了检测阈值的准确性,进一步提高了数据异常检测的准确率。
在一种可能的设计中,所述边界规则为对于一组数据的边界值是从数据中的最大值开始确定密度区域;若密度区域内的数据的个数大于密度阈值,则将所述最大值确定为该组数据的边界值;否则从该组数据中将所述最大值删除,返回从数据中的最大值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。
采用上述方法,通过边界规则选择边界值;具体来说,判断最大值密度区域内的数据个数是否大于密度阈值,若是最大值密度区域内的数据个数大于密度阈值;则可以认为在最大值附近的数据密度符合数据变化规律,最大值为合理数据,可以作为边界值。若是最大值密度区域内的数据个数小于密度阈值,就可以判定该最大值附近的数据量异常小,不符合数据变化规律,该最大值很可能是异常数据,则将该最大值删除并重新在该组数据中重新确定最大值,继续判断重新确定的最大值的密度区域的数据个数是否大于密度阈值, 直到确定的最大值的密度区域中的数据个数大于密度阈值,则以该最大值为边界值。因此,可以提高确定边界值的准确性,防止因为边界值为异常数据,而影响后续计算检测阈值的准确性。
在一种可能的设计中,所述密度区域通过如下方式确定,包括:
根据该组数据的最大值、最小值,确定该组数据的分区数;
根据该组数据的最大值、最小值和所述分区数,确定区域半径;
以所述最大值为中心,所述区域半径为半径,确定所述密度区域。
采用上述方法,密度区域是以最大值为中心,区域半径为半径确定的区域。其中,根据一组数据的最大值、最小值,确定该组数据的分区数,通过最大值、最小值和分区数确定区域半径,可以使得到的区域半径更能准确划分密度区域,使得该密度区域可以准确表征最大值附近的数据分布特征,进而判断该最大值附近的数据是否处于异常,确定该最大值是否符合数据变化规律,再确定该最大值是否应用到后续检测阈值的计算中,如此,增加检测阈值计算的准确性。
在一种可能的设计中,根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值,包括:
通过如下公式计算:
up_boundary=up_p+K*eps+base
其中,up_boundary为所述第一大周期的可信边界指标值,up_p为所述第一大周期的边界指标值;K为波动系数;eps为区域半径,根据所述第一大周期内小周期的边界指标值的最大值、最小值和分区数确定;所述分区数为根据所述第一大周期内小周期的边界指标值的数据的个数确定的;base为根据所述最大值和所述最小值确定的。
采用上述方法,使可信边界指标值等于边界指标值、两倍的区域半径和base的和。边界指标值加两倍的区域半径,可以使得在边界指标值的基础上设定后续数据的合理波动区间,base的设置则可以使得在合理波动区间的基础上增加小部分偶然数据,在保证异常检测的准确率的情况下,减少误报率。
在一种可能的设计中,还包括:
若根据所述边界规则未确定出边界值,则将该组数据中的最大值作为该组数据的边界值。
采用上述方法,若根据边界规则无法确定出边界值,则可以将该组数据中的最大值作为边界值,防止因为边界值为空,出现无法计算检测阈值的情况。
在一种可能的设计中,所述密度阈值依据该组数据的数据量进行设定,包括:
该组数据的数据量越大,该组数据的密度阈值越大。
采用上述方法,通过数据量决定密度阈值,可以增加密度阈值的合理性与准确性。
在一种可能的设计中,所述第一大周期内的各小周期为同一时段。
第二方面,本发明实施例提供一种异常检测的装置,所述装置包括:
采集单元,用于针对第一大周期内的采集数据,确定待检测指标在所述第一大周期内的各小周期的至少一个指标值;
处理单元,用于针对每个小周期的至少一个指标值,根据预设的边界规则,确定所述待检测指标在每个小周期的边界指标值;根据各小周期的边界指标值,按照所述边界规则,确定所述待检测指标在所述第一大周期的边界指标值;
所述处理单元还用于,根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值;所述第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;所述第二大周期为所述第一大周期之后的周期。
第三方面,本申请实施例还提供一种计算设备,包括:存储器,用于存储程序指令;处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行如第一方面的各种可能的设计中所述的方法。
第四方面,本申请实施例还提供一种计算机可读非易失性存储介质,包括计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行如第一方面的各种可能的设计中所述的方法。
本申请的这些实现方式或其他实现方式在以下实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种异常检测的架构示意图;
图2为本发明实施例提供的一种异常检测方法的流程示意图;
图3为本发明实施例提供的一种小周期内采集的平均时延数据的表格示意图;
图4为本发明实施例提供的一种小周期内采集的平均时延数据的曲线示意图;
图5为本发明实施例提供的又一种异常检测方法的流程示意图;
图6为本发明实施例提供的一种异常检测的装置示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
图1为本发明实施例提供的一种异常检测的***架构,采集数据模块101将采集的每小周期的数据或采集的第一大周期的数据发送至计算模块102;由计算模块102根据采集数据模块101发送的每小周期的数据计算每个小周期的边界指标值,再根据第一大周期中每个小周期的边界指标值计算第一大周期的边界指标值,进一步计算第一大周期的可信边界指标值,计算模块102将得到的可信边界指标值发送至检测模块103,由检测模块103以可信边界指标值为检测阈值,检测第二大周期内的数据是否异常。
基于此,本申请实施例提供了一种异常检测的方法流程,如图2所示,包括:
步骤201、针对第一大周期内的采集数据,确定待检测指标在所述第一大周期内的各小周期的至少一个指标值;
此处,第一大周期可以为十天、一周、四天、三天等,小周期可以为六小时、三小时、一小时等,第一大周期和小周期可以根据以往数据的规律和具体需要等进行设定,具体不做限制。待检测指标为对应需要检测的指标,例如,芯片的光刻设备运行指标:分辨率、对准精度等。银行新推出理财产品的交易指标:交易数量、交易时延等。采集数据为采集对应待检测指标产生的数据,如,芯片的光刻设备的分辨率、对准精度等相关指标需要采集,以判断光刻设备的精确度是否合格。又如理财产品的交易数量、交易时延等相关指标需要采集,以判断处理交易的服务器状态是否良好,或者,是否有恶意购买等情况。指标值则为反映运行水平的量值,如小周期内的指标数据中的最大值、最小值等。
步骤202、针对每个小周期的至少一个指标值,根据预设的边界规则,确定所述待检测指标在每个小周期的边界指标值;
此处,边界指标值为在对应小周期中采集的待检测指标数据组中的上边界指标值、下边界指标值、离散边界、汇聚边界、转换边界等,具体边界指标值类型根据待检测指标的数据特征以及变化规律等确定,具体不做限定。
其中,预设的边界规则可以为对于一组数据的边界值是从数据中的最大值开始确定密度区域;若密度区域内的数据的个数大于密度阈值,则将所述最大值确定为该组数据的边界值;否则从该组数据中将所述最大值删除,返回从数据中的最大值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。或,预设的边界规则可以为,对于 一组数据的边界值是从数据中的最小值开始确定密度区域;若密度区域内的数据的个数大于密度阈值,则将所述最小值确定为该组数据的边界值;否则从该组数据中将所述最小值删除,返回从数据中的最小值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。
此处,预设的边界规则为如何获取边界值的规则,从需要获取边界值的一组数据中确定最大值以及最大值的密度区域;若最大值的密度区域中数据的个数大于密度阈值,则该最大值为该组数据的边界值;否则,最大值的密度区域中数据的个数小于密度阈值,则将该最大值从该组数据中删除;在删除了该最大值的该组数据中确定最大值,继续判断第二次确定的最大值的密度区域中数据的个数是否大于密度阈值;若大于,则该第二次确定的最大值为该组数据的边界值;否则,删除第二次确定的最大值;在删除了第一次确定的最大值和第二次确定的最大值的该组数据中确定最大值;继续判断第三次确定的最大值的密度区域中数据的个数是否大于密度阈值;直到所确定的最大值的密度区域中数据的个数大于密度阈值,则将最后确定的最大值作为边界值,即边界指标值。相应的,从需要获取边界值的一组数据中确定最小值以及最小值的密度区域;若最小值的密度区域中数据的个数大于密度阈值,则该最小值为该组数据的边界值;否则,最小值的密度区域中数据的个数小于密度阈值,则将该最小值从该组数据中删除;在删除了该最小值的该组数据中确定最小值,继续判断第二次确定的最小值的密度区域中数据的个数是否大于密度阈值;若大于,则该第二次确定的最小值为该组数据的边界值;否则,删除第二次确定的最小值;在删除了第一次确定的最小值和第二次确定的最小值的该组数据中确定最小值,继续判断第三次确定的最小值的密度区域中数据的个数是否大于密度阈值;直到所确定的最小值的密度区域中数据的个数大于密度阈值,则将最后确定的最小值作为边界值,即边界指标值。
其中,所述密度阈值依据该组数据的数据量进行设定,包括:
该组数据的数据量越大,该组数据的密度阈值越大。
此处,可以根据采集数据的密度变化速率、数据间平方差等数据特征设置对应函数,如,通过一次函数确定密度阈值:密度阈值=相关系数*数据个数。或者,通过幂函数确定密度阈值,或者,通过对数函数确定密度阈值。相关系数的系数值根据实际需要设置,密度阈值的计算方法具体不做限定。
其中,所述密度区域通过如下方式确定,包括:根据该组数据的最大值、最小值,确定该组数据的分区数;根据该组数据的最大值、最小值和所述分区数,确定区域半径;以所述最大值为中心,所述区域半径为半径,确定所述密度区域。
此处,可以根据历史同类或不同类,但具有相同变化规律的指标数据分析指标数据的 特征,确定通过最大值、最小值进行相应的运算可以得到对应的分区数,使得通过该分区数和该组数据的最大值、最小值确定的区域半径,能更准确的划分密度区域,使得指标值的密度区域中的数据个数能够准确表征该指标值是否异常,能准确判断指标值的合理性。
举个例子,对于新推出的理财产品A,对其交易时延进行检测,若交易时延过长则有可能是交易设备产生故障,若交易时延过短(交易时延大于0时),则可能是有人恶意占用交易***。这里以监测交易时延过长为例,小周期为一小时,第一大周期为一周,图3为本发明实施例提供的一种小周期内采集的平均时延数据的表格示意图,如图3所示,实时采集每分钟内发生的交易,计算每分钟内的交易产生的平均时延,其中,阴影部分有平均时延的最大值60ms,平均时延的最小值10ms,平均时延的最大值和最小值的平均值为35ms,可以通过如下公式确定分区数:
block_edge:分区数
block_edge=[(-1,4),(50,4),(200,6),(500,8),(1000,10),(2000,14),(mean+1,14)]
其中,mean+1为平均值,因此,该公式中(-1,4),(50,4),即为均值+1大于-1小于50对应的分区数为4。(50,4),(200,6),即为均值+1大于50小于200对应的分区数为4。(200,6),(500,8),即为均值+1大于200小于500对应的分区数为6,...(2000,14),(mean+1,14)],即为均值+1大于2000的分区数为14。则均值35+1对应的分区数为4。
再根据该组数据的最大值、最小值和所述分区数,确定区域半径。
如下公式确定区域半径:
Eps:半径
eps=(df_detect[′value′].max()-df_detect[′value′].min())/block_num
即将最大值减最小值的差,除以分区数,便可以得到区域半径。
Figure PCTCN2021080564-appb-000001
密度区域是平均时延为[47.5(60-12.5),72.5(60+12.5)]的区间,图4为本发明实施例提供的一种小周期内采集的平均时延数据的曲线示意图;最大平均时延60ms在基准线(□)72.5和基准线(△)47.5之间只有最大平均时延60ms,也就是说,密度区域内的数据个数为0。
如下公式确定密度阈值:
minsample=log(数据个数)+1
minsample=log(60)+1=2.7781512503836
因此,可以断定最大平均时延60ms的密度区域内的数据个数小于密度阈值,因此, 将最大平均时延60ms,从该小周期的数据组中删除,再次确定最大平均时延为40ms,平均时延的最小值10ms,平均时延的最大值和最小值的平均值为25ms,由block_edge公式确定分区数为4,
Figure PCTCN2021080564-appb-000002
密度区域为平均时延为[32.5,47.5]的区间,图4中最大平均时延40ms在基准线(△)47.5和基准线(×)32.5之间有三个数据,平均时延36ms、平均时延36ms、平均时延35ms,也就是说,密度区域内的数据个数为3。
将最大平均时延60ms删除后该小周期内的数据个数相应减1,则密度阈值为:
minsample=log(60-1)+1=2.7708520116421。
密度区域内的数据个数3大于密度阈值2.7708520116421,则平均时延40ms为边界指标值。也可以叫做上边界指标值。
这里需要说明的是,当小周期内的数据个数很多,而删除的最大值的个数不足以影响密度阈值的计算时,可以按照未删除最大值的数据个数计算密度阈值。在特殊情况下,如数据比较平稳,数据间的差值相差不大时,eps可以应用删除的最大值和该组数据中的最小值计算。此处,密度阈值和区域半径的计算方式,具体不做限定。另外,当开始推出的理财产品A的交易量少,或交易情况不稳定时,运维人员可以通过设置预设比例快速识别第一大周期内的产生的交易时延数据是否可以应用;例如,一个星期内未产生交易的时间与一个星期内产生交易的时间比例小于15%,则可以基于这一星期的交易时延数据计算交易时延的边界指标值;或者,对于一星期内每天对应的时间段会出现数据量极少,如夜间或凌晨交易数量少时,可以根据用户作息与习惯对每天对应时间段分别设置未产生交易的时间与产生交易的时间的预设比例,以判断交易时延数据是否可以用于计算边界指标值。
其中,若根据所述边界规则未确定出边界值,则将该组数据中的最大值作为该组数据的边界值。也就是说,当小周期内的数据个数很少或数据特征特殊,无法根据预设的边界规则计算边界值时,则将该组数据中的最大值作为该组数据的边界值。
步骤203、根据各小周期的边界指标值,按照所述边界规则,确定所述待检测指标在所述第一大周期的边界指标值;
此处,在步骤202中已经计算出针对每个小周期的边界指标值,取第一大周期内的各个小周期的边界指标值,按照边界规则计算该待检测指标在第一大周期的边界指标值。
其中,所述边界规则可以为对于一组数据的边界值是从数据中的最大值开始确定密度区域;若密度区域内的数据的个数大于密度阈值,则将所述最大值确定为该组数据的边界值;否则从该组数据中将所述最大值删除,返回从数据中的最大值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。或,预设的边界规则可以为对于一组数据的边界值是从数据中的最小值开始确定密度区域;若密度区域内的数据的个数大于密 度阈值,则将所述最小值确定为该组数据的边界值;否则从该组数据中将所述最小值删除,返回从数据中的最小值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。
此处,一组数据则为第一大周期内对应的各个小周期的边界指标值;其中,各个小周期可以是第一大周期内的所有小周期。例如,第一大周期为一周七天,小周期为一个小时,则第一大周期内包含的小周期数为7×24,对应包含168个边界指标值,则这168个边界指标值为一组数据,通过这168个边界指标值计算第一大周期的边界指标值。或者,所述第一大周期内的各小周期可以为同一时段。例如,第一大周期为一周七天,小周期为一个小时,因为第一大周期内的各个小周期为同一时段,则第一大周期内包含0点-1点的7个小周期、1点-2点的7个小周期、2点-3点的7个小周期…23时-24时的7个小周期。因此,0时-1时的7个小周期的边界指标值为一组数据、1时-2时的7个小周期的边界指标值为一组数据、2时-3时的7个小周期的边界指标值为一组数据…23时-24时的7个小周期的边界指标值为一组数据。因此,则可以确定第一大周期内对应0时-1时的边界指标值、第一大周期内对应0时-1时的边界指标值、第一大周期内对应1时-2时的边界指标值…第一大周期内对应23时-24时的边界指标值。可以如此,针对一天内的每个时段第一大周期都会得到相应的第一大周期的边界指标值。若分析确定一天中有时段的小周期中几乎不产生交易,或交易量极少,则可以不计算该时段的小周期的边界指标值。
以上,基于一组数据确定边界值的方式,步骤202中已经对此处做了详细描述,将步骤202中小周期的一组数据更换成上述第一大周期内的一组数据进行计算以确定第一大周期的边界指标值,这里不做赘述。
其中,所述密度区域通过如下方式确定,包括:
根据该组数据的数据个数,确定该组数据的分区数;
根据该组数据的最大值、最小值和所述分区数,确定区域半径;
以所述最大值为中心,所述区域半径为半径,确定所述密度区域。
此处,因为在确定第一大周期的边界指标值计算方式后,可以确定第一大周期内的各个小周期的个数。如上述例子,或有168个小周期及其对应的边界指标值,或有24组相同时段的7个小周期对应的边界指标值;则通过该一组数据或该24组中每组数据的数量、数据的平方差等数据特征确定计算密度阈值的函数。如,通过一次函数确定密度阈值:密度阈值=相关系数*数据个数。或者,通过幂函数确定密度阈值,或者,通过对数函数确定密度阈值。第一大周期中计算的密度阈值的函数可以与小周期中计算密度阈值的函数相同或不同。密度阈值的计算方法具体不做限定。
其中,所述密度区域通过如下方式确定,包括:根据该组数据的最大值、最小值,确定该组数据的分区数;根据该组数据的最大值、最小值和所述分区数,确定区域半径;以所述最大值为中心,所述区域半径为半径,确定所述密度区域。以上,基于一组数据确定密度区域的方式,步骤202中已经对此处做了详细描述,将步骤202中小周期的一组数据更换成上述第一大周期内的各个小周期对应的边界指标值形成的一组数据进行计算,以确定第一大周期的密度区域,这里不做赘述。
在上一个示例中,对于新推出的理财产品A,该小周期(假设该小周期为一周中星期一的0时-1时的小周期)的边界指标值为40ms,若第一大周期内的各个小周期为同一时段,则以同样的方式获取一周中星期二的0时-1时的小周期边界指标值为40ms、一周中星期三的0时-1时的小周期边界指标值为45ms、一周中星期四的0时-1时的小周期边界指标值为47.5ms、一周中星期五的0时-1时的小周期边界指标值为48ms、一周中星期六的0时-1时的小周期边界指标值为48.5ms、一周中星期日的0时-1时的小周期边界指标值为55ms。该组数据中最大值为55ms,最小值为40ms,平均值为47.5ms,则根据上述block_edge和Eps的公式确定block_edge=4,
Figure PCTCN2021080564-appb-000003
密度区域为平均时延为[51.25,58.75]的区间,密度区域内的数据个数为0,minsample=log(7)+1=1.84509804001426,密度区域内的数据个数小于密度阈值。
删除该组数据55ms后,该组数据中最大值为48ms,最小值为40ms,平均值为44ms,则根据上述block_edge和Eps的公式确定block_edge=4,
Figure PCTCN2021080564-appb-000004
密度区域为平均时延为[47,49]的区间,密度区域内的数据个数为2。minsample=log(7-1)+1=1.77815125038364,密度区域内的数据个数大于密度阈值,则48ms为第一大周期内0时-1时的边界指标值。也可以叫做上边界指标值。
这里需要说明的是,当第一大周期内的数据个数很多,而删除的最大值的个数不足以影响密度阈值的计算时,可以按照未删除最大值的数据个数计算密度阈值。在特殊情况下,如数据比较平稳,数据间的差值相差不大时,eps可以应用删除的最大值和该组数据中的最小值计算。此处,密度阈值和区域半径的计算方式,具体不做限定。
其中,若根据所述边界规则未确定出边界值,则将该组数据中的最大值作为该组数据的边界值。也就是说,当第一大周期内的数据个数很少或数据特征特殊,无法根据预设的边界规则计算边界值时,则将该组数据中的最大值作为该组数据的边界值。
步骤204、根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值;
此处,可以将第一大周期的边界指标值作为第一大周期的可信边界指标值,也可以在 第一大周期的边界指标值的基础上做一些简单运算获得可信边界指标值,增加可信边界指标值的可靠性。
其中,根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值,包括:
通过如下公式计算:
up_boundary=up_p+K*eps+base
其中,up_boundary为所述第一大周期的可信边界指标值,up_p为所述第一大周期的边界指标值;K为波动系数;eps为区域半径,根据所述第一大周期内小周期的边界指标值的最大值、最小值和分区数确定;所述分区数为根据所述第一大周期内小周期的边界指标值的数据的个数确定的;base为根据所述最大值和所述最小值确定的。
此处,再上一个示例中,up_p=48,eps=1,则可信边界指标值up boundary=up p+K*eps=50,其中,波动系数K的具体数值可以由技术人员通过分析数据的波动幅度确定;也可以设为数据对应产品的,以往行业经验的波动系数数值,或行业专家建议的波动系数数值;波动系数数值可以根据需要灵活设置,具体不做限定。若数据的变化比较大,且数据大小的增加率又处于正常水平,为了减少误报率,可以在up boundary=up p+K*eps增加一个Base值,用来表征在数据正常的情况下,减少误报率所要增加的数据波动值,例如,base=0.1*mean,其中,mean可以是该组数据中所有数据的非零平均值,或该组数据中最大值和最小值的平均值,系数0.1可以根据工程技术人员的分析做相应的调整,这里只是作为一种实现可能。
步骤205、所述第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;所述第二大周期为所述第一大周期之后的周期。
此处,第二大周期可以是第一大周期之后紧邻的大周期,如一月份的第一周为第一大周期,一月份的第二周为第二大周期。或者,第二大周期可以是第一大周期之后非紧邻的大周期,如一月份的第一周为第一大周期,一月份的第四周为第二大周期。又或者,第二大周期可以是与第一大周期有重合时间段的大周期,如一月份的第一周的周一至周日为第一大周期,一月份的第一周的周二至周日加上一月份第二周的周一为第二大周期。在上一个示例中,第一大周期利用168个数据确定的可信边界指标值可以用来检测第二大周期中的每个时刻(ms\s\min等)内产生的数据,当第二大周期内的数据大于可信边界指标值时,可以判定数据异常触发告警。或者,第一大周期利用对应时段的7个数据确定的可信边界指标值可以用来检测第二大周期中对应时段的每个时刻(ms\s\min等)内产生的数据,当第二大周期内对应时段的数据大于可信边界指标值时,可以判定数据异常触发告警。又或 者,第一大周期中某一天的某一时段的可信边界指标可以用来检测第二大周期的对应这一天的对应时段产生的数据;例如,第一大周期第一天的0-1时的可信边界指标为a,可以用来检测第二大周期第一天的0-1时产生的数据、第一大周期第二天的0-1时的可信边界指标为b,可以用来检测第二大周期第二天的0-1时产生的数据。需要说明的是,以上第一大周期的周期长度与第二大周期的周期长度可以相同也可以不同,例如,第一大周期为两天,第二大周期为一天,或者第一大周期为一天,第二大周期为两天。可以通过缩短第一大周期的周期长度和第二大周期的周期长度实现数据的及时更新;例如,第一大周期为一天,第二大周期为一天,通过第一大周期某一时段的可信边界指标值检测第二大周期内对应时段产生的数据。确定可信边界指标值所取的第一大周期的时段具体不做限定,以及用该可信边界指标值检测第二大周期的时段具体不做限定,可以根据数据波动特征或需要做灵活调整。
采用上述方法,第一大周期内包含多个小周期,采集第一大周期内的数据,确定第一大周期及中各个小周期内的至少一个指标值;对于每个小周期对应的指标值,根据预设边界规则,获取每个小周期对应的边界指标值。如此,可以通过边界规则对每个小周期的指标值进行选择,得到每个小周期对应的边界指标值,使得每个小周期对应的边界指标值更准确。进一步,通过第一大周期的各个小周期对应的边界指标值,再次按照边界规则得到第一大周期的边界指标值,使得第一大周期的边界指标值更准确。更进一步的,根据第一大周期的边界指标值得到第一大周期的可信边界指标值,以可信边界指标值作为之后产生的数据的检测阈值;如此,在第一大周期的边界指标值的准确性基础上做相关调整得到可信边界指标值,使得之后的第一大周期采集的数据可以在正常合理的范围波动而不会触发异常,造成异常误报。实现了检测阈值的自适应调整,提高了检测阈值的准确性,进一步提高了异常检测的准确率。
基于此,本申请实施例提供了又一种异常检测的方法流程,如图5所示,包括:
步骤501、设定第一大周期和各个小周期。
此处,设定第一大周期包括设定周期大小,周期开始时间和周期结束时间等相关参数。设定各个小周期包括设定周期大小,周期开始时间和周期结束时间等相关参数。以及,各个小周期是第一大周期内的所有小周期,或者,第一大周期内的各小周期为同一时段等。各个小周期与第一大周期的关系可以灵活设置。
步骤502、采集各个小周期内的数据。
此处,确定需要被检测数据的产品或设备等,确定该产品或设备的待检测指标并对该待检测指标进行数据采集。
步骤503、根据各个小周期中的数据个数,分别计算出各个小周期对应的密度阈值。
步骤504、分别确定出各个小周期的数据中的最大值和最小值。
步骤505、分别根据各个小周期的数据中的最大值和最小值确定各个小周期中的数据对应的分区数,分别根据各个小周期的最大值、最小值和分区数确定各个小周期的区域半径,根据各个小周期的最大值和区域半径确定各个小周期的密度区域。
步骤506、确定各个小周期的最大值对应的密度区域内数据个数。
步骤507、分别判断各个小周期的密度区域内数据个数是否大于密度阈值。若否,则执行步骤508,将密度区域内数据个数小于密度阈值对应的小周期内的最大值从该小周期内删除,重新确定最大值,并执行步骤505、506、507,直到该小周期的密度区域内数据个数大于密度阈值。若是,则执行步骤509。
步骤509、将小周期的密度区域内数据个数大于密度阈值时的密度区域对应的最大值,作为该小周期的边界指标值,同样的,得到各个小周期的边界指标值。
步骤510、确定第一大周期内的小周期边界指标值的个数,并确定密度阈值。
步骤511、确定第一大周期中小周期边界指标值的最大值和最小值。
步骤512、根据该最大值和最小值确定第一大周期内小周期边界指标值数据对应的分区数,根据该最大值、最小值和分区数确定区域半径,进而根据区域半径确定第一大周期的密度区域。
步骤513、确定第一大周期的密度区域的数据个数。
步骤514、判断第一大周期的密度区域的数据个数是否大于第一大周期的密度阈值。若否,则执行步骤515,将密度区域内数据个数小于密度阈值对应的第一大周期内的最大值,从该第一大周期内的数据组中删除,重新确定最大值,并执行步骤511、512、513,直到该第一大周期的密度区域内的数据个数大于密度阈值。若是,则执行步骤516。
步骤516、该第一大周期的密度区域内数据个数大于密度阈值时,将对应该第一大周期的密度区域的最大值作为第一大周期的边界指标值。
步骤517、根据第一大周期的边界指标值获取第一大周期的可信边界指标值。
步骤518、根据第一大周期的可信边界指标值监检测第二大周期的数据,检测第二大周期的数据是否异常。
这里需要说明的是,上述流程的顺序并不唯一,如步骤501和步骤502,可以先执行步骤502,再执行步骤501。
基于同样的构思,本发明实施例提供一种异常检测装置,图6为本申请实施例提供的一种异常检测装置示意图,如图6所示,包括:
采集单元601,用于针对第一大周期内的采集数据,确定待检测指标在所述第一大周期内的各小周期的至少一个指标值;
处理单元602,用于针对每个小周期的至少一个指标值,根据预设的边界规则,确定所述待检测指标在每个小周期的边界指标值;根据各小周期的边界指标值,按照所述边界规则,确定所述待检测指标在所述第一大周期的边界指标值;
所述处理单元602还用于,根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值;所述第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;所述第二大周期为所述第一大周期之后的周期。
在一种可能的设计中,所述边界规则为对于一组数据的边界值是从数据中的最大值开始确定密度区域;若密度区域内的数据的个数大于密度阈值,则将所述最大值确定为该组数据的边界值;否则从该组数据中将所述最大值删除,返回从数据中的最大值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。
在一种可能的设计中,所述密度区域通过如下方式确定,包括:
根据该组数据的最大值、最小值,确定该组数据的分区数;
根据该组数据的最大值、最小值和所述分区数,确定区域半径;
以所述最大值为中心,所述区域半径为半径,确定所述密度区域。
在一种可能的设计中,所述处理单元602具体用于,通过如下公式计算:
up_boundary=up_p+K*eps+base
其中,up_boundary为所述第一大周期的可信边界指标值,up_p为所述第一大周期的边界指标值;K为波动系数;eps为区域半径,根据所述第一大周期内小周期的边界指标值中的最大值、最小值和分区数确定;所述分区数为根据所述第一大周期内小周期的边界指标值的数据的个数确定的;base为根据所述最大值和所述最小值确定的。
在一种可能的设计中,所述处理单元602还用于:
若根据所述边界规则未确定出边界值,则将该组数据中的最大值作为该组数据的边界值。
在一种可能的设计中,所述密度阈值依据该组数据的数据量进行设定,包括:
该组数据的数据量越大,该组数据的密度阈值越大。
在一种可能的设计中,所述第一大周期内的各小周期为同一时段。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机 可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (10)

  1. 一种异常检测的方法,其特征在于,包括:
    针对第一大周期内的采集数据,确定待检测指标在所述第一大周期内的各小周期的至少一个指标值;
    针对每个小周期的至少一个指标值,根据预设的边界规则,确定所述待检测指标在每个小周期的边界指标值;
    根据各小周期的边界指标值,按照所述边界规则,确定所述待检测指标在所述第一大周期的边界指标值;
    根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值;所述第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;所述第二大周期为所述第一大周期之后的周期。
  2. 如权利要求1所述的方法,其特征在于,所述边界规则为对于一组数据的边界值是从数据中的最大值开始确定密度区域;若密度区域内的数据的个数大于密度阈值,则将所述最大值确定为该组数据的边界值;否则从该组数据中将所述最大值删除,返回从数据中的最大值开始确定密度区域的步骤;所述密度阈值依据该组数据的数据量进行设定。
  3. 如权利要求2所述的方法,其特征在于,所述密度区域通过如下方式确定,包括:
    根据该组数据的最大值、最小值,确定该组数据的分区数;
    根据该组数据的最大值、最小值和所述分区数,确定区域半径;
    以所述最大值为中心,所述区域半径为半径,确定所述密度区域。
  4. 如权利要求2所述的方法,其特征在于,根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值,包括:
    通过如下公式计算:
    up_boundary=up_p+K*eps+base
    其中,up_boundary为所述第一大周期的可信边界指标值;up_p为所述第一大周期的边界指标值;K为波动系数;eps为区域半径,根据所述第一大周期内小周期的边界指标值的最大值、最小值和分区数确定;所述分区数为根据所述第一大周期内小周期的边界指标值的数据的个数确定的;base为根据所述最大值和所述最小值确定的。
  5. 如权利要求2所述的方法,其特征在于,还包括:
    若根据所述边界规则未确定出边界值,则将该组数据中的最大值作为该组数据的边界值。
  6. 如权利要求2所述的方法,其特征在于,所述密度阈值依据该组数据的数据量进行设定,包括:
    该组数据的数据量越大,该组数据的密度阈值越大。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述第一大周期内的各小周期为同一时段。
  8. 一种异常检测的装置,其特征在于,所述装置包括:
    采集单元,用于针对第一大周期内的采集数据,确定待检测指标在所述第一大周期内的各小周期的至少一个指标值;
    处理单元,用于针对每个小周期的至少一个指标值,根据预设的边界规则,确定所述待检测指标在每个小周期的边界指标值;根据各小周期的边界指标值,按照所述边界规则,确定所述待检测指标在所述第一大周期的边界指标值;
    所述处理单元还用于,根据所述第一大周期的边界指标值,确定所述第一大周期的可信边界指标值;所述第一大周期的可信边界指标值作为对第二大周期内的采集数据进行异常检测的检测阈值;所述第二大周期为所述第一大周期之后的周期。
  9. 一种计算设备,其特征在于,包括:
    存储器,用于存储程序指令;
    处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行权利要求1至7任一项所述的方法。
  10. 一种计算机可读非易失性存储介质,其特征在于,包括计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行如权利要求1至7任一项所述的方法。
PCT/CN2021/080564 2020-03-19 2021-03-12 一种异常检测的方法及装置 WO2021185182A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010196303.0A CN111400141B (zh) 2020-03-19 2020-03-19 一种异常检测的方法及装置
CN202010196303.0 2020-03-19

Publications (1)

Publication Number Publication Date
WO2021185182A1 true WO2021185182A1 (zh) 2021-09-23

Family

ID=71432698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080564 WO2021185182A1 (zh) 2020-03-19 2021-03-12 一种异常检测的方法及装置

Country Status (2)

Country Link
CN (1) CN111400141B (zh)
WO (1) WO2021185182A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301803A (zh) * 2021-12-24 2022-04-08 北京百度网讯科技有限公司 网络质量检测方法、装置、电子设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400141B (zh) * 2020-03-19 2021-11-09 深圳前海微众银行股份有限公司 一种异常检测的方法及装置
CN117130819B (zh) * 2023-10-27 2024-01-30 江西师范大学 一种基于时延方差和相关系数值的微服务故障诊断方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871190A (zh) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 一种业务指标监控方法及装置
CN109213654A (zh) * 2018-07-05 2019-01-15 北京奇艺世纪科技有限公司 一种异常检测方法及装置
CN109558295A (zh) * 2018-11-15 2019-04-02 新华三信息安全技术有限公司 一种性能指标异常检测方法及装置
US10509695B1 (en) * 2015-03-30 2019-12-17 ThetaRay Ltd. System and method for anomaly detection in dynamically evolving data using low rank matrix decomposition
CN111400141A (zh) * 2020-03-19 2020-07-10 深圳前海微众银行股份有限公司 一种异常检测的方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7076695B2 (en) * 2001-07-20 2006-07-11 Opnet Technologies, Inc. System and methods for adaptive threshold determination for performance metrics
CN105406991A (zh) * 2015-10-26 2016-03-16 上海华讯网络***有限公司 基于网络监控指标由历史数据生成业务阈值的方法及***
CN107153593B (zh) * 2016-03-02 2021-01-05 创新先进技术有限公司 一种互联网业务监控阈值的确定方法及装置
CN106557401A (zh) * 2016-10-13 2017-04-05 中国铁道科学研究院电子计算技术研究所 一种it设备监控指标的动态阈值设定方法及***
CN110189228A (zh) * 2019-06-24 2019-08-30 深圳前海微众银行股份有限公司 一种监测异常交易的方法和装置
CN110377491A (zh) * 2019-07-10 2019-10-25 ***股份有限公司 一种数据异常检测方法及装置
CN110890998B (zh) * 2019-11-06 2021-08-27 厦门网宿有限公司 一种确定阈值的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10509695B1 (en) * 2015-03-30 2019-12-17 ThetaRay Ltd. System and method for anomaly detection in dynamically evolving data using low rank matrix decomposition
CN107871190A (zh) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 一种业务指标监控方法及装置
CN109213654A (zh) * 2018-07-05 2019-01-15 北京奇艺世纪科技有限公司 一种异常检测方法及装置
CN109558295A (zh) * 2018-11-15 2019-04-02 新华三信息安全技术有限公司 一种性能指标异常检测方法及装置
CN111400141A (zh) * 2020-03-19 2020-07-10 深圳前海微众银行股份有限公司 一种异常检测的方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301803A (zh) * 2021-12-24 2022-04-08 北京百度网讯科技有限公司 网络质量检测方法、装置、电子设备及存储介质
CN114301803B (zh) * 2021-12-24 2024-03-08 北京百度网讯科技有限公司 网络质量检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111400141B (zh) 2021-11-09
CN111400141A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021185182A1 (zh) 一种异常检测的方法及装置
CN107066365B (zh) 一种***异常的监测方法及装置
US7421460B2 (en) Method for determining execution of backup on a database
CN103354924B (zh) 用于监视性能指标的方法和***
CN111984503A (zh) 一种监控指标数据异常数据识别的方法及装置
CN111045894B (zh) 数据库异常检测方法、装置、计算机设备和存储介质
CN112433919B (zh) 一种信息告警方法、设备及存储介质
CN108874535B (zh) 一种任务调节方法、计算机可读存储介质及终端设备
TWI700578B (zh) 異常檢測的方法及裝置
CN112365070B (zh) 一种电力负荷预测方法、装置、设备及可读存储介质
EP4343554A1 (en) System monitoring method and apparatus
US9116804B2 (en) Transient detection for predictive health management of data processing systems
JP5219783B2 (ja) 不正アクセス検知装置及び不正アクセス検知プログラム及び記録媒体及び不正アクセス検知方法
CN110855484B (zh) 自动检测业务量变化的方法、***、电子设备和存储介质
CN112506901A (zh) 一种数据质量测量方法、装置及介质
CN116414608A (zh) 异常检测方法、装置、设备及存储介质
WO2023024679A1 (zh) 一种预测服务器容量的方法及装置
CN112685390B (zh) 数据库实例管理方法及装置、计算设备
CN115168154A (zh) 一种基于动态基线的异常日志检测方法、装置及设备
CN111309716A (zh) 应用于pas案例库的维护方法、装置及计算机设备
CN114513441B (zh) 基于区块链的***维护方法、装置、设备及存储介质
US11829226B2 (en) Anomaly detection apparatus, anomaly detection method, and anomaly detection program
US20240020545A1 (en) Selecting forecasting algorithms using motifs
CN115189350A (zh) 电网用户侧的实时电价攻击检测方法、装置及存储介质
CN117575802A (zh) 金融***的升级监控方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21771452

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21771452

Country of ref document: EP

Kind code of ref document: A1