WO2024066720A1 - 指标阈值的确定方法、装置、存储介质及电子装置 - Google Patents

指标阈值的确定方法、装置、存储介质及电子装置 Download PDF

Info

Publication number
WO2024066720A1
WO2024066720A1 PCT/CN2023/110331 CN2023110331W WO2024066720A1 WO 2024066720 A1 WO2024066720 A1 WO 2024066720A1 CN 2023110331 W CN2023110331 W CN 2023110331W WO 2024066720 A1 WO2024066720 A1 WO 2024066720A1
Authority
WO
WIPO (PCT)
Prior art keywords
indicator
threshold
data
coordinate
value
Prior art date
Application number
PCT/CN2023/110331
Other languages
English (en)
French (fr)
Inventor
杨伟伟
冯媛
邵敏峰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024066720A1 publication Critical patent/WO2024066720A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • the present disclosure relates to the field of big data and artificial intelligence technology, and in particular, to a method, device, storage medium and electronic device for determining an indicator threshold.
  • threshold setting such as anomaly detection, root cause analysis, data prediction, alarm management, intelligent recovery, and perception evaluation.
  • wireless network operators set thresholds for service indicators mainly based on fixed empirical thresholds of indicators or dynamic thresholds obtained from relatively complex statistical distributions. Even if dynamic thresholds based on mathematical methods such as statistical distribution are used, the threshold solution problem is converted into a threshold setting problem in another dimension, which makes it difficult to accurately and objectively measure the pros and cons of service indicators, and thus effectively guide network O&M and analysis and achieve the goal of maximizing data value.
  • the embodiments of the present disclosure provide a method, device, storage medium and electronic device for determining an indicator threshold, so as to at least solve the problem of how to determine the indicator threshold.
  • a method for determining an indicator threshold comprising: obtaining aggregated indicator data corresponding to a target indicator; determining an indicator data set from the aggregated indicator data, sorting first indicator data in the indicator data set to obtain second indicator data, clustering the second indicator data to obtain a plurality of clustered groups, and fitting the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object; and determining an indicator threshold from a set of intersection coordinates of the piecewise functions according to the indicator bias of the target indicator.
  • a device for determining an indicator threshold comprising: an acquisition module configured to acquire aggregate indicator data corresponding to a target indicator; a first determination module configured to determine a target indicator from the aggregate indicator data; Determine an indicator data set, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain multiple groups after clustering, and fit the indicator data of each group of the multiple groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; a second determination module is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  • a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned method for determining the indicator threshold value when running.
  • an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the method for determining the indicator threshold through the computer program.
  • aggregated indicator data corresponding to a target indicator is obtained; an indicator data set is determined from the aggregated indicator data, first indicator data in the indicator data set is sorted to obtain second indicator data, the second indicator data is clustered to obtain a plurality of clustered groups, and the indicator data of each of the plurality of groups is fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object; an indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the technical problem of how to determine the indicator threshold.
  • FIG1 is a hardware structure block diagram of a computer terminal of a method for determining an indicator threshold value according to an embodiment of the present disclosure
  • FIG2 is a flow chart of a method for determining an indicator threshold according to an embodiment of the present disclosure
  • FIG3 is a schematic diagram of two-dimensional discrete points according to an embodiment of the present disclosure.
  • FIG. 4 is a structural block diagram of a device for determining an indicator threshold according to an embodiment of the present disclosure.
  • FIG1 is a hardware structure block diagram of a computer terminal of the method for determining the indicator threshold of the embodiment of the present disclosure.
  • the computer terminal may include one or more (only one is shown in FIG1 ) processors 202 (the processor 202 may include but is not limited to a microprocessor (Microprocessor Unit, referred to as MPU) or a programmable logic device (Programmable logic device, referred to as PLD) and a memory 204 configured to store data.
  • MPU microprocessor Unit
  • PLD programmable logic device
  • the above-mentioned computer terminal may also include a transmission device 206 and an input and output device 208 configured to have a communication function.
  • a transmission device 206 and an input and output device 208 configured to have a communication function.
  • the structure shown in FIG1 is only for illustration and does not limit the structure of the above-mentioned computer terminal.
  • the computer terminal may also include more or fewer components than those shown in FIG1 , or have a different configuration with the same function as that shown in FIG1 or more functions than those shown in FIG1 .
  • the memory 204 may be configured to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the method for determining the index threshold value in the embodiment of the present disclosure, and the processor 202 executes various functional applications and data processing by running the computer programs stored in the memory 204, that is, to implement the above method.
  • the memory 204 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 204 may further include a memory remotely arranged relative to the processor 202, and these remote memories may be connected to the computer terminal via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 206 is configured to receive or send data via a network.
  • Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of a computer terminal.
  • the transmission device 206 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 206 can be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet wirelessly.
  • RF Radio Frequency
  • TCP Transmission Control Protocol
  • transmission control protocol transmission control protocol
  • RTT Round-Trip Time, round-trip delay
  • CPU Central Processing Unit
  • central processing unit central processing unit
  • FIG2 is a flow chart of a method for determining an indicator threshold according to an embodiment of the present disclosure. As shown in FIG2 , the steps of the method include:
  • Step S202 obtaining aggregated indicator data corresponding to the target indicator.
  • Step S204 determining an indicator data set from the aggregated indicator data, and performing a The indicator data are sorted to obtain second indicator data, the second indicator data are clustered to obtain multiple groups after clustering, and the indicator data of each group of the multiple groups are fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitoring object.
  • clustering algorithms for clustering the second indicator data may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean migration clustering algorithm, hierarchical clustering, etc., but are not limited to these.
  • Step S206 determining an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  • the disclosed embodiment obtains aggregated indicator data corresponding to the target indicator; determines an indicator data set from the aggregated indicator data, sorts the first indicator data in the indicator data set to obtain second indicator data, clusters the second indicator data to obtain a plurality of clustered groups, and fits the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; determines an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the problem of how to determine the indicator threshold.
  • the following implementation steps are proposed: determine the pre-set monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, the initial indicator data under the indicator category, and the time aggregation granularity corresponding to the target indicator; determine the indicator data to be aggregated according to the pre-set monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category; aggregate the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator.
  • a technical solution for aggregating the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator, which specifically includes: obtaining a first time granularity of the indicator data to be aggregated; when it is determined that the first time granularity is smaller than the time aggregation granularity, obtaining the first indicator data of the indicator data to be aggregated within the first time granularity, and aggregating multiple first time granularities into the time aggregation granularity; aggregating multiple first indicator data within the multiple first time granularities into first aggregated indicator data within the time aggregation granularity, and determining the first aggregated indicator data as the aggregated indicator data corresponding to the target indicator.
  • first indicator data of the indicator data to be aggregated within the first time granularity can be obtained, and the first indicator data can be determined as the aggregated indicator data corresponding to the target indicator.
  • the second indicator data before clustering the second indicator data, can be further standardized to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting number; for each standardized indicator value, the sorting number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value; the coordinate slopes between the two adjacent coordinate points are determined to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, a smoothing value of each coordinate slope is determined to obtain a plurality of smoothing values; based on the plurality of smoothing values, The sliding value determines third indicator data, and determines the third indicator data as the updated second indicator data.
  • the above-mentioned standardization processing may include normalization processing.
  • the second indicator data may be standardized by using a normalization processing method to compress the range to within the range of [0, 1], so as to standardize the data and improve the data processing efficiency.
  • the following technical solution is proposed: clustering the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values; for each group of slope values, determining the mean of the coordinate slopes of each group of slope values as the smoothing value of the coordinate slope of each group of slope values.
  • the above-mentioned preset clustering algorithms may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean shift clustering algorithm, hierarchical clustering, etc., and the present disclosure does not limit this.
  • a technical solution is also proposed, and the specific steps include: when it is determined that there is a target group slope value among the multiple groups of slope values, the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value is determined as the smoothed value of the coordinate slope within the target group slope value, or the smoothed value of the coordinate slope within the target group slope value is determined according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
  • multiple groups of means can be obtained according to the mean of the indicator data of each group, and a mean set can be determined according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function; the intersection coordinates of the intersection coordinate set of the piecewise function are determined, the left derivative and the right derivative corresponding to the intersection coordinates are determined, and the first mean of the first piecewise function corresponding to the left derivative in the mean set and the second mean of the second piecewise function corresponding to the right derivative in the mean set are determined; based on the first mean and the second mean, it is determined whether to retain the intersection coordinates within the indicator threshold set.
  • the following technical solution is proposed to illustrate the implementation process of determining whether to retain the intersection coordinates within the indicator threshold set based on the first mean and the second mean: determine a first absolute distance value between the intersection coordinates and the origin coordinates; determine first coordinate information corresponding to the first mean based on the first absolute distance value and the first mean, wherein the first coordinate information represents the independent variable value of the first piecewise function; determine second coordinate information corresponding to the second mean based on the first absolute distance value and the second mean, wherein the second coordinate information represents the independent variable value of the second piecewise function; when it is determined that the first coordinate information is the same as the second coordinate information, retain the intersection coordinates within the indicator threshold set; when it is determined that the first coordinate information is different from the second coordinate information, retain the intersection coordinates within the indicator threshold set.
  • a technical solution for implementing the above step S206 of determining the indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, specifically including: determining the set of non-differentiable points of the piecewise function and the coordinate points whose second-order derivatives are target values; determining the indicator threshold set based on the set of non-differentiable points, the coordinate points whose second-order derivatives are target values, and the intersection coordinate set of the piecewise function; determining the set of indicator thresholds according to the set of non-differentiable points, the coordinate points whose second-order derivatives are target values, and the intersection coordinate set of the piecewise function; determining the set of The indicator bias of the target indicator determines the indicator threshold from the indicator threshold set.
  • target value may be, for example, 0, but is not limited thereto.
  • the process of determining an indicator threshold from an indicator threshold set according to the indicator bias of the target indicator can be implemented in a variety of ways, specifically including: Way 1, when it is determined that the indicator bias of the target indicator is negative, if the type of the indicator threshold is determined to be an alarm threshold, then the maximum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, then the minimum value in the indicator threshold set is determined as the indicator threshold.
  • Method 2 When it is determined that the indicator bias of the target indicator is positive, if the type of the indicator threshold is determined to be an alarm threshold, the minimum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
  • the alarm threshold can be understood as a threshold when the performance corresponding to the indicator data of the target indicator is poor.
  • the alarm threshold of the CPU usage is set to 80%, at which time the CPU occupies more resources and the performance is poor.
  • the optimal threshold can be understood as a threshold when the performance corresponding to the indicator data of the target indicator is better.
  • the optimal threshold of the network delay is set to 10%, at which time the network delay is small and the performance is better. In particular, for multiple optimal thresholds, the smallest one is selected as the optimal delay.
  • the monitoring dimensions and indicators are determined, and the key KPI indicator system for operation and maintenance scenarios ⁇ KPI1, KPI2, ..., KPIn ⁇ is constructed; then, according to the actual real-time requirements of business operation and maintenance, the time aggregation granularity is determined, and the time granularity indicator aggregation is performed on each dimension-KPI data to construct a five-tuple data of ⁇ dimension (i.e. monitoring dimension), object (i.e. monitoring object), time granularity (i.e. time aggregation granularity), indicator (i.e. indicator category of target indicator), data (i.e. initial indicator data under indicator category) ⁇ ; among them, the business indicator configuration information needs to clarify the bias of the indicator and the normal range of the indicator.
  • ⁇ dimension i.e. monitoring dimension
  • object i.e. monitoring object
  • time granularity i.e. time aggregation granularity
  • indicator i.e. indicator category of target indicator
  • data i.e.
  • Step a construct a single-dimensional single-object or multi-object KPI indicator data set (the object range selection depends on the actual application scenario), sort the data values of the time series indicators (from small to large or from large to small), and obtain a two-dimensional sequence of ID values and KPI indicator values ⁇ i: Valuei ⁇ , i ⁇ [1, N], N is the number of samples in the data set, and the ID value is the corresponding serial number of the sorted KPI indicator, starting from 1 and increasing by an interval of 1.
  • the two-dimensional sequence can be expressed as a two-dimensional discrete point image with the sample ID and KPI index value as the coordinate axis.
  • the coordinate point is the ID as the horizontal axis and the KPI index value as the vertical axis.
  • Step b Considering that the KPI indicator value may fluctuate greatly, in order to facilitate subsequent processing, the indicator is first standardized.
  • the KPI indicator value is standardized by a normalization processing method, and the range is compressed to the range of [0, 1].
  • clustering algorithms can avoid excessive parameter settings in the process of building the algorithm model, and can classify data with differences in the sequence, making it easier to obtain turning points later.
  • the present disclosure does not specifically limit the type of clustering algorithms.
  • Step d For the G clustering result sets of step c, curve fitting is performed respectively to obtain a piecewise function f(x) having G fitting functions.
  • the curve fitting method can quickly obtain an approximate piecewise function, which provides an effective way to solve the turning point in the subsequent automatic threshold calculation process.
  • Step a Solve the intersection coordinate information of each adjacent piecewise fitting function. For the intersection coordinates, solve the x ⁇ [1, N] interval, the f(x) non-differentiable point set C and the coordinate points where the second-order derivative is 0 to form the threshold set T. For x ⁇ C, calculate the left derivative k1' and the right derivative k2' respectively, and calculate the absolute distance between k1' and k2' and k ⁇ K to determine the category. If both the left and right derivatives belong to the same group, it means that the point is not the turning point we want to find, and then remove the point from the threshold set T.
  • Step b Combine the indicator bias information provided in the business indicator configuration information in step 1 and take the maximum or minimum value in the threshold set T as the threshold solution.
  • the monitoring dimension of this embodiment is the server
  • the objects are server A, server B, and server C
  • the indicators are CPU usage (%), memory usage (%), disk usage (%), and network rate (kbps).
  • the above indicators are constructed into a key KPI indicator system for server equipment operation and maintenance scenarios;
  • the time aggregation granularity is determined to be 1 hour. Taking the server's CPU utilization rate (%) as the target indicator, the indicator is aggregated at the time granularity to construct a five-tuple data of ⁇ dimension, object, time granularity, indicator, data ⁇ ; the business indicator configuration information clearly states that the bias of the CPU utilization rate (%) indicator is negative, and the normal range of the indicator is 0 to 100.
  • Table 1 Index data record table
  • mapping function is solved according to the discrete sample points.
  • server A and server B as objects, select their corresponding indicator data sets, sort the indicator data in the set according to the values, and obtain a two-dimensional sequence ⁇ i: Valuei ⁇ , i ⁇ [1, 48] of the combination of ID values and CPU usage (%) values, and construct a two-dimensional discrete point image with the horizontal axis as the sample ID and the vertical axis as the CPU usage (%).
  • the CPU usage (%) indicator value is normalized.
  • the minimum-maximum scaling method can be used to compress the indicator range to the range of [0, 1].
  • the normalization function is as follows:
  • the two-dimensional sequence discrete points within the above two-dimensional sequence can be understood as coordinate points corresponding to the above standardized index values.
  • the slope between two consecutive points is calculated in segments (equivalent to the above-mentioned coordinate slope), and the slope is smoothed by the three-point mean slope k’.
  • the three-point mean smoothed slope cannot be calculated for the last two points, and the three-point mean smoothed slope of the previous point can be used instead to obtain a new sequence of ID values and three-point mean smoothed slopes ⁇ i:k’i ⁇ , i ⁇ [1,48].
  • the threshold is automatically learned.
  • intersection coordinate set X ⁇ (10, 0.3), (30, 0.5), (40, 0.8) ⁇ of each adjacent piecewise fitting function.
  • the intersection coordinates solve the x ⁇ [1, 48] interval, the set C of non-differentiable points of f(x) and the coordinate points where the second-order derivative is 0.
  • the threshold set T is formed.
  • the sets C and T are also ⁇ (10, 0.3), (30, 0.5), (40, 0.8) ⁇ .
  • the left derivative k1' and the right derivative k2' are calculated respectively.
  • the absolute distances with k ⁇ K are calculated for k1' and k2' respectively.
  • the k selects the mean of the smooth slopes of the groups corresponding to the piecewise functions on the left and right sides of the coordinate point. According to the calculated absolute distance, the belonging category is determined. For the coordinate point (10, 0.3), the left derivative k1' and the right derivative k2' are 0.04 and 0.01 respectively. The mean of the smooth slopes of the groups corresponding to the piecewise functions on the left and right sides of the coordinate point (10, 0.3) are 0.046 and 0.012 respectively. According to the absolute distance calculation, the left derivative k1' and the right derivative k2' belong to different groups, and the point is not removed from the threshold set T. The same goes for other coordinate points, and the final threshold set T is ⁇ (10, 0.3), (30, 0.5), (40, 0.8) ⁇ .
  • the usage scenario is to find the CPU usage rate (%) that needs to issue an alarm, that is, the worse threshold, so the maximum value of the smooth slope of the three-point mean in the threshold set T is selected as 0.8 as the reference value for generating the threshold. Based on this value, the CPU usage rate (%) indicator value before normalization is obtained in reverse, which is 90.72, which is the required threshold solution.
  • the monitoring dimension of this embodiment is the cell
  • the objects are cell 622001, cell 622002, cell 622003, ..., cell 622099
  • the indicators are TCP connection success rate (%), TCP retransmission rate (%), TCP disorder rate (%), TCP average RTT delay (ms);
  • the time aggregation granularity is determined to be 1 hour.
  • the indicator is aggregated at the time granularity to construct the five-tuple data of ⁇ dimension, object, time granularity, indicator, data ⁇ ; the business indicator configuration information clearly states that the bias of the TCP average RTT delay (ms) indicator is negative, and the normal range of the indicator is greater than or equal to 0.
  • mapping function is solved according to the discrete sample points.
  • 50 cells including cell 622001, cell 622002, cell 622003, ..., cell 622050, are determined as objects, and their corresponding indicator data sets are selected.
  • the indicator data in the set are sorted according to the values to obtain a two-dimensional sequence ⁇ i: Valuei ⁇ , i ⁇ [1, 1200] of the combination of ID value and TCP average RTT delay (ms) value.
  • a two-dimensional discrete point image is constructed with the horizontal axis as sample ID and the vertical axis as TCP average RTT delay (ms).
  • the indicator value of TCP average RTT delay (ms) is normalized, and the minimum-maximum scaling method can be used to compress the indicator range to the range of [0, 1].
  • the threshold is automatically learned.
  • the minimum value of the smooth slope of the three-point mean in the threshold set T 0.123, is selected as the reference value for generating the threshold. Based on this value, the TCP average RTT delay (ms) indicator value of 2.5 before normalization is obtained in reverse, which is the required threshold solution.
  • the unilaterality of the indicators is first divided to distinguish between positive and negative indicators. Secondly, the indicator data is preprocessed and the rate of change of the curve is calculated. Then, the machine learning algorithm is used to train the indicator data model. Finally, the threshold learning is transformed into a solution problem based on the turning point of the unilaterality of the indicator, so as to realize the intelligent generation of thresholds for different types of indicators with lower cost and higher accuracy.
  • the problem that the threshold or the converted threshold needs to be manually set can be solved more thoroughly.
  • the technical solution disclosed in the present invention has better applicability and accuracy, and provides a strong guarantee for the operation and maintenance support and operation analysis of mobile operators, which not only helps mobile operators to perform operation and maintenance support and operation analysis more accurately, but also greatly saves labor costs.
  • the present disclosure relates to the field of big data and artificial intelligence technology, and in particular to the field of communication big data and engineering operation and maintenance in the Internet, the Internet of Things, etc., where a large number of indicator thresholds need to be set in a targeted manner, such as the operation and maintenance support and operation analysis of mobile operators, such as anomaly detection, root cause analysis, data prediction, alarm management, intelligent recovery and perception evaluation.
  • the setting method of indicator thresholds on the market is currently widely used, and the indicator threshold is mainly set based on the fixed empirical threshold of the indicator or the dynamic threshold obtained by relatively complex statistical distribution.
  • the present disclosure converts the calculation problem of the indicator threshold into an image solving problem, and combines artificial intelligence algorithms for training and prediction.
  • the artificial intelligence algorithms used also have more choices in practical applications, such as neural network, clustering, classification and other algorithms.
  • the constructed model has good accuracy and broad application prospects, which provides a premise for the precision and intelligence of mobile operator engineering operation and maintenance, and also clarifies the direction for reducing labor costs.
  • This disclosure is aimed at the field of operation and maintenance, especially large and complex architecture systems. It includes IT equipment operation and maintenance based on underlying monitoring indicators and business system operation and maintenance based on model-based KPI/KQI indicators. By collecting and cleaning key system indicators, building monitoring dimension models and automatically learning indicator thresholds, it is possible to identify faults or risks in the system, thereby facilitating network optimization personnel to handle or avoid faults in advance.
  • a device for determining an indicator threshold is also provided, which is used to implement the above embodiments and preferred implementation methods.
  • the term “module” may be a combination of software and/or hardware that implements a predetermined function.
  • the devices described in the following embodiments are preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG4 is a structural block diagram of a device for determining an indicator threshold according to an embodiment of the present disclosure. As shown in FIG4 , the device for determining an indicator threshold includes:
  • An acquisition module 42 is configured to acquire aggregated indicator data corresponding to a target indicator
  • a first determination module 44 is configured to determine an indicator data set from the aggregated indicator data, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain a plurality of clustered groups, and fit the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object;
  • clustering algorithms for clustering the second indicator data may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean migration clustering algorithm, hierarchical clustering, etc., but are not limited to these.
  • the second determination module 46 is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  • aggregated indicator data corresponding to the target indicator is obtained; an indicator data set is determined from the aggregated indicator data, the first indicator data in the indicator data set is sorted to obtain second indicator data, the second indicator data is clustered to obtain multiple clustered groups, and the indicator data of each group of the multiple groups is fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; an indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the problem of how to determine the indicator threshold.
  • the acquisition module 42 is further configured to: determine a preset monitoring dimension, a monitoring object of the monitoring dimension, an indicator category of the target indicator, the initial indicator data under the indicator category, and a time aggregation granularity corresponding to the target indicator; determine the indicator data to be aggregated according to the preset monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category; aggregate the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator.
  • the acquisition module 42 is further configured to: acquire the first time granularity of the indicator data to be aggregated; when determining that the first time granularity is smaller than the time aggregation granularity, acquire the first indicator data of the indicator data to be aggregated within the first time granularity, and aggregate multiple first time granularities into the time aggregation granularity; aggregate multiple first indicator data within the multiple first time granularities into first aggregate indicator data within the time aggregation granularity, and determine the first aggregate indicator data as the aggregate indicator data corresponding to the target indicator.
  • the above-mentioned acquisition module 42 is also configured to: when it is determined that the first time granularity is equal to the time aggregation granularity, obtain the first indicator data of the indicator data to be aggregated within the first time granularity, and determine the first indicator data as the aggregation indicator data corresponding to the target indicator.
  • the acquisition module 42 is further configured as follows: before clustering the second indicator data, the second indicator data is standardized to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting number; for each standardized indicator value, the sorting number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value; the coordinate slopes between two adjacent coordinate points are determined to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, a smoothing value of each coordinate slope is determined to obtain a plurality of smoothing values; the third indicator data is determined according to the plurality of smoothing values, and the third indicator data is determined as the updated second indicator data.
  • the above-mentioned standardization processing may include normalization processing.
  • the second indicator data may be standardized by using a normalization processing method to compress the range to within the range of [0, 1], so as to standardize the data and improve the data processing efficiency.
  • the acquisition module 42 is further configured to: in the process of determining the smoothing value of each coordinate slope in the multiple coordinate slopes to obtain multiple smoothing values, cluster the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values; for each group of slope values, determine the mean of the coordinate slopes of each group of slope values as the smoothing value of the coordinate slope of each group of slope values.
  • the above-mentioned preset clustering algorithms may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean shift clustering algorithm, hierarchical clustering, etc., and the present disclosure does not limit this.
  • the acquisition module 42 is further configured to: when it is determined that there is a target group slope value among the multiple groups of slope values, determine the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value as the smoothed value of the coordinate slope within the target group slope value, or determine the smoothed value of the coordinate slope within the target group slope value according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
  • the first determination module 44 is further configured to: obtain multiple groups of means according to the mean of the indicator data of each group, and determine a mean set according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function; determine the intersection coordinates of the intersection coordinate set of the piecewise function, determine the left derivative and the right derivative corresponding to the intersection coordinates, and determine the first mean of the first piecewise function corresponding to the left derivative in the mean set and the second mean of the second piecewise function corresponding to the right derivative in the mean set; determine whether to retain the intersection coordinates within the indicator threshold set based on the first mean and the second mean.
  • the first determination module 44 is further configured to: determine a first absolute distance value between the intersection coordinates and the origin coordinates; determine first coordinate information corresponding to the first mean according to the first absolute distance value and the first mean, wherein the first coordinate information represents the independent variable value of the first piecewise function; determine second coordinate information corresponding to the second mean according to the first absolute distance value and the second mean, wherein the second coordinate information represents the independent variable value of the second piecewise function; when it is determined that the first coordinate information is the same as the second coordinate information, retain the intersection coordinates within the indicator threshold set; when it is determined that the first coordinate information is different from the second coordinate information, retain the intersection coordinates within the indicator threshold set.
  • the second determination module 46 is further configured to: determine the set of non-differentiable points of the piecewise function and the coordinate points where the second-order derivative is the target value; determine the indicator threshold set based on the set of non-differentiable points, the coordinate points where the second-order derivative is the target value, and the coordinate set of the intersection of the piecewise function; and determine the indicator threshold from the indicator threshold set according to the indicator bias of the target indicator.
  • the second determination module 46 is further configured as follows: when it is determined that the indicator bias of the target indicator is negative, if the type of the indicator threshold is determined to be an alarm threshold, the maximum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the minimum value in the indicator threshold set is determined as the indicator threshold.
  • the above-mentioned second determination module 46 is also configured as follows: when it is determined that the indicator bias of the target indicator is positive, if the type of the indicator threshold is determined to be an alarm threshold, the minimum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
  • the present disclosure proposes a threshold intelligent learning and operation and maintenance device based on curve image calculation, which can solve the core problem of replacement of threshold self-learning in the industry (i.e., converting one threshold automatic learning process into the threshold setting of another threshold), and truly achieve automatic identification and operation and maintenance of thresholds without human intervention.
  • the indicator data of the monitoring object based on the granularity of the actual application scenario is obtained; the constructed data is displayed in a sequence graphical manner, the distribution data is converted into a curve image, and by solving the turning point of the distribution, combined with the actual indicator business characteristics, the indicator threshold self-learning function is further realized.
  • a first aspect of the present disclosure provides a model building unit based on collected indicators, which is configured to implement functions such as data cleaning, aggregate model description, and core business indicator configuration item description.
  • the second aspect of the present disclosure provides a method for solving a mapping function based on discrete sample points, the method comprising: converting the time series data after model construction into the image representation required for solving the threshold in the present disclosure. It should be noted that the image is not actually drawn here, but the converted data sequence can express the characteristics of the image; through clustering and image fitting algorithms, a mapping function of the image based on the sample sequence is obtained.
  • the third aspect of the present disclosure provides a threshold automatic learning calculation method based on curve image calculation, the method comprising: solving the slope change rate of the function curve obtained in the above steps, combining the business characteristics of the indicator, to obtain the threshold intelligent recognition result.
  • the fourth aspect of the present disclosure provides a threshold automatic learning device based on curve image calculation, the device comprising: a real-time data aggregation module, configured to perform real-time data cleaning and indicator aggregation for key KPI indicators of each entity node of the multi-dimensional system of the operation and maintenance system; a threshold intelligent identification module, configured to execute the method in the above steps.
  • a fifth aspect of the present disclosure provides an electronic device, the electronic device comprising a computer processor and a memory: the computer memory is configured to store a computer program;
  • the processor is configured to implement the functions implemented by the model building unit described in the first aspect above, and to execute a method for solving a mapping function based on discrete sample points described in the second aspect above and a threshold automatic learning calculation method based on curve image calculation described in the third aspect.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is a better implementation method.
  • the technical solution of the present disclosure, or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.
  • a readable storage medium such as ROM/RAM, a magnetic disk, or an optical disk
  • the above-mentioned computer-readable storage medium may include, but is not limited to: a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, a magnetic disk or an optical disk, and other media that can store computer programs.
  • An embodiment of the present disclosure further provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
  • the processor may be configured to perform the following steps through a computer program:
  • the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
  • modules or steps of the present disclosure can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, they can be implemented by a program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, the steps shown or described can be executed in a different order than here, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation.
  • the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开实施例提出了一种指标阈值的确定方法、装置、存储介质及电子装置,该方法包括:获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。采用上述技术方案,解决了相关技术中如何确定出指标阈值的问题。

Description

指标阈值的确定方法、装置、存储介质及电子装置
本公开要求于2022年09月30日提交中国专利局、申请号为202211225305.3、发明名称“指标阈值的确定方法、装置、存储介质及电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及大数据和人工智能技术领域,具体而言,涉及一种指标阈值的确定方法、装置、存储介质及电子装置。
背景技术
随着万物互联时代的到来,传感器、智能手机、可穿戴设备以及智能家电等设备将成为万物互联的一部分,在设备的运行时期往往会产生海量数据,无线网络运营商在对数据的快速处理过程中,通常会挖掘有效数据价值并应用于运维支撑和运营分析。基于移动互联网的业务丰富多样,不同的业务对网络性能的要求也各异,因此,在采集数据进行运维支撑和运营分析的过程中,往往需要结合业务的特点和指标的实际情况进行阈值设定,从而灵活地构建指标的评价标准。
需要进行阈值设定的运维分析场景非常多,如异常检测、根因分析、数据预测、告警管理、智能恢复和感知评估等。以往无线网络运营商对业务的指标进行阈值设定的时候主要是基于指标的固定经验阈值或者相对复杂的统计学分布得到的动态阈值,即使是采用基于统计学分布等数学方法的动态阈值,也是将阈值求解问题转换为另一个维度的阈值设置问题,难以准确客观地衡量业务指标的优劣情况,进而有效地指导网络运维和分析并达到数据价值最大化的目标。
因此,针对相关技术,如何确定出指标阈值的问题,目前尚未提出有效的解决方案。
因此,有必要对相关技术予以改良以克服相关技术中的所述缺陷。
发明内容
本公开实施例提供了一种指标阈值的确定方法、装置、存储介质及电子装置,以至少解决如何确定出指标阈值的问题。
根据本公开实施例的一方面,提供一种指标阈值的确定方法,包括:获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
根据本公开实施例的又一方面,还提供了一种指标阈值的确定装置,包括:获取模块,设置为获取目标指标对应的聚合指标数据;第一确定模块,设置为从所述聚合指标数据中确 定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;第二确定模块,设置为根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
根据本公开实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述指标阈值的确定方法。
根据本公开实施例的又一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述指标阈值的确定方法。
通过本公开,通过获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,解决了如何确定指标阈值的技术问题。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示例性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1是本公开实施例的指标阈值的确定方法的计算机终端的硬件结构框图;
图2是根据本公开实施例的指标阈值的确定方法的流程图;
图3是根据本公开实施例的二维离散点的示意图;
图4是根据本公开实施例的指标阈值的确定装置的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于 清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本公开实施例中所提供的方法实施例可以在计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图1是本公开实施例的指标阈值的确定方法的计算机终端的硬件结构框图。如图1所示,计算机终端可以包括一个或多个(图1中仅示出一个)处理器202(处理器202可以包括但不限于微处理器(Microprocessor Unit,简称是MPU)或可编程逻辑器件(Programmable logic device,简称是PLD)和设置为存储数据的存储器204,在一个示例性实施例中,上述计算机终端还可以包括设置为通信功能的传输设备206以及输入输出设备208。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述计算机终端的结构造成限定。例如,计算机终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示等同功能或比图1所示功能更多的不同的配置。
存储器204可设置为存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的指标阈值的确定方法对应的计算机程序,处理器202通过运行存储在存储器204内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器204可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器204可进一步包括相对于处理器202远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输设备206设置为经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输设备206包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备206可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。
接下来对本公开中提及的技术术语的含义进行解释:
KQI,Key Quality Indicators,业务质量参数;
KPI,Key Performance Index,关键性能指标;
TCP,Transmission Control Protocol,传输控制协议;
RTT,Round-Trip Time,往返时延;
CPU,Central Processing Unit,中央处理器。
图2是根据本公开实施例的指标阈值的确定方法的流程图,如图2所示,该方法的步骤包括:
步骤S202,获取目标指标对应的聚合指标数据。
步骤S204,从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一 指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据。
需要说明的是,上述对第二指标数据进行聚类的聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,但不限于此。
步骤S206,根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
本公开实施例通过获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,解决了如何确定出指标阈值的问题。
在一个示例性实施例中,为了更好的理解上述步骤S202中获取目标指标对应的聚合指标数据的实现过程,提出了以下实现步骤:确定预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据和所述目标指标对应的时间聚合粒度;根据所述预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据确定待聚合指标数据;按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据。
在一个示例性实施例中,提出了一种按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据的技术方案,具体包括:获取所述待聚合指标数据的第一时间粒度;在确定所述第一时间粒度小于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,在将多个第一时间粒度聚合为所述时间聚合粒度;将所述多个第一时间粒度内的多个第一指标数据聚合为所述时间聚合粒度内的第一聚合指标数据,将所述第一聚合指标数据确定为所述目标指标对应的聚合指标数据。
在一个示例性实施例中,还可以在确定所述第一时间粒度等于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,将所述第一指标数据确定为所述目标指标对应的聚合指标数据。
在一个示例性实施例中,在对所述第二指标数据进行聚类之前,进一步的,可以对所述第二指标数据进行标准化处理,得到多个标准化指标数值,其中,每一个标准化指标数值对应有排序序号;对于每一个标准化指标数值,将所述标准化指标数值对应的排序序号确定为横坐标,将所述标准化指标数值确定为纵坐标,得到所述标准化指标数值对应的坐标点;确定出两两相邻的所述坐标点之间的坐标斜率,得到多个坐标斜率,对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值;根据所述多个平 滑值确定第三指标数据,并将所述第三指标数据确定为更新后的所述第二指标数据。
需要说明的是,上述标准化处理可以包括归一化处理,例如,采用归一化的处理方式对第二指标数据进行标准化,将范围压缩至[0,1]范围内,可以将数据标准化,从而提高数据的处理效率。
在一个示例性实施例中,在对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值的过程中,提出了如下技术方案:按照预设聚类算法对将所述多个坐标斜率进行聚类,得到多组斜率值;对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值。
其中,上述预设聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,本公开对此不作限制。
在一个示例性实施例中,还提出了一种技术方案,具体步骤包括:在确定所述多组斜率值中存在目标组斜率值的情况下,将与所述目标组斜率值相邻的临近组斜率值的坐标斜率的平滑值确定为所述目标组斜率值内的坐标斜率的平滑值,或者按照预设平滑值确定所述目标组斜率值内的坐标斜率的平滑值,其中,所述目标组斜率值内的坐标斜率的数量与所述每一组斜率值内的坐标斜率的数量不同。
在一个示例性实施例中,在对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数之后,进一步的,可以根据所述每个分组的指标数据的均值得到多组均值,并根据所述多组均值确定均值集合,其中,所述均值集合包括各个分段函数对应的均值;确定所述分段函数的交点坐标集合的交点坐标,确定所述交点坐标对应的左导数和右导数,并确定所述左导数在所述均值集合内对应的第一分段函数的第一均值和所述右导数在所述均值集合内对应的第二分段函数的第二均值;基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内。
在一个示例性实施例中,针对基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内的实现过程,提出以下技术方案进行说明:确定所述交点坐标与原点坐标之间的第一绝对距离值;根据所述第一绝对距离值和所述第一均值确定所述第一均值对应的第一坐标信息,其中,所述第一坐标信息表示所述第一分段函数的自变量取值;根据所述第一绝对距离值和所述第二均值确定所述第二均值对应的第二坐标信息,其中,所述第二坐标信息表示所述第二分段函数的自变量取值;在确定所述第一坐标信息与所述第二坐标信息相同的情况下,将所述交点坐标保留在所述指标阈值集合内;在确定所述第一坐标信息与所述第二坐标信息不同的情况下,将所述交点坐标保留在所述指标阈值集合内。
在一个示例性实施例中,提出了一种如何实现上述步骤S206中根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值的技术方案,具体包括:确定出所述分段函数的不可导点集合和二阶导数为目标值的坐标点;基于所述不可导点集合、所述二阶导数为目标值的坐标点和所述分段函数的交点坐标集合确定出所述指标阈值集合;根据所 述目标指标的指标偏向性从指标阈值集合内确定出指标阈值。
其中,需要说明的是,上述目标值例如可以取0,但不限于此。
在一个示例性实施例中,可以通过多种方式实现根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值的过程,具体包括:方式1、在确定所述目标指标的指标偏向性为负向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值。
方式2、在确定所述目标指标的指标偏向性为正向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值。
可选的,在上述实施例中,告警阈值可以理解为目标指标的指标数据对应的性能较差时的阈值,例如,设置CPU使用率的告警阈值为80%,此时CPU占用资源较多,性能较差。而择优阈值可以理解为目标指标的指标数据对应的性能较优时的阈值,例如,设置网络时延的择优阈值为10%,此时网络时延较小,性能较优。特别的,对于多个择优阈值,从中选取最小的作为择优时延。
进一步的,通过以下步骤对本公开实施例的指标阈值的确定过程作进一步的详细描述。
(1)模型构建单元:
首先基于运维应用实际业务场景,确定监测维度和指标,构建运维场景关键KPI指标体系{KPI1,KPI2,...,KPIn};然后根据实际的业务运维实时性要求,确定时间聚合粒度,对每个维度-KPI数据进行时间粒度指标聚合,构建{维度(即监测维度)、对象(即监测对象)、时间粒度(即时间聚合粒度)、指标(即目标指标的指标类别)、数据(即指标类别下的初始指标数据)}五元组数据;其中,业务指标配置信息需要明确指标的偏向性及指标的正常范围区间。
(2)基于离散样本点求解映射函数求解:
步骤a、构建单维度单对象或多对象KPI指标数据集合(对象范围选择取决于实际应用场景),将时序指标进行数据值排序(从小到大或从大到小均可),得到ID值和KPI指标值组合的二维序列{i:Valuei},i∈[1,N],N为数据集合的样本个数,ID值为排序后KPI指标的对应序号,从1开始以间隔为1的长度递增。
为了更易于形象化地理解,二维序列可以表现为坐标轴分别为样本ID和KPI指标值的二维离散点图像。如图3所示,以ID为横坐标,以KPI指标值为纵坐标的坐标点。
步骤b、考虑到KPI指标值可能波动较大,为了便于后续处理,先对指标进行标准化。采用归一化的处理方式对KPI指标值进行标准化,将范围压缩至[0,1]范围内。
步骤c、分段计算连续两点间斜率k=(y2-y1)/(x2-x1),并对斜率进行三点均值平滑斜率k’,基于平滑斜率k’及ID值进行聚类,得到G个分组,计算各分组的平滑斜率均值,使用集合K进行标记。
其中,采用聚类的算法可以避免在算法模型构建的过程中过多地进行参数设置,并且可以将序列中有差异的数据进行分类,便于后续更加容易完成转折点的获取,本公开对聚类算法的类型不作具体限定。
步骤d、对步骤c的G个聚类分组结果集,分别进行曲线拟合,获得拥有G个拟合函数的分段函数f(x)。曲线拟合的方式可以快速获得近似的分段函数,为后续在阈值自动计算的过程中求解转折点提供了有效的途径。
(3)阈值自动计算:
步骤a、求解各相邻分段拟合函数的交点坐标信息,对交点坐标,再求解x∈[1,N]区间,f(x)不可导点集合C及二阶导数为0的坐标点,组成阈值集合T。对x∈C,分别计算左导数k1’及右导数k2’,针对k1’、k2’分别计算与k∈K的绝对距离,判定归属类别。若左、右导数均归属同一分组,表明该点并非我们想寻找的转折点,则将该点从阈值集合T中剔除。
步骤b、结合步骤1业务指标配置信息中提供的指标偏向性信息,取阈值集合T中最大值或最小值为所求得阈值解。
接下来结合以下实施例对指标阈值的确定方法进行进一步说明,以下实施例采用了不同的运维场景以及其对应的指标数据进行说明。
实施例1
以基于底层监控指标的IT设备运维场景进行举例,在运维过程中,时常需要根据服务器的资源使用情况来进行阈值设置,当达到设定阈值时需要发出告警,并考虑进行服务器扩容。
对于模型构建单元,确定本实施例的监测维度为服务器,对象为服务器A、服务器B、服务器C,指标为CPU使用率(%)、内存使用率(%)、磁盘使用率(%)、网络速率(kbps),以上指标构建成服务器设备运维场景关键的KPI指标体系;
根据实际的业务运维实时性要求,确定时间聚合粒度为1小时,以服务器的CPU使用率(%)为目标指标为例,对该指标进行时间粒度聚合,构建{维度、对象、时间粒度、指标、数据}五元组数据;业务指标配置信息明确CPU使用率(%)指标的偏向性为负向,指标的正常范围区间为0到100。
获得如表1所示的某天24个小时以1小时为粒度,经过聚合后的CPU使用率指标数据:
表1:指标数据记录表

基于上述模型构建单元的内容,根据离散样本点求解映射函数。
首先,在模型构建单元的结果中,确定服务器A、服务器B为对象,选择其对应的指标数据集合,将集合中的指标数据按照数值进行排序,得到ID值和CPU使用率(%)数值组合的二维序列{i:Valuei},i∈[1,48],构建横轴为样本ID、纵轴为CPU使用率(%)的二维离散点图像。
其次,对CPU使用率(%)的指标数值进行归一化处理,可以采用最小值-最大值缩放法,将指标范围压缩至[0,1]范围内,归一化函数如下所示:
其中,上述二维序列内的二维序列离散点可以理解为上述标准化指标数值对应的坐标点。
再次,分段计算连续两点间斜率(相当于上述坐标斜率),并对斜率进行三点均值平滑斜率k’,最后2个点无法计算三点均值平滑斜率,可以用前面1个点的三点均值平滑斜率代替,得到ID值和三点均值平滑斜率组合的新序列{i:k’i},i∈[1,48],对序列中的值进行DBSCAN聚类,得到4个分组,计算各分组的平滑斜率均值(相当于对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值的过程),得到集合K={0.046,0.012,0.034,0.089}。
最后,对上步每个分组的数据使用线性一次函数进行拟合,得到有4个分段的分段函数f(x)。
基于上述求解的映射函数,进行阈值自动学习。
首先,求解各相邻分段拟合函数的交点坐标集合X={(10,0.3),(30,0.5),(40,0.8)},对于交点坐标,再求解x∈[1,48]区间,f(x)不可导点集合C及二阶导数为0的坐标点,组 成阈值集合T,此实施例中集合C和T同样为{(10,0.3),(30,0.5),(40,0.8)}。对x∈C,分别计算左导数k1’及右导数k2’,针对k1’、k2’分别计算与k∈K的绝对距离,k选择坐标点左、右两侧分段函数所对应分组的平滑斜率均值,根据计算的绝对距离,判定归属类别。对于坐标点(10,0.3),左导数k1’、右导数k2’分别为0.04和0.01,坐标点(10,0.3)左、右两侧分段函数所对应分组的平滑斜率均值分别为0.046和0.012,根据绝对距离计算,左导数k1’、右导数k2’属于不同分组,则不将该点从阈值集合T中剔除。其他坐标点以此类推,最终得到阈值集合T为{(10,0.3),(30,0.5),(40,0.8)}。
其次,结合模型构建单元业务指标配置信息中设置的指标偏向性为负向,且使用场景为寻找需要发出告警的CPU使用率(%),即较差阈值,故选取阈值集合T中三点均值平滑斜率的最大值0.8为生成阈值的参考值,根据该值反向求得归一化前的CPU使用率(%)指标值90.72,即为所需阈值解。
实施例2
以基于模型构建KPI/KQI指标的业务***运维场景进行举例,在移动运营商的网络运维中,需要关注小区的KQI指标情况,通过小区的KQI来对小区的感知情况进行评价,在评价的过程中通常需要进行KQI指标阈值的设定。
对于模型构建单元,确定本实施例的监测维度为小区,对象为小区622001、小区622002、小区622003、...、小区622099,指标为TCP连接成功率(%)、TCP重传率(%)、TCP乱序率(%)、TCP平均RTT时延(ms);
根据实际的业务运维实时性要求,确定时间聚合粒度为1小时,以小区的TCP平均RTT时延(ms)为例,对该指标进行时间粒度聚合,构建{维度、对象、时间粒度、指标、数据}五元组数据;业务指标配置信息明确TCP平均RTT时延(ms)指标的偏向性为负向,指标的正常范围为大于等于0。
获得某天24个小时以1小时为粒度,经过聚合后的TCP平均RTT时延(ms)指标数据。
基于上述模型构建单元的内容,根据离散样本点求解映射函数。
首先,在模型构建单元的结果中,确定小区622001、小区622002、小区622003、...、小区622050共50个小区为对象,选择其对应的指标数据集合,将集合中的指标数据按照数值进行排序,得到ID值和TCP平均RTT时延(ms)数值组合的二维序列{i:Valuei},i∈[1,1200]。构建横轴为样本ID、纵轴为TCP平均RTT时延(ms)的二维离散点图像。
其次,对TCP平均RTT时延(ms)的指标数值进行归一化处理,可以采用最小值-最大值缩放法,将指标范围压缩至[0,1]范围内。
再次,分段计算连续两点间斜率,并对斜率进行三点均值平滑斜率k’,最后2个点无法计算三点均值平滑斜率,可以用前面1个点的三点均值平滑斜率代替,得到ID值和三点均值平滑斜率组合的新序列{i:k’i},i∈[1,1200],对序列中的值进行高斯混合聚类,得到13个分组,计算各分组的平滑斜率均值,得到集合K。
最后,对上步每个分组的数据使用线性一次函数、二次函数等进行拟合,可以选择拟合 最好的曲线作为拟合结果,得到有13个分段的分段函数f(x)。
基于上述求解的映射函数,进行阈值自动学习。
首先,求解各相邻分段拟合函数的交点坐标集合X,对于交点坐标,再求解x∈[1,1200]区间,f(x)不可导点集合C及二阶导数为0的坐标点,组成阈值集合T。对x∈C,分别计算左导数k1’及右导数k2’,针对k1’、k2’分别计算与k∈K的绝对距离,k选择坐标点左、右两侧分段函数所对应分组的平滑斜率均值,根据计算的绝对距离,判定归属类别。如果属于相同分组,则将该点从阈值集合T中剔除。
其次,结合模型构建单元业务指标配置信息中设置的指标偏向性为负向,且使用场景为寻找需要理想阈值,故选取阈值集合T中三点均值平滑斜率的最小值0.123为生成阈值的参考值,根据该值反向求得归一化前的TCP平均RTT时延(ms)指标值2.5,即为所需阈值解。
基于上述实施例,根据不同类型指标在现实网络的实际分布特征,比如大部分指标都具备“三段式”特征,即少量“劣化”样本、绝大多数“正常”样本、少量“优质”样本,可以将指标阈值的计算问题转化为图像的求解问题。
在求解的过程中,首先对指标的单边性进行划分,区分正向、负向指标,其次再对指标数据进行预处理并计算曲线的变化率,随后采用机器学习算法对指标数据进行模型训练,最后将阈值的学习转化为基于指标单边性转折点的求解问题,以较低的成本和较高的准确性实现不同类型指标阈值的智能生成。
相较于传统的固定经验阈值或者相对复杂的基于统计学分布的动态阈值,可以较为彻底地解决阈值或转换后的阈值需要人工设置的问题。本公开的技术方案具有更好的适用性和准确性,为移动运营商的运维支撑和运营分析提供了有力的保障,不仅助力移动运营商更加准确地进行运维支撑和运营分析,也大大节省了人力成本。
另外,本公开涉及大数据和人工智能技术领域,尤其涉及互联网、物联网等存在大量指标阈值需要进行针对性设置的通信大数据及其工程运维领域,比如移动运营商的运维支撑和运营分析,如异常检测、根因分析、数据预测、告警管理、智能恢复和感知评估等场景。目前市场上的指标阈值的设定方法应用非常广泛,并且指标阈值主要是基于指标的固定经验阈值或者相对复杂的统计学分布得到的动态阈值进行设定。随着人工智能技术的深度发展,基于人工智能算法构建的模型具有更好的适用性,本公开将指标阈值的计算问题转化为图像的求解问题,并结合人工智能算法进行训练和预测,所采用的人工智能算法在实际应用中也有较多的选择,比如神经网络、聚类、分类等算法,所构建的模型具有较好的准确性和较广阔的应用前景,为移动运营商工程运维的精确化和智能化提供了前提,也为人力成本的降低明确了方向。
本公开针对运维领域,特别是大型复杂架构***。包括基于底层监控指标的IT设备运维及基于模型构建KPI/KQI指标的业务***运维。通过对***关键指标采集、清洗、监测维度模型构建及指标阈值自动学习,从而识别***中可能存在的故障或风险,进而方便网优人员对故障进行提前处置或规避。
在本实施例中还提供了指标阈值的确定装置,该装置用于实现上述实施例及优选实施方 式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的设备较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图4是根据本公开实施例的指标阈值的确定装置的结构框图。如图4所示,指标阈值的确定装置包括:
获取模块42,设置为获取目标指标对应的聚合指标数据;
第一确定模块44,设置为从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;
需要说明的是,上述对第二指标数据进行聚类的聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,但不限于此。
第二确定模块46,设置为根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
通过上述装置,通过获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,解决了如何确定出指标阈值的问题。
在一个示例性实施例中,上述获取模块42还设置为:确定预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据和所述目标指标对应的时间聚合粒度;根据所述预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据确定待聚合指标数据;按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据。
在一个示例性实施例中,上述获取模块42还设置为:获取所述待聚合指标数据的第一时间粒度;在确定所述第一时间粒度小于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,在将多个第一时间粒度聚合为所述时间聚合粒度;将所述多个第一时间粒度内的多个第一指标数据聚合为所述时间聚合粒度内的第一聚合指标数据,将所述第一聚合指标数据确定为所述目标指标对应的聚合指标数据。
在一个示例性实施例中,上述获取模块42还设置为:在确定所述第一时间粒度等于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,将所述第一指标数据确定为所述目标指标对应的聚合指标数据。
在一个示例性实施例中,上述获取模块42还设置为:在对所述第二指标数据进行聚类之前,对所述第二指标数据进行标准化处理,得到多个标准化指标数值,其中,每一个标准化指标数值对应有排序序号;对于每一个标准化指标数值,将所述标准化指标数值对应的排序序号确定为横坐标,将所述标准化指标数值确定为纵坐标,得到所述标准化指标数值对应的坐标点;确定出两两相邻的所述坐标点之间的坐标斜率,得到多个坐标斜率,对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值;根据所述多个平滑值确定第三指标数据,并将所述第三指标数据确定为更新后的所述第二指标数据。
需要说明的是,上述标准化处理可以包括归一化处理,例如,采用归一化的处理方式对第二指标数据进行标准化,将范围压缩至[0,1]范围内,可以将数据标准化,从而提高数据的处理效率。
在一个示例性实施例中,上述获取模块42还设置为:在对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值的过程中,按照预设聚类算法对将所述多个坐标斜率进行聚类,得到多组斜率值;对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值。
其中,上述预设聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,本公开对此不作限制。
在一个示例性实施例中,上述获取模块42还设置为:在确定所述多组斜率值中存在目标组斜率值的情况下,将与所述目标组斜率值相邻的临近组斜率值的坐标斜率的平滑值确定为所述目标组斜率值内的坐标斜率的平滑值,或者按照预设平滑值确定所述目标组斜率值内的坐标斜率的平滑值,其中,所述目标组斜率值内的坐标斜率的数量与所述每一组斜率值内的坐标斜率的数量不同。
在一个示例性实施例中,上述第一确定模块44还设置为:根据所述每个分组的指标数据的均值得到多组均值,并根据所述多组均值确定均值集合,其中,所述均值集合包括各个分段函数对应的均值;确定所述分段函数的交点坐标集合的交点坐标,确定所述交点坐标对应的左导数和右导数,并确定所述左导数在所述均值集合内对应的第一分段函数的第一均值和所述右导数在所述均值集合内对应的第二分段函数的第二均值;基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内。
在一个示例性实施例中,上述第一确定模块44还设置为:确定所述交点坐标与原点坐标之间的第一绝对距离值;根据所述第一绝对距离值和所述第一均值确定所述第一均值对应的第一坐标信息,其中,所述第一坐标信息表示所述第一分段函数的自变量取值;根据所述第一绝对距离值和所述第二均值确定所述第二均值对应的第二坐标信息,其中,所述第二坐标信息表示所述第二分段函数的自变量取值;在确定所述第一坐标信息与所述第二坐标信息相同的情况下,将所述交点坐标保留在所述指标阈值集合内;在确定所述第一坐标信息与所述第二坐标信息不同的情况下,将所述交点坐标保留在所述指标阈值集合内。
在一个示例性实施例中,上述第二确定模块46还设置为:确定出所述分段函数的不可导点集合和二阶导数为目标值的坐标点;基于所述不可导点集合、所述二阶导数为目标值的坐标点和所述分段函数的交点坐标集合确定出所述指标阈值集合;根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值。
在一个示例性实施例中,上述第二确定模块46还设置为:在确定所述目标指标的指标偏向性为负向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值。
在一个示例性实施例中,上述第二确定模块46还设置为:在确定所述目标指标的指标偏向性为正向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值。
进一步的,本公开提出了一种基于曲线图像计算的阈值智能学习运维装置,能够解决业界阈值自学习领域替换的核心问题(即将一个阈值自动学习过程转换为另一个阈值的门限设置),真正做到无人工干预的阈值自动识别及运维。通过对大数据运维指标的采集、清洗、模型构建,获得基于实际应用场景粒度下监测对象的指标数据;将构建后数据进行序列图形化展示,将分布数据转换为曲线图像,并通过对分布的转折点求解,结合实际的指标业务特征,进一步实现指标阈值自学习功能。
本公开第一方面提供了基于采集指标的模型构建单元,设置为实现数据清洗、聚体模型描述及核心业务指标配置项描述等功能。
本公开第二方面提供了一种基于离散样本点求解映射函数的方法,所述方法包括:基于模型构建后的时序数据转换为本公开求解阈值时所需的图像表现形式,需要说明的是,此处不是要真正将图像进行绘制,而是转后的数据序列能够表达图像的特点;通过聚类、图像拟合算法,得到图像基于样本序列的映射函数。
本公开第三方面提供了一种基于曲线图像计算的阈值自动学习计算方法,所述方法包括:通过对上述步骤所得函数曲线的斜率变化率求解,结合指标的业务特性,获得阈值智能识别结果。
本公开第四方面提供了一种基于曲线图像计算的阈值自动学习装置,所述装置包括:数据实时聚合模块,设置为针对运维***多维度***每个实体节点的关键KPI指标进行实时数据清洗及指标聚合;阈值智能识别模块,设置为执行上述步骤中的方法。
本公开第五方面提供了一种电子设备,所述电子设备包括计算机处理器以及存储器:计算机存储器设置为存储计算机程序;
处理器设置为实现上述第一方面所述的模型构建单元所实现的功能,以及执行上述第二方面所述的一种基于离散样本点求解映射函数的方法和第三方面所述的一种基于曲线图像计算的阈值自动学习计算方法。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个可读存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例的方法。
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,获取目标指标对应的聚合指标数据;
S2,从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;
S3,根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (15)

  1. 一种指标阈值的确定方法,包括:
    获取目标指标对应的聚合指标数据;
    从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;
    根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
  2. 根据权利要求1所述的指标阈值的确定方法,其中,获取目标指标对应的聚合指标数据,包括:
    确定预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据和所述目标指标对应的时间聚合粒度;
    根据所述预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据确定待聚合指标数据;
    按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据。
  3. 根据权利要求2所述的指标阈值的确定方法,其中,按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据,包括:
    获取所述待聚合指标数据的第一时间粒度;
    在确定所述第一时间粒度小于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,在将多个第一时间粒度聚合为所述时间聚合粒度;
    将所述多个第一时间粒度内的多个第一指标数据聚合为所述时间聚合粒度内的第一聚合指标数据,将所述第一聚合指标数据确定为所述目标指标对应的聚合指标数据。
  4. 根据权利要求3所述的指标阈值的确定方法,其中,所述方法还包括:
    在确定所述第一时间粒度等于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,将所述第一指标数据确定为所述目标指标对应的聚合指标数据。
  5. 根据权利要求1所述的指标阈值的确定方法,其中,在对所述第二指标数据进行聚类之前,所述方法还包括:
    对所述第二指标数据进行标准化处理,得到多个标准化指标数值,其中,每一个标准化指标数值对应有排序序号;
    对于每一个标准化指标数值,将所述标准化指标数值对应的排序序号确定为横坐标,将所述标准化指标数值确定为纵坐标,得到所述标准化指标数值对应的坐标点;
    确定出两两相邻的所述坐标点之间的坐标斜率,得到多个坐标斜率,对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值;
    根据所述多个平滑值确定第三指标数据,并将所述第三指标数据确定为更新后的所述第二指标数据。
  6. 根据权利要求5所述的指标阈值的确定方法,其中,在对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值的过程中,包括:
    按照预设聚类算法对将所述多个坐标斜率进行聚类,得到多组斜率值;
    对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值。
  7. 根据权利要求6所述的指标阈值的确定方法,其中,所述方法还包括:
    在确定所述多组斜率值中存在目标组斜率值的情况下,将与所述目标组斜率值相邻的临近组斜率值的坐标斜率的平滑值确定为所述目标组斜率值内的坐标斜率的平滑值,或者按照预设平滑值确定所述目标组斜率值内的坐标斜率的平滑值,其中,所述目标组斜率值内的坐标斜率的数量与所述每一组斜率值内的坐标斜率的数量不同。
  8. 根据权利要求1所述的指标阈值的确定方法,其中,在对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数之后,所述方法还包括:
    根据所述每个分组的指标数据的均值得到多组均值,并根据所述多组均值确定均值集合,其中,所述均值集合包括各个分段函数对应的均值;
    确定所述分段函数的交点坐标集合的交点坐标,确定所述交点坐标对应的左导数和右导数,并确定所述左导数在所述均值集合内对应的第一分段函数的第一均值和所述右导数在所述均值集合内对应的第二分段函数的第二均值;
    基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内。
  9. 根据权利要求8所述的指标阈值的确定方法,其中,基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内,包括:
    确定所述交点坐标与原点坐标之间的第一绝对距离值;
    根据所述第一绝对距离值和所述第一均值确定所述第一均值对应的第一坐标信息,其中,所述第一坐标信息表示所述第一分段函数的自变量取值;
    根据所述第一绝对距离值和所述第二均值确定所述第二均值对应的第二坐标信息,其中,所述第二坐标信息表示所述第二分段函数的自变量取值;
    在确定所述第一坐标信息与所述第二坐标信息相同的情况下,将所述交点坐标保留在所述指标阈值集合内;
    在确定所述第一坐标信息与所述第二坐标信息不同的情况下,将所述交点坐标保留在所 述指标阈值集合内。
  10. 根据权利要求1所述的指标阈值的确定方法,其中,根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,包括:
    确定出所述分段函数的不可导点集合和二阶导数为目标值的坐标点;
    基于所述不可导点集合、所述二阶导数为目标值的坐标点和所述分段函数的交点坐标集合确定出所述指标阈值集合;
    根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值。
  11. 根据权利要求1所述的指标阈值的确定方法,其中,根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值,包括:
    在确定所述目标指标的指标偏向性为负向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值;
    如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值。
  12. 根据权利要求1所述的指标阈值的确定方法,其中,根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值,包括:
    在确定所述目标指标的指标偏向性为正向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值;
    如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值。
  13. 一种指标阈值的确定装置,包括:
    获取模块,设置为获取目标指标对应的聚合指标数据;
    第一确定模块,设置为从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;
    第二确定模块,设置为根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。
  14. 一种计算机可读的存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至12任一项中所述的方法。
  15. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至12任一项中所述的方法。
PCT/CN2023/110331 2022-09-30 2023-07-31 指标阈值的确定方法、装置、存储介质及电子装置 WO2024066720A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211225305.3 2022-09-30
CN202211225305.3A CN117875746A (zh) 2022-09-30 2022-09-30 指标阈值的确定方法、装置、存储介质及电子装置

Publications (1)

Publication Number Publication Date
WO2024066720A1 true WO2024066720A1 (zh) 2024-04-04

Family

ID=90475973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110331 WO2024066720A1 (zh) 2022-09-30 2023-07-31 指标阈值的确定方法、装置、存储介质及电子装置

Country Status (2)

Country Link
CN (1) CN117875746A (zh)
WO (1) WO2024066720A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118051800A (zh) * 2024-04-16 2024-05-17 山东峰华智能科技有限公司 基于云边协同的矿内监测数据智能处理方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127455A1 (en) * 2013-11-06 2015-05-07 Globys, Inc. Automated entity classification using usage histograms & ensembles
CN110489306A (zh) * 2019-08-26 2019-11-22 北京博睿宏远数据科技股份有限公司 一种报警阈值确定方法、装置、计算机设备及存储介质
CN111985815A (zh) * 2020-08-21 2020-11-24 国网能源研究院有限公司 一种能源电力运营评价指标的筛选方法及装置
CN114780371A (zh) * 2022-05-10 2022-07-22 平安科技(深圳)有限公司 基于多曲线拟合的压测指标分析方法、装置、设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127455A1 (en) * 2013-11-06 2015-05-07 Globys, Inc. Automated entity classification using usage histograms & ensembles
US20150127454A1 (en) * 2013-11-06 2015-05-07 Globys, Inc. Automated entity classification using usage histograms & ensembles
CN110489306A (zh) * 2019-08-26 2019-11-22 北京博睿宏远数据科技股份有限公司 一种报警阈值确定方法、装置、计算机设备及存储介质
CN111985815A (zh) * 2020-08-21 2020-11-24 国网能源研究院有限公司 一种能源电力运营评价指标的筛选方法及装置
CN114780371A (zh) * 2022-05-10 2022-07-22 平安科技(深圳)有限公司 基于多曲线拟合的压测指标分析方法、装置、设备及介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118051800A (zh) * 2024-04-16 2024-05-17 山东峰华智能科技有限公司 基于云边协同的矿内监测数据智能处理方法

Also Published As

Publication number Publication date
CN117875746A (zh) 2024-04-12

Similar Documents

Publication Publication Date Title
WO2020077672A1 (zh) 一种服务质量评估模型的训练方法及装置
CN114298863B (zh) 一种智能抄表终端的数据采集方法及***
EP4024261A1 (en) Model training method, apparatus, and system
CN109446783B (zh) 一种基于机器众包的图像识别高效样本采集方法及***
CN107038167A (zh) 基于模型评估的大数据挖掘分析***及其分析方法
WO2021057382A1 (zh) 一种异常检测方法、装置、终端及存储介质
WO2024066720A1 (zh) 指标阈值的确定方法、装置、存储介质及电子装置
CN113408087B (zh) 一种基于云边***和视频智能分析的变电站巡视方法
CN112365366B (zh) 一种基于智能化5g切片的微电网管理方法及***
WO2021007871A1 (zh) 一种基于云边协同的氧化铝生产运行优化***及方法
CN108243435B (zh) 一种lte小区场景划分中的参数优化方法及装置
WO2021103823A1 (zh) 模型更新***、模型更新方法及相关设备
CN109494757B (zh) 一种电压无功运行预警方法和***
CN111176953A (zh) 一种异常检测及其模型训练方法、计算机设备和存储介质
CN118017509A (zh) 基于数字孪生空间的大规模配电网并行优化方法
CN110555619A (zh) 一种基于智能配电网的供电能力评估方法
CN107666403B (zh) 一种指标数据的获取方法及装置
CN110647086B (zh) 一种基于运行大数据分析的智能运维监控***
CN111343271A (zh) 一种基于dtu设备的人工智能云控制方法及***
CN112860768B (zh) 一种电磁频谱可用频率推荐方法
CN115328870A (zh) 一种面向云制造的数据共享方法及***
CN115689320A (zh) 基站运行的健康管理方法、装置及计算机可读存储介质
CN108123436B (zh) 基于主成分分析和多元回归算法的电压越限预测模型
CN106980925B (zh) 一种基于大数据的区域电网调度方法
CN111949246A (zh) 新能源电力行业应用的创建方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23869966

Country of ref document: EP

Kind code of ref document: A1