WO2022160675A1 - 根因确定方法及装置 - Google Patents

根因确定方法及装置 Download PDF

Info

Publication number
WO2022160675A1
WO2022160675A1 PCT/CN2021/113331 CN2021113331W WO2022160675A1 WO 2022160675 A1 WO2022160675 A1 WO 2022160675A1 CN 2021113331 W CN2021113331 W CN 2021113331W WO 2022160675 A1 WO2022160675 A1 WO 2022160675A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimension
combination
target
factor
abnormal
Prior art date
Application number
PCT/CN2021/113331
Other languages
English (en)
French (fr)
Inventor
黄宗怡
吴曙楠
王方舟
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022160675A1 publication Critical patent/WO2022160675A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a root cause determination method and device.
  • the multi-dimensional crossover solution in the prior art to find abnormal factors requires the user to input a hyperparameter to control the final result. Since the final result is extremely sensitive to this hyperparameter, this hyperparameter will have a huge impact on the final result. influences. Especially when the hyperparameter is set too low, the solution may meet the conditions quickly during the execution process, and the root cause search will not be continued, and the returned root cause is likely to be single-dimensional rather than cross-dimensional.
  • the reason for multi-dimensional cross root cause analysis is to find more fine-grained root factors, and only returning single-dimensional root causes is obviously not in line with expectations.
  • the hyperparameters selected through experience may not return the expected root cause, and the hyperparameters selected by experience cannot be generalized, that is, different hyperparameters need to be preset for different business scenarios, which leads to the current situation.
  • the multi-dimensional cross-searching for abnormal factors in the prior art cannot take into account various business scenarios.
  • Embodiments of the present disclosure provide a root cause determination method and device.
  • a method for determining a root cause comprising:
  • abnormal data includes a dimension and a factor included in the dimension
  • a root cause search tree is constructed, the root cause search tree includes at least one dimension node layer, each dimension node layer includes at least one dimension node, and the dimension node is associated with at least one dimension, and The number of dimensions associated with the dimension node is the same as the layer number of the dimension node where the dimension node is located;
  • the first dimension node is associated with at least one first dimension, and the first dimension factor combination includes each of the first dimension.
  • a factor, the first dimension node is any dimension node in the root factor search tree;
  • any dimension node in the root factor search tree as the first dimension node and using the increased first threshold as the new first threshold, repeat the calculation that there is an abnormal factor combination in the first dimension factor combination. and a process of increasing the new first threshold when the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimensions is not greater than the second threshold , until the recalculated first possibility parameter is greater than the new first threshold and the number of the first dimension is greater than the second threshold, and an abnormal factor combination is determined from the first dimension factor combination.
  • the method further includes:
  • the method further includes:
  • An abnormal factor combination is determined from the first dimension factor combination in response to the recalculated first likelihood parameter being greater than the new first threshold and the number of the first dimensions is not greater than the second threshold.
  • the method further includes:
  • the method further includes:
  • the target combination is the factor combination associated with the Nth layer dimension node, and N is the dimension quantity of the abnormal data;
  • the calculating the first possibility parameter of the abnormal factor combination in the first dimension factor combination includes:
  • the target proportion of the i-th first dimension factor combination is the abnormal target in the target combination associated with the i-th first dimension factor combination.
  • the proportion of combinations, i is a positive integer and i ⁇ [1, M], M ⁇ [1, N], M is the number of the first dimension factor combinations associated with the first dimension node;
  • a first possibility parameter of an abnormal factor combination in the first dimension factor combination is calculated.
  • the method further includes:
  • the preset condition includes that the change of the target object does not match the abnormal direction of the target index, the target object is the first-type index value of the target combination collected at different times, and the target index is the same as the target index.
  • the business indicator associated with the dimension, and the value of the first type of indicator is the value of the target indicator;
  • the determining the abnormal target combination includes:
  • An abnormal target combination in the first remaining target combination is determined.
  • the determining an abnormal target combination includes:
  • the target combinations with offsets greater than the first target offsets are abnormal targets combination, wherein, the first target offset is the abscissa of the first inflection point in the first offset distribution graph.
  • the determining the first inflection point in the first offset distribution curve includes:
  • the sensitive parameter S in the inflection point detection algorithm based on the elbow rule is calculated, wherein L is the target involved in the first offset distribution curve diagram
  • the total number of combinations, m and n are preset constants respectively;
  • the first inflection point in the first offset distribution curve graph is determined.
  • the determining an abnormal target combination further includes:
  • the target combinations are carried out according to the order of the offsets from small to large. Sort, get the first sort;
  • a second offset distribution graph is drawn, wherein the horizontal axis of the second offset distribution graph represents the offset, and the vertical axis represents the offset The number of target combinations whose offset is less than the offset represented by the numerical value on the horizontal axis;
  • the target combination of displacement amounts is determined as an abnormal target combination, wherein the second target displacement amount is the abscissa of the second inflection point in the second displacement amount distribution graph.
  • the calculating, according to the target proportion of the first dimension factor combination, a first possibility parameter of an abnormal factor combination in the first dimension factor combination includes:
  • the target combination associated with the candidate factor combination includes the factors in the candidate factor combination
  • the second preset number is greater than the third preset number.
  • the sorting of the combination of factors to be processed to obtain a third sorting includes:
  • the first parameter of the combination of factors to be processed is calculated, and the combination of factors to be processed is sorted in descending order of the first parameter to obtain a third ranking , wherein the target indicator is the business indicator associated with the dimension, the first parameter is the sum of the offsets of the target combinations associated with the same combination of factors to be processed, and the combination of factors to be processed is related to
  • the linked target combination includes the factors in the to-be-processed factor combination;
  • the target index is a derivative index
  • obtain the first value of each first target combination the first target combination is the target combination associated with the to-be-processed factor combination, and the first value is the The absolute value of the difference between the second-type indicator values of the first target combination at different times, the second-type indicator value is the value of the first indicator, and the first indicator is that the target indicator is a derivative indicator. , which is used as a molecular index in the process of calculating the target index;
  • the second value is the absolute value of the difference between the third-type indicator values of the first target combination at the different times, and the third-type indicator value is the value of the second index, and when the second index is that the target index is a derivative index, the index used as the denominator in the process of calculating the target index;
  • the determining an abnormal factor combination from the first dimension factor combination includes:
  • the candidate factor combination is determined as the abnormal factor combination.
  • calculating a first possibility parameter of an abnormal factor combination in the first dimension factor combination according to the target combination associated with the candidate factor combination includes:
  • a(Z1) f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)), where f( Z1) represents the sum of the first-type index values of the first abnormal target combination at the second moment, and f(Z) represents the sum of the first-type indicator values of the first target combination at the second moment , v(Z) represents the sum of the first type index values of the first target combination at the first moment, the first type index value is the value of the target index, and the target index is the dimension associated with the business indicators, the first moment is earlier than the second moment;
  • avg1, avg2, avg3, and the third preset formula GPS 1-(avg3+avg2)/(avg1+avg2), calculate the first possibility parameter of the abnormal factor combination in the first dimension factor combination ;
  • GPS represents the first possibility parameter.
  • the method further includes:
  • the target combination is the factor combination associated with the Nth layer dimension node, N is the number of dimensions of the abnormal data, and the first target combination includes the abnormality the factors in the factor combination;
  • the fourth value is the sum of the first-type index values of the second target combination at the third moment
  • the first-type index value is the value of the target index
  • the target index is the sum of the business indicators associated with the dimension
  • the fifth numerical value is the sum of the first-type index values of all the target combinations at the third moment
  • a preset prompt operation is performed, and the preset prompt operation is used to prompt that the abnormal factor is not in the abnormal data.
  • a root cause determination device comprising:
  • a data acquisition module configured to acquire abnormal data, the abnormal data includes a dimension and a factor included in the dimension;
  • the building module is configured to construct a root cause search tree according to the dimension, the root cause search tree includes at least one dimension node layer, each dimension node layer includes at least one dimension node, the dimension node Associate at least one dimension, and the number of dimensions associated with the dimension node is the same as the number of dimension nodes at which the dimension node is located;
  • the first factor combination acquisition module is configured to acquire the first dimension factor combination associated with the first dimension node in the root cause search tree, the first dimension node is associated with at least one first dimension, the first dimension
  • the factor combination includes one factor of each of the first dimensions, and the first dimension node is any dimension node in the root factor search tree;
  • a first possibility parameter calculation module configured to calculate a first possibility parameter of an abnormal factor combination in the first dimension factor combination
  • a threshold increasing module configured to increase the first threshold in response to the first likelihood parameter being greater than a first threshold and the number of the first dimensions being not greater than a second threshold;
  • the execution module is configured to take any dimension node in the root factor search tree as a first dimension node, and use the increased first threshold as a new first threshold, and repeatedly perform the calculation of the first dimension factor
  • There is a first likelihood parameter of a combination of abnormal factors in the combination and increasing the first likelihood parameter when the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimensions is not greater than the second threshold
  • a process for a new first threshold until the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimension is greater than the second threshold, determining an anomaly from the first dimension factor combination factor combination.
  • the apparatus further includes:
  • a first determination module configured to determine an anomaly from the first dimension factor combination in response to the first likelihood parameter being greater than the first threshold and the number of the first dimensions being greater than the second threshold factor combination.
  • the apparatus further includes:
  • a second determination module configured to, based on increasing the first threshold, in response to the recalculated first likelihood parameter being greater than the new first threshold and the number of the first dimensions not being greater than the second threshold , and determine the abnormal factor combination from the first dimension factor combination.
  • the apparatus further includes:
  • a second factor combination acquiring module configured to acquire, in response to the first possibility parameter not being greater than the first threshold, acquire a second dimension factor combination associated with a second dimension node, where the second dimension node is associated with at least a second dimension, and the second dimension factor combination includes one factor of each of the second dimensions;
  • the second possibility parameter calculation module is configured to calculate a second possibility parameter that an abnormal factor combination exists in the second dimension factor combination.
  • the apparatus further includes:
  • an abnormal target combination determination module configured to determine an abnormal target combination, the target combination is a factor combination associated with the Nth layer dimension node, and N is the dimension number of the abnormal data;
  • the first possibility parameter calculation module includes:
  • the proportion calculation sub-module is configured to calculate the target proportion of each combination of the first dimension factors, wherein, the target proportion of the i-th first dimension factor combination is, the i-th first dimension factor combination
  • the proportion of abnormal target combinations in the target combination associated with the factor combination, i is a positive integer and i ⁇ [1, M], M ⁇ [1, N], M is the first dimension associated with the first dimension node the number of dimension factor combinations;
  • the possibility parameter calculation sub-module is configured to calculate a first possibility parameter of an abnormal factor combination in the first dimension factor combination according to the target proportion of the first dimension factor combination.
  • the apparatus further includes:
  • a deletion module configured to delete the target combination that meets the preset condition to obtain the first remaining target combination
  • the preset condition includes that the change of the target object does not match the abnormal direction of the target index, the target object is the first-type index value of the target combination collected at different times, and the target index is the same as the target index.
  • the business indicator associated with the dimension, and the value of the first type of indicator is the value of the target indicator;
  • the abnormal target combination determination module determines the abnormal target combination, it is specifically configured as:
  • An abnormal target combination in the first remaining target combination is determined.
  • the abnormal target combination determination module includes:
  • an offset obtaining submodule configured to obtain the offset of the target combination
  • the first drawing submodule is configured to draw a first offset distribution graph, wherein the horizontal axis of the first offset distribution graph represents the offset, and the vertical axis represents that the offset is less than that represented by the horizontal axis the number of target combinations of offsets;
  • a first inflection point determination submodule configured to determine a first inflection point in the first offset distribution graph
  • the first abnormal target combination determination sub-module is configured to, among all the target combinations, the proportion of the target combination whose offset is greater than the first target offset is not greater than the fifth threshold, then determine that the offset is greater than the
  • the target combination of the first target offset is an abnormal target combination, wherein the first target offset is the abscissa of the first inflection point in the first offset distribution graph.
  • the first inflection point determination sub-module is specifically configured as:
  • the sensitive parameter S in the inflection point detection algorithm based on the elbow rule is calculated, wherein L is the target involved in the first offset distribution curve graph
  • the total number of combinations, m and n are preset constants respectively;
  • the first inflection point in the first offset distribution curve graph is determined.
  • the abnormal target combination determination module further includes:
  • a sorting sub-module configured such that among all the target combinations, if the proportion of target combinations whose offsets are greater than the first target offsets is greater than the fifth threshold, then the offsets are in ascending order of the offsets. , sort the target combination to obtain the first sorting;
  • a deletion submodule configured to remove the first preset number of target combinations in the first sorting to obtain a second remaining target combination
  • the second drawing sub-module is configured to draw a second offset distribution graph according to the offset of the second remaining target combination obtained this time, wherein the horizontal direction of the second offset distribution graph is The axis represents the offset, and the vertical axis represents the number of target combinations whose offset is less than the offset represented by the numerical value on the horizontal axis;
  • a second inflection point determination sub-module configured to determine a second inflection point in the second offset distribution curve
  • the second abnormal target combination determination sub-module is configured such that in the second remaining target combination obtained this time, the proportion of the target combination whose offset is greater than the second target offset is not greater than the fifth threshold, Then, a target combination with an offset greater than the second target offset is determined as an abnormal target combination, wherein the second target offset is the second inflection point in the second offset distribution curve the abscissa in .
  • the likelihood parameter calculation sub-module is specifically configured as:
  • the target combination associated with the candidate factor combination includes the factors in the candidate factor combination
  • the second preset number is greater than the third preset number.
  • the possibility parameter calculation sub-module is specifically configured as:
  • the first parameter of the combination of factors to be processed is calculated, and the combination of factors to be processed is sorted in descending order of the first parameter to obtain a third ranking , wherein the target indicator is the business indicator associated with the dimension, the first parameter is the sum of the offsets of the target combinations associated with the same combination of factors to be processed, and the combination of factors to be processed is related to
  • the linked target combination includes the factors in the to-be-processed factor combination;
  • the target index is a derivative index
  • obtain a first value of each first target combination the first target combination is the target combination associated with the to-be-processed factor combination, and the first value is the The absolute value of the difference between the second-type indicator values of the first target combination at different times, the second-type indicator value is the value of the first indicator, and the first indicator is when the target indicator is a derivative indicator,
  • the index is used as a molecule;
  • the second value is the absolute value of the difference between the third-type indicator values of the first target combination at the different times, and the third-type indicator value is the value of the second index, and the second index is the index used as the denominator in the process of calculating the target index when the target index is a derivative index;
  • the execution module determines the abnormal factor combination from the first dimension factor combination, it is specifically configured to:
  • the candidate factor combination is determined as the abnormal factor combination.
  • the possibility parameter calculation submodule calculates the first possibility parameter of the abnormal factor combination in the first dimension factor combination according to the target combination associated with the candidate factor combination, Specifically configured as:
  • a(Z1) f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)), where f( Z1) represents the sum of the first-type index values of the first abnormal target combination at the second moment, and f(Z) represents the sum of the first-type indicator values of the first target combination at the second moment , v(Z) represents the sum of the first type index values of the first target combination at the first moment, the first type index value is the value of the target index, and the target index is the dimension associated with the business indicators, the first moment is earlier than the second moment;
  • avg1, avg2, avg3, and the third preset formula GPS 1-(avg3+avg2)/(avg1+avg2), calculate the first possibility parameter of the abnormal factor combination in the first dimension factor combination ;
  • GPS represents the first possibility parameter.
  • the apparatus further includes:
  • the first verification parameter acquisition module is configured to acquire the second target combination associated with the abnormal factor combination, the target combination is the factor combination associated with the Nth layer dimension node, and N is the number of dimensions of the abnormal data, the first target combination includes factors in the abnormal factor combination;
  • the second verification parameter obtaining module is configured to obtain a fourth numerical value, where the fourth numerical value is the sum of the first type index values of the second target combination at the third moment, and the first type index value is the target index
  • the value of , the target indicator is the business indicator associated with the dimension
  • a third verification parameter obtaining module configured to obtain a fifth numerical value, where the fifth numerical value is the sum of the first-type index values of all the target combinations at the third moment;
  • a verification module configured to perform a preset prompt operation in response to the ratio of the fourth numerical value to the fifth numerical value being less than a third threshold, the preset prompt operation is used to prompt that the abnormal factor is not in the abnormal data .
  • an electronic device comprising:
  • memory for storing instructions executable by the processor
  • the processor is configured to execute the instructions to implement the above-mentioned root cause determination method.
  • a non-volatile computer-readable storage medium characterized in that, when an instruction in the storage medium is executed by a processor of an electronic device, the electronic device is enabled to Implement the root cause determination method described above.
  • a computer program product including a computer program, which implements the above-mentioned root cause determination method when the computer program is executed by a processor.
  • dimensions and factors included in the dimensions can be obtained, so as to construct a root cause search tree according to the dimensions, and then use any dimension node in the root cause search tree as a first dimension node, and obtain the first dimension node.
  • the first dimension factor combination associated with the point calculating the first possibility parameter of the abnormal factor combination in the first dimension factor combination, in response to the first possibility parameter being greater than the first threshold and the number of the first dimension not greater than the first threshold, Increase the first threshold.
  • the calculation of the first possibility parameter and the calculation of the first possibility parameter are repeated.
  • the process of increasing the new first threshold when the first possibility parameter is greater than the new first threshold and the number of first dimensions is not greater than the second threshold, until the recalculated first possibility parameter is greater than the new first threshold and the first dimension The number of is greater than the second threshold, and the abnormal factor combination is determined from the first dimension factor combination.
  • the embodiment of the present disclosure does not require the user to input hyperparameters, and will determine whether the number of the first dimension is greater than the number of the second dimension when a first possibility parameter greater than the initially set first threshold is obtained by calculation. Threshold, the number of the first dimension is not greater than the second threshold, it means that the threshold is set too low, the method does not go deeper into the root cause search tree to meet the threshold condition and return the result, and at this time, in the present disclosure
  • the first threshold is increased with a certain step size and the abnormal factor combination is searched again, so that the root cause of dimension intersection (ie, multi-dimension) can be output. Therefore, the embodiments of the present disclosure can return reasonable results without the user inputting any hyperparameters, and can be applied to various business scenarios.
  • FIG. 1 is a flowchart of a root cause determination method according to an exemplary embodiment.
  • FIG. 2 is a schematic diagram of a root cause search tree in an embodiment of the present disclosure
  • FIG. 3 is a flowchart of another root cause determination method according to an exemplary embodiment
  • FIG. 5 is a schematic diagram of the position of the inflection point when the sensitive parameter S has different values in an embodiment of the present disclosure
  • FIG. 6 is a graph in which the first offset distribution curve is mapped to a value range of 0 to 1 on an abscissa and a value range of 0 to 1 on an ordinate in an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of a distance curve in an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of a specific implementation of a root cause determination method in an embodiment of the present disclosure
  • Fig. 9 is a block diagram of a root cause determination device according to an exemplary embodiment
  • Fig. 10 is a block diagram of another root cause determination device according to an exemplary embodiment
  • Fig. 11 is a block diagram of an electronic device according to an exemplary embodiment
  • Fig. 12 shows a block diagram of another electronic device according to an exemplary embodiment.
  • FIG. 1 is a flowchart of a root cause determination method according to an exemplary embodiment.
  • the root cause determination method can be applied to an electronic device, such as a server, a computer, a mobile phone, and the like. As shown in FIG. 1, the method includes the following steps 101-106.
  • Step 101 Obtain abnormal data.
  • the abnormal data includes dimensions and factors included in the dimensions.
  • Step 102 Build a root cause search tree according to the dimension.
  • the root cause search tree includes at least one dimension node layer, each dimension node layer includes at least one dimension node, the dimension node is associated with at least one dimension, and the dimension node is associated with the number of dimensions It is the same as the dimension node level where the dimension node is located.
  • the dimension associated with a dimension node of the jth layer belongs to the dimension associated with a dimension node of the j+1th layer
  • the dimension node of the jth layer is the dimension of the j+1th layer
  • the parent node of the node, j is a positive integer and j ⁇ [1, N], N is the number of dimensions of the abnormal data. That is, in the embodiment of the present disclosure, the dimension associated with the parent node of a dimension node belongs to the dimension associated with the dimension node, and may also be considered to be included in the dimension associated with the dimension node.
  • abnormal data includes three dimensions A, B, and C
  • a dimension includes two factors a1, a2
  • B dimension includes three factors b1, b2, and b3
  • C dimension includes four factors, c1, c2, c3, and c4, then According to the three dimensions of A, B, and C, the root cause search tree is constructed, as shown in Figure 2.
  • the root cause search tree includes three layers, and the first layer includes three dimension nodes, which are dimension node A associated with dimension A, dimension node B associated with dimension B, and dimension node C associated with dimension C;
  • the second layer includes dimension nodes AB associated with A dimension and B dimension, dimension node AC associated with A dimension and C dimension, and dimension node BC associated with B dimension and C dimension;
  • the third layer includes A, B, and C dimensions Consists of dimension nodes ABC.
  • the A dimension is included in the A dimension and the B dimension. Therefore, the dimension node A associated with the A dimension is the parent node of the dimension node AB associated with the A dimension and the B dimension.
  • Step 103 Obtain the first dimension factor combination associated with the first dimension node in the root factor search tree.
  • the first dimension node is associated with at least one first dimension, the first dimension factor combination includes one factor of each of the first dimensions, and the first dimension node is the root factor search tree in the root factor search tree. Any dimension node.
  • the factor combination associated with dimension node A is: the factor combination composed of a1 factor and the factor combination of a2 factor combination
  • the factor combination associated with dimension node B is: The factor combination composed of b1, the factor combination composed of b2, and the factor combination composed of b3
  • the factor combination associated with dimension node C is: the factor combination composed of c1, the factor combination composed of c2, the factor combination composed of c3, the factor composed of c4 combination.
  • the factor combination associated with the dimension node AB is: a factor combination selected from two factors a1, a2, and a factor combination consisting of three factors selected from b1, b2, and b3, That is, it is a combination of 6 factors;
  • the factor combination associated with the dimension node AC is: a factor selected from the two factors a1 and a2, and a factor selected from the four factors c1, c2, c3, and c4.
  • the factor combination is 8 factor combinations;
  • the factor combinations associated with the dimension node BC are: one factor selected from the three factors b1, b2, and b3, and one factor selected from the four factors c1, c2, c3, and c4.
  • the factor combination composed of the selected factors is 12 factor combinations.
  • the factor combination associated with the dimension node ABC is: a factor selected from two factors a1, a2, a factor selected from three factors b1, b2, and b3, and a factor selected from c1, c2,
  • the factor combination composed of the factors selected from the four factors c3 and c4 is 24 factor combinations.
  • the factor combination associated with the dimension node with the largest number of associated dimensions is called the target combination.
  • the dimension node with the largest number of associated dimensions is the Nth layer dimension node in the root cause search tree, and N is abnormal.
  • the number of dimensions of the data is the target combination.
  • Step 104 Calculate a first possibility parameter of an abnormal factor combination in the first dimension factor combination.
  • the possibility parameter there is a one-to-one correspondence between the possibility parameter and the dimension node, that is, there is a possibility parameter corresponding to each dimension node in the root cause search tree.
  • the probability parameter corresponding to a dimension node represents the probability that there is an abnormal factor combination in the factor combination associated with the dimension node.
  • the likelihood parameter may be an incentive potential score (General Potential Score, GPS).
  • GPS is a value that measures the possibility of a combination of factors becoming a root cause in multi-dimensional cross root cause analysis, and the calculation method of GPS will be described later.
  • Step 105 In response to the first likelihood parameter being greater than a first threshold and the number of the first dimensions not being greater than a second threshold, increasing the first threshold.
  • Step 106 Taking any dimension node in the root factor search tree as the first dimension node, and using the increased first threshold as the new first threshold, repeat the calculation of the existence of the first dimension factor combination. a first likelihood parameter of an abnormal factor combination, and increasing the new first likelihood parameter when the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimensions is not greater than the second threshold Thresholding process, until the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimension is greater than the second threshold, determining an abnormal factor combination from the first dimension factor combination.
  • steps 105 and 106 are the process of searching for abnormal factor combinations in the factor combinations associated with the dimension nodes in the root factor search tree.
  • the found combination of abnormal factors is the "root cause”.
  • the order in which the abnormal factor combinations are searched in the factor combinations associated with the dimension nodes in the root factor search tree is not limited.
  • the root cause search tree is traversed, that is, the possibility parameter corresponding to each dimension node is calculated layer by layer; a dimension node can also be randomly selected, and the possibility parameter corresponding to the dimension node is calculated, and the calculated dimension If there is no abnormal factor combination in the factor combination associated with the node, randomly select the next dimension node for which the possibility parameter has not been calculated.
  • the dimension nodes in each layer can be traversed in the order of the first layer, the second layer, and the third layer.
  • the possibility parameters corresponding to the dimension nodes are calculated in the order of dimension node A, dimension node B, and dimension node C;
  • the order of dimension nodes AC and dimension nodes BC is used to calculate the possibility parameters corresponding to the dimension nodes.
  • the first possibility parameter is greater than the first threshold and the number of first dimensions is not greater than the second threshold, it means that the first threshold is set too low, and the method does not search deeper into the root cause search tree to satisfy the threshold condition and The result is returned.
  • the first threshold is increased by a certain step, and the abnormal factor combination is searched again.
  • the abnormal factor combination can be selected from the first dimension factor combination.
  • the method further includes:
  • the first possibility parameter calculated in step 104 is greater than the first threshold, and the number of the first dimension is greater than the second threshold, indicating that there is a high possibility of an abnormal factor combination in the first dimension factor combination, and the number of the first dimension is greater than the second threshold. is reasonable, then select the abnormal factor combination from the first dimension factor combination.
  • the method further includes:
  • An abnormal factor combination is determined from the first dimension factor combination in response to the recalculated first likelihood parameter being greater than the new first threshold and the number of the first dimensions is not greater than the second threshold.
  • the first threshold is not less than the preset value
  • the recalculated first possibility parameter is greater than the new first threshold
  • the number of first dimensions is not greater than the second threshold, it indicates an abnormal factor It is also reasonable that the number of dimensions to which the factors included in the combination belong is not greater than the second threshold.
  • the calculation of the first possibility parameter is performed repeatedly, and when the recalculated first possibility parameter is greater than the new first threshold value and the number of first dimensions is not greater than the first possibility parameter.
  • the method further includes:
  • the first possibility parameter is not greater than the first threshold, indicating that the possibility of abnormal factor combination in the first dimension factor combination is very small.
  • the possibility parameters corresponding to other dimension nodes need to be recalculated. For example, in the process of traversing the root cause search tree layer by layer in the order of the number of dimensions from small to large, if the possibility parameter of a node of a certain dimension is calculated to be less than or equal to the first threshold, then continue to traverse the remaining dimension nodes, That is, the possibility parameter of the next dimension node of the dimension node corresponding to this possibility parameter is calculated, until a possibility parameter greater than the first threshold is obtained, and the number of dimensions associated with the dimension node corresponding to this possibility parameter is obtained. If it is greater than the second threshold, stop traversing.
  • the abnormal factor combination can be selected from the second dimension factor combination.
  • the second possibility parameter is not greater than the first threshold, or the second possibility parameter is greater than the first threshold and the number of the second dimension is not greater than the second threshold, then the root cause search tree is calculated to divide the node of the first dimension and the second dimension The possibility parameters corresponding to other dimension nodes other than the node.
  • the embodiments of the present disclosure can obtain dimensions and factors included in the dimensions, so as to construct a root cause search tree according to the dimensions, and then use any dimension node in the root cause search tree as a first dimension node, and obtain the root cause search tree.
  • For the first dimension factor combination associated with the first dimension node calculate the first possibility parameter of the abnormal factor combination in the first dimension factor combination, in response to the first possibility parameter being greater than the first threshold and the number of the first dimension not greater than First threshold, increase the first threshold.
  • the calculation of the first possibility parameter and the calculation of the first possibility parameter are repeated.
  • the process of increasing the new first threshold when the first possibility parameter is greater than the new first threshold and the number of first dimensions is not greater than the second threshold, until the recalculated first possibility parameter is greater than the new first threshold and the first dimension The number of is greater than the second threshold, and the abnormal factor combination is determined from the first dimension factor combination.
  • the embodiment of the present disclosure does not require the user to input hyperparameters, and will determine whether the number of the first dimension is greater than the number of the second dimension when a first possibility parameter greater than the initially set first threshold is obtained by calculation. Threshold, the number of the first dimension is not greater than the second threshold, it means that the threshold is set too low, the method does not go deeper into the root cause search tree to meet the threshold condition and return the result, and at this time, in the present disclosure
  • the first threshold is increased with a certain step size and the abnormal factor combination is searched again, so that the root cause of dimension intersection (ie, multi-dimension) can be output. Therefore, the embodiments of the present disclosure can return reasonable results without the user inputting any hyperparameters, which is in line with expectations, and can be applied to various business scenarios.
  • Fig. 3 is a flow chart of a method for determining a root cause according to an exemplary embodiment.
  • the root cause determination method can be applied to an electronic device, and the electronic device can be, for example, a server, a computer, a mobile phone, or the like. As shown in FIG. 3, the method includes the following steps 301-308.
  • Step 301 Acquire abnormal data.
  • the abnormal data includes dimensions and factors included in the dimensions.
  • Step 302 Build a root cause search tree according to the dimension.
  • the root cause search tree includes at least one dimension node layer, each dimension node layer includes at least one dimension node, the dimension node is associated with at least one dimension, and the dimension node is associated with the number of dimensions It is the same as the dimension node level where the dimension node is located.
  • the dimension associated with a dimension node of the jth layer belongs to the dimension associated with a dimension node of the j+1th layer, then the dimension node of the jth layer is the parent of the dimension node of the j+1th layer Node, j is a positive integer and j ⁇ [1, N], N is the number of dimensions of the abnormal data. That is, in the embodiment of the present disclosure, the dimension associated with the parent node of a dimension node belongs to the dimension associated with the dimension node, and may also be considered to be included in the dimension associated with the dimension node.
  • abnormal data includes three dimensions A, B, and C
  • a dimension includes two factors a1, a2
  • B dimension includes three factors b1, b2, and b3
  • C dimension includes four factors, c1, c2, c3, and c4, then According to the three dimensions of A, B, and C, the root cause search tree is constructed, as shown in Figure 2.
  • the root cause search tree includes three layers, and the first layer includes three dimension nodes, which are dimension node A associated with dimension A, dimension node B associated with dimension B, and dimension node C associated with dimension C;
  • the second layer includes dimension nodes AB associated with A dimension and B dimension, dimension node AC associated with A dimension and C dimension, and dimension node BC associated with B dimension and C dimension;
  • the third layer includes A, B, and C dimensions Consists of dimension nodes ABC.
  • the A dimension is included in the A dimension and the B dimension. Therefore, the dimension node A associated with the A dimension is the parent node of the dimension node AB associated with the A dimension and the B dimension.
  • Step 303 Obtain the first dimension factor combination associated with the first dimension node in the root factor search tree.
  • the first dimension node is associated with at least one first dimension, the first dimension factor combination includes one factor of each of the first dimensions, and the first dimension node is the root factor search tree in the root factor search tree. Any dimension node.
  • the factor combination associated with dimension node A is: the factor combination composed of a1 factor and the factor combination of a2 factor combination
  • the factor combination associated with dimension node B is: The factor combination composed of b1, the factor combination composed of b2, and the factor combination composed of b3
  • the factor combination associated with dimension node C is: the factor combination composed of c1, the factor combination composed of c2, the factor combination composed of c3, the factor composed of c4 combination.
  • the factor combination associated with the dimension node AB is: a factor combination selected from two factors a1, a2, and a factor combination consisting of three factors selected from b1, b2, and b3, That is, it is a combination of 6 factors;
  • the factor combination associated with the dimension node AC is: a factor selected from the two factors a1 and a2, and a factor selected from the four factors c1, c2, c3, and c4.
  • the factor combination is 8 factor combinations;
  • the factor combinations associated with the dimension node BC are: one factor selected from the three factors b1, b2, and b3, and one factor selected from the four factors c1, c2, c3, and c4.
  • the factor combination composed of the selected factors is 12 factor combinations.
  • the factor combination associated with the dimension node ABC is: a factor selected from two factors a1, a2, a factor selected from three factors b1, b2, and b3, and a factor selected from c1, c2,
  • the factor combination composed of the factors selected from the four factors c3 and c4 is 24 factor combinations.
  • the factor combination associated with the dimension node with the largest number of associated dimensions is called the target combination.
  • the dimension node with the largest number of associated dimensions is the Nth layer dimension node in the root cause search tree, and N is abnormal.
  • the number of dimensions of the data is the target combination.
  • Step 304 Determine the abnormal target combination.
  • the target combination is the factor combination associated with the dimension node of the Nth layer, and N is the number of dimensions of the abnormal data.
  • the factor combination associated with the dimension nodes in the third layer is the target combination.
  • the abnormal target combination is a target combination with an offset greater than a preset offset.
  • the offset is the absolute value of the difference between the first-type indicator values of the target combination collected at different times, the first-type indicator value is the value of the target indicator, and the target indicator is the business indicator associated with the dimension , for example, the target index is an abnormal index or an index to be detected.
  • Step 305 Calculate the target proportion of each of the first dimension factor combinations.
  • the target proportion of the i-th first dimension factor combination is: the proportion of abnormal target combinations in the target combination associated with the i-th first dimension factor combination, i is a positive integer and i ⁇ [1, M], M ⁇ [1, N], M is the number of the first dimension factor combinations associated with the first dimension node.
  • a factor included in a factor combination belongs to a factor included in a target combination
  • the factor combination is associated with the target combination.
  • a certain first dimension factor combination is associated with 10 target combinations, among which there are 6 abnormal target combinations, then the target ratio of the first dimension factor combination is 0.6.
  • Step 306 Calculate a first possibility parameter of an abnormal factor combination in the first dimension factor combination according to the target proportion of the first dimension factor combination.
  • the possibility parameter there is a one-to-one correspondence between the possibility parameter and the dimension node, that is, there is a possibility parameter corresponding to each dimension node in the root cause search tree.
  • the probability parameter corresponding to a dimension node represents the probability that there is an abnormal factor combination in the factor combination associated with the dimension node.
  • the likelihood parameter may be an incentive potential score (General Potential Score, GPS).
  • GPS is a value that measures the possibility of a combination of factors becoming a root cause in multi-dimensional cross root cause analysis, and the calculation method of GPS will be described later.
  • Step 307 In response to the first likelihood parameter being greater than a first threshold and the number of the first dimensions not being greater than a second threshold, increase the first threshold.
  • Step 308 Taking any dimension node in the root factor search tree as the first dimension node, and using the increased first threshold as the new first threshold, repeat the calculation of the existence of the first dimension factor combination. a first likelihood parameter of an abnormal factor combination, and increasing the new first likelihood parameter when the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimensions is not greater than the second threshold Thresholding process, until the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimension is greater than the second threshold, determining an abnormal factor combination from the first dimension factor combination.
  • the above steps 307 and 308 are the process of searching for abnormal factor combinations in the factor combinations associated with the dimension nodes in the root factor search tree.
  • the found combination of abnormal factors is the "root cause”.
  • the order of finding the abnormal factor combinations is not limited to the factor combinations associated with the dimension nodes in the root factor search tree, for example, the order of the number of dimensions from small to large can be layer by layer.
  • Traverse the root cause search tree that is, calculate the possibility parameter corresponding to each dimension node layer by layer; you can also randomly select a dimension node, and calculate the possibility parameter corresponding to the dimension node, and in the calculated If there is no abnormal factor combination in the factor combination associated with the dimension node, randomly select the next dimension node for which the possibility parameter has not been calculated.
  • the dimension nodes in each layer can be traversed in the order of the first layer, the second layer, and the third layer.
  • the possibility parameters corresponding to the dimension nodes are calculated in the order of dimension node A, dimension node B, and dimension node C;
  • the order of dimension nodes AC and dimension nodes BC is used to calculate the possibility parameters corresponding to the dimension nodes.
  • the first possibility parameter is greater than the first threshold and the number of first dimensions is not greater than the second threshold, it means that the first threshold is set too low, and the method does not search deeper into the root cause search tree to satisfy the threshold condition and The result is returned.
  • the first threshold is increased by a certain step, and the abnormal factor combination is searched again.
  • the abnormal factor combination can be selected from the first dimension factor combination.
  • the method further includes:
  • the abnormal factor combination is selected from the first dimension factor combination.
  • the method further includes:
  • An abnormal factor combination is determined from the first dimension factor combination in response to the recalculated first likelihood parameter being greater than the new first threshold and the number of the first dimensions is not greater than the second threshold.
  • the first threshold is not less than the preset value
  • the recalculated first possibility parameter is greater than the new first threshold
  • the number of first dimensions is not greater than the second threshold, it indicates an abnormal factor It is also reasonable that the number of dimensions to which the factors included in the combination belong is not greater than the second threshold.
  • the method further includes:
  • the first possibility parameter is not greater than the first threshold, indicating that the possibility of abnormal factor combination in the first dimension factor combination is very small.
  • the possibility parameters corresponding to other dimension nodes need to be recalculated. For example, in the process of traversing the root cause search tree layer by layer in the order of the number of dimensions from small to large, if the possibility parameter of a node of a certain dimension is calculated to be less than or equal to the first threshold, then continue to traverse the remaining dimension nodes, That is, the possibility parameter of the next dimension node of the dimension node corresponding to this possibility parameter is calculated, until a possibility parameter greater than the first threshold is obtained, and the number of dimensions associated with the dimension node corresponding to this possibility parameter is obtained. If it is greater than the second threshold, stop traversing.
  • the abnormal factor combination can be selected from the second dimension factor combination.
  • the second possibility parameter is not greater than the first threshold, or the second possibility parameter is greater than the first threshold and the number of the second dimension is not greater than the second threshold, then the root cause search tree is calculated to divide the node of the first dimension and the second dimension The possibility parameters corresponding to other dimension nodes other than the node.
  • the method further includes:
  • the preset condition includes that the change of the target object does not match the abnormal direction of the target index, the target object is the first-type index value of the target combination collected at different times, and the target index is the same as the target index.
  • the business indicator associated with the dimension, and the value of the first type of indicator is the value of the target indicator;
  • the determining the abnormal target combination includes:
  • An abnormal target combination in the first remaining target combination is determined.
  • the target combination that is inconsistent with the abnormal direction of the target index is removed. For example, it is necessary to determine the factor combination that increases the DAU, but the DAU of some target combinations, the current index value is reduced compared to the past index value, the target combination needs to be removed, so as to avoid such target combination affecting the subsequent search and causing The real reason for the increase in DAU.
  • the abnormal target combination in the obtained first remaining target combination is determined, and the subsequent process is performed by using the abnormal target combination in the first remaining target combination.
  • the determining an abnormal target combination includes:
  • the target combinations with offsets greater than the first target offsets are abnormal targets combination, wherein, the first target offset is the abscissa of the first inflection point in the first offset distribution graph.
  • the first offset distribution curve diagram may be, for example, as shown in FIG. 4 .
  • the offset is the absolute value of the difference between the first-type indicator values of the target combination collected at different times
  • the first-type indicator is the value of the target indicator
  • the target indicator is the service associated with the dimension index.
  • the DAU index value of a target combination collected at a certain time in the past is x1
  • the DAU index value of the target combination collected at the current moment is x2
  • the offset of the target combination is
  • the offset when the root cause search tree is obtained, the offset will be calculated for each target combination, and the offset distribution curve will be drawn according to the calculated offset. Shift the inflection point of the distribution curve to find the threshold required to determine the abnormal target combination, so that the target combination greater than the threshold is defined as the abnormal target combination.
  • the determining the first inflection point in the first offset distribution curve includes:
  • the sensitive parameter S in the inflection point detection algorithm based on the elbow rule is calculated, wherein L is the target involved in the first offset distribution curve diagram
  • the total number of combinations, m and n are preset constants respectively;
  • the first inflection point in the first offset distribution curve graph is determined.
  • a sensitive parameter S is used to control the conservative degree of finding the inflection point.
  • each inflection point that is, the dotted line and the solid line in FIG. 5
  • the distribution of the intersection point is shown in Figure 5. It can be seen from Figure 5 that the larger S is, the larger the value of the abscissa of the inflection point in the offset curve graph is, so that the number of abnormal target combinations determined is less. That is, the larger S is, the more conservative it is.
  • an inflection point detection algorithm based on the elbow rule is used, and the process of determining the first inflection point in the first offset distribution curve graph is as follows:
  • the first offset distribution curve graph may be mapped to a graph with the abscissa value ranging from 0 to 1 and the ordinate value ranging from 0 to 1, for example, as shown in FIG. 6 .
  • each point can get its distance to the line segment AB, wherein, among the two endpoints of the line segment AB, point A is the starting point of the curve shown in Fig. 6, B The point is the end point of the curve shown in FIG. 6 .
  • a distance curve graph can be obtained according to the distance from each point in the curve in FIG. 6 to the line segment AB, as shown in FIG. 7 .
  • the vertical axis of the distance curve represents the distance from each point on the curve in FIG. 6 to the line segment AB.
  • the number of points whose distance is greater than the predetermined distance threshold can be obtained, which is denoted as Q;
  • map the determined first target point in FIG. 7 to FIG. 6 that is, find the first target point corresponding to the point in FIG. 6 , and record it as the second target point, so that FIG. 6 can be determined.
  • the first target point in the above-mentioned point of the first offset curve is the first inflection point in the first offset curve.
  • the determining an abnormal target combination further includes:
  • the target combinations are carried out according to the order of the offsets from small to large. Sort, get the first sort;
  • a second offset distribution graph is drawn, wherein the horizontal axis of the second offset distribution graph represents the offset, and the vertical axis represents the offset The number of target combinations whose offset is less than the offset represented by the numerical value on the horizontal axis;
  • the target combination of displacement amounts is determined as an abnormal target combination, wherein the second target displacement amount is the abscissa of the second inflection point in the second displacement amount distribution graph.
  • the embodiment of the present disclosure cyclically discards some target combinations in a certain proportion, Then, the offset distribution curve is redrawn according to the remaining target combinations, and the inflection point is calculated again until the proportion of the abnormal target combination found according to the inflection point in the remaining target combinations is less than the fifth threshold (eg, 50%).
  • the fifth threshold eg, 50%
  • the calculating a first possibility parameter of an abnormal factor combination in the first dimension factor combination according to the target proportion of the first dimension factor combination may include the following steps H1 to H6.
  • Step H1 Sort the first dimension factor combinations according to the descending order of the target proportions to obtain the second sorting.
  • Step H2 Select the first second preset number of to-be-processed factor combinations in the second sorting.
  • Step H3 Rank the to-be-processed factor combinations to obtain a third ranking.
  • Step H4 Combining the top third preset number of candidate factors in the third sorting.
  • Step H5 Obtain a target combination associated with the candidate factor combination, wherein the target combination associated with the candidate factor combination includes the factors in the candidate factor combination.
  • Step H6 Calculate a first possibility parameter of an abnormal factor combination in the first dimension factor combination according to the target combination associated with the candidate factor combination.
  • the second preset number is greater than the third preset number.
  • the first dimension node is associated with 50 first dimension factor combinations, then it is necessary to calculate the target proportion of each factor combination in the 50 first dimension factor combinations, and according to the order of the target proportion from large to small, Sort the 50 first dimension factor combinations, so as to first select the third preset number (for example, 15) of the to-be-processed factor combinations in the sorting, and then select the third preset number of to-be-processed factor combinations , perform sorting, and then select a first preset number (for example, 3) candidate factor combinations, and then calculate the first possibility of abnormal factor combinations in the first dimension factor combination according to the target combination associated with the candidate factor combinations. parameter.
  • Sort the 50 first dimension factor combinations so as to first select the third preset number (for example, 15) of the to-be-processed factor combinations in the sorting, and then select the third preset number of to-be-processed factor combinations , perform sorting, and then select a first preset number (for example, 3) candidate factor combinations, and then calculate the first possibility of abnormal
  • the candidate factor combination finally selected from the factor combination associated with a dimension node may be called the candidate factor combination associated with the dimension node.
  • the sorting of the combination of factors to be processed to obtain a third sorting includes:
  • the first parameter of the combination of factors to be processed is calculated, and the combination of factors to be processed is sorted in descending order of the first parameter to obtain a third ranking , wherein the target indicator is the business indicator associated with the dimension, the first parameter is the sum of the offsets of the target combinations associated with the same combination of factors to be processed, and the combination of factors to be processed is related to
  • the linked target combination includes the factors in the to-be-processed factor combination;
  • the target index is a derivative index
  • obtain the first value of each first target combination the first target combination is the target combination associated with the to-be-processed factor combination, and the first value is the The absolute value of the difference between the second-type index values of the first target combination at different times, the second-type index value is the value of the first index, and the first index is when the target index is a derivative index,
  • the index is used as a molecule;
  • the second value is the absolute value of the difference between the third-type indicator values of the first target combination at the different times, and the third-type indicator value is the value of the second index, and the second index is the index used as the denominator in the process of calculating the target index when the target index is a derivative index;
  • the target index is an atomic index
  • the overall fluctuation of the target index is rising: according to the value of f-v from large to small, sort the above-mentioned combination of factors to be processed again; then in the target index
  • the overall fluctuation of is decreasing: according to the value of v-f from large to small, sort the above combination of factors to be processed again; where f represents the first category of the target combination associated with the combination of factors to be processed at the second moment
  • f represents the first category of the target combination associated with the combination of factors to be processed at the second moment
  • v represents the sum of the first-type index values of the target combination associated with the factor combination to be processed at the first moment, and the first moment is earlier than the second moment.
  • the target index is a derivative index, it is first necessary to determine the first index nume as the numerator and the second index deno as the denominator in the process of calculating the target index;
  • f_nume represents the sum of the second type of index values (that is, the value of the first index number) of the target combination associated with the combination of factors to be processed at the second moment
  • v_nume represents the target combination associated with the combination of factors to be processed at the first
  • f_deno represents the sum of the third-type index values (that is, the value of the second index deno) at the second moment of the target combination associated with the combination of factors to be processed
  • v_deno represents the The sum of the third-type index values of the target combination associated with the factor combination at the first moment.
  • the GPS of the dimension node When the sum of the number of target combinations associated with the factor combinations associated with a dimension node is small, the GPS of the dimension node will be relatively high. Therefore, selecting a factor combination with a small number of associated target combinations but a relatively large target ratio as an alternative factor combination for calculating the GPS of a dimension node will make the calculated GPS of this dimension node high, so The probability of abnormal factor combinations in the factor combinations associated with the dimension node cannot be accurately represented, resulting in inaccurate abnormal factor combinations found. Among them, in the case where there are few factor combinations associated with a dimension node, the specific reason why the GPS of the dimension node is relatively high will be described below.
  • the target ratio is used to sort first, and the first and third preset number of factor combinations in the order of target ratio from large to small are reserved, and then an "offset" is further introduced.
  • the third preset number of factor combinations is sorted twice, so as to exclude those situations where the number of target combinations is small and the target proportion is large.
  • the determining an abnormal factor combination from the first dimension factor combination includes:
  • the candidate factor combination is determined as the abnormal factor combination.
  • the above candidate factor combination can be determined as the abnormal factor combination, that is, it is determined as The combination of factors that cause the target metric to be abnormal.
  • calculating a first possibility parameter of an abnormal factor combination in the first dimension factor combination according to the target combination associated with the candidate factor combination includes:
  • a(Z1) f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)), where f( Z1) represents the sum of the first-type index values of the first abnormal target combination at the second moment, and f(Z) represents the sum of the first-type indicator values of the first target combination at the second moment , v(Z) represents the sum of the first type index values of the first target combination at the first moment, the first type index value is the value of the target index, and the target index is the dimension associated with the business indicators, the first moment is earlier than the second moment;
  • avg1, avg2, avg3, and the third preset formula GPS 1-(avg3+avg2)/(avg1+avg2), calculate the first possibility parameter of the abnormal factor combination in the first dimension factor combination ;
  • GPS represents the first possibility parameter.
  • all target combinations associated with the candidate factor combination associated with the first dimension node include 10. Among them, there are 5 abnormal target combinations and 5 normal target combinations, and the offsets of these 5 abnormal target combinations need to be calculated. Then, calculate the second average value avg2 of the offsets of the 5 normal target combinations; again, calculate the sum of the first type index values of the 5 abnormal target combinations collected at the first moment and, to obtain f(Z1); again, calculate the sum of the first-class index values of the above 10 target combinations collected at the first moment to obtain f(Z); again, calculate the collected at the second moment, the above
  • the method further includes the following steps K1-K4.
  • Step K1 Obtain a second target combination associated with the abnormal factor combination, the target combination is the factor combination associated with the Nth layer dimension node, N is the number of dimensions of the abnormal data, and the first target combination includes: A factor in the anomalous factor combination.
  • Step K2 Obtain a fourth numerical value, where the fourth numerical value is the sum of the first type index values of the second target combination at the third moment, the first type index value is the value of the target index, and the target Metrics are business metrics associated with the dimension.
  • Step K3 Obtain a fifth numerical value, where the fifth numerical value is the sum of the first-type index values of all the target combinations at the third moment.
  • Step K4 In response to the ratio of the fourth numerical value to the fifth numerical value being smaller than the third threshold, perform a preset prompt operation, where the preset prompt operation is used to prompt that the abnormal factor is not in the abnormal data.
  • the abnormal factor combination is found to be the first factor combination consisting of a1 and b1 factors, the second factor combination consisting of a1 and b2, and the third factor combination consisting of a2 and b3.
  • the target combination associated with the first factor combination includes: the target combination composed of a1, b1, and c1, the target combination composed of a1, b1, and c2, the target combination composed of a1, b1, and c3, and the target combination composed of a1, b1, and c4.
  • the target combination of the Includes four target combinations. Then there are 12 target combinations associated with the obtained abnormal factor combination.
  • the sum of the first-type index values of the above 12 target combinations that is, the values of the target indicators
  • the sum of the first-type index values of all target combinations if the ratio of the former to the latter is less than the third threshold, the Indicates that the proportion of the sum of the index values of the target combination associated with the found abnormal factor combination is too small.
  • the target indicator is the number of daily active users (Daily Active User, DAU), and the data collected at a certain time shows that the sum of the DAUs of all target combinations of the root cause search tree shown in Figure 2 is 200 million, while the above 12
  • the sum of the DAU of the target combination is 2000, then 2000 is far less than 200 million, which means that the found factor combination that causes DAU anomaly accounts for too small, which means that the abnormal factor is not in the dimension node in the above root cause search tree associated factor combinations, so as to avoid misleading users with the output factor combinations.
  • the proportion of the sum of the first-type index values of the target combination associated with the selected abnormal factor combination in the total index value that is, the sum of the first-type index values of all target combinations
  • the proportion of the sum of the first-type index values of the target combination associated with the selected abnormal factor combination is too small in the total index value, the user is prompted that the root cause is not in the factor combination associated with the dimension node in the root cause search tree. in order to avoid misleading users. It can be seen that, the embodiment of the present disclosure also considers the possibility that the root cause is not in the factor combination associated with the dimension nodes in the root cause search tree.
  • the percentage change of the target combination is the percentage change of the first type index value of the target combination at different times
  • the percentage change of the abnormal factor combination is: the sum of the first type indicator values of all target combinations associated with the abnormal factor combination is different at different times The percentage change in time.
  • the percentage change of the abnormal factor combination is not much different from that of most of the target combinations associated with the abnormal factor combination. For example, if there are 3 abnormal factor combinations selected, and the number of target combinations associated with these three abnormal factor combinations is 6, then if these 3 abnormal factor combinations belong to the root cause, the DAU based on these 6 target combinations The sum is up 15%, and the DAU gains for each of the 6 target groups are not much different from 15%.
  • the probability that the values of the first-type indicators of two target combinations change by a similar amount will be much greater than the probability that the values of the first-type indicators of a 100-target combination all change by a similar amount. Therefore, when the sum of the number of target combinations associated with the factor combinations associated with a dimension node is small, the GPS of the dimension node will be relatively high. It can be seen that a dimension node with a deeper dimension intersection (ie, a dimension node with a larger number of associated dimensions) tends to have a higher GPS score because its associated factor combination has fewer target combinations associated with it.
  • the global optimal is not selected, but as the dimension crosses deeper and deeper, through a
  • the first threshold is to search for the root cause of abnormal target index from shallow to deep. So theoretically even if the dimension intersection is shallow (such as only two dimension intersections and GPS is 0.75), as long as its GPS is greater than this threshold, it will still take precedence over dimension intersections that are deeper but with higher GPS scores (such as three dimension intersections and GPS is 0.8).
  • the root cause of the jitter of the indicator is not in the internal business, but the competing products have some actions.
  • the root cause of the jitter of the indicator is not in the internal business, but the competing products have some actions.
  • there is no real root dimension in the root search tree so all dimension intersections will theoretically have a low GPS score.
  • the dimension nodes associated with deeper dimension intersection tend to have a higher GPS score, in this case, the dimension nodes associated with the found combination of abnormal factors are in the root cause search tree.
  • a deeper level, in this way, the number of target combinations associated with the found abnormal factor combination is relatively small, resulting in a small proportion of the sum of the first-type index values of the target combination associated with the returned abnormal factor combination to the total index value.
  • the embodiment of the present disclosure will judge the first type index value of the target combination associated with the found abnormal factor combination. , the root cause you are looking for may not be in the current dimension" etc. are used to describe the prompt information that the root cause has not been found through this solution. Among them, in the root cause analysis in the real business scenario, the possibility that the abnormal factor combination is not in the factor combination associated with the dimension nodes in the root cause search tree is very large. Therefore, this processing can greatly improve the rationality of the output. .
  • the above-mentioned third moment may be the same as one of the aforementioned first moment and second moment. That is, in the case of finding the abnormal factor combination, the above steps K1 to K4 may be performed by using the first-type index value collected at the first moment or the first-type index value collected at the second moment. The above steps K1 to K4 may also be performed using the first-type index values collected at other times.
  • the specific implementation of the root cause determination method in the embodiment of the present disclosure is mainly divided into four stages, namely the preparation stage, the abnormal target combination confirmation stage, the root cause search stage, and the control result output. stage.
  • the preparation stage mainly includes the following processes:
  • the target combination draws the offset distribution curve, and controls the search for the inflection point of the curve through the sensitive parameter S, and then confirms the abnormal target combination according to the inflection point.
  • the dimension nodes are traversed layer by layer, and the target proportion of the abnormal target combination in the factor combination associated with the dimension node is calculated.
  • the factor combinations associated with the dimension nodes are sorted, then the top 15 factor combinations to be processed are selected, and the factor combinations to be processed are sorted, and then the top three candidate factor combinations are selected, and then according to the selected top three
  • the target combination associated with the candidate factor combination of the name calculate the GPS of the node of this dimension;
  • control result output stage which mainly includes the following processes:
  • the number of abnormal target combinations after pruning is within a reasonable range relative to the total number of target combinations. This includes taking into account the up and down direction of the indicator and retaining the same target combination as the indicator's up and down direction. Further, according to the total number of target combinations in the same direction as the indicator, the sensitive parameters in the inflection point detection algorithm based on the elbow rule are calculated, so as to find the offset curve drawn according to the offset of the target combination according to the sensitive parameters. suitable inflection point in the figure. In addition, the case where the offset distribution curve is a concave function is also considered, and this case is corrected.
  • the “offset related to the target combination associated with the factor combination to be processed” is introduced to sort the factor combination to be processed, so as to avoid The fact that the number of target combinations associated with the factor combination is small, which leads to the fact that the ranking of the target proportion is inflated.
  • the preset first threshold is cyclically changed with a certain step size to ensure that the root cause of the output is multi-dimensional rather than single-dimensional.
  • the real root cause is in the factor combination associated with the dimension node in the root cause search tree by the proportion of the sum of the first-type index values of the target combination associated with the selected abnormal factor combination in the total index value. middle. If the proportion of the sum of the first-type index values of the target combination associated with the selected abnormal factor combination is too small in the total index value, the user is prompted that the root cause is not in the factor combination associated with the dimension node in the root cause search tree. in order to avoid misleading users.
  • the embodiments of the present disclosure improve the robustness of the original algorithm in various business situations in all aspects, and compared with the original algorithm, the results are more reasonable and more interpretable.
  • the embodiments of the present disclosure correct the unreasonable parts of the existing methods in pruning, sorting and result output when faced with real business data.
  • it can also effectively avoid the fact that hyperparameters are selected, and the application in the multi-dimensional cross root cause analysis scenario can effectively enhance the robustness of the algorithm in non-application scenarios, and avoid the multi-dimensional cross to find abnormal factors. Take into account the facts of various business scenarios.
  • FIG. 9 is a block diagram of a root cause determination device according to an exemplary embodiment. As shown in FIG. 9 , the root cause determination device 90 may include:
  • a data acquisition module 901 configured to acquire abnormal data, where the abnormal data includes dimensions and factors included in the dimensions;
  • the building module 902 is configured to construct a root cause search tree according to the dimension, the root cause search tree includes at least one dimension node layer, each dimension node layer includes at least one dimension node, and the dimension node layer includes at least one dimension node.
  • the point is associated with at least one dimension, and the number of dimensions associated with the dimension node is the same as the level of the dimension node where the dimension node is located;
  • the first factor combination acquisition module 903 is configured to acquire the first dimension factor combination associated with the first dimension node in the root factor search tree, the first dimension node is associated with at least one first dimension, the first dimension The dimension factor combination includes one factor of each of the first dimensions, and the first dimension node is any dimension node in the root factor search tree;
  • a first possibility parameter calculation module 904 configured to calculate a first possibility parameter of an abnormal factor combination in the first dimension factor combination
  • a threshold increasing module 905 configured to increase the first threshold in response to the first likelihood parameter being greater than a first threshold and the number of the first dimension not greater than a second threshold;
  • the execution module 906 is configured to take any dimension node in the root cause search tree as a first dimension node, and use the increased first threshold as a new first threshold to repeatedly execute and calculate the first dimension There is a first likelihood parameter of an abnormal factor combination in the factor combination, and when the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimensions is not greater than the second threshold until the recalculated first possibility parameter is greater than the new first threshold and the number of the first dimension is greater than the second threshold, determined from the combination of the first dimension factors Anomalous factor combinations.
  • the embodiments of the present disclosure can construct a root cause search tree according to the target dimension involved in the index to be detected and the factors included in the target dimension, so as to traverse the path of the root cause search tree, and every time a root cause search tree is encountered
  • the GPS of the encountered dimension node is calculated, and the calculated first GPS is greater than the first threshold, and the target dimension involved in the dimension node to which the first GPS belongs is less than or equal to the second threshold.
  • Step until the calculated second GPS is greater than the increased first threshold, and the number of target dimensions involved in the dimension node to which the second GPS belongs is greater than the second threshold, stop traversing, and start from the second GPS.
  • a first preset number of factor combinations are selected from the factor combinations corresponding to the dimension nodes of , as the factor combinations that cause the abnormality of the indicator to be detected.
  • the embodiment of the present disclosure does not require the user to input hyperparameters, and will determine whether the number of dimensions involved in the dimension node to which the GPS belongs is greater than The second threshold, the number of dimensions involved in the dimension node to which the GPS belongs is not greater than the second threshold, indicating that the threshold is set too low, and the algorithm satisfies the threshold condition and returns the result without further exploration.
  • the present disclosure In the embodiment of , the first threshold is increased with a certain step size, and the root cause search tree is searched from top to bottom again, so that the root cause of dimension intersection can be output. Therefore, the embodiments of the present disclosure can return reasonable results without the user inputting any hyperparameters, thus being applicable to various business scenarios.
  • Fig. 10 is a block diagram of a root cause determination device according to an exemplary embodiment. As shown in Fig. 10 , the root cause determination device 100 may include:
  • the data acquisition module 1001 is configured to acquire abnormal data, and the abnormal data includes a dimension and a factor included in the dimension;
  • the building module 1002 is configured to build a root cause search tree according to the dimension, where the root cause search tree includes at least one dimension node layer, each dimension node layer includes at least one dimension node, and the dimension node layer includes at least one dimension node.
  • the point is associated with at least one dimension, and the number of dimensions associated with the dimension node is the same as the level of the dimension node where the dimension node is located;
  • the first factor combination acquisition module 1003 is configured to acquire the first dimension factor combination associated with the first dimension node in the root cause search tree, the first dimension node is associated with at least one first dimension, the first dimension The dimension factor combination includes one factor of each of the first dimensions, and the first dimension node is any dimension node in the root factor search tree;
  • the first possibility parameter calculation module 1004 is configured to calculate the first possibility parameter that there is an abnormal factor combination in the first dimension factor combination
  • a threshold increasing module 1005, configured to increase the first threshold in response to the first likelihood parameter being greater than a first threshold and the number of the first dimensions not greater than a second threshold;
  • the execution module 1006 is configured to take any dimension node in the root cause search tree as a first dimension node, and use the increased first threshold as a new first threshold, and repeatedly execute and calculate the first dimension There is a first likelihood parameter of an abnormal factor combination in the factor combination, and when the recalculated first likelihood parameter is greater than the new first threshold and the number of the first dimensions is not greater than the second threshold until the recalculated first possibility parameter is greater than the new first threshold and the number of the first dimension is greater than the second threshold, determined from the combination of the first dimension factors Anomalous factor combinations.
  • the apparatus further includes:
  • a first determination module 1007 configured to increase the first threshold in response to the recalculated first likelihood parameter being greater than the new first threshold and the number of the first dimension is not greater than the second a threshold value, and an abnormal factor combination is determined from the first dimension factor combination.
  • the apparatus further includes:
  • the second determining module 1008 is configured to, based on increasing the first threshold, determine an abnormal factor combination from the first dimension factor combination in response to the new first threshold being not less than a preset value.
  • the apparatus further includes:
  • the second factor combination obtaining module 1009 is configured to, in response to the first possibility parameter not being greater than the first threshold, obtain a second dimension factor combination associated with a second dimension node, the second dimension node being associated with at least one second dimension, the second dimension factor combination includes one factor of each of the second dimensions;
  • the second possibility parameter calculation module 1010 is configured to calculate a second possibility parameter that an abnormal factor combination exists in the second dimension factor combination.
  • the apparatus further includes:
  • the abnormal target combination determination module 1011 is configured to determine the abnormal target combination, the target combination is the factor combination associated with the Nth layer dimension node, and N is the number of dimensions of the abnormal data;
  • the first possibility parameter calculation module 1004 includes:
  • the proportion calculation sub-module 10041 is configured to calculate the target proportion of each of the first dimension factor combinations, wherein the target proportion of the i-th first dimension factor combination is The proportion of abnormal target combinations in the target combination associated with the dimension factor combination, i is a positive integer and i ⁇ [1, M], M ⁇ [1, N], M is the first dimension associated with the first dimension node. the number of one-dimensional factor combinations;
  • the likelihood parameter calculation sub-module 10042 is configured to calculate, according to the target proportion of the first dimension factor combination, a first possibility parameter of an abnormal factor combination in the first dimension factor combination.
  • the apparatus further includes:
  • the deletion module 1012 is configured to delete the target combination that meets the preset condition to obtain the first remaining target combination
  • the preset condition includes that the change of the target object does not match the abnormal direction of the target index, the target object is the first-type index value of the target combination collected at different times, and the target index is the same as the target index.
  • the business indicator associated with the dimension, and the value of the first type of indicator is the value of the target indicator;
  • the abnormal target combination determination module 1011 determines the abnormal target combination, it is specifically configured as:
  • An abnormal target combination in the first remaining target combination is determined.
  • the abnormal target combination determination module 1011 includes:
  • an offset acquisition sub-module 10111 configured to acquire the offset of the target combination
  • the first drawing sub-module 10112 is configured to draw a first offset distribution graph, wherein the horizontal axis of the first offset distribution graph represents the offset, and the vertical axis represents that the offset is smaller than the horizontal axis.
  • a first inflection point determination sub-module 10113 configured to determine a first inflection point in the first offset distribution curve
  • the first abnormal target combination determination sub-module 10114 is configured to, among all the target combinations, if the proportion of the target combination whose offset is greater than the first target offset is not greater than the fifth threshold, then determine that the offset is greater than all the target combinations.
  • the target combination of the first target offset is an abnormal target combination, wherein the first target offset is the abscissa of the first inflection point in the first offset distribution graph.
  • the first inflection point determination sub-module 10113 is specifically configured to:
  • the sensitive parameter S in the inflection point detection algorithm based on the elbow rule is calculated, wherein L is the target involved in the first offset distribution curve diagram
  • the total number of combinations, m and n are preset constants respectively;
  • the first inflection point in the first offset distribution curve graph is determined.
  • the abnormal target combination determination module 1011 further includes:
  • the sorting sub-module 10115 is configured to, among all the target combinations, if the proportion of the target combinations whose offsets are greater than the first target offsets is greater than the fifth threshold, then according to the offsets from small to large order, sort the target combination to obtain the first order;
  • the deletion sub-module 10116 is configured to remove the first preset number of target combinations in the first sorting to obtain the second remaining target combination
  • the second drawing sub-module 10117 is configured to draw a second offset distribution curve according to the offset of the second remaining target combination obtained this time, wherein the second offset distribution curve is
  • the horizontal axis represents the offset
  • the vertical axis represents the number of target combinations whose offset is less than the offset represented by the numerical value on the horizontal axis;
  • a second inflection point determination sub-module 10118 configured to determine a second inflection point in the second offset distribution curve
  • the second abnormal target combination determination sub-module 10119 is configured to, in the second remaining target combination obtained this time, the proportion of the target combination whose offset is greater than the second target offset is not greater than the fifth threshold , the target combination with an offset greater than the second target offset is determined as an abnormal target combination, wherein the second target offset is the second inflection point in the second offset distribution curve The abscissa in the figure.
  • the likelihood parameter calculation sub-module 10042 is specifically configured to:
  • the target combination associated with the candidate factor combination includes the factors in the candidate factor combination
  • the second preset number is greater than the third preset number.
  • the possibility parameter calculation sub-module 10042 is specifically configured to: in the case of sorting the to-be-processed factor combinations to obtain the third sorting order:
  • the first parameter of the combination of factors to be processed is calculated, and the combination of factors to be processed is sorted in descending order of the first parameter to obtain a third ranking , wherein the target indicator is the business indicator associated with the dimension, the first parameter is the sum of the offsets of the target combinations associated with the same combination of factors to be processed, and the combination of factors to be processed is related to
  • the linked target combination includes the factors in the to-be-processed factor combination;
  • the target index is a derivative index
  • obtain the first value of each first target combination the first target combination is the target combination associated with the to-be-processed factor combination, and the first value is the The absolute value of the difference between the second-type index values of the first target combination at different times, the second-type index value is the value of the first index, and the first index is when the target index is a derivative index,
  • the index is used as a molecule;
  • the second value is the absolute value of the difference between the third-type indicator values of the first target combination at the different times, and the third-type indicator value is the value of the second index, and the second index is the index used as the denominator in the process of calculating the target index when the target index is a derivative index;
  • the execution module 1006 is specifically configured to:
  • the candidate factor combination is determined as the abnormal factor combination.
  • the likelihood parameter calculation sub-module 10042 calculates the first likelihood parameter of an abnormal factor combination in the first dimension factor combination according to the target combination associated with the candidate factor combination , which is specifically configured as:
  • a(Z1) f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)), where f( Z1) represents the sum of the first-type index values of the first abnormal target combination at the second moment, and f(Z) represents the sum of the first-type indicator values of the first target combination at the second moment , v(Z) represents the sum of the first type index values of the first target combination at the first moment, the first type index value is the value of the target index, and the target index is the dimension associated with the business indicators, the first moment is earlier than the second moment;
  • avg1, avg2, avg3, and the third preset formula GPS 1-(avg3+avg2)/(avg1+avg2), calculate the first possibility parameter of the abnormal factor combination in the first dimension factor combination ;
  • GPS represents the first possibility parameter.
  • the apparatus further includes:
  • the first verification parameter acquisition module 1013 is configured to acquire a second target combination associated with the abnormal factor combination, where the target combination is the factor combination associated with the Nth layer dimension node, and N is the number of dimensions of the abnormal data , the first target combination includes the factors in the abnormal factor combination;
  • the second verification parameter obtaining module 1014 is configured to obtain a fourth numerical value, where the fourth numerical value is the sum of the first type index values of the second target combination at the third moment, and the first type index value is the target the value of the indicator, the target indicator is the business indicator associated with the dimension;
  • the third verification parameter obtaining module 1015 is configured to obtain a fifth numerical value, where the fifth numerical value is the sum of the first-type index values of all the target combinations at the third moment;
  • the verification module 1016 is configured to perform a preset prompt operation in response to the ratio of the fourth numerical value to the fifth numerical value being less than the third threshold, the preset prompt operation is used to prompt that the abnormal factor is not in the abnormal data middle.
  • the embodiments of the present disclosure can construct a root cause search tree according to the target dimension involved in the index to be detected and the factors included in the target dimension, so as to traverse the path of the root cause search tree, and every time a root cause search tree is encountered
  • the GPS of the encountered dimension node is calculated, and the calculated first GPS is greater than the first threshold, and the target dimension involved in the dimension node to which the first GPS belongs is less than or equal to the second threshold.
  • Step until the calculated second GPS is greater than the increased first threshold, and the number of target dimensions involved in the dimension node to which the second GPS belongs is greater than the second threshold, stop traversing, and start from the second GPS.
  • a first preset number of factor combinations are selected from the factor combinations corresponding to the dimension nodes of , as the factor combinations that cause the abnormality of the indicator to be detected.
  • the embodiment of the present disclosure does not require the user to input hyperparameters, and will determine whether the number of dimensions involved in the dimension node to which the GPS belongs is greater than The second threshold, the number of dimensions involved in the dimension node to which the GPS belongs is not greater than the second threshold, indicating that the threshold is set too low, and the algorithm satisfies the threshold condition and returns the result without further exploration.
  • the present disclosure In the embodiment of , the first threshold is increased with a certain step size, and the root cause search tree is searched from top to bottom again, so that the root cause of dimension intersection can be output. Therefore, the embodiments of the present disclosure can return reasonable results without the user inputting any hyperparameters, thus being applicable to various business scenarios.
  • an electronic device is provided. 11, the electronic device includes:
  • a memory 1120 for storing the processor-executable instructions
  • the processor is configured to execute the instructions to implement the above-mentioned root cause determination method.
  • an electronic device is also provided.
  • the electronic device 900 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.
  • an electronic device 1200 may include one or more of the following components: a processing component 1202, a memory 1204, a power supply component 1206, a multimedia component 1208, an audio component 1210, an input/output (I/O) interface 1212, a sensor component 1214 , and the communication component 1216.
  • a processing component 1202 a memory 1204, a power supply component 1206, a multimedia component 1208, an audio component 1210, an input/output (I/O) interface 1212, a sensor component 1214 , and the communication component 1216.
  • the processing component 1202 generally controls the overall operation of the electronic device 1200, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 1202 can include one or more processors 1220 to execute instructions to perform all or some of the steps of the above-described methods.
  • processing component 1202 may include one or more modules that facilitate interaction between processing component 1202 and other components.
  • processing component 1202 may include a multimedia module to facilitate interaction between multimedia component 1208 and processing component 1202.
  • Memory 1204 is configured to store various types of data to support operation at device 1200 . Examples of such data include instructions for any application or method operating on electronic device 1200, contact data, phonebook data, messages, pictures, videos, and the like. Memory 1204 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply component 1206 provides power to various components of electronic device 1200 .
  • Power supply components 1206 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 1200 .
  • Multimedia component 1208 includes a screen that provides an output interface between the electronic device 1200 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP).
  • the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
  • the multimedia component 1208 includes a front-facing camera and/or a rear-facing camera. When the device 1200 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data.
  • Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 1210 is configured to output and/or input audio signals.
  • audio component 1210 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 1200 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
  • the received audio signal may be further stored in memory 1204 or transmitted via communication component 1216 .
  • audio component 1210 also includes a speaker for outputting audio signals.
  • the I/O interface 1212 provides an interface between the processing component 1202 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • the sensor assembly 1214 includes one or more sensors for providing various aspects of the status assessment of the electronic device 1200 .
  • the sensor component 1214 can detect the open/closed state of the device 1200, the relative positioning of components, such as the display and keypad of the electronic device 1200, the sensor component 1214 can also detect the electronic device 1200 or a component of the electronic device 1200 The position of the electronic device 1200 changes, the presence or absence of the user's contact with the electronic device 1200, the orientation or acceleration/deceleration of the electronic device 1200, and the temperature change of the electronic device 1200.
  • Sensor assembly 1214 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 1214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 1214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 1216 is configured to facilitate wired or wireless communication between electronic device 1200 and other devices.
  • Electronic device 1200 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 8G, or 5G), or a combination thereof.
  • the communication component 1216 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 1216 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • electronic device 1200 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the root cause determination method described above.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field A programmable gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the root cause determination method described above.
  • a non-transitory computer-readable storage medium including instructions such as a memory 1204 including instructions, is also provided, and the instructions can be executed by the processor 1220 of the electronic device 1200 to complete the above method.
  • the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD - ROM, magnetic tape, floppy disk and optical data storage devices, etc.
  • an embodiment of the present disclosure further provides a storage medium, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute the above-mentioned root due to the method of determination.
  • a computer program product comprising instructions, which, when executed on a computer, enable the computer to implement the above-mentioned root cause determination method.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

一种根因确定方法及装置,方法包括:获取异常数据(101);根据维度,构建根因查找树(101);获取根因查找树中第一维度结点关联的第一维度因子组合(103);计算第一维度因子组合中存在异常因子组合的第一可能性参数(104);在第一可能性参数大于第一阈值且第一维度的数量不大于第二阈值时,增大第一阈值(105);以根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于新第一阈值且第一维度的数量不大于第二阈值时增大新第一阈值的过程,直至重新计算的第一可能性参数大于新第一阈值、且第一维度的数量大于第二阈值,从第一维度因子组合中确定异常因子组合(106)。

Description

根因确定方法及装置
相关申请的交叉引用
本申请基于申请号为202110130846.7、申请日为2021年01月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及计算机技术领域,尤其涉及一种根因确定方法及装置。
背景技术
在业务指标出现异常的情况下,多维度交叉寻找异常因子是一个常见的场景。通过遍历不同维度下因子的组合,结合打分的方式进行排序,从而最终输出最有可能的异常根因子。
其中,现有技术中的多维交叉寻找异常因子的方案,要求使用者输入一个超参数来对最终的结果进行控制,由于最终结果对该超参数极其敏感,因此该超参数会对最终结果产生巨大影响。尤其是当超参数设置的过低的情况下,方案在执行过程中可能会很快满足条件,便不继续进行根因的搜索,则返回的根因很可能是单维度而非维度交叉的。然而之所以进行多维交叉根因分析就是为了找到更细粒度的根因子,只返回单维度的根因很明显并不符合预期。
由此可见,通过经验选取的超参数有可能导致无法返回符合预期的根因,且由经验选取的超参数无法泛化,即针对不同的业务场景需要预先设定不同的超参数,从而导致现有技术中的多维交叉寻找异常因子的方案不能兼顾各种业务场景。
发明内容
本公开实施例提供一种根因确定方法及装置。
根据本公开实施例的第一方面,提供了一种根因确定方法,所述方法包括:
获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述方法还包括:
响应于所述第一可能性参数大于所述第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,在所述新第一阈值不小于预设值的情况下,则所述方法还包括:
响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述方法还包括:
响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度,所述第二维度因子组合包括各所述第二维度的一个因子;
计算所述第二维度因子组合中存在异常因子组合的第二可能性参数。
在一些实施例中,所述方法还包括:
确定异常目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的 维度数量;
所述计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
计算每一个所述第一维度因子组合的目标占比,其中,第i个所述第一维度因子组合的目标占比为,第i个所述第一维度因子组合关联的目标组合中异常目标组合的占比,i为正整数且i∈[1,M],M∈[1,N],M为所述第一维度结点关联的所述第一维度因子组合的数量;
根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
在一些实施例中,所述方法还包括:
删除符合预设条件的目标组合,得到第一剩余目标组合;
其中,所述预设条件包括目标对象的变化情况与目标指标的异常方向不匹配,所述目标对象为在不同时刻采集的所述目标组合的第一类指标值,所述目标指标为与所述维度关联的业务指标,所述第一类指标值为所述目标指标的值;
所述确定异常目标组合,包括:
确定所述第一剩余目标组合中的异常目标组合。
在一些实施例中,所述确定异常目标组合,包括:
获取所述目标组合的偏移量;
绘制第一偏移量分布曲线图,其中,所述第一偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴表示的偏移量的目标组合的数量;
确定所述第一偏移量分布曲线图中的第一拐点;
在所有所述目标组合中,偏移量大于第一目标偏移量的目标组合的占比不大于第五阈值,则确定偏移量大于所述第一目标偏移量的目标组合为异常目标组合,其中,所述第一目标偏移量为所述第一拐点在所述第一偏移量分布曲线图中的横坐标。
在一些实施例中,所述确定所述第一偏移量分布曲线图中的第一拐点,包括:
根据第一预设公式S=min(m,L/n),计算基于肘部法则的拐点检测算法中的敏感参数S,其中,L为所述第一偏移量分布曲线图中涉及的目标组合的总数量,m和n分别为预先设置的常量;
采用所述基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点。
在一些实施例中,所述确定异常目标组合,还包括:
在所有所述目标组合中,偏移量大于所述第一目标偏移量的目标组合的占比大于所述第五阈值,则按照偏移量从小到大的顺序,对所述目标组合进行排序,获得第一排序;
将所述第一排序中的前第一预设数量的目标组合去除,得到第二剩余目标组合;
根据本次得到的所述第二剩余目标组合的偏移量,绘制第二偏移量分布曲线图,其中,所述第二偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴上的数值表示的偏移量的目标组合的数量;
确定所述第二偏移量分布曲线图中的第二拐点;
在本次得到的所述第二剩余目标组合中,偏移量大于第二目标偏移量的目标组合的占比不大于所述第五阈值,则将偏移量大于所述第二目标偏移量的目标组合确定为异常目标组合,其中,所述第二目标偏移量为所述第二拐点在所述第二偏移量分布曲线图中的横坐标。
在一些实施例中,所述根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
按照所述目标占比从大到小的顺序,对所述第一维度因子组合进行排序,得到第二排序;
选出所述第二排序中前第二预设数量的待处理因子组合;
对所述待处理因子组合进行排序,获得第三排序;
将所述第三排序中前第三预设数量的备选因子组合;
获取所述备选因子组合关联的目标组合,其中,所述备选因子组合关联的目标组合包括所述备选因子组合中的因子;
根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,所述第二预设数量大于所述第三预设数量。
在一些实施例中,所述对所述待处理因子组合进行排序,获得第三排序,包括:
在目标指标为原生指标的情况下,计算所述待处理因子组合的第一参数,并按照所述第一参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序,其中,所述目标指标为所 述维度关联的业务指标,所述第一参数为与同一个所述待处理因子组合相关联的目标组合的偏移量之和,所述待处理因子组合相关联的目标组合包括所述待处理因子组合中的因子;
在所述目标指标为衍生指标的情况下,获取每一个第一目标组合的第一数值,所述第一目标组合为所述待处理因子组合关联的目标组合,所述第一数值为所述第一目标组合在不同时刻的第二类指标值之差的绝对值,所述第二类指标值为第一指标的值,在所述第一指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分子的指标;
获取每一个所述第一目标组合的第二数值,所述第二数值为所述第一目标组合在所述不同时刻的第三类指标值之差的绝对值,所述第三类指标值为第二指标的值,在所述第二指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分母的指标;
计算每一个所述第一目标组合的第三数值,所述第三数值为同一个所述第一目标组合的所述第一数值与所述第二数值之和;
计算所述待处理因子组合的第二参数,所述第二参数为与同一个所述待处理因子组合关联的所述第一目标组合的所述第三数值之和;
按照所述待处理因子组合的所述第二参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序。
在一些实施例中,所述从所述第一维度因子组合中确定异常因子组合,包括:
将所述备选因子组合,确定为所述异常因子组合。
在一些实施例中,所述根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
计算第一异常目标组合的偏移量的第一平均值avg1,其中,所述第一异常目标组合为第一目标组合中的异常目标组合,所述第一目标组合为所述备选因子组合相关联的目标组合;
计算所述第一目标组合中除所述第一异常目标组合之外的其他目标组合的偏移量的第二平均值avg2;
根据第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)),计算第三参数a(Z1),其中,f(Z1)表示所述第一异常目标组合在第二时刻的第一类指标值的和,f(Z)表示所述第一目标组合在所述第二时刻的所述第一类指标值的和,v(Z)表示所述第一目标组合在第一时刻的所述第一类指标值的和,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,所述第一时刻早于所述第二时刻;
计算每一个所述第一异常目标组合在所述第一时刻的所述第一类指标值与所述第三参数之差,得到与每一个所述第一异常目标组合对应的第四参数;
计算所有所述第一异常目标组合对应的第四参数的绝对值的第三平均值avg3;
根据所述avg1、avg2、avg3、以及第三预设公式GPS=1-(avg3+avg2)/(avg1+avg2),计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,GPS表示所述第一可能性参数。
在一些实施例中,所述方法还包括:
获取与所述异常因子组合关联的第二目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量,所述第一目标组合包括所述异常因子组合中的因子;
获取第四数值,所述第四数值为所述第二目标组合在第三时刻的第一类指标值的和,所述第一类指标值为目标指标的取值,所述目标指标为与所述维度关联的业务指标;
获取第五数值,所述第五数值为所有所述目标组合在所述第三时刻的所述第一类指标值的和;
响应于所述第四数值与所述第五数值的比值小于第三阈值,执行预设提示操作,所述预设提示操作用于提示异常因子未处于所述异常数据中。
根据本公开实施例的第二方面,提供一种根因确定装置,所述装置包括:
数据获取模块,被配置为获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
构建模块,被配置为根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
第一因子组合获取模块,被配置为获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
第一可能性参数计算模块,被配置为计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
阈值增大模块,被配置为响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
执行模块,被配置为以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述装置还包括:
第一确定模块,被配置为响应于所述第一可能性参数大于所述第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述装置还包括:
第二确定模块,被配置为基于增大所述第一阈值,响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述装置还包括:
第二因子组合获取模块,被配置为响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度,所述第二维度因子组合包括各所述第二维度的一个因子;
第二可能性参数计算模块,被配置为计算所述第二维度因子组合中存在异常因子组合的第二可能性参数。
在一些实施例中,所述装置还包括:
异常目标组合确定模块,被配置为确定异常目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量;
所述第一可能性参数计算模块包括:
占比计算子模块,被配置为计算每一个所述第一维度因子组合的目标占比,其中,第i个所述第一维度因子组合的目标占比为,第i个所述第一维度因子组合关联的目标组合中异常目标组合的占比,i为正整数且i∈[1,M],M∈[1,N],M为所述第一维度结点关联的所述第一维度因子组合的数量;
可能性参数计算子模块,被配置为根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
在一些实施例中,所述装置还包括:
删除模块,被配置为删除符合预设条件的目标组合,得到第一剩余目标组合;
其中,所述预设条件包括目标对象的变化情况与目标指标的异常方向不匹配,所述目标对象为在不同时刻采集的所述目标组合的第一类指标值,所述目标指标为与所述维度关联的业务指标,所述第一类指标值为所述目标指标的值;
在所述异常目标组合确定模块在确定异常目标组合的情况下,具体被配置为:
确定所述第一剩余目标组合中的异常目标组合。
在一些实施例中,所述异常目标组合确定模块包括:
偏移量获取子模块,被配置为获取所述目标组合的偏移量;
第一绘制子模块,被配置为绘制第一偏移量分布曲线图,其中,所述第一偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴表示的偏移量的目标组合的数量;
第一拐点确定子模块,被配置为确定所述第一偏移量分布曲线图中的第一拐点;
第一异常目标组合确定子模块,被配置为在所有所述目标组合中,偏移量大于第一目标偏移量的目标组合的占比不大于第五阈值,则确定偏移量大于所述第一目标偏移量的目标组合为异常目标组合,其中,所述第一目标偏移量为所述第一拐点在所述第一偏移量分布曲线图中的横坐标。
在一些实施例中,所述第一拐点确定子模块具体被配置为:
根据第一预设公式S=min(m,L/n),计算基于肘部法则的拐点检测算法中的敏感参数S,其中,L为所述第一偏移量分布曲线图中涉及的目标组合的总数量,m和n分别为预先设置的常量;
采用所述基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点。
在一些实施例中,所述异常目标组合确定模块还包括:
排序子模块,被配置为在所有所述目标组合中,偏移量大于所述第一目标偏移量的目标组合 的占比大于所述第五阈值,则按照偏移量从小到大的顺序,对所述目标组合进行排序,获得第一排序;
删减子模块,被配置为将所述第一排序中的前第一预设数量的目标组合去除,得到第二剩余目标组合;
第二绘制子模块,被配置为根据本次得到的所述第二剩余目标组合的偏移量,绘制第二偏移量分布曲线图,其中,所述第二偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴上的数值表示的偏移量的目标组合的数量;
第二拐点确定子模块,被配置为确定所述第二偏移量分布曲线图中的第二拐点;
第二异常目标组合确定子模块,被配置为在本次得到的所述第二剩余目标组合中,偏移量大于第二目标偏移量的目标组合的占比不大于所述第五阈值,则将偏移量大于所述第二目标偏移量的目标组合确定为异常目标组合,其中,所述第二目标偏移量为所述第二拐点在所述第二偏移量分布曲线图中的横坐标。
在一些实施例中,所述可能性参数计算子模块具体被配置为:
按照所述目标占比从大到小的顺序,对所述第一维度因子组合进行排序,得到第二排序;
选出所述第二排序中前第二预设数量的待处理因子组合;
对所述待处理因子组合进行排序,获得第三排序;
将所述第三排序中前第三预设数量的备选因子组合;
获取所述备选因子组合关联的目标组合,其中,所述备选因子组合关联的目标组合包括所述备选因子组合中的因子;
根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,所述第二预设数量大于所述第三预设数量。
在一些实施例中,所述可能性参数计算子模块在对所述待处理因子组合进行排序,在获得第三排序的情况下,具体被配置为:
在目标指标为原生指标的情况下,计算所述待处理因子组合的第一参数,并按照所述第一参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序,其中,所述目标指标为所述维度关联的业务指标,所述第一参数为与同一个所述待处理因子组合相关联的目标组合的偏移量之和,所述待处理因子组合相关联的目标组合包括所述待处理因子组合中的因子;
在所述目标指标为衍生指标的情况下,获取每一个第一目标组合的第一数值,所述第一目标组合为所述待处理因子组合关联的目标组合,所述第一数值为所述第一目标组合在不同时刻的第二类指标值之差的绝对值,所述第二类指标值为第一指标的值,所述第一指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分子的指标;
获取每一个所述第一目标组合的第二数值,所述第二数值为所述第一目标组合在所述不同时刻的第三类指标值之差的绝对值,所述第三类指标值为第二指标的值,所述第二指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分母的指标;
计算每一个所述第一目标组合的第三数值,所述第三数值为同一个所述第一目标组合的所述第一数值与所述第二数值之和;
计算所述待处理因子组合的第二参数,所述第二参数为与同一个所述待处理因子组合关联的所述第一目标组合的所述第三数值之和;
按照所述待处理因子组合的所述第二参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序。
在一些实施例中,在所述执行模块在从所述第一维度因子组合中确定异常因子组合的情况下,具体被配置为:
将所述备选因子组合,确定为所述异常因子组合。
在一些实施例中,所述可能性参数计算子模块在根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数的情况下,具体被配置为:
计算第一异常目标组合的偏移量的第一平均值avg1,其中,所述第一异常目标组合为第一目标组合中的异常目标组合,所述第一目标组合为所述备选因子组合相关联的目标组合;
计算所述第一目标组合中除所述第一异常目标组合之外的其他目标组合的偏移量的第二平均值avg2;
根据第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)),计算第三参数a(Z1),其中,f(Z1)表示所述第一异常目标组合在第二时刻的第一类指标值的和,f(Z)表示所述第一 目标组合在所述第二时刻的所述第一类指标值的和,v(Z)表示所述第一目标组合在第一时刻的所述第一类指标值的和,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,所述第一时刻早于所述第二时刻;
计算每一个所述第一异常目标组合在所述第一时刻的所述第一类指标值与所述第三参数之差,得到与每一个所述第一异常目标组合对应的第四参数;
计算所有所述第一异常目标组合对应的第四参数的绝对值的第三平均值avg3;
根据所述avg1、avg2、avg3、以及第三预设公式GPS=1-(avg3+avg2)/(avg1+avg2),计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,GPS表示所述第一可能性参数。
在一些实施例中,所述装置还包括:
第一验证参数获取模块,被配置为获取与所述异常因子组合关联的第二目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量,所述第一目标组合包括所述异常因子组合中的因子;
第二验证参数获取模块,被配置为获取第四数值,所述第四数值为所述第二目标组合在第三时刻的第一类指标值的和,所述第一类指标值为目标指标的取值,所述目标指标为与所述维度关联的业务指标;
第三验证参数获取模块,被配置为获取第五数值,所述第五数值为所有所述目标组合在所述第三时刻的所述第一类指标值的和;
验证模块,被配置为响应于所述第四数值与所述第五数值的比值小于第三阈值,执行预设提示操作,所述预设提示操作用于提示异常因子未处于所述异常数据中。
根据本公开实施例的第三方面,提供一种电子设备包括:
处理器;
用于存储该处理器可执行指令的存储器;
其中,所述处理器被配置为执行所述指令,以实现上述所述的根因确定方法。
根据本公开实施例的第四方面,提供一种非易失性计算机可读存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够实现上述所述的根因确定方法。
根据本公开实施例的第五方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述所述的根因确定方法。
本公开的实施例,能够获取维度以及维度包括的因子,从而根据维度构建根因查找树,进而以根因查找树中的任一维度结点为第一维度结点,并获取第一维度结点关联的第一维度因子组合,计算第一维度因子组合中存在异常因子组合的第一可能性参数,响应于第一可能性参数大于第一阈值且第一维度的数量不大于第一阈值,增大第一阈值。
然后,再次以根因查找树中的任一维度结点为第一维度结点,以增大后的第一阈值为新第一阈值,重复执行计算第一可能性参数、以及在重新计算的第一可能性参数大于新第一阈值且第一维度的数量不大于第二阈值时增大新第一阈值的过程,直至重新计算的第一可能性参数大于新第一阈值、且第一维度的数量大于第二阈值,从第一维度因子组合中确定异常因子组合。
由此可知,本公开的实施例,无需使用者输入超参数,并且会在计算得到一个大于最初设置的第一阈值的第一可能性参数的情况下,判断第一维度的数量是否大于第二阈值,第一维度的数量不大于第二阈值,则说明阈值设置的过低,方法没有往根因查找树的更深层进行探索就满足阈值条件并返回结果了,而此时,在本公开的实施例中,以一定步长提高第一阈值并重新寻找异常因子组合,从而可以输出维度交叉(即多维度)的根因。因此,本公开的实施例,无需使用者输入任何超参数即可返回合理结果,且可以适用于多种业务场景。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
图1是根据一示例性实施例示出的根因确定方法的流程图。
图2是本公开实施例中根因查找树的示意图;
图3是根据一示例性实施例示出的另一种根因确定方法的流程图;
图4是本公开实施例中偏移分布曲线示意图;
图5是本公开实施例中敏感参数S为不同取值时拐点的位置示意图;
图6是本公开实施例中第一偏移量分布曲线图映射到横坐标取值范围为0~1,以及纵坐标取值范围为0~1的曲线图;
图7是本公开实施例中的距离曲线的示意图;
图8是本公开实施例中根因确定方法的具体实施方式的流程图;
图9是根据一示例性实施例示出的一种根因确定装置的框图;
图10根据一示例性实施例示出的另一种根因确定装置的框图;
图11根据一示例性实施例示出的一种电子设备的框图;
图12根据一示例性实施例示出的另一种电子设备的框图。
具体实施方式
图1是根据一示例性实施例示出的一种根因确定方法的流程图,该根因确定方法可以应用于电子设备,该电子设备例如可以为服务器、电脑、手机等。如图1所示,该方法包括以下步骤101-106。
步骤101:获取异常数据。
其中,所述异常数据包括维度以及所述维度包括的因子。
步骤102:根据所述维度,构建根因查找树。
其中,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同。
另外,在第j层的一个维度结点关联的维度属于第j+1层的一个维度结点关联的维度的情况下,则该第j层的维度结点为该第j+1层的维度结点的父结点,j为正整数且j∈[1,N],N为所述异常数据的维度数量。即本公开实施例中,一个维度结点的父结点关联的维度,属于该维度结点关联的维度,也可以认为被包括在该维度结点关联的维度中。
例如异常数据包括A、B、C三个维度,A维度包括a1、a2两个因子,B维度包括b1、b2、b3三个因子,C维度包括c1、c2、c3、c4四个因子,则根据A、B、C三个维度,构建的根因查找树,如图2所示。即该根因查找树包括三层,第一层包括3个维度结点,分别为关联A维度的维度结点A、关联B维度的维度结点B以及关联C维度的维度结点C;第二层包括关联A维度和B维度的维度结点AB、关联A维度和C维度的维度结点AC、关联B维度和C维度的维度结点BC;第三层包括关联A、B、C维度组成的维度结点ABC。
其中,对于父结点,例如图2中,A维度包括在A维度和B维度中,因此,关联A维度的维度结点A为关联A维度和B维度的维度结点AB的父结点。
步骤103:获取所述根因查找树中第一维度结点关联的第一维度因子组合。
其中,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点。
例如图2所示的根因查找树,对于第一层:维度结点A关联的因子组合为:a1因子组成的因子组合、a2因子组合的因子组合;维度结点B关联的因子组合为:b1组成的因子组合、b2组成的因子组合、b3组成的因子组合;维度结点C关联的因子组合为:c1组成的因子组合、c2组成的因子组合、c3组成的因子组合、c4组成的因子组合。
针对第二层:维度结点AB关联的因子组合为:从a1、a2两个因子中选出的一个因子、以及从b1、b2、b3三个因子中选出的因子的组成的因子组合,即为6个因子组合;维度结点AC关联的因子组合为:从a1、a2两个因子中选出的一个因子、以及从c1、c2、c3、c4四个因子中选出的因子的组成的因子组合,即为8个因子组合;维度结点BC关联的因子组合为:从b1、b2、b3三个因子中选出的一个因子、以及从c1、c2、c3、c4四个因子中选出的因子的组成的因子组合,即为12个因子组合。
针对第三层:维度结点ABC关联的因子组合为:从a1、a2两个因子中选出的一个因子、从b1、b2、b3三个因子中选出的因子、以及从c1、c2、c3、c4四个因子中选出的因子的组成的因子组合,即为24个因子组合。
其中,关联维度数量最多的维度结点所关联的因子组合,我们称为目标组合,应当理解,关联维度数量最多的维度结点为根因查找树中的第N层维度结点,N为异常数据的维度数量。例如,图2中第三层的维度结点ABC关联的因子组合为目标组合。
步骤104:计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
其中,可能性参数与维度结点是一一对应的关系,即根因查找树中的每一个维度结点对应存在一个可能性参数。一个维度结点对应的可能性参表示该维度结点关联的因子组合中存在异常因 子组合的概率。在一些实施例中,所述可能性参数可以为激励潜在分数(General Potential Score,GPS)。其中,GPS在多维交叉根因分析中是衡量因子组合成为根因的可能性的值,GPS的计算方法将在后文中描述。
步骤105:响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值。
步骤106:以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
其中,上述步骤105和步骤106,即为在根因查找树中的维度结点关联的因子组合中,查找异常因子组合的过程。查找到的异常因子组合即为“根因”。
另外,在上述步骤105和步骤106中,并未限定在根因查找树中的维度结点关联的因子组合中查找异常因子组合的顺序,例如可以按照维度数量由小到大的顺序逐层对根因查找树进行遍历,即逐层计算每一个维度结点对应的可能性参数;也可以随机选择一个维度结点,并计算该维度结点对应的可能性参数,并在计算得到的该维度结点关联的因子组合中不存在异常因子组合的情况下,随机选择下一个未计算过可能性参数的维度结点。
例如图2所示的根因查找树,可以按照第一层、第二层、第三层的顺序,对每一层中的维度结点进行遍历。
在一些实施例中,针对第一层,按照维度结点A、维度结点B、维度结点C的顺序,计算维度结点对应的可能性参数;针对第二层,按照维度结点AB、维度结点AC、维度结点BC的顺序,计算维度结点对应的可能性参数。
此外,第一可能性参数大于第一阈值且第一维度的数量不大于第二阈值,则说明第一阈值设置的过低,方法没有往根因查找树的更深层进行搜索就满足阈值条件并返回结果了,此时,本公开的实施例,以一定步长提高第一阈值并重新寻找异常因子组合。
重新计算的第一可能性参数大于新第一阈值、且第一维度的数量大于第二阈值,则说明对于第一维度因子组合中存在异常因子组合的可能性很大,且第一维度的数量是合理的,则可以从第一维度因子组合中选择出异常因子组合。
在一些实施例中,所述方法还包括:
响应于所述第一可能性参数大于所述第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
即步骤104计算得到的第一可能性参数大于第一阈值,且第一维度的数量大于第二阈值,说明第一维度因子组合中存在异常因子组合的可能性很大,且第一维度的数量是合理的,则从第一维度因子组合中选择出异常因子组合。
在一些实施例中,在所述新第一阈值不小于预设值的情况下,则所述方法还包括:
响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
其中,基于多次增大第一阈值,第一阈值不小于预设值,重新计算的第一可能性参数大于新第一阈值、且第一维度的数量不大于第二阈值,则表示异常因子组合中包括的因子所属的维度数量不大于第二阈值也是合理的。
另外,需要说明的是,新第一阈值小于预设值,则重复执行计算第一可能性参数、以及在重新计算的第一可能性参数大于新第一阈值且第一维度的数量不大于第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述方法还包括:
响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度,所述第二维度因子组合包括各所述第二维度的一个因子;
计算所述第二维度因子组合中存在异常因子组合的第二可能性参数。
其中,第一可能性参数不大于第一阈值,表示第一维度因子组合中存在异常因子组合的可能性很小,此种情况下则需要重新计算其他维度结点对应的可能性参数。例如在按照维度数量由小到大的顺序对根因查找树逐层遍历的过程中,计算得到某个维度结点的可能性参数小于或等于第 一阈值,则继续遍历剩余的维度结点,即计算这个可能性参数对应的维度结点的下一维度结点的可能性参数,直到计算得到一个大于第一阈值的可能性参数、且这个可能性参数对应的维度结点关联的维度的数量大于第二阈值的情况下,停止遍历。
需要说明的是,在根因查找树中的所有维度结点对应的可能性参数计算完成的情况下,一直未得到大于第一阈值的可能性参数,则表示异常因子组合并不存在于根因查找树中的维度结点关联的因子组合中。
另外,第二可能性参数大于第一阈值、且第二维度的数量大于第二阈值,则说明对于第二维度因子组合中存在异常因子组合的可能性很大,且第二维度的数量是合理的,则可以从第二维度因子组合中选择出异常因子组合。第二可能性参数不大于第一阈值,或者第二可能性参数大于第一阈值且第二维度的数量不大于第二阈值,则计算根因查找树中除第一维度结点和第二维度结点之外的其他维度结点对应的可能性参数。
由上述可知,本公开的实施例,能够获取维度以及维度包括的因子,从而根据维度构建根因查找树,进而以根因查找树中的任一维度结点为第一维度结点,并获取第一维度结点关联的第一维度因子组合,计算第一维度因子组合中存在异常因子组合的第一可能性参数,响应于第一可能性参数大于第一阈值且第一维度的数量不大于第一阈值,增大第一阈值。
然后,再次以根因查找树中的任一维度结点为第一维度结点,以增大后的第一阈值为新第一阈值,重复执行计算第一可能性参数、以及在重新计算的第一可能性参数大于新第一阈值且第一维度的数量不大于第二阈值时增大新第一阈值的过程,直至重新计算的第一可能性参数大于新第一阈值、且第一维度的数量大于第二阈值,从第一维度因子组合中确定异常因子组合。
由此可知,本公开的实施例,无需使用者输入超参数,并且会在计算得到一个大于最初设置的第一阈值的第一可能性参数的情况下,判断第一维度的数量是否大于第二阈值,第一维度的数量不大于第二阈值,则说明阈值设置的过低,方法没有往根因查找树的更深层进行探索就满足阈值条件并返回结果了,而此时,在本公开的实施例中,以一定步长提高第一阈值并重新寻找异常因子组合,从而可以输出维度交叉(即多维度)的根因。因此,本公开的实施例,无需使用者输入任何超参数即可返回合理结果,符合预期,且可以适用于多种业务场景。
图3是根据一示例性实施例示出的一种根因确定方法的流程图。该根因确定方法可以应用于电子设备,该电子设备例如可以为服务器、电脑、手机等。如图3所示,该方法包括以下步骤301-308。
步骤301:获取异常数据。
其中,所述异常数据包括维度以及所述维度包括的因子。
步骤302:根据所述维度,构建根因查找树。
其中,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同。
另外,第j层的一个维度结点关联的维度属于第j+1层的一个维度结点关联的维度,则该第j层的维度结点为该第j+1层的维度结点的父结点,j为正整数且j∈[1,N],N为所述异常数据的维度数量。即本公开实施例中,一个维度结点的父结点关联的维度,属于该维度结点关联的维度,也可以认为被包括在该维度结点关联的维度中。
例如异常数据包括A、B、C三个维度,A维度包括a1、a2两个因子,B维度包括b1、b2、b3三个因子,C维度包括c1、c2、c3、c4四个因子,则根据A、B、C三个维度,构建的根因查找树,如图2所示。即该根因查找树包括三层,第一层包括3个维度结点,分别为关联A维度的维度结点A、关联B维度的维度结点B以及关联C维度的维度结点C;第二层包括关联A维度和B维度的维度结点AB、关联A维度和C维度的维度结点AC、关联B维度和C维度的维度结点BC;第三层包括关联A、B、C维度组成的维度结点ABC。
其中,对于父结点,例如图2中,A维度包括在A维度和B维度中,因此,关联A维度的维度结点A为关联A维度和B维度的维度结点AB的父结点。
步骤303:获取所述根因查找树中第一维度结点关联的第一维度因子组合。
其中,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点。
例如图2所示的根因查找树,对于第一层:维度结点A关联的因子组合为:a1因子组成的因子组合、a2因子组合的因子组合;维度结点B关联的因子组合为:b1组成的因子组合、b2组成的因子组合、b3组成的因子组合;维度结点C关联的因子组合为:c1组成的因子组合、c2组成的因子组合、c3组成的因子组合、c4组成的因子组合。
针对第二层:维度结点AB关联的因子组合为:从a1、a2两个因子中选出的一个因子、以及从b1、b2、b3三个因子中选出的因子的组成的因子组合,即为6个因子组合;维度结点AC关联的因子组合为:从a1、a2两个因子中选出的一个因子、以及从c1、c2、c3、c4四个因子中选出的因子的组成的因子组合,即为8个因子组合;维度结点BC关联的因子组合为:从b1、b2、b3三个因子中选出的一个因子、以及从c1、c2、c3、c4四个因子中选出的因子的组成的因子组合,即为12个因子组合。
针对第三层:维度结点ABC关联的因子组合为:从a1、a2两个因子中选出的一个因子、从b1、b2、b3三个因子中选出的因子、以及从c1、c2、c3、c4四个因子中选出的因子的组成的因子组合,即为24个因子组合。
其中,关联维度数量最多的维度结点所关联的因子组合,我们称为目标组合,应当理解,关联维度数量最多的维度结点为根因查找树中的第N层维度结点,N为异常数据的维度数量。例如,图2中第三层的维度结点ABC关联的因子组合为目标组合。
步骤304:确定异常目标组合。
其中,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量。例如图2所示的根因查找树,第三层的维度结点关联的因子组合即为目标组合。
所述异常目标组合为偏移量大于预设偏移量的目标组合。所述偏移量为在不同时刻采集的目标组合的第一类指标值之差的绝对值,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,例如所述目标指标为异常指标或者待检测指标。
步骤305:计算每一个所述第一维度因子组合的目标占比。
其中,第i个所述第一维度因子组合的目标占比为:第i个所述第一维度因子组合关联的目标组合中异常目标组合的占比,i为正整数且i∈[1,M],M∈[1,N],M为所述第一维度结点关联的所述第一维度因子组合的数量。
此处需要说明的是,本公开的实施例中,一个因子组合包括的因子,属于一个目标组合包括的因子,则该因子组合与该目标组合相关联。
例如某一个第一维度因子组合关联有10个目标组合,其中存在6个异常目标组合,则该第一维度因子组合的目标占比为0.6。
步骤306:根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
其中,可能性参数与维度结点是一一对应的关系,即根因查找树中的每一个维度结点对应存在一个可能性参数。一个维度结点对应的可能性参表示该维度结点关联的因子组合中存在异常因子组合的概率。在一些实施例中,所述可能性参数可以为激励潜在分数(General Potential Score,GPS)。其中,GPS在多维交叉根因分析中是衡量因子组合成为根因的可能性的值,GPS的计算方法将在后文中描述。
步骤307:响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值。
步骤308:以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
其中,上述步骤307和步骤308,即为在根因查找树中的维度结点关联的因子组合中,查找异常因子组合的过程。查找到的异常因子组合即为“根因”。
另外,在上述步骤307和步骤308中,并未限定在根因查找树中的维度结点关联的因子组合中,查找异常因子组合的顺序,例如可以按照维度数量由小到大的顺序逐层对根因查找树进行遍历,即逐层计算每一个维度结点对应的可能性参数;也可以随机选择一个维度结点,并计算该维度结点对应的可能性参数,并在计算得到的该维度结点关联的因子组合中不存在异常因子组合的情况下,随机选择下一个未计算过可能性参数的维度结点。
例如图2所示的根因查找树,可以按照第一层、第二层、第三层的顺序,对每一层中的维度结点进行遍历。
在一些实施例中,针对第一层,按照维度结点A、维度结点B、维度结点C的顺序,计算维度结点对应的可能性参数;针对第二层,按照维度结点AB、维度结点AC、维度结点BC的顺序,计算维度结点对应的可能性参数。
此外,第一可能性参数大于第一阈值且第一维度的数量不大于第二阈值,则说明第一阈值设置的过低,方法没有往根因查找树的更深层进行搜索就满足阈值条件并返回结果了,此时,本公开的实施例,以一定步长提高第一阈值并重新寻找异常因子组合。
重新计算的第一可能性参数大于新第一阈值、且第一维度的数量大于第二阈值,则说明对于第一维度因子组合中存在异常因子组合的可能性很大,且第一维度的数量是合理的,则可以从第一维度因子组合中,选择出异常因子组合。
在一些实施例中,所述方法还包括:
响应于在所述第一可能性参数大于所述第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
即步骤306计算得到的第一可能性参数大于第一阈值,且第一维度的数量大于第二阈值的情况下,说明第一维度因子组合中存在异常因子组合的可能性很大,且第一维度的数量是合理的,则从第一维度因子组合中选择出异常因子组合。
在一些实施例中,在所述新第一阈值不小于预设值的情况下,则所述方法还包括:
响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
其中,基于多次增大第一阈值,第一阈值不小于预设值,重新计算的第一可能性参数大于新第一阈值、且第一维度的数量不大于第二阈值,则表示异常因子组合中包括的因子所属的维度数量不大于第二阈值也是合理的。
在一些实施例中,所述方法还包括:
响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度,所述第二维度因子组合包括各所述第二维度的一个因子;
计算所述第二维度因子组合中存在异常因子组合的第二可能性参数。
其中,第一可能性参数不大于第一阈值,表示第一维度因子组合中存在异常因子组合的可能性很小,此种情况下则需要重新计算其他维度结点对应的可能性参数。例如在按照维度数量由小到大的顺序对根因查找树逐层遍历的过程中,计算得到某个维度结点的可能性参数小于或等于第一阈值,则继续遍历剩余的维度结点,即计算这个可能性参数对应的维度结点的下一维度结点的可能性参数,直到计算得到一个大于第一阈值的可能性参数、且这个可能性参数对应的维度结点关联的维度的数量大于第二阈值的情况下,停止遍历。
需要说明的是,在根因查找树中的所有维度结点对应的可能性参数计算完成的情况下,一直未得到大于第一阈值的可能性参数,则表示因此异常组合并不存在于根因查找树中的维度结点关联的因子组合中。
另外,第二可能性参数大于第一阈值、且第二维度的数量大于第二阈值,则说明对于第二维度因子组合中存在异常因子组合的可能性很大,且第二维度的数量是合理的,则可以从第二维度因子组合中选择出异常因子组合。第二可能性参数不大于第一阈值,或者第二可能性参数大于第一阈值且第二维度的数量不大于第二阈值,则计算根因查找树中除第一维度结点和第二维度结点之外的其他维度结点对应的可能性参数。
在一些实施例中,所述方法还包括:
删除符合预设条件的目标组合,得到第一剩余目标组合;
其中,所述预设条件包括目标对象的变化情况与目标指标的异常方向不匹配,所述目标对象为在不同时刻采集的所述目标组合的第一类指标值,所述目标指标为与所述维度关联的业务指标,所述第一类指标值为所述目标指标的值;
所述确定异常目标组合,包括:
确定所述第一剩余目标组合中的异常目标组合。
即本公开的实施例,为了确定异常目标组合,去掉与目标指标异常方向不一致的目标组合。例如,需要确定DAU增加的因子组合,但某些目标组合的DAU,现在的指标值相对于过去的指标值是减小的,则需要去掉该目标组合,从而避免此类目标组合影响后续寻找导致DAU增加的真正原因。
其中,在删除符合预设条件的目标组合的情况下,则确定得到的第一剩余目标组合中的异常目标组合,从而采用第一剩余目标组合中的异常目标组合执行后续的流程。
在一些实施例中,所述确定异常目标组合,包括:
获取所述目标组合的偏移量;
绘制第一偏移量分布曲线图,其中,所述第一偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴表示的偏移量的目标组合的数量;
确定所述第一偏移量分布曲线图中的第一拐点;
在所有所述目标组合中,偏移量大于第一目标偏移量的目标组合的占比不大于第五阈值,则确定偏移量大于所述第一目标偏移量的目标组合为异常目标组合,其中,所述第一目标偏移量为所述第一拐点在所述第一偏移量分布曲线图中的横坐标。
其中,第一偏移量分布曲线图例如可为图4所示。
另外,所述偏移量为在不同时刻采集的目标组合的第一类指标值之差的绝对值,所述第一类指标为目标指标的值,所述目标指标为所述维度关联的业务指标。例如在过去某一时刻曾采集到在一个目标组合的DAU指标值为x1,在当前时刻采集到的该目标组合的DAU指标值为x2,则该目标组合的偏移量为|x2-x1|。
即本公开的实施例中,在得到根因查找树的情况下,会针对每一个目标组合计算偏移量,并根据计算得到的偏移量绘制偏移量分布曲线图,然后通过寻找该偏移量分布曲线图的拐点,来找到确定异常目标组合所需的阈值,从而将大于该阈值的目标组合定义为异常目标组合。
在一些实施例中,所述确定所述第一偏移量分布曲线图中的第一拐点,包括:
根据第一预设公式S=min(m,L/n),计算基于肘部法则的拐点检测算法中的敏感参数S,其中,L为所述第一偏移量分布曲线图中涉及的目标组合的总数量,m和n分别为预先设置的常量;
采用所述基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点。
其中,在基于肘部法则的拐点检测算法寻找拐点的过程中,通过一敏感参数S来控制寻找拐点的保守程度。例如,在偏移量分布曲线图中,在S=1、S=3、S=5、S=10、S=100、S=200的情况下,各个拐点(即图5中虚线与实线的交点)的分布情况如图5所示,由图5可知,S越大,则拐点在偏移量曲线图中的横坐标的值越大,从而使得确定的异常目标组合的数量越少,即S越大越保守。
在一些实施例中,在计算出S的情况下,采用基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点的过程如下所述:
首先,第一偏移量分布曲线图可以映射到横坐标取值范围为0~1,以及纵坐标取值范围为0~1的曲线图中,例如图6所示。其中,在图6所示的曲线图中,每一个点都可以得到其到线段AB的距离,其中,线段AB的两个端点中,A点为图6中所示的曲线的起始点,B点为图6中所示的曲线的终点。则可以根据图6中的曲线中的每一个点到线段AB的距离,得到一个距离曲线图,如图7所示。其中,图7中所示的距离曲线的横轴与图6中所示的曲线的横轴表示的意义相同,即均为偏移量映射到0~1范围内的数值,图7中所示的距离曲线的纵轴表示图6中的曲线上的各个点到线段AB的距离。
其次,得到图7所示的距离曲线后,可以获取图7中所示的距离曲线中,距离大于预先确定的距离阈值的点的数量,记为Q;
然后,比较前述敏感参数S与Q的大小;在S小于或等于Q的情况下,将图7中所示的距离曲线中,距离大于前述距离阈值的点中的第S个点确定为第一目标点;在S大于Q的情况下,将图7中所示的距离曲线中,距离大于前述距离阈值的点中的最后一个点确定为第一目标点;
最后,将确定出的图7中的上述第一目标点,映射到图6中,即找出第一目标点对应于图6中的点,记为第二目标点,从而可以确定出图6中的第一目标点在前述第一偏移量曲线的点,即为第一偏移量曲线中的第一拐点。
在一些实施例中,所述确定异常目标组合,还包括:
在所有所述目标组合中,偏移量大于所述第一目标偏移量的目标组合的占比大于所述第五阈值,则按照偏移量从小到大的顺序,对所述目标组合进行排序,获得第一排序;
将所述第一排序中的前第一预设数量的目标组合去除,得到第二剩余目标组合;
根据本次得到的所述第二剩余目标组合的偏移量,绘制第二偏移量分布曲线图,其中,所述第二偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴上的数值表示的偏移量的目标组合的数量;
确定所述第二偏移量分布曲线图中的第二拐点;
在本次得到的所述第二剩余目标组合中,偏移量大于第二目标偏移量的目标组合的占比不大于所述第五阈值,则将偏移量大于所述第二目标偏移量的目标组合确定为异常目标组合,其中,所述第二目标偏移量为所述第二拐点在所述第二偏移量分布曲线图中的横坐标。
由上述可知,针对偏移量分布曲线可能为凹函数以致找不到拐点的情况(如图4中虚线部分圈中的部分),本公开的实施例循环的以一定比例摒弃掉部分目标组合,然后重新根据剩余的目标组合绘制偏移量分布曲线图,并再次计算拐点,直到根据拐点找到的异常目标组合在剩余目标组合中的占比小于第五阈值(例如50%)为止。
在一些实施例中,所述根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,可以包括以下步骤H1~H6。
步骤H1:按照所述目标占比从大到小的顺序,对所述第一维度因子组合进行排序,得到第二排序。
步骤H2:选出所述第二排序中前第二预设数量的待处理因子组合。
步骤H3:对所述待处理因子组合进行排序,获得第三排序。
步骤H4:将所述第三排序中前第三预设数量的备选因子组合。
步骤H5:获取所述备选因子组合关联的目标组合,其中,所述备选因子组合关联的目标组合包括所述备选因子组合中的因子。
步骤H6:根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
其中,所述第二预设数量大于所述第三预设数量。
例如第一维度结点关联有50个第一维度因子组合,则需要计算这50个第一维度因子组合中的每一个因子组合的目标占比,并依据目标占比从大到小的顺序,对这50个第一维度因子组合进行排序,从而先选出排序中第三预设数量(例如15个)的待处理因子组合,再对选出的这第三预设数量的待处理因子组合,进行排序,进而再选出第一预设数量(例如3个)备选因子组合,进而根据备选因子组合关联的目标组合,计算第一维度因子组合中存在异常因子组合的第一可能性参数。
其中,最终从一个维度结点关联的因子组合中选出的备选因子组合,可以称为该维度结点关联的备选因子组合。
在一些实施例中,所述对所述待处理因子组合进行排序,获得第三排序,包括:
在目标指标为原生指标的情况下,计算所述待处理因子组合的第一参数,并按照所述第一参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序,其中,所述目标指标为所述维度关联的业务指标,所述第一参数为与同一个所述待处理因子组合相关联的目标组合的偏移量之和,所述待处理因子组合相关联的目标组合包括所述待处理因子组合中的因子;
在所述目标指标为衍生指标的情况下,获取每一个第一目标组合的第一数值,所述第一目标组合为所述待处理因子组合关联的目标组合,所述第一数值为所述第一目标组合在不同时刻的第二类指标值之差的绝对值,所述第二类指标值为第一指标的值,所述第一指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分子的指标;
获取每一个所述第一目标组合的第二数值,所述第二数值为所述第一目标组合在所述不同时刻的第三类指标值之差的绝对值,所述第三类指标值为第二指标的值,所述第二指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分母的指标;
计算每一个所述第一目标组合的第三数值,所述第三数值为同一个所述第一目标组合的所述第一数值与所述第二数值之和;
计算所述待处理因子组合的第二参数,所述第二参数为与同一个所述待处理因子组合关联的所述第一目标组合的所述第三数值之和;
按照所述待处理因子组合的所述第二参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序。
在一些实施例中,目标指标为原子指标,则在目标指标的整体波动为上升的情况下:根据f-v的值从大到小的顺序,将上述待处理因子组合再次进行排序;则在目标指标的整体波动为下降时:根据v-f的值从大到小的顺序,将上述待处理因子组合再次进行排序;其中,f表示与待处理因子组合相关联的目标组合在第二时刻的第一类指标值之和,v表示与待处理因子组合相关联的目标组合在第一时刻的第一类指标值之和,第一时刻早于第二时刻。
目标指标为衍生指标,则首先需要确定计算目标指标的过程中作为分子的第一指标nume和作为分母的第二指标deno;
然后,在目标指标的整体波动为上升的情况下,根据f_nume-v_nume+v_deno-f_deno的值由大到小的顺序,将上述待处理因子组合,再次进行排序;在待检测指标的整体波动为下降的情况下,根据v_nume-f_nume+f_deno-v_deno的值,将上述待处理因子组合,再次进行排序。
其中,f_nume表示与待处理因子组合相关联的目标组合在第二时刻的第二类指标值(即第一指标nume的值)之和,v_nume表示与待处理因子组合相关联的目标组合在第一时刻的第二类指标值之和,f_deno表示与待处理因子组合相关联的目标组合在第二时刻的第三类指标值(即第二指标deno的值)之和,v_deno表示与待处理因子组合相关联的目标组合在第一时刻的第三类指标值之和。
由上述可知,上述所述的对待处理因子组合进行排序的过程,实际是根据待处理因子组合关联的目标组合的相关的偏移量进行的。
其中,在根据因子组合的目标占比来排序的情况下,很有可能排在前面的都是一些本身关联的目标组合数量就很小的因子组合。例如,该因子组合关联的目标组合只有一个,而这一个又恰好是异常的,那么该占比将会是1/1=1,这样的因子组合一定会排在最前面。由此可见,这样的因子组合关联的目标组合的数量很少,但目标占比却很大。在选出这样的因子组合作为计算一个维度结点的GPS的备选因子组合的情况下,则备选因子组合关联的目标组合的数量总和会比较小。
而一个维度结点关联的因子组合关联的目标组合的数量之和较小的情况下,该维度结点的GPS会比较高。因此,在选择关联目标组合数量较少但目标占比比较大的因子组合,作为计算一个维度结点的GPS的备选因子组合,则会使得计算得到的这个维度结点的GPS偏高,从而无法准确表示该维度节点关联的因子组合中存在异常因子组合的概率,进而导致查找到的异常因子组合不准确。其中,在一个维度结点关联的因子组合较少的情况下,该维度结点的GPS会比较高的具体原因会在下文描述。
本公开的实施例中,先用目标占比进行排序,保留目标占比从大到小的排序中的前第三预设数量的因子组合,而后再进一步引入“偏移量”,对这前第三预设数量的因子组合,进行二次排序,从而排除掉那些目标组合数很少导致目标占比很大的情况。
在一些实施例中,所述从所述第一维度因子组合中确定异常因子组合,包括:
将所述备选因子组合,确定为所述异常因子组合。
由上述可知,采用上述步骤H1~H6,在计算第一维度因子组合中存在异常因子组合的第一可能性参数的情况下,则可以将上述备选因子组合确定为异常因子组合,即确定为导致目标指标异常的因子组合。
在一些实施例中,所述根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
计算第一异常目标组合的偏移量的第一平均值avg1,其中,所述第一异常目标组合为第一目标组合中的异常目标组合,所述第一目标组合为所述备选因子组合相关联的目标组合;
计算所述第一目标组合中除所述第一异常目标组合之外的其他目标组合的偏移量的第二平均值avg2;
根据第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)),计算第三参数a(Z1),其中,f(Z1)表示所述第一异常目标组合在第二时刻的第一类指标值的和,f(Z)表示所述第一目标组合在所述第二时刻的所述第一类指标值的和,v(Z)表示所述第一目标组合在第一时刻的所述第一类指标值的和,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,所述第一时刻早于所述第二时刻;
计算每一个所述第一异常目标组合在所述第一时刻的所述第一类指标值与所述第三参数之差,得到与每一个所述第一异常目标组合对应的第四参数;
计算所有所述第一异常目标组合对应的第四参数的绝对值的第三平均值avg3;
根据所述avg1、avg2、avg3、以及第三预设公式GPS=1-(avg3+avg2)/(avg1+avg2),计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,GPS表示所述第一可能性参数。
例如与第一维度结点关联的备选因子组合关联的所有目标组合包括10个,其中,存在5个异常目标组合,5个正常目标组合,则需要计算这5个异常目标组合的偏移量的第一平均值avg1;然后,计算这5个正常目标组合的偏移量的第二平均值avg2;再次,计算在第一时刻采集的,这5个异常目标组合的第一类指标值之和,得到f(Z1);再次,计算在第一时刻采集的,在上述10个目标组合的第一类指标值之和,得到f(Z);再次,计算在第二时刻采集的,上述10个目标组合的第一类指标值之和,得到为v(Z),其中,第一时刻早于第二时刻;再次,将f(Z1)、f(Z)、v(Z)代入至第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)中,得到a(Z1);再次,针对上述5个异常目标组合,计算在第二时刻采集的每一个异常目标组合的第一类指标值与a(Z1)之差,并计算这些差值的绝对值的第三平均值avg3;最后,将avg1、avg2和 avg3代入公式GPS=1-(avg3+avg2)/(avg1+avg2)中,即可得到第一维度因子组合中存在异常因子组合的第一可能性参数GPS。
在一些实施例中,所述方法还包括如下步骤K1-K4。
步骤K1:获取与所述异常因子组合关联的第二目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量,所述第一目标组合包括所述异常因子组合中的因子。
步骤K2:获取第四数值,所述第四数值为所述第二目标组合在第三时刻的第一类指标值的和,所述第一类指标值为目标指标的取值,所述目标指标为与所述维度关联的业务指标。
步骤K3:获取第五数值,所述第五数值为所有所述目标组合在所述第三时刻的所述第一类指标值的和。
步骤K4:响应于在所述第四数值与所述第五数值的比值小于第三阈值,执行预设提示操作,所述预设提示操作用于提示异常因子未处于所述异常数据中。
例如对于图2所示的根因查找树,经过上述步骤找到异常因子组合为a1、b1因子组成的第一因子组合、a1和b2组成的第二因子组合以及a2和b3组成的第三因子组合,则与第一因子组合相关联的目标组合包括:a1、b1、c1组成的目标组合,a1、b1、c2组成的目标组合,a1、b1、c3组成的目标组合,a1、b1、c4组成的目标组合,即与第一因子组合相关联的目标组合包括四个目标组合;同理,与第二因子组合相关联的目标组合包括四个目标组合,与第三因子组合相关联的目标组合包括四个目标组合。则得到的异常因子组合相关联的目标组合包括12个。
其中,上述12个目标组合的第一类指标值(即目标指标的值)之和、所有目标组合的第一类指标值之和,在前者与后者的比值小于第三阈值的情况下,表示寻找到的异常因子组合关联的目标组合的指标值之和占比过小。例如目标指标为日活跃用户数量(Daily Active User,DAU),而某一次采集的数据显示在图2中所示的根因查找树的所有目标组合的DAU之和为2亿,而上述12个目标组合的DAU之和为2000,则2000远远小于2亿,则说明寻找到的导致DAU异常的因子组合占比过小,从而可以说明异常因子未处于上述根因查找树中的维度结点关联的因子组合中,从而以免输出的因子组合错误引导使用者。
即本公开的实施例中,选出的异常因子组合关联的目标组合的第一类指标值之和在总指标值(即所有目标组合的第一类指标值之和)中的占比,来判断真实根因是否在根因查找树中的维度结点关联的因子组合中。选出的异常因子组合关联的目标组合的第一类指标值之和在总指标值中的占比过小,则提示使用者根因并不在根因查找树中的维度结点关联的因子组合中,以免错误的引导使用者。由此可见,本公开的实施例,还考虑了根因不在根因查找树中的维度结点关联的因子组合中的可能性。
其中,判断选出的异常因子组合是不是根因,实际上是通过该异常因子组合关联的每一个目标组合的变化百分比、与该异常因子组合的变化百分比之间的差距来判断的。其中,目标组合的变化百分比为目标组合的第一类指标值在不同时刻的变化百分比,异常因子组合的变化百分比为:与异常因子组合关联的所有目标组合的第一类指标值之和在不同时刻的变化百分比。
即在认为选出的异常因子组合属于根因的情况下,那么该异常因子组合的变化百分比与该异常因子组合关联的大部分目标组合的变化百分比相差不大。例如选出的异常因子组合为3个,这三个异常因子组合关联的目标组合的数量为6个,那么在这3个异常因子组合属于根因的情况下,基于这6个目标组合的DAU之和上涨15%,这6个目标组合中每一个目标组合的DAU上涨幅度与15%相差不大。
例如两个目标组合的第一类指标值都变动相似幅度的概率,将远远大于一个100个目标组合的第一类指标值均变动相似幅度的概率。因此,一个维度结点关联的因子组合所关联的目标组合的数量之和较小的情况下,该维度结点的GPS会比较高。由此可见,维度交叉比较深的维度结点(即关联维度数量较多的维度结点),由于其关联的因子组合所关联的目标组合更少,因此往往会有一个较高的GPS分数。
其中,在按照维度数量由小到大的顺序,逐层对根因查找树进行遍历寻找异常因子组合的过程中,并不是选择全局最优,而是随着维度交叉越来越深,通过一个第一阈值(threshold)去由浅至深的寻找导致目标指标异常的根因。因此理论上即使维度交叉比较浅(比如只有两个维度交叉且GPS为0.75),只要它的GPS大于这个阈值,它仍然会优先于维度交叉较深但是GPS分数较高(比如有三个维度交叉且GPS为0.8)的这种组合。
但是,在异常因子组合不在根因查找树中的维度结点关联的因子组合中的情况下,例如该指标的抖动的根因不在内部业务,而是竞品有了某些动作,这种情况下,根因查找树中并不存在真 正的根因维度,因此所有维度交叉组合理论上都会有一个较低的GPS分数。而正如上文所述,由于维度交叉比较深的维度结点倾向于会有一个较高的GPS分数,则此种情况下,找到的异常因子组合关联的维度结点则处于比较根因查找树比较深的一层,这样,找到的异常因子组合关联的目标组合的数量则比较少,从而导致返回的异常因子组合关联的目标组合的第一类指标值之和占总指标值的比例小。
因此,基于上述原因,返回的根因过深、且返回的异常因子组合关联的目标组合的第一类指标值之和占总指标值的比例小,则大概率是真正的根因并不在当前输入的维度中所导致的。针对这种情况,本公开的实施例会对找到的异常因子组合关联的目标组合的第一类指标值之进行判断,在过小的情况下,则会直接显示“找到的根因子占比过小,您寻找的根因可能不在当前维度中”等用于描述通过本方案并未找到根因的提示信息。其中,在真实业务场景下的根因分析中,异常因子组合不在根因查找树中的维度结点关联的因子组合中的可能性是非常大的,因此,这样处理可以大大提升输出的合理性。
此外,还需说明的是:上述第三时刻可以为前面所述第一时刻和第二时刻中的其中一个相同。也就是,在找到异常因子组合的情况下,可以通过第一时刻采集的第一类指标值或者第二时刻采集的第一类指标值执行上述步骤K1~K4。也可以采用其他时刻采集的第一类指标值执行上述步骤K1~K4。
综上所述,本公开实施例的根因确定方法的具体实施方式,如图8所示,主要分为四个阶段,即准备阶段、确认异常目标组合阶段、搜索根因阶段、控制结果输出阶段。
首先,在准备阶段主要包括如下过程:
获取目标指标关联的维度以及维度包括的因子,从而根据维度构建根因查找树。
其次,进入确认异常目标组合阶段,其中,确认异常目标组合阶段主要包括如下过程:
计算目标组合(即为根因查找树中最后一层维度结点关联的因子组合)的偏移量,并剔除第一类指标值的变化情况与目标异常方向不一致的目标组合,从而利用剩余的目标组合绘制偏移量分布曲线图,并通过敏感参数S,控制寻找曲线拐点,进而根据拐点确认异常目标组合。
再次,进入搜索根因阶段,搜索根因阶段主要包括如下过程:
按照根因查找树的层级结构,逐层遍历维度结点,并计算遇到的维度结点关联的因子组合中异常目标组合的目标占比,从而按照目标占比从大到小的顺序,对于该维度结点关联的因子组合进行排序,然后选出前15名的待处理因子组合,并对待处理因子组合进行排序,然后选出前三名的备选因子组合,进而根据选出的前三名的备选因子组合关联的目标组合,计算该维度结点的GPS;
其中,在该维度结点的GPS大于第一阈值、且涉及多个维度的情况下,确定是选出的备选因子组合为异常因子组合,并进入控制结果输出阶段;
在该维度结点的GPS大于第一阈值、且涉及一个维度的情况下,提升第一阈值,结束本次遍历,并重新开始遍历根因查找树;
在该维度结点的GPS小于第一阈值的情况下,继续遍历下一维度结点,从而计算下一个维度结点的GPS。
最后,进入控制结果输出阶段,控制结果输出阶段主要包括如下过程:
判断异常因子组合的占比是否符合规定(即判断异常因子组合相关联的目标组合的第一类指标值之和与所有目标组合的第一类指标值之和的比值是否大于一定阈值),在符合规定的情况下,则输出该异常因子组合,否则,提示异常因子组合不在根因查找树中的维度结点关联的因子组合中。
此处需要说明的是,该实施方式中的其他技术细节(例如拐点的寻找方法、对待处理因子组合进行排序的过程、计算GPS的方法)可参见前文所述,此处不再赘述。
由上述可知,本公开的实施例,在剪枝(即对目标组合进行删减)的情况下,做到剪枝后的异常目标组合的数量相对于目标组合的总数在合理的范围内。其中包括考虑指标的涨跌方向,并且保留与指标涨跌方向相同的目标组合。进一步地,根据与指标涨跌方向相同的目标组合的总数量,计算基于肘部法则的拐点检测算法中的敏感参数,从而依据该敏感参数寻找根据目标组合的偏移量绘制的偏移量曲线图中的合适的拐点。另外,还考虑了偏移量分布曲线为凹函数的情况,并对这种情况进行了修正。
此外,在根据因子组合的目标占比的排序选出待处理因子组合的情况下,再引入“待处理因子组合关联的目标组合相关的偏移量”,对待处理因子组合进行排序,从而可以规避因子组合关联的目标组合数量少而导致目标占比的排名虚高的事实。在对结果进行输出的情况下,以一定的步长循环改变预先设置的第一阈值,保证输出的根因是多维度的而非单维度。
同时,通过选出的异常因子组合关联的目标组合的第一类指标值之和在总指标值中的占比,来判断真实根因是否在根因查找树中的维度结点关联的因子组合中。选出的异常因子组合关联的目标组合的第一类指标值之和在总指标值中的占比过小,则提示使用者根因并不在根因查找树中的维度结点关联的因子组合中,以免错误地引导使用者。
因此,本公开的实施例,全方面的提升了原算法在各种业务情况下的鲁棒性,相较于原算法,结果更加合理且具有较强解释性。
此外,现有算法往往会有一些苛刻的假设,如指标的变动只能是上升的,或是目标组合的总数在一个假设的范围内,亦或是真实的根因子一定包含在建立的根因查找树的维度结点关联的因子组合中等等……在面对一些不符合原模型假设的情况下,现有方法会有结果不准确的情况发生。同时,现有的方法要求使用者输入一个超参数来对最终的结果进行控制,由于最终结果对该超参数极其敏感,因此该超参数会对最终结果产生巨大影响。往往通过经验选取的超参数无法泛化,导致应用的广度无法进行高效延伸。
而本公开的实施例,修正了现有方法在面对真实业务数据时在剪枝、排序以及结果输出上的不合理的部分。同时,还可以有效避免超参数选取的事实,应用在多维交叉根因分析场景下能够有效的增强算法在不应用场景下的鲁棒性,避免了多维交叉寻找异常因子的方案不符合预期、不能兼顾各种业务场景的事实。
图9是根据一示例性实施例示出的一种根因确定装置的框图,如图9所示,该根因确定装置90可以包括:
数据获取模块901,被配置为获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
构建模块902,被配置为根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
第一因子组合获取模块903,被配置为获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
第一可能性参数计算模块904,被配置为计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
阈值增大模块905,被配置为响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
执行模块906,被配置为以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
由上述可知,本公开的实施例,能够根据待检测指标涉及的目标维度,以及目标维度包括的因子,构建根因查找树,从而按照根因查找树的路径进行遍历,并在每遇到一个维度结点的情况下,计算遇到的维度结点的GPS,并在计算得到的第一GPS大于第一阈值,且第一GPS所属的维度结点涉及的目标维度小于或等于第二阈值的情况下,将第一阈值增大预设步长,并返回前述按照根因查找树的路径进行遍历,并在每遇到一个维度结点的情况下,计算遇到的维度结点的GPS的步骤,直到计算得到的第二GPS大于增大后的第一阈值,且第二GPS所属的维度结点涉及的目标维度的数量大于第二阈值的情况下,停止遍历,并从第二GPS所属的维度结点对应的因子组合中选出第一预设数量的因子组合,作为导致待检测指标异常的因子组合。
由此可知,本公开的实施例,无需使用者输入超参数,并且会在计算得到一个大于最初设置的第一阈值的GPS的情况下,判断该GPS所属维度结点涉及的维度的数量是否大于第二阈值,该GPS所属维度结点涉及的维度的数量不大于第二阈值,则说明阈值设置的过低,算法没有往更深层进行探索就满足阈值条件并返回结果了,此时,本公开的实施例,以一定步长提高第一阈值并再次针对根因查找树由上至进行搜索,从而可以输出维度交叉的根因。因此,本公开的实施例,无需使用者输入任何超参数即可返回合理结果,从而可以适用于多种业务场景。
图10是根据一示例性实施例示出的一种根因确定装置的框图,如图10所示,该根因确定装置100可以包括:
数据获取模块1001,被配置为获取异常数据,所述异常数据包括维度以及所述维度包括的因 子;
构建模块1002,被配置为根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
第一因子组合获取模块1003,被配置为获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
第一可能性参数计算模块1004,被配置为计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
阈值增大模块1005,被配置为响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
执行模块1006,被配置为以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述装置还包括:
第一确定模块1007,被配置为基于增大所述第一阈值,响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述装置还包括:
第二确定模块1008,被配置为基于增大所述第一阈值,响应于所述新第一阈值不小于预设值,从所述第一维度因子组合中确定异常因子组合。
在一些实施例中,所述装置还包括:
第二因子组合获取模块1009,被配置为响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度,所述第二维度因子组合包括各所述第二维度的一个因子;
第二可能性参数计算模块1010,被配置为计算所述第二维度因子组合中存在异常因子组合的第二可能性参数。
在一些实施例中,所述装置还包括:
异常目标组合确定模块1011,被配置为确定异常目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量;
所述第一可能性参数计算模块1004包括:
占比计算子模块10041,被配置为计算每一个所述第一维度因子组合的目标占比,其中,第i个所述第一维度因子组合的目标占比为,第i个所述第一维度因子组合关联的目标组合中异常目标组合的占比,i为正整数且i∈[1,M],M∈[1,N],M为所述第一维度结点关联的所述第一维度因子组合的数量;
可能性参数计算子模块10042,被配置为根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
在一些实施例中,所述装置还包括:
删除模块1012,被配置为删除符合预设条件的目标组合,得到第一剩余目标组合;
其中,所述预设条件包括目标对象的变化情况与目标指标的异常方向不匹配,所述目标对象为在不同时刻采集的所述目标组合的第一类指标值,所述目标指标为与所述维度关联的业务指标,所述第一类指标值为所述目标指标的值;
在所述异常目标组合确定模块1011在确定异常目标组合的情况下,具体被配置为:
确定所述第一剩余目标组合中的异常目标组合。
在一些实施例中,所述异常目标组合确定模块1011包括:
偏移量获取子模块10111,被配置为获取所述目标组合的偏移量;
第一绘制子模块10112,被配置为绘制第一偏移量分布曲线图,其中,所述第一偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴表示的偏移量的目标组合的数量;
第一拐点确定子模块10113,被配置为确定所述第一偏移量分布曲线图中的第一拐点;
第一异常目标组合确定子模块10114,被配置为在所有所述目标组合中,偏移量大于第一目 标偏移量的目标组合的占比不大于第五阈值,则确定偏移量大于所述第一目标偏移量的目标组合为异常目标组合,其中,所述第一目标偏移量为所述第一拐点在所述第一偏移量分布曲线图中的横坐标。
在一些实施例中,所述第一拐点确定子模块10113具体被配置为:
根据第一预设公式S=min(m,L/n),计算基于肘部法则的拐点检测算法中的敏感参数S,其中,L为所述第一偏移量分布曲线图中涉及的目标组合的总数量,m和n分别为预先设置的常量;
采用所述基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点。
在一些实施例中,所述异常目标组合确定模块1011还包括:
排序子模块10115,被配置为在所有所述目标组合中,偏移量大于所述第一目标偏移量的目标组合的占比大于所述第五阈值,则按照偏移量从小到大的顺序,对所述目标组合进行排序,获得第一排序;
删减子模块10116,被配置为将所述第一排序中的前第一预设数量的目标组合去除,得到第二剩余目标组合;
第二绘制子模块10117,被配置为根据本次得到的所述第二剩余目标组合的偏移量,绘制第二偏移量分布曲线图,其中,所述第二偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴上的数值表示的偏移量的目标组合的数量;
第二拐点确定子模块10118,被配置为确定所述第二偏移量分布曲线图中的第二拐点;
第二异常目标组合确定子模块10119,被配置为在本次得到的所述第二剩余目标组合中,偏移量大于第二目标偏移量的目标组合的占比不大于所述第五阈值,则将偏移量大于所述第二目标偏移量的目标组合确定为异常目标组合,其中,所述第二目标偏移量为所述第二拐点在所述第二偏移量分布曲线图中的横坐标。
在一些实施例中,所述可能性参数计算子模块10042具体被配置为:
按照所述目标占比从大到小的顺序,对所述第一维度因子组合进行排序,得到第二排序;
选出所述第二排序中前第二预设数量的待处理因子组合;
对所述待处理因子组合进行排序,获得第三排序;
将所述第三排序中前第三预设数量的备选因子组合;
获取所述备选因子组合关联的目标组合,其中,所述备选因子组合关联的目标组合包括所述备选因子组合中的因子;
根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,所述第二预设数量大于所述第三预设数量。
在一些实施例中,所述可能性参数计算子模块10042在对所述待处理因子组合进行排序,获得第三排序的情况下,具体被配置为:
在目标指标为原生指标的情况下,计算所述待处理因子组合的第一参数,并按照所述第一参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序,其中,所述目标指标为所述维度关联的业务指标,所述第一参数为与同一个所述待处理因子组合相关联的目标组合的偏移量之和,所述待处理因子组合相关联的目标组合包括所述待处理因子组合中的因子;
在所述目标指标为衍生指标的情况下,获取每一个第一目标组合的第一数值,所述第一目标组合为所述待处理因子组合关联的目标组合,所述第一数值为所述第一目标组合在不同时刻的第二类指标值之差的绝对值,所述第二类指标值为第一指标的值,所述第一指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分子的指标;
获取每一个所述第一目标组合的第二数值,所述第二数值为所述第一目标组合在所述不同时刻的第三类指标值之差的绝对值,所述第三类指标值为第二指标的值,所述第二指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分母的指标;
计算每一个所述第一目标组合的第三数值,所述第三数值为同一个所述第一目标组合的所述第一数值与所述第二数值之和;
计算所述待处理因子组合的第二参数,所述第二参数为与同一个所述待处理因子组合关联的所述第一目标组合的所述第三数值之和;
按照所述待处理因子组合的所述第二参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序。
在一些实施例中,所述执行模块1006在从所述第一维度因子组合中确定异常因子组合的情况 下,具体被配置为:
将所述备选因子组合,确定为所述异常因子组合。
在一些实施例中,所述可能性参数计算子模块10042在根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数的情况下,具体被配置为:
计算第一异常目标组合的偏移量的第一平均值avg1,其中,所述第一异常目标组合为第一目标组合中的异常目标组合,所述第一目标组合为所述备选因子组合相关联的目标组合;
计算所述第一目标组合中除所述第一异常目标组合之外的其他目标组合的偏移量的第二平均值avg2;
根据第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)),计算第三参数a(Z1),其中,f(Z1)表示所述第一异常目标组合在第二时刻的第一类指标值的和,f(Z)表示所述第一目标组合在所述第二时刻的所述第一类指标值的和,v(Z)表示所述第一目标组合在第一时刻的所述第一类指标值的和,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,所述第一时刻早于所述第二时刻;
计算每一个所述第一异常目标组合在所述第一时刻的所述第一类指标值与所述第三参数之差,得到与每一个所述第一异常目标组合对应的第四参数;
计算所有所述第一异常目标组合对应的第四参数的绝对值的第三平均值avg3;
根据所述avg1、avg2、avg3、以及第三预设公式GPS=1-(avg3+avg2)/(avg1+avg2),计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
其中,GPS表示所述第一可能性参数。
在一些实施例中,所述装置还包括:
第一验证参数获取模块1013,被配置为获取与所述异常因子组合关联的第二目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量,所述第一目标组合包括所述异常因子组合中的因子;
第二验证参数获取模块1014,被配置为获取第四数值,所述第四数值为所述第二目标组合在第三时刻的第一类指标值的和,所述第一类指标值为目标指标的取值,所述目标指标为与所述维度关联的业务指标;
第三验证参数获取模块1015,被配置为获取第五数值,所述第五数值为所有所述目标组合在所述第三时刻的所述第一类指标值的和;
验证模块1016,被配置为响应于所述第四数值与所述第五数值的比值小于第三阈值,执行预设提示操作,所述预设提示操作用于提示异常因子未处于所述异常数据中。
由上述可知,本公开的实施例,能够根据待检测指标涉及的目标维度,以及目标维度包括的因子,构建根因查找树,从而按照根因查找树的路径进行遍历,并在每遇到一个维度结点的情况下,计算遇到的维度结点的GPS,并在计算得到的第一GPS大于第一阈值,且第一GPS所属的维度结点涉及的目标维度小于或等于第二阈值的情况下,将第一阈值增大预设步长,并返回前述按照根因查找树的路径进行遍历,并在每遇到一个维度结点的情况下,计算遇到的维度结点的GPS的步骤,直到计算得到的第二GPS大于增大后的第一阈值,且第二GPS所属的维度结点涉及的目标维度的数量大于第二阈值的情况下,停止遍历,并从第二GPS所属的维度结点对应的因子组合中选出第一预设数量的因子组合,作为导致待检测指标异常的因子组合。
由此可知,本公开的实施例,无需使用者输入超参数,并且会在计算得到一个大于最初设置的第一阈值的GPS的情况下,判断该GPS所属维度结点涉及的维度的数量是否大于第二阈值,该GPS所属维度结点涉及的维度的数量不大于第二阈值,则说明阈值设置的过低,算法没有往更深层进行探索就满足阈值条件并返回结果了,此时,本公开的实施例,以一定步长提高第一阈值并再次针对根因查找树由上至进行搜索,从而可以输出维度交叉的根因。因此,本公开的实施例,无需使用者输入任何超参数即可返回合理结果,从而可以适用于多种业务场景。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
根据本公开实施例的第四方面,提供了一种电子设备。参照图11,该电子设备包括:
处理器1110;
用于存储所述处理器可执行指令的存储器1120;
其中,所述处理器被配置为执行所述指令,以实现上述所述的根因确定方法。
根据本公开实施例的第五方面,还提供了一种电子设备。如图12所示,该电子设备900可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身 设备,个人数字助理等。
参照图12,电子设备1200可以包括以下一个或多个组件:处理组件1202,存储器1204,电源组件1206,多媒体组件1208,音频组件1210,输入/输出(I/O)的接口1212,传感器组件1214,以及通信组件1216。
处理组件1202通常控制电子设备1200的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件1202可以包括一个或多个处理器1220来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件1202可以包括一个或多个模块,便于处理组件1202和其他组件之间的交互。例如,处理组件1202可以包括多媒体模块,以方便多媒体组件1208和处理组件1202之间的交互。
存储器1204被配置为存储各种类型的数据以支持在设备1200的操作。这些数据的示例包括用于在电子设备1200上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器1204可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件1206为电子设备1200的各种组件提供电力。电源组件1206可以包括电源管理***,一个或多个电源,及其他与为电子设备1200生成、管理和分配电力相关联的组件。
多媒体组件1208包括在所述电子设备1200和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。在屏幕包括触摸面板的情况下,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件1208包括一个前置摄像头和/或后置摄像头。在设备1200处于操作模式,如拍摄模式或视频模式的情况下,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜***或具有焦距和光学变焦能力。
音频组件1210被配置为输出和/或输入音频信号。例如,音频组件1210包括一个麦克风(MIC),在电子设备1200处于操作模式,如呼叫模式、记录模式和语音识别模式的情况下,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器1204或经由通信组件1216发送。在一些实施例中,音频组件1210还包括一个扬声器,用于输出音频信号。
I/O接口1212为处理组件1202和***接口模块之间提供接口,上述***接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件1214包括一个或多个传感器,用于为电子设备1200提供各个方面的状态评估。例如,传感器组件1214可以检测到设备1200的打开/关闭状态,组件的相对定位,例如所述组件为电子设备1200的显示器和小键盘,传感器组件1214还可以检测电子设备1200或电子设备1200一个组件的位置改变,用户与电子设备1200接触的存在或不存在,电子设备1200方位或加速/减速和电子设备1200的温度变化。传感器组件1214可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件1214还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件1214还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件1216被配置为便于电子设备1200和其他设备之间有线或无线方式的通信。电子设备1200可以接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、8G或5G),或它们的组合。在一个示例性实施例中,通信组件1216经由广播信道接收来自外部广播管理***的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件1216还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在本公开的实施例中,电子设备1200可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述所述的根因确定方法。
在本公开的实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器1204,上述指令可由电子设备1200的处理器1220执行以完成上述方法。在一些实施例中,例如,存储介质可以是非临时性计算机可读存储介质,例如,所述非临时性计算机可读存 储介质计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
在本公开实施的又一方面,本公开实施例还提供了一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行上述所述的根因确定方法。
根据本公开实施例的又一方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机实现上述所述的根因确定方法。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。

Claims (31)

  1. 一种根因确定方法,包括:
    获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
    根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
    获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
    计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
    以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  2. 根据权利要求1所述的方法,还包括:
    响应于所述第一可能性参数大于所述第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  3. 根据权利要求1所述的方法,其中,基于增大所述第一阈值,在所述新第一阈值不小于预设值的情况下,则所述方法还包括:
    响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  4. 根据所述权利要求1所述的方法,还包括:
    响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度,所述第二维度因子组合包括各所述第二维度的一个因子;
    计算所述第二维度因子组合中存在异常因子组合的第二可能性参数。
  5. 根据权利要求1所述的方法,还包括:
    确定异常目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量;
    所述计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
    计算每一个所述第一维度因子组合的目标占比,其中,第i个所述第一维度因子组合的目标占比为,第i个所述第一维度因子组合关联的目标组合中异常目标组合的占比,i为正整数且i∈[1,M],M∈[1,N],M为所述第一维度结点关联的所述第一维度因子组合的数量;
    根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
  6. 根据权利要求5所述的方法,还包括:
    删除符合预设条件的目标组合,得到第一剩余目标组合;
    其中,所述预设条件包括目标对象的变化情况与目标指标的异常方向不匹配,所述目标对象为在不同时刻采集的所述目标组合的第一类指标值,所述目标指标为与所述维度关联的业务指标,所述第一类指标值为所述目标指标的值;
    所述确定异常目标组合,包括:
    确定所述第一剩余目标组合中的异常目标组合。
  7. 根据权利要求5所述的方法,其中,所述确定异常目标组合,包括:
    获取所述目标组合的偏移量;
    绘制第一偏移量分布曲线图,其中,所述第一偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴表示的偏移量的目标组合的数量;
    确定所述第一偏移量分布曲线图中的第一拐点;
    在所有所述目标组合中,偏移量大于第一目标偏移量的目标组合的占比不大于第五阈值,则确定偏移量大于所述第一目标偏移量的目标组合为异常目标组合,其中,所述第一目标偏移量为 所述第一拐点在所述第一偏移量分布曲线图中的横坐标。
  8. 根据权利要求7所述的方法,其中,所述确定所述第一偏移量分布曲线图中的第一拐点,包括:
    根据第一预设公式S=min(m,L/n),计算基于肘部法则的拐点检测算法中的敏感参数S,其中,L为所述第一偏移量分布曲线图中涉及的目标组合的总数量,m和n分别为预先设置的常量;
    采用所述基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点。
  9. 根据权利要求7所述的方法,其中,所述确定异常目标组合,还包括:
    在所有所述目标组合中,偏移量大于所述第一目标偏移量的目标组合的占比大于所述第五阈值,则按照偏移量从小到大的顺序,对所述目标组合进行排序,获得第一排序;
    将所述第一排序中的前第一预设数量的目标组合去除,得到第二剩余目标组合;
    根据本次得到的所述第二剩余目标组合的偏移量,绘制第二偏移量分布曲线图,其中,所述第二偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴上的数值表示的偏移量的目标组合的数量;
    确定所述第二偏移量分布曲线图中的第二拐点;
    在本次得到的所述第二剩余目标组合中,偏移量大于第二目标偏移量的目标组合的占比不大于所述第五阈值,则将偏移量大于所述第二目标偏移量的目标组合确定为异常目标组合,其中,所述第二目标偏移量为所述第二拐点在所述第二偏移量分布曲线图中的横坐标。
  10. 根据权利要求5所述的方法,其中,所述根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
    按照所述目标占比从大到小的顺序,对所述第一维度因子组合进行排序,得到第二排序;
    选出所述第二排序中前第二预设数量的待处理因子组合;
    对所述待处理因子组合进行排序,获得第三排序;
    将所述第三排序中前第三预设数量的备选因子组合;
    获取所述备选因子组合关联的目标组合,其中,所述备选因子组合关联的目标组合包括所述备选因子组合中的因子;
    根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    其中,所述第二预设数量大于所述第三预设数量。
  11. 根据权利要求10所述的方法,其中,所述对所述待处理因子组合进行排序,获得第三排序,包括:
    在目标指标为原生指标的情况下,计算所述待处理因子组合的第一参数,并按照所述第一参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序,其中,所述目标指标为所述维度关联的业务指标,所述第一参数为与同一个所述待处理因子组合相关联的目标组合的偏移量之和,所述待处理因子组合相关联的目标组合包括所述待处理因子组合中的因子;
    在所述目标指标为衍生指标的情况下,获取每一个第一目标组合的第一数值,所述第一目标组合为所述待处理因子组合关联的目标组合,所述第一数值为所述第一目标组合在不同时刻的第二类指标值之差的绝对值,所述第二类指标值为第一指标的值,所述第一指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分子的指标;
    获取每一个所述第一目标组合的第二数值,所述第二数值为所述第一目标组合在所述不同时刻的第三类指标值之差的绝对值,所述第三类指标值为第二指标的值,所述第二指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分母的指标;
    计算每一个所述第一目标组合的第三数值,所述第三数值为同一个所述第一目标组合的所述第一数值与所述第二数值之和;
    计算所述待处理因子组合的第二参数,所述第二参数为与同一个所述待处理因子组合关联的所述第一目标组合的所述第三数值之和;
    按照所述待处理因子组合的所述第二参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序。
  12. 根据权利要求10所述的方法,其中,所述从所述第一维度因子组合中确定异常因子组合,包括:
    将所述备选因子组合,确定为所述异常因子组合。
  13. 根据权利要求10所述的方法,其中,所述根据所述备选因子组合关联的目标组合,计算 所述第一维度因子组合中存在异常因子组合的第一可能性参数,包括:
    计算第一异常目标组合的偏移量的第一平均值avg1,其中,所述第一异常目标组合为第一目标组合中的异常目标组合,所述第一目标组合为所述备选因子组合相关联的目标组合;
    计算所述第一目标组合中除所述第一异常目标组合之外的其他目标组合的偏移量的第二平均值avg2;
    根据第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)),计算第三参数a(Z1),其中,f(Z1)表示所述第一异常目标组合在第二时刻的第一类指标值的和,f(Z)表示所述第一目标组合在所述第二时刻的所述第一类指标值的和,v(Z)表示所述第一目标组合在第一时刻的所述第一类指标值的和,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,所述第一时刻早于所述第二时刻;
    计算每一个所述第一异常目标组合在所述第一时刻的所述第一类指标值与所述第三参数之差,得到与每一个所述第一异常目标组合对应的第四参数;
    计算所有所述第一异常目标组合对应的第四参数的绝对值的第三平均值avg3;
    根据所述avg1、avg2、avg3、以及第三预设公式GPS=1-(avg3+avg2)/(avg1+avg2),计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    其中,GPS表示所述第一可能性参数。
  14. 根据权利要求1所述的方法,还包括:
    获取与所述异常因子组合关联的第二目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量,所述第一目标组合包括所述异常因子组合中的因子;
    获取第四数值,所述第四数值为所述第二目标组合在第三时刻的第一类指标值的和,所述第一类指标值为目标指标的取值,所述目标指标为与所述维度关联的业务指标;
    获取第五数值,所述第五数值为所有所述目标组合在所述第三时刻的所述第一类指标值的和;
    响应于所述第四数值与所述第五数值的比值小于第三阈值,执行预设提示操作,所述预设提示操作用于提示异常因子未处于所述异常数据中。
  15. 一种根因确定装置,包括:
    数据获取模块,被配置为获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
    构建模块,被配置为根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
    第一因子组合获取模块,被配置为获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
    第一可能性参数计算模块,被配置为计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    阈值增大模块,被配置为响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
    执行模块,被配置为以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  16. 根据权利要求15所述的装置,还包括:
    第一确定模块,被配置为响应于所述第一可能性参数大于所述第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  17. 根据权利要求15所述的装置,还包括:
    第二确定模块,被配置为基于增大所述第一阈值,响应于重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量不大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  18. 根据权利要求15所述的装置,还包括:
    第二因子组合获取模块,被配置为响应于所述第一可能性参数不大于所述第一阈值,获取第二维度结点关联的第二维度因子组合,所述第二维度结点关联至少一个第二维度;
    第二可能性参数计算模块,被配置为计算所述第二维度因子组合中存在异常因子组合的第二 可能性参数。
  19. 根据权利要求15所述的装置,还包括:
    异常目标组合确定模块,被配置为确定异常目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量;
    所述第一可能性参数计算模块包括:
    占比计算子模块,被配置为计算每一个所述第一维度因子组合的目标占比,其中,第i个所述第一维度因子组合的目标占比为,第i个所述第一维度因子组合关联的目标组合中异常目标组合的占比,i为正整数且i∈[1,M],M∈[1,N],M为所述第一维度结点关联的所述第一维度因子组合的数量;
    可能性参数计算子模块,被配置为根据所述第一维度因子组合的目标占比,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数。
  20. 根据权利要求19所述的装置,还包括:
    删除模块,被配置为删除符合预设条件的目标组合,得到第一剩余目标组合;
    其中,所述预设条件包括目标对象的变化情况与目标指标的异常方向不匹配,所述目标对象为在不同时刻采集的所述目标组合的第一类指标值,所述目标指标为与所述维度关联的业务指标,所述第一类指标值为所述目标指标的值;
    所述异常目标组合确定模块在确定异常目标组合的情况下,具体被配置为:
    确定所述第一剩余目标组合中的异常目标组合。
  21. 根据权利要求19所述的装置,其中,所述异常目标组合确定模块包括:
    偏移量获取子模块,被配置为获取所述目标组合的偏移量;
    第一绘制子模块,被配置为绘制第一偏移量分布曲线图,其中,所述第一偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴表示的偏移量的目标组合的数量;
    第一拐点确定子模块,被配置为确定所述第一偏移量分布曲线图中的第一拐点;
    第一异常目标组合确定子模块,被配置为在所有所述目标组合中,偏移量大于第一目标偏移量的目标组合的占比不大于第五阈值,则确定偏移量大于所述第一目标偏移量的目标组合为异常目标组合,其中,所述第一目标偏移量为所述第一拐点在所述第一偏移量分布曲线图中的横坐标。
  22. 根据权利要求21所述的装置,其中,所述第一拐点确定子模块具体被配置为:
    根据第一预设公式S=min(m,L/n),计算基于肘部法则的拐点检测算法中的敏感参数S,其中,L为所述第一偏移量分布曲线图中涉及的目标组合的总数量,m和n分别为预先设置的常量;
    采用所述基于肘部法则的拐点检测算法,确定所述第一偏移量分布曲线图中的第一拐点。
  23. 根据权利要求21所述的装置,其中,所述异常目标组合确定模块还包括:
    排序子模块,被配置为在所有所述目标组合中,偏移量大于所述第一目标偏移量的目标组合的占比大于所述第五阈值,则按照偏移量从小到大的顺序,对所述目标组合进行排序,获得第一排序;
    删减子模块,被配置为将所述第一排序中的前第一预设数量的目标组合去除,得到第二剩余目标组合;
    第二绘制子模块,被配置为根据本次得到的所述第二剩余目标组合的偏移量,绘制第二偏移量分布曲线图,其中,所述第二偏移量分布曲线图的横轴表示偏移量,纵轴表示偏移量小于横轴上的数值表示的偏移量的目标组合的数量;
    第二拐点确定子模块,被配置为确定所述第二偏移量分布曲线图中的第二拐点;
    第二异常目标组合确定子模块,被配置为在本次得到的所述第二剩余目标组合中,偏移量大于第二目标偏移量的目标组合的占比不大于所述第五阈值,则将偏移量大于所述第二目标偏移量的目标组合确定为异常目标组合,其中,所述第二目标偏移量为所述第二拐点在所述第二偏移量分布曲线图中的横坐标。
  24. 根据权利要求19所述的装置,其中,所述可能性参数计算子模块具体被配置为:
    按照所述目标占比从大到小的顺序,对所述第一维度因子组合进行排序,得到第二排序;
    选出所述第二排序中前第二预设数量的待处理因子组合;
    对所述待处理因子组合进行排序,获得第三排序;
    将所述第三排序中前第三预设数量的备选因子组合;
    获取所述备选因子组合关联的目标组合,其中,所述备选因子组合关联的目标组合包括所述备选因子组合中的因子;
    根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    其中,所述第二预设数量大于所述第三预设数量。
  25. 根据权利要求24所述的装置,其中,所述可能性参数计算子模块在对所述待处理因子组合进行排序,获得第三排序的情况下,具体被配置为:
    在目标指标为原生指标的情况下,计算所述待处理因子组合的第一参数,并按照所述第一参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序,其中,所述目标指标为所述维度关联的业务指标,所述第一参数为与同一个所述待处理因子组合相关联的目标组合的偏移量之和,所述待处理因子组合相关联的目标组合包括所述待处理因子组合中的因子;
    在所述目标指标为衍生指标的情况下,获取每一个第一目标组合的第一数值,所述第一目标组合为所述待处理因子组合关联的目标组合,所述第一数值为所述第一目标组合在不同时刻的第二类指标值之差的绝对值,所述第二类指标值为第一指标的值,所述第一指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分子的指标;
    获取每一个所述第一目标组合的第二数值,所述第二数值为所述第一目标组合在所述不同时刻的第三类指标值之差的绝对值,所述第三类指标值为第二指标的值,所述第二指标为所述目标指标为衍生指标的情况下,计算所述目标指标的过程中作为分母的指标;
    计算每一个所述第一目标组合的第三数值,所述第三数值为同一个所述第一目标组合的所述第一数值与所述第二数值之和;
    计算所述待处理因子组合的第二参数,所述第二参数为与同一个所述待处理因子组合关联的所述第一目标组合的所述第三数值之和;
    按照所述待处理因子组合的所述第二参数从大到小的顺序,对所述待处理因子组合进行排序,得到第三排序。
  26. 根据权利要求24所述的装置,其中,所述执行模块在从所述第一维度因子组合中确定异常因子组合的情况下,具体被配置为:
    将所述备选因子组合,确定为所述异常因子组合。
  27. 根据权利要求24所述的装置,其中,所述可能性参数计算子模块在根据所述备选因子组合关联的目标组合,计算所述第一维度因子组合中存在异常因子组合的第一可能性参数的情况下,具体被配置为:
    计算第一异常目标组合的偏移量的第一平均值avg1,其中,所述第一异常目标组合为第一目标组合中的异常目标组合,所述第一目标组合为所述备选因子组合相关联的目标组合;
    计算所述第一目标组合中除所述第一异常目标组合之外的其他目标组合的偏移量的第二平均值avg2;
    根据第二预设公式a(Z1)=f(Z1)-f(Z1)/f(Z)(f(Z)-v(Z)),计算第三参数a(Z1),其中,f(Z1)表示所述第一异常目标组合在第二时刻的第一类指标值的和,f(Z)表示所述第一目标组合在所述第二时刻的所述第一类指标值的和,v(Z)表示所述第一目标组合在第一时刻的所述第一类指标值的和,所述第一类指标值为目标指标的值,所述目标指标为所述维度关联的业务指标,所述第一时刻早于所述第二时刻;
    计算每一个所述第一异常目标组合在所述第一时刻的所述第一类指标值与所述第三参数之差,得到与每一个所述第一异常目标组合对应的第四参数;
    计算所有所述第一异常目标组合对应的第四参数的绝对值的第三平均值avg3;
    根据所述avg1、avg2、avg3、以及第三预设公式GPS=1-(avg3+avg2)/(avg1+avg2),计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    其中,GPS表示所述第一可能性参数。
  28. 根据权利要求15所述的装置,还包括:
    第一验证参数获取模块,被配置为获取与所述异常因子组合关联的第二目标组合,所述目标组合为第N层维度结点关联的因子组合,N为所述异常数据的维度数量,所述第一目标组合包括所述异常因子组合中的因子;
    第二验证参数获取模块,被配置为获取第四数值,所述第四数值为所述第二目标组合在第三时刻的第一类指标值的和,所述第一类指标值为目标指标的取值,所述目标指标为与所述维度关联的业务指标;
    第三验证参数获取模块,被配置为获取第五数值,所述第五数值为所有所述目标组合在所述第三时刻的所述第一类指标值的和;
    验证模块,被配置为响应于所述第四数值与所述第五数值的比值小于第三阈值,执行预设提示操作,所述预设提示操作用于提示异常因子未处于所述异常数据中。
  29. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现以下步骤:
    获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
    根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
    获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
    计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
    以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  30. 一种非易失性计算机可读存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够实现以下步骤:
    获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
    根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
    获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
    计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
    以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
  31. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现以下步骤:
    获取异常数据,所述异常数据包括维度以及所述维度包括的因子;
    根据所述维度,构建根因查找树,所述根因查找树包括至少一层维度结点层,每一维度结点层包括至少一个维度结点,所述维度结点关联至少一个维度,且所述维度结点关联的维度数量与所述维度结点位于的维度结点层数相同;
    获取所述根因查找树中第一维度结点关联的第一维度因子组合,所述第一维度结点关联至少一个第一维度,所述第一维度因子组合包括各所述第一维度的一个因子,所述第一维度结点为所述根因查找树中的任一维度结点;
    计算所述第一维度因子组合中存在异常因子组合的第一可能性参数;
    响应于所述第一可能性参数大于第一阈值且所述第一维度的数量不大于第二阈值,增大所述第一阈值;
    以所述根因查找树中的任一维度结点为第一维度结点、以增大后的第一阈值为新第一阈值,重复执行计算所述第一维度因子组合中存在异常因子组合的第一可能性参数、以及在重新计算的 第一可能性参数大于所述新第一阈值且所述第一维度的数量不大于所述第二阈值时增大所述新第一阈值的过程,直至重新计算的第一可能性参数大于所述新第一阈值、且所述第一维度的数量大于所述第二阈值,从所述第一维度因子组合中确定异常因子组合。
PCT/CN2021/113331 2021-01-29 2021-08-18 根因确定方法及装置 WO2022160675A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130846.7 2021-01-29
CN202110130846.7A CN112949983B (zh) 2021-01-29 2021-01-29 一种根因确定方法及装置

Publications (1)

Publication Number Publication Date
WO2022160675A1 true WO2022160675A1 (zh) 2022-08-04

Family

ID=76240378

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113331 WO2022160675A1 (zh) 2021-01-29 2021-08-18 根因确定方法及装置

Country Status (2)

Country Link
CN (1) CN112949983B (zh)
WO (1) WO2022160675A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392799A (zh) * 2022-10-27 2022-11-25 平安科技(深圳)有限公司 归因分析方法、装置、计算机设备及存储介质
CN115756919A (zh) * 2022-11-10 2023-03-07 上海鼎茂信息技术有限公司 一种面向多维数据的根因定位方法及***
CN117149486A (zh) * 2023-08-25 2023-12-01 北京优特捷信息技术有限公司 告警和根因定位方法、模型训练方法、装置、设备及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949983B (zh) * 2021-01-29 2024-06-04 北京达佳互联信息技术有限公司 一种根因确定方法及装置

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170154071A1 (en) * 2014-07-28 2017-06-01 Hewlett Packard Enterprise Development Lp Detection of abnormal transaction loops
CN109753372A (zh) * 2018-12-20 2019-05-14 东软集团股份有限公司 多维数据异常检测方法、装置、可读存储介质及电子设备
CN109992479A (zh) * 2019-03-31 2019-07-09 西安电子科技大学 一种多维度kpi数据异常定位方法、装置及计算机设备
CN110825769A (zh) * 2019-10-11 2020-02-21 苏宁金融科技(南京)有限公司 一种数据指标异常的查询方法和***
CN111026570A (zh) * 2019-11-01 2020-04-17 支付宝(杭州)信息技术有限公司 用于确定业务***异常原因的方法和装置
CN111064614A (zh) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 一种故障根因定位方法、装置、设备及存储介质
CN111444247A (zh) * 2020-06-17 2020-07-24 北京必示科技有限公司 一种基于kpi指标的根因定位方法、装置及存储介质
CN111538951A (zh) * 2020-03-31 2020-08-14 北京华三通信技术有限公司 一种异常定位方法及装置
CN112187554A (zh) * 2020-12-01 2021-01-05 北京蒙帕信创科技有限公司 一种基于蒙特卡洛树搜索的运维***故障定位方法和***
CN112256748A (zh) * 2020-09-25 2021-01-22 北京五八信息技术有限公司 一种异常检测方法、装置、电子设备及存储介质
CN112949983A (zh) * 2021-01-29 2021-06-11 北京达佳互联信息技术有限公司 一种根因确定方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360527B2 (en) * 2010-11-10 2019-07-23 International Business Machines Corporation Casual modeling of multi-dimensional hierarchical metric cubes
US10902062B1 (en) * 2017-08-24 2021-01-26 Amazon Technologies, Inc. Artificial intelligence system providing dimension-level anomaly score attributions for streaming data
CN111160329A (zh) * 2019-12-27 2020-05-15 深圳前海微众银行股份有限公司 一种根因分析的方法及装置
CN111641519B (zh) * 2020-04-30 2022-10-11 平安科技(深圳)有限公司 异常根因定位方法、装置及存储介质

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170154071A1 (en) * 2014-07-28 2017-06-01 Hewlett Packard Enterprise Development Lp Detection of abnormal transaction loops
CN109753372A (zh) * 2018-12-20 2019-05-14 东软集团股份有限公司 多维数据异常检测方法、装置、可读存储介质及电子设备
CN109992479A (zh) * 2019-03-31 2019-07-09 西安电子科技大学 一种多维度kpi数据异常定位方法、装置及计算机设备
CN110825769A (zh) * 2019-10-11 2020-02-21 苏宁金融科技(南京)有限公司 一种数据指标异常的查询方法和***
CN111026570A (zh) * 2019-11-01 2020-04-17 支付宝(杭州)信息技术有限公司 用于确定业务***异常原因的方法和装置
CN111064614A (zh) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 一种故障根因定位方法、装置、设备及存储介质
CN111538951A (zh) * 2020-03-31 2020-08-14 北京华三通信技术有限公司 一种异常定位方法及装置
CN111444247A (zh) * 2020-06-17 2020-07-24 北京必示科技有限公司 一种基于kpi指标的根因定位方法、装置及存储介质
CN112256748A (zh) * 2020-09-25 2021-01-22 北京五八信息技术有限公司 一种异常检测方法、装置、电子设备及存储介质
CN112187554A (zh) * 2020-12-01 2021-01-05 北京蒙帕信创科技有限公司 一种基于蒙特卡洛树搜索的运维***故障定位方法和***
CN112949983A (zh) * 2021-01-29 2021-06-11 北京达佳互联信息技术有限公司 一种根因确定方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392799A (zh) * 2022-10-27 2022-11-25 平安科技(深圳)有限公司 归因分析方法、装置、计算机设备及存储介质
CN115392799B (zh) * 2022-10-27 2023-04-11 平安科技(深圳)有限公司 归因分析方法、装置、计算机设备及存储介质
CN115756919A (zh) * 2022-11-10 2023-03-07 上海鼎茂信息技术有限公司 一种面向多维数据的根因定位方法及***
CN115756919B (zh) * 2022-11-10 2023-10-31 上海鼎茂信息技术有限公司 一种面向多维数据的根因定位方法及***
CN117149486A (zh) * 2023-08-25 2023-12-01 北京优特捷信息技术有限公司 告警和根因定位方法、模型训练方法、装置、设备及介质

Also Published As

Publication number Publication date
CN112949983B (zh) 2024-06-04
CN112949983A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022160675A1 (zh) 根因确定方法及装置
CN108629354B (zh) 目标检测方法及装置
KR101813195B1 (ko) 연락처 정보 추천 방법, 장치, 프로그램 및 기록매체
WO2017071063A1 (zh) 区域识别方法及装置
CN110674932A (zh) 一种二阶段卷积神经网络目标检测网络训练方法及装置
CN114925092B (zh) 一种数据处理方法、装置、电子设备及存储介质
CN111209354A (zh) 一种地图兴趣点判重的方法、装置及电子设备
US11038764B2 (en) Establishing communication between nodes on a connection network
CN111368161A (zh) 一种搜索意图的识别方法、意图识别模型训练方法和装置
CN111382064A (zh) 一种测试方法、装置、介质和电子设备
US11429660B2 (en) Photo processing method, device and computer equipment
CN107239462B (zh) 一种搜索方法和装置以及浏览器
CN104408130B (zh) 图片整理的方法及装置
CN113590605B (zh) 数据处理方法、装置、电子设备及存储介质
CN104111977B (zh) 信息匹配方法、装置及终端
CN116127353A (zh) 分类方法、分类模型训练方法、设备及介质
CN109799916B (zh) 一种候选项联想方法和装置
CN115687303A (zh) 数据信息迁移方法、装置、设备及存储介质
CN112162991B (zh) 数据的智能管理方法及装置
WO2017071213A1 (zh) 联系人的搜索方法及装置
CN109145160A (zh) 概率图中选取关键边和优化关键边的方法及存储介质
CN114117058A (zh) 账户信息的确定方法、装置、电子设备及存储介质
CN112328081A (zh) 一种文件查找方法、装置、电子设备及可读存储介质
CN112837813A (zh) 自动问诊方法及装置
CN111797994B (zh) 一种风险评估方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922270

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.11.2023)