CN112702198A - Abnormal root cause positioning method and device, electronic equipment and storage medium - Google Patents

Abnormal root cause positioning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112702198A
CN112702198A CN202011511168.0A CN202011511168A CN112702198A CN 112702198 A CN112702198 A CN 112702198A CN 202011511168 A CN202011511168 A CN 202011511168A CN 112702198 A CN112702198 A CN 112702198A
Authority
CN
China
Prior art keywords
attribution
dimension
root cause
anomaly
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011511168.0A
Other languages
Chinese (zh)
Other versions
CN112702198B (en
Inventor
李健
马茗
程媛
郭君健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202011511168.0A priority Critical patent/CN112702198B/en
Publication of CN112702198A publication Critical patent/CN112702198A/en
Application granted granted Critical
Publication of CN112702198B publication Critical patent/CN112702198B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure relates to an abnormal root cause positioning method, an abnormal root cause positioning device, an electronic device and a storage medium. The abnormal root cause positioning method comprises the following steps: detecting an anomaly; in response to detecting an anomaly, performing an attribution probe operation according to an attribution dimension set to determine root cause nodes in an attribution path of the anomaly, wherein the attribution dimension set comprises a plurality of attribution dimensions, each root cause node corresponding to an anomaly attribution dimension value of one attribution dimension; an abnormal root cause is determined from all root cause nodes in the attributed path.

Description

Abnormal root cause positioning method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of signal processing, and in particular, to a method and an apparatus for locating an abnormal root cause, an electronic device, and a storage medium.
Background
In operation and maintenance work under various environments, operation and maintenance personnel can face a large amount of abnormity or faults every day, the operation and maintenance personnel need to find out the reason really causing the abnormity or the faults in time, and only then, the operation and maintenance personnel can be guided to take corresponding measures in time to stop loss quickly or restore normal operation of services quickly, so that high-quality experience of users is finally guaranteed. In addition, after an anomaly is found, the root cause is often found in a large dimensional space. Although some root cause localization methods have been used in production environments, such as microsoft's multidimensional root cause algorithm Adtributor for advertising systems, it can determine the dimensions that are most likely to cause ad revenue anomalies. However, such a root cause positioning method can only position a set in a certain dimension causing an anomaly in a multi-dimensional scene, and for a scene (for example, a scene related to a Content Delivery Network (CDN)) in which each dimension causing an anomaly is crossed or a root cause causing an anomaly is often not one dimension, such a root cause causing an anomaly cannot be accurately positioned according to the positioning method. Therefore, a general technique capable of accurately locating the cause of an abnormality is required.
Disclosure of Invention
The present disclosure provides a method and an apparatus for locating an abnormal root cause, an electronic device, and a storage medium, so as to at least solve the problem in the related art that the abnormal root cause cannot be located accurately.
According to a first aspect of the embodiments of the present disclosure, there is provided an abnormal root cause positioning method, including: detecting an anomaly; in response to detecting an anomaly, performing an attribution probe operation according to an attribution dimension set to determine root cause nodes in an attribution path of the anomaly, wherein the attribution dimension set comprises a plurality of attribution dimensions, each root cause node corresponding to an anomaly attribution dimension value of one attribution dimension; an abnormal root cause is determined from all root cause nodes in the attributed path.
Optionally, the detecting an anomaly comprises: performing anomaly detection on an attention dimension or an attention dimension combination of a user to determine an abnormal attention dimension value or an attention dimension combination value, wherein the method for determining root cause nodes in an attribution path of the anomaly by performing attribution detection operation according to an attribution dimension set comprises the following steps: for the attention dimension value or attention dimension combination value of the anomaly, an attribution detection operation is executed according to the attribution dimension set to determine a root cause node in the attribution path.
Optionally, the attribution detection operation comprises: for each attribution dimension in the attribution dimension set, respectively performing anomaly detection under each attribution dimension to determine an anomaly attribution dimension value of each attribution dimension; calculating the abnormal degree of each attribution dimension; and determining root cause nodes in the attribution paths according to the abnormal degrees of all attribution dimensions in the attribution dimension set.
Optionally, the determining a root cause node in the attribution path according to the abnormal degrees of all attribution dimensions in the attribution dimension set includes: determining an abnormal attribution dimension value of an attribution dimension of which the abnormal degree is equal to the minimum value as a root cause node corresponding to the attribution path in the case that the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set meets a preset condition, updating the attribution dimension set by removing the attribution dimension of which the abnormal degree is equal to the minimum value from the attribution dimension set, and executing the attribution detection operation according to the updated attribution dimension set until the attribution dimension set is empty; in the case where the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set does not satisfy a predetermined condition, determining the abnormal attribution dimension value of the attribution dimension, the abnormal degree of which is equal to the minimum value, as corresponding to the root cause node in the attribution path, and not performing the attribution detection operation any more.
Optionally, the degree of abnormality of each attribution dimension is equal to a ratio of the number of abnormal attribution dimension values of each attribution dimension to the number of attribution dimension values considered when performing abnormality detection in each attribution dimension.
Optionally, the detecting an anomaly comprises: detecting an anomaly occurring in a content distribution network, wherein the set of attributed dimensions includes a plurality of dimensions that affect the quality of the content distribution network.
Optionally, the exception comprises an exception relating to a streaming media service.
Optionally, the method for locating an abnormal root cause further includes: and providing alarm information according to the determined abnormal root cause.
According to a second aspect of the embodiments of the present disclosure, there is provided an abnormal root cause positioning device, including: an abnormality detection unit configured to detect an abnormality; a root cause node determination unit configured to, in response to detecting an anomaly, perform an attribution probe operation according to an attribution dimension set to determine root cause nodes in an attribution path of the anomaly, wherein the attribution dimension set includes a plurality of attribution dimensions, each root cause node corresponding to an anomaly attribution dimension value of one attribution dimension; an abnormal root cause determination unit configured to determine an abnormal root cause from all root cause nodes in the attribution path.
Optionally, the detecting an anomaly comprises: performing anomaly detection on an attention dimension or an attention dimension combination of a user to determine an abnormal attention dimension value or an attention dimension combination value, wherein the method for determining root cause nodes in an attribution path of the anomaly by performing attribution detection operation according to an attribution dimension set comprises the following steps: for the attention dimension value or attention dimension combination value of the anomaly, an attribution detection operation is executed according to the attribution dimension set to determine a root cause node in the attribution path.
Optionally, the attribution detection operation comprises: for each attribution dimension in the attribution dimension set, respectively performing anomaly detection under each attribution dimension to determine an anomaly attribution dimension value of each attribution dimension; calculating the abnormal degree of each attribution dimension; and determining root cause nodes in the attribution paths according to the abnormal degrees of all attribution dimensions in the attribution dimension set.
Optionally, the determining a root cause node in the attribution path according to the abnormal degrees of all attribution dimensions in the attribution dimension set includes: determining an abnormal attribution dimension value of an attribution dimension of which the abnormal degree is equal to the minimum value as a root cause node corresponding to the attribution path in the case that the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set meets a preset condition, updating the attribution dimension set by removing the attribution dimension of which the abnormal degree is equal to the minimum value from the attribution dimension set, and executing the attribution detection operation according to the updated attribution dimension set until the attribution dimension set is empty; in the case where the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set does not satisfy a predetermined condition, determining the abnormal attribution dimension value of the attribution dimension, the abnormal degree of which is equal to the minimum value, as corresponding to the root cause node in the attribution path, and not performing the attribution detection operation any more.
Optionally, the degree of abnormality of each attribution dimension is equal to a ratio of the number of abnormal attribution dimension values of each attribution dimension to the number of attribution dimension values considered when performing abnormality detection in each attribution dimension.
Optionally, the detecting an anomaly comprises: detecting an anomaly occurring in a content distribution network, wherein the set of attributed dimensions includes a plurality of dimensions that affect the quality of the content distribution network.
Optionally, the exception comprises an exception relating to a streaming media service.
Optionally, the abnormal root cause locating device further includes: an alarm unit configured to provide alarm information according to the determined abnormal root cause.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method for anomaly root cause localization as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the method for anomaly root cause localization as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein instructions of the computer program product are executed by at least one processor in an electronic device to perform the method for anomaly root cause localization as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: embodiments of the present disclosure determine root cause nodes in the attribution path of the anomaly by performing an attribution probe operation according to an attribution dimension set, since the attribution dimension set includes a plurality of attribution dimensions and each root cause node corresponds to an exception attribution dimension value of one attribution dimension, an anomaly root cause in multiple dimensions can be accurately located according to all root cause nodes in the attribution path.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is an exemplary system architecture to which exemplary embodiments of the present disclosure may be applied;
FIG. 2 is a flow chart of an anomaly root cause location method of an exemplary embodiment of the present disclosure;
FIG. 3 is a diagram illustrating an example of an anomaly root cause localization method of an exemplary embodiment of the present disclosure;
FIG. 4 is a block diagram of an anomaly root cause locating device of an exemplary embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
Fig. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. A user may use the terminal devices 101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., audio-video data upload requests, audio-video data download requests), etc. Various communication client applications, such as audio and video recording software, an audio and video player, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103. The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and capable of playing and recording audio and video, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal device 101, 102, 103 is software, it may be installed in the electronic devices listed above, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or it may be implemented as a single software or software module. And is not particularly limited herein.
The terminal devices 101, 102, 103 may be equipped with an image capturing device (e.g., a camera) to capture video data. In practice, the smallest visual unit that makes up a video is a Frame (Frame). Each frame is a static image. Temporally successive sequences of frames are composited together to form a motion video. Further, the terminal apparatuses 101, 102, 103 may also be mounted with a component (e.g., a speaker) for converting an electric signal into sound to play the sound, and may also be mounted with a device (e.g., a microphone) for converting an analog audio signal into a digital audio signal to pick up the sound.
The server 105 may be a server providing various services, such as a background server providing support for multimedia applications installed on the terminal devices 101, 102, 103. The background server can analyze, store and the like the received data such as the audio and video data uploading request, can also receive the audio and video data downloading request sent by the terminal equipment 101, 102 and 103, and feeds back the audio and video data indicated by the audio and video data downloading request to the terminal equipment 101, 102 and 103.
By way of example, server 105 may be a streaming media server and network 104 may be a Content Delivery Network (CDN). The streaming server may transmit streaming media to the respective terminal devices through the content distribution network. The content distribution network includes a plurality of nodes that provide streaming media content. When a node in a content distribution network fails, an abnormality occurs when a streaming server provides a streaming service to a terminal device. If the failed node can be accurately located, the failure can be quickly handled and streaming media services can be better provided.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the abnormal root cause positioning method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the abnormal root cause positioning apparatus is generally disposed in the server 105. However, the abnormal root cause positioning method provided by the embodiment of the present disclosure may also be executed by the terminal device and the server in cooperation. Accordingly, the abnormality root cause positioning device may also be provided in both the terminal device and the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation, and the disclosure is not limited thereto.
Fig. 2 is a flowchart of an abnormal root cause locating method of an exemplary embodiment of the present disclosure.
Referring to fig. 2, in step S201, an abnormality is detected. Here, the anomaly may be detected in any manner, and the present disclosure incorporates any limitation on the manner in which the anomaly is checked. As an example, an anomaly may be detected based on statistics of historical traffic data, or an anomaly that may occur may be predicted based on historical traffic data. Furthermore, the exception may be any exception occurring in any business scenario, and the present disclosure does not have any limitation on the kind, standard, etc. of the exception.
For example, in a streaming media service architecture, CDNs play a very critical role. The CDN is applied to the transmission of streaming media content, so that the QoS in the transmission process of the CDN can be effectively improved, the bandwidth consumption of the backbone network can be effectively reduced, and the CDN is currently applied more and more widely. In the real-time quality monitoring of the service, the index abnormity of the quality of service (QoS) needs to be found in time, and the real fault reason can be found quickly and accurately to guide operation and maintenance personnel to carry out corresponding quick loss stopping, so that the high-quality experience of a user is finally ensured. To this end, according to an exemplary embodiment, detecting the anomaly may include detecting an anomaly occurring in a Content Delivery Network (CDN). By way of example, the anomalies include anomalies related to streaming media services, such as, but not limited to, data download failure rate anomalies.
In step S202, in response to detecting an anomaly, an attribution probe operation may be performed according to the set of attribution dimensions to determine a root cause node in an attribution path for the anomaly. Here, the set of attribution dimensions may include a plurality of attribution dimensions, and each root cause node may correspond to an abnormal attribution dimension value of one attribution dimension. In the present disclosure, the attribution is a cause that may cause an anomaly, and the root cause is a root cause that causes the anomaly. Here, the attribution dimension may be a cause dimension causing an abnormality, and the abnormality attribution dimension value is a cause dimension value attributed to an occurrence of an abnormality in the dimension. Further, here, the attribution path is a path for finding a cause of an abnormality, and the root cause node is a node corresponding to the root cause causing the abnormality.
For example, as mentioned in the background of the present disclosure, in a scenario involving a Content Delivery Network (CDN), the dimensions causing the anomaly are crossed or the root causing the anomaly is often not one dimension, and the root needs to be found in a large dimensional space after the anomaly is found. Thus, the set of attribution dimensions may be pre-set to facilitate locating the root cause under a multi-dimensional space. In such a scenario, according to an exemplary embodiment, multiple dimensions may be included in the set of attributed dimensions that affect the quality of the content distribution network. For example, the set of attributed dimensions can be (province, operator, domain name, network), but is not so limited. The user can define various dimensions in the set of attributed dimensions according to various scenario needs.
As described above, the root cause causing the abnormality is often not a dimension, and after the abnormality is found, the root cause needs to be found in a large number of dimensional spaces, however, in the face of a large number of abnormalities in various dimensions or dimensional combinations every day, the operation and maintenance personnel often need to really pay attention to the dimension or dimensional combination causing the change in the quality of service, so that the loss can be conveniently and timely stopped. According to an exemplary embodiment, when an anomaly is detected in step S201, anomaly detection may be performed on a user' S attention dimension or attention dimension combination to determine an attention dimension value or attention dimension combination value of the anomaly. Accordingly, in step S202, an attribution probing operation may be performed according to the attribution dimension set for the attention dimension value or the attention dimension combination value of the anomaly to determine a root cause node in the attribution path.
For example, in a quality monitoring scenario in which streaming media services are provided by using a CDN, the dimension of interest of a user may be a CDN vendor, a domain name, an operator, a province, or a network, or the combination of the dimensions of interest of a user may be any combination of a CDN vendor, a domain name, an operator, a province, and a network, for example, the combination of the dimensions of a province and an operator. In this case, when an abnormality is detected, abnormality detection may be performed only on the attention dimension or the attention dimension combination of the user to determine an attention dimension value or an attention dimension combination value of the abnormality. For example, the anomaly detection may be performed only on this dimension combination of interest (province, operator) to determine the dimension combination of interest value of the anomaly. For example, anomaly detection may be performed by detecting download failure rates of customers served by operators of various provinces, e.g., if the download failure rate of a customer moving service in Henan is higher than a certain threshold, (Henan, Mobile) is determined to be an anomalous dimension of interest combination value. At this time, it may be determined that an abnormality occurs in the movement of the south of the river during the streaming service provided by the CDN. In this case, an attribution probing operation may be performed to determine root cause nodes in the attribution path from the attribution dimension set only for this combined value of the dimensions of interest (Henan, move) to determine what cause of Henan movement caused the occurrence of the anomaly.
According to an exemplary embodiment, the attribution detection operation performed at step 202 may include: for each attribution dimension in the attribution dimension set, respectively performing anomaly detection under each attribution dimension to determine an anomaly attribution dimension value of each attribution dimension; calculating the abnormal degree of each attribution dimension; and determining root cause nodes in the attribution paths according to the abnormal degrees of all attribution dimensions in the attribution dimension set. For example, in the above example, when it is determined that an anomaly is about a dimension combination value (henan, mobile), anomaly detection under each network and each domain name provided by a mobile operator in henan province can be performed separately to determine respective anomaly dimension values under both dimensions of the network and the domain name, for example, according to an attributed dimension set (network, domain name). For example, if a WIFI network anomaly is detected, the anomaly attribution dimension value for this attribution dimension of the network is WIFI; if a domain name alimov2 anomaly is detected, the anomalous attribution dimension value for this domain name attribution dimension is alimov 2. After the abnormal attribution dimension value for each attribution dimension is determined, the degree of abnormality for each attribution dimension may be calculated. According to an exemplary embodiment, the degree of abnormality for each attribution dimension may be equal to the ratio of the number of abnormal attribution dimension values for each attribution dimension to the number of attribution dimension values considered when performing abnormality detection in each attribution dimension. Here, the attribution dimension values to be considered in performing the abnormality detection in each attribution dimension may be defined by a user in advance, or may determine which attribution dimension values need to be considered according to a predetermined condition. For example, in a scenario where streaming media services are provided with a CDN, it may be determined which attribution dimension values need to be considered depending on whether the download amount exceeds a threshold.
For example, if the attribution dimension is a network and two networks, namely, a WIFI network and a cellular network, are considered when abnormality detection is performed in the attribution dimension of the network, the attribution dimension values considered when abnormality detection is performed in the attribution dimension of the network are the WIFI network and the cellular network, and the number thereof is 2. If the WIFI network anomaly is detected, the anomaly degree of the network due to the dimension is 1/2. Although the calculation example of the degree of abnormality for each attribution dimension is given above, the manner of calculating the degree of abnormality is not limited thereto.
After the degree of anomaly for each attribution dimension is calculated, the root cause node in the attribution path can be determined according to the degree of anomaly for all attribution dimensions in the attribution dimension set. In particular, according to an exemplary embodiment, in a case where a minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set satisfies a predetermined condition, the abnormal attribution dimension value of the attribution dimension having the abnormal degree equal to the minimum value may be determined to correspond to a root cause node in the attribution path, the attribution dimension set may be updated by removing the attribution dimension having the abnormal degree equal to the minimum value from the attribution dimension set, and the attribution probing operation may be performed according to the updated attribution dimension set until the attribution dimension set is empty. In the case where the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set does not satisfy the predetermined condition, the abnormal attribution dimension value of the attribution dimension having the abnormal degree equal to the minimum value may be determined to correspond to the root cause node in the attribution path, and the attribution detecting operation is not performed any more.
For convenience of description, the above process of performing an attribution probing operation according to an attribution dimension set to determine a root cause node in an attribution path of the exception may be represented as follows:
i) respectively performing anomaly detection under each attribution dimension to each attribution dimension in the current attribution dimension set to determine an anomaly attribution dimension value of each attribution dimension;
ii) calculating the degree of abnormality for each attribution dimension, which may be calculated, for example, as follows: number of abnormal attribution dimension values per attribution dimension/number of attribution dimension values considered in performing abnormality detection under each attribution dimension;
iii) if the minimum degree of abnormality satisfies a predetermined condition (for example, greater than 0 and less than 1/3, but the predetermined condition is not limited thereto), determining an abnormal attribution dimension value of an attribution dimension, the degree of abnormality of which is equal to the minimum value, as corresponding to a root cause node in the attribution path; otherwise, the current node is considered to be the last root cause node, no downward exploration with finer dimension is needed,
the loop is exited.
iv) updating the set of attributed dimensions by removing the attributed dimension from the current set of attributed dimensions having a degree of anomaly equal to the minimum value, and iteratively looping i) through iv) until the set of attributed dimensions is empty.
After the root cause nodes in the attribution path are determined at step S202, an abnormal root cause is determined from all the root cause nodes in the attribution path at step S203. Specifically, since each root cause node corresponds to an abnormal attribution dimension value of an attribution dimension, combining the abnormal attribution dimension values corresponding to all root cause nodes is an abnormal root cause. For example, in the example of Henan movement listed above, if there are two root cause nodes in the attributed path, one root cause node corresponding to the attributed dimension of the network and the outlier attributed dimension value being WIFI, and one root cause node corresponding to the attributed dimension of the domain name and the outlier attributed dimension value being alimov2, then the outlier root cause is an anomaly in the domain name alimov2 in the WIFI network.
Optionally, according to an exemplary embodiment of the present disclosure, the above-mentioned abnormal root cause positioning method may further include providing alarm information according to the determined abnormal root cause. The alert information may be provided to the user in various ways (e.g., an audible alarm, etc.). By providing the alarm information, relevant operation and maintenance personnel can take corresponding countermeasures in time according to the abnormal root.
In the above, the abnormal root cause localization method according to the exemplary embodiment of the present disclosure has been described with reference to fig. 2, the abnormal root cause localization method according to the present disclosure determines a root cause node in an attribution path of the abnormality by performing an attribution probing operation according to an attribution dimension set, and since the attribution dimension set includes a plurality of attribution dimensions and each root cause node corresponds to an abnormal attribution dimension value of one attribution dimension, an abnormal root cause in a plurality of dimensions can be accurately localized according to all root cause nodes in the attribution path.
Hereinafter, in order to more clearly understand the present disclosure, the above-described abnormal root cause localization method is briefly described in conjunction with an example scenario to which exemplary embodiments of the present disclosure may be applied.
Fig. 3 is a schematic diagram illustrating an example of an abnormal root cause localization method of an exemplary embodiment of the present disclosure. The example of fig. 3 is an example of anomaly root location in a scenario where streaming media services are provided with a CDN. Referring to fig. 3, anomalies may be detected in streaming media quality of service monitoring, e.g., download failure rate anomalies may be detected. Assume that, through anomaly detection, an anomaly in the resource PREFETCH _ VIDEO download failure rate is discovered. In this case, a cause probe operation may be performed to determine root nodes in the cause path of such an anomaly based on the set of cause dimensions. For example, as shown in fig. 3, the attributed dimension set can be (province, operator, domain name, network) or can be represented as (provice, host, isp, network). Anomaly detection for each attribution dimension in the set of attribution dimensions may first be performed separately for each attribution dimension to determine an anomalous attribution dimension value for each attribution dimension. For example, as shown in fig. 3, anomaly detection is performed in four dimensions of province, operator, domain name, and network, respectively (i.e., which provinces have a download failure rate anomaly in the province under consideration, which operators have a download failure rate anomaly in the operator under consideration, which domains have a download failure rate anomaly in the domain name under consideration, and which networks have a download failure rate anomaly in the network under consideration), respectively), and the anomaly attribution dimension value of each attribution dimension can be determined by anomaly detection in each attribution dimension. For example, as shown in fig. 3, the anomalous attribution dimension values for the four dimensions province, operator, domain name, network are guangdong, mobile, alimov2 and WIFI, respectively.
Next, the degree of anomaly for each attributed dimension can be calculated. Assuming that the number of attribution dimension values considered when anomaly detection is performed in four dimensions of province, operator, domain name, network is 13, 3, 7, and 2, respectively, the ratio of the number of attribution dimension values for each attribution dimension to the number of attribution dimension values considered when anomaly detection is performed in each attribution dimension can be taken as the degree of anomaly for each attribution dimension. For example, as shown in fig. 3, the degrees of anomaly (also referred to as "saliency") for the four attributed dimensions of province, operator, domain name, network calculated in this manner are 1/13, 1/3, 1/7, and 1/2, respectively. Subsequently, it may be determined whether the minimum value of all the abnormality degrees (i.e., the minimum abnormality degree) satisfies a predetermined condition. In the example of fig. 3, the predetermined condition is set as whether the minimum abnormality degree is greater than 0 and less than 1/3. Since the minimum abnormality degree among the above-calculated abnormality degrees 1/13, 1/3, 1/7, and 1/2 is 1/13, which satisfies a predetermined condition of being greater than 0 and less than 1/3, the abnormality attribution dimension value of the attribution dimension whose abnormality degree is 1/13 is determined to correspond to the root node in the attribution path, that is, the guangdong is determined to correspond to one root node.
Next, the attribution dimension with the least degree of anomaly (i.e., province, operator, domain name, network) is removed from the current attribution dimension set (province, operator, domain name, network), and the operations of i) to iv) are continued as described above until the attribution dimension set is empty. After removing the province, namely the attribution dimension, from the attribution dimension set, respectively performing anomaly detection under the current attribution dimensions to determine an anomaly attribution dimension value of each attribution dimension. That is, it is detected which operators in the operators considered in Guangdong province have abnormal download failure rate, which domains in the domains considered in Guangdong province have abnormal download failure rate, and which networks in the networks considered in Guangdong province have abnormal download failure rate, respectively. As shown in fig. 3, since the operator and the domain name with abnormal download failure rate are not detected, the degree of abnormality of both the operator and the domain name due to the dimension is 0. And since it is detected that there is a network with abnormal download failure rate in the WIFI network and the attribution dimension value considered in the attribution dimension of the network is only the WIFI network, the degree of abnormality of the attribution dimension of the network is 1/1. Because the minimum abnormal degree does not meet the preset condition, the current node is the last root cause node, and the downward detection of the finer dimension is not needed. At this point, an abnormal root cause may be determined from all root cause nodes in the attributed path.
Specifically, in the example of fig. 3, it may be determined that the anomaly in guangdong-WIFI is the root cause of the anomaly in the resource download _ VIDEO download failure rate. In addition, alarm information can be provided according to the abnormal root cause, for example, the alarm information can be output: PREFETCH-VIDEO-Guangdong-WIFI.
Although the above has briefly described the method for locating an abnormal root cause according to the exemplary embodiment of the present disclosure by taking a scenario in which a CDN provides a streaming media service as an example, the method for locating an abnormal root cause of the present disclosure is not limited to the above exemplary scenario, but may be applied to other scenarios requiring abnormal root cause location as needed.
Fig. 4 is a block diagram of an anomaly root cause locating device 400 of an exemplary embodiment of the present disclosure.
Referring to fig. 4, the abnormality based positioning apparatus 400 may include an abnormality detection unit 401, a root cause node determination unit 402, and an abnormality root cause determination unit 403. Specifically, the abnormality detection unit 401 may be configured to detect an abnormality. The root node determination unit 402 may be configured to, in response to detecting an anomaly, perform an attribution probing operation according to the set of attribution dimensions to determine a root node in an attribution path of the anomaly. Here, the set of attribution dimensions includes a plurality of attribution dimensions, each root cause node corresponding to an exception attribution dimension value of one attribution dimension. The abnormal root cause determination unit 403 may be configured to determine an abnormal root cause from all root cause nodes in the attribution path. Optionally, the abnormal root cause locating device 400 may further include an alarm unit (not shown), which may be configured to provide alarm information according to the determined abnormal root cause.
Since the abnormal root cause location method shown in fig. 2 can be executed by the abnormal root cause location apparatus 400 shown in fig. 4, and the abnormality detection unit 401, the root cause node determination unit 402, and the abnormal root cause determination unit 403 can respectively execute operations corresponding to step 201, step 202, and step 203 in fig. 2, any relevant details related to the operations executed by each unit in fig. 4 can be referred to in the corresponding description about fig. 2, and are not repeated here.
Furthermore, it should be noted that although the abnormal root cause locating device 400 is described as being divided into units for respectively executing corresponding processing, it is clear to those skilled in the art that the processing executed by each unit can also be executed without any specific unit division or explicit demarcation between units by the abnormal root cause locating device 400. Furthermore, the anomaly root cause locating device 400 may also include other units, such as a data processing unit, a storage unit, and the like.
Fig. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Referring to fig. 5, an electronic device 500 may include at least one memory 501 having a set of computer-executable instructions stored therein that, when executed by the at least one processor, perform an anomaly root cause localization method in accordance with embodiments of the present disclosure and at least one processor 502.
By way of example, the electronic device may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. The electronic device need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) either individually or in combination. The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In an electronic device, a processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor may execute instructions or code stored in the memory, which may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory may be integral to the processor, e.g., RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the memory may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the memory.
In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.
According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform an abnormal root cause localization method according to an exemplary embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an embodiment of the present disclosure, there may also be provided a computer program product, instructions of which are executable by at least one processor in an electronic device to perform an abnormal root cause localization method according to an exemplary embodiment of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An abnormal root cause locating method is characterized by comprising the following steps:
detecting an anomaly;
in response to detecting an anomaly, performing an attribution probe operation according to an attribution dimension set to determine root cause nodes in an attribution path of the anomaly, wherein the attribution dimension set comprises a plurality of attribution dimensions, each root cause node corresponding to an anomaly attribution dimension value of one attribution dimension;
an abnormal root cause is determined from all root cause nodes in the attributed path.
2. The method for locating abnormal root cause according to claim 1, wherein the detecting abnormality includes:
carrying out anomaly detection on the attention dimension or the attention dimension combination of the user to determine an abnormal attention dimension value or attention dimension combination value,
wherein the performing an attribution probing operation according to the set of attribution dimensions to determine root nodes in the attribution path of the anomaly comprises:
for the attention dimension value or attention dimension combination value of the anomaly, an attribution detection operation is executed according to the attribution dimension set to determine a root cause node in the attribution path.
3. The method of abnormal root cause localization according to claim 1 or 2, wherein the attribution detection operation comprises:
for each attribution dimension in the attribution dimension set, respectively performing anomaly detection under each attribution dimension to determine an anomaly attribution dimension value of each attribution dimension;
calculating the abnormal degree of each attribution dimension;
and determining root cause nodes in the attribution paths according to the abnormal degrees of all attribution dimensions in the attribution dimension set.
4. The method for abnormal root cause localization according to claim 3, wherein the determining root cause nodes in the attribution path according to the abnormal degrees of all attribution dimensions in the attribution dimension set comprises:
determining an abnormal attribution dimension value of an attribution dimension of which the abnormal degree is equal to the minimum value as a root cause node corresponding to the attribution path in the case that the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set meets a preset condition, updating the attribution dimension set by removing the attribution dimension of which the abnormal degree is equal to the minimum value from the attribution dimension set, and executing the attribution detection operation according to the updated attribution dimension set until the attribution dimension set is empty;
in the case where the minimum value of the abnormal degrees of all attribution dimensions in the attribution dimension set does not satisfy a predetermined condition, determining the abnormal attribution dimension value of the attribution dimension, the abnormal degree of which is equal to the minimum value, as corresponding to the root cause node in the attribution path, and not performing the attribution detection operation any more.
5. The method of anomaly root cause localization according to claim 4, wherein the degree of anomaly for each attribution dimension is equal to the ratio of the number of anomaly attribution dimension values for each attribution dimension to the number of attribution dimension values considered in performing anomaly detection for the each attribution dimension.
6. The method for locating abnormal root cause according to claim 1, wherein the detecting abnormality includes:
detects an abnormality occurring in the content distribution network,
wherein the set of attribution dimensions includes a plurality of dimensions that affect the quality of the content distribution network.
7. The method of anomaly root cause location according to claim 6, wherein said anomaly comprises an anomaly related to a streaming media service.
8. An abnormal root cause locating device, comprising:
an abnormality detection unit configured to detect an abnormality;
a root cause node determination unit configured to, in response to detecting an anomaly, perform an attribution probe operation according to an attribution dimension set to determine root cause nodes in an attribution path of the anomaly, wherein the attribution dimension set includes a plurality of attribution dimensions, each root cause node corresponding to an anomaly attribution dimension value of one attribution dimension;
an abnormal root cause determination unit configured to determine an abnormal root cause from all root cause nodes in the attribution path.
9. An electronic device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method of anomaly root cause localization as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the method of anomaly root cause localization according to any one of claims 1 to 7.
CN202011511168.0A 2020-12-18 2020-12-18 Abnormal root cause positioning method and device, electronic equipment and storage medium Active CN112702198B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011511168.0A CN112702198B (en) 2020-12-18 2020-12-18 Abnormal root cause positioning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011511168.0A CN112702198B (en) 2020-12-18 2020-12-18 Abnormal root cause positioning method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112702198A true CN112702198A (en) 2021-04-23
CN112702198B CN112702198B (en) 2023-03-14

Family

ID=75509093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011511168.0A Active CN112702198B (en) 2020-12-18 2020-12-18 Abnormal root cause positioning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112702198B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547133A (en) * 2022-01-17 2022-05-27 北京元年科技股份有限公司 Multi-dimensional dataset-based conversational attribution analysis method, device and equipment
CN114900421A (en) * 2022-04-08 2022-08-12 深圳绿米联创科技有限公司 Fault detection method and device, electronic equipment and readable storage medium
CN117389230A (en) * 2023-11-16 2024-01-12 广州中健中医药科技有限公司 Antihypertensive traditional Chinese medicine extract production control method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200053108A1 (en) * 2018-08-07 2020-02-13 Apple Inc. Utilizing machine intelligence to identify anomalies
CN111064614A (en) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 Fault root cause positioning method, device, equipment and storage medium
CN111078521A (en) * 2019-12-18 2020-04-28 北京三快在线科技有限公司 Abnormal event analysis method, device, equipment, system and storage medium
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN111444247A (en) * 2020-06-17 2020-07-24 北京必示科技有限公司 KPI (Key performance indicator) -based root cause positioning method and device and storage medium
CN111538951A (en) * 2020-03-31 2020-08-14 北京华三通信技术有限公司 Abnormity positioning method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200053108A1 (en) * 2018-08-07 2020-02-13 Apple Inc. Utilizing machine intelligence to identify anomalies
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN111064614A (en) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 Fault root cause positioning method, device, equipment and storage medium
CN111078521A (en) * 2019-12-18 2020-04-28 北京三快在线科技有限公司 Abnormal event analysis method, device, equipment, system and storage medium
CN111538951A (en) * 2020-03-31 2020-08-14 北京华三通信技术有限公司 Abnormity positioning method and device
CN111444247A (en) * 2020-06-17 2020-07-24 北京必示科技有限公司 KPI (Key performance indicator) -based root cause positioning method and device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547133A (en) * 2022-01-17 2022-05-27 北京元年科技股份有限公司 Multi-dimensional dataset-based conversational attribution analysis method, device and equipment
CN114900421A (en) * 2022-04-08 2022-08-12 深圳绿米联创科技有限公司 Fault detection method and device, electronic equipment and readable storage medium
CN117389230A (en) * 2023-11-16 2024-01-12 广州中健中医药科技有限公司 Antihypertensive traditional Chinese medicine extract production control method and system
CN117389230B (en) * 2023-11-16 2024-06-07 广州中健中医药科技有限公司 Antihypertensive traditional Chinese medicine extract production control method and system

Also Published As

Publication number Publication date
CN112702198B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN112702198B (en) Abnormal root cause positioning method and device, electronic equipment and storage medium
US10048995B1 (en) Methods and apparatus for improved fault analysis
US20240195833A1 (en) System for automated capture and analysis of business information for security and client-facing infrastructure reliability
US10178198B2 (en) System and method for selection and switching of content sources for a streaming content session
US9158650B2 (en) Mobile application performance management
US9405666B2 (en) Health monitoring using snapshot backups through test vectors
US9553909B2 (en) System and method for assignment and switching of content sources for a streaming content session
US9929930B2 (en) Reducing an amount of captured network traffic data to analyze
CN107332765B (en) Method and apparatus for repairing router failures
CN110727560A (en) Cloud service alarm method and device
US20200250019A1 (en) Method, device and computer program product for monitoring access request
US20160134508A1 (en) Non-disruptive integrated network infrastructure testing
CN113242443B (en) Data stream transmission abnormity detection method and device
US10423480B2 (en) Guided troubleshooting with autofilters
US20220337809A1 (en) Video playing
CN112954372B (en) Streaming media fault monitoring method and device
US10372524B2 (en) Storage anomaly detection
CN112702229A (en) Data transmission method, device, electronic equipment and storage medium
CN112769643B (en) Resource scheduling method and device, electronic equipment and storage medium
US9952773B2 (en) Determining a cause for low disk space with respect to a logical disk
CN112671590B (en) Data transmission method and device, electronic equipment and computer storage medium
WO2018001114A1 (en) Method and device for processing data
CN113079103A (en) Audio transmission method, audio transmission device, electronic equipment and storage medium
CN114049065A (en) Data processing method, device and system
CN111258845A (en) Detection of event storms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant